[jira] [Updated] (SPARK-21849) Make the serializer function more robust
[ https://issues.apache.org/jira/browse/SPARK-21849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DjvuLee updated SPARK-21849: Issue Type: Improvement (was: Bug) > Make the serializer function more robust > > > Key: SPARK-21849 > URL: https://issues.apache.org/jira/browse/SPARK-21849 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: DjvuLee >Priority: Minor > > make sure the `close` function is called in the `serialize` function. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21849) Make the serializer function more robust
DjvuLee created SPARK-21849: --- Summary: Make the serializer function more robust Key: SPARK-21849 URL: https://issues.apache.org/jira/browse/SPARK-21849 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.2.0 Reporter: DjvuLee Priority: Minor make sure the `close` function is called in the `serialize` function. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21682) Caching 100k-task RDD GC-kills driver (due to updatedBlockStatuses?)
[ https://issues.apache.org/jira/browse/SPARK-21682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120959#comment-16120959 ] DjvuLee commented on SPARK-21682: - Yes, our company also faced with this scalability problem, the driver can easily died under a 70K partition. > Caching 100k-task RDD GC-kills driver (due to updatedBlockStatuses?) > > > Key: SPARK-21682 > URL: https://issues.apache.org/jira/browse/SPARK-21682 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.2, 2.1.1, 2.2.0 >Reporter: Ryan Williams > > h3. Summary > * {{sc.parallelize(1 to 10, 10).cache.count}} causes a driver GC > stall midway through on every configuration and version I've tried in 2.x. > * It runs fine with no Full GCs as of 1.6.3 > * I think that {{internal.metrics.updatedBlockStatuses}} is the culprit, and > breaks a contract about what big-O sizes accumulators' values can be: > ** they are each of size O(P), where P is the number of partitions in a > cached RDD > ** ⇒ the driver must process O(P²) data from {{TaskEnd}} events, instead of > O(P) > ** ⇒ the driver also must process O(P*E) work every 10s from > {{ExecutorMetricsUpdates}} (where E is the number of executors; cf. > {{spark.executor.heartbeatInterval}}) > * when operating on a 100k-partition cached RDD, the driver enters a GC loop > due to all the allocations it must do to process {{ExecutorMetricsUpdate}} > and {{TaskEnd}} events with {{updatedBlockStatuses}} attached > * this metric should be disabled, or some ability to blacklist it from the > command-line should be added. > * [SPARK-20084|https://issues.apache.org/jira/browse/SPARK-20084] addressed > one part of this - the event-log size had exploded - but the root problem > still exists / is worse > h3. {{count}} a 100k-partition RDD: works fine without {{.cache}} > In Spark 2.2.0 or 2.1.1: > {code} > spark-shell --conf spark.driver.extraJavaOptions="-XX:+PrintGCDetails > -XX:+PrintGCTimeStamps -verbose:gc" > scala> val rdd = sc.parallelize(1 to 10, 10) > scala> rdd.count > {code} > In YARN and local modes, this finishes in ~20s seconds with ~20 partial GCs > logged, all taking under 0.1s ([example > output|https://gist.github.com/ryan-williams/a3d02ea49d2b5021b53e1680f8fedc37#file-local-mode-rdd-not-cached-works-fine]); > all is well! > h3. {{count}} a 100k-partition cached RDD: GC-dies > If we {{cache}} the RDD first, the same {{count}} job quickly sends the > driver into a GC death spiral: full GC's start after a few thousand tasks and > increase in frequency and length until they last minutes / become continuous > (and, in YARN, the driver loses contact with any executors). > Example outputs: > [local|https://gist.github.com/ryan-williams/a3d02ea49d2b5021b53e1680f8fedc37#file-local-mode-rdd-cached-crashes], > > [YARN|https://gist.github.com/ryan-williams/a3d02ea49d2b5021b53e1680f8fedc37#file-yarn-mode-cached-rdd-dies]. > The YARN example removes any confusion about whether the storing of the > blocks is causing memory pressure on the driver; the driver is basically > doing no work except receiving executor updates and events, and yet it > becomes overloaded. > h3. Can't effectively throw driver heap at the problem > I've tested with 1GB, 10GB, and 20GB heaps, and the larger heaps do what we'd > expect: delay the first Full GC, and make Full GCs longer when they happen. > I don't have a clear sense on whether the onset is linear or quadratic (i.e. > do I get twice as far into the job before the first Full GC with a 20GB as > with a 10GB heap, or only sqrt(2) times as far?). > h3. Mostly memory pressure, not OOMs > An interesting note is that I'm rarely seeing OOMs as a result of this, even > on small heaps. > I think this is consistent with the idea that all this data is being > immediately discarded by the driver, as opposed to kept around to serve web > UIs or somesuch. > h3. Eliminating {{ExecutorMetricsUpdate}}'s doesn't seem to help > Interestingly, setting large values of {{spark.executor.heartbeatInterval}} > doesn't seem to mitigate the problem; GC-stall sets in at about the same > point in the {{count}} job. > This implies that, in this example, the {{TaskEnd}} events are doing most or > all of the damage. > h3. CMS helps but doesn't solve the problem > In some rough testing, I saw the {{count}} get about twice as far before > dying when using the CMS collector. > h3. What bandwidth do we expect the driver to process events at? > IIUC, every 10s the driver gets O(T) (~100k?) block updates from each of ~500 > executors, and allocating objects for these updates is pushing it over a > tipping point where it can't keep up. > I don't know how to get good numbers on how much data the driver is >
[jira] [Commented] (SPARK-21547) Spark cleaner cost too many time
[ https://issues.apache.org/jira/browse/SPARK-21547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16106292#comment-16106292 ] DjvuLee commented on SPARK-21547: - Ok, I will try and posted the result later. > Spark cleaner cost too many time > > > Key: SPARK-21547 > URL: https://issues.apache.org/jira/browse/SPARK-21547 > Project: Spark > Issue Type: Bug > Components: DStreams >Affects Versions: 2.0.0 >Reporter: DjvuLee > > Spark Streaming sometime cost so many time deal with cleaning, and this can > become worse when enable the dynamic allocation. > I post the Driver's Log in the following comments, we can find that the > cleaner costs more than 2min. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21547) Spark cleaner cost too many time
[ https://issues.apache.org/jira/browse/SPARK-21547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103459#comment-16103459 ] DjvuLee commented on SPARK-21547: - Yes, I agree that this has a relationship with the work, but doing nothing about 3min is too long for a Streaming Application. My proposal is try to let us to inspect whether the current cleaner strategy is good enough. > Spark cleaner cost too many time > > > Key: SPARK-21547 > URL: https://issues.apache.org/jira/browse/SPARK-21547 > Project: Spark > Issue Type: Bug > Components: DStreams >Affects Versions: 2.0.0 >Reporter: DjvuLee > > Spark Streaming sometime cost so many time deal with cleaning, and this can > become worse when enable the dynamic allocation. > I post the Driver's Log in the following comments, we can find that the > cleaner costs more than 2min. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21547) Spark cleaner cost too many time
[ https://issues.apache.org/jira/browse/SPARK-21547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DjvuLee updated SPARK-21547: Description: Spark Streaming sometime cost so many time deal with cleaning, and this can become worse when enable the dynamic allocation. I post the Driver's Log in the following comments, we can find that the cleaner costs more than 2min. was: Spark Streaming sometime cost so many time deal with cleaning, and this can become worse when enable the dynamic allocation. I post the Driver Log in the following, in this log we can find that the cleaner cost more than 2min. > Spark cleaner cost too many time > > > Key: SPARK-21547 > URL: https://issues.apache.org/jira/browse/SPARK-21547 > Project: Spark > Issue Type: Bug > Components: DStreams >Affects Versions: 2.0.0 >Reporter: DjvuLee > > Spark Streaming sometime cost so many time deal with cleaning, and this can > become worse when enable the dynamic allocation. > I post the Driver's Log in the following comments, we can find that the > cleaner costs more than 2min. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21547) Spark cleaner cost too many time
[ https://issues.apache.org/jira/browse/SPARK-21547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DjvuLee updated SPARK-21547: Description: Spark Streaming sometime cost so many time deal with cleaning, and this can become worse when enable the dynamic allocation. I post the Driver Log in the following, in this log we can find that the cleaner cost more than 2min. was:Spark Streaming sometime cost so many time deal with cleaning, and this can become worse when enable the dynamic allocation. > Spark cleaner cost too many time > > > Key: SPARK-21547 > URL: https://issues.apache.org/jira/browse/SPARK-21547 > Project: Spark > Issue Type: Bug > Components: DStreams >Affects Versions: 2.0.0 >Reporter: DjvuLee > > Spark Streaming sometime cost so many time deal with cleaning, and this can > become worse when enable the dynamic allocation. > I post the Driver Log in the following, in this log we can find that the > cleaner cost more than 2min. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21547) Spark cleaner cost too many time
[ https://issues.apache.org/jira/browse/SPARK-21547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103005#comment-16103005 ] DjvuLee commented on SPARK-21547: - 17/07/27 11:29:51 INFO TaskSetManager: Finished task 169.0 in stage 1504.0 (TID 1504369) in 43975 ms on n6-195-137.byted.org (999/1000) 17/07/27 11:29:55 INFO TaskSetManager: Finished task 882.0 in stage 1504.0 (TID 1504905) in 44153 ms on n6-195-137.byted.org (1000/1000) 17/07/27 11:29:55 INFO YarnScheduler: Removed TaskSet 1504.0, whose tasks have all completed, from pool 17/07/27 11:29:55 INFO DAGScheduler: ResultStage 1504 (call at /spark2/python/lib/py4j-0.10.3-src.zip/py4j/java_gateway.py:2230) finished in 457.863 s 17/07/27 11:29:55 INFO DAGScheduler: Job 1504 finished: call at /spark2/python/lib/py4j-0.10.3-src.zip/py4j/java_gateway.py:2230, took 457.877969 s 17/07/27 11:30:02 INFO JobScheduler: Added jobs for time 150112620 ms 17/07/27 11:30:32 INFO JobScheduler: Added jobs for time 150112623 ms 17/07/27 11:31:02 INFO JobScheduler: Added jobs for time 150112626 ms 17/07/27 11:31:32 INFO JobScheduler: Added jobs for time 150112629 ms 17/07/27 11:31:53 INFO ContextCleaner: Cleaned accumulator 10906391 17/07/27 11:31:53 INFO ContextCleaner: Cleaned accumulator 10906392 17/07/27 11:31:53 INFO ContextCleaner: Cleaned accumulator 10906396 17/07/27 11:31:53 INFO ContextCleaner: Cleaned accumulator 10906402 17/07/27 11:31:53 INFO ContextCleaner: Cleaned accumulator 10906404 17/07/27 11:31:53 INFO ContextCleaner: Cleaned accumulator 12492509 17/07/27 11:31:53 INFO ContextCleaner: Cleaned accumulator 12492508 17/07/27 11:31:53 INFO ContextCleaner: Cleaned accumulator 12492507 17/07/27 11:31:53 INFO ContextCleaner: Cleaned accumulator 12492506 17/07/27 11:31:53 INFO ContextCleaner: Cleaned accumulator 12492505 17/07/27 11:31:53 INFO ContextCleaner: Cleaned accumulator 12492504 17/07/27 11:31:53 INFO ContextCleaner: Cleaned accumulator 12492503 17/07/27 11:31:53 INFO ContextCleaner: Cleaned accumulator 12492502 ... 7/07/27 11:31:53 INFO ContextCleaner: Cleaned accumulator 10906397 17/07/27 11:31:53 INFO ContextCleaner: Cleaned accumulator 10906398 17/07/27 11:31:53 INFO ContextCleaner: Cleaned accumulator 10906395 17/07/27 11:31:53 INFO ContextCleaner: Cleaned accumulator 10906399 17/07/27 11:31:53 INFO ContextCleaner: Cleaned accumulator 10906403 17/07/27 11:31:53 INFO ContextCleaner: Cleaned accumulator 10906400 17/07/27 11:31:53 INFO ContextCleaner: Cleaned accumulator 10906401 17/07/27 11:31:53 INFO BlockManagerInfo: Removed broadcast_1504_piece0 on 10.6.131.75:23734 in memory (size: 35.9 KB, free: 2.4 GB) 17/07/27 11:31:53 INFO BlockManagerInfo: Removed broadcast_1504_piece0 on n8-157-227.byted.org:13090 in memory (size: 35.9 KB, free: 9.4 GB) 17/07/27 11:31:53 INFO BlockManagerInfo: Removed broadcast_1504_piece0 on n8-157-158.byted.org:21120 in memory (size: 35.9 KB, free: 9.4 GB) 17/07/27 11:31:53 INFO BlockManagerInfo: Removed broadcast_1504_piece0 on n6-195-150.byted.org:13277 in memory (size: 35.9 KB, free: 9.4 GB) 17/07/27 11:31:53 INFO BlockManagerInfo: Removed broadcast_1504_piece0 on n8-156-165.byted.org:35355 in memory (size: 35.9 KB, free: 9.4 GB) 17/07/27 11:31:53 INFO BlockManagerInfo: Removed broadcast_1504_piece0 on n6-132-023.byted.org:52521 in memory (size: 35.9 KB, free: 9.4 GB) 17/07/27 11:31:53 INFO BlockManagerInfo: Removed broadcast_1504_piece0 on n8-136-133.byted.org:25696 in memory (size: 35.9 KB, free: 9.4 GB) 17/07/27 11:31:53 INFO BlockManagerInfo: Removed broadcast_1504_piece0 on n8-150-029.byted.org:34673 in memory (size: 35.9 KB, free: 9.4 GB) 17/07/27 11:31:53 INFO BlockManagerInfo: Removed broadcast_1504_piece0 on n8-148-038.byted.org:22503 in memory (size: 35.9 KB, free: 9.4 GB) 17/07/27 11:31:53 INFO BlockManagerInfo: Removed broadcast_1504_piece0 on n8-150-038.byted.org:28209 in memory (size: 35.9 KB, free: 9.4 GB) ... 17/07/27 11:32:01 INFO BlockManagerInfo: Removed broadcast_1442_piece0 on n8-163-151.byted.org:33703 in memory (size: 35.9 KB, free: 9.4 GB) 17/07/27 11:32:01 INFO BlockManagerInfo: Removed broadcast_1442_piece0 on n8-148-028.byted.org:36086 in memory (size: 35.9 KB, free: 9.4 GB) 17/07/27 11:32:01 INFO BlockManagerInfo: Removed broadcast_1442_piece0 on n8-151-039.byted.org:21081 in memory (size: 35.9 KB, free: 9.4 GB) 17/07/27 11:32:01 INFO BlockManagerInfo: Removed broadcast_1442_piece0 on n8-157-167.byted.org:29370 in memory (size: 35.9 KB, free: 9.4 GB) 17/07/27 11:32:02 INFO JobScheduler: Added jobs for time 150112632 ms 17/07/27 11:32:32 INFO JobScheduler: Added jobs for time 150112635 ms 17/07/27 11:32:45 INFO JobScheduler: Finished job streaming job 150111696 ms.0 from job set of time 150111696 ms 17/07/27 11:32:45 INFO JobScheduler: Total delay: 9405.183 s for time 150111696 ms (execution: 1169.595 s) 17/07/27 11:32:45 INFO JobScheduler: Starting job streaming
[jira] [Updated] (SPARK-21547) Spark cleaner cost too many time
[ https://issues.apache.org/jira/browse/SPARK-21547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DjvuLee updated SPARK-21547: Description: Spark Streaming sometime cost so many time deal with cleaning, and this can become worse when enable the dynamic allocation. > Spark cleaner cost too many time > > > Key: SPARK-21547 > URL: https://issues.apache.org/jira/browse/SPARK-21547 > Project: Spark > Issue Type: Bug > Components: DStreams >Affects Versions: 2.0.0 >Reporter: DjvuLee > > Spark Streaming sometime cost so many time deal with cleaning, and this can > become worse when enable the dynamic allocation. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21547) Spark cleaner cost too many time
DjvuLee created SPARK-21547: --- Summary: Spark cleaner cost too many time Key: SPARK-21547 URL: https://issues.apache.org/jira/browse/SPARK-21547 Project: Spark Issue Type: Bug Components: DStreams Affects Versions: 2.0.0 Reporter: DjvuLee -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21383) YARN can allocate too many executors
[ https://issues.apache.org/jira/browse/SPARK-21383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DjvuLee updated SPARK-21383: Summary: YARN can allocate too many executors (was: YARN can allocate to many executors) > YARN can allocate too many executors > > > Key: SPARK-21383 > URL: https://issues.apache.org/jira/browse/SPARK-21383 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 2.0.0 >Reporter: Thomas Graves > > The YarnAllocator doesn't properly track containers being launched but not > yet running. If it takes time to launch the containers on the NM they don't > show up as numExecutorsRunning, but they are already out of the Pending list, > so if the allocateResources call happens again it can think it has missing > executors even when it doesn't (they just haven't been launched yet). > This was introduced by SPARK-12447 > Where it check for missing: > https://github.com/apache/spark/blob/master/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala#L297 > Only updates the numRunningExecutors after NM has started it: > https://github.com/apache/spark/blob/master/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala#L524 > Thus if the NM is slow or the network is slow, it can miscount and start > additional executors. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21082) Consider Executor's memory usage when scheduling task
[ https://issues.apache.org/jira/browse/SPARK-21082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DjvuLee updated SPARK-21082: Affects Version/s: (was: 2.2.1) 2.3.0 > Consider Executor's memory usage when scheduling task > -- > > Key: SPARK-21082 > URL: https://issues.apache.org/jira/browse/SPARK-21082 > Project: Spark > Issue Type: Improvement > Components: Scheduler, Spark Core >Affects Versions: 2.3.0 >Reporter: DjvuLee > > Spark Scheduler do not consider the memory usage during dispatch tasks, this > can lead to Executor OOM if the RDD is cached sometimes, because Spark can > not estimate the memory usage well enough(especially when the RDD type is not > flatten), scheduler may dispatch so many tasks on one Executor. > We can offer a configuration for user to decide whether scheduler will > consider the memory usage. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21082) Consider Executor's memory usage when scheduling task
[ https://issues.apache.org/jira/browse/SPARK-21082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16049965#comment-16049965 ] DjvuLee commented on SPARK-21082: - Data locality, input size for task, scheduling order affect a lot, even all the nodes have the same computation capacity. Suppose there are two Executors with same computation capacity and four tasks with input size: 10G, 3G, 10G, 20G. So there is a chance that one Executor will cache 30GB, one will cache 13GB under current scheduling policy。 If the Executor have only 25GB memory for storage, then not all the data can be cached in memory. I will give a more detail description for the propose if it seems OK now. > Consider Executor's memory usage when scheduling task > -- > > Key: SPARK-21082 > URL: https://issues.apache.org/jira/browse/SPARK-21082 > Project: Spark > Issue Type: Improvement > Components: Scheduler, Spark Core >Affects Versions: 2.2.1 >Reporter: DjvuLee > > Spark Scheduler do not consider the memory usage during dispatch tasks, this > can lead to Executor OOM if the RDD is cached sometimes, because Spark can > not estimate the memory usage well enough(especially when the RDD type is not > flatten), scheduler may dispatch so many tasks on one Executor. > We can offer a configuration for user to decide whether scheduler will > consider the memory usage. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-21082) Consider Executor's memory usage when scheduling task
[ https://issues.apache.org/jira/browse/SPARK-21082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16049933#comment-16049933 ] DjvuLee edited comment on SPARK-21082 at 6/15/17 2:47 AM: -- Not a really fast node and slow node problem. Even all the nodes have equal computation power, but there are lots of factor affect the data cached by Executors. Such as the data locality for the task's input, the network, and scheduling order etc. `it is reasonable to schedule more tasks on to fast node.` but the fact is schedule more tasks to ideal Executors. Scheduler has no meaning of fast or slow for each Executor, it considers more about locality and idle. I agree that it is better not to change the code, but I can not find any configuration to solve the problem. Is there any good solution to keep the used memory balanced across Executors? was (Author: djvulee): Not a really fast node and slow node problem. Even all the nodes have equal computation power, but there are lots of factor affect the data cached by Executors. Such as the data locality for the task's input, the network, and scheduling order etc. `it is reasonable to schedule more tasks on to fast node.` but the fact is schedule more tasks to ideal Executors. Scheduler has no meaning of fast or slow for each Executor, it considers more about locality and idle. I agree that it is better not to change the code, but I can not find any configuration to solve the problem. Is there any good solution to keep the used memory balanced across the Executors? > Consider Executor's memory usage when scheduling task > -- > > Key: SPARK-21082 > URL: https://issues.apache.org/jira/browse/SPARK-21082 > Project: Spark > Issue Type: Improvement > Components: Scheduler, Spark Core >Affects Versions: 2.2.1 >Reporter: DjvuLee > > Spark Scheduler do not consider the memory usage during dispatch tasks, this > can lead to Executor OOM if the RDD is cached sometimes, because Spark can > not estimate the memory usage well enough(especially when the RDD type is not > flatten), scheduler may dispatch so many tasks on one Executor. > We can offer a configuration for user to decide whether scheduler will > consider the memory usage. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21082) Consider Executor's memory usage when scheduling task
[ https://issues.apache.org/jira/browse/SPARK-21082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16049933#comment-16049933 ] DjvuLee commented on SPARK-21082: - Not a really fast node and slow node problem. Even all the nodes have equal computation power, but there are lots of factor affect the data cached by Executors. Such as the data locality for the task's input, the network, and scheduling order etc. `it is reasonable to schedule more tasks on to fast node.` but the fact is schedule more tasks to ideal Executors. Scheduler has no meaning of fast or slow for each Executor, it considers more about locality and idle. I agree that it is better not to change the code, but I can not find any configuration to solve the problem. Is there any good solution to keep the used memory balanced across the Executors? > Consider Executor's memory usage when scheduling task > -- > > Key: SPARK-21082 > URL: https://issues.apache.org/jira/browse/SPARK-21082 > Project: Spark > Issue Type: Improvement > Components: Scheduler, Spark Core >Affects Versions: 2.2.1 >Reporter: DjvuLee > > Spark Scheduler do not consider the memory usage during dispatch tasks, this > can lead to Executor OOM if the RDD is cached sometimes, because Spark can > not estimate the memory usage well enough(especially when the RDD type is not > flatten), scheduler may dispatch so many tasks on one Executor. > We can offer a configuration for user to decide whether scheduler will > consider the memory usage. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21082) Consider Executor's memory usage when scheduling task
[ https://issues.apache.org/jira/browse/SPARK-21082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DjvuLee updated SPARK-21082: Description: Spark Scheduler do not consider the memory usage during dispatch tasks, this can lead to Executor OOM if the RDD is cached sometimes, because Spark can not estimate the memory usage well enough(especially when the RDD type is not flatten), scheduler may dispatch so many tasks on one Executor. We can offer a configuration for user to decide whether scheduler will consider the memory usage. was: Spark Scheduler do not consider the memory usage during dispatch tasks, this can lead to Executor OOM if the RDD is cached sometimes, because Spark can not estimate the memory usage enough well(especially when the RDD type is not flatten), scheduler may dispatch so many tasks on one Executor. We can offer a configuration for user to decide whether scheduler will consider the memory usage. > Consider Executor's memory usage when scheduling task > -- > > Key: SPARK-21082 > URL: https://issues.apache.org/jira/browse/SPARK-21082 > Project: Spark > Issue Type: Improvement > Components: Scheduler, Spark Core >Affects Versions: 2.2.1 >Reporter: DjvuLee > > Spark Scheduler do not consider the memory usage during dispatch tasks, this > can lead to Executor OOM if the RDD is cached sometimes, because Spark can > not estimate the memory usage well enough(especially when the RDD type is not > flatten), scheduler may dispatch so many tasks on one Executor. > We can offer a configuration for user to decide whether scheduler will > consider the memory usage. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21082) Consider Executor's memory usage when scheduling task
[ https://issues.apache.org/jira/browse/SPARK-21082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048835#comment-16048835 ] DjvuLee commented on SPARK-21082: - Yes, one of the reason why Spark do not balance tasks well enough is affected by data locality. Consider data locality is good in most case, but when we want to cache the RDD and analysis many times on this RDD, memory balance is more important than keep data locality when load the data。 If we can not guarantee all the consideration well enough, offer a configuration to users is valuable when dealing with memory. I will give a pull request soon if this suggestion is not defective at first sight. > Consider Executor's memory usage when scheduling task > -- > > Key: SPARK-21082 > URL: https://issues.apache.org/jira/browse/SPARK-21082 > Project: Spark > Issue Type: Improvement > Components: Scheduler, Spark Core >Affects Versions: 2.2.1 >Reporter: DjvuLee > > Spark Scheduler do not consider the memory usage during dispatch tasks, this > can lead to Executor OOM if the RDD is cached sometimes, because Spark can > not estimate the memory usage enough well(especially when the RDD type is not > flatten), scheduler may dispatch so many tasks on one Executor. > We can offer a configuration for user to decide whether scheduler will > consider the memory usage. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-21082) Consider Executor's memory usage when scheduling task
[ https://issues.apache.org/jira/browse/SPARK-21082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048653#comment-16048653 ] DjvuLee edited comment on SPARK-21082 at 6/14/17 3:15 AM: -- [~srowen] This situation occurred when the partition number is larger than the CPU core. Consider there are 1000 partition and 100 CPU core, we want cache RDD among all the Executors. If one Executor executes tasks fast at first time, then the scheduler will dispatch more tasks to it, so after all the tasks is scheduled, some Executors will used all the storage memory, but some Executors just use few memory, Executors which used more memory may not cache all the RDD partition scheduled on it, because there is no more memory for some tasks. Under this situation, we can not cache all the partition even we have enough memory. What's more, if some Executors occurred OOM during following compute, the scheduler may dispatch tasks to Executor which have no more storage memory, and sometimes can lead to more and more OOM if Spark can not estimate the memory well enough. But if the scheduler try to schedule tasks to Executors which own more free memory can ease this situation. Maybe we can use the `coalesce` to decrease the partition number, but this is not good enough for speculating. was (Author: djvulee): [~srowen] This situation occurred when the partition number is larger than the CPU core. Consider there are 1000 partition and 100 CPU core, we want cache RDD among all the Executors. If one Executor executes tasks fast at first time, then the scheduler will dispatch more tasks to it, so after all the tasks is scheduled, some Executors will used all the storage memory, but some Executors just use few memory, Executors which used more memory may not cache all the RDD partition scheduled on it, because there is no more memory for some tasks. Under this situation, we can not cache all the partition even we have enough memory. What's more, if some Executors occurred OOM during following compute, the scheduler may dispatch tasks to Executor which have no more storage memory, and sometimes can lead to more and more OOM if Spark can not estimate the memory. But if the scheduler try to schedule tasks to Executors which own more free memory can ease this situation. Maybe we can use the `coalesce` to decrease the partition number, but this is not good enough for speculating. > Consider Executor's memory usage when scheduling task > -- > > Key: SPARK-21082 > URL: https://issues.apache.org/jira/browse/SPARK-21082 > Project: Spark > Issue Type: Improvement > Components: Scheduler, Spark Core >Affects Versions: 2.2.1 >Reporter: DjvuLee > > Spark Scheduler do not consider the memory usage during dispatch tasks, this > can lead to Executor OOM if the RDD is cached sometimes, because Spark can > not estimate the memory usage enough well(especially when the RDD type is not > flatten), scheduler may dispatch so many tasks on one Executor. > We can offer a configuration for user to decide whether scheduler will > consider the memory usage. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21082) Consider Executor's memory usage when scheduling task
[ https://issues.apache.org/jira/browse/SPARK-21082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048654#comment-16048654 ] DjvuLee commented on SPARK-21082: - My idea is try to consider the BlockManger information when scheduling if user specify the configuration. > Consider Executor's memory usage when scheduling task > -- > > Key: SPARK-21082 > URL: https://issues.apache.org/jira/browse/SPARK-21082 > Project: Spark > Issue Type: Improvement > Components: Scheduler, Spark Core >Affects Versions: 2.2.1 >Reporter: DjvuLee > > Spark Scheduler do not consider the memory usage during dispatch tasks, this > can lead to Executor OOM if the RDD is cached sometimes, because Spark can > not estimate the memory usage enough well(especially when the RDD type is not > flatten), scheduler may dispatch so many tasks on one Executor. > We can offer a configuration for user to decide whether scheduler will > consider the memory usage. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21082) Consider Executor's memory usage when scheduling task
[ https://issues.apache.org/jira/browse/SPARK-21082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048653#comment-16048653 ] DjvuLee commented on SPARK-21082: - [~srowen] This situation occurred when the partition number is larger than the CPU core. Consider there are 1000 partition and 100 CPU core, we want cache RDD among all the Executors. If one Executor executes tasks fast at first time, then the scheduler will dispatch more tasks to it, so after all the tasks is scheduled, some Executors will used all the storage memory, but some Executors just use few memory, Executors which used more memory may not cache all the RDD partition scheduled on it, because there is no more memory for some tasks. Under this situation, we can not cache all the partition even we have enough memory. What's more, if some Executors occurred OOM during following compute, the scheduler may dispatch tasks to Executor which have no more storage memory, and sometimes can lead to more and more OOM if Spark can not estimate the memory. But if the scheduler try to schedule tasks to Executors which own more free memory can ease this situation. Maybe we can use the `coalesce` to decrease the partition number, but this is not good enough for speculating. > Consider Executor's memory usage when scheduling task > -- > > Key: SPARK-21082 > URL: https://issues.apache.org/jira/browse/SPARK-21082 > Project: Spark > Issue Type: Improvement > Components: Scheduler, Spark Core >Affects Versions: 2.2.1 >Reporter: DjvuLee > > Spark Scheduler do not consider the memory usage during dispatch tasks, this > can lead to Executor OOM if the RDD is cached sometimes, because Spark can > not estimate the memory usage enough well(especially when the RDD type is not > flatten), scheduler may dispatch so many tasks on one Executor. > We can offer a configuration for user to decide whether scheduler will > consider the memory usage. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21082) Consider Executor's memory usage when scheduling task
[ https://issues.apache.org/jira/browse/SPARK-21082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DjvuLee updated SPARK-21082: Description: Spark Scheduler do not consider the memory usage during dispatch tasks, this can lead to Executor OOM if the RDD is cached sometimes, because Spark can not estimate the memory usage enough well(especially when the RDD type is not flatten), scheduler may dispatch so many tasks on one Executor. We can offer a configuration for user to decide whether scheduler will consider the memory usage. was: Spark Scheduler do not consider the memory usage during dispatch tasks, this can lead to Executor OOM if the RDD is cached sometimes, because Spark can not estimate the memory usage enough well(especially when the RDD type is not flatten), scheduler may dispatch so many task on one Executor. We can offer a configuration for user to decide whether scheduler will consider the memory usage. > Consider Executor's memory usage when scheduling task > -- > > Key: SPARK-21082 > URL: https://issues.apache.org/jira/browse/SPARK-21082 > Project: Spark > Issue Type: Improvement > Components: Scheduler, Spark Core >Affects Versions: 2.2.1 >Reporter: DjvuLee > > Spark Scheduler do not consider the memory usage during dispatch tasks, this > can lead to Executor OOM if the RDD is cached sometimes, because Spark can > not estimate the memory usage enough well(especially when the RDD type is not > flatten), scheduler may dispatch so many tasks on one Executor. > We can offer a configuration for user to decide whether scheduler will > consider the memory usage. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21082) Consider Executor's memory usage when scheduling task
[ https://issues.apache.org/jira/browse/SPARK-21082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048146#comment-16048146 ] DjvuLee commented on SPARK-21082: - If this feature is a good suggestion(we encounter this problem in fact), I will give a pull request. > Consider Executor's memory usage when scheduling task > -- > > Key: SPARK-21082 > URL: https://issues.apache.org/jira/browse/SPARK-21082 > Project: Spark > Issue Type: Improvement > Components: Scheduler, Spark Core >Affects Versions: 2.2.1 >Reporter: DjvuLee > > Spark Scheduler do not consider the memory usage during dispatch tasks, this > can lead to Executor OOM if the RDD is cached sometimes, because Spark can > not estimate the memory usage enough well(especially when the RDD type is not > flatten), scheduler may dispatch so many task on one Executor. > We can offer a configuration for user to decide whether scheduler will > consider the memory usage. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21082) Consider Executor's memory usage when scheduling task
[ https://issues.apache.org/jira/browse/SPARK-21082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DjvuLee updated SPARK-21082: Description: Spark Scheduler do not consider the memory usage during dispatch tasks, this can lead to Executor OOM if the RDD is cached sometimes, because Spark can not estimate the memory usage enough well(especially when the RDD type is not flatten), scheduler may dispatch so many task on one Executor. We can offer a configuration for user to decide whether scheduler will consider the memory usage. was: Spark Scheduler do not consider the memory usage during dispatch tasks, this can lead to Executor OOM if the RDD is cached sometimes, because Spark can not estimate the memory usage enough well(especially when the RDD type is not flatten). We can offer a configuration for user to decide whether scheduler will consider the memory usage to relief the OOM. > Consider Executor's memory usage when scheduling task > -- > > Key: SPARK-21082 > URL: https://issues.apache.org/jira/browse/SPARK-21082 > Project: Spark > Issue Type: Improvement > Components: Scheduler, Spark Core >Affects Versions: 2.2.1 >Reporter: DjvuLee > > Spark Scheduler do not consider the memory usage during dispatch tasks, this > can lead to Executor OOM if the RDD is cached sometimes, because Spark can > not estimate the memory usage enough well(especially when the RDD type is not > flatten), scheduler may dispatch so many task on one Executor. > We can offer a configuration for user to decide whether scheduler will > consider the memory usage. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21082) Consider Executor's memory usage when scheduling task
[ https://issues.apache.org/jira/browse/SPARK-21082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DjvuLee updated SPARK-21082: Description: Spark Scheduler do not consider the memory usage during dispatch tasks, this can lead to Executor OOM if the RDD is cached sometimes, because Spark can not estimate the memory usage enough well(especially when the RDD type is not flatten). We can offer a configuration for user to decide whether scheduler will consider the memory usage to relief the OOM. (was: When we cache the ) > Consider Executor's memory usage when scheduling task > -- > > Key: SPARK-21082 > URL: https://issues.apache.org/jira/browse/SPARK-21082 > Project: Spark > Issue Type: Improvement > Components: Scheduler, Spark Core >Affects Versions: 2.2.1 >Reporter: DjvuLee > > Spark Scheduler do not consider the memory usage during dispatch tasks, this > can lead to Executor OOM if the RDD is cached sometimes, because Spark can > not estimate the memory usage enough well(especially when the RDD type is not > flatten). We can offer a configuration for user to decide whether scheduler > will consider the memory usage to relief the OOM. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21082) Consider Executor's memory usage when scheduling task
[ https://issues.apache.org/jira/browse/SPARK-21082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DjvuLee updated SPARK-21082: Description: When we cache the > Consider Executor's memory usage when scheduling task > -- > > Key: SPARK-21082 > URL: https://issues.apache.org/jira/browse/SPARK-21082 > Project: Spark > Issue Type: Improvement > Components: Scheduler, Spark Core >Affects Versions: 2.2.1 >Reporter: DjvuLee > > When we cache the -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21082) Consider Executor's memory usage when scheduling task
[ https://issues.apache.org/jira/browse/SPARK-21082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DjvuLee updated SPARK-21082: Component/s: Scheduler > Consider Executor's memory usage when scheduling task > -- > > Key: SPARK-21082 > URL: https://issues.apache.org/jira/browse/SPARK-21082 > Project: Spark > Issue Type: Improvement > Components: Scheduler, Spark Core >Affects Versions: 2.2.1 >Reporter: DjvuLee > > When we cache the -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21082) Consider the Executor's Memory usage when scheduling task
DjvuLee created SPARK-21082: --- Summary: Consider the Executor's Memory usage when scheduling task Key: SPARK-21082 URL: https://issues.apache.org/jira/browse/SPARK-21082 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 2.2.1 Reporter: DjvuLee -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21082) Consider Executor's memory usage when scheduling task
[ https://issues.apache.org/jira/browse/SPARK-21082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DjvuLee updated SPARK-21082: Summary: Consider Executor's memory usage when scheduling task (was: Consider the Executor's Memory usage when scheduling task ) > Consider Executor's memory usage when scheduling task > -- > > Key: SPARK-21082 > URL: https://issues.apache.org/jira/browse/SPARK-21082 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.2.1 >Reporter: DjvuLee > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21064) Fix the default value bug in NettyBlockTransferServiceSuite
[ https://issues.apache.org/jira/browse/SPARK-21064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16046475#comment-16046475 ] DjvuLee commented on SPARK-21064: - The defalut value for `spark.port.maxRetries` is 100, but we use the 10 in the suite file. > Fix the default value bug in NettyBlockTransferServiceSuite > --- > > Key: SPARK-21064 > URL: https://issues.apache.org/jira/browse/SPARK-21064 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: DjvuLee > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21064) Fix the default value bug in NettyBlockTransferServiceSuite
DjvuLee created SPARK-21064: --- Summary: Fix the default value bug in NettyBlockTransferServiceSuite Key: SPARK-21064 URL: https://issues.apache.org/jira/browse/SPARK-21064 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.2.0 Reporter: DjvuLee -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18085) Better History Server scalability for many / large applications
[ https://issues.apache.org/jira/browse/SPARK-18085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16040633#comment-16040633 ] DjvuLee commented on SPARK-18085: - [~vanzin] the procedure of loading the history summary page is still a little long. Is there any further plan to solve this? > Better History Server scalability for many / large applications > --- > > Key: SPARK-18085 > URL: https://issues.apache.org/jira/browse/SPARK-18085 > Project: Spark > Issue Type: Umbrella > Components: Spark Core, Web UI >Affects Versions: 2.0.0 >Reporter: Marcelo Vanzin > Attachments: spark_hs_next_gen.pdf > > > It's a known fact that the History Server currently has some annoying issues > when serving lots of applications, and when serving large applications. > I'm filing this umbrella to track work related to addressing those issues. > I'll be attaching a document shortly describing the issues and suggesting a > path to how to solve them. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18085) Better History Server scalability for many / large applications
[ https://issues.apache.org/jira/browse/SPARK-18085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16038210#comment-16038210 ] DjvuLee commented on SPARK-18085: - [~vanzin] I want to try your branch. Does all the information is the same with you post above? If there is anything I can help you, you can post the work below. > Better History Server scalability for many / large applications > --- > > Key: SPARK-18085 > URL: https://issues.apache.org/jira/browse/SPARK-18085 > Project: Spark > Issue Type: Umbrella > Components: Spark Core, Web UI >Affects Versions: 2.0.0 >Reporter: Marcelo Vanzin > Attachments: spark_hs_next_gen.pdf > > > It's a known fact that the History Server currently has some annoying issues > when serving lots of applications, and when serving large applications. > I'm filing this umbrella to track work related to addressing those issues. > I'll be attaching a document shortly describing the issues and suggesting a > path to how to solve them. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-18085) Better History Server scalability for many / large applications
[ https://issues.apache.org/jira/browse/SPARK-18085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DjvuLee updated SPARK-18085: Yes, you're right. I just want to impact as few as possible. > Better History Server scalability for many / large applications > --- > > Key: SPARK-18085 > URL: https://issues.apache.org/jira/browse/SPARK-18085 > Project: Spark > Issue Type: Umbrella > Components: Spark Core, Web UI >Affects Versions: 2.0.0 >Reporter: Marcelo Vanzin > Attachments: spark_hs_next_gen.pdf > > > It's a known fact that the History Server currently has some annoying issues > when serving lots of applications, and when serving large applications. > I'm filing this umbrella to track work related to addressing those issues. > I'll be attaching a document shortly describing the issues and suggesting a > path to how to solve them. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18085) Better History Server scalability for many / large applications
[ https://issues.apache.org/jira/browse/SPARK-18085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15896652#comment-15896652 ] DjvuLee commented on SPARK-18085: - "A separate jar file" means we generate a new jar file for the history function, just like Spark put `network` function in a new jar file, not in the Spark-core,I just want do not impact the existing jar file。 You can update the information when this design is ready for complete, maybe many people want to try it. Thanks for your reply! > Better History Server scalability for many / large applications > --- > > Key: SPARK-18085 > URL: https://issues.apache.org/jira/browse/SPARK-18085 > Project: Spark > Issue Type: Umbrella > Components: Spark Core, Web UI >Affects Versions: 2.0.0 >Reporter: Marcelo Vanzin > Attachments: spark_hs_next_gen.pdf > > > It's a known fact that the History Server currently has some annoying issues > when serving lots of applications, and when serving large applications. > I'm filing this umbrella to track work related to addressing those issues. > I'll be attaching a document shortly describing the issues and suggesting a > path to how to solve them. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19823) Support Gang Distribution of Task
[ https://issues.apache.org/jira/browse/SPARK-19823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15896107#comment-15896107 ] DjvuLee commented on SPARK-19823: - [~zsxwing] Can you have a look at? > Support Gang Distribution of Task > -- > > Key: SPARK-19823 > URL: https://issues.apache.org/jira/browse/SPARK-19823 > Project: Spark > Issue Type: Improvement > Components: Scheduler, Spark Core >Affects Versions: 2.0.2 >Reporter: DjvuLee > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-19823) Support Gang Distribution of Task
[ https://issues.apache.org/jira/browse/SPARK-19823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15896096#comment-15896096 ] DjvuLee edited comment on SPARK-19823 at 3/5/17 7:19 AM: - When Spark distributes tasks to Executors, it uses a Round-Robin way, this means we distribute tasks to all Executors as possible, this is a good strategy in most case. However, when we introduce the dynamic allocation and concurrent jobs, users may need the gang distribution for one single job, this means we allocate tasks for one job in few Executors, so we can release the unused resource. Under current way, Executors may not get released since the long-running stage will distributed at lease one task in each Executor, even we have more cores in each Executor. We can offer a configuration for users, and this will not introduce much complexity. we only nee to modify this: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala#L340 val shuffledOffers = shuffleOffers(filteredOffers) was (Author: djvulee): When Spark distributes tasks to Executors, it uses a Round-Robin way, this means we distribute tasks to all Executors as possible, this is a good strategy in most case. However, when we introduce the dynamic allocation and concurrent jobs, users may need the gang distribution for one single job, this means we allocate tasks for one job in few Executors, so we can release the unused resource. Under current way, Executors may not get released since the long-running stage will distributed at lease one task in each Executor, even we have more cores in each Executor. We can offer a configuration for users, and this will not introduce much complexity. > Support Gang Distribution of Task > -- > > Key: SPARK-19823 > URL: https://issues.apache.org/jira/browse/SPARK-19823 > Project: Spark > Issue Type: Improvement > Components: Scheduler, Spark Core >Affects Versions: 2.0.2 >Reporter: DjvuLee > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-19823) Support Gang Distribution of Task
[ https://issues.apache.org/jira/browse/SPARK-19823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15896096#comment-15896096 ] DjvuLee edited comment on SPARK-19823 at 3/5/17 7:10 AM: - When Spark distributes tasks to Executors, it uses a Round-Robin way, this means we distribute tasks to all Executors as possible, this is a good strategy in most case. However, when we introduce the dynamic allocation and concurrent jobs, users may need the gang distribution for one single job, this means we allocate tasks for one job in few Executors, so we can release the unused resource. In current ways, Executors may not get released since the long-running stage will distributed at lease one task in each Executor, even we have more cores in each Executor. We can offer a configuration for users, and this will not introduce much complexity. was (Author: djvulee): When Spark distributes tasks to Executors, it uses a Round-Robin way, this means we distribute tasks to all Executors as possible, this is a good strategy in most case. However, when we introduce the dynamic allocation and concurrent jobs, users may need the gang distribution for one single job, this means we allocate tasks for one job in few Executors, so we can release the unused resource. In current ways, Executors may not get released since the long-running stage will distributed at lease one task in each Executors, even we have more cores in each Executor. We can offer a configuration for users, and this will not introduce much complexity. > Support Gang Distribution of Task > -- > > Key: SPARK-19823 > URL: https://issues.apache.org/jira/browse/SPARK-19823 > Project: Spark > Issue Type: Improvement > Components: Scheduler, Spark Core >Affects Versions: 2.0.2 >Reporter: DjvuLee > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-19823) Support Gang Distribution of Task
[ https://issues.apache.org/jira/browse/SPARK-19823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15896096#comment-15896096 ] DjvuLee edited comment on SPARK-19823 at 3/5/17 7:10 AM: - When Spark distributes tasks to Executors, it uses a Round-Robin way, this means we distribute tasks to all Executors as possible, this is a good strategy in most case. However, when we introduce the dynamic allocation and concurrent jobs, users may need the gang distribution for one single job, this means we allocate tasks for one job in few Executors, so we can release the unused resource. Under current way, Executors may not get released since the long-running stage will distributed at lease one task in each Executor, even we have more cores in each Executor. We can offer a configuration for users, and this will not introduce much complexity. was (Author: djvulee): When Spark distributes tasks to Executors, it uses a Round-Robin way, this means we distribute tasks to all Executors as possible, this is a good strategy in most case. However, when we introduce the dynamic allocation and concurrent jobs, users may need the gang distribution for one single job, this means we allocate tasks for one job in few Executors, so we can release the unused resource. In current ways, Executors may not get released since the long-running stage will distributed at lease one task in each Executor, even we have more cores in each Executor. We can offer a configuration for users, and this will not introduce much complexity. > Support Gang Distribution of Task > -- > > Key: SPARK-19823 > URL: https://issues.apache.org/jira/browse/SPARK-19823 > Project: Spark > Issue Type: Improvement > Components: Scheduler, Spark Core >Affects Versions: 2.0.2 >Reporter: DjvuLee > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-19823) Support Gang Distribution of Task
[ https://issues.apache.org/jira/browse/SPARK-19823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15896096#comment-15896096 ] DjvuLee edited comment on SPARK-19823 at 3/5/17 7:10 AM: - When Spark distributes tasks to Executors, it uses a Round-Robin way, this means we distribute tasks to all Executors as possible, this is a good strategy in most case. However, when we introduce the dynamic allocation and concurrent jobs, users may need the gang distribution for one single job, this means we allocate tasks for one job in few Executors, so we can release the unused resource. In current ways, Executors may not get released since the long-running stage will distributed at lease one task in each Executors, even we have more cores in each Executor. We can offer a configuration for users, and this will not introduce much complexity. was (Author: djvulee): When Spark distributes tasks to Executors, it uses a Round-Robin way, this means we distribute tasks to all Executors as possible, this is a good strategy in most case. However, when we introduce the dynamic allocation and concurrent jobs, users may need the gang distribution for one single job, this means we allocate tasks for one job in few Executors, so we can release the unused resource. In current ways, Executors may not get released since the long-running stage will distributed on task in each Executors, even we have more cores in each Executor. We can offer a configuration for users, and this will not introduce much complexity. > Support Gang Distribution of Task > -- > > Key: SPARK-19823 > URL: https://issues.apache.org/jira/browse/SPARK-19823 > Project: Spark > Issue Type: Improvement > Components: Scheduler, Spark Core >Affects Versions: 2.0.2 >Reporter: DjvuLee > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19823) Support Gang Distribution of Task
[ https://issues.apache.org/jira/browse/SPARK-19823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15896096#comment-15896096 ] DjvuLee commented on SPARK-19823: - When Spark distributes tasks to Executors, it uses a Round-Robin way, this means we distribute tasks to all Executors as possible, this is a good strategy in most case. However, when we introduce the dynamic allocation and concurrent jobs, users may need the gang distribution for one single job, this means we allocate tasks for one job in few Executors, so we can release the unused resource. In current ways, Executors may not get released since the long-running stage will distributed on task in each Executors, even we have more cores in each Executor. We can offer a configuration for users, and this will not introduce much complexity. > Support Gang Distribution of Task > -- > > Key: SPARK-19823 > URL: https://issues.apache.org/jira/browse/SPARK-19823 > Project: Spark > Issue Type: Improvement > Components: Scheduler, Spark Core >Affects Versions: 2.0.2 >Reporter: DjvuLee > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19823) Support Gang Distribution of Task
[ https://issues.apache.org/jira/browse/SPARK-19823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15896097#comment-15896097 ] DjvuLee commented on SPARK-19823: - If this is a good advice, I will give a Pull Request. > Support Gang Distribution of Task > -- > > Key: SPARK-19823 > URL: https://issues.apache.org/jira/browse/SPARK-19823 > Project: Spark > Issue Type: Improvement > Components: Scheduler, Spark Core >Affects Versions: 2.0.2 >Reporter: DjvuLee > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-19823) Support Gang Distribution of Task
DjvuLee created SPARK-19823: --- Summary: Support Gang Distribution of Task Key: SPARK-19823 URL: https://issues.apache.org/jira/browse/SPARK-19823 Project: Spark Issue Type: Improvement Components: Scheduler, Spark Core Affects Versions: 2.0.2 Reporter: DjvuLee -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18085) Better History Server scalability for many / large applications
[ https://issues.apache.org/jira/browse/SPARK-18085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15896070#comment-15896070 ] DjvuLee commented on SPARK-18085: - [~vanzin] Thanks for your reply! Does your new solution will generate a separate jar file, I would like to try it if true. The history problem is a issue for our production environment now. > Better History Server scalability for many / large applications > --- > > Key: SPARK-18085 > URL: https://issues.apache.org/jira/browse/SPARK-18085 > Project: Spark > Issue Type: Umbrella > Components: Spark Core, Web UI >Affects Versions: 2.0.0 >Reporter: Marcelo Vanzin > Attachments: spark_hs_next_gen.pdf > > > It's a known fact that the History Server currently has some annoying issues > when serving lots of applications, and when serving large applications. > I'm filing this umbrella to track work related to addressing those issues. > I'll be attaching a document shortly describing the issues and suggesting a > path to how to solve them. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19821) Throw out the Read-only disk information when create file for Shuffle
[ https://issues.apache.org/jira/browse/SPARK-19821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15896067#comment-15896067 ] DjvuLee commented on SPARK-19821: - Currently, when the disk is just read-only, we will just throw out the FileNotFoundException, we can do better to give out the disk is read-only information, and maybe we can achieve better fault tolerance for single task. > Throw out the Read-only disk information when create file for Shuffle > - > > Key: SPARK-19821 > URL: https://issues.apache.org/jira/browse/SPARK-19821 > Project: Spark > Issue Type: Improvement > Components: Shuffle, Spark Core >Affects Versions: 2.0.2 >Reporter: DjvuLee > > java.io.FileNotFoundException: > /data01/yarn/nmdata/usercache/tiger/appcache/application_1486364177723_1047735/blockmgr-23098754-a97a-4673-ba73-3de5e167da87/2c/shuffle_55_47_0.index.0347f74b-a9c1-473e-b81f-40be394cc00f > (Input/output error) > at java.io.FileOutputStream.open0(Native Method) > at java.io.FileOutputStream.open(FileOutputStream.java:270) > at java.io.FileOutputStream.(FileOutputStream.java:213) > at java.io.FileOutputStream.(FileOutputStream.java:162) > at > org.apache.spark.shuffle.IndexShuffleBlockResolver.writeIndexFileAndCommit(IndexShuffleBlockResolver.scala:143) > at > org.apache.spark.shuffle.sort.UnsafeShuffleWriter.closeAndWriteOutput(UnsafeShuffleWriter.java:219) > at > org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:164) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47) > at org.apache.spark.scheduler.Task.run(Task.scala:86) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:314) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19821) Throw out the Read-only disk information when create file for Shuffle
[ https://issues.apache.org/jira/browse/SPARK-19821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DjvuLee updated SPARK-19821: Description: java.io.FileNotFoundException: /data01/yarn/nmdata/usercache/tiger/appcache/application_1486364177723_1047735/blockmgr-23098754-a97a-4673-ba73-3de5e167da87/2c/shuffle_55_47_0.index.0347f74b-a9c1-473e-b81f-40be394cc00f (Input/output error) at java.io.FileOutputStream.open0(Native Method) at java.io.FileOutputStream.open(FileOutputStream.java:270) at java.io.FileOutputStream.(FileOutputStream.java:213) at java.io.FileOutputStream.(FileOutputStream.java:162) at org.apache.spark.shuffle.IndexShuffleBlockResolver.writeIndexFileAndCommit(IndexShuffleBlockResolver.scala:143) at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.closeAndWriteOutput(UnsafeShuffleWriter.java:219) at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:164) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47) at org.apache.spark.scheduler.Task.run(Task.scala:86) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:314) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) > Throw out the Read-only disk information when create file for Shuffle > - > > Key: SPARK-19821 > URL: https://issues.apache.org/jira/browse/SPARK-19821 > Project: Spark > Issue Type: Improvement > Components: Shuffle, Spark Core >Affects Versions: 2.0.2 >Reporter: DjvuLee > > java.io.FileNotFoundException: > /data01/yarn/nmdata/usercache/tiger/appcache/application_1486364177723_1047735/blockmgr-23098754-a97a-4673-ba73-3de5e167da87/2c/shuffle_55_47_0.index.0347f74b-a9c1-473e-b81f-40be394cc00f > (Input/output error) > at java.io.FileOutputStream.open0(Native Method) > at java.io.FileOutputStream.open(FileOutputStream.java:270) > at java.io.FileOutputStream.(FileOutputStream.java:213) > at java.io.FileOutputStream.(FileOutputStream.java:162) > at > org.apache.spark.shuffle.IndexShuffleBlockResolver.writeIndexFileAndCommit(IndexShuffleBlockResolver.scala:143) > at > org.apache.spark.shuffle.sort.UnsafeShuffleWriter.closeAndWriteOutput(UnsafeShuffleWriter.java:219) > at > org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:164) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47) > at org.apache.spark.scheduler.Task.run(Task.scala:86) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:314) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-19821) Throw out the Read-only disk information when create file for Shuffle
DjvuLee created SPARK-19821: --- Summary: Throw out the Read-only disk information when create file for Shuffle Key: SPARK-19821 URL: https://issues.apache.org/jira/browse/SPARK-19821 Project: Spark Issue Type: Improvement Components: Shuffle, Spark Core Affects Versions: 2.0.2 Reporter: DjvuLee -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18085) Better History Server scalability for many / large applications
[ https://issues.apache.org/jira/browse/SPARK-18085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15894074#comment-15894074 ] DjvuLee commented on SPARK-18085: - [~vanzin] This is a nice design. There is not much information about the delete. The history log can be large after a few weeks, does this local db will delete the data as specified by the configuration? > Better History Server scalability for many / large applications > --- > > Key: SPARK-18085 > URL: https://issues.apache.org/jira/browse/SPARK-18085 > Project: Spark > Issue Type: Umbrella > Components: Spark Core, Web UI >Affects Versions: 2.0.0 >Reporter: Marcelo Vanzin > Attachments: spark_hs_next_gen.pdf > > > It's a known fact that the History Server currently has some annoying issues > when serving lots of applications, and when serving large applications. > I'm filing this umbrella to track work related to addressing those issues. > I'll be attaching a document shortly describing the issues and suggesting a > path to how to solve them. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17300) ClosedChannelException caused by missing block manager when speculative tasks are killed
[ https://issues.apache.org/jira/browse/SPARK-17300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15853630#comment-15853630 ] DjvuLee commented on SPARK-17300: - [~rdblue] Is there any fix for this issue later? > ClosedChannelException caused by missing block manager when speculative tasks > are killed > > > Key: SPARK-17300 > URL: https://issues.apache.org/jira/browse/SPARK-17300 > Project: Spark > Issue Type: Bug >Reporter: Ryan Blue > > We recently backported SPARK-10530 to our Spark build, which kills > unnecessary duplicate/speculative tasks when one completes (either a > speculative task or the original). In large jobs with 500+ executors, this > caused some executors to die and resulted in the same error that was fixed by > SPARK-15262: ClosedChannelException when trying to connect to the block > manager on affected hosts. > {code} > java.nio.channels.ClosedChannelException > at > org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:239) > at > org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:226) > at > io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680) > at > io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:567) > at > io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424) > at > io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:801) > at > io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:699) > at > io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1122) > at > io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:633) > at > io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:32) > at > io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:908) > at > io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:960) > at > io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:893) > at > io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.nio.channels.ClosedChannelException > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-19327) Improve the partition method when read data from jdbc
DjvuLee created SPARK-19327: --- Summary: Improve the partition method when read data from jdbc Key: SPARK-19327 URL: https://issues.apache.org/jira/browse/SPARK-19327 Project: Spark Issue Type: Improvement Components: SQL Reporter: DjvuLee The partition method in `jdbc` when specify the column using the equal step, this can lead to skew between partitions. The new method introduce a balance partition method base on the elements, this can avoid the skew problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19239) Check the lowerBound and upperBound whether equal None in jdbc API
[ https://issues.apache.org/jira/browse/SPARK-19239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DjvuLee updated SPARK-19239: Summary: Check the lowerBound and upperBound whether equal None in jdbc API (was: Check the lowerBound and upperBound equal None in jdbc API) > Check the lowerBound and upperBound whether equal None in jdbc API > -- > > Key: SPARK-19239 > URL: https://issues.apache.org/jira/browse/SPARK-19239 > Project: Spark > Issue Type: Improvement > Components: PySpark >Reporter: DjvuLee > > When we use the ``jdbc`` in pyspark, if we check the lowerBound and > upperBound, we can give a more friendly suggestion. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19239) Check the lowerBound and upperBound equal None in jdbc API
[ https://issues.apache.org/jira/browse/SPARK-19239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DjvuLee updated SPARK-19239: Description: When we use the ``jdbc`` in pyspark, if we check the lowerBound and upperBound, we can give a more friendly suggestion. > Check the lowerBound and upperBound equal None in jdbc API > -- > > Key: SPARK-19239 > URL: https://issues.apache.org/jira/browse/SPARK-19239 > Project: Spark > Issue Type: Improvement > Components: PySpark >Reporter: DjvuLee > > When we use the ``jdbc`` in pyspark, if we check the lowerBound and > upperBound, we can give a more friendly suggestion. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-19239) Check the lowerBound and upperBound equal None in jdbc API
DjvuLee created SPARK-19239: --- Summary: Check the lowerBound and upperBound equal None in jdbc API Key: SPARK-19239 URL: https://issues.apache.org/jira/browse/SPARK-19239 Project: Spark Issue Type: Improvement Components: PySpark Reporter: DjvuLee -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-18778) Fix the Scala classpath in the spark-shell
[ https://issues.apache.org/jira/browse/SPARK-18778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731403#comment-15731403 ] DjvuLee edited comment on SPARK-18778 at 12/8/16 7:46 AM: -- When I just run the ./bin/spark-shell under our environment, the spark-shell occurs the above error. I can fix it by pass the -usejavacp directly to the spark-shell, like running ./bin/spark-shell -usejavacp My environment is jdk1.8.0_91, and we do not install the scala. the OS is Debian 4.6.4. was (Author: djvulee): When I just run the ./bin/spark-shell under our environment, the spark-shell occurs the above error. I can fix it by pass the -usejavacp directly to the spark-shell, like running ./bin/spark-shell -usejavacp My environment is jdk1.8.0_91, and we do not install the scala. the -Dscala.usejavacp=true in bin/spark-shell seems not work. > Fix the Scala classpath in the spark-shell > -- > > Key: SPARK-18778 > URL: https://issues.apache.org/jira/browse/SPARK-18778 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.6.1, 2.0.2 >Reporter: DjvuLee > > Failed to initialize compiler: object scala.runtime in compiler mirror not > found. > ** Note that as of 2.8 scala does not assume use of the java classpath. > ** For the old behavior pass -usejavacp to scala, or if using a Settings > ** object programatically, settings.usejavacp.value = true. > Exception in thread "main" java.lang.AssertionError: assertion failed: null > at scala.Predef$.assert(Predef.scala:179) > at > org.apache.spark.repl.SparkIMain.initializeSynchronous(SparkIMain.scala:247) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:990) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) > at > scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) > at > org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) > at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) > at org.apache.spark.repl.Main$.main(Main.scala:31) > at org.apache.spark.repl.Main.main(Main.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18778) Fix the Scala classpath in the spark-shell
[ https://issues.apache.org/jira/browse/SPARK-18778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731455#comment-15731455 ] DjvuLee commented on SPARK-18778: - [~srowen] [~andrewor14] can you have a look at? > Fix the Scala classpath in the spark-shell > -- > > Key: SPARK-18778 > URL: https://issues.apache.org/jira/browse/SPARK-18778 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.6.1, 2.0.2 >Reporter: DjvuLee > > Failed to initialize compiler: object scala.runtime in compiler mirror not > found. > ** Note that as of 2.8 scala does not assume use of the java classpath. > ** For the old behavior pass -usejavacp to scala, or if using a Settings > ** object programatically, settings.usejavacp.value = true. > Exception in thread "main" java.lang.AssertionError: assertion failed: null > at scala.Predef$.assert(Predef.scala:179) > at > org.apache.spark.repl.SparkIMain.initializeSynchronous(SparkIMain.scala:247) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:990) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) > at > scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) > at > org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) > at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) > at org.apache.spark.repl.Main$.main(Main.scala:31) > at org.apache.spark.repl.Main.main(Main.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-18778) Fix the Scala classpath in the spark-shell
[ https://issues.apache.org/jira/browse/SPARK-18778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731431#comment-15731431 ] DjvuLee edited comment on SPARK-18778 at 12/8/16 7:38 AM: -- I give a fix in the https://github.com/apache/spark/pull/16210. This seems a little wired, since the -Dscala.usejavacp=true is try to fix this.[https://issues.apache.org/jira/browse/SPARK-4161] was (Author: djvulee): I give a fix in the https://github.com/apache/spark/pull/16210. This seems a little wired, since the -Dscala.usejavacp=true is try to fix this. > Fix the Scala classpath in the spark-shell > -- > > Key: SPARK-18778 > URL: https://issues.apache.org/jira/browse/SPARK-18778 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.6.1, 2.0.2 >Reporter: DjvuLee > > Failed to initialize compiler: object scala.runtime in compiler mirror not > found. > ** Note that as of 2.8 scala does not assume use of the java classpath. > ** For the old behavior pass -usejavacp to scala, or if using a Settings > ** object programatically, settings.usejavacp.value = true. > Exception in thread "main" java.lang.AssertionError: assertion failed: null > at scala.Predef$.assert(Predef.scala:179) > at > org.apache.spark.repl.SparkIMain.initializeSynchronous(SparkIMain.scala:247) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:990) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) > at > scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) > at > org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) > at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) > at org.apache.spark.repl.Main$.main(Main.scala:31) > at org.apache.spark.repl.Main.main(Main.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-18778) Fix the Scala classpath in the spark-shell
[ https://issues.apache.org/jira/browse/SPARK-18778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731431#comment-15731431 ] DjvuLee edited comment on SPARK-18778 at 12/8/16 7:38 AM: -- I give a fix in the https://github.com/apache/spark/pull/16210. This seems a little wired, since the -Dscala.usejavacp=true is try to fix this.[https://issues.apache.org/jira/browse/SPARK-4161] was (Author: djvulee): I give a fix in the https://github.com/apache/spark/pull/16210. This seems a little wired, since the -Dscala.usejavacp=true is try to fix this.[https://issues.apache.org/jira/browse/SPARK-4161] > Fix the Scala classpath in the spark-shell > -- > > Key: SPARK-18778 > URL: https://issues.apache.org/jira/browse/SPARK-18778 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.6.1, 2.0.2 >Reporter: DjvuLee > > Failed to initialize compiler: object scala.runtime in compiler mirror not > found. > ** Note that as of 2.8 scala does not assume use of the java classpath. > ** For the old behavior pass -usejavacp to scala, or if using a Settings > ** object programatically, settings.usejavacp.value = true. > Exception in thread "main" java.lang.AssertionError: assertion failed: null > at scala.Predef$.assert(Predef.scala:179) > at > org.apache.spark.repl.SparkIMain.initializeSynchronous(SparkIMain.scala:247) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:990) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) > at > scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) > at > org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) > at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) > at org.apache.spark.repl.Main$.main(Main.scala:31) > at org.apache.spark.repl.Main.main(Main.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-18778) Fix the Scala classpath in the spark-shell
[ https://issues.apache.org/jira/browse/SPARK-18778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731431#comment-15731431 ] DjvuLee edited comment on SPARK-18778 at 12/8/16 7:35 AM: -- I give a fix in the https://github.com/apache/spark/pull/16210. This seems a little wired, since the -Dscala.usejavacp=true is try to fix this. was (Author: djvulee): I give a fix in the https://github.com/apache/spark/pull/16210 > Fix the Scala classpath in the spark-shell > -- > > Key: SPARK-18778 > URL: https://issues.apache.org/jira/browse/SPARK-18778 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.6.1, 2.0.2 >Reporter: DjvuLee > > Failed to initialize compiler: object scala.runtime in compiler mirror not > found. > ** Note that as of 2.8 scala does not assume use of the java classpath. > ** For the old behavior pass -usejavacp to scala, or if using a Settings > ** object programatically, settings.usejavacp.value = true. > Exception in thread "main" java.lang.AssertionError: assertion failed: null > at scala.Predef$.assert(Predef.scala:179) > at > org.apache.spark.repl.SparkIMain.initializeSynchronous(SparkIMain.scala:247) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:990) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) > at > scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) > at > org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) > at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) > at org.apache.spark.repl.Main$.main(Main.scala:31) > at org.apache.spark.repl.Main.main(Main.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18778) Fix the Scala classpath in the spark-shell
[ https://issues.apache.org/jira/browse/SPARK-18778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731431#comment-15731431 ] DjvuLee commented on SPARK-18778: - I give a fix in the https://github.com/apache/spark/pull/16210 > Fix the Scala classpath in the spark-shell > -- > > Key: SPARK-18778 > URL: https://issues.apache.org/jira/browse/SPARK-18778 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.6.1, 2.0.2 >Reporter: DjvuLee > > Failed to initialize compiler: object scala.runtime in compiler mirror not > found. > ** Note that as of 2.8 scala does not assume use of the java classpath. > ** For the old behavior pass -usejavacp to scala, or if using a Settings > ** object programatically, settings.usejavacp.value = true. > Exception in thread "main" java.lang.AssertionError: assertion failed: null > at scala.Predef$.assert(Predef.scala:179) > at > org.apache.spark.repl.SparkIMain.initializeSynchronous(SparkIMain.scala:247) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:990) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) > at > scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) > at > org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) > at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) > at org.apache.spark.repl.Main$.main(Main.scala:31) > at org.apache.spark.repl.Main.main(Main.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18778) Fix the Scala classpath in the spark-shell
[ https://issues.apache.org/jira/browse/SPARK-18778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731403#comment-15731403 ] DjvuLee commented on SPARK-18778: - When I just run the ./bin/spark-shell under our environment, the spark-shell occurs the above error. I can fix it by pass the -usejavacp directly to the spark-shell, like running ./bin/spark-sell -usejavacp My environment is jdk1.8.0_91, and we do not install the scala. the -Dscala.usejavacp=true in bin/spark-shell seems not work. > Fix the Scala classpath in the spark-shell > -- > > Key: SPARK-18778 > URL: https://issues.apache.org/jira/browse/SPARK-18778 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.6.1, 2.0.2 >Reporter: DjvuLee > > Failed to initialize compiler: object scala.runtime in compiler mirror not > found. > ** Note that as of 2.8 scala does not assume use of the java classpath. > ** For the old behavior pass -usejavacp to scala, or if using a Settings > ** object programatically, settings.usejavacp.value = true. > Exception in thread "main" java.lang.AssertionError: assertion failed: null > at scala.Predef$.assert(Predef.scala:179) > at > org.apache.spark.repl.SparkIMain.initializeSynchronous(SparkIMain.scala:247) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:990) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) > at > scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) > at > org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) > at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) > at org.apache.spark.repl.Main$.main(Main.scala:31) > at org.apache.spark.repl.Main.main(Main.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-18778) Fix the Scala classpath in the spark-shell
[ https://issues.apache.org/jira/browse/SPARK-18778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731403#comment-15731403 ] DjvuLee edited comment on SPARK-18778 at 12/8/16 7:25 AM: -- When I just run the ./bin/spark-shell under our environment, the spark-shell occurs the above error. I can fix it by pass the -usejavacp directly to the spark-shell, like running ./bin/spark-shell -usejavacp My environment is jdk1.8.0_91, and we do not install the scala. the -Dscala.usejavacp=true in bin/spark-shell seems not work. was (Author: djvulee): When I just run the ./bin/spark-shell under our environment, the spark-shell occurs the above error. I can fix it by pass the -usejavacp directly to the spark-shell, like running ./bin/spark-sell -usejavacp My environment is jdk1.8.0_91, and we do not install the scala. the -Dscala.usejavacp=true in bin/spark-shell seems not work. > Fix the Scala classpath in the spark-shell > -- > > Key: SPARK-18778 > URL: https://issues.apache.org/jira/browse/SPARK-18778 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.6.1, 2.0.2 >Reporter: DjvuLee > > Failed to initialize compiler: object scala.runtime in compiler mirror not > found. > ** Note that as of 2.8 scala does not assume use of the java classpath. > ** For the old behavior pass -usejavacp to scala, or if using a Settings > ** object programatically, settings.usejavacp.value = true. > Exception in thread "main" java.lang.AssertionError: assertion failed: null > at scala.Predef$.assert(Predef.scala:179) > at > org.apache.spark.repl.SparkIMain.initializeSynchronous(SparkIMain.scala:247) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:990) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) > at > scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) > at > org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) > at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) > at org.apache.spark.repl.Main$.main(Main.scala:31) > at org.apache.spark.repl.Main.main(Main.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-18778) Fix the Scala classpath in the spark-shell
[ https://issues.apache.org/jira/browse/SPARK-18778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DjvuLee updated SPARK-18778: Affects Version/s: 1.6.1 2.0.2 > Fix the Scala classpath in the spark-shell > -- > > Key: SPARK-18778 > URL: https://issues.apache.org/jira/browse/SPARK-18778 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.6.1, 2.0.2 >Reporter: DjvuLee > > Failed to initialize compiler: object scala.runtime in compiler mirror not > found. > ** Note that as of 2.8 scala does not assume use of the java classpath. > ** For the old behavior pass -usejavacp to scala, or if using a Settings > ** object programatically, settings.usejavacp.value = true. > Exception in thread "main" java.lang.AssertionError: assertion failed: null > at scala.Predef$.assert(Predef.scala:179) > at > org.apache.spark.repl.SparkIMain.initializeSynchronous(SparkIMain.scala:247) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:990) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) > at > scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) > at > org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) > at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) > at org.apache.spark.repl.Main$.main(Main.scala:31) > at org.apache.spark.repl.Main.main(Main.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-18778) Fix the Scala classpath in the spark-shell
[ https://issues.apache.org/jira/browse/SPARK-18778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DjvuLee updated SPARK-18778: Description: Failed to initialize compiler: object scala.runtime in compiler mirror not found. ** Note that as of 2.8 scala does not assume use of the java classpath. ** For the old behavior pass -usejavacp to scala, or if using a Settings ** object programatically, settings.usejavacp.value = true. Exception in thread "main" java.lang.AssertionError: assertion failed: null at scala.Predef$.assert(Predef.scala:179) at org.apache.spark.repl.SparkIMain.initializeSynchronous(SparkIMain.scala:247) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:990) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Fix the Scala classpath in the spark-shell > -- > > Key: SPARK-18778 > URL: https://issues.apache.org/jira/browse/SPARK-18778 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.6.1, 2.0.2 >Reporter: DjvuLee > > Failed to initialize compiler: object scala.runtime in compiler mirror not > found. > ** Note that as of 2.8 scala does not assume use of the java classpath. > ** For the old behavior pass -usejavacp to scala, or if using a Settings > ** object programatically, settings.usejavacp.value = true. > Exception in thread "main" java.lang.AssertionError: assertion failed: null > at scala.Predef$.assert(Predef.scala:179) > at > org.apache.spark.repl.SparkIMain.initializeSynchronous(SparkIMain.scala:247) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:990) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) > at > scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) > at > org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) > at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) > at org.apache.spark.repl.Main$.main(Main.scala:31) > at org.apache.spark.repl.Main.main(Main.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-18778) Fix the Scala classpath in the spark-shell
DjvuLee created SPARK-18778: --- Summary: Fix the Scala classpath in the spark-shell Key: SPARK-18778 URL: https://issues.apache.org/jira/browse/SPARK-18778 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: DjvuLee -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18181) Huge managed memory leak (2.7G) when running reduceByKey
[ https://issues.apache.org/jira/browse/SPARK-18181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15685568#comment-15685568 ] DjvuLee commented on SPARK-18181: - [~barrybecker4] can you reproduce this on the spark2.x version? > Huge managed memory leak (2.7G) when running reduceByKey > > > Key: SPARK-18181 > URL: https://issues.apache.org/jira/browse/SPARK-18181 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.2 >Reporter: Barry Becker > > For a while now, I have noticed messages like > 16/10/31 09:44:25 ERROR Executor: Managed memory leak detected; size = > 5251642 bytes, TID = 64204 > when running jobs with spark 1.6.2. > I have seen others post bugs on this, but they are all marked fixed in > earlier versions. I am certain that the issue still exists in 1.6.2. > In the following case, I can even get it to leak 2.7G all at once. > The message is: > 16/10/31 11:12:47 ERROR Executor: Managed memory leak detected; size = > 2724723111 bytes, TID = 18 > The code snippet causing it is: > {code} > val nonZeros: RDD[((Int, Float), Array[Long])] = > featureValues.map(y => (y._1._1 + "," + y._1._2, y._2)).reduceByKey { > case (v1, v2) => > (v1, v2).zipped.map(_ + _) > }.map(y => { > val s = y._1.split(",") > ((s(0).toInt, s(1).toFloat), y._2) > }) > {code} > and the stack trace is: > {code} > 16/10/31 11:12:47 ERROR Executor: Exception in task 0.0 in stage 11.0 (TID 18) > java.lang.OutOfMemoryError: Java heap space > at > scala.collection.mutable.ArrayBuilder$ofLong.mkArray(ArrayBuilder.scala:388) > at > scala.collection.mutable.ArrayBuilder$ofLong.resize(ArrayBuilder.scala:394) > at > scala.collection.mutable.ArrayBuilder$ofLong.sizeHint(ArrayBuilder.scala:399) > at scala.collection.mutable.Builder$class.sizeHint(Builder.scala:69) > at scala.collection.mutable.ArrayBuilder.sizeHint(ArrayBuilder.scala:22) > at scala.runtime.Tuple2Zipped$.map$extension(Tuple2Zipped.scala:41) > at > org.apache.spark.mllib.feature.MDLPDiscretizer$$anonfun$11.apply(MDLPDiscretizer.scala:151) > at > org.apache.spark.mllib.feature.MDLPDiscretizer$$anonfun$11.apply(MDLPDiscretizer.scala:150) > at > org.apache.spark.util.collection.ExternalSorter$$anonfun$5.apply(ExternalSorter.scala:187) > at > org.apache.spark.util.collection.ExternalSorter$$anonfun$5.apply(ExternalSorter.scala:186) > at > org.apache.spark.util.collection.AppendOnlyMap.changeValue(AppendOnlyMap.scala:144) > at > org.apache.spark.util.collection.SizeTrackingAppendOnlyMap.changeValue(SizeTrackingAppendOnlyMap.scala:32) > at > org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:192) > at > org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:64) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 16/10/31 11:12:47 ERROR SparkUncaughtExceptionHandler: Uncaught exception in > thread Thread[Executor task launch worker-0,5,main] > java.lang.OutOfMemoryError: Java heap space > at > scala.collection.mutable.ArrayBuilder$ofLong.mkArray(ArrayBuilder.scala:388) > at > scala.collection.mutable.ArrayBuilder$ofLong.resize(ArrayBuilder.scala:394) > at > scala.collection.mutable.ArrayBuilder$ofLong.sizeHint(ArrayBuilder.scala:399) > at scala.collection.mutable.Builder$class.sizeHint(Builder.scala:69) > at scala.collection.mutable.ArrayBuilder.sizeHint(ArrayBuilder.scala:22) > at scala.runtime.Tuple2Zipped$.map$extension(Tuple2Zipped.scala:41) > at > org.apache.spark.mllib.feature.MDLPDiscretizer$$anonfun$11.apply(MDLPDiscretizer.scala:151) > at > org.apache.spark.mllib.feature.MDLPDiscretizer$$anonfun$11.apply(MDLPDiscretizer.scala:150) > at > org.apache.spark.util.collection.ExternalSorter$$anonfun$5.apply(ExternalSorter.scala:187) > at > org.apache.spark.util.collection.ExternalSorter$$anonfun$5.apply(ExternalSorter.scala:186) > at > org.apache.spark.util.collection.AppendOnlyMap.changeValue(AppendOnlyMap.scala:144) > at > org.apache.spark.util.collection.SizeTrackingAppendOnlyMap.changeValue(SizeTrackingAppendOnlyMap.scala:32) > at >
[jira] [Commented] (SPARK-18528) limit + groupBy leads to java.lang.NullPointerException
[ https://issues.apache.org/jira/browse/SPARK-18528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15685451#comment-15685451 ] DjvuLee commented on SPARK-18528: - I just test your example, but it works. >>> df.limit(3).groupBy("user_id").count().show() +---+-+ |user_id|count| +---+-+ | 1|2| | 2|1| +---+-+ I test on a the spark2.0.1-hadoop2.6 version downloaded from the spark website. can you reproduce your error? > limit + groupBy leads to java.lang.NullPointerException > --- > > Key: SPARK-18528 > URL: https://issues.apache.org/jira/browse/SPARK-18528 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 2.0.1 > Environment: CentOS release 6.6, Linux 2.6.32-504.el6.x86_64 >Reporter: Corey > > Using limit on a DataFrame prior to groupBy will lead to a crash. > Repartitioning will avoid the crash. > *will crash:* {{df.limit(3).groupBy("user_id").count().show()}} > *will work:* {{df.limit(3).coalesce(1).groupBy('user_id').count().show()}} > *will work:* > {{df.limit(3).repartition('user_id').groupBy('user_id').count().show()}} > Here is a reproducible example along with the error message: > {quote} > >>> df = spark.createDataFrame([ (1, 1), (1, 3), (2, 1), (3, 2), (3, 3) ], > >>> ["user_id", "genre_id"]) > >>> > >>> df.show() > +---++ > |user_id|genre_id| > +---++ > | 1| 1| > | 1| 3| > | 2| 1| > | 3| 2| > | 3| 3| > +---++ > >>> df.groupBy("user_id").count().show() > +---+-+ > |user_id|count| > +---+-+ > | 1|2| > | 3|2| > | 2|1| > +---+-+ > >>> df.limit(3).groupBy("user_id").count().show() > [Stage 8:===>(1964 + 24) / > 2000]16/11/21 01:59:27 WARN TaskSetManager: Lost task 0.0 in stage 9.0 (TID > 8204, lvsp20hdn012.stubprod.com): java.lang.NullPointerException > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.agg_doAggregateWithKeys$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:246) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:240) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) > at org.apache.spark.scheduler.Task.run(Task.scala:86) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-17500) The DiskBytesSpilled metric in ExternalMerger && ExternalGroupBy is not right
[ https://issues.apache.org/jira/browse/SPARK-17500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DjvuLee closed SPARK-17500. --- Resolution: Not A Bug > The DiskBytesSpilled metric in ExternalMerger && ExternalGroupBy is not right > - > > Key: SPARK-17500 > URL: https://issues.apache.org/jira/browse/SPARK-17500 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.6.0, 1.6.1, 1.6.2, 2.0.0 >Reporter: DjvuLee > > The DiskBytesSpilled metric in ExternalMerger && ExternalGroupBy is increased > by file size in each spill, but we only need the file size at the last time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17500) The DiskBytesSpilled metric in ExternalMerger && ExternalGroupBy is not right
[ https://issues.apache.org/jira/browse/SPARK-17500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DjvuLee updated SPARK-17500: Description: The DiskBytesSpilled metric in ExternalMerger && ExternalGroupBy is increased by file size in each spill, but we only need the file size at the last time. (was: The DiskBytesSpilled metric in ExternalMerger && ExternalGroupBy is increment by file size in each spill, but we only need the file size at the last time.) > The DiskBytesSpilled metric in ExternalMerger && ExternalGroupBy is not right > - > > Key: SPARK-17500 > URL: https://issues.apache.org/jira/browse/SPARK-17500 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.6.0, 1.6.1, 1.6.2, 2.0.0 >Reporter: DjvuLee > > The DiskBytesSpilled metric in ExternalMerger && ExternalGroupBy is increased > by file size in each spill, but we only need the file size at the last time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17500) The DiskBytesSpilled metric in ExternalMerger && ExternalGroupBy is not right
[ https://issues.apache.org/jira/browse/SPARK-17500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DjvuLee updated SPARK-17500: Summary: The DiskBytesSpilled metric in ExternalMerger && ExternalGroupBy is not right (was: The DiskBytesSpilled metric in ExternalMerger && ExternalGroupBy is wrong) > The DiskBytesSpilled metric in ExternalMerger && ExternalGroupBy is not right > - > > Key: SPARK-17500 > URL: https://issues.apache.org/jira/browse/SPARK-17500 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.6.0, 1.6.1, 1.6.2, 2.0.0 >Reporter: DjvuLee > > The DiskBytesSpilled metric in ExternalMerger && ExternalGroupBy is increment > by file size in each spill, but we only need the file size at the last time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17500) The DiskBytesSpilled metric in ExternalMerger && ExternalGroupBy is wrong
DjvuLee created SPARK-17500: --- Summary: The DiskBytesSpilled metric in ExternalMerger && ExternalGroupBy is wrong Key: SPARK-17500 URL: https://issues.apache.org/jira/browse/SPARK-17500 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 2.0.0, 1.6.2, 1.6.1, 1.6.0 Reporter: DjvuLee The DiskBytesSpilled metric in ExternalMerger && ExternalGroupBy is increment by file size in each spill, but we only need the file size at the last time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-3630) Identify cause of Kryo+Snappy PARSING_ERROR
[ https://issues.apache.org/jira/browse/SPARK-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DjvuLee updated SPARK-3630: --- Comment: was deleted (was: How much data do you test? we encounter this error in our production. Our data is about several TB. The Spark version is 1.6.1, and the snappy version is 1.1.2.4。) > Identify cause of Kryo+Snappy PARSING_ERROR > --- > > Key: SPARK-3630 > URL: https://issues.apache.org/jira/browse/SPARK-3630 > Project: Spark > Issue Type: Task > Components: Spark Core >Affects Versions: 1.1.0, 1.2.0 >Reporter: Andrew Ash >Assignee: Josh Rosen > > A recent GraphX commit caused non-deterministic exceptions in unit tests so > it was reverted (see SPARK-3400). > Separately, [~aash] observed the same exception stacktrace in an > application-specific Kryo registrator: > {noformat} > com.esotericsoftware.kryo.KryoException: java.io.IOException: failed to > uncompress the chunk: PARSING_ERROR(2) > com.esotericsoftware.kryo.io.Input.fill(Input.java:142) > com.esotericsoftware.kryo.io.Input.require(Input.java:169) > com.esotericsoftware.kryo.io.Input.readInt(Input.java:325) > com.esotericsoftware.kryo.io.Input.readFloat(Input.java:624) > com.esotericsoftware.kryo.serializers.DefaultSerializers$FloatSerializer.read(DefaultSerializers.java:127) > > com.esotericsoftware.kryo.serializers.DefaultSerializers$FloatSerializer.read(DefaultSerializers.java:117) > > com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732) > com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:109) > > com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18) > > com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732) > ... > {noformat} > This ticket is to identify the cause of the exception in the GraphX commit so > the faulty commit can be fixed and merged back into master. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-3630) Identify cause of Kryo+Snappy PARSING_ERROR
[ https://issues.apache.org/jira/browse/SPARK-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15430215#comment-15430215 ] DjvuLee edited comment on SPARK-3630 at 8/22/16 7:10 AM: - Can I know how much data do you test? We encounter this error in our production, our data is about several TB. The Spark version is 1.6.1, and the snappy version is 1.1.2.4。When the data is small, we never encounter this error. was (Author: djvulee): How much data do you test? we encounter this error in our production. Our data is about several TB. The Spark version is 1.6.1, and the snappy version is 1.1.2.4。 > Identify cause of Kryo+Snappy PARSING_ERROR > --- > > Key: SPARK-3630 > URL: https://issues.apache.org/jira/browse/SPARK-3630 > Project: Spark > Issue Type: Task > Components: Spark Core >Affects Versions: 1.1.0, 1.2.0 >Reporter: Andrew Ash >Assignee: Josh Rosen > > A recent GraphX commit caused non-deterministic exceptions in unit tests so > it was reverted (see SPARK-3400). > Separately, [~aash] observed the same exception stacktrace in an > application-specific Kryo registrator: > {noformat} > com.esotericsoftware.kryo.KryoException: java.io.IOException: failed to > uncompress the chunk: PARSING_ERROR(2) > com.esotericsoftware.kryo.io.Input.fill(Input.java:142) > com.esotericsoftware.kryo.io.Input.require(Input.java:169) > com.esotericsoftware.kryo.io.Input.readInt(Input.java:325) > com.esotericsoftware.kryo.io.Input.readFloat(Input.java:624) > com.esotericsoftware.kryo.serializers.DefaultSerializers$FloatSerializer.read(DefaultSerializers.java:127) > > com.esotericsoftware.kryo.serializers.DefaultSerializers$FloatSerializer.read(DefaultSerializers.java:117) > > com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732) > com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:109) > > com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18) > > com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732) > ... > {noformat} > This ticket is to identify the cause of the exception in the GraphX commit so > the faulty commit can be fixed and merged back into master. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3630) Identify cause of Kryo+Snappy PARSING_ERROR
[ https://issues.apache.org/jira/browse/SPARK-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15430215#comment-15430215 ] DjvuLee commented on SPARK-3630: How much data do you test? we encounter this error in our production. Our data is about several TB. The Spark version is 1.6.1, and the snappy version is 1.1.2.4。 > Identify cause of Kryo+Snappy PARSING_ERROR > --- > > Key: SPARK-3630 > URL: https://issues.apache.org/jira/browse/SPARK-3630 > Project: Spark > Issue Type: Task > Components: Spark Core >Affects Versions: 1.1.0, 1.2.0 >Reporter: Andrew Ash >Assignee: Josh Rosen > > A recent GraphX commit caused non-deterministic exceptions in unit tests so > it was reverted (see SPARK-3400). > Separately, [~aash] observed the same exception stacktrace in an > application-specific Kryo registrator: > {noformat} > com.esotericsoftware.kryo.KryoException: java.io.IOException: failed to > uncompress the chunk: PARSING_ERROR(2) > com.esotericsoftware.kryo.io.Input.fill(Input.java:142) > com.esotericsoftware.kryo.io.Input.require(Input.java:169) > com.esotericsoftware.kryo.io.Input.readInt(Input.java:325) > com.esotericsoftware.kryo.io.Input.readFloat(Input.java:624) > com.esotericsoftware.kryo.serializers.DefaultSerializers$FloatSerializer.read(DefaultSerializers.java:127) > > com.esotericsoftware.kryo.serializers.DefaultSerializers$FloatSerializer.read(DefaultSerializers.java:117) > > com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732) > com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:109) > > com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18) > > com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732) > ... > {noformat} > This ticket is to identify the cause of the exception in the GraphX commit so > the faulty commit can be fixed and merged back into master. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3630) Identify cause of Kryo+Snappy PARSING_ERROR
[ https://issues.apache.org/jira/browse/SPARK-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15430217#comment-15430217 ] DjvuLee commented on SPARK-3630: How much data do you test? we encounter this error in our production. Our data is about several TB. The Spark version is 1.6.1, and the snappy version is 1.1.2.4。 > Identify cause of Kryo+Snappy PARSING_ERROR > --- > > Key: SPARK-3630 > URL: https://issues.apache.org/jira/browse/SPARK-3630 > Project: Spark > Issue Type: Task > Components: Spark Core >Affects Versions: 1.1.0, 1.2.0 >Reporter: Andrew Ash >Assignee: Josh Rosen > > A recent GraphX commit caused non-deterministic exceptions in unit tests so > it was reverted (see SPARK-3400). > Separately, [~aash] observed the same exception stacktrace in an > application-specific Kryo registrator: > {noformat} > com.esotericsoftware.kryo.KryoException: java.io.IOException: failed to > uncompress the chunk: PARSING_ERROR(2) > com.esotericsoftware.kryo.io.Input.fill(Input.java:142) > com.esotericsoftware.kryo.io.Input.require(Input.java:169) > com.esotericsoftware.kryo.io.Input.readInt(Input.java:325) > com.esotericsoftware.kryo.io.Input.readFloat(Input.java:624) > com.esotericsoftware.kryo.serializers.DefaultSerializers$FloatSerializer.read(DefaultSerializers.java:127) > > com.esotericsoftware.kryo.serializers.DefaultSerializers$FloatSerializer.read(DefaultSerializers.java:117) > > com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732) > com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:109) > > com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18) > > com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732) > ... > {noformat} > This ticket is to identify the cause of the exception in the GraphX commit so > the faulty commit can be fixed and merged back into master. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11054) WARN ReliableDeliverySupervisor: Association with remote system
[ https://issues.apache.org/jira/browse/SPARK-11054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14952225#comment-14952225 ] DjvuLee commented on SPARK-11054: - Since the default value for spark.akka.heartbeat.interval or spark.akka.heartbeat.pauses is very large, there should no relation with this Error. > WARN ReliableDeliverySupervisor: Association with remote system > --- > > Key: SPARK-11054 > URL: https://issues.apache.org/jira/browse/SPARK-11054 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.0 >Reporter: DjvuLee >Priority: Blocker > > WARN ReliableDeliverySupervisor: Association with remote system > [akka.tcp://sparkExecutor@10.23.0.36:38375] has failed, address is now gated > for [5000] ms. Reason: [Disassociated] > = > This is WARN occurred very frequently in 1.5.0 version, and I do not in > 1.4.1. Some related issue report set spark.akka.heartbeat.interval(default > 1000s) or spark.akka.heartbeat.pauses(default 6000s) can help, but is not > work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11054) WARN ReliableDeliverySupervisor: Association with remote system
DjvuLee created SPARK-11054: --- Summary: WARN ReliableDeliverySupervisor: Association with remote system Key: SPARK-11054 URL: https://issues.apache.org/jira/browse/SPARK-11054 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.5.0 Reporter: DjvuLee Priority: Blocker WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkExecutor@10.23.0.36:38375] has failed, address is now gated for [5000] ms. Reason: [Disassociated] = This is WARN occurred very frequently in 1.5.0 version, and I do not in 1.4.1. Some related issue report set spark.akka.heartbeat.interval(default 1000s) or spark.akka.heartbeat.pauses(default 6000s) can help, but is not work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11054) WARN ReliableDeliverySupervisor: Association with remote system
[ https://issues.apache.org/jira/browse/SPARK-11054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14952653#comment-14952653 ] DjvuLee commented on SPARK-11054: - I am very sorry for set the wrong Priority! I knew this problem occurred many times, not just for me, so I try to report this. > WARN ReliableDeliverySupervisor: Association with remote system > --- > > Key: SPARK-11054 > URL: https://issues.apache.org/jira/browse/SPARK-11054 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.0 >Reporter: DjvuLee >Priority: Minor > > WARN ReliableDeliverySupervisor: Association with remote system > [akka.tcp://sparkExecutor@10.23.0.36:38375] has failed, address is now gated > for [5000] ms. Reason: [Disassociated] > = > This is WARN occurred very frequently in 1.5.0 version, and I do not in > 1.4.1. Some related issue report set spark.akka.heartbeat.interval(default > 1000s) or spark.akka.heartbeat.pauses(default 6000s) can help, but is not > work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10717) remove the with Loging in the NioBlockTransferService
[ https://issues.apache.org/jira/browse/SPARK-10717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14877397#comment-14877397 ] DjvuLee commented on SPARK-10717: - change to final class NioBlockTransferService(conf: SparkConf, securityManager: SecurityManager) extends BlockTransferService {} > remove the with Loging in the NioBlockTransferService > - > > Key: SPARK-10717 > URL: https://issues.apache.org/jira/browse/SPARK-10717 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.5.0 >Reporter: DjvuLee >Priority: Minor > > Since BlockTransferService implement the Logging interface, we can remove the > with Logging in the NioBlockTransferService, keep the same as > NettyBlockTransferService > abstract class BlockTransferService extends ShuffleClient with Closeable with > Logging {} > final class NioBlockTransferService(conf: SparkConf, securityManager: > SecurityManager) > extends BlockTransferService with Logging {} > class NettyBlockTransferService(conf: SparkConf, securityManager: > SecurityManager, numCores: Int) extends BlockTransferService {} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10717) remove the with Loging in the NioBlockTransferService
DjvuLee created SPARK-10717: --- Summary: remove the with Loging in the NioBlockTransferService Key: SPARK-10717 URL: https://issues.apache.org/jira/browse/SPARK-10717 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.5.0 Reporter: DjvuLee Priority: Minor Since BlockTransferService implement the Logging interface, we can remove the with Logging in the NioBlockTransferService, keep the same as NettyBlockTransferService abstract class BlockTransferService extends ShuffleClient with Closeable with Logging {} final class NioBlockTransferService(conf: SparkConf, securityManager: SecurityManager) extends BlockTransferService with Logging {} class NettyBlockTransferService(conf: SparkConf, securityManager: SecurityManager, numCores: Int) extends BlockTransferService {} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4105) FAILED_TO_UNCOMPRESS(5) errors when fetching shuffle data with sort-based shuffle
[ https://issues.apache.org/jira/browse/SPARK-4105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14661207#comment-14661207 ] DjvuLee commented on SPARK-4105: Is there any progress on this bug? I have the same error on spark1.1.0, I occurred when I use the GroupByKey just as posted above or a sql join. when the data is not very big, this bug do not occurred, but when the data is bigger, the the FAILED_TO_UNCOMPRESS throwed. by the way, I just using the Hash-based shuffle. FAILED_TO_UNCOMPRESS(5) errors when fetching shuffle data with sort-based shuffle - Key: SPARK-4105 URL: https://issues.apache.org/jira/browse/SPARK-4105 Project: Spark Issue Type: Bug Components: Shuffle, Spark Core Affects Versions: 1.2.0, 1.2.1, 1.3.0 Reporter: Josh Rosen Assignee: Josh Rosen Priority: Blocker Attachments: JavaObjectToSerialize.java, SparkFailedToUncompressGenerator.scala We have seen non-deterministic {{FAILED_TO_UNCOMPRESS(5)}} errors during shuffle read. Here's a sample stacktrace from an executor: {code} 14/10/23 18:34:11 ERROR Executor: Exception in task 1747.3 in stage 11.0 (TID 33053) java.io.IOException: FAILED_TO_UNCOMPRESS(5) at org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:78) at org.xerial.snappy.SnappyNative.rawUncompress(Native Method) at org.xerial.snappy.Snappy.rawUncompress(Snappy.java:391) at org.xerial.snappy.Snappy.uncompress(Snappy.java:427) at org.xerial.snappy.SnappyInputStream.readFully(SnappyInputStream.java:127) at org.xerial.snappy.SnappyInputStream.readHeader(SnappyInputStream.java:88) at org.xerial.snappy.SnappyInputStream.init(SnappyInputStream.java:58) at org.apache.spark.io.SnappyCompressionCodec.compressedInputStream(CompressionCodec.scala:128) at org.apache.spark.storage.BlockManager.wrapForCompression(BlockManager.scala:1090) at org.apache.spark.storage.ShuffleBlockFetcherIterator$$anon$1$$anonfun$onBlockFetchSuccess$1.apply(ShuffleBlockFetcherIterator.scala:116) at org.apache.spark.storage.ShuffleBlockFetcherIterator$$anon$1$$anonfun$onBlockFetchSuccess$1.apply(ShuffleBlockFetcherIterator.scala:115) at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:243) at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:52) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) at org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:129) at org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$5.apply(CoGroupedRDD.scala:159) at org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$5.apply(CoGroupedRDD.scala:158) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) at org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:158) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:31) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.rdd.FlatMappedValuesRDD.compute(FlatMappedValuesRDD.scala:31) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181) at
[jira] [Commented] (SPARK-5739) Size exceeds Integer.MAX_VALUE in File Map
[ https://issues.apache.org/jira/browse/SPARK-5739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316393#comment-14316393 ] DjvuLee commented on SPARK-5739: Yes, 1M maybe enough for the Kmeans algorithm. But if we consider other machine learning algorithm, such as logistic regression, then 10^7 dimension is not such big. LR in the ad click model in the real maybe common(I ever heard by my friends), so how Spark can deal well with this? Maybe the weight parameter in LR is only one, but when the dimension is up to billion, the data can up to GB. Size exceeds Integer.MAX_VALUE in File Map -- Key: SPARK-5739 URL: https://issues.apache.org/jira/browse/SPARK-5739 Project: Spark Issue Type: Bug Affects Versions: 1.1.1 Environment: Spark1.1.1 on a cluster with 12 node. Every node with 128GB RAM, 24 Core. the data is just 40GB, and there is 48 parallel task on a node. Reporter: DjvuLee I just run the kmeans algorithm using a random generate data,but occurred this problem after some iteration. I try several time, and this problem is reproduced. Because the data is random generate, so I guess is there a bug ? Or if random data can lead to such a scenario that the size is bigger than Integer.MAX_VALUE, can we check the size before using the file map? 015-02-11 00:39:36,057 [sparkDriver-akka.actor.default-dispatcher-15] WARN org.apache.spark.util.SizeEstimator - Failed to check whether UseCompressedOops is set; assuming yes [error] (run-main-0) java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:850) at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:105) at org.apache.spark.storage.DiskStore.putIterator(DiskStore.scala:86) at org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:140) at org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:105) at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:747) at org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:598) at org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:869) at org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:79) at org.apache.spark.broadcast.TorrentBroadcast.init(TorrentBroadcast.scala:68) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:36) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29) at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62) at org.apache.spark.SparkContext.broadcast(SparkContext.scala:809) at org.apache.spark.mllib.clustering.KMeans.initKMeansParallel(KMeans.scala:270) at org.apache.spark.mllib.clustering.KMeans.runBreeze(KMeans.scala:143) at org.apache.spark.mllib.clustering.KMeans.run(KMeans.scala:126) at org.apache.spark.mllib.clustering.KMeans$.train(KMeans.scala:338) at org.apache.spark.mllib.clustering.KMeans$.train(KMeans.scala:348) at KMeansDataGenerator$.main(kmeans.scala:105) at KMeansDataGenerator.main(kmeans.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55) at java.lang.reflect.Method.invoke(Method.java:619) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5739) Size exceeds Integer.MAX_VALUE in File Map
[ https://issues.apache.org/jira/browse/SPARK-5739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316430#comment-14316430 ] DjvuLee commented on SPARK-5739: Ok, Got it, I will look the code for more detail. I will close this issue, but add a check for the size is still a good advise? Size exceeds Integer.MAX_VALUE in File Map -- Key: SPARK-5739 URL: https://issues.apache.org/jira/browse/SPARK-5739 Project: Spark Issue Type: Bug Affects Versions: 1.1.1 Environment: Spark1.1.1 on a cluster with 12 node. Every node with 128GB RAM, 24 Core. the data is just 40GB, and there is 48 parallel task on a node. Reporter: DjvuLee I just run the kmeans algorithm using a random generate data,but occurred this problem after some iteration. I try several time, and this problem is reproduced. Because the data is random generate, so I guess is there a bug ? Or if random data can lead to such a scenario that the size is bigger than Integer.MAX_VALUE, can we check the size before using the file map? 015-02-11 00:39:36,057 [sparkDriver-akka.actor.default-dispatcher-15] WARN org.apache.spark.util.SizeEstimator - Failed to check whether UseCompressedOops is set; assuming yes [error] (run-main-0) java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:850) at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:105) at org.apache.spark.storage.DiskStore.putIterator(DiskStore.scala:86) at org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:140) at org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:105) at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:747) at org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:598) at org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:869) at org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:79) at org.apache.spark.broadcast.TorrentBroadcast.init(TorrentBroadcast.scala:68) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:36) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29) at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62) at org.apache.spark.SparkContext.broadcast(SparkContext.scala:809) at org.apache.spark.mllib.clustering.KMeans.initKMeansParallel(KMeans.scala:270) at org.apache.spark.mllib.clustering.KMeans.runBreeze(KMeans.scala:143) at org.apache.spark.mllib.clustering.KMeans.run(KMeans.scala:126) at org.apache.spark.mllib.clustering.KMeans$.train(KMeans.scala:338) at org.apache.spark.mllib.clustering.KMeans$.train(KMeans.scala:348) at KMeansDataGenerator$.main(kmeans.scala:105) at KMeansDataGenerator.main(kmeans.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55) at java.lang.reflect.Method.invoke(Method.java:619) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5739) Size exceeds Integer.MAX_VALUE in File Map
[ https://issues.apache.org/jira/browse/SPARK-5739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14317384#comment-14317384 ] DjvuLee commented on SPARK-5739: Yes, I do not explain cleanly. What I mean is that we can add a check in the getBytes method in the DiskStore.scala file. just before call the following function. Some(channel.map(MapMode.READ_ONLY, segment.offset, segment.length)) Size exceeds Integer.MAX_VALUE in File Map -- Key: SPARK-5739 URL: https://issues.apache.org/jira/browse/SPARK-5739 Project: Spark Issue Type: Bug Affects Versions: 1.1.1 Environment: Spark1.1.1 on a cluster with 12 node. Every node with 128GB RAM, 24 Core. the data is just 40GB, and there is 48 parallel task on a node. Reporter: DjvuLee I just run the kmeans algorithm using a random generate data,but occurred this problem after some iteration. I try several time, and this problem is reproduced. Because the data is random generate, so I guess is there a bug ? Or if random data can lead to such a scenario that the size is bigger than Integer.MAX_VALUE, can we check the size before using the file map? 015-02-11 00:39:36,057 [sparkDriver-akka.actor.default-dispatcher-15] WARN org.apache.spark.util.SizeEstimator - Failed to check whether UseCompressedOops is set; assuming yes [error] (run-main-0) java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:850) at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:105) at org.apache.spark.storage.DiskStore.putIterator(DiskStore.scala:86) at org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:140) at org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:105) at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:747) at org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:598) at org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:869) at org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:79) at org.apache.spark.broadcast.TorrentBroadcast.init(TorrentBroadcast.scala:68) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:36) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29) at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62) at org.apache.spark.SparkContext.broadcast(SparkContext.scala:809) at org.apache.spark.mllib.clustering.KMeans.initKMeansParallel(KMeans.scala:270) at org.apache.spark.mllib.clustering.KMeans.runBreeze(KMeans.scala:143) at org.apache.spark.mllib.clustering.KMeans.run(KMeans.scala:126) at org.apache.spark.mllib.clustering.KMeans$.train(KMeans.scala:338) at org.apache.spark.mllib.clustering.KMeans$.train(KMeans.scala:348) at KMeansDataGenerator$.main(kmeans.scala:105) at KMeansDataGenerator.main(kmeans.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55) at java.lang.reflect.Method.invoke(Method.java:619) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5739) Size exceeds Integer.MAX_VALUE in File Map
[ https://issues.apache.org/jira/browse/SPARK-5739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DjvuLee updated SPARK-5739: --- Summary: Size exceeds Integer.MAX_VALUE in File Map (was: Size exceeds Integer.MAX_VALUE in FileMap) Size exceeds Integer.MAX_VALUE in File Map -- Key: SPARK-5739 URL: https://issues.apache.org/jira/browse/SPARK-5739 Project: Spark Issue Type: Bug Affects Versions: 1.1.1 Environment: Spark1.1.1 on a cluster with 12 node. Every node with 128GB RAM, 24 Core. the data is just 40GB, and there is 48 parallel task on a node. Reporter: DjvuLee I just run the kmeans algorithm using a random generate data,but occurred this problem after some iteration. I try several time, and this problem is reproduced. Because the data is random generate, so I guess is there a bug ? Or if random data can lead to such a scenario that the size is bigger than Integer.MAX_VALUE, can we check the size before using the file map? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5739) Size exceeds Integer.MAX_VALUE in File Map
[ https://issues.apache.org/jira/browse/SPARK-5739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14315845#comment-14315845 ] DjvuLee commented on SPARK-5739: the data is generated by the example KMeansDataGenerator in the Spark, I just change the parameter for val parts = 480 val numPoints = 480 //1500 val k = 10 //args(3).toInt val d = 1000//args(4).toInt val r = 1.0 //args(5).toDouble val iter = 8 Size exceeds Integer.MAX_VALUE in File Map -- Key: SPARK-5739 URL: https://issues.apache.org/jira/browse/SPARK-5739 Project: Spark Issue Type: Bug Affects Versions: 1.1.1 Environment: Spark1.1.1 on a cluster with 12 node. Every node with 128GB RAM, 24 Core. the data is just 40GB, and there is 48 parallel task on a node. Reporter: DjvuLee I just run the kmeans algorithm using a random generate data,but occurred this problem after some iteration. I try several time, and this problem is reproduced. Because the data is random generate, so I guess is there a bug ? Or if random data can lead to such a scenario that the size is bigger than Integer.MAX_VALUE, can we check the size before using the file map? 015-02-11 00:39:36,057 [sparkDriver-akka.actor.default-dispatcher-15] WARN org.apache.spark.util.SizeEstimator - Failed to check whether UseCompressedOops is set; assuming yes [error] (run-main-0) java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:850) at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:105) at org.apache.spark.storage.DiskStore.putIterator(DiskStore.scala:86) at org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:140) at org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:105) at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:747) at org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:598) at org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:869) at org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:79) at org.apache.spark.broadcast.TorrentBroadcast.init(TorrentBroadcast.scala:68) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:36) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29) at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62) at org.apache.spark.SparkContext.broadcast(SparkContext.scala:809) at org.apache.spark.mllib.clustering.KMeans.initKMeansParallel(KMeans.scala:270) at org.apache.spark.mllib.clustering.KMeans.runBreeze(KMeans.scala:143) at org.apache.spark.mllib.clustering.KMeans.run(KMeans.scala:126) at org.apache.spark.mllib.clustering.KMeans$.train(KMeans.scala:338) at org.apache.spark.mllib.clustering.KMeans$.train(KMeans.scala:348) at KMeansDataGenerator$.main(kmeans.scala:105) at KMeansDataGenerator.main(kmeans.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55) at java.lang.reflect.Method.invoke(Method.java:619) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5739) Size exceeds Integer.MAX_VALUE in FileMap
DjvuLee created SPARK-5739: -- Summary: Size exceeds Integer.MAX_VALUE in FileMap Key: SPARK-5739 URL: https://issues.apache.org/jira/browse/SPARK-5739 Project: Spark Issue Type: Bug Affects Versions: 1.1.1 Environment: Spark1.1.1 on a cluster with 12 node. Every node with 128GB RAM, 24 Core. the data is just 40GB, and there is 48 parallel task on a node. Reporter: DjvuLee I just run the kmeans algorithm using a random generate data,but occurred this problem after some iteration. I try several time, and this problem is reproduced. Because the data is random generate, so I guess is there a bug ? Or if random data can lead to such a scenario that the size is bigger than Integer.MAX_VALUE, can we check the size before using the file map? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5739) Size exceeds Integer.MAX_VALUE in File Map
[ https://issues.apache.org/jira/browse/SPARK-5739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DjvuLee updated SPARK-5739: --- Description: I just run the kmeans algorithm using a random generate data,but occurred this problem after some iteration. I try several time, and this problem is reproduced. Because the data is random generate, so I guess is there a bug ? Or if random data can lead to such a scenario that the size is bigger than Integer.MAX_VALUE, can we check the size before using the file map? 015-02-11 00:39:36,057 [sparkDriver-akka.actor.default-dispatcher-15] WARN org.apache.spark.util.SizeEstimator - Failed to check whether UseCompressedOops is set; assuming yes [error] (run-main-0) java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:850) at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:105) at org.apache.spark.storage.DiskStore.putIterator(DiskStore.scala:86) at org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:140) at org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:105) at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:747) at org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:598) at org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:869) at org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:79) at org.apache.spark.broadcast.TorrentBroadcast.init(TorrentBroadcast.scala:68) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:36) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29) at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62) at org.apache.spark.SparkContext.broadcast(SparkContext.scala:809) at org.apache.spark.mllib.clustering.KMeans.initKMeansParallel(KMeans.scala:270) at org.apache.spark.mllib.clustering.KMeans.runBreeze(KMeans.scala:143) at org.apache.spark.mllib.clustering.KMeans.run(KMeans.scala:126) at org.apache.spark.mllib.clustering.KMeans$.train(KMeans.scala:338) at org.apache.spark.mllib.clustering.KMeans$.train(KMeans.scala:348) at KMeansDataGenerator$.main(kmeans.scala:105) at KMeansDataGenerator.main(kmeans.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55) at java.lang.reflect.Method.invoke(Method.java:619) was: I just run the kmeans algorithm using a random generate data,but occurred this problem after some iteration. I try several time, and this problem is reproduced. Because the data is random generate, so I guess is there a bug ? Or if random data can lead to such a scenario that the size is bigger than Integer.MAX_VALUE, can we check the size before using the file map? Size exceeds Integer.MAX_VALUE in File Map -- Key: SPARK-5739 URL: https://issues.apache.org/jira/browse/SPARK-5739 Project: Spark Issue Type: Bug Affects Versions: 1.1.1 Environment: Spark1.1.1 on a cluster with 12 node. Every node with 128GB RAM, 24 Core. the data is just 40GB, and there is 48 parallel task on a node. Reporter: DjvuLee I just run the kmeans algorithm using a random generate data,but occurred this problem after some iteration. I try several time, and this problem is reproduced. Because the data is random generate, so I guess is there a bug ? Or if random data can lead to such a scenario that the size is bigger than Integer.MAX_VALUE, can we check the size before using the file map? 015-02-11 00:39:36,057 [sparkDriver-akka.actor.default-dispatcher-15] WARN org.apache.spark.util.SizeEstimator - Failed to check whether UseCompressedOops is set; assuming yes [error] (run-main-0) java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:850) at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:105) at org.apache.spark.storage.DiskStore.putIterator(DiskStore.scala:86) at org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:140) at org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:105) at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:747)
[jira] [Updated] (SPARK-5375) Specify more clearly about the max thread meaning in the ConnectionManager
[ https://issues.apache.org/jira/browse/SPARK-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DjvuLee updated SPARK-5375: --- Description: In the ConnectionManager.scala file, there is three thread pool: handleMessageExecutor, handleReadWriteExecutor, handleConnectExecutor. such as: private val handleMessageExecutor = new ThreadPoolExecutor( conf.getInt(spark.core.connection.handler.threads.min, 20), conf.getInt(spark.core.connection.handler.threads.max, 60), conf.getInt(spark.core.connection.handler.threads.keepalive, 60), TimeUnit.SECONDS, new LinkedBlockingDeque[Runnable](), Utils.namedThreadFactory(handle-message-executor)) Since we use a LinkedBlockingDeque, so the max thread parameter has no meaning. Every time I read the code, this can lead to Confusing for me , Maybe we can add some comment in those place? was: In the ConnectionManager.scala file, there is three thread pool: handleMessageExecutor, handleReadWriteExecutor, handleConnectExecutor. such as: private val handleMessageExecutor = new ThreadPoolExecutor( conf.getInt(spark.core.connection.handler.threads.min, 20), conf.getInt(spark.core.connection.handler.threads.max, 60), conf.getInt(spark.core.connection.handler.threads.keepalive, 60), TimeUnit.SECONDS, new LinkedBlockingDeque[Runnable](), Utils.namedThreadFactory(handle-message-executor)) Since we use a LinkedBlockingDeque, so the max thread parameter have no meaning. Every time I read the code, this can lead to Confusing for me , Maybe we can add some comment in those place? Specify more clearly about the max thread meaning in the ConnectionManager -- Key: SPARK-5375 URL: https://issues.apache.org/jira/browse/SPARK-5375 Project: Spark Issue Type: Improvement Affects Versions: 1.1.0 Reporter: DjvuLee In the ConnectionManager.scala file, there is three thread pool: handleMessageExecutor, handleReadWriteExecutor, handleConnectExecutor. such as: private val handleMessageExecutor = new ThreadPoolExecutor( conf.getInt(spark.core.connection.handler.threads.min, 20), conf.getInt(spark.core.connection.handler.threads.max, 60), conf.getInt(spark.core.connection.handler.threads.keepalive, 60), TimeUnit.SECONDS, new LinkedBlockingDeque[Runnable](), Utils.namedThreadFactory(handle-message-executor)) Since we use a LinkedBlockingDeque, so the max thread parameter has no meaning. Every time I read the code, this can lead to Confusing for me , Maybe we can add some comment in those place? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5375) Specify more clearly about the max thread meaning in the ConnectionManager
DjvuLee created SPARK-5375: -- Summary: Specify more clearly about the max thread meaning in the ConnectionManager Key: SPARK-5375 URL: https://issues.apache.org/jira/browse/SPARK-5375 Project: Spark Issue Type: Improvement Affects Versions: 1.1.0 Reporter: DjvuLee In the ConnectionManager.scala file, there is three thread pool: handleMessageExecutor, handleReadWriteExecutor, handleConnectExecutor. such as: private val handleMessageExecutor = new ThreadPoolExecutor( conf.getInt(spark.core.connection.handler.threads.min, 20), conf.getInt(spark.core.connection.handler.threads.max, 60), conf.getInt(spark.core.connection.handler.threads.keepalive, 60), TimeUnit.SECONDS, new LinkedBlockingDeque[Runnable](), Utils.namedThreadFactory(handle-message-executor)) Since we use a LinkedBlockingDeque, so the max thread parameter have no meaning. Every time I read the code, this can lead to Confusing for me , Maybe we can add some comment in those place? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5375) Specify more clearly about the max thread meaning in the ConnectionManager
[ https://issues.apache.org/jira/browse/SPARK-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DjvuLee updated SPARK-5375: --- Description: In the ConnectionManager.scala file, there is three thread pool: handleMessageExecutor, handleReadWriteExecutor, handleConnectExecutor. such as: private val handleMessageExecutor = new ThreadPoolExecutor( conf.getInt(spark.core.connection.handler.threads.min, 20), conf.getInt(spark.core.connection.handler.threads.max, 60), conf.getInt(spark.core.connection.handler.threads.keepalive, 60), TimeUnit.SECONDS, new LinkedBlockingDeque[Runnable](), Utils.namedThreadFactory(handle-message-executor)) Since we use a LinkedBlockingDeque, so the max thread parameter have no meaning. Every time I read the code, this can lead to Confusing for me , Maybe we can add some comment in those place? was: In the ConnectionManager.scala file, there is three thread pool: handleMessageExecutor, handleReadWriteExecutor, handleConnectExecutor. such as: private val handleMessageExecutor = new ThreadPoolExecutor( conf.getInt(spark.core.connection.handler.threads.min, 20), conf.getInt(spark.core.connection.handler.threads.max, 60), conf.getInt(spark.core.connection.handler.threads.keepalive, 60), TimeUnit.SECONDS, new LinkedBlockingDeque[Runnable](), Utils.namedThreadFactory(handle-message-executor)) Since we use a LinkedBlockingDeque, so the max thread parameter have no meaning. Every time I read the code, this can lead to Confusing for me , Maybe we can add some comment in those place? Specify more clearly about the max thread meaning in the ConnectionManager -- Key: SPARK-5375 URL: https://issues.apache.org/jira/browse/SPARK-5375 Project: Spark Issue Type: Improvement Affects Versions: 1.1.0 Reporter: DjvuLee In the ConnectionManager.scala file, there is three thread pool: handleMessageExecutor, handleReadWriteExecutor, handleConnectExecutor. such as: private val handleMessageExecutor = new ThreadPoolExecutor( conf.getInt(spark.core.connection.handler.threads.min, 20), conf.getInt(spark.core.connection.handler.threads.max, 60), conf.getInt(spark.core.connection.handler.threads.keepalive, 60), TimeUnit.SECONDS, new LinkedBlockingDeque[Runnable](), Utils.namedThreadFactory(handle-message-executor)) Since we use a LinkedBlockingDeque, so the max thread parameter have no meaning. Every time I read the code, this can lead to Confusing for me , Maybe we can add some comment in those place? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1112) When spark.akka.frameSize 10, task results bigger than 10MiB block execution
[ https://issues.apache.org/jira/browse/SPARK-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073244#comment-14073244 ] DjvuLee commented on SPARK-1112: Does anyone test in version0.9.2,I found it also failed , while v1.0.1 v1.1.0 is ok. When spark.akka.frameSize 10, task results bigger than 10MiB block execution -- Key: SPARK-1112 URL: https://issues.apache.org/jira/browse/SPARK-1112 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 0.9.0, 1.0.0 Reporter: Guillaume Pitel Assignee: Xiangrui Meng Priority: Blocker Fix For: 0.9.2 When I set the spark.akka.frameSize to something over 10, the messages sent from the executors to the driver completely block the execution if the message is bigger than 10MiB and smaller than the frameSize (if it's above the frameSize, it's ok) Workaround is to set the spark.akka.frameSize to 10. In this case, since 0.8.1, the blockManager deal with the data to be sent. It seems slower than akka direct message though. The configuration seems to be correctly read (see actorSystemConfig.txt), so I don't see where the 10MiB could come from -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2156) When the size of serialized results for one partition is slightly smaller than 10MB (the default akka.frameSize), the execution blocks
[ https://issues.apache.org/jira/browse/SPARK-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064965#comment-14064965 ] DjvuLee commented on SPARK-2156: I see this fixed in the spark branch-0.9 in the github, but does it updated in the spark v0.9.1 in the http://spark.apache.org/ site? When the size of serialized results for one partition is slightly smaller than 10MB (the default akka.frameSize), the execution blocks -- Key: SPARK-2156 URL: https://issues.apache.org/jira/browse/SPARK-2156 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 0.9.1, 1.0.0 Environment: AWS EC2 1 master 2 slaves with the instance type of r3.2xlarge Reporter: Chen Jin Assignee: Xiangrui Meng Priority: Blocker Fix For: 0.9.2, 1.0.1, 1.1.0 Original Estimate: 504h Remaining Estimate: 504h I have done some experiments when the frameSize is around 10MB . 1) spark.akka.frameSize = 10 If one of the partition size is very close to 10MB, say 9.97MB, the execution blocks without any exception or warning. Worker finished the task to send the serialized result, and then throw exception saying hadoop IPC client connection stops (changing the logging to debug level). However, the master never receives the results and the program just hangs. But if sizes for all the partitions less than some number btw 9.96MB amd 9.97MB, the program works fine. 2) spark.akka.frameSize = 9 when the partition size is just a little bit smaller than 9MB, it fails as well. This bug behavior is not exactly what spark-1112 is about. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2138) The KMeans algorithm in the MLlib can lead to the Serialized Task size become bigger and bigger
[ https://issues.apache.org/jira/browse/SPARK-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061804#comment-14061804 ] DjvuLee commented on SPARK-2138: [~piotrszul] In my opinion, if your task size if bigger than the akka.frameSize, it should work failed, because the akka.frameSize is the threshold. What I confused is that the serialized task size should not become bigger and bigger. In any opinion, it seem that the KMeans can work succeeded in the v1.0.1 version. In the v0.9.0 version, even you increase the akka.frameSize parameter, your kmeans job can not be done. The KMeans algorithm in the MLlib can lead to the Serialized Task size become bigger and bigger --- Key: SPARK-2138 URL: https://issues.apache.org/jira/browse/SPARK-2138 Project: Spark Issue Type: Bug Components: MLlib Affects Versions: 0.9.0, 0.9.1 Reporter: DjvuLee Assignee: Xiangrui Meng When the algorithm running at certain stage, when running the reduceBykey() function, It can lead to Executor Lost and Task lost, after several times. the application exit. When this error occurred, the size of serialized task is bigger than 10MB, and the size become larger as the iteration increase. the data generation file: https://gist.github.com/djvulee/7e3b2c9eb33ff0037622 the running code: https://gist.github.com/djvulee/6bf00e60885215e3bfd5 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2138) The KMeans algorithm in the MLlib can lead to the Serialized Task size become bigger and bigger
[ https://issues.apache.org/jira/browse/SPARK-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061843#comment-14061843 ] DjvuLee commented on SPARK-2138: Oh, So can we improve this better? The KMeans algorithm in the MLlib can lead to the Serialized Task size become bigger and bigger --- Key: SPARK-2138 URL: https://issues.apache.org/jira/browse/SPARK-2138 Project: Spark Issue Type: Bug Components: MLlib Affects Versions: 0.9.0, 0.9.1 Reporter: DjvuLee Assignee: Xiangrui Meng When the algorithm running at certain stage, when running the reduceBykey() function, It can lead to Executor Lost and Task lost, after several times. the application exit. When this error occurred, the size of serialized task is bigger than 10MB, and the size become larger as the iteration increase. the data generation file: https://gist.github.com/djvulee/7e3b2c9eb33ff0037622 the running code: https://gist.github.com/djvulee/6bf00e60885215e3bfd5 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2138) The KMeans algorithm in the MLlib can lead to the Serialized Task size become bigger and bigger
[ https://issues.apache.org/jira/browse/SPARK-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062026#comment-14062026 ] DjvuLee commented on SPARK-2138: In my experiment, I set the akka.frameSize=200, my data is 1.8G, when the iteration become 18th, the size of serialized task is larger than 200, and the Spark application exit, so maybe we should improve this. As for this issue, it actually not solved, although kmeans job can run success when set a large akka.framesize parameter. so I think keep this issue open is helpful. The KMeans algorithm in the MLlib can lead to the Serialized Task size become bigger and bigger --- Key: SPARK-2138 URL: https://issues.apache.org/jira/browse/SPARK-2138 Project: Spark Issue Type: Bug Components: MLlib Affects Versions: 0.9.0, 0.9.1 Reporter: DjvuLee Assignee: Xiangrui Meng When the algorithm running at certain stage, when running the reduceBykey() function, It can lead to Executor Lost and Task lost, after several times. the application exit. When this error occurred, the size of serialized task is bigger than 10MB, and the size become larger as the iteration increase. the data generation file: https://gist.github.com/djvulee/7e3b2c9eb33ff0037622 the running code: https://gist.github.com/djvulee/6bf00e60885215e3bfd5 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2138) The KMeans algorithm in the MLlib can lead to the Serialized Task size become bigger and bigger
[ https://issues.apache.org/jira/browse/SPARK-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062163#comment-14062163 ] DjvuLee commented on SPARK-2138: oh, I am a little sorry that I write some mistaken in my last comment, 18 is not the iteration, it is the stage id. [~mengxr] your speculate is right, after I read the code, I found it happens in the initialization stage indeed, with no relationship with iteration number. And if you have a higher cluster for the final result, the size of serialized task will become higher during initialization stage. The KMeans algorithm in the MLlib can lead to the Serialized Task size become bigger and bigger --- Key: SPARK-2138 URL: https://issues.apache.org/jira/browse/SPARK-2138 Project: Spark Issue Type: Bug Components: MLlib Affects Versions: 0.9.0, 0.9.1 Reporter: DjvuLee Assignee: Xiangrui Meng When the algorithm running at certain stage, when running the reduceBykey() function, It can lead to Executor Lost and Task lost, after several times. the application exit. When this error occurred, the size of serialized task is bigger than 10MB, and the size become larger as the iteration increase. the data generation file: https://gist.github.com/djvulee/7e3b2c9eb33ff0037622 the running code: https://gist.github.com/djvulee/6bf00e60885215e3bfd5 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2138) The KMeans algorithm in the MLlib can lead to the Serialized Task size become bigger and bigger
[ https://issues.apache.org/jira/browse/SPARK-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048488#comment-14048488 ] DjvuLee commented on SPARK-2138: @[~piotrszul] Thanks for your test, I will also test this as soon as possible. If I have any result, I will update on this issue. The KMeans algorithm in the MLlib can lead to the Serialized Task size become bigger and bigger --- Key: SPARK-2138 URL: https://issues.apache.org/jira/browse/SPARK-2138 Project: Spark Issue Type: Bug Components: MLlib Affects Versions: 0.9.0, 0.9.1 Reporter: DjvuLee Assignee: Xiangrui Meng When the algorithm running at certain stage, when running the reduceBykey() function, It can lead to Executor Lost and Task lost, after several times. the application exit. When this error occurred, the size of serialized task is bigger than 10MB, and the size become larger as the iteration increase. the data generation file: https://gist.github.com/djvulee/7e3b2c9eb33ff0037622 the running code: https://gist.github.com/djvulee/6bf00e60885215e3bfd5 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2138) The KMeans algorithm in the MLlib can lead to the Serialized Task size become bigger and bigger
[ https://issues.apache.org/jira/browse/SPARK-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046597#comment-14046597 ] DjvuLee commented on SPARK-2138: If this bug fixed, shall we closed this issue? [~mengxr] The KMeans algorithm in the MLlib can lead to the Serialized Task size become bigger and bigger --- Key: SPARK-2138 URL: https://issues.apache.org/jira/browse/SPARK-2138 Project: Spark Issue Type: Bug Components: MLlib Affects Versions: 0.9.0, 0.9.1 Reporter: DjvuLee Assignee: Xiangrui Meng When the algorithm running at certain stage, when running the reduceBykey() function, It can lead to Executor Lost and Task lost, after several times. the application exit. When this error occurred, the size of serialized task is bigger than 10MB, and the size become larger as the iteration increase. the data generation file: https://gist.github.com/djvulee/7e3b2c9eb33ff0037622 the running code: https://gist.github.com/djvulee/6bf00e60885215e3bfd5 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2138) The KMeans algorithm in the MLlib can lead to the Serialized Task size become bigger and bigger
[ https://issues.apache.org/jira/browse/SPARK-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14036892#comment-14036892 ] DjvuLee commented on SPARK-2138: Thanks very much! I am very glad to see that I reported a real bug, not caused by my mistaken. The KMeans algorithm in the MLlib can lead to the Serialized Task size become bigger and bigger --- Key: SPARK-2138 URL: https://issues.apache.org/jira/browse/SPARK-2138 Project: Spark Issue Type: Bug Components: MLlib Affects Versions: 0.9.0, 0.9.1 Reporter: DjvuLee Assignee: Xiangrui Meng When the algorithm running at certain stage, when running the reduceBykey() function, It can lead to Executor Lost and Task lost, after several times. the application exit. When this error occurred, the size of serialized task is bigger than 10MB, and the size become larger as the iteration increase. the data generation file: https://gist.github.com/djvulee/7e3b2c9eb33ff0037622 the running code: https://gist.github.com/djvulee/6bf00e60885215e3bfd5 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2138) The KMeans algorithm in the MLlib can lead to the Serialized Task size become bigger and bigger
DjvuLee created SPARK-2138: -- Summary: The KMeans algorithm in the MLlib can lead to the Serialized Task size become bigger and bigger Key: SPARK-2138 URL: https://issues.apache.org/jira/browse/SPARK-2138 Project: Spark Issue Type: Bug Components: MLlib Affects Versions: 0.9.1, 0.9.0 Reporter: DjvuLee When the algorithm running at certain stage, when running the reduceBykey() algorithm, It can lead to Executor Lost and Task lost, after several times. the application exit. When this error occurred, the size of serialized task is bigger than 10MB, and the size become larger as the iteration increase. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2138) The KMeans algorithm in the MLlib can lead to the Serialized Task size become bigger and bigger
[ https://issues.apache.org/jira/browse/SPARK-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DjvuLee updated SPARK-2138: --- Description: When the algorithm running at certain stage, when running the reduceBykey() algorithm, It can lead to Executor Lost and Task lost, after several times. the application exit. When this error occurred, the size of serialized task is bigger than 10MB, and the size become larger as the iteration increase. the data generation file: https://gist.github.com/djvulee/7e3b2c9eb33ff0037622 the running code: https://gist.github.com/djvulee/6bf00e60885215e3bfd5 was: When the algorithm running at certain stage, when running the reduceBykey() algorithm, It can lead to Executor Lost and Task lost, after several times. the application exit. When this error occurred, the size of serialized task is bigger than 10MB, and the size become larger as the iteration increase. The KMeans algorithm in the MLlib can lead to the Serialized Task size become bigger and bigger --- Key: SPARK-2138 URL: https://issues.apache.org/jira/browse/SPARK-2138 Project: Spark Issue Type: Bug Components: MLlib Affects Versions: 0.9.0, 0.9.1 Reporter: DjvuLee When the algorithm running at certain stage, when running the reduceBykey() algorithm, It can lead to Executor Lost and Task lost, after several times. the application exit. When this error occurred, the size of serialized task is bigger than 10MB, and the size become larger as the iteration increase. the data generation file: https://gist.github.com/djvulee/7e3b2c9eb33ff0037622 the running code: https://gist.github.com/djvulee/6bf00e60885215e3bfd5 -- This message was sent by Atlassian JIRA (v6.2#6252)