[jira] [Commented] (SPARK-19371) Cannot spread cached partitions evenly across executors
[ https://issues.apache.org/jira/browse/SPARK-19371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17320759#comment-17320759 ] Hyunjun Kim commented on SPARK-19371: - I had the same problem with Spark 2.4.7. My workaround was to use 'sc.textFile()' function by specifying the 'defaultMinPartitions' parameter. I didn't try other file formats though, try splitting your data as many as the number of partitions you want. > Cannot spread cached partitions evenly across executors > --- > > Key: SPARK-19371 > URL: https://issues.apache.org/jira/browse/SPARK-19371 > Project: Spark > Issue Type: Bug >Affects Versions: 1.6.1 >Reporter: Thunder Stumpges >Priority: Major > Labels: bulk-closed > Attachments: RDD Block Distribution on two executors.png, Unbalanced > RDD Blocks, and resulting task imbalance.png, Unbalanced RDD Blocks, and > resulting task imbalance.png, execution timeline.png > > > Before running an intensive iterative job (in this case a distributed topic > model training), we need to load a dataset and persist it across executors. > After loading from HDFS and persisting, the partitions are spread unevenly > across executors (based on the initial scheduling of the reads which are not > data locale sensitive). The partition sizes are even, just not their > distribution over executors. We currently have no way to force the partitions > to spread evenly, and as the iterative algorithm begins, tasks are > distributed to executors based on this initial load, forcing some very > unbalanced work. > This has been mentioned a > [number|http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-Partitions-not-distributed-evenly-to-executors-tt16988.html#a17059] > of > [times|http://apache-spark-user-list.1001560.n3.nabble.com/Spark-work-distribution-among-execs-tt26502.html] > in > [various|http://apache-spark-user-list.1001560.n3.nabble.com/Partitions-are-get-placed-on-the-single-node-tt26597.html] > user/dev group threads. > None of the discussions I could find had solutions that worked for me. Here > are examples of things I have tried. All resulted in partitions in memory > that were NOT evenly distributed to executors, causing future tasks to be > imbalanced across executors as well. > *Reduce Locality* > {code}spark.shuffle.reduceLocality.enabled=false/true{code} > *"Legacy" memory mode* > {code}spark.memory.useLegacyMode = true/false{code} > *Basic load and repartition* > {code} > val numPartitions = 48*16 > val df = sqlContext.read. > parquet("/data/folder_to_load"). > repartition(numPartitions). > persist > df.count > {code} > *Load and repartition to 2x partitions, then shuffle repartition down to > desired partitions* > {code} > val numPartitions = 48*16 > val df2 = sqlContext.read. > parquet("/data/folder_to_load"). > repartition(numPartitions*2) > val df = df2.repartition(numPartitions). > persist > df.count > {code} > It would be great if when persisting an RDD/DataFrame, if we could request > that those partitions be stored evenly across executors in preparation for > future tasks. > I'm not sure if this is a more general issue (I.E. not just involving > persisting RDDs), but for the persisted in-memory case, it can make a HUGE > difference in the over-all running time of the remaining work. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19371) Cannot spread cached partitions evenly across executors
[ https://issues.apache.org/jira/browse/SPARK-19371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17100878#comment-17100878 ] Thunder Stumpges commented on SPARK-19371: -- Thank you for the comments [~honor], [~danmeany], and [~lebedev] ! I am glad to see there are others with this issue. We have had to "just live with it" for these years. And this job is STILL running in production, every 10 seconds, wasting computing resources and time due to imbalanced cached partitions across the executors. > Cannot spread cached partitions evenly across executors > --- > > Key: SPARK-19371 > URL: https://issues.apache.org/jira/browse/SPARK-19371 > Project: Spark > Issue Type: Bug >Affects Versions: 1.6.1 >Reporter: Thunder Stumpges >Priority: Major > Labels: bulk-closed > Attachments: RDD Block Distribution on two executors.png, Unbalanced > RDD Blocks, and resulting task imbalance.png, Unbalanced RDD Blocks, and > resulting task imbalance.png, execution timeline.png > > > Before running an intensive iterative job (in this case a distributed topic > model training), we need to load a dataset and persist it across executors. > After loading from HDFS and persisting, the partitions are spread unevenly > across executors (based on the initial scheduling of the reads which are not > data locale sensitive). The partition sizes are even, just not their > distribution over executors. We currently have no way to force the partitions > to spread evenly, and as the iterative algorithm begins, tasks are > distributed to executors based on this initial load, forcing some very > unbalanced work. > This has been mentioned a > [number|http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-Partitions-not-distributed-evenly-to-executors-tt16988.html#a17059] > of > [times|http://apache-spark-user-list.1001560.n3.nabble.com/Spark-work-distribution-among-execs-tt26502.html] > in > [various|http://apache-spark-user-list.1001560.n3.nabble.com/Partitions-are-get-placed-on-the-single-node-tt26597.html] > user/dev group threads. > None of the discussions I could find had solutions that worked for me. Here > are examples of things I have tried. All resulted in partitions in memory > that were NOT evenly distributed to executors, causing future tasks to be > imbalanced across executors as well. > *Reduce Locality* > {code}spark.shuffle.reduceLocality.enabled=false/true{code} > *"Legacy" memory mode* > {code}spark.memory.useLegacyMode = true/false{code} > *Basic load and repartition* > {code} > val numPartitions = 48*16 > val df = sqlContext.read. > parquet("/data/folder_to_load"). > repartition(numPartitions). > persist > df.count > {code} > *Load and repartition to 2x partitions, then shuffle repartition down to > desired partitions* > {code} > val numPartitions = 48*16 > val df2 = sqlContext.read. > parquet("/data/folder_to_load"). > repartition(numPartitions*2) > val df = df2.repartition(numPartitions). > persist > df.count > {code} > It would be great if when persisting an RDD/DataFrame, if we could request > that those partitions be stored evenly across executors in preparation for > future tasks. > I'm not sure if this is a more general issue (I.E. not just involving > persisting RDDs), but for the persisted in-memory case, it can make a HUGE > difference in the over-all running time of the remaining work. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19371) Cannot spread cached partitions evenly across executors
[ https://issues.apache.org/jira/browse/SPARK-19371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17100517#comment-17100517 ] serdar onur commented on SPARK-19371: - The year is 2020 and I am still trying to find a solution to this. I totally understand what [~thunderstumpges] was trying to achieve and I am trying to achieve the same. For a tool like spark, it is unacceptable not to be able to distribute the created partitions to the executors evenly. You know, we can create a custom partitioner to distribute the data to the partitions evenly by creating our own partition index. I was under the impression that a similar approach could be applied to spread these partitions to the executors evenly(using some sort of executor index for selection of executors during partition distribution). I have been googling this for a day now and I am very disappointed to say that up to now this seems to be not possible. > Cannot spread cached partitions evenly across executors > --- > > Key: SPARK-19371 > URL: https://issues.apache.org/jira/browse/SPARK-19371 > Project: Spark > Issue Type: Bug >Affects Versions: 1.6.1 >Reporter: Thunder Stumpges >Priority: Major > Labels: bulk-closed > Attachments: RDD Block Distribution on two executors.png, Unbalanced > RDD Blocks, and resulting task imbalance.png, Unbalanced RDD Blocks, and > resulting task imbalance.png, execution timeline.png > > > Before running an intensive iterative job (in this case a distributed topic > model training), we need to load a dataset and persist it across executors. > After loading from HDFS and persisting, the partitions are spread unevenly > across executors (based on the initial scheduling of the reads which are not > data locale sensitive). The partition sizes are even, just not their > distribution over executors. We currently have no way to force the partitions > to spread evenly, and as the iterative algorithm begins, tasks are > distributed to executors based on this initial load, forcing some very > unbalanced work. > This has been mentioned a > [number|http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-Partitions-not-distributed-evenly-to-executors-tt16988.html#a17059] > of > [times|http://apache-spark-user-list.1001560.n3.nabble.com/Spark-work-distribution-among-execs-tt26502.html] > in > [various|http://apache-spark-user-list.1001560.n3.nabble.com/Partitions-are-get-placed-on-the-single-node-tt26597.html] > user/dev group threads. > None of the discussions I could find had solutions that worked for me. Here > are examples of things I have tried. All resulted in partitions in memory > that were NOT evenly distributed to executors, causing future tasks to be > imbalanced across executors as well. > *Reduce Locality* > {code}spark.shuffle.reduceLocality.enabled=false/true{code} > *"Legacy" memory mode* > {code}spark.memory.useLegacyMode = true/false{code} > *Basic load and repartition* > {code} > val numPartitions = 48*16 > val df = sqlContext.read. > parquet("/data/folder_to_load"). > repartition(numPartitions). > persist > df.count > {code} > *Load and repartition to 2x partitions, then shuffle repartition down to > desired partitions* > {code} > val numPartitions = 48*16 > val df2 = sqlContext.read. > parquet("/data/folder_to_load"). > repartition(numPartitions*2) > val df = df2.repartition(numPartitions). > persist > df.count > {code} > It would be great if when persisting an RDD/DataFrame, if we could request > that those partitions be stored evenly across executors in preparation for > future tasks. > I'm not sure if this is a more general issue (I.E. not just involving > persisting RDDs), but for the persisted in-memory case, it can make a HUGE > difference in the over-all running time of the remaining work. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19371) Cannot spread cached partitions evenly across executors
[ https://issues.apache.org/jira/browse/SPARK-19371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349440#comment-16349440 ] Dan Meany commented on SPARK-19371: --- We have had this issue on many occasions and nothing I tried related to repartitioning would redistribute the work evenly. The job would spend most of its time using one executor with the rest idle. This was on both Spark 1 and 2. The work around was to create a few threads (like say 3 or 4) on the driver, and split the work into multiple independent units to run on the threads. This worked to fully saturate the executors with work all the time and reduced runtimes dramatically. Works for streaming or batch. There are two requirements for this to work: 1) The work units done in the threads must be independent of each other. 2) If there are any RDDs, DataFrames, or DataSets created in the main thread (such as dimension data for joining etc) that are used by the worker threads, it is a good idea to persist or cachtable and run an action like count on them in the main thread, just to be sure they are fully calculated first before launching the workers. Remember to unpersist/uncachetable afterward if using streaming so storage doesn't accumulate. 3) I save any exceptions generated in the thread objects, and after the Thread.joins in the main thread, check to see if any exceptions were thrown by the workers and report them. > Cannot spread cached partitions evenly across executors > --- > > Key: SPARK-19371 > URL: https://issues.apache.org/jira/browse/SPARK-19371 > Project: Spark > Issue Type: Bug >Affects Versions: 1.6.1 >Reporter: Thunder Stumpges >Priority: Major > Attachments: RDD Block Distribution on two executors.png, Unbalanced > RDD Blocks, and resulting task imbalance.png, Unbalanced RDD Blocks, and > resulting task imbalance.png, execution timeline.png > > > Before running an intensive iterative job (in this case a distributed topic > model training), we need to load a dataset and persist it across executors. > After loading from HDFS and persisting, the partitions are spread unevenly > across executors (based on the initial scheduling of the reads which are not > data locale sensitive). The partition sizes are even, just not their > distribution over executors. We currently have no way to force the partitions > to spread evenly, and as the iterative algorithm begins, tasks are > distributed to executors based on this initial load, forcing some very > unbalanced work. > This has been mentioned a > [number|http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-Partitions-not-distributed-evenly-to-executors-tt16988.html#a17059] > of > [times|http://apache-spark-user-list.1001560.n3.nabble.com/Spark-work-distribution-among-execs-tt26502.html] > in > [various|http://apache-spark-user-list.1001560.n3.nabble.com/Partitions-are-get-placed-on-the-single-node-tt26597.html] > user/dev group threads. > None of the discussions I could find had solutions that worked for me. Here > are examples of things I have tried. All resulted in partitions in memory > that were NOT evenly distributed to executors, causing future tasks to be > imbalanced across executors as well. > *Reduce Locality* > {code}spark.shuffle.reduceLocality.enabled=false/true{code} > *"Legacy" memory mode* > {code}spark.memory.useLegacyMode = true/false{code} > *Basic load and repartition* > {code} > val numPartitions = 48*16 > val df = sqlContext.read. > parquet("/data/folder_to_load"). > repartition(numPartitions). > persist > df.count > {code} > *Load and repartition to 2x partitions, then shuffle repartition down to > desired partitions* > {code} > val numPartitions = 48*16 > val df2 = sqlContext.read. > parquet("/data/folder_to_load"). > repartition(numPartitions*2) > val df = df2.repartition(numPartitions). > persist > df.count > {code} > It would be great if when persisting an RDD/DataFrame, if we could request > that those partitions be stored evenly across executors in preparation for > future tasks. > I'm not sure if this is a more general issue (I.E. not just involving > persisting RDDs), but for the persisted in-memory case, it can make a HUGE > difference in the over-all running time of the remaining work. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19371) Cannot spread cached partitions evenly across executors
[ https://issues.apache.org/jira/browse/SPARK-19371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257809#comment-16257809 ] Sergei Lebedev commented on SPARK-19371: > Usually the answer is to force a shuffle [...] [~srowen] we are seeing exactly the same imbalance as a result of a shuffle. From the Executors page, it looks like a couple of executors get much more "reduce" tasks than the others. Does this sound like a possible scenario? > Cannot spread cached partitions evenly across executors > --- > > Key: SPARK-19371 > URL: https://issues.apache.org/jira/browse/SPARK-19371 > Project: Spark > Issue Type: Bug >Affects Versions: 1.6.1 >Reporter: Thunder Stumpges > Attachments: RDD Block Distribution on two executors.png, Unbalanced > RDD Blocks, and resulting task imbalance.png, Unbalanced RDD Blocks, and > resulting task imbalance.png, execution timeline.png > > > Before running an intensive iterative job (in this case a distributed topic > model training), we need to load a dataset and persist it across executors. > After loading from HDFS and persisting, the partitions are spread unevenly > across executors (based on the initial scheduling of the reads which are not > data locale sensitive). The partition sizes are even, just not their > distribution over executors. We currently have no way to force the partitions > to spread evenly, and as the iterative algorithm begins, tasks are > distributed to executors based on this initial load, forcing some very > unbalanced work. > This has been mentioned a > [number|http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-Partitions-not-distributed-evenly-to-executors-tt16988.html#a17059] > of > [times|http://apache-spark-user-list.1001560.n3.nabble.com/Spark-work-distribution-among-execs-tt26502.html] > in > [various|http://apache-spark-user-list.1001560.n3.nabble.com/Partitions-are-get-placed-on-the-single-node-tt26597.html] > user/dev group threads. > None of the discussions I could find had solutions that worked for me. Here > are examples of things I have tried. All resulted in partitions in memory > that were NOT evenly distributed to executors, causing future tasks to be > imbalanced across executors as well. > *Reduce Locality* > {code}spark.shuffle.reduceLocality.enabled=false/true{code} > *"Legacy" memory mode* > {code}spark.memory.useLegacyMode = true/false{code} > *Basic load and repartition* > {code} > val numPartitions = 48*16 > val df = sqlContext.read. > parquet("/data/folder_to_load"). > repartition(numPartitions). > persist > df.count > {code} > *Load and repartition to 2x partitions, then shuffle repartition down to > desired partitions* > {code} > val numPartitions = 48*16 > val df2 = sqlContext.read. > parquet("/data/folder_to_load"). > repartition(numPartitions*2) > val df = df2.repartition(numPartitions). > persist > df.count > {code} > It would be great if when persisting an RDD/DataFrame, if we could request > that those partitions be stored evenly across executors in preparation for > future tasks. > I'm not sure if this is a more general issue (I.E. not just involving > persisting RDDs), but for the persisted in-memory case, it can make a HUGE > difference in the over-all running time of the remaining work. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19371) Cannot spread cached partitions evenly across executors
[ https://issues.apache.org/jira/browse/SPARK-19371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16253554#comment-16253554 ] Sean Owen commented on SPARK-19371: --- Yes, I get why you want to spread the partitions. It doesn't happen naturally because other factors can get in the way -- imbalanced hashing, or, locality. It won't bother to move data around preemptively as it's not obvious it's worth the effort; a task that scheduled on A because it was local to some data produced a cached partition on A and that's about the best guess about where work on that data should live. The idea behind setting locality to 0 is that it should mean tasks didn't 'clump' together on the few nodes with data in the first place, to produce cached partitions. It's a crude tool though because means setting this to 0 globally. It's more of a diagnostic, although, often, lowering it is a good idea. Usually the answer is to force a shuffle, and that thread ended above saying there wasn't a shuffle option, but Dataset.repartition() should shuffle. You don't have to change the number of partitions, even. If that's producing an imbalance, then I'm surprised, and would next have to figure out just what it's hashing the rows on, and whether that's the issue. Or else, why the tasks performing the shuffle aren't spread out, as they should be wide dependencies where locality isn't going to help. > Cannot spread cached partitions evenly across executors > --- > > Key: SPARK-19371 > URL: https://issues.apache.org/jira/browse/SPARK-19371 > Project: Spark > Issue Type: Bug >Affects Versions: 1.6.1 >Reporter: Thunder Stumpges > Attachments: RDD Block Distribution on two executors.png, Unbalanced > RDD Blocks, and resulting task imbalance.png, Unbalanced RDD Blocks, and > resulting task imbalance.png, execution timeline.png > > > Before running an intensive iterative job (in this case a distributed topic > model training), we need to load a dataset and persist it across executors. > After loading from HDFS and persisting, the partitions are spread unevenly > across executors (based on the initial scheduling of the reads which are not > data locale sensitive). The partition sizes are even, just not their > distribution over executors. We currently have no way to force the partitions > to spread evenly, and as the iterative algorithm begins, tasks are > distributed to executors based on this initial load, forcing some very > unbalanced work. > This has been mentioned a > [number|http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-Partitions-not-distributed-evenly-to-executors-tt16988.html#a17059] > of > [times|http://apache-spark-user-list.1001560.n3.nabble.com/Spark-work-distribution-among-execs-tt26502.html] > in > [various|http://apache-spark-user-list.1001560.n3.nabble.com/Partitions-are-get-placed-on-the-single-node-tt26597.html] > user/dev group threads. > None of the discussions I could find had solutions that worked for me. Here > are examples of things I have tried. All resulted in partitions in memory > that were NOT evenly distributed to executors, causing future tasks to be > imbalanced across executors as well. > *Reduce Locality* > {code}spark.shuffle.reduceLocality.enabled=false/true{code} > *"Legacy" memory mode* > {code}spark.memory.useLegacyMode = true/false{code} > *Basic load and repartition* > {code} > val numPartitions = 48*16 > val df = sqlContext.read. > parquet("/data/folder_to_load"). > repartition(numPartitions). > persist > df.count > {code} > *Load and repartition to 2x partitions, then shuffle repartition down to > desired partitions* > {code} > val numPartitions = 48*16 > val df2 = sqlContext.read. > parquet("/data/folder_to_load"). > repartition(numPartitions*2) > val df = df2.repartition(numPartitions). > persist > df.count > {code} > It would be great if when persisting an RDD/DataFrame, if we could request > that those partitions be stored evenly across executors in preparation for > future tasks. > I'm not sure if this is a more general issue (I.E. not just involving > persisting RDDs), but for the persisted in-memory case, it can make a HUGE > difference in the over-all running time of the remaining work. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19371) Cannot spread cached partitions evenly across executors
[ https://issues.apache.org/jira/browse/SPARK-19371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16252969#comment-16252969 ] Thunder Stumpges commented on SPARK-19371: -- Here's a view of something that happened this time around as I started the job (this time with locality.wait using default). This RDD somehow got all 120 partitions on ONLY TWO executors, both on the same physical machine. From now on, all jobs either run on that one single machine, or (if I set locality.wait=0) move that data to other executors before executing, correct? !RDD Block Distribution on two executors.png! > Cannot spread cached partitions evenly across executors > --- > > Key: SPARK-19371 > URL: https://issues.apache.org/jira/browse/SPARK-19371 > Project: Spark > Issue Type: Bug >Affects Versions: 1.6.1 >Reporter: Thunder Stumpges > Attachments: RDD Block Distribution on two executors.png, Unbalanced > RDD Blocks, and resulting task imbalance.png, Unbalanced RDD Blocks, and > resulting task imbalance.png, execution timeline.png > > > Before running an intensive iterative job (in this case a distributed topic > model training), we need to load a dataset and persist it across executors. > After loading from HDFS and persisting, the partitions are spread unevenly > across executors (based on the initial scheduling of the reads which are not > data locale sensitive). The partition sizes are even, just not their > distribution over executors. We currently have no way to force the partitions > to spread evenly, and as the iterative algorithm begins, tasks are > distributed to executors based on this initial load, forcing some very > unbalanced work. > This has been mentioned a > [number|http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-Partitions-not-distributed-evenly-to-executors-tt16988.html#a17059] > of > [times|http://apache-spark-user-list.1001560.n3.nabble.com/Spark-work-distribution-among-execs-tt26502.html] > in > [various|http://apache-spark-user-list.1001560.n3.nabble.com/Partitions-are-get-placed-on-the-single-node-tt26597.html] > user/dev group threads. > None of the discussions I could find had solutions that worked for me. Here > are examples of things I have tried. All resulted in partitions in memory > that were NOT evenly distributed to executors, causing future tasks to be > imbalanced across executors as well. > *Reduce Locality* > {code}spark.shuffle.reduceLocality.enabled=false/true{code} > *"Legacy" memory mode* > {code}spark.memory.useLegacyMode = true/false{code} > *Basic load and repartition* > {code} > val numPartitions = 48*16 > val df = sqlContext.read. > parquet("/data/folder_to_load"). > repartition(numPartitions). > persist > df.count > {code} > *Load and repartition to 2x partitions, then shuffle repartition down to > desired partitions* > {code} > val numPartitions = 48*16 > val df2 = sqlContext.read. > parquet("/data/folder_to_load"). > repartition(numPartitions*2) > val df = df2.repartition(numPartitions). > persist > df.count > {code} > It would be great if when persisting an RDD/DataFrame, if we could request > that those partitions be stored evenly across executors in preparation for > future tasks. > I'm not sure if this is a more general issue (I.E. not just involving > persisting RDDs), but for the persisted in-memory case, it can make a HUGE > difference in the over-all running time of the remaining work. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19371) Cannot spread cached partitions evenly across executors
[ https://issues.apache.org/jira/browse/SPARK-19371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16252953#comment-16252953 ] Thunder Stumpges commented on SPARK-19371: -- In my case, it is important because that cached RDD is partitioned carefully, and used to join to a streaming dataset every 10 seconds to perform processing. The stream RDD is shuffled to align/join with this data-set, and work is then done. This runs for days or months at a time. If I could just once at the beginning move the RDD blocks so they're balanced, the benefit of that one time move (all in memory) would pay off many times over. Setting locality.wait=0 only causes this same network traffic to be ongoing (a task starting on one executor and its RDD block being on another executor), correct? BTW, I have tried my job with spark.locality.wait=0, but it seems to have limited effect. Tasks are still being executed on the executors with the RDD blocks. Only two tasks did NOT run as PROCESS_LOCAL, and they took many times longer to execute: !execution timeline.png! It all seems very clear in my head, am I missing something? Is this a really complicated thing to do? Basically the operation would be a "rebalance" that applies to cached RDDs and would attempt to balance the distribution of blocks across executors. > Cannot spread cached partitions evenly across executors > --- > > Key: SPARK-19371 > URL: https://issues.apache.org/jira/browse/SPARK-19371 > Project: Spark > Issue Type: Bug >Affects Versions: 1.6.1 >Reporter: Thunder Stumpges > Attachments: Unbalanced RDD Blocks, and resulting task imbalance.png, > Unbalanced RDD Blocks, and resulting task imbalance.png, execution > timeline.png > > > Before running an intensive iterative job (in this case a distributed topic > model training), we need to load a dataset and persist it across executors. > After loading from HDFS and persisting, the partitions are spread unevenly > across executors (based on the initial scheduling of the reads which are not > data locale sensitive). The partition sizes are even, just not their > distribution over executors. We currently have no way to force the partitions > to spread evenly, and as the iterative algorithm begins, tasks are > distributed to executors based on this initial load, forcing some very > unbalanced work. > This has been mentioned a > [number|http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-Partitions-not-distributed-evenly-to-executors-tt16988.html#a17059] > of > [times|http://apache-spark-user-list.1001560.n3.nabble.com/Spark-work-distribution-among-execs-tt26502.html] > in > [various|http://apache-spark-user-list.1001560.n3.nabble.com/Partitions-are-get-placed-on-the-single-node-tt26597.html] > user/dev group threads. > None of the discussions I could find had solutions that worked for me. Here > are examples of things I have tried. All resulted in partitions in memory > that were NOT evenly distributed to executors, causing future tasks to be > imbalanced across executors as well. > *Reduce Locality* > {code}spark.shuffle.reduceLocality.enabled=false/true{code} > *"Legacy" memory mode* > {code}spark.memory.useLegacyMode = true/false{code} > *Basic load and repartition* > {code} > val numPartitions = 48*16 > val df = sqlContext.read. > parquet("/data/folder_to_load"). > repartition(numPartitions). > persist > df.count > {code} > *Load and repartition to 2x partitions, then shuffle repartition down to > desired partitions* > {code} > val numPartitions = 48*16 > val df2 = sqlContext.read. > parquet("/data/folder_to_load"). > repartition(numPartitions*2) > val df = df2.repartition(numPartitions). > persist > df.count > {code} > It would be great if when persisting an RDD/DataFrame, if we could request > that those partitions be stored evenly across executors in preparation for > future tasks. > I'm not sure if this is a more general issue (I.E. not just involving > persisting RDDs), but for the persisted in-memory case, it can make a HUGE > difference in the over-all running time of the remaining work. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19371) Cannot spread cached partitions evenly across executors
[ https://issues.apache.org/jira/browse/SPARK-19371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16251990#comment-16251990 ] Sean Owen commented on SPARK-19371: --- The question is whether this is something that needs a change in Spark. What do you propose? I think the effect you cite relates to locality. See above? If so that is working as intended insofar as there are downsides to spending time sending cached partitions around and/or waiting for local execution. > Cannot spread cached partitions evenly across executors > --- > > Key: SPARK-19371 > URL: https://issues.apache.org/jira/browse/SPARK-19371 > Project: Spark > Issue Type: Bug >Affects Versions: 1.6.1 >Reporter: Thunder Stumpges > Attachments: Unbalanced RDD Blocks, and resulting task imbalance.png, > Unbalanced RDD Blocks, and resulting task imbalance.png > > > Before running an intensive iterative job (in this case a distributed topic > model training), we need to load a dataset and persist it across executors. > After loading from HDFS and persisting, the partitions are spread unevenly > across executors (based on the initial scheduling of the reads which are not > data locale sensitive). The partition sizes are even, just not their > distribution over executors. We currently have no way to force the partitions > to spread evenly, and as the iterative algorithm begins, tasks are > distributed to executors based on this initial load, forcing some very > unbalanced work. > This has been mentioned a > [number|http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-Partitions-not-distributed-evenly-to-executors-tt16988.html#a17059] > of > [times|http://apache-spark-user-list.1001560.n3.nabble.com/Spark-work-distribution-among-execs-tt26502.html] > in > [various|http://apache-spark-user-list.1001560.n3.nabble.com/Partitions-are-get-placed-on-the-single-node-tt26597.html] > user/dev group threads. > None of the discussions I could find had solutions that worked for me. Here > are examples of things I have tried. All resulted in partitions in memory > that were NOT evenly distributed to executors, causing future tasks to be > imbalanced across executors as well. > *Reduce Locality* > {code}spark.shuffle.reduceLocality.enabled=false/true{code} > *"Legacy" memory mode* > {code}spark.memory.useLegacyMode = true/false{code} > *Basic load and repartition* > {code} > val numPartitions = 48*16 > val df = sqlContext.read. > parquet("/data/folder_to_load"). > repartition(numPartitions). > persist > df.count > {code} > *Load and repartition to 2x partitions, then shuffle repartition down to > desired partitions* > {code} > val numPartitions = 48*16 > val df2 = sqlContext.read. > parquet("/data/folder_to_load"). > repartition(numPartitions*2) > val df = df2.repartition(numPartitions). > persist > df.count > {code} > It would be great if when persisting an RDD/DataFrame, if we could request > that those partitions be stored evenly across executors in preparation for > future tasks. > I'm not sure if this is a more general issue (I.E. not just involving > persisting RDDs), but for the persisted in-memory case, it can make a HUGE > difference in the over-all running time of the remaining work. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19371) Cannot spread cached partitions evenly across executors
[ https://issues.apache.org/jira/browse/SPARK-19371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16251971#comment-16251971 ] Thunder Stumpges commented on SPARK-19371: -- Hi Sean, >From my perspective, you can attach whatever label you want (bug, feature, >story, PITA) but it is obviously causing troubles for more than just an >isolated case. And to re-state the issue, this is NOT (in my case at least) related to partitioner / hashing. My partitions are sized evenly, the keys are unique words (String) in a vocabulary. The issue is related to where the RDD blocks are distributed to the executors in cache storage. See the image below which shows the problem. What would solve this for me is not really a "shuffle" but a "rebalance storage" that moves RDD blocks in memory to balance them across executors. !Unbalanced RDD Blocks, and resulting task imbalance.png! > Cannot spread cached partitions evenly across executors > --- > > Key: SPARK-19371 > URL: https://issues.apache.org/jira/browse/SPARK-19371 > Project: Spark > Issue Type: Bug >Affects Versions: 1.6.1 >Reporter: Thunder Stumpges > Attachments: Unbalanced RDD Blocks, and resulting task imbalance.png, > Unbalanced RDD Blocks, and resulting task imbalance.png > > > Before running an intensive iterative job (in this case a distributed topic > model training), we need to load a dataset and persist it across executors. > After loading from HDFS and persisting, the partitions are spread unevenly > across executors (based on the initial scheduling of the reads which are not > data locale sensitive). The partition sizes are even, just not their > distribution over executors. We currently have no way to force the partitions > to spread evenly, and as the iterative algorithm begins, tasks are > distributed to executors based on this initial load, forcing some very > unbalanced work. > This has been mentioned a > [number|http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-Partitions-not-distributed-evenly-to-executors-tt16988.html#a17059] > of > [times|http://apache-spark-user-list.1001560.n3.nabble.com/Spark-work-distribution-among-execs-tt26502.html] > in > [various|http://apache-spark-user-list.1001560.n3.nabble.com/Partitions-are-get-placed-on-the-single-node-tt26597.html] > user/dev group threads. > None of the discussions I could find had solutions that worked for me. Here > are examples of things I have tried. All resulted in partitions in memory > that were NOT evenly distributed to executors, causing future tasks to be > imbalanced across executors as well. > *Reduce Locality* > {code}spark.shuffle.reduceLocality.enabled=false/true{code} > *"Legacy" memory mode* > {code}spark.memory.useLegacyMode = true/false{code} > *Basic load and repartition* > {code} > val numPartitions = 48*16 > val df = sqlContext.read. > parquet("/data/folder_to_load"). > repartition(numPartitions). > persist > df.count > {code} > *Load and repartition to 2x partitions, then shuffle repartition down to > desired partitions* > {code} > val numPartitions = 48*16 > val df2 = sqlContext.read. > parquet("/data/folder_to_load"). > repartition(numPartitions*2) > val df = df2.repartition(numPartitions). > persist > df.count > {code} > It would be great if when persisting an RDD/DataFrame, if we could request > that those partitions be stored evenly across executors in preparation for > future tasks. > I'm not sure if this is a more general issue (I.E. not just involving > persisting RDDs), but for the persisted in-memory case, it can make a HUGE > difference in the over-all running time of the remaining work. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19371) Cannot spread cached partitions evenly across executors
[ https://issues.apache.org/jira/browse/SPARK-19371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16247515#comment-16247515 ] Sean Owen commented on SPARK-19371: --- All: this isn't a bug. See issues above. If you rule these out, you need to look at your partitioner and why its hashing isn't spreading keys around. > Cannot spread cached partitions evenly across executors > --- > > Key: SPARK-19371 > URL: https://issues.apache.org/jira/browse/SPARK-19371 > Project: Spark > Issue Type: Bug >Affects Versions: 1.6.1 >Reporter: Thunder Stumpges > > Before running an intensive iterative job (in this case a distributed topic > model training), we need to load a dataset and persist it across executors. > After loading from HDFS and persisting, the partitions are spread unevenly > across executors (based on the initial scheduling of the reads which are not > data locale sensitive). The partition sizes are even, just not their > distribution over executors. We currently have no way to force the partitions > to spread evenly, and as the iterative algorithm begins, tasks are > distributed to executors based on this initial load, forcing some very > unbalanced work. > This has been mentioned a > [number|http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-Partitions-not-distributed-evenly-to-executors-tt16988.html#a17059] > of > [times|http://apache-spark-user-list.1001560.n3.nabble.com/Spark-work-distribution-among-execs-tt26502.html] > in > [various|http://apache-spark-user-list.1001560.n3.nabble.com/Partitions-are-get-placed-on-the-single-node-tt26597.html] > user/dev group threads. > None of the discussions I could find had solutions that worked for me. Here > are examples of things I have tried. All resulted in partitions in memory > that were NOT evenly distributed to executors, causing future tasks to be > imbalanced across executors as well. > *Reduce Locality* > {code}spark.shuffle.reduceLocality.enabled=false/true{code} > *"Legacy" memory mode* > {code}spark.memory.useLegacyMode = true/false{code} > *Basic load and repartition* > {code} > val numPartitions = 48*16 > val df = sqlContext.read. > parquet("/data/folder_to_load"). > repartition(numPartitions). > persist > df.count > {code} > *Load and repartition to 2x partitions, then shuffle repartition down to > desired partitions* > {code} > val numPartitions = 48*16 > val df2 = sqlContext.read. > parquet("/data/folder_to_load"). > repartition(numPartitions*2) > val df = df2.repartition(numPartitions). > persist > df.count > {code} > It would be great if when persisting an RDD/DataFrame, if we could request > that those partitions be stored evenly across executors in preparation for > future tasks. > I'm not sure if this is a more general issue (I.E. not just involving > persisting RDDs), but for the persisted in-memory case, it can make a HUGE > difference in the over-all running time of the remaining work. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19371) Cannot spread cached partitions evenly across executors
[ https://issues.apache.org/jira/browse/SPARK-19371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16247215#comment-16247215 ] Jing Weng commented on SPARK-19371: --- I'm also having the same issue in Spark 2.1. > Cannot spread cached partitions evenly across executors > --- > > Key: SPARK-19371 > URL: https://issues.apache.org/jira/browse/SPARK-19371 > Project: Spark > Issue Type: Bug >Affects Versions: 1.6.1 >Reporter: Thunder Stumpges > > Before running an intensive iterative job (in this case a distributed topic > model training), we need to load a dataset and persist it across executors. > After loading from HDFS and persisting, the partitions are spread unevenly > across executors (based on the initial scheduling of the reads which are not > data locale sensitive). The partition sizes are even, just not their > distribution over executors. We currently have no way to force the partitions > to spread evenly, and as the iterative algorithm begins, tasks are > distributed to executors based on this initial load, forcing some very > unbalanced work. > This has been mentioned a > [number|http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-Partitions-not-distributed-evenly-to-executors-tt16988.html#a17059] > of > [times|http://apache-spark-user-list.1001560.n3.nabble.com/Spark-work-distribution-among-execs-tt26502.html] > in > [various|http://apache-spark-user-list.1001560.n3.nabble.com/Partitions-are-get-placed-on-the-single-node-tt26597.html] > user/dev group threads. > None of the discussions I could find had solutions that worked for me. Here > are examples of things I have tried. All resulted in partitions in memory > that were NOT evenly distributed to executors, causing future tasks to be > imbalanced across executors as well. > *Reduce Locality* > {code}spark.shuffle.reduceLocality.enabled=false/true{code} > *"Legacy" memory mode* > {code}spark.memory.useLegacyMode = true/false{code} > *Basic load and repartition* > {code} > val numPartitions = 48*16 > val df = sqlContext.read. > parquet("/data/folder_to_load"). > repartition(numPartitions). > persist > df.count > {code} > *Load and repartition to 2x partitions, then shuffle repartition down to > desired partitions* > {code} > val numPartitions = 48*16 > val df2 = sqlContext.read. > parquet("/data/folder_to_load"). > repartition(numPartitions*2) > val df = df2.repartition(numPartitions). > persist > df.count > {code} > It would be great if when persisting an RDD/DataFrame, if we could request > that those partitions be stored evenly across executors in preparation for > future tasks. > I'm not sure if this is a more general issue (I.E. not just involving > persisting RDDs), but for the persisted in-memory case, it can make a HUGE > difference in the over-all running time of the remaining work. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19371) Cannot spread cached partitions evenly across executors
[ https://issues.apache.org/jira/browse/SPARK-19371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16000770#comment-16000770 ] Lucas Mendonça commented on SPARK-19371: I'm also having the same issue in Spark 2.1 . Is there any update on this? > Cannot spread cached partitions evenly across executors > --- > > Key: SPARK-19371 > URL: https://issues.apache.org/jira/browse/SPARK-19371 > Project: Spark > Issue Type: Bug >Affects Versions: 1.6.1 >Reporter: Thunder Stumpges > > Before running an intensive iterative job (in this case a distributed topic > model training), we need to load a dataset and persist it across executors. > After loading from HDFS and persisting, the partitions are spread unevenly > across executors (based on the initial scheduling of the reads which are not > data locale sensitive). The partition sizes are even, just not their > distribution over executors. We currently have no way to force the partitions > to spread evenly, and as the iterative algorithm begins, tasks are > distributed to executors based on this initial load, forcing some very > unbalanced work. > This has been mentioned a > [number|http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-Partitions-not-distributed-evenly-to-executors-tt16988.html#a17059] > of > [times|http://apache-spark-user-list.1001560.n3.nabble.com/Spark-work-distribution-among-execs-tt26502.html] > in > [various|http://apache-spark-user-list.1001560.n3.nabble.com/Partitions-are-get-placed-on-the-single-node-tt26597.html] > user/dev group threads. > None of the discussions I could find had solutions that worked for me. Here > are examples of things I have tried. All resulted in partitions in memory > that were NOT evenly distributed to executors, causing future tasks to be > imbalanced across executors as well. > *Reduce Locality* > {code}spark.shuffle.reduceLocality.enabled=false/true{code} > *"Legacy" memory mode* > {code}spark.memory.useLegacyMode = true/false{code} > *Basic load and repartition* > {code} > val numPartitions = 48*16 > val df = sqlContext.read. > parquet("/data/folder_to_load"). > repartition(numPartitions). > persist > df.count > {code} > *Load and repartition to 2x partitions, then shuffle repartition down to > desired partitions* > {code} > val numPartitions = 48*16 > val df2 = sqlContext.read. > parquet("/data/folder_to_load"). > repartition(numPartitions*2) > val df = df2.repartition(numPartitions). > persist > df.count > {code} > It would be great if when persisting an RDD/DataFrame, if we could request > that those partitions be stored evenly across executors in preparation for > future tasks. > I'm not sure if this is a more general issue (I.E. not just involving > persisting RDDs), but for the persisted in-memory case, it can make a HUGE > difference in the over-all running time of the remaining work. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19371) Cannot spread cached partitions evenly across executors
[ https://issues.apache.org/jira/browse/SPARK-19371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893877#comment-15893877 ] Paul Lysak commented on SPARK-19371: I'm observing similar behavior in Spark 2.1 - unfortunately, due to complex workflow of our application wasn't yet able to identify after which operation exactly all the partitions of DataFrame end up on a single executor, so no matter how big cluster is - only one executor picks all the job. > Cannot spread cached partitions evenly across executors > --- > > Key: SPARK-19371 > URL: https://issues.apache.org/jira/browse/SPARK-19371 > Project: Spark > Issue Type: Bug >Affects Versions: 1.6.1 >Reporter: Thunder Stumpges > > Before running an intensive iterative job (in this case a distributed topic > model training), we need to load a dataset and persist it across executors. > After loading from HDFS and persisting, the partitions are spread unevenly > across executors (based on the initial scheduling of the reads which are not > data locale sensitive). The partition sizes are even, just not their > distribution over executors. We currently have no way to force the partitions > to spread evenly, and as the iterative algorithm begins, tasks are > distributed to executors based on this initial load, forcing some very > unbalanced work. > This has been mentioned a > [number|http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-Partitions-not-distributed-evenly-to-executors-tt16988.html#a17059] > of > [times|http://apache-spark-user-list.1001560.n3.nabble.com/Spark-work-distribution-among-execs-tt26502.html] > in > [various|http://apache-spark-user-list.1001560.n3.nabble.com/Partitions-are-get-placed-on-the-single-node-tt26597.html] > user/dev group threads. > None of the discussions I could find had solutions that worked for me. Here > are examples of things I have tried. All resulted in partitions in memory > that were NOT evenly distributed to executors, causing future tasks to be > imbalanced across executors as well. > *Reduce Locality* > {code}spark.shuffle.reduceLocality.enabled=false/true{code} > *"Legacy" memory mode* > {code}spark.memory.useLegacyMode = true/false{code} > *Basic load and repartition* > {code} > val numPartitions = 48*16 > val df = sqlContext.read. > parquet("/data/folder_to_load"). > repartition(numPartitions). > persist > df.count > {code} > *Load and repartition to 2x partitions, then shuffle repartition down to > desired partitions* > {code} > val numPartitions = 48*16 > val df2 = sqlContext.read. > parquet("/data/folder_to_load"). > repartition(numPartitions*2) > val df = df2.repartition(numPartitions). > persist > df.count > {code} > It would be great if when persisting an RDD/DataFrame, if we could request > that those partitions be stored evenly across executors in preparation for > future tasks. > I'm not sure if this is a more general issue (I.E. not just involving > persisting RDDs), but for the persisted in-memory case, it can make a HUGE > difference in the over-all running time of the remaining work. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19371) Cannot spread cached partitions evenly across executors
[ https://issues.apache.org/jira/browse/SPARK-19371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15840569#comment-15840569 ] Thunder Stumpges commented on SPARK-19371: -- Thanks Sean. I don't think I can (always) rely on perfect distribution of the underlying data in HDFS and data locality for this as the number of files (I'm using parquet, but this could vary a lot) and data source in general could be different in different cases. An extreme example might be a query to ElasticSearch or Cassandra or other external store that has no locality with the spark executors. I guess I can agree that you might not want to classify this as a bug, but it is at minimum a very important improvement / new feature request, as it can make certain workloads prohibitively slow, and create quite unbalanced utilization of a cluster on the whole. I think shuffling is probably the right idea. It seems to me once an RDD is cached, a simple "rebalance" could be used to move the partitions around the executors in preparation for additional operations. > Cannot spread cached partitions evenly across executors > --- > > Key: SPARK-19371 > URL: https://issues.apache.org/jira/browse/SPARK-19371 > Project: Spark > Issue Type: Bug >Affects Versions: 1.6.1 >Reporter: Thunder Stumpges > > Before running an intensive iterative job (in this case a distributed topic > model training), we need to load a dataset and persist it across executors. > After loading from HDFS and persisting, the partitions are spread unevenly > across executors (based on the initial scheduling of the reads which are not > data locale sensitive). The partition sizes are even, just not their > distribution over executors. We currently have no way to force the partitions > to spread evenly, and as the iterative algorithm begins, tasks are > distributed to executors based on this initial load, forcing some very > unbalanced work. > This has been mentioned a > [number|http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-Partitions-not-distributed-evenly-to-executors-tt16988.html#a17059] > of > [times|http://apache-spark-user-list.1001560.n3.nabble.com/Spark-work-distribution-among-execs-tt26502.html] > in > [various|http://apache-spark-user-list.1001560.n3.nabble.com/Partitions-are-get-placed-on-the-single-node-tt26597.html] > user/dev group threads. > None of the discussions I could find had solutions that worked for me. Here > are examples of things I have tried. All resulted in partitions in memory > that were NOT evenly distributed to executors, causing future tasks to be > imbalanced across executors as well. > *Reduce Locality* > {code}spark.shuffle.reduceLocality.enabled=false/true{code} > *"Legacy" memory mode* > {code}spark.memory.useLegacyMode = true/false{code} > *Basic load and repartition* > {code} > val numPartitions = 48*16 > val df = sqlContext.read. > parquet("/data/folder_to_load"). > repartition(numPartitions). > persist > df.count > {code} > *Load and repartition to 2x partitions, then shuffle repartition down to > desired partitions* > {code} > val numPartitions = 48*16 > val df2 = sqlContext.read. > parquet("/data/folder_to_load"). > repartition(numPartitions*2) > val df = df2.repartition(numPartitions). > persist > df.count > {code} > It would be great if when persisting an RDD/DataFrame, if we could request > that those partitions be stored evenly across executors in preparation for > future tasks. > I'm not sure if this is a more general issue (I.E. not just involving > persisting RDDs), but for the persisted in-memory case, it can make a HUGE > difference in the over-all running time of the remaining work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19371) Cannot spread cached partitions evenly across executors
[ https://issues.apache.org/jira/browse/SPARK-19371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15840113#comment-15840113 ] Sean Owen commented on SPARK-19371: --- There's a tension between waiting for locality and taking a free slot that's not local. You can increase spark.locality.wait to prefer locality even at the expense of delays. If your underlying HDFS data isn't spread out well, it won't help though. You should make sure you have some HDFS replication enabled. You can try 2x caching replication to make more copies available, obviously at the expense of more memory. Shuffling is actually a decent idea; one way or the other if the data isn't evenly spread, some if it has to be copied. I don't think this is a bug per se. > Cannot spread cached partitions evenly across executors > --- > > Key: SPARK-19371 > URL: https://issues.apache.org/jira/browse/SPARK-19371 > Project: Spark > Issue Type: Bug >Affects Versions: 1.6.1 >Reporter: Thunder Stumpges > > Before running an intensive iterative job (in this case a distributed topic > model training), we need to load a dataset and persist it across executors. > After loading from HDFS and persisting, the partitions are spread unevenly > across executors (based on the initial scheduling of the reads which are not > data locale sensitive). The partition sizes are even, just not their > distribution over executors. We currently have no way to force the partitions > to spread evenly, and as the iterative algorithm begins, tasks are > distributed to executors based on this initial load, forcing some very > unbalanced work. > This has been mentioned a > [number|http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-Partitions-not-distributed-evenly-to-executors-tt16988.html#a17059] > of > [times|http://apache-spark-user-list.1001560.n3.nabble.com/Spark-work-distribution-among-execs-tt26502.html] > in > [various|http://apache-spark-user-list.1001560.n3.nabble.com/Partitions-are-get-placed-on-the-single-node-tt26597.html] > user/dev group threads. > None of the discussions I could find had solutions that worked for me. Here > are examples of things I have tried. All resulted in partitions in memory > that were NOT evenly distributed to executors, causing future tasks to be > imbalanced across executors as well. > *Reduce Locality* > {code}spark.shuffle.reduceLocality.enabled=false/true{code} > *"Legacy" memory mode* > {code}spark.memory.useLegacyMode = true/false{code} > *Basic load and repartition* > {code} > val numPartitions = 48*16 > val df = sqlContext.read. > parquet("/data/folder_to_load"). > repartition(numPartitions). > persist > df.count > {code} > *Load and repartition to 2x partitions, then shuffle repartition down to > desired partitions* > {code} > val numPartitions = 48*16 > val df2 = sqlContext.read. > parquet("/data/folder_to_load"). > repartition(numPartitions*2) > val df = df2.repartition(numPartitions). > persist > df.count > {code} > It would be great if when persisting an RDD/DataFrame, if we could request > that those partitions be stored evenly across executors in preparation for > future tasks. > I'm not sure if this is a more general issue (I.E. not just involving > persisting RDDs), but for the persisted in-memory case, it can make a HUGE > difference in the over-all running time of the remaining work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org