Re: Always two tasks slower than others, and then job fails
Data skew ? May your partition key has some special value like null or empty string On Fri, Aug 14, 2015 at 11:01 AM, randylu randyl...@gmail.com wrote: It is strange that there are always two tasks slower than others, and the corresponding partitions's data are larger, no matter how many partitions? Executor ID Address Task Time Shuffle Read Size / Records 1 slave129.vsvs.com:56691 16 s1 99.5 MB / 18865432 *10 slave317.vsvs.com:59281 0 ms0 413.5 MB / 311001318* 100 slave290.vsvs.com:60241 19 s1 110.8 MB / 27075926 101 slave323.vsvs.com:36246 14 s1 126.1 MB / 25052808 Task time and records of Executor 10 seems strange, and the cpus on the node are all 100% busy. Anyone meets the same problem, Thanks in advance for any answer! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Always-two-tasks-slower-than-others-and-then-job-fails-tp24257.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- Best Regards Jeff Zhang
Re: Always two tasks slower than others, and then job fails
Data skew is still a problem with Spark. - If you use groupByKey, try to express your logic by not using groupByKey. - If you need to use groupByKey, all you can do is to scale vertically. - If you can, repartition with a finer HashPartitioner. You will have many tasks for each stage, but tasks are light-weight in Spark, so it should not introduce a heavy overhead. If you have your own domain-partitioner, try to rewrite it by introducing a secondary-key. I hope I gave some insights and help. On Fri, Aug 14, 2015 at 9:37 AM Jeff Zhang zjf...@gmail.com wrote: Data skew ? May your partition key has some special value like null or empty string On Fri, Aug 14, 2015 at 11:01 AM, randylu randyl...@gmail.com wrote: It is strange that there are always two tasks slower than others, and the corresponding partitions's data are larger, no matter how many partitions? Executor ID Address Task Time Shuffle Read Size / Records 1 slave129.vsvs.com:56691 16 s1 99.5 MB / 18865432 *10 slave317.vsvs.com:59281 0 ms0 413.5 MB / 311001318* 100 slave290.vsvs.com:60241 19 s1 110.8 MB / 27075926 101 slave323.vsvs.com:36246 14 s1 126.1 MB / 25052808 Task time and records of Executor 10 seems strange, and the cpus on the node are all 100% busy. Anyone meets the same problem, Thanks in advance for any answer! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Always-two-tasks-slower-than-others-and-then-job-fails-tp24257.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- Best Regards Jeff Zhang
Always two tasks slower than others, and then job fails
It is strange that there are always two tasks slower than others, and the corresponding partitions's data are larger, no matter how many partitions? Executor ID Address Task Time Shuffle Read Size / Records 1 slave129.vsvs.com:56691 16 s1 99.5 MB / 18865432 *10 slave317.vsvs.com:59281 0 ms0 413.5 MB / 311001318* 100 slave290.vsvs.com:60241 19 s1 110.8 MB / 27075926 101 slave323.vsvs.com:36246 14 s1 126.1 MB / 25052808 Task time and records of Executor 10 seems strange, and the cpus on the node are all 100% busy. Anyone meets the same problem, Thanks in advance for any answer! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Always-two-tasks-slower-than-others-and-then-job-fails-tp24257.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org