I'm finding the following messages in the driver. Can this potentially have
anything to do with these drastic slowdowns?


14/06/28 00:00:17 INFO ShuffleBlockManager: Could not find files for
shuffle 8 for deleting
14/06/28 00:00:17 INFO ContextCleaner: Cleaned shuffle 8
14/06/28 00:00:17 INFO ShuffleBlockManager: Could not find files for
shuffle 7 for deleting
14/06/28 00:00:17 INFO ContextCleaner: Cleaned shuffle 7
14/06/28 00:00:17 INFO ShuffleBlockManager: Could not find files for
shuffle 6 for deleting
14/06/28 00:00:17 INFO ContextCleaner: Cleaned shuffle 6




On Fri, Jun 27, 2014 at 11:35 PM, Sung Hwan Chung <coded...@cs.stanford.edu>
wrote:

> I'm doing something like this:
>
> rdd.groupBy.map().collect()
>
> The work load on final map is pretty much evenly distributed.
>
> When collect happens, say on 60 partitions, the first 55 or so partitions
> finish very quickly say within 10 seconds. However, the last 5,
> particularly the very last one, typically get very slow, the overall
> collect time reaching 30 seconds to sometimes even 1 minute.
>
> E.g., it would get stuck in a state like 54/55 for a much longer time.
>
> Another interesting thing is the first iteration typically doesn't have
> this problem, but it gets progressively worse despite having about the same
> workload/partition sizes in subsequent iterations.
>
> This problem worsens with smaller akka framesize and/or maxMbInFlight
>
> Anyone know why this is so?
>

Reply via email to