I have seen that link. I am using RDD of Byte Array n Kryo serialization.
Inside mapPartition when I measure time it is never more than 1 ms whereas
total time took by application is like 30 min. Codebase has lot of
dependencies. I m trying to come up with a simple version where I can
reproduce this problem.
Also GC timings reported by spark ui is always in the range of 3~4%of total
time.

On Thu, Jan 1, 2015, 14:05 Akhil Das <ak...@sigmoidanalytics.com> wrote:

> Would be great if you can share the piece of code happening inside your
> mapPartition, I'm assuming you are creating/handling a lot of Complex
> objects and hence it slows down the performance. Here's a link
> <http://spark.apache.org/docs/latest/tuning.html> to performance tuning
> if you haven't seen it already.
>
> Thanks
> Best Regards
>
> On Wed, Dec 31, 2014 at 8:45 AM, Raghavendra Pandey <
> raghavendra.pan...@gmail.com> wrote:
>
>> I have a spark app that involves series of mapPartition operations and
>> then a keyBy operation. I have measured the time inside mapPartition
>> function block. These blocks take trivial time. Still the application takes
>> way too much time and even sparkUI shows that much time.
>> So i was wondering where does it take time and how can I reduce this.
>>
>> Thanks
>> Raghavendra
>>
>
>

Reply via email to