I have seen that link. I am using RDD of Byte Array n Kryo serialization. Inside mapPartition when I measure time it is never more than 1 ms whereas total time took by application is like 30 min. Codebase has lot of dependencies. I m trying to come up with a simple version where I can reproduce this problem. Also GC timings reported by spark ui is always in the range of 3~4%of total time.
On Thu, Jan 1, 2015, 14:05 Akhil Das <ak...@sigmoidanalytics.com> wrote: > Would be great if you can share the piece of code happening inside your > mapPartition, I'm assuming you are creating/handling a lot of Complex > objects and hence it slows down the performance. Here's a link > <http://spark.apache.org/docs/latest/tuning.html> to performance tuning > if you haven't seen it already. > > Thanks > Best Regards > > On Wed, Dec 31, 2014 at 8:45 AM, Raghavendra Pandey < > raghavendra.pan...@gmail.com> wrote: > >> I have a spark app that involves series of mapPartition operations and >> then a keyBy operation. I have measured the time inside mapPartition >> function block. These blocks take trivial time. Still the application takes >> way too much time and even sparkUI shows that much time. >> So i was wondering where does it take time and how can I reduce this. >> >> Thanks >> Raghavendra >> > >