Would be great if you can share the piece of code happening inside your
mapPartition, I'm assuming you are creating/handling a lot of Complex
objects and hence it slows down the performance. Here's a link
http://spark.apache.org/docs/latest/tuning.html to performance tuning if
you haven't seen it
I have seen that link. I am using RDD of Byte Array n Kryo serialization.
Inside mapPartition when I measure time it is never more than 1 ms whereas
total time took by application is like 30 min. Codebase has lot of
dependencies. I m trying to come up with a simple version where I can
reproduce
I have a spark app that involves series of mapPartition operations and then
a keyBy operation. I have measured the time inside mapPartition function
block. These blocks take trivial time. Still the application takes way too
much time and even sparkUI shows that much time.
So i was wondering where