Maybe JIT? The 1st stage -- the scheduler code isn't JITed yet.

On Wed, May 13, 2015 at 9:18 AM, Akshat Aranya <aara...@gmail.com> wrote:

> Hi,
> Any input on this?  I'm willing to instrument further and experiment
> if there are any ideas.
>
> On Mon, May 4, 2015 at 11:27 AM, Akshat Aranya <aara...@gmail.com> wrote:
> > Hi,
> >
> > I have been investigating scheduling delays in Spark and I found some
> > unexplained anomalies.  In my use case, I have two stages after
> > collapsing the transformations: the first is a mapPartitions() and the
> > second is a sortByKey().  I found that the task serialization for the
> > first stage takes much longer than the second.
> >
> > 1. mapPartitions() - this launches 256 tasks in 603 ms (avg. 2.363
> > ms). Each task serializes to 1220 bytes.
> > 2. sortByKey() - this launches 64 tasks in 12 ms (avg. 0.187 ms). Each
> > task serializes to 1139 bytes.
> >
> > Note that the serialized size of the task is similar, but the avg.
> > scheduling time is very different.  I also instrumented the code to
> > print out the serialization time, and it seems like it is indeed the
> > serialization that takes much longer.  This seemed weird to me because
> > the biggest part of the Task, the taskBinary is actually directly
> > copied from a byte array.
> >
> > Any explanation of why this happens?
> >
> > Thanks,
> > Akshat
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>

Reply via email to