Re: JavaSerializerInstance is slow

2021-09-07 Thread Kohki Nishio
A spark job creates 200 partitions, and executors try to deserialize the task at the same time. That creates a chain of blocking situations, as all executors are deserializing the same task and loadClass does a lock per class name. I often observe that many threads are making that chain from the

Re: JavaSerializerInstance is slow

2021-09-03 Thread Sean Owen
I don't know if java serialization is slow in that case; that shows blocking on a class load, which may or may not be directly due to deserialization. Indeed I don't think (some) things are serialized in local mode within one JVM, so not sure that's actually what's going on. On Thu, Sep 2, 2021

Re: JavaSerializerInstance is slow

2021-09-02 Thread Antonin Delpeuch (lists)
Hi Kohki, Serialization of tasks happens in local mode too and as far as I am aware there is no way to disable this (although it would definitely be useful in my opinion). You can see the local mode as a testing mode, in which you would want to catch any serialization errors, before they appear

JavaSerializerInstance is slow

2021-09-02 Thread Kohki Nishio
I'm seeing many threads doing deserialization of a task, I understand since lambda is involved, we can't use Kryo for those purposes. However I'm running it in local mode, this serialization is not really necessary, no? Is there any trick I can apply to get rid of this thread contention ? I'm