Re: JavaSerializerInstance is slow

2021-09-07 Thread Kohki Nishio
A spark job creates 200 partitions, and executors try to deserialize
the task at the same time. That creates a chain of blocking situations, as
all executors are deserializing the same task and loadClass does a lock per
class name. I often observe that many threads are making that chain from
the thread dumps.

We're using Spark as a high TPS search engine, we can't really afford
allocating a resource per query, thus, we're going with local mode, I
believe there are people using similar way in production, but anyways,
thanks for the comments. For now, it seems Java deserializer is the only
option, ... so it seems I'll have to add more machines to handle higher TPS,

Thanks
-Kohki

On Fri, Sep 3, 2021 at 5:40 AM Sean Owen  wrote:

> I don't know if java serialization is slow in that case; that shows
> blocking on a class load, which may or may not be directly due to
> deserialization.
> Indeed I don't think (some) things are serialized in local mode within one
> JVM, so not sure that's actually what's going on.
>
> On Thu, Sep 2, 2021 at 11:58 PM Antonin Delpeuch (lists) <
> li...@antonin.delpeuch.eu> wrote:
>
>> Hi Kohki,
>>
>> Serialization of tasks happens in local mode too and as far as I am
>> aware there is no way to disable this (although it would definitely be
>> useful in my opinion).
>>
>> You can see the local mode as a testing mode, in which you would want to
>> catch any serialization errors, before they appear in production.
>>
>> There are also some important bugs that are present in local mode and
>> are not deemed worth fixing because it is not intended to be used in
>> production (https://issues.apache.org/jira/browse/SPARK-5300).
>>
>> I think there would definitely be interest in having a reliable and
>> efficient local mode in Spark but it's a pretty different use case than
>> what Spark originally focused on.
>>
>> Antonin
>>
>> On 03/09/2021 05:56, Kohki Nishio wrote:
>> > I'm seeing many threads doing deserialization of a task, I understand
>> > since lambda is involved, we can't use Kryo for those purposes.
>> > However I'm running it in local mode, this serialization is not really
>> > necessary, no?
>> >
>> > Is there any trick I can apply to get rid of this thread contention ?
>> > I'm seeing many of the below threads in thread dumps ...
>> >
>> >
>> > "Executor task launch worker for task 11.0 in stage 15472514.0 (TID
>> > 19788863)" #732821 daemon prio=5 os_prio=0 tid=0x7f02581b2800
>> > nid=0x355d waiting for monitor entry [0x7effd1e3f000]
>> >java.lang.Thread.State: BLOCKED (on object monitor)
>> > at java.lang.ClassLoader.loadClass(ClassLoader.java:400)
>> > - waiting to lock <0x7f0f7246edf8> (a java.lang.Object)
>> > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
>> > at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
>> > at
>> >
>> scala.runtime.LambdaDeserializer$.deserializeLambda(LambdaDeserializer.scala:51)
>> > at
>> >
>> scala.runtime.LambdaDeserialize.deserializeLambda(LambdaDeserialize.java:38)
>> >
>> >
>> > Thanks
>> > -Kohki
>>
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>

-- 
Kohki Nishio


Re: JavaSerializerInstance is slow

2021-09-03 Thread Sean Owen
I don't know if java serialization is slow in that case; that shows
blocking on a class load, which may or may not be directly due to
deserialization.
Indeed I don't think (some) things are serialized in local mode within one
JVM, so not sure that's actually what's going on.

On Thu, Sep 2, 2021 at 11:58 PM Antonin Delpeuch (lists) <
li...@antonin.delpeuch.eu> wrote:

> Hi Kohki,
>
> Serialization of tasks happens in local mode too and as far as I am
> aware there is no way to disable this (although it would definitely be
> useful in my opinion).
>
> You can see the local mode as a testing mode, in which you would want to
> catch any serialization errors, before they appear in production.
>
> There are also some important bugs that are present in local mode and
> are not deemed worth fixing because it is not intended to be used in
> production (https://issues.apache.org/jira/browse/SPARK-5300).
>
> I think there would definitely be interest in having a reliable and
> efficient local mode in Spark but it's a pretty different use case than
> what Spark originally focused on.
>
> Antonin
>
> On 03/09/2021 05:56, Kohki Nishio wrote:
> > I'm seeing many threads doing deserialization of a task, I understand
> > since lambda is involved, we can't use Kryo for those purposes.
> > However I'm running it in local mode, this serialization is not really
> > necessary, no?
> >
> > Is there any trick I can apply to get rid of this thread contention ?
> > I'm seeing many of the below threads in thread dumps ...
> >
> >
> > "Executor task launch worker for task 11.0 in stage 15472514.0 (TID
> > 19788863)" #732821 daemon prio=5 os_prio=0 tid=0x7f02581b2800
> > nid=0x355d waiting for monitor entry [0x7effd1e3f000]
> >java.lang.Thread.State: BLOCKED (on object monitor)
> > at java.lang.ClassLoader.loadClass(ClassLoader.java:400)
> > - waiting to lock <0x7f0f7246edf8> (a java.lang.Object)
> > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
> > at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
> > at
> >
> scala.runtime.LambdaDeserializer$.deserializeLambda(LambdaDeserializer.scala:51)
> > at
> >
> scala.runtime.LambdaDeserialize.deserializeLambda(LambdaDeserialize.java:38)
> >
> >
> > Thanks
> > -Kohki
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Re: JavaSerializerInstance is slow

2021-09-02 Thread Antonin Delpeuch (lists)
Hi Kohki,

Serialization of tasks happens in local mode too and as far as I am
aware there is no way to disable this (although it would definitely be
useful in my opinion).

You can see the local mode as a testing mode, in which you would want to
catch any serialization errors, before they appear in production.

There are also some important bugs that are present in local mode and
are not deemed worth fixing because it is not intended to be used in
production (https://issues.apache.org/jira/browse/SPARK-5300).

I think there would definitely be interest in having a reliable and
efficient local mode in Spark but it's a pretty different use case than
what Spark originally focused on.

Antonin

On 03/09/2021 05:56, Kohki Nishio wrote:
> I'm seeing many threads doing deserialization of a task, I understand
> since lambda is involved, we can't use Kryo for those purposes.
> However I'm running it in local mode, this serialization is not really
> necessary, no?
>
> Is there any trick I can apply to get rid of this thread contention ?
> I'm seeing many of the below threads in thread dumps ... 
>
>
> "Executor task launch worker for task 11.0 in stage 15472514.0 (TID
> 19788863)" #732821 daemon prio=5 os_prio=0 tid=0x7f02581b2800
> nid=0x355d waiting for monitor entry [0x7effd1e3f000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:400)
> - waiting to lock <0x7f0f7246edf8> (a java.lang.Object)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
> at
> scala.runtime.LambdaDeserializer$.deserializeLambda(LambdaDeserializer.scala:51)
> at
> scala.runtime.LambdaDeserialize.deserializeLambda(LambdaDeserialize.java:38) 
>
>
> Thanks
> -Kohki

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



JavaSerializerInstance is slow

2021-09-02 Thread Kohki Nishio
I'm seeing many threads doing deserialization of a task, I understand since
lambda is involved, we can't use Kryo for those purposes. However I'm
running it in local mode, this serialization is not really necessary, no?

Is there any trick I can apply to get rid of this thread contention ? I'm
seeing many of the below threads in thread dumps ...


"Executor task launch worker for task 11.0 in stage 15472514.0 (TID
19788863)" #732821 daemon prio=5 os_prio=0 tid=0x7f02581b2800
nid=0x355d waiting for monitor entry [0x7effd1e3f000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at java.lang.ClassLoader.loadClass(ClassLoader.java:400)
- waiting to lock <0x7f0f7246edf8> (a java.lang.Object)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
at
scala.runtime.LambdaDeserializer$.deserializeLambda(LambdaDeserializer.scala:51)
at
scala.runtime.LambdaDeserialize.deserializeLambda(LambdaDeserialize.java:38)


Thanks
-Kohki