Thanks all for the answer. Unfortunately while I wasn’t able to use the
extra parameters to get the needed information, I did solve my issue. It
was an issue of using pureconfig to read a certain config from hadoop
before the spark session initialized, therefore pureconfig would error out
in deserializing the class before spark could configure properly.


On Tue, Feb 18, 2020 at 10:24 AM Maxim Gekk <maxim.g...@databricks.com>
wrote:

> Hi Ruijing,
>
> Spark uses SerializationDebugger (
> https://spark.apache.org/docs/latest/api/java/org/apache/spark/serializer/SerializationDebugger.html)
> as default debugger to detect the serialization issues. You can take more
> detailed serialization exception information by setting the following while
> creating a cluster:
> spark.driver.extraJavaOptions -Dsun.io.serialization.extendedDebugInfo=true
> spark.executor.extraJavaOptions
> -Dsun.io.serialization.extendedDebugInfo=true
>
> Maxim Gekk
>
> Software Engineer
>
> Databricks, Inc.
>
>
> On Tue, Feb 18, 2020 at 1:02 PM Ruijing Li <liruijin...@gmail.com> wrote:
>
>> Hi all,
>>
>> When working with spark jobs, I sometimes have to tackle with
>> serialization issues, and I have a difficult time trying to fix those. A
>> lot of times, the serialization issues happen only in cluster mode across
>> the network in a mesos container, so I can’t debug locally. And the
>> exception thrown by spark is not very helpful to find the cause.
>>
>> I’d love to hear some tips on how to debug in the right places. Also, I’d
>> be interested to know if in future releases it would be possible to point
>> out which class or function is causing the serialization issue (right now I
>> find its either Java generic classes or the class Spark is running itself).
>> Thanks!
>> --
>> Cheers,
>> Ruijing Li
>>
> --
Cheers,
Ruijing Li

Reply via email to