Thanks all for the answer. Unfortunately while I wasn’t able to use the extra parameters to get the needed information, I did solve my issue. It was an issue of using pureconfig to read a certain config from hadoop before the spark session initialized, therefore pureconfig would error out in deserializing the class before spark could configure properly.
On Tue, Feb 18, 2020 at 10:24 AM Maxim Gekk <maxim.g...@databricks.com> wrote: > Hi Ruijing, > > Spark uses SerializationDebugger ( > https://spark.apache.org/docs/latest/api/java/org/apache/spark/serializer/SerializationDebugger.html) > as default debugger to detect the serialization issues. You can take more > detailed serialization exception information by setting the following while > creating a cluster: > spark.driver.extraJavaOptions -Dsun.io.serialization.extendedDebugInfo=true > spark.executor.extraJavaOptions > -Dsun.io.serialization.extendedDebugInfo=true > > Maxim Gekk > > Software Engineer > > Databricks, Inc. > > > On Tue, Feb 18, 2020 at 1:02 PM Ruijing Li <liruijin...@gmail.com> wrote: > >> Hi all, >> >> When working with spark jobs, I sometimes have to tackle with >> serialization issues, and I have a difficult time trying to fix those. A >> lot of times, the serialization issues happen only in cluster mode across >> the network in a mesos container, so I can’t debug locally. And the >> exception thrown by spark is not very helpful to find the cause. >> >> I’d love to hear some tips on how to debug in the right places. Also, I’d >> be interested to know if in future releases it would be possible to point >> out which class or function is causing the serialization issue (right now I >> find its either Java generic classes or the class Spark is running itself). >> Thanks! >> -- >> Cheers, >> Ruijing Li >> > -- Cheers, Ruijing Li