Hi Ruijing,
Spark uses SerializationDebugger (
https://spark.apache.org/docs/latest/api/java/org/apache/spark/serializer/SerializationDebugger.html)
as default debugger to detect the serialization issues. You can take more
detailed serialization exception information by setting the following
Hi all,
When working with spark jobs, I sometimes have to tackle with serialization
issues, and I have a difficult time trying to fix those. A lot of times,
the serialization issues happen only in cluster mode across the network in
a mesos container, so I can’t debug locally. And the exception
> either materialize the Dataframe on HDFS (e.g. parquet or checkpoint)
I wonder if avro is a better candidate for this because it's row
oriented it should be faster to write/read for such a task. Never heard
about checkpoint.
Enrico Minack writes:
> It is not about very large or small, it is