Yep it is in the REPL. I will try your solution and also to submit the whole
thing as a job jar. If this is true, this should be fixed, right? I will check
whether there is a ticket already. Somebody pointed me to
https://issues.apache.org/jira/browse/SPARK-2620 but I need to investigate.
Dear Spark Users,
I googled the web for several hours now but I don't find a solution for my
problem. So maybe someone from this list can help.
I have an RDD of case classes, generated from CSV files with Spark. When I used
the distinct operator, there were still duplicates. So I investigated
Is this in the Spark shell? Case classes don't work correctly in the Spark
shell unfortunately (though they do work in the Scala shell) because we change
the way lines of code compile to allow shipping functions across the network.
The best way to get case classes in there is to compile them