Lee, what cluster do you use? standalone, yarn-cluster, yarn-client, mesos? in yarn-cluster the driver program is executed inside one of nodes in cluster, so might be that driver code needs to be serialized to be sent to some node
On 5 June 2015 at 22:55, Lee McFadden <splee...@gmail.com> wrote: > > On Fri, Jun 5, 2015 at 12:30 PM Marcelo Vanzin <van...@cloudera.com> > wrote: > >> Ignoring the serialization thing (seems like a red herring): >> > > People seem surprised that I'm getting the Serialization exception at all > - I'm not convinced it's a red herring per se, but on to the blocking > issue... > > >> >> > You might be using this Cassandra library with an incompatible version of >> Spark; the `TaskMetrics` class has changed in the past, and the method it's >> looking for does not exist at least in 1.4. >> >> > You are correct, I was being a bone head. We recently downgraded to Spark > 1.2.1 and I was running the compiled jar using Spark 1.3.1 on my local > machine. Running the job with threading on my 1.2.1 cluster worked. Thank > you for finding the obvious mistake :) > > Regarding serialization, I'm still confused as to why I was getting a > serialization error in the first place as I'm executing these Runnable > classes from a java thread pool. I'm fairly new to Scala/JVM world and > there doesn't seem to be any Spark documentation to explain *why* I need to > declare the sc variable as @transient (or even that I should). > > I was under the impression that objects only need to be serializable when > they are sent over the network, and that doesn't seem to be occurring as > far as I can tell. > > Apologies if this is simple stuff, but I don't like "fixing things" > without knowing the full reason why the changes I made fixed things :) > > Thanks again for your time! >