Re: Variables outside of mapPartitions scope
Right now I am not using any class variables (references to this). All my variables are created within the scope of the method I am running. I did more debugging and found this strange behavior. variables here for loop mapPartitions call use variables here end mapPartitions endfor This will result in a serializable bug, but this won't variables here for loop create new references to variables here mapPartitions call use new reference variables here end mapPartitions endfor -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Variables-outside-of-mapPartitions-scope-tp5517p5528.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Variables outside of mapPartitions scope
In general, you can find out exactly what's not serializable by adding -Dsun.io.serialization.extendedDebugInfo=true to SPARK_JAVA_OPTS. Since a this reference to the enclosing class is often what's causing the problem, a general workaround is to move the mapPartitions call to a static method where there is no this reference. This transforms this: class A { def f() = rdd.mapPartitions(iter => ...)} into this: class A { def f() = A.helper(rdd)}object A { def helper(rdd: RDD[...]) = rdd.mapPartitions(iter => ...)} -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Variables-outside-of-mapPartitions-scope-tp5517p5527.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Variables outside of mapPartitions scope
Scala's for-loop is not just looping; it's not native looping in bytecode level. It will create a couple of objects at runtime and performs a truckload of method calls on them. As a result, if you are referring the variables outside the for-loop, the whole for-loop object and any variable inside the loop have to be serializable. Since the for-loop is serializable in scala, I guess you have something non-serializable inside the for-loop. The while-loop in scala is native, so you won't have this issue if you use while-loop. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Fri, May 9, 2014 at 1:13 PM, pedro wrote: > Right now I am not using any class variables (references to this). All my > variables are created within the scope of the method I am running. > > I did more debugging and found this strange behavior. > variables here > for loop > mapPartitions call > use variables here > end mapPartitions > endfor > > This will result in a serializable bug, but this won't > > variables here > for loop > create new references to variables here > mapPartitions call > use new reference variables here > end mapPartitions > endfor > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Variables-outside-of-mapPartitions-scope-tp5517p5528.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >