Re: Variables outside of mapPartitions scope

2014-05-12 Thread pedro
Right now I am not using any class variables (references to this). All my
variables are created within the scope of the method I am running.

I did more debugging and found this strange behavior.
variables here
for loop
mapPartitions call
use variables here
end mapPartitions
endfor

This will result in a serializable bug, but this won't

variables here
for loop
create new references to variables here
mapPartitions call
use new reference variables here
end mapPartitions
endfor



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Variables-outside-of-mapPartitions-scope-tp5517p5528.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Variables outside of mapPartitions scope

2014-05-13 Thread ankurdave
In general, you can find out exactly what's not serializable by adding
-Dsun.io.serialization.extendedDebugInfo=true to SPARK_JAVA_OPTS.
Since a this reference to the enclosing class is often what's causing the
problem, a general workaround is to move the mapPartitions call to a static
method where there is no this reference. This transforms this:
class A {  def f() = rdd.mapPartitions(iter => ...)}
into this:
class A {  def f() = A.helper(rdd)}object A {  def helper(rdd: RDD[...]) =
rdd.mapPartitions(iter => ...)}




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Variables-outside-of-mapPartitions-scope-tp5517p5527.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Variables outside of mapPartitions scope

2014-05-13 Thread DB Tsai
Scala's for-loop is not just looping; it's not native looping in bytecode
level. It will create a couple of objects at runtime and performs a
truckload of method calls on them. As a result, if you are referring the
variables outside the for-loop, the whole for-loop object and any variable
inside the loop have to be serializable. Since the for-loop is serializable
in scala, I guess you have something non-serializable inside the for-loop.

The while-loop in scala is native, so you won't have this issue if you use
while-loop.


Sincerely,

DB Tsai
---
My Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai


On Fri, May 9, 2014 at 1:13 PM, pedro  wrote:

> Right now I am not using any class variables (references to this). All my
> variables are created within the scope of the method I am running.
>
> I did more debugging and found this strange behavior.
> variables here
> for loop
> mapPartitions call
> use variables here
> end mapPartitions
> endfor
>
> This will result in a serializable bug, but this won't
>
> variables here
> for loop
> create new references to variables here
> mapPartitions call
> use new reference variables here
> end mapPartitions
> endfor
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Variables-outside-of-mapPartitions-scope-tp5517p5528.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>