are functions deserialized once per task?

Michael Albert Fri, 02 Oct 2015 09:38:23 -0700

Greetings!
Is it true that functions, such as those passed to RDD.map(), are deserialized 
once per task?This seems to be the case looking at Executor.scala, but I don't 
really understand the code.
I'm hoping the answer is yes because that makes it easier to write code without 
worrying about thread safety.For example, suppose I have something like 
this:class FooToBarTransformer{   def transform(foo: Foo): Bar = .....}
Now I want to do something like this:val rddFoo : RDD[FOO] = ....val 
transformer = new TransformerrddBar = rddFoo.map( foo => 
transformer.transform(foo))
If the "transformer" object is deserialized once per task, then I do not need 
to worry whether "transform()" is thread safe.If, for example, the 
implementation tried "optimize" matters by caching the deserialization, so that 
one object was sharedby all threads in a single JVM, then presumably one would 
need to worry about the thread safety of transform().
Is my understanding correct?Is this likely to continue to be true in future 
releases?Answers of "yes" would be much appreciated :-).
Thanks!-Mike

are functions deserialized once per task?

Reply via email to