when HTTP connection is opened you are opening a connection between specific
machine (with IP and NIC card) to another specific machine, so this can't be
serialized and used on other machine right!!

This isn't spark limitation. 

I made a simple diagram if it helps. The Objects created at driver and
passed to worker need to be serialized. The objects created at workers need
not. 

In the diagram you have to create HTTPConnection on the executors
independently of the driver.
The HTTPConnection created at Executor-1 can be used for partitions P1-P3 of
RDD available on that executor. 

Spark is tolerant and does allow passing objects from driver to workers, but
in case if it reports "Task not serializable"  it does indicate some object
is having issue. mark the class as Serializable if you think if the object
of it can be serialized. As I said in the beginning not everything could
serializable particularly http connections, JDBC connections etc.. 

<http://apache-spark-user-list.1001560.n3.nabble.com/file/t8878/Picture1.png> 














--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to