when HTTP connection is opened you are opening a connection between specific machine (with IP and NIC card) to another specific machine, so this can't be serialized and used on other machine right!!
This isn't spark limitation. I made a simple diagram if it helps. The Objects created at driver and passed to worker need to be serialized. The objects created at workers need not. In the diagram you have to create HTTPConnection on the executors independently of the driver. The HTTPConnection created at Executor-1 can be used for partitions P1-P3 of RDD available on that executor. Spark is tolerant and does allow passing objects from driver to workers, but in case if it reports "Task not serializable" it does indicate some object is having issue. mark the class as Serializable if you think if the object of it can be serialized. As I said in the beginning not everything could serializable particularly http connections, JDBC connections etc.. <http://apache-spark-user-list.1001560.n3.nabble.com/file/t8878/Picture1.png> -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org