Re: Error in collecting RDD as a Map - IOException in collectAsMap

2016-07-23 Thread Andrew Ehrlich
+1 for the misleading error. Messages about failing to connect often mean that an executor has died. If so, dig into the executor logs and find out why the executor died (out of memory, perhaps). Andrew > On Jul 23, 2016, at 11:39 AM, VG wrote: > > Hi Pedro, > > Based on

Re: Error in collecting RDD as a Map - IOException in collectAsMap

2016-07-23 Thread VG
Hi Pedro, Based on your suggestion, I deployed this on a aws node and it worked fine. thanks for your advice. I am still trying to figure out the issues on the local environment Anyways thanks again -VG On Sat, Jul 23, 2016 at 9:26 PM, Pedro Rodriguez wrote: > Have

Re: Error in collecting RDD as a Map - IOException in collectAsMap

2016-07-23 Thread Marco Mistroni
Hi vg I believe the error msg is misleading. I had a similar one with pyspark yesterday after calling a count on a data frame, where the real error was with an incorrect user defined function being applied . Pls send me some sample code with a trimmed down version of the data and I see if i can

Re: Error in collecting RDD as a Map - IOException in collectAsMap

2016-07-23 Thread Pedro Rodriguez
Have you changed spark-env.sh or spark-defaults.conf from the default? It looks like spark is trying to address local workers based on a network address (eg 192.168……) instead of on localhost (localhost, 127.0.0.1, 0.0.0.0,…). Additionally, that network address doesn’t resolve correctly. You

Re: Error in collecting RDD as a Map - IOException in collectAsMap

2016-07-23 Thread VG
Hi pedro, Apologies for not adding this earlier. This is running on a local cluster set up as follows. JavaSparkContext jsc = new JavaSparkContext("local[2]", "DR"); Any suggestions based on this ? The ports are not blocked by firewall. Regards, On Sat, Jul 23, 2016 at 8:35 PM, Pedro

Re: Error in collecting RDD as a Map - IOException in collectAsMap

2016-07-23 Thread Pedro Rodriguez
Make sure that you don’t have ports firewalled. You don’t really give much information to work from, but it looks like the master can’t access the worker nodes for some reason. If you give more information on the cluster, networking, etc, it would help. For example, on AWS you can create a

Error in collecting RDD as a Map - IOException in collectAsMap

2016-07-23 Thread VG
Please suggest if I am doing something wrong or an alternative way of doing this. I have an RDD with two values as follows JavaPairRDD rdd When I execute rdd..collectAsMap() it always fails with IO exceptions. 16/07/23 19:03:58 ERROR RetryingBlockFetcher: Exception while