How big is each entry, and how much memory do you have on each
executor? You generated all data on driver and
sc.parallelize(bytesList) will send the entire dataset to a single
executor. You may run into I/O or memory issues. If the entries are
generated, you should create a simple RDD sc.parallelize(0 until 20,
20) and call mapPartitions to generate them in parallel. -Xiangrui
On Wed, Apr 23, 2014 at 9:23 AM, amit karmakar
amit.codenam...@gmail.com wrote:
Spark hangs after i perform the following operations
ArrayListbyte[] bytesList = new ArrayListbyte[]();
/*
add 40k entries to bytesList
*/
JavaRDDbyte[] rdd = sparkContext.parallelize(bytesList);
System.out.println(Count= + rdd.count());
If i add just one entry it works.
It works if i modify,
JavaRDDbyte[] rdd = sparkContext.parallelize(bytesList)
to
JavaRDDbyte[] rdd = sparkContext.parallelize(bytesList, 20);
There is nothing in the logs that can help understand the reason.
What could be reason for this ?
Regards,
Amit Kumar Karmakar