Re: IOException and appcache FileNotFoundException in Spark 1.02

2014-10-14 Thread Ilya Ganelin
Hello all . Does anyone else have any suggestions? Even understanding what this error is from would help a lot. On Oct 11, 2014 12:56 AM, Ilya Ganelin ilgan...@gmail.com wrote: Hi Akhil - I tried your suggestions and tried varying my partition sizes. Reducing the number of partitions led to

Re: IOException and appcache FileNotFoundException in Spark 1.02

2014-10-10 Thread Akhil Das
You could be hitting this issue https://issues.apache.org/jira/browse/SPARK-3633 (or similar). You can try the following workarounds: sc.set(spark.core.connection.ack.wait.timeout,600) sc.set(spark.akka.frameSize,50) Also reduce the number of partitions, you could be hitting the kernel's ulimit.

Re: IOException and appcache FileNotFoundException in Spark 1.02

2014-10-10 Thread Ilya Ganelin
Thank you - I will try this. If I drop the partition count am I not more likely to hit memory issues? Especially if the dataset is rather large? On Oct 10, 2014 3:19 AM, Akhil Das ak...@sigmoidanalytics.com wrote: You could be hitting this issue https://issues.apache.org/jira/browse/SPARK-3633

Re: IOException and appcache FileNotFoundException in Spark 1.02

2014-10-10 Thread Ilya Ganelin
Hi Akhil - I tried your suggestions and tried varying my partition sizes. Reducing the number of partitions led to memory errors (presumably - I saw IOExceptions much sooner). With the settings you provided the program ran for longer but ultimately crashes in the same way. I would like to

IOException and appcache FileNotFoundException in Spark 1.02

2014-10-09 Thread Ilya Ganelin
On Oct 9, 2014 10:18 AM, Ilya Ganelin ilgan...@gmail.com wrote: Hi all – I could use some help figuring out a couple of exceptions I’ve been getting regularly. I have been running on a fairly large dataset (150 gigs). With smaller datasets I don't have any issues. My sequence of operations is

IOException and appcache FileNotFoundException in Spark 1.02

2014-10-09 Thread Ilya Ganelin
Hi all – I could use some help figuring out a couple of exceptions I’ve been getting regularly. I have been running on a fairly large dataset (150 gigs). With smaller datasets I don't have any issues. My sequence of operations is as follows – unless otherwise specified, I am not caching: Map a