Spark with Cassandra - Shuffle opening to many files

2015-01-07 Thread Ankur Srivastava
Hello, We are currently running our data pipeline on spark which uses Cassandra as the data source. We are currently facing issue with the step where we create an rdd on data in cassandra table and then try to run flatMapToPair to transform the data but we are running into Too many open files. I

Re: Spark with Cassandra - Shuffle opening to many files

2015-01-07 Thread Cody Koeninger
General ideas regarding too many open files: Make sure ulimit is actually being set, especially if you're on mesos (because of https://issues.apache.org/jira/browse/MESOS-123 ) Find the pid of the executor process, and cat /proc/pid/limits set spark.shuffle.consolidateFiles = true try

Re: Spark with Cassandra - Shuffle opening to many files

2015-01-07 Thread Ankur Srivastava
Thank you Cody!! I am going to try with the two settings you have mentioned. We are currently running with Spark standalone cluster manager. Thanks Ankur On Wed, Jan 7, 2015 at 1:20 PM, Cody Koeninger c...@koeninger.org wrote: General ideas regarding too many open files: Make sure ulimit