Hello,
We are currently running our data pipeline on spark which uses Cassandra as
the data source.
We are currently facing issue with the step where we create an rdd on data
in cassandra table and then try to run flatMapToPair to transform the
data but we are running into Too many open files. I
General ideas regarding too many open files:
Make sure ulimit is actually being set, especially if you're on mesos
(because of https://issues.apache.org/jira/browse/MESOS-123 ) Find the pid
of the executor process, and cat /proc/pid/limits
set spark.shuffle.consolidateFiles = true
try
Thank you Cody!!
I am going to try with the two settings you have mentioned.
We are currently running with Spark standalone cluster manager.
Thanks
Ankur
On Wed, Jan 7, 2015 at 1:20 PM, Cody Koeninger c...@koeninger.org wrote:
General ideas regarding too many open files:
Make sure ulimit