After a lot of grovelling through logs, I found out that the Nagios monitor process detected that the machine was almost out of memory, and killed the SNAP executor process.
So why is the machine running out of memory? Each node has 128GB of RAM, 4 executors, about 40GB of data. It did run out of memory if I tried to cache() the RDD, but I would hope that persist() is implemented so that it would stream to disk without trying to materialize too much data in RAM. Ravi -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Lost-executors-tp11722p12032.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org