Hi Anthony, Could you retry your scenario without the '-exec spark' option? By default, SystemML will run in hybrid_spark mode which is more efficient.
Thanks, Glenn From: Anthony Thomas <ahtho...@eng.ucsd.edu> To: dev@systemml.apache.org Date: 06/15/2017 09:50 AM Subject: Unexpected Executor Crash Hi SystemML Developers, I'm running the following simple DML script under SystemML 0.14: M = read('/scratch/M5.csv') N = read('/scratch/M5.csv') MN = M %*% N if (1 == 1) { print(as.scalar(MN[1,1])) } The matrix M is square and about 5GB on disk (stored in HDFS). I am submitting the script to a 2 node spark cluster where each physical machine has 30GB of RAM. I am using the following command to submit the job: $SPARK_HOME/bin/spark-submit --driver-memory=5G --executor-memory=25G --conf spark.driver.maxResultSize=0 --conf spark.akka.frameSize=128 --verbose --conf spark.serializer=org.apache.spark.serializer.KryoSerializer $SYSTEMML_HOME/SystemML.jar -f example.dml -exec spark -explain runtime However, I consistently run into errors like: ERROR TaskSchedulerImpl: Lost executor 1 on 172.31.3.116: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages. and the job eventually aborts. Consulting the output of executors shows they are crashing with OutOfMemory exceptions. Even if one executor needed to store M,N and MN in memory simultaneously it seems like there should be enough memory so I'm unsure why the executor is crashing. In addition, I was under the impression that Spark would spill to disk if there was insufficient memory. I've tried various combinations of increasing/decreasing the number of executor cores (from 1 to 8), using more/fewer executors, increasing/decreasing Spark's memoryFraction, and increasing/decreasing Spark's default parallelism all without success. Can anyone offer any advice or suggestions to debug this issue further? I'm not a very experienced Spark user so it's very possible I haven't configured something correctly. Please let me know if you'd like any further information. Best, Anthony Thomas