Hello, Currently I work on a project in which:
I spawn a standalone Apache Spark MLlib job in Standalone mode, from a running Java Process. In the code of the Spark Job I have the following code: SparkConf sparkConf = new SparkConf().setAppName("SparkParallelLoad"); sparkConf.set("spark.executor.memory", "8g"); JavaSparkContext sc = new JavaSparkContext(sparkConf); ... Also, in my ~/spark/conf/spark-env.sh I have the following values: SPARK_WORKER_CORES=1 export SPARK_WORKER_CORES=1 SPARK_WORKER_MEMORY=2g export SPARK_WORKER_MEMORY=2g SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.spark.executor.memory=4g" export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.spark.executor.memory=4g" During runtime I receive a Java OutOfMemory exception and a Core dump. My dataset is less than 1 GB and I want to make sure that I cache it all in memory for my ML task. Am I increasing the JVM Heap Memory correctly? Am I doing something wrong? Thank you, Nick