Spark ignores SPARK_WORKER_MEMORY?
Hello, Although I'm setting SPARK_WORKER_MEMORY in spark-env.sh, looks like this setting is ignored. I can't find any indication at the scripts under bin/sbin that -Xms/-Xmx are set. If I ps the worker pid, it looks like memory set to 1G: [hadoop@sl-env1-hadoop1 spark-1.5.2-bin-hadoop2.6]$ ps -ef | grep 20232 hadoop 20232 1 0 02:01 ?00:00:22 /usr/java/latest//bin/java -cp /workspace/3rd-party/spark/spark-1.5.2-bin-hadoop2.6/sbin/../conf/:/workspace/3rd-party/spark/spark-1.5.2-bin-hadoop2.6/lib/spark-assembly-1.5.2-hadoop2.6.0.jar:/workspace/3rd-party/spark/spark-1.5.2-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/workspace/3rd-party/spark/spark-1.5.2-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/workspace/3rd-party/spark/spark-1.5.2-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/workspace/3rd-party/hadoop/2.6.3//etc/hadoop/ -Xms1g -Xmx1g org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://10.52.39.92:7077 Am I missing something? Thanks.
Lost tasks due to OutOfMemoryError (GC overhead limit exceeded)
Hello, I've a 5 nodes cluster which hosts both hdfs datanodes and spark workers. Each node has 8 cpu and 16G memory. Spark version is 1.5.2, spark-env.sh is as follow: export SPARK_MASTER_IP=10.52.39.92 export SPARK_WORKER_INSTANCES=4 export SPARK_WORKER_CORES=8 export SPARK_WORKER_MEMORY=4g And more settings done in the application code: sparkConf.set("spark.serializer","org.apache.spark.serializer.KryoSerializer"); sparkConf.set("spark.kryo.registrator",InternalKryoRegistrator.class.getName()); sparkConf.set("spark.kryo.registrationRequired","true"); sparkConf.set("spark.kryoserializer.buffer.max.mb","512"); sparkConf.set("spark.default.parallelism","300"); sparkConf.set("spark.rpc.askTimeout","500"); I'm trying to load data from hdfs and running some sqls on it (mostly groupby) using DataFrames. The logs keep saying that tasks are lost due to OutOfMemoryError (GC overhead limit exceeded). Can you advice what is the recommended settings (memory, cores, partitions, etc.) for the given hardware? Thanks!
Spark 1.5.1 standalone cluster - wrong Akka remoting config?
Doing my firsts steps with Spark, I'm facing problems submitting jobs to cluster from the application code. Digging the logs, I noticed some periodic WARN messages on master log: 15/10/08 13:00:00 WARN remote.ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkDriver@192.168.254.167:64014] has failed, address is now gated for [5000] ms. Reason: [Disassociated] The problem is that ip address not exist on our network, and wasn't configured anywhere. The same wrong ip is shown on the worker log when it tries execute the task (wrong ip passed to --driver-url): 15/10/08 12:58:21 INFO worker.ExecutorRunner: Launch command: "/usr/java/latest//bin/java" "-cp" "/path/spark/spark-1.5.1-bin-ha doop2.6/sbin/../conf/:/path/spark/spark-1.5.1-bin-hadoop2.6/lib/spark-assembly-1.5.1-hadoop2.6.0.jar:/path/spark/ spark-1.5.1-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/path/spark/spark-1.5.1-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.ja r:/path/spark/spark-1.5.1-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/path/hadoop/2.6.0//etc/hadoop/" "-Xms102 4M" "-Xmx1024M" "-Dspark.driver.port=64014" "-Dspark.driver.port=53411" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "akka.tcp://sparkDriver@192.168.254.167:64014/user/CoarseGrainedScheduler" "--executor-id" "39" "--hostname" "192.168.10.214" "--cores" "16" "--app-id" "app-20151008123702-0003" "--worker-url" "akka.tcp:// sparkWorker@192.168.10.214:37625/user/Worker" 15/10/08 12:59:28 INFO worker.Worker: Executor app-20151008123702-0003/39 finished with state EXITED message Command exited with code 1 exitStatus 1 Any idea what I did wrong and how can this be fixed? The java version is 1.8.0_20, and I'm using pre-built Spark binaries. Thanks!