Spark memory distribution

2020-07-27 Thread dben
Hi, I'm having a computation held on top of a big dynamic model that is constantly having changes / online updates, therefore, thought that working in batch mode (stateless): s.t. requires of heavy model sent to spark will be less appropriate than working in stream mode. Therefore, was able to

Re: Understanding Spark Memory distribution

2015-04-13 Thread Imran Rashid
broadcast variables count towards spark.storage.memoryFraction, so they use the same pool of memory as cached RDDs. That being said, I'm really not sure why you are running into problems, it seems like you have plenty of memory available. Most likely its got nothing to do with broadcast

Re: Understanding Spark Memory distribution

2015-03-30 Thread giive chen
Hi Ankur If you using standalone mode, your config is wrong. You should use export SPARK_DAEMON_MEMORY=xxx in config/spark-env.sh. At least it works on my spark 1.3.0 standalone mode machine. BTW, The SPARK_DRIVER_MEMORY is used in Yarn mode and looks like the standalone mode don't use this

Re: Understanding Spark Memory distribution

2015-03-30 Thread Ankur Srivastava
Hi Wisely, I am running spark 1.2.1 and I have checked the process heap and it is running with all the heap that I am assigning and as I mentioned earlier I get OOM on workers not the driver or master. Thanks Ankur On Mon, Mar 30, 2015 at 9:24 AM, giive chen thegi...@gmail.com wrote: Hi Ankur

Re: Understanding Spark Memory distribution

2015-03-29 Thread Ankur Srivastava
Hi Wisely, I am running on Amazon EC2 instances so I can not doubt the hardware. Moreover my other pipelines run successfully except for this which involves Broadcasting large object. My spark-en.sh setting are: SPARK_MASTER_IP=MASTER-IP SPARK_LOCAL_IP=LOCAL-IP SPARK_DRIVER_MEMORY=24g

Re: Understanding Spark Memory distribution

2015-03-28 Thread Wisely Chen
Hi Ankur If your hardware is ok, looks like it is config problem. Can you show me the config of spark-env.sh or JVM config? Thanks Wisely Chen 2015-03-28 15:39 GMT+08:00 Ankur Srivastava ankur.srivast...@gmail.com: Hi Wisely, I have 26gb for driver and the master is running on m3.2xlarge

Re: Understanding Spark Memory distribution

2015-03-28 Thread Wisely Chen
Hi In broadcast, spark will collect the whole 3gb object into master node and broadcast to each slaves. It is very common situation that the master node don't have enough memory . What is your master node settings? Wisely Chen Ankur Srivastava ankur.srivast...@gmail.com 於 2015年3月28日 星期六寫道: I

Understanding Spark Memory distribution

2015-03-27 Thread Ankur Srivastava
Hi All, I am running a spark cluster on EC2 instances of type: m3.2xlarge. I have given 26gb of memory with all 8 cores to my executors. I can see that in the logs too: *15/03/27 21:31:06 INFO AppClient$ClientActor: Executor added: app-20150327213106-/0 on

Re: Understanding Spark Memory distribution

2015-03-27 Thread Ankur Srivastava
I have increased the spark.storage.memoryFraction to 0.4 but I still get OOM errors on Spark Executor nodes 15/03/27 23:19:51 INFO BlockManagerMaster: Updated info of block broadcast_5_piece10 15/03/27 23:19:51 INFO TorrentBroadcast: Reading broadcast variable 5 took 2704 ms 15/03/27 23:19:52