Re: Count distinct and driver memory

2020-10-19 Thread Raghavendra Ganesh
cation with multiple actions, > Spark reexecutes the entire DAG for each action unless there is a cache in > between. I was trying to avoid reloading 1/2 a terabyte of data. Also, > cache should use up executor memory, not driver memory. > > As it turns out cache was the problem.

Re: Count distinct and driver memory

2020-10-19 Thread ayan guha
ith multiple > > actions, Spark reexecutes the entire DAG for each action unless there > > is a cache in between. I was trying to avoid reloading 1/2 a terabyte > > of data. Also, cache should use up executor memory, not driver > > memory. > why not counting the parquet

Re: Count distinct and driver memory

2020-10-19 Thread Nicolas Paris
te > of data. Also, cache should use up executor memory, not driver > memory. why not counting the parquet file instead? writing/reading a parquet files is more efficients than caching in my experience. if you really need caching you could choose a better strategy such DISK. Lalwani, Ja

Re: Count distinct and driver memory

2020-10-19 Thread Mich Talebzadeh
in > between. I was trying to avoid reloading 1/2 a terabyte of data. Also, > cache should use up executor memory, not driver memory. > > As it turns out cache was the problem. I didn't expect cache to take > Executor memory and spill over to disk. I don't know why it's taking drive

Re: Count distinct and driver memory

2020-10-19 Thread Lalwani, Jayesh
, cache should use up executor memory, not driver memory. As it turns out cache was the problem. I didn't expect cache to take Executor memory and spill over to disk. I don't know why it's taking driver memory. The input data has millions of partitions which results in millions of tasks. Perhaps

Re: Count distinct and driver memory

2020-10-19 Thread Nicolas Paris
> Before I write the data frame to parquet, I do df.cache. After writing > the file out, I do df.countDistinct(“a”, “b”, “c”).collect() if you write the df to parquet, why would you also cache it ? caching by default loads the memory. this might affect later use, such collect. the resulting GC

Re: Count distinct and driver memory

2020-10-18 Thread Gourav Sengupta
Hi, 6 billion rows is quite small, I can do it in my laptop with around 4 GB RAM. What is the version of SPARK you are using and what is the effective memory that you have per executor? Regards, Gourav Sengupta On Mon, Oct 19, 2020 at 4:24 AM Lalwani, Jayesh wrote: > I have a Dataframe with

Count distinct and driver memory

2020-10-18 Thread Lalwani, Jayesh
I have a Dataframe with around 6 billion rows, and about 20 columns. First of all, I want to write this dataframe out to parquet. The, Out of the 20 columns, I have 3 columns of interest, and I want to find how many distinct values of the columns are there in the file. I don’t need the actual

Re: LiveListenerBus is occupying most of the Driver Memory and frequent GC is degrading the performance

2020-09-11 Thread Teja
partition. >> >> We are using Spark v2.2.2 as of now. The major problem we are facing is >> due >> to GC on the driver. All of the driver memory (30G) is getting filled and >> GC >> is very active, which is taking more than 50% of the runtime for Full GC >> Evacua

Re: LiveListenerBus is occupying most of the Driver Memory and frequent GC is degrading the performance

2020-09-11 Thread Teja
Sorry for the poor formatting -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail:

Re: LiveListenerBus is occupying most of the Driver Memory and frequent GC is degrading the performance

2020-08-11 Thread Mridul Muralidharan
have ~30k partitions which make ~90MB per partition. > > We are using Spark v2.2.2 as of now. The major problem we are facing is due > to GC on the driver. All of the driver memory (30G) is getting filled and > GC > is very active, which is taking more than 50% of the runtime for Full G

Re: LiveListenerBus is occupying most of the Driver Memory and frequent GC is degrading the performance

2020-08-11 Thread Waleed Fateem
Hi Teja, The only thought I have is maybe considering decreasing the spark.scheduler.listenerbus.eventqueue.capacity parameter. That should decrease the driver memory pressure but of course you'll end up with dropping events probably more frequently, meaning you can't really trust anything you

LiveListenerBus is occupying most of the Driver Memory and frequent GC is degrading the performance

2020-08-11 Thread Teja
on the driver. All of the driver memory (30G) is getting filled and GC is very active, which is taking more than 50% of the runtime for Full GC Evacuation. The heap dump indicates that 80% of the memory is being occupied by LiveListenerBus and it's not being cleared by GC. Frequent GC runs are clearing newly

Re: Driver Memory taken up by BlockManager

2018-12-14 Thread Davide.Mandrini
Hello, I am facing a similar issue, have you found a solution for that issue? Cheers, Davide -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

--driver-memory allocation question

2018-04-20 Thread klrmowse
newb question... say, memory per node is 16GB for 6 nodes (for a total of 96GB for the cluster) is 16GB the max amount of memory that can be allocated to driver? (since, it is, after all, 16GB per node) Thanks -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

Re: pySpark driver memory limit

2017-11-09 Thread Sebastian Piu
This is my experience too when running under yarn at least On Thu, 9 Nov 2017, 07:11 Nicolas Paris, <nipari...@gmail.com> wrote: > Le 06 nov. 2017 à 19:56, Nicolas Paris écrivait : > > Can anyone clarify the driver memory aspects of pySpark? > > According to [1], spark.dri

Re: pySpark driver memory limit

2017-11-08 Thread Nicolas Paris
Le 06 nov. 2017 à 19:56, Nicolas Paris écrivait : > Can anyone clarify the driver memory aspects of pySpark? > According to [1], spark.driver.memory limits JVM + python memory. > > In case: > spark.driver.memory=2G > Then does it mean the user won't be able to use more

pySpark driver memory limit

2017-11-06 Thread Nicolas Paris
hi there Can anyone clarify the driver memory aspects of pySpark? According to [1], spark.driver.memory limits JVM + python memory. In case: spark.driver.memory=2G Then does it mean the user won't be able to use more than 2G, whatever the python code + the RDD stuff he is using ? Thanks, [1

Re: [E] Re: Spark Job is stuck at SUBMITTED when set Driver Memory > Executor Memory

2017-06-25 Thread Mich Talebzadeh
This typically works ok for standalone mode with moderate resources ${SPARK_HOME}/bin/spark-submit \ --driver-memory 6G \ --executor-memory 2G \ --num-executors 2 \ --executor-cores 2 \ --master spark

Re: [E] Re: Spark Job is stuck at SUBMITTED when set Driver Memory > Executor Memory

2017-06-12 Thread Rastogi, Pankaj
7 at 5:02 PM To: Abdulfattah Safa <fattah.s...@gmail.com<mailto:fattah.s...@gmail.com>> Cc: User <user@spark.apache.org<mailto:user@spark.apache.org>> Subject: [E] Re: Spark Job is stuck at SUBMITTED when set Driver Memory > Executor Memory You can add memory in your command m

Re: Spark Job is stuck at SUBMITTED when set Driver Memory > Executor Memory

2017-06-10 Thread vaquar khan
://spark.apache.org/docs/1.1.0/submitting-applications.html Also try to avoid function need memory like collect etc. Regards, Vaquar khan On Jun 4, 2017 5:46 AM, "Abdulfattah Safa" <fattah.s...@gmail.com> wrote: I'm working on Spark with Standalone Cluster mode. I need to increase t

Re: Spark Job is stuck at SUBMITTED when set Driver Memory > Executor Memory

2017-06-04 Thread khwunchai jaengsawang
Github <https://github.com/khwunchai> > On Jun 4, 2560 BE, at 6:51 PM, Abdulfattah Safa <fattah.s...@gmail.com> wrote: > > I'm working on Spark with Standalone Cluster mode. I need to increase the > Driver Memory as I got OOM in t he driver thread. If found that when

Spark Job is stuck at SUBMITTED when set Driver Memory > Executor Memory

2017-06-04 Thread Abdulfattah Safa
I'm working on Spark with Standalone Cluster mode. I need to increase the Driver Memory as I got OOM in t he driver thread. If found that when setting the Driver Memory to > Executor Memory, the submitted job is stuck at Submitted in the driver and the application never starts.

Spark Job is stuck at SUBMITTED when set Driver Memory > Executor Memory

2017-06-04 Thread Abdulfattah Safa
I'm working on Spark with Standalone Cluster mode. I need to increase the Driver Memory as I got OOM in t he driver thread. If found that when setting the Driver Memory to > Executor Memory, the submitted job is stuck at Submitted in the driver and the application never starts.

Re: No way to set mesos cluster driver memory overhead?

2016-10-13 Thread drewrobb
It seems like this is a real issue, so I've opened an issue: https://issues.apache.org/jira/browse/SPARK-17928 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/No-way-to-set-mesos-cluster-driver-memory-overhead-tp27897p27901.html Sent from the Apache Spark

Re: No way to set mesos cluster driver memory overhead?

2016-10-13 Thread Michael Gummelt
AM, drewrobb <drewr...@gmail.com> wrote: > When using spark on mesos and deploying a job in cluster mode using > dispatcher, there appears to be no memory overhead configuration for the > launched driver processes ("--driver-memory" is the same as Xmx which is > th

Re: No way to set mesos cluster driver memory overhead?

2016-10-13 Thread Rodrick Brown
On Thu, Oct 13, 2016 at 1:42 PM, drewrobb <drewr...@gmail.com> wrote: > When using spark on mesos and deploying a job in cluster mode using > dispatcher, there appears to be no memory overhead configuration for the > launched driver processes ("--driver-memory"

No way to set mesos cluster driver memory overhead?

2016-10-13 Thread drewrobb
When using spark on mesos and deploying a job in cluster mode using dispatcher, there appears to be no memory overhead configuration for the launched driver processes ("--driver-memory" is the same as Xmx which is the same as the memory quota). This makes it almost a guarantee that a lo

Spark driver memory breakdown

2016-08-26 Thread Mich Talebzadeh
this driver memory low you end up with heap space issue and the job crashes. So I had to increase the driver memory from 1G to 8G to make the job run. So in a nutshell how this driver memory is allocated in Standalone mode given that we also have executer memory --executor-memory that I set separately

Spark driver memory keeps growing

2016-08-08 Thread Pierre Villard
Hi, I'm running a job on Spark 1.5.2 and I get OutOfMemoryError on broadcast variables access. The thing is I am not sure to understand why the broadcast keeps growing and why it does at this place of code. Basically, I have a large input file, each line having a key. I group by key my lines to

RE: Spark SQL driver memory keeps rising

2016-06-16 Thread Mohammed Guller
Big-Data-Analytics-Spark-Practitioners/dp/1484209656/> From: Khaled Hammouda [mailto:khaled.hammo...@kik.com] Sent: Thursday, June 16, 2016 11:45 AM To: Mohammed Guller Cc: user Subject: Re: Spark SQL driver memory keeps rising I'm using pyspark and running in YARN client mode. I managed to ano

Re: Spark SQL driver memory keeps rising

2016-06-16 Thread Khaled Hammouda
the job was failing with the error "serialized results of x tasks () is bigger than spark.driver.maxResultSize (xxx)", which means the tasks were sending something to the driver, and that's probably what's causing the driver memory usage to keep rising. This happens at stages that read/writ

Re: Spark SQL driver memory keeps rising

2016-06-15 Thread Mich Talebzadeh
t; > > > Mohammed > > Author: Big Data Analytics with Spark > <http://www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/> > > > > *From:* Khaled Hammouda [mailto:khaled.hammo...@kik.com] > *Sent:* Tuesday, June 14, 2016 10:23 PM > *To:* user > *S

RE: Spark SQL driver memory keeps rising

2016-06-15 Thread Mohammed Guller
mmouda [mailto:khaled.hammo...@kik.com] Sent: Tuesday, June 14, 2016 10:23 PM To: user Subject: Spark SQL driver memory keeps rising I'm having trouble with a Spark SQL job in which I run a series of SQL transformations on data loaded from HDFS. The first two stages load data from hdfs input without

Spark SQL driver memory keeps rising

2016-06-14 Thread Khaled Hammouda
I'm having trouble with a Spark SQL job in which I run a series of SQL transformations on data loaded from HDFS. The first two stages load data from hdfs input without issues, but later stages that require shuffles cause the driver memory to keep rising until it is exhausted, and then the driver

Re: Saprk 1.6 Driver Memory Issue

2016-06-01 Thread kali.tumm...@gmail.com
rk.driver.maxReults"="20g" is it --conf or -conf ? Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Saprk-1-6-Driver-Memory-Issue-tp27063p27066.html Sent from the Apache Spark User

Re: Saprk 1.6 Driver Memory Issue

2016-06-01 Thread ashesh_28
yarn.nodemanager.vmem-pmem-ratio 5 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Saprk-1-6-Driver-Memory-Issue-tp27063p27064.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Saprk 1.6 Driver Memory Issue

2016-06-01 Thread Kishoore MV
: > > Hi All , > > I am getting spark driver memory issue even after overriding the conf by > using --conf spark.driver.maxResultSize=20g and I also mentioned in my sql > script (set spark.driver.maxResultSize =16;) but still the same error > happening. > > Job a

Saprk 1.6 Driver Memory Issue

2016-06-01 Thread kali.tumm...@gmail.com
Hi All , I am getting spark driver memory issue even after overriding the conf by using --conf spark.driver.maxResultSize=20g and I also mentioned in my sql script (set spark.driver.maxResultSize =16;) but still the same error happening. Job aborted due to stage failure: Total size

Re: LogisticRegression models consumes all driver memory

2015-09-25 Thread Eugene Zhulenev
Problem turned out to be in too high 'spark.default.parallelism', BinaryClassificationMetrics are doing combineByKey which internally shuffle train dataset. Lower parallelism + cutting train set RDD history with save/read into parquet solved the problem. Thanks for hint! On Wed, Sep 23, 2015 at

LogisticRegression models consumes all driver memory

2015-09-23 Thread Eugene Zhulenev
We are running Apache Spark 1.5.0 (latest code from 1.5 branch) We are running 2-3 LogisticRegression models in parallel (we'd love to run 10-20 actually), they are not really big at all, maybe 1-2 million rows in each model. Cluster itself, and all executors look good. Enough free memory and no

Re: LogisticRegression models consumes all driver memory

2015-09-23 Thread DB Tsai
You want to reduce the # of partitions to around the # of executors * cores. Since you have so many tasks/partitions which will give a lot of pressure on treeReduce in LoR. Let me know if this helps. Sincerely, DB Tsai -- Blog:

Re: LogisticRegression models consumes all driver memory

2015-09-23 Thread DB Tsai
Could you paste some of your code for diagnosis? Sincerely, DB Tsai -- Blog: https://www.dbtsai.com PGP Key ID: 0xAF08DF8D On Wed, Sep 23, 2015 at 3:19 PM, Eugene Zhulenev

Re: LogisticRegression models consumes all driver memory

2015-09-23 Thread DB Tsai
Your code looks correct for me. How many # of features do you have in this training? How many tasks are running in the job? Sincerely, DB Tsai -- Blog: https://www.dbtsai.com PGP Key ID: 0xAF08DF8D

Re: LogisticRegression models consumes all driver memory

2015-09-23 Thread Eugene Zhulenev
~3000 features, pretty sparse, I think about 200-300 non zero features in each row. We have 100 executors x 8 cores. Number of tasks is pretty big, 30k-70k, can't remember exact number. Training set is a result of pretty big join from multiple data frames, but it's cached. However as I understand

How does driver memory utilized

2015-09-15 Thread Renu
Hi I have query regarding driver memory what are the tasks in which driver memory is used? Please Help -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-does-driver-memory-utilized-tp24699.html Sent from the Apache Spark User List mailing list archive

How does driver memory utilized

2015-09-15 Thread Renu Yadav
Hi I have query regarding driver memory what are the tasks in which driver memory is used? Please Help

Re: Spark Effects of Driver Memory, Executor Memory, Driver Memory Overhead and Executor Memory Overhead on success of job runs

2015-09-05 Thread Timothy Sum Hon Mun
Hi Krishna, Thanks for your reply. I will definitely take a look at it to understand the configuration details. Best Regards, Tim On Tue, Sep 1, 2015 at 6:17 PM, Krishna Sangeeth KS < kskrishnasange...@gmail.com> wrote: > Hi Timothy, > > I think the driver memory in all your e

Re: Spark Effects of Driver Memory, Executor Memory, Driver Memory Overhead and Executor Memory Overhead on success of job runs

2015-09-01 Thread Krishna Sangeeth KS
Hi Timothy, I think the driver memory in all your examples is more than what is necessary in usual cases and executor memory is quite less. I found this devops talk[1] at spark-summit here to be super useful in understanding few of this configuration details. [1] https://.youtube.com/watch?v

Re: Spark Effects of Driver Memory, Executor Memory, Driver Memory Overhead and Executor Memory Overhead on success of job runs

2015-08-31 Thread Timothy Sum Hon Mun
your message. */bin/spark-submit --class --master yarn-cluster --driver-memory 11g --executor-memory 1g --num-executors 3 --executor-cores 1 --jars * If I do not mess with the default memory overhead settings as above, I have to use driver memory greater than 10g for my job to run

Re: Spark Effects of Driver Memory, Executor Memory, Driver Memory Overhead and Executor Memory Overhead on success of job runs

2015-08-31 Thread timothy22000
Added log files and diagnostics to first and second cases and removed the images. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Effects-of-Driver-Memory-Executor-Memory-Driver-Memory-Overhead-and-Executor-Memory-Overhead-os-tp24507p24528.html Sent

Re: Spark Effects of Driver Memory, Executor Memory, Driver Memory Overhead and Executor Memory Overhead on success of job runs

2015-08-31 Thread Sandy Ryza
st. > > First Case > > /`/bin/spark-submit --class --master yarn-cluster > --driver-memory 7g --executor-memory 1g --num-executors 3 --executor-cores > 1 > --jars ` > / > If I run my program with any driver memory less than 11g, I will get the > error below which is t

Spark Effects of Driver Memory, Executor Memory, Driver Memory Overhead and Executor Memory Overhead on success of job runs

2015-08-29 Thread timothy22000
filters an RDD to make it smaller (removing examples as part of an algorithm), then does mapToPair and collect to gather the results and save them within a list. First Case /`/bin/spark-submit --class class name --master yarn-cluster --driver-memory 7g --executor-memory 1g --num-executors 3

Driver memory default setting stops background jobs

2015-05-01 Thread Andreas Marfurt
Hi all, I encountered strange behavior with the driver memory setting, and was wondering if some of you experienced it as well, or know what the problem is. I want to start a Spark job in the background with spark-submit. If I have the driver memory setting in my spark-defaults.conf

Re: Driver memory leak?

2015-04-29 Thread Sean Owen
] is running beyond physical memory limits. Current usage: 2.5 GB of 2.5 GB physical memory used; 3.2 GB of 50 GB virtual memory used. Killing container. I set --driver-memory to 2g, In my mind, driver is responsibility for job scheduler and job monitor(Please correct me If I'm wrong), Why it using so

Re: Driver memory leak?

2015-04-29 Thread Serega Sheypak
memory limits. Current usage: 2.5 GB of 2.5 GB physical memory used; 3.2 GB of 50 GB virtual memory used. Killing container. I set --driver-memory to 2g, In my mind, driver is responsibility for job scheduler and job monitor(Please correct me If I'm wrong), Why it using so much memory? So I

Re: Driver memory leak?

2015-04-29 Thread Conor Fennell
limits. Current usage: 2.5 GB of 2.5 GB physical memory used; 3.2 GB of 50 GB virtual memory used. Killing container. I set --driver-memory to 2g, In my mind, driver is responsibility for job scheduler and job monitor(Please correct me If I'm wrong), Why it using so much memory? So I using

Re: Driver memory leak?

2015-04-29 Thread Sean Owen
GB of 2.5 GB physical memory used; 3.2 GB of 50 GB virtual memory used. Killing container. I set --driver-memory to 2g, In my mind, driver is responsibility for job scheduler and job monitor(Please correct me If I'm wrong), Why it using so much memory? So I using jmap to monitor other program

Re: Driver memory leak?

2015-04-29 Thread Tathagata Das
the program failed because of driver's OOM as follow: Container [pid=49133,containerID=container_1429773909253_0050_02_01] is running beyond physical memory limits. Current usage: 2.5 GB of 2.5 GB physical memory used; 3.2 GB of 50 GB virtual memory used. Killing container. I set --driver

RE: --driver-memory parameter doesn't work for spark-submmit on yarn?

2015-04-07 Thread Shuai Zheng
Sorry for reply late. I bypass this by set _JAVA_OPTIONS. And the ps aux | grep spark hadoop 14442 0.6 0.2 34334552 128560 pts/0 Sl+ 14:37 0:01 /usr/java/latest/bin/java org.apache.spark.deploy.SparkSubmitDriverBootstrapper --driver-memory=5G --executor-memory=10G --master yarn-client

Re: --driver-memory parameter doesn't work for spark-submmit on yarn?

2015-04-01 Thread Akhil Das
Zheng szheng.c...@gmail.com wrote: Hi All, Below is the my shell script: /home/hadoop/spark/bin/spark-submit --driver-memory=5G --executor-memory=40G --master yarn-client --class com.***.FinancialEngineExecutor /home/hadoop/lib/my.jar s3://bucket/vriscBatchConf.properties My driver

RE: --driver-memory parameter doesn't work for spark-submmit on yarn?

2015-04-01 Thread Shuai Zheng
Hi Akhil, Thanks a lot! After set export _JAVA_OPTIONS=-Xmx5g, the OutOfMemory exception disappeared. But this make me confused, so the driver-memory options doesn’t work for spark-submit to YARN (I haven’t check other clusters), is it a bug? Regards, Shuai From: Akhil Das

Re: --driver-memory parameter doesn't work for spark-submmit on yarn?

2015-04-01 Thread Sean Owen
I feel like I recognize that problem, and it's almost the inverse of https://issues.apache.org/jira/browse/SPARK-3884 which I was looking at today. The spark-class script didn't seem to handle all the ways that driver memory can be set. I think this is also something fixed by the new launcher

RE: --driver-memory parameter doesn't work for spark-submmit on yarn?

2015-04-01 Thread Shuai Zheng
, Shuai -Original Message- From: Sean Owen [mailto:so...@cloudera.com] Sent: Wednesday, April 01, 2015 10:51 AM To: Shuai Zheng Cc: Akhil Das; user@spark.apache.org Subject: Re: --driver-memory parameter doesn't work for spark-submmit on yarn? I feel like I recognize that problem, and it's

RE: --driver-memory parameter doesn't work for spark-submmit on yarn?

2015-04-01 Thread Bozeman, Christopher
Owen' Cc: 'Akhil Das'; user@spark.apache.org Subject: RE: --driver-memory parameter doesn't work for spark-submmit on yarn? Nice. But when my case shows that even I use Yarn-Client, I have same issue. I do verify it several times. And I am running 1.3.0 on EMR (use the version dispatch

--driver-memory parameter doesn't work for spark-submmit on yarn?

2015-03-31 Thread Shuai Zheng
Hi All, Below is the my shell script: /home/hadoop/spark/bin/spark-submit --driver-memory=5G --executor-memory=40G --master yarn-client --class com.***.FinancialEngineExecutor /home/hadoop/lib/my.jar s3://bucket/vriscBatchConf.properties My driver will load some resources

Re: Spark on YARN driver memory allocation bug?

2014-10-17 Thread Boduo Li
It may also cause a problem when running in the yarn-client mode. If --driver-memory is large, Yarn has to allocate a lot of memory to the AM container, but AM doesn't really need the memory. Boduo -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark

Re: Spark on YARN driver memory allocation bug?

2014-10-09 Thread Greg Hill
$MASTER is 'yarn-cluster' in spark-env.sh spark-submit --driver-memory 12424m --class org.apache.spark.examples.SparkPi /usr/lib/spark-yarn/lib/spark-examples*.jar 1000 OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x0006fd28, 4342677504, 0) failed; error='Cannot allocate

Re: Spark on YARN driver memory allocation bug?

2014-10-09 Thread Sandy Ryza
I filed https://issues.apache.org/jira/browse/SPARK-3884 to address this. -Sandy On Thu, Oct 9, 2014 at 7:05 AM, Greg Hill greg.h...@rackspace.com wrote: $MASTER is 'yarn-cluster' in spark-env.sh spark-submit --driver-memory 12424m --class org.apache.spark.examples.SparkPi /usr/lib

Spark on YARN driver memory allocation bug?

2014-10-08 Thread Greg Hill
So, I think this is a bug, but I wanted to get some feedback before I reported it as such. On Spark on YARN, 1.1.0, if you specify the --driver-memory value to be higher than the memory available on the client machine, Spark errors out due to failing to allocate enough memory. This happens

Re: Spark on YARN driver memory allocation bug?

2014-10-08 Thread Andrew Or
the --driver-memory value to be higher than the memory available on the client machine, Spark errors out due to failing to allocate enough memory. This happens even in yarn-cluster mode. Shouldn't it only allocate that memory on the YARN node that is going to run the driver process

driver memory management

2014-09-28 Thread Brad Miller
Hi All, I am interested to collect() a large RDD so that I can run a learning algorithm on it. I've noticed that when I don't increase SPARK_DRIVER_MEMORY I can run out of memory. I've also noticed that it looks like the same fraction of memory is reserved for storage on the driver as on the

Re: driver memory management

2014-09-28 Thread Reynold Xin
The storage fraction only limits the amount of memory used for storage. It doesn't actually limit anything else. I.e you can use all the memory if you want in collect. On Sunday, September 28, 2014, Brad Miller bmill...@eecs.berkeley.edu wrote: Hi All, I am interested to collect() a large RDD

recommended values for spark driver memory?

2014-09-23 Thread Greg Hill
I know the recommendation is it depends, but can people share what sort of memory allocations they're using for their driver processes? I'd like to get an idea of what the range looks like so we can provide sensible defaults without necessarily knowing what the jobs will look like. The

How to clear broadcast variable from driver memory?

2014-09-03 Thread Kevin Jung
Hi, I tried Broadcast.unpersist() on Spark 1.0.1 but MemoryStore(driver memory) still allocated it. //LOGS //Block broadcast_0 stored as values to memory (estimated size 380.1 MB, free 5.7 GB) The free size of memory was same after calling unpersist. Can I clear this? -- View this message

Re: How to clear broadcast variable from driver memory?

2014-09-03 Thread Andrew Or
(driver memory) still allocated it. //LOGS //Block broadcast_0 stored as values to memory (estimated size 380.1 MB, free 5.7 GB) The free size of memory was same after calling unpersist. Can I clear this? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How

driver memory

2014-07-23 Thread mrm
Hi, How do I increase the driver memory? This are my configs right now: sed 's/INFO/ERROR/' spark/conf/log4j.properties.template ./ephemeral-hdfs/conf/log4j.properties sed 's/INFO/ERROR/' spark/conf/log4j.properties.template spark/conf/log4j.properties # Environment variables and Spark

Re: driver memory

2014-07-23 Thread mrm
Hi, I figured out my problem so I wanted to share my findings. I was basically trying to broadcast an array with 4 million elements, and a size of approximatively 150 MB. Every time I was trying to broadcast, I got an OutOfMemory error. I fixed my problem by increasing the driver memory using

Re: driver memory

2014-07-23 Thread Andrew Or
and SPARK_EXECUTOR_MEMORY and other environment variables and configs. Note that while spark.executor.memory is an equivalent config, spark.driver.memory is only used for YARN. If you are using Spark 1.0+, the recommended way of specifying driver memory is through the --driver-memory command line argument of spark