Re: How can we control CPU and Memory per Spark job operation..

2016-07-22 Thread Pedro Rodriguez
Sorry, wasn’t very clear (looks like Pavan’s response was dropped from list for some reason as well). I am assuming that: 1) the first map is CPU bound 2) the second map is heavily memory bound To be specific, lets saw you are using 4 m3.2xlarge instances which have 8 CPUs and 30GB of ram each

Re: How can we control CPU and Memory per Spark job operation..

2016-07-17 Thread Jacek Laskowski
Hi, How would that help?! Why would you do that? Jacek On 17 Jul 2016 7:19 a.m., "Pedro Rodriguez" wrote: > You could call map on an RDD which has “many” partitions, then call > repartition/coalesce to drastically reduce the number of partitions so that > your second

Re: How can we control CPU and Memory per Spark job operation..

2016-07-16 Thread Pedro Rodriguez
You could call map on an RDD which has “many” partitions, then call repartition/coalesce to drastically reduce the number of partitions so that your second map job has less things running. — Pedro Rodriguez PhD Student in Large-Scale Machine Learning | CU Boulder Systems Oriented Data Scientist

Re: How can we control CPU and Memory per Spark job operation..

2016-07-16 Thread Jacek Laskowski
Hi, My understanding is that these two map functions will end up as a job with one stage (as if you wrote the two maps as a single map) so you really need as much vcores and memory as possible for map1 and map2. I initially thought about dynamic allocation of executors that may or may not help

How can we control CPU and Memory per Spark job operation..

2016-07-15 Thread Pavan Achanta
Hi All, Here is my use case: I have a pipeline job consisting of 2 map functions: 1. CPU intensive map operation that does not require a lot of memory. 2. Memory intensive map operation that requires upto 4 GB of memory. And this 4GB memory cannot be distributed since it is an NLP model.