Re: Worker and Nodes

2015-02-21 Thread Deep Pradhan
So increasing Executors without increasing physical resources If I have a 16 GB RAM system and then I allocate 1 GB for each executor, and give number of executors as 8, then I am increasing the resource right? In this case, how do you explain? Thank You On Sun, Feb 22, 2015 at 6:12 AM, Aaron

Re: Worker and Nodes

2015-02-21 Thread Aaron Davidson
Note that the parallelism (i.e., number of partitions) is just an upper bound on how much of the work can be done in parallel. If you have 200 partitions, then you can divide the work among between 1 and 200 cores and all resources will remain utilized. If you have more than 200 cores, though,

Re: Worker and Nodes

2015-02-21 Thread Deep Pradhan
Also, If I take SparkPageRank for example (org.apache.spark.examples), there are various RDDs that are created and transformed in the code that is written. If I want to increase the number of partitions and test out, what is the optimum number of partitions that gives me the best performance, I

Re: Worker and Nodes

2015-02-21 Thread Deep Pradhan
In this case, I just wanted to know if a single node cluster with various workers act like a simulator of a multi-node cluster with various nodes. Like, if we have a single node cluster with 10 workers, say, then can we tell that the same behavior will take place with cluster of 10 nodes? It is

Re: Worker and Nodes

2015-02-21 Thread Frank Austin Nothaft
There could be many different things causing this. For example, if you only have a single partition of data, increasing the number of tasks will only increase execution time due to higher scheduling overhead. Additionally, how large is a single partition in your application relative to the

Re: Worker and Nodes

2015-02-21 Thread Deep Pradhan
So, with the increase in the number of worker instances, if I also increase the degree of parallelism, will it make any difference? I can use this model even the other way round right? I can always predict the performance of an app with the increase in number of worker instances, the deterioration

Re: Worker and Nodes

2015-02-21 Thread Deep Pradhan
Yes, I am talking about standalone single node cluster. No, I am not increasing parallelism. I just wanted to know if it is natural. Does message passing across the workers account for the happenning? I am running SparkKMeans, just to validate one prediction model. I am using several data sets.

Re: Worker and Nodes

2015-02-21 Thread Sean Owen
What's your storage like? are you adding worker machines that are remote from where the data lives? I wonder if it just means you are spending more and more time sending the data over the network as you try to ship more of it to more remote workers. To answer your question, no in general more

Re: Worker and Nodes

2015-02-21 Thread Deep Pradhan
No, I just have a single node standalone cluster. I am not tweaking around with the code to increase parallelism. I am just running SparkKMeans that is there in Spark-1.0.0 I just wanted to know, if this behavior is natural. And if so, what causes this? Thank you On Sat, Feb 21, 2015 at 8:32

Re: Worker and Nodes

2015-02-21 Thread Sean Owen
Workers has a specific meaning in Spark. You are running many on one machine? that's possible but not usual. Each worker's executors have access to a fraction of your machine's resources then. If you're not increasing parallelism, maybe you're not actually using additional workers, so are using

Re: Worker and Nodes

2015-02-21 Thread Deep Pradhan
Yes, I have decreased the executor memory. But,if I have to do this, then I have to tweak around with the code corresponding to each configuration right? On Sat, Feb 21, 2015 at 8:47 PM, Sean Owen so...@cloudera.com wrote: Workers has a specific meaning in Spark. You are running many on one

Re: Worker and Nodes

2015-02-21 Thread Deep Pradhan
So, if I keep the number of instances constant and increase the degree of parallelism in steps, can I expect the performance to increase? Thank You On Sat, Feb 21, 2015 at 9:07 PM, Deep Pradhan pradhandeep1...@gmail.com wrote: So, with the increase in the number of worker instances, if I also

Re: Worker and Nodes

2015-02-21 Thread Sean Owen
I can imagine a few reasons. Adding workers might cause fewer tasks to execute locally (?) So you may be execute more remotely. Are you increasing parallelism? for trivial jobs, chopping them up further may cause you to pay more overhead of managing so many small tasks, for no speed up in

Worker and Nodes

2015-02-21 Thread Deep Pradhan
Hi, I have been running some jobs in my local single node stand alone cluster. I am varying the worker instances for the same job, and the time taken for the job to complete increases with increase in the number of workers. I repeated some experiments varying the number of nodes in a cluster too

Re: Worker and Nodes

2015-02-21 Thread Yiannis Gkoufas
Hi, I have experienced the same behavior. You are talking about standalone cluster mode right? BR On 21 February 2015 at 14:37, Deep Pradhan pradhandeep1...@gmail.com wrote: Hi, I have been running some jobs in my local single node stand alone cluster. I am varying the worker instances for