What's your storage like? are you adding worker machines that are
remote from where the data lives? I wonder if it just means you are
spending more and more time sending the data over the network as you
try to ship more of it to more remote workers.

To answer your question, no in general more workers means more
parallelism and therefore faster execution. But that depends on a lot
of things. For example, if your process isn't parallelize to use all
available execution slots, adding more slots doesn't do anything.

On Sat, Feb 21, 2015 at 2:51 PM, Deep Pradhan <pradhandeep1...@gmail.com> wrote:
> Yes, I am talking about standalone single node cluster.
>
> No, I am not increasing parallelism. I just wanted to know if it is natural.
> Does message passing across the workers account for the happenning?
>
> I am running SparkKMeans, just to validate one prediction model. I am using
> several data sets. I have a standalone mode. I am varying the workers from 1
> to 16
>
> On Sat, Feb 21, 2015 at 8:14 PM, Sean Owen <so...@cloudera.com> wrote:
>>
>> I can imagine a few reasons. Adding workers might cause fewer tasks to
>> execute locally (?) So you may be execute more remotely.
>>
>> Are you increasing parallelism? for trivial jobs, chopping them up
>> further may cause you to pay more overhead of managing so many small
>> tasks, for no speed up in execution time.
>>
>> Can you provide any more specifics though? you haven't said what
>> you're running, what mode, how many workers, how long it takes, etc.
>>
>> On Sat, Feb 21, 2015 at 2:37 PM, Deep Pradhan <pradhandeep1...@gmail.com>
>> wrote:
>> > Hi,
>> > I have been running some jobs in my local single node stand alone
>> > cluster. I
>> > am varying the worker instances for the same job, and the time taken for
>> > the
>> > job to complete increases with increase in the number of workers. I
>> > repeated
>> > some experiments varying the number of nodes in a cluster too and the
>> > same
>> > behavior is seen.
>> > Can the idea of worker instances be extrapolated to the nodes in a
>> > cluster?
>> >
>> > Thank You
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to