Re: Spark job uses only one Worker

2016-01-08 Thread Michael Pisula
Hi Annabel,

I am using Spark in stand-alone mode (deployment using the ec2 scripts
packaged with spark).

Cheers,
Michael

On 08.01.2016 00:43, Annabel Melongo wrote:
> Michael,
>
> I don't know what's your environment but if it's Cloudera, you should
> be able to see the link to your master in the Hue.
>
> Thanks
>
>
> On Thursday, January 7, 2016 5:03 PM, Michael Pisula
> <michael.pis...@tngtech.com> wrote:
>
>
> I had tried several parameters, including --total-executor-cores, no
> effect.
> As for the port, I tried 7077, but if I remember correctly I got some
> kind of error that suggested to try 6066, with which it worked just
> fine (apart from this issue here).
>
> Each worker has two cores. I also tried increasing cores, again no
> effect. I was able to increase the number of cores the job was using
> on one worker, but it would not use any other worker (and it would not
> start if the number of cores the job wanted was higher than the number
> available on one worker).
>
> On 07.01.2016 22:51, Igor Berman wrote:
>> read about *--total-executor-cores*
>> not sure why you specify port 6066 in master...usually it's 7077
>> verify in master ui(usually port 8080) how many cores are
>> there(depends on other configs, but usually workers connect to master
>> with all their cores)
>>
>> On 7 January 2016 at 23:46, Michael Pisula
>> <michael.pis...@tngtech.com <mailto:michael.pis...@tngtech.com>> wrote:
>>
>> Hi,
>>
>> I start the cluster using the spark-ec2 scripts, so the cluster
>> is in stand-alone mode.
>> Here is how I submit my job:
>> spark/bin/spark-submit --class demo.spark.StaticDataAnalysis
>> --master spark://:6066 --deploy-mode cluster
>> demo/Demo-1.0-SNAPSHOT-all.jar
>>
>> Cheers,
>> Michael
>>
>>
>> On 07.01.2016 22:41, Igor Berman wrote:
>>> share how you submit your job
>>> what cluster(yarn, standalone)
>>>
>>> On 7 January 2016 at 23:24, Michael Pisula
>>> <michael.pis...@tngtech.com <mailto:michael.pis...@tngtech.com>>
>>> wrote:
>>>
>>> Hi there,
>>>
>>> I ran a simple Batch Application on a Spark Cluster on EC2.
>>> Despite having 3
>>> Worker Nodes, I could not get the application processed on
>>> more than one
>>> node, regardless if I submitted the Application in Cluster
>>> or Client mode.
>>> I also tried manually increasing the number of partitions in
>>> the code, no
>>> effect. I also pass the master into the application.
>>> I verified on the nodes themselves that only one node was
>>> active while the
>>> job was running.
>>> I pass enough data to make the job take 6 minutes to process.
>>> The job is simple enough, reading data from two S3 files,
>>> joining records on
>>> a shared field, filtering out some records and writing the
>>> result back to
>>> S3.
>>>
>>> Tried all kinds of stuff, but could not make it work. I did
>>> find similar
>>> questions, but had already tried the solutions that worked
>>> in those cases.
>>> Would be really happy about any pointers.
>>>
>>> Cheers,
>>> Michael
>>>
>>>
>>>
>>> --
>>>     View this message in context:
>>> 
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-job-uses-only-one-Worker-tp25909.html
>>> Sent from the Apache Spark User List mailing list archive at
>>> Nabble.com.
>>>
>>> 
>>> -----
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> <mailto:user-unsubscr...@spark.apache.org>
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>> <mailto:user-h...@spark.apache.org>
>>>
>>>
>>
>> -- 
>> Michael Pisula * michael.pis...@tngtech.com 
>> <mailto:michael.pis...@tngtech.com> * +49-174-3180084
>> TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
>> Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
>> Sitz: Unterföhring * Amtsgericht München * HRB 135082
>>
>>
>
> -- 
> Michael Pisula * michael.pis...@tngtech.com 
> <mailto:michael.pis...@tngtech.com> * +49-174-3180084
> TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
> Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
> Sitz: Unterföhring * Amtsgericht München * HRB 135082
>
>

-- 
Michael Pisula * michael.pis...@tngtech.com * +49-174-3180084
TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
Sitz: Unterföhring * Amtsgericht München * HRB 135082



Re: Spark job uses only one Worker

2016-01-07 Thread Michael Pisula
Hi,

I start the cluster using the spark-ec2 scripts, so the cluster is in
stand-alone mode.
Here is how I submit my job:
spark/bin/spark-submit --class demo.spark.StaticDataAnalysis --master
spark://:6066 --deploy-mode cluster demo/Demo-1.0-SNAPSHOT-all.jar

Cheers,
Michael

On 07.01.2016 22:41, Igor Berman wrote:
> share how you submit your job
> what cluster(yarn, standalone)
>
> On 7 January 2016 at 23:24, Michael Pisula <michael.pis...@tngtech.com
> <mailto:michael.pis...@tngtech.com>> wrote:
>
> Hi there,
>
> I ran a simple Batch Application on a Spark Cluster on EC2.
> Despite having 3
> Worker Nodes, I could not get the application processed on more
> than one
> node, regardless if I submitted the Application in Cluster or
> Client mode.
> I also tried manually increasing the number of partitions in the
> code, no
> effect. I also pass the master into the application.
> I verified on the nodes themselves that only one node was active
> while the
> job was running.
> I pass enough data to make the job take 6 minutes to process.
> The job is simple enough, reading data from two S3 files, joining
> records on
> a shared field, filtering out some records and writing the result
> back to
> S3.
>
> Tried all kinds of stuff, but could not make it work. I did find
> similar
> questions, but had already tried the solutions that worked in
> those cases.
> Would be really happy about any pointers.
>
> Cheers,
> Michael
>
>
>
> --
> View this message in context:
> 
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-job-uses-only-one-Worker-tp25909.html
> Sent from the Apache Spark User List mailing list archive at
> Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> <mailto:user-unsubscr...@spark.apache.org>
> For additional commands, e-mail: user-h...@spark.apache.org
> <mailto:user-h...@spark.apache.org>
>
>

-- 
Michael Pisula * michael.pis...@tngtech.com * +49-174-3180084
TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
Sitz: Unterföhring * Amtsgericht München * HRB 135082



Re: Spark job uses only one Worker

2016-01-07 Thread Michael Pisula
I had tried several parameters, including --total-executor-cores, no effect.
As for the port, I tried 7077, but if I remember correctly I got some
kind of error that suggested to try 6066, with which it worked just fine
(apart from this issue here).

Each worker has two cores. I also tried increasing cores, again no
effect. I was able to increase the number of cores the job was using on
one worker, but it would not use any other worker (and it would not
start if the number of cores the job wanted was higher than the number
available on one worker).

On 07.01.2016 22:51, Igor Berman wrote:
> read about *--total-executor-cores*
> not sure why you specify port 6066 in master...usually it's 7077
> verify in master ui(usually port 8080) how many cores are
> there(depends on other configs, but usually workers connect to master
> with all their cores)
>
> On 7 January 2016 at 23:46, Michael Pisula <michael.pis...@tngtech.com
> <mailto:michael.pis...@tngtech.com>> wrote:
>
> Hi,
>
> I start the cluster using the spark-ec2 scripts, so the cluster is
> in stand-alone mode.
> Here is how I submit my job:
> spark/bin/spark-submit --class demo.spark.StaticDataAnalysis
> --master spark://:6066 --deploy-mode cluster
> demo/Demo-1.0-SNAPSHOT-all.jar
>
> Cheers,
> Michael
>
>
> On 07.01.2016 22:41, Igor Berman wrote:
>> share how you submit your job
>> what cluster(yarn, standalone)
>>
>> On 7 January 2016 at 23:24, Michael Pisula
>> <michael.pis...@tngtech.com <mailto:michael.pis...@tngtech.com>>
>> wrote:
>>
>> Hi there,
>>
>> I ran a simple Batch Application on a Spark Cluster on EC2.
>> Despite having 3
>> Worker Nodes, I could not get the application processed on
>> more than one
>> node, regardless if I submitted the Application in Cluster or
>> Client mode.
>> I also tried manually increasing the number of partitions in
>> the code, no
>> effect. I also pass the master into the application.
>> I verified on the nodes themselves that only one node was
>> active while the
>> job was running.
>> I pass enough data to make the job take 6 minutes to process.
>> The job is simple enough, reading data from two S3 files,
>> joining records on
>> a shared field, filtering out some records and writing the
>> result back to
>> S3.
>>
>> Tried all kinds of stuff, but could not make it work. I did
>> find similar
>> questions, but had already tried the solutions that worked in
>> those cases.
>> Would be really happy about any pointers.
>>
>> Cheers,
>> Michael
>>
>>
>>
>> --
>> View this message in context:
>> 
>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-job-uses-only-one-Worker-tp25909.html
>> Sent from the Apache Spark User List mailing list archive at
>>     Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> <mailto:user-unsubscr...@spark.apache.org>
>> For additional commands, e-mail: user-h...@spark.apache.org
>>     <mailto:user-h...@spark.apache.org>
>>
>>
>
> -- 
> Michael Pisula * michael.pis...@tngtech.com 
> <mailto:michael.pis...@tngtech.com> * +49-174-3180084 <tel:%2B49-174-3180084>
> TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
> Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
> Sitz: Unterföhring * Amtsgericht München * HRB 135082
>
>

-- 
Michael Pisula * michael.pis...@tngtech.com * +49-174-3180084
TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
Sitz: Unterföhring * Amtsgericht München * HRB 135082



Re: Spark job uses only one Worker

2016-01-07 Thread Michael Pisula
All the workers were connected, I even saw the job being processed on
different workers, so that was working fine.
I will fire up the cluster again tomorrow and post the results of
connecting to 7077 and using --total-executor-cores 4.

Thanks for the help

On 07.01.2016 23:10, Igor Berman wrote:
> do you see in master ui that workers connected to master & before you
> are running your app there are 2 available cores in master ui per each
> worker?
> I understand that there are 2 cores on each worker - the question is
> do they got registered under master
>
> regarding port it's very strange, please post what is problem
> connecting to 7077
>
> use *--total-executor-cores 4 in your submit*
> *
> *
> if you can post master ui screen after you submitted your app
>
>
> On 8 January 2016 at 00:02, Michael Pisula <michael.pis...@tngtech.com
> <mailto:michael.pis...@tngtech.com>> wrote:
>
> I had tried several parameters, including --total-executor-cores,
> no effect.
> As for the port, I tried 7077, but if I remember correctly I got
> some kind of error that suggested to try 6066, with which it
> worked just fine (apart from this issue here).
>
> Each worker has two cores. I also tried increasing cores, again no
> effect. I was able to increase the number of cores the job was
> using on one worker, but it would not use any other worker (and it
> would not start if the number of cores the job wanted was higher
> than the number available on one worker).
>
>
> On 07.01.2016 22:51, Igor Berman wrote:
>> read about *--total-executor-cores*
>> not sure why you specify port 6066 in master...usually it's 7077
>> verify in master ui(usually port 8080) how many cores are
>> there(depends on other configs, but usually workers connect to
>> master with all their cores)
>>
>> On 7 January 2016 at 23:46, Michael Pisula
>> <michael.pis...@tngtech.com <mailto:michael.pis...@tngtech.com>>
>> wrote:
>>
>> Hi,
>>
>> I start the cluster using the spark-ec2 scripts, so the
>> cluster is in stand-alone mode.
>> Here is how I submit my job:
>> spark/bin/spark-submit --class demo.spark.StaticDataAnalysis
>> --master spark://:6066 --deploy-mode cluster
>> demo/Demo-1.0-SNAPSHOT-all.jar
>>
>> Cheers,
>> Michael
>>
>>
>> On 07.01.2016 22:41, Igor Berman wrote:
>>> share how you submit your job
>>> what cluster(yarn, standalone)
>>>
>>> On 7 January 2016 at 23:24, Michael Pisula
>>> <michael.pis...@tngtech.com
>>> <mailto:michael.pis...@tngtech.com>> wrote:
>>>
>>> Hi there,
>>>
>>> I ran a simple Batch Application on a Spark Cluster on
>>> EC2. Despite having 3
>>> Worker Nodes, I could not get the application processed
>>> on more than one
>>> node, regardless if I submitted the Application in
>>> Cluster or Client mode.
>>> I also tried manually increasing the number of
>>> partitions in the code, no
>>> effect. I also pass the master into the application.
>>> I verified on the nodes themselves that only one node
>>> was active while the
>>> job was running.
>>> I pass enough data to make the job take 6 minutes to
>>> process.
>>> The job is simple enough, reading data from two S3
>>> files, joining records on
>>> a shared field, filtering out some records and writing
>>> the result back to
>>> S3.
>>>
>>> Tried all kinds of stuff, but could not make it work. I
>>> did find similar
>>> questions, but had already tried the solutions that
>>> worked in those cases.
>>> Would be really happy about any pointers.
>>>
>>> Cheers,
>>> Michael
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> 
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-job-uses-only-one-Worker-tp25909.html
>>> Sent from the Apache Spark User List mailing list
>>> archive at Nabble.com.
>>>

Spark job uses only one Worker

2016-01-07 Thread Michael Pisula
Hi there,

I ran a simple Batch Application on a Spark Cluster on EC2. Despite having 3
Worker Nodes, I could not get the application processed on more than one
node, regardless if I submitted the Application in Cluster or Client mode.
I also tried manually increasing the number of partitions in the code, no
effect. I also pass the master into the application.
I verified on the nodes themselves that only one node was active while the
job was running.
I pass enough data to make the job take 6 minutes to process.
The job is simple enough, reading data from two S3 files, joining records on
a shared field, filtering out some records and writing the result back to
S3.

Tried all kinds of stuff, but could not make it work. I did find similar
questions, but had already tried the solutions that worked in those cases.
Would be really happy about any pointers.

Cheers,
Michael



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-job-uses-only-one-Worker-tp25909.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org