Re: Spark job uses only one Worker

Michael Pisula Thu, 07 Jan 2016 14:15:07 -0800

All the workers were connected, I even saw the job being processed on
different workers, so that was working fine.
I will fire up the cluster again tomorrow and post the results of
connecting to 7077 and using --total-executor-cores 4.


Thanks for the help

On 07.01.2016 23:10, Igor Berman wrote:
> do you see in master ui that workers connected to master & before you
> are running your app there are 2 available cores in master ui per each
> worker?
> I understand that there are 2 cores on each worker - the question is
> do they got registered under master
>
> regarding port it's very strange, please post what is problem
> connecting to 7077
>
> use *--total-executor-cores 4 in your submit*
> *
> *
> if you can post master ui screen after you submitted your app
>
>
> On 8 January 2016 at 00:02, Michael Pisula <michael.pis...@tngtech.com
> <mailto:michael.pis...@tngtech.com>> wrote:
>
>     I had tried several parameters, including --total-executor-cores,
>     no effect.
>     As for the port, I tried 7077, but if I remember correctly I got
>     some kind of error that suggested to try 6066, with which it
>     worked just fine (apart from this issue here).
>
>     Each worker has two cores. I also tried increasing cores, again no
>     effect. I was able to increase the number of cores the job was
>     using on one worker, but it would not use any other worker (and it
>     would not start if the number of cores the job wanted was higher
>     than the number available on one worker).
>
>
>     On 07.01.2016 22:51, Igor Berman wrote:
>>     read about *--total-executor-cores*
>>     not sure why you specify port 6066 in master...usually it's 7077
>>     verify in master ui(usually port 8080) how many cores are
>>     there(depends on other configs, but usually workers connect to
>>     master with all their cores)
>>
>>     On 7 January 2016 at 23:46, Michael Pisula
>>     <michael.pis...@tngtech.com <mailto:michael.pis...@tngtech.com>>
>>     wrote:
>>
>>         Hi,
>>
>>         I start the cluster using the spark-ec2 scripts, so the
>>         cluster is in stand-alone mode.
>>         Here is how I submit my job:
>>         spark/bin/spark-submit --class demo.spark.StaticDataAnalysis
>>         --master spark://<host>:6066 --deploy-mode cluster
>>         demo/Demo-1.0-SNAPSHOT-all.jar
>>
>>         Cheers,
>>         Michael
>>
>>
>>         On 07.01.2016 22:41, Igor Berman wrote:
>>>         share how you submit your job
>>>         what cluster(yarn, standalone)
>>>
>>>         On 7 January 2016 at 23:24, Michael Pisula
>>>         <michael.pis...@tngtech.com
>>>         <mailto:michael.pis...@tngtech.com>> wrote:
>>>
>>>             Hi there,
>>>
>>>             I ran a simple Batch Application on a Spark Cluster on
>>>             EC2. Despite having 3
>>>             Worker Nodes, I could not get the application processed
>>>             on more than one
>>>             node, regardless if I submitted the Application in
>>>             Cluster or Client mode.
>>>             I also tried manually increasing the number of
>>>             partitions in the code, no
>>>             effect. I also pass the master into the application.
>>>             I verified on the nodes themselves that only one node
>>>             was active while the
>>>             job was running.
>>>             I pass enough data to make the job take 6 minutes to
>>>             process.
>>>             The job is simple enough, reading data from two S3
>>>             files, joining records on
>>>             a shared field, filtering out some records and writing
>>>             the result back to
>>>             S3.
>>>
>>>             Tried all kinds of stuff, but could not make it work. I
>>>             did find similar
>>>             questions, but had already tried the solutions that
>>>             worked in those cases.
>>>             Would be really happy about any pointers.
>>>
>>>             Cheers,
>>>             Michael
>>>
>>>
>>>
>>>             --
>>>             View this message in context:
>>>             
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-job-uses-only-one-Worker-tp25909.html
>>>             Sent from the Apache Spark User List mailing list
>>>             archive at Nabble.com.
>>>
>>>             
>>> ---------------------------------------------------------------------
>>>             To unsubscribe, e-mail:
>>>             user-unsubscr...@spark.apache.org
>>>             <mailto:user-unsubscr...@spark.apache.org>
>>>             For additional commands, e-mail:
>>>             user-h...@spark.apache.org
>>>             <mailto:user-h...@spark.apache.org>
>>>
>>>
>>
>>         -- 
>>         Michael Pisula * michael.pis...@tngtech.com 
>> <mailto:michael.pis...@tngtech.com> * +49-174-3180084 <tel:%2B49-174-3180084>
>>         TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
>>         Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
>>         Sitz: Unterföhring * Amtsgericht München * HRB 135082
>>
>>
>
>     -- 
>     Michael Pisula * michael.pis...@tngtech.com 
> <mailto:michael.pis...@tngtech.com> * +49-174-3180084 <tel:%2B49-174-3180084>
>     TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
>     Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
>     Sitz: Unterföhring * Amtsgericht München * HRB 135082
>
>

-- 
Michael Pisula * michael.pis...@tngtech.com * +49-174-3180084
TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
Sitz: Unterföhring * Amtsgericht München * HRB 135082

Re: Spark job uses only one Worker

Reply via email to