Re: Not all workers seem to run in a standalone cluster setup by spark-ec2 script

2015-12-04 Thread Nicholas Chammas
Quick question: Are you processing gzipped files by any chance? It's a
common stumbling block people hit.

See: http://stackoverflow.com/q/27531816/877069

Nick

On Fri, Dec 4, 2015 at 2:28 PM Kyohey Hamaguchi 
wrote:

> Hi,
>
> I have setup a Spark standalone-cluster, which involves 5 workers,
> using spark-ec2 script.
>
> After submitting my Spark application, I had noticed that just one
> worker seemed to run the application and other 4 workers were doing
> nothing. I had confirmed this by checking CPU and memory usage on the
> Spark Web UI (CPU usage indicates zero and memory is almost fully
> availabile.)
>
> This is the command used to launch:
>
> $ ~/spark/ec2/spark-ec2 -k awesome-keypair-name -i
> /path/to/.ssh/awesome-private-key.pem --region ap-northeast-1
> --zone=ap-northeast-1a --slaves 5 --instance-type m1.large
> --hadoop-major-version yarn launch awesome-spark-cluster
>
> And the command to run application:
>
> $ ssh -i ~/path/to/awesome-private-key.pem root@ec2-master-host-name
> "mkdir ~/awesome"
> $ scp -i ~/path/to/awesome-private-key.pem spark.jar
> root@ec2-master-host-name:~/awesome && ssh -i
> ~/path/to/awesome-private-key.pem root@ec2-master-host-name
> "~/spark-ec2/copy-dir ~/awesome"
> $ ssh -i ~/path/to/awesome-private-key.pem root@ec2-master-host-name
> "~/spark/bin/spark-submit --num-executors 5 --executor-cores 2
> --executor-memory 5G --total-executor-cores 10 --driver-cores 2
> --driver-memory 5G --class com.example.SparkIsAwesome
> awesome/spark.jar"
>
> How do I let the all of the workers execute the app?
>
> Or do I have wrong understanding on what workers, slaves and executors are?
>
> My understanding is: Spark driver(or maybe master?) sends a part of
> jobs to each worker (== executor == slave), so a Spark cluster
> automatically exploits all resources available in the cluster. Is this
> some sort of misconception?
>
> Thanks,
>
> --
> Kyohey Hamaguchi
> TEL:  080-6918-1708
> Mail: tnzk.ma...@gmail.com
> Blog: http://blog.tnzk.org/
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Re: Not all workers seem to run in a standalone cluster setup by spark-ec2 script

2015-12-04 Thread Andy Davidson
Hi Kyohey

I think you need to pass the argument --master $MASTER_URL \


master_URL is something like
spark://ec2-54-215-112-121.us-west-1.compute.amazonaws.com:7077

Its the public url to your master


Andy

From:  Kyohey Hamaguchi 
Date:  Friday, December 4, 2015 at 11:28 AM
To:  "user @spark" 
Subject:  Not all workers seem to run in a standalone cluster setup by
spark-ec2 script

> Hi,
> 
> I have setup a Spark standalone-cluster, which involves 5 workers,
> using spark-ec2 script.
> 
> After submitting my Spark application, I had noticed that just one
> worker seemed to run the application and other 4 workers were doing
> nothing. I had confirmed this by checking CPU and memory usage on the
> Spark Web UI (CPU usage indicates zero and memory is almost fully
> availabile.)
> 
> This is the command used to launch:
> 
> $ ~/spark/ec2/spark-ec2 -k awesome-keypair-name -i
> /path/to/.ssh/awesome-private-key.pem --region ap-northeast-1
> --zone=ap-northeast-1a --slaves 5 --instance-type m1.large
> --hadoop-major-version yarn launch awesome-spark-cluster
> 
> And the command to run application:
> 
> $ ssh -i ~/path/to/awesome-private-key.pem root@ec2-master-host-name
> "mkdir ~/awesome"
> $ scp -i ~/path/to/awesome-private-key.pem spark.jar
> root@ec2-master-host-name:~/awesome && ssh -i
> ~/path/to/awesome-private-key.pem root@ec2-master-host-name
> "~/spark-ec2/copy-dir ~/awesome"
> $ ssh -i ~/path/to/awesome-private-key.pem root@ec2-master-host-name
> "~/spark/bin/spark-submit --num-executors 5 --executor-cores 2
> --executor-memory 5G --total-executor-cores 10 --driver-cores 2
> --driver-memory 5G --class com.example.SparkIsAwesome
> awesome/spark.jar"
> 
> How do I let the all of the workers execute the app?
> 
> Or do I have wrong understanding on what workers, slaves and executors are?
> 
> My understanding is: Spark driver(or maybe master?) sends a part of
> jobs to each worker (== executor == slave), so a Spark cluster
> automatically exploits all resources available in the cluster. Is this
> some sort of misconception?
> 
> Thanks,
> 
> --
> Kyohey Hamaguchi
> TEL:  080-6918-1708
> Mail: tnzk.ma...@gmail.com
> Blog: http://blog.tnzk.org/
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
> 
> 




Re: Not all workers seem to run in a standalone cluster setup by spark-ec2 script

2015-12-04 Thread Kyohey Hamaguchi
Andy,

Thank you for replying.

I am specifying exactly like it to --master. I just had missed it when
writing that email.
2015年12月5日(土) 9:27 Andy Davidson :

> Hi Kyohey
>
> I think you need to pass the argument --master $MASTER_URL \
>
>
> master_URL is something like spark://
> ec2-54-215-112-121.us-west-1.compute.amazonaws.com:7077
>
> Its the public url to your master
>
>
> Andy
>
> From: Kyohey Hamaguchi 
> Date: Friday, December 4, 2015 at 11:28 AM
> To: "user @spark" 
> Subject: Not all workers seem to run in a standalone cluster setup by
> spark-ec2 script
>
> Hi,
>
> I have setup a Spark standalone-cluster, which involves 5 workers,
> using spark-ec2 script.
>
> After submitting my Spark application, I had noticed that just one
> worker seemed to run the application and other 4 workers were doing
> nothing. I had confirmed this by checking CPU and memory usage on the
> Spark Web UI (CPU usage indicates zero and memory is almost fully
> availabile.)
>
> This is the command used to launch:
>
> $ ~/spark/ec2/spark-ec2 -k awesome-keypair-name -i
> /path/to/.ssh/awesome-private-key.pem --region ap-northeast-1
> --zone=ap-northeast-1a --slaves 5 --instance-type m1.large
> --hadoop-major-version yarn launch awesome-spark-cluster
>
> And the command to run application:
>
> $ ssh -i ~/path/to/awesome-private-key.pem root@ec2-master-host-name
> "mkdir ~/awesome"
> $ scp -i ~/path/to/awesome-private-key.pem spark.jar
> root@ec2-master-host-name:~/awesome && ssh -i
> ~/path/to/awesome-private-key.pem root@ec2-master-host-name
> "~/spark-ec2/copy-dir ~/awesome"
> $ ssh -i ~/path/to/awesome-private-key.pem root@ec2-master-host-name
> "~/spark/bin/spark-submit --num-executors 5 --executor-cores 2
> --executor-memory 5G --total-executor-cores 10 --driver-cores 2
> --driver-memory 5G --class com.example.SparkIsAwesome
> awesome/spark.jar"
>
> How do I let the all of the workers execute the app?
>
> Or do I have wrong understanding on what workers, slaves and executors are?
>
> My understanding is: Spark driver(or maybe master?) sends a part of
> jobs to each worker (== executor == slave), so a Spark cluster
> automatically exploits all resources available in the cluster. Is this
> some sort of misconception?
>
> Thanks,
>
> --
> Kyohey Hamaguchi
> TEL:  080-6918-1708
> Mail: tnzk.ma...@gmail.com
> Blog: http://blog.tnzk.org/
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>
>


Re: Not all workers seem to run in a standalone cluster setup by spark-ec2 script

2015-12-07 Thread Akhil Das
Whats in your SparkIsAwesome class? Just make sure that you are giving
enough partition to spark to evenly distribute the job throughout the
cluster.
Try submitting the job this way:

~/spark/bin/spark-submit --executor-cores 10 --executor-memory 5G
--driver-memory 5G --class com.example.SparkIsAwesome awesome/spark.jar


Thanks
Best Regards

On Sat, Dec 5, 2015 at 12:58 AM, Kyohey Hamaguchi 
wrote:

> Hi,
>
> I have setup a Spark standalone-cluster, which involves 5 workers,
> using spark-ec2 script.
>
> After submitting my Spark application, I had noticed that just one
> worker seemed to run the application and other 4 workers were doing
> nothing. I had confirmed this by checking CPU and memory usage on the
> Spark Web UI (CPU usage indicates zero and memory is almost fully
> availabile.)
>
> This is the command used to launch:
>
> $ ~/spark/ec2/spark-ec2 -k awesome-keypair-name -i
> /path/to/.ssh/awesome-private-key.pem --region ap-northeast-1
> --zone=ap-northeast-1a --slaves 5 --instance-type m1.large
> --hadoop-major-version yarn launch awesome-spark-cluster
>
> And the command to run application:
>
> $ ssh -i ~/path/to/awesome-private-key.pem root@ec2-master-host-name
> "mkdir ~/awesome"
> $ scp -i ~/path/to/awesome-private-key.pem spark.jar
> root@ec2-master-host-name:~/awesome && ssh -i
> ~/path/to/awesome-private-key.pem root@ec2-master-host-name
> "~/spark-ec2/copy-dir ~/awesome"
> $ ssh -i ~/path/to/awesome-private-key.pem root@ec2-master-host-name
> "~/spark/bin/spark-submit --num-executors 5 --executor-cores 2
> --executor-memory 5G --total-executor-cores 10 --driver-cores 2
> --driver-memory 5G --class com.example.SparkIsAwesome
> awesome/spark.jar"
>
> How do I let the all of the workers execute the app?
>
> Or do I have wrong understanding on what workers, slaves and executors are?
>
> My understanding is: Spark driver(or maybe master?) sends a part of
> jobs to each worker (== executor == slave), so a Spark cluster
> automatically exploits all resources available in the cluster. Is this
> some sort of misconception?
>
> Thanks,
>
> --
> Kyohey Hamaguchi
> TEL:  080-6918-1708
> Mail: tnzk.ma...@gmail.com
> Blog: http://blog.tnzk.org/
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>