Re: spark on ec2
I don't see anything that says you must explicitly restart them to load the new settings, but usually there is some sort of signal trapped [or brute force full restart] to get a configuration reload for most daemons. I'd take a guess and use the $SPARK_HOME/sbin/{stop,start}-slaves.sh scripts on your master node and see. ( http://spark.apache.org/docs/1.2.0/spark-standalone.html#cluster-launch-scripts ) I just tested this out on my integration EC2 cluster and got odd results for stopping the workers (no workers found) but the start script... seemed to work. My integration cluster was running and functioning after executing both scripts, but I also didn't make any changes to spark-env either. On Thu Feb 05 2015 at 9:49:49 PM Kane Kim wrote: > Hi, > > I'm trying to change setting as described here: > http://spark.apache.org/docs/1.2.0/ec2-scripts.html > export SPARK_WORKER_CORES=6 > > Then I ran ~/spark-ec2/copy-dir /root/spark/conf to distribute to > slaves, but without any effect. Do I have to restart workers? > How to do that with spark-ec2? > > Thanks. >
Re: spark on ec2
Oh yeah, they picked up changes after restart, thanks! On Thu, Feb 5, 2015 at 8:13 PM, Charles Feduke wrote: > I don't see anything that says you must explicitly restart them to load > the new settings, but usually there is some sort of signal trapped [or > brute force full restart] to get a configuration reload for most daemons. > I'd take a guess and use the $SPARK_HOME/sbin/{stop,start}-slaves.sh > scripts on your master node and see. ( > http://spark.apache.org/docs/1.2.0/spark-standalone.html#cluster-launch-scripts > ) > > I just tested this out on my integration EC2 cluster and got odd results > for stopping the workers (no workers found) but the start script... seemed > to work. My integration cluster was running and functioning after executing > both scripts, but I also didn't make any changes to spark-env either. > > On Thu Feb 05 2015 at 9:49:49 PM Kane Kim wrote: > >> Hi, >> >> I'm trying to change setting as described here: >> http://spark.apache.org/docs/1.2.0/ec2-scripts.html >> export SPARK_WORKER_CORES=6 >> >> Then I ran ~/spark-ec2/copy-dir /root/spark/conf to distribute to >> slaves, but without any effect. Do I have to restart workers? >> How to do that with spark-ec2? >> >> Thanks. >> >
Re: Spark on EC2
Hi, As a real spark cluster needs a least one master and one slaves, you need to launch two machine. Therefore the second machine is not free. However, If you run spark on local mode on a ec2 machine. It is free. The charge of AWS depends on how much and the types of machine that you launched, but not on the utilisation of machine. Hope it would help. Cheers Gen On Tue, Feb 24, 2015 at 3:55 PM, Deep Pradhan wrote: > Hi, > I have just signed up for Amazon AWS because I learnt that it provides > service for free for the first 12 months. > I want to run Spark on EC2 cluster. Will they charge me for this? > > Thank You >
Re: Spark on EC2
Kindly bear with my questions as I am new to this. >> If you run spark on local mode on a ec2 machine What does this mean? Is it that I launch Spark cluster from my local machine,i.e., by running the shell script that is there in /spark/ec2? On Tue, Feb 24, 2015 at 8:32 PM, gen tang wrote: > Hi, > > As a real spark cluster needs a least one master and one slaves, you need > to launch two machine. Therefore the second machine is not free. > However, If you run spark on local mode on a ec2 machine. It is free. > > The charge of AWS depends on how much and the types of machine that you > launched, but not on the utilisation of machine. > > Hope it would help. > > Cheers > Gen > > > On Tue, Feb 24, 2015 at 3:55 PM, Deep Pradhan > wrote: > >> Hi, >> I have just signed up for Amazon AWS because I learnt that it provides >> service for free for the first 12 months. >> I want to run Spark on EC2 cluster. Will they charge me for this? >> >> Thank You >> > >
Re: Spark on EC2
The free tier includes 750 hours of t2.micro instance time per month. http://aws.amazon.com/free/ That's basically a month of hours, so it's all free if you run one instance only at a time. If you run 4, you'll be able to run your cluster of 4 for about a week free. A t2.micro has 1GB of memory, which is small but something you could possible get work done with. However it provides only burst CPU. You can only use about 10% of 1 vCPU continuously due to capping. Imagine this as about 1/10th of 1 core on your laptop. It would be incredibly slow. This is not to mention the network and I/O bottleneck you're likely to run into as you don't get much provisioning with these free instances. So, no you really can't use this for anything that is at all CPU intensive. It's for, say, running a low-traffic web service. On Tue, Feb 24, 2015 at 2:55 PM, Deep Pradhan wrote: > Hi, > I have just signed up for Amazon AWS because I learnt that it provides > service for free for the first 12 months. > I want to run Spark on EC2 cluster. Will they charge me for this? > > Thank You - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark on EC2
Thank You Sean. I was just trying to experiment with the performance of Spark Applications with various worker instances (I hope you remember that we discussed about the worker instances). I thought it would be a good one to try in EC2. So, it doesn't work out, does it? Thank You On Tue, Feb 24, 2015 at 8:40 PM, Sean Owen wrote: > The free tier includes 750 hours of t2.micro instance time per month. > http://aws.amazon.com/free/ > > That's basically a month of hours, so it's all free if you run one > instance only at a time. If you run 4, you'll be able to run your > cluster of 4 for about a week free. > > A t2.micro has 1GB of memory, which is small but something you could > possible get work done with. > > However it provides only burst CPU. You can only use about 10% of 1 > vCPU continuously due to capping. Imagine this as about 1/10th of 1 > core on your laptop. It would be incredibly slow. > > This is not to mention the network and I/O bottleneck you're likely to > run into as you don't get much provisioning with these free instances. > > So, no you really can't use this for anything that is at all CPU > intensive. It's for, say, running a low-traffic web service. > > On Tue, Feb 24, 2015 at 2:55 PM, Deep Pradhan > wrote: > > Hi, > > I have just signed up for Amazon AWS because I learnt that it provides > > service for free for the first 12 months. > > I want to run Spark on EC2 cluster. Will they charge me for this? > > > > Thank You >
Re: Spark on EC2
This should help you understand the cost of running a Spark cluster for a short period of time: http://www.ec2instances.info/ If you run an instance for even 1 second of a single hour you are charged for that complete hour. So before you shut down your miniature cluster make sure you really are done with what you want to do, as firing up the cluster again will be like using an extra hour's worth of time. The purpose of EC2's free tier is to get you to purchase into AWS services. At the free level its not terribly useful except for the most simplest of web applications (which you could host on Heroku - also uses AWS - for free) or simple long running but largely dormant shell processes. On Tue Feb 24 2015 at 10:16:56 AM Deep Pradhan wrote: > Thank You Sean. > I was just trying to experiment with the performance of Spark Applications > with various worker instances (I hope you remember that we discussed about > the worker instances). > I thought it would be a good one to try in EC2. So, it doesn't work out, > does it? > > Thank You > > On Tue, Feb 24, 2015 at 8:40 PM, Sean Owen wrote: > >> The free tier includes 750 hours of t2.micro instance time per month. >> http://aws.amazon.com/free/ >> >> That's basically a month of hours, so it's all free if you run one >> instance only at a time. If you run 4, you'll be able to run your >> cluster of 4 for about a week free. >> >> A t2.micro has 1GB of memory, which is small but something you could >> possible get work done with. >> >> However it provides only burst CPU. You can only use about 10% of 1 >> vCPU continuously due to capping. Imagine this as about 1/10th of 1 >> core on your laptop. It would be incredibly slow. >> >> This is not to mention the network and I/O bottleneck you're likely to >> run into as you don't get much provisioning with these free instances. >> >> So, no you really can't use this for anything that is at all CPU >> intensive. It's for, say, running a low-traffic web service. >> >> On Tue, Feb 24, 2015 at 2:55 PM, Deep Pradhan >> wrote: >> > Hi, >> > I have just signed up for Amazon AWS because I learnt that it provides >> > service for free for the first 12 months. >> > I want to run Spark on EC2 cluster. Will they charge me for this? >> > >> > Thank You >> > >
Re: Spark on EC2
You can definitely, easily, try a 1-node standalone cluster for free. Just don't be surprised when the CPU capping kicks in within about 5 minutes of any non-trivial computation and suddenly the instance is very s-l-o-w. I would consider just paying the ~$0.07/hour to play with an m3.medium, which ought to be pretty OK for basic experimentation. On Tue, Feb 24, 2015 at 3:14 PM, Deep Pradhan wrote: > Thank You Sean. > I was just trying to experiment with the performance of Spark Applications > with various worker instances (I hope you remember that we discussed about > the worker instances). > I thought it would be a good one to try in EC2. So, it doesn't work out, > does it? > > Thank You > > On Tue, Feb 24, 2015 at 8:40 PM, Sean Owen wrote: >> >> The free tier includes 750 hours of t2.micro instance time per month. >> http://aws.amazon.com/free/ >> >> That's basically a month of hours, so it's all free if you run one >> instance only at a time. If you run 4, you'll be able to run your >> cluster of 4 for about a week free. >> >> A t2.micro has 1GB of memory, which is small but something you could >> possible get work done with. >> >> However it provides only burst CPU. You can only use about 10% of 1 >> vCPU continuously due to capping. Imagine this as about 1/10th of 1 >> core on your laptop. It would be incredibly slow. >> >> This is not to mention the network and I/O bottleneck you're likely to >> run into as you don't get much provisioning with these free instances. >> >> So, no you really can't use this for anything that is at all CPU >> intensive. It's for, say, running a low-traffic web service. >> >> On Tue, Feb 24, 2015 at 2:55 PM, Deep Pradhan >> wrote: >> > Hi, >> > I have just signed up for Amazon AWS because I learnt that it provides >> > service for free for the first 12 months. >> > I want to run Spark on EC2 cluster. Will they charge me for this? >> > >> > Thank You > > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark on EC2
No, I think I am ok with the time it takes. Just that, with the increase in the partitions along with the increase in the number of workers, I want to see the improvement in the performance of an application. I just want to see this happen. Any comments? Thank You On Tue, Feb 24, 2015 at 8:52 PM, Sean Owen wrote: > You can definitely, easily, try a 1-node standalone cluster for free. > Just don't be surprised when the CPU capping kicks in within about 5 > minutes of any non-trivial computation and suddenly the instance is > very s-l-o-w. > > I would consider just paying the ~$0.07/hour to play with an > m3.medium, which ought to be pretty OK for basic experimentation. > > On Tue, Feb 24, 2015 at 3:14 PM, Deep Pradhan > wrote: > > Thank You Sean. > > I was just trying to experiment with the performance of Spark > Applications > > with various worker instances (I hope you remember that we discussed > about > > the worker instances). > > I thought it would be a good one to try in EC2. So, it doesn't work out, > > does it? > > > > Thank You > > > > On Tue, Feb 24, 2015 at 8:40 PM, Sean Owen wrote: > >> > >> The free tier includes 750 hours of t2.micro instance time per month. > >> http://aws.amazon.com/free/ > >> > >> That's basically a month of hours, so it's all free if you run one > >> instance only at a time. If you run 4, you'll be able to run your > >> cluster of 4 for about a week free. > >> > >> A t2.micro has 1GB of memory, which is small but something you could > >> possible get work done with. > >> > >> However it provides only burst CPU. You can only use about 10% of 1 > >> vCPU continuously due to capping. Imagine this as about 1/10th of 1 > >> core on your laptop. It would be incredibly slow. > >> > >> This is not to mention the network and I/O bottleneck you're likely to > >> run into as you don't get much provisioning with these free instances. > >> > >> So, no you really can't use this for anything that is at all CPU > >> intensive. It's for, say, running a low-traffic web service. > >> > >> On Tue, Feb 24, 2015 at 2:55 PM, Deep Pradhan < > pradhandeep1...@gmail.com> > >> wrote: > >> > Hi, > >> > I have just signed up for Amazon AWS because I learnt that it provides > >> > service for free for the first 12 months. > >> > I want to run Spark on EC2 cluster. Will they charge me for this? > >> > > >> > Thank You > > > > >
Re: Spark on EC2
Hi, I am sorry that I made a mistake on AWS tarif. You can read the email of sean owen which explains better the strategies to run spark on AWS. For your question: it means that you just download spark and unzip it. Then run spark shell by ./bin/spark-shell or ./bin/pyspark. It is useful to get familiar with spark. You can do this on your laptop as well as on ec2. In fact, running ./ec2/spark-ec2 means launching spark standalone mode on a cluster, you can find more details here: https://spark.apache.org/docs/latest/spark-standalone.html Cheers Gen On Tue, Feb 24, 2015 at 4:07 PM, Deep Pradhan wrote: > Kindly bear with my questions as I am new to this. > >> If you run spark on local mode on a ec2 machine > What does this mean? Is it that I launch Spark cluster from my local > machine,i.e., by running the shell script that is there in /spark/ec2? > > On Tue, Feb 24, 2015 at 8:32 PM, gen tang wrote: > >> Hi, >> >> As a real spark cluster needs a least one master and one slaves, you need >> to launch two machine. Therefore the second machine is not free. >> However, If you run spark on local mode on a ec2 machine. It is free. >> >> The charge of AWS depends on how much and the types of machine that you >> launched, but not on the utilisation of machine. >> >> Hope it would help. >> >> Cheers >> Gen >> >> >> On Tue, Feb 24, 2015 at 3:55 PM, Deep Pradhan >> wrote: >> >>> Hi, >>> I have just signed up for Amazon AWS because I learnt that it provides >>> service for free for the first 12 months. >>> I want to run Spark on EC2 cluster. Will they charge me for this? >>> >>> Thank You >>> >> >> >
Re: Spark on EC2
Thank You All. I think I will look into paying ~$0.7/hr as Sean suggested. On Tue, Feb 24, 2015 at 9:01 PM, gen tang wrote: > Hi, > > I am sorry that I made a mistake on AWS tarif. You can read the email of > sean owen which explains better the strategies to run spark on AWS. > > For your question: it means that you just download spark and unzip it. > Then run spark shell by ./bin/spark-shell or ./bin/pyspark. It is useful to > get familiar with spark. You can do this on your laptop as well as on ec2. > In fact, running ./ec2/spark-ec2 means launching spark standalone mode on a > cluster, you can find more details here: > https://spark.apache.org/docs/latest/spark-standalone.html > > Cheers > Gen > > > On Tue, Feb 24, 2015 at 4:07 PM, Deep Pradhan > wrote: > >> Kindly bear with my questions as I am new to this. >> >> If you run spark on local mode on a ec2 machine >> What does this mean? Is it that I launch Spark cluster from my local >> machine,i.e., by running the shell script that is there in /spark/ec2? >> >> On Tue, Feb 24, 2015 at 8:32 PM, gen tang wrote: >> >>> Hi, >>> >>> As a real spark cluster needs a least one master and one slaves, you >>> need to launch two machine. Therefore the second machine is not free. >>> However, If you run spark on local mode on a ec2 machine. It is free. >>> >>> The charge of AWS depends on how much and the types of machine that you >>> launched, but not on the utilisation of machine. >>> >>> Hope it would help. >>> >>> Cheers >>> Gen >>> >>> >>> On Tue, Feb 24, 2015 at 3:55 PM, Deep Pradhan >> > wrote: >>> Hi, I have just signed up for Amazon AWS because I learnt that it provides service for free for the first 12 months. I want to run Spark on EC2 cluster. Will they charge me for this? Thank You >>> >>> >> >
Re: Spark on EC2
If you signup for Google Compute Cloud, you will get free $300 credits for 3 months and you can start a pretty good cluster for your testing purposes. :) Thanks Best Regards On Tue, Feb 24, 2015 at 8:25 PM, Deep Pradhan wrote: > Hi, > I have just signed up for Amazon AWS because I learnt that it provides > service for free for the first 12 months. > I want to run Spark on EC2 cluster. Will they charge me for this? > > Thank You >
Re: Spark on EC2
Yes it is :) Thanks Best Regards On Tue, Feb 24, 2015 at 9:09 PM, Deep Pradhan wrote: > Thank You Akhil. Will look into it. > Its free, isn't it? I am still a student :) > > On Tue, Feb 24, 2015 at 9:06 PM, Akhil Das > wrote: > >> If you signup for Google Compute Cloud, you will get free $300 credits >> for 3 months and you can start a pretty good cluster for your testing >> purposes. :) >> >> Thanks >> Best Regards >> >> On Tue, Feb 24, 2015 at 8:25 PM, Deep Pradhan >> wrote: >> >>> Hi, >>> I have just signed up for Amazon AWS because I learnt that it provides >>> service for free for the first 12 months. >>> I want to run Spark on EC2 cluster. Will they charge me for this? >>> >>> Thank You >>> >> >> >
Re: Spark on EC2
Thank You Akhil. Will look into it. Its free, isn't it? I am still a student :) On Tue, Feb 24, 2015 at 9:06 PM, Akhil Das wrote: > If you signup for Google Compute Cloud, you will get free $300 credits for > 3 months and you can start a pretty good cluster for your testing purposes. > :) > > Thanks > Best Regards > > On Tue, Feb 24, 2015 at 8:25 PM, Deep Pradhan > wrote: > >> Hi, >> I have just signed up for Amazon AWS because I learnt that it provides >> service for free for the first 12 months. >> I want to run Spark on EC2 cluster. Will they charge me for this? >> >> Thank You >> > >
Re: Spark on EC2
Hmm.. you've gotten further than me. Which AMI's are you using? On Sun, Jun 1, 2014 at 2:21 PM, superback wrote: > Hi, > I am trying to run an example on AMAZON EC2 and have successfully > set up one cluster with two nodes on EC2. However, when I was testing an > example using the following command, > > * > ./run-example org.apache.spark.examples.GroupByTest > spark://`hostname`:7077* > > I got the following warnings and errors. Can anyone help one solve this > problem? Thanks very much! > > 46781 [Timer-0] WARN org.apache.spark.scheduler.TaskSchedulerImpl - Initial > job has not accepted any resources; check your cluster UI to ensure that > workers are registered and have sufficient memory > 61544 [spark-akka.actor.default-dispatcher-3] ERROR > org.apache.spark.deploy.client.AppClient$ClientActor - All masters are > unresponsive! Giving up. > 61544 [spark-akka.actor.default-dispatcher-3] ERROR > org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend - Spark > cluster looks dead, giving up. > 61546 [spark-akka.actor.default-dispatcher-3] INFO > org.apache.spark.scheduler.TaskSchedulerImpl - Remove TaskSet 0.0 from pool > 61549 [main] INFO org.apache.spark.scheduler.DAGScheduler - Failed to run > count at GroupByTest.scala:50 > Exception in thread "main" org.apache.spark.SparkException: Job aborted: > Spark cluster looks down > at > > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028) > at > > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1026) > at > > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at > scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > org.apache.spark.scheduler.DAGScheduler.org > $apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1026) > at > > org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619) > at > > org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619) > at scala.Option.foreach(Option.scala:236) > at > > org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:619) > at > > org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:207) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) > at akka.actor.ActorCell.invoke(ActorCell.scala:456) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) > at akka.dispatch.Mailbox.run(Mailbox.scala:219) > at > > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) > at > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > > > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-on-EC2-tp6638.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > -- Jeremy Lee BCompSci(Hons) The Unorthodox Engineers
Re: Spark on EC2
I haven't set up AMI yet. I am just trying to run a simple job on the EC2 cluster. So, is setting up AMI a prerequisite for running simple Spark example like org.apache.spark.examples.GroupByTest? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-on-EC2-tp6638p6681.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Spark on EC2
No, you don't have to set up your own AMI. Actually it's probably simpler and less error prone if you let spark-ec2 manage that for you as you first start to get comfortable with Spark. Just spin up a cluster without any explicit mention of AMI and it will do the right thing. 2014년 6월 1일 일요일, superback님이 작성한 메시지: > I haven't set up AMI yet. I am just trying to run a simple job on the EC2 > cluster. So, is setting up AMI a prerequisite for running simple Spark > example like org.apache.spark.examples.GroupByTest? > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-on-EC2-tp6638p6681.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >
Re: Spark on EC2
You're probably requesting more instances than allowed by your account, so the error gets generated for the extra instances. Try launching a smaller cluster. On Wed, Apr 1, 2015 at 12:41 PM, Vadim Bichutskiy < vadim.bichuts...@gmail.com> wrote: > Hi all, > > I just tried launching a Spark cluster on EC2 as described in > http://spark.apache.org/docs/1.3.0/ec2-scripts.html > > I got the following response: > > > *"PendingVerificationYour > account is currently being verified. Verification normally takes less than > 2 hours. Until your account is verified, you may not be able to launch > additional instances or create additional volumes. If you are still > receiving this message after more than 2 hours, please let us know by > writing to aws-verificat...@amazon.com . We > appreciate your patience..."* > However I can see the EC2 instances in AWS console as "running" > > Any thoughts on what's going on? > > Thanks, > Vadim > ᐧ > -- *Dan Osipov* *Shazam* 2114 Broadway Street, Redwood City, CA 94063 Please consider the environment before printing this document Shazam Entertainment Limited is incorporated in England and Wales under company number 3998831 and its registered office is at 26-28 Hammersmith Grove, London W6 7HA. Shazam Media Services Inc is a member of the Shazam Entertainment Limited group of companies.
Re: Spark on EC2
Hi Gilberto, Could you please attach the driver logs as well, so that we can pinpoint what's going wrong? Could you also add the flag `--driver-memory 4g` while submitting your application and try that as well? Best, Burak - Original Message - From: "Gilberto Lira" To: user@spark.apache.org Sent: Thursday, September 18, 2014 11:48:03 AM Subject: Spark on EC2 Hello, I am trying to run a python script that makes use of the kmeans MLIB and I'm not getting anywhere. I'm using an c3.xlarge instance as master, and 10 c3.large instances as slaves. In the code I make a map of a 600MB csv file in S3, where each row has 128 integer columns. The problem is that around the TID7 my slave stops responding, and I can not finish my processing. Could you help me with this problem? I sending my script attached for review. Thank you, Gilberto - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org