to narrow down,you can try below 1) is the job going to same node everytime( when you execute job multiple times)?. enable property spark.speculation, keep thread.sleep for 2 mins and see if the job is going to a different worker from the executor posted on initially. ( trying to find, there are no connection or setup related issue) 2) whats your spark.executor.memory. try decreasing executor memory to a value less than data size and if that helps in distributing. 3 While launching the cluster, play around with with number of slaves - start with 1 ./spark-ec2 -k <keypair> -i <key-file> -s <num-slaves> launch <cluster-name>
On Fri, Jan 8, 2016 at 2:53 PM, Michael Pisula <michael.pis...@tngtech.com> wrote: > Hi Annabel, > > I am using Spark in stand-alone mode (deployment using the ec2 scripts > packaged with spark). > > Cheers, > Michael > > > On 08.01.2016 00:43, Annabel Melongo wrote: > > Michael, > > I don't know what's your environment but if it's Cloudera, you should be > able to see the link to your master in the Hue. > > Thanks > > > On Thursday, January 7, 2016 5:03 PM, Michael Pisula > <michael.pis...@tngtech.com> <michael.pis...@tngtech.com> wrote: > > > I had tried several parameters, including --total-executor-cores, no > effect. > As for the port, I tried 7077, but if I remember correctly I got some kind > of error that suggested to try 6066, with which it worked just fine (apart > from this issue here). > > Each worker has two cores. I also tried increasing cores, again no effect. > I was able to increase the number of cores the job was using on one worker, > but it would not use any other worker (and it would not start if the number > of cores the job wanted was higher than the number available on one worker). > > On 07.01.2016 22:51, Igor Berman wrote: > > read about *--total-executor-cores* > not sure why you specify port 6066 in master...usually it's 7077 > verify in master ui(usually port 8080) how many cores are there(depends on > other configs, but usually workers connect to master with all their cores) > > On 7 January 2016 at 23:46, Michael Pisula <michael.pis...@tngtech.com> > wrote: > > Hi, > > I start the cluster using the spark-ec2 scripts, so the cluster is in > stand-alone mode. > Here is how I submit my job: > spark/bin/spark-submit --class demo.spark.StaticDataAnalysis --master > spark://<host>:6066 --deploy-mode cluster demo/Demo-1.0-SNAPSHOT-all.jar > > Cheers, > Michael > > > On 07.01.2016 22:41, Igor Berman wrote: > > share how you submit your job > what cluster(yarn, standalone) > > On 7 January 2016 at 23:24, Michael Pisula < <michael.pis...@tngtech.com> > michael.pis...@tngtech.com> wrote: > > Hi there, > > I ran a simple Batch Application on a Spark Cluster on EC2. Despite having > 3 > Worker Nodes, I could not get the application processed on more than one > node, regardless if I submitted the Application in Cluster or Client mode. > I also tried manually increasing the number of partitions in the code, no > effect. I also pass the master into the application. > I verified on the nodes themselves that only one node was active while the > job was running. > I pass enough data to make the job take 6 minutes to process. > The job is simple enough, reading data from two S3 files, joining records > on > a shared field, filtering out some records and writing the result back to > S3. > > Tried all kinds of stuff, but could not make it work. I did find similar > questions, but had already tried the solutions that worked in those cases. > Would be really happy about any pointers. > > Cheers, > Michael > > > > -- > View this message in context: > <http://apache-spark-user-list.1001560.n3.nabble.com/Spark-job-uses-only-one-Worker-tp25909.html> > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-job-uses-only-one-Worker-tp25909.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: <user-unsubscr...@spark.apache.org> > user-unsubscr...@spark.apache.org > For additional commands, e-mail: <user-h...@spark.apache.org> > user-h...@spark.apache.org > > > > -- > Michael Pisula * michael.pis...@tngtech.com * +49-174-3180084 > TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring > Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke > Sitz: Unterföhring * Amtsgericht München * HRB 135082 > > > > -- > Michael Pisula * michael.pis...@tngtech.com * +49-174-3180084 > TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring > Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke > Sitz: Unterföhring * Amtsgericht München * HRB 135082 > > > > > -- > Michael Pisula * michael.pis...@tngtech.com * +49-174-3180084 > TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring > Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke > Sitz: Unterföhring * Amtsgericht München * HRB 135082 > >