Hi there,

I ran a simple Batch Application on a Spark Cluster on EC2. Despite having 3
Worker Nodes, I could not get the application processed on more than one
node, regardless if I submitted the Application in Cluster or Client mode.
I also tried manually increasing the number of partitions in the code, no
effect. I also pass the master into the application.
I verified on the nodes themselves that only one node was active while the
job was running.
I pass enough data to make the job take 6 minutes to process.
The job is simple enough, reading data from two S3 files, joining records on
a shared field, filtering out some records and writing the result back to
S3.

Tried all kinds of stuff, but could not make it work. I did find similar
questions, but had already tried the solutions that worked in those cases.
Would be really happy about any pointers.

Cheers,
Michael



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-job-uses-only-one-Worker-tp25909.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to