Hi there, I ran a simple Batch Application on a Spark Cluster on EC2. Despite having 3 Worker Nodes, I could not get the application processed on more than one node, regardless if I submitted the Application in Cluster or Client mode. I also tried manually increasing the number of partitions in the code, no effect. I also pass the master into the application. I verified on the nodes themselves that only one node was active while the job was running. I pass enough data to make the job take 6 minutes to process. The job is simple enough, reading data from two S3 files, joining records on a shared field, filtering out some records and writing the result back to S3.
Tried all kinds of stuff, but could not make it work. I did find similar questions, but had already tried the solutions that worked in those cases. Would be really happy about any pointers. Cheers, Michael -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-job-uses-only-one-Worker-tp25909.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org