Hello, I have problem with join of two tables via Spark - I have tried to do it via Spark SQL and API but no progress so far. I have basicaly two tables ACCONTS - 16 mio records and TRANSACTIONS 2,5 billion records. When I try to join the tables (please see code) the job stucks in the last stage for very long (please see console output). And after eg 2h it writes to the output a weird exception like /org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 0/
I have tried several strategies - repartitioning of RDDs, broadcast the smaller one, but result is always same Have sombody idea what happens? Source Code. AccJoin.java <http://apache-spark-user-list.1001560.n3.nabble.com/file/n21018/AccJoin.java> Console AccJoin_0.html <http://apache-spark-user-list.1001560.n3.nabble.com/file/n21018/AccJoin_0.html> -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Join-stucks-in-the-last-stage-step-tp21018.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org