Hello,

  I have problem with join of two tables via Spark - I have tried to do it
via Spark SQL and API but no progress so far. I have basicaly two tables
ACCONTS - 16 mio records and TRANSACTIONS 2,5 billion records. When I try to
join the tables (please see code) the job stucks in the last stage for very
long (please see console output).  And after eg 2h it writes to the output a
weird exception like
/org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output
location for shuffle 0/

I have tried several strategies - repartitioning of RDDs, broadcast the
smaller one, but result is always same
Have sombody idea what happens? 

Source Code.  AccJoin.java
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n21018/AccJoin.java>  
Console  AccJoin_0.html
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n21018/AccJoin_0.html>
  



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Join-stucks-in-the-last-stage-step-tp21018.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to