I have about 15 -20 joins to perform. Each of these tables are in the order
of 6 million to 66 million rows. The number of columns range from 20 are
400.

I read the parquet files and obtain schemaRDDs.
Then use join functionality on 2 SchemaRDDs.
I join the previous join results with the next schemaRDD.

Any ideas how to deal with such join intensive spark SQL process?
Any advise how to handle joins in better ways?

I will appreciate all the inputs.

Thanks!




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/LeftOuter-Join-issue-tp21398.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to