I have about 15 -20 joins to perform. Each of these tables are in the order of 6 million to 66 million rows. The number of columns range from 20 are 400.
I read the parquet files and obtain schemaRDDs. Then use join functionality on 2 SchemaRDDs. I join the previous join results with the next schemaRDD. Any ideas how to deal with such join intensive spark SQL process? Any advise how to handle joins in better ways? I will appreciate all the inputs. Thanks! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/LeftOuter-Join-issue-tp21398.html Sent from the Apache Spark User List mailing list archive at Nabble.com.