Re: Help explaining explain() after DataFrame join reordering

2018-06-05 Thread Matteo Cossu
Hello, as explained here , the join order can be changed by the optimizer. The difference introduced in Spark 2.2 is that the reordering is based on statistics instead of heuristics, that can appear "random"

Help explaining explain() after DataFrame join reordering

2018-06-01 Thread Mohamed Nadjib MAMI
Dear Sparkers, I'm loading into DataFrames data from 5 sources (using official connectors): Parquet, MongoDB, Cassandra, MySQL and CSV. I'm then joining those DataFrames in two different orders. - mongo * cassandra * jdbc * parquet * csv (random order). - parquet * csv * cassandra * jdbc *