GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/20684
[SPARK-23523] [SQL] Fix the incorrect result caused by the rule OptimizeMetadataOnlyQuery ## What changes were proposed in this pull request? ```Scala val tablePath = new File(s"${path.getCanonicalPath}/cOl3=c/cOl1=a/cOl5=e") Seq(("a", "b", "c", "d", "e")).toDF("cOl1", "cOl2", "cOl3", "cOl4", "cOl5") .write.json(tablePath.getCanonicalPath) val df = spark.read.json(path.getCanonicalPath).select("CoL1", "CoL5", "CoL3").distinct() df.show() ``` It generates a wrong result. ``` [c,e,a] ``` We have a bug in the rule `OptimizeMetadataOnlyQuery `. We should respect the attribute order in the original leaf node. This PR is to fix it. ## How was this patch tested? Added a test case You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark optimizeMetadataOnly Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20684.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20684 ---- commit 292e87f09861558f590aa7e735fa8dccd001ae89 Author: gatorsmile <gatorsmile@...> Date: 2018-02-27T05:18:38Z fix. ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org