[ https://issues.apache.org/jira/browse/SPARK-17983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15586121#comment-15586121 ]
Reynold Xin commented on SPARK-17983: ------------------------------------- We can update Parquet to make it case insensitive too. > Can't filter over mixed case parquet columns of converted Hive tables > --------------------------------------------------------------------- > > Key: SPARK-17983 > URL: https://issues.apache.org/jira/browse/SPARK-17983 > Project: Spark > Issue Type: Sub-task > Components: SQL > Affects Versions: 2.1.0 > Reporter: Eric Liang > Priority: Critical > > We should probably revive https://github.com/apache/spark/pull/14750 in order > to fix this issue and related classes of issues. > The only other alternatives are (1) reconciling on-disk schemas with > metastore schema at planning time, which seems pretty messy, and (2) fixing > all the datasources to support case-insensitive matching, which also has > issues. > Reproduction: > {code} > private def setupPartitionedTable(tableName: String, dir: File): Unit = { > spark.range(5).selectExpr("id as normalCol", "id as partCol1", "id as > partCol2").write > .partitionBy("partCol1", "partCol2") > .mode("overwrite") > .parquet(dir.getAbsolutePath) > spark.sql(s""" > |create external table $tableName (normalCol long) > |partitioned by (partCol1 int, partCol2 int) > |stored as parquet > |location "${dir.getAbsolutePath}"""".stripMargin) > spark.sql(s"msck repair table $tableName") > } > test("filter by mixed case col") { > withTable("test") { > withTempDir { dir => > setupPartitionedTable("test", dir) > val df = spark.sql("select * from test where normalCol = 3") > assert(df.count() == 1) > } > } > } > {code} > cc [~cloud_fan] -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org