[ 
https://issues.apache.org/jira/browse/SPARK-17983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Liang updated SPARK-17983:
-------------------------------
    Description: 
We should probably revive https://github.com/apache/spark/pull/14750 in order 
to fix this issue and related classes of issues.

The only other alternatives are (1) reconciling on-disk schemas with metastore 
schema at planning time, which seems pretty messy, and (2) fixing all the 
datasources to support case-insensitive matching, which also has issues.

Reproduction:
{code}
  private def setupPartitionedTable(tableName: String, dir: File): Unit = {
    spark.range(5).selectExpr("id as normalCol", "id as partCol1", "id as 
partCol2").write
      .partitionBy("partCol1", "partCol2")
      .mode("overwrite")
      .parquet(dir.getAbsolutePath)

    spark.sql(s"""
      |create external table $tableName (normalCol long)
      |partitioned by (partCol1 int, partCol2 int)
      |stored as parquet
      |location "${dir.getAbsolutePath}"""".stripMargin)
    spark.sql(s"msck repair table $tableName")
  }

  test("filter by mixed case col") {
    withTable("test") {
      withTempDir { dir =>
        setupPartitionedTable("test", dir)
        val df = spark.sql("select * from test where normalCol = 3")
        assert(df.count() == 1)
      }
    }
  }
{code}
cc [~cloud_fan]

  was:
We should probably revive https://github.com/apache/spark/pull/14750 in order 
to fix this issue and related classes of issues.

The only other alternatives are (1) reconciling on-disk schemas with metastore 
schema at planning time, which seems pretty messy, and (2) fixing all the 
datasources to support case-insensitive matching, which also has issues.

cc [~cloud_fan]


> Can't filter over mixed case parquet columns of converted Hive tables
> ---------------------------------------------------------------------
>
>                 Key: SPARK-17983
>                 URL: https://issues.apache.org/jira/browse/SPARK-17983
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.1.0
>            Reporter: Eric Liang
>            Priority: Critical
>
> We should probably revive https://github.com/apache/spark/pull/14750 in order 
> to fix this issue and related classes of issues.
> The only other alternatives are (1) reconciling on-disk schemas with 
> metastore schema at planning time, which seems pretty messy, and (2) fixing 
> all the datasources to support case-insensitive matching, which also has 
> issues.
> Reproduction:
> {code}
>   private def setupPartitionedTable(tableName: String, dir: File): Unit = {
>     spark.range(5).selectExpr("id as normalCol", "id as partCol1", "id as 
> partCol2").write
>       .partitionBy("partCol1", "partCol2")
>       .mode("overwrite")
>       .parquet(dir.getAbsolutePath)
>     spark.sql(s"""
>       |create external table $tableName (normalCol long)
>       |partitioned by (partCol1 int, partCol2 int)
>       |stored as parquet
>       |location "${dir.getAbsolutePath}"""".stripMargin)
>     spark.sql(s"msck repair table $tableName")
>   }
>   test("filter by mixed case col") {
>     withTable("test") {
>       withTempDir { dir =>
>         setupPartitionedTable("test", dir)
>         val df = spark.sql("select * from test where normalCol = 3")
>         assert(df.count() == 1)
>       }
>     }
>   }
> {code}
> cc [~cloud_fan]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to