[jira] [Commented] (SPARK-17990) ALTER TABLE ... ADD PARTITION does not play nice with mixed-case partition column names

Michael Allman (JIRA) Fri, 21 Oct 2016 09:19:24 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-17990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15595569#comment-15595569
 ]


Michael Allman commented on SPARK-17990:
----------------------------------------

The main problem as I see it is one of user experience. If the user puts an 
upper case letter in the partition column name, then things won't work as 
expected. If we're not going to support partition column names with upper case 
letters, I think we should throw an informative error when the user tries to 
create one, preferably in planning. What do you think?

> ALTER TABLE ... ADD PARTITION does not play nice with mixed-case partition 
> column names
> ---------------------------------------------------------------------------------------
>
>                 Key: SPARK-17990
>                 URL: https://issues.apache.org/jira/browse/SPARK-17990
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>         Environment: Linux
> Mac OS with a case-sensitive filesystem
>            Reporter: Michael Allman
>
> Writing partition data to an external table's file location and then adding 
> those as table partition metadata is a common use case. However, for tables 
> with partition column names with upper case letters, the SQL command {{ALTER 
> TABLE ... ADD PARTITION}} does not work, as illustrated in the following 
> example:
> {code}
> scala> sql("create external table mixed_case_partitioning (a bigint) 
> PARTITIONED BY (partCol bigint) STORED AS parquet LOCATION 
> '/tmp/mixed_case_partitioning'")
> res0: org.apache.spark.sql.DataFrame = []
> scala> spark.sqlContext.range(10).selectExpr("id as a", "id as 
> partCol").write.partitionBy("partCol").mode("overwrite").parquet("/tmp/mixed_case_partitioning")
> {code}
> At this point, doing a {{hadoop fs -ls /tmp/mixed_case_partitioning}} 
> produces the following:
> {code}
> [msa@jupyter ~]$ hadoop fs -ls /tmp/mixed_case_partitioning
> Found 11 items
> -rw-r--r--   3 msa supergroup          0 2016-10-18 17:52 
> /tmp/mixed_case_partitioning/_SUCCESS
> drwxr-xr-x   - msa supergroup          0 2016-10-18 17:52 
> /tmp/mixed_case_partitioning/partCol=0
> drwxr-xr-x   - msa supergroup          0 2016-10-18 17:52 
> /tmp/mixed_case_partitioning/partCol=1
> drwxr-xr-x   - msa supergroup          0 2016-10-18 17:52 
> /tmp/mixed_case_partitioning/partCol=2
> drwxr-xr-x   - msa supergroup          0 2016-10-18 17:52 
> /tmp/mixed_case_partitioning/partCol=3
> drwxr-xr-x   - msa supergroup          0 2016-10-18 17:52 
> /tmp/mixed_case_partitioning/partCol=4
> drwxr-xr-x   - msa supergroup          0 2016-10-18 17:52 
> /tmp/mixed_case_partitioning/partCol=5
> drwxr-xr-x   - msa supergroup          0 2016-10-18 17:52 
> /tmp/mixed_case_partitioning/partCol=6
> drwxr-xr-x   - msa supergroup          0 2016-10-18 17:52 
> /tmp/mixed_case_partitioning/partCol=7
> drwxr-xr-x   - msa supergroup          0 2016-10-18 17:52 
> /tmp/mixed_case_partitioning/partCol=8
> drwxr-xr-x   - msa supergroup          0 2016-10-18 17:52 
> /tmp/mixed_case_partitioning/partCol=9
> {code}
> Returning to the Spark shell, we execute the following to add the partition 
> metadata:
> {code}
> scala> (0 to 9).foreach { p => sql(s"alter table mixed_case_partitioning add 
> partition(partCol=$p)") }
> {code}
> Examining the HDFS file listing again, we see:
> {code}
> [msa@jupyter ~]$ hadoop fs -ls /tmp/mixed_case_partitioning
> Found 21 items
> -rw-r--r--   3 msa supergroup          0 2016-10-18 17:52 
> /tmp/mixed_case_partitioning/_SUCCESS
> drwxr-xr-x   - msa supergroup          0 2016-10-18 17:52 
> /tmp/mixed_case_partitioning/partCol=0
> drwxr-xr-x   - msa supergroup          0 2016-10-18 17:52 
> /tmp/mixed_case_partitioning/partCol=1
> drwxr-xr-x   - msa supergroup          0 2016-10-18 17:52 
> /tmp/mixed_case_partitioning/partCol=2
> drwxr-xr-x   - msa supergroup          0 2016-10-18 17:52 
> /tmp/mixed_case_partitioning/partCol=3
> drwxr-xr-x   - msa supergroup          0 2016-10-18 17:52 
> /tmp/mixed_case_partitioning/partCol=4
> drwxr-xr-x   - msa supergroup          0 2016-10-18 17:52 
> /tmp/mixed_case_partitioning/partCol=5
> drwxr-xr-x   - msa supergroup          0 2016-10-18 17:52 
> /tmp/mixed_case_partitioning/partCol=6
> drwxr-xr-x   - msa supergroup          0 2016-10-18 17:52 
> /tmp/mixed_case_partitioning/partCol=7
> drwxr-xr-x   - msa supergroup          0 2016-10-18 17:52 
> /tmp/mixed_case_partitioning/partCol=8
> drwxr-xr-x   - msa supergroup          0 2016-10-18 17:52 
> /tmp/mixed_case_partitioning/partCol=9
> drwxr-xr-x   - msa supergroup          0 2016-10-18 17:53 
> /tmp/mixed_case_partitioning/partcol=0
> drwxr-xr-x   - msa supergroup          0 2016-10-18 17:53 
> /tmp/mixed_case_partitioning/partcol=1
> drwxr-xr-x   - msa supergroup          0 2016-10-18 17:53 
> /tmp/mixed_case_partitioning/partcol=2
> drwxr-xr-x   - msa supergroup          0 2016-10-18 17:53 
> /tmp/mixed_case_partitioning/partcol=3
> drwxr-xr-x   - msa supergroup          0 2016-10-18 17:53 
> /tmp/mixed_case_partitioning/partcol=4
> drwxr-xr-x   - msa supergroup          0 2016-10-18 17:53 
> /tmp/mixed_case_partitioning/partcol=5
> drwxr-xr-x   - msa supergroup          0 2016-10-18 17:53 
> /tmp/mixed_case_partitioning/partcol=6
> drwxr-xr-x   - msa supergroup          0 2016-10-18 17:53 
> /tmp/mixed_case_partitioning/partcol=7
> drwxr-xr-x   - msa supergroup          0 2016-10-18 17:53 
> /tmp/mixed_case_partitioning/partcol=8
> drwxr-xr-x   - msa supergroup          0 2016-10-18 17:53 
> /tmp/mixed_case_partitioning/partcol=9
> {code}
> Note that {{msck repair table mixed_case_partitioning}} does not exhibit this 
> behavior—it handles this use case correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-17990) ALTER TABLE ... ADD PARTITION does not play nice with mixed-case partition column names

Reply via email to