[GitHub] spark issue #15998: [SPARK-18572][SQL] Add a method `listPartitionNames` to ...

ericl Tue, 29 Nov 2016 10:04:59 -0800

Github user ericl commented on the issue:

    https://github.com/apache/spark/pull/15998
  
    @mallman I'll take a look today
    
    On Tue, Nov 29, 2016, 9:45 AM Michael Allman <notificati...@github.com>
    wrote:
    
    > Hi Guys,
    >
    > Repeating my comment/query for @ericl <https://github.com/ericl>. I'm
    > hoping someone can provide affirmation/refutation to my question before I
    > proceed with new unit tests.
    >
    > I've run some tests to compare behavior between Hive and Spark in handling
    > gnarly partition column names, and I found some disparities. We've spent a
    > considerable amount of time wrangling with partition column name handling
    > recently, and I'm not sure what semantics we've decided on. To ensure the
    > behavior I'm seeing is what we're expecting, I want to describe a scenario
    > I ran.
    >
    > In my test scenario, I created a table named test with the stock Hive
    > 2.1.0 distribution. (I simply downloaded it from its download page and
    > initialized an empty Derby schema store.) The exact DDL I used to create
    > this table is as follows:
    >
    > create table test(a string) partitioned by (`P``Ðr t` int);
    >
    > When I do a describe test with hive it shows the column name as p`Ð´r t.
    > It appears to lowercase the P and the cyrillic Ð before storing the table
    > schema it in the metastore. I then run
    >
    > alter table test add partition(`P``Ðr t`=0);
    >
    > When I run show partitions test in hive it gives me p`Ð´r t=0.
    > Additionally, when I list the contents of the test table's base directory
    > in HDFS, the partition directory entry is
    >
    > /user/hive/warehouse/test/p`Ð´r t=0
    >
    > If I drop the table, create it with spark-sql using the same DDL as
    > before and do a describe test, the partition column is given as P`Ðr t.
    > Spark has preserved the case of the partition column name. If I then do
    >
    > alter table test add partition(`P``Ðr t`=0);
    >
    > in spark-sql and show partitions test I get P`Ðr t=0. When I list the
    > directory contents in HDFS, I get
    >
    > /user/hive/warehouse/test/P`Ðr t=0
    >
    > The upshot is Hive is lowercasing the partition column name and Spark is
    > leaving it unaltered. Is this correct?
    >
    > â
    >
    >
    > You are receiving this because you were mentioned.
    > Reply to this email directly, view it on GitHub
    > <https://github.com/apache/spark/pull/15998#issuecomment-263643263>, or 
mute
    > the thread
    > 
<https://github.com/notifications/unsubscribe-auth/AAA6SiqmPcATOuBJeYtPOJSvMJPCwjRfks5rDGS9gaJpZM4K7Ggx>
    > .
    >




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15998: [SPARK-18572][SQL] Add a method `listPartitionNames` to ...

Reply via email to