[GitHub] spark issue #16797: [SPARK-19455][SQL] Add option for case-insensitive Parqu...

budde Tue, 07 Feb 2017 18:59:27 -0800

Github user budde commented on the issue:

    https://github.com/apache/spark/pull/16797
  
    > is it a completely compatibility issue? Seems like the only problem is, 
when we write out mixed-case-schema parquet files directly, and create an 
external table pointing to these files with Spark prior to 2.1, then read this 
table with Spark 2.1+.
    
    Fundamentally, I wouldn't make the assumption that Spark is being used to 
create and maintain the tables in the Hive Metastore that Spark is querying 
against. We're currently using Spark to add and update metastore tables in our 
usecase, but I don't think Spark should make any assumptions about how the 
table was created with or what properties may be set.
    
    In regard to the underlying issue, we've been using Spark in production for 
over two years and have several petabytes of case-sensitive Parquet data we've 
both written and queried using Spark. As of Spark 2.1, we are no longer able to 
use Spark to query any of this data as any query containing a case-sensitive 
field name will return 0 results. I would argue this is a compatibility 
regression.
    
    > For tables in hive, as long as long hive can read it, Spark should be 
able to read it too.
    
    In our case, other Hive-compatible query engines like Presto don't have a 
problem with case-sensitive  Parquet files. I haven't tried Hive itself in a 
long time but as far as I remember we didn't have a problem there either.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16797: [SPARK-19455][SQL] Add option for case-insensitive Parqu...

Reply via email to