[GitHub] spark issue #18849: [SPARK-21617][SQL] Store correct table metadata when alt...

vanzin Fri, 18 Aug 2017 12:54:47 -0700

Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/18849
  
    I think there is still a lot of confusion around here about what this is 
fixing. I see a bunch of comments related to testing the schema for 
compatibility.
    
    That does not work. Schema compatibility is not the issue here; the issue 
is whether the table was *initially* created as Hive-compatible or not. This is 
the Hive metastore, not Spark, complaining, so the Spark-side schema for 
non-compatible tables is pretty irrelevant.
    
    The schema by itself does not provide enough information to detect whether 
a table is compatible or not. Even if the schema is Hive compatible, the data 
source may not have a Hive counterpart, or the table might have been initially 
created in a case sensitive session and have conflicting column names when case 
is ignore, or a few other things, all of which are checked at table creation 
time.
    
    The same checks *cannot* be done later, and should not be done. If the 
table was non-compatible it should remain non-compatible, and vice-versa. The 
only thing that is needed is a way to detect that single property of the table. 
You cannot do that just from the schema as has been proposed a few times here.
    
    There are two options:
    - use an explicit option, which is the approach I took
    - use some combination of metadata written by old Spark versions that tells 
you whether the table is compatible or not.
    
    The only thing that exists for the second one is the serde field in the 
storage descriptor. Spark sets it to either `None` or some placeholder that 
does not match the datasource serde. I use that fact as a fallback for when the 
property does not exist, but I think it's safer to have an explicit property 
for that instead of relying on these artifacts.
    
    Hope that clarifies things.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18849: [SPARK-21617][SQL] Store correct table metadata when alt...

Reply via email to