[GitHub] spark issue #16593: [SPARK-19153][SQL]DataFrameWriter.saveAsTable work with ...

windpiger Thu, 19 Jan 2017 21:44:25 -0800

Github user windpiger commented on the issue:

    https://github.com/apache/spark/pull/16593
  
    thanks all, let's make a summary:
    1. no CTAS
    `
    create table t(a int, b int, c string, d string)
    using $provider
    partitioned by(d, c)
    `
    the schema order of table in catalog should be `a, b, d, c`
    a) for datasource table 
    this situation `has ensured by DataSource.getOrInferFileFormatSchema`:
    
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L182
    
    b) for hive table
      as @lins05 's comment, currently we does not process this situation, as 
the suggest we should
     add a new rule for it.
    
    2. CTAS
    `
    create table t
    using $provider
    partitioned by(d, c)
    select 1 as b, 2 as a, 'x' as c, 'y' as d
    `
    the schema order of table in catalog should be `b, a, d, c`
    a) for datasource table 
    this situation `has ensured by create table with updated schema`:
    
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala#L159
    
    b) for hive table
      this pr put this logic in `CreateHIveTableAsSelectCommand`, if we add a 
new rule, we can merge the logic with no-CTAS for hive situation.
    
    Above all, to ensure the order of schema in catalog as we expected, we need 
add a new rule for hive table. this is the test branch implement the new 
rule,https://github.com/windpiger/spark/commit/acca991d3d92116ce3a88918b3798d14d32849f8#diff-73bd90660f41c12a87ee9fe8d35d856aR463
    
    But before this implement new rule, we should first merge the pr(#16642), 
then we can get a `tableDesc with non-empty schema`, and then we can use it 
here 
https://github.com/windpiger/spark/commit/acca991d3d92116ce3a88918b3798d14d32849f8#diff-73bd90660f41c12a87ee9fe8d35d856aR470
 
    
    @cloud-fan @lins05 is this ok?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16593: [SPARK-19153][SQL]DataFrameWriter.saveAsTable work with ...

Reply via email to