[GitHub] spark pull request #14946: [SPARK-17353] [SPARK-16943] [SPARK-16942] [SPARK-...

gatorsmile Fri, 02 Sep 2016 23:41:13 -0700

GitHub user gatorsmile opened a pull request:

    https://github.com/apache/spark/pull/14946


    [SPARK-17353] [SPARK-16943] [SPARK-16942] [SPARK-16959] [BACKPORT-2.0] 
[SQL] Fix multiple bugs in CREATE TABLE LIKE command

    ### What changes were proposed in this pull request?
    This PR is to backport https://github.com/apache/spark/pull/14531 and 
https://github.com/apache/spark/pull/14550
    
    The existing `CREATE TABLE LIKE` command has multiple issues:
    
    - The generated table is non-empty when the source table is a data source 
table. The major reason is the data source table is using the table property 
`path` to store the location of table contents. Currently, we keep it 
unchanged. Thus, we still create the same table with the same location.
    
    - The table type of the generated table is `EXTERNAL` when the source table 
is an external Hive Serde table. Currently, we explicitly set it to `MANAGED`, 
but Hive is checking the table property `EXTERNAL` to decide whether the table 
is `EXTERNAL` or not. (See 
https://github.com/apache/hive/blob/master/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L1407-L1408)
 Thus, the created table is still `EXTERNAL`. 
    
    - When the source table is a `VIEW`, the metadata of the generated table 
contains the original view text and view original text. So far, this does not 
break anything, but it could cause something wrong in Hive. (For example, 
https://github.com/apache/hive/blob/master/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L1405-L1406)
    
    - The issue regarding the table `comment`. To follow what Hive does, the 
table comment should be cleaned, but the column comments should be still kept.
    
    - The `INDEX` table is not supported. Thus, we should throw an exception in 
this case. 
    
    - `owner` should not be retained. `ToHiveTable` set it 
[here](https://github.com/apache/spark/blob/e679bc3c1cd418ef0025d2ecbc547c9660cac433/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala#L793)
 no matter which value we set in `CatalogTable`. We set it to an empty string 
for avoiding the confusing output in Explain.
    
    - Add a support for temp tables
    
    - Like Hive, we should not copy the table properties from the source table 
to the created table, especially for the statistics-related properties, which 
could be wrong in the created table.
    
    - `unsupportedFeatures` should not be copied from the source table. The 
created table does not have these unsupported features.
    
    - When the type of source table is a view, the target table is using the 
default format of data source tables: `spark.sql.sources.default`.
    
    This PR is to fix the above issues. 
    
    ### How was this patch tested?
    Improve the test coverage by adding more test cases

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gatorsmile/spark createTableLike20

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/14946.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #14946
    
----
commit 535ec1b148935e55aec3171ea803949b991a2895
Author: gatorsmile <gatorsm...@gmail.com>
Date:   2016-09-03T05:05:45Z

    fix

commit fc419f698f09263f968245979c8ea6176531339f
Author: gatorsmile <gatorsm...@gmail.com>
Date:   2016-09-03T06:36:35Z

    fix

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14946: [SPARK-17353] [SPARK-16943] [SPARK-16942] [SPARK-...

Reply via email to