[GitHub] spark pull request #19434: [SPARK-21785][SQL]Support create table from a par...

CrazyJacky Wed, 04 Oct 2017 17:15:45 -0700

GitHub user CrazyJacky opened a pull request:

    https://github.com/apache/spark/pull/19434


    [SPARK-21785][SQL]Support create table from a parquet file schema

    ## Support create table from a parquet file schema
    As described in jira:
    ```sql
    CREATE EXTERNAL TABLE IF NOT EXISTS test LIKE 'PARQUET' 
'/user/test/abc/a.snappy.parquet' STORED AS PARQUET LOCATION
    '/user/test/def/'; 
    ```
    this is a very ugly fix and I would like someone to help to review and 
refine.
    and it only supports create hive table.
    
    ## Tested by test case and tested about build the runnable distribution
    
    ```scala
    test("create table like parquet") {
    
        val f = getClass.getClassLoader.
          getResource("test-data/dec-in-fixed-len.parquet").getPath
        val v1 =
          """
            |create table if not exists db1.table1 like 'parquet'
          """.stripMargin.concat("'" + f + "'").concat(
          """
            |stored as sequencefile
            |location '/tmp/table1'
          """.stripMargin
          )
    
        val (desc, allowExisting) = extractTableDesc(v1)
    
        assert(allowExisting)
        assert(desc.identifier.database == Some("db1"))
        assert(desc.identifier.table == "table1")
        assert(desc.tableType == CatalogTableType.EXTERNAL)
        assert(desc.schema == new StructType()
          .add("fixed_len_dec", "decimal(10,2)"))
        assert(desc.bucketSpec.isEmpty)
        assert(desc.viewText.isEmpty)
        assert(desc.viewDefaultDatabase.isEmpty)
        assert(desc.viewQueryColumnNames.isEmpty)
        assert(desc.storage.locationUri == Some(new URI("/tmp/table1")))
        assert(desc.storage.inputFormat == 
Some("org.apache.hadoop.mapred.SequenceFileInputFormat"))
        assert(desc.storage.outputFormat == 
Some("org.apache.hadoop.mapred.SequenceFileOutputFormat"))
        assert(desc.storage.serde == 
Some("org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"))
      }
    ```


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jacshen/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19434.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19434
    
----
commit 6b23cb8ff5a778f4f1b4ca4f218cbe8c4e422101
Author: Shen <jacs...@lm-sea-11008031.corp.ebay.com>
Date:   2017-10-04T20:35:03Z

    Add support to create table which schema is reading from a given parquet 
file

commit 877a57ec439db4e688c71568ddd312bdc2a50cec
Author: jacshen <jacs...@ebay.com>
Date:   2017-10-04T20:37:08Z

    Merge branch 'master' of https://github.com/apache/spark

commit a22c39e795ab4a730d0277c4162cdfadd37dbf22
Author: jacshen <jacs...@ebay.com>
Date:   2017-10-04T21:21:02Z

    Add support to create table which schema is reading from a given parquet 
file

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19434: [SPARK-21785][SQL]Support create table from a par...

Reply via email to