subject:"\[GitHub\] spark pull request #19434\: \[SPARK\-21785\]\[SQL\]Support create table from a par..."

[GitHub] spark pull request #19434: [SPARK-21785][SQL]Support create table from a par...

2018-11-10 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19434


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19434: [SPARK-21785][SQL]Support create table from a par...

2017-10-04 Thread CrazyJacky

GitHub user CrazyJacky opened a pull request:

https://github.com/apache/spark/pull/19434

[SPARK-21785][SQL]Support create table from a parquet file schema

## Support create table from a parquet file schema
As described in jira:
```sql
CREATE EXTERNAL TABLE IF NOT EXISTS test LIKE 'PARQUET' 
'/user/test/abc/a.snappy.parquet' STORED AS PARQUET LOCATION
'/user/test/def/'; 
```
this is a very ugly fix and I would like someone to help to review and 
refine.
and it only supports create hive table.

## Tested by test case and tested about build the runnable distribution

```scala
test("create table like parquet") {

val f = getClass.getClassLoader.
  getResource("test-data/dec-in-fixed-len.parquet").getPath
val v1 =
  """
|create table if not exists db1.table1 like 'parquet'
  """.stripMargin.concat("'" + f + "'").concat(
  """
|stored as sequencefile
|location '/tmp/table1'
  """.stripMargin
  )

val (desc, allowExisting) = extractTableDesc(v1)

assert(allowExisting)
assert(desc.identifier.database == Some("db1"))
assert(desc.identifier.table == "table1")
assert(desc.tableType == CatalogTableType.EXTERNAL)
assert(desc.schema == new StructType()
  .add("fixed_len_dec", "decimal(10,2)"))
assert(desc.bucketSpec.isEmpty)
assert(desc.viewText.isEmpty)
assert(desc.viewDefaultDatabase.isEmpty)
assert(desc.viewQueryColumnNames.isEmpty)
assert(desc.storage.locationUri == Some(new URI("/tmp/table1")))
assert(desc.storage.inputFormat == 
Some("org.apache.hadoop.mapred.SequenceFileInputFormat"))
assert(desc.storage.outputFormat == 
Some("org.apache.hadoop.mapred.SequenceFileOutputFormat"))
assert(desc.storage.serde == 
Some("org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"))
  }
```


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jacshen/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19434.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19434


commit 6b23cb8ff5a778f4f1b4ca4f218cbe8c4e422101
Author: Shen 
Date:   2017-10-04T20:35:03Z

Add support to create table which schema is reading from a given parquet 
file

commit 877a57ec439db4e688c71568ddd312bdc2a50cec
Author: jacshen 
Date:   2017-10-04T20:37:08Z

Merge branch 'master' of https://github.com/apache/spark

commit a22c39e795ab4a730d0277c4162cdfadd37dbf22
Author: jacshen 
Date:   2017-10-04T21:21:02Z

Add support to create table which schema is reading from a given parquet 
file




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19434: [SPARK-21785][SQL]Support create table from a par...

[GitHub] spark pull request #19434: [SPARK-21785][SQL]Support create table from a par...

2 matches

Site Navigation

Mail list logo

Footer information