GitHub user CrazyJacky opened a pull request:
https://github.com/apache/spark/pull/19434
[SPARK-21785][SQL]Support create table from a parquet file schema
## Support create table from a parquet file schema
As described in jira:
```sql
CREATE EXTERNAL TABLE IF NOT EXISTS test LIKE 'PARQUET'
'/user/test/abc/a.snappy.parquet' STORED AS PARQUET LOCATION
'/user/test/def/';
```
this is a very ugly fix and I would like someone to help to review and
refine.
and it only supports create hive table.
## Tested by test case and tested about build the runnable distribution
```scala
test("create table like parquet") {
val f = getClass.getClassLoader.
getResource("test-data/dec-in-fixed-len.parquet").getPath
val v1 =
"""
|create table if not exists db1.table1 like 'parquet'
""".stripMargin.concat("'" + f + "'").concat(
"""
|stored as sequencefile
|location '/tmp/table1'
""".stripMargin
)
val (desc, allowExisting) = extractTableDesc(v1)
assert(allowExisting)
assert(desc.identifier.database == Some("db1"))
assert(desc.identifier.table == "table1")
assert(desc.tableType == CatalogTableType.EXTERNAL)
assert(desc.schema == new StructType()
.add("fixed_len_dec", "decimal(10,2)"))
assert(desc.bucketSpec.isEmpty)
assert(desc.viewText.isEmpty)
assert(desc.viewDefaultDatabase.isEmpty)
assert(desc.viewQueryColumnNames.isEmpty)
assert(desc.storage.locationUri == Some(new URI("/tmp/table1")))
assert(desc.storage.inputFormat ==
Some("org.apache.hadoop.mapred.SequenceFileInputFormat"))
assert(desc.storage.outputFormat ==
Some("org.apache.hadoop.mapred.SequenceFileOutputFormat"))
assert(desc.storage.serde ==
Some("org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"))
}
```
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jacshen/spark master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19434.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19434
commit 6b23cb8ff5a778f4f1b4ca4f218cbe8c4e422101
Author: Shen
Date: 2017-10-04T20:35:03Z
Add support to create table which schema is reading from a given parquet
file
commit 877a57ec439db4e688c71568ddd312bdc2a50cec
Author: jacshen
Date: 2017-10-04T20:37:08Z
Merge branch 'master' of https://github.com/apache/spark
commit a22c39e795ab4a730d0277c4162cdfadd37dbf22
Author: jacshen
Date: 2017-10-04T21:21:02Z
Add support to create table which schema is reading from a given parquet
file
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org