[GitHub] spark pull request: [SPARK-14346][SQL] Show Create Table (Native)

2016-05-11 Thread xwu0226
Github user xwu0226 closed the pull request at:

https://github.com/apache/spark/pull/12579


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14346][SQL] Show Create Table (Native)

2016-05-11 Thread xwu0226
Github user xwu0226 commented on the pull request:

https://github.com/apache/spark/pull/12579#issuecomment-218549665
  
@liancheng Thank you for the detail explanation!! Yeah. if the goal is to 
make sure Spark SQL can handle the generated DDL, then, we need to miss some 
hive features for now. I will close this PR. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14346][SQL] Show Create Table (Native)

2016-05-11 Thread liancheng
Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/12579#issuecomment-218379687
  
Hey @xwu0226, sorry that I didn't explain why I opened another PR for the 
same issue, was in code rush for 2.0...

So one of the considerations for all the native DDL commands is that we 
don't want these DDL commands to rely on Hive anymore. This is because we'd 
like to remove Hive dependency from Spark SQL core and gradually make Hive a 
separate data source in the future. This means, we shouldn't add new code in 
places like `HiveClientImpl`. These new DDL command should be implemented upon 
interfaces like `CatalogTable`.

One apparent problem of this approach is that, current Spark SQL interfaces 
don't capture all semantics of Hive. For example, some table metadata like skew 
spec is not covered in `CatalogTable` yet. Our general strategies are:

1. For easy ones, like "owner" and "compressed" in #12844, we may just add 
them to the interface and leverage them.
2. For features that are not supported in Spark SQL, for example, skew 
spec, we can simply ignore them for now, since Spark can't handle them anyway.

There will be a follow-up of #12781 to add support for Hive tables. After 
offline discussion with @yhuai, we decided to add a flag in `CatalogTable` to 
indicate that whether there unrecognized metadata provided by the underlying 
external catalog, but not translated and included in `CatalogTable`. In this 
way, when applying `SHOW CREATE TABLE` to tables containing such metadata, this 
flag can be set to true, and we can simply refuse to output anything by 
checking this flag. This makes sense because even if you add things like skew 
spec in the result of `SHOW CREATE TABLE`, Spark can't handle the generated DDL 
statement


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14346][SQL] Show Create Table (Native)

2016-05-10 Thread xwu0226
Github user xwu0226 commented on the pull request:

https://github.com/apache/spark/pull/12579#issuecomment-218238618
  
@srowen Yes, for datasource table. This PR also includes the work for hive 
syntax DDL too. I see #12781 mentions that there will be followup PR taking 
care of the hive syntax DDL. So I wondering whether I should continue on this 
PR. I can close this one if there is no need. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14346][SQL] Show Create Table (Native)

2016-05-10 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/12579#issuecomment-218192270
  
@xwu0226 I think this is superseded by 
https://github.com/apache/spark/pull/12781 ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14346][SQL] Show Create Table (Native)

2016-05-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12579#issuecomment-216763938
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14346][SQL] Show Create Table (Native)

2016-04-27 Thread xwu0226
Github user xwu0226 commented on the pull request:

https://github.com/apache/spark/pull/12579#issuecomment-215303758
  
@yhuai @liancheng , I see PR 
[#12734](https://github.com/apache/spark/pull/12734) takes care of the 
PARTITIONED BY and CLUSTERED BY (with SORTED BY) clause for CTAS syntax, but 
not for non-CTAS syntax.  Now I need to change my PR to adapt to this change, 
which means that the generated DDL will be something like `create table t1 (c1 
int, ...) using .. options (..) partitioned by (..) clustered by (...) sorted 
by (...) in ... buckets`. But there won't be a "select clause" following it 
since we do not have the original query. But such generated query will not run 
because [#12734](https://github.com/apache/spark/pull/12734) does not support 
it.  Can we add a fake select clause with a warning message?

Also DataFrameWriter.saveAsTable case is like CTAS. Can we then generate 
the DDL as a regular CTAS syntax? This will change my current implementation in 
this PR. 
Please advice, thanks a lot!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14346][SQL] Show Create Table (Native)

2016-04-25 Thread gatorsmile
Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/12579#issuecomment-214553367
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14346][SQL] Show Create Table (Native)

2016-04-25 Thread xwu0226
Github user xwu0226 commented on the pull request:

https://github.com/apache/spark/pull/12579#issuecomment-214472079
  
@liancheng Thanks for triggering the test! I am looking into the test 
failure. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14346][SQL] Show Create Table (Native)

2016-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12579#issuecomment-214466178
  
Build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14346][SQL] Show Create Table (Native)

2016-04-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12579#issuecomment-214465973
  
**[Test build #56899 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56899/consoleFull)**
 for PR 12579 at commit 
[`13e9775`](https://github.com/apache/spark/commit/13e9775604f3365683bf2b0f3b35b80a30f05dd4).
 * This patch **fails Spark unit tests**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14346][SQL] Show Create Table (Native)

2016-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12579#issuecomment-214466180
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56899/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14346][SQL] Show Create Table (Native)

2016-04-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12579#issuecomment-214420652
  
**[Test build #56899 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56899/consoleFull)**
 for PR 12579 at commit 
[`13e9775`](https://github.com/apache/spark/commit/13e9775604f3365683bf2b0f3b35b80a30f05dd4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14346][SQL] Show Create Table (Native)

2016-04-25 Thread liancheng
Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/12579#issuecomment-214419087
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14346][SQL] Show Create Table (Native)

2016-04-21 Thread xwu0226
Github user xwu0226 commented on the pull request:

https://github.com/apache/spark/pull/12579#issuecomment-213126393
  
@yhuai @andrewor14 Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14346][SQL] Show Create Table (Native)

2016-04-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12579#issuecomment-213032974
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14346][SQL] Show Create Table (Native)

2016-04-21 Thread xwu0226
GitHub user xwu0226 opened a pull request:

https://github.com/apache/spark/pull/12579

[SPARK-14346][SQL] Show Create Table (Native)

This is a rebased version of 
[#12132](https://github.com/apache/spark/pull/12132) and 
[#12406](https://github.com/apache/spark/pull/12406)

## What changes were proposed in this pull request?
Allow users to issue "`SHOW CREATE TABLE`" command natively in SparkSQL. 
-- For tables that are created by Hive, this command will display the DDL 
in hive syntax. If the syntax includes `CLUSTERED BY, SKEWED BY or STORED BY` 
clause, there will be a warning message saying that this DDL is not supported 
in SparkSQL native DDL yet. 

-- For tables that are created by datasource DDL, such as "`CREATE TABLE... 
USING ... OPTIONS (...)`", it will show the DDL in this syntax. 

-- For tables that are created by dataframe API, such as 
"`df.write.partitionBy(...).saveAsTable(...)`", currently the command will 
display DDL with the syntax "CREATE TABLE.. USING...OPTIONS(...)". However, 
this syntax lose the partitioning information. It is proposed to display create 
table in the dataframe API format.

## How was this patch tested?
Unit tests are created. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xwu0226/spark show_create_table_3

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/12579.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12579


commit 0ebb0142e13db3ce8fb474ee5682528b0f87d2d2
Author: xin Wu 
Date:   2016-04-02T01:46:16Z

show create table DDL -- hive metastore table

commit 6d060be797d4127f0b86fa59c1bc848d75215533
Author: xin Wu 
Date:   2016-04-02T06:01:46Z

update upon review

commit 2799672162d715b209cad9a5c103d6f09692d8dc
Author: xin Wu 
Date:   2016-04-02T18:19:26Z

ignoring sqlContext temp table and considering datasource table ddl

commit 98c020aa9a5374861d1470fa0c305148e8314ada
Author: xin Wu 
Date:   2016-04-04T21:54:32Z

fix scala style issue

commit efd889821bf84e328ef6dd8d0b6a645729248251
Author: xin Wu 
Date:   2016-04-04T22:40:26Z

fix scala style issue in testcase

commit b370630f5827071bc5076e9b3fa9c92720b27eb2
Author: xin Wu 
Date:   2016-04-05T01:31:46Z

fix testcase for test failure

commit 8cb7a7299df84f2608b91b092a7df6795b85d41e
Author: xin Wu 
Date:   2016-04-06T18:12:07Z

continue the database ddl generation

commit 8b67d22c5ed8fd6b309df772e4a372e741acf630
Author: xin Wu 
Date:   2016-04-08T20:57:12Z

support datasource ddl

commit 9ab863fb7f8127d1acd083b1ba857f5c1fd2769c
Author: xin Wu 
Date:   2016-04-08T22:04:05Z

scala style fix

commit a40273c7989bebdf62b93ce6e604bb14cacce100
Author: xin Wu 
Date:   2016-04-13T22:54:16Z

merge the code committed by CREATE TABLE native support

commit d214a3b0c54641a6234ba39eef82b2b8ac4c87dd
Author: xin Wu 
Date:   2016-04-14T23:49:03Z

rework show create ddl based on new native supported create table DDL work

commit 1680ea0403f0d29185d9a3f8f81d15599be81aac
Author: xin Wu 
Date:   2016-04-14T23:51:03Z

Merge branch 'show_create_table_1' into show_create_table_2

commit fa8373c3fd2d27cf2b3356ee0214c8e04dfc0f36
Author: xin Wu 
Date:   2016-04-15T02:03:41Z

remove spaces

commit 5095b6c871de55e871c5ea606ade6ab0b2166627
Author: xin Wu 
Date:   2016-04-15T16:24:53Z

update upon review - use visitTableIdentifier

commit 15f226c7d4f195947cbb1acc341eaaae4072d4a6
Author: xin Wu 
Date:   2016-04-20T18:28:29Z

generate dataframe API create table for some datasource tables

commit 601867ae71cc370770deddd56cc8883b04dcf8ee
Author: xin Wu 
Date:   2016-04-20T18:31:27Z

synch up with master branch

commit 687f7aca56cf5c032ceac09c341b2dfd00129b8e
Author: xin Wu 
Date:   2016-04-20T21:54:51Z

update upon review

commit bf3512ba01e773a514350030cfa91087de10fc03
Author: xin Wu 
Date:   2016-04-20T22:07:55Z

synch up with latest change

commit ca44d67584f358bd588743d33de2b7d689df584d
Author: xin Wu 
Date:   2016-04-21T05:35:04Z

synch up again




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org