[GitHub] spark pull request #13517: [SPARK-14839][SQL] Support for other types as opt...

2016-07-02 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/13517#discussion_r69385174
  
--- Diff: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -45,11 +45,11 @@ statement
 | ALTER DATABASE identifier SET DBPROPERTIES tablePropertyList 
#setDatabaseProperties
 | DROP DATABASE (IF EXISTS)? identifier (RESTRICT | CASCADE)?  
#dropDatabase
 | createTableHeader ('(' colTypeList ')')? tableProvider
-(OPTIONS tablePropertyList)?
+(OPTIONS optionParameterList)?
 (PARTITIONED BY partitionColumnNames=identifierList)?
 bucketSpec?
#createTableUsing
 | createTableHeader tableProvider
-(OPTIONS tablePropertyList)?
--- End diff --

@hvanhovell I see. Thanks, I didn't expect you are on holidays..
I will push some commits and wait. Please feel free to review when you have 
some time!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13517: [SPARK-14839][SQL] Support for other types as opt...

2016-07-02 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/13517#discussion_r69380919
  
--- Diff: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -45,11 +45,11 @@ statement
 | ALTER DATABASE identifier SET DBPROPERTIES tablePropertyList 
#setDatabaseProperties
 | DROP DATABASE (IF EXISTS)? identifier (RESTRICT | CASCADE)?  
#dropDatabase
 | createTableHeader ('(' colTypeList ')')? tableProvider
-(OPTIONS tablePropertyList)?
+(OPTIONS optionParameterList)?
 (PARTITIONED BY partitionColumnNames=identifierList)?
 bucketSpec?
#createTableUsing
 | createTableHeader tableProvider
-(OPTIONS tablePropertyList)?
--- End diff --

@HyukjinKwon I am on holiday - so I am bit slow with my responses.

Yo have understood me correctly. What I am suggesting will affect the 
DBPROPERTIES and TBLPROPERTIES; it will also allow for boolean and numeric 
options. I don't think this is a bad thing, it is better to have a lenient 
parser and to constrain behavior in the `AstBuilder` (this allows us to throw 
much better error messages).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13517: [SPARK-14839][SQL] Support for other types as opt...

2016-07-02 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/13517#discussion_r69377675
  
--- Diff: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -45,11 +45,11 @@ statement
 | ALTER DATABASE identifier SET DBPROPERTIES tablePropertyList 
#setDatabaseProperties
 | DROP DATABASE (IF EXISTS)? identifier (RESTRICT | CASCADE)?  
#dropDatabase
 | createTableHeader ('(' colTypeList ')')? tableProvider
-(OPTIONS tablePropertyList)?
+(OPTIONS optionParameterList)?
 (PARTITIONED BY partitionColumnNames=identifierList)?
 bucketSpec?
#createTableUsing
 | createTableHeader tableProvider
-(OPTIONS tablePropertyList)?
--- End diff --

(Sorry for pining @hvanhovell)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13517: [SPARK-14839][SQL] Support for other types as opt...

2016-06-28 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/13517#discussion_r68871740
  
--- Diff: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -45,11 +45,11 @@ statement
 | ALTER DATABASE identifier SET DBPROPERTIES tablePropertyList 
#setDatabaseProperties
 | DROP DATABASE (IF EXISTS)? identifier (RESTRICT | CASCADE)?  
#dropDatabase
 | createTableHeader ('(' colTypeList ')')? tableProvider
-(OPTIONS tablePropertyList)?
+(OPTIONS optionParameterList)?
 (PARTITIONED BY partitionColumnNames=identifierList)?
 bucketSpec?
#createTableUsing
 | createTableHeader tableProvider
-(OPTIONS tablePropertyList)?
--- End diff --

Ah, do you mean it is okay to support other types for other related rules?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13517: [SPARK-14839][SQL] Support for other types as opt...

2016-06-28 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/13517#discussion_r68869270
  
--- Diff: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -45,11 +45,11 @@ statement
 | ALTER DATABASE identifier SET DBPROPERTIES tablePropertyList 
#setDatabaseProperties
 | DROP DATABASE (IF EXISTS)? identifier (RESTRICT | CASCADE)?  
#dropDatabase
 | createTableHeader ('(' colTypeList ')')? tableProvider
-(OPTIONS tablePropertyList)?
+(OPTIONS optionParameterList)?
 (PARTITIONED BY partitionColumnNames=identifierList)?
 bucketSpec?
#createTableUsing
 | createTableHeader tableProvider
-(OPTIONS tablePropertyList)?
--- End diff --

I hope I understood this correctly. I guess you meant the `tableProperty` 
rule, for example,  as below:

```antlr
tableProperty
: key=tablePropertyKey (EQ? value= optionValue)?
;
```

If so, I am worried if this affects other rules such as `DBPROPERTIES` and 
`TBLPROPERTIES` (allowing other types as values). I made this separate because 
it seems allowing other types in `OPTIONS` clause complies standard SQL.

If not, could you give me an advice in a bit more detail please?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13517: [SPARK-14839][SQL] Support for other types as opt...

2016-06-27 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/13517#discussion_r68699390
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -435,6 +434,37 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder {
   }
 
   /**
+   * Parse a key-value map from a [[OptionParameterListContext]], assuming 
all values are
+   * specified. This allows string, boolean, decimal and integer literals 
which are converted
+   * to strings.
+   */
+  override def visitOptionParameterList(ctx: OptionParameterListContext): 
Map[String, String] = {
+// TODO: Currently it does not treat null. Hive does not allow null 
for metadata and
+// throws an exception.
+val properties = ctx.optionParameter.asScala.map { property =>
+  val key = visitTablePropertyKey(property.key)
+  val value = if (property.value.STRING != null) {
+string(property.value.STRING)
+  } else if (property.value.booleanValue != null) {
+property.value.getText.toLowerCase
+  } else {
+property.value.getText
+  }
+  key -> value
+}
+
+// Check for duplicate property names.
+checkDuplicateKeys(properties, ctx)
+val props = properties.toMap
+val badKeys = props.filter { case (_, v) => v == null }.keys
--- End diff --

NIT (not your code): `val badKeys = props.collect { case (key, null) => key 
}`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13517: [SPARK-14839][SQL] Support for other types as opt...

2016-06-27 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/13517#discussion_r68699131
  
--- Diff: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -45,11 +45,11 @@ statement
 | ALTER DATABASE identifier SET DBPROPERTIES tablePropertyList 
#setDatabaseProperties
 | DROP DATABASE (IF EXISTS)? identifier (RESTRICT | CASCADE)?  
#dropDatabase
 | createTableHeader ('(' colTypeList ')')? tableProvider
-(OPTIONS tablePropertyList)?
+(OPTIONS optionParameterList)?
 (PARTITIONED BY partitionColumnNames=identifierList)?
 bucketSpec?
#createTableUsing
 | createTableHeader tableProvider
-(OPTIONS tablePropertyList)?
--- End diff --

Why not generalize the `tableProperty` rule and use `optionValue` (rename 
it to something more consistent) as its value rule? Seems easier.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13517: [SPARK-14839][SQL] Support for other types as opt...

2016-06-27 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/13517#discussion_r68698738
  
--- Diff: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -252,6 +252,21 @@ tablePropertyKey
 | STRING
 ;
 
+optionParameterList
+: '(' optionParameter (',' optionParameter)* ')'
+;
+
+optionParameter
+: key=tablePropertyKey (EQ? value=optionValue)?
--- End diff --

We could remove `EQ?` here. This is actually not supported by data source 
tables. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13517: [SPARK-14839][SQL] Support for other types as opt...

2016-06-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/13517#discussion_r65991481
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -435,6 +434,37 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder {
   }
 
   /**
+   * Parse a key-value map from a [[OptionParameterListContext]], assuming 
all values are
+   * specified. This allows string, boolean, decimal and integer literals 
which are converted
+   * to strings.
+   */
+  override def visitOptionParameterList(ctx: OptionParameterListContext): 
Map[String, String] = {
+// TODO: Currently it does not treat null. Hive does not allow null 
for metadata and
+// throws an exception.
+val properties = ctx.optionParameter.asScala.map { property =>
+  val key = visitTablePropertyKey(property.key)
+  val value = if (property.value.STRING != null) {
+string(property.value.STRING)
+  } else if (property.value.booleanValue != null) {
+property.value.getText.toLowerCase
+  } else {
+property.value.getText
+  }
+  key -> value
+}
+
+// Check for duplicate property names.
+checkDuplicateKeys(properties, ctx)
+val props = properties.toMap
+val badKeys = props.filter { case (_, v) => v == null }.keys
+if (badKeys.nonEmpty) {
+  throw operationNotAllowed(
+s"Values should not be specified for key(s): 
${badKeys.mkString("[", ",", "]")}", ctx)
--- End diff --

Thanks! I just fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13517: [SPARK-14839][SQL] Support for other types as opt...

2016-06-06 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/13517#discussion_r65975446
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -435,6 +434,37 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder {
   }
 
   /**
+   * Parse a key-value map from a [[OptionParameterListContext]], assuming 
all values are
+   * specified. This allows string, boolean, decimal and integer literals 
which are converted
+   * to strings.
+   */
+  override def visitOptionParameterList(ctx: OptionParameterListContext): 
Map[String, String] = {
+// TODO: Currently it does not treat null. Hive does not allow null 
for metadata and
+// throws an exception.
+val properties = ctx.optionParameter.asScala.map { property =>
+  val key = visitTablePropertyKey(property.key)
+  val value = if (property.value.STRING != null) {
+string(property.value.STRING)
+  } else if (property.value.booleanValue != null) {
+property.value.getText.toLowerCase
+  } else {
+property.value.getText
+  }
+  key -> value
+}
+
+// Check for duplicate property names.
+checkDuplicateKeys(properties, ctx)
+val props = properties.toMap
+val badKeys = props.filter { case (_, v) => v == null }.keys
+if (badKeys.nonEmpty) {
+  throw operationNotAllowed(
+s"Values should not be specified for key(s): 
${badKeys.mkString("[", ",", "]")}", ctx)
--- End diff --

This error message does not match with the behavior.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13517: [SPARK-14839][SQL] Support for other types as opt...

2016-06-05 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/13517#discussion_r65835938
  
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -180,6 +180,9 @@ def json(self, path, schema=None, 
primitivesAsString=None, prefersDecimal=None,
 :param path: string represents path to the JSON dataset,
  or RDD of Strings storing JSON objects.
 :param schema: an optional :class:`StructType` for the input 
schema.
+:param samplingRatio: sets the ratio for sampling and reading the 
input data to infer
--- End diff --

Ah, I see. It does not affect the actual I/O but just drops some and then 
try to infer the schema. 
I will remove the change.

BTW, actually, I have found another one 
[`mergeSchema`](https://github.com/apache/spark/blob/431542765785304edb76a19885fbc5f9b8ae7d64/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L148-L152)
 option in Parquet data source, which I guess should be located in 
`ParquetOptions`. Can this be done here together maybe..?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13517: [SPARK-14839][SQL] Support for other types as opt...

2016-06-05 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/13517#discussion_r65834648
  
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -180,6 +180,9 @@ def json(self, path, schema=None, 
primitivesAsString=None, prefersDecimal=None,
 :param path: string represents path to the JSON dataset,
  or RDD of Strings storing JSON objects.
 :param schema: an optional :class:`StructType` for the input 
schema.
+:param samplingRatio: sets the ratio for sampling and reading the 
input data to infer
--- End diff --

it was actually intentional that samplingRatio was undocumented, because 
regardless the value, Spark still needs to read all the data so this might as 
well be 1 all the time.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13517: [SPARK-14839][SQL] Support for other types as opt...

2016-06-05 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/13517

[SPARK-14839][SQL] Support for other types as option in OPTIONS clause

## What changes were proposed in this pull request?

Currently, Scala API supports to take options with the types, `String`, 
`Long`, `Double` and `Boolean` and Python API also supports other types. 

This PR adds the support for other types (string, boolean, double and 
integer) as options so that support the options for data sources in a 
consistent way.

While looking up other options, I realised that `samplingRatio` 
documentation is missing. So, I added the documentation.

Also, `TODO add bucketing and partitioning.` was removed because it was 
resolved in 
https://github.com/apache/spark/commit/24bea000476cdd0b43be5160a76bc5b170ef0b42

## How was this patch tested?

Unit test in `MetastoreDataSourcesSuite.scala`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark SPARK-14839

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13517.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13517


commit ba014eee8f910262b6db2d9ec4fe84478890a03c
Author: hyukjinkwon 
Date:   2016-06-05T06:26:49Z

Support for other types in option clause in SQL Syntax

commit 1f94ca4a47ae3ae99d41551466f42d4ce96c3f07
Author: hyukjinkwon 
Date:   2016-06-05T06:38:44Z

Update commnets for a new function

commit 1833d4f66a8c80314f3075d0088cc1903f6ed16f
Author: hyukjinkwon 
Date:   2016-06-05T07:07:55Z

Add a todo

commit 1e6bdd3a3b6e303006004bd893b6bbe20561016f
Author: hyukjinkwon 
Date:   2016-06-05T07:22:16Z

Add override

commit 905bdc594ebbb658d9392585af3a0ab7206841e1
Author: hyukjinkwon 
Date:   2016-06-05T07:27:12Z

Change documentation for the correct explanation




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org