subject:"\[GitHub\] spark pull request\: \[SPARK\-4574\]\[SQL\] Adding support for defining ..."

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-10 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-69475221
  
Yeah, no problem. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-10 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3431


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-10 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-69473637
  
Thanks for working on this guys!  Merging to master.

@yhuai can you clarify the difference between SchemaRelationProvider and 
RelationProvider in the scala doc in your next PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-69436044
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25354/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-69436041
  
  [Test build #25354 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25354/consoleFull)
 for   PR 3431 at commit 
[`7e79ce5`](https://github.com/apache/spark/commit/7e79ce5f80003fab657458cd9e79f4be85319aaa).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `trait SchemaRelationProvider `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-69433301
  
  [Test build #25354 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25354/consoleFull)
 for   PR 3431 at commit 
[`7e79ce5`](https://github.com/apache/spark/commit/7e79ce5f80003fab657458cd9e79f4be85319aaa).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-09 Thread scwf

Github user scwf commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-69433261
  
ok, merged!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-09 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-69433190
  
https://github.com/scwf/spark/pull/22



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-09 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-69433129
  
@scwf I have done it and will have a PR to your branch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-09 Thread scwf

Github user scwf commented on a diff in the pull request:

https://github.com/apache/spark/pull/3431#discussion_r22754925
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala ---
@@ -46,6 +46,33 @@ trait RelationProvider {
 
 /**
  * ::DeveloperApi::
+ * Implemented by objects that produce relations for a specific kind of 
data source.  When
+ * Spark SQL is given a DDL operation with
+ * 1. USING clause: to specify the implemented SchemaRelationProvider
+ * 2. User defined schema: users can define schema optionally when create 
table
+ *
+ * Users may specify the fully qualified class name of a given data 
source.  When that class is
+ * not found Spark SQL will append the class name `DefaultSource` to the 
path, allowing for
+ * less verbose invocation.  For example, 'org.apache.spark.sql.json' 
would resolve to the
+ * data source 'org.apache.spark.sql.json.DefaultSource'
+ *
+ * A new instance of this class with be instantiated each time a DDL call 
is made.
+ */
+@DeveloperApi
+trait SchemaRelationProvider {
+  /**
+   * Returns a new base relation with the given parameters and user 
defined schema.
+   * Note: the parameters' keywords are case insensitive and this 
insensitivity is enforced
+   * by the Map that is passed to the function.
+   */
+  def createRelation(
+  sqlContext: SQLContext,
+  parameters: Map[String, String],
+  schema: Option[StructType]): BaseRelation
--- End diff --

My initial idea is to compatible with the old traits, since we will have 
two traits i will fix this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-09 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/3431#discussion_r22742665
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala ---
@@ -46,6 +46,33 @@ trait RelationProvider {
 
 /**
  * ::DeveloperApi::
+ * Implemented by objects that produce relations for a specific kind of 
data source.  When
+ * Spark SQL is given a DDL operation with
+ * 1. USING clause: to specify the implemented SchemaRelationProvider
+ * 2. User defined schema: users can define schema optionally when create 
table
+ *
+ * Users may specify the fully qualified class name of a given data 
source.  When that class is
+ * not found Spark SQL will append the class name `DefaultSource` to the 
path, allowing for
+ * less verbose invocation.  For example, 'org.apache.spark.sql.json' 
would resolve to the
+ * data source 'org.apache.spark.sql.json.DefaultSource'
+ *
+ * A new instance of this class with be instantiated each time a DDL call 
is made.
+ */
+@DeveloperApi
+trait SchemaRelationProvider {
+  /**
+   * Returns a new base relation with the given parameters and user 
defined schema.
+   * Note: the parameters' keywords are case insensitive and this 
insensitivity is enforced
+   * by the Map that is passed to the function.
+   */
+  def createRelation(
+  sqlContext: SQLContext,
+  parameters: Map[String, String],
+  schema: Option[StructType]): BaseRelation
--- End diff --

Why is this an option?  we have two traits and option is not very friendly 
to java


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-09 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/3431#discussion_r22739718
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/json/JSONRelation.scala ---
@@ -18,31 +18,38 @@
 package org.apache.spark.sql.json
 
 import org.apache.spark.sql.SQLContext
+import org.apache.spark.sql.catalyst.types.StructType
 import org.apache.spark.sql.sources._
 
 private[sql] class DefaultSource extends RelationProvider {
   /** Returns a new base relation with the given parameters. */
   override def createRelation(
   sqlContext: SQLContext,
-  parameters: Map[String, String]): BaseRelation = {
+  parameters: Map[String, String],
+  schema: Option[StructType]): BaseRelation = {
--- End diff --

Default values do not preserve binary compatibility, only source 
compatibility.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-08 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-69288714
  
  [Test build #25287 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25287/consoleFull)
 for   PR 3431 at commit 
[`a852b10`](https://github.com/apache/spark/commit/a852b100b5fc6ddd6a19271f01c8df12c00553a6).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class DefaultSource extends SchemaRelationProvider `
  * `case class ParquetRelation2(`
  * `trait SchemaRelationProvider `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-69288718
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25287/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-08 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-69285760
  
  [Test build #25283 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25283/consoleFull)
 for   PR 3431 at commit 
[`f336a16`](https://github.com/apache/spark/commit/f336a16c4b1e6241d160d2c149cdb13dba4b9263).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class DefaultSource extends SchemaRelationProvider `
  * `case class ParquetRelation2(`
  * `trait SchemaRelationProvider `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-69285764
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25283/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-08 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-69284463
  
  [Test build #25287 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25287/consoleFull)
 for   PR 3431 at commit 
[`a852b10`](https://github.com/apache/spark/commit/a852b100b5fc6ddd6a19271f01c8df12c00553a6).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-08 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/3431#discussion_r22697822
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala ---
@@ -83,10 +118,73 @@ private[sql] class DDLParser extends 
StandardTokenParsers with PackratParsers wi
   protected lazy val className: Parser[String] = repsep(ident, ".") ^^ { 
case s => s.mkString(".")}
 
   protected lazy val pair: Parser[(String, String)] = ident ~ stringLit ^^ 
{ case k ~ v => (k,v) }
+
+  protected lazy val column: Parser[StructField] =
+ident ~ dataType ^^ { case columnName ~ typ =>
+  StructField(cleanIdentifier(columnName), typ)
+}
+
+  protected lazy val primitiveType: Parser[DataType] =
+STRING ^^^ StringType |
+BINARY ^^^ BinaryType |
+BOOLEAN ^^^ BooleanType |
+TINYINT ^^^ ByteType |
+SMALLINT ^^^ ShortType |
+INT ^^^ IntegerType |
+BIGINT ^^^ LongType |
+FLOAT ^^^ FloatType |
+DOUBLE ^^^ DoubleType |
+fixedDecimalType |   // decimal with precision/scale
+DECIMAL ^^^ DecimalType.Unlimited |  // decimal with no precision/scale
+DATE ^^^ DateType |
+TIMESTAMP ^^^ TimestampType |
+VARCHAR ~ "(" ~ numericLit ~ ")" ^^^ StringType
+
+  protected lazy val fixedDecimalType: Parser[DataType] =
+(DECIMAL ~ "(" ~> numericLit) ~ ("," ~> numericLit <~ ")") ^^ {
+  case precision ~ scale => DecimalType(precision.toInt, scale.toInt)
+}
+
+  protected lazy val arrayType: Parser[DataType] =
+ARRAY ~> "<" ~> dataType <~ ">" ^^ {
+  case tpe => ArrayType(tpe)
+}
+
+  protected lazy val mapType: Parser[DataType] =
+MAP ~> "<" ~> dataType ~ "," ~ dataType <~ ">" ^^ {
+  case t1 ~ _ ~ t2 => MapType(t1, t2)
+}
+
+  protected lazy val structField: Parser[StructField] =
+ident ~ ":" ~ dataType ^^ {
+  case fieldName ~ _ ~ tpe => StructField(cleanIdentifier(fieldName), 
tpe, nullable = true)
+}
+
+  protected lazy val structType: Parser[DataType] =
+(STRUCT ~> "<" ~> repsep(structField, ",") <~ ">" ^^ {
+case fields => new StructType(fields)
+}) |
+(STRUCT ~> "<>" ^^ {
+  case fields => new StructType(Nil)
+})
+
+  private[sql] lazy val dataType: Parser[DataType] =
+arrayType |
+mapType |
+structType |
+primitiveType
+
+  protected val escapedIdentifier = "`([^`]+)`".r
+  /** Strips backticks from ident if present */
+  protected def cleanIdentifier(ident: String): String = ident match {
+case escapedIdentifier(i) => i
+case plainIdent => plainIdent
+  }
--- End diff --

Thank you:)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-08 Thread scwf

Github user scwf commented on a diff in the pull request:

https://github.com/apache/spark/pull/3431#discussion_r22697765
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala ---
@@ -83,10 +118,73 @@ private[sql] class DDLParser extends 
StandardTokenParsers with PackratParsers wi
   protected lazy val className: Parser[String] = repsep(ident, ".") ^^ { 
case s => s.mkString(".")}
 
   protected lazy val pair: Parser[(String, String)] = ident ~ stringLit ^^ 
{ case k ~ v => (k,v) }
+
+  protected lazy val column: Parser[StructField] =
+ident ~ dataType ^^ { case columnName ~ typ =>
+  StructField(cleanIdentifier(columnName), typ)
+}
+
+  protected lazy val primitiveType: Parser[DataType] =
+STRING ^^^ StringType |
+BINARY ^^^ BinaryType |
+BOOLEAN ^^^ BooleanType |
+TINYINT ^^^ ByteType |
+SMALLINT ^^^ ShortType |
+INT ^^^ IntegerType |
+BIGINT ^^^ LongType |
+FLOAT ^^^ FloatType |
+DOUBLE ^^^ DoubleType |
+fixedDecimalType |   // decimal with precision/scale
+DECIMAL ^^^ DecimalType.Unlimited |  // decimal with no precision/scale
+DATE ^^^ DateType |
+TIMESTAMP ^^^ TimestampType |
+VARCHAR ~ "(" ~ numericLit ~ ")" ^^^ StringType
+
+  protected lazy val fixedDecimalType: Parser[DataType] =
+(DECIMAL ~ "(" ~> numericLit) ~ ("," ~> numericLit <~ ")") ^^ {
+  case precision ~ scale => DecimalType(precision.toInt, scale.toInt)
+}
+
+  protected lazy val arrayType: Parser[DataType] =
+ARRAY ~> "<" ~> dataType <~ ">" ^^ {
+  case tpe => ArrayType(tpe)
+}
+
+  protected lazy val mapType: Parser[DataType] =
+MAP ~> "<" ~> dataType ~ "," ~ dataType <~ ">" ^^ {
+  case t1 ~ _ ~ t2 => MapType(t1, t2)
+}
+
+  protected lazy val structField: Parser[StructField] =
+ident ~ ":" ~ dataType ^^ {
+  case fieldName ~ _ ~ tpe => StructField(cleanIdentifier(fieldName), 
tpe, nullable = true)
+}
+
+  protected lazy val structType: Parser[DataType] =
+(STRUCT ~> "<" ~> repsep(structField, ",") <~ ">" ^^ {
+case fields => new StructType(fields)
+}) |
+(STRUCT ~> "<>" ^^ {
+  case fields => new StructType(Nil)
+})
+
+  private[sql] lazy val dataType: Parser[DataType] =
+arrayType |
+mapType |
+structType |
+primitiveType
+
+  protected val escapedIdentifier = "`([^`]+)`".r
+  /** Strips backticks from ident if present */
+  protected def cleanIdentifier(ident: String): String = ident match {
+case escapedIdentifier(i) => i
+case plainIdent => plainIdent
+  }
--- End diff --

ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-08 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/3431#discussion_r22697226
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala ---
@@ -83,10 +118,73 @@ private[sql] class DDLParser extends 
StandardTokenParsers with PackratParsers wi
   protected lazy val className: Parser[String] = repsep(ident, ".") ^^ { 
case s => s.mkString(".")}
 
   protected lazy val pair: Parser[(String, String)] = ident ~ stringLit ^^ 
{ case k ~ v => (k,v) }
+
+  protected lazy val column: Parser[StructField] =
+ident ~ dataType ^^ { case columnName ~ typ =>
+  StructField(cleanIdentifier(columnName), typ)
+}
+
+  protected lazy val primitiveType: Parser[DataType] =
+STRING ^^^ StringType |
+BINARY ^^^ BinaryType |
+BOOLEAN ^^^ BooleanType |
+TINYINT ^^^ ByteType |
+SMALLINT ^^^ ShortType |
+INT ^^^ IntegerType |
+BIGINT ^^^ LongType |
+FLOAT ^^^ FloatType |
+DOUBLE ^^^ DoubleType |
+fixedDecimalType |   // decimal with precision/scale
+DECIMAL ^^^ DecimalType.Unlimited |  // decimal with no precision/scale
+DATE ^^^ DateType |
+TIMESTAMP ^^^ TimestampType |
+VARCHAR ~ "(" ~ numericLit ~ ")" ^^^ StringType
+
+  protected lazy val fixedDecimalType: Parser[DataType] =
+(DECIMAL ~ "(" ~> numericLit) ~ ("," ~> numericLit <~ ")") ^^ {
+  case precision ~ scale => DecimalType(precision.toInt, scale.toInt)
+}
+
+  protected lazy val arrayType: Parser[DataType] =
+ARRAY ~> "<" ~> dataType <~ ">" ^^ {
+  case tpe => ArrayType(tpe)
+}
+
+  protected lazy val mapType: Parser[DataType] =
+MAP ~> "<" ~> dataType ~ "," ~ dataType <~ ">" ^^ {
+  case t1 ~ _ ~ t2 => MapType(t1, t2)
+}
+
+  protected lazy val structField: Parser[StructField] =
+ident ~ ":" ~ dataType ^^ {
+  case fieldName ~ _ ~ tpe => StructField(cleanIdentifier(fieldName), 
tpe, nullable = true)
+}
+
+  protected lazy val structType: Parser[DataType] =
+(STRUCT ~> "<" ~> repsep(structField, ",") <~ ">" ^^ {
+case fields => new StructType(fields)
+}) |
+(STRUCT ~> "<>" ^^ {
+  case fields => new StructType(Nil)
+})
+
+  private[sql] lazy val dataType: Parser[DataType] =
+arrayType |
+mapType |
+structType |
+primitiveType
+
+  protected val escapedIdentifier = "`([^`]+)`".r
+  /** Strips backticks from ident if present */
+  protected def cleanIdentifier(ident: String): String = ident match {
+case escapedIdentifier(i) => i
+case plainIdent => plainIdent
+  }
--- End diff --

Seems when we use `ident`, the parser will automatically take care 
backticks. We can remove it. I am sorry I just noticed it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-08 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-69280879
  
  [Test build #25283 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25283/consoleFull)
 for   PR 3431 at commit 
[`f336a16`](https://github.com/apache/spark/commit/f336a16c4b1e6241d160d2c149cdb13dba4b9263).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-08 Thread scwf

Github user scwf commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-69280728
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-08 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-69279959
  
  [Test build #25279 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25279/consoleFull)
 for   PR 3431 at commit 
[`f336a16`](https://github.com/apache/spark/commit/f336a16c4b1e6241d160d2c149cdb13dba4b9263).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class DefaultSource extends SchemaRelationProvider `
  * `case class ParquetRelation2(`
  * `trait SchemaRelationProvider `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-69279963
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25279/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-08 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-69275198
  
  [Test build #25279 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25279/consoleFull)
 for   PR 3431 at commit 
[`f336a16`](https://github.com/apache/spark/commit/f336a16c4b1e6241d160d2c149cdb13dba4b9263).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-08 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-69252713
  
@scwf Some updates in https://github.com/scwf/spark/pull/21.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-07 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-69127505
  
  [Test build #25189 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25189/consoleFull)
 for   PR 3431 at commit 
[`1eeb769`](https://github.com/apache/spark/commit/1eeb769b358ce40a7ef03e86989a35abc2bbe091).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class DefaultSource extends SchemaRelationProvider `
  * `case class ParquetRelation2(`
  * `trait SchemaRelationProvider `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-69127509
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25189/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-07 Thread scwf

Github user scwf commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-69126742
  
@yhuai Also do it later is ok:)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-07 Thread OopsOutOfMemory

Github user OopsOutOfMemory commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-69125862
  
@scwf 
yeah, I agree with u : )  Since the data source api is a septreated module 
and has the same level with HiveContext etc.. it will has its own DDL Parser, 
DDL COMMAND and other commands. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-07 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-69125679
  
@scwf How about we do it later if necessary and put these tightly coupled 
stuff together for now?

@marmbrus Can you take a look? I think this PR is ready.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-07 Thread scwf

Github user scwf commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-69122288
  
Hi @yhuai i have merged your PR. Why i move ```CreateTableUsing``` to a new 
commands.scala is that in future there may be more commands in data source api 
such as commands for sink, so we can let them all in a single commands.scala


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-07 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-6916
  
  [Test build #25189 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25189/consoleFull)
 for   PR 3431 at commit 
[`1eeb769`](https://github.com/apache/spark/commit/1eeb769b358ce40a7ef03e86989a35abc2bbe091).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-07 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-69115562
  
@scwf I have created a PR to your branch for the above comments. I also 
made a few updates in test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-07 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/3431#discussion_r22619369
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala ---
@@ -67,15 +89,25 @@ private[sql] class DDLParser extends 
StandardTokenParsers with PackratParsers wi
   protected lazy val ddl: Parser[LogicalPlan] = createTable
 
   /**
-   * CREATE TEMPORARY TABLE avroTable
+   * `CREATE TEMPORARY TABLE avroTable
* USING org.apache.spark.sql.avro
-   * OPTIONS (path "../hive/src/test/resources/data/files/episodes.avro")
+   * OPTIONS (path "../hive/src/test/resources/data/files/episodes.avro")`
+   * or
+   * `CREATE TEMPORARY TABLE avroTable(intField int, stringField string...)
+   * USING org.apache.spark.sql.avro
+   * OPTIONS (path "../hive/src/test/resources/data/files/episodes.avro")`
*/
   protected lazy val createTable: Parser[LogicalPlan] =
-CREATE ~ TEMPORARY ~ TABLE ~> ident ~ (USING ~> className) ~ (OPTIONS 
~> options) ^^ {
-  case tableName ~ provider ~ opts =>
-CreateTableUsing(tableName, provider, opts)
+  (
+CREATE ~ TEMPORARY ~ TABLE ~> ident
+  ~ (tableCols).? ~ (USING ~> className) ~ (OPTIONS ~> options) ^^ {
+  case tableName ~ columns ~ provider ~ opts =>
+val tblColumns = if(columns.isEmpty) Seq.empty else columns.get
--- End diff --

I feel it is better to use the `Option` instead of converting a `None` to 
an empty `Seq`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-07 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-69071019
  
@scwf @OopsOutOfMemory Actually, can we revert the last commit (moving 
`CreateTableUsing` out of `ddl.scala`)? We'd like to put it in `ddl.scala` 
since it is a part of the work of supporting foreign DDL commands.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-07 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-69000429
  
  [Test build #25156 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25156/consoleFull)
 for   PR 3431 at commit 
[`b621c8f`](https://github.com/apache/spark/commit/b621c8f8885dbcc67dc34c53fbef79346900cb9c).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class DefaultSource extends SchemaRelationProvider `
  * `case class ParquetRelation2(`
  * `sys.error(s"Failed to load class for data source: 
$provider")`
  * `trait SchemaRelationProvider `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-69000432
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25156/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-07 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-68993444
  
  [Test build #25156 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25156/consoleFull)
 for   PR 3431 at commit 
[`b621c8f`](https://github.com/apache/spark/commit/b621c8f8885dbcc67dc34c53fbef79346900cb9c).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-07 Thread scwf

Github user scwf commented on a diff in the pull request:

https://github.com/apache/spark/pull/3431#discussion_r22575577
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala ---
@@ -83,10 +118,73 @@ private[sql] class DDLParser extends 
StandardTokenParsers with PackratParsers wi
   protected lazy val className: Parser[String] = repsep(ident, ".") ^^ { 
case s => s.mkString(".")}
 
   protected lazy val pair: Parser[(String, String)] = ident ~ stringLit ^^ 
{ case k ~ v => (k,v) }
+
+  protected lazy val column: Parser[StructField] =
+ident ~ dataType ^^ { case columnName ~ typ =>
+  StructField(cleanIdentifier(columnName), typ)
+}
+
+  protected lazy val primitiveType: Parser[DataType] =
+STRING ^^^ StringType |
+BINARY ^^^ BinaryType |
+BOOLEAN ^^^ BooleanType |
+TINYINT ^^^ ByteType |
+SMALLINT ^^^ ShortType |
+INT ^^^ IntegerType |
+BIGINT ^^^ LongType |
+FLOAT ^^^ FloatType |
+DOUBLE ^^^ DoubleType |
+fixedDecimalType |   // decimal with precision/scale
+DECIMAL ^^^ DecimalType.Unlimited |  // decimal with no precision/scale
+DATE ^^^ DateType |
+TIMESTAMP ^^^ TimestampType |
+VARCHAR ~ "(" ~ numericLit ~ ")" ^^^ StringType
+
+  protected lazy val fixedDecimalType: Parser[DataType] =
+(DECIMAL ~ "(" ~> numericLit) ~ ("," ~> numericLit <~ ")") ^^ {
+  case precision ~ scale => DecimalType(precision.toInt, scale.toInt)
+}
+
+  protected lazy val arrayType: Parser[DataType] =
+ARRAY ~> "<" ~> dataType <~ ">" ^^ {
+  case tpe => ArrayType(tpe)
+}
+
+  protected lazy val mapType: Parser[DataType] =
+MAP ~> "<" ~> dataType ~ "," ~ dataType <~ ">" ^^ {
+  case t1 ~ _ ~ t2 => MapType(t1, t2)
+}
+
+  protected lazy val structField: Parser[StructField] =
+ident ~ ":" ~ dataType ^^ {
+  case fieldName ~ _ ~ tpe => StructField(cleanIdentifier(fieldName), 
tpe, nullable = true)
+}
+
+  protected lazy val structType: Parser[DataType] =
+(STRUCT ~> "<" ~> repsep(structField, ",") <~ ">" ^^ {
+case fields => new StructType(fields)
+}) |
+(STRUCT ~> "<>" ^^ {
+  case fields => new StructType(Nil)
+})
+
+  private[sql] lazy val dataType: Parser[DataType] =
+arrayType |
+mapType |
+structType |
+primitiveType
+
+  protected val escapedIdentifier = "`([^`]+)`".r
+  /** Strips backticks from ident if present */
+  protected def cleanIdentifier(ident: String): String = ident match {
+case escapedIdentifier(i) => i
+case plainIdent => plainIdent
+  }
 }
 
 private[sql] case class CreateTableUsing(
--- End diff --

ok, i will move it 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-07 Thread OopsOutOfMemory

Github user OopsOutOfMemory commented on a diff in the pull request:

https://github.com/apache/spark/pull/3431#discussion_r22574984
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala ---
@@ -83,10 +118,73 @@ private[sql] class DDLParser extends 
StandardTokenParsers with PackratParsers wi
   protected lazy val className: Parser[String] = repsep(ident, ".") ^^ { 
case s => s.mkString(".")}
 
   protected lazy val pair: Parser[(String, String)] = ident ~ stringLit ^^ 
{ case k ~ v => (k,v) }
+
+  protected lazy val column: Parser[StructField] =
+ident ~ dataType ^^ { case columnName ~ typ =>
+  StructField(cleanIdentifier(columnName), typ)
+}
+
+  protected lazy val primitiveType: Parser[DataType] =
+STRING ^^^ StringType |
+BINARY ^^^ BinaryType |
+BOOLEAN ^^^ BooleanType |
+TINYINT ^^^ ByteType |
+SMALLINT ^^^ ShortType |
+INT ^^^ IntegerType |
+BIGINT ^^^ LongType |
+FLOAT ^^^ FloatType |
+DOUBLE ^^^ DoubleType |
+fixedDecimalType |   // decimal with precision/scale
+DECIMAL ^^^ DecimalType.Unlimited |  // decimal with no precision/scale
+DATE ^^^ DateType |
+TIMESTAMP ^^^ TimestampType |
+VARCHAR ~ "(" ~ numericLit ~ ")" ^^^ StringType
+
+  protected lazy val fixedDecimalType: Parser[DataType] =
+(DECIMAL ~ "(" ~> numericLit) ~ ("," ~> numericLit <~ ")") ^^ {
+  case precision ~ scale => DecimalType(precision.toInt, scale.toInt)
+}
+
+  protected lazy val arrayType: Parser[DataType] =
+ARRAY ~> "<" ~> dataType <~ ">" ^^ {
+  case tpe => ArrayType(tpe)
+}
+
+  protected lazy val mapType: Parser[DataType] =
+MAP ~> "<" ~> dataType ~ "," ~ dataType <~ ">" ^^ {
+  case t1 ~ _ ~ t2 => MapType(t1, t2)
+}
+
+  protected lazy val structField: Parser[StructField] =
+ident ~ ":" ~ dataType ^^ {
+  case fieldName ~ _ ~ tpe => StructField(cleanIdentifier(fieldName), 
tpe, nullable = true)
+}
+
+  protected lazy val structType: Parser[DataType] =
+(STRUCT ~> "<" ~> repsep(structField, ",") <~ ">" ^^ {
+case fields => new StructType(fields)
+}) |
+(STRUCT ~> "<>" ^^ {
+  case fields => new StructType(Nil)
+})
+
+  private[sql] lazy val dataType: Parser[DataType] =
+arrayType |
+mapType |
+structType |
+primitiveType
+
+  protected val escapedIdentifier = "`([^`]+)`".r
+  /** Strips backticks from ident if present */
+  protected def cleanIdentifier(ident: String): String = ident match {
+case escapedIdentifier(i) => i
+case plainIdent => plainIdent
+  }
 }
 
 private[sql] case class CreateTableUsing(
--- End diff --

It would be nice to move this `class CreateTableUsing` to 
`org.apache.spark.sql.sources.commands.scala`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-07 Thread OopsOutOfMemory

Github user OopsOutOfMemory commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-68991149
  
@scwf 
One thing to note : )
The RunnableCommand `CreateTableUsing` is not belong to `ddl.scala`
It should be defined in `org.apache.spark.sql.sources.commands.scala`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-06 Thread scwf

Github user scwf commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-68986040
  
@yhuai if we want to handle it in 
```org.apache.spark.sql.hive.MetastoreRelation$SchemaAttribute.toAttribute```, 
it is hardly to reuse the data type parser of ddl, so maybe we can/should 
revert the ```HiveMetastoreTypes``` ?

Actually i think whether to allow users to define ```struct<>``` is a small 
point, how about let users can do this and then 
```org.apache.spark.sql.hive.MetastoreRelation``` reuse the data type parser of 
ddl?

This two both ok for me, if you have any other concern let me know.





---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-06 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-68982855
  
@scwf Actually, I tried `struct<>` with `runSqlHive` and got an exception. 
https://issues.apache.org/jira/browse/HIVE-3323 seems the original hive jira 
that introduced `struct<>` (for thrift support). Maybe we should not allow 
users to define `struct<>` and take care it specially in 
`org.apache.spark.sql.hive.MetastoreRelation$SchemaAttribute.toAttribute`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-68982516
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25137/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-06 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-68982511
  
  [Test build #25137 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25137/consoleFull)
 for   PR 3431 at commit 
[`d02547f`](https://github.com/apache/spark/commit/d02547f1c2df677fc9b76bd5cf052e8fd700a2b5).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class DefaultSource extends SchemaRelationProvider `
  * `case class ParquetRelation2(`
  * `trait SchemaRelationProvider `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-06 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-68981491
  
@marmbrus told me that `<>` is a delimiter defined in 
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SparkSQLParser.scala#L63.
 So, seems we need to handle `struct<>` specially for now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-06 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-68978806
  
  [Test build #25137 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25137/consoleFull)
 for   PR 3431 at commit 
[`d02547f`](https://github.com/apache/spark/commit/d02547f1c2df677fc9b76bd5cf052e8fd700a2b5).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-06 Thread scwf

Github user scwf commented on a diff in the pull request:

https://github.com/apache/spark/pull/3431#discussion_r22570806
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala ---
@@ -83,10 +118,70 @@ private[sql] class DDLParser extends 
StandardTokenParsers with PackratParsers wi
   protected lazy val className: Parser[String] = repsep(ident, ".") ^^ { 
case s => s.mkString(".")}
 
   protected lazy val pair: Parser[(String, String)] = ident ~ stringLit ^^ 
{ case k ~ v => (k,v) }
+
+  protected lazy val column: Parser[StructField] =
+ident ~ dataType ^^ { case columnName ~ typ =>
+  StructField(cleanIdentifier(columnName), typ)
+}
+
+  protected lazy val primitiveType: Parser[DataType] =
+STRING ^^^ StringType |
+BINARY ^^^ BinaryType |
+BOOLEAN ^^^ BooleanType |
+TINYINT ^^^ ByteType |
+SMALLINT ^^^ ShortType |
+INT ^^^ IntegerType |
+BIGINT ^^^ LongType |
+FLOAT ^^^ FloatType |
+DOUBLE ^^^ DoubleType |
+fixedDecimalType |   // decimal with precision/scale
+DECIMAL ^^^ DecimalType.Unlimited |  // decimal with no precision/scale
+DATE ^^^ DateType |
+TIMESTAMP ^^^ TimestampType |
+VARCHAR ~ "(" ~ numericLit ~ ")" ^^^ StringType
+
+  protected lazy val fixedDecimalType: Parser[DataType] =
+(DECIMAL ~ "(" ~> numericLit) ~ ("," ~> numericLit <~ ")") ^^ {
+  case precision ~ scale => DecimalType(precision.toInt, scale.toInt)
+}
+
+  protected lazy val arrayType: Parser[DataType] =
+ARRAY ~> "<" ~> dataType <~ ">" ^^ {
+  case tpe => ArrayType(tpe)
+}
+
+  protected lazy val mapType: Parser[DataType] =
+MAP ~> "<" ~> dataType ~ "," ~ dataType <~ ">" ^^ {
+  case t1 ~ _ ~ t2 => MapType(t1, t2)
+}
+
+  protected lazy val structField: Parser[StructField] =
+ident ~ ":" ~ dataType ^^ {
+  case fieldName ~ _ ~ tpe => StructField(cleanIdentifier(fieldName), 
tpe, nullable = true)
+}
+
+  protected lazy val structType: Parser[DataType] =
+STRUCT ~> "<" ~> repsep(structField, ",") <~ ">" ^^ {
--- End diff --

very wired, the old version also use ```repsep(structField, ",")``` and no 
```STRUCT ~> "<>" ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-06 Thread scwf

Github user scwf commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-68978023
  
Seems this change can not parse ```structField struct<>```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-06 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/3431#discussion_r22570712
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala ---
@@ -83,10 +118,70 @@ private[sql] class DDLParser extends 
StandardTokenParsers with PackratParsers wi
   protected lazy val className: Parser[String] = repsep(ident, ".") ^^ { 
case s => s.mkString(".")}
 
   protected lazy val pair: Parser[(String, String)] = ident ~ stringLit ^^ 
{ case k ~ v => (k,v) }
+
+  protected lazy val column: Parser[StructField] =
+ident ~ dataType ^^ { case columnName ~ typ =>
+  StructField(cleanIdentifier(columnName), typ)
+}
+
+  protected lazy val primitiveType: Parser[DataType] =
+STRING ^^^ StringType |
+BINARY ^^^ BinaryType |
+BOOLEAN ^^^ BooleanType |
+TINYINT ^^^ ByteType |
+SMALLINT ^^^ ShortType |
+INT ^^^ IntegerType |
+BIGINT ^^^ LongType |
+FLOAT ^^^ FloatType |
+DOUBLE ^^^ DoubleType |
+fixedDecimalType |   // decimal with precision/scale
+DECIMAL ^^^ DecimalType.Unlimited |  // decimal with no precision/scale
+DATE ^^^ DateType |
+TIMESTAMP ^^^ TimestampType |
+VARCHAR ~ "(" ~ numericLit ~ ")" ^^^ StringType
+
+  protected lazy val fixedDecimalType: Parser[DataType] =
+(DECIMAL ~ "(" ~> numericLit) ~ ("," ~> numericLit <~ ")") ^^ {
+  case precision ~ scale => DecimalType(precision.toInt, scale.toInt)
+}
+
+  protected lazy val arrayType: Parser[DataType] =
+ARRAY ~> "<" ~> dataType <~ ">" ^^ {
+  case tpe => ArrayType(tpe)
+}
+
+  protected lazy val mapType: Parser[DataType] =
+MAP ~> "<" ~> dataType ~ "," ~ dataType <~ ">" ^^ {
+  case t1 ~ _ ~ t2 => MapType(t1, t2)
+}
+
+  protected lazy val structField: Parser[StructField] =
+ident ~ ":" ~ dataType ^^ {
+  case fieldName ~ _ ~ tpe => StructField(cleanIdentifier(fieldName), 
tpe, nullable = true)
+}
+
+  protected lazy val structType: Parser[DataType] =
+STRUCT ~> "<" ~> repsep(structField, ",") <~ ">" ^^ {
--- End diff --

@scwf I am not sure why it failed. I tried the following change and the 
error disappears.
```scala
protected lazy val structType: Parser[DataType] =
  (STRUCT ~> "<" ~> repsep(structField, ",") <~ ">" ^^ {
case fields => new StructType(fields)
  }) |
  (STRUCT ~> "<>" ^^ {
  case fields => new StructType(Nil)
  })
``` 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-06 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/3431#discussion_r22570296
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala ---
@@ -83,10 +118,70 @@ private[sql] class DDLParser extends 
StandardTokenParsers with PackratParsers wi
   protected lazy val className: Parser[String] = repsep(ident, ".") ^^ { 
case s => s.mkString(".")}
 
   protected lazy val pair: Parser[(String, String)] = ident ~ stringLit ^^ 
{ case k ~ v => (k,v) }
+
+  protected lazy val column: Parser[StructField] =
+ident ~ dataType ^^ { case columnName ~ typ =>
+  StructField(cleanIdentifier(columnName), typ)
+}
+
+  protected lazy val primitiveType: Parser[DataType] =
+STRING ^^^ StringType |
+BINARY ^^^ BinaryType |
+BOOLEAN ^^^ BooleanType |
+TINYINT ^^^ ByteType |
+SMALLINT ^^^ ShortType |
+INT ^^^ IntegerType |
+BIGINT ^^^ LongType |
+FLOAT ^^^ FloatType |
+DOUBLE ^^^ DoubleType |
+fixedDecimalType |   // decimal with precision/scale
+DECIMAL ^^^ DecimalType.Unlimited |  // decimal with no precision/scale
+DATE ^^^ DateType |
+TIMESTAMP ^^^ TimestampType |
+VARCHAR ~ "(" ~ numericLit ~ ")" ^^^ StringType
+
+  protected lazy val fixedDecimalType: Parser[DataType] =
+(DECIMAL ~ "(" ~> numericLit) ~ ("," ~> numericLit <~ ")") ^^ {
+  case precision ~ scale => DecimalType(precision.toInt, scale.toInt)
+}
+
+  protected lazy val arrayType: Parser[DataType] =
+ARRAY ~> "<" ~> dataType <~ ">" ^^ {
+  case tpe => ArrayType(tpe)
+}
+
+  protected lazy val mapType: Parser[DataType] =
+MAP ~> "<" ~> dataType ~ "," ~ dataType <~ ">" ^^ {
+  case t1 ~ _ ~ t2 => MapType(t1, t2)
+}
+
+  protected lazy val structField: Parser[StructField] =
+ident ~ ":" ~ dataType ^^ {
+  case fieldName ~ _ ~ tpe => StructField(cleanIdentifier(fieldName), 
tpe, nullable = true)
+}
+
+  protected lazy val structType: Parser[DataType] =
+STRUCT ~> "<" ~> repsep(structField, ",") <~ ">" ^^ {
--- End diff --

Seems this line caused the failure. But, I have not found a fix. Still 
trying.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-06 Thread scwf

Github user scwf commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-68976404
  
Have not found the root cause, and idea here @yhuai ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-06 Thread scwf

Github user scwf commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-68975092
  
The parser change leads to ```HiveCompatibilitySuite``` failed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-06 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-68973513
  
  [Test build #25128 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25128/consoleFull)
 for   PR 3431 at commit 
[`ddab984`](https://github.com/apache/spark/commit/ddab9848dee0bbf0cafbdd8339889153a26b38b3).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class DefaultSource extends SchemaRelationProvider `
  * `case class ParquetRelation2(`
  * `trait SchemaRelationProvider `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-68973517
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25128/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-06 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-68970836
  
  [Test build #25128 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25128/consoleFull)
 for   PR 3431 at commit 
[`ddab984`](https://github.com/apache/spark/commit/ddab9848dee0bbf0cafbdd8339889153a26b38b3).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-06 Thread scwf

Github user scwf commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-68970820
  
And i will add the more tests here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-06 Thread scwf

Github user scwf commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-68970720
  
@yhuai i have merged the PR:) 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-06 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-68966142
  
@scwf I made a pull request (https://github.com/scwf/spark/pull/19/) to 
your branch. This PR moves the data type parser into our `DDLParser`. With it, 
data type keywords are also parsed in a case insensitive way. Let me know if it 
makes sense. Also, we probably want to add more tests for the parser.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-06 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-68953584
  
@scwf I am working on the data type parser. Will try to make a pull request 
to your branch soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-06 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/3431#discussion_r22548615
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala ---
@@ -50,6 +50,7 @@ private[sql] class DDLParser extends StandardTokenParsers 
with PackratParsers wi
 lexical.allCaseVersions(k.str).map(x => x : Parser[String]).reduce(_ | 
_)
 
   protected val CREATE = Keyword("CREATE")
+  protected val DECIMAL = Keyword("DECIMAL")
--- End diff --

Oh, actually, I put this comment on the wrong line. I meant to put it under 
the line 108.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-06 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/3431#discussion_r22547365
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala ---
@@ -83,10 +99,104 @@ private[sql] class DDLParser extends 
StandardTokenParsers with PackratParsers wi
   protected lazy val className: Parser[String] = repsep(ident, ".") ^^ { 
case s => s.mkString(".")}
 
   protected lazy val pair: Parser[(String, String)] = ident ~ stringLit ^^ 
{ case k ~ v => (k,v) }
+
+  protected lazy val column: Parser[StructField] =
+  ( ident ~  ident ^^ { case name ~ typ =>
+  StructField(name, metastoreTypes.toDataType(typ))
+}
+  |
+ident ~ (DECIMAL ~ "(" ~> numericLit) ~ ("," ~> numericLit <~ ")") ^^ {
+  case name ~ precision ~ scale =>
+StructField(name, DecimalType(precision.toInt, scale.toInt))
+}
+  )
+}
+
+/**
+ * :: DeveloperApi ::
+ * Provides a parser for data types.
+ */
+@DeveloperApi
+private[sql] class MetastoreTypes extends RegexParsers {
+  protected lazy val primitiveType: Parser[DataType] =
+"string" ^^^ StringType |
+  "float" ^^^ FloatType |
+  "int" ^^^ IntegerType |
+  "tinyint" ^^^ ByteType |
+  "smallint" ^^^ ShortType |
+  "double" ^^^ DoubleType |
+  "bigint" ^^^ LongType |
+  "binary" ^^^ BinaryType |
+  "boolean" ^^^ BooleanType |
+  fixedDecimalType | // decimal with 
precision/scale
+  "decimal" ^^^ DecimalType.Unlimited |  // decimal with no 
precision/scale
+  "date" ^^^ DateType |
+  "timestamp" ^^^ TimestampType |
+  "varchar\\((\\d+)\\)".r ^^^ StringType
+
+  protected lazy val fixedDecimalType: Parser[DataType] =
+("decimal" ~> "(" ~> "\\d+".r) ~ ("," ~> "\\d+".r <~ ")") ^^ {
+  case precision ~ scale =>
+DecimalType(precision.toInt, scale.toInt)
+}
+
+  protected lazy val arrayType: Parser[DataType] =
+"array" ~> "<" ~> dataType <~ ">" ^^ {
+  case tpe => ArrayType(tpe)
+}
+
+  protected lazy val mapType: Parser[DataType] =
+"map" ~> "<" ~> dataType ~ "," ~ dataType <~ ">" ^^ {
+  case t1 ~ _ ~ t2 => MapType(t1, t2)
+}
+
+  protected lazy val structField: Parser[StructField] =
+"[a-zA-Z0-9_]*".r ~ ":" ~ dataType ^^ {
+  case name ~ _ ~ tpe => StructField(name, tpe, nullable = true)
+}
+
+  protected lazy val structType: Parser[DataType] =
+"struct" ~> "<" ~> repsep(structField,",") <~ ">"  ^^ {
+  case fields => new StructType(fields)
+}
+
+  private[sql] lazy val dataType: Parser[DataType] =
+arrayType |
+  mapType |
+  structType |
+  primitiveType
+
+  def toDataType(metastoreType: String): DataType = parseAll(dataType, 
metastoreType) match {
+case Success(result, _) => result
+case failure: NoSuccess => sys.error(s"Unsupported dataType: 
$metastoreType")
+  }
+
+  def toMetastoreType(dt: DataType): String = dt match {
--- End diff --

Seems we do not need it at here since we only need to parse strings. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-06 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/3431#discussion_r22535746
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala ---
@@ -50,6 +50,7 @@ private[sql] class DDLParser extends StandardTokenParsers 
with PackratParsers wi
 lexical.allCaseVersions(k.str).map(x => x : Parser[String]).reduce(_ | 
_)
 
   protected val CREATE = Keyword("CREATE")
+  protected val DECIMAL = Keyword("DECIMAL")
--- End diff --

Seems `fixedDecimalType` defined below already does the work. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-06 Thread OopsOutOfMemory

Github user OopsOutOfMemory commented on a diff in the pull request:

https://github.com/apache/spark/pull/3431#discussion_r22526157
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala ---
@@ -83,10 +99,104 @@ private[sql] class DDLParser extends 
StandardTokenParsers with PackratParsers wi
   protected lazy val className: Parser[String] = repsep(ident, ".") ^^ { 
case s => s.mkString(".")}
 
   protected lazy val pair: Parser[(String, String)] = ident ~ stringLit ^^ 
{ case k ~ v => (k,v) }
+
+  protected lazy val column: Parser[StructField] =
+  ( ident ~  ident ^^ { case name ~ typ =>
+  StructField(name, metastoreTypes.toDataType(typ))
--- End diff --

In order to handle also possible user definition, here we can make the type 
forced be lower case so __ toDataType__ can handle it correctly. Like:
```
StructField(name, metastoreTypes.toDataType(typ.toLowerCase)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-06 Thread OopsOutOfMemory

Github user OopsOutOfMemory commented on a diff in the pull request:

https://github.com/apache/spark/pull/3431#discussion_r22525893
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala ---
@@ -83,10 +99,104 @@ private[sql] class DDLParser extends 
StandardTokenParsers with PackratParsers wi
   protected lazy val className: Parser[String] = repsep(ident, ".") ^^ { 
case s => s.mkString(".")}
 
   protected lazy val pair: Parser[(String, String)] = ident ~ stringLit ^^ 
{ case k ~ v => (k,v) }
+
+  protected lazy val column: Parser[StructField] =
+  ( ident ~  ident ^^ { case name ~ typ =>
+  StructField(name, metastoreTypes.toDataType(typ))
+}
+  |
+ident ~ (DECIMAL ~ "(" ~> numericLit) ~ ("," ~> numericLit <~ ")") ^^ {
+  case name ~ precision ~ scale =>
+StructField(name, DecimalType(precision.toInt, scale.toInt))
+}
+  )
+}
+
+/**
+ * :: DeveloperApi ::
+ * Provides a parser for data types.
+ */
+@DeveloperApi
+private[sql] class MetastoreTypes extends RegexParsers {
+  protected lazy val primitiveType: Parser[DataType] =
+"string" ^^^ StringType |
--- End diff --

We'd better consider the lowercase and upper case of `type` which user 
defined.
e.g:
```
name STRING,  
salary FLOAT,  
subordinates ARRAY,  
deductions MAP,  
address STRUCT  
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-06 Thread OopsOutOfMemory

Github user OopsOutOfMemory commented on a diff in the pull request:

https://github.com/apache/spark/pull/3431#discussion_r22525682
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala ---
@@ -67,15 +68,30 @@ private[sql] class DDLParser extends 
StandardTokenParsers with PackratParsers wi
   protected lazy val ddl: Parser[LogicalPlan] = createTable
 
   /**
-   * CREATE TEMPORARY TABLE avroTable
+   * `CREATE TEMPORARY TABLE avroTable
* USING org.apache.spark.sql.avro
-   * OPTIONS (path "../hive/src/test/resources/data/files/episodes.avro")
+   * OPTIONS (path "../hive/src/test/resources/data/files/episodes.avro")`
+   * or
+   * `CREATE TEMPORARY TABLE avroTable(intField int, stringField string...)
+   * USING org.apache.spark.sql.avro
+   * OPTIONS (path "../hive/src/test/resources/data/files/episodes.avro")`
*/
   protected lazy val createTable: Parser[LogicalPlan] =
-CREATE ~ TEMPORARY ~ TABLE ~> ident ~ (USING ~> className) ~ (OPTIONS 
~> options) ^^ {
+  (  CREATE ~ TEMPORARY ~ TABLE ~> ident ~ (USING ~> className) ~ (OPTIONS 
~> options) ^^ {
   case tableName ~ provider ~ opts =>
-CreateTableUsing(tableName, provider, opts)
+CreateTableUsing(tableName, Seq.empty, provider, opts)
+}
+  |
--- End diff --

These two conditions can be merged with :
```
 CREATE ~ TEMPORARY ~ TABLE ~> ident ~ (tableCols).? ~ (USING ~> className) 
~ (OPTIONS ~> options) ^^ {
  case tableName ~ columns ~ provider ~ opts =>
val tblColumns = if(columns.isEmpty) Seq.empty else columns
CreateTableUsing(tableName,tblColumns, provider, opts)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-06 Thread scwf

Github user scwf commented on a diff in the pull request:

https://github.com/apache/spark/pull/3431#discussion_r22516253
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala ---
@@ -50,6 +50,7 @@ private[sql] class DDLParser extends StandardTokenParsers 
with PackratParsers wi
 lexical.allCaseVersions(k.str).map(x => x : Parser[String]).reduce(_ | 
_)
 
   protected val CREATE = Keyword("CREATE")
+  protected val DECIMAL = Keyword("DECIMAL")
--- End diff --

This is used for parse decimal type format of ```decimal(10,2)``` format.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-05 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/3431#discussion_r22508765
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/sources/NewTableScanSuite.scala ---
@@ -0,0 +1,136 @@
+/*
+* Licensed to the Apache Software Foundation (ASF) under one or more
+* contributor license agreements.  See the NOTICE file distributed with
+* this work for additional information regarding copyright ownership.
+* The ASF licenses this file to You under the Apache License, Version 2.0
+* (the "License"); you may not use this file except in compliance with
+* the License.  You may obtain a copy of the License at
+*
+*http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+package org.apache.spark.sql.sources
+
+import org.apache.spark.sql._
+import java.sql.{Timestamp, Date}
+import org.apache.spark.sql.execution.RDDConversions
+
+case class PrimaryData(
+stringField: String,
+intField: Int,
+longField: Long,
+floatField: Float,
+doubleField: Double,
+shortField: Short,
+byteField: Byte,
+booleanField: Boolean,
+decimalField: BigDecimal,
+date: Date,
+timestampField: Timestamp)
--- End diff --

Also test complex data types?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-05 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/3431#discussion_r22507890
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala ---
@@ -50,6 +50,7 @@ private[sql] class DDLParser extends StandardTokenParsers 
with PackratParsers wi
 lexical.allCaseVersions(k.str).map(x => x : Parser[String]).reduce(_ | 
_)
 
   protected val CREATE = Keyword("CREATE")
+  protected val DECIMAL = Keyword("DECIMAL")
--- End diff --

Can you explain the reason that we need this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2014-12-30 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-68416333
  
  [Test build #24919 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24919/consoleFull)
 for   PR 3431 at commit 
[`cf982d2`](https://github.com/apache/spark/commit/cf982d2759c61988f1612380a950942cd41e734a).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class DefaultSource extends SchemaRelationProvider `
  * `case class ParquetRelation2(`
  * `trait SchemaRelationProvider `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2014-12-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-68416337
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24919/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2014-12-30 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-68412433
  
  [Test build #24919 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24919/consoleFull)
 for   PR 3431 at commit 
[`cf982d2`](https://github.com/apache/spark/commit/cf982d2759c61988f1612380a950942cd41e734a).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2014-12-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-68410619
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24915/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2014-12-30 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-68410616
  
  [Test build #24915 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24915/consoleFull)
 for   PR 3431 at commit 
[`445b57b`](https://github.com/apache/spark/commit/445b57bc8dca3753be5dfeecc964be072933898a).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class DefaultSource extends SchemaRelationProvider `
  * `case class ParquetRelation2(`
  * `trait SchemaRelationProvider `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2014-12-30 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-68409631
  
  [Test build #24915 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24915/consoleFull)
 for   PR 3431 at commit 
[`445b57b`](https://github.com/apache/spark/commit/445b57bc8dca3753be5dfeecc964be072933898a).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2014-12-30 Thread scwf

Github user scwf commented on a diff in the pull request:

https://github.com/apache/spark/pull/3431#discussion_r22369162
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/sources/PrunedScanSuite.scala ---
@@ -19,10 +19,11 @@ package org.apache.spark.sql.sources
 
 import org.apache.spark.sql._
 
-class PrunedScanSource extends RelationProvider {
+class PrunedScanSource extends SchemaRelationProvider {
--- End diff --

ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2014-12-30 Thread scwf

Github user scwf commented on a diff in the pull request:

https://github.com/apache/spark/pull/3431#discussion_r22369155
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala ---
@@ -99,8 +209,15 @@ private[sql] case class CreateTableUsing(
 sys.error(s"Failed to load class for data source: $provider")
 }
 }
-val dataSource = 
clazz.newInstance().asInstanceOf[org.apache.spark.sql.sources.RelationProvider]
-val relation = dataSource.createRelation(sqlContext, new 
CaseInsensitiveMap(options))
+val dataSource =
+  
clazz.newInstance().asInstanceOf[org.apache.spark.sql.sources.SchemaRelationProvider]
--- End diff --

Ok so if users do not specify schema in ddl, go with 
```RelationProvider```, otherwith go with ```SchemaRelationProvider```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2014-12-30 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/3431#discussion_r22364492
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/sources/PrunedScanSuite.scala ---
@@ -19,10 +19,11 @@ package org.apache.spark.sql.sources
 
 import org.apache.spark.sql._
 
-class PrunedScanSource extends RelationProvider {
+class PrunedScanSource extends SchemaRelationProvider {
--- End diff --

I don't think we should change these, otherwise we have no tests for 
libraries that were written against the old api.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2014-12-30 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/3431#discussion_r22364389
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala ---
@@ -99,8 +209,15 @@ private[sql] case class CreateTableUsing(
 sys.error(s"Failed to load class for data source: $provider")
 }
 }
-val dataSource = 
clazz.newInstance().asInstanceOf[org.apache.spark.sql.sources.RelationProvider]
-val relation = dataSource.createRelation(sqlContext, new 
CaseInsensitiveMap(options))
+val dataSource =
+  
clazz.newInstance().asInstanceOf[org.apache.spark.sql.sources.SchemaRelationProvider]
--- End diff --

This is still breaking support for libraries that were written against the 
original API.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2014-12-30 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-68378887
  
  [Test build #24898 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24898/consoleFull)
 for   PR 3431 at commit 
[`02a662c`](https://github.com/apache/spark/commit/02a662c4cb3605b3abc7033ad14e3b7400c30964).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class DefaultSource extends SchemaRelationProvider `
  * `case class ParquetRelation2(`
  * `trait SchemaRelationProvider `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2014-12-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-68378891
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24898/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2014-12-30 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-68372387
  
  [Test build #24898 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24898/consoleFull)
 for   PR 3431 at commit 
[`02a662c`](https://github.com/apache/spark/commit/02a662c4cb3605b3abc7033ad14e3b7400c30964).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2014-12-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-68371499
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24897/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2014-12-30 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-68371497
  
  [Test build #24897 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24897/consoleFull)
 for   PR 3431 at commit 
[`44eb70c`](https://github.com/apache/spark/commit/44eb70cda9049a68d7a3a4a4ca74e5bc41f04991).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class DefaultSource extends SchemaRelationProvider `
  * `case class ParquetRelation2(`
  * `trait SchemaRelationProvider `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2014-12-30 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-68371410
  
  [Test build #24897 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24897/consoleFull)
 for   PR 3431 at commit 
[`44eb70c`](https://github.com/apache/spark/commit/44eb70cda9049a68d7a3a4a4ca74e5bc41f04991).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2014-12-29 Thread scwf

Github user scwf commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-68336828
  
Hi @marmbrus, still working on this, tomorrow i will update this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2014-12-29 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-68336623
  
ping.  any progress here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2014-12-17 Thread scwf

Github user scwf commented on a diff in the pull request:

https://github.com/apache/spark/pull/3431#discussion_r21974049
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala ---
@@ -49,6 +48,15 @@ private[sql] class DDLParser extends 
StandardTokenParsers with PackratParsers wi
   protected implicit def asParser(k: Keyword): Parser[String] =
 lexical.allCaseVersions(k.str).map(x => x : Parser[String]).reduce(_ | 
_)
 
+  protected val STRING = Keyword("STRING")
+  protected val SHORT = Keyword("SHORT")
+  protected val DOUBLE = Keyword("DOUBLE")
+  protected val BOOLEAN = Keyword("BOOLEAN")
+  protected val BYTE = Keyword("BYTE")
+  protected val FLOAT = Keyword("FLOAT")
+  protected val INT = Keyword("INT")
+  protected val INTEGER = Keyword("INTEGER")
+  protected val LONG = Keyword("LONG")
--- End diff --

Does this mean we will follow hive ql in future?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2014-12-17 Thread scwf

Github user scwf commented on a diff in the pull request:

https://github.com/apache/spark/pull/3431#discussion_r21973647
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/json/JSONRelation.scala ---
@@ -18,31 +18,38 @@
 package org.apache.spark.sql.json
 
 import org.apache.spark.sql.SQLContext
+import org.apache.spark.sql.catalyst.types.StructType
 import org.apache.spark.sql.sources._
 
 private[sql] class DefaultSource extends RelationProvider {
   /** Returns a new base relation with the given parameters. */
   override def createRelation(
   sqlContext: SQLContext,
-  parameters: Map[String, String]): BaseRelation = {
+  parameters: Map[String, String],
+  schema: Option[StructType]): BaseRelation = {
 val fileName = parameters.getOrElse("path", sys.error("Option 'path' 
not specified"))
 val samplingRatio = 
parameters.get("samplingRatio").map(_.toDouble).getOrElse(1.0)
 
-JSONRelation(fileName, samplingRatio)(sqlContext)
+JSONRelation(fileName, samplingRatio, schema)(sqlContext)
   }
 }
 
-private[sql] case class JSONRelation(fileName: String, samplingRatio: 
Double)(
+private[sql] case class JSONRelation(
+fileName: String,
+samplingRatio: Double,
+_schema: Option[StructType])(
--- End diff --

ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2014-12-17 Thread scwf

Github user scwf commented on a diff in the pull request:

https://github.com/apache/spark/pull/3431#discussion_r21973220
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/json/JSONRelation.scala ---
@@ -18,31 +18,38 @@
 package org.apache.spark.sql.json
 
 import org.apache.spark.sql.SQLContext
+import org.apache.spark.sql.catalyst.types.StructType
 import org.apache.spark.sql.sources._
 
 private[sql] class DefaultSource extends RelationProvider {
   /** Returns a new base relation with the given parameters. */
   override def createRelation(
   sqlContext: SQLContext,
-  parameters: Map[String, String]): BaseRelation = {
+  parameters: Map[String, String],
+  schema: Option[StructType]): BaseRelation = {
--- End diff --

Or using a default value for schema:
```
schema: Option[StructType] = None
``` ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2014-12-16 Thread mateiz

Github user mateiz commented on a diff in the pull request:

https://github.com/apache/spark/pull/3431#discussion_r21950698
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala ---
@@ -49,6 +48,15 @@ private[sql] class DDLParser extends 
StandardTokenParsers with PackratParsers wi
   protected implicit def asParser(k: Keyword): Parser[String] =
 lexical.allCaseVersions(k.str).map(x => x : Parser[String]).reduce(_ | 
_)
 
+  protected val STRING = Keyword("STRING")
+  protected val SHORT = Keyword("SHORT")
+  protected val DOUBLE = Keyword("DOUBLE")
+  protected val BOOLEAN = Keyword("BOOLEAN")
+  protected val BYTE = Keyword("BYTE")
+  protected val FLOAT = Keyword("FLOAT")
+  protected val INT = Keyword("INT")
+  protected val INTEGER = Keyword("INTEGER")
+  protected val LONG = Keyword("LONG")
--- End diff --

IMO we should follow Hive here. I don't think it's that confusing. I would 
actually start by doing only the Hive types and not SHORT or BYTE in order to 
make it clearer to users. See 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types for what 
they are called.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2014-12-16 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-67250038
  
Thanks for working on this!  I know a couple of people who want to use this 
in data sources they are writing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2014-12-16 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/3431#discussion_r21940324
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/sources/FilteredScanSuite.scala ---
@@ -24,18 +24,26 @@ import org.apache.spark.sql._
 class FilteredScanSource extends RelationProvider {
   override def createRelation(
   sqlContext: SQLContext,
-  parameters: Map[String, String]): BaseRelation = {
-SimpleFilteredScan(parameters("from").toInt, 
parameters("to").toInt)(sqlContext)
+  parameters: Map[String, String],
--- End diff --

Also make sure to add a test case for identifiers that have backticks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2014-12-16 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/3431#discussion_r21940281
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala ---
@@ -67,16 +75,35 @@ private[sql] class DDLParser extends 
StandardTokenParsers with PackratParsers wi
   protected lazy val ddl: Parser[LogicalPlan] = createTable
 
   /**
-   * CREATE TEMPORARY TABLE avroTable
+   * `CREATE TEMPORARY TABLE avroTable
* USING org.apache.spark.sql.avro
-   * OPTIONS (path "../hive/src/test/resources/data/files/episodes.avro")
+   * OPTIONS (path "../hive/src/test/resources/data/files/episodes.avro")`
+   * or
+   * `CREATE TEMPORARY TABLE avroTable(intField int, stringField string...)
+   * USING org.apache.spark.sql.avro
+   * OPTIONS (path "../hive/src/test/resources/data/files/episodes.avro")`
*/
   protected lazy val createTable: Parser[LogicalPlan] =
-CREATE ~ TEMPORARY ~ TABLE ~> ident ~ (USING ~> className) ~ (OPTIONS 
~> options) ^^ {
+  (  CREATE ~ TEMPORARY ~ TABLE ~> ident ~ (USING ~> className) ~ (OPTIONS 
~> options) ^^ {
   case tableName ~ provider ~ opts =>
-CreateTableUsing(tableName, provider, opts)
+CreateTableUsing(tableName, Seq.empty, provider, opts)
+}
+  |
+CREATE ~ TEMPORARY ~ TABLE ~> ident
+  ~ tableCols  ~ (USING ~> className) ~ (OPTIONS ~> options) ^^ {
+  case tableName ~ tableColumns ~ provider ~ opts =>
+  CreateTableUsing(tableName, tableColumns, provider, opts)
+}
+  )
+  protected lazy val tableCol: Parser[(String, String)] =
--- End diff --

I'd just have this be `Parser[StructField]` since you have all the info you 
need here to construct the final data structure.  This will also make handling 
nested type easier.  Look at `HiveMetastoreTypes` for an example.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2014-12-16 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/3431#discussion_r21940231
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/sources/FilteredScanSuite.scala ---
@@ -24,18 +24,26 @@ import org.apache.spark.sql._
 class FilteredScanSource extends RelationProvider {
   override def createRelation(
   sqlContext: SQLContext,
-  parameters: Map[String, String]): BaseRelation = {
-SimpleFilteredScan(parameters("from").toInt, 
parameters("to").toInt)(sqlContext)
+  parameters: Map[String, String],
--- End diff --

I'm not really sure if we need to rewrite all of these test cases since the 
only code path that is changing here is `DDLParser -> RelationProvider`.  The 
planner is exactly the same.  I'd just add one new test case that makes sure 
that schemas are passed correctly when using the new synatx.  Also there needs 
to be better coverage for the various datatypes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2014-12-16 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/3431#discussion_r21939982
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala ---
@@ -49,6 +48,15 @@ private[sql] class DDLParser extends 
StandardTokenParsers with PackratParsers wi
   protected implicit def asParser(k: Keyword): Parser[String] =
 lexical.allCaseVersions(k.str).map(x => x : Parser[String]).reduce(_ | 
_)
 
+  protected val STRING = Keyword("STRING")
+  protected val SHORT = Keyword("SHORT")
+  protected val DOUBLE = Keyword("DOUBLE")
+  protected val BOOLEAN = Keyword("BOOLEAN")
+  protected val BYTE = Keyword("BYTE")
+  protected val FLOAT = Keyword("FLOAT")
+  protected val INT = Keyword("INT")
+  protected val INTEGER = Keyword("INTEGER")
+  protected val LONG = Keyword("LONG")
--- End diff --

We also need to handle nested types and arrays.  Another decision is if we 
want to support the standard SQL types as well (`TINYINT`, etc) as aliases for 
the JVM type names.  We could do this like Hive does unless there is another 
way?

Thoughts @matez and @rxin?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2014-12-16 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/3431#discussion_r21939724
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/json/JSONRelation.scala ---
@@ -18,31 +18,38 @@
 package org.apache.spark.sql.json
 
 import org.apache.spark.sql.SQLContext
+import org.apache.spark.sql.catalyst.types.StructType
 import org.apache.spark.sql.sources._
 
 private[sql] class DefaultSource extends RelationProvider {
   /** Returns a new base relation with the given parameters. */
   override def createRelation(
   sqlContext: SQLContext,
-  parameters: Map[String, String]): BaseRelation = {
+  parameters: Map[String, String],
+  schema: Option[StructType]): BaseRelation = {
 val fileName = parameters.getOrElse("path", sys.error("Option 'path' 
not specified"))
 val samplingRatio = 
parameters.get("samplingRatio").map(_.toDouble).getOrElse(1.0)
 
-JSONRelation(fileName, samplingRatio)(sqlContext)
+JSONRelation(fileName, samplingRatio, schema)(sqlContext)
   }
 }
 
-private[sql] case class JSONRelation(fileName: String, samplingRatio: 
Double)(
+private[sql] case class JSONRelation(
+fileName: String,
+samplingRatio: Double,
+_schema: Option[StructType])(
--- End diff --

instead of `_schema`, how about `userSpecifiedSchema`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2014-12-16 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/3431#discussion_r21939479
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/json/JSONRelation.scala ---
@@ -18,31 +18,38 @@
 package org.apache.spark.sql.json
 
 import org.apache.spark.sql.SQLContext
+import org.apache.spark.sql.catalyst.types.StructType
 import org.apache.spark.sql.sources._
 
 private[sql] class DefaultSource extends RelationProvider {
   /** Returns a new base relation with the given parameters. */
   override def createRelation(
   sqlContext: SQLContext,
-  parameters: Map[String, String]): BaseRelation = {
+  parameters: Map[String, String],
+  schema: Option[StructType]): BaseRelation = {
--- End diff --

We cannot change the function signature, otherwise we will break existing 
libraries.  Instead I think we need to create a new interface 
`SchemaRelationProvider` maybe?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2014-12-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-66743935
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24396/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 >

1 - 100 of 110 matches

Mail list logo