[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-69475221 Yeah, no problem. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3431 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-69473637 Thanks for working on this guys! Merging to master. @yhuai can you clarify the difference between SchemaRelationProvider and RelationProvider in the scala doc in your next PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-69436044 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25354/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-69436041 [Test build #25354 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25354/consoleFull) for PR 3431 at commit [`7e79ce5`](https://github.com/apache/spark/commit/7e79ce5f80003fab657458cd9e79f4be85319aaa). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `trait SchemaRelationProvider ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-69433301 [Test build #25354 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25354/consoleFull) for PR 3431 at commit [`7e79ce5`](https://github.com/apache/spark/commit/7e79ce5f80003fab657458cd9e79f4be85319aaa). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-69433261 ok, merged! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-69433190 https://github.com/scwf/spark/pull/22 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-69433129 @scwf I have done it and will have a PR to your branch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user scwf commented on a diff in the pull request: https://github.com/apache/spark/pull/3431#discussion_r22754925 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala --- @@ -46,6 +46,33 @@ trait RelationProvider { /** * ::DeveloperApi:: + * Implemented by objects that produce relations for a specific kind of data source. When + * Spark SQL is given a DDL operation with + * 1. USING clause: to specify the implemented SchemaRelationProvider + * 2. User defined schema: users can define schema optionally when create table + * + * Users may specify the fully qualified class name of a given data source. When that class is + * not found Spark SQL will append the class name `DefaultSource` to the path, allowing for + * less verbose invocation. For example, 'org.apache.spark.sql.json' would resolve to the + * data source 'org.apache.spark.sql.json.DefaultSource' + * + * A new instance of this class with be instantiated each time a DDL call is made. + */ +@DeveloperApi +trait SchemaRelationProvider { + /** + * Returns a new base relation with the given parameters and user defined schema. + * Note: the parameters' keywords are case insensitive and this insensitivity is enforced + * by the Map that is passed to the function. + */ + def createRelation( + sqlContext: SQLContext, + parameters: Map[String, String], + schema: Option[StructType]): BaseRelation --- End diff -- My initial idea is to compatible with the old traits, since we will have two traits i will fix this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/3431#discussion_r22742665 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala --- @@ -46,6 +46,33 @@ trait RelationProvider { /** * ::DeveloperApi:: + * Implemented by objects that produce relations for a specific kind of data source. When + * Spark SQL is given a DDL operation with + * 1. USING clause: to specify the implemented SchemaRelationProvider + * 2. User defined schema: users can define schema optionally when create table + * + * Users may specify the fully qualified class name of a given data source. When that class is + * not found Spark SQL will append the class name `DefaultSource` to the path, allowing for + * less verbose invocation. For example, 'org.apache.spark.sql.json' would resolve to the + * data source 'org.apache.spark.sql.json.DefaultSource' + * + * A new instance of this class with be instantiated each time a DDL call is made. + */ +@DeveloperApi +trait SchemaRelationProvider { + /** + * Returns a new base relation with the given parameters and user defined schema. + * Note: the parameters' keywords are case insensitive and this insensitivity is enforced + * by the Map that is passed to the function. + */ + def createRelation( + sqlContext: SQLContext, + parameters: Map[String, String], + schema: Option[StructType]): BaseRelation --- End diff -- Why is this an option? we have two traits and option is not very friendly to java --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/3431#discussion_r22739718 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/json/JSONRelation.scala --- @@ -18,31 +18,38 @@ package org.apache.spark.sql.json import org.apache.spark.sql.SQLContext +import org.apache.spark.sql.catalyst.types.StructType import org.apache.spark.sql.sources._ private[sql] class DefaultSource extends RelationProvider { /** Returns a new base relation with the given parameters. */ override def createRelation( sqlContext: SQLContext, - parameters: Map[String, String]): BaseRelation = { + parameters: Map[String, String], + schema: Option[StructType]): BaseRelation = { --- End diff -- Default values do not preserve binary compatibility, only source compatibility. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-69288714 [Test build #25287 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25287/consoleFull) for PR 3431 at commit [`a852b10`](https://github.com/apache/spark/commit/a852b100b5fc6ddd6a19271f01c8df12c00553a6). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class DefaultSource extends SchemaRelationProvider ` * `case class ParquetRelation2(` * `trait SchemaRelationProvider ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-69288718 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25287/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-69285760 [Test build #25283 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25283/consoleFull) for PR 3431 at commit [`f336a16`](https://github.com/apache/spark/commit/f336a16c4b1e6241d160d2c149cdb13dba4b9263). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class DefaultSource extends SchemaRelationProvider ` * `case class ParquetRelation2(` * `trait SchemaRelationProvider ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-69285764 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25283/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-69284463 [Test build #25287 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25287/consoleFull) for PR 3431 at commit [`a852b10`](https://github.com/apache/spark/commit/a852b100b5fc6ddd6a19271f01c8df12c00553a6). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/3431#discussion_r22697822 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala --- @@ -83,10 +118,73 @@ private[sql] class DDLParser extends StandardTokenParsers with PackratParsers wi protected lazy val className: Parser[String] = repsep(ident, ".") ^^ { case s => s.mkString(".")} protected lazy val pair: Parser[(String, String)] = ident ~ stringLit ^^ { case k ~ v => (k,v) } + + protected lazy val column: Parser[StructField] = +ident ~ dataType ^^ { case columnName ~ typ => + StructField(cleanIdentifier(columnName), typ) +} + + protected lazy val primitiveType: Parser[DataType] = +STRING ^^^ StringType | +BINARY ^^^ BinaryType | +BOOLEAN ^^^ BooleanType | +TINYINT ^^^ ByteType | +SMALLINT ^^^ ShortType | +INT ^^^ IntegerType | +BIGINT ^^^ LongType | +FLOAT ^^^ FloatType | +DOUBLE ^^^ DoubleType | +fixedDecimalType | // decimal with precision/scale +DECIMAL ^^^ DecimalType.Unlimited | // decimal with no precision/scale +DATE ^^^ DateType | +TIMESTAMP ^^^ TimestampType | +VARCHAR ~ "(" ~ numericLit ~ ")" ^^^ StringType + + protected lazy val fixedDecimalType: Parser[DataType] = +(DECIMAL ~ "(" ~> numericLit) ~ ("," ~> numericLit <~ ")") ^^ { + case precision ~ scale => DecimalType(precision.toInt, scale.toInt) +} + + protected lazy val arrayType: Parser[DataType] = +ARRAY ~> "<" ~> dataType <~ ">" ^^ { + case tpe => ArrayType(tpe) +} + + protected lazy val mapType: Parser[DataType] = +MAP ~> "<" ~> dataType ~ "," ~ dataType <~ ">" ^^ { + case t1 ~ _ ~ t2 => MapType(t1, t2) +} + + protected lazy val structField: Parser[StructField] = +ident ~ ":" ~ dataType ^^ { + case fieldName ~ _ ~ tpe => StructField(cleanIdentifier(fieldName), tpe, nullable = true) +} + + protected lazy val structType: Parser[DataType] = +(STRUCT ~> "<" ~> repsep(structField, ",") <~ ">" ^^ { +case fields => new StructType(fields) +}) | +(STRUCT ~> "<>" ^^ { + case fields => new StructType(Nil) +}) + + private[sql] lazy val dataType: Parser[DataType] = +arrayType | +mapType | +structType | +primitiveType + + protected val escapedIdentifier = "`([^`]+)`".r + /** Strips backticks from ident if present */ + protected def cleanIdentifier(ident: String): String = ident match { +case escapedIdentifier(i) => i +case plainIdent => plainIdent + } --- End diff -- Thank you:) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user scwf commented on a diff in the pull request: https://github.com/apache/spark/pull/3431#discussion_r22697765 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala --- @@ -83,10 +118,73 @@ private[sql] class DDLParser extends StandardTokenParsers with PackratParsers wi protected lazy val className: Parser[String] = repsep(ident, ".") ^^ { case s => s.mkString(".")} protected lazy val pair: Parser[(String, String)] = ident ~ stringLit ^^ { case k ~ v => (k,v) } + + protected lazy val column: Parser[StructField] = +ident ~ dataType ^^ { case columnName ~ typ => + StructField(cleanIdentifier(columnName), typ) +} + + protected lazy val primitiveType: Parser[DataType] = +STRING ^^^ StringType | +BINARY ^^^ BinaryType | +BOOLEAN ^^^ BooleanType | +TINYINT ^^^ ByteType | +SMALLINT ^^^ ShortType | +INT ^^^ IntegerType | +BIGINT ^^^ LongType | +FLOAT ^^^ FloatType | +DOUBLE ^^^ DoubleType | +fixedDecimalType | // decimal with precision/scale +DECIMAL ^^^ DecimalType.Unlimited | // decimal with no precision/scale +DATE ^^^ DateType | +TIMESTAMP ^^^ TimestampType | +VARCHAR ~ "(" ~ numericLit ~ ")" ^^^ StringType + + protected lazy val fixedDecimalType: Parser[DataType] = +(DECIMAL ~ "(" ~> numericLit) ~ ("," ~> numericLit <~ ")") ^^ { + case precision ~ scale => DecimalType(precision.toInt, scale.toInt) +} + + protected lazy val arrayType: Parser[DataType] = +ARRAY ~> "<" ~> dataType <~ ">" ^^ { + case tpe => ArrayType(tpe) +} + + protected lazy val mapType: Parser[DataType] = +MAP ~> "<" ~> dataType ~ "," ~ dataType <~ ">" ^^ { + case t1 ~ _ ~ t2 => MapType(t1, t2) +} + + protected lazy val structField: Parser[StructField] = +ident ~ ":" ~ dataType ^^ { + case fieldName ~ _ ~ tpe => StructField(cleanIdentifier(fieldName), tpe, nullable = true) +} + + protected lazy val structType: Parser[DataType] = +(STRUCT ~> "<" ~> repsep(structField, ",") <~ ">" ^^ { +case fields => new StructType(fields) +}) | +(STRUCT ~> "<>" ^^ { + case fields => new StructType(Nil) +}) + + private[sql] lazy val dataType: Parser[DataType] = +arrayType | +mapType | +structType | +primitiveType + + protected val escapedIdentifier = "`([^`]+)`".r + /** Strips backticks from ident if present */ + protected def cleanIdentifier(ident: String): String = ident match { +case escapedIdentifier(i) => i +case plainIdent => plainIdent + } --- End diff -- ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/3431#discussion_r22697226 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala --- @@ -83,10 +118,73 @@ private[sql] class DDLParser extends StandardTokenParsers with PackratParsers wi protected lazy val className: Parser[String] = repsep(ident, ".") ^^ { case s => s.mkString(".")} protected lazy val pair: Parser[(String, String)] = ident ~ stringLit ^^ { case k ~ v => (k,v) } + + protected lazy val column: Parser[StructField] = +ident ~ dataType ^^ { case columnName ~ typ => + StructField(cleanIdentifier(columnName), typ) +} + + protected lazy val primitiveType: Parser[DataType] = +STRING ^^^ StringType | +BINARY ^^^ BinaryType | +BOOLEAN ^^^ BooleanType | +TINYINT ^^^ ByteType | +SMALLINT ^^^ ShortType | +INT ^^^ IntegerType | +BIGINT ^^^ LongType | +FLOAT ^^^ FloatType | +DOUBLE ^^^ DoubleType | +fixedDecimalType | // decimal with precision/scale +DECIMAL ^^^ DecimalType.Unlimited | // decimal with no precision/scale +DATE ^^^ DateType | +TIMESTAMP ^^^ TimestampType | +VARCHAR ~ "(" ~ numericLit ~ ")" ^^^ StringType + + protected lazy val fixedDecimalType: Parser[DataType] = +(DECIMAL ~ "(" ~> numericLit) ~ ("," ~> numericLit <~ ")") ^^ { + case precision ~ scale => DecimalType(precision.toInt, scale.toInt) +} + + protected lazy val arrayType: Parser[DataType] = +ARRAY ~> "<" ~> dataType <~ ">" ^^ { + case tpe => ArrayType(tpe) +} + + protected lazy val mapType: Parser[DataType] = +MAP ~> "<" ~> dataType ~ "," ~ dataType <~ ">" ^^ { + case t1 ~ _ ~ t2 => MapType(t1, t2) +} + + protected lazy val structField: Parser[StructField] = +ident ~ ":" ~ dataType ^^ { + case fieldName ~ _ ~ tpe => StructField(cleanIdentifier(fieldName), tpe, nullable = true) +} + + protected lazy val structType: Parser[DataType] = +(STRUCT ~> "<" ~> repsep(structField, ",") <~ ">" ^^ { +case fields => new StructType(fields) +}) | +(STRUCT ~> "<>" ^^ { + case fields => new StructType(Nil) +}) + + private[sql] lazy val dataType: Parser[DataType] = +arrayType | +mapType | +structType | +primitiveType + + protected val escapedIdentifier = "`([^`]+)`".r + /** Strips backticks from ident if present */ + protected def cleanIdentifier(ident: String): String = ident match { +case escapedIdentifier(i) => i +case plainIdent => plainIdent + } --- End diff -- Seems when we use `ident`, the parser will automatically take care backticks. We can remove it. I am sorry I just noticed it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-69280879 [Test build #25283 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25283/consoleFull) for PR 3431 at commit [`f336a16`](https://github.com/apache/spark/commit/f336a16c4b1e6241d160d2c149cdb13dba4b9263). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-69280728 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-69279959 [Test build #25279 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25279/consoleFull) for PR 3431 at commit [`f336a16`](https://github.com/apache/spark/commit/f336a16c4b1e6241d160d2c149cdb13dba4b9263). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class DefaultSource extends SchemaRelationProvider ` * `case class ParquetRelation2(` * `trait SchemaRelationProvider ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-69279963 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25279/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-69275198 [Test build #25279 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25279/consoleFull) for PR 3431 at commit [`f336a16`](https://github.com/apache/spark/commit/f336a16c4b1e6241d160d2c149cdb13dba4b9263). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-69252713 @scwf Some updates in https://github.com/scwf/spark/pull/21. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-69127505 [Test build #25189 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25189/consoleFull) for PR 3431 at commit [`1eeb769`](https://github.com/apache/spark/commit/1eeb769b358ce40a7ef03e86989a35abc2bbe091). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class DefaultSource extends SchemaRelationProvider ` * `case class ParquetRelation2(` * `trait SchemaRelationProvider ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-69127509 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25189/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-69126742 @yhuai Also do it later is ok:) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user OopsOutOfMemory commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-69125862 @scwf yeah, I agree with u : ) Since the data source api is a septreated module and has the same level with HiveContext etc.. it will has its own DDL Parser, DDL COMMAND and other commands. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-69125679 @scwf How about we do it later if necessary and put these tightly coupled stuff together for now? @marmbrus Can you take a look? I think this PR is ready. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-69122288 Hi @yhuai i have merged your PR. Why i move ```CreateTableUsing``` to a new commands.scala is that in future there may be more commands in data source api such as commands for sink, so we can let them all in a single commands.scala --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-6916 [Test build #25189 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25189/consoleFull) for PR 3431 at commit [`1eeb769`](https://github.com/apache/spark/commit/1eeb769b358ce40a7ef03e86989a35abc2bbe091). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-69115562 @scwf I have created a PR to your branch for the above comments. I also made a few updates in test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/3431#discussion_r22619369 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala --- @@ -67,15 +89,25 @@ private[sql] class DDLParser extends StandardTokenParsers with PackratParsers wi protected lazy val ddl: Parser[LogicalPlan] = createTable /** - * CREATE TEMPORARY TABLE avroTable + * `CREATE TEMPORARY TABLE avroTable * USING org.apache.spark.sql.avro - * OPTIONS (path "../hive/src/test/resources/data/files/episodes.avro") + * OPTIONS (path "../hive/src/test/resources/data/files/episodes.avro")` + * or + * `CREATE TEMPORARY TABLE avroTable(intField int, stringField string...) + * USING org.apache.spark.sql.avro + * OPTIONS (path "../hive/src/test/resources/data/files/episodes.avro")` */ protected lazy val createTable: Parser[LogicalPlan] = -CREATE ~ TEMPORARY ~ TABLE ~> ident ~ (USING ~> className) ~ (OPTIONS ~> options) ^^ { - case tableName ~ provider ~ opts => -CreateTableUsing(tableName, provider, opts) + ( +CREATE ~ TEMPORARY ~ TABLE ~> ident + ~ (tableCols).? ~ (USING ~> className) ~ (OPTIONS ~> options) ^^ { + case tableName ~ columns ~ provider ~ opts => +val tblColumns = if(columns.isEmpty) Seq.empty else columns.get --- End diff -- I feel it is better to use the `Option` instead of converting a `None` to an empty `Seq`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-69071019 @scwf @OopsOutOfMemory Actually, can we revert the last commit (moving `CreateTableUsing` out of `ddl.scala`)? We'd like to put it in `ddl.scala` since it is a part of the work of supporting foreign DDL commands. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-69000429 [Test build #25156 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25156/consoleFull) for PR 3431 at commit [`b621c8f`](https://github.com/apache/spark/commit/b621c8f8885dbcc67dc34c53fbef79346900cb9c). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class DefaultSource extends SchemaRelationProvider ` * `case class ParquetRelation2(` * `sys.error(s"Failed to load class for data source: $provider")` * `trait SchemaRelationProvider ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-69000432 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25156/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-68993444 [Test build #25156 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25156/consoleFull) for PR 3431 at commit [`b621c8f`](https://github.com/apache/spark/commit/b621c8f8885dbcc67dc34c53fbef79346900cb9c). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user scwf commented on a diff in the pull request: https://github.com/apache/spark/pull/3431#discussion_r22575577 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala --- @@ -83,10 +118,73 @@ private[sql] class DDLParser extends StandardTokenParsers with PackratParsers wi protected lazy val className: Parser[String] = repsep(ident, ".") ^^ { case s => s.mkString(".")} protected lazy val pair: Parser[(String, String)] = ident ~ stringLit ^^ { case k ~ v => (k,v) } + + protected lazy val column: Parser[StructField] = +ident ~ dataType ^^ { case columnName ~ typ => + StructField(cleanIdentifier(columnName), typ) +} + + protected lazy val primitiveType: Parser[DataType] = +STRING ^^^ StringType | +BINARY ^^^ BinaryType | +BOOLEAN ^^^ BooleanType | +TINYINT ^^^ ByteType | +SMALLINT ^^^ ShortType | +INT ^^^ IntegerType | +BIGINT ^^^ LongType | +FLOAT ^^^ FloatType | +DOUBLE ^^^ DoubleType | +fixedDecimalType | // decimal with precision/scale +DECIMAL ^^^ DecimalType.Unlimited | // decimal with no precision/scale +DATE ^^^ DateType | +TIMESTAMP ^^^ TimestampType | +VARCHAR ~ "(" ~ numericLit ~ ")" ^^^ StringType + + protected lazy val fixedDecimalType: Parser[DataType] = +(DECIMAL ~ "(" ~> numericLit) ~ ("," ~> numericLit <~ ")") ^^ { + case precision ~ scale => DecimalType(precision.toInt, scale.toInt) +} + + protected lazy val arrayType: Parser[DataType] = +ARRAY ~> "<" ~> dataType <~ ">" ^^ { + case tpe => ArrayType(tpe) +} + + protected lazy val mapType: Parser[DataType] = +MAP ~> "<" ~> dataType ~ "," ~ dataType <~ ">" ^^ { + case t1 ~ _ ~ t2 => MapType(t1, t2) +} + + protected lazy val structField: Parser[StructField] = +ident ~ ":" ~ dataType ^^ { + case fieldName ~ _ ~ tpe => StructField(cleanIdentifier(fieldName), tpe, nullable = true) +} + + protected lazy val structType: Parser[DataType] = +(STRUCT ~> "<" ~> repsep(structField, ",") <~ ">" ^^ { +case fields => new StructType(fields) +}) | +(STRUCT ~> "<>" ^^ { + case fields => new StructType(Nil) +}) + + private[sql] lazy val dataType: Parser[DataType] = +arrayType | +mapType | +structType | +primitiveType + + protected val escapedIdentifier = "`([^`]+)`".r + /** Strips backticks from ident if present */ + protected def cleanIdentifier(ident: String): String = ident match { +case escapedIdentifier(i) => i +case plainIdent => plainIdent + } } private[sql] case class CreateTableUsing( --- End diff -- ok, i will move it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user OopsOutOfMemory commented on a diff in the pull request: https://github.com/apache/spark/pull/3431#discussion_r22574984 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala --- @@ -83,10 +118,73 @@ private[sql] class DDLParser extends StandardTokenParsers with PackratParsers wi protected lazy val className: Parser[String] = repsep(ident, ".") ^^ { case s => s.mkString(".")} protected lazy val pair: Parser[(String, String)] = ident ~ stringLit ^^ { case k ~ v => (k,v) } + + protected lazy val column: Parser[StructField] = +ident ~ dataType ^^ { case columnName ~ typ => + StructField(cleanIdentifier(columnName), typ) +} + + protected lazy val primitiveType: Parser[DataType] = +STRING ^^^ StringType | +BINARY ^^^ BinaryType | +BOOLEAN ^^^ BooleanType | +TINYINT ^^^ ByteType | +SMALLINT ^^^ ShortType | +INT ^^^ IntegerType | +BIGINT ^^^ LongType | +FLOAT ^^^ FloatType | +DOUBLE ^^^ DoubleType | +fixedDecimalType | // decimal with precision/scale +DECIMAL ^^^ DecimalType.Unlimited | // decimal with no precision/scale +DATE ^^^ DateType | +TIMESTAMP ^^^ TimestampType | +VARCHAR ~ "(" ~ numericLit ~ ")" ^^^ StringType + + protected lazy val fixedDecimalType: Parser[DataType] = +(DECIMAL ~ "(" ~> numericLit) ~ ("," ~> numericLit <~ ")") ^^ { + case precision ~ scale => DecimalType(precision.toInt, scale.toInt) +} + + protected lazy val arrayType: Parser[DataType] = +ARRAY ~> "<" ~> dataType <~ ">" ^^ { + case tpe => ArrayType(tpe) +} + + protected lazy val mapType: Parser[DataType] = +MAP ~> "<" ~> dataType ~ "," ~ dataType <~ ">" ^^ { + case t1 ~ _ ~ t2 => MapType(t1, t2) +} + + protected lazy val structField: Parser[StructField] = +ident ~ ":" ~ dataType ^^ { + case fieldName ~ _ ~ tpe => StructField(cleanIdentifier(fieldName), tpe, nullable = true) +} + + protected lazy val structType: Parser[DataType] = +(STRUCT ~> "<" ~> repsep(structField, ",") <~ ">" ^^ { +case fields => new StructType(fields) +}) | +(STRUCT ~> "<>" ^^ { + case fields => new StructType(Nil) +}) + + private[sql] lazy val dataType: Parser[DataType] = +arrayType | +mapType | +structType | +primitiveType + + protected val escapedIdentifier = "`([^`]+)`".r + /** Strips backticks from ident if present */ + protected def cleanIdentifier(ident: String): String = ident match { +case escapedIdentifier(i) => i +case plainIdent => plainIdent + } } private[sql] case class CreateTableUsing( --- End diff -- It would be nice to move this `class CreateTableUsing` to `org.apache.spark.sql.sources.commands.scala` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user OopsOutOfMemory commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-68991149 @scwf One thing to note : ) The RunnableCommand `CreateTableUsing` is not belong to `ddl.scala` It should be defined in `org.apache.spark.sql.sources.commands.scala` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-68986040 @yhuai if we want to handle it in ```org.apache.spark.sql.hive.MetastoreRelation$SchemaAttribute.toAttribute```, it is hardly to reuse the data type parser of ddl, so maybe we can/should revert the ```HiveMetastoreTypes``` ? Actually i think whether to allow users to define ```struct<>``` is a small point, how about let users can do this and then ```org.apache.spark.sql.hive.MetastoreRelation``` reuse the data type parser of ddl? This two both ok for me, if you have any other concern let me know. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-68982855 @scwf Actually, I tried `struct<>` with `runSqlHive` and got an exception. https://issues.apache.org/jira/browse/HIVE-3323 seems the original hive jira that introduced `struct<>` (for thrift support). Maybe we should not allow users to define `struct<>` and take care it specially in `org.apache.spark.sql.hive.MetastoreRelation$SchemaAttribute.toAttribute`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-68982516 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25137/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-68982511 [Test build #25137 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25137/consoleFull) for PR 3431 at commit [`d02547f`](https://github.com/apache/spark/commit/d02547f1c2df677fc9b76bd5cf052e8fd700a2b5). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class DefaultSource extends SchemaRelationProvider ` * `case class ParquetRelation2(` * `trait SchemaRelationProvider ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-68981491 @marmbrus told me that `<>` is a delimiter defined in https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SparkSQLParser.scala#L63. So, seems we need to handle `struct<>` specially for now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-68978806 [Test build #25137 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25137/consoleFull) for PR 3431 at commit [`d02547f`](https://github.com/apache/spark/commit/d02547f1c2df677fc9b76bd5cf052e8fd700a2b5). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user scwf commented on a diff in the pull request: https://github.com/apache/spark/pull/3431#discussion_r22570806 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala --- @@ -83,10 +118,70 @@ private[sql] class DDLParser extends StandardTokenParsers with PackratParsers wi protected lazy val className: Parser[String] = repsep(ident, ".") ^^ { case s => s.mkString(".")} protected lazy val pair: Parser[(String, String)] = ident ~ stringLit ^^ { case k ~ v => (k,v) } + + protected lazy val column: Parser[StructField] = +ident ~ dataType ^^ { case columnName ~ typ => + StructField(cleanIdentifier(columnName), typ) +} + + protected lazy val primitiveType: Parser[DataType] = +STRING ^^^ StringType | +BINARY ^^^ BinaryType | +BOOLEAN ^^^ BooleanType | +TINYINT ^^^ ByteType | +SMALLINT ^^^ ShortType | +INT ^^^ IntegerType | +BIGINT ^^^ LongType | +FLOAT ^^^ FloatType | +DOUBLE ^^^ DoubleType | +fixedDecimalType | // decimal with precision/scale +DECIMAL ^^^ DecimalType.Unlimited | // decimal with no precision/scale +DATE ^^^ DateType | +TIMESTAMP ^^^ TimestampType | +VARCHAR ~ "(" ~ numericLit ~ ")" ^^^ StringType + + protected lazy val fixedDecimalType: Parser[DataType] = +(DECIMAL ~ "(" ~> numericLit) ~ ("," ~> numericLit <~ ")") ^^ { + case precision ~ scale => DecimalType(precision.toInt, scale.toInt) +} + + protected lazy val arrayType: Parser[DataType] = +ARRAY ~> "<" ~> dataType <~ ">" ^^ { + case tpe => ArrayType(tpe) +} + + protected lazy val mapType: Parser[DataType] = +MAP ~> "<" ~> dataType ~ "," ~ dataType <~ ">" ^^ { + case t1 ~ _ ~ t2 => MapType(t1, t2) +} + + protected lazy val structField: Parser[StructField] = +ident ~ ":" ~ dataType ^^ { + case fieldName ~ _ ~ tpe => StructField(cleanIdentifier(fieldName), tpe, nullable = true) +} + + protected lazy val structType: Parser[DataType] = +STRUCT ~> "<" ~> repsep(structField, ",") <~ ">" ^^ { --- End diff -- very wired, the old version also use ```repsep(structField, ",")``` and no ```STRUCT ~> "<>" ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-68978023 Seems this change can not parse ```structField struct<>``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/3431#discussion_r22570712 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala --- @@ -83,10 +118,70 @@ private[sql] class DDLParser extends StandardTokenParsers with PackratParsers wi protected lazy val className: Parser[String] = repsep(ident, ".") ^^ { case s => s.mkString(".")} protected lazy val pair: Parser[(String, String)] = ident ~ stringLit ^^ { case k ~ v => (k,v) } + + protected lazy val column: Parser[StructField] = +ident ~ dataType ^^ { case columnName ~ typ => + StructField(cleanIdentifier(columnName), typ) +} + + protected lazy val primitiveType: Parser[DataType] = +STRING ^^^ StringType | +BINARY ^^^ BinaryType | +BOOLEAN ^^^ BooleanType | +TINYINT ^^^ ByteType | +SMALLINT ^^^ ShortType | +INT ^^^ IntegerType | +BIGINT ^^^ LongType | +FLOAT ^^^ FloatType | +DOUBLE ^^^ DoubleType | +fixedDecimalType | // decimal with precision/scale +DECIMAL ^^^ DecimalType.Unlimited | // decimal with no precision/scale +DATE ^^^ DateType | +TIMESTAMP ^^^ TimestampType | +VARCHAR ~ "(" ~ numericLit ~ ")" ^^^ StringType + + protected lazy val fixedDecimalType: Parser[DataType] = +(DECIMAL ~ "(" ~> numericLit) ~ ("," ~> numericLit <~ ")") ^^ { + case precision ~ scale => DecimalType(precision.toInt, scale.toInt) +} + + protected lazy val arrayType: Parser[DataType] = +ARRAY ~> "<" ~> dataType <~ ">" ^^ { + case tpe => ArrayType(tpe) +} + + protected lazy val mapType: Parser[DataType] = +MAP ~> "<" ~> dataType ~ "," ~ dataType <~ ">" ^^ { + case t1 ~ _ ~ t2 => MapType(t1, t2) +} + + protected lazy val structField: Parser[StructField] = +ident ~ ":" ~ dataType ^^ { + case fieldName ~ _ ~ tpe => StructField(cleanIdentifier(fieldName), tpe, nullable = true) +} + + protected lazy val structType: Parser[DataType] = +STRUCT ~> "<" ~> repsep(structField, ",") <~ ">" ^^ { --- End diff -- @scwf I am not sure why it failed. I tried the following change and the error disappears. ```scala protected lazy val structType: Parser[DataType] = (STRUCT ~> "<" ~> repsep(structField, ",") <~ ">" ^^ { case fields => new StructType(fields) }) | (STRUCT ~> "<>" ^^ { case fields => new StructType(Nil) }) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/3431#discussion_r22570296 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala --- @@ -83,10 +118,70 @@ private[sql] class DDLParser extends StandardTokenParsers with PackratParsers wi protected lazy val className: Parser[String] = repsep(ident, ".") ^^ { case s => s.mkString(".")} protected lazy val pair: Parser[(String, String)] = ident ~ stringLit ^^ { case k ~ v => (k,v) } + + protected lazy val column: Parser[StructField] = +ident ~ dataType ^^ { case columnName ~ typ => + StructField(cleanIdentifier(columnName), typ) +} + + protected lazy val primitiveType: Parser[DataType] = +STRING ^^^ StringType | +BINARY ^^^ BinaryType | +BOOLEAN ^^^ BooleanType | +TINYINT ^^^ ByteType | +SMALLINT ^^^ ShortType | +INT ^^^ IntegerType | +BIGINT ^^^ LongType | +FLOAT ^^^ FloatType | +DOUBLE ^^^ DoubleType | +fixedDecimalType | // decimal with precision/scale +DECIMAL ^^^ DecimalType.Unlimited | // decimal with no precision/scale +DATE ^^^ DateType | +TIMESTAMP ^^^ TimestampType | +VARCHAR ~ "(" ~ numericLit ~ ")" ^^^ StringType + + protected lazy val fixedDecimalType: Parser[DataType] = +(DECIMAL ~ "(" ~> numericLit) ~ ("," ~> numericLit <~ ")") ^^ { + case precision ~ scale => DecimalType(precision.toInt, scale.toInt) +} + + protected lazy val arrayType: Parser[DataType] = +ARRAY ~> "<" ~> dataType <~ ">" ^^ { + case tpe => ArrayType(tpe) +} + + protected lazy val mapType: Parser[DataType] = +MAP ~> "<" ~> dataType ~ "," ~ dataType <~ ">" ^^ { + case t1 ~ _ ~ t2 => MapType(t1, t2) +} + + protected lazy val structField: Parser[StructField] = +ident ~ ":" ~ dataType ^^ { + case fieldName ~ _ ~ tpe => StructField(cleanIdentifier(fieldName), tpe, nullable = true) +} + + protected lazy val structType: Parser[DataType] = +STRUCT ~> "<" ~> repsep(structField, ",") <~ ">" ^^ { --- End diff -- Seems this line caused the failure. But, I have not found a fix. Still trying. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-68976404 Have not found the root cause, and idea here @yhuai ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-68975092 The parser change leads to ```HiveCompatibilitySuite``` failed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-68973513 [Test build #25128 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25128/consoleFull) for PR 3431 at commit [`ddab984`](https://github.com/apache/spark/commit/ddab9848dee0bbf0cafbdd8339889153a26b38b3). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class DefaultSource extends SchemaRelationProvider ` * `case class ParquetRelation2(` * `trait SchemaRelationProvider ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-68973517 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25128/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-68970836 [Test build #25128 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25128/consoleFull) for PR 3431 at commit [`ddab984`](https://github.com/apache/spark/commit/ddab9848dee0bbf0cafbdd8339889153a26b38b3). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-68970820 And i will add the more tests here --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-68970720 @yhuai i have merged the PR:) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-68966142 @scwf I made a pull request (https://github.com/scwf/spark/pull/19/) to your branch. This PR moves the data type parser into our `DDLParser`. With it, data type keywords are also parsed in a case insensitive way. Let me know if it makes sense. Also, we probably want to add more tests for the parser. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-68953584 @scwf I am working on the data type parser. Will try to make a pull request to your branch soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/3431#discussion_r22548615 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala --- @@ -50,6 +50,7 @@ private[sql] class DDLParser extends StandardTokenParsers with PackratParsers wi lexical.allCaseVersions(k.str).map(x => x : Parser[String]).reduce(_ | _) protected val CREATE = Keyword("CREATE") + protected val DECIMAL = Keyword("DECIMAL") --- End diff -- Oh, actually, I put this comment on the wrong line. I meant to put it under the line 108. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/3431#discussion_r22547365 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala --- @@ -83,10 +99,104 @@ private[sql] class DDLParser extends StandardTokenParsers with PackratParsers wi protected lazy val className: Parser[String] = repsep(ident, ".") ^^ { case s => s.mkString(".")} protected lazy val pair: Parser[(String, String)] = ident ~ stringLit ^^ { case k ~ v => (k,v) } + + protected lazy val column: Parser[StructField] = + ( ident ~ ident ^^ { case name ~ typ => + StructField(name, metastoreTypes.toDataType(typ)) +} + | +ident ~ (DECIMAL ~ "(" ~> numericLit) ~ ("," ~> numericLit <~ ")") ^^ { + case name ~ precision ~ scale => +StructField(name, DecimalType(precision.toInt, scale.toInt)) +} + ) +} + +/** + * :: DeveloperApi :: + * Provides a parser for data types. + */ +@DeveloperApi +private[sql] class MetastoreTypes extends RegexParsers { + protected lazy val primitiveType: Parser[DataType] = +"string" ^^^ StringType | + "float" ^^^ FloatType | + "int" ^^^ IntegerType | + "tinyint" ^^^ ByteType | + "smallint" ^^^ ShortType | + "double" ^^^ DoubleType | + "bigint" ^^^ LongType | + "binary" ^^^ BinaryType | + "boolean" ^^^ BooleanType | + fixedDecimalType | // decimal with precision/scale + "decimal" ^^^ DecimalType.Unlimited | // decimal with no precision/scale + "date" ^^^ DateType | + "timestamp" ^^^ TimestampType | + "varchar\\((\\d+)\\)".r ^^^ StringType + + protected lazy val fixedDecimalType: Parser[DataType] = +("decimal" ~> "(" ~> "\\d+".r) ~ ("," ~> "\\d+".r <~ ")") ^^ { + case precision ~ scale => +DecimalType(precision.toInt, scale.toInt) +} + + protected lazy val arrayType: Parser[DataType] = +"array" ~> "<" ~> dataType <~ ">" ^^ { + case tpe => ArrayType(tpe) +} + + protected lazy val mapType: Parser[DataType] = +"map" ~> "<" ~> dataType ~ "," ~ dataType <~ ">" ^^ { + case t1 ~ _ ~ t2 => MapType(t1, t2) +} + + protected lazy val structField: Parser[StructField] = +"[a-zA-Z0-9_]*".r ~ ":" ~ dataType ^^ { + case name ~ _ ~ tpe => StructField(name, tpe, nullable = true) +} + + protected lazy val structType: Parser[DataType] = +"struct" ~> "<" ~> repsep(structField,",") <~ ">" ^^ { + case fields => new StructType(fields) +} + + private[sql] lazy val dataType: Parser[DataType] = +arrayType | + mapType | + structType | + primitiveType + + def toDataType(metastoreType: String): DataType = parseAll(dataType, metastoreType) match { +case Success(result, _) => result +case failure: NoSuccess => sys.error(s"Unsupported dataType: $metastoreType") + } + + def toMetastoreType(dt: DataType): String = dt match { --- End diff -- Seems we do not need it at here since we only need to parse strings. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/3431#discussion_r22535746 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala --- @@ -50,6 +50,7 @@ private[sql] class DDLParser extends StandardTokenParsers with PackratParsers wi lexical.allCaseVersions(k.str).map(x => x : Parser[String]).reduce(_ | _) protected val CREATE = Keyword("CREATE") + protected val DECIMAL = Keyword("DECIMAL") --- End diff -- Seems `fixedDecimalType` defined below already does the work. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user OopsOutOfMemory commented on a diff in the pull request: https://github.com/apache/spark/pull/3431#discussion_r22526157 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala --- @@ -83,10 +99,104 @@ private[sql] class DDLParser extends StandardTokenParsers with PackratParsers wi protected lazy val className: Parser[String] = repsep(ident, ".") ^^ { case s => s.mkString(".")} protected lazy val pair: Parser[(String, String)] = ident ~ stringLit ^^ { case k ~ v => (k,v) } + + protected lazy val column: Parser[StructField] = + ( ident ~ ident ^^ { case name ~ typ => + StructField(name, metastoreTypes.toDataType(typ)) --- End diff -- In order to handle also possible user definition, here we can make the type forced be lower case so __ toDataType__ can handle it correctly. Like: ``` StructField(name, metastoreTypes.toDataType(typ.toLowerCase) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user OopsOutOfMemory commented on a diff in the pull request: https://github.com/apache/spark/pull/3431#discussion_r22525893 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala --- @@ -83,10 +99,104 @@ private[sql] class DDLParser extends StandardTokenParsers with PackratParsers wi protected lazy val className: Parser[String] = repsep(ident, ".") ^^ { case s => s.mkString(".")} protected lazy val pair: Parser[(String, String)] = ident ~ stringLit ^^ { case k ~ v => (k,v) } + + protected lazy val column: Parser[StructField] = + ( ident ~ ident ^^ { case name ~ typ => + StructField(name, metastoreTypes.toDataType(typ)) +} + | +ident ~ (DECIMAL ~ "(" ~> numericLit) ~ ("," ~> numericLit <~ ")") ^^ { + case name ~ precision ~ scale => +StructField(name, DecimalType(precision.toInt, scale.toInt)) +} + ) +} + +/** + * :: DeveloperApi :: + * Provides a parser for data types. + */ +@DeveloperApi +private[sql] class MetastoreTypes extends RegexParsers { + protected lazy val primitiveType: Parser[DataType] = +"string" ^^^ StringType | --- End diff -- We'd better consider the lowercase and upper case of `type` which user defined. e.g: ``` name STRING, salary FLOAT, subordinates ARRAY, deductions MAP, address STRUCT ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user OopsOutOfMemory commented on a diff in the pull request: https://github.com/apache/spark/pull/3431#discussion_r22525682 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala --- @@ -67,15 +68,30 @@ private[sql] class DDLParser extends StandardTokenParsers with PackratParsers wi protected lazy val ddl: Parser[LogicalPlan] = createTable /** - * CREATE TEMPORARY TABLE avroTable + * `CREATE TEMPORARY TABLE avroTable * USING org.apache.spark.sql.avro - * OPTIONS (path "../hive/src/test/resources/data/files/episodes.avro") + * OPTIONS (path "../hive/src/test/resources/data/files/episodes.avro")` + * or + * `CREATE TEMPORARY TABLE avroTable(intField int, stringField string...) + * USING org.apache.spark.sql.avro + * OPTIONS (path "../hive/src/test/resources/data/files/episodes.avro")` */ protected lazy val createTable: Parser[LogicalPlan] = -CREATE ~ TEMPORARY ~ TABLE ~> ident ~ (USING ~> className) ~ (OPTIONS ~> options) ^^ { + ( CREATE ~ TEMPORARY ~ TABLE ~> ident ~ (USING ~> className) ~ (OPTIONS ~> options) ^^ { case tableName ~ provider ~ opts => -CreateTableUsing(tableName, provider, opts) +CreateTableUsing(tableName, Seq.empty, provider, opts) +} + | --- End diff -- These two conditions can be merged with : ``` CREATE ~ TEMPORARY ~ TABLE ~> ident ~ (tableCols).? ~ (USING ~> className) ~ (OPTIONS ~> options) ^^ { case tableName ~ columns ~ provider ~ opts => val tblColumns = if(columns.isEmpty) Seq.empty else columns CreateTableUsing(tableName,tblColumns, provider, opts) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user scwf commented on a diff in the pull request: https://github.com/apache/spark/pull/3431#discussion_r22516253 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala --- @@ -50,6 +50,7 @@ private[sql] class DDLParser extends StandardTokenParsers with PackratParsers wi lexical.allCaseVersions(k.str).map(x => x : Parser[String]).reduce(_ | _) protected val CREATE = Keyword("CREATE") + protected val DECIMAL = Keyword("DECIMAL") --- End diff -- This is used for parse decimal type format of ```decimal(10,2)``` format. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/3431#discussion_r22508765 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/sources/NewTableScanSuite.scala --- @@ -0,0 +1,136 @@ +/* +* Licensed to the Apache Software Foundation (ASF) under one or more +* contributor license agreements. See the NOTICE file distributed with +* this work for additional information regarding copyright ownership. +* The ASF licenses this file to You under the Apache License, Version 2.0 +* (the "License"); you may not use this file except in compliance with +* the License. You may obtain a copy of the License at +* +*http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an "AS IS" BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ + +package org.apache.spark.sql.sources + +import org.apache.spark.sql._ +import java.sql.{Timestamp, Date} +import org.apache.spark.sql.execution.RDDConversions + +case class PrimaryData( +stringField: String, +intField: Int, +longField: Long, +floatField: Float, +doubleField: Double, +shortField: Short, +byteField: Byte, +booleanField: Boolean, +decimalField: BigDecimal, +date: Date, +timestampField: Timestamp) --- End diff -- Also test complex data types? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/3431#discussion_r22507890 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala --- @@ -50,6 +50,7 @@ private[sql] class DDLParser extends StandardTokenParsers with PackratParsers wi lexical.allCaseVersions(k.str).map(x => x : Parser[String]).reduce(_ | _) protected val CREATE = Keyword("CREATE") + protected val DECIMAL = Keyword("DECIMAL") --- End diff -- Can you explain the reason that we need this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-68416333 [Test build #24919 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24919/consoleFull) for PR 3431 at commit [`cf982d2`](https://github.com/apache/spark/commit/cf982d2759c61988f1612380a950942cd41e734a). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class DefaultSource extends SchemaRelationProvider ` * `case class ParquetRelation2(` * `trait SchemaRelationProvider ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-68416337 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24919/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-68412433 [Test build #24919 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24919/consoleFull) for PR 3431 at commit [`cf982d2`](https://github.com/apache/spark/commit/cf982d2759c61988f1612380a950942cd41e734a). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-68410619 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24915/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-68410616 [Test build #24915 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24915/consoleFull) for PR 3431 at commit [`445b57b`](https://github.com/apache/spark/commit/445b57bc8dca3753be5dfeecc964be072933898a). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class DefaultSource extends SchemaRelationProvider ` * `case class ParquetRelation2(` * `trait SchemaRelationProvider ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-68409631 [Test build #24915 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24915/consoleFull) for PR 3431 at commit [`445b57b`](https://github.com/apache/spark/commit/445b57bc8dca3753be5dfeecc964be072933898a). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user scwf commented on a diff in the pull request: https://github.com/apache/spark/pull/3431#discussion_r22369162 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/sources/PrunedScanSuite.scala --- @@ -19,10 +19,11 @@ package org.apache.spark.sql.sources import org.apache.spark.sql._ -class PrunedScanSource extends RelationProvider { +class PrunedScanSource extends SchemaRelationProvider { --- End diff -- ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user scwf commented on a diff in the pull request: https://github.com/apache/spark/pull/3431#discussion_r22369155 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala --- @@ -99,8 +209,15 @@ private[sql] case class CreateTableUsing( sys.error(s"Failed to load class for data source: $provider") } } -val dataSource = clazz.newInstance().asInstanceOf[org.apache.spark.sql.sources.RelationProvider] -val relation = dataSource.createRelation(sqlContext, new CaseInsensitiveMap(options)) +val dataSource = + clazz.newInstance().asInstanceOf[org.apache.spark.sql.sources.SchemaRelationProvider] --- End diff -- Ok so if users do not specify schema in ddl, go with ```RelationProvider```, otherwith go with ```SchemaRelationProvider``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/3431#discussion_r22364492 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/sources/PrunedScanSuite.scala --- @@ -19,10 +19,11 @@ package org.apache.spark.sql.sources import org.apache.spark.sql._ -class PrunedScanSource extends RelationProvider { +class PrunedScanSource extends SchemaRelationProvider { --- End diff -- I don't think we should change these, otherwise we have no tests for libraries that were written against the old api. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/3431#discussion_r22364389 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala --- @@ -99,8 +209,15 @@ private[sql] case class CreateTableUsing( sys.error(s"Failed to load class for data source: $provider") } } -val dataSource = clazz.newInstance().asInstanceOf[org.apache.spark.sql.sources.RelationProvider] -val relation = dataSource.createRelation(sqlContext, new CaseInsensitiveMap(options)) +val dataSource = + clazz.newInstance().asInstanceOf[org.apache.spark.sql.sources.SchemaRelationProvider] --- End diff -- This is still breaking support for libraries that were written against the original API. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-68378887 [Test build #24898 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24898/consoleFull) for PR 3431 at commit [`02a662c`](https://github.com/apache/spark/commit/02a662c4cb3605b3abc7033ad14e3b7400c30964). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class DefaultSource extends SchemaRelationProvider ` * `case class ParquetRelation2(` * `trait SchemaRelationProvider ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-68378891 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24898/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-68372387 [Test build #24898 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24898/consoleFull) for PR 3431 at commit [`02a662c`](https://github.com/apache/spark/commit/02a662c4cb3605b3abc7033ad14e3b7400c30964). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-68371499 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24897/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-68371497 [Test build #24897 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24897/consoleFull) for PR 3431 at commit [`44eb70c`](https://github.com/apache/spark/commit/44eb70cda9049a68d7a3a4a4ca74e5bc41f04991). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class DefaultSource extends SchemaRelationProvider ` * `case class ParquetRelation2(` * `trait SchemaRelationProvider ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-68371410 [Test build #24897 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24897/consoleFull) for PR 3431 at commit [`44eb70c`](https://github.com/apache/spark/commit/44eb70cda9049a68d7a3a4a4ca74e5bc41f04991). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-68336828 Hi @marmbrus, still working on this, tomorrow i will update this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-68336623 ping. any progress here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user scwf commented on a diff in the pull request: https://github.com/apache/spark/pull/3431#discussion_r21974049 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala --- @@ -49,6 +48,15 @@ private[sql] class DDLParser extends StandardTokenParsers with PackratParsers wi protected implicit def asParser(k: Keyword): Parser[String] = lexical.allCaseVersions(k.str).map(x => x : Parser[String]).reduce(_ | _) + protected val STRING = Keyword("STRING") + protected val SHORT = Keyword("SHORT") + protected val DOUBLE = Keyword("DOUBLE") + protected val BOOLEAN = Keyword("BOOLEAN") + protected val BYTE = Keyword("BYTE") + protected val FLOAT = Keyword("FLOAT") + protected val INT = Keyword("INT") + protected val INTEGER = Keyword("INTEGER") + protected val LONG = Keyword("LONG") --- End diff -- Does this mean we will follow hive ql in future? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user scwf commented on a diff in the pull request: https://github.com/apache/spark/pull/3431#discussion_r21973647 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/json/JSONRelation.scala --- @@ -18,31 +18,38 @@ package org.apache.spark.sql.json import org.apache.spark.sql.SQLContext +import org.apache.spark.sql.catalyst.types.StructType import org.apache.spark.sql.sources._ private[sql] class DefaultSource extends RelationProvider { /** Returns a new base relation with the given parameters. */ override def createRelation( sqlContext: SQLContext, - parameters: Map[String, String]): BaseRelation = { + parameters: Map[String, String], + schema: Option[StructType]): BaseRelation = { val fileName = parameters.getOrElse("path", sys.error("Option 'path' not specified")) val samplingRatio = parameters.get("samplingRatio").map(_.toDouble).getOrElse(1.0) -JSONRelation(fileName, samplingRatio)(sqlContext) +JSONRelation(fileName, samplingRatio, schema)(sqlContext) } } -private[sql] case class JSONRelation(fileName: String, samplingRatio: Double)( +private[sql] case class JSONRelation( +fileName: String, +samplingRatio: Double, +_schema: Option[StructType])( --- End diff -- ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user scwf commented on a diff in the pull request: https://github.com/apache/spark/pull/3431#discussion_r21973220 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/json/JSONRelation.scala --- @@ -18,31 +18,38 @@ package org.apache.spark.sql.json import org.apache.spark.sql.SQLContext +import org.apache.spark.sql.catalyst.types.StructType import org.apache.spark.sql.sources._ private[sql] class DefaultSource extends RelationProvider { /** Returns a new base relation with the given parameters. */ override def createRelation( sqlContext: SQLContext, - parameters: Map[String, String]): BaseRelation = { + parameters: Map[String, String], + schema: Option[StructType]): BaseRelation = { --- End diff -- Or using a default value for schema: ``` schema: Option[StructType] = None ``` ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/3431#discussion_r21950698 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala --- @@ -49,6 +48,15 @@ private[sql] class DDLParser extends StandardTokenParsers with PackratParsers wi protected implicit def asParser(k: Keyword): Parser[String] = lexical.allCaseVersions(k.str).map(x => x : Parser[String]).reduce(_ | _) + protected val STRING = Keyword("STRING") + protected val SHORT = Keyword("SHORT") + protected val DOUBLE = Keyword("DOUBLE") + protected val BOOLEAN = Keyword("BOOLEAN") + protected val BYTE = Keyword("BYTE") + protected val FLOAT = Keyword("FLOAT") + protected val INT = Keyword("INT") + protected val INTEGER = Keyword("INTEGER") + protected val LONG = Keyword("LONG") --- End diff -- IMO we should follow Hive here. I don't think it's that confusing. I would actually start by doing only the Hive types and not SHORT or BYTE in order to make it clearer to users. See https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types for what they are called. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-67250038 Thanks for working on this! I know a couple of people who want to use this in data sources they are writing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/3431#discussion_r21940324 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/sources/FilteredScanSuite.scala --- @@ -24,18 +24,26 @@ import org.apache.spark.sql._ class FilteredScanSource extends RelationProvider { override def createRelation( sqlContext: SQLContext, - parameters: Map[String, String]): BaseRelation = { -SimpleFilteredScan(parameters("from").toInt, parameters("to").toInt)(sqlContext) + parameters: Map[String, String], --- End diff -- Also make sure to add a test case for identifiers that have backticks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/3431#discussion_r21940281 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala --- @@ -67,16 +75,35 @@ private[sql] class DDLParser extends StandardTokenParsers with PackratParsers wi protected lazy val ddl: Parser[LogicalPlan] = createTable /** - * CREATE TEMPORARY TABLE avroTable + * `CREATE TEMPORARY TABLE avroTable * USING org.apache.spark.sql.avro - * OPTIONS (path "../hive/src/test/resources/data/files/episodes.avro") + * OPTIONS (path "../hive/src/test/resources/data/files/episodes.avro")` + * or + * `CREATE TEMPORARY TABLE avroTable(intField int, stringField string...) + * USING org.apache.spark.sql.avro + * OPTIONS (path "../hive/src/test/resources/data/files/episodes.avro")` */ protected lazy val createTable: Parser[LogicalPlan] = -CREATE ~ TEMPORARY ~ TABLE ~> ident ~ (USING ~> className) ~ (OPTIONS ~> options) ^^ { + ( CREATE ~ TEMPORARY ~ TABLE ~> ident ~ (USING ~> className) ~ (OPTIONS ~> options) ^^ { case tableName ~ provider ~ opts => -CreateTableUsing(tableName, provider, opts) +CreateTableUsing(tableName, Seq.empty, provider, opts) +} + | +CREATE ~ TEMPORARY ~ TABLE ~> ident + ~ tableCols ~ (USING ~> className) ~ (OPTIONS ~> options) ^^ { + case tableName ~ tableColumns ~ provider ~ opts => + CreateTableUsing(tableName, tableColumns, provider, opts) +} + ) + protected lazy val tableCol: Parser[(String, String)] = --- End diff -- I'd just have this be `Parser[StructField]` since you have all the info you need here to construct the final data structure. This will also make handling nested type easier. Look at `HiveMetastoreTypes` for an example. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/3431#discussion_r21940231 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/sources/FilteredScanSuite.scala --- @@ -24,18 +24,26 @@ import org.apache.spark.sql._ class FilteredScanSource extends RelationProvider { override def createRelation( sqlContext: SQLContext, - parameters: Map[String, String]): BaseRelation = { -SimpleFilteredScan(parameters("from").toInt, parameters("to").toInt)(sqlContext) + parameters: Map[String, String], --- End diff -- I'm not really sure if we need to rewrite all of these test cases since the only code path that is changing here is `DDLParser -> RelationProvider`. The planner is exactly the same. I'd just add one new test case that makes sure that schemas are passed correctly when using the new synatx. Also there needs to be better coverage for the various datatypes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/3431#discussion_r21939982 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala --- @@ -49,6 +48,15 @@ private[sql] class DDLParser extends StandardTokenParsers with PackratParsers wi protected implicit def asParser(k: Keyword): Parser[String] = lexical.allCaseVersions(k.str).map(x => x : Parser[String]).reduce(_ | _) + protected val STRING = Keyword("STRING") + protected val SHORT = Keyword("SHORT") + protected val DOUBLE = Keyword("DOUBLE") + protected val BOOLEAN = Keyword("BOOLEAN") + protected val BYTE = Keyword("BYTE") + protected val FLOAT = Keyword("FLOAT") + protected val INT = Keyword("INT") + protected val INTEGER = Keyword("INTEGER") + protected val LONG = Keyword("LONG") --- End diff -- We also need to handle nested types and arrays. Another decision is if we want to support the standard SQL types as well (`TINYINT`, etc) as aliases for the JVM type names. We could do this like Hive does unless there is another way? Thoughts @matez and @rxin? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/3431#discussion_r21939724 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/json/JSONRelation.scala --- @@ -18,31 +18,38 @@ package org.apache.spark.sql.json import org.apache.spark.sql.SQLContext +import org.apache.spark.sql.catalyst.types.StructType import org.apache.spark.sql.sources._ private[sql] class DefaultSource extends RelationProvider { /** Returns a new base relation with the given parameters. */ override def createRelation( sqlContext: SQLContext, - parameters: Map[String, String]): BaseRelation = { + parameters: Map[String, String], + schema: Option[StructType]): BaseRelation = { val fileName = parameters.getOrElse("path", sys.error("Option 'path' not specified")) val samplingRatio = parameters.get("samplingRatio").map(_.toDouble).getOrElse(1.0) -JSONRelation(fileName, samplingRatio)(sqlContext) +JSONRelation(fileName, samplingRatio, schema)(sqlContext) } } -private[sql] case class JSONRelation(fileName: String, samplingRatio: Double)( +private[sql] case class JSONRelation( +fileName: String, +samplingRatio: Double, +_schema: Option[StructType])( --- End diff -- instead of `_schema`, how about `userSpecifiedSchema`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/3431#discussion_r21939479 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/json/JSONRelation.scala --- @@ -18,31 +18,38 @@ package org.apache.spark.sql.json import org.apache.spark.sql.SQLContext +import org.apache.spark.sql.catalyst.types.StructType import org.apache.spark.sql.sources._ private[sql] class DefaultSource extends RelationProvider { /** Returns a new base relation with the given parameters. */ override def createRelation( sqlContext: SQLContext, - parameters: Map[String, String]): BaseRelation = { + parameters: Map[String, String], + schema: Option[StructType]): BaseRelation = { --- End diff -- We cannot change the function signature, otherwise we will break existing libraries. Instead I think we need to create a new interface `SchemaRelationProvider` maybe? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3431#issuecomment-66743935 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24396/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org