Repository: spark Updated Branches: refs/heads/branch-2.0 fcbb7f653 -> b52e639a8
[SPARK-16698][SQL] Field names having dots should be allowed for datasources based on FileFormat ## What changes were proposed in this pull request? It seems this is a regression assuming from https://issues.apache.org/jira/browse/SPARK-16698. Field name having dots throws an exception. For example the codes below: ```scala val path = "/tmp/path" val json =""" {"a.b":"data"}""" spark.sparkContext .parallelize(json :: Nil) .saveAsTextFile(path) spark.read.json(path).collect() ``` throws an exception as below: ``` Unable to resolve a.b given [a.b]; org.apache.spark.sql.AnalysisException: Unable to resolve a.b given [a.b]; at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolve$1$$anonfun$apply$5.apply(LogicalPlan.scala:134) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolve$1$$anonfun$apply$5.apply(LogicalPlan.scala:134) at scala.Option.getOrElse(Option.scala:121) ``` This problem was introduced in https://github.com/apache/spark/commit/17eec0a71ba8713c559d641e3f43a1be726b037c#diff-27c76f96a7b2733ecfd6f46a1716e153R121 When extracting the data columns, it does not count that it can contains dots in field names. Actually, it seems the fields name are not expected as quoted when defining schema. So, It not have to consider whether this is wrapped with quotes because the actual schema (inferred or user-given schema) would not have the quotes for fields. For example, this throws an exception. (**Loading JSON from RDD is fine**) ```scala val json =""" {"a.b":"data"}""" val rdd = spark.sparkContext.parallelize(json :: Nil) spark.read.schema(StructType(Seq(StructField("`a.b`", StringType, true)))) .json(rdd).select("`a.b`").printSchema() ``` as below: ``` cannot resolve '```a.b```' given input columns: [`a.b`]; org.apache.spark.sql.AnalysisException: cannot resolve '```a.b```' given input columns: [`a.b`]; at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) ``` ## How was this patch tested? Unit tests in `FileSourceStrategySuite`. Author: hyukjinkwon <gurwls...@gmail.com> Closes #14339 from HyukjinKwon/SPARK-16698-regression. (cherry picked from commit 79826f3c7936ee27457d030c7115d5cac69befd7) Signed-off-by: Cheng Lian <l...@databricks.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b52e639a Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b52e639a Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b52e639a Branch: refs/heads/branch-2.0 Commit: b52e639a84a851e0b9159a0f6dae92664425042e Parents: fcbb7f6 Author: hyukjinkwon <gurwls...@gmail.com> Authored: Mon Jul 25 22:51:30 2016 +0800 Committer: Cheng Lian <l...@databricks.com> Committed: Mon Jul 25 22:51:56 2016 +0800 ---------------------------------------------------------------------- .../sql/catalyst/plans/logical/LogicalPlan.scala | 2 +- .../scala/org/apache/spark/sql/SQLQuerySuite.scala | 15 +++++++++++++++ 2 files changed, 16 insertions(+), 1 deletion(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/spark/blob/b52e639a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala ---------------------------------------------------------------------- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala index d0b2b5d..6d77991 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala @@ -127,7 +127,7 @@ abstract class LogicalPlan extends QueryPlan[LogicalPlan] with Logging { */ def resolve(schema: StructType, resolver: Resolver): Seq[Attribute] = { schema.map { field => - resolveQuoted(field.name, resolver).map { + resolve(field.name :: Nil, resolver).map { case a: AttributeReference => a case other => sys.error(s"can not handle nested schema yet... plan $this") }.getOrElse { http://git-wip-us.apache.org/repos/asf/spark/blob/b52e639a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala ---------------------------------------------------------------------- diff --git a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala index f1a2410..be84dff 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala @@ -2946,4 +2946,19 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { """.stripMargin), Nil) } } + + test("SPARK-16674: field names containing dots for both fields and partitioned fields") { + withTempPath { path => + val data = (1 to 10).map(i => (i, s"data-$i", i % 2, if ((i % 2) == 0) "a" else "b")) + .toDF("col.1", "col.2", "part.col1", "part.col2") + data.write + .format("parquet") + .partitionBy("part.col1", "part.col2") + .save(path.getCanonicalPath) + val readBack = spark.read.format("parquet").load(path.getCanonicalPath) + checkAnswer( + readBack.selectExpr("`part.col1`", "`col.1`"), + data.selectExpr("`part.col1`", "`col.1`")) + } + } } --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org