[GitHub] spark pull request #22563: [SPARK-24341][SQL][followup] remove duplicated er...

2018-09-27 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22563


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22563: [SPARK-24341][SQL][followup] remove duplicated er...

2018-09-27 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22563#discussion_r220846789
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala
 ---
@@ -155,37 +155,35 @@ case class InSubquery(values: Seq[Expression], query: 
ListQuery)
 
 
   override def checkInputDataTypes(): TypeCheckResult = {
-val mismatchOpt = !DataType.equalsStructurally(query.dataType, 
value.dataType,
-  ignoreNullability = true)
-if (mismatchOpt) {
-  if (values.length != query.childOutputs.length) {
-TypeCheckResult.TypeCheckFailure(
-  s"""
- |The number of columns in the left hand side of an IN 
subquery does not match the
- |number of columns in the output of subquery.
- |#columns in left hand side: ${values.length}.
- |#columns in right hand side: ${query.childOutputs.length}.
- |Left side columns:
- |[${values.map(_.sql).mkString(", ")}].
- |Right side columns:
- |[${query.childOutputs.map(_.sql).mkString(", 
")}].""".stripMargin)
-  } else {
-val mismatchedColumns = values.zip(query.childOutputs).flatMap {
-  case (l, r) if l.dataType != r.dataType =>
-Seq(s"(${l.sql}:${l.dataType.catalogString}, 
${r.sql}:${r.dataType.catalogString})")
-  case _ => None
-}
-TypeCheckResult.TypeCheckFailure(
-  s"""
- |The data type of one or more elements in the left hand side 
of an IN subquery
- |is not compatible with the data type of the output of the 
subquery
- |Mismatched columns:
- |[${mismatchedColumns.mkString(", ")}]
- |Left side:
- |[${values.map(_.dataType.catalogString).mkString(", ")}].
- |Right side:
- 
|[${query.childOutputs.map(_.dataType.catalogString).mkString(", 
")}].""".stripMargin)
+if (values.length != query.childOutputs.length) {
--- End diff --

Yes, thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22563: [SPARK-24341][SQL][followup] remove duplicated er...

2018-09-27 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22563#discussion_r220830491
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala
 ---
@@ -155,37 +155,35 @@ case class InSubquery(values: Seq[Expression], query: 
ListQuery)
 
 
   override def checkInputDataTypes(): TypeCheckResult = {
-val mismatchOpt = !DataType.equalsStructurally(query.dataType, 
value.dataType,
-  ignoreNullability = true)
-if (mismatchOpt) {
-  if (values.length != query.childOutputs.length) {
-TypeCheckResult.TypeCheckFailure(
-  s"""
- |The number of columns in the left hand side of an IN 
subquery does not match the
- |number of columns in the output of subquery.
- |#columns in left hand side: ${values.length}.
- |#columns in right hand side: ${query.childOutputs.length}.
- |Left side columns:
- |[${values.map(_.sql).mkString(", ")}].
- |Right side columns:
- |[${query.childOutputs.map(_.sql).mkString(", 
")}].""".stripMargin)
-  } else {
-val mismatchedColumns = values.zip(query.childOutputs).flatMap {
-  case (l, r) if l.dataType != r.dataType =>
-Seq(s"(${l.sql}:${l.dataType.catalogString}, 
${r.sql}:${r.dataType.catalogString})")
-  case _ => None
-}
-TypeCheckResult.TypeCheckFailure(
-  s"""
- |The data type of one or more elements in the left hand side 
of an IN subquery
- |is not compatible with the data type of the output of the 
subquery
- |Mismatched columns:
- |[${mismatchedColumns.mkString(", ")}]
- |Left side:
- |[${values.map(_.dataType.catalogString).mkString(", ")}].
- |Right side:
- 
|[${query.childOutputs.map(_.dataType.catalogString).mkString(", 
")}].""".stripMargin)
+if (values.length != query.childOutputs.length) {
--- End diff --

That's not how it's designed to work. The reason we have `CheckAnalysis` is 
to do analysis work as much as we can, and report the most precise error at the 
end. Performance is not a big concern in analyzer, giving users the best error 
message is more important. Although in this case the error message is not a 
problem, it's better to follow the designed way.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22563: [SPARK-24341][SQL][followup] remove duplicated er...

2018-09-27 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22563#discussion_r220826272
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala
 ---
@@ -155,37 +155,35 @@ case class InSubquery(values: Seq[Expression], query: 
ListQuery)
 
 
   override def checkInputDataTypes(): TypeCheckResult = {
-val mismatchOpt = !DataType.equalsStructurally(query.dataType, 
value.dataType,
-  ignoreNullability = true)
-if (mismatchOpt) {
-  if (values.length != query.childOutputs.length) {
-TypeCheckResult.TypeCheckFailure(
-  s"""
- |The number of columns in the left hand side of an IN 
subquery does not match the
- |number of columns in the output of subquery.
- |#columns in left hand side: ${values.length}.
- |#columns in right hand side: ${query.childOutputs.length}.
- |Left side columns:
- |[${values.map(_.sql).mkString(", ")}].
- |Right side columns:
- |[${query.childOutputs.map(_.sql).mkString(", 
")}].""".stripMargin)
-  } else {
-val mismatchedColumns = values.zip(query.childOutputs).flatMap {
-  case (l, r) if l.dataType != r.dataType =>
-Seq(s"(${l.sql}:${l.dataType.catalogString}, 
${r.sql}:${r.dataType.catalogString})")
-  case _ => None
-}
-TypeCheckResult.TypeCheckFailure(
-  s"""
- |The data type of one or more elements in the left hand side 
of an IN subquery
- |is not compatible with the data type of the output of the 
subquery
- |Mismatched columns:
- |[${mismatchedColumns.mkString(", ")}]
- |Left side:
- |[${values.map(_.dataType.catalogString).mkString(", ")}].
- |Right side:
- 
|[${query.childOutputs.map(_.dataType.catalogString).mkString(", 
")}].""".stripMargin)
+if (values.length != query.childOutputs.length) {
--- End diff --

the reason why I added the check in `ResolveSubquery` was to fail fast 
immediately, without wasting time doing all the analysis until we check the 
data types. In this way, the check in  `checkInputDataTypes` was actually 
useless, but I left it there for safety.

I agree unifying them and actually I am fine with this change, but I'd 
prefer keeping the check in `ResolveSubquery` in order to fail immediately (it 
is not a big deal anyway). What do you think?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22563: [SPARK-24341][SQL][followup] remove duplicated er...

2018-09-26 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22563#discussion_r220783617
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala
 ---
@@ -155,37 +155,35 @@ case class InSubquery(values: Seq[Expression], query: 
ListQuery)
 
 
   override def checkInputDataTypes(): TypeCheckResult = {
-val mismatchOpt = !DataType.equalsStructurally(query.dataType, 
value.dataType,
-  ignoreNullability = true)
-if (mismatchOpt) {
-  if (values.length != query.childOutputs.length) {
-TypeCheckResult.TypeCheckFailure(
-  s"""
- |The number of columns in the left hand side of an IN 
subquery does not match the
- |number of columns in the output of subquery.
- |#columns in left hand side: ${values.length}.
- |#columns in right hand side: ${query.childOutputs.length}.
- |Left side columns:
- |[${values.map(_.sql).mkString(", ")}].
- |Right side columns:
- |[${query.childOutputs.map(_.sql).mkString(", 
")}].""".stripMargin)
-  } else {
-val mismatchedColumns = values.zip(query.childOutputs).flatMap {
-  case (l, r) if l.dataType != r.dataType =>
-Seq(s"(${l.sql}:${l.dataType.catalogString}, 
${r.sql}:${r.dataType.catalogString})")
-  case _ => None
-}
-TypeCheckResult.TypeCheckFailure(
-  s"""
- |The data type of one or more elements in the left hand side 
of an IN subquery
- |is not compatible with the data type of the output of the 
subquery
- |Mismatched columns:
- |[${mismatchedColumns.mkString(", ")}]
- |Left side:
- |[${values.map(_.dataType.catalogString).mkString(", ")}].
- |Right side:
- 
|[${query.childOutputs.map(_.dataType.catalogString).mkString(", 
")}].""".stripMargin)
+if (values.length != query.childOutputs.length) {
--- End diff --

to keep the behavior same as before, here we should do length check first, 
then type check.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22563: [SPARK-24341][SQL][followup] remove duplicated er...

2018-09-26 Thread cloud-fan
GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/22563

[SPARK-24341][SQL][followup] remove duplicated error checking

## What changes were proposed in this pull request?

There are 2 places we check for problematic `InSubquery`: the rule 
`ResolveSubquery` and `InSubquery. checkInputDataTypes`. We should unify them.

## How was this patch tested?

existing tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark followup

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22563.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22563


commit d810e0dfec9bc8f182509ccf75623f2e92e95290
Author: Wenchen Fan 
Date:   2018-09-27T03:16:17Z

remove duplicated error checking




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org