[GitHub] [spark] viirya commented on a diff in pull request #36809: [SPARK-39488][SQL] Simplify the error handling of TempResolvedColumn

2022-06-15 Thread GitBox


viirya commented on code in PR #36809:
URL: https://github.com/apache/spark/pull/36809#discussion_r898653220


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala:
##
@@ -50,8 +50,6 @@ trait CheckAnalysis extends PredicateHelper with 
LookupCatalog {
 
   val DATA_TYPE_MISMATCH_ERROR = TreeNodeTag[Boolean]("dataTypeMismatchError")
 
-  val DATA_TYPE_MISMATCH_ERROR_MESSAGE = 
TreeNodeTag[String]("dataTypeMismatchError")
-

Review Comment:
   just to make sure this isn't related to `TempResolvedColumn`, right?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a diff in pull request #36809: [SPARK-39488][SQL] Simplify the error handling of TempResolvedColumn

2022-06-15 Thread GitBox


viirya commented on code in PR #36809:
URL: https://github.com/apache/spark/pull/36809#discussion_r898652235


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##
@@ -4345,32 +4340,42 @@ object ApplyCharTypePadding extends Rule[LogicalPlan] {
 }
 
 /**
- * Removes all [[TempResolvedColumn]]s in the query plan. This is the last 
resort, in case some
- * rules in the main resolution batch miss to remove [[TempResolvedColumn]]s. 
We should run this
- * rule right after the main resolution batch.
+ * The rule `ResolveAggregationFunctions` in the main resolution batch creates
+ * [[TempResolvedColumn]] in filter conditions and sort expressions to hold 
the temporarily resolved
+ * column with `agg.child`. When filter conditions or sort expressions are 
resolved,
+ * `ResolveAggregationFunctions` will replace [[TempResolvedColumn]], to 
[[AttributeReference]] if
+ * it's inside aggregate functions or group expressions, or to 
[[UnresolvedAttribute]] otherwise,
+ * hoping other rules can resolve it.
+ *
+ * This rule runs after the main resolution batch, and can still hit 
[[TempResolvedColumn]] if
+ * filter conditions or sort expressions are not resolved. When this happens, 
there is no point to
+ * turn [[TempResolvedColumn]] to [[UnresolvedAttribute]], as we can't resolve 
the column
+ * differently, and query will fail. This rule strips all 
[[TempResolvedColumn]]s in Filter/Sort and
+ * turns them to `AttributeReference` so that the error message can tell users 
why the filter

Review Comment:
   nit:
   
   ```suggestion
* turns them to [[AttributeReference]] so that the error message can tell 
users why the filter
   ```
   
   To be consistent in the whole comment.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a diff in pull request #36809: [SPARK-39488][SQL] Simplify the error handling of TempResolvedColumn

2022-06-15 Thread GitBox


viirya commented on code in PR #36809:
URL: https://github.com/apache/spark/pull/36809#discussion_r898640889


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##
@@ -4345,32 +4340,42 @@ object ApplyCharTypePadding extends Rule[LogicalPlan] {
 }
 
 /**
- * Removes all [[TempResolvedColumn]]s in the query plan. This is the last 
resort, in case some
- * rules in the main resolution batch miss to remove [[TempResolvedColumn]]s. 
We should run this
- * rule right after the main resolution batch.
+ * The rule `ResolveAggregationFunctions` in the main resolution batch creates
+ * [[TempResolvedColumn]] in filter conditions and sort expressions to hold 
the temporarily resolved
+ * column with `agg.child`. When filter conditions or sort expressions are 
resolved,
+ * `ResolveAggregationFunctions` will turn [[TempResolvedColumn]] to 
[[AttributeReference]] if it's
+ * inside aggregate functions or group expressions, or turn it to 
[[UnresolvedAttribute]] otherwise,
+ * hoping other rules can resolve it.
+ *
+ * This rule runs after the main resolution batch, and can still hit 
[[TempResolvedColumn]] if
+ * filter conditions or sort expressions are not resolved. When this happens, 
there is no point to

Review Comment:
   So you mean if `TempResolvedColumn` is inside filter conditions or sort 
expressions and these expressions are resolved, `TempResolvedColumn` will be 
replaced with `AttributeReference`.
   
   
   If the filter conditions or sort expressions are not resolved, 
`TempResolvedColumn` are remaining here.
   
   Is it correct?
   
   What `TempResolvedColumn` will be replaced with `UnresolvedAttribute`?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a diff in pull request #36809: [SPARK-39488][SQL] Simplify the error handling of TempResolvedColumn

2022-06-15 Thread GitBox


viirya commented on code in PR #36809:
URL: https://github.com/apache/spark/pull/36809#discussion_r898623736


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##
@@ -4345,32 +4340,42 @@ object ApplyCharTypePadding extends Rule[LogicalPlan] {
 }
 
 /**
- * Removes all [[TempResolvedColumn]]s in the query plan. This is the last 
resort, in case some
- * rules in the main resolution batch miss to remove [[TempResolvedColumn]]s. 
We should run this
- * rule right after the main resolution batch.
+ * The rule `ResolveAggregationFunctions` in the main resolution batch creates
+ * [[TempResolvedColumn]] in filter conditions and sort expressions to hold 
the temporarily resolved
+ * column with `agg.child`. When filter conditions or sort expressions are 
resolved,
+ * `ResolveAggregationFunctions` will turn [[TempResolvedColumn]] to 
[[AttributeReference]] if it's
+ * inside aggregate functions or group expressions, or turn it to 
[[UnresolvedAttribute]] otherwise,
+ * hoping other rules can resolve it.
+ *
+ * This rule runs after the main resolution batch, and can still hit 
[[TempResolvedColumn]] if
+ * filter conditions or sort expressions are not resolved. When this happens, 
there is no point to

Review Comment:
   It reads like a contradiction here. `ResolveAggregationFunctions` will turn 
`TempResolvedColumn` to `UnresolvedAttribute` and hope other rules can resolve 
it.
   
   So why `RemoveTempResolvedColumn` can hit `TempResolvedColumn`? If it is not 
resolved, isn't it turned to `UnresolvedAttribute`?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org