[GitHub] [spark] viirya commented on a diff in pull request #36809: [SPARK-39488][SQL] Simplify the error handling of TempResolvedColumn
viirya commented on code in PR #36809: URL: https://github.com/apache/spark/pull/36809#discussion_r898653220 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala: ## @@ -50,8 +50,6 @@ trait CheckAnalysis extends PredicateHelper with LookupCatalog { val DATA_TYPE_MISMATCH_ERROR = TreeNodeTag[Boolean]("dataTypeMismatchError") - val DATA_TYPE_MISMATCH_ERROR_MESSAGE = TreeNodeTag[String]("dataTypeMismatchError") - Review Comment: just to make sure this isn't related to `TempResolvedColumn`, right? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a diff in pull request #36809: [SPARK-39488][SQL] Simplify the error handling of TempResolvedColumn
viirya commented on code in PR #36809: URL: https://github.com/apache/spark/pull/36809#discussion_r898652235 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -4345,32 +4340,42 @@ object ApplyCharTypePadding extends Rule[LogicalPlan] { } /** - * Removes all [[TempResolvedColumn]]s in the query plan. This is the last resort, in case some - * rules in the main resolution batch miss to remove [[TempResolvedColumn]]s. We should run this - * rule right after the main resolution batch. + * The rule `ResolveAggregationFunctions` in the main resolution batch creates + * [[TempResolvedColumn]] in filter conditions and sort expressions to hold the temporarily resolved + * column with `agg.child`. When filter conditions or sort expressions are resolved, + * `ResolveAggregationFunctions` will replace [[TempResolvedColumn]], to [[AttributeReference]] if + * it's inside aggregate functions or group expressions, or to [[UnresolvedAttribute]] otherwise, + * hoping other rules can resolve it. + * + * This rule runs after the main resolution batch, and can still hit [[TempResolvedColumn]] if + * filter conditions or sort expressions are not resolved. When this happens, there is no point to + * turn [[TempResolvedColumn]] to [[UnresolvedAttribute]], as we can't resolve the column + * differently, and query will fail. This rule strips all [[TempResolvedColumn]]s in Filter/Sort and + * turns them to `AttributeReference` so that the error message can tell users why the filter Review Comment: nit: ```suggestion * turns them to [[AttributeReference]] so that the error message can tell users why the filter ``` To be consistent in the whole comment. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a diff in pull request #36809: [SPARK-39488][SQL] Simplify the error handling of TempResolvedColumn
viirya commented on code in PR #36809: URL: https://github.com/apache/spark/pull/36809#discussion_r898640889 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -4345,32 +4340,42 @@ object ApplyCharTypePadding extends Rule[LogicalPlan] { } /** - * Removes all [[TempResolvedColumn]]s in the query plan. This is the last resort, in case some - * rules in the main resolution batch miss to remove [[TempResolvedColumn]]s. We should run this - * rule right after the main resolution batch. + * The rule `ResolveAggregationFunctions` in the main resolution batch creates + * [[TempResolvedColumn]] in filter conditions and sort expressions to hold the temporarily resolved + * column with `agg.child`. When filter conditions or sort expressions are resolved, + * `ResolveAggregationFunctions` will turn [[TempResolvedColumn]] to [[AttributeReference]] if it's + * inside aggregate functions or group expressions, or turn it to [[UnresolvedAttribute]] otherwise, + * hoping other rules can resolve it. + * + * This rule runs after the main resolution batch, and can still hit [[TempResolvedColumn]] if + * filter conditions or sort expressions are not resolved. When this happens, there is no point to Review Comment: So you mean if `TempResolvedColumn` is inside filter conditions or sort expressions and these expressions are resolved, `TempResolvedColumn` will be replaced with `AttributeReference`. If the filter conditions or sort expressions are not resolved, `TempResolvedColumn` are remaining here. Is it correct? What `TempResolvedColumn` will be replaced with `UnresolvedAttribute`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a diff in pull request #36809: [SPARK-39488][SQL] Simplify the error handling of TempResolvedColumn
viirya commented on code in PR #36809: URL: https://github.com/apache/spark/pull/36809#discussion_r898623736 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -4345,32 +4340,42 @@ object ApplyCharTypePadding extends Rule[LogicalPlan] { } /** - * Removes all [[TempResolvedColumn]]s in the query plan. This is the last resort, in case some - * rules in the main resolution batch miss to remove [[TempResolvedColumn]]s. We should run this - * rule right after the main resolution batch. + * The rule `ResolveAggregationFunctions` in the main resolution batch creates + * [[TempResolvedColumn]] in filter conditions and sort expressions to hold the temporarily resolved + * column with `agg.child`. When filter conditions or sort expressions are resolved, + * `ResolveAggregationFunctions` will turn [[TempResolvedColumn]] to [[AttributeReference]] if it's + * inside aggregate functions or group expressions, or turn it to [[UnresolvedAttribute]] otherwise, + * hoping other rules can resolve it. + * + * This rule runs after the main resolution batch, and can still hit [[TempResolvedColumn]] if + * filter conditions or sort expressions are not resolved. When this happens, there is no point to Review Comment: It reads like a contradiction here. `ResolveAggregationFunctions` will turn `TempResolvedColumn` to `UnresolvedAttribute` and hope other rules can resolve it. So why `RemoveTempResolvedColumn` can hit `TempResolvedColumn`? If it is not resolved, isn't it turned to `UnresolvedAttribute`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org