[jira] [Updated] (SPARK-48871) Fix INVALID_NON_DETERMINISTIC_EXPRESSIONS validation in CheckAnalysis
[ https://issues.apache.org/jira/browse/SPARK-48871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carmen Kwan updated SPARK-48871: Description: I encountered the following exception when attempting to use a non-deterministic udf in my query. {code:java} [info] org.apache.spark.sql.catalyst.ExtendedAnalysisException: [INVALID_NON_DETERMINISTIC_EXPRESSIONS] The operator expects a deterministic expression, but the actual expression is "[some expression]".; line 2 pos 1 [info] [some logical plan] [info] at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:52) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$2(CheckAnalysis.scala:761) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$2$adapted(CheckAnalysis.scala:182) [info] at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:244) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis0(CheckAnalysis.scala:182) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis0$(CheckAnalysis.scala:164) [info] at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis0(Analyzer.scala:188) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:160) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:150) [info] at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:188) [info] at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:211) [info] at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330) [info] at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:208) [info] at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:77) [info] at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:138) [info] at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:219) [info] at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:546) [info] at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:219) [info] at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900) [info] at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:218) [info] at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:77) [info] at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:74) [info] at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:66){code} The non-deterministic expression can be safely allowed for my custom LogicalPlan, but it is disabled in the checkAnalysis phase. The CheckAnalysis rule is too strict so that reasonable use cases of non-deterministic expressions are also disabled. To fix this, we could add a trait that logical plans can extend to implement a method to decide whether there can be non-deterministic expressions for the operator, and check this function in checkAnalysis. This allows delegation of this validation to frameworks that extend Spark so we can allow list more than just the few explicitly named logical plans (e.g. `Project`, `Filter`). was: I encountered the following exception when attempting to use a non-deterministic udf in my query. {code:java} [info] org.apache.spark.sql.catalyst.ExtendedAnalysisException: [INVALID_NON_DETERMINISTIC_EXPRESSIONS] The operator expects a deterministic expression, but the actual expression is "[some expression]".; line 2 pos 1 [info] [some logical plan] [info] at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:52) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$2(CheckAnalysis.scala:761) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$2$adapted(CheckAnalysis.scala:182) [info] at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:244) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis0(CheckAnalysis.scala:182) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis0$(CheckAnalysis.scala:164) [info] at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis0(Analyzer.scala:188) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:160) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:150) [info] at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:188) [info] at org.apache.spark.sql.catalyst.analysis.
[jira] [Updated] (SPARK-48871) Fix INVALID_NON_DETERMINISTIC_EXPRESSIONS validation in CheckAnalysis
[ https://issues.apache.org/jira/browse/SPARK-48871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carmen Kwan updated SPARK-48871: Description: I encountered the following exception when attempting to use a non-deterministic udf in my query. {code:java} [info] org.apache.spark.sql.catalyst.ExtendedAnalysisException: [INVALID_NON_DETERMINISTIC_EXPRESSIONS] The operator expects a deterministic expression, but the actual expression is "[some expression]".; line 2 pos 1 [info] [some logical plan] [info] at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:52) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$2(CheckAnalysis.scala:761) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$2$adapted(CheckAnalysis.scala:182) [info] at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:244) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis0(CheckAnalysis.scala:182) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis0$(CheckAnalysis.scala:164) [info] at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis0(Analyzer.scala:188) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:160) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:150) [info] at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:188) [info] at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:211) [info] at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330) [info] at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:208) [info] at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:77) [info] at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:138) [info] at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:219) [info] at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:546) [info] at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:219) [info] at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900) [info] at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:218) [info] at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:77) [info] at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:74) [info] at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:66){code} The non-deterministic expression can be safely allowed for my custom LogicalPlan, but it is disabled in the checkAnalysis phase. The CheckAnalysis rule is too strict so that reasonable use cases of non-deterministic expressions are also disabled. To fix this, we could add a trait that logical plans can extend to implement a method to decide whether there can be non-deterministic expressions for the operator, and check this function in checkAnalysis. This allows delegation of this validation to frameworks that extend Spark so we can allow list more than just the few explicitly named logical plans (e.g. `Project`, `Filter`). was: I encountered the following exception when attempting to use a non-deterministic udf in my query. {code:java} [info] org.apache.spark.sql.catalyst.ExtendedAnalysisException: [INVALID_NON_DETERMINISTIC_EXPRESSIONS] The operator expects a deterministic expression, but the actual expression is "[some expression]".; line 2 pos 1 [info] [some logical plan] [info] at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:52) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$2(CheckAnalysis.scala:761) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$2$adapted(CheckAnalysis.scala:182) [info] at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:244) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis0(CheckAnalysis.scala:182) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis0$(CheckAnalysis.scala:164) [info] at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis0(Analyzer.scala:188) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:160) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:150) [info] at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:188) [info] at org.apache.spark.sql.catalyst.analysis
[jira] [Updated] (SPARK-48871) Fix INVALID_NON_DETERMINISTIC_EXPRESSIONS validation in CheckAnalysis
[ https://issues.apache.org/jira/browse/SPARK-48871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carmen Kwan updated SPARK-48871: Description: I encountered the following exception when attempting to use a non-deterministic udf in my query. {code:java} [info] org.apache.spark.sql.catalyst.ExtendedAnalysisException: [INVALID_NON_DETERMINISTIC_EXPRESSIONS] The operator expects a deterministic expression, but the actual expression is "[some expression]".; line 2 pos 1 [info] [some logical plan] [info] at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:52) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$2(CheckAnalysis.scala:761) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$2$adapted(CheckAnalysis.scala:182) [info] at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:244) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis0(CheckAnalysis.scala:182) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis0$(CheckAnalysis.scala:164) [info] at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis0(Analyzer.scala:188) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:160) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:150) [info] at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:188) [info] at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:211) [info] at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330) [info] at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:208) [info] at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:77) [info] at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:138) [info] at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:219) [info] at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:546) [info] at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:219) [info] at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900) [info] at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:218) [info] at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:77) [info] at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:74) [info] at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:66){code} The non-deterministic expression can be safely allowed for my custom LogicalPlan, but it is disabled in the checkAnalysis phase. The CheckAnalysis rule is too strict so that reasonable use cases of non-deterministic expressions are also disabled. To fix this, we could add a trait that logical plans can extend to implement a method to decide whether there can be non-deterministic expressions for the operator, and check this function in checkAnalysis. This allows delegation of this validation to frameworks that extend Spark so we can allow list more than just the few explicitly named logical plans (e.g. `Project`, `Filter`). was: I encountered the following exception when attempting to use a non-deterministic udf in my query. The non-deterministic expression can be safely allowed for my custom LogicalPlan, but it is disabled in the checkAnalysis phase. The CheckAnalysis rule is too strict so that reasonable use cases of non-deterministic expressions are also disabled. To fix this, we could add a trait that logical plans can extend to implement a method to decide whether there can be non-deterministic expressions for the operator, and check this function in checkAnalysis. This allows delegation of this validation to frameworks that extend Spark so we can allow list more than just the few explicitly named logical plans (e.g. `Project`, `Filter`). {code:java} [info] org.apache.spark.sql.catalyst.ExtendedAnalysisException: [INVALID_NON_DETERMINISTIC_EXPRESSIONS] The operator expects a deterministic expression, but the actual expression is "[some expression]".; line 2 pos 1 [info] [some logical plan] [info] at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:52) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$2(CheckAnalysis.scala:761) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$2$adapted(CheckAnalysis.scala:182) [info] at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:244) [info] at o
[jira] [Updated] (SPARK-48871) Fix INVALID_NON_DETERMINISTIC_EXPRESSIONS validation in CheckAnalysis
[ https://issues.apache.org/jira/browse/SPARK-48871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carmen Kwan updated SPARK-48871: Description: I encountered the following exception when attempting to use a non-deterministic udf in my query. The non-deterministic expression can be safely allowed for my custom LogicalPlan, but it is disabled in the checkAnalysis phase. The CheckAnalysis rule is too strict so that reasonable use cases of non-deterministic expressions are also disabled. To fix this, we could add a trait that logical plans can extend to implement a method to decide whether there can be non-deterministic expressions for the operator, and check this function in checkAnalysis. This allows delegation of this validation to frameworks that extend Spark so we can allow list more than just the few explicitly named logical plans (e.g. `Project`, `Filter`). {code:java} [info] org.apache.spark.sql.catalyst.ExtendedAnalysisException: [INVALID_NON_DETERMINISTIC_EXPRESSIONS] The operator expects a deterministic expression, but the actual expression is "[some expression]".; line 2 pos 1 [info] [some logical plan] [info] at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:52) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$2(CheckAnalysis.scala:761) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$2$adapted(CheckAnalysis.scala:182) [info] at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:244) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis0(CheckAnalysis.scala:182) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis0$(CheckAnalysis.scala:164) [info] at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis0(Analyzer.scala:188) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:160) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:150) [info] at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:188) [info] at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:211) [info] at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330) [info] at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:208) [info] at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:77) [info] at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:138) [info] at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:219) [info] at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:546) [info] at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:219) [info] at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900) [info] at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:218) [info] at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:77) [info] at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:74) [info] at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:66) {code} was: I encountered the following exception when attempting to use a non-deterministic udf in my query. The non-deterministic expression can be safely allowed for my custom LogicalPlan, but it is disabled in the checkAnalysis phase. The CheckAnalysis rule is too strict so that reasonable use cases of non-deterministic expressions are also disabled. To fix this, we could add a trait that logical plans can extend to implement a method to decide whether there can be non-deterministic expressions for the operator, and check this function in checkAnalysis. This allows delegation of this validation to frameworks that extend Spark so we can allow list more than just the few explicitly named logical plans (e.g. `Project`, `Filter`). {code:java} [info] org.apache.spark.sql.catalyst.ExtendedAnalysisException: [INVALID_NON_DETERMINISTIC_EXPRESSIONS] The operator expects a deterministic expression, but the actual expression is "[some expression]".; line 2 pos 1; [info] [some logical plan] [info] at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:52) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$2(CheckAnalysis.scala:761) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$2$adapted(CheckAnalysis.scala:182) [info] at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:244) [
[jira] [Updated] (SPARK-48871) Fix INVALID_NON_DETERMINISTIC_EXPRESSIONS validation in CheckAnalysis
[ https://issues.apache.org/jira/browse/SPARK-48871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carmen Kwan updated SPARK-48871: Description: I encountered the following exception when attempting to use a non-deterministic udf in my query. The non-deterministic expression can be safely allowed for my custom LogicalPlan, but it is disabled in the checkAnalysis phase. The CheckAnalysis rule is too strict so that reasonable use cases of non-deterministic expressions are also disabled. To fix this, we could add a trait that logical plans can extend to implement a method to decide whether there can be non-deterministic expressions for the operator, and check this function in checkAnalysis. This allows delegation of this validation to frameworks that extend Spark so we can allow list more than just the few explicitly named logical plans (e.g. `Project`, `Filter`). {code:java} [info] org.apache.spark.sql.catalyst.ExtendedAnalysisException: [INVALID_NON_DETERMINISTIC_EXPRESSIONS] The operator expects a deterministic expression, but the actual expression is "[some expression]".; line 2 pos 1 [info] [some logical plan] [info] at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:52) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$2(CheckAnalysis.scala:761) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$2$adapted(CheckAnalysis.scala:182) [info] at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:244) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis0(CheckAnalysis.scala:182) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis0$(CheckAnalysis.scala:164) [info] at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis0(Analyzer.scala:188) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:160) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:150) [info] at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:188) [info] at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:211) [info] at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330) [info] at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:208) [info] at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:77) [info] at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:138) [info] at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:219) [info] at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:546) [info] at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:219) [info] at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900) [info] at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:218) [info] at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:77) [info] at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:74) [info] at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:66){code} was: I encountered the following exception when attempting to use a non-deterministic udf in my query. The non-deterministic expression can be safely allowed for my custom LogicalPlan, but it is disabled in the checkAnalysis phase. The CheckAnalysis rule is too strict so that reasonable use cases of non-deterministic expressions are also disabled. To fix this, we could add a trait that logical plans can extend to implement a method to decide whether there can be non-deterministic expressions for the operator, and check this function in checkAnalysis. This allows delegation of this validation to frameworks that extend Spark so we can allow list more than just the few explicitly named logical plans (e.g. `Project`, `Filter`). {code:java} [info] org.apache.spark.sql.catalyst.ExtendedAnalysisException: [INVALID_NON_DETERMINISTIC_EXPRESSIONS] The operator expects a deterministic expression, but the actual expression is "[some expression]".; line 2 pos 1 [info] [some logical plan] [info] at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:52) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$2(CheckAnalysis.scala:761) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$2$adapted(CheckAnalysis.scala:182) [info] at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:244) [info] at o
[jira] [Updated] (SPARK-48871) Fix INVALID_NON_DETERMINISTIC_EXPRESSIONS validation in CheckAnalysis
[ https://issues.apache.org/jira/browse/SPARK-48871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carmen Kwan updated SPARK-48871: Description: I encountered the following exception when attempting to use a non-deterministic udf in my query. The non-deterministic expression can be safely allowed for my custom LogicalPlan, but it is disabled in the checkAnalysis phase. The CheckAnalysis rule is too strict so that reasonable use cases of non-deterministic expressions are also disabled. To fix this, we could add a trait that logical plans can extend to implement a method to decide whether there can be non-deterministic expressions for the operator, and check this function in checkAnalysis. This allows delegation of this validation to frameworks that extend Spark so we can allow list more than just the few explicitly named logical plans (e.g. `Project`, `Filter`). {code:java} [info] org.apache.spark.sql.catalyst.ExtendedAnalysisException: [INVALID_NON_DETERMINISTIC_EXPRESSIONS] The operator expects a deterministic expression, but the actual expression is "[some expression]".; line 2 pos 1; [info] [some logical plan] [info] at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:52) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$2(CheckAnalysis.scala:761) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$2$adapted(CheckAnalysis.scala:182) [info] at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:244) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis0(CheckAnalysis.scala:182) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis0$(CheckAnalysis.scala:164) [info] at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis0(Analyzer.scala:188) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:160) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:150) [info] at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:188) [info] at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:211) [info] at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330) [info] at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:208) [info] at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:77) [info] at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:138) [info] at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:219) [info] at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:546) [info] at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:219) [info] at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900) [info] at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:218) [info] at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:77) [info] at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:74) [info] at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:66) {code} was: I encountered the following exception when attempting to use a non-deterministic udf in my query. The non-deterministic expression can be safely allowed for my custom LogicalPlan, but it is disabled in the checkAnalysis phase. The CheckAnalysis rule is too strict so that reasonable use cases of non-deterministic expressions are also disabled. To fix this, we could add a trait that logical plans can extend to implement a method to decide whether there can be non-deterministic expressions for the operator, and check this function in checkAnalysis. This allows delegation of this validation to frameworks that extend Spark so we can allow list more than just the few explicitly named logical plans (e.g. `Project`, `Filter`). [info] org.apache.spark.sql.catalyst.ExtendedAnalysisException: [INVALID_NON_DETERMINISTIC_EXPRESSIONS] The operator expects a deterministic expression, but the actual expression is "[some expression]".; line 2 pos 1; [info] [some logical plan] [info] at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:52) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$2(CheckAnalysis.scala:761) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$2$adapted(CheckAnalysis.scala:182) [info] at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala
[jira] [Created] (SPARK-48871) Fix INVALID_NON_DETERMINISTIC_EXPRESSIONS validation in CheckAnalysis
Carmen Kwan created SPARK-48871: --- Summary: Fix INVALID_NON_DETERMINISTIC_EXPRESSIONS validation in CheckAnalysis Key: SPARK-48871 URL: https://issues.apache.org/jira/browse/SPARK-48871 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 4.0.0, 3.5.2, 3.4.4 Reporter: Carmen Kwan I encountered the following exception when attempting to use a non-deterministic udf in my query. The non-deterministic expression can be safely allowed for my custom LogicalPlan, but it is disabled in the checkAnalysis phase. The CheckAnalysis rule is too strict so that reasonable use cases of non-deterministic expressions are also disabled. To fix this, we could add a trait that logical plans can extend to implement a method to decide whether there can be non-deterministic expressions for the operator, and check this function in checkAnalysis. This allows delegation of this validation to frameworks that extend Spark so we can allow list more than just the few explicitly named logical plans (e.g. `Project`, `Filter`). [info] org.apache.spark.sql.catalyst.ExtendedAnalysisException: [INVALID_NON_DETERMINISTIC_EXPRESSIONS] The operator expects a deterministic expression, but the actual expression is "[some expression]".; line 2 pos 1; [info] [some logical plan] [info] at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:52) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$2(CheckAnalysis.scala:761) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$2$adapted(CheckAnalysis.scala:182) [info] at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:244) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis0(CheckAnalysis.scala:182) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis0$(CheckAnalysis.scala:164) [info] at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis0(Analyzer.scala:188) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:160) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:150) [info] at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:188) [info] at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:211) [info] at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330) [info] at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:208) [info] at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:77) [info] at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:138) [info] at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:219) [info] at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:546) [info] at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:219) [info] at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900) [info] at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:218) [info] at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:77) [info] at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:74) [info] at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:66) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48473) CheckAnalysis should be more flexible
[ https://issues.apache.org/jira/browse/SPARK-48473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carmen Kwan resolved SPARK-48473. - Fix Version/s: (was: 4.0.0) Resolution: Abandoned > CheckAnalysis should be more flexible > - > > Key: SPARK-48473 > URL: https://issues.apache.org/jira/browse/SPARK-48473 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Carmen Kwan >Priority: Major > > CheckAnalysis should be more flexible. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48473) Add extensible trait to allow-list non-deterministic expressions in operators in CheckAnalysis
[ https://issues.apache.org/jira/browse/SPARK-48473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carmen Kwan updated SPARK-48473: Fix Version/s: 4.0.0 3.5.2 > Add extensible trait to allow-list non-deterministic expressions in operators > in CheckAnalysis > -- > > Key: SPARK-48473 > URL: https://issues.apache.org/jira/browse/SPARK-48473 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0, 3.5.2 >Reporter: Carmen Kwan >Priority: Major > Fix For: 4.0.0, 3.5.2 > > > CheckAnalysis throws an `INVALID_NON_DETERMINISTIC_EXPRESSIONS` exception > when there is a non-deterministic expression within an operator that is not > allow listed in the case match check > [below|https://github.com/apache/spark/blob/branch-3.5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala#L773-L784]: > > {code:java} > case o if o.expressions.exists(!_.deterministic) && > !o.isInstanceOf[Project] && !o.isInstanceOf[Filter] && > !o.isInstanceOf[Aggregate] && !o.isInstanceOf[Window] && > !o.isInstanceOf[Expand] && > !o.isInstanceOf[Generate] && > // Lateral join is checked in checkSubqueryExpression. > !o.isInstanceOf[LateralJoin] => > // The rule above is used to check Aggregate operator. > o.failAnalysis( > errorClass = "INVALID_NON_DETERMINISTIC_EXPRESSIONS", > messageParameters = Map("sqlExprs" -> > o.expressions.map(toSQLExpr(_)).mkString(", ")) > ){code} > > It would be nice to add a generic trait/class to this case match that is > allow listed so that when new non-deterministic expressions that live in > other repositories needs to be allow listed, we don't need to wait for a new > spark release. For example, in Delta Lake, we want to allow list a specific > non-deterministic expression for the DeltaMergeIntoMatchedUpdateClause > operator as part of Delta's [Identity Column > implementation.|https://github.com/delta-io/delta/issues/1959]It is cleaner > overall to add an abstract generic class there than to put Delta specific > logic into this CheckAnalysis rule. > It would be beneficial to backport this to Spark 3.5 so that we don't need to > wait for the Spark 4 to benefit from this low risk change. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48824) Add SQL syntax in create/replace table to create an identity column
Carmen Kwan created SPARK-48824: --- Summary: Add SQL syntax in create/replace table to create an identity column Key: SPARK-48824 URL: https://issues.apache.org/jira/browse/SPARK-48824 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 4.0.0 Reporter: Carmen Kwan Add SQL support for creating identity columns. Identity Column syntax should be flexible such that users can specify * whether identity values are always generated by the system * (optionally) the starting value of the column * (optionally) the increment/step of the column The SQL syntax support should also allow flexible ordering of the increment and starting values, as both variants are used in the wild by other systems (e.g. [PostgreSQL|https://www.postgresql.org/docs/current/sql-createsequence.html] [Oracle).|https://docs.oracle.com/en/database/oracle/oracle-database/23/sqlrf/CREATE-SEQUENCE.html#GUID-E9C78A8C-615A-4757-B2A8-5E6EFB130571] That is, we should allow both {code:java} START WITH INCREMENT BY {code} and {code:java} INCREMENT BY START WITH {code} . For example, we should be able to define {code:java} CREATE TABLE default.example ( id LONG GENERATED ALWAYS AS IDENTITY, id2 LONG GENERATED BY DEFAULT START WITH 0 INCREMENT BY -10, id3 LONG GENERATED ALWAYS AS IDENTITY INCREMENT BY 2 START WITH -8, value LONG ) {code} This will enable defining identity columns in Spark SQL for data sources that support it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48473) Add extensible trait to allow-list non-deterministic expressions in operators in CheckAnalysis
[ https://issues.apache.org/jira/browse/SPARK-48473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carmen Kwan updated SPARK-48473: Component/s: SQL (was: Spark Core) > Add extensible trait to allow-list non-deterministic expressions in operators > in CheckAnalysis > -- > > Key: SPARK-48473 > URL: https://issues.apache.org/jira/browse/SPARK-48473 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0, 3.5.2 >Reporter: Carmen Kwan >Priority: Major > > CheckAnalysis throws an `INVALID_NON_DETERMINISTIC_EXPRESSIONS` exception > when there is a non-deterministic expression within an operator that is not > allow listed in the case match check > [below|https://github.com/apache/spark/blob/branch-3.5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala#L773-L784]: > > {code:java} > case o if o.expressions.exists(!_.deterministic) && > !o.isInstanceOf[Project] && !o.isInstanceOf[Filter] && > !o.isInstanceOf[Aggregate] && !o.isInstanceOf[Window] && > !o.isInstanceOf[Expand] && > !o.isInstanceOf[Generate] && > // Lateral join is checked in checkSubqueryExpression. > !o.isInstanceOf[LateralJoin] => > // The rule above is used to check Aggregate operator. > o.failAnalysis( > errorClass = "INVALID_NON_DETERMINISTIC_EXPRESSIONS", > messageParameters = Map("sqlExprs" -> > o.expressions.map(toSQLExpr(_)).mkString(", ")) > ){code} > > It would be nice to add a generic trait/class to this case match that is > allow listed so that when new non-deterministic expressions that live in > other repositories needs to be allow listed, we don't need to wait for a new > spark release. For example, in Delta Lake, we want to allow list a specific > non-deterministic expression for the DeltaMergeIntoMatchedUpdateClause > operator as part of Delta's [Identity Column > implementation.|https://github.com/delta-io/delta/issues/1959]It is cleaner > overall to add an abstract generic class there than to put Delta specific > logic into this CheckAnalysis rule. > It would be beneficial to backport this to Spark 3.5 so that we don't need to > wait for the Spark 4 to benefit from this low risk change. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48473) Add extensible trait to allow-list non-deterministic expressions in operators in CheckAnalysis
Carmen Kwan created SPARK-48473: --- Summary: Add extensible trait to allow-list non-deterministic expressions in operators in CheckAnalysis Key: SPARK-48473 URL: https://issues.apache.org/jira/browse/SPARK-48473 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 4.0.0, 3.5.2 Reporter: Carmen Kwan CheckAnalysis throws an `INVALID_NON_DETERMINISTIC_EXPRESSIONS` exception when there is a non-deterministic expression within an operator that is not allow listed in the case match check [below|https://github.com/apache/spark/blob/branch-3.5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala#L773-L784]: {code:java} case o if o.expressions.exists(!_.deterministic) && !o.isInstanceOf[Project] && !o.isInstanceOf[Filter] && !o.isInstanceOf[Aggregate] && !o.isInstanceOf[Window] && !o.isInstanceOf[Expand] && !o.isInstanceOf[Generate] && // Lateral join is checked in checkSubqueryExpression. !o.isInstanceOf[LateralJoin] => // The rule above is used to check Aggregate operator. o.failAnalysis( errorClass = "INVALID_NON_DETERMINISTIC_EXPRESSIONS", messageParameters = Map("sqlExprs" -> o.expressions.map(toSQLExpr(_)).mkString(", ")) ){code} It would be nice to add a generic trait/class to this case match that is allow listed so that when new non-deterministic expressions that live in other repositories needs to be allow listed, we don't need to wait for a new spark release. For example, in Delta Lake, we want to allow list a specific non-deterministic expression for the DeltaMergeIntoMatchedUpdateClause operator as part of Delta's [Identity Column implementation.|https://github.com/delta-io/delta/issues/1959]It is cleaner overall to add an abstract generic class there than to put Delta specific logic into this CheckAnalysis rule. It would be beneficial to backport this to Spark 3.5 so that we don't need to wait for the Spark 4 to benefit from this low risk change. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40315) Non-deterministic hashCode() calculations for ArrayBasedMapData on equal objects
Carmen Kwan created SPARK-40315: --- Summary: Non-deterministic hashCode() calculations for ArrayBasedMapData on equal objects Key: SPARK-40315 URL: https://issues.apache.org/jira/browse/SPARK-40315 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.2.2 Reporter: Carmen Kwan There is no explicit `hashCode()` function override for the `ArrayBasedMapData` LogicalPlan. As a result, the `hashCode()` computed for `ArrayBasedMapData` can be different for two equal objects (objects with equal keys and values). This error is non-deterministic and hard to reproduce, as we don't control the default `hashCode()` function. We should override the `hashCode` function so that it works exactly as we expect. We should also have an explicit `equals()` function for consistency with how `Literals` check for equality of `ArrayBasedMapData`. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org