[jira] [Updated] (SPARK-39893) Remote Aggregate if it is group only and all grouping and aggregate expressions are foldable
[ https://issues.apache.org/jira/browse/SPARK-39893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wan Kun updated SPARK-39893: Description: If all groupingExpressions and aggregateExpressions in a aggregate are foldable, we can remove this aggregate. For example, query : {code:java} SELECT distinct 1001 as id , cast('2022-06-03' as date) AS DT FROM testData {code} the grouping expressions are : *[1001, 2022-06-03]* the aggregate expressions are : *[1001 AS id#274, 2022-06-03 AS DT#275]* so we can skip scan table testData and remote the aggregate operation. Before this PR: {code:java} Aggregate [1001, 2022-06-03], [1001 AS id#274, 2022-06-03 AS DT#275], Statistics(sizeInBytes=16.0 EiB) +- SerializeFromObject, Statistics(sizeInBytes=8.0 EiB) +- ExternalRDD [obj#12], Statistics(sizeInBytes=8.0 EiB) {code} After this PR: {code:java} Project [1001 AS id#218, 2022-06-03 AS DT#219], Statistics(sizeInBytes=2.0 B) +- OneRowRelation, Statistics(sizeInBytes=1.0 B) {code} was: If all groupingExpressions and aggregateExpressions in a aggregate are foldable, we can remove this aggregate. For example, query : {code:java} SELECT distinct 1001 as id , cast('2022-06-03' as date) AS DT FROM testData {code} the grouping expressions are : *[1001, 2022-06-03]* the aggregate expressions are : *[1001 AS id#274, 2022-06-03 AS DT#275]* so we can skip scan table testData and remote the aggregate operation. > Remote Aggregate if it is group only and all grouping and aggregate > expressions are foldable > > > Key: SPARK-39893 > URL: https://issues.apache.org/jira/browse/SPARK-39893 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Wan Kun >Priority: Major > > If all groupingExpressions and aggregateExpressions in a aggregate are > foldable, we can remove this aggregate. > For example, query : > {code:java} > SELECT distinct 1001 as id , cast('2022-06-03' as date) AS DT FROM testData > {code} > the grouping expressions are : *[1001, 2022-06-03]* > the aggregate expressions are : *[1001 AS id#274, 2022-06-03 AS DT#275]* > so we can skip scan table testData and remote the aggregate operation. > Before this PR: > {code:java} > Aggregate [1001, 2022-06-03], [1001 AS id#274, 2022-06-03 AS DT#275], > Statistics(sizeInBytes=16.0 EiB) > +- SerializeFromObject, Statistics(sizeInBytes=8.0 EiB) >+- ExternalRDD [obj#12], Statistics(sizeInBytes=8.0 EiB) > {code} > After this PR: > {code:java} > Project [1001 AS id#218, 2022-06-03 AS DT#219], Statistics(sizeInBytes=2.0 B) > +- OneRowRelation, Statistics(sizeInBytes=1.0 B) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39893) Remote Aggregate if it is group only and all grouping and aggregate expressions are foldable
Wan Kun created SPARK-39893: --- Summary: Remote Aggregate if it is group only and all grouping and aggregate expressions are foldable Key: SPARK-39893 URL: https://issues.apache.org/jira/browse/SPARK-39893 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.4.0 Reporter: Wan Kun If all groupingExpressions and aggregateExpressions in a aggregate are foldable, we can remove this aggregate. For example, query : {code:java} SELECT distinct 1001 as id , cast('2022-06-03' as date) AS DT FROM testData {code} the grouping expressions are : *[1001, 2022-06-03]* the aggregate expressions are : *[1001 AS id#274, 2022-06-03 AS DT#275]* so we can skip scan table testData and remote the aggregate operation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39834) Include the origin stats and constraints for LogicalRDD if it comes from DataFrame
[ https://issues.apache.org/jira/browse/SPARK-39834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim resolved SPARK-39834. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 37248 [https://github.com/apache/spark/pull/37248] > Include the origin stats and constraints for LogicalRDD if it comes from > DataFrame > -- > > Key: SPARK-39834 > URL: https://issues.apache.org/jira/browse/SPARK-39834 > Project: Spark > Issue Type: Improvement > Components: SQL, Structured Streaming >Affects Versions: 3.4.0 >Reporter: Jungtaek Lim >Assignee: Jungtaek Lim >Priority: Major > Fix For: 3.4.0 > > > With SPARK-39748, Spark includes the origin logical plan for LogicalRDD if it > comes from DataFrame, to achieve carrying-over stats as well as providing > information to possibly connect two disconnected logical plans into one. > After we introduced the change, we figured out several issues: > 1. One of major use case for DataFrame.checkpoint is ML, especially > "iterative algorithm", which purpose is to "prune" the logical plan. That is > against the purpose of including origin logical plan and we have a risk to > have nested LogicalRDDs which grows the size of logical plan infinitely. > 2. We leverage logical plan to carry over stats, but the correct stats > information is in optimized plan. > 3. (Not an issue but missing spot) constraints is also something we can carry > over. > To address above issues, it would be better if we include stats and > constraints in LogicalRDD rather than logical plan. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39834) Include the origin stats and constraints for LogicalRDD if it comes from DataFrame
[ https://issues.apache.org/jira/browse/SPARK-39834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim reassigned SPARK-39834: Assignee: Jungtaek Lim > Include the origin stats and constraints for LogicalRDD if it comes from > DataFrame > -- > > Key: SPARK-39834 > URL: https://issues.apache.org/jira/browse/SPARK-39834 > Project: Spark > Issue Type: Improvement > Components: SQL, Structured Streaming >Affects Versions: 3.4.0 >Reporter: Jungtaek Lim >Assignee: Jungtaek Lim >Priority: Major > > With SPARK-39748, Spark includes the origin logical plan for LogicalRDD if it > comes from DataFrame, to achieve carrying-over stats as well as providing > information to possibly connect two disconnected logical plans into one. > After we introduced the change, we figured out several issues: > 1. One of major use case for DataFrame.checkpoint is ML, especially > "iterative algorithm", which purpose is to "prune" the logical plan. That is > against the purpose of including origin logical plan and we have a risk to > have nested LogicalRDDs which grows the size of logical plan infinitely. > 2. We leverage logical plan to carry over stats, but the correct stats > information is in optimized plan. > 3. (Not an issue but missing spot) constraints is also something we can carry > over. > To address above issues, it would be better if we include stats and > constraints in LogicalRDD rather than logical plan. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39892) Use ArrowType.Decimal(precision, scale, bitWidth) instead of ArrowType.Decimal(precision, scale)
[ https://issues.apache.org/jira/browse/SPARK-39892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39892: Assignee: Apache Spark > Use ArrowType.Decimal(precision, scale, bitWidth) instead of > ArrowType.Decimal(precision, scale) > > > Key: SPARK-39892 > URL: https://issues.apache.org/jira/browse/SPARK-39892 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Assignee: Apache Spark >Priority: Minor > Fix For: 3.4.0 > > > [warn] > /home/runner/work/spark/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/util/ArrowUtils.scala:48:49: > [deprecation @ org.apache.spark.sql.util.ArrowUtils.toArrowType | > origin=org.apache.arrow.vector.types.pojo.ArrowType.Decimal. | > version=] constructor Decimal in class Decimal is deprecated -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39892) Use ArrowType.Decimal(precision, scale, bitWidth) instead of ArrowType.Decimal(precision, scale)
[ https://issues.apache.org/jira/browse/SPARK-39892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39892: Assignee: (was: Apache Spark) > Use ArrowType.Decimal(precision, scale, bitWidth) instead of > ArrowType.Decimal(precision, scale) > > > Key: SPARK-39892 > URL: https://issues.apache.org/jira/browse/SPARK-39892 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Priority: Minor > Fix For: 3.4.0 > > > [warn] > /home/runner/work/spark/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/util/ArrowUtils.scala:48:49: > [deprecation @ org.apache.spark.sql.util.ArrowUtils.toArrowType | > origin=org.apache.arrow.vector.types.pojo.ArrowType.Decimal. | > version=] constructor Decimal in class Decimal is deprecated -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39892) Use ArrowType.Decimal(precision, scale, bitWidth) instead of ArrowType.Decimal(precision, scale)
[ https://issues.apache.org/jira/browse/SPARK-39892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17571780#comment-17571780 ] Apache Spark commented on SPARK-39892: -- User 'panbingkun' has created a pull request for this issue: https://github.com/apache/spark/pull/37315 > Use ArrowType.Decimal(precision, scale, bitWidth) instead of > ArrowType.Decimal(precision, scale) > > > Key: SPARK-39892 > URL: https://issues.apache.org/jira/browse/SPARK-39892 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Priority: Minor > Fix For: 3.4.0 > > > [warn] > /home/runner/work/spark/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/util/ArrowUtils.scala:48:49: > [deprecation @ org.apache.spark.sql.util.ArrowUtils.toArrowType | > origin=org.apache.arrow.vector.types.pojo.ArrowType.Decimal. | > version=] constructor Decimal in class Decimal is deprecated -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39892) Use ArrowType.Decimal(precision, scale, bitWidth) instead of ArrowType.Decimal(precision, scale)
[ https://issues.apache.org/jira/browse/SPARK-39892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17571779#comment-17571779 ] Apache Spark commented on SPARK-39892: -- User 'panbingkun' has created a pull request for this issue: https://github.com/apache/spark/pull/37315 > Use ArrowType.Decimal(precision, scale, bitWidth) instead of > ArrowType.Decimal(precision, scale) > > > Key: SPARK-39892 > URL: https://issues.apache.org/jira/browse/SPARK-39892 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Priority: Minor > Fix For: 3.4.0 > > > [warn] > /home/runner/work/spark/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/util/ArrowUtils.scala:48:49: > [deprecation @ org.apache.spark.sql.util.ArrowUtils.toArrowType | > origin=org.apache.arrow.vector.types.pojo.ArrowType.Decimal. | > version=] constructor Decimal in class Decimal is deprecated -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39892) Use ArrowType.Decimal(precision, scale, bitWidth) instead of ArrowType.Decimal(precision, scale)
[ https://issues.apache.org/jira/browse/SPARK-39892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BingKun Pan updated SPARK-39892: Summary: Use ArrowType.Decimal(precision, scale, bitWidth) instead of ArrowType.Decimal(precision, scale) (was: Use ArrowType.Decimal(int precision, int scale, int bitWidth) instead of ArrowType.Decimal(int precision, int scale)) > Use ArrowType.Decimal(precision, scale, bitWidth) instead of > ArrowType.Decimal(precision, scale) > > > Key: SPARK-39892 > URL: https://issues.apache.org/jira/browse/SPARK-39892 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Priority: Minor > Fix For: 3.4.0 > > > [warn] > /home/runner/work/spark/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/util/ArrowUtils.scala:48:49: > [deprecation @ org.apache.spark.sql.util.ArrowUtils.toArrowType | > origin=org.apache.arrow.vector.types.pojo.ArrowType.Decimal. | > version=] constructor Decimal in class Decimal is deprecated -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39892) Use ArrowType.Decimal(int precision, int scale, int bitWidth) instead of ArrowType.Decimal(int precision, int scale)
BingKun Pan created SPARK-39892: --- Summary: Use ArrowType.Decimal(int precision, int scale, int bitWidth) instead of ArrowType.Decimal(int precision, int scale) Key: SPARK-39892 URL: https://issues.apache.org/jira/browse/SPARK-39892 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.4.0 Reporter: BingKun Pan Fix For: 3.4.0 [warn] /home/runner/work/spark/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/util/ArrowUtils.scala:48:49: [deprecation @ org.apache.spark.sql.util.ArrowUtils.toArrowType | origin=org.apache.arrow.vector.types.pojo.ArrowType.Decimal. | version=] constructor Decimal in class Decimal is deprecated -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39833) Filtered parquet data frame count() and show() produce inconsistent results when spark.sql.parquet.filterPushdown is true
[ https://issues.apache.org/jira/browse/SPARK-39833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17571768#comment-17571768 ] Ivan Sadikov commented on SPARK-39833: -- Interesting, I will take a look. > Filtered parquet data frame count() and show() produce inconsistent results > when spark.sql.parquet.filterPushdown is true > - > > Key: SPARK-39833 > URL: https://issues.apache.org/jira/browse/SPARK-39833 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1 >Reporter: Michael Allman >Priority: Major > Labels: correctness > > One of our data scientists discovered a problem wherein a data frame > `.show()` call printed non-empty results, but `.count()` printed 0. I've > narrowed the issue to a small, reproducible test case which exhibits this > aberrant behavior. In pyspark, run the following code: > {code:python} > from pyspark.sql.types import * > parquet_pushdown_bug_df = spark.createDataFrame([{"COL0": int(0)}], > schema=StructType(fields=[StructField("COL0",IntegerType(),True)])) > parquet_pushdown_bug_df.repartition(1).write.mode("overwrite").parquet("parquet_pushdown_bug/col0=0/parquet_pushdown_bug.parquet") > reread_parquet_pushdown_bug_df = spark.read.parquet("parquet_pushdown_bug") > reread_parquet_pushdown_bug_df.filter("col0 = 0").show() > print(reread_parquet_pushdown_bug_df.filter("col0 = 0").count()) > {code} > In my usage, this prints a data frame with 1 row and a count of 0. However, > disabling `spark.sql.parquet.filterPushdown` produces consistent results: > {code:python} > spark.conf.set("spark.sql.parquet.filterPushdown", False) > reread_parquet_pushdown_bug_df.filter("col0 = 0").show() > reread_parquet_pushdown_bug_df.filter("col0 = 0").count() > {code} > This will print the same data frame, however it will print a count of 1. The > key to triggering this bug is not just enabling > `spark.sql.parquet.filterPushdown` (which is enabled by default). The case of > the column in the data frame (before writing) must differ from the case of > the partition column in the file path, i.e. COL0 versus col0 or col0 versus > COL0. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39888) Fix scala code style in SecurityManagerSuite
[ https://issues.apache.org/jira/browse/SPARK-39888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiaoping.huang resolved SPARK-39888. Resolution: Won't Do > Fix scala code style in SecurityManagerSuite > > > Key: SPARK-39888 > URL: https://issues.apache.org/jira/browse/SPARK-39888 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: xiaoping.huang >Priority: Major > > Usually, semicolons are not used at the end of a line of code in Scala > syntax. In order to maintain the code style, need to delete some semicolons -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39891) Bump h2 to 2.1.214
[ https://issues.apache.org/jira/browse/SPARK-39891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17571765#comment-17571765 ] Apache Spark commented on SPARK-39891: -- User 'panbingkun' has created a pull request for this issue: https://github.com/apache/spark/pull/37314 > Bump h2 to 2.1.214 > -- > > Key: SPARK-39891 > URL: https://issues.apache.org/jira/browse/SPARK-39891 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Priority: Minor > Fix For: 3.4.0 > > > release notes: > https://github.com/h2database/h2database/releases -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39891) Bump h2 to 2.1.214
[ https://issues.apache.org/jira/browse/SPARK-39891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39891: Assignee: Apache Spark > Bump h2 to 2.1.214 > -- > > Key: SPARK-39891 > URL: https://issues.apache.org/jira/browse/SPARK-39891 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Assignee: Apache Spark >Priority: Minor > Fix For: 3.4.0 > > > release notes: > https://github.com/h2database/h2database/releases -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39891) Bump h2 to 2.1.214
[ https://issues.apache.org/jira/browse/SPARK-39891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39891: Assignee: (was: Apache Spark) > Bump h2 to 2.1.214 > -- > > Key: SPARK-39891 > URL: https://issues.apache.org/jira/browse/SPARK-39891 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Priority: Minor > Fix For: 3.4.0 > > > release notes: > https://github.com/h2database/h2database/releases -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39833) Filtered parquet data frame count() and show() produce inconsistent results when spark.sql.parquet.filterPushdown is true
[ https://issues.apache.org/jira/browse/SPARK-39833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17571763#comment-17571763 ] Hyukjin Kwon commented on SPARK-39833: -- Seems like a bug from Parquet side in rowgroup filtering. > Filtered parquet data frame count() and show() produce inconsistent results > when spark.sql.parquet.filterPushdown is true > - > > Key: SPARK-39833 > URL: https://issues.apache.org/jira/browse/SPARK-39833 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1 >Reporter: Michael Allman >Priority: Major > Labels: correctness > > One of our data scientists discovered a problem wherein a data frame > `.show()` call printed non-empty results, but `.count()` printed 0. I've > narrowed the issue to a small, reproducible test case which exhibits this > aberrant behavior. In pyspark, run the following code: > {code:python} > from pyspark.sql.types import * > parquet_pushdown_bug_df = spark.createDataFrame([{"COL0": int(0)}], > schema=StructType(fields=[StructField("COL0",IntegerType(),True)])) > parquet_pushdown_bug_df.repartition(1).write.mode("overwrite").parquet("parquet_pushdown_bug/col0=0/parquet_pushdown_bug.parquet") > reread_parquet_pushdown_bug_df = spark.read.parquet("parquet_pushdown_bug") > reread_parquet_pushdown_bug_df.filter("col0 = 0").show() > print(reread_parquet_pushdown_bug_df.filter("col0 = 0").count()) > {code} > In my usage, this prints a data frame with 1 row and a count of 0. However, > disabling `spark.sql.parquet.filterPushdown` produces consistent results: > {code:python} > spark.conf.set("spark.sql.parquet.filterPushdown", False) > reread_parquet_pushdown_bug_df.filter("col0 = 0").show() > reread_parquet_pushdown_bug_df.filter("col0 = 0").count() > {code} > This will print the same data frame, however it will print a count of 1. The > key to triggering this bug is not just enabling > `spark.sql.parquet.filterPushdown` (which is enabled by default). The case of > the column in the data frame (before writing) must differ from the case of > the partition column in the file path, i.e. COL0 versus col0 or col0 versus > COL0. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39891) Bump h2 to 2.1.214
[ https://issues.apache.org/jira/browse/SPARK-39891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17571764#comment-17571764 ] Apache Spark commented on SPARK-39891: -- User 'panbingkun' has created a pull request for this issue: https://github.com/apache/spark/pull/37314 > Bump h2 to 2.1.214 > -- > > Key: SPARK-39891 > URL: https://issues.apache.org/jira/browse/SPARK-39891 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Priority: Minor > Fix For: 3.4.0 > > > release notes: > https://github.com/h2database/h2database/releases -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39891) Bump h2 to 2.1.214
BingKun Pan created SPARK-39891: --- Summary: Bump h2 to 2.1.214 Key: SPARK-39891 URL: https://issues.apache.org/jira/browse/SPARK-39891 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.4.0 Reporter: BingKun Pan Fix For: 3.4.0 release notes: https://github.com/h2database/h2database/releases -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39889) Use different error classes for numeric/interval divided by 0
[ https://issues.apache.org/jira/browse/SPARK-39889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17571751#comment-17571751 ] Apache Spark commented on SPARK-39889: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/37313 > Use different error classes for numeric/interval divided by 0 > - > > Key: SPARK-39889 > URL: https://issues.apache.org/jira/browse/SPARK-39889 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > Currently, when numbers are divided by 0 under ANSI mode, the error message > is like > {quote}[DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate > divisor being 0 and return NULL instead. If necessary set "ansi_mode" to > "false" (except for ANSI interval type) to bypass this error.{quote} > The "(except for ANSI interval type)" part is confusing. We should remove it > and have a new error class "INTERVAL_DIVIDED_BY_ZERO" -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39889) Use different error classes for numeric/interval divided by 0
[ https://issues.apache.org/jira/browse/SPARK-39889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39889: Assignee: Apache Spark (was: Gengliang Wang) > Use different error classes for numeric/interval divided by 0 > - > > Key: SPARK-39889 > URL: https://issues.apache.org/jira/browse/SPARK-39889 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Assignee: Apache Spark >Priority: Major > > Currently, when numbers are divided by 0 under ANSI mode, the error message > is like > {quote}[DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate > divisor being 0 and return NULL instead. If necessary set "ansi_mode" to > "false" (except for ANSI interval type) to bypass this error.{quote} > The "(except for ANSI interval type)" part is confusing. We should remove it > and have a new error class "INTERVAL_DIVIDED_BY_ZERO" -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39889) Use different error classes for numeric/interval divided by 0
[ https://issues.apache.org/jira/browse/SPARK-39889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39889: Assignee: Gengliang Wang (was: Apache Spark) > Use different error classes for numeric/interval divided by 0 > - > > Key: SPARK-39889 > URL: https://issues.apache.org/jira/browse/SPARK-39889 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > Currently, when numbers are divided by 0 under ANSI mode, the error message > is like > {quote}[DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate > divisor being 0 and return NULL instead. If necessary set "ansi_mode" to > "false" (except for ANSI interval type) to bypass this error.{quote} > The "(except for ANSI interval type)" part is confusing. We should remove it > and have a new error class "INTERVAL_DIVIDED_BY_ZERO" -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39889) Use different error classes for numeric/interval divided by 0
[ https://issues.apache.org/jira/browse/SPARK-39889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17571750#comment-17571750 ] Apache Spark commented on SPARK-39889: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/37313 > Use different error classes for numeric/interval divided by 0 > - > > Key: SPARK-39889 > URL: https://issues.apache.org/jira/browse/SPARK-39889 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > Currently, when numbers are divided by 0 under ANSI mode, the error message > is like > {quote}[DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate > divisor being 0 and return NULL instead. If necessary set "ansi_mode" to > "false" (except for ANSI interval type) to bypass this error.{quote} > The "(except for ANSI interval type)" part is confusing. We should remove it > and have a new error class "INTERVAL_DIVIDED_BY_ZERO" -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org