[jira] [Commented] (SPARK-47134) Unexpected nulls when casting decimal values in specific cases
[ https://issues.apache.org/jira/browse/SPARK-47134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17844720#comment-17844720 ] Dylan Walker commented on SPARK-47134: -- This ticket can be withdrawn. I can confirm that it is not an issue with ASF's Spark distributions. I have not been permitted to provide further details, nor is there publicly available information to point to. Apologies for the misdirected request and delayed followup. > Unexpected nulls when casting decimal values in specific cases > -- > > Key: SPARK-47134 > URL: https://issues.apache.org/jira/browse/SPARK-47134 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.1, 3.5.0 >Reporter: Dylan Walker >Priority: Major > Attachments: 321queryplan.txt, 341queryplan.txt > > > In specific cases, casting decimal values can result in `null` values where > no overflow exists. > The cases appear very specific, and I don't have the depth of knowledge to > generalize this issue, so here is a simple spark-shell reproduction: > *Setup:* > {code:scala} > scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", > x)).toDS > ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int] > scala> ds.createOrReplaceTempView("t") > {code} > > *Spark 3.2.1 behaviour (correct):* > {code:scala} > scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct > FROM t GROUP BY `_1` ORDER BY ct ASC").show() > ++ > | ct| > ++ > | 9508.00| > |13879.00| > ++ > {code} > *Spark 3.4.1 / Spark 3.5.0 behaviour:* > {code:scala} > scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct > FROM t GROUP BY `_1` ORDER BY ct ASC").show() > +---+ > | ct| > +---+ > | null| > |9508.00| > +---+ > {code} > This is fairly delicate: > - removing the {{ORDER BY}} clause produces the correct result > - removing the {{CAST}} produces the correct result > - changing the number of 0s in the argument to {{SUM}} produces the correct > result > - setting {{spark.ansi.enabled}} to {{true}} produces the correct result > (and does not throw an error) > Also, removing the {{ORDER BY}}, but writing {{ds}} to a parquet will also > result in the unexpected nulls. > Please let me know if you need additional information. > We are also interested in understanding whether setting > {{spark.ansi.enabled}} can be considered a reliable workaround to this issue > prior to a fix being released, if possible. > Text files that include {{explain()}} output attached. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-47134) Unexpected nulls when casting decimal values in specific cases
[ https://issues.apache.org/jira/browse/SPARK-47134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17819834#comment-17819834 ] Dylan Walker edited comment on SPARK-47134 at 2/22/24 10:32 PM: [~bersprockets] Hmm, it's possible I may have made too many assumptions. I left out that this is on EMR, which does have its own fork of Spark. If this is referring to names that don't exist in the Apache Spark codebase, this may be an Amazon thing. I will reach out to AWS support to confirm, and apologies if this turns out to be the case. Unfortunately, they don't do a great job at documenting the differences. was (Author: JIRAUSER304364): [~bersprockets] Hmm, it's possible I may have made too many assumptions. I left out that this is on EMR, which does have its own fork of Spark. If this is referring to names that don't exist in the Apache Spark codebase, this may be an Amazon thing. I will reach out to AWS support to confirm, and apologies if this turns out to be the case. > Unexpected nulls when casting decimal values in specific cases > -- > > Key: SPARK-47134 > URL: https://issues.apache.org/jira/browse/SPARK-47134 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.1, 3.5.0 >Reporter: Dylan Walker >Priority: Major > Attachments: 321queryplan.txt, 341queryplan.txt > > > In specific cases, casting decimal values can result in `null` values where > no overflow exists. > The cases appear very specific, and I don't have the depth of knowledge to > generalize this issue, so here is a simple spark-shell reproduction: > *Setup:* > {code:scala} > scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", > x)).toDS > ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int] > scala> ds.createOrReplaceTempView("t") > {code} > > *Spark 3.2.1 behaviour (correct):* > {code:scala} > scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct > FROM t GROUP BY `_1` ORDER BY ct ASC").show() > ++ > | ct| > ++ > | 9508.00| > |13879.00| > ++ > {code} > *Spark 3.4.1 / Spark 3.5.0 behaviour:* > {code:scala} > scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct > FROM t GROUP BY `_1` ORDER BY ct ASC").show() > +---+ > | ct| > +---+ > | null| > |9508.00| > +---+ > {code} > This is fairly delicate: > - removing the {{ORDER BY}} clause produces the correct result > - removing the {{CAST}} produces the correct result > - changing the number of 0s in the argument to {{SUM}} produces the correct > result > - setting {{spark.ansi.enabled}} to {{true}} produces the correct result > (and does not throw an error) > Also, removing the {{ORDER BY}}, but writing {{ds}} to a parquet will also > result in the unexpected nulls. > Please let me know if you need additional information. > We are also interested in understanding whether setting > {{spark.ansi.enabled}} can be considered a reliable workaround to this issue > prior to a fix being released, if possible. > Text files that include {{explain()}} output attached. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47134) Unexpected nulls when casting decimal values in specific cases
[ https://issues.apache.org/jira/browse/SPARK-47134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17819834#comment-17819834 ] Dylan Walker commented on SPARK-47134: -- [~bersprockets] Hmm, it's possible I may have made too many assumptions. I left out that this is on EMR, which does have its own fork of Spark. If this is referring to names that don't exist in the Apache Spark codebase, this may be an Amazon thing. I will reach out to AWS support to confirm, and apologies if this turns out to be the case. > Unexpected nulls when casting decimal values in specific cases > -- > > Key: SPARK-47134 > URL: https://issues.apache.org/jira/browse/SPARK-47134 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.1, 3.5.0 >Reporter: Dylan Walker >Priority: Major > Attachments: 321queryplan.txt, 341queryplan.txt > > > In specific cases, casting decimal values can result in `null` values where > no overflow exists. > The cases appear very specific, and I don't have the depth of knowledge to > generalize this issue, so here is a simple spark-shell reproduction: > *Setup:* > {code:scala} > scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", > x)).toDS > ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int] > scala> ds.createOrReplaceTempView("t") > {code} > > *Spark 3.2.1 behaviour (correct):* > {code:scala} > scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct > FROM t GROUP BY `_1` ORDER BY ct ASC").show() > ++ > | ct| > ++ > | 9508.00| > |13879.00| > ++ > {code} > *Spark 3.4.1 / Spark 3.5.0 behaviour:* > {code:scala} > scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct > FROM t GROUP BY `_1` ORDER BY ct ASC").show() > +---+ > | ct| > +---+ > | null| > |9508.00| > +---+ > {code} > This is fairly delicate: > - removing the {{ORDER BY}} clause produces the correct result > - removing the {{CAST}} produces the correct result > - changing the number of 0s in the argument to {{SUM}} produces the correct > result > - setting {{spark.ansi.enabled}} to {{true}} produces the correct result > (and does not throw an error) > Also, removing the {{ORDER BY}}, but writing {{ds}} to a parquet will also > result in the unexpected nulls. > Please let me know if you need additional information. > We are also interested in understanding whether setting > {{spark.ansi.enabled}} can be considered a reliable workaround to this issue > prior to a fix being released, if possible. > Text files that include {{explain()}} output attached. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47134) Unexpected nulls when casting decimal values in specific cases
[ https://issues.apache.org/jira/browse/SPARK-47134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dylan Walker updated SPARK-47134: - Attachment: 321queryplan.txt > Unexpected nulls when casting decimal values in specific cases > -- > > Key: SPARK-47134 > URL: https://issues.apache.org/jira/browse/SPARK-47134 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.1, 3.5.0 >Reporter: Dylan Walker >Priority: Major > Attachments: 321queryplan.txt, 341queryplan.txt > > > In specific cases, casting decimal values can result in `null` values where > no overflow exists. > The cases appear very specific, and I don't have the depth of knowledge to > generalize this issue, so here is a simple spark-shell reproduction: > *Setup:* > {code:scala} > scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", > x)).toDS > ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int] > scala> ds.createOrReplaceTempView("t") > {code} > > *Spark 3.2.1 behaviour (correct):* > {code:scala} > scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct > FROM t GROUP BY `_1` ORDER BY ct ASC").show() > ++ > | ct| > ++ > | 9508.00| > |13879.00| > ++ > {code} > *Spark 3.4.1 / Spark 3.5.0 behaviour:* > {code:scala} > scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct > FROM t GROUP BY `_1` ORDER BY ct ASC").show() > +---+ > | ct| > +---+ > | null| > |9508.00| > +---+ > {code} > This is fairly delicate: > - removing the {{ORDER BY}} clause produces the correct result > - removing the {{CAST}} produces the correct result > - changing the number of 0s in the argument to {{SUM}} produces the correct > result > - setting {{spark.ansi.enabled}} to {{true}} produces the correct result > (and does not throw an error) > Also, removing the {{ORDER BY}}, but writing {{ds}} to a parquet will also > result in the unexpected nulls. > Please let me know if you need additional information. > We are also interested in understanding whether setting > {{spark.ansi.enabled}} can be considered a reliable workaround to this issue > prior to a fix being released, if possible. > Text files that include {{explain()}} output attached. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47134) Unexpected nulls when casting decimal values in specific cases
[ https://issues.apache.org/jira/browse/SPARK-47134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dylan Walker updated SPARK-47134: - Attachment: 341queryplan.txt > Unexpected nulls when casting decimal values in specific cases > -- > > Key: SPARK-47134 > URL: https://issues.apache.org/jira/browse/SPARK-47134 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.1, 3.5.0 >Reporter: Dylan Walker >Priority: Major > Attachments: 321queryplan.txt, 341queryplan.txt > > > In specific cases, casting decimal values can result in `null` values where > no overflow exists. > The cases appear very specific, and I don't have the depth of knowledge to > generalize this issue, so here is a simple spark-shell reproduction: > *Setup:* > {code:scala} > scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", > x)).toDS > ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int] > scala> ds.createOrReplaceTempView("t") > {code} > > *Spark 3.2.1 behaviour (correct):* > {code:scala} > scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct > FROM t GROUP BY `_1` ORDER BY ct ASC").show() > ++ > | ct| > ++ > | 9508.00| > |13879.00| > ++ > {code} > *Spark 3.4.1 / Spark 3.5.0 behaviour:* > {code:scala} > scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct > FROM t GROUP BY `_1` ORDER BY ct ASC").show() > +---+ > | ct| > +---+ > | null| > |9508.00| > +---+ > {code} > This is fairly delicate: > - removing the {{ORDER BY}} clause produces the correct result > - removing the {{CAST}} produces the correct result > - changing the number of 0s in the argument to {{SUM}} produces the correct > result > - setting {{spark.ansi.enabled}} to {{true}} produces the correct result > (and does not throw an error) > Also, removing the {{ORDER BY}}, but writing {{ds}} to a parquet will also > result in the unexpected nulls. > Please let me know if you need additional information. > We are also interested in understanding whether setting > {{spark.ansi.enabled}} can be considered a reliable workaround to this issue > prior to a fix being released, if possible. > Text files that include {{explain()}} output attached. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47134) Unexpected nulls when casting decimal values in specific cases
[ https://issues.apache.org/jira/browse/SPARK-47134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dylan Walker updated SPARK-47134: - Description: In specific cases, casting decimal values can result in `null` values where no overflow exists. The cases appear very specific, and I don't have the depth of knowledge to generalize this issue, so here is a simple spark-shell reproduction: *Setup:* {code:scala} scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", x)).toDS ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int] scala> ds.createOrReplaceTempView("t") {code} *Spark 3.2.1 behaviour (correct):* {code:scala} scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct FROM t GROUP BY `_1` ORDER BY ct ASC").show() ++ | ct| ++ | 9508.00| |13879.00| ++ {code} *Spark 3.4.1 / Spark 3.5.0 behaviour:* {code:scala} scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct FROM t GROUP BY `_1` ORDER BY ct ASC").show() +---+ | ct| +---+ | null| |9508.00| +---+ {code} This is fairly delicate: - removing the {{ORDER BY}} clause produces the correct result - removing the {{CAST}} produces the correct result - changing the number of 0s in the argument to {{SUM}} produces the correct result - setting {{spark.ansi.enabled}} to {{true}} produces the correct result (and does not throw an error) Also, removing the {{ORDER BY}}, but writing {{ds}} to a parquet will also result in the unexpected nulls. Please let me know if you need additional information. We are also interested in understanding whether setting {{spark.ansi.enabled}} can be considered a reliable workaround to this issue prior to a fix being released, if possible. was: In specific cases, casting decimal values can result in `null` values where no overflow exists. The cases appear very specific, and I don't have the depth of knowledge to generalize this issue, so here is a simple spark-shell reproduction: *Setup:* {code:scala} scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", x)).toDS ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int] scala> ds.createOrReplaceTempView("t") {code} *Spark 3.2.1 behaviour (correct):* {code:scala} scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct FROM t GROUP BY `_1` ORDER BY ct ASC").show() ++ | ct| ++ | 9508.00| |13879.00| ++ {code} *Spark 3.4.1 / Spark 3.5.0 behaviour:* {code:scala} scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct FROM t GROUP BY `_1` ORDER BY ct ASC").show() +---+ | ct| +---+ | null| |9508.00| +---+ {code} This is fairly delicate: - removing the `ORDER BY` clause produces the correct result - removing the `CAST` produces the correct result - changing the number of 0s in the argument to `SUM` produces the correct result - setting `spark.ansi.enabled` to `true` produces the correct result (and does not throw an error) Also, removing the `ORDER BY`, but writing `ds` to a parquet will also result in the unexpected nulls. Please let me know if you need additional information. We are also interested in understanding whether setting `spark.ansi.enabled` can be considered a reliable workaround to this issue prior to a fix being released, if possible. > Unexpected nulls when casting decimal values in specific cases > -- > > Key: SPARK-47134 > URL: https://issues.apache.org/jira/browse/SPARK-47134 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.1, 3.5.0 >Reporter: Dylan Walker >Priority: Major > > In specific cases, casting decimal values can result in `null` values where > no overflow exists. > The cases appear very specific, and I don't have the depth of knowledge to > generalize this issue, so here is a simple spark-shell reproduction: > *Setup:* > {code:scala} > scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", > x)).toDS > ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int] > scala> ds.createOrReplaceTempView("t") > {code} > > *Spark 3.2.1 behaviour (correct):* > {code:scala} > scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct > FROM t GROUP BY `_1` ORDER BY ct ASC").show() > ++ > | ct| > ++ > | 9508.00| > |13879.00|
[jira] [Updated] (SPARK-47134) Unexpected nulls when casting decimal values in specific cases
[ https://issues.apache.org/jira/browse/SPARK-47134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dylan Walker updated SPARK-47134: - Description: In specific cases, casting decimal values can result in `null` values where no overflow exists. The cases appear very specific, and I don't have the depth of knowledge to generalize this issue, so here is a simple spark-shell reproduction: *Setup:* {code:scala} scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", x)).toDS ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int] scala> ds.createOrReplaceTempView("t") {code} *Spark 3.2.1 behaviour (correct):* {code:scala} scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct FROM t GROUP BY `_1` ORDER BY ct ASC").show() ++ | ct| ++ | 9508.00| |13879.00| ++ {code} *Spark 3.4.1 / Spark 3.5.0 behaviour:* {code:scala} scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct FROM t GROUP BY `_1` ORDER BY ct ASC").show() +---+ | ct| +---+ | null| |9508.00| +---+ {code} This is fairly delicate: - removing the `ORDER BY` clause produces the correct result - removing the `CAST` produces the correct result - changing the number of 0s in the argument to `SUM` produces the correct result - setting `spark.ansi.enabled` to `true` produces the correct result (and does not throw an error) Also, removing the `ORDER BY`, but writing `ds` to a parquet will also result in the unexpected nulls. Please let me know if you need additional information. We are also interested in understanding whether setting `spark.ansi.enabled` can be considered a reliable workaround to this issue prior to a fix being released, if possible. was: In specific cases, casting decimal values can result in `null` values where no overflow exists. The cases appear very specific, and I don't have the depth of knowledge to generalize this issue, so here is a simple spark-shell reproduction: *Setup:* {code:scala} scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", x)).toDS ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int] scala> ds.createOrReplaceTempView("t") {code} *Spark 3.2.1 behaviour (correct):* {code:scala} scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct FROM t GROUP BY `_1` ORDER BY ct ASC").show() ++ |ct| ++ |9508.00| |13879.00| ++ {code} *Spark 3.4.1 / Spark 3.5.0 behaviour:* {code:scala} scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct FROM t GROUP BY `_1` ORDER BY ct ASC").show() +---+ |ct| +---+ |null| |9508.00| +---+ {code} This is fairly delicate: - removing the `ORDER BY` clause produces the correct result - removing the `CAST` produces the correct result - changing the number of 0s in the argument to `SUM` produces the correct result - setting `spark.ansi.enabled` to `true` produces the correct result (and does not throw an error) Also, removing the `ORDER BY`, but writing `ds` to a parquet will also result in the unexpected nulls. Please let me know if you need additional information. We are also interested in understanding whether setting `spark.ansi.enabled` can be considered a reliable workaround to this issue prior to a fix being released, if possible. > Unexpected nulls when casting decimal values in specific cases > -- > > Key: SPARK-47134 > URL: https://issues.apache.org/jira/browse/SPARK-47134 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.1, 3.5.0 >Reporter: Dylan Walker >Priority: Major > > In specific cases, casting decimal values can result in `null` values where > no overflow exists. > The cases appear very specific, and I don't have the depth of knowledge to > generalize this issue, so here is a simple spark-shell reproduction: > *Setup:* > {code:scala} > scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", > x)).toDS > ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int] > scala> ds.createOrReplaceTempView("t") > {code} > > *Spark 3.2.1 behaviour (correct):* > {code:scala} > scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct > FROM t GROUP BY `_1` ORDER BY ct ASC").show() > ++ > | ct| > ++ > | 9508.00| > |13879.00| > ++ > {code} > *Spark 3.4.1 / Spark 3.5.0
[jira] [Updated] (SPARK-47134) Unexpected nulls when casting decimal values in specific cases
[ https://issues.apache.org/jira/browse/SPARK-47134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dylan Walker updated SPARK-47134: - Description: In specific cases, casting decimal values can result in `null` values where no overflow exists. The cases appear very specific, and I don't have the depth of knowledge to generalize this issue, so here is a simple spark-shell reproduction: *Setup:* {code:scala} scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", x)).toDS ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int] scala> ds.createOrReplaceTempView("t") {code} *Spark 3.2.1 behaviour (correct):* {code:scala} scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct FROM t GROUP BY `_1` ORDER BY ct ASC").show() ++ |ct| ++ |9508.00| |13879.00| ++ {code} *Spark 3.4.1 / Spark 3.5.0 behaviour:* {code:scala} scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct FROM t GROUP BY `_1` ORDER BY ct ASC").show() +---+ |ct| +---+ |null| |9508.00| +---+ {code} This is fairly delicate: - removing the `ORDER BY` clause produces the correct result - removing the `CAST` produces the correct result - changing the number of 0s in the argument to `SUM` produces the correct result - setting `spark.ansi.enabled` to `true` produces the correct result (and does not throw an error) Also, removing the `ORDER BY`, but writing `ds` to a parquet will also result in the unexpected nulls. Please let me know if you need additional information. We are also interested in understanding whether setting `spark.ansi.enabled` can be considered a reliable workaround to this issue prior to a fix being released, if possible. was: In specific cases, casting decimal values can result in `null` values where no overflow exists. The cases appear very specific, and I don't have the depth of knowledge to generalize this issue, so here is a simple spark-shell reproduction: Setup: {code:scala} scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", x)).toDS ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int] scala> ds.createOrReplaceTempView("t") {code} Spark 3.2.1 behaviour (correct): {code:scala} scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct FROM t GROUP BY `_1` ORDER BY ct ASC").show() ++ |ct| ++ |9508.00| |13879.00| ++ {code} Spark 3.4.1 / Spark 3.5.0 behaviour: {code:scala} scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct FROM t GROUP BY `_1` ORDER BY ct ASC").show() +---+ |ct| +---+ |null| |9508.00| +---+ {code} This is fairly delicate: - removing the `ORDER BY` clause produces the correct result - removing the `CAST` produces the correct result - changing the number of 0s in the argument to `SUM` produces the correct result - setting `spark.ansi.enabled` to `true` produces the correct result (and does not throw an error) Also, removing the `ORDER BY`, but writing `ds` to a parquet will also result in the unexpected nulls. Please let me know if you need additional information. We are also interested in understanding whether setting `spark.ansi.enabled` can be considered a reliable workaround to this issue prior to a fix being released, if possible. > Unexpected nulls when casting decimal values in specific cases > -- > > Key: SPARK-47134 > URL: https://issues.apache.org/jira/browse/SPARK-47134 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.1, 3.5.0 >Reporter: Dylan Walker >Priority: Major > > In specific cases, casting decimal values can result in `null` values where > no overflow exists. > The cases appear very specific, and I don't have the depth of knowledge to > generalize this issue, so here is a simple spark-shell reproduction: > *Setup:* > {code:scala} > scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", > x)).toDS > ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int] > scala> ds.createOrReplaceTempView("t") > {code} > > *Spark 3.2.1 behaviour (correct):* > {code:scala} > scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct > FROM t GROUP BY `_1` ORDER BY ct ASC").show() > ++ > |ct| > ++ > |9508.00| > |13879.00| > ++ > {code} > *Spark 3.4.1 / Spark 3.5.0 behaviour:* > {code:scala} > scala> spark.sql("select CAST(SUM(1.
[jira] [Updated] (SPARK-47134) Unexpected nulls when casting decimal values in specific cases
[ https://issues.apache.org/jira/browse/SPARK-47134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dylan Walker updated SPARK-47134: - Description: In specific cases, casting decimal values can result in `null` values where no overflow exists. The cases appear very specific, and I don't have the depth of knowledge to generalize this issue, so here is a simple spark-shell reproduction: Setup: {code:scala} scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", x)).toDS ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int] scala> ds.createOrReplaceTempView("t") {code} Spark 3.2.1 behaviour (correct): {code:scala} scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct FROM t GROUP BY `_1` ORDER BY ct ASC").show() ++ |ct| ++ |9508.00| |13879.00| ++ {code} Spark 3.4.1 / Spark 3.5.0 behaviour: {code:scala} scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct FROM t GROUP BY `_1` ORDER BY ct ASC").show() +---+ |ct| +---+ |null| |9508.00| +---+ {code} This is fairly delicate: - removing the `ORDER BY` clause produces the correct result - removing the `CAST` produces the correct result - changing the number of 0s in the argument to `SUM` produces the correct result - setting `spark.ansi.enabled` to `true` produces the correct result (and does not throw an error) Also, removing the `ORDER BY`, but writing `ds` to a parquet will also result in the unexpected nulls. Please let me know if you need additional information. We are also interested in understanding whether setting `spark.ansi.enabled` can be considered a reliable workaround to this issue prior to a fix being released, if possible. was: In specific cases, casting decimal values can result in `null` values where no overflow exists. The cases appear very specific, and I don't have the depth of knowledge to generalize this issue, so here is a simple spark-shell reproduction: Setup: ``` scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", x)).toDS ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int] scala> ds.createOrReplaceTempView("t") ``` Spark 3.2.1 behaviour (correct): ``` scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct FROM t GROUP BY `_1` ORDER BY ct ASC").show() ++ | ct| ++ | 9508.00| |13879.00| ++ ``` Spark 3.4.1 / Spark 3.5.0 behaviour: ``` scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct FROM t GROUP BY `_1` ORDER BY ct ASC").show() +---+ | ct| +---+ | null| |9508.00| +---+ ``` This is fairly delicate: - removing the `ORDER BY` clause produces the correct result - removing the `CAST` produces the correct result - changing the number of 0s in the argument to `SUM` produces the correct result - setting `spark.ansi.enabled` to `true` produces the correct result (and does not throw an error) Also, removing the `ORDER BY`, but writing `ds` to a parquet will also result in the unexpected nulls. Please let me know if you need additional information. We are also interested in understanding whether setting `spark.ansi.enabled` can be considered a reliable workaround to this issue prior to a fix being released, if possible. > Unexpected nulls when casting decimal values in specific cases > -- > > Key: SPARK-47134 > URL: https://issues.apache.org/jira/browse/SPARK-47134 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.1, 3.5.0 >Reporter: Dylan Walker >Priority: Major > > In specific cases, casting decimal values can result in `null` values where > no overflow exists. > > The cases appear very specific, and I don't have the depth of knowledge to > generalize this issue, so here is a simple spark-shell reproduction: > > Setup: > {code:scala} > scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", > x)).toDS > ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int] > scala> ds.createOrReplaceTempView("t") > {code} > > Spark 3.2.1 behaviour (correct): > {code:scala} > scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct > FROM t GROUP BY `_1` ORDER BY ct ASC").show() > ++ > |ct| > ++ > |9508.00| > |13879.00| > ++ > {code} > Spark 3.4.1 / Spark 3.5.0 behaviour: > {code:scala} > scala> spark.sql("select CAS
[jira] [Created] (SPARK-47134) Unexpected nulls when casting decimal values in specific cases
Dylan Walker created SPARK-47134: Summary: Unexpected nulls when casting decimal values in specific cases Key: SPARK-47134 URL: https://issues.apache.org/jira/browse/SPARK-47134 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.5.0, 3.4.1 Reporter: Dylan Walker In specific cases, casting decimal values can result in `null` values where no overflow exists. The cases appear very specific, and I don't have the depth of knowledge to generalize this issue, so here is a simple spark-shell reproduction: Setup: ``` scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", x)).toDS ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int] scala> ds.createOrReplaceTempView("t") ``` Spark 3.2.1 behaviour (correct): ``` scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct FROM t GROUP BY `_1` ORDER BY ct ASC").show() ++ | ct| ++ | 9508.00| |13879.00| ++ ``` Spark 3.4.1 / Spark 3.5.0 behaviour: ``` scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct FROM t GROUP BY `_1` ORDER BY ct ASC").show() +---+ | ct| +---+ | null| |9508.00| +---+ ``` This is fairly delicate: - removing the `ORDER BY` clause produces the correct result - removing the `CAST` produces the correct result - changing the number of 0s in the argument to `SUM` produces the correct result - setting `spark.ansi.enabled` to `true` produces the correct result (and does not throw an error) Also, removing the `ORDER BY`, but writing `ds` to a parquet will also result in the unexpected nulls. Please let me know if you need additional information. We are also interested in understanding whether setting `spark.ansi.enabled` can be considered a reliable workaround to this issue prior to a fix being released, if possible. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org