[jira] [Created] (SPARK-43446) Upgrade Apache Arrow to 12.0.0
Dongjoon Hyun created SPARK-43446: - Summary: Upgrade Apache Arrow to 12.0.0 Key: SPARK-43446 URL: https://issues.apache.org/jira/browse/SPARK-43446 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.5.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43424) Support vanila JDBC CHAR/VARCHAR through STS
[ https://issues.apache.org/jira/browse/SPARK-43424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-43424. --- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41102 [https://github.com/apache/spark/pull/41102] > Support vanila JDBC CHAR/VARCHAR through STS > - > > Key: SPARK-43424 > URL: https://issues.apache.org/jira/browse/SPARK-43424 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Kent Yao >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43424) Support vanila JDBC CHAR/VARCHAR through STS
[ https://issues.apache.org/jira/browse/SPARK-43424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-43424: - Assignee: Kent Yao > Support vanila JDBC CHAR/VARCHAR through STS > - > > Key: SPARK-43424 > URL: https://issues.apache.org/jira/browse/SPARK-43424 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43445) Enable GroupBySlowTests.test_split_apply_combine_on_series for pandas 2.0.0.
Haejoon Lee created SPARK-43445: --- Summary: Enable GroupBySlowTests.test_split_apply_combine_on_series for pandas 2.0.0. Key: SPARK-43445 URL: https://issues.apache.org/jira/browse/SPARK-43445 Project: Spark Issue Type: Sub-task Components: Pandas API on Spark Affects Versions: 3.5.0 Reporter: Haejoon Lee Enable GroupBySlowTests.test_split_apply_combine_on_series for pandas 2.0.0. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43444) Enable GroupBySlowTests.test_value_counts for pandas 2.0.0.
Haejoon Lee created SPARK-43444: --- Summary: Enable GroupBySlowTests.test_value_counts for pandas 2.0.0. Key: SPARK-43444 URL: https://issues.apache.org/jira/browse/SPARK-43444 Project: Spark Issue Type: Sub-task Components: Pandas API on Spark Affects Versions: 3.5.0 Reporter: Haejoon Lee Enable GroupBySlowTests.test_value_counts for pandas 2.0.0. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43441) makeDotNode should not fail when DeterministicLevel is absent
[ https://issues.apache.org/jira/browse/SPARK-43441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-43441. --- Fix Version/s: 3.4.1 Resolution: Fixed Issue resolved by pull request 41124 [https://github.com/apache/spark/pull/41124] > makeDotNode should not fail when DeterministicLevel is absent > - > > Key: SPARK-43441 > URL: https://issues.apache.org/jira/browse/SPARK-43441 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Qi Tan >Assignee: Qi Tan >Priority: Minor > Fix For: 3.4.1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43443) Add benchmark for Timestamp type inference when use invalid value
Jia Fan created SPARK-43443: --- Summary: Add benchmark for Timestamp type inference when use invalid value Key: SPARK-43443 URL: https://issues.apache.org/jira/browse/SPARK-43443 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.4.0 Reporter: Jia Fan We need a benchmark to measure whether our optimization of Timestamp type inference is useful, we have valid Timestamp value benchmark at now, but don't have invalid Timestamp value benchmark when use Timestamp type inference. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43442) Split test module `pyspark_pandas_connect`
[ https://issues.apache.org/jira/browse/SPARK-43442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17721609#comment-17721609 ] Snoot.io commented on SPARK-43442: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/41127 > Split test module `pyspark_pandas_connect` > -- > > Key: SPARK-43442 > URL: https://issues.apache.org/jira/browse/SPARK-43442 > Project: Spark > Issue Type: Test > Components: Connect, PySpark, Tests >Affects Versions: 3.5.0 >Reporter: Ruifeng Zheng >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43442) Split test module `pyspark_pandas_connect`
[ https://issues.apache.org/jira/browse/SPARK-43442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17721610#comment-17721610 ] Snoot.io commented on SPARK-43442: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/41127 > Split test module `pyspark_pandas_connect` > -- > > Key: SPARK-43442 > URL: https://issues.apache.org/jira/browse/SPARK-43442 > Project: Spark > Issue Type: Test > Components: Connect, PySpark, Tests >Affects Versions: 3.5.0 >Reporter: Ruifeng Zheng >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43403) GET /history//1/jobs/ failed: java.lang.IllegalStateException: DB is closed
[ https://issues.apache.org/jira/browse/SPARK-43403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17721605#comment-17721605 ] Snoot.io commented on SPARK-43403: -- User 'zhouyifan279' has created a pull request for this issue: https://github.com/apache/spark/pull/41105 > GET /history//1/jobs/ failed: java.lang.IllegalStateException: DB is > closed > -- > > Key: SPARK-43403 > URL: https://issues.apache.org/jira/browse/SPARK-43403 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.1.2 >Reporter: Zhou Yifan >Priority: Major > Attachments: image-2023-05-08-11-33-13-634.png > > > !image-2023-05-08-11-33-13-634.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43442) Split test module `pyspark_pandas_connect`
Ruifeng Zheng created SPARK-43442: - Summary: Split test module `pyspark_pandas_connect` Key: SPARK-43442 URL: https://issues.apache.org/jira/browse/SPARK-43442 Project: Spark Issue Type: Test Components: Connect, PySpark, Tests Affects Versions: 3.5.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43425) Add TimestampNTZType to ColumnarBatchRow
[ https://issues.apache.org/jira/browse/SPARK-43425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-43425: - Fix Version/s: 3.4.1 > Add TimestampNTZType to ColumnarBatchRow > > > Key: SPARK-43425 > URL: https://issues.apache.org/jira/browse/SPARK-43425 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Fokko Driesprong >Assignee: Fokko Driesprong >Priority: Major > Fix For: 3.4.1, 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43440) Support registration of an Arrow-optimized Python UDF
[ https://issues.apache.org/jira/browse/SPARK-43440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-43440: - Description: Currently, when users register an Arrow-optimized Python UDF, it will be registered as a pickled Python UDF and thus, executed without Arrow optimization. We should support Arrow-optimized Python UDFs registration and execute them with Arrow optimization. was:Support registration of an Arrow-optimized Python UDF > Support registration of an Arrow-optimized Python UDF > -- > > Key: SPARK-43440 > URL: https://issues.apache.org/jira/browse/SPARK-43440 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.5.0 >Reporter: Xinrong Meng >Priority: Major > > Currently, when users register an Arrow-optimized Python UDF, it will be > registered as a pickled Python UDF and thus, executed without Arrow > optimization. > We should support Arrow-optimized Python UDFs registration and execute them > with Arrow optimization. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43441) makeDotNode should not fail when DeterministicLevel is absent
[ https://issues.apache.org/jira/browse/SPARK-43441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-43441: - Assignee: Qi Tan > makeDotNode should not fail when DeterministicLevel is absent > - > > Key: SPARK-43441 > URL: https://issues.apache.org/jira/browse/SPARK-43441 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Qi Tan >Assignee: Qi Tan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43441) makeDotNode should not fail when DeterministicLevel is absent
Qi Tan created SPARK-43441: -- Summary: makeDotNode should not fail when DeterministicLevel is absent Key: SPARK-43441 URL: https://issues.apache.org/jira/browse/SPARK-43441 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.4.0 Reporter: Qi Tan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43440) Support registration of an Arrow-optimized Python UDF
Xinrong Meng created SPARK-43440: Summary: Support registration of an Arrow-optimized Python UDF Key: SPARK-43440 URL: https://issues.apache.org/jira/browse/SPARK-43440 Project: Spark Issue Type: Sub-task Components: Connect, PySpark Affects Versions: 3.5.0 Reporter: Xinrong Meng Support registration of an Arrow-optimized Python UDF -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42523) Apache Spark 3.4 release
[ https://issues.apache.org/jira/browse/SPARK-42523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17721549#comment-17721549 ] Xinrong Meng commented on SPARK-42523: -- I am wondering if we shall keep the ticket open for minor releases such as the upcoming 3.4.1. > Apache Spark 3.4 release > > > Key: SPARK-42523 > URL: https://issues.apache.org/jira/browse/SPARK-42523 > Project: Spark > Issue Type: Umbrella > Components: Build >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > > An umbrella for Apache Spark 3.4 release -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43412) Introduce `SQL_ARROW_BATCHED_UDF` EvalType for Arrow-optimized Python UDFs
[ https://issues.apache.org/jira/browse/SPARK-43412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng resolved SPARK-43412. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41053 [https://github.com/apache/spark/pull/41053] > Introduce `SQL_ARROW_BATCHED_UDF` EvalType for Arrow-optimized Python UDFs > -- > > Key: SPARK-43412 > URL: https://issues.apache.org/jira/browse/SPARK-43412 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.5.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Fix For: 3.5.0 > > > We are about to improve nested non-atomic input/output support of an > Arrow-optimized Python UDF. > However, currently, it shares the same EvalType with a pickled Python UDF, > but the same implementation with a Pandas UDF. > Introducing an EvalType enables isolating the changes to Arrow-optimized > Python UDFs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43430) ExecutePlanRequest should have the ability to set request options.
[ https://issues.apache.org/jira/browse/SPARK-43430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hövell resolved SPARK-43430. --- Fix Version/s: 3.5.0 Resolution: Fixed > ExecutePlanRequest should have the ability to set request options. > -- > > Key: SPARK-43430 > URL: https://issues.apache.org/jira/browse/SPARK-43430 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Martin Grund >Assignee: Martin Grund >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43430) ExecutePlanRequest should have the ability to set request options.
[ https://issues.apache.org/jira/browse/SPARK-43430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hövell reassigned SPARK-43430: - Assignee: Martin Grund > ExecutePlanRequest should have the ability to set request options. > -- > > Key: SPARK-43430 > URL: https://issues.apache.org/jira/browse/SPARK-43430 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Martin Grund >Assignee: Martin Grund >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43439) Drop does not work when passed a string with an alias
[ https://issues.apache.org/jira/browse/SPARK-43439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frederik Paradis updated SPARK-43439: - Affects Version/s: 3.4.0 (was: 3.3.2) > Drop does not work when passed a string with an alias > - > > Key: SPARK-43439 > URL: https://issues.apache.org/jira/browse/SPARK-43439 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Frederik Paradis >Priority: Major > > When passing a string to the drop method, if the string contains an alias, > the column is not dropped. However, passing a column object with the same > name and alias, it works. > {code:python} > from pyspark.sql import SparkSession > import pyspark.sql.functions as F > spark = > SparkSession.builder.master("local[1]").appName("local-spark-session").getOrCreate() > df = spark.createDataFrame([(1, 10)], ["any", "hour"]).alias("a") > j = df.drop("a.hour") > print(j) # DataFrame[any: bigint, hour: bigint] > jj = df.drop(F.col("a.hour")) > print(jj) # DataFrame[any: bigint] > {code} > > Related issues: > https://issues.apache.org/jira/browse/SPARK-31123 > https://issues.apache.org/jira/browse/SPARK-14759 > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43439) Drop does not work when passed a string with an alias
Frederik Paradis created SPARK-43439: Summary: Drop does not work when passed a string with an alias Key: SPARK-43439 URL: https://issues.apache.org/jira/browse/SPARK-43439 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 3.3.2 Reporter: Frederik Paradis When passing a string to the drop method, if the string contains an alias, the column is not dropped. However, passing a column object with the same name and alias, it works. {code:python} from pyspark.sql import SparkSession import pyspark.sql.functions as F spark = SparkSession.builder.master("local[1]").appName("local-spark-session").getOrCreate() df1 = spark.createDataFrame([(1, 10)], ["any", "hour"]).alias("a") j = df1.drop("a.hour") print(j) # DataFrame[any: bigint, hour: bigint] jj = df1.drop(F.col("a.hour")) print(jj) # DataFrame[any: bigint] {code} Related issues: https://issues.apache.org/jira/browse/SPARK-31123 https://issues.apache.org/jira/browse/SPARK-14759 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43439) Drop does not work when passed a string with an alias
[ https://issues.apache.org/jira/browse/SPARK-43439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frederik Paradis updated SPARK-43439: - Description: When passing a string to the drop method, if the string contains an alias, the column is not dropped. However, passing a column object with the same name and alias, it works. {code:python} from pyspark.sql import SparkSession import pyspark.sql.functions as F spark = SparkSession.builder.master("local[1]").appName("local-spark-session").getOrCreate() df = spark.createDataFrame([(1, 10)], ["any", "hour"]).alias("a") j = df.drop("a.hour") print(j) # DataFrame[any: bigint, hour: bigint] jj = df.drop(F.col("a.hour")) print(jj) # DataFrame[any: bigint] {code} Related issues: https://issues.apache.org/jira/browse/SPARK-31123 https://issues.apache.org/jira/browse/SPARK-14759 was: When passing a string to the drop method, if the string contains an alias, the column is not dropped. However, passing a column object with the same name and alias, it works. {code:python} from pyspark.sql import SparkSession import pyspark.sql.functions as F spark = SparkSession.builder.master("local[1]").appName("local-spark-session").getOrCreate() df1 = spark.createDataFrame([(1, 10)], ["any", "hour"]).alias("a") j = df1.drop("a.hour") print(j) # DataFrame[any: bigint, hour: bigint] jj = df1.drop(F.col("a.hour")) print(jj) # DataFrame[any: bigint] {code} Related issues: https://issues.apache.org/jira/browse/SPARK-31123 https://issues.apache.org/jira/browse/SPARK-14759 > Drop does not work when passed a string with an alias > - > > Key: SPARK-43439 > URL: https://issues.apache.org/jira/browse/SPARK-43439 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.3.2 >Reporter: Frederik Paradis >Priority: Major > > When passing a string to the drop method, if the string contains an alias, > the column is not dropped. However, passing a column object with the same > name and alias, it works. > {code:python} > from pyspark.sql import SparkSession > import pyspark.sql.functions as F > spark = > SparkSession.builder.master("local[1]").appName("local-spark-session").getOrCreate() > df = spark.createDataFrame([(1, 10)], ["any", "hour"]).alias("a") > j = df.drop("a.hour") > print(j) # DataFrame[any: bigint, hour: bigint] > jj = df.drop(F.col("a.hour")) > print(jj) # DataFrame[any: bigint] > {code} > > Related issues: > https://issues.apache.org/jira/browse/SPARK-31123 > https://issues.apache.org/jira/browse/SPARK-14759 > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43439) Drop does not work when passed a string with an alias
[ https://issues.apache.org/jira/browse/SPARK-43439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frederik Paradis updated SPARK-43439: - Description: When passing a string to the drop method, if the string contains an alias, the column is not dropped. However, passing a column object with the same name and alias, it works. {code:python} from pyspark.sql import SparkSession import pyspark.sql.functions as F spark = SparkSession.builder.master("local[1]").appName("local-spark-session").getOrCreate() df = spark.createDataFrame([(1, 10)], ["any", "hour"]).alias("a") j = df.drop("a.hour") print(j) # DataFrame[any: bigint, hour: bigint] jj = df.drop(F.col("a.hour")) print(jj) # DataFrame[any: bigint] {code} Related issues: https://issues.apache.org/jira/browse/SPARK-31123 https://issues.apache.org/jira/browse/SPARK-14759 was: When passing a string to the drop method, if the string contains an alias, the column is not dropped. However, passing a column object with the same name and alias, it works. {code:python} from pyspark.sql import SparkSession import pyspark.sql.functions as F spark = SparkSession.builder.master("local[1]").appName("local-spark-session").getOrCreate() df = spark.createDataFrame([(1, 10)], ["any", "hour"]).alias("a") j = df.drop("a.hour") print(j) # DataFrame[any: bigint, hour: bigint] jj = df.drop(F.col("a.hour")) print(jj) # DataFrame[any: bigint] {code} Related issues: https://issues.apache.org/jira/browse/SPARK-31123 https://issues.apache.org/jira/browse/SPARK-14759 > Drop does not work when passed a string with an alias > - > > Key: SPARK-43439 > URL: https://issues.apache.org/jira/browse/SPARK-43439 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.3.2 >Reporter: Frederik Paradis >Priority: Major > > When passing a string to the drop method, if the string contains an alias, > the column is not dropped. However, passing a column object with the same > name and alias, it works. > {code:python} > from pyspark.sql import SparkSession > import pyspark.sql.functions as F > spark = > SparkSession.builder.master("local[1]").appName("local-spark-session").getOrCreate() > df = spark.createDataFrame([(1, 10)], ["any", "hour"]).alias("a") > j = df.drop("a.hour") > print(j) # DataFrame[any: bigint, hour: bigint] > jj = df.drop(F.col("a.hour")) > print(jj) # DataFrame[any: bigint] > {code} > > Related issues: > https://issues.apache.org/jira/browse/SPARK-31123 > https://issues.apache.org/jira/browse/SPARK-14759 > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43439) Drop does not work when passed a string with an alias
[ https://issues.apache.org/jira/browse/SPARK-43439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frederik Paradis updated SPARK-43439: - Description: When passing a string to the drop method, if the string contains an alias, the column is not dropped. However, passing a column object with the same name and alias, it works. {code:python} from pyspark.sql import SparkSession import pyspark.sql.functions as F spark = SparkSession.builder.master("local[1]").appName("local-spark-session").getOrCreate() df = spark.createDataFrame([(1, 10)], ["any", "hour"]).alias("a") j = df.drop("a.hour") print(j) # DataFrame[any: bigint, hour: bigint] jj = df.drop(F.col("a.hour")) print(jj) # DataFrame[any: bigint] {code} Related issues: https://issues.apache.org/jira/browse/SPARK-31123 https://issues.apache.org/jira/browse/SPARK-14759 was: When passing a string to the drop method, if the string contains an alias, the column is not dropped. However, passing a column object with the same name and alias, it works. {code:python} from pyspark.sql import SparkSession import pyspark.sql.functions as F spark = SparkSession.builder.master("local[1]").appName("local-spark-session").getOrCreate() df = spark.createDataFrame([(1, 10)], ["any", "hour"]).alias("a") j = df.drop("a.hour") print(j) # DataFrame[any: bigint, hour: bigint] jj = df.drop(F.col("a.hour")) print(jj) # DataFrame[any: bigint] {code} Related issues: https://issues.apache.org/jira/browse/SPARK-31123 https://issues.apache.org/jira/browse/SPARK-14759 > Drop does not work when passed a string with an alias > - > > Key: SPARK-43439 > URL: https://issues.apache.org/jira/browse/SPARK-43439 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.3.2 >Reporter: Frederik Paradis >Priority: Major > > When passing a string to the drop method, if the string contains an alias, > the column is not dropped. However, passing a column object with the same > name and alias, it works. > {code:python} > from pyspark.sql import SparkSession > import pyspark.sql.functions as F > spark = > SparkSession.builder.master("local[1]").appName("local-spark-session").getOrCreate() > df = spark.createDataFrame([(1, 10)], ["any", "hour"]).alias("a") > j = df.drop("a.hour") > print(j) # DataFrame[any: bigint, hour: bigint] > jj = df.drop(F.col("a.hour")) > print(jj) # DataFrame[any: bigint] > {code} > > Related issues: > https://issues.apache.org/jira/browse/SPARK-31123 > https://issues.apache.org/jira/browse/SPARK-14759 > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43438) Fix mismatched column list error on INSERT
Serge Rielau created SPARK-43438: Summary: Fix mismatched column list error on INSERT Key: SPARK-43438 URL: https://issues.apache.org/jira/browse/SPARK-43438 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.4.0 Reporter: Serge Rielau This error message is pretty bad, and common "_LEGACY_ERROR_TEMP_1038" : { "message" : [ "Cannot write to table due to mismatched user specified column size() and data column size()." ] }, It can perhaps be merged with this one - after giving it an ERROR_CLASS "_LEGACY_ERROR_TEMP_1168" : { "message" : [ " requires that the data to be inserted have the same number of columns as the target table: target table has column(s) but the inserted data has column(s), including partition column(s) having constant value(s)." ] }, Repro: CREATE TABLE tabtest(c1 INT, c2 INT); INSERT INTO tabtest SELECT 1; `spark_catalog`.`default`.`tabtest` requires that the data to be inserted have the same number of columns as the target table: target table has 2 column(s) but the inserted data has 1 column(s), including 0 partition column(s) having constant value(s). INSERT INTO tabtest(c1) SELECT 1, 2, 3; Cannot write to table due to mismatched user specified column size(1) and data column size(3).; line 1 pos 24 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35198) Add support for calling debugCodegen from Python & Java
[ https://issues.apache.org/jira/browse/SPARK-35198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17721475#comment-17721475 ] Ignite TC Bot commented on SPARK-35198: --- User 'juanvisoler' has created a pull request for this issue: https://github.com/apache/spark/pull/40608 > Add support for calling debugCodegen from Python & Java > --- > > Key: SPARK-35198 > URL: https://issues.apache.org/jira/browse/SPARK-35198 > Project: Spark > Issue Type: Improvement > Components: PySpark, SQL >Affects Versions: 3.0.1, 3.0.2, 3.1.0, 3.1.1, 3.2.0 >Reporter: Holden Karau >Priority: Minor > Labels: starter > > Because it is implimented with an implicit conversion it's a bit complicated > to call, we should add a direct method to get debug state for Java & Python > users of Dataframes. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43437) Upgrade Arrow to 12.0.0
Yang Jie created SPARK-43437: Summary: Upgrade Arrow to 12.0.0 Key: SPARK-43437 URL: https://issues.apache.org/jira/browse/SPARK-43437 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.5.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43014) Support spark.kubernetes.setSubmitTimeInDriver
[ https://issues.apache.org/jira/browse/SPARK-43014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-43014: -- Affects Version/s: 3.5.0 (was: 3.4.0) (was: 3.3.2) > Support spark.kubernetes.setSubmitTimeInDriver > -- > > Key: SPARK-43014 > URL: https://issues.apache.org/jira/browse/SPARK-43014 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Zhou Yifan >Assignee: Zhou Yifan >Priority: Minor > Fix For: 3.5.0 > > > If submit Spark in k8s cluster mode, `spark.app.submitTime` will be > overwritten when driver starts. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43014) Support spark.kubernetes.setSubmitTimeInDriver
[ https://issues.apache.org/jira/browse/SPARK-43014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-43014: -- Summary: Support spark.kubernetes.setSubmitTimeInDriver (was: spark.app.submitTime is not right in k8s cluster mode) > Support spark.kubernetes.setSubmitTimeInDriver > -- > > Key: SPARK-43014 > URL: https://issues.apache.org/jira/browse/SPARK-43014 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.2, 3.4.0 >Reporter: Zhou Yifan >Assignee: Zhou Yifan >Priority: Major > Fix For: 3.5.0 > > > If submit Spark in k8s cluster mode, `spark.app.submitTime` will be > overwritten when driver starts. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43014) Support spark.kubernetes.setSubmitTimeInDriver
[ https://issues.apache.org/jira/browse/SPARK-43014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-43014: -- Priority: Minor (was: Major) > Support spark.kubernetes.setSubmitTimeInDriver > -- > > Key: SPARK-43014 > URL: https://issues.apache.org/jira/browse/SPARK-43014 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.2, 3.4.0 >Reporter: Zhou Yifan >Assignee: Zhou Yifan >Priority: Minor > Fix For: 3.5.0 > > > If submit Spark in k8s cluster mode, `spark.app.submitTime` will be > overwritten when driver starts. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43014) Support spark.kubernetes.setSubmitTimeInDriver
[ https://issues.apache.org/jira/browse/SPARK-43014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-43014: -- Issue Type: Improvement (was: Bug) > Support spark.kubernetes.setSubmitTimeInDriver > -- > > Key: SPARK-43014 > URL: https://issues.apache.org/jira/browse/SPARK-43014 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.2, 3.4.0 >Reporter: Zhou Yifan >Assignee: Zhou Yifan >Priority: Major > Fix For: 3.5.0 > > > If submit Spark in k8s cluster mode, `spark.app.submitTime` will be > overwritten when driver starts. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43014) spark.app.submitTime is not right in k8s cluster mode
[ https://issues.apache.org/jira/browse/SPARK-43014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-43014. --- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 40645 [https://github.com/apache/spark/pull/40645] > spark.app.submitTime is not right in k8s cluster mode > - > > Key: SPARK-43014 > URL: https://issues.apache.org/jira/browse/SPARK-43014 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.2, 3.4.0 >Reporter: Zhou Yifan >Assignee: Zhou Yifan >Priority: Major > Fix For: 3.5.0 > > > If submit Spark in k8s cluster mode, `spark.app.submitTime` will be > overwritten when driver starts. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43014) spark.app.submitTime is not right in k8s cluster mode
[ https://issues.apache.org/jira/browse/SPARK-43014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-43014: - Assignee: Zhou Yifan > spark.app.submitTime is not right in k8s cluster mode > - > > Key: SPARK-43014 > URL: https://issues.apache.org/jira/browse/SPARK-43014 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.2, 3.4.0 >Reporter: Zhou Yifan >Assignee: Zhou Yifan >Priority: Major > > If submit Spark in k8s cluster mode, `spark.app.submitTime` will be > overwritten when driver starts. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43436) Upgrade rocksdbjni to 8.1.1.1
Yang Jie created SPARK-43436: Summary: Upgrade rocksdbjni to 8.1.1.1 Key: SPARK-43436 URL: https://issues.apache.org/jira/browse/SPARK-43436 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.5.0 Reporter: Yang Jie https://github.com/facebook/rocksdb/releases/tag/v8.1.1 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43405) Remove useless code in `ScriptInputOutputSchema`
[ https://issues.apache.org/jira/browse/SPARK-43405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hövell reassigned SPARK-43405: - Assignee: Jia Fan > Remove useless code in `ScriptInputOutputSchema` > > > Key: SPARK-43405 > URL: https://issues.apache.org/jira/browse/SPARK-43405 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Jia Fan >Assignee: Jia Fan >Priority: Minor > > In case class `ScriptInputOutputSchema`, some method like `getRowFormatSQL` > naver be used. So we can remove it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43405) Remove useless code in `ScriptInputOutputSchema`
[ https://issues.apache.org/jira/browse/SPARK-43405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hövell resolved SPARK-43405. --- Fix Version/s: 3.5.0 Resolution: Fixed > Remove useless code in `ScriptInputOutputSchema` > > > Key: SPARK-43405 > URL: https://issues.apache.org/jira/browse/SPARK-43405 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Jia Fan >Assignee: Jia Fan >Priority: Minor > Fix For: 3.5.0 > > > In case class `ScriptInputOutputSchema`, some method like `getRowFormatSQL` > naver be used. So we can remove it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40912) Overhead of Exceptions in DeserializationStream
[ https://issues.apache.org/jira/browse/SPARK-40912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen reassigned SPARK-40912: Assignee: Emil Ejbyfeldt > Overhead of Exceptions in DeserializationStream > > > Key: SPARK-40912 > URL: https://issues.apache.org/jira/browse/SPARK-40912 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Emil Ejbyfeldt >Assignee: Emil Ejbyfeldt >Priority: Minor > > The interface of DeserializationStream forces implementation to raise > EOFException to indicate that there is no more data. And for the > KryoDeserializtionStream it even worse since the kryo library does not raise > EOFException we pay for the price of two exceptions for each stream. For > large shuffles with lots of small stream this is quite a bit large overhead > (seen couple % of cpu time). It also less safe to depend exceptions as it > might me raised for different reasons like corrupt data and that currently > cause data loss. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40912) Overhead of Exceptions in DeserializationStream
[ https://issues.apache.org/jira/browse/SPARK-40912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-40912. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 38428 [https://github.com/apache/spark/pull/38428] > Overhead of Exceptions in DeserializationStream > > > Key: SPARK-40912 > URL: https://issues.apache.org/jira/browse/SPARK-40912 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Emil Ejbyfeldt >Assignee: Emil Ejbyfeldt >Priority: Minor > Fix For: 3.5.0 > > > The interface of DeserializationStream forces implementation to raise > EOFException to indicate that there is no more data. And for the > KryoDeserializtionStream it even worse since the kryo library does not raise > EOFException we pay for the price of two exceptions for each stream. For > large shuffles with lots of small stream this is quite a bit large overhead > (seen couple % of cpu time). It also less safe to depend exceptions as it > might me raised for different reasons like corrupt data and that currently > cause data loss. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43434) Disable flaky doctest `pyspark.sql.connect.dataframe.DataFrame.writeStream`
[ https://issues.apache.org/jira/browse/SPARK-43434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-43434: Assignee: Ruifeng Zheng > Disable flaky doctest `pyspark.sql.connect.dataframe.DataFrame.writeStream` > --- > > Key: SPARK-43434 > URL: https://issues.apache.org/jira/browse/SPARK-43434 > Project: Spark > Issue Type: Test > Components: Connect, Tests >Affects Versions: 3.5.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43434) Disable flaky doctest `pyspark.sql.connect.dataframe.DataFrame.writeStream`
[ https://issues.apache.org/jira/browse/SPARK-43434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-43434. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41114 [https://github.com/apache/spark/pull/41114] > Disable flaky doctest `pyspark.sql.connect.dataframe.DataFrame.writeStream` > --- > > Key: SPARK-43434 > URL: https://issues.apache.org/jira/browse/SPARK-43434 > Project: Spark > Issue Type: Test > Components: Connect, Tests >Affects Versions: 3.5.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Minor > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37942) Use error classes in the compilation errors of properties
[ https://issues.apache.org/jira/browse/SPARK-37942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-37942: Assignee: Narek Karapetian > Use error classes in the compilation errors of properties > - > > Key: SPARK-37942 > URL: https://issues.apache.org/jira/browse/SPARK-37942 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Assignee: Narek Karapetian >Priority: Major > > Migrate the following errors in QueryCompilationErrors: > * cannotReadCorruptedTablePropertyError > * cannotCreateJDBCNamespaceWithPropertyError > * cannotSetJDBCNamespaceWithPropertyError > * cannotUnsetJDBCNamespaceWithPropertyError > * alterTableSerDePropertiesNotSupportedForV2TablesError > * unsetNonExistentPropertyError > onto use error classes. Throw an implementation of SparkThrowable. Also write > a test per every error in QueryCompilationErrorsSuite. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37942) Use error classes in the compilation errors of properties
[ https://issues.apache.org/jira/browse/SPARK-37942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-37942. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41018 [https://github.com/apache/spark/pull/41018] > Use error classes in the compilation errors of properties > - > > Key: SPARK-37942 > URL: https://issues.apache.org/jira/browse/SPARK-37942 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Assignee: Narek Karapetian >Priority: Major > Fix For: 3.5.0 > > > Migrate the following errors in QueryCompilationErrors: > * cannotReadCorruptedTablePropertyError > * cannotCreateJDBCNamespaceWithPropertyError > * cannotSetJDBCNamespaceWithPropertyError > * cannotUnsetJDBCNamespaceWithPropertyError > * alterTableSerDePropertiesNotSupportedForV2TablesError > * unsetNonExistentPropertyError > onto use error classes. Throw an implementation of SparkThrowable. Also write > a test per every error in QueryCompilationErrorsSuite. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43386) Improve list of suggested column/attributes in `UNRESOLVED_COLUMN.WITH_SUGGESTION` error class
[ https://issues.apache.org/jira/browse/SPARK-43386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17721264#comment-17721264 ] ASF GitHub Bot commented on SPARK-43386: User 'vitaliili-db' has created a pull request for this issue: https://github.com/apache/spark/pull/41038 > Improve list of suggested column/attributes in > `UNRESOLVED_COLUMN.WITH_SUGGESTION` error class > -- > > Key: SPARK-43386 > URL: https://issues.apache.org/jira/browse/SPARK-43386 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Vitalii Li >Priority: Major > > Match the style of unresolved column/attribute when sorting list of suggested > columns. If an unresolved column name is single-part identifier - use same > style for suggested columns. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43386) Improve list of suggested column/attributes in `UNRESOLVED_COLUMN.WITH_SUGGESTION` error class
[ https://issues.apache.org/jira/browse/SPARK-43386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17721263#comment-17721263 ] ASF GitHub Bot commented on SPARK-43386: User 'vitaliili-db' has created a pull request for this issue: https://github.com/apache/spark/pull/41038 > Improve list of suggested column/attributes in > `UNRESOLVED_COLUMN.WITH_SUGGESTION` error class > -- > > Key: SPARK-43386 > URL: https://issues.apache.org/jira/browse/SPARK-43386 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Vitalii Li >Priority: Major > > Match the style of unresolved column/attribute when sorting list of suggested > columns. If an unresolved column name is single-part identifier - use same > style for suggested columns. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43422) Tags are lost on LogicalRelation when adding _metadata
[ https://issues.apache.org/jira/browse/SPARK-43422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-43422: Assignee: Jan-Ole Sasse > Tags are lost on LogicalRelation when adding _metadata > -- > > Key: SPARK-43422 > URL: https://issues.apache.org/jira/browse/SPARK-43422 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 3.4.0 >Reporter: Jan-Ole Sasse >Assignee: Jan-Ole Sasse >Priority: Minor > > The AddMetadataColumns does not copy tags for the LogicalRelation when > adding metadata output in addMetadataCol -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43422) Tags are lost on LogicalRelation when adding _metadata
[ https://issues.apache.org/jira/browse/SPARK-43422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-43422. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41104 [https://github.com/apache/spark/pull/41104] > Tags are lost on LogicalRelation when adding _metadata > -- > > Key: SPARK-43422 > URL: https://issues.apache.org/jira/browse/SPARK-43422 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 3.4.0 >Reporter: Jan-Ole Sasse >Assignee: Jan-Ole Sasse >Priority: Minor > Fix For: 3.5.0 > > > The AddMetadataColumns does not copy tags for the LogicalRelation when > adding metadata output in addMetadataCol -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43435) re-enable doctest `pyspark.sql.connect.dataframe.DataFrame.writeStream`
Ruifeng Zheng created SPARK-43435: - Summary: re-enable doctest `pyspark.sql.connect.dataframe.DataFrame.writeStream` Key: SPARK-43435 URL: https://issues.apache.org/jira/browse/SPARK-43435 Project: Spark Issue Type: Test Components: Connect, Tests Affects Versions: 3.5.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43434) Disable flaky doctest `pyspark.sql.connect.dataframe.DataFrame.writeStream`
Ruifeng Zheng created SPARK-43434: - Summary: Disable flaky doctest `pyspark.sql.connect.dataframe.DataFrame.writeStream` Key: SPARK-43434 URL: https://issues.apache.org/jira/browse/SPARK-43434 Project: Spark Issue Type: Test Components: Connect, Tests Affects Versions: 3.5.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43427) Unsigned integer types are deserialized as signed numeric equivalents
[ https://issues.apache.org/jira/browse/SPARK-43427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Parth Upadhyay updated SPARK-43427: --- Description: I'm not sure if "bug" is the correct tag for this jira, but i've tagged it like that for now since the behavior seems odd, happy to update to "improvement" or something else based on the conversation! h2. Issue Protobuf supports unsigned integer types, including `uint32` and `uint64`. When deserializing protobuf values with fields of these types, uint32 is converted to `IntegerType` and uint64 is converted to `LongType` in the resulting spark struct. `IntegerType` and `LongType` are [signed|https://spark.apache.org/docs/latest/sql-ref-datatypes.html] integer types, so this can lead to confusing results. Namely, if a uint32 value in a stored proto is above 2^31 or a uint64 value is above 2^63, their representation in binary will contain a 1 in the highest bit, which when interpreted as a signed integer will come out as negative (I.e. overflow). I propose that we deserialize unsigned integer types into a type that can contain them correctly, e.g. uint32 => `LongType` uint64 => `Decimal(20, 0)` h2. Backwards Compatibility / Default Behavior Should we maintain backwards compatibility and we add an option that allows deserializing these types differently? Or should we change change the default behavior (with an option to go back to the old way)? I think by default it makes more sense to deserialize them as the larger types so that it's semantically more correct. However, there may be existing users of this library that would be affected by this behavior change. Though, maybe we can justify the change since the function is tagged as `Experimental` (and spark 3.4.0 was only released very recently). h2. Precedent I believe that unsigned integer types in parquet are deserialized in a similar manner, i.e. put into a larger type so that the unsigned representation natively fits. https://issues.apache.org/jira/browse/SPARK-34817 and https://github.com/apache/spark/pull/31921 was: h2. Issue Protobuf supports unsigned integer types, including `uint32` and `uint64`. When deserializing protobuf values with fields of these types, uint32 is converted to `IntegerType` and uint64 is converted to `LongType` in the resulting spark struct. `IntegerType` and `LongType` are [signed|https://spark.apache.org/docs/latest/sql-ref-datatypes.html] integer types, so this can lead to confusing results. Namely, if a uint32 value in a stored proto is above 2^31 or a uint64 value is above 2^63, their representation in binary will contain a 1 in the highest bit, which when interpreted as a signed integer will come out as negative (I.e. overflow). I propose that we deserialize unsigned integer types into a type that can contain them correctly, e.g. uint32 => `LongType` uint64 => `Decimal(20, 0)` h2. Backwards Compatibility / Default Behavior Should we maintain backwards compatibility and we add an option that allows deserializing these types differently? Or should we change change the default behavior (with an option to go back to the old way)? I think by default it makes more sense to deserialize them as the larger types so that it's semantically more correct. However, there may be existing users of this library that would be affected by this behavior change. Though, maybe we can justify the change since the function is tagged as `Experimental` (and spark 3.4.0 was only released very recently). h2. Precedent I believe that unsigned integer types in parquet are deserialized in a similar manner, i.e. put into a larger type so that the unsigned representation natively fits. https://issues.apache.org/jira/browse/SPARK-34817 and https://github.com/apache/spark/pull/31921 > Unsigned integer types are deserialized as signed numeric equivalents > - > > Key: SPARK-43427 > URL: https://issues.apache.org/jira/browse/SPARK-43427 > Project: Spark > Issue Type: Bug > Components: Protobuf >Affects Versions: 3.4.0 >Reporter: Parth Upadhyay >Priority: Major > > I'm not sure if "bug" is the correct tag for this jira, but i've tagged it > like that for now since the behavior seems odd, happy to update to > "improvement" or something else based on the conversation! > h2. Issue > Protobuf supports unsigned integer types, including `uint32` and `uint64`. > When deserializing protobuf values with fields of these types, uint32 is > converted to `IntegerType` and uint64 is converted to `LongType` in the > resulting spark struct. `IntegerType` and `LongType` are > [signed|https://spark.apache.org/docs/latest/sql-ref-datatypes.html] integer > types, so this can lead to confusing results. > Namely, if a uint32 value in a stored proto
[jira] [Commented] (SPARK-43357) Spark AWS Glue date partition push down broken
[ https://issues.apache.org/jira/browse/SPARK-43357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17721219#comment-17721219 ] Stijn De Haes commented on SPARK-43357: --- Any chance we could backport this to older versions? 3.1, 3.2, 3.4 all have the same issue. I don't know which versions are actively supported? I am willing to make new PR's to these older versions if needed. > Spark AWS Glue date partition push down broken > -- > > Key: SPARK-43357 > URL: https://issues.apache.org/jira/browse/SPARK-43357 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0, 3.1.1, 3.1.2, 3.2.0, 3.1.3, 3.2.1, 3.3.0, 3.2.2, > 3.3.1, 3.2.3, 3.2.4, 3.3.2 >Reporter: Stijn De Haes >Assignee: Stijn De Haes >Priority: Major > Fix For: 3.5.0 > > > When using the following project: > [https://github.com/awslabs/aws-glue-data-catalog-client-for-apache-hive-metastore] > To have glue supported as as a hive metastore for spark there is an issue > when reading a date-partitioned data set. Writing is fine. > You get the following error: > {quote}org.apache.hadoop.hive.metastore.api.InvalidObjectException: > Unsupported expression '2023 - 05 - 03' (Service: AWSGlue; Status Code: 400; > Error Code: InvalidInputException; Request ID: > beed68c6-b228-442e-8783-52c25b9d2243; Proxy: null) > {quote} > > A fix for this is making sure the date passed to glue is quoted -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43425) Add TimestampNTZType to ColumnarBatchRow
[ https://issues.apache.org/jira/browse/SPARK-43425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong updated SPARK-43425: - Issue Type: Bug (was: Improvement) > Add TimestampNTZType to ColumnarBatchRow > > > Key: SPARK-43425 > URL: https://issues.apache.org/jira/browse/SPARK-43425 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Fokko Driesprong >Assignee: Fokko Driesprong >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43400) Add Primary Key syntax support
[ https://issues.apache.org/jira/browse/SPARK-43400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] melin updated SPARK-43400: -- Summary: Add Primary Key syntax support (was: create table support the PRIMARY KEY keyword) > Add Primary Key syntax support > -- > > Key: SPARK-43400 > URL: https://issues.apache.org/jira/browse/SPARK-43400 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: melin >Priority: Major > > apache paimon and hudi support primary key definitions. It is necessary to > support the primary key definition syntax > https://docs.snowflake.com/en/sql-reference/sql/create-table-constraint#constraint-properties > [~gurwls223] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org