[jira] [Comment Edited] (SPARK-36088) 'spark.archives' does not extract the archive file into the driver under client mode
[ https://issues.apache.org/jira/browse/SPARK-36088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17461230#comment-17461230 ] jingxiong zhong edited comment on SPARK-36088 at 12/17/21, 6:53 AM: In cluster mode, I hava another question that when I unzip python3.6.6.zip in pod , but no permission to execute, my execute operation as follows: {code:sh} spark-submit \ --archives ./python3.6.6.zip#python3.6.6 \ --conf "spark.pyspark.python=python3.6.6/python3.6.6/bin/python3" \ --conf "spark.pyspark.driver.python=python3.6.6/python3.6.6/bin/python3" \ --conf spark.kubernetes.container.image.pullPolicy=Always \ ./examples/src/main/python/pi.py 100 {code} was (Author: JIRAUSER281124): In cluster mode, I hava another question that when I unzip python3.6.6.zip in pod , but no permission to execute, my execute operation as follows: {code:shell} spark-submit \ --archives ./python3.6.6.zip#python3.6.6 \ --conf "spark.pyspark.python=python3.6.6/python3.6.6/bin/python3" \ --conf "spark.pyspark.driver.python=python3.6.6/python3.6.6/bin/python3" \ --conf spark.kubernetes.container.image.pullPolicy=Always \ ./examples/src/main/python/pi.py 100 {code} > 'spark.archives' does not extract the archive file into the driver under > client mode > > > Key: SPARK-36088 > URL: https://issues.apache.org/jira/browse/SPARK-36088 > Project: Spark > Issue Type: Improvement > Components: Kubernetes, Spark Submit >Affects Versions: 3.1.2 >Reporter: rickcheng >Priority: Major > > When running spark in the k8s cluster, there are 2 deploy modes: cluster and > client. After my test, in the cluster mode, *spark.archives* can extract the > archive file to the working directory of the executors and driver. But in > client mode, *spark.archives* can only extract the archive file to the > working directory of the executors. > > However, I need *spark.archives* to send the virtual environment tar file > packaged by conda to both the driver and executors under client mode (So that > the executor and the driver have the same python environment). > > Why *spark.archives* does not extract the archive file into the working > directory of the driver under client mode? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36088) 'spark.archives' does not extract the archive file into the driver under client mode
[ https://issues.apache.org/jira/browse/SPARK-36088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17461230#comment-17461230 ] jingxiong zhong commented on SPARK-36088: - In cluster mode, I hava another question that when I unzip python3.6.6.zip in pod , but no permission to execute, my execute operation as follows: {code:shell} spark-submit \ --archives ./python3.6.6.zip#python3.6.6 \ --conf "spark.pyspark.python=python3.6.6/python3.6.6/bin/python3" \ --conf "spark.pyspark.driver.python=python3.6.6/python3.6.6/bin/python3" \ --conf spark.kubernetes.container.image.pullPolicy=Always \ ./examples/src/main/python/pi.py 100 {code} > 'spark.archives' does not extract the archive file into the driver under > client mode > > > Key: SPARK-36088 > URL: https://issues.apache.org/jira/browse/SPARK-36088 > Project: Spark > Issue Type: Improvement > Components: Kubernetes, Spark Submit >Affects Versions: 3.1.2 >Reporter: rickcheng >Priority: Major > > When running spark in the k8s cluster, there are 2 deploy modes: cluster and > client. After my test, in the cluster mode, *spark.archives* can extract the > archive file to the working directory of the executors and driver. But in > client mode, *spark.archives* can only extract the archive file to the > working directory of the executors. > > However, I need *spark.archives* to send the virtual environment tar file > packaged by conda to both the driver and executors under client mode (So that > the executor and the driver have the same python environment). > > Why *spark.archives* does not extract the archive file into the working > directory of the driver under client mode? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37673) Implement `ps.timedelta_range` method
[ https://issues.apache.org/jira/browse/SPARK-37673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17461221#comment-17461221 ] Apache Spark commented on SPARK-37673: -- User 'xinrong-databricks' has created a pull request for this issue: https://github.com/apache/spark/pull/34932 > Implement `ps.timedelta_range` method > - > > Key: SPARK-37673 > URL: https://issues.apache.org/jira/browse/SPARK-37673 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Xinrong Meng >Priority: Major > > Implement `ps.timedelta_range` method -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37673) Implement `ps.timedelta_range` method
[ https://issues.apache.org/jira/browse/SPARK-37673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37673: Assignee: (was: Apache Spark) > Implement `ps.timedelta_range` method > - > > Key: SPARK-37673 > URL: https://issues.apache.org/jira/browse/SPARK-37673 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Xinrong Meng >Priority: Major > > Implement `ps.timedelta_range` method -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37673) Implement `ps.timedelta_range` method
[ https://issues.apache.org/jira/browse/SPARK-37673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17461220#comment-17461220 ] Apache Spark commented on SPARK-37673: -- User 'xinrong-databricks' has created a pull request for this issue: https://github.com/apache/spark/pull/34932 > Implement `ps.timedelta_range` method > - > > Key: SPARK-37673 > URL: https://issues.apache.org/jira/browse/SPARK-37673 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Xinrong Meng >Priority: Major > > Implement `ps.timedelta_range` method -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37673) Implement `ps.timedelta_range` method
[ https://issues.apache.org/jira/browse/SPARK-37673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37673: Assignee: Apache Spark > Implement `ps.timedelta_range` method > - > > Key: SPARK-37673 > URL: https://issues.apache.org/jira/browse/SPARK-37673 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Xinrong Meng >Assignee: Apache Spark >Priority: Major > > Implement `ps.timedelta_range` method -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37673) Implement `ps.timedelta_range` method
Xinrong Meng created SPARK-37673: Summary: Implement `ps.timedelta_range` method Key: SPARK-37673 URL: https://issues.apache.org/jira/browse/SPARK-37673 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.3.0 Reporter: Xinrong Meng Implement `ps.timedelta_range` method -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37657) Support str and timestamp for (Series|DataFrame).describe()
[ https://issues.apache.org/jira/browse/SPARK-37657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37657: Assignee: (was: Apache Spark) > Support str and timestamp for (Series|DataFrame).describe() > --- > > Key: SPARK-37657 > URL: https://issues.apache.org/jira/browse/SPARK-37657 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Haejoon Lee >Priority: Major > > Initialized in Koalas issue: > [https://github.com/databricks/koalas/issues/1888] > > The `(Series|DataFrame).describe()` in pandas API on Spark doesn't work > properly when DataFrame has no numeric column. > > > {code:java} > >>> df = ps.DataFrame({'a': ["a", "b", "c"]}) > >>> df.describe() > Traceback (most recent call last): > File "", line 1, in > File "/.../python/pyspark/pandas/frame.py", line 7582, in describe > raise ValueError("Cannot describe a DataFrame without columns") > ValueError: Cannot describe a DataFrame without columns > {code} > > As it works fine in pandas, we should fix it. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37657) Support str and timestamp for (Series|DataFrame).describe()
[ https://issues.apache.org/jira/browse/SPARK-37657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37657: Assignee: Apache Spark > Support str and timestamp for (Series|DataFrame).describe() > --- > > Key: SPARK-37657 > URL: https://issues.apache.org/jira/browse/SPARK-37657 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Haejoon Lee >Assignee: Apache Spark >Priority: Major > > Initialized in Koalas issue: > [https://github.com/databricks/koalas/issues/1888] > > The `(Series|DataFrame).describe()` in pandas API on Spark doesn't work > properly when DataFrame has no numeric column. > > > {code:java} > >>> df = ps.DataFrame({'a': ["a", "b", "c"]}) > >>> df.describe() > Traceback (most recent call last): > File "", line 1, in > File "/.../python/pyspark/pandas/frame.py", line 7582, in describe > raise ValueError("Cannot describe a DataFrame without columns") > ValueError: Cannot describe a DataFrame without columns > {code} > > As it works fine in pandas, we should fix it. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37657) Support str and timestamp for (Series|DataFrame).describe()
[ https://issues.apache.org/jira/browse/SPARK-37657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17461205#comment-17461205 ] Apache Spark commented on SPARK-37657: -- User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/34931 > Support str and timestamp for (Series|DataFrame).describe() > --- > > Key: SPARK-37657 > URL: https://issues.apache.org/jira/browse/SPARK-37657 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Haejoon Lee >Priority: Major > > Initialized in Koalas issue: > [https://github.com/databricks/koalas/issues/1888] > > The `(Series|DataFrame).describe()` in pandas API on Spark doesn't work > properly when DataFrame has no numeric column. > > > {code:java} > >>> df = ps.DataFrame({'a': ["a", "b", "c"]}) > >>> df.describe() > Traceback (most recent call last): > File "", line 1, in > File "/.../python/pyspark/pandas/frame.py", line 7582, in describe > raise ValueError("Cannot describe a DataFrame without columns") > ValueError: Cannot describe a DataFrame without columns > {code} > > As it works fine in pandas, we should fix it. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37672) Support ANSI Aggregate Function: regr_sxx
[ https://issues.apache.org/jira/browse/SPARK-37672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17461204#comment-17461204 ] jiaan.geng commented on SPARK-37672: I'm working on. > Support ANSI Aggregate Function: regr_sxx > - > > Key: SPARK-37672 > URL: https://issues.apache.org/jira/browse/SPARK-37672 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Priority: Major > > REGR_SXX is an ANSI aggregate function. many database support it. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37590) Unify v1 and v2 ALTER NAMESPACE ... SET PROPERTIES tests
[ https://issues.apache.org/jira/browse/SPARK-37590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17461202#comment-17461202 ] Apache Spark commented on SPARK-37590: -- User 'imback82' has created a pull request for this issue: https://github.com/apache/spark/pull/34930 > Unify v1 and v2 ALTER NAMESPACE ... SET PROPERTIES tests > > > Key: SPARK-37590 > URL: https://issues.apache.org/jira/browse/SPARK-37590 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Terry Kim >Assignee: Terry Kim >Priority: Major > Fix For: 3.3.0 > > > Unify v1 and v2 ALTER NAMESPACE ... SET PROPERTIES tests -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37590) Unify v1 and v2 ALTER NAMESPACE ... SET PROPERTIES tests
[ https://issues.apache.org/jira/browse/SPARK-37590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17461201#comment-17461201 ] Apache Spark commented on SPARK-37590: -- User 'imback82' has created a pull request for this issue: https://github.com/apache/spark/pull/34930 > Unify v1 and v2 ALTER NAMESPACE ... SET PROPERTIES tests > > > Key: SPARK-37590 > URL: https://issues.apache.org/jira/browse/SPARK-37590 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Terry Kim >Assignee: Terry Kim >Priority: Major > Fix For: 3.3.0 > > > Unify v1 and v2 ALTER NAMESPACE ... SET PROPERTIES tests -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37657) Support str and timestamp for (Series|DataFrame).describe()
[ https://issues.apache.org/jira/browse/SPARK-37657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-37657: Summary: Support str and timestamp for (Series|DataFrame).describe() (was: Fix the bug in ps.(Series|DataFrame).describe()) > Support str and timestamp for (Series|DataFrame).describe() > --- > > Key: SPARK-37657 > URL: https://issues.apache.org/jira/browse/SPARK-37657 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Haejoon Lee >Priority: Major > > Initialized in Koalas issue: > [https://github.com/databricks/koalas/issues/1888] > > The `(Series|DataFrame).describe()` in pandas API on Spark doesn't work > properly when DataFrame has no numeric column. > > > {code:java} > >>> df = ps.DataFrame({'a': ["a", "b", "c"]}) > >>> df.describe() > Traceback (most recent call last): > File "", line 1, in > File "/.../python/pyspark/pandas/frame.py", line 7582, in describe > raise ValueError("Cannot describe a DataFrame without columns") > ValueError: Cannot describe a DataFrame without columns > {code} > > As it works fine in pandas, we should fix it. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37672) Support ANSI Aggregate Function: regr_sxx
jiaan.geng created SPARK-37672: -- Summary: Support ANSI Aggregate Function: regr_sxx Key: SPARK-37672 URL: https://issues.apache.org/jira/browse/SPARK-37672 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.3.0 Reporter: jiaan.geng REGR_SXX is an ANSI aggregate function. many database support it. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37641) Support ANSI Aggregate Function: regr_r2
[ https://issues.apache.org/jira/browse/SPARK-37641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-37641: --- Parent: SPARK-37671 Issue Type: Sub-task (was: New Feature) > Support ANSI Aggregate Function: regr_r2 > > > Key: SPARK-37641 > URL: https://issues.apache.org/jira/browse/SPARK-37641 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Priority: Major > > REGR_R2 is an ANSI aggregate function. many database support it. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37623) Support ANSI Aggregate Function: regr_slope & regr_intercept
[ https://issues.apache.org/jira/browse/SPARK-37623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-37623: --- Parent: SPARK-37671 Issue Type: Sub-task (was: New Feature) > Support ANSI Aggregate Function: regr_slope & regr_intercept > > > Key: SPARK-37623 > URL: https://issues.apache.org/jira/browse/SPARK-37623 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Priority: Major > > REGR_SLOPE is an ANSI aggregate functions. many database support it. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37614) Support ANSI Aggregate Function: regr_avgx & regr_avgy
[ https://issues.apache.org/jira/browse/SPARK-37614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-37614: --- Parent: SPARK-37671 Issue Type: Sub-task (was: New Feature) > Support ANSI Aggregate Function: regr_avgx & regr_avgy > -- > > Key: SPARK-37614 > URL: https://issues.apache.org/jira/browse/SPARK-37614 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Priority: Major > > REGR_AVGX and REGR_AVGY are ANSI aggregate functions. many database support > it. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37613) Support ANSI Aggregate Function: regr_count
[ https://issues.apache.org/jira/browse/SPARK-37613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-37613: --- Parent: SPARK-37671 Issue Type: Sub-task (was: New Feature) > Support ANSI Aggregate Function: regr_count > --- > > Key: SPARK-37613 > URL: https://issues.apache.org/jira/browse/SPARK-37613 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > Fix For: 3.3.0 > > > REGR_COUNT is an ANSI aggregate function. many database support it. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37671) Support ANSI Aggregation Function of regression
jiaan.geng created SPARK-37671: -- Summary: Support ANSI Aggregation Function of regression Key: SPARK-37671 URL: https://issues.apache.org/jira/browse/SPARK-37671 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 3.3.0 Reporter: jiaan.geng Support ANSI Aggregation Function of regression -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37670) Support predicate pushdown and column pruning for de-duped CTEs
[ https://issues.apache.org/jira/browse/SPARK-37670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37670: Assignee: (was: Apache Spark) > Support predicate pushdown and column pruning for de-duped CTEs > --- > > Key: SPARK-37670 > URL: https://issues.apache.org/jira/browse/SPARK-37670 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Wei Xue >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37670) Support predicate pushdown and column pruning for de-duped CTEs
[ https://issues.apache.org/jira/browse/SPARK-37670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37670: Assignee: Apache Spark > Support predicate pushdown and column pruning for de-duped CTEs > --- > > Key: SPARK-37670 > URL: https://issues.apache.org/jira/browse/SPARK-37670 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Wei Xue >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37670) Support predicate pushdown and column pruning for de-duped CTEs
[ https://issues.apache.org/jira/browse/SPARK-37670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17461195#comment-17461195 ] Apache Spark commented on SPARK-37670: -- User 'maryannxue' has created a pull request for this issue: https://github.com/apache/spark/pull/34929 > Support predicate pushdown and column pruning for de-duped CTEs > --- > > Key: SPARK-37670 > URL: https://issues.apache.org/jira/browse/SPARK-37670 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Wei Xue >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37670) Support predicate pushdown and column pruning for de-duped CTEs
Wei Xue created SPARK-37670: --- Summary: Support predicate pushdown and column pruning for de-duped CTEs Key: SPARK-37670 URL: https://issues.apache.org/jira/browse/SPARK-37670 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.0 Reporter: Wei Xue -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37654) Regression - NullPointerException in Row.getSeq when field null
[ https://issues.apache.org/jira/browse/SPARK-37654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-37654. -- Fix Version/s: 3.3.0 3.2.1 3.1.3 Resolution: Fixed Issue resolved by pull request 34928 [https://github.com/apache/spark/pull/34928] > Regression - NullPointerException in Row.getSeq when field null > --- > > Key: SPARK-37654 > URL: https://issues.apache.org/jira/browse/SPARK-37654 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.1, 3.1.2, 3.2.0 >Reporter: Brandon Dahler >Priority: Major > Fix For: 3.3.0, 3.2.1, 3.1.3 > > > h2. Description > A NullPointerException occurs in _org.apache.spark.sql.Row.getSeq(int)_ if > the row contains a _null_ value at the requested index. > {code:java} > java.lang.NullPointerException > at org.apache.spark.sql.Row.getSeq(Row.scala:319) > at org.apache.spark.sql.Row.getSeq$(Row.scala:319) > at > org.apache.spark.sql.catalyst.expressions.GenericRow.getSeq(rows.scala:166) > at org.apache.spark.sql.Row.getList(Row.scala:327) > at org.apache.spark.sql.Row.getList$(Row.scala:326) > at > org.apache.spark.sql.catalyst.expressions.GenericRow.getList(rows.scala:166) > ... > {code} > > Prior to 3.1.1, the code would not throw an exception and instead would > return a null _Seq_ instance. > h2. Reproduction > # Start a new spark-shell instance > # Execute the following script: > {code:scala} > import org.apache.spark.sql.Row > Row(Seq("value")).getSeq(0) > Row(Seq()).getSeq(0) > Row(null).getSeq(0) {code} > h3. Expected Output > res2 outputs a _null_ value. > {code:java} > scala> import org.apache.spark.sql.Row > import org.apache.spark.sql.Row > scala> > scala> Row(Seq("value")).getSeq(0) > res0: Seq[Nothing] = List(value) > scala> Row(Seq()).getSeq(0) > res1: Seq[Nothing] = List() > scala> Row(null).getSeq(0) > res2: Seq[Nothing] = null > {code} > h3. Actual Output > res2 throws a NullPointerException. > {code:java} > scala> import org.apache.spark.sql.Row > import org.apache.spark.sql.Row > scala> > scala> Row(Seq("value")).getSeq(0) > res0: Seq[Nothing] = List(value) > scala> Row(Seq()).getSeq(0) > res1: Seq[Nothing] = List() > scala> Row(null).getSeq(0) > java.lang.NullPointerException > at org.apache.spark.sql.Row.getSeq(Row.scala:319) > at org.apache.spark.sql.Row.getSeq$(Row.scala:319) > at > org.apache.spark.sql.catalyst.expressions.GenericRow.getSeq(rows.scala:166) > ... 47 elided > {code} > h3. Environments Tested > Tested against the following releases using the provided reproduction steps: > # spark-3.0.3-bin-hadoop2.7 - Succeeded > {code:java} > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/___/ .__/\_,_/_/ /_/\_\ version 3.0.3 > /_/Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java > 1.8.0_312) {code} > # spark-3.1.2-bin-hadoop3.2 - Failed > {code:java} > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/___/ .__/\_,_/_/ /_/\_\ version 3.1.2 > /_/Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java > 1.8.0_312) {code} > # spark-3.2.0-bin-hadoop3.2 - Failed > {code:java} > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/___/ .__/\_,_/_/ /_/\_\ version 3.2.0 > /_/Using Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java > 1.8.0_312) {code} > h2. Regression Source > The regression appears to have been introduced in > [25c7d0fe6ae20a4c1c42e0cd0b448c08ab03f3fb|https://github.com/apache/spark/commit/25c7d0fe6ae20a4c1c42e0cd0b448c08ab03f3fb#diff-722324a11a0e4635a59a9debc962da2c1678d86702a9a106fd0d51188f83853bR317], > which addressed > [SPARK-32526|https://issues.apache.org/jira/browse/SPARK-32526] > h2. Work Around > This regression can be worked around by using _Row.isNullAt(int)_ and > handling the null scenario in user code, prior to calling _Row.getSeq(int)_ > or _Row.getList(int)_. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37654) Regression - NullPointerException in Row.getSeq when field null
[ https://issues.apache.org/jira/browse/SPARK-37654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-37654: Assignee: Huaxin Gao > Regression - NullPointerException in Row.getSeq when field null > --- > > Key: SPARK-37654 > URL: https://issues.apache.org/jira/browse/SPARK-37654 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.1, 3.1.2, 3.2.0 >Reporter: Brandon Dahler >Assignee: Huaxin Gao >Priority: Major > Fix For: 3.1.3, 3.2.1, 3.3.0 > > > h2. Description > A NullPointerException occurs in _org.apache.spark.sql.Row.getSeq(int)_ if > the row contains a _null_ value at the requested index. > {code:java} > java.lang.NullPointerException > at org.apache.spark.sql.Row.getSeq(Row.scala:319) > at org.apache.spark.sql.Row.getSeq$(Row.scala:319) > at > org.apache.spark.sql.catalyst.expressions.GenericRow.getSeq(rows.scala:166) > at org.apache.spark.sql.Row.getList(Row.scala:327) > at org.apache.spark.sql.Row.getList$(Row.scala:326) > at > org.apache.spark.sql.catalyst.expressions.GenericRow.getList(rows.scala:166) > ... > {code} > > Prior to 3.1.1, the code would not throw an exception and instead would > return a null _Seq_ instance. > h2. Reproduction > # Start a new spark-shell instance > # Execute the following script: > {code:scala} > import org.apache.spark.sql.Row > Row(Seq("value")).getSeq(0) > Row(Seq()).getSeq(0) > Row(null).getSeq(0) {code} > h3. Expected Output > res2 outputs a _null_ value. > {code:java} > scala> import org.apache.spark.sql.Row > import org.apache.spark.sql.Row > scala> > scala> Row(Seq("value")).getSeq(0) > res0: Seq[Nothing] = List(value) > scala> Row(Seq()).getSeq(0) > res1: Seq[Nothing] = List() > scala> Row(null).getSeq(0) > res2: Seq[Nothing] = null > {code} > h3. Actual Output > res2 throws a NullPointerException. > {code:java} > scala> import org.apache.spark.sql.Row > import org.apache.spark.sql.Row > scala> > scala> Row(Seq("value")).getSeq(0) > res0: Seq[Nothing] = List(value) > scala> Row(Seq()).getSeq(0) > res1: Seq[Nothing] = List() > scala> Row(null).getSeq(0) > java.lang.NullPointerException > at org.apache.spark.sql.Row.getSeq(Row.scala:319) > at org.apache.spark.sql.Row.getSeq$(Row.scala:319) > at > org.apache.spark.sql.catalyst.expressions.GenericRow.getSeq(rows.scala:166) > ... 47 elided > {code} > h3. Environments Tested > Tested against the following releases using the provided reproduction steps: > # spark-3.0.3-bin-hadoop2.7 - Succeeded > {code:java} > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/___/ .__/\_,_/_/ /_/\_\ version 3.0.3 > /_/Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java > 1.8.0_312) {code} > # spark-3.1.2-bin-hadoop3.2 - Failed > {code:java} > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/___/ .__/\_,_/_/ /_/\_\ version 3.1.2 > /_/Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java > 1.8.0_312) {code} > # spark-3.2.0-bin-hadoop3.2 - Failed > {code:java} > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/___/ .__/\_,_/_/ /_/\_\ version 3.2.0 > /_/Using Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java > 1.8.0_312) {code} > h2. Regression Source > The regression appears to have been introduced in > [25c7d0fe6ae20a4c1c42e0cd0b448c08ab03f3fb|https://github.com/apache/spark/commit/25c7d0fe6ae20a4c1c42e0cd0b448c08ab03f3fb#diff-722324a11a0e4635a59a9debc962da2c1678d86702a9a106fd0d51188f83853bR317], > which addressed > [SPARK-32526|https://issues.apache.org/jira/browse/SPARK-32526] > h2. Work Around > This regression can be worked around by using _Row.isNullAt(int)_ and > handling the null scenario in user code, prior to calling _Row.getSeq(int)_ > or _Row.getList(int)_. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37613) Support ANSI Aggregate Function: regr_count
[ https://issues.apache.org/jira/browse/SPARK-37613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-37613: --- Assignee: jiaan.geng > Support ANSI Aggregate Function: regr_count > --- > > Key: SPARK-37613 > URL: https://issues.apache.org/jira/browse/SPARK-37613 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > > REGR_COUNT is an ANSI aggregate function. many database support it. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37613) Support ANSI Aggregate Function: regr_count
[ https://issues.apache.org/jira/browse/SPARK-37613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-37613. - Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 34880 [https://github.com/apache/spark/pull/34880] > Support ANSI Aggregate Function: regr_count > --- > > Key: SPARK-37613 > URL: https://issues.apache.org/jira/browse/SPARK-37613 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > Fix For: 3.3.0 > > > REGR_COUNT is an ANSI aggregate function. many database support it. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37669) Remove unnecessary usages of OrderedDict
[ https://issues.apache.org/jira/browse/SPARK-37669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-37669: Assignee: Takuya Ueshin > Remove unnecessary usages of OrderedDict > > > Key: SPARK-37669 > URL: https://issues.apache.org/jira/browse/SPARK-37669 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Takuya Ueshin >Assignee: Takuya Ueshin >Priority: Major > > Now that supported Python is 3.7 and above, we can remove unnecessary usages > of {{OrderedDict}} because built-in dict guarantees the insertion order. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37669) Remove unnecessary usages of OrderedDict
[ https://issues.apache.org/jira/browse/SPARK-37669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-37669. -- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 34926 [https://github.com/apache/spark/pull/34926] > Remove unnecessary usages of OrderedDict > > > Key: SPARK-37669 > URL: https://issues.apache.org/jira/browse/SPARK-37669 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Takuya Ueshin >Assignee: Takuya Ueshin >Priority: Major > Fix For: 3.3.0 > > > Now that supported Python is 3.7 and above, we can remove unnecessary usages > of {{OrderedDict}} because built-in dict guarantees the insertion order. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37654) Regression - NullPointerException in Row.getSeq when field null
[ https://issues.apache.org/jira/browse/SPARK-37654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37654: Assignee: (was: Apache Spark) > Regression - NullPointerException in Row.getSeq when field null > --- > > Key: SPARK-37654 > URL: https://issues.apache.org/jira/browse/SPARK-37654 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.1, 3.1.2, 3.2.0 >Reporter: Brandon Dahler >Priority: Major > > h2. Description > A NullPointerException occurs in _org.apache.spark.sql.Row.getSeq(int)_ if > the row contains a _null_ value at the requested index. > {code:java} > java.lang.NullPointerException > at org.apache.spark.sql.Row.getSeq(Row.scala:319) > at org.apache.spark.sql.Row.getSeq$(Row.scala:319) > at > org.apache.spark.sql.catalyst.expressions.GenericRow.getSeq(rows.scala:166) > at org.apache.spark.sql.Row.getList(Row.scala:327) > at org.apache.spark.sql.Row.getList$(Row.scala:326) > at > org.apache.spark.sql.catalyst.expressions.GenericRow.getList(rows.scala:166) > ... > {code} > > Prior to 3.1.1, the code would not throw an exception and instead would > return a null _Seq_ instance. > h2. Reproduction > # Start a new spark-shell instance > # Execute the following script: > {code:scala} > import org.apache.spark.sql.Row > Row(Seq("value")).getSeq(0) > Row(Seq()).getSeq(0) > Row(null).getSeq(0) {code} > h3. Expected Output > res2 outputs a _null_ value. > {code:java} > scala> import org.apache.spark.sql.Row > import org.apache.spark.sql.Row > scala> > scala> Row(Seq("value")).getSeq(0) > res0: Seq[Nothing] = List(value) > scala> Row(Seq()).getSeq(0) > res1: Seq[Nothing] = List() > scala> Row(null).getSeq(0) > res2: Seq[Nothing] = null > {code} > h3. Actual Output > res2 throws a NullPointerException. > {code:java} > scala> import org.apache.spark.sql.Row > import org.apache.spark.sql.Row > scala> > scala> Row(Seq("value")).getSeq(0) > res0: Seq[Nothing] = List(value) > scala> Row(Seq()).getSeq(0) > res1: Seq[Nothing] = List() > scala> Row(null).getSeq(0) > java.lang.NullPointerException > at org.apache.spark.sql.Row.getSeq(Row.scala:319) > at org.apache.spark.sql.Row.getSeq$(Row.scala:319) > at > org.apache.spark.sql.catalyst.expressions.GenericRow.getSeq(rows.scala:166) > ... 47 elided > {code} > h3. Environments Tested > Tested against the following releases using the provided reproduction steps: > # spark-3.0.3-bin-hadoop2.7 - Succeeded > {code:java} > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/___/ .__/\_,_/_/ /_/\_\ version 3.0.3 > /_/Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java > 1.8.0_312) {code} > # spark-3.1.2-bin-hadoop3.2 - Failed > {code:java} > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/___/ .__/\_,_/_/ /_/\_\ version 3.1.2 > /_/Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java > 1.8.0_312) {code} > # spark-3.2.0-bin-hadoop3.2 - Failed > {code:java} > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/___/ .__/\_,_/_/ /_/\_\ version 3.2.0 > /_/Using Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java > 1.8.0_312) {code} > h2. Regression Source > The regression appears to have been introduced in > [25c7d0fe6ae20a4c1c42e0cd0b448c08ab03f3fb|https://github.com/apache/spark/commit/25c7d0fe6ae20a4c1c42e0cd0b448c08ab03f3fb#diff-722324a11a0e4635a59a9debc962da2c1678d86702a9a106fd0d51188f83853bR317], > which addressed > [SPARK-32526|https://issues.apache.org/jira/browse/SPARK-32526] > h2. Work Around > This regression can be worked around by using _Row.isNullAt(int)_ and > handling the null scenario in user code, prior to calling _Row.getSeq(int)_ > or _Row.getList(int)_. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37654) Regression - NullPointerException in Row.getSeq when field null
[ https://issues.apache.org/jira/browse/SPARK-37654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37654: Assignee: Apache Spark > Regression - NullPointerException in Row.getSeq when field null > --- > > Key: SPARK-37654 > URL: https://issues.apache.org/jira/browse/SPARK-37654 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.1, 3.1.2, 3.2.0 >Reporter: Brandon Dahler >Assignee: Apache Spark >Priority: Major > > h2. Description > A NullPointerException occurs in _org.apache.spark.sql.Row.getSeq(int)_ if > the row contains a _null_ value at the requested index. > {code:java} > java.lang.NullPointerException > at org.apache.spark.sql.Row.getSeq(Row.scala:319) > at org.apache.spark.sql.Row.getSeq$(Row.scala:319) > at > org.apache.spark.sql.catalyst.expressions.GenericRow.getSeq(rows.scala:166) > at org.apache.spark.sql.Row.getList(Row.scala:327) > at org.apache.spark.sql.Row.getList$(Row.scala:326) > at > org.apache.spark.sql.catalyst.expressions.GenericRow.getList(rows.scala:166) > ... > {code} > > Prior to 3.1.1, the code would not throw an exception and instead would > return a null _Seq_ instance. > h2. Reproduction > # Start a new spark-shell instance > # Execute the following script: > {code:scala} > import org.apache.spark.sql.Row > Row(Seq("value")).getSeq(0) > Row(Seq()).getSeq(0) > Row(null).getSeq(0) {code} > h3. Expected Output > res2 outputs a _null_ value. > {code:java} > scala> import org.apache.spark.sql.Row > import org.apache.spark.sql.Row > scala> > scala> Row(Seq("value")).getSeq(0) > res0: Seq[Nothing] = List(value) > scala> Row(Seq()).getSeq(0) > res1: Seq[Nothing] = List() > scala> Row(null).getSeq(0) > res2: Seq[Nothing] = null > {code} > h3. Actual Output > res2 throws a NullPointerException. > {code:java} > scala> import org.apache.spark.sql.Row > import org.apache.spark.sql.Row > scala> > scala> Row(Seq("value")).getSeq(0) > res0: Seq[Nothing] = List(value) > scala> Row(Seq()).getSeq(0) > res1: Seq[Nothing] = List() > scala> Row(null).getSeq(0) > java.lang.NullPointerException > at org.apache.spark.sql.Row.getSeq(Row.scala:319) > at org.apache.spark.sql.Row.getSeq$(Row.scala:319) > at > org.apache.spark.sql.catalyst.expressions.GenericRow.getSeq(rows.scala:166) > ... 47 elided > {code} > h3. Environments Tested > Tested against the following releases using the provided reproduction steps: > # spark-3.0.3-bin-hadoop2.7 - Succeeded > {code:java} > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/___/ .__/\_,_/_/ /_/\_\ version 3.0.3 > /_/Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java > 1.8.0_312) {code} > # spark-3.1.2-bin-hadoop3.2 - Failed > {code:java} > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/___/ .__/\_,_/_/ /_/\_\ version 3.1.2 > /_/Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java > 1.8.0_312) {code} > # spark-3.2.0-bin-hadoop3.2 - Failed > {code:java} > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/___/ .__/\_,_/_/ /_/\_\ version 3.2.0 > /_/Using Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java > 1.8.0_312) {code} > h2. Regression Source > The regression appears to have been introduced in > [25c7d0fe6ae20a4c1c42e0cd0b448c08ab03f3fb|https://github.com/apache/spark/commit/25c7d0fe6ae20a4c1c42e0cd0b448c08ab03f3fb#diff-722324a11a0e4635a59a9debc962da2c1678d86702a9a106fd0d51188f83853bR317], > which addressed > [SPARK-32526|https://issues.apache.org/jira/browse/SPARK-32526] > h2. Work Around > This regression can be worked around by using _Row.isNullAt(int)_ and > handling the null scenario in user code, prior to calling _Row.getSeq(int)_ > or _Row.getList(int)_. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37666) Set `GCM` as the default mode in `aes_encrypt()`/`aes_decrypt()`
[ https://issues.apache.org/jira/browse/SPARK-37666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-37666. - Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 34925 [https://github.com/apache/spark/pull/34925] > Set `GCM` as the default mode in `aes_encrypt()`/`aes_decrypt()` > > > Key: SPARK-37666 > URL: https://issues.apache.org/jira/browse/SPARK-37666 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Fix For: 3.3.0 > > > Change the default mode from ECB to GCM in AES functions: aes_encrypt() and > aes_decrypt(). GCM is much more preferable because it is semantically secure. > Also the mode is used the default one in other systems like Snowflake, see > https://docs.snowflake.com/en/sql-reference/functions/encrypt.html -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37654) Regression - NullPointerException in Row.getSeq when field null
[ https://issues.apache.org/jira/browse/SPARK-37654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17461156#comment-17461156 ] Apache Spark commented on SPARK-37654: -- User 'huaxingao' has created a pull request for this issue: https://github.com/apache/spark/pull/34928 > Regression - NullPointerException in Row.getSeq when field null > --- > > Key: SPARK-37654 > URL: https://issues.apache.org/jira/browse/SPARK-37654 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.1, 3.1.2, 3.2.0 >Reporter: Brandon Dahler >Priority: Major > > h2. Description > A NullPointerException occurs in _org.apache.spark.sql.Row.getSeq(int)_ if > the row contains a _null_ value at the requested index. > {code:java} > java.lang.NullPointerException > at org.apache.spark.sql.Row.getSeq(Row.scala:319) > at org.apache.spark.sql.Row.getSeq$(Row.scala:319) > at > org.apache.spark.sql.catalyst.expressions.GenericRow.getSeq(rows.scala:166) > at org.apache.spark.sql.Row.getList(Row.scala:327) > at org.apache.spark.sql.Row.getList$(Row.scala:326) > at > org.apache.spark.sql.catalyst.expressions.GenericRow.getList(rows.scala:166) > ... > {code} > > Prior to 3.1.1, the code would not throw an exception and instead would > return a null _Seq_ instance. > h2. Reproduction > # Start a new spark-shell instance > # Execute the following script: > {code:scala} > import org.apache.spark.sql.Row > Row(Seq("value")).getSeq(0) > Row(Seq()).getSeq(0) > Row(null).getSeq(0) {code} > h3. Expected Output > res2 outputs a _null_ value. > {code:java} > scala> import org.apache.spark.sql.Row > import org.apache.spark.sql.Row > scala> > scala> Row(Seq("value")).getSeq(0) > res0: Seq[Nothing] = List(value) > scala> Row(Seq()).getSeq(0) > res1: Seq[Nothing] = List() > scala> Row(null).getSeq(0) > res2: Seq[Nothing] = null > {code} > h3. Actual Output > res2 throws a NullPointerException. > {code:java} > scala> import org.apache.spark.sql.Row > import org.apache.spark.sql.Row > scala> > scala> Row(Seq("value")).getSeq(0) > res0: Seq[Nothing] = List(value) > scala> Row(Seq()).getSeq(0) > res1: Seq[Nothing] = List() > scala> Row(null).getSeq(0) > java.lang.NullPointerException > at org.apache.spark.sql.Row.getSeq(Row.scala:319) > at org.apache.spark.sql.Row.getSeq$(Row.scala:319) > at > org.apache.spark.sql.catalyst.expressions.GenericRow.getSeq(rows.scala:166) > ... 47 elided > {code} > h3. Environments Tested > Tested against the following releases using the provided reproduction steps: > # spark-3.0.3-bin-hadoop2.7 - Succeeded > {code:java} > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/___/ .__/\_,_/_/ /_/\_\ version 3.0.3 > /_/Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java > 1.8.0_312) {code} > # spark-3.1.2-bin-hadoop3.2 - Failed > {code:java} > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/___/ .__/\_,_/_/ /_/\_\ version 3.1.2 > /_/Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java > 1.8.0_312) {code} > # spark-3.2.0-bin-hadoop3.2 - Failed > {code:java} > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/___/ .__/\_,_/_/ /_/\_\ version 3.2.0 > /_/Using Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java > 1.8.0_312) {code} > h2. Regression Source > The regression appears to have been introduced in > [25c7d0fe6ae20a4c1c42e0cd0b448c08ab03f3fb|https://github.com/apache/spark/commit/25c7d0fe6ae20a4c1c42e0cd0b448c08ab03f3fb#diff-722324a11a0e4635a59a9debc962da2c1678d86702a9a106fd0d51188f83853bR317], > which addressed > [SPARK-32526|https://issues.apache.org/jira/browse/SPARK-32526] > h2. Work Around > This regression can be worked around by using _Row.isNullAt(int)_ and > handling the null scenario in user code, prior to calling _Row.getSeq(int)_ > or _Row.getList(int)_. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37668) 'Index' object has no attribute 'levels' in pyspark.pandas.frame.DataFrame.insert
[ https://issues.apache.org/jira/browse/SPARK-37668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17461143#comment-17461143 ] Haejoon Lee commented on SPARK-37668: - Thanks for the report! Let me take a look > 'Index' object has no attribute 'levels' in > pyspark.pandas.frame.DataFrame.insert > -- > > Key: SPARK-37668 > URL: https://issues.apache.org/jira/browse/SPARK-37668 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Maciej Szymkiewicz >Priority: Major > > [This piece of > code|https://github.com/apache/spark/blob/6e45b04db48008fa033b09df983d3bd1c4f790ea/python/pyspark/pandas/frame.py#L3991-L3993] > in {{pyspark.pandas.frame}} is going to fail on runtime, when > {{is_name_like_tuple}} evaluates to {{True}} > {code:python} > if is_name_like_tuple(column): > if len(column) != len(self.columns.levels): > {code} > with > {code} > 'Index' object has no attribute 'levels' > {code} > To be honest, I am not sure what is intended behavior (initially, I suspected > that we should have > {code:python} > if len(column) != self.columns.nlevels > {code} > but {{nlevels}} is hard-coded to one, and wouldn't be consistent with Pandas > at all. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34544) pyspark toPandas() should return pd.DataFrame
[ https://issues.apache.org/jira/browse/SPARK-34544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17461133#comment-17461133 ] Apache Spark commented on SPARK-34544: -- User 'zero323' has created a pull request for this issue: https://github.com/apache/spark/pull/34927 > pyspark toPandas() should return pd.DataFrame > - > > Key: SPARK-34544 > URL: https://issues.apache.org/jira/browse/SPARK-34544 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.1.1 >Reporter: Rafal Wojdyla >Assignee: Maciej Szymkiewicz >Priority: Major > > Right now {{toPandas()}} returns {{DataFrameLike}}, which is an incomplete > "view" of pandas {{DataFrame}}. Which leads to cases like mypy reporting that > certain pandas methods are not present in {{DataFrameLike}}, even tho those > methods are valid methods on pandas {{DataFrame}}, which is the actual type > of the object. This requires type ignore comments or asserts. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34544) pyspark toPandas() should return pd.DataFrame
[ https://issues.apache.org/jira/browse/SPARK-34544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34544: Assignee: Maciej Szymkiewicz (was: Apache Spark) > pyspark toPandas() should return pd.DataFrame > - > > Key: SPARK-34544 > URL: https://issues.apache.org/jira/browse/SPARK-34544 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.1.1 >Reporter: Rafal Wojdyla >Assignee: Maciej Szymkiewicz >Priority: Major > > Right now {{toPandas()}} returns {{DataFrameLike}}, which is an incomplete > "view" of pandas {{DataFrame}}. Which leads to cases like mypy reporting that > certain pandas methods are not present in {{DataFrameLike}}, even tho those > methods are valid methods on pandas {{DataFrame}}, which is the actual type > of the object. This requires type ignore comments or asserts. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34544) pyspark toPandas() should return pd.DataFrame
[ https://issues.apache.org/jira/browse/SPARK-34544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34544: Assignee: Apache Spark (was: Maciej Szymkiewicz) > pyspark toPandas() should return pd.DataFrame > - > > Key: SPARK-34544 > URL: https://issues.apache.org/jira/browse/SPARK-34544 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.1.1 >Reporter: Rafal Wojdyla >Assignee: Apache Spark >Priority: Major > > Right now {{toPandas()}} returns {{DataFrameLike}}, which is an incomplete > "view" of pandas {{DataFrame}}. Which leads to cases like mypy reporting that > certain pandas methods are not present in {{DataFrameLike}}, even tho those > methods are valid methods on pandas {{DataFrame}}, which is the actual type > of the object. This requires type ignore comments or asserts. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37669) Remove unnecessary usages of OrderedDict
[ https://issues.apache.org/jira/browse/SPARK-37669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37669: Assignee: (was: Apache Spark) > Remove unnecessary usages of OrderedDict > > > Key: SPARK-37669 > URL: https://issues.apache.org/jira/browse/SPARK-37669 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Takuya Ueshin >Priority: Major > > Now that supported Python is 3.7 and above, we can remove unnecessary usages > of {{OrderedDict}} because built-in dict guarantees the insertion order. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37669) Remove unnecessary usages of OrderedDict
[ https://issues.apache.org/jira/browse/SPARK-37669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37669: Assignee: Apache Spark > Remove unnecessary usages of OrderedDict > > > Key: SPARK-37669 > URL: https://issues.apache.org/jira/browse/SPARK-37669 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Takuya Ueshin >Assignee: Apache Spark >Priority: Major > > Now that supported Python is 3.7 and above, we can remove unnecessary usages > of {{OrderedDict}} because built-in dict guarantees the insertion order. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37669) Remove unnecessary usages of OrderedDict
[ https://issues.apache.org/jira/browse/SPARK-37669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17461070#comment-17461070 ] Apache Spark commented on SPARK-37669: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/34926 > Remove unnecessary usages of OrderedDict > > > Key: SPARK-37669 > URL: https://issues.apache.org/jira/browse/SPARK-37669 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Takuya Ueshin >Priority: Major > > Now that supported Python is 3.7 and above, we can remove unnecessary usages > of {{OrderedDict}} because built-in dict guarantees the insertion order. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36019) Cannot run leveldb related UTs on Mac OS of M1 architecture
[ https://issues.apache.org/jira/browse/SPARK-36019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-36019: -- Parent: SPARK-35781 Issue Type: Sub-task (was: Bug) > Cannot run leveldb related UTs on Mac OS of M1 architecture > --- > > Key: SPARK-36019 > URL: https://issues.apache.org/jira/browse/SPARK-36019 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.3.0 >Reporter: Yang Jie >Priority: Major > > When run leveldb related UTs on Mac OS of M1 architecture, there are some > test failed as follows: > {code:java} > [INFO] Running org.apache.spark.util.kvstore.LevelDBSuite > [ERROR] Tests run: 10, Failures: 0, Errors: 10, Skipped: 0, Time elapsed: > 0.18 s <<< FAILURE! - in org.apache.spark.util.kvstore.LevelDBSuite > [ERROR] > org.apache.spark.util.kvstore.LevelDBSuite.testMultipleTypesWriteReadDelete > Time elapsed: 0.146 s <<< ERROR! > java.lang.UnsatisfiedLinkError: > Could not load library. Reasons: [no leveldbjni64-1.8 in java.library.path, > no leveldbjni-1.8 in java.library.path, no leveldbjni in java.library.path, > /Users/yangjie01/SourceCode/git/spark-mine-12/common/kvstore/target/tmp/libleveldbjni-64-1-7259526109351494242.8: > > dlopen(/Users/yangjie01/SourceCode/git/spark-mine-12/common/kvstore/target/tmp/libleveldbjni-64-1-7259526109351494242.8, > 1): no suitable image found. Did find: > > /Users/yangjie01/SourceCode/git/spark-mine-12/common/kvstore/target/tmp/libleveldbjni-64-1-7259526109351494242.8: > no matching architecture in universal wrapper > > /Users/yangjie01/SourceCode/git/spark-mine-12/common/kvstore/target/tmp/libleveldbjni-64-1-7259526109351494242.8: > no matching architecture in universal wrapper] > at > org.apache.spark.util.kvstore.LevelDBSuite.setup(LevelDBSuite.java:55) > [ERROR] org.apache.spark.util.kvstore.LevelDBSuite.testObjectWriteReadDelete > Time elapsed: 0 s <<< ERROR! > java.lang.NoClassDefFoundError: Could not initialize class > org.fusesource.leveldbjni.JniDBFactory > at > org.apache.spark.util.kvstore.LevelDBSuite.setup(LevelDBSuite.java:55) > > [ERROR] Tests run: 105, Failures: 0, Errors: 48, Skipped: 0{code} > There seems to be a lack of JNI support -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-36019) Cannot run leveldb related UTs on Mac OS of M1 architecture
[ https://issues.apache.org/jira/browse/SPARK-36019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun closed SPARK-36019. - > Cannot run leveldb related UTs on Mac OS of M1 architecture > --- > > Key: SPARK-36019 > URL: https://issues.apache.org/jira/browse/SPARK-36019 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.3.0 >Reporter: Yang Jie >Priority: Major > > When run leveldb related UTs on Mac OS of M1 architecture, there are some > test failed as follows: > {code:java} > [INFO] Running org.apache.spark.util.kvstore.LevelDBSuite > [ERROR] Tests run: 10, Failures: 0, Errors: 10, Skipped: 0, Time elapsed: > 0.18 s <<< FAILURE! - in org.apache.spark.util.kvstore.LevelDBSuite > [ERROR] > org.apache.spark.util.kvstore.LevelDBSuite.testMultipleTypesWriteReadDelete > Time elapsed: 0.146 s <<< ERROR! > java.lang.UnsatisfiedLinkError: > Could not load library. Reasons: [no leveldbjni64-1.8 in java.library.path, > no leveldbjni-1.8 in java.library.path, no leveldbjni in java.library.path, > /Users/yangjie01/SourceCode/git/spark-mine-12/common/kvstore/target/tmp/libleveldbjni-64-1-7259526109351494242.8: > > dlopen(/Users/yangjie01/SourceCode/git/spark-mine-12/common/kvstore/target/tmp/libleveldbjni-64-1-7259526109351494242.8, > 1): no suitable image found. Did find: > > /Users/yangjie01/SourceCode/git/spark-mine-12/common/kvstore/target/tmp/libleveldbjni-64-1-7259526109351494242.8: > no matching architecture in universal wrapper > > /Users/yangjie01/SourceCode/git/spark-mine-12/common/kvstore/target/tmp/libleveldbjni-64-1-7259526109351494242.8: > no matching architecture in universal wrapper] > at > org.apache.spark.util.kvstore.LevelDBSuite.setup(LevelDBSuite.java:55) > [ERROR] org.apache.spark.util.kvstore.LevelDBSuite.testObjectWriteReadDelete > Time elapsed: 0 s <<< ERROR! > java.lang.NoClassDefFoundError: Could not initialize class > org.fusesource.leveldbjni.JniDBFactory > at > org.apache.spark.util.kvstore.LevelDBSuite.setup(LevelDBSuite.java:55) > > [ERROR] Tests run: 105, Failures: 0, Errors: 48, Skipped: 0{code} > There seems to be a lack of JNI support -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37317) Reduce weights in GaussianMixtureSuite
[ https://issues.apache.org/jira/browse/SPARK-37317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-37317: -- Parent Issue: SPARK-35781 (was: SPARK-33772) > Reduce weights in GaussianMixtureSuite > -- > > Key: SPARK-37317 > URL: https://issues.apache.org/jira/browse/SPARK-37317 > Project: Spark > Issue Type: Sub-task > Components: MLlib, Tests >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.2.1, 3.3.0 > > > {code} > $ build/sbt "mllib/test" > ... > [info] *** 1 TEST FAILED *** > [error] Failed: Total 1760, Failed 1, Errors 0, Passed 1759, Ignored 7 > [error] Failed tests: > [error] org.apache.spark.ml.clustering.GaussianMixtureSuite > [error] (mllib / Test / test) sbt.TestsFailedException: Tests unsuccessful > [error] Total time: 625 s (10:25), completed Nov 13, 2021, 6:21:13 PM > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37522) Fix MultilayerPerceptronClassifierTest.test_raw_and_probability_prediction
[ https://issues.apache.org/jira/browse/SPARK-37522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-37522: -- Parent Issue: SPARK-35781 (was: SPARK-33772) > Fix MultilayerPerceptronClassifierTest.test_raw_and_probability_prediction > -- > > Key: SPARK-37522 > URL: https://issues.apache.org/jira/browse/SPARK-37522 > Project: Spark > Issue Type: Sub-task > Components: PySpark, Tests >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.2.1, 3.3.0 > > > The failure happens on Java 17 native Apple Silicon version on Python > 3.9/3.10. > {code} > $ java -version > openjdk version "17.0.1" 2021-10-19 LTS > OpenJDK Runtime Environment Zulu17.30+15-CA (build 17.0.1+12-LTS) > OpenJDK 64-Bit Server VM Zulu17.30+15-CA (build 17.0.1+12-LTS, mixed mode, > sharing) > {code} > {code} > == > FAIL: test_raw_and_probability_prediction > (pyspark.ml.tests.test_algorithms.MultilayerPerceptronClassifierTest) > -- > Traceback (most recent call last): > File > "/Users/dongjoon/APACHE/spark-merge/python/pyspark/ml/tests/test_algorithms.py", > line 104, in test_raw_and_probability_prediction > self.assertTrue(np.allclose(result.rawPrediction, expected_rawPrediction, > rtol=0.102)) > AssertionError: False is not true > -- > Ran 1 test in 7.385s > FAILED (failures=1) > Had test failures in pyspark.ml.tests.test_algorithms > MultilayerPerceptronClassifierTest.test_raw_and_probability_prediction with > python3; see logs. > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37272) Add `ExtendedRocksDBTest` and disable RocksDB tests on Apple Silicon
[ https://issues.apache.org/jira/browse/SPARK-37272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-37272: -- Parent Issue: SPARK-35781 (was: SPARK-33772) > Add `ExtendedRocksDBTest` and disable RocksDB tests on Apple Silicon > > > Key: SPARK-37272 > URL: https://issues.apache.org/jira/browse/SPARK-37272 > Project: Spark > Issue Type: Sub-task > Components: SQL, Tests >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.3.0 > > > Java 17 officially support Apple Silicon > - JEP 391: macOS/AArch64 Port > - https://bugs.openjdk.java.net/browse/JDK-8251280 > Oracle Java, Azul Zulu, and Eclipse Temurin Java 17 supports Apple Silicon > natively. > {code} > /Users/dongjoon/.jenv/versions/oracle17/bin/java: Mach-O 64-bit executable > arm64 > /Users/dongjoon/.jenv/versions/zulu17/bin/java: Mach-O 64-bit executable arm64 > /Users/dongjoon/.jenv/versions/temurin17/bin/java: Mach-O 64-bit executable > arm64 > {code} > Since RocksDBJNI still doesn't support Apple Silicon natively, the following > failures occur on M1. > {code} > $ build/sbt "sql/testOnly *RocksDB* *.StreamingSessionWindowSuite" > ... > [info] Run completed in 23 seconds, 281 milliseconds. > [info] Total number of tests run: 32 > [info] Suites: completed 2, aborted 2 > [info] Tests: succeeded 22, failed 10, canceled 0, ignored 0, pending 0 > [info] *** 2 SUITES ABORTED *** > [info] *** 10 TESTS FAILED *** > [error] Failed tests: > [error] org.apache.spark.sql.streaming.StreamingSessionWindowSuite > [error] > org.apache.spark.sql.execution.streaming.state.RocksDBStateStoreIntegrationSuite > [error] Error during tests: > [error] org.apache.spark.sql.execution.streaming.state.RocksDBSuite > [error] > org.apache.spark.sql.execution.streaming.state.RocksDBStateStoreSuite > [error] (sql / Test / testOnly) sbt.TestsFailedException: Tests unsuccessful > [error] Total time: 43 s, completed Nov 10, 2021 4:29:50 PM > {code} > This issue aims to add ExtendedRocksDBTest to disable RocksDB selectively on > Apple Silicon. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37282) Add ExtendedLevelDBTest and disable LevelDB tests on Apple Silicon
[ https://issues.apache.org/jira/browse/SPARK-37282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-37282: -- Parent Issue: SPARK-35781 (was: SPARK-33772) > Add ExtendedLevelDBTest and disable LevelDB tests on Apple Silicon > -- > > Key: SPARK-37282 > URL: https://issues.apache.org/jira/browse/SPARK-37282 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, Tests >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.3.0 > > > Java 17 officially support Apple Silicon. > - JEP 391: macOS/AArch64 Port > - https://bugs.openjdk.java.net/browse/JDK-8251280 > Oracle Java, Azul Zulu, and Eclipse Temurin Java 17 supports Apple Silicon > natively. > {code} > /Users/dongjoon/.jenv/versions/oracle17/bin/java: Mach-O 64-bit executable > arm64 > /Users/dongjoon/.jenv/versions/zulu17/bin/java: Mach-O 64-bit executable arm64 > /Users/dongjoon/.jenv/versions/temurin17/bin/java: Mach-O 64-bit executable > arm64 > {code} > Since LevelDBJNI still doesn't support Apple Silicon natively, the test cases > fail on M1. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37655) Add RocksDB Implementation for KVStore
[ https://issues.apache.org/jira/browse/SPARK-37655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-37655: -- Parent: SPARK-35781 Issue Type: Sub-task (was: Improvement) > Add RocksDB Implementation for KVStore > -- > > Key: SPARK-37655 > URL: https://issues.apache.org/jira/browse/SPARK-37655 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37655) Add RocksDB Implementation for KVStore
[ https://issues.apache.org/jira/browse/SPARK-37655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-37655: -- Parent: (was: SPARK-33772) Issue Type: Improvement (was: Sub-task) > Add RocksDB Implementation for KVStore > -- > > Key: SPARK-37655 > URL: https://issues.apache.org/jira/browse/SPARK-37655 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35781) Support Spark on Apple Silicon on macOS natively on Java 17
[ https://issues.apache.org/jira/browse/SPARK-35781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-35781: -- Summary: Support Spark on Apple Silicon on macOS natively on Java 17 (was: Support Spark on Apple Silicon on macOS natively) > Support Spark on Apple Silicon on macOS natively on Java 17 > --- > > Key: SPARK-35781 > URL: https://issues.apache.org/jira/browse/SPARK-35781 > Project: Spark > Issue Type: New Feature > Components: Build >Affects Versions: 3.3.0 >Reporter: DB Tsai >Priority: Major > > This is an umbrella JIRA tracking the progress of supporting Apple Silicon on > macOS natively. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37669) Remove unnecessary usages of OrderedDict
[ https://issues.apache.org/jira/browse/SPARK-37669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17461057#comment-17461057 ] Takuya Ueshin commented on SPARK-37669: --- I'm working on this. > Remove unnecessary usages of OrderedDict > > > Key: SPARK-37669 > URL: https://issues.apache.org/jira/browse/SPARK-37669 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Takuya Ueshin >Priority: Major > > Now that supported Python is 3.7 and above, we can remove unnecessary usages > of {{OrderedDict}} because built-in dict guarantees the insertion order. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37669) Remove unnecessary usages of OrderedDict
Takuya Ueshin created SPARK-37669: - Summary: Remove unnecessary usages of OrderedDict Key: SPARK-37669 URL: https://issues.apache.org/jira/browse/SPARK-37669 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.3.0 Reporter: Takuya Ueshin Now that supported Python is 3.7 and above, we can remove unnecessary usages of {{OrderedDict}} because built-in dict guarantees the insertion order. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37667) Spark throws TreeNodeException ("Couldn't find gen_alias") during wildcard column expansion
[ https://issues.apache.org/jira/browse/SPARK-37667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kellan B Cummings updated SPARK-37667: -- Description: I'm seeing a TreeNodeException ("Couldn't find {_}gen_alias{_}") when running certain operations in Spark 3.1.2. A few conditions need to be met to trigger the bug: - a DF with a nested struct joins to a second DF - a filter that compares a column in the right DF to a column in the left DF - wildcard column expansion of the nested struct - a group by statement on a struct column *Data* g...@github.com:kellanburket/spark3bug.git {code:java} val rightDf = spark.read.parquet("right.parquet") val leftDf = spark.read.parquet("left.parquet"){code} *Schemas* {code:java} leftDf.printSchema() root |-- row: struct (nullable = true) | |-- mid: string (nullable = true) | |-- start: struct (nullable = true) | | |-- latitude: double (nullable = true) | | |-- longitude: double (nullable = true) |-- s2_cell_id: long (nullable = true){code} {code:java} rightDf.printSchema() root |-- id: string (nullable = true) |-- s2_cell_id: long (nullable = true){code} *Breaking Code* {code:java} leftDf.join(rightDf, "s2_cell_id").filter( "id != row.start.latitude" ).select( col("row.*"), col("id") ).groupBy( "start" ).agg( min("id") ).show(){code} *Working Examples* The following examples don't seem to be effected by the bug Works without group by: {code:java} leftDf.join(rightDf, "s2_cell_id").filter( "id != row.start.latitude" ).select( col("row.*"), col("id") ).show(){code} Works without filter {code:java} leftDf.join(rightDf, "s2_cell_id").select( col("row.*"), col("id") ).groupBy( "start" ).agg( min("id") ).show(){code} Works without wildcard expansion {code:java} leftDf.join(rightDf, "s2_cell_id").filter( "id != row.start.latitude" ).select( col("row.start"), col("id") ).groupBy( "start" ).agg( min("id") ).show(){code} Works with caching {code:java} leftDf.join(rightDf, "s2_cell_id").filter( "id != row.start.latitude" ).cache().select( col("row.*"), col("id") ).groupBy( "start" ).agg( min("id") ).show(){code} *Error message* {code:java} org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree: Exchange hashpartitioning(start#2116, 1024), ENSURE_REQUIREMENTS, [id=#3849] +- SortAggregate(key=[knownfloatingpointnormalized(if (isnull(start#2116)) null else named_struct(latitude, knownfloatingpointnormalized(normalizenanandzero(start#2116.latitude)), longitude, knownfloatingpointnormalized(normalizenanandzero(start#2116.longitude AS start#2116], functions=[partial_min(id#2103)], output=[start#2116, min#2138]) +- *(2) Sort [knownfloatingpointnormalized(if (isnull(start#2116)) null else named_struct(latitude, knownfloatingpointnormalized(normalizenanandzero(start#2116.latitude)), longitude, knownfloatingpointnormalized(normalizenanandzero(start#2116.longitude AS start#2116 ASC NULLS FIRST], false, 0 +- *(2) Project [_gen_alias_2133#2133 AS start#2116, id#2103] +- *(2) !BroadcastHashJoin [s2_cell_id#2108L], [s2_cell_id#2104L], Inner, BuildLeft, NOT (cast(id#2103 as double) = _gen_alias_2134#2134), false :- BroadcastQueryStage 0 : +- BroadcastExchange HashedRelationBroadcastMode(List(input[1, bigint, false]),false), [id=#3768] : +- *(1) Project [row#2107.start AS _gen_alias_2133#2133, s2_cell_id#2108L] : +- *(1) Filter isnotnull(s2_cell_id#2108L) : +- FileScan parquet [row#2107,s2_cell_id#2108L] Batched: false, DataFilters: [isnotnull(s2_cell_id#2108L)], Format: Parquet, Location: InMemoryFileIndex[s3://co.mira.public/spark3_bug/left], PartitionFilters: [], PushedFilters: [IsNotNull(s2_cell_id)], ReadSchema: struct>,s2_cell_id:bigint> +- *(2) Filter (isnotnull(id#2103) AND isnotnull(s2_cell_id#2104L)) +- *(2) ColumnarToRow +- FileScan parquet [id#2103,s2_cell_id#2104L] Batched: true, DataFilters: [isnotnull(id#2103), isnotnull(s2_cell_id#2104L)], Format: Parquet, Location: InMemoryFileIndex[s3://co.mira.public/spark3_bug/right], PartitionFilters: [], PushedFilters: [IsNotNull(id), IsNotNull(s2_cell_id)], ReadSchema: struct at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56) at org.apache.spark.sql.execution.exchange.ShuffleExchangeLike.$anonfun$materializeFuture$1(ShuffleExchangeExec.scala:101) at org.apache.spark.sql.util.LazyValue.getOrInit(LazyValue.scala:41) at org.apache.spark.sql.execution.exchange.Exchange.getOrInitMaterializeFuture(Exchange.scala:71) at org.apache.spark.sql.execution.exchange.ShuffleExchangeLike.materializeFuture(ShuffleExchangeExec.scala:97) at org.apache.spark.sql.execution.exchange.ShuffleExchangeLike.materialize(ShuffleExchangeExec.scala:85)
[jira] [Updated] (SPARK-37667) Spark throws TreeNodeException ("Couldn't find gen_alias") during wildcard column expansion
[ https://issues.apache.org/jira/browse/SPARK-37667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kellan B Cummings updated SPARK-37667: -- Summary: Spark throws TreeNodeException ("Couldn't find gen_alias") during wildcard column expansion (was: Spark throws TreeNodeException during wildcard column expansion) > Spark throws TreeNodeException ("Couldn't find gen_alias") during wildcard > column expansion > --- > > Key: SPARK-37667 > URL: https://issues.apache.org/jira/browse/SPARK-37667 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.2 >Reporter: Kellan B Cummings >Priority: Major > > I'm seeing a TreeNodeException ("Couldn't find _gen_alias_") when running > certain operations in Spark 3.1.2. > A few conditions need to be met to trigger the bug: > - a DF with a nested struct joins to a second DF > - a filter that compares a column in the right DF to a column in the left DF > - wildcard column expansion of the nested struct > - a group by statement on a struct column > *Data* > g...@github.com:kellanburket/spark3bug.git > > {code:java} > val rightDf = spark.read.parquet("right.parquet") > val leftDf = spark.read.parquet("left.parquet"){code} > > *Schemas* > {code:java} > leftDf.printSchema() > root > |-- row: struct (nullable = true) > | |-- mid: string (nullable = true) > | |-- start: struct (nullable = true) > | | |-- latitude: double (nullable = true) > | | |-- longitude: double (nullable = true) > |-- s2_cell_id: long (nullable = true){code} > {code:java} > rightDf.printSchema() > root > |-- id: string (nullable = true) > |-- s2_cell_id: long (nullable = true){code} > > *Breaking Code* > {code:java} > leftDf.join(rightDf, "s2_cell_id").filter( > "id != row.start.latitude" > ).select( > col("row.*"), col("id") > ).groupBy( > "start" > ).agg( > min("id") > ).show(){code} > > *Working Examples* > The following examples don't seem to be effected by the bug > Works without group by: > {code:java} > leftDf.join(rightDf, "s2_cell_id").filter( > "id != row.start.latitude" > ).select( > col("row.*"), col("id") > ).show(){code} > Works without filter > {code:java} > leftDf.join(rightDf, "s2_cell_id").select( > col("row.*"), col("id") > ).groupBy( > "start" > ).agg( > min("id") > ).show(){code} > Works without variable expansion > {code:java} > leftDf.join(rightDf, "s2_cell_id").filter( > "id != row.start.latitude" > ).select( > col("row.start"), col("id") > ).groupBy( > "start" > ).agg( > min("id") > ).show(){code} > Works with caching > {code:java} > leftDf.join(rightDf, "s2_cell_id").filter( > "id != row.start.latitude" > ).cache().select( > col("row.*"), > col("id") > ).groupBy( > "start" > ).agg( > min("id") > ).show(){code} > *Error message* > > > {code:java} > org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree: > Exchange hashpartitioning(start#2116, 1024), ENSURE_REQUIREMENTS, [id=#3849] > +- SortAggregate(key=[knownfloatingpointnormalized(if (isnull(start#2116)) > null else named_struct(latitude, > knownfloatingpointnormalized(normalizenanandzero(start#2116.latitude)), > longitude, > knownfloatingpointnormalized(normalizenanandzero(start#2116.longitude AS > start#2116], functions=[partial_min(id#2103)], output=[start#2116, min#2138]) > +- *(2) Sort [knownfloatingpointnormalized(if (isnull(start#2116)) null > else named_struct(latitude, > knownfloatingpointnormalized(normalizenanandzero(start#2116.latitude)), > longitude, > knownfloatingpointnormalized(normalizenanandzero(start#2116.longitude AS > start#2116 ASC NULLS FIRST], false, 0 > +- *(2) Project [_gen_alias_2133#2133 AS start#2116, id#2103] > +- *(2) !BroadcastHashJoin [s2_cell_id#2108L], [s2_cell_id#2104L], > Inner, BuildLeft, NOT (cast(id#2103 as double) = _gen_alias_2134#2134), false > :- BroadcastQueryStage 0 > : +- BroadcastExchange HashedRelationBroadcastMode(List(input[1, > bigint, false]),false), [id=#3768] > : +- *(1) Project [row#2107.start AS _gen_alias_2133#2133, > s2_cell_id#2108L] > : +- *(1) Filter isnotnull(s2_cell_id#2108L) > : +- FileScan parquet [row#2107,s2_cell_id#2108L] > Batched: false, DataFilters: [isnotnull(s2_cell_id#2108L)], Format: Parquet, > Location: InMemoryFileIndex[s3://co.mira.public/spark3_bug/left], > PartitionFilters: [], PushedFilters: [IsNotNull(s2_cell_id)], ReadSchema: > struct>,s2_cell_id:bigint> > +- *(2) Filter (isnotnull(id#2103) AND > isnotnull(s2_cell_id#2104L)) > +- *(2) ColumnarToRow > +- FileScan parquet [id#2103,s2_cell_id#210
[jira] [Updated] (SPARK-37668) 'Index' object has no attribute 'levels' in pyspark.pandas.frame.DataFrame.insert
[ https://issues.apache.org/jira/browse/SPARK-37668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maciej Szymkiewicz updated SPARK-37668: --- Summary: 'Index' object has no attribute 'levels' in pyspark.pandas.frame.DataFrame.insert (was: 'Index' object has no attribute 'levels') > 'Index' object has no attribute 'levels' in > pyspark.pandas.frame.DataFrame.insert > -- > > Key: SPARK-37668 > URL: https://issues.apache.org/jira/browse/SPARK-37668 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Maciej Szymkiewicz >Priority: Major > > [This piece of > code|https://github.com/apache/spark/blob/6e45b04db48008fa033b09df983d3bd1c4f790ea/python/pyspark/pandas/frame.py#L3991-L3993] > in {{pyspark.pandas.frame}} is going to fail on runtime, when > {{is_name_like_tuple}} evaluates to {{True}} > {code:python} > if is_name_like_tuple(column): > if len(column) != len(self.columns.levels): > {code} > with > {code} > 'Index' object has no attribute 'levels' > {code} > To be honest, I am not sure what is intended behavior (initially, I suspected > that we should have > {code:python} > if len(column) != self.columns.nlevels > {code} > but {{nlevels}} is hard-coded to one, and wouldn't be consistent with Pandas > at all. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37668) 'Index' object has no attribute 'levels'
[ https://issues.apache.org/jira/browse/SPARK-37668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17461039#comment-17461039 ] Maciej Szymkiewicz commented on SPARK-37668: cc [~hyukjin.kwon], [~itholic], [~ueshin], [~XinrongM]. > 'Index' object has no attribute 'levels' > > > Key: SPARK-37668 > URL: https://issues.apache.org/jira/browse/SPARK-37668 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Maciej Szymkiewicz >Priority: Major > > [This piece of > code|https://github.com/apache/spark/blob/6e45b04db48008fa033b09df983d3bd1c4f790ea/python/pyspark/pandas/frame.py#L3991-L3993] > in {{pyspark.pandas.frame}} is going to fail on runtime, when > {{is_name_like_tuple}} evaluates to {{True}} > {code:python} > if is_name_like_tuple(column): > if len(column) != len(self.columns.levels): > {code} > with > {code} > 'Index' object has no attribute 'levels' > {code} > To be honest, I am not sure what is intended behavior (initially, I suspected > that we should have > {code:python} > if len(column) != self.columns.nlevels > {code} > but {{nlevels}} is hard-coded to one, and wouldn't be consistent with Pandas > at all. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37667) Spark throws TreeNodeException during wildcard column expansion
[ https://issues.apache.org/jira/browse/SPARK-37667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kellan B Cummings updated SPARK-37667: -- Summary: Spark throws TreeNodeException during wildcard column expansion (was: Spark throws TreeNodeException during variable expansion) > Spark throws TreeNodeException during wildcard column expansion > --- > > Key: SPARK-37667 > URL: https://issues.apache.org/jira/browse/SPARK-37667 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.2 >Reporter: Kellan B Cummings >Priority: Major > > I'm seeing a TreeNodeException ("Couldn't find _gen_alias_") when running > certain operations in Spark 3.1.2. > A few conditions need to be met to trigger the bug: > - a DF with a nested struct joins to a second DF > - a filter that compares a column in the right DF to a column in the left DF > - wildcard column expansion of the nested struct > - a group by statement on a struct column > *Data* > g...@github.com:kellanburket/spark3bug.git > > {code:java} > val rightDf = spark.read.parquet("right.parquet") > val leftDf = spark.read.parquet("left.parquet"){code} > > *Schemas* > {code:java} > leftDf.printSchema() > root > |-- row: struct (nullable = true) > | |-- mid: string (nullable = true) > | |-- start: struct (nullable = true) > | | |-- latitude: double (nullable = true) > | | |-- longitude: double (nullable = true) > |-- s2_cell_id: long (nullable = true){code} > {code:java} > rightDf.printSchema() > root > |-- id: string (nullable = true) > |-- s2_cell_id: long (nullable = true){code} > > *Breaking Code* > {code:java} > leftDf.join(rightDf, "s2_cell_id").filter( > "id != row.start.latitude" > ).select( > col("row.*"), col("id") > ).groupBy( > "start" > ).agg( > min("id") > ).show(){code} > > *Working Examples* > The following examples don't seem to be effected by the bug > Works without group by: > {code:java} > leftDf.join(rightDf, "s2_cell_id").filter( > "id != row.start.latitude" > ).select( > col("row.*"), col("id") > ).show(){code} > Works without filter > {code:java} > leftDf.join(rightDf, "s2_cell_id").select( > col("row.*"), col("id") > ).groupBy( > "start" > ).agg( > min("id") > ).show(){code} > Works without variable expansion > {code:java} > leftDf.join(rightDf, "s2_cell_id").filter( > "id != row.start.latitude" > ).select( > col("row.start"), col("id") > ).groupBy( > "start" > ).agg( > min("id") > ).show(){code} > Works with caching > {code:java} > leftDf.join(rightDf, "s2_cell_id").filter( > "id != row.start.latitude" > ).cache().select( > col("row.*"), > col("id") > ).groupBy( > "start" > ).agg( > min("id") > ).show(){code} > *Error message* > > > {code:java} > org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree: > Exchange hashpartitioning(start#2116, 1024), ENSURE_REQUIREMENTS, [id=#3849] > +- SortAggregate(key=[knownfloatingpointnormalized(if (isnull(start#2116)) > null else named_struct(latitude, > knownfloatingpointnormalized(normalizenanandzero(start#2116.latitude)), > longitude, > knownfloatingpointnormalized(normalizenanandzero(start#2116.longitude AS > start#2116], functions=[partial_min(id#2103)], output=[start#2116, min#2138]) > +- *(2) Sort [knownfloatingpointnormalized(if (isnull(start#2116)) null > else named_struct(latitude, > knownfloatingpointnormalized(normalizenanandzero(start#2116.latitude)), > longitude, > knownfloatingpointnormalized(normalizenanandzero(start#2116.longitude AS > start#2116 ASC NULLS FIRST], false, 0 > +- *(2) Project [_gen_alias_2133#2133 AS start#2116, id#2103] > +- *(2) !BroadcastHashJoin [s2_cell_id#2108L], [s2_cell_id#2104L], > Inner, BuildLeft, NOT (cast(id#2103 as double) = _gen_alias_2134#2134), false > :- BroadcastQueryStage 0 > : +- BroadcastExchange HashedRelationBroadcastMode(List(input[1, > bigint, false]),false), [id=#3768] > : +- *(1) Project [row#2107.start AS _gen_alias_2133#2133, > s2_cell_id#2108L] > : +- *(1) Filter isnotnull(s2_cell_id#2108L) > : +- FileScan parquet [row#2107,s2_cell_id#2108L] > Batched: false, DataFilters: [isnotnull(s2_cell_id#2108L)], Format: Parquet, > Location: InMemoryFileIndex[s3://co.mira.public/spark3_bug/left], > PartitionFilters: [], PushedFilters: [IsNotNull(s2_cell_id)], ReadSchema: > struct>,s2_cell_id:bigint> > +- *(2) Filter (isnotnull(id#2103) AND > isnotnull(s2_cell_id#2104L)) > +- *(2) ColumnarToRow > +- FileScan parquet [id#2103,s2_cell_id#2104L] Batched: > true, DataFilters: [isnotnull(id#2103), isnotnull(s2_cell_id#2104L)], Format:
[jira] [Created] (SPARK-37668) 'Index' object has no attribute 'levels'
Maciej Szymkiewicz created SPARK-37668: -- Summary: 'Index' object has no attribute 'levels' Key: SPARK-37668 URL: https://issues.apache.org/jira/browse/SPARK-37668 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 3.3.0 Reporter: Maciej Szymkiewicz [This piece of code|https://github.com/apache/spark/blob/6e45b04db48008fa033b09df983d3bd1c4f790ea/python/pyspark/pandas/frame.py#L3991-L3993] in {{pyspark.pandas.frame}} is going to fail on runtime, when {{is_name_like_tuple}} evaluates to {{True}} {code:python} if is_name_like_tuple(column): if len(column) != len(self.columns.levels): {code} with {code} 'Index' object has no attribute 'levels' {code} To be honest, I am not sure what is intended behavior (initially, I suspected that we should have {code:python} if len(column) != self.columns.nlevels {code} but {{nlevels}} is hard-coded to one, and wouldn't be consistent with Pandas at all. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37667) Spark throws TreeNodeException during variable expansion
Kellan B Cummings created SPARK-37667: - Summary: Spark throws TreeNodeException during variable expansion Key: SPARK-37667 URL: https://issues.apache.org/jira/browse/SPARK-37667 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.1.2 Reporter: Kellan B Cummings I'm seeing a TreeNodeException ("Couldn't find _gen_alias_") when running certain operations in Spark 3.1.2. A few conditions need to be met to trigger the bug: - a DF with a nested struct joins to a second DF - a filter that compares a column in the right DF to a column in the left DF - wildcard column expansion of the nested struct - a group by statement on a struct column *Data* g...@github.com:kellanburket/spark3bug.git {code:java} val rightDf = spark.read.parquet("right.parquet") val leftDf = spark.read.parquet("left.parquet"){code} *Schemas* {code:java} leftDf.printSchema() root |-- row: struct (nullable = true) | |-- mid: string (nullable = true) | |-- start: struct (nullable = true) | | |-- latitude: double (nullable = true) | | |-- longitude: double (nullable = true) |-- s2_cell_id: long (nullable = true){code} {code:java} rightDf.printSchema() root |-- id: string (nullable = true) |-- s2_cell_id: long (nullable = true){code} *Breaking Code* {code:java} leftDf.join(rightDf, "s2_cell_id").filter( "id != row.start.latitude" ).select( col("row.*"), col("id") ).groupBy( "start" ).agg( min("id") ).show(){code} *Working Examples* The following examples don't seem to be effected by the bug Works without group by: {code:java} leftDf.join(rightDf, "s2_cell_id").filter( "id != row.start.latitude" ).select( col("row.*"), col("id") ).show(){code} Works without filter {code:java} leftDf.join(rightDf, "s2_cell_id").select( col("row.*"), col("id") ).groupBy( "start" ).agg( min("id") ).show(){code} Works without variable expansion {code:java} leftDf.join(rightDf, "s2_cell_id").filter( "id != row.start.latitude" ).select( col("row.start"), col("id") ).groupBy( "start" ).agg( min("id") ).show(){code} Works with caching {code:java} leftDf.join(rightDf, "s2_cell_id").filter( "id != row.start.latitude" ).cache().select( col("row.*"), col("id") ).groupBy( "start" ).agg( min("id") ).show(){code} *Error message* {code:java} org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree: Exchange hashpartitioning(start#2116, 1024), ENSURE_REQUIREMENTS, [id=#3849] +- SortAggregate(key=[knownfloatingpointnormalized(if (isnull(start#2116)) null else named_struct(latitude, knownfloatingpointnormalized(normalizenanandzero(start#2116.latitude)), longitude, knownfloatingpointnormalized(normalizenanandzero(start#2116.longitude AS start#2116], functions=[partial_min(id#2103)], output=[start#2116, min#2138]) +- *(2) Sort [knownfloatingpointnormalized(if (isnull(start#2116)) null else named_struct(latitude, knownfloatingpointnormalized(normalizenanandzero(start#2116.latitude)), longitude, knownfloatingpointnormalized(normalizenanandzero(start#2116.longitude AS start#2116 ASC NULLS FIRST], false, 0 +- *(2) Project [_gen_alias_2133#2133 AS start#2116, id#2103] +- *(2) !BroadcastHashJoin [s2_cell_id#2108L], [s2_cell_id#2104L], Inner, BuildLeft, NOT (cast(id#2103 as double) = _gen_alias_2134#2134), false :- BroadcastQueryStage 0 : +- BroadcastExchange HashedRelationBroadcastMode(List(input[1, bigint, false]),false), [id=#3768] : +- *(1) Project [row#2107.start AS _gen_alias_2133#2133, s2_cell_id#2108L] : +- *(1) Filter isnotnull(s2_cell_id#2108L) : +- FileScan parquet [row#2107,s2_cell_id#2108L] Batched: false, DataFilters: [isnotnull(s2_cell_id#2108L)], Format: Parquet, Location: InMemoryFileIndex[s3://co.mira.public/spark3_bug/left], PartitionFilters: [], PushedFilters: [IsNotNull(s2_cell_id)], ReadSchema: struct>,s2_cell_id:bigint> +- *(2) Filter (isnotnull(id#2103) AND isnotnull(s2_cell_id#2104L)) +- *(2) ColumnarToRow +- FileScan parquet [id#2103,s2_cell_id#2104L] Batched: true, DataFilters: [isnotnull(id#2103), isnotnull(s2_cell_id#2104L)], Format: Parquet, Location: InMemoryFileIndex[s3://co.mira.public/spark3_bug/right], PartitionFilters: [], PushedFilters: [IsNotNull(id), IsNotNull(s2_cell_id)], ReadSchema: struct at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56) at org.apache.spark.sql.execution.exchange.ShuffleExchangeLike.$anonfun$materializeFuture$1(ShuffleExchangeExec.scala:101) at org.apache.spark.sql.util.LazyValue.getOrInit(LazyValue.scala:41) at org.apache.spark.sql.execution.exchange.Exchange.getOrInitMaterializeFuture(Exchange.scala:71) at org.apache.spark.sql.execution.exc
[jira] [Assigned] (SPARK-34521) spark.createDataFrame does not support Pandas StringDtype extension type
[ https://issues.apache.org/jira/browse/SPARK-34521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler reassigned SPARK-34521: Assignee: Nicolas Azrak > spark.createDataFrame does not support Pandas StringDtype extension type > > > Key: SPARK-34521 > URL: https://issues.apache.org/jira/browse/SPARK-34521 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.0.1 >Reporter: Pavel Ganelin >Assignee: Nicolas Azrak >Priority: Major > Fix For: 3.3.0 > > > The following test case demonstrates the problem: > {code:java} > import pandas as pd > from pyspark.sql import SparkSession, types > spark = SparkSession.builder.appName(__file__)\ > .config("spark.sql.execution.arrow.pyspark.enabled","true") \ > .getOrCreate() > good = pd.DataFrame([["abc"]], columns=["col"]) > schema = types.StructType([types.StructField("col", types.StringType(), > True)]) > df = spark.createDataFrame(good, schema=schema) > df.show() > bad = good.copy() > bad["col"]=bad["col"].astype("string") > schema = types.StructType([types.StructField("col", types.StringType(), > True)]) > df = spark.createDataFrame(bad, schema=schema) > df.show(){code} > The error: > {code:java} > C:\Python\3.8.3\lib\site-packages\pyspark\sql\pandas\conversion.py:289: > UserWarning: createDataFrame attempted Arrow optimization because > 'spark.sql.execution.arrow.pyspark.enabled' is set to true; however, failed > by the reason below: > Cannot specify a mask or a size when passing an object that is converted > with the __arrow_array__ protocol. > Attempting non-optimization as > 'spark.sql.execution.arrow.pyspark.fallback.enabled' is set to true. > warnings.warn(msg) > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37666) Set `GCM` as the default mode in `aes_encrypt()`/`aes_decrypt()`
[ https://issues.apache.org/jira/browse/SPARK-37666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37666: Assignee: Max Gekk (was: Apache Spark) > Set `GCM` as the default mode in `aes_encrypt()`/`aes_decrypt()` > > > Key: SPARK-37666 > URL: https://issues.apache.org/jira/browse/SPARK-37666 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > > Change the default mode from ECB to GCM in AES functions: aes_encrypt() and > aes_decrypt(). GCM is much more preferable because it is semantically secure. > Also the mode is used the default one in other systems like Snowflake, see > https://docs.snowflake.com/en/sql-reference/functions/encrypt.html -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37666) Set `GCM` as the default mode in `aes_encrypt()`/`aes_decrypt()`
[ https://issues.apache.org/jira/browse/SPARK-37666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37666: Assignee: Apache Spark (was: Max Gekk) > Set `GCM` as the default mode in `aes_encrypt()`/`aes_decrypt()` > > > Key: SPARK-37666 > URL: https://issues.apache.org/jira/browse/SPARK-37666 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Major > > Change the default mode from ECB to GCM in AES functions: aes_encrypt() and > aes_decrypt(). GCM is much more preferable because it is semantically secure. > Also the mode is used the default one in other systems like Snowflake, see > https://docs.snowflake.com/en/sql-reference/functions/encrypt.html -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37666) Set `GCM` as the default mode in `aes_encrypt()`/`aes_decrypt()`
[ https://issues.apache.org/jira/browse/SPARK-37666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17460984#comment-17460984 ] Apache Spark commented on SPARK-37666: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/34925 > Set `GCM` as the default mode in `aes_encrypt()`/`aes_decrypt()` > > > Key: SPARK-37666 > URL: https://issues.apache.org/jira/browse/SPARK-37666 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > > Change the default mode from ECB to GCM in AES functions: aes_encrypt() and > aes_decrypt(). GCM is much more preferable because it is semantically secure. > Also the mode is used the default one in other systems like Snowflake, see > https://docs.snowflake.com/en/sql-reference/functions/encrypt.html -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37666) Set `GCM` as the default mode in `aes_encrypt()`/`aes_decrypt()`
Max Gekk created SPARK-37666: Summary: Set `GCM` as the default mode in `aes_encrypt()`/`aes_decrypt()` Key: SPARK-37666 URL: https://issues.apache.org/jira/browse/SPARK-37666 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.3.0 Reporter: Max Gekk Assignee: Max Gekk Change the default mode from ECB to GCM in AES functions: aes_encrypt() and aes_decrypt(). GCM is much more preferable because it is semantically secure. Also the mode is used the default one in other systems like Snowflake, see https://docs.snowflake.com/en/sql-reference/functions/encrypt.html -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37664) Add InMemoryColumnarBenchmark and StateStoreBasicOperationsBenchmark Java 11/17 result
[ https://issues.apache.org/jira/browse/SPARK-37664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-37664: -- Parent: SPARK-33772 Issue Type: Sub-task (was: Task) > Add InMemoryColumnarBenchmark and StateStoreBasicOperationsBenchmark Java > 11/17 result > -- > > Key: SPARK-37664 > URL: https://issues.apache.org/jira/browse/SPARK-37664 > Project: Spark > Issue Type: Sub-task > Components: Tests >Affects Versions: 3.3.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Trivial > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37664) Add InMemoryColumnarBenchmark and StateStoreBasicOperationsBenchmark Java 11/17 result
[ https://issues.apache.org/jira/browse/SPARK-37664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-37664: -- Summary: Add InMemoryColumnarBenchmark and StateStoreBasicOperationsBenchmark Java 11/17 result (was: Supplement benchmark result of InMemoryColumnarBenchmark and StateStoreBasicOperationsBenchmark for Java 11 and Java 17) > Add InMemoryColumnarBenchmark and StateStoreBasicOperationsBenchmark Java > 11/17 result > -- > > Key: SPARK-37664 > URL: https://issues.apache.org/jira/browse/SPARK-37664 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.3.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Trivial > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37664) Add InMemoryColumnarBenchmark and StateStoreBasicOperationsBenchmark Java 11/17 result
[ https://issues.apache.org/jira/browse/SPARK-37664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-37664: -- Issue Type: Task (was: Improvement) > Add InMemoryColumnarBenchmark and StateStoreBasicOperationsBenchmark Java > 11/17 result > -- > > Key: SPARK-37664 > URL: https://issues.apache.org/jira/browse/SPARK-37664 > Project: Spark > Issue Type: Task > Components: Tests >Affects Versions: 3.3.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Trivial > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37664) Supplement benchmark result of InMemoryColumnarBenchmark and StateStoreBasicOperationsBenchmark for Java 11 and Java 17
[ https://issues.apache.org/jira/browse/SPARK-37664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-37664. --- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 34921 [https://github.com/apache/spark/pull/34921] > Supplement benchmark result of InMemoryColumnarBenchmark and > StateStoreBasicOperationsBenchmark for Java 11 and Java 17 > --- > > Key: SPARK-37664 > URL: https://issues.apache.org/jira/browse/SPARK-37664 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.3.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Trivial > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37664) Supplement benchmark result of InMemoryColumnarBenchmark and StateStoreBasicOperationsBenchmark for Java 11 and Java 17
[ https://issues.apache.org/jira/browse/SPARK-37664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-37664: - Assignee: Yang Jie > Supplement benchmark result of InMemoryColumnarBenchmark and > StateStoreBasicOperationsBenchmark for Java 11 and Java 17 > --- > > Key: SPARK-37664 > URL: https://issues.apache.org/jira/browse/SPARK-37664 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.3.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Trivial > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37145) Improvement for extending pod feature steps with KubernetesConf
[ https://issues.apache.org/jira/browse/SPARK-37145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37145: Assignee: (was: Apache Spark) > Improvement for extending pod feature steps with KubernetesConf > --- > > Key: SPARK-37145 > URL: https://issues.apache.org/jira/browse/SPARK-37145 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.2.0 >Reporter: wangxin201492 >Priority: Major > > SPARK-33261 provides us with great convenience, but it only construct a > `KubernetesFeatureConfigStep` with a empty construction method. > It would be better to use the construction method with `KubernetesConf` (or > more detail: `KubernetesDriverConf` and `KubernetesExecutorConf`) -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37145) Improvement for extending pod feature steps with KubernetesConf
[ https://issues.apache.org/jira/browse/SPARK-37145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37145: Assignee: Apache Spark > Improvement for extending pod feature steps with KubernetesConf > --- > > Key: SPARK-37145 > URL: https://issues.apache.org/jira/browse/SPARK-37145 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.2.0 >Reporter: wangxin201492 >Assignee: Apache Spark >Priority: Major > > SPARK-33261 provides us with great convenience, but it only construct a > `KubernetesFeatureConfigStep` with a empty construction method. > It would be better to use the construction method with `KubernetesConf` (or > more detail: `KubernetesDriverConf` and `KubernetesExecutorConf`) -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37145) Improvement for extending pod feature steps with KubernetesConf
[ https://issues.apache.org/jira/browse/SPARK-37145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17460889#comment-17460889 ] Apache Spark commented on SPARK-37145: -- User 'Yikun' has created a pull request for this issue: https://github.com/apache/spark/pull/34924 > Improvement for extending pod feature steps with KubernetesConf > --- > > Key: SPARK-37145 > URL: https://issues.apache.org/jira/browse/SPARK-37145 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.2.0 >Reporter: wangxin201492 >Priority: Major > > SPARK-33261 provides us with great convenience, but it only construct a > `KubernetesFeatureConfigStep` with a empty construction method. > It would be better to use the construction method with `KubernetesConf` (or > more detail: `KubernetesDriverConf` and `KubernetesExecutorConf`) -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-37630) Security issue from Log4j 1.X exploit
[ https://issues.apache.org/jira/browse/SPARK-37630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17460832#comment-17460832 ] Ismail H edited comment on SPARK-37630 at 12/16/21, 4:06 PM: - to [~divekarsc] , extract from https://access.redhat.com/security/cve/CVE-2021-4104 : bq. Note this flaw ONLY affects applications which are specifically configured to use JMSAppender, which is not the default, or when the attacker has write access to the Log4j configuration for adding JMSAppender to the attacker's JMS Broker. so the question is, is Spark using JMSAppender ? was (Author: JIRAUSER281735): to [~divekarsc] , extract from https://access.redhat.com/security/cve/CVE-2021-4104 : bq. Note this flaw ONLY affects applications which are specifically configured to use JMSAppender, which is not the default, or when the attacker has write access to the Log4j configuration for adding JMSAppender to the attacker's JMS Broker. bq. so the question is, is Spark using JMSAppender ? > Security issue from Log4j 1.X exploit > - > > Key: SPARK-37630 > URL: https://issues.apache.org/jira/browse/SPARK-37630 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.8, 3.2.0 >Reporter: Ismail H >Priority: Major > Labels: security > > log4j is being used in version [1.2.17|#L122]] > > This version has been deprecated and since [then have a known issue that > hasn't been adressed in 1.X > versions|https://www.cvedetails.com/cve/CVE-2019-17571/]. > > *Solution:* > * Upgrade log4j to version 2.15.0 which correct all known issues. [Last > known issues |https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-44228] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37630) Security issue from Log4j 1.X exploit
[ https://issues.apache.org/jira/browse/SPARK-37630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17460832#comment-17460832 ] Ismail H commented on SPARK-37630: -- to [~divekarsc] , extract from https://access.redhat.com/security/cve/CVE-2021-4104 : bq. Note this flaw ONLY affects applications which are specifically configured to use JMSAppender, which is not the default, or when the attacker has write access to the Log4j configuration for adding JMSAppender to the attacker's JMS Broker. bq. so the question is, is Spark using JMSAppender ? > Security issue from Log4j 1.X exploit > - > > Key: SPARK-37630 > URL: https://issues.apache.org/jira/browse/SPARK-37630 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.8, 3.2.0 >Reporter: Ismail H >Priority: Major > Labels: security > > log4j is being used in version [1.2.17|#L122]] > > This version has been deprecated and since [then have a known issue that > hasn't been adressed in 1.X > versions|https://www.cvedetails.com/cve/CVE-2019-17571/]. > > *Solution:* > * Upgrade log4j to version 2.15.0 which correct all known issues. [Last > known issues |https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-44228] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35739) [Spark Sql] Add Java-comptable Dataset.join overloads
[ https://issues.apache.org/jira/browse/SPARK-35739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17460777#comment-17460777 ] Apache Spark commented on SPARK-35739: -- User 'brandondahler' has created a pull request for this issue: https://github.com/apache/spark/pull/34923 > [Spark Sql] Add Java-comptable Dataset.join overloads > - > > Key: SPARK-35739 > URL: https://issues.apache.org/jira/browse/SPARK-35739 > Project: Spark > Issue Type: Improvement > Components: Java API, SQL >Affects Versions: 2.0.0, 3.0.0 >Reporter: Brandon Dahler >Priority: Minor > > h2. Problem > When using Spark SQL with Java, the required syntax to utilize the following > two overloads are unnatural and not obvious to developers that haven't had to > interoperate with Scala before: > {code:java} > def join(right: Dataset[_], usingColumns: Seq[String]): DataFrame > def join(right: Dataset[_], usingColumns: Seq[String], joinType: String): > DataFrame > {code} > Examples: > Java 11 > {code:java} > Dataset dataset1 = ...; > Dataset dataset2 = ...; > // Overload with multiple usingColumns, no join type > dataset1 > .join(dataset2, JavaConverters.asScalaBuffer(List.of("column", "column2)) > .show(); > // Overload with multiple usingColumns and a join type > dataset1 > .join( > dataset2, > JavaConverters.asScalaBuffer(List.of("column", "column2")), > "left") > .show(); > {code} > > Additionally there is no overload that takes a single usingColumnn and a > joinType, forcing the developer to use the Seq[String] overload regardless of > language. > Examples: > Scala > {code:java} > val dataset1 :DataFrame = ...; > val dataset2 :DataFrame = ...; > dataset1 > .join(dataset2, Seq("column"), "left") > .show(); > {code} > > Java 11 > {code:java} > Dataset dataset1 = ...; > Dataset dataset2 = ...; > dataset1 > .join(dataset2, JavaConverters.asScalaBuffer(List.of("column")), "left") > .show(); > {code} > h2. Proposed Improvement > Add 3 additional overloads to Dataset: > > {code:java} > def join(right: Dataset[_], usingColumn: List[String]): DataFrame > def join(right: Dataset[_], usingColumn: String, joinType: String): DataFrame > def join(right: Dataset[_], usingColumn: List[String], joinType: String): > DataFrame > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35739) [Spark Sql] Add Java-comptable Dataset.join overloads
[ https://issues.apache.org/jira/browse/SPARK-35739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17460775#comment-17460775 ] Apache Spark commented on SPARK-35739: -- User 'brandondahler' has created a pull request for this issue: https://github.com/apache/spark/pull/34923 > [Spark Sql] Add Java-comptable Dataset.join overloads > - > > Key: SPARK-35739 > URL: https://issues.apache.org/jira/browse/SPARK-35739 > Project: Spark > Issue Type: Improvement > Components: Java API, SQL >Affects Versions: 2.0.0, 3.0.0 >Reporter: Brandon Dahler >Priority: Minor > > h2. Problem > When using Spark SQL with Java, the required syntax to utilize the following > two overloads are unnatural and not obvious to developers that haven't had to > interoperate with Scala before: > {code:java} > def join(right: Dataset[_], usingColumns: Seq[String]): DataFrame > def join(right: Dataset[_], usingColumns: Seq[String], joinType: String): > DataFrame > {code} > Examples: > Java 11 > {code:java} > Dataset dataset1 = ...; > Dataset dataset2 = ...; > // Overload with multiple usingColumns, no join type > dataset1 > .join(dataset2, JavaConverters.asScalaBuffer(List.of("column", "column2)) > .show(); > // Overload with multiple usingColumns and a join type > dataset1 > .join( > dataset2, > JavaConverters.asScalaBuffer(List.of("column", "column2")), > "left") > .show(); > {code} > > Additionally there is no overload that takes a single usingColumnn and a > joinType, forcing the developer to use the Seq[String] overload regardless of > language. > Examples: > Scala > {code:java} > val dataset1 :DataFrame = ...; > val dataset2 :DataFrame = ...; > dataset1 > .join(dataset2, Seq("column"), "left") > .show(); > {code} > > Java 11 > {code:java} > Dataset dataset1 = ...; > Dataset dataset2 = ...; > dataset1 > .join(dataset2, JavaConverters.asScalaBuffer(List.of("column")), "left") > .show(); > {code} > h2. Proposed Improvement > Add 3 additional overloads to Dataset: > > {code:java} > def join(right: Dataset[_], usingColumn: List[String]): DataFrame > def join(right: Dataset[_], usingColumn: String, joinType: String): DataFrame > def join(right: Dataset[_], usingColumn: List[String], joinType: String): > DataFrame > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37483) Support push down top N to JDBC data source V2
[ https://issues.apache.org/jira/browse/SPARK-37483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-37483. - Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 34918 [https://github.com/apache/spark/pull/34918] > Support push down top N to JDBC data source V2 > -- > > Key: SPARK-37483 > URL: https://issues.apache.org/jira/browse/SPARK-37483 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37483) Support push down top N to JDBC data source V2
[ https://issues.apache.org/jira/browse/SPARK-37483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-37483: --- Assignee: jiaan.geng > Support push down top N to JDBC data source V2 > -- > > Key: SPARK-37483 > URL: https://issues.apache.org/jira/browse/SPARK-37483 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37665) JDBC: Provide generic metadata functions
Daniel Haviv created SPARK-37665: Summary: JDBC: Provide generic metadata functions Key: SPARK-37665 URL: https://issues.apache.org/jira/browse/SPARK-37665 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.0 Reporter: Daniel Haviv JDBC Driver vendors expose the metadata (databases/tables) for the underlying engine through the [DatabaseMetaData interface.|https://docs.oracle.com/javase/8/docs/api/java/sql/DatabaseMetaData.html] Today when a user wants to fetch the metadata for an engine, they have to execute an sql statement that is tailored to a specific engine/syntax instead of using a more generic approach. I suggest we add two new functions to the JDBC reader: {code:java} listDatabases & listTables{code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37663) Mitigate ConcurrentModificationException thrown from tests in SparkContextSuite
[ https://issues.apache.org/jira/browse/SPARK-37663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17460612#comment-17460612 ] Apache Spark commented on SPARK-37663: -- User 'sarutak' has created a pull request for this issue: https://github.com/apache/spark/pull/34922 > Mitigate ConcurrentModificationException thrown from tests in > SparkContextSuite > --- > > Key: SPARK-37663 > URL: https://issues.apache.org/jira/browse/SPARK-37663 > Project: Spark > Issue Type: Bug > Components: Spark Core, Tests >Affects Versions: 3.3.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > > ConcurrentModificationException can be thrown from tests in SparkContextSuite > with Scala 2.13. > The cause seems to be same as SPARK-37315. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37663) Mitigate ConcurrentModificationException thrown from tests in SparkContextSuite
[ https://issues.apache.org/jira/browse/SPARK-37663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37663: Assignee: Apache Spark (was: Kousuke Saruta) > Mitigate ConcurrentModificationException thrown from tests in > SparkContextSuite > --- > > Key: SPARK-37663 > URL: https://issues.apache.org/jira/browse/SPARK-37663 > Project: Spark > Issue Type: Bug > Components: Spark Core, Tests >Affects Versions: 3.3.0 >Reporter: Kousuke Saruta >Assignee: Apache Spark >Priority: Minor > > ConcurrentModificationException can be thrown from tests in SparkContextSuite > with Scala 2.13. > The cause seems to be same as SPARK-37315. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37663) Mitigate ConcurrentModificationException thrown from tests in SparkContextSuite
[ https://issues.apache.org/jira/browse/SPARK-37663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37663: Assignee: Kousuke Saruta (was: Apache Spark) > Mitigate ConcurrentModificationException thrown from tests in > SparkContextSuite > --- > > Key: SPARK-37663 > URL: https://issues.apache.org/jira/browse/SPARK-37663 > Project: Spark > Issue Type: Bug > Components: Spark Core, Tests >Affects Versions: 3.3.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > > ConcurrentModificationException can be thrown from tests in SparkContextSuite > with Scala 2.13. > The cause seems to be same as SPARK-37315. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37664) Supplement benchmark result of InMemoryColumnarBenchmark and StateStoreBasicOperationsBenchmark for Java 11 and Java 17
[ https://issues.apache.org/jira/browse/SPARK-37664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17460539#comment-17460539 ] Apache Spark commented on SPARK-37664: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/34921 > Supplement benchmark result of InMemoryColumnarBenchmark and > StateStoreBasicOperationsBenchmark for Java 11 and Java 17 > --- > > Key: SPARK-37664 > URL: https://issues.apache.org/jira/browse/SPARK-37664 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.3.0 >Reporter: Yang Jie >Priority: Trivial > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37664) Supplement benchmark result of InMemoryColumnarBenchmark and StateStoreBasicOperationsBenchmark for Java 11 and Java 17
[ https://issues.apache.org/jira/browse/SPARK-37664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37664: Assignee: (was: Apache Spark) > Supplement benchmark result of InMemoryColumnarBenchmark and > StateStoreBasicOperationsBenchmark for Java 11 and Java 17 > --- > > Key: SPARK-37664 > URL: https://issues.apache.org/jira/browse/SPARK-37664 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.3.0 >Reporter: Yang Jie >Priority: Trivial > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37664) Supplement benchmark result of InMemoryColumnarBenchmark and StateStoreBasicOperationsBenchmark for Java 11 and Java 17
[ https://issues.apache.org/jira/browse/SPARK-37664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37664: Assignee: Apache Spark > Supplement benchmark result of InMemoryColumnarBenchmark and > StateStoreBasicOperationsBenchmark for Java 11 and Java 17 > --- > > Key: SPARK-37664 > URL: https://issues.apache.org/jira/browse/SPARK-37664 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.3.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Trivial > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37664) Supplement benchmark result of InMemoryColumnarBenchmark and StateStoreBasicOperationsBenchmark for Java 11 and Java 17
[ https://issues.apache.org/jira/browse/SPARK-37664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17460536#comment-17460536 ] Apache Spark commented on SPARK-37664: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/34921 > Supplement benchmark result of InMemoryColumnarBenchmark and > StateStoreBasicOperationsBenchmark for Java 11 and Java 17 > --- > > Key: SPARK-37664 > URL: https://issues.apache.org/jira/browse/SPARK-37664 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.3.0 >Reporter: Yang Jie >Priority: Trivial > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37664) Supplement benchmark result of InMemoryColumnarBenchmark and StateStoreBasicOperationsBenchmark for Java 11 and Java 17
[ https://issues.apache.org/jira/browse/SPARK-37664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-37664: - Component/s: Tests (was: SQL) > Supplement benchmark result of InMemoryColumnarBenchmark and > StateStoreBasicOperationsBenchmark for Java 11 and Java 17 > --- > > Key: SPARK-37664 > URL: https://issues.apache.org/jira/browse/SPARK-37664 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.3.0 >Reporter: Yang Jie >Priority: Trivial > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9853) Optimize shuffle fetch of contiguous partition IDs
[ https://issues.apache.org/jira/browse/SPARK-9853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-9853: Priority: Major (was: Minor) > Optimize shuffle fetch of contiguous partition IDs > -- > > Key: SPARK-9853 > URL: https://issues.apache.org/jira/browse/SPARK-9853 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Reporter: Matei Alexandru Zaharia >Assignee: Yuanjian Li >Priority: Major > Fix For: 3.0.0 > > > On the map side, we should be able to serve a block representing multiple > partition IDs in one block manager request -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37664) Supplement benchmark result of InMemoryColumnarBenchmark and StateStoreBasicOperationsBenchmark for Java 11 and Java 17
Yang Jie created SPARK-37664: Summary: Supplement benchmark result of InMemoryColumnarBenchmark and StateStoreBasicOperationsBenchmark for Java 11 and Java 17 Key: SPARK-37664 URL: https://issues.apache.org/jira/browse/SPARK-37664 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.3.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37663) Mitigate ConcurrentModificationException thrown from tests in SparkContextSuite
[ https://issues.apache.org/jira/browse/SPARK-37663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-37663: --- Summary: Mitigate ConcurrentModificationException thrown from tests in SparkContextSuite (was: Mitigate ConcurrentModificationException thrown from a test in SparkContextSuite) > Mitigate ConcurrentModificationException thrown from tests in > SparkContextSuite > --- > > Key: SPARK-37663 > URL: https://issues.apache.org/jira/browse/SPARK-37663 > Project: Spark > Issue Type: Bug > Components: Spark Core, Tests >Affects Versions: 3.3.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > > ConcurrentModificationException can be thrown from tests in SparkContextSuite > with Scala 2.13. > The cause seems to be same as SPARK-37315. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37663) SPARK-37315][ML][TEST] Mitigate ConcurrentModificationException thrown from a test in SparkContextSuite
Kousuke Saruta created SPARK-37663: -- Summary: SPARK-37315][ML][TEST] Mitigate ConcurrentModificationException thrown from a test in SparkContextSuite Key: SPARK-37663 URL: https://issues.apache.org/jira/browse/SPARK-37663 Project: Spark Issue Type: Bug Components: Spark Core, Tests Affects Versions: 3.3.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta ConcurrentModificationException can be thrown from tests in SparkContextSuite with Scala 2.13. The cause seems to be same as SPARK-37315. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37663) Mitigate ConcurrentModificationException thrown from a test in SparkContextSuite
[ https://issues.apache.org/jira/browse/SPARK-37663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-37663: --- Summary: Mitigate ConcurrentModificationException thrown from a test in SparkContextSuite (was: SPARK-37315][ML][TEST] Mitigate ConcurrentModificationException thrown from a test in SparkContextSuite) > Mitigate ConcurrentModificationException thrown from a test in > SparkContextSuite > > > Key: SPARK-37663 > URL: https://issues.apache.org/jira/browse/SPARK-37663 > Project: Spark > Issue Type: Bug > Components: Spark Core, Tests >Affects Versions: 3.3.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > > ConcurrentModificationException can be thrown from tests in SparkContextSuite > with Scala 2.13. > The cause seems to be same as SPARK-37315. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37661) SparkSQLCLIDriver will use hive defaults to resolve warehouse dir
[ https://issues.apache.org/jira/browse/SPARK-37661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37661: Assignee: Apache Spark > SparkSQLCLIDriver will use hive defaults to resolve warehouse dir > - > > Key: SPARK-37661 > URL: https://issues.apache.org/jira/browse/SPARK-37661 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Kent Yao >Assignee: Apache Spark >Priority: Minor > > {code:log} > 21/12/16 15:27:26.713 main INFO SharedState: spark.sql.warehouse.dir is not > set, but hive.metastore.warehouse.dir is set. Setting spark.sql.warehouse.dir > to the value of hive.metastore.warehouse.dir. > 21/12/16 15:27:26.761 main INFO SharedState: Warehouse path is > 'file:/user/hive/warehouse'. > ... > ... > 21/12/16 15:27:36.559 main INFO SharedState: Setting > hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir. > 21/12/16 15:27:36.561 main INFO SharedState: Warehouse path is > 'file:/Users/kentyao/Downloads/spark/spark-3.2.0-bin-hadoop3.2/spark-warehouse'. > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37661) SparkSQLCLIDriver will use hive defaults to resolve warehouse dir
[ https://issues.apache.org/jira/browse/SPARK-37661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17460505#comment-17460505 ] Apache Spark commented on SPARK-37661: -- User 'yaooqinn' has created a pull request for this issue: https://github.com/apache/spark/pull/34920 > SparkSQLCLIDriver will use hive defaults to resolve warehouse dir > - > > Key: SPARK-37661 > URL: https://issues.apache.org/jira/browse/SPARK-37661 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Kent Yao >Priority: Minor > > {code:log} > 21/12/16 15:27:26.713 main INFO SharedState: spark.sql.warehouse.dir is not > set, but hive.metastore.warehouse.dir is set. Setting spark.sql.warehouse.dir > to the value of hive.metastore.warehouse.dir. > 21/12/16 15:27:26.761 main INFO SharedState: Warehouse path is > 'file:/user/hive/warehouse'. > ... > ... > 21/12/16 15:27:36.559 main INFO SharedState: Setting > hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir. > 21/12/16 15:27:36.561 main INFO SharedState: Warehouse path is > 'file:/Users/kentyao/Downloads/spark/spark-3.2.0-bin-hadoop3.2/spark-warehouse'. > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37661) SparkSQLCLIDriver will use hive defaults to resolve warehouse dir
[ https://issues.apache.org/jira/browse/SPARK-37661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37661: Assignee: (was: Apache Spark) > SparkSQLCLIDriver will use hive defaults to resolve warehouse dir > - > > Key: SPARK-37661 > URL: https://issues.apache.org/jira/browse/SPARK-37661 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Kent Yao >Priority: Minor > > {code:log} > 21/12/16 15:27:26.713 main INFO SharedState: spark.sql.warehouse.dir is not > set, but hive.metastore.warehouse.dir is set. Setting spark.sql.warehouse.dir > to the value of hive.metastore.warehouse.dir. > 21/12/16 15:27:26.761 main INFO SharedState: Warehouse path is > 'file:/user/hive/warehouse'. > ... > ... > 21/12/16 15:27:36.559 main INFO SharedState: Setting > hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir. > 21/12/16 15:27:36.561 main INFO SharedState: Warehouse path is > 'file:/Users/kentyao/Downloads/spark/spark-3.2.0-bin-hadoop3.2/spark-warehouse'. > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37662) exception when handling late data with watermarking and window
luigi created SPARK-37662: - Summary: exception when handling late data with watermarking and window Key: SPARK-37662 URL: https://issues.apache.org/jira/browse/SPARK-37662 Project: Spark Issue Type: Bug Components: Structured Streaming Affects Versions: 3.2.0 Environment: spark v3.2.0 scala v2.12.12 Reporter: luigi when i use watermark to block late data, meanwhile window for state de-duplication, the order will cause unexpected behavior. a)below code will cause exception state that {color:#172b4d}"Couldn't find {color:#de350b}timestamp#58-T5000ms{color} in [{color:#4c9aff}window#550-T5000ms{color},raid#132L,app#528]"{color} {code:java} // code placeholder withWatermark("timestamp", "5 seconds"). withColumn("window", window($"timestamp", "1 hours")). dropDuplicates("window", "raid", "app"). {code} b) but when i switch the order of watermark and window config as below, it work without any exception {code:java} // code placeholder withColumn("window", window($"timestamp", "1 hours")). withWatermark("timestamp", "5 seconds"). dropDuplicates("window", "raid", "app"). {code} pls. note , this issue does not exist on spark v3.1.2 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37661) SparkSQLCLIDriver will use hive defaults to resolve warehouse dir
Kent Yao created SPARK-37661: Summary: SparkSQLCLIDriver will use hive defaults to resolve warehouse dir Key: SPARK-37661 URL: https://issues.apache.org/jira/browse/SPARK-37661 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.0 Reporter: Kent Yao {code:log} 21/12/16 15:27:26.713 main INFO SharedState: spark.sql.warehouse.dir is not set, but hive.metastore.warehouse.dir is set. Setting spark.sql.warehouse.dir to the value of hive.metastore.warehouse.dir. 21/12/16 15:27:26.761 main INFO SharedState: Warehouse path is 'file:/user/hive/warehouse'. ... ... 21/12/16 15:27:36.559 main INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir. 21/12/16 15:27:36.561 main INFO SharedState: Warehouse path is 'file:/Users/kentyao/Downloads/spark/spark-3.2.0-bin-hadoop3.2/spark-warehouse'. {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org