[jira] [Commented] (SPARK-42567) Track state store provider load time and log warning if it exceeds a threshold
[ https://issues.apache.org/jira/browse/SPARK-42567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693354#comment-17693354 ] Apache Spark commented on SPARK-42567: -- User 'anishshri-db' has created a pull request for this issue: https://github.com/apache/spark/pull/40163 > Track state store provider load time and log warning if it exceeds a threshold > -- > > Key: SPARK-42567 > URL: https://issues.apache.org/jira/browse/SPARK-42567 > Project: Spark > Issue Type: Task > Components: Structured Streaming >Affects Versions: 3.4.1 >Reporter: Anish Shrigondekar >Priority: Major > > Track state store provider load time and log warning if it exceeds a threshold > > In some cases, we see that the filesystem initialization might take time for > the first time that we create the provider and initialize it. This change > will log the time taken if it exceeds a certain threshold -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42566) RocksDB StateStore lock acquisition should happen after getting input iterator from inputRDD
[ https://issues.apache.org/jira/browse/SPARK-42566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693329#comment-17693329 ] Apache Spark commented on SPARK-42566: -- User 'huanliwang-db' has created a pull request for this issue: https://github.com/apache/spark/pull/40162 > RocksDB StateStore lock acquisition should happen after getting input > iterator from inputRDD > > > Key: SPARK-42566 > URL: https://issues.apache.org/jira/browse/SPARK-42566 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.5.0 >Reporter: Huanli Wang >Priority: Minor > > The current behavior of the `{*}compute{*}` method in both > `{*}StateStoreRDD{*}` and `{*}ReadStateStoreRDD{*}` is: we first get the > state store instance and then get the input iterator for the inputRDD. > For RocksDB state store, the running task will acquire and hold the lock for > this instance. The retried task or speculative task will fail to acquire the > lock and eventually abort the job if there are some network issues. For > example, When we shrink the executors, the alive one will try to fetch data > from the killed ones because it doesn't know the target location (prefetched > from the driver) is dead until it tries to fetch data. The query might be > hanging for a long time as the executor will retry > {{*spark.shuffle.io.maxRetries=3*}} times and for each retry wait for > {{*spark.shuffle.io.connectionTimeout*}} (default value is 120s) before > timeout. In total, the task could be hanging for about 6 minutes. And the > retried or speculative tasks won't be able to acquire the lock in this period. > Making lock acquisition happen after retrieving the input iterator should be > able to avoid this situation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42566) RocksDB StateStore lock acquisition should happen after getting input iterator from inputRDD
[ https://issues.apache.org/jira/browse/SPARK-42566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42566: Assignee: Apache Spark > RocksDB StateStore lock acquisition should happen after getting input > iterator from inputRDD > > > Key: SPARK-42566 > URL: https://issues.apache.org/jira/browse/SPARK-42566 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.5.0 >Reporter: Huanli Wang >Assignee: Apache Spark >Priority: Minor > > The current behavior of the `{*}compute{*}` method in both > `{*}StateStoreRDD{*}` and `{*}ReadStateStoreRDD{*}` is: we first get the > state store instance and then get the input iterator for the inputRDD. > For RocksDB state store, the running task will acquire and hold the lock for > this instance. The retried task or speculative task will fail to acquire the > lock and eventually abort the job if there are some network issues. For > example, When we shrink the executors, the alive one will try to fetch data > from the killed ones because it doesn't know the target location (prefetched > from the driver) is dead until it tries to fetch data. The query might be > hanging for a long time as the executor will retry > {{*spark.shuffle.io.maxRetries=3*}} times and for each retry wait for > {{*spark.shuffle.io.connectionTimeout*}} (default value is 120s) before > timeout. In total, the task could be hanging for about 6 minutes. And the > retried or speculative tasks won't be able to acquire the lock in this period. > Making lock acquisition happen after retrieving the input iterator should be > able to avoid this situation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42566) RocksDB StateStore lock acquisition should happen after getting input iterator from inputRDD
[ https://issues.apache.org/jira/browse/SPARK-42566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42566: Assignee: (was: Apache Spark) > RocksDB StateStore lock acquisition should happen after getting input > iterator from inputRDD > > > Key: SPARK-42566 > URL: https://issues.apache.org/jira/browse/SPARK-42566 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.5.0 >Reporter: Huanli Wang >Priority: Minor > > The current behavior of the `{*}compute{*}` method in both > `{*}StateStoreRDD{*}` and `{*}ReadStateStoreRDD{*}` is: we first get the > state store instance and then get the input iterator for the inputRDD. > For RocksDB state store, the running task will acquire and hold the lock for > this instance. The retried task or speculative task will fail to acquire the > lock and eventually abort the job if there are some network issues. For > example, When we shrink the executors, the alive one will try to fetch data > from the killed ones because it doesn't know the target location (prefetched > from the driver) is dead until it tries to fetch data. The query might be > hanging for a long time as the executor will retry > {{*spark.shuffle.io.maxRetries=3*}} times and for each retry wait for > {{*spark.shuffle.io.connectionTimeout*}} (default value is 120s) before > timeout. In total, the task could be hanging for about 6 minutes. And the > retried or speculative tasks won't be able to acquire the lock in this period. > Making lock acquisition happen after retrieving the input iterator should be > able to avoid this situation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42565) Error log improve ment for the lock acquisition of RocksDB state store instance
[ https://issues.apache.org/jira/browse/SPARK-42565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42565: Assignee: Apache Spark > Error log improve ment for the lock acquisition of RocksDB state store > instance > --- > > Key: SPARK-42565 > URL: https://issues.apache.org/jira/browse/SPARK-42565 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.5.0 >Reporter: Huanli Wang >Assignee: Apache Spark >Priority: Minor > > > {code:java} > "23/02/23 23:57:44 INFO Executor: Running task 2.0 in stage 57.1 (TID 363) > "23/02/23 23:58:44 ERROR RocksDB StateStoreId(opId=0,partId=3,name=default): > RocksDB instance could not be acquired by [ThreadId: Some(49), task: 3.0 in > stage 57, TID 363] as it was not released by [ThreadId: Some(51), task: 3.1 > in stage 57, TID 342] after 60002 ms.{code} > > We are seeing those error messages for a testing query. The *taskId != > partitionId* but we fail to be clear on this in the error log. > It's confusing when we see those logs: the second log entry seems to talk > about `{*}task 3.0{*}` (it's actually partition 3 and retry attempt 0), but > the `{*}TID 363{*}` is already occupied by `{*}task 2.0 in stage 57.1{*}`. > > Also, it's unclear at which stage retry attempt, the lock is acquired (or > fails to be acquired) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42565) Error log improve ment for the lock acquisition of RocksDB state store instance
[ https://issues.apache.org/jira/browse/SPARK-42565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693320#comment-17693320 ] Apache Spark commented on SPARK-42565: -- User 'huanliwang-db' has created a pull request for this issue: https://github.com/apache/spark/pull/40161 > Error log improve ment for the lock acquisition of RocksDB state store > instance > --- > > Key: SPARK-42565 > URL: https://issues.apache.org/jira/browse/SPARK-42565 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.5.0 >Reporter: Huanli Wang >Priority: Minor > > > {code:java} > "23/02/23 23:57:44 INFO Executor: Running task 2.0 in stage 57.1 (TID 363) > "23/02/23 23:58:44 ERROR RocksDB StateStoreId(opId=0,partId=3,name=default): > RocksDB instance could not be acquired by [ThreadId: Some(49), task: 3.0 in > stage 57, TID 363] as it was not released by [ThreadId: Some(51), task: 3.1 > in stage 57, TID 342] after 60002 ms.{code} > > We are seeing those error messages for a testing query. The *taskId != > partitionId* but we fail to be clear on this in the error log. > It's confusing when we see those logs: the second log entry seems to talk > about `{*}task 3.0{*}` (it's actually partition 3 and retry attempt 0), but > the `{*}TID 363{*}` is already occupied by `{*}task 2.0 in stage 57.1{*}`. > > Also, it's unclear at which stage retry attempt, the lock is acquired (or > fails to be acquired) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42565) Error log improve ment for the lock acquisition of RocksDB state store instance
[ https://issues.apache.org/jira/browse/SPARK-42565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42565: Assignee: (was: Apache Spark) > Error log improve ment for the lock acquisition of RocksDB state store > instance > --- > > Key: SPARK-42565 > URL: https://issues.apache.org/jira/browse/SPARK-42565 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.5.0 >Reporter: Huanli Wang >Priority: Minor > > > {code:java} > "23/02/23 23:57:44 INFO Executor: Running task 2.0 in stage 57.1 (TID 363) > "23/02/23 23:58:44 ERROR RocksDB StateStoreId(opId=0,partId=3,name=default): > RocksDB instance could not be acquired by [ThreadId: Some(49), task: 3.0 in > stage 57, TID 363] as it was not released by [ThreadId: Some(51), task: 3.1 > in stage 57, TID 342] after 60002 ms.{code} > > We are seeing those error messages for a testing query. The *taskId != > partitionId* but we fail to be clear on this in the error log. > It's confusing when we see those logs: the second log entry seems to talk > about `{*}task 3.0{*}` (it's actually partition 3 and retry attempt 0), but > the `{*}TID 363{*}` is already occupied by `{*}task 2.0 in stage 57.1{*}`. > > Also, it's unclear at which stage retry attempt, the lock is acquired (or > fails to be acquired) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42509) WindowGroupLimitExec supports codegen
[ https://issues.apache.org/jira/browse/SPARK-42509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693180#comment-17693180 ] Apache Spark commented on SPARK-42509: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/40159 > WindowGroupLimitExec supports codegen > - > > Key: SPARK-42509 > URL: https://issues.apache.org/jira/browse/SPARK-42509 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: jiaan.geng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42488) Upgrade commons-crypto from 1.1.0 to 1.2.0
[ https://issues.apache.org/jira/browse/SPARK-42488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42488: Assignee: (was: Apache Spark) > Upgrade commons-crypto from 1.1.0 to 1.2.0 > -- > > Key: SPARK-42488 > URL: https://issues.apache.org/jira/browse/SPARK-42488 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Minor > > https://github.com/apache/commons-crypto/compare/rel/commons-crypto-1.1.0...rel/commons-crypto-1.2.0 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42488) Upgrade commons-crypto from 1.1.0 to 1.2.0
[ https://issues.apache.org/jira/browse/SPARK-42488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42488: Assignee: Apache Spark > Upgrade commons-crypto from 1.1.0 to 1.2.0 > -- > > Key: SPARK-42488 > URL: https://issues.apache.org/jira/browse/SPARK-42488 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Minor > > https://github.com/apache/commons-crypto/compare/rel/commons-crypto-1.1.0...rel/commons-crypto-1.2.0 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42551) Support subexpression elimination in FilterExec
[ https://issues.apache.org/jira/browse/SPARK-42551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693038#comment-17693038 ] Apache Spark commented on SPARK-42551: -- User 'wankunde' has created a pull request for this issue: https://github.com/apache/spark/pull/40157 > Support subexpression elimination in FilterExec > --- > > Key: SPARK-42551 > URL: https://issues.apache.org/jira/browse/SPARK-42551 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.2 >Reporter: Wan Kun >Priority: Major > > Just like SPARK-33092, We can support subexpression elimination in FilterExec > in Whole-stage codegen. > For example: > {code:java} > SELECT * FROM ( > SELECT v, v * v + 1 v1 from values(1) as t2(v) > ) t > where v > 0 and v1 > 5 and v1 < 10 > Codegen plan > {code:java} > *(1) Project [v#1, ((v#1 * v#1) + 1) AS v1#0] > +- *(1) Filter (((v#1 > 0) AND (((v#1 * v#1) + 1) > 5)) AND (((v#1 * v#1) + > 1) < 10)) >+- *(1) LocalTableScan [v#1] > {code} > The subexpression *(v#1 * v#1) + 1* will be execute twice times. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42551) Support subexpression elimination in FilterExec
[ https://issues.apache.org/jira/browse/SPARK-42551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42551: Assignee: (was: Apache Spark) > Support subexpression elimination in FilterExec > --- > > Key: SPARK-42551 > URL: https://issues.apache.org/jira/browse/SPARK-42551 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.2 >Reporter: Wan Kun >Priority: Major > > Just like SPARK-33092, We can support subexpression elimination in FilterExec > in Whole-stage codegen. > For example: > {code:java} > SELECT * FROM ( > SELECT v, v * v + 1 v1 from values(1) as t2(v) > ) t > where v > 0 and v1 > 5 and v1 < 10 > Codegen plan > {code:java} > *(1) Project [v#1, ((v#1 * v#1) + 1) AS v1#0] > +- *(1) Filter (((v#1 > 0) AND (((v#1 * v#1) + 1) > 5)) AND (((v#1 * v#1) + > 1) < 10)) >+- *(1) LocalTableScan [v#1] > {code} > The subexpression *(v#1 * v#1) + 1* will be execute twice times. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42551) Support subexpression elimination in FilterExec
[ https://issues.apache.org/jira/browse/SPARK-42551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42551: Assignee: Apache Spark > Support subexpression elimination in FilterExec > --- > > Key: SPARK-42551 > URL: https://issues.apache.org/jira/browse/SPARK-42551 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.2 >Reporter: Wan Kun >Assignee: Apache Spark >Priority: Major > > Just like SPARK-33092, We can support subexpression elimination in FilterExec > in Whole-stage codegen. > For example: > {code:java} > SELECT * FROM ( > SELECT v, v * v + 1 v1 from values(1) as t2(v) > ) t > where v > 0 and v1 > 5 and v1 < 10 > Codegen plan > {code:java} > *(1) Project [v#1, ((v#1 * v#1) + 1) AS v1#0] > +- *(1) Filter (((v#1 > 0) AND (((v#1 * v#1) + 1) > 5)) AND (((v#1 * v#1) + > 1) < 10)) >+- *(1) LocalTableScan [v#1] > {code} > The subexpression *(v#1 * v#1) + 1* will be execute twice times. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41823) DataFrame.join creating ambiguous column names
[ https://issues.apache.org/jira/browse/SPARK-41823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693005#comment-17693005 ] Apache Spark commented on SPARK-41823: -- User 'hvanhovell' has created a pull request for this issue: https://github.com/apache/spark/pull/40156 > DataFrame.join creating ambiguous column names > -- > > Key: SPARK-41823 > URL: https://issues.apache.org/jira/browse/SPARK-41823 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Assignee: Ruifeng Zheng >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 254, in pyspark.sql.connect.dataframe.DataFrame.drop > Failed example: > df.join(df2, df.name == df2.name, 'inner').drop('name').show() > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File "", line > 1, in > df.join(df2, df.name == df2.name, 'inner').drop('name').show() > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 534, in show > print(self._show_string(n, truncate, vertical)) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 423, in _show_string > ).toPandas() > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 1031, in toPandas > return self._session.client.to_pandas(query) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 413, in to_pandas > return self._execute_and_fetch(req) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 573, in _execute_and_fetch > self._handle_error(rpc_error) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 619, in _handle_error > raise SparkConnectAnalysisException( > pyspark.sql.connect.client.SparkConnectAnalysisException: > [AMBIGUOUS_REFERENCE] Reference `name` is ambiguous, could be: [`name`, > `name`]. > Plan: {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42534) Fix DB2 Limit clause
[ https://issues.apache.org/jira/browse/SPARK-42534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693004#comment-17693004 ] Apache Spark commented on SPARK-42534: -- User 'sadikovi' has created a pull request for this issue: https://github.com/apache/spark/pull/40155 > Fix DB2 Limit clause > > > Key: SPARK-42534 > URL: https://issues.apache.org/jira/browse/SPARK-42534 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Ivan Sadikov >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42548) Add PlainReferences to skip rewriting attributes
[ https://issues.apache.org/jira/browse/SPARK-42548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42548: Assignee: Apache Spark > Add PlainReferences to skip rewriting attributes > > > Key: SPARK-42548 > URL: https://issues.apache.org/jira/browse/SPARK-42548 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: XiDuo You >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42548) Add PlainReferences to skip rewriting attributes
[ https://issues.apache.org/jira/browse/SPARK-42548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42548: Assignee: (was: Apache Spark) > Add PlainReferences to skip rewriting attributes > > > Key: SPARK-42548 > URL: https://issues.apache.org/jira/browse/SPARK-42548 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: XiDuo You >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42548) Add PlainReferences to skip rewriting attributes
[ https://issues.apache.org/jira/browse/SPARK-42548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692992#comment-17692992 ] Apache Spark commented on SPARK-42548: -- User 'ulysses-you' has created a pull request for this issue: https://github.com/apache/spark/pull/40154 > Add PlainReferences to skip rewriting attributes > > > Key: SPARK-42548 > URL: https://issues.apache.org/jira/browse/SPARK-42548 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: XiDuo You >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42547) Make PySpark working with Python 3.7
[ https://issues.apache.org/jira/browse/SPARK-42547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42547: Assignee: (was: Apache Spark) > Make PySpark working with Python 3.7 > > > Key: SPARK-42547 > URL: https://issues.apache.org/jira/browse/SPARK-42547 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Blocker > > {code} > + ./python/run-tests --python-executables=python3 > Running PySpark tests. Output is in /home/ec2-user/spark/python/unit-tests.log > Will test against the following Python executables: ['python3'] > Will test the following Python modules: ['pyspark-connect', 'pyspark-core', > 'pyspark-errors', 'pyspark-ml', 'pyspark-mllib', 'pyspark-pandas', > 'pyspark-pandas-slow', 'pyspark-resource', 'pyspark-sql', 'pyspark-streaming'] > python3 python_implementation is CPython > python3 version is: Python 3.7.16 > Starting test(python3): pyspark.ml.tests.test_feature (temp output: > /home/ec2-user/spark/python/target/8ca9ab1a-05cc-4845-bf89-30d9001510bc/python3__pyspark.ml.tests.test_feature__kg6sseie.log) > Starting test(python3): pyspark.ml.tests.test_base (temp output: > /home/ec2-user/spark/python/target/f2264f3b-6b26-4e61-9452-8d6ddd7eb002/python3__pyspark.ml.tests.test_base__0902zf9_.log) > Starting test(python3): pyspark.ml.tests.test_algorithms (temp output: > /home/ec2-user/spark/python/target/d1dc4e07-e58c-4c03-abe5-09d8fab22e6a/python3__pyspark.ml.tests.test_algorithms__lh3wb2u8.log) > Starting test(python3): pyspark.ml.tests.test_evaluation (temp output: > /home/ec2-user/spark/python/target/3f42dc79-c945-4cf2-a1eb-83e72b40a9ee/python3__pyspark.ml.tests.test_evaluation__89idc7fa.log) > Finished test(python3): pyspark.ml.tests.test_base (16s) > Starting test(python3): pyspark.ml.tests.test_functions (temp output: > /home/ec2-user/spark/python/target/5a3b90f0-216b-4edd-9d15-6619d3e03300/python3__pyspark.ml.tests.test_functions__g5u1290s.log) > Traceback (most recent call last): > File "/usr/lib64/python3.7/runpy.py", line 193, in _run_module_as_main > "__main__", mod_spec) > File "/usr/lib64/python3.7/runpy.py", line 85, in _run_code > exec(code, run_globals) > File "/home/ec2-user/spark/python/pyspark/ml/tests/test_functions.py", line > 21, in > from pyspark.ml.functions import predict_batch_udf > File "/home/ec2-user/spark/python/pyspark/ml/functions.py", line 38, in > > from typing import Any, Callable, Iterator, List, Mapping, Protocol, > TYPE_CHECKING, Tuple, Union > ImportError: cannot import name 'Protocol' from 'typing' > (/usr/lib64/python3.7/typing.py) > Had test failures in pyspark.ml.tests.test_functions with python3; see logs. > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42547) Make PySpark working with Python 3.7
[ https://issues.apache.org/jira/browse/SPARK-42547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42547: Assignee: Apache Spark > Make PySpark working with Python 3.7 > > > Key: SPARK-42547 > URL: https://issues.apache.org/jira/browse/SPARK-42547 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Blocker > > {code} > + ./python/run-tests --python-executables=python3 > Running PySpark tests. Output is in /home/ec2-user/spark/python/unit-tests.log > Will test against the following Python executables: ['python3'] > Will test the following Python modules: ['pyspark-connect', 'pyspark-core', > 'pyspark-errors', 'pyspark-ml', 'pyspark-mllib', 'pyspark-pandas', > 'pyspark-pandas-slow', 'pyspark-resource', 'pyspark-sql', 'pyspark-streaming'] > python3 python_implementation is CPython > python3 version is: Python 3.7.16 > Starting test(python3): pyspark.ml.tests.test_feature (temp output: > /home/ec2-user/spark/python/target/8ca9ab1a-05cc-4845-bf89-30d9001510bc/python3__pyspark.ml.tests.test_feature__kg6sseie.log) > Starting test(python3): pyspark.ml.tests.test_base (temp output: > /home/ec2-user/spark/python/target/f2264f3b-6b26-4e61-9452-8d6ddd7eb002/python3__pyspark.ml.tests.test_base__0902zf9_.log) > Starting test(python3): pyspark.ml.tests.test_algorithms (temp output: > /home/ec2-user/spark/python/target/d1dc4e07-e58c-4c03-abe5-09d8fab22e6a/python3__pyspark.ml.tests.test_algorithms__lh3wb2u8.log) > Starting test(python3): pyspark.ml.tests.test_evaluation (temp output: > /home/ec2-user/spark/python/target/3f42dc79-c945-4cf2-a1eb-83e72b40a9ee/python3__pyspark.ml.tests.test_evaluation__89idc7fa.log) > Finished test(python3): pyspark.ml.tests.test_base (16s) > Starting test(python3): pyspark.ml.tests.test_functions (temp output: > /home/ec2-user/spark/python/target/5a3b90f0-216b-4edd-9d15-6619d3e03300/python3__pyspark.ml.tests.test_functions__g5u1290s.log) > Traceback (most recent call last): > File "/usr/lib64/python3.7/runpy.py", line 193, in _run_module_as_main > "__main__", mod_spec) > File "/usr/lib64/python3.7/runpy.py", line 85, in _run_code > exec(code, run_globals) > File "/home/ec2-user/spark/python/pyspark/ml/tests/test_functions.py", line > 21, in > from pyspark.ml.functions import predict_batch_udf > File "/home/ec2-user/spark/python/pyspark/ml/functions.py", line 38, in > > from typing import Any, Callable, Iterator, List, Mapping, Protocol, > TYPE_CHECKING, Tuple, Union > ImportError: cannot import name 'Protocol' from 'typing' > (/usr/lib64/python3.7/typing.py) > Had test failures in pyspark.ml.tests.test_functions with python3; see logs. > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42547) Make PySpark working with Python 3.7
[ https://issues.apache.org/jira/browse/SPARK-42547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692988#comment-17692988 ] Apache Spark commented on SPARK-42547: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/40153 > Make PySpark working with Python 3.7 > > > Key: SPARK-42547 > URL: https://issues.apache.org/jira/browse/SPARK-42547 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Blocker > > {code} > + ./python/run-tests --python-executables=python3 > Running PySpark tests. Output is in /home/ec2-user/spark/python/unit-tests.log > Will test against the following Python executables: ['python3'] > Will test the following Python modules: ['pyspark-connect', 'pyspark-core', > 'pyspark-errors', 'pyspark-ml', 'pyspark-mllib', 'pyspark-pandas', > 'pyspark-pandas-slow', 'pyspark-resource', 'pyspark-sql', 'pyspark-streaming'] > python3 python_implementation is CPython > python3 version is: Python 3.7.16 > Starting test(python3): pyspark.ml.tests.test_feature (temp output: > /home/ec2-user/spark/python/target/8ca9ab1a-05cc-4845-bf89-30d9001510bc/python3__pyspark.ml.tests.test_feature__kg6sseie.log) > Starting test(python3): pyspark.ml.tests.test_base (temp output: > /home/ec2-user/spark/python/target/f2264f3b-6b26-4e61-9452-8d6ddd7eb002/python3__pyspark.ml.tests.test_base__0902zf9_.log) > Starting test(python3): pyspark.ml.tests.test_algorithms (temp output: > /home/ec2-user/spark/python/target/d1dc4e07-e58c-4c03-abe5-09d8fab22e6a/python3__pyspark.ml.tests.test_algorithms__lh3wb2u8.log) > Starting test(python3): pyspark.ml.tests.test_evaluation (temp output: > /home/ec2-user/spark/python/target/3f42dc79-c945-4cf2-a1eb-83e72b40a9ee/python3__pyspark.ml.tests.test_evaluation__89idc7fa.log) > Finished test(python3): pyspark.ml.tests.test_base (16s) > Starting test(python3): pyspark.ml.tests.test_functions (temp output: > /home/ec2-user/spark/python/target/5a3b90f0-216b-4edd-9d15-6619d3e03300/python3__pyspark.ml.tests.test_functions__g5u1290s.log) > Traceback (most recent call last): > File "/usr/lib64/python3.7/runpy.py", line 193, in _run_module_as_main > "__main__", mod_spec) > File "/usr/lib64/python3.7/runpy.py", line 85, in _run_code > exec(code, run_globals) > File "/home/ec2-user/spark/python/pyspark/ml/tests/test_functions.py", line > 21, in > from pyspark.ml.functions import predict_batch_udf > File "/home/ec2-user/spark/python/pyspark/ml/functions.py", line 38, in > > from typing import Any, Callable, Iterator, List, Mapping, Protocol, > TYPE_CHECKING, Tuple, Union > ImportError: cannot import name 'Protocol' from 'typing' > (/usr/lib64/python3.7/typing.py) > Had test failures in pyspark.ml.tests.test_functions with python3; see logs. > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42545) Remove `experimental` from Volcano docs
[ https://issues.apache.org/jira/browse/SPARK-42545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42545: Assignee: Apache Spark > Remove `experimental` from Volcano docs > --- > > Key: SPARK-42545 > URL: https://issues.apache.org/jira/browse/SPARK-42545 > Project: Spark > Issue Type: Documentation > Components: Documentation, Kubernetes >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42545) Remove `experimental` from Volcano docs
[ https://issues.apache.org/jira/browse/SPARK-42545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692963#comment-17692963 ] Apache Spark commented on SPARK-42545: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/40152 > Remove `experimental` from Volcano docs > --- > > Key: SPARK-42545 > URL: https://issues.apache.org/jira/browse/SPARK-42545 > Project: Spark > Issue Type: Documentation > Components: Documentation, Kubernetes >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42545) Remove `experimental` from Volcano docs
[ https://issues.apache.org/jira/browse/SPARK-42545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42545: Assignee: (was: Apache Spark) > Remove `experimental` from Volcano docs > --- > > Key: SPARK-42545 > URL: https://issues.apache.org/jira/browse/SPARK-42545 > Project: Spark > Issue Type: Documentation > Components: Documentation, Kubernetes >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42121) Add built-in table-valued functions posexplode and posexplode_outer
[ https://issues.apache.org/jira/browse/SPARK-42121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42121: Assignee: Apache Spark > Add built-in table-valued functions posexplode and posexplode_outer > --- > > Key: SPARK-42121 > URL: https://issues.apache.org/jira/browse/SPARK-42121 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Allison Wang >Assignee: Apache Spark >Priority: Major > > Add `posexplode` and `posexplode_outer` to the built-in table function > registry. > Add new SQL tests in `table-valued-functions.sql` and `join-lateral.sql`. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42121) Add built-in table-valued functions posexplode and posexplode_outer
[ https://issues.apache.org/jira/browse/SPARK-42121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42121: Assignee: (was: Apache Spark) > Add built-in table-valued functions posexplode and posexplode_outer > --- > > Key: SPARK-42121 > URL: https://issues.apache.org/jira/browse/SPARK-42121 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Allison Wang >Priority: Major > > Add `posexplode` and `posexplode_outer` to the built-in table function > registry. > Add new SQL tests in `table-valued-functions.sql` and `join-lateral.sql`. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42121) Add built-in table-valued functions posexplode and posexplode_outer
[ https://issues.apache.org/jira/browse/SPARK-42121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692961#comment-17692961 ] Apache Spark commented on SPARK-42121: -- User 'allisonwang-db' has created a pull request for this issue: https://github.com/apache/spark/pull/40151 > Add built-in table-valued functions posexplode and posexplode_outer > --- > > Key: SPARK-42121 > URL: https://issues.apache.org/jira/browse/SPARK-42121 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Allison Wang >Priority: Major > > Add `posexplode` and `posexplode_outer` to the built-in table function > registry. > Add new SQL tests in `table-valued-functions.sql` and `join-lateral.sql`. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41834) Implement SparkSession.conf
[ https://issues.apache.org/jira/browse/SPARK-41834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692947#comment-17692947 ] Apache Spark commented on SPARK-41834: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/40150 > Implement SparkSession.conf > --- > > Key: SPARK-41834 > URL: https://issues.apache.org/jira/browse/SPARK-41834 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", > line 2119, in pyspark.sql.connect.functions.unix_timestamp > Failed example: > spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles") > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File "", line > 1, in > spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles") > AttributeError: 'SparkSession' object has no attribute 'conf'{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41834) Implement SparkSession.conf
[ https://issues.apache.org/jira/browse/SPARK-41834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41834: Assignee: Apache Spark > Implement SparkSession.conf > --- > > Key: SPARK-41834 > URL: https://issues.apache.org/jira/browse/SPARK-41834 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Assignee: Apache Spark >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", > line 2119, in pyspark.sql.connect.functions.unix_timestamp > Failed example: > spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles") > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File "", line > 1, in > spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles") > AttributeError: 'SparkSession' object has no attribute 'conf'{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41834) Implement SparkSession.conf
[ https://issues.apache.org/jira/browse/SPARK-41834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41834: Assignee: (was: Apache Spark) > Implement SparkSession.conf > --- > > Key: SPARK-41834 > URL: https://issues.apache.org/jira/browse/SPARK-41834 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", > line 2119, in pyspark.sql.connect.functions.unix_timestamp > Failed example: > spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles") > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File "", line > 1, in > spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles") > AttributeError: 'SparkSession' object has no attribute 'conf'{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41834) Implement SparkSession.conf
[ https://issues.apache.org/jira/browse/SPARK-41834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692946#comment-17692946 ] Apache Spark commented on SPARK-41834: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/40150 > Implement SparkSession.conf > --- > > Key: SPARK-41834 > URL: https://issues.apache.org/jira/browse/SPARK-41834 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", > line 2119, in pyspark.sql.connect.functions.unix_timestamp > Failed example: > spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles") > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File "", line > 1, in > spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles") > AttributeError: 'SparkSession' object has no attribute 'conf'{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42122) Add built-in table-valued function stack
[ https://issues.apache.org/jira/browse/SPARK-42122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42122: Assignee: (was: Apache Spark) > Add built-in table-valued function stack > > > Key: SPARK-42122 > URL: https://issues.apache.org/jira/browse/SPARK-42122 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Allison Wang >Priority: Major > > Add `stack` to the built-in table function registry. > Add new SQL tests in `table-valued-functions.sql` and `join-lateral.sql`. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42122) Add built-in table-valued function stack
[ https://issues.apache.org/jira/browse/SPARK-42122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692945#comment-17692945 ] Apache Spark commented on SPARK-42122: -- User 'allisonwang-db' has created a pull request for this issue: https://github.com/apache/spark/pull/40149 > Add built-in table-valued function stack > > > Key: SPARK-42122 > URL: https://issues.apache.org/jira/browse/SPARK-42122 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Allison Wang >Priority: Major > > Add `stack` to the built-in table function registry. > Add new SQL tests in `table-valued-functions.sql` and `join-lateral.sql`. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42122) Add built-in table-valued function stack
[ https://issues.apache.org/jira/browse/SPARK-42122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42122: Assignee: Apache Spark > Add built-in table-valued function stack > > > Key: SPARK-42122 > URL: https://issues.apache.org/jira/browse/SPARK-42122 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Allison Wang >Assignee: Apache Spark >Priority: Major > > Add `stack` to the built-in table function registry. > Add new SQL tests in `table-valued-functions.sql` and `join-lateral.sql`. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42544) Spark Connect Scala Client: support parameterized SQL
[ https://issues.apache.org/jira/browse/SPARK-42544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692937#comment-17692937 ] Apache Spark commented on SPARK-42544: -- User 'amaliujia' has created a pull request for this issue: https://github.com/apache/spark/pull/40148 > Spark Connect Scala Client: support parameterized SQL > - > > Key: SPARK-42544 > URL: https://issues.apache.org/jira/browse/SPARK-42544 > Project: Spark > Issue Type: Task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42544) Spark Connect Scala Client: support parameterized SQL
[ https://issues.apache.org/jira/browse/SPARK-42544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42544: Assignee: Apache Spark (was: Rui Wang) > Spark Connect Scala Client: support parameterized SQL > - > > Key: SPARK-42544 > URL: https://issues.apache.org/jira/browse/SPARK-42544 > Project: Spark > Issue Type: Task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42544) Spark Connect Scala Client: support parameterized SQL
[ https://issues.apache.org/jira/browse/SPARK-42544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42544: Assignee: Rui Wang (was: Apache Spark) > Spark Connect Scala Client: support parameterized SQL > - > > Key: SPARK-42544 > URL: https://issues.apache.org/jira/browse/SPARK-42544 > Project: Spark > Issue Type: Task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42544) Spark Connect Scala Client: support parameterized SQL
[ https://issues.apache.org/jira/browse/SPARK-42544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692936#comment-17692936 ] Apache Spark commented on SPARK-42544: -- User 'amaliujia' has created a pull request for this issue: https://github.com/apache/spark/pull/40148 > Spark Connect Scala Client: support parameterized SQL > - > > Key: SPARK-42544 > URL: https://issues.apache.org/jira/browse/SPARK-42544 > Project: Spark > Issue Type: Task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42543) Specify protocol for UDF artifact transfer in JVM/Scala client
[ https://issues.apache.org/jira/browse/SPARK-42543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692924#comment-17692924 ] Apache Spark commented on SPARK-42543: -- User 'vicennial' has created a pull request for this issue: https://github.com/apache/spark/pull/40147 > Specify protocol for UDF artifact transfer in JVM/Scala client > --- > > Key: SPARK-42543 > URL: https://issues.apache.org/jira/browse/SPARK-42543 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Venkata Sai Akhil Gudesa >Priority: Major > > An "artifact" is any file that may be used during the execution of a UDF. > In the decoupled client-server architecture of Spark Connect, a remote client > may use a local JAR or a new class in their UDF that may not be present on > the server. To handle these cases of missing "artifacts", a protocol for > artifact transfer is needed to move the required artifacts from the client > side over to the server side. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42543) Specify protocol for UDF artifact transfer in JVM/Scala client
[ https://issues.apache.org/jira/browse/SPARK-42543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692923#comment-17692923 ] Apache Spark commented on SPARK-42543: -- User 'vicennial' has created a pull request for this issue: https://github.com/apache/spark/pull/40147 > Specify protocol for UDF artifact transfer in JVM/Scala client > --- > > Key: SPARK-42543 > URL: https://issues.apache.org/jira/browse/SPARK-42543 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Venkata Sai Akhil Gudesa >Priority: Major > > An "artifact" is any file that may be used during the execution of a UDF. > In the decoupled client-server architecture of Spark Connect, a remote client > may use a local JAR or a new class in their UDF that may not be present on > the server. To handle these cases of missing "artifacts", a protocol for > artifact transfer is needed to move the required artifacts from the client > side over to the server side. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42543) Specify protocol for UDF artifact transfer in JVM/Scala client
[ https://issues.apache.org/jira/browse/SPARK-42543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42543: Assignee: (was: Apache Spark) > Specify protocol for UDF artifact transfer in JVM/Scala client > --- > > Key: SPARK-42543 > URL: https://issues.apache.org/jira/browse/SPARK-42543 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Venkata Sai Akhil Gudesa >Priority: Major > > An "artifact" is any file that may be used during the execution of a UDF. > In the decoupled client-server architecture of Spark Connect, a remote client > may use a local JAR or a new class in their UDF that may not be present on > the server. To handle these cases of missing "artifacts", a protocol for > artifact transfer is needed to move the required artifacts from the client > side over to the server side. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42543) Specify protocol for UDF artifact transfer in JVM/Scala client
[ https://issues.apache.org/jira/browse/SPARK-42543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42543: Assignee: Apache Spark > Specify protocol for UDF artifact transfer in JVM/Scala client > --- > > Key: SPARK-42543 > URL: https://issues.apache.org/jira/browse/SPARK-42543 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Venkata Sai Akhil Gudesa >Assignee: Apache Spark >Priority: Major > > An "artifact" is any file that may be used during the execution of a UDF. > In the decoupled client-server architecture of Spark Connect, a remote client > may use a local JAR or a new class in their UDF that may not be present on > the server. To handle these cases of missing "artifacts", a protocol for > artifact transfer is needed to move the required artifacts from the client > side over to the server side. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42120) Add built-in table-valued function json_tuple
[ https://issues.apache.org/jira/browse/SPARK-42120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42120: Assignee: Apache Spark > Add built-in table-valued function json_tuple > - > > Key: SPARK-42120 > URL: https://issues.apache.org/jira/browse/SPARK-42120 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Allison Wang >Assignee: Apache Spark >Priority: Major > > Add `json_tuple` to the built-in table function registry. > Add new SQL tests in `table-valued-functions.sql` and `join-lateral.sql`. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42120) Add built-in table-valued function json_tuple
[ https://issues.apache.org/jira/browse/SPARK-42120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692921#comment-17692921 ] Apache Spark commented on SPARK-42120: -- User 'allisonwang-db' has created a pull request for this issue: https://github.com/apache/spark/pull/40146 > Add built-in table-valued function json_tuple > - > > Key: SPARK-42120 > URL: https://issues.apache.org/jira/browse/SPARK-42120 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Allison Wang >Priority: Major > > Add `json_tuple` to the built-in table function registry. > Add new SQL tests in `table-valued-functions.sql` and `join-lateral.sql`. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42120) Add built-in table-valued function json_tuple
[ https://issues.apache.org/jira/browse/SPARK-42120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42120: Assignee: (was: Apache Spark) > Add built-in table-valued function json_tuple > - > > Key: SPARK-42120 > URL: https://issues.apache.org/jira/browse/SPARK-42120 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Allison Wang >Priority: Major > > Add `json_tuple` to the built-in table function registry. > Add new SQL tests in `table-valued-functions.sql` and `join-lateral.sql`. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42541) Support Pivot with provided pivot column values
[ https://issues.apache.org/jira/browse/SPARK-42541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42541: Assignee: Apache Spark (was: Rui Wang) > Support Pivot with provided pivot column values > --- > > Key: SPARK-42541 > URL: https://issues.apache.org/jira/browse/SPARK-42541 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42541) Support Pivot with provided pivot column values
[ https://issues.apache.org/jira/browse/SPARK-42541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42541: Assignee: Rui Wang (was: Apache Spark) > Support Pivot with provided pivot column values > --- > > Key: SPARK-42541 > URL: https://issues.apache.org/jira/browse/SPARK-42541 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42541) Support Pivot with provided pivot column values
[ https://issues.apache.org/jira/browse/SPARK-42541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692913#comment-17692913 ] Apache Spark commented on SPARK-42541: -- User 'amaliujia' has created a pull request for this issue: https://github.com/apache/spark/pull/40145 > Support Pivot with provided pivot column values > --- > > Key: SPARK-42541 > URL: https://issues.apache.org/jira/browse/SPARK-42541 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42473) An explicit cast will be needed when INSERT OVERWRITE SELECT UNION ALL
[ https://issues.apache.org/jira/browse/SPARK-42473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42473: Assignee: Apache Spark > An explicit cast will be needed when INSERT OVERWRITE SELECT UNION ALL > -- > > Key: SPARK-42473 > URL: https://issues.apache.org/jira/browse/SPARK-42473 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 3.3.1 > Environment: spark 3.3.1 >Reporter: kevinshin >Assignee: Apache Spark >Priority: Major > > *when 'union all' and one select statement use* *Literal as column value , > the other* *select statement has computed expression at the same column , > then the whole statement will compile failed. A explicit cast will be needed.* > for example: > {color:#4c9aff}explain{color} > {color:#4c9aff}*INSERT* OVERWRITE *TABLE* test.spark33_decimal_orc{color} > {color:#4c9aff}*select* *null* *as* amt1, {*}cast{*}('256.99' *as* > {*}decimal{*}(20,8)) *as* amt2{color} > {color:#4c9aff}*union* *all*{color} > {color:#4c9aff}*select* {*}cast{*}('200.99' *as* > {*}decimal{*}(20,8)){*}/{*}100 *as* amt1,{*}cast{*}('256.99' *as* > {*}decimal{*}(20,8)) *as* amt2;{color} > *will got error :* > org.apache.spark.{*}sql{*}.catalyst.expressions.Literal cannot be *cast* *to* > org.apache.spark.{*}sql{*}.catalyst.expressions.AnsiCast > The SQL will need to change to : > {color:#4c9aff}explain{color} > {color:#4c9aff}*INSERT* OVERWRITE *TABLE* test.spark33_decimal_orc{color} > {color:#4c9aff}*select* *null* *as* amt1,{*}cast{*}('256.99' *as* > {*}decimal{*}(20,8)) *as* amt2{color} > {color:#4c9aff}*union* *all*{color} > {color:#4c9aff}*select* {color:#de350b}{*}cast{*}({color}{*}cast{*}('200.99' > *as* {*}decimal{*}(20,8)){*}/{*}100 *as* > {*}decimal{*}(20,8){color:#de350b}){color} *as* amt1,{*}cast{*}('256.99' *as* > {*}decimal{*}(20,8)) *as* amt2;{color} > > *but this is not need in spark3.2.1 , is this a bug for spark3.3.1 ?* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42473) An explicit cast will be needed when INSERT OVERWRITE SELECT UNION ALL
[ https://issues.apache.org/jira/browse/SPARK-42473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692857#comment-17692857 ] Apache Spark commented on SPARK-42473: -- User 'RunyaoChen' has created a pull request for this issue: https://github.com/apache/spark/pull/40140 > An explicit cast will be needed when INSERT OVERWRITE SELECT UNION ALL > -- > > Key: SPARK-42473 > URL: https://issues.apache.org/jira/browse/SPARK-42473 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 3.3.1 > Environment: spark 3.3.1 >Reporter: kevinshin >Priority: Major > > *when 'union all' and one select statement use* *Literal as column value , > the other* *select statement has computed expression at the same column , > then the whole statement will compile failed. A explicit cast will be needed.* > for example: > {color:#4c9aff}explain{color} > {color:#4c9aff}*INSERT* OVERWRITE *TABLE* test.spark33_decimal_orc{color} > {color:#4c9aff}*select* *null* *as* amt1, {*}cast{*}('256.99' *as* > {*}decimal{*}(20,8)) *as* amt2{color} > {color:#4c9aff}*union* *all*{color} > {color:#4c9aff}*select* {*}cast{*}('200.99' *as* > {*}decimal{*}(20,8)){*}/{*}100 *as* amt1,{*}cast{*}('256.99' *as* > {*}decimal{*}(20,8)) *as* amt2;{color} > *will got error :* > org.apache.spark.{*}sql{*}.catalyst.expressions.Literal cannot be *cast* *to* > org.apache.spark.{*}sql{*}.catalyst.expressions.AnsiCast > The SQL will need to change to : > {color:#4c9aff}explain{color} > {color:#4c9aff}*INSERT* OVERWRITE *TABLE* test.spark33_decimal_orc{color} > {color:#4c9aff}*select* *null* *as* amt1,{*}cast{*}('256.99' *as* > {*}decimal{*}(20,8)) *as* amt2{color} > {color:#4c9aff}*union* *all*{color} > {color:#4c9aff}*select* {color:#de350b}{*}cast{*}({color}{*}cast{*}('200.99' > *as* {*}decimal{*}(20,8)){*}/{*}100 *as* > {*}decimal{*}(20,8){color:#de350b}){color} *as* amt1,{*}cast{*}('256.99' *as* > {*}decimal{*}(20,8)) *as* amt2;{color} > > *but this is not need in spark3.2.1 , is this a bug for spark3.3.1 ?* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42473) An explicit cast will be needed when INSERT OVERWRITE SELECT UNION ALL
[ https://issues.apache.org/jira/browse/SPARK-42473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42473: Assignee: (was: Apache Spark) > An explicit cast will be needed when INSERT OVERWRITE SELECT UNION ALL > -- > > Key: SPARK-42473 > URL: https://issues.apache.org/jira/browse/SPARK-42473 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 3.3.1 > Environment: spark 3.3.1 >Reporter: kevinshin >Priority: Major > > *when 'union all' and one select statement use* *Literal as column value , > the other* *select statement has computed expression at the same column , > then the whole statement will compile failed. A explicit cast will be needed.* > for example: > {color:#4c9aff}explain{color} > {color:#4c9aff}*INSERT* OVERWRITE *TABLE* test.spark33_decimal_orc{color} > {color:#4c9aff}*select* *null* *as* amt1, {*}cast{*}('256.99' *as* > {*}decimal{*}(20,8)) *as* amt2{color} > {color:#4c9aff}*union* *all*{color} > {color:#4c9aff}*select* {*}cast{*}('200.99' *as* > {*}decimal{*}(20,8)){*}/{*}100 *as* amt1,{*}cast{*}('256.99' *as* > {*}decimal{*}(20,8)) *as* amt2;{color} > *will got error :* > org.apache.spark.{*}sql{*}.catalyst.expressions.Literal cannot be *cast* *to* > org.apache.spark.{*}sql{*}.catalyst.expressions.AnsiCast > The SQL will need to change to : > {color:#4c9aff}explain{color} > {color:#4c9aff}*INSERT* OVERWRITE *TABLE* test.spark33_decimal_orc{color} > {color:#4c9aff}*select* *null* *as* amt1,{*}cast{*}('256.99' *as* > {*}decimal{*}(20,8)) *as* amt2{color} > {color:#4c9aff}*union* *all*{color} > {color:#4c9aff}*select* {color:#de350b}{*}cast{*}({color}{*}cast{*}('200.99' > *as* {*}decimal{*}(20,8)){*}/{*}100 *as* > {*}decimal{*}(20,8){color:#de350b}){color} *as* amt1,{*}cast{*}('256.99' *as* > {*}decimal{*}(20,8)) *as* amt2;{color} > > *but this is not need in spark3.2.1 , is this a bug for spark3.3.1 ?* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41991) Interpreted mode subexpression elimination can throw exception during insert
[ https://issues.apache.org/jira/browse/SPARK-41991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692854#comment-17692854 ] Apache Spark commented on SPARK-41991: -- User 'RunyaoChen' has created a pull request for this issue: https://github.com/apache/spark/pull/40140 > Interpreted mode subexpression elimination can throw exception during insert > > > Key: SPARK-41991 > URL: https://issues.apache.org/jira/browse/SPARK-41991 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.1, 3.4.0 >Reporter: Bruce Robbins >Assignee: Bruce Robbins >Priority: Major > Fix For: 3.4.0 > > > Example: > {noformat} > drop table if exists tbl1; > create table tbl1 (a int, b int) using parquet; > set spark.sql.codegen.wholeStage=false; > set spark.sql.codegen.factoryMode=NO_CODEGEN; > insert into tbl1 > select id as a, id as b > from range(1, 5); > {noformat} > This results in the following exception: > {noformat} > java.lang.ClassCastException: > org.apache.spark.sql.catalyst.expressions.ExpressionProxy cannot be cast to > org.apache.spark.sql.catalyst.expressions.Cast > at > org.apache.spark.sql.catalyst.expressions.CheckOverflowInTableInsert.withNewChildInternal(Cast.scala:2514) > at > org.apache.spark.sql.catalyst.expressions.CheckOverflowInTableInsert.withNewChildInternal(Cast.scala:2512) > {noformat} > The query produces 2 bigint values, but the table's schema expects 2 int > values, so Spark wraps each output field with a {{Cast}}. > Later, in {{InterpretedUnsafeProjection}}, {{prepareExpressions}} tries to > wrap the two {{Cast}} expressions with an {{ExpressionProxy}}. However, the > parent expression of each {{Cast}} is a {{CheckOverflowInTableInsert}} > expression, which does not accept {{ExpressionProxy}} as a child. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42539) User-provided JARs can override Spark's Hive metadata client JARs when using "builtin"
[ https://issues.apache.org/jira/browse/SPARK-42539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42539: Assignee: Apache Spark > User-provided JARs can override Spark's Hive metadata client JARs when using > "builtin" > -- > > Key: SPARK-42539 > URL: https://issues.apache.org/jira/browse/SPARK-42539 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.3, 3.2.3, 3.3.2 >Reporter: Erik Krogen >Assignee: Apache Spark >Priority: Major > > Recently we observed that on version 3.2.0 and Java 8, it is possible for > user-provided Hive JARs to break the ability for Spark, via the Hive metadata > client / {{IsolatedClientLoader}}, to communicate with Hive Metastore, when > using the default behavior of the "builtin" Hive version. After SPARK-35321, > when Spark is compiled against Hive >= 2.3.9 and the "builtin" Hive client > version is used, we will call the method {{Hive.getWithoutRegisterFns()}} > (from HIVE-21563) instead of {{Hive.get()}}. If the user has included, for > example, {{hive-exec-2.3.8.jar}} on their classpath, the client will break > with a {{NoSuchMethodError}}. This particular failure mode was resolved in > 3.2.1 by SPARK-37446, but while investigating, we found a general issue that > it's possible for user JARs to override Spark's own JARs -- but only inside > of the IsolatedClientLoader when using "builtin". This happens because even > when Spark is configured to use the "builtin" Hive classes, it still creates > a separate URLClassLoader for the HiveClientImpl used for HMS communication. > To get the set of JAR URLs to use for this classloader, Spark [collects all > of the JARs used by the user classloader (and its parent, and that > classloader's parent, and so > on)](https://github.com/apache/spark/blob/87e3d5625e76bb734b8dd753bfb25002822c8585/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala#L412-L438). > Thus the newly created classloader will have all of the same JARs as the > user classloader, but the ordering has been reversed! User JARs get > prioritized ahead of system JARs, because the classloader hierarchy is > traversed from bottom-to-top. For example let's say we have user JARs > "foo.jar" and "hive-exec-2.3.8.jar". The user classloader will look like this: > {code} > MutableURLClassLoader > -- foo.jar > -- hive-exec-2.3.8.jar > -- parent: URLClassLoader > - spark-core_2.12-3.2.0.jar > - ... > - hive-exec-2.3.9.jar > - ... > {code} > This setup provides the expected behavior within the user classloader; it > will first check the parent, so hive-exec-2.3.9.jar takes precedence, and the > MutableURLClassLoader is only checked if the class doesn't exist in the > parent. But when a JAR list is constructed for the IsolatedClientLoader, it > traverses the URLs from MutableURLClassLoader first, then it's parent, so the > final list looks like (in order): > {code} > URLClassLoader [IsolatedClientLoader] > -- foo.jar > -- hive-exec-2.3.8.jar > -- spark-core_2.12-3.2.0.jar > -- ... > -- hive-exec-2.3.9.jar > -- ... > -- parent: boot classloader (JVM classes) > {code} > Now when a lookup happens, all of the JARs are within the same > URLClassLoader, and the user JARs are in front of the Spark ones, so the user > JARs get prioritized. This is the opposite of the expected behavior when > using the default user/application classloader in Spark, which has > parent-first behavior, prioritizing the Spark/system classes over the user > classes. (Note that this behavior is correct when using the > {{ChildFirstURLClassLoader}}.) > After SPARK-37446, the NoSuchMethodError is no longer an issue, but this > still breaks assumptions about how user JARs should be treated vs. system > JARs, and presents the ability for the client to break in other ways. For > example in SPARK-37446 it describes a scenario whereby Hive 2.3.8 JARs have > been included; the changes in Hive 2.3.9 were needed to improve compatibility > with older HMS, so if a user were to accidentally include these older JARs, > it could break the ability of Spark to communicate with HMS 1.x > I see two solutions to this: > *(A) Remove the separate classloader entirely when using "builtin"* > Starting from 3.0.0, due to SPARK-26839, when using Java 9+, we don't even > create a new classloader when using "builtin". This makes sense, as [called > out in this > comment|https://github.com/apache/spark/pull/24057#discussion_r265142878], > since the point of "builtin" is to use the existing JARs on the classpath > anyway. This proposes simply extending the changes from SPARK-26839 to all > Java versions, instead of restricting to Java 9+ only. > *(B) Reverse the
[jira] [Commented] (SPARK-42539) User-provided JARs can override Spark's Hive metadata client JARs when using "builtin"
[ https://issues.apache.org/jira/browse/SPARK-42539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692820#comment-17692820 ] Apache Spark commented on SPARK-42539: -- User 'xkrogen' has created a pull request for this issue: https://github.com/apache/spark/pull/40144 > User-provided JARs can override Spark's Hive metadata client JARs when using > "builtin" > -- > > Key: SPARK-42539 > URL: https://issues.apache.org/jira/browse/SPARK-42539 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.3, 3.2.3, 3.3.2 >Reporter: Erik Krogen >Priority: Major > > Recently we observed that on version 3.2.0 and Java 8, it is possible for > user-provided Hive JARs to break the ability for Spark, via the Hive metadata > client / {{IsolatedClientLoader}}, to communicate with Hive Metastore, when > using the default behavior of the "builtin" Hive version. After SPARK-35321, > when Spark is compiled against Hive >= 2.3.9 and the "builtin" Hive client > version is used, we will call the method {{Hive.getWithoutRegisterFns()}} > (from HIVE-21563) instead of {{Hive.get()}}. If the user has included, for > example, {{hive-exec-2.3.8.jar}} on their classpath, the client will break > with a {{NoSuchMethodError}}. This particular failure mode was resolved in > 3.2.1 by SPARK-37446, but while investigating, we found a general issue that > it's possible for user JARs to override Spark's own JARs -- but only inside > of the IsolatedClientLoader when using "builtin". This happens because even > when Spark is configured to use the "builtin" Hive classes, it still creates > a separate URLClassLoader for the HiveClientImpl used for HMS communication. > To get the set of JAR URLs to use for this classloader, Spark [collects all > of the JARs used by the user classloader (and its parent, and that > classloader's parent, and so > on)](https://github.com/apache/spark/blob/87e3d5625e76bb734b8dd753bfb25002822c8585/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala#L412-L438). > Thus the newly created classloader will have all of the same JARs as the > user classloader, but the ordering has been reversed! User JARs get > prioritized ahead of system JARs, because the classloader hierarchy is > traversed from bottom-to-top. For example let's say we have user JARs > "foo.jar" and "hive-exec-2.3.8.jar". The user classloader will look like this: > {code} > MutableURLClassLoader > -- foo.jar > -- hive-exec-2.3.8.jar > -- parent: URLClassLoader > - spark-core_2.12-3.2.0.jar > - ... > - hive-exec-2.3.9.jar > - ... > {code} > This setup provides the expected behavior within the user classloader; it > will first check the parent, so hive-exec-2.3.9.jar takes precedence, and the > MutableURLClassLoader is only checked if the class doesn't exist in the > parent. But when a JAR list is constructed for the IsolatedClientLoader, it > traverses the URLs from MutableURLClassLoader first, then it's parent, so the > final list looks like (in order): > {code} > URLClassLoader [IsolatedClientLoader] > -- foo.jar > -- hive-exec-2.3.8.jar > -- spark-core_2.12-3.2.0.jar > -- ... > -- hive-exec-2.3.9.jar > -- ... > -- parent: boot classloader (JVM classes) > {code} > Now when a lookup happens, all of the JARs are within the same > URLClassLoader, and the user JARs are in front of the Spark ones, so the user > JARs get prioritized. This is the opposite of the expected behavior when > using the default user/application classloader in Spark, which has > parent-first behavior, prioritizing the Spark/system classes over the user > classes. (Note that this behavior is correct when using the > {{ChildFirstURLClassLoader}}.) > After SPARK-37446, the NoSuchMethodError is no longer an issue, but this > still breaks assumptions about how user JARs should be treated vs. system > JARs, and presents the ability for the client to break in other ways. For > example in SPARK-37446 it describes a scenario whereby Hive 2.3.8 JARs have > been included; the changes in Hive 2.3.9 were needed to improve compatibility > with older HMS, so if a user were to accidentally include these older JARs, > it could break the ability of Spark to communicate with HMS 1.x > I see two solutions to this: > *(A) Remove the separate classloader entirely when using "builtin"* > Starting from 3.0.0, due to SPARK-26839, when using Java 9+, we don't even > create a new classloader when using "builtin". This makes sense, as [called > out in this > comment|https://github.com/apache/spark/pull/24057#discussion_r265142878], > since the point of "builtin" is to use the existing JARs on the classpath > anyway. This proposes simply extending the changes from SPARK-26839 to all >
[jira] [Assigned] (SPARK-42539) User-provided JARs can override Spark's Hive metadata client JARs when using "builtin"
[ https://issues.apache.org/jira/browse/SPARK-42539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42539: Assignee: (was: Apache Spark) > User-provided JARs can override Spark's Hive metadata client JARs when using > "builtin" > -- > > Key: SPARK-42539 > URL: https://issues.apache.org/jira/browse/SPARK-42539 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.3, 3.2.3, 3.3.2 >Reporter: Erik Krogen >Priority: Major > > Recently we observed that on version 3.2.0 and Java 8, it is possible for > user-provided Hive JARs to break the ability for Spark, via the Hive metadata > client / {{IsolatedClientLoader}}, to communicate with Hive Metastore, when > using the default behavior of the "builtin" Hive version. After SPARK-35321, > when Spark is compiled against Hive >= 2.3.9 and the "builtin" Hive client > version is used, we will call the method {{Hive.getWithoutRegisterFns()}} > (from HIVE-21563) instead of {{Hive.get()}}. If the user has included, for > example, {{hive-exec-2.3.8.jar}} on their classpath, the client will break > with a {{NoSuchMethodError}}. This particular failure mode was resolved in > 3.2.1 by SPARK-37446, but while investigating, we found a general issue that > it's possible for user JARs to override Spark's own JARs -- but only inside > of the IsolatedClientLoader when using "builtin". This happens because even > when Spark is configured to use the "builtin" Hive classes, it still creates > a separate URLClassLoader for the HiveClientImpl used for HMS communication. > To get the set of JAR URLs to use for this classloader, Spark [collects all > of the JARs used by the user classloader (and its parent, and that > classloader's parent, and so > on)](https://github.com/apache/spark/blob/87e3d5625e76bb734b8dd753bfb25002822c8585/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala#L412-L438). > Thus the newly created classloader will have all of the same JARs as the > user classloader, but the ordering has been reversed! User JARs get > prioritized ahead of system JARs, because the classloader hierarchy is > traversed from bottom-to-top. For example let's say we have user JARs > "foo.jar" and "hive-exec-2.3.8.jar". The user classloader will look like this: > {code} > MutableURLClassLoader > -- foo.jar > -- hive-exec-2.3.8.jar > -- parent: URLClassLoader > - spark-core_2.12-3.2.0.jar > - ... > - hive-exec-2.3.9.jar > - ... > {code} > This setup provides the expected behavior within the user classloader; it > will first check the parent, so hive-exec-2.3.9.jar takes precedence, and the > MutableURLClassLoader is only checked if the class doesn't exist in the > parent. But when a JAR list is constructed for the IsolatedClientLoader, it > traverses the URLs from MutableURLClassLoader first, then it's parent, so the > final list looks like (in order): > {code} > URLClassLoader [IsolatedClientLoader] > -- foo.jar > -- hive-exec-2.3.8.jar > -- spark-core_2.12-3.2.0.jar > -- ... > -- hive-exec-2.3.9.jar > -- ... > -- parent: boot classloader (JVM classes) > {code} > Now when a lookup happens, all of the JARs are within the same > URLClassLoader, and the user JARs are in front of the Spark ones, so the user > JARs get prioritized. This is the opposite of the expected behavior when > using the default user/application classloader in Spark, which has > parent-first behavior, prioritizing the Spark/system classes over the user > classes. (Note that this behavior is correct when using the > {{ChildFirstURLClassLoader}}.) > After SPARK-37446, the NoSuchMethodError is no longer an issue, but this > still breaks assumptions about how user JARs should be treated vs. system > JARs, and presents the ability for the client to break in other ways. For > example in SPARK-37446 it describes a scenario whereby Hive 2.3.8 JARs have > been included; the changes in Hive 2.3.9 were needed to improve compatibility > with older HMS, so if a user were to accidentally include these older JARs, > it could break the ability of Spark to communicate with HMS 1.x > I see two solutions to this: > *(A) Remove the separate classloader entirely when using "builtin"* > Starting from 3.0.0, due to SPARK-26839, when using Java 9+, we don't even > create a new classloader when using "builtin". This makes sense, as [called > out in this > comment|https://github.com/apache/spark/pull/24057#discussion_r265142878], > since the point of "builtin" is to use the existing JARs on the classpath > anyway. This proposes simply extending the changes from SPARK-26839 to all > Java versions, instead of restricting to Java 9+ only. > *(B) Reverse the ordering of parent/child
[jira] [Assigned] (SPARK-42538) `functions#lit` support more types
[ https://issues.apache.org/jira/browse/SPARK-42538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42538: Assignee: (was: Apache Spark) > `functions#lit` support more types > --- > > Key: SPARK-42538 > URL: https://issues.apache.org/jira/browse/SPARK-42538 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0, 3.5.0 >Reporter: Yang Jie >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42538) `functions#lit` support more types
[ https://issues.apache.org/jira/browse/SPARK-42538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42538: Assignee: Apache Spark > `functions#lit` support more types > --- > > Key: SPARK-42538 > URL: https://issues.apache.org/jira/browse/SPARK-42538 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0, 3.5.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42538) `functions#lit` support more types
[ https://issues.apache.org/jira/browse/SPARK-42538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692698#comment-17692698 ] Apache Spark commented on SPARK-42538: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/40143 > `functions#lit` support more types > --- > > Key: SPARK-42538 > URL: https://issues.apache.org/jira/browse/SPARK-42538 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0, 3.5.0 >Reporter: Yang Jie >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41171) Push down filter through window when partitionSpec is empty
[ https://issues.apache.org/jira/browse/SPARK-41171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692628#comment-17692628 ] Apache Spark commented on SPARK-41171: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/40142 > Push down filter through window when partitionSpec is empty > --- > > Key: SPARK-41171 > URL: https://issues.apache.org/jira/browse/SPARK-41171 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Priority: Major > > Sometimes, filter compares the rank-like window functions with number. > {code:java} > SELECT *, ROW_NUMBER() OVER(ORDER BY a) AS rn FROM Tab1 WHERE rn <= 5 > {code} > We can create a Limit(5) and push down it as the child of Window. > {code:java} > SELECT *, ROW_NUMBER() OVER(ORDER BY a) AS rn FROM (SELECT * FROM Tab1 ORDER > BY a LIMIT 5) t > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42406) [PROTOBUF] Recursive field handling is incompatible with delta
[ https://issues.apache.org/jira/browse/SPARK-42406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692607#comment-17692607 ] Apache Spark commented on SPARK-42406: -- User 'rangadi' has created a pull request for this issue: https://github.com/apache/spark/pull/40141 > [PROTOBUF] Recursive field handling is incompatible with delta > -- > > Key: SPARK-42406 > URL: https://issues.apache.org/jira/browse/SPARK-42406 > Project: Spark > Issue Type: Bug > Components: Protobuf >Affects Versions: 3.4.0 >Reporter: Raghu Angadi >Assignee: Raghu Angadi >Priority: Major > Fix For: 3.4.0 > > > Protobuf deserializer (`from_protobuf()` function()) optionally supports > recursive fields by limiting the depth to certain level. See example below. > It assigns a 'NullType' for such a field when allowed depth is reached. > It causes a few issues. E.g. a repeated field as in the following example > results in a Array field with 'NullType'. Delta does not support null type in > a complex type. > Actually `Array[NullType]` is not really useful anyway. > How about this fix: Drop the recursive field when the limit reached rather > than using a NullType. > The example below makes it clear: > Consider a recursive Protobuf: > > {code:python} > message TreeNode { > string value = 1; > repeated TreeNode children = 2; > } > {code} > Allow depth of 2: > > {code:python} > df.select( > 'proto', > messageName = 'TreeNode', > options = { ... "recursive.fields.max.depth" : "2" } > ).printSchema() > {code} > Schema looks like this: > {noformat} > root > |– from_protobuf(proto): struct (nullable = true)| > | |– value: string (nullable = true)| > | |– children: array (nullable = false)| > | | |– element: struct (containsNull = false)| > | | | |– value: string (nullable = true)| > | | | |– children: array (nullable = false)| > | | | | |– element: struct (containsNull = false)| > | | | | | |– value: string (nullable = true)| > | | | | | |– children: array (nullable = false). [ === Proposed fix: Drop > this field === ]| > | | | | | | |– element: void (containsNull = false) [ === NOTICE 'void' HERE > === ] > {noformat} > When we try to write this to a delta table, we get an error: > {noformat} > AnalysisException: Found nested NullType in column > from_protobuf(proto).children which is of ArrayType. Delta doesn't support > writing NullType in complex types. > {noformat} > > We could just drop the field 'element' when recursion depth is reached. It is > simpler and does not need to deal with NullType. We are ignoring the value > anyway. There is no use in keeping the field. > Another issue is setting for 'recursive.fields.max.depth': It is not enforced > correctly. '0' does not make sense. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42406) [PROTOBUF] Recursive field handling is incompatible with delta
[ https://issues.apache.org/jira/browse/SPARK-42406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42406: Assignee: Raghu Angadi (was: Apache Spark) > [PROTOBUF] Recursive field handling is incompatible with delta > -- > > Key: SPARK-42406 > URL: https://issues.apache.org/jira/browse/SPARK-42406 > Project: Spark > Issue Type: Bug > Components: Protobuf >Affects Versions: 3.4.0 >Reporter: Raghu Angadi >Assignee: Raghu Angadi >Priority: Major > Fix For: 3.4.0 > > > Protobuf deserializer (`from_protobuf()` function()) optionally supports > recursive fields by limiting the depth to certain level. See example below. > It assigns a 'NullType' for such a field when allowed depth is reached. > It causes a few issues. E.g. a repeated field as in the following example > results in a Array field with 'NullType'. Delta does not support null type in > a complex type. > Actually `Array[NullType]` is not really useful anyway. > How about this fix: Drop the recursive field when the limit reached rather > than using a NullType. > The example below makes it clear: > Consider a recursive Protobuf: > > {code:python} > message TreeNode { > string value = 1; > repeated TreeNode children = 2; > } > {code} > Allow depth of 2: > > {code:python} > df.select( > 'proto', > messageName = 'TreeNode', > options = { ... "recursive.fields.max.depth" : "2" } > ).printSchema() > {code} > Schema looks like this: > {noformat} > root > |– from_protobuf(proto): struct (nullable = true)| > | |– value: string (nullable = true)| > | |– children: array (nullable = false)| > | | |– element: struct (containsNull = false)| > | | | |– value: string (nullable = true)| > | | | |– children: array (nullable = false)| > | | | | |– element: struct (containsNull = false)| > | | | | | |– value: string (nullable = true)| > | | | | | |– children: array (nullable = false). [ === Proposed fix: Drop > this field === ]| > | | | | | | |– element: void (containsNull = false) [ === NOTICE 'void' HERE > === ] > {noformat} > When we try to write this to a delta table, we get an error: > {noformat} > AnalysisException: Found nested NullType in column > from_protobuf(proto).children which is of ArrayType. Delta doesn't support > writing NullType in complex types. > {noformat} > > We could just drop the field 'element' when recursion depth is reached. It is > simpler and does not need to deal with NullType. We are ignoring the value > anyway. There is no use in keeping the field. > Another issue is setting for 'recursive.fields.max.depth': It is not enforced > correctly. '0' does not make sense. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42406) [PROTOBUF] Recursive field handling is incompatible with delta
[ https://issues.apache.org/jira/browse/SPARK-42406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42406: Assignee: Apache Spark (was: Raghu Angadi) > [PROTOBUF] Recursive field handling is incompatible with delta > -- > > Key: SPARK-42406 > URL: https://issues.apache.org/jira/browse/SPARK-42406 > Project: Spark > Issue Type: Bug > Components: Protobuf >Affects Versions: 3.4.0 >Reporter: Raghu Angadi >Assignee: Apache Spark >Priority: Major > Fix For: 3.4.0 > > > Protobuf deserializer (`from_protobuf()` function()) optionally supports > recursive fields by limiting the depth to certain level. See example below. > It assigns a 'NullType' for such a field when allowed depth is reached. > It causes a few issues. E.g. a repeated field as in the following example > results in a Array field with 'NullType'. Delta does not support null type in > a complex type. > Actually `Array[NullType]` is not really useful anyway. > How about this fix: Drop the recursive field when the limit reached rather > than using a NullType. > The example below makes it clear: > Consider a recursive Protobuf: > > {code:python} > message TreeNode { > string value = 1; > repeated TreeNode children = 2; > } > {code} > Allow depth of 2: > > {code:python} > df.select( > 'proto', > messageName = 'TreeNode', > options = { ... "recursive.fields.max.depth" : "2" } > ).printSchema() > {code} > Schema looks like this: > {noformat} > root > |– from_protobuf(proto): struct (nullable = true)| > | |– value: string (nullable = true)| > | |– children: array (nullable = false)| > | | |– element: struct (containsNull = false)| > | | | |– value: string (nullable = true)| > | | | |– children: array (nullable = false)| > | | | | |– element: struct (containsNull = false)| > | | | | | |– value: string (nullable = true)| > | | | | | |– children: array (nullable = false). [ === Proposed fix: Drop > this field === ]| > | | | | | | |– element: void (containsNull = false) [ === NOTICE 'void' HERE > === ] > {noformat} > When we try to write this to a delta table, we get an error: > {noformat} > AnalysisException: Found nested NullType in column > from_protobuf(proto).children which is of ArrayType. Delta doesn't support > writing NullType in complex types. > {noformat} > > We could just drop the field 'element' when recursion depth is reached. It is > simpler and does not need to deal with NullType. We are ignoring the value > anyway. There is no use in keeping the field. > Another issue is setting for 'recursive.fields.max.depth': It is not enforced > correctly. '0' does not make sense. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42286) Fix internal error for valid CASE WHEN expression with CAST when inserting into a table
[ https://issues.apache.org/jira/browse/SPARK-42286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692512#comment-17692512 ] Apache Spark commented on SPARK-42286: -- User 'RunyaoChen' has created a pull request for this issue: https://github.com/apache/spark/pull/40140 > Fix internal error for valid CASE WHEN expression with CAST when inserting > into a table > --- > > Key: SPARK-42286 > URL: https://issues.apache.org/jira/browse/SPARK-42286 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Runyao.Chen >Assignee: Runyao.Chen >Priority: Major > Fix For: 3.4.0 > > > ``` > spark-sql> create or replace table es570639t1 as select x FROM values (1), > (2), (3) as tab(x); > spark-sql> create or replace table es570639t2 (x Decimal(9, 0)); > spark-sql> insert into es570639t2 select 0 - (case when x = 1 then 1 else x > end) from es570639t1 where x = 1; > ``` > hits the following internal error > org.apache.spark.SparkException: [INTERNAL_ERROR] Child is not Cast or > ExpressionProxy of Cast > > Stack trace: > org.apache.spark.SparkException: [INTERNAL_ERROR] Child is not Cast or > ExpressionProxy of Cast at > org.apache.spark.SparkException$.internalError(SparkException.scala:78) at > org.apache.spark.SparkException$.internalError(SparkException.scala:82) at > org.apache.spark.sql.catalyst.expressions.CheckOverflowInTableInsert.checkChild(Cast.scala:2693) > at > org.apache.spark.sql.catalyst.expressions.CheckOverflowInTableInsert.withNewChildInternal(Cast.scala:2697) > at > org.apache.spark.sql.catalyst.expressions.CheckOverflowInTableInsert.withNewChildInternal(Cast.scala:2683) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.$anonfun$mapChildren$5(TreeNode.scala:1315) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:106) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1314) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1309) > at > org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:636) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:570) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:570) > > This internal error comes from `CheckOverflowInTableInsert``checkChild`, > where we covered only `Cast` expr and `ExpressionProxy` expr, but not the > `CaseWhen` expr. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39859) Support v2 `DESCRIBE TABLE EXTENDED` for columns
[ https://issues.apache.org/jira/browse/SPARK-39859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692503#comment-17692503 ] Apache Spark commented on SPARK-39859: -- User 'huaxingao' has created a pull request for this issue: https://github.com/apache/spark/pull/40139 > Support v2 `DESCRIBE TABLE EXTENDED` for columns > > > Key: SPARK-39859 > URL: https://issues.apache.org/jira/browse/SPARK-39859 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Huaxin Gao >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42049) Improve AliasAwareOutputExpression
[ https://issues.apache.org/jira/browse/SPARK-42049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692481#comment-17692481 ] Apache Spark commented on SPARK-42049: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/40137 > Improve AliasAwareOutputExpression > -- > > Key: SPARK-42049 > URL: https://issues.apache.org/jira/browse/SPARK-42049 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: XiDuo You >Assignee: Peter Toth >Priority: Major > Fix For: 3.4.0 > > > AliasAwareOutputExpression now does not support if an attribute has more than > one alias. > AliasAwareOutputExpression should also work for LogicalPlan. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41793) Incorrect result for window frames defined by a range clause on large decimals
[ https://issues.apache.org/jira/browse/SPARK-41793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41793: Assignee: (was: Apache Spark) > Incorrect result for window frames defined by a range clause on large > decimals > --- > > Key: SPARK-41793 > URL: https://issues.apache.org/jira/browse/SPARK-41793 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Gera Shegalov >Priority: Blocker > Labels: correctness > > Context > https://github.com/NVIDIA/spark-rapids/issues/7429#issuecomment-1368040686 > The following windowing query on a simple two-row input should produce two > non-empty windows as a result > {code} > from pprint import pprint > data = [ > ('9223372036854775807', '11342371013783243717493546650944543.47'), > ('9223372036854775807', '.99') > ] > df1 = spark.createDataFrame(data, 'a STRING, b STRING') > df2 = df1.select(df1.a.cast('LONG'), df1.b.cast('DECIMAL(38,2)')) > df2.createOrReplaceTempView('test_table') > df = sql(''' > SELECT > COUNT(1) OVER ( > PARTITION BY a > ORDER BY b ASC > RANGE BETWEEN 10.2345 PRECEDING AND 6.7890 FOLLOWING > ) AS CNT_1 > FROM > test_table > ''') > res = df.collect() > df.explain(True) > pprint(res) > {code} > Spark 3.4.0-SNAPSHOT output: > {code} > [Row(CNT_1=1), Row(CNT_1=0)] > {code} > Spark 3.3.1 output as expected: > {code} > Row(CNT_1=1), Row(CNT_1=1)] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41793) Incorrect result for window frames defined by a range clause on large decimals
[ https://issues.apache.org/jira/browse/SPARK-41793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692480#comment-17692480 ] Apache Spark commented on SPARK-41793: -- User 'ulysses-you' has created a pull request for this issue: https://github.com/apache/spark/pull/40138 > Incorrect result for window frames defined by a range clause on large > decimals > --- > > Key: SPARK-41793 > URL: https://issues.apache.org/jira/browse/SPARK-41793 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Gera Shegalov >Priority: Blocker > Labels: correctness > > Context > https://github.com/NVIDIA/spark-rapids/issues/7429#issuecomment-1368040686 > The following windowing query on a simple two-row input should produce two > non-empty windows as a result > {code} > from pprint import pprint > data = [ > ('9223372036854775807', '11342371013783243717493546650944543.47'), > ('9223372036854775807', '.99') > ] > df1 = spark.createDataFrame(data, 'a STRING, b STRING') > df2 = df1.select(df1.a.cast('LONG'), df1.b.cast('DECIMAL(38,2)')) > df2.createOrReplaceTempView('test_table') > df = sql(''' > SELECT > COUNT(1) OVER ( > PARTITION BY a > ORDER BY b ASC > RANGE BETWEEN 10.2345 PRECEDING AND 6.7890 FOLLOWING > ) AS CNT_1 > FROM > test_table > ''') > res = df.collect() > df.explain(True) > pprint(res) > {code} > Spark 3.4.0-SNAPSHOT output: > {code} > [Row(CNT_1=1), Row(CNT_1=0)] > {code} > Spark 3.3.1 output as expected: > {code} > Row(CNT_1=1), Row(CNT_1=1)] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41793) Incorrect result for window frames defined by a range clause on large decimals
[ https://issues.apache.org/jira/browse/SPARK-41793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41793: Assignee: Apache Spark > Incorrect result for window frames defined by a range clause on large > decimals > --- > > Key: SPARK-41793 > URL: https://issues.apache.org/jira/browse/SPARK-41793 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Gera Shegalov >Assignee: Apache Spark >Priority: Blocker > Labels: correctness > > Context > https://github.com/NVIDIA/spark-rapids/issues/7429#issuecomment-1368040686 > The following windowing query on a simple two-row input should produce two > non-empty windows as a result > {code} > from pprint import pprint > data = [ > ('9223372036854775807', '11342371013783243717493546650944543.47'), > ('9223372036854775807', '.99') > ] > df1 = spark.createDataFrame(data, 'a STRING, b STRING') > df2 = df1.select(df1.a.cast('LONG'), df1.b.cast('DECIMAL(38,2)')) > df2.createOrReplaceTempView('test_table') > df = sql(''' > SELECT > COUNT(1) OVER ( > PARTITION BY a > ORDER BY b ASC > RANGE BETWEEN 10.2345 PRECEDING AND 6.7890 FOLLOWING > ) AS CNT_1 > FROM > test_table > ''') > res = df.collect() > df.explain(True) > pprint(res) > {code} > Spark 3.4.0-SNAPSHOT output: > {code} > [Row(CNT_1=1), Row(CNT_1=0)] > {code} > Spark 3.3.1 output as expected: > {code} > Row(CNT_1=1), Row(CNT_1=1)] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42515) ClientE2ETestSuite local test failed
[ https://issues.apache.org/jira/browse/SPARK-42515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692469#comment-17692469 ] Apache Spark commented on SPARK-42515: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/40136 > ClientE2ETestSuite local test failed > > > Key: SPARK-42515 > URL: https://issues.apache.org/jira/browse/SPARK-42515 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.4.0, 3.5.0 >Reporter: Yang Jie >Priority: Minor > > > local run `build/sbt clean "connect-client-jvm/test"`, > `ClientE2ETestSuite#write table` failed, GA not failed. > > {code:java} > [info] - rite table *** FAILED *** (41 milliseconds) > [info] io.grpc.StatusRuntimeException: UNKNOWN: > org/apache/parquet/hadoop/api/ReadSupport > [info] at io.grpc.Status.asRuntimeException(Status.java:535) > [info] at > io.grpc.stub.ClientCalls$BlockingResponseStream.hasNext(ClientCalls.java:660) > [info] at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45) > [info] at scala.collection.Iterator.foreach(Iterator.scala:943) > [info] at scala.collection.Iterator.foreach$(Iterator.scala:943) > [info] at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) > [info] at org.apache.spark.sql.SparkSession.execute(SparkSession.scala:169) > [info] at > org.apache.spark.sql.DataFrameWriter.executeWriteOperation(DataFrameWriter.scala:255) > [info] at > org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:338) > [info] at > org.apache.spark.sql.ClientE2ETestSuite.$anonfun$new$12(ClientE2ETestSuite.scala:145) > [info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) > [info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) > [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > [info] at org.scalatest.Transformer.apply(Transformer.scala:22) > [info] at org.scalatest.Transformer.apply(Transformer.scala:20) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226) > [info] at org.scalatest.TestSuite.withFixture(TestSuite.scala:196) > [info] at org.scalatest.TestSuite.withFixture$(TestSuite.scala:195) > [info] at > org.scalatest.funsuite.AnyFunSuite.withFixture(AnyFunSuite.scala:1564) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236) > [info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218) > [info] at org.scalatest.funsuite.AnyFunSuite.runTest(AnyFunSuite.scala:1564) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269) > [info] at > org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413) > [info] at scala.collection.immutable.List.foreach(List.scala:431) > [info] at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) > [info] at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396) > [info] at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268) > [info] at > org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1564) > [info] at org.scalatest.Suite.run(Suite.scala:1114) > [info] at org.scalatest.Suite.run$(Suite.scala:1096) > [info] at > org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1564) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:273) > [info] at org.scalatest.SuperEngine.runImpl(Engine.scala:535) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:273) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:272) > [info] at > org.apache.spark.sql.ClientE2ETestSuite.org$scalatest$BeforeAndAfterAll$$super$run(ClientE2ETestSuite.scala:33) > [info] at > org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213) > [info] at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210) > [info] at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208) > [info] at > org.apache.spark.sql.ClientE2ETestSuite.run(ClientE2ETestSuite.scala:33) > [info] at > org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:321) > [info] at >
[jira] [Assigned] (SPARK-42515) ClientE2ETestSuite local test failed
[ https://issues.apache.org/jira/browse/SPARK-42515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42515: Assignee: (was: Apache Spark) > ClientE2ETestSuite local test failed > > > Key: SPARK-42515 > URL: https://issues.apache.org/jira/browse/SPARK-42515 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.4.0, 3.5.0 >Reporter: Yang Jie >Priority: Minor > > > local run `build/sbt clean "connect-client-jvm/test"`, > `ClientE2ETestSuite#write table` failed, GA not failed. > > {code:java} > [info] - rite table *** FAILED *** (41 milliseconds) > [info] io.grpc.StatusRuntimeException: UNKNOWN: > org/apache/parquet/hadoop/api/ReadSupport > [info] at io.grpc.Status.asRuntimeException(Status.java:535) > [info] at > io.grpc.stub.ClientCalls$BlockingResponseStream.hasNext(ClientCalls.java:660) > [info] at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45) > [info] at scala.collection.Iterator.foreach(Iterator.scala:943) > [info] at scala.collection.Iterator.foreach$(Iterator.scala:943) > [info] at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) > [info] at org.apache.spark.sql.SparkSession.execute(SparkSession.scala:169) > [info] at > org.apache.spark.sql.DataFrameWriter.executeWriteOperation(DataFrameWriter.scala:255) > [info] at > org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:338) > [info] at > org.apache.spark.sql.ClientE2ETestSuite.$anonfun$new$12(ClientE2ETestSuite.scala:145) > [info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) > [info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) > [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > [info] at org.scalatest.Transformer.apply(Transformer.scala:22) > [info] at org.scalatest.Transformer.apply(Transformer.scala:20) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226) > [info] at org.scalatest.TestSuite.withFixture(TestSuite.scala:196) > [info] at org.scalatest.TestSuite.withFixture$(TestSuite.scala:195) > [info] at > org.scalatest.funsuite.AnyFunSuite.withFixture(AnyFunSuite.scala:1564) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236) > [info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218) > [info] at org.scalatest.funsuite.AnyFunSuite.runTest(AnyFunSuite.scala:1564) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269) > [info] at > org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413) > [info] at scala.collection.immutable.List.foreach(List.scala:431) > [info] at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) > [info] at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396) > [info] at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268) > [info] at > org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1564) > [info] at org.scalatest.Suite.run(Suite.scala:1114) > [info] at org.scalatest.Suite.run$(Suite.scala:1096) > [info] at > org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1564) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:273) > [info] at org.scalatest.SuperEngine.runImpl(Engine.scala:535) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:273) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:272) > [info] at > org.apache.spark.sql.ClientE2ETestSuite.org$scalatest$BeforeAndAfterAll$$super$run(ClientE2ETestSuite.scala:33) > [info] at > org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213) > [info] at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210) > [info] at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208) > [info] at > org.apache.spark.sql.ClientE2ETestSuite.run(ClientE2ETestSuite.scala:33) > [info] at > org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:321) > [info] at > org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:517) > [info] at
[jira] [Assigned] (SPARK-42515) ClientE2ETestSuite local test failed
[ https://issues.apache.org/jira/browse/SPARK-42515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42515: Assignee: Apache Spark > ClientE2ETestSuite local test failed > > > Key: SPARK-42515 > URL: https://issues.apache.org/jira/browse/SPARK-42515 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.4.0, 3.5.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Minor > > > local run `build/sbt clean "connect-client-jvm/test"`, > `ClientE2ETestSuite#write table` failed, GA not failed. > > {code:java} > [info] - rite table *** FAILED *** (41 milliseconds) > [info] io.grpc.StatusRuntimeException: UNKNOWN: > org/apache/parquet/hadoop/api/ReadSupport > [info] at io.grpc.Status.asRuntimeException(Status.java:535) > [info] at > io.grpc.stub.ClientCalls$BlockingResponseStream.hasNext(ClientCalls.java:660) > [info] at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45) > [info] at scala.collection.Iterator.foreach(Iterator.scala:943) > [info] at scala.collection.Iterator.foreach$(Iterator.scala:943) > [info] at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) > [info] at org.apache.spark.sql.SparkSession.execute(SparkSession.scala:169) > [info] at > org.apache.spark.sql.DataFrameWriter.executeWriteOperation(DataFrameWriter.scala:255) > [info] at > org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:338) > [info] at > org.apache.spark.sql.ClientE2ETestSuite.$anonfun$new$12(ClientE2ETestSuite.scala:145) > [info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) > [info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) > [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > [info] at org.scalatest.Transformer.apply(Transformer.scala:22) > [info] at org.scalatest.Transformer.apply(Transformer.scala:20) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226) > [info] at org.scalatest.TestSuite.withFixture(TestSuite.scala:196) > [info] at org.scalatest.TestSuite.withFixture$(TestSuite.scala:195) > [info] at > org.scalatest.funsuite.AnyFunSuite.withFixture(AnyFunSuite.scala:1564) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236) > [info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218) > [info] at org.scalatest.funsuite.AnyFunSuite.runTest(AnyFunSuite.scala:1564) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269) > [info] at > org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413) > [info] at scala.collection.immutable.List.foreach(List.scala:431) > [info] at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) > [info] at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396) > [info] at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268) > [info] at > org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1564) > [info] at org.scalatest.Suite.run(Suite.scala:1114) > [info] at org.scalatest.Suite.run$(Suite.scala:1096) > [info] at > org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1564) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:273) > [info] at org.scalatest.SuperEngine.runImpl(Engine.scala:535) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:273) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:272) > [info] at > org.apache.spark.sql.ClientE2ETestSuite.org$scalatest$BeforeAndAfterAll$$super$run(ClientE2ETestSuite.scala:33) > [info] at > org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213) > [info] at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210) > [info] at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208) > [info] at > org.apache.spark.sql.ClientE2ETestSuite.run(ClientE2ETestSuite.scala:33) > [info] at > org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:321) > [info] at > org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:517) >
[jira] [Assigned] (SPARK-42444) DataFrame.drop should handle multi columns properly
[ https://issues.apache.org/jira/browse/SPARK-42444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42444: Assignee: (was: Apache Spark) > DataFrame.drop should handle multi columns properly > --- > > Key: SPARK-42444 > URL: https://issues.apache.org/jira/browse/SPARK-42444 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Blocker > > {code:java} > from pyspark.sql import Row > df1 = spark.createDataFrame([(14, "Tom"), (23, "Alice"), (16, "Bob")], > ["age", "name"]) > df2 = spark.createDataFrame([Row(height=80, name="Tom"), Row(height=85, > name="Bob")]) > df1.join(df2, df1.name == df2.name, 'inner').drop('name', 'age').show() > {code} > This works in 3.3 > {code:java} > +--+ > |height| > +--+ > |85| > |80| > +--+ > {code} > but fails in 3.4 > {code:java} > --- > AnalysisException Traceback (most recent call last) > Cell In[1], line 4 > 2 df1 = spark.createDataFrame([(14, "Tom"), (23, "Alice"), (16, > "Bob")], ["age", "name"]) > 3 df2 = spark.createDataFrame([Row(height=80, name="Tom"), > Row(height=85, name="Bob")]) > > 4 df1.join(df2, df1.name == df2.name, 'inner').drop('name', > 'age').show() > File ~/Dev/spark/python/pyspark/sql/dataframe.py:4913, in > DataFrame.drop(self, *cols) >4911 jcols = [_to_java_column(c) for c in cols] >4912 first_column, *remaining_columns = jcols > -> 4913 jdf = self._jdf.drop(first_column, self._jseq(remaining_columns)) >4915 return DataFrame(jdf, self.sparkSession) > File ~/Dev/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1322, > in JavaMember.__call__(self, *args) >1316 command = proto.CALL_COMMAND_NAME +\ >1317 self.command_header +\ >1318 args_command +\ >1319 proto.END_COMMAND_PART >1321 answer = self.gateway_client.send_command(command) > -> 1322 return_value = get_return_value( >1323 answer, self.gateway_client, self.target_id, self.name) >1325 for temp_arg in temp_args: >1326 if hasattr(temp_arg, "_detach"): > File ~/Dev/spark/python/pyspark/errors/exceptions/captured.py:159, in > capture_sql_exception..deco(*a, **kw) > 155 converted = convert_exception(e.java_exception) > 156 if not isinstance(converted, UnknownException): > 157 # Hide where the exception came from that shows a non-Pythonic > 158 # JVM exception message. > --> 159 raise converted from None > 160 else: > 161 raise > AnalysisException: [AMBIGUOUS_REFERENCE] Reference `name` is ambiguous, could > be: [`name`, `name`]. > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42444) DataFrame.drop should handle multi columns properly
[ https://issues.apache.org/jira/browse/SPARK-42444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692467#comment-17692467 ] Apache Spark commented on SPARK-42444: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/40135 > DataFrame.drop should handle multi columns properly > --- > > Key: SPARK-42444 > URL: https://issues.apache.org/jira/browse/SPARK-42444 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Blocker > > {code:java} > from pyspark.sql import Row > df1 = spark.createDataFrame([(14, "Tom"), (23, "Alice"), (16, "Bob")], > ["age", "name"]) > df2 = spark.createDataFrame([Row(height=80, name="Tom"), Row(height=85, > name="Bob")]) > df1.join(df2, df1.name == df2.name, 'inner').drop('name', 'age').show() > {code} > This works in 3.3 > {code:java} > +--+ > |height| > +--+ > |85| > |80| > +--+ > {code} > but fails in 3.4 > {code:java} > --- > AnalysisException Traceback (most recent call last) > Cell In[1], line 4 > 2 df1 = spark.createDataFrame([(14, "Tom"), (23, "Alice"), (16, > "Bob")], ["age", "name"]) > 3 df2 = spark.createDataFrame([Row(height=80, name="Tom"), > Row(height=85, name="Bob")]) > > 4 df1.join(df2, df1.name == df2.name, 'inner').drop('name', > 'age').show() > File ~/Dev/spark/python/pyspark/sql/dataframe.py:4913, in > DataFrame.drop(self, *cols) >4911 jcols = [_to_java_column(c) for c in cols] >4912 first_column, *remaining_columns = jcols > -> 4913 jdf = self._jdf.drop(first_column, self._jseq(remaining_columns)) >4915 return DataFrame(jdf, self.sparkSession) > File ~/Dev/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1322, > in JavaMember.__call__(self, *args) >1316 command = proto.CALL_COMMAND_NAME +\ >1317 self.command_header +\ >1318 args_command +\ >1319 proto.END_COMMAND_PART >1321 answer = self.gateway_client.send_command(command) > -> 1322 return_value = get_return_value( >1323 answer, self.gateway_client, self.target_id, self.name) >1325 for temp_arg in temp_args: >1326 if hasattr(temp_arg, "_detach"): > File ~/Dev/spark/python/pyspark/errors/exceptions/captured.py:159, in > capture_sql_exception..deco(*a, **kw) > 155 converted = convert_exception(e.java_exception) > 156 if not isinstance(converted, UnknownException): > 157 # Hide where the exception came from that shows a non-Pythonic > 158 # JVM exception message. > --> 159 raise converted from None > 160 else: > 161 raise > AnalysisException: [AMBIGUOUS_REFERENCE] Reference `name` is ambiguous, could > be: [`name`, `name`]. > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42444) DataFrame.drop should handle multi columns properly
[ https://issues.apache.org/jira/browse/SPARK-42444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692466#comment-17692466 ] Apache Spark commented on SPARK-42444: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/40135 > DataFrame.drop should handle multi columns properly > --- > > Key: SPARK-42444 > URL: https://issues.apache.org/jira/browse/SPARK-42444 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Blocker > > {code:java} > from pyspark.sql import Row > df1 = spark.createDataFrame([(14, "Tom"), (23, "Alice"), (16, "Bob")], > ["age", "name"]) > df2 = spark.createDataFrame([Row(height=80, name="Tom"), Row(height=85, > name="Bob")]) > df1.join(df2, df1.name == df2.name, 'inner').drop('name', 'age').show() > {code} > This works in 3.3 > {code:java} > +--+ > |height| > +--+ > |85| > |80| > +--+ > {code} > but fails in 3.4 > {code:java} > --- > AnalysisException Traceback (most recent call last) > Cell In[1], line 4 > 2 df1 = spark.createDataFrame([(14, "Tom"), (23, "Alice"), (16, > "Bob")], ["age", "name"]) > 3 df2 = spark.createDataFrame([Row(height=80, name="Tom"), > Row(height=85, name="Bob")]) > > 4 df1.join(df2, df1.name == df2.name, 'inner').drop('name', > 'age').show() > File ~/Dev/spark/python/pyspark/sql/dataframe.py:4913, in > DataFrame.drop(self, *cols) >4911 jcols = [_to_java_column(c) for c in cols] >4912 first_column, *remaining_columns = jcols > -> 4913 jdf = self._jdf.drop(first_column, self._jseq(remaining_columns)) >4915 return DataFrame(jdf, self.sparkSession) > File ~/Dev/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1322, > in JavaMember.__call__(self, *args) >1316 command = proto.CALL_COMMAND_NAME +\ >1317 self.command_header +\ >1318 args_command +\ >1319 proto.END_COMMAND_PART >1321 answer = self.gateway_client.send_command(command) > -> 1322 return_value = get_return_value( >1323 answer, self.gateway_client, self.target_id, self.name) >1325 for temp_arg in temp_args: >1326 if hasattr(temp_arg, "_detach"): > File ~/Dev/spark/python/pyspark/errors/exceptions/captured.py:159, in > capture_sql_exception..deco(*a, **kw) > 155 converted = convert_exception(e.java_exception) > 156 if not isinstance(converted, UnknownException): > 157 # Hide where the exception came from that shows a non-Pythonic > 158 # JVM exception message. > --> 159 raise converted from None > 160 else: > 161 raise > AnalysisException: [AMBIGUOUS_REFERENCE] Reference `name` is ambiguous, could > be: [`name`, `name`]. > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42444) DataFrame.drop should handle multi columns properly
[ https://issues.apache.org/jira/browse/SPARK-42444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42444: Assignee: Apache Spark > DataFrame.drop should handle multi columns properly > --- > > Key: SPARK-42444 > URL: https://issues.apache.org/jira/browse/SPARK-42444 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Apache Spark >Priority: Blocker > > {code:java} > from pyspark.sql import Row > df1 = spark.createDataFrame([(14, "Tom"), (23, "Alice"), (16, "Bob")], > ["age", "name"]) > df2 = spark.createDataFrame([Row(height=80, name="Tom"), Row(height=85, > name="Bob")]) > df1.join(df2, df1.name == df2.name, 'inner').drop('name', 'age').show() > {code} > This works in 3.3 > {code:java} > +--+ > |height| > +--+ > |85| > |80| > +--+ > {code} > but fails in 3.4 > {code:java} > --- > AnalysisException Traceback (most recent call last) > Cell In[1], line 4 > 2 df1 = spark.createDataFrame([(14, "Tom"), (23, "Alice"), (16, > "Bob")], ["age", "name"]) > 3 df2 = spark.createDataFrame([Row(height=80, name="Tom"), > Row(height=85, name="Bob")]) > > 4 df1.join(df2, df1.name == df2.name, 'inner').drop('name', > 'age').show() > File ~/Dev/spark/python/pyspark/sql/dataframe.py:4913, in > DataFrame.drop(self, *cols) >4911 jcols = [_to_java_column(c) for c in cols] >4912 first_column, *remaining_columns = jcols > -> 4913 jdf = self._jdf.drop(first_column, self._jseq(remaining_columns)) >4915 return DataFrame(jdf, self.sparkSession) > File ~/Dev/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1322, > in JavaMember.__call__(self, *args) >1316 command = proto.CALL_COMMAND_NAME +\ >1317 self.command_header +\ >1318 args_command +\ >1319 proto.END_COMMAND_PART >1321 answer = self.gateway_client.send_command(command) > -> 1322 return_value = get_return_value( >1323 answer, self.gateway_client, self.target_id, self.name) >1325 for temp_arg in temp_args: >1326 if hasattr(temp_arg, "_detach"): > File ~/Dev/spark/python/pyspark/errors/exceptions/captured.py:159, in > capture_sql_exception..deco(*a, **kw) > 155 converted = convert_exception(e.java_exception) > 156 if not isinstance(converted, UnknownException): > 157 # Hide where the exception came from that shows a non-Pythonic > 158 # JVM exception message. > --> 159 raise converted from None > 160 else: > 161 raise > AnalysisException: [AMBIGUOUS_REFERENCE] Reference `name` is ambiguous, could > be: [`name`, `name`]. > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42534) Fix DB2 Limit clause
[ https://issues.apache.org/jira/browse/SPARK-42534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692440#comment-17692440 ] Apache Spark commented on SPARK-42534: -- User 'sadikovi' has created a pull request for this issue: https://github.com/apache/spark/pull/40134 > Fix DB2 Limit clause > > > Key: SPARK-42534 > URL: https://issues.apache.org/jira/browse/SPARK-42534 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Ivan Sadikov >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42534) Fix DB2 Limit clause
[ https://issues.apache.org/jira/browse/SPARK-42534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42534: Assignee: (was: Apache Spark) > Fix DB2 Limit clause > > > Key: SPARK-42534 > URL: https://issues.apache.org/jira/browse/SPARK-42534 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Ivan Sadikov >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42534) Fix DB2 Limit clause
[ https://issues.apache.org/jira/browse/SPARK-42534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692439#comment-17692439 ] Apache Spark commented on SPARK-42534: -- User 'sadikovi' has created a pull request for this issue: https://github.com/apache/spark/pull/40134 > Fix DB2 Limit clause > > > Key: SPARK-42534 > URL: https://issues.apache.org/jira/browse/SPARK-42534 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Ivan Sadikov >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42534) Fix DB2 Limit clause
[ https://issues.apache.org/jira/browse/SPARK-42534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42534: Assignee: Apache Spark > Fix DB2 Limit clause > > > Key: SPARK-42534 > URL: https://issues.apache.org/jira/browse/SPARK-42534 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Ivan Sadikov >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42533) SSL support for Scala Client
[ https://issues.apache.org/jira/browse/SPARK-42533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692420#comment-17692420 ] Apache Spark commented on SPARK-42533: -- User 'zhenlineo' has created a pull request for this issue: https://github.com/apache/spark/pull/40133 > SSL support for Scala Client > > > Key: SPARK-42533 > URL: https://issues.apache.org/jira/browse/SPARK-42533 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Zhen Li >Priority: Major > > Add the basic encryption support for scala client. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42533) SSL support for Scala Client
[ https://issues.apache.org/jira/browse/SPARK-42533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42533: Assignee: (was: Apache Spark) > SSL support for Scala Client > > > Key: SPARK-42533 > URL: https://issues.apache.org/jira/browse/SPARK-42533 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Zhen Li >Priority: Major > > Add the basic encryption support for scala client. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42533) SSL support for Scala Client
[ https://issues.apache.org/jira/browse/SPARK-42533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42533: Assignee: Apache Spark > SSL support for Scala Client > > > Key: SPARK-42533 > URL: https://issues.apache.org/jira/browse/SPARK-42533 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Zhen Li >Assignee: Apache Spark >Priority: Major > > Add the basic encryption support for scala client. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42533) SSL support for Scala Client
[ https://issues.apache.org/jira/browse/SPARK-42533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692417#comment-17692417 ] Apache Spark commented on SPARK-42533: -- User 'zhenlineo' has created a pull request for this issue: https://github.com/apache/spark/pull/40133 > SSL support for Scala Client > > > Key: SPARK-42533 > URL: https://issues.apache.org/jira/browse/SPARK-42533 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Zhen Li >Priority: Major > > Add the basic encryption support for scala client. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42532) Update YuniKorn documentation with v1.2
[ https://issues.apache.org/jira/browse/SPARK-42532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42532: Assignee: Apache Spark > Update YuniKorn documentation with v1.2 > --- > > Key: SPARK-42532 > URL: https://issues.apache.org/jira/browse/SPARK-42532 > Project: Spark > Issue Type: Documentation > Components: Documentation, Kubernetes >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42532) Update YuniKorn documentation with v1.2
[ https://issues.apache.org/jira/browse/SPARK-42532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42532: Assignee: (was: Apache Spark) > Update YuniKorn documentation with v1.2 > --- > > Key: SPARK-42532 > URL: https://issues.apache.org/jira/browse/SPARK-42532 > Project: Spark > Issue Type: Documentation > Components: Documentation, Kubernetes >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42532) Update YuniKorn documentation with v1.2
[ https://issues.apache.org/jira/browse/SPARK-42532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692401#comment-17692401 ] Apache Spark commented on SPARK-42532: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/40132 > Update YuniKorn documentation with v1.2 > --- > > Key: SPARK-42532 > URL: https://issues.apache.org/jira/browse/SPARK-42532 > Project: Spark > Issue Type: Documentation > Components: Documentation, Kubernetes >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42150) Upgrade Volcano to 1.7.0
[ https://issues.apache.org/jira/browse/SPARK-42150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692384#comment-17692384 ] Apache Spark commented on SPARK-42150: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/40131 > Upgrade Volcano to 1.7.0 > > > Key: SPARK-42150 > URL: https://issues.apache.org/jira/browse/SPARK-42150 > Project: Spark > Issue Type: Improvement > Components: Documentation, Kubernetes >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42531) Scala Client Add Collection Functions
[ https://issues.apache.org/jira/browse/SPARK-42531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42531: Assignee: (was: Apache Spark) > Scala Client Add Collection Functions > - > > Key: SPARK-42531 > URL: https://issues.apache.org/jira/browse/SPARK-42531 > Project: Spark > Issue Type: Task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42531) Scala Client Add Collection Functions
[ https://issues.apache.org/jira/browse/SPARK-42531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42531: Assignee: Apache Spark > Scala Client Add Collection Functions > - > > Key: SPARK-42531 > URL: https://issues.apache.org/jira/browse/SPARK-42531 > Project: Spark > Issue Type: Task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42531) Scala Client Add Collection Functions
[ https://issues.apache.org/jira/browse/SPARK-42531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692367#comment-17692367 ] Apache Spark commented on SPARK-42531: -- User 'hvanhovell' has created a pull request for this issue: https://github.com/apache/spark/pull/40130 > Scala Client Add Collection Functions > - > > Key: SPARK-42531 > URL: https://issues.apache.org/jira/browse/SPARK-42531 > Project: Spark > Issue Type: Task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42529) Support Cube and Rollup
[ https://issues.apache.org/jira/browse/SPARK-42529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42529: Assignee: Apache Spark (was: Rui Wang) > Support Cube and Rollup > --- > > Key: SPARK-42529 > URL: https://issues.apache.org/jira/browse/SPARK-42529 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42529) Support Cube and Rollup
[ https://issues.apache.org/jira/browse/SPARK-42529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692349#comment-17692349 ] Apache Spark commented on SPARK-42529: -- User 'amaliujia' has created a pull request for this issue: https://github.com/apache/spark/pull/40129 > Support Cube and Rollup > --- > > Key: SPARK-42529 > URL: https://issues.apache.org/jira/browse/SPARK-42529 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42529) Support Cube and Rollup
[ https://issues.apache.org/jira/browse/SPARK-42529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42529: Assignee: Rui Wang (was: Apache Spark) > Support Cube and Rollup > --- > > Key: SPARK-42529 > URL: https://issues.apache.org/jira/browse/SPARK-42529 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42466) spark.kubernetes.file.upload.path not deleting files under HDFS after job completes
[ https://issues.apache.org/jira/browse/SPARK-42466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692340#comment-17692340 ] Apache Spark commented on SPARK-42466: -- User 'shrprasa' has created a pull request for this issue: https://github.com/apache/spark/pull/40128 > spark.kubernetes.file.upload.path not deleting files under HDFS after job > completes > --- > > Key: SPARK-42466 > URL: https://issues.apache.org/jira/browse/SPARK-42466 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.2.0 >Reporter: Jagadeeswara Rao >Priority: Major > > In cluster mode after uploading files to HDFS location using > spark.kubernetes.file.upload.path property files are not getting cleared . > File is successfully uploaded to hdfs location in this format > spark-upload-[randomUUID] using {{KubernetesUtils}} is requested to > uploadFileUri . > [https://github.com/apache/spark/blob/76a134ade60a9f354aca01eaca0b2e2477c6bd43/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesUtils.scala#L310] > following is driver log , driver is completed successfully and shutdownhook > is not cleared the hdfs files. > {code:java} > 23/02/16 18:06:56 INFO KubernetesClusterSchedulerBackend: Shutting down all > executors > 23/02/16 18:06:56 INFO > KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asking each > executor to shut down > 23/02/16 18:06:56 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has > been closed. > 23/02/16 18:06:57 INFO MapOutputTrackerMasterEndpoint: > MapOutputTrackerMasterEndpoint stopped! > 23/02/16 18:06:57 INFO MemoryStore: MemoryStore cleared > 23/02/16 18:06:57 INFO BlockManager: BlockManager stopped > 23/02/16 18:06:57 INFO BlockManagerMaster: BlockManagerMaster stopped > 23/02/16 18:06:57 INFO > OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: > OutputCommitCoordinator stopped! > 23/02/16 18:06:57 INFO SparkContext: Successfully stopped SparkContext > 23/02/16 18:06:57 INFO ShutdownHookManager: Shutdown hook called > 23/02/16 18:06:57 INFO ShutdownHookManager: Deleting directory > /tmp/spark-efb8f725-4ead-4729-a8e0-f478280121b7 > 23/02/16 18:06:57 INFO ShutdownHookManager: Deleting directory > /spark-local2/spark-66dbf7e6-fe7e-4655-8724-69d76d93fc1f > 23/02/16 18:06:57 INFO ShutdownHookManager: Deleting directory > /spark-local1/spark-53aefaee-58a5-4fce-b5b0-5e29f42e337f{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42466) spark.kubernetes.file.upload.path not deleting files under HDFS after job completes
[ https://issues.apache.org/jira/browse/SPARK-42466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692339#comment-17692339 ] Apache Spark commented on SPARK-42466: -- User 'shrprasa' has created a pull request for this issue: https://github.com/apache/spark/pull/40128 > spark.kubernetes.file.upload.path not deleting files under HDFS after job > completes > --- > > Key: SPARK-42466 > URL: https://issues.apache.org/jira/browse/SPARK-42466 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.2.0 >Reporter: Jagadeeswara Rao >Priority: Major > > In cluster mode after uploading files to HDFS location using > spark.kubernetes.file.upload.path property files are not getting cleared . > File is successfully uploaded to hdfs location in this format > spark-upload-[randomUUID] using {{KubernetesUtils}} is requested to > uploadFileUri . > [https://github.com/apache/spark/blob/76a134ade60a9f354aca01eaca0b2e2477c6bd43/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesUtils.scala#L310] > following is driver log , driver is completed successfully and shutdownhook > is not cleared the hdfs files. > {code:java} > 23/02/16 18:06:56 INFO KubernetesClusterSchedulerBackend: Shutting down all > executors > 23/02/16 18:06:56 INFO > KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asking each > executor to shut down > 23/02/16 18:06:56 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has > been closed. > 23/02/16 18:06:57 INFO MapOutputTrackerMasterEndpoint: > MapOutputTrackerMasterEndpoint stopped! > 23/02/16 18:06:57 INFO MemoryStore: MemoryStore cleared > 23/02/16 18:06:57 INFO BlockManager: BlockManager stopped > 23/02/16 18:06:57 INFO BlockManagerMaster: BlockManagerMaster stopped > 23/02/16 18:06:57 INFO > OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: > OutputCommitCoordinator stopped! > 23/02/16 18:06:57 INFO SparkContext: Successfully stopped SparkContext > 23/02/16 18:06:57 INFO ShutdownHookManager: Shutdown hook called > 23/02/16 18:06:57 INFO ShutdownHookManager: Deleting directory > /tmp/spark-efb8f725-4ead-4729-a8e0-f478280121b7 > 23/02/16 18:06:57 INFO ShutdownHookManager: Deleting directory > /spark-local2/spark-66dbf7e6-fe7e-4655-8724-69d76d93fc1f > 23/02/16 18:06:57 INFO ShutdownHookManager: Deleting directory > /spark-local1/spark-53aefaee-58a5-4fce-b5b0-5e29f42e337f{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42466) spark.kubernetes.file.upload.path not deleting files under HDFS after job completes
[ https://issues.apache.org/jira/browse/SPARK-42466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42466: Assignee: (was: Apache Spark) > spark.kubernetes.file.upload.path not deleting files under HDFS after job > completes > --- > > Key: SPARK-42466 > URL: https://issues.apache.org/jira/browse/SPARK-42466 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.2.0 >Reporter: Jagadeeswara Rao >Priority: Major > > In cluster mode after uploading files to HDFS location using > spark.kubernetes.file.upload.path property files are not getting cleared . > File is successfully uploaded to hdfs location in this format > spark-upload-[randomUUID] using {{KubernetesUtils}} is requested to > uploadFileUri . > [https://github.com/apache/spark/blob/76a134ade60a9f354aca01eaca0b2e2477c6bd43/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesUtils.scala#L310] > following is driver log , driver is completed successfully and shutdownhook > is not cleared the hdfs files. > {code:java} > 23/02/16 18:06:56 INFO KubernetesClusterSchedulerBackend: Shutting down all > executors > 23/02/16 18:06:56 INFO > KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asking each > executor to shut down > 23/02/16 18:06:56 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has > been closed. > 23/02/16 18:06:57 INFO MapOutputTrackerMasterEndpoint: > MapOutputTrackerMasterEndpoint stopped! > 23/02/16 18:06:57 INFO MemoryStore: MemoryStore cleared > 23/02/16 18:06:57 INFO BlockManager: BlockManager stopped > 23/02/16 18:06:57 INFO BlockManagerMaster: BlockManagerMaster stopped > 23/02/16 18:06:57 INFO > OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: > OutputCommitCoordinator stopped! > 23/02/16 18:06:57 INFO SparkContext: Successfully stopped SparkContext > 23/02/16 18:06:57 INFO ShutdownHookManager: Shutdown hook called > 23/02/16 18:06:57 INFO ShutdownHookManager: Deleting directory > /tmp/spark-efb8f725-4ead-4729-a8e0-f478280121b7 > 23/02/16 18:06:57 INFO ShutdownHookManager: Deleting directory > /spark-local2/spark-66dbf7e6-fe7e-4655-8724-69d76d93fc1f > 23/02/16 18:06:57 INFO ShutdownHookManager: Deleting directory > /spark-local1/spark-53aefaee-58a5-4fce-b5b0-5e29f42e337f{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42466) spark.kubernetes.file.upload.path not deleting files under HDFS after job completes
[ https://issues.apache.org/jira/browse/SPARK-42466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42466: Assignee: Apache Spark > spark.kubernetes.file.upload.path not deleting files under HDFS after job > completes > --- > > Key: SPARK-42466 > URL: https://issues.apache.org/jira/browse/SPARK-42466 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.2.0 >Reporter: Jagadeeswara Rao >Assignee: Apache Spark >Priority: Major > > In cluster mode after uploading files to HDFS location using > spark.kubernetes.file.upload.path property files are not getting cleared . > File is successfully uploaded to hdfs location in this format > spark-upload-[randomUUID] using {{KubernetesUtils}} is requested to > uploadFileUri . > [https://github.com/apache/spark/blob/76a134ade60a9f354aca01eaca0b2e2477c6bd43/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesUtils.scala#L310] > following is driver log , driver is completed successfully and shutdownhook > is not cleared the hdfs files. > {code:java} > 23/02/16 18:06:56 INFO KubernetesClusterSchedulerBackend: Shutting down all > executors > 23/02/16 18:06:56 INFO > KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asking each > executor to shut down > 23/02/16 18:06:56 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has > been closed. > 23/02/16 18:06:57 INFO MapOutputTrackerMasterEndpoint: > MapOutputTrackerMasterEndpoint stopped! > 23/02/16 18:06:57 INFO MemoryStore: MemoryStore cleared > 23/02/16 18:06:57 INFO BlockManager: BlockManager stopped > 23/02/16 18:06:57 INFO BlockManagerMaster: BlockManagerMaster stopped > 23/02/16 18:06:57 INFO > OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: > OutputCommitCoordinator stopped! > 23/02/16 18:06:57 INFO SparkContext: Successfully stopped SparkContext > 23/02/16 18:06:57 INFO ShutdownHookManager: Shutdown hook called > 23/02/16 18:06:57 INFO ShutdownHookManager: Deleting directory > /tmp/spark-efb8f725-4ead-4729-a8e0-f478280121b7 > 23/02/16 18:06:57 INFO ShutdownHookManager: Deleting directory > /spark-local2/spark-66dbf7e6-fe7e-4655-8724-69d76d93fc1f > 23/02/16 18:06:57 INFO ShutdownHookManager: Deleting directory > /spark-local1/spark-53aefaee-58a5-4fce-b5b0-5e29f42e337f{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42530) Remove Hadoop 2 from PySpark installation guide
[ https://issues.apache.org/jira/browse/SPARK-42530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692333#comment-17692333 ] Apache Spark commented on SPARK-42530: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/40127 > Remove Hadoop 2 from PySpark installation guide > --- > > Key: SPARK-42530 > URL: https://issues.apache.org/jira/browse/SPARK-42530 > Project: Spark > Issue Type: Documentation > Components: Documentation, PySpark >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42530) Remove Hadoop 2 from PySpark installation guide
[ https://issues.apache.org/jira/browse/SPARK-42530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42530: Assignee: (was: Apache Spark) > Remove Hadoop 2 from PySpark installation guide > --- > > Key: SPARK-42530 > URL: https://issues.apache.org/jira/browse/SPARK-42530 > Project: Spark > Issue Type: Documentation > Components: Documentation, PySpark >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42530) Remove Hadoop 2 from PySpark installation guide
[ https://issues.apache.org/jira/browse/SPARK-42530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692332#comment-17692332 ] Apache Spark commented on SPARK-42530: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/40127 > Remove Hadoop 2 from PySpark installation guide > --- > > Key: SPARK-42530 > URL: https://issues.apache.org/jira/browse/SPARK-42530 > Project: Spark > Issue Type: Documentation > Components: Documentation, PySpark >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org