[jira] [Commented] (SPARK-42631) Support custom extensions in Spark Connect Scala client
[ https://issues.apache.org/jira/browse/SPARK-42631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17695254#comment-17695254 ] Apache Spark commented on SPARK-42631: -- User 'tomvanbussel' has created a pull request for this issue: https://github.com/apache/spark/pull/40234 > Support custom extensions in Spark Connect Scala client > --- > > Key: SPARK-42631 > URL: https://issues.apache.org/jira/browse/SPARK-42631 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.0 >Reporter: Tom van Bussel >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42631) Support custom extensions in Spark Connect Scala client
[ https://issues.apache.org/jira/browse/SPARK-42631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42631: Assignee: Apache Spark > Support custom extensions in Spark Connect Scala client > --- > > Key: SPARK-42631 > URL: https://issues.apache.org/jira/browse/SPARK-42631 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.0 >Reporter: Tom van Bussel >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42631) Support custom extensions in Spark Connect Scala client
[ https://issues.apache.org/jira/browse/SPARK-42631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42631: Assignee: (was: Apache Spark) > Support custom extensions in Spark Connect Scala client > --- > > Key: SPARK-42631 > URL: https://issues.apache.org/jira/browse/SPARK-42631 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.0 >Reporter: Tom van Bussel >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42631) Support custom extensions in Spark Connect Scala client
[ https://issues.apache.org/jira/browse/SPARK-42631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17695252#comment-17695252 ] Apache Spark commented on SPARK-42631: -- User 'tomvanbussel' has created a pull request for this issue: https://github.com/apache/spark/pull/40234 > Support custom extensions in Spark Connect Scala client > --- > > Key: SPARK-42631 > URL: https://issues.apache.org/jira/browse/SPARK-42631 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.0 >Reporter: Tom van Bussel >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42632) Fix scala paths in tests
[ https://issues.apache.org/jira/browse/SPARK-42632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17695232#comment-17695232 ] Apache Spark commented on SPARK-42632: -- User 'hvanhovell' has created a pull request for this issue: https://github.com/apache/spark/pull/40235 > Fix scala paths in tests > > > Key: SPARK-42632 > URL: https://issues.apache.org/jira/browse/SPARK-42632 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.1 >Reporter: Herman van Hövell >Assignee: Herman van Hövell >Priority: Major > > The jar resolution in the connect client tests can resolve the jar for the > wrong scala version. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42632) Fix scala paths in tests
[ https://issues.apache.org/jira/browse/SPARK-42632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42632: Assignee: Apache Spark (was: Herman van Hövell) > Fix scala paths in tests > > > Key: SPARK-42632 > URL: https://issues.apache.org/jira/browse/SPARK-42632 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.1 >Reporter: Herman van Hövell >Assignee: Apache Spark >Priority: Major > > The jar resolution in the connect client tests can resolve the jar for the > wrong scala version. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42632) Fix scala paths in tests
[ https://issues.apache.org/jira/browse/SPARK-42632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42632: Assignee: Herman van Hövell (was: Apache Spark) > Fix scala paths in tests > > > Key: SPARK-42632 > URL: https://issues.apache.org/jira/browse/SPARK-42632 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.1 >Reporter: Herman van Hövell >Assignee: Herman van Hövell >Priority: Major > > The jar resolution in the connect client tests can resolve the jar for the > wrong scala version. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34827) Support fetching shuffle blocks in batch with i/o encryption
[ https://issues.apache.org/jira/browse/SPARK-34827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34827: Assignee: Apache Spark > Support fetching shuffle blocks in batch with i/o encryption > > > Key: SPARK-34827 > URL: https://issues.apache.org/jira/browse/SPARK-34827 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0, 3.3.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Blocker > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34827) Support fetching shuffle blocks in batch with i/o encryption
[ https://issues.apache.org/jira/browse/SPARK-34827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34827: Assignee: (was: Apache Spark) > Support fetching shuffle blocks in batch with i/o encryption > > > Key: SPARK-34827 > URL: https://issues.apache.org/jira/browse/SPARK-34827 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0, 3.3.0 >Reporter: Dongjoon Hyun >Priority: Blocker > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34827) Support fetching shuffle blocks in batch with i/o encryption
[ https://issues.apache.org/jira/browse/SPARK-34827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17695157#comment-17695157 ] Apache Spark commented on SPARK-34827: -- User 'tomvanbussel' has created a pull request for this issue: https://github.com/apache/spark/pull/40234 > Support fetching shuffle blocks in batch with i/o encryption > > > Key: SPARK-34827 > URL: https://issues.apache.org/jira/browse/SPARK-34827 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0, 3.3.0 >Reporter: Dongjoon Hyun >Priority: Blocker > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42630) Make `parse_data_type` use new proto message `DDLParse`
[ https://issues.apache.org/jira/browse/SPARK-42630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42630: Assignee: (was: Apache Spark) > Make `parse_data_type` use new proto message `DDLParse` > --- > > Key: SPARK-42630 > URL: https://issues.apache.org/jira/browse/SPARK-42630 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42630) Make `parse_data_type` use new proto message `DDLParse`
[ https://issues.apache.org/jira/browse/SPARK-42630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42630: Assignee: Apache Spark > Make `parse_data_type` use new proto message `DDLParse` > --- > > Key: SPARK-42630 > URL: https://issues.apache.org/jira/browse/SPARK-42630 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42630) Make `parse_data_type` use new proto message `DDLParse`
[ https://issues.apache.org/jira/browse/SPARK-42630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17695056#comment-17695056 ] Apache Spark commented on SPARK-42630: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/40233 > Make `parse_data_type` use new proto message `DDLParse` > --- > > Key: SPARK-42630 > URL: https://issues.apache.org/jira/browse/SPARK-42630 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42629) Update the description of default data source in the document
[ https://issues.apache.org/jira/browse/SPARK-42629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17695046#comment-17695046 ] Apache Spark commented on SPARK-42629: -- User 'huangxiaopingRD' has created a pull request for this issue: https://github.com/apache/spark/pull/40232 > Update the description of default data source in the document > - > > Key: SPARK-42629 > URL: https://issues.apache.org/jira/browse/SPARK-42629 > Project: Spark > Issue Type: Task > Components: Documentation >Affects Versions: 3.5.0 >Reporter: xiaoping.huang >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42629) Update the description of default data source in the document
[ https://issues.apache.org/jira/browse/SPARK-42629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42629: Assignee: (was: Apache Spark) > Update the description of default data source in the document > - > > Key: SPARK-42629 > URL: https://issues.apache.org/jira/browse/SPARK-42629 > Project: Spark > Issue Type: Task > Components: Documentation >Affects Versions: 3.5.0 >Reporter: xiaoping.huang >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42629) Update the description of default data source in the document
[ https://issues.apache.org/jira/browse/SPARK-42629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42629: Assignee: Apache Spark > Update the description of default data source in the document > - > > Key: SPARK-42629 > URL: https://issues.apache.org/jira/browse/SPARK-42629 > Project: Spark > Issue Type: Task > Components: Documentation >Affects Versions: 3.5.0 >Reporter: xiaoping.huang >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42628) Add a migration note for bloom filter join
[ https://issues.apache.org/jira/browse/SPARK-42628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42628: Assignee: (was: Apache Spark) > Add a migration note for bloom filter join > -- > > Key: SPARK-42628 > URL: https://issues.apache.org/jira/browse/SPARK-42628 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42628) Add a migration note for bloom filter join
[ https://issues.apache.org/jira/browse/SPARK-42628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694958#comment-17694958 ] Apache Spark commented on SPARK-42628: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/40231 > Add a migration note for bloom filter join > -- > > Key: SPARK-42628 > URL: https://issues.apache.org/jira/browse/SPARK-42628 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42628) Add a migration note for bloom filter join
[ https://issues.apache.org/jira/browse/SPARK-42628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42628: Assignee: Apache Spark > Add a migration note for bloom filter join > -- > > Key: SPARK-42628 > URL: https://issues.apache.org/jira/browse/SPARK-42628 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42521) Add NULL values for INSERT commands with user-specified lists of fewer columns than the target table
[ https://issues.apache.org/jira/browse/SPARK-42521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694799#comment-17694799 ] Apache Spark commented on SPARK-42521: -- User 'dtenedor' has created a pull request for this issue: https://github.com/apache/spark/pull/40229 > Add NULL values for INSERT commands with user-specified lists of fewer > columns than the target table > > > Key: SPARK-42521 > URL: https://issues.apache.org/jira/browse/SPARK-42521 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Daniel >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41874) Implement DataFrame `sameSemantics`
[ https://issues.apache.org/jira/browse/SPARK-41874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694796#comment-17694796 ] Apache Spark commented on SPARK-41874: -- User 'amaliujia' has created a pull request for this issue: https://github.com/apache/spark/pull/40228 > Implement DataFrame `sameSemantics` > --- > > Key: SPARK-41874 > URL: https://issues.apache.org/jira/browse/SPARK-41874 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41874) Implement DataFrame `sameSemantics`
[ https://issues.apache.org/jira/browse/SPARK-41874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694795#comment-17694795 ] Apache Spark commented on SPARK-41874: -- User 'amaliujia' has created a pull request for this issue: https://github.com/apache/spark/pull/40228 > Implement DataFrame `sameSemantics` > --- > > Key: SPARK-41874 > URL: https://issues.apache.org/jira/browse/SPARK-41874 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41870) Handle duplicate columns in `createDataFrame`
[ https://issues.apache.org/jira/browse/SPARK-41870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694791#comment-17694791 ] Apache Spark commented on SPARK-41870: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/40227 > Handle duplicate columns in `createDataFrame` > - > > Key: SPARK-41870 > URL: https://issues.apache.org/jira/browse/SPARK-41870 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > df = self.spark.createDataFrame([(1, 2)], ["c", "c"]){code} > Error: > {code:java} > Traceback (most recent call last): > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_dataframe.py", > line 65, in test_duplicated_column_names > df = self.spark.createDataFrame([(1, 2)], ["c", "c"]) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/session.py", > line 277, in createDataFrame > raise ValueError( > ValueError: Length mismatch: Expected axis has 1 elements, new values have 2 > elements{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41870) Handle duplicate columns in `createDataFrame`
[ https://issues.apache.org/jira/browse/SPARK-41870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41870: Assignee: (was: Apache Spark) > Handle duplicate columns in `createDataFrame` > - > > Key: SPARK-41870 > URL: https://issues.apache.org/jira/browse/SPARK-41870 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > df = self.spark.createDataFrame([(1, 2)], ["c", "c"]){code} > Error: > {code:java} > Traceback (most recent call last): > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_dataframe.py", > line 65, in test_duplicated_column_names > df = self.spark.createDataFrame([(1, 2)], ["c", "c"]) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/session.py", > line 277, in createDataFrame > raise ValueError( > ValueError: Length mismatch: Expected axis has 1 elements, new values have 2 > elements{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41870) Handle duplicate columns in `createDataFrame`
[ https://issues.apache.org/jira/browse/SPARK-41870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41870: Assignee: Apache Spark > Handle duplicate columns in `createDataFrame` > - > > Key: SPARK-41870 > URL: https://issues.apache.org/jira/browse/SPARK-41870 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Assignee: Apache Spark >Priority: Major > > {code:java} > df = self.spark.createDataFrame([(1, 2)], ["c", "c"]){code} > Error: > {code:java} > Traceback (most recent call last): > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_dataframe.py", > line 65, in test_duplicated_column_names > df = self.spark.createDataFrame([(1, 2)], ["c", "c"]) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/session.py", > line 277, in createDataFrame > raise ValueError( > ValueError: Length mismatch: Expected axis has 1 elements, new values have 2 > elements{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41868) Support data type Duration(NANOSECOND)
[ https://issues.apache.org/jira/browse/SPARK-41868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694781#comment-17694781 ] Apache Spark commented on SPARK-41868: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/40226 > Support data type Duration(NANOSECOND) > -- > > Key: SPARK-41868 > URL: https://issues.apache.org/jira/browse/SPARK-41868 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > import pandas as pd > from datetime import timedelta > df = self.spark.createDataFrame(pd.DataFrame({"a": > [timedelta(microseconds=123)]})) {code} > {code:java} > Traceback (most recent call last): > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_dataframe.py", > line 1291, in test_create_dataframe_from_pandas_with_day_time_interval > self.assertEqual(df.toPandas().a.iloc[0], timedelta(microseconds=123)) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 1031, in toPandas > return self._session.client.to_pandas(query) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 413, in to_pandas > return self._execute_and_fetch(req) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 573, in _execute_and_fetch > self._handle_error(rpc_error) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 623, in _handle_error > raise SparkConnectException(status.message, info.reason) from None > pyspark.sql.connect.client.SparkConnectException: > (org.apache.spark.SparkUnsupportedOperationException) Unsupported data type: > Duration(NANOSECOND){code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41868) Support data type Duration(NANOSECOND)
[ https://issues.apache.org/jira/browse/SPARK-41868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41868: Assignee: Apache Spark > Support data type Duration(NANOSECOND) > -- > > Key: SPARK-41868 > URL: https://issues.apache.org/jira/browse/SPARK-41868 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Assignee: Apache Spark >Priority: Major > > {code:java} > import pandas as pd > from datetime import timedelta > df = self.spark.createDataFrame(pd.DataFrame({"a": > [timedelta(microseconds=123)]})) {code} > {code:java} > Traceback (most recent call last): > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_dataframe.py", > line 1291, in test_create_dataframe_from_pandas_with_day_time_interval > self.assertEqual(df.toPandas().a.iloc[0], timedelta(microseconds=123)) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 1031, in toPandas > return self._session.client.to_pandas(query) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 413, in to_pandas > return self._execute_and_fetch(req) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 573, in _execute_and_fetch > self._handle_error(rpc_error) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 623, in _handle_error > raise SparkConnectException(status.message, info.reason) from None > pyspark.sql.connect.client.SparkConnectException: > (org.apache.spark.SparkUnsupportedOperationException) Unsupported data type: > Duration(NANOSECOND){code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41868) Support data type Duration(NANOSECOND)
[ https://issues.apache.org/jira/browse/SPARK-41868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41868: Assignee: (was: Apache Spark) > Support data type Duration(NANOSECOND) > -- > > Key: SPARK-41868 > URL: https://issues.apache.org/jira/browse/SPARK-41868 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > import pandas as pd > from datetime import timedelta > df = self.spark.createDataFrame(pd.DataFrame({"a": > [timedelta(microseconds=123)]})) {code} > {code:java} > Traceback (most recent call last): > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_dataframe.py", > line 1291, in test_create_dataframe_from_pandas_with_day_time_interval > self.assertEqual(df.toPandas().a.iloc[0], timedelta(microseconds=123)) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 1031, in toPandas > return self._session.client.to_pandas(query) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 413, in to_pandas > return self._execute_and_fetch(req) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 573, in _execute_and_fetch > self._handle_error(rpc_error) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 623, in _handle_error > raise SparkConnectException(status.message, info.reason) from None > pyspark.sql.connect.client.SparkConnectException: > (org.apache.spark.SparkUnsupportedOperationException) Unsupported data type: > Duration(NANOSECOND){code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41868) Support data type Duration(NANOSECOND)
[ https://issues.apache.org/jira/browse/SPARK-41868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694780#comment-17694780 ] Apache Spark commented on SPARK-41868: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/40226 > Support data type Duration(NANOSECOND) > -- > > Key: SPARK-41868 > URL: https://issues.apache.org/jira/browse/SPARK-41868 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > import pandas as pd > from datetime import timedelta > df = self.spark.createDataFrame(pd.DataFrame({"a": > [timedelta(microseconds=123)]})) {code} > {code:java} > Traceback (most recent call last): > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_dataframe.py", > line 1291, in test_create_dataframe_from_pandas_with_day_time_interval > self.assertEqual(df.toPandas().a.iloc[0], timedelta(microseconds=123)) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 1031, in toPandas > return self._session.client.to_pandas(query) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 413, in to_pandas > return self._execute_and_fetch(req) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 573, in _execute_and_fetch > self._handle_error(rpc_error) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 623, in _handle_error > raise SparkConnectException(status.message, info.reason) from None > pyspark.sql.connect.client.SparkConnectException: > (org.apache.spark.SparkUnsupportedOperationException) Unsupported data type: > Duration(NANOSECOND){code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42625) Upgrade zstd-jni to 1.5.4-2
[ https://issues.apache.org/jira/browse/SPARK-42625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42625: Assignee: (was: Apache Spark) > Upgrade zstd-jni to 1.5.4-2 > --- > > Key: SPARK-42625 > URL: https://issues.apache.org/jira/browse/SPARK-42625 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.5.0 >Reporter: Dongjoon Hyun >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42625) Upgrade zstd-jni to 1.5.4-2
[ https://issues.apache.org/jira/browse/SPARK-42625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42625: Assignee: Apache Spark > Upgrade zstd-jni to 1.5.4-2 > --- > > Key: SPARK-42625 > URL: https://issues.apache.org/jira/browse/SPARK-42625 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.5.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42625) Upgrade zstd-jni to 1.5.4-2
[ https://issues.apache.org/jira/browse/SPARK-42625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694755#comment-17694755 ] Apache Spark commented on SPARK-42625: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/40225 > Upgrade zstd-jni to 1.5.4-2 > --- > > Key: SPARK-42625 > URL: https://issues.apache.org/jira/browse/SPARK-42625 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.5.0 >Reporter: Dongjoon Hyun >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42539) User-provided JARs can override Spark's Hive metadata client JARs when using "builtin"
[ https://issues.apache.org/jira/browse/SPARK-42539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694753#comment-17694753 ] Apache Spark commented on SPARK-42539: -- User 'xkrogen' has created a pull request for this issue: https://github.com/apache/spark/pull/40224 > User-provided JARs can override Spark's Hive metadata client JARs when using > "builtin" > -- > > Key: SPARK-42539 > URL: https://issues.apache.org/jira/browse/SPARK-42539 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.3, 3.2.3, 3.3.2 >Reporter: Erik Krogen >Priority: Major > > Recently we observed that on version 3.2.0 and Java 8, it is possible for > user-provided Hive JARs to break the ability for Spark, via the Hive metadata > client / {{IsolatedClientLoader}}, to communicate with Hive Metastore, when > using the default behavior of the "builtin" Hive version. After SPARK-35321, > when Spark is compiled against Hive >= 2.3.9 and the "builtin" Hive client > version is used, we will call the method {{Hive.getWithoutRegisterFns()}} > (from HIVE-21563) instead of {{Hive.get()}}. If the user has included, for > example, {{hive-exec-2.3.8.jar}} on their classpath, the client will break > with a {{NoSuchMethodError}}. This particular failure mode was resolved in > 3.2.1 by SPARK-37446, but while investigating, we found a general issue that > it's possible for user JARs to override Spark's own JARs -- but only inside > of the IsolatedClientLoader when using "builtin". This happens because even > when Spark is configured to use the "builtin" Hive classes, it still creates > a separate URLClassLoader for the HiveClientImpl used for HMS communication. > To get the set of JAR URLs to use for this classloader, Spark [collects all > of the JARs used by the user classloader (and its parent, and that > classloader's parent, and so > on)|https://github.com/apache/spark/blob/87e3d5625e76bb734b8dd753bfb25002822c8585/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala#L412-L438]. > Thus the newly created classloader will have all of the same JARs as the > user classloader, but the ordering has been reversed! User JARs get > prioritized ahead of system JARs, because the classloader hierarchy is > traversed from bottom-to-top. For example let's say we have user JARs > "foo.jar" and "hive-exec-2.3.8.jar". The user classloader will look like this: > {code} > MutableURLClassLoader > -- foo.jar > -- hive-exec-2.3.8.jar > -- parent: URLClassLoader > - spark-core_2.12-3.2.0.jar > - ... > - hive-exec-2.3.9.jar > - ... > {code} > This setup provides the expected behavior within the user classloader; it > will first check the parent, so hive-exec-2.3.9.jar takes precedence, and the > MutableURLClassLoader is only checked if the class doesn't exist in the > parent. But when a JAR list is constructed for the IsolatedClientLoader, it > traverses the URLs from MutableURLClassLoader first, then it's parent, so the > final list looks like (in order): > {code} > URLClassLoader [IsolatedClientLoader] > -- foo.jar > -- hive-exec-2.3.8.jar > -- spark-core_2.12-3.2.0.jar > -- ... > -- hive-exec-2.3.9.jar > -- ... > -- parent: boot classloader (JVM classes) > {code} > Now when a lookup happens, all of the JARs are within the same > URLClassLoader, and the user JARs are in front of the Spark ones, so the user > JARs get prioritized. This is the opposite of the expected behavior when > using the default user/application classloader in Spark, which has > parent-first behavior, prioritizing the Spark/system classes over the user > classes. (Note that this behavior is correct when using the > {{ChildFirstURLClassLoader}}.) > After SPARK-37446, the NoSuchMethodError is no longer an issue, but this > still breaks assumptions about how user JARs should be treated vs. system > JARs, and presents the ability for the client to break in other ways. For > example in SPARK-37446 it describes a scenario whereby Hive 2.3.8 JARs have > been included; the changes in Hive 2.3.9 were needed to improve compatibility > with older HMS, so if a user were to accidentally include these older JARs, > it could break the ability of Spark to communicate with HMS 1.x > I see two solutions to this: > *(A) Remove the separate classloader entirely when using "builtin"* > Starting from 3.0.0, due to SPARK-26839, when using Java 9+, we don't even > create a new classloader when using "builtin". This makes sense, as [called > out in this > comment|https://github.com/apache/spark/pull/24057#discussion_r265142878], > since the point of "builtin" is to use the existing JARs on the classpath > anyway. This proposes simply extending the changes from SPARK-2683
[jira] [Commented] (SPARK-42539) User-provided JARs can override Spark's Hive metadata client JARs when using "builtin"
[ https://issues.apache.org/jira/browse/SPARK-42539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694752#comment-17694752 ] Apache Spark commented on SPARK-42539: -- User 'xkrogen' has created a pull request for this issue: https://github.com/apache/spark/pull/40224 > User-provided JARs can override Spark's Hive metadata client JARs when using > "builtin" > -- > > Key: SPARK-42539 > URL: https://issues.apache.org/jira/browse/SPARK-42539 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.3, 3.2.3, 3.3.2 >Reporter: Erik Krogen >Priority: Major > > Recently we observed that on version 3.2.0 and Java 8, it is possible for > user-provided Hive JARs to break the ability for Spark, via the Hive metadata > client / {{IsolatedClientLoader}}, to communicate with Hive Metastore, when > using the default behavior of the "builtin" Hive version. After SPARK-35321, > when Spark is compiled against Hive >= 2.3.9 and the "builtin" Hive client > version is used, we will call the method {{Hive.getWithoutRegisterFns()}} > (from HIVE-21563) instead of {{Hive.get()}}. If the user has included, for > example, {{hive-exec-2.3.8.jar}} on their classpath, the client will break > with a {{NoSuchMethodError}}. This particular failure mode was resolved in > 3.2.1 by SPARK-37446, but while investigating, we found a general issue that > it's possible for user JARs to override Spark's own JARs -- but only inside > of the IsolatedClientLoader when using "builtin". This happens because even > when Spark is configured to use the "builtin" Hive classes, it still creates > a separate URLClassLoader for the HiveClientImpl used for HMS communication. > To get the set of JAR URLs to use for this classloader, Spark [collects all > of the JARs used by the user classloader (and its parent, and that > classloader's parent, and so > on)|https://github.com/apache/spark/blob/87e3d5625e76bb734b8dd753bfb25002822c8585/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala#L412-L438]. > Thus the newly created classloader will have all of the same JARs as the > user classloader, but the ordering has been reversed! User JARs get > prioritized ahead of system JARs, because the classloader hierarchy is > traversed from bottom-to-top. For example let's say we have user JARs > "foo.jar" and "hive-exec-2.3.8.jar". The user classloader will look like this: > {code} > MutableURLClassLoader > -- foo.jar > -- hive-exec-2.3.8.jar > -- parent: URLClassLoader > - spark-core_2.12-3.2.0.jar > - ... > - hive-exec-2.3.9.jar > - ... > {code} > This setup provides the expected behavior within the user classloader; it > will first check the parent, so hive-exec-2.3.9.jar takes precedence, and the > MutableURLClassLoader is only checked if the class doesn't exist in the > parent. But when a JAR list is constructed for the IsolatedClientLoader, it > traverses the URLs from MutableURLClassLoader first, then it's parent, so the > final list looks like (in order): > {code} > URLClassLoader [IsolatedClientLoader] > -- foo.jar > -- hive-exec-2.3.8.jar > -- spark-core_2.12-3.2.0.jar > -- ... > -- hive-exec-2.3.9.jar > -- ... > -- parent: boot classloader (JVM classes) > {code} > Now when a lookup happens, all of the JARs are within the same > URLClassLoader, and the user JARs are in front of the Spark ones, so the user > JARs get prioritized. This is the opposite of the expected behavior when > using the default user/application classloader in Spark, which has > parent-first behavior, prioritizing the Spark/system classes over the user > classes. (Note that this behavior is correct when using the > {{ChildFirstURLClassLoader}}.) > After SPARK-37446, the NoSuchMethodError is no longer an issue, but this > still breaks assumptions about how user JARs should be treated vs. system > JARs, and presents the ability for the client to break in other ways. For > example in SPARK-37446 it describes a scenario whereby Hive 2.3.8 JARs have > been included; the changes in Hive 2.3.9 were needed to improve compatibility > with older HMS, so if a user were to accidentally include these older JARs, > it could break the ability of Spark to communicate with HMS 1.x > I see two solutions to this: > *(A) Remove the separate classloader entirely when using "builtin"* > Starting from 3.0.0, due to SPARK-26839, when using Java 9+, we don't even > create a new classloader when using "builtin". This makes sense, as [called > out in this > comment|https://github.com/apache/spark/pull/24057#discussion_r265142878], > since the point of "builtin" is to use the existing JARs on the classpath > anyway. This proposes simply extending the changes from SPARK-2683
[jira] [Assigned] (SPARK-42624) Reorganize imports in test_functions
[ https://issues.apache.org/jira/browse/SPARK-42624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42624: Assignee: (was: Apache Spark) > Reorganize imports in test_functions > > > Key: SPARK-42624 > URL: https://issues.apache.org/jira/browse/SPARK-42624 > Project: Spark > Issue Type: Task > Components: PySpark, Tests >Affects Versions: 3.4.0 >Reporter: Takuya Ueshin >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42624) Reorganize imports in test_functions
[ https://issues.apache.org/jira/browse/SPARK-42624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42624: Assignee: Apache Spark > Reorganize imports in test_functions > > > Key: SPARK-42624 > URL: https://issues.apache.org/jira/browse/SPARK-42624 > Project: Spark > Issue Type: Task > Components: PySpark, Tests >Affects Versions: 3.4.0 >Reporter: Takuya Ueshin >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42624) Reorganize imports in test_functions
[ https://issues.apache.org/jira/browse/SPARK-42624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694729#comment-17694729 ] Apache Spark commented on SPARK-42624: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/40223 > Reorganize imports in test_functions > > > Key: SPARK-42624 > URL: https://issues.apache.org/jira/browse/SPARK-42624 > Project: Spark > Issue Type: Task > Components: PySpark, Tests >Affects Versions: 3.4.0 >Reporter: Takuya Ueshin >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42615) Refactor the AnalyzePlan RPC and add `session.version`
[ https://issues.apache.org/jira/browse/SPARK-42615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694661#comment-17694661 ] Apache Spark commented on SPARK-42615: -- User 'amaliujia' has created a pull request for this issue: https://github.com/apache/spark/pull/40222 > Refactor the AnalyzePlan RPC and add `session.version` > -- > > Key: SPARK-42615 > URL: https://issues.apache.org/jira/browse/SPARK-42615 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.4.1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41551) Improve/complete PathOutputCommitProtocol support for dynamic partitioning
[ https://issues.apache.org/jira/browse/SPARK-41551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694632#comment-17694632 ] Apache Spark commented on SPARK-41551: -- User 'steveloughran' has created a pull request for this issue: https://github.com/apache/spark/pull/40221 > Improve/complete PathOutputCommitProtocol support for dynamic partitioning > -- > > Key: SPARK-41551 > URL: https://issues.apache.org/jira/browse/SPARK-41551 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.1 >Reporter: Steve Loughran >Priority: Minor > > Followup to SPARK-40034 as > * that is incomplete as it doesn't record the partitions > * as long at the job doesn't call `newTaskTempFileAbsPath()`, and slow > renames are ok, both s3a committers are actually OK to use. > It's only that newTaskTempFileAbsPath operation which is unsupported in s3a > committers; the post-job dir rename is O(data) but file by file rename is > correct for a non-atomic job commit. > # Cut PathOutputCommitProtocol.newTaskTempFile; to update super > partitionPaths (needs a setter). The superclass can't just say if (committer > instance of PathOutputCommitter as spark-core needs to compile with older > hadoop versions) > # downgrade failure in setup to log (info?) > # retain failure in the newTaskTempFileAbsPath call. > Testing: yes -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41551) Improve/complete PathOutputCommitProtocol support for dynamic partitioning
[ https://issues.apache.org/jira/browse/SPARK-41551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694629#comment-17694629 ] Apache Spark commented on SPARK-41551: -- User 'steveloughran' has created a pull request for this issue: https://github.com/apache/spark/pull/40221 > Improve/complete PathOutputCommitProtocol support for dynamic partitioning > -- > > Key: SPARK-41551 > URL: https://issues.apache.org/jira/browse/SPARK-41551 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.1 >Reporter: Steve Loughran >Priority: Minor > > Followup to SPARK-40034 as > * that is incomplete as it doesn't record the partitions > * as long at the job doesn't call `newTaskTempFileAbsPath()`, and slow > renames are ok, both s3a committers are actually OK to use. > It's only that newTaskTempFileAbsPath operation which is unsupported in s3a > committers; the post-job dir rename is O(data) but file by file rename is > correct for a non-atomic job commit. > # Cut PathOutputCommitProtocol.newTaskTempFile; to update super > partitionPaths (needs a setter). The superclass can't just say if (committer > instance of PathOutputCommitter as spark-core needs to compile with older > hadoop versions) > # downgrade failure in setup to log (info?) > # retain failure in the newTaskTempFileAbsPath call. > Testing: yes -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42622) StackOverflowError reading json that does not conform to schema
[ https://issues.apache.org/jira/browse/SPARK-42622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694622#comment-17694622 ] Apache Spark commented on SPARK-42622: -- User 'jelmerk' has created a pull request for this issue: https://github.com/apache/spark/pull/40219 > StackOverflowError reading json that does not conform to schema > --- > > Key: SPARK-42622 > URL: https://issues.apache.org/jira/browse/SPARK-42622 > Project: Spark > Issue Type: Bug > Components: Input/Output >Affects Versions: 3.4.0 >Reporter: Jelmer Kuperus >Priority: Critical > > Databricks runtime 12.1 uses a pre-release version of spark 3.4.x we > encountered the following problem > > !https://user-images.githubusercontent.com/133639/221866500-99f187a0-8db3-42a7-85ca-b027fdec160d.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42622) StackOverflowError reading json that does not conform to schema
[ https://issues.apache.org/jira/browse/SPARK-42622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42622: Assignee: (was: Apache Spark) > StackOverflowError reading json that does not conform to schema > --- > > Key: SPARK-42622 > URL: https://issues.apache.org/jira/browse/SPARK-42622 > Project: Spark > Issue Type: Bug > Components: Input/Output >Affects Versions: 3.4.0 >Reporter: Jelmer Kuperus >Priority: Critical > > Databricks runtime 12.1 uses a pre-release version of spark 3.4.x we > encountered the following problem > > !https://user-images.githubusercontent.com/133639/221866500-99f187a0-8db3-42a7-85ca-b027fdec160d.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42622) StackOverflowError reading json that does not conform to schema
[ https://issues.apache.org/jira/browse/SPARK-42622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694621#comment-17694621 ] Apache Spark commented on SPARK-42622: -- User 'jelmerk' has created a pull request for this issue: https://github.com/apache/spark/pull/40219 > StackOverflowError reading json that does not conform to schema > --- > > Key: SPARK-42622 > URL: https://issues.apache.org/jira/browse/SPARK-42622 > Project: Spark > Issue Type: Bug > Components: Input/Output >Affects Versions: 3.4.0 >Reporter: Jelmer Kuperus >Priority: Critical > > Databricks runtime 12.1 uses a pre-release version of spark 3.4.x we > encountered the following problem > > !https://user-images.githubusercontent.com/133639/221866500-99f187a0-8db3-42a7-85ca-b027fdec160d.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42622) StackOverflowError reading json that does not conform to schema
[ https://issues.apache.org/jira/browse/SPARK-42622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42622: Assignee: Apache Spark > StackOverflowError reading json that does not conform to schema > --- > > Key: SPARK-42622 > URL: https://issues.apache.org/jira/browse/SPARK-42622 > Project: Spark > Issue Type: Bug > Components: Input/Output >Affects Versions: 3.4.0 >Reporter: Jelmer Kuperus >Assignee: Apache Spark >Priority: Critical > > Databricks runtime 12.1 uses a pre-release version of spark 3.4.x we > encountered the following problem > > !https://user-images.githubusercontent.com/133639/221866500-99f187a0-8db3-42a7-85ca-b027fdec160d.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42579) Extend function.lit() to match Literal.apply()
[ https://issues.apache.org/jira/browse/SPARK-42579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694533#comment-17694533 ] Apache Spark commented on SPARK-42579: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/40218 > Extend function.lit() to match Literal.apply() > -- > > Key: SPARK-42579 > URL: https://issues.apache.org/jira/browse/SPARK-42579 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Priority: Major > > function.lit should support the same conversions as the original. > This requires an addition to the connect protocol, since it does not support > nested type literals at the moment. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42579) Extend function.lit() to match Literal.apply()
[ https://issues.apache.org/jira/browse/SPARK-42579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694532#comment-17694532 ] Apache Spark commented on SPARK-42579: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/40218 > Extend function.lit() to match Literal.apply() > -- > > Key: SPARK-42579 > URL: https://issues.apache.org/jira/browse/SPARK-42579 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Priority: Major > > function.lit should support the same conversions as the original. > This requires an addition to the connect protocol, since it does not support > nested type literals at the moment. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42579) Extend function.lit() to match Literal.apply()
[ https://issues.apache.org/jira/browse/SPARK-42579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42579: Assignee: (was: Apache Spark) > Extend function.lit() to match Literal.apply() > -- > > Key: SPARK-42579 > URL: https://issues.apache.org/jira/browse/SPARK-42579 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Priority: Major > > function.lit should support the same conversions as the original. > This requires an addition to the connect protocol, since it does not support > nested type literals at the moment. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42579) Extend function.lit() to match Literal.apply()
[ https://issues.apache.org/jira/browse/SPARK-42579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42579: Assignee: Apache Spark > Extend function.lit() to match Literal.apply() > -- > > Key: SPARK-42579 > URL: https://issues.apache.org/jira/browse/SPARK-42579 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Assignee: Apache Spark >Priority: Major > > function.lit should support the same conversions as the original. > This requires an addition to the connect protocol, since it does not support > nested type literals at the moment. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42559) Implement DataFrameNaFunctions
[ https://issues.apache.org/jira/browse/SPARK-42559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42559: Assignee: Apache Spark (was: BingKun Pan) > Implement DataFrameNaFunctions > -- > > Key: SPARK-42559 > URL: https://issues.apache.org/jira/browse/SPARK-42559 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Assignee: Apache Spark >Priority: Major > > Implement DataFrameNaFunctions for connect and hook it up to Dataset. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42559) Implement DataFrameNaFunctions
[ https://issues.apache.org/jira/browse/SPARK-42559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42559: Assignee: BingKun Pan (was: Apache Spark) > Implement DataFrameNaFunctions > -- > > Key: SPARK-42559 > URL: https://issues.apache.org/jira/browse/SPARK-42559 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Assignee: BingKun Pan >Priority: Major > > Implement DataFrameNaFunctions for connect and hook it up to Dataset. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42559) Implement DataFrameNaFunctions
[ https://issues.apache.org/jira/browse/SPARK-42559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694502#comment-17694502 ] Apache Spark commented on SPARK-42559: -- User 'panbingkun' has created a pull request for this issue: https://github.com/apache/spark/pull/40217 > Implement DataFrameNaFunctions > -- > > Key: SPARK-42559 > URL: https://issues.apache.org/jira/browse/SPARK-42559 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Assignee: BingKun Pan >Priority: Major > > Implement DataFrameNaFunctions for connect and hook it up to Dataset. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42593) Deprecate & remove the APIs that will be removed in pandas 2.0.
[ https://issues.apache.org/jira/browse/SPARK-42593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42593: Assignee: (was: Apache Spark) > Deprecate & remove the APIs that will be removed in pandas 2.0. > --- > > Key: SPARK-42593 > URL: https://issues.apache.org/jira/browse/SPARK-42593 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > > pandas is preparing to release 2.0 which includes bunch of API changes. > ([https://pandas.pydata.org/pandas-docs/version/2.0/whatsnew/v2.0.0.html#removal-of-prior-version-deprecations-changes]) > We also should deprecate the APIs so that we can remove the API in next > release. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42593) Deprecate & remove the APIs that will be removed in pandas 2.0.
[ https://issues.apache.org/jira/browse/SPARK-42593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42593: Assignee: Apache Spark > Deprecate & remove the APIs that will be removed in pandas 2.0. > --- > > Key: SPARK-42593 > URL: https://issues.apache.org/jira/browse/SPARK-42593 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Apache Spark >Priority: Major > > pandas is preparing to release 2.0 which includes bunch of API changes. > ([https://pandas.pydata.org/pandas-docs/version/2.0/whatsnew/v2.0.0.html#removal-of-prior-version-deprecations-changes]) > We also should deprecate the APIs so that we can remove the API in next > release. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42593) Deprecate & remove the APIs that will be removed in pandas 2.0.
[ https://issues.apache.org/jira/browse/SPARK-42593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694464#comment-17694464 ] Apache Spark commented on SPARK-42593: -- User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/40216 > Deprecate & remove the APIs that will be removed in pandas 2.0. > --- > > Key: SPARK-42593 > URL: https://issues.apache.org/jira/browse/SPARK-42593 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > > pandas is preparing to release 2.0 which includes bunch of API changes. > ([https://pandas.pydata.org/pandas-docs/version/2.0/whatsnew/v2.0.0.html#removal-of-prior-version-deprecations-changes]) > We also should deprecate the APIs so that we can remove the API in next > release. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42593) Deprecate & remove the APIs that will be removed in pandas 2.0.
[ https://issues.apache.org/jira/browse/SPARK-42593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694463#comment-17694463 ] Apache Spark commented on SPARK-42593: -- User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/40216 > Deprecate & remove the APIs that will be removed in pandas 2.0. > --- > > Key: SPARK-42593 > URL: https://issues.apache.org/jira/browse/SPARK-42593 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > > pandas is preparing to release 2.0 which includes bunch of API changes. > ([https://pandas.pydata.org/pandas-docs/version/2.0/whatsnew/v2.0.0.html#removal-of-prior-version-deprecations-changes]) > We also should deprecate the APIs so that we can remove the API in next > release. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42376) Introduce watermark propagation among operators
[ https://issues.apache.org/jira/browse/SPARK-42376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694401#comment-17694401 ] Apache Spark commented on SPARK-42376: -- User 'HeartSaVioR' has created a pull request for this issue: https://github.com/apache/spark/pull/40215 > Introduce watermark propagation among operators > --- > > Key: SPARK-42376 > URL: https://issues.apache.org/jira/browse/SPARK-42376 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.5.0 >Reporter: Jungtaek Lim >Priority: Major > > With introduction of SPARK-40925, we enabled workloads containing multiple > stateful operators in a single streaming query. > The JIRA ticket clearly described out-of-scope, "Here we propose fixing the > late record filtering in stateful operators to allow chaining of stateful > operators {*}which do not produce delayed records (like time-interval join or > potentially flatMapGroupsWithState){*}". > We identified production use case for stream-stream time-interval join > followed by stateful operator (e.g. window aggregation), and propose to > address such use case via this ticket. > The design will be described in the PR, but the sketched idea is introducing > simulation of watermark propagation among operators. As of now, Spark > considers all stateful operators to have same input watermark and output > watermark, which introduced the limitation. With this ticket, we construct > the logic to simulate watermark propagation so that each operator can have > its own (input watermark, output watermark). Operators introducing delayed > records will produce delayed output watermark, and downstream operator can > take the delay into account as input watermark will be adjusted. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42591) Document SS guide doc for introducing watermark propagation among operators
[ https://issues.apache.org/jira/browse/SPARK-42591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42591: Assignee: (was: Apache Spark) > Document SS guide doc for introducing watermark propagation among operators > --- > > Key: SPARK-42591 > URL: https://issues.apache.org/jira/browse/SPARK-42591 > Project: Spark > Issue Type: Documentation > Components: Structured Streaming >Affects Versions: 3.5.0 >Reporter: Jungtaek Lim >Priority: Major > > Once SPARK-42376 has merged, we would want to also provide the example of > using stream-stream time interval join followed by streaming aggregation. > Just adding the feature without proper document may lead to the case no one > even knows this is supported. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42591) Document SS guide doc for introducing watermark propagation among operators
[ https://issues.apache.org/jira/browse/SPARK-42591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694400#comment-17694400 ] Apache Spark commented on SPARK-42591: -- User 'HeartSaVioR' has created a pull request for this issue: https://github.com/apache/spark/pull/40215 > Document SS guide doc for introducing watermark propagation among operators > --- > > Key: SPARK-42591 > URL: https://issues.apache.org/jira/browse/SPARK-42591 > Project: Spark > Issue Type: Documentation > Components: Structured Streaming >Affects Versions: 3.5.0 >Reporter: Jungtaek Lim >Priority: Major > > Once SPARK-42376 has merged, we would want to also provide the example of > using stream-stream time interval join followed by streaming aggregation. > Just adding the feature without proper document may lead to the case no one > even knows this is supported. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42591) Document SS guide doc for introducing watermark propagation among operators
[ https://issues.apache.org/jira/browse/SPARK-42591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42591: Assignee: Apache Spark > Document SS guide doc for introducing watermark propagation among operators > --- > > Key: SPARK-42591 > URL: https://issues.apache.org/jira/browse/SPARK-42591 > Project: Spark > Issue Type: Documentation > Components: Structured Streaming >Affects Versions: 3.5.0 >Reporter: Jungtaek Lim >Assignee: Apache Spark >Priority: Major > > Once SPARK-42376 has merged, we would want to also provide the example of > using stream-stream time interval join followed by streaming aggregation. > Just adding the feature without proper document may lead to the case no one > even knows this is supported. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42491) Upgrade jetty to 9.4.51.v20230217
[ https://issues.apache.org/jira/browse/SPARK-42491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42491: Assignee: (was: Apache Spark) > Upgrade jetty to 9.4.51.v20230217 > -- > > Key: SPARK-42491 > URL: https://issues.apache.org/jira/browse/SPARK-42491 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Minor > > https://github.com/eclipse/jetty.project/releases/tag/jetty-9.4.51.v20230217 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42491) Upgrade jetty to 9.4.51.v20230217
[ https://issues.apache.org/jira/browse/SPARK-42491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42491: Assignee: Apache Spark > Upgrade jetty to 9.4.51.v20230217 > -- > > Key: SPARK-42491 > URL: https://issues.apache.org/jira/browse/SPARK-42491 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Minor > > https://github.com/eclipse/jetty.project/releases/tag/jetty-9.4.51.v20230217 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42491) Upgrade jetty to 9.4.51.v20230217
[ https://issues.apache.org/jira/browse/SPARK-42491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694393#comment-17694393 ] Apache Spark commented on SPARK-42491: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/40214 > Upgrade jetty to 9.4.51.v20230217 > -- > > Key: SPARK-42491 > URL: https://issues.apache.org/jira/browse/SPARK-42491 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Minor > > https://github.com/eclipse/jetty.project/releases/tag/jetty-9.4.51.v20230217 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42599) Make `CompatibilitySuite` as a tool like `dev/mima`
[ https://issues.apache.org/jira/browse/SPARK-42599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694378#comment-17694378 ] Apache Spark commented on SPARK-42599: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/40213 > Make `CompatibilitySuite` as a tool like `dev/mima` > --- > > Key: SPARK-42599 > URL: https://issues.apache.org/jira/browse/SPARK-42599 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0, 3.5.0 >Reporter: Yang Jie >Priority: Major > > Using maven to test `CompatibilitySuite` requires some pre-work(need maven > build sql & > connect-client-jvm module before test), so when we run `mvn package test`, > there will be following errors: > > {code:java} > CompatibilitySuite: > - compatibility MiMa tests *** FAILED *** > java.lang.AssertionError: assertion failed: Failed to find the jar inside > folder: /home/bjorn/spark-3.4.0/connector/connect/client/jvm/target > at scala.Predef$.assert(Predef.scala:223) > at > org.apache.spark.sql.connect.client.util.IntegrationTestUtils$.findJar(IntegrationTestUtils.scala:67) > at > org.apache.spark.sql.connect.client.CompatibilitySuite.clientJar$lzycompute(CompatibilitySuite.scala:57) > at > org.apache.spark.sql.connect.client.CompatibilitySuite.clientJar(CompatibilitySuite.scala:53) > at > org.apache.spark.sql.connect.client.CompatibilitySuite.$anonfun$new$1(CompatibilitySuite.scala:69) > at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > at org.scalatest.Transformer.apply(Transformer.scala:22) > at org.scalatest.Transformer.apply(Transformer.scala:20) > ... > - compatibility API tests: Dataset *** FAILED *** > java.lang.AssertionError: assertion failed: Failed to find the jar inside > folder: /home/bjorn/spark-3.4.0/connector/connect/client/jvm/target > at scala.Predef$.assert(Predef.scala:223) > at > org.apache.spark.sql.connect.client.util.IntegrationTestUtils$.findJar(IntegrationTestUtils.scala:67) > at > org.apache.spark.sql.connect.client.CompatibilitySuite.clientJar$lzycompute(CompatibilitySuite.scala:57) > at > org.apache.spark.sql.connect.client.CompatibilitySuite.clientJar(CompatibilitySuite.scala:53) > at > org.apache.spark.sql.connect.client.CompatibilitySuite.$anonfun$new$7(CompatibilitySuite.scala:110) > at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > at org.scalatest.Transformer.apply(Transformer.scala:22) {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42616) SparkSQLCLIDriver shall only close started hive sessionState
[ https://issues.apache.org/jira/browse/SPARK-42616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42616: Assignee: Apache Spark > SparkSQLCLIDriver shall only close started hive sessionState > > > Key: SPARK-42616 > URL: https://issues.apache.org/jira/browse/SPARK-42616 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.2 >Reporter: Kent Yao >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42613) PythonRunner should set OMP_NUM_THREADS to task cpus instead of executor cores by default
[ https://issues.apache.org/jira/browse/SPARK-42613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42613: Assignee: Apache Spark > PythonRunner should set OMP_NUM_THREADS to task cpus instead of executor > cores by default > - > > Key: SPARK-42613 > URL: https://issues.apache.org/jira/browse/SPARK-42613 > Project: Spark > Issue Type: Bug > Components: PySpark, YARN >Affects Versions: 3.3.0 >Reporter: John Zhuge >Assignee: Apache Spark >Priority: Major > > Follow up from > [https://github.com/apache/spark/pull/40199#discussion_r1119453996] > If OMP_NUM_THREADS is not set explicitly, we should set it to > `spark.task.cpus` instead of `spark.executor.cores` as described in [PR > #38699|https://github.com/apache/spark/pull/38699]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42613) PythonRunner should set OMP_NUM_THREADS to task cpus instead of executor cores by default
[ https://issues.apache.org/jira/browse/SPARK-42613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694324#comment-17694324 ] Apache Spark commented on SPARK-42613: -- User 'jzhuge' has created a pull request for this issue: https://github.com/apache/spark/pull/40212 > PythonRunner should set OMP_NUM_THREADS to task cpus instead of executor > cores by default > - > > Key: SPARK-42613 > URL: https://issues.apache.org/jira/browse/SPARK-42613 > Project: Spark > Issue Type: Bug > Components: PySpark, YARN >Affects Versions: 3.3.0 >Reporter: John Zhuge >Priority: Major > > Follow up from > [https://github.com/apache/spark/pull/40199#discussion_r1119453996] > If OMP_NUM_THREADS is not set explicitly, we should set it to > `spark.task.cpus` instead of `spark.executor.cores` as described in [PR > #38699|https://github.com/apache/spark/pull/38699]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42616) SparkSQLCLIDriver shall only close started hive sessionState
[ https://issues.apache.org/jira/browse/SPARK-42616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42616: Assignee: (was: Apache Spark) > SparkSQLCLIDriver shall only close started hive sessionState > > > Key: SPARK-42616 > URL: https://issues.apache.org/jira/browse/SPARK-42616 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.2 >Reporter: Kent Yao >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42613) PythonRunner should set OMP_NUM_THREADS to task cpus instead of executor cores by default
[ https://issues.apache.org/jira/browse/SPARK-42613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42613: Assignee: (was: Apache Spark) > PythonRunner should set OMP_NUM_THREADS to task cpus instead of executor > cores by default > - > > Key: SPARK-42613 > URL: https://issues.apache.org/jira/browse/SPARK-42613 > Project: Spark > Issue Type: Bug > Components: PySpark, YARN >Affects Versions: 3.3.0 >Reporter: John Zhuge >Priority: Major > > Follow up from > [https://github.com/apache/spark/pull/40199#discussion_r1119453996] > If OMP_NUM_THREADS is not set explicitly, we should set it to > `spark.task.cpus` instead of `spark.executor.cores` as described in [PR > #38699|https://github.com/apache/spark/pull/38699]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42616) SparkSQLCLIDriver shall only close started hive sessionState
[ https://issues.apache.org/jira/browse/SPARK-42616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694323#comment-17694323 ] Apache Spark commented on SPARK-42616: -- User 'yaooqinn' has created a pull request for this issue: https://github.com/apache/spark/pull/40211 > SparkSQLCLIDriver shall only close started hive sessionState > > > Key: SPARK-42616 > URL: https://issues.apache.org/jira/browse/SPARK-42616 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.2 >Reporter: Kent Yao >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42615) Refactor the AnalyzePlan RPC and add `session.version`
[ https://issues.apache.org/jira/browse/SPARK-42615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694313#comment-17694313 ] Apache Spark commented on SPARK-42615: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/40210 > Refactor the AnalyzePlan RPC and add `session.version` > -- > > Key: SPARK-42615 > URL: https://issues.apache.org/jira/browse/SPARK-42615 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42615) Refactor the AnalyzePlan RPC and add `session.version`
[ https://issues.apache.org/jira/browse/SPARK-42615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42615: Assignee: (was: Apache Spark) > Refactor the AnalyzePlan RPC and add `session.version` > -- > > Key: SPARK-42615 > URL: https://issues.apache.org/jira/browse/SPARK-42615 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42615) Refactor the AnalyzePlan RPC and add `session.version`
[ https://issues.apache.org/jira/browse/SPARK-42615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42615: Assignee: Apache Spark > Refactor the AnalyzePlan RPC and add `session.version` > -- > > Key: SPARK-42615 > URL: https://issues.apache.org/jira/browse/SPARK-42615 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42427) Conv should return an error if the internal conversion overflows
[ https://issues.apache.org/jira/browse/SPARK-42427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694299#comment-17694299 ] Apache Spark commented on SPARK-42427: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/40209 > Conv should return an error if the internal conversion overflows > > > Key: SPARK-42427 > URL: https://issues.apache.org/jira/browse/SPARK-42427 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42539) User-provided JARs can override Spark's Hive metadata client JARs when using "builtin"
[ https://issues.apache.org/jira/browse/SPARK-42539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42539: Assignee: Apache Spark > User-provided JARs can override Spark's Hive metadata client JARs when using > "builtin" > -- > > Key: SPARK-42539 > URL: https://issues.apache.org/jira/browse/SPARK-42539 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.3, 3.2.3, 3.3.2 >Reporter: Erik Krogen >Assignee: Apache Spark >Priority: Major > > Recently we observed that on version 3.2.0 and Java 8, it is possible for > user-provided Hive JARs to break the ability for Spark, via the Hive metadata > client / {{IsolatedClientLoader}}, to communicate with Hive Metastore, when > using the default behavior of the "builtin" Hive version. After SPARK-35321, > when Spark is compiled against Hive >= 2.3.9 and the "builtin" Hive client > version is used, we will call the method {{Hive.getWithoutRegisterFns()}} > (from HIVE-21563) instead of {{Hive.get()}}. If the user has included, for > example, {{hive-exec-2.3.8.jar}} on their classpath, the client will break > with a {{NoSuchMethodError}}. This particular failure mode was resolved in > 3.2.1 by SPARK-37446, but while investigating, we found a general issue that > it's possible for user JARs to override Spark's own JARs -- but only inside > of the IsolatedClientLoader when using "builtin". This happens because even > when Spark is configured to use the "builtin" Hive classes, it still creates > a separate URLClassLoader for the HiveClientImpl used for HMS communication. > To get the set of JAR URLs to use for this classloader, Spark [collects all > of the JARs used by the user classloader (and its parent, and that > classloader's parent, and so > on)|https://github.com/apache/spark/blob/87e3d5625e76bb734b8dd753bfb25002822c8585/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala#L412-L438]. > Thus the newly created classloader will have all of the same JARs as the > user classloader, but the ordering has been reversed! User JARs get > prioritized ahead of system JARs, because the classloader hierarchy is > traversed from bottom-to-top. For example let's say we have user JARs > "foo.jar" and "hive-exec-2.3.8.jar". The user classloader will look like this: > {code} > MutableURLClassLoader > -- foo.jar > -- hive-exec-2.3.8.jar > -- parent: URLClassLoader > - spark-core_2.12-3.2.0.jar > - ... > - hive-exec-2.3.9.jar > - ... > {code} > This setup provides the expected behavior within the user classloader; it > will first check the parent, so hive-exec-2.3.9.jar takes precedence, and the > MutableURLClassLoader is only checked if the class doesn't exist in the > parent. But when a JAR list is constructed for the IsolatedClientLoader, it > traverses the URLs from MutableURLClassLoader first, then it's parent, so the > final list looks like (in order): > {code} > URLClassLoader [IsolatedClientLoader] > -- foo.jar > -- hive-exec-2.3.8.jar > -- spark-core_2.12-3.2.0.jar > -- ... > -- hive-exec-2.3.9.jar > -- ... > -- parent: boot classloader (JVM classes) > {code} > Now when a lookup happens, all of the JARs are within the same > URLClassLoader, and the user JARs are in front of the Spark ones, so the user > JARs get prioritized. This is the opposite of the expected behavior when > using the default user/application classloader in Spark, which has > parent-first behavior, prioritizing the Spark/system classes over the user > classes. (Note that this behavior is correct when using the > {{ChildFirstURLClassLoader}}.) > After SPARK-37446, the NoSuchMethodError is no longer an issue, but this > still breaks assumptions about how user JARs should be treated vs. system > JARs, and presents the ability for the client to break in other ways. For > example in SPARK-37446 it describes a scenario whereby Hive 2.3.8 JARs have > been included; the changes in Hive 2.3.9 were needed to improve compatibility > with older HMS, so if a user were to accidentally include these older JARs, > it could break the ability of Spark to communicate with HMS 1.x > I see two solutions to this: > *(A) Remove the separate classloader entirely when using "builtin"* > Starting from 3.0.0, due to SPARK-26839, when using Java 9+, we don't even > create a new classloader when using "builtin". This makes sense, as [called > out in this > comment|https://github.com/apache/spark/pull/24057#discussion_r265142878], > since the point of "builtin" is to use the existing JARs on the classpath > anyway. This proposes simply extending the changes from SPARK-26839 to all > Java versions, instead of restricting to Java 9+ only. > *(B) Reverse the ord
[jira] [Assigned] (SPARK-42539) User-provided JARs can override Spark's Hive metadata client JARs when using "builtin"
[ https://issues.apache.org/jira/browse/SPARK-42539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42539: Assignee: (was: Apache Spark) > User-provided JARs can override Spark's Hive metadata client JARs when using > "builtin" > -- > > Key: SPARK-42539 > URL: https://issues.apache.org/jira/browse/SPARK-42539 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.3, 3.2.3, 3.3.2 >Reporter: Erik Krogen >Priority: Major > > Recently we observed that on version 3.2.0 and Java 8, it is possible for > user-provided Hive JARs to break the ability for Spark, via the Hive metadata > client / {{IsolatedClientLoader}}, to communicate with Hive Metastore, when > using the default behavior of the "builtin" Hive version. After SPARK-35321, > when Spark is compiled against Hive >= 2.3.9 and the "builtin" Hive client > version is used, we will call the method {{Hive.getWithoutRegisterFns()}} > (from HIVE-21563) instead of {{Hive.get()}}. If the user has included, for > example, {{hive-exec-2.3.8.jar}} on their classpath, the client will break > with a {{NoSuchMethodError}}. This particular failure mode was resolved in > 3.2.1 by SPARK-37446, but while investigating, we found a general issue that > it's possible for user JARs to override Spark's own JARs -- but only inside > of the IsolatedClientLoader when using "builtin". This happens because even > when Spark is configured to use the "builtin" Hive classes, it still creates > a separate URLClassLoader for the HiveClientImpl used for HMS communication. > To get the set of JAR URLs to use for this classloader, Spark [collects all > of the JARs used by the user classloader (and its parent, and that > classloader's parent, and so > on)|https://github.com/apache/spark/blob/87e3d5625e76bb734b8dd753bfb25002822c8585/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala#L412-L438]. > Thus the newly created classloader will have all of the same JARs as the > user classloader, but the ordering has been reversed! User JARs get > prioritized ahead of system JARs, because the classloader hierarchy is > traversed from bottom-to-top. For example let's say we have user JARs > "foo.jar" and "hive-exec-2.3.8.jar". The user classloader will look like this: > {code} > MutableURLClassLoader > -- foo.jar > -- hive-exec-2.3.8.jar > -- parent: URLClassLoader > - spark-core_2.12-3.2.0.jar > - ... > - hive-exec-2.3.9.jar > - ... > {code} > This setup provides the expected behavior within the user classloader; it > will first check the parent, so hive-exec-2.3.9.jar takes precedence, and the > MutableURLClassLoader is only checked if the class doesn't exist in the > parent. But when a JAR list is constructed for the IsolatedClientLoader, it > traverses the URLs from MutableURLClassLoader first, then it's parent, so the > final list looks like (in order): > {code} > URLClassLoader [IsolatedClientLoader] > -- foo.jar > -- hive-exec-2.3.8.jar > -- spark-core_2.12-3.2.0.jar > -- ... > -- hive-exec-2.3.9.jar > -- ... > -- parent: boot classloader (JVM classes) > {code} > Now when a lookup happens, all of the JARs are within the same > URLClassLoader, and the user JARs are in front of the Spark ones, so the user > JARs get prioritized. This is the opposite of the expected behavior when > using the default user/application classloader in Spark, which has > parent-first behavior, prioritizing the Spark/system classes over the user > classes. (Note that this behavior is correct when using the > {{ChildFirstURLClassLoader}}.) > After SPARK-37446, the NoSuchMethodError is no longer an issue, but this > still breaks assumptions about how user JARs should be treated vs. system > JARs, and presents the ability for the client to break in other ways. For > example in SPARK-37446 it describes a scenario whereby Hive 2.3.8 JARs have > been included; the changes in Hive 2.3.9 were needed to improve compatibility > with older HMS, so if a user were to accidentally include these older JARs, > it could break the ability of Spark to communicate with HMS 1.x > I see two solutions to this: > *(A) Remove the separate classloader entirely when using "builtin"* > Starting from 3.0.0, due to SPARK-26839, when using Java 9+, we don't even > create a new classloader when using "builtin". This makes sense, as [called > out in this > comment|https://github.com/apache/spark/pull/24057#discussion_r265142878], > since the point of "builtin" is to use the existing JARs on the classpath > anyway. This proposes simply extending the changes from SPARK-26839 to all > Java versions, instead of restricting to Java 9+ only. > *(B) Reverse the ordering of parent/child JAR
[jira] [Commented] (SPARK-42592) Document SS guide doc for supporting multiple stateful operators (especially chained aggregations)
[ https://issues.apache.org/jira/browse/SPARK-42592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694281#comment-17694281 ] Apache Spark commented on SPARK-42592: -- User 'HeartSaVioR' has created a pull request for this issue: https://github.com/apache/spark/pull/40208 > Document SS guide doc for supporting multiple stateful operators (especially > chained aggregations) > -- > > Key: SPARK-42592 > URL: https://issues.apache.org/jira/browse/SPARK-42592 > Project: Spark > Issue Type: Documentation > Components: Structured Streaming >Affects Versions: 3.5.0 >Reporter: Jungtaek Lim >Assignee: Jungtaek Lim >Priority: Major > Fix For: 3.4.1, 3.5.0 > > > We made a change on the guide doc for SPARK-40925 via SPARK-42105, but from > SPARK-42105 we only removed the section of "limitation of global watermark". > That said, we haven't provided any example of new functionality, especially > that users need to know about the change of SQL function (window) in chained > time window aggregations. > In this ticket, we will add the example of chained time window aggregations, > with introducing new functionality of SQL function. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42614) Make all constructors private[sql]
[ https://issues.apache.org/jira/browse/SPARK-42614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694279#comment-17694279 ] Apache Spark commented on SPARK-42614: -- User 'hvanhovell' has created a pull request for this issue: https://github.com/apache/spark/pull/40207 > Make all constructors private[sql] > -- > > Key: SPARK-42614 > URL: https://issues.apache.org/jira/browse/SPARK-42614 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Assignee: Herman van Hövell >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42614) Make all constructors private[sql]
[ https://issues.apache.org/jira/browse/SPARK-42614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42614: Assignee: Herman van Hövell (was: Apache Spark) > Make all constructors private[sql] > -- > > Key: SPARK-42614 > URL: https://issues.apache.org/jira/browse/SPARK-42614 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Assignee: Herman van Hövell >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42614) Make all constructors private[sql]
[ https://issues.apache.org/jira/browse/SPARK-42614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42614: Assignee: Apache Spark (was: Herman van Hövell) > Make all constructors private[sql] > -- > > Key: SPARK-42614 > URL: https://issues.apache.org/jira/browse/SPARK-42614 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42614) Make all constructors private[sql]
[ https://issues.apache.org/jira/browse/SPARK-42614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694278#comment-17694278 ] Apache Spark commented on SPARK-42614: -- User 'hvanhovell' has created a pull request for this issue: https://github.com/apache/spark/pull/40207 > Make all constructors private[sql] > -- > > Key: SPARK-42614 > URL: https://issues.apache.org/jira/browse/SPARK-42614 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Assignee: Herman van Hövell >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42611) Insert char/varchar length checks for inner fields during resolution
[ https://issues.apache.org/jira/browse/SPARK-42611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42611: Assignee: (was: Apache Spark) > Insert char/varchar length checks for inner fields during resolution > > > Key: SPARK-42611 > URL: https://issues.apache.org/jira/browse/SPARK-42611 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Anton Okolnychyi >Priority: Major > > In SPARK-36498, we added support for reordering inner fields in structs > during resolution. Unfortunately, we don't add any length validation for > nested char/varchar columns in that path. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42611) Insert char/varchar length checks for inner fields during resolution
[ https://issues.apache.org/jira/browse/SPARK-42611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694272#comment-17694272 ] Apache Spark commented on SPARK-42611: -- User 'aokolnychyi' has created a pull request for this issue: https://github.com/apache/spark/pull/40206 > Insert char/varchar length checks for inner fields during resolution > > > Key: SPARK-42611 > URL: https://issues.apache.org/jira/browse/SPARK-42611 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Anton Okolnychyi >Priority: Major > > In SPARK-36498, we added support for reordering inner fields in structs > during resolution. Unfortunately, we don't add any length validation for > nested char/varchar columns in that path. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42611) Insert char/varchar length checks for inner fields during resolution
[ https://issues.apache.org/jira/browse/SPARK-42611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42611: Assignee: Apache Spark > Insert char/varchar length checks for inner fields during resolution > > > Key: SPARK-42611 > URL: https://issues.apache.org/jira/browse/SPARK-42611 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Anton Okolnychyi >Assignee: Apache Spark >Priority: Major > > In SPARK-36498, we added support for reordering inner fields in structs > during resolution. Unfortunately, we don't add any length validation for > nested char/varchar columns in that path. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42611) Insert char/varchar length checks for inner fields during resolution
[ https://issues.apache.org/jira/browse/SPARK-42611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694271#comment-17694271 ] Apache Spark commented on SPARK-42611: -- User 'aokolnychyi' has created a pull request for this issue: https://github.com/apache/spark/pull/40206 > Insert char/varchar length checks for inner fields during resolution > > > Key: SPARK-42611 > URL: https://issues.apache.org/jira/browse/SPARK-42611 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Anton Okolnychyi >Priority: Major > > In SPARK-36498, we added support for reordering inner fields in structs > during resolution. Unfortunately, we don't add any length validation for > nested char/varchar columns in that path. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42610) Add implicit encoders to SQLImplicits
[ https://issues.apache.org/jira/browse/SPARK-42610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694265#comment-17694265 ] Apache Spark commented on SPARK-42610: -- User 'hvanhovell' has created a pull request for this issue: https://github.com/apache/spark/pull/40205 > Add implicit encoders to SQLImplicits > - > > Key: SPARK-42610 > URL: https://issues.apache.org/jira/browse/SPARK-42610 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Assignee: Herman van Hövell >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42601) New physical type Decimal128 for DecimalType
[ https://issues.apache.org/jira/browse/SPARK-42601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694259#comment-17694259 ] Apache Spark commented on SPARK-42601: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/40204 > New physical type Decimal128 for DecimalType > > > Key: SPARK-42601 > URL: https://issues.apache.org/jira/browse/SPARK-42601 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: jiaan.geng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42601) New physical type Decimal128 for DecimalType
[ https://issues.apache.org/jira/browse/SPARK-42601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42601: Assignee: (was: Apache Spark) > New physical type Decimal128 for DecimalType > > > Key: SPARK-42601 > URL: https://issues.apache.org/jira/browse/SPARK-42601 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: jiaan.geng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42601) New physical type Decimal128 for DecimalType
[ https://issues.apache.org/jira/browse/SPARK-42601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42601: Assignee: Apache Spark > New physical type Decimal128 for DecimalType > > > Key: SPARK-42601 > URL: https://issues.apache.org/jira/browse/SPARK-42601 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: jiaan.geng >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42601) New physical type Decimal128 for DecimalType
[ https://issues.apache.org/jira/browse/SPARK-42601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694258#comment-17694258 ] Apache Spark commented on SPARK-42601: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/40204 > New physical type Decimal128 for DecimalType > > > Key: SPARK-42601 > URL: https://issues.apache.org/jira/browse/SPARK-42601 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: jiaan.geng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42612) Enable more parity tests related to functions
[ https://issues.apache.org/jira/browse/SPARK-42612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42612: Assignee: (was: Apache Spark) > Enable more parity tests related to functions > - > > Key: SPARK-42612 > URL: https://issues.apache.org/jira/browse/SPARK-42612 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Takuya Ueshin >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42612) Enable more parity tests related to functions
[ https://issues.apache.org/jira/browse/SPARK-42612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42612: Assignee: Apache Spark > Enable more parity tests related to functions > - > > Key: SPARK-42612 > URL: https://issues.apache.org/jira/browse/SPARK-42612 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Takuya Ueshin >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42612) Enable more parity tests related to functions
[ https://issues.apache.org/jira/browse/SPARK-42612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694244#comment-17694244 ] Apache Spark commented on SPARK-42612: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/40203 > Enable more parity tests related to functions > - > > Key: SPARK-42612 > URL: https://issues.apache.org/jira/browse/SPARK-42612 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Takuya Ueshin >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42612) Enable more parity tests related to functions
[ https://issues.apache.org/jira/browse/SPARK-42612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694245#comment-17694245 ] Apache Spark commented on SPARK-42612: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/40203 > Enable more parity tests related to functions > - > > Key: SPARK-42612 > URL: https://issues.apache.org/jira/browse/SPARK-42612 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Takuya Ueshin >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42608) Use full column names for inner fields in resolution errors
[ https://issues.apache.org/jira/browse/SPARK-42608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694242#comment-17694242 ] Apache Spark commented on SPARK-42608: -- User 'aokolnychyi' has created a pull request for this issue: https://github.com/apache/spark/pull/40202 > Use full column names for inner fields in resolution errors > --- > > Key: SPARK-42608 > URL: https://issues.apache.org/jira/browse/SPARK-42608 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Anton Okolnychyi >Priority: Major > > If there are multiple inner columns with the same name, resolution errors may > be confusing as we only use field names, not full column names. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42608) Use full column names for inner fields in resolution errors
[ https://issues.apache.org/jira/browse/SPARK-42608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42608: Assignee: Apache Spark > Use full column names for inner fields in resolution errors > --- > > Key: SPARK-42608 > URL: https://issues.apache.org/jira/browse/SPARK-42608 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Anton Okolnychyi >Assignee: Apache Spark >Priority: Major > > If there are multiple inner columns with the same name, resolution errors may > be confusing as we only use field names, not full column names. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42608) Use full column names for inner fields in resolution errors
[ https://issues.apache.org/jira/browse/SPARK-42608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42608: Assignee: (was: Apache Spark) > Use full column names for inner fields in resolution errors > --- > > Key: SPARK-42608 > URL: https://issues.apache.org/jira/browse/SPARK-42608 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Anton Okolnychyi >Priority: Major > > If there are multiple inner columns with the same name, resolution errors may > be confusing as we only use field names, not full column names. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41725) Remove the workaround of sql(...).collect back in PySpark tests
[ https://issues.apache.org/jira/browse/SPARK-41725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41725: Assignee: (was: Apache Spark) > Remove the workaround of sql(...).collect back in PySpark tests > --- > > Key: SPARK-41725 > URL: https://issues.apache.org/jira/browse/SPARK-41725 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, Tests >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > See https://github.com/apache/spark/pull/39224/files#r1057436437 > We don't have to `collect` for every `sql` but Spark Connect requires it. We > should remove them out. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41725) Remove the workaround of sql(...).collect back in PySpark tests
[ https://issues.apache.org/jira/browse/SPARK-41725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694208#comment-17694208 ] Apache Spark commented on SPARK-41725: -- User 'grundprinzip' has created a pull request for this issue: https://github.com/apache/spark/pull/40160 > Remove the workaround of sql(...).collect back in PySpark tests > --- > > Key: SPARK-41725 > URL: https://issues.apache.org/jira/browse/SPARK-41725 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, Tests >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > See https://github.com/apache/spark/pull/39224/files#r1057436437 > We don't have to `collect` for every `sql` but Spark Connect requires it. We > should remove them out. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41725) Remove the workaround of sql(...).collect back in PySpark tests
[ https://issues.apache.org/jira/browse/SPARK-41725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41725: Assignee: Apache Spark > Remove the workaround of sql(...).collect back in PySpark tests > --- > > Key: SPARK-41725 > URL: https://issues.apache.org/jira/browse/SPARK-41725 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, Tests >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Major > > See https://github.com/apache/spark/pull/39224/files#r1057436437 > We don't have to `collect` for every `sql` but Spark Connect requires it. We > should remove them out. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42510) Implement `DataFrame.mapInPandas`
[ https://issues.apache.org/jira/browse/SPARK-42510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694162#comment-17694162 ] Apache Spark commented on SPARK-42510: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/40201 > Implement `DataFrame.mapInPandas` > - > > Key: SPARK-42510 > URL: https://issues.apache.org/jira/browse/SPARK-42510 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Fix For: 3.4.0 > > > Implement `DataFrame.mapInPandas` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org