[jira] [Commented] (SPARK-42750) Support INSERT INTO by name
[ https://issues.apache.org/jira/browse/SPARK-42750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700509#comment-17700509 ] Xinsen commented on SPARK-42750: May I solve it? I'm interested in this. And by the way, does it just include inserting into hive table and hdfs file, or it includes jdbc tables like MySQL table? > Support INSERT INTO by name > --- > > Key: SPARK-42750 > URL: https://issues.apache.org/jira/browse/SPARK-42750 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Jose Torres >Priority: Major > > In some use cases, users have incoming dataframes with fixed column names > which might differ from the canonical order. Currently there's no way to > handle this easily through the INSERT INTO API - the user has to make sure > the columns are in the right order as they would when inserting a tuple. We > should add an optional BY NAME clause, such that: > INSERT INTO tgt BY NAME > takes each column of and inserts it into the column in `tgt` which > has the same name according to the configured `resolver` logic. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42801) Fix Flaky ClientE2ETestSuite
Dongjoon Hyun created SPARK-42801: - Summary: Fix Flaky ClientE2ETestSuite Key: SPARK-42801 URL: https://issues.apache.org/jira/browse/SPARK-42801 Project: Spark Issue Type: Bug Components: Connect, Tests Affects Versions: 3.4.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42706) Document the Spark SQL error classes in user-facing documentation.
[ https://issues.apache.org/jira/browse/SPARK-42706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700505#comment-17700505 ] Apache Spark commented on SPARK-42706: -- User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/40433 > Document the Spark SQL error classes in user-facing documentation. > -- > > Key: SPARK-42706 > URL: https://issues.apache.org/jira/browse/SPARK-42706 > Project: Spark > Issue Type: Sub-task > Components: Documentation, SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 3.5.0 > > > We need to have an error class list to user facing documents. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42800) Implement ml function {array_to_vector, vector_to_array}
[ https://issues.apache.org/jira/browse/SPARK-42800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42800: Assignee: Apache Spark > Implement ml function {array_to_vector, vector_to_array} > > > Key: SPARK-42800 > URL: https://issues.apache.org/jira/browse/SPARK-42800 > Project: Spark > Issue Type: Sub-task > Components: Connect, ML, PySpark >Affects Versions: 3.5.0 >Reporter: Ruifeng Zheng >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42800) Implement ml function {array_to_vector, vector_to_array}
[ https://issues.apache.org/jira/browse/SPARK-42800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42800: Assignee: (was: Apache Spark) > Implement ml function {array_to_vector, vector_to_array} > > > Key: SPARK-42800 > URL: https://issues.apache.org/jira/browse/SPARK-42800 > Project: Spark > Issue Type: Sub-task > Components: Connect, ML, PySpark >Affects Versions: 3.5.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42800) Implement ml function {array_to_vector, vector_to_array}
[ https://issues.apache.org/jira/browse/SPARK-42800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700496#comment-17700496 ] Apache Spark commented on SPARK-42800: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/40432 > Implement ml function {array_to_vector, vector_to_array} > > > Key: SPARK-42800 > URL: https://issues.apache.org/jira/browse/SPARK-42800 > Project: Spark > Issue Type: Sub-task > Components: Connect, ML, PySpark >Affects Versions: 3.5.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42800) Implement ml function {array_to_vector, vector_to_array}
Ruifeng Zheng created SPARK-42800: - Summary: Implement ml function {array_to_vector, vector_to_array} Key: SPARK-42800 URL: https://issues.apache.org/jira/browse/SPARK-42800 Project: Spark Issue Type: Sub-task Components: Connect, ML, PySpark Affects Versions: 3.5.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42799) Update SBT build `xercesImpl` version to match with pom.xml
[ https://issues.apache.org/jira/browse/SPARK-42799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-42799: -- Affects Version/s: 3.3.1 3.3.0 3.4.0 3.3.2 > Update SBT build `xercesImpl` version to match with pom.xml > --- > > Key: SPARK-42799 > URL: https://issues.apache.org/jira/browse/SPARK-42799 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.3.0, 3.2.2, 3.3.1, 3.3.2, 3.4.0 >Reporter: Dongjoon Hyun >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42797) Spark Connect - Grammatical improvements to Spark Overview and Spark Connect Overview doc pages
[ https://issues.apache.org/jira/browse/SPARK-42797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-42797: - Fix Version/s: 3.4.1 (was: 3.4.0) > Spark Connect - Grammatical improvements to Spark Overview and Spark Connect > Overview doc pages > --- > > Key: SPARK-42797 > URL: https://issues.apache.org/jira/browse/SPARK-42797 > Project: Spark > Issue Type: Documentation > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Allan Folting >Assignee: Allan Folting >Priority: Major > Fix For: 3.4.1 > > > Grammatical improvements, this is a follow-up to this ticket: > Introducing Spark Connect on the main page and adding Spark Connect Overview > page > https://issues.apache.org/jira/browse/SPARK-42496 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42765) Enable importing `pandas_udf` from `pyspark.sql.connect.functions`
[ https://issues.apache.org/jira/browse/SPARK-42765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-42765: Assignee: Xinrong Meng > Enable importing `pandas_udf` from `pyspark.sql.connect.functions` > -- > > Key: SPARK-42765 > URL: https://issues.apache.org/jira/browse/SPARK-42765 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > > Remove the outdated import path of `pandas_udf` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42765) Enable importing `pandas_udf` from `pyspark.sql.connect.functions`
[ https://issues.apache.org/jira/browse/SPARK-42765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-42765. -- Fix Version/s: 3.4.1 Resolution: Fixed Issue resolved by pull request 40388 [https://github.com/apache/spark/pull/40388] > Enable importing `pandas_udf` from `pyspark.sql.connect.functions` > -- > > Key: SPARK-42765 > URL: https://issues.apache.org/jira/browse/SPARK-42765 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Fix For: 3.4.1 > > > Remove the outdated import path of `pandas_udf` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42797) Spark Connect - Grammatical improvements to Spark Overview and Spark Connect Overview doc pages
[ https://issues.apache.org/jira/browse/SPARK-42797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-42797. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 40428 [https://github.com/apache/spark/pull/40428] > Spark Connect - Grammatical improvements to Spark Overview and Spark Connect > Overview doc pages > --- > > Key: SPARK-42797 > URL: https://issues.apache.org/jira/browse/SPARK-42797 > Project: Spark > Issue Type: Documentation > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Allan Folting >Assignee: Allan Folting >Priority: Major > Fix For: 3.4.0 > > > Grammatical improvements, this is a follow-up to this ticket: > Introducing Spark Connect on the main page and adding Spark Connect Overview > page > https://issues.apache.org/jira/browse/SPARK-42496 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42799) Update SBT build `xercesImpl` version to match with pom.xml
[ https://issues.apache.org/jira/browse/SPARK-42799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42799: Assignee: Apache Spark > Update SBT build `xercesImpl` version to match with pom.xml > --- > > Key: SPARK-42799 > URL: https://issues.apache.org/jira/browse/SPARK-42799 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.2.2 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42797) Spark Connect - Grammatical improvements to Spark Overview and Spark Connect Overview doc pages
[ https://issues.apache.org/jira/browse/SPARK-42797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-42797: Assignee: Allan Folting > Spark Connect - Grammatical improvements to Spark Overview and Spark Connect > Overview doc pages > --- > > Key: SPARK-42797 > URL: https://issues.apache.org/jira/browse/SPARK-42797 > Project: Spark > Issue Type: Documentation > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Allan Folting >Assignee: Allan Folting >Priority: Major > > Grammatical improvements, this is a follow-up to this ticket: > Introducing Spark Connect on the main page and adding Spark Connect Overview > page > https://issues.apache.org/jira/browse/SPARK-42496 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42799) Update SBT build `xercesImpl` version to match with pom.xml
[ https://issues.apache.org/jira/browse/SPARK-42799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42799: Assignee: (was: Apache Spark) > Update SBT build `xercesImpl` version to match with pom.xml > --- > > Key: SPARK-42799 > URL: https://issues.apache.org/jira/browse/SPARK-42799 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.2.2 >Reporter: Dongjoon Hyun >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42799) Update SBT build `xercesImpl` version to match with pom.xml
[ https://issues.apache.org/jira/browse/SPARK-42799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700492#comment-17700492 ] Apache Spark commented on SPARK-42799: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/40431 > Update SBT build `xercesImpl` version to match with pom.xml > --- > > Key: SPARK-42799 > URL: https://issues.apache.org/jira/browse/SPARK-42799 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.2.2 >Reporter: Dongjoon Hyun >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42799) Update SBT build `xercesImpl` version to match with pom.xml
Dongjoon Hyun created SPARK-42799: - Summary: Update SBT build `xercesImpl` version to match with pom.xml Key: SPARK-42799 URL: https://issues.apache.org/jira/browse/SPARK-42799 Project: Spark Issue Type: Bug Components: Build Affects Versions: 3.2.2 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42666) Fix `createDataFrame` to work properly with rows and schema
[ https://issues.apache.org/jira/browse/SPARK-42666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee resolved SPARK-42666. - Fix Version/s: 3.4.0 Resolution: Duplicate Resolved from SPARK-42679 > Fix `createDataFrame` to work properly with rows and schema > --- > > Key: SPARK-42666 > URL: https://issues.apache.org/jira/browse/SPARK-42666 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Priority: Major > Fix For: 3.4.0 > > > The code below is not working properly in Spark Connect: > {code:java} > >>> sdf = spark.range(10) > >>> spark.createDataFrame(sdf.tail(5), sdf.schema) > Traceback (most recent call last): > File "", line 1, in > File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 94, in > __repr__ > return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes)) > File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 162, in > dtypes > return [(str(f.name), f.dataType.simpleString()) for f in > self.schema.fields] > File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 1346, in > schema > self._schema = self._session.client.schema(query) > File "/.../spark/python/pyspark/sql/connect/client.py", line 614, in schema > proto_schema = self._analyze(method="schema", plan=plan).schema > File "/.../spark/python/pyspark/sql/connect/client.py", line 755, in > _analyze > self._handle_error(rpc_error) > File "/.../spark/python/pyspark/sql/connect/client.py", line 894, in > _handle_error > raise convert_exception(info, status.message) from None > pyspark.errors.exceptions.connect.AnalysisException: > [NULLABLE_COLUMN_OR_FIELD] Column or field `id` is nullable while it's > required to be non-nullable.{code} > whereas working properly in regular PySpark: > {code:java} > >>> sdf = spark.range(10) > >>> spark.createDataFrame(sdf.tail(5), sdf.schema).show() > +---+ > | id| > +---+ > | 5| > | 6| > | 7| > | 8| > | 9| > +---+ {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42798) Upgrade protobuf-java to 3.22.2
[ https://issues.apache.org/jira/browse/SPARK-42798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42798: Assignee: Apache Spark > Upgrade protobuf-java to 3.22.2 > --- > > Key: SPARK-42798 > URL: https://issues.apache.org/jira/browse/SPARK-42798 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Major > > * [https://github.com/protocolbuffers/protobuf/releases/tag/v22.1] > * [https://github.com/protocolbuffers/protobuf/releases/tag/v22.2] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42798) Upgrade protobuf-java to 3.22.2
[ https://issues.apache.org/jira/browse/SPARK-42798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42798: Assignee: (was: Apache Spark) > Upgrade protobuf-java to 3.22.2 > --- > > Key: SPARK-42798 > URL: https://issues.apache.org/jira/browse/SPARK-42798 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Major > > * [https://github.com/protocolbuffers/protobuf/releases/tag/v22.1] > * [https://github.com/protocolbuffers/protobuf/releases/tag/v22.2] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42798) Upgrade protobuf-java to 3.22.2
[ https://issues.apache.org/jira/browse/SPARK-42798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700484#comment-17700484 ] Apache Spark commented on SPARK-42798: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/40430 > Upgrade protobuf-java to 3.22.2 > --- > > Key: SPARK-42798 > URL: https://issues.apache.org/jira/browse/SPARK-42798 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Major > > * [https://github.com/protocolbuffers/protobuf/releases/tag/v22.1] > * [https://github.com/protocolbuffers/protobuf/releases/tag/v22.2] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42571) Provide a mode to replace Py4J for local communication
[ https://issues.apache.org/jira/browse/SPARK-42571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-42571: - Affects Version/s: 3.5.0 (was: 3.4.0) > Provide a mode to replace Py4J for local communication > -- > > Key: SPARK-42571 > URL: https://issues.apache.org/jira/browse/SPARK-42571 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.5.0 >Reporter: Hyukjin Kwon >Priority: Major > > We can replace Py4J even when master is specified. e.g., > SparkSession.builder.master("...") and communicate w/ JVM via Spark Connect > instead of Py4J. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42729) Update Submitting Applications page for Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-42729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-42729: - Affects Version/s: 3.5.0 (was: 3.4.0) > Update Submitting Applications page for Spark Connect > - > > Key: SPARK-42729 > URL: https://issues.apache.org/jira/browse/SPARK-42729 > Project: Spark > Issue Type: Sub-task > Components: Connect, Documentation >Affects Versions: 3.5.0 >Reporter: Hyukjin Kwon >Priority: Major > > https://spark.apache.org/docs/latest/submitting-applications.html > Should we add Spark Connect application building content here or create > another, Spark Connect application building page. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42798) Upgrade protobuf-java to 3.22.2
[ https://issues.apache.org/jira/browse/SPARK-42798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-42798: - Description: * [https://github.com/protocolbuffers/protobuf/releases/tag/v22.1] * [https://github.com/protocolbuffers/protobuf/releases/tag/v22.2] > Upgrade protobuf-java to 3.22.2 > --- > > Key: SPARK-42798 > URL: https://issues.apache.org/jira/browse/SPARK-42798 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Major > > * [https://github.com/protocolbuffers/protobuf/releases/tag/v22.1] > * [https://github.com/protocolbuffers/protobuf/releases/tag/v22.2] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42798) Upgrade protobuf-java to 3.22.2
[ https://issues.apache.org/jira/browse/SPARK-42798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-42798: - Environment: (was: * [https://github.com/protocolbuffers/protobuf/releases/tag/v22.1] * https://github.com/protocolbuffers/protobuf/releases/tag/v22.2) > Upgrade protobuf-java to 3.22.2 > --- > > Key: SPARK-42798 > URL: https://issues.apache.org/jira/browse/SPARK-42798 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42798) Upgrade protobuf-java to 3.22.2
Yang Jie created SPARK-42798: Summary: Upgrade protobuf-java to 3.22.2 Key: SPARK-42798 URL: https://issues.apache.org/jira/browse/SPARK-42798 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.5.0 Environment: * [https://github.com/protocolbuffers/protobuf/releases/tag/v22.1] * https://github.com/protocolbuffers/protobuf/releases/tag/v22.2 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42794) Increase the lockAcquireTimeoutMs for acquiring the RocksDB state store in Structure Streaming
[ https://issues.apache.org/jira/browse/SPARK-42794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim reassigned SPARK-42794: Assignee: Huanli Wang > Increase the lockAcquireTimeoutMs for acquiring the RocksDB state store in > Structure Streaming > -- > > Key: SPARK-42794 > URL: https://issues.apache.org/jira/browse/SPARK-42794 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.5.0 >Reporter: Huanli Wang >Assignee: Huanli Wang >Priority: Minor > > We are seeing query failure which is caused by RocksDB acquisition failure > for the retry tasks. > * at t1, we shrink the cluster to only have one executor > {code:java} > 23/03/05 22:47:21 INFO StandaloneAppClient$ClientEndpoint: Executor updated: > app-20230305224215-/2 is now DECOMMISSIONED (worker decommissioned > because of kill request from HTTP endpoint (data migration disabled)) > 23/03/05 22:47:21 INFO StandaloneAppClient$ClientEndpoint: Executor updated: > app-20230305224215-/3 is now DECOMMISSIONED (worker decommissioned > because of kill request from HTTP endpoint (data migration disabled)) > {code} > > * at t1+2min, task 7 at its first attempt (i.e. task 7.0) is scheduled to > the alive executor > {code:java} > 23/03/05 22:49:58 INFO TaskSetManager: Starting task 7.0 in stage 133.0 (TID > 685) (10.166.225.249, executor 0, partition 7, ANY, {code} > > It seems that task 7.0 is able to pass [*{{dataRDD.iterator(partition, > ctxt)}}*|https://github.com/apache/spark/blob/4db8e7b7944302a3929dd6a1197ea1385eecc46a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStoreRDD.scala#L123] > and acquires the rocksdb lock as we are seeing > {code:java} > 23/03/05 22:51:59 WARN TaskSetManager: Lost task 4.1 in stage 133.1 (TID 700) > (10.166.225.249 executor 0): java.lang.IllegalStateException: > StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be > acquired by [ThreadId: Some(50), task: partition 7.1 in stage 133.1, TID 700] > as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage > 133.0, TID 685] after 60003 ms. > 23/03/05 22:52:59 WARN TaskSetManager: Lost task 4.2 in stage 133.1 (TID 702) > (10.166.225.249 executor 0): java.lang.IllegalStateException: > StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be > acquired by [ThreadId: Some(1495), task: partition 7.2 in stage 133.1, TID > 702] as it was not released by [ThreadId: Some(449), task: partition 7.0 in > stage 133.0, TID 685] after 60006 ms. > 23/03/05 22:53:59 WARN TaskSetManager: Lost task 4.3 in stage 133.1 (TID 704) > (10.166.225.249 executor 0): java.lang.IllegalStateException: > StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be > acquired by [ThreadId: Some(46), task: partition 7.3 in stage 133.1, TID 704] > as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage > 133.0, TID 685] after 60003 ms. > {code} > > Increasing the *lockAcquireTimeoutMs* to 2 minutes such that 4 task retries > will give us 8 minutes to acquire the lock and it is larger than > connectionTimeout with retries (3 * 120s). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42794) Increase the lockAcquireTimeoutMs for acquiring the RocksDB state store in Structure Streaming
[ https://issues.apache.org/jira/browse/SPARK-42794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim resolved SPARK-42794. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 40425 [https://github.com/apache/spark/pull/40425] > Increase the lockAcquireTimeoutMs for acquiring the RocksDB state store in > Structure Streaming > -- > > Key: SPARK-42794 > URL: https://issues.apache.org/jira/browse/SPARK-42794 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.5.0 >Reporter: Huanli Wang >Assignee: Huanli Wang >Priority: Minor > Fix For: 3.5.0 > > > We are seeing query failure which is caused by RocksDB acquisition failure > for the retry tasks. > * at t1, we shrink the cluster to only have one executor > {code:java} > 23/03/05 22:47:21 INFO StandaloneAppClient$ClientEndpoint: Executor updated: > app-20230305224215-/2 is now DECOMMISSIONED (worker decommissioned > because of kill request from HTTP endpoint (data migration disabled)) > 23/03/05 22:47:21 INFO StandaloneAppClient$ClientEndpoint: Executor updated: > app-20230305224215-/3 is now DECOMMISSIONED (worker decommissioned > because of kill request from HTTP endpoint (data migration disabled)) > {code} > > * at t1+2min, task 7 at its first attempt (i.e. task 7.0) is scheduled to > the alive executor > {code:java} > 23/03/05 22:49:58 INFO TaskSetManager: Starting task 7.0 in stage 133.0 (TID > 685) (10.166.225.249, executor 0, partition 7, ANY, {code} > > It seems that task 7.0 is able to pass [*{{dataRDD.iterator(partition, > ctxt)}}*|https://github.com/apache/spark/blob/4db8e7b7944302a3929dd6a1197ea1385eecc46a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStoreRDD.scala#L123] > and acquires the rocksdb lock as we are seeing > {code:java} > 23/03/05 22:51:59 WARN TaskSetManager: Lost task 4.1 in stage 133.1 (TID 700) > (10.166.225.249 executor 0): java.lang.IllegalStateException: > StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be > acquired by [ThreadId: Some(50), task: partition 7.1 in stage 133.1, TID 700] > as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage > 133.0, TID 685] after 60003 ms. > 23/03/05 22:52:59 WARN TaskSetManager: Lost task 4.2 in stage 133.1 (TID 702) > (10.166.225.249 executor 0): java.lang.IllegalStateException: > StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be > acquired by [ThreadId: Some(1495), task: partition 7.2 in stage 133.1, TID > 702] as it was not released by [ThreadId: Some(449), task: partition 7.0 in > stage 133.0, TID 685] after 60006 ms. > 23/03/05 22:53:59 WARN TaskSetManager: Lost task 4.3 in stage 133.1 (TID 704) > (10.166.225.249 executor 0): java.lang.IllegalStateException: > StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be > acquired by [ThreadId: Some(46), task: partition 7.3 in stage 133.1, TID 704] > as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage > 133.0, TID 685] after 60003 ms. > {code} > > Increasing the *lockAcquireTimeoutMs* to 2 minutes such that 4 task retries > will give us 8 minutes to acquire the lock and it is larger than > connectionTimeout with retries (3 * 120s). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42374) User-facing documentation
[ https://issues.apache.org/jira/browse/SPARK-42374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allan Folting updated SPARK-42374: -- Summary: User-facing documentation (was: User-facing documentaiton) > User-facing documentation > - > > Key: SPARK-42374 > URL: https://issues.apache.org/jira/browse/SPARK-42374 > Project: Spark > Issue Type: Documentation > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Haejoon Lee >Priority: Major > > Should provide the user-facing documentation so end users how to use Spark > Connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42796) Support TimestampNTZ in Cached Batch
[ https://issues.apache.org/jira/browse/SPARK-42796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-42796: - Fix Version/s: 3.4.1 > Support TimestampNTZ in Cached Batch > > > Key: SPARK-42796 > URL: https://issues.apache.org/jira/browse/SPARK-42796 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.1 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Fix For: 3.4.1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42422) Upgrade `maven-shade-plugin` to 3.4.1
[ https://issues.apache.org/jira/browse/SPARK-42422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-42422: -- Affects Version/s: 3.4.0 (was: 3.5.0) > Upgrade `maven-shade-plugin` to 3.4.1 > - > > Key: SPARK-42422 > URL: https://issues.apache.org/jira/browse/SPARK-42422 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Fix For: 3.4.0 > > > * > [https://github.com/apache/maven-shade-plugin/releases/tag/maven-shade-plugin-3.3.0] > * > https://github.com/apache/maven-shade-plugin/compare/maven-shade-plugin-3.3.0...maven-shade-plugin-3.4.1 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42775) approx_percentile produces wrong results for large decimals.
[ https://issues.apache.org/jira/browse/SPARK-42775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700448#comment-17700448 ] Apache Spark commented on SPARK-42775: -- User 'chenhao-db' has created a pull request for this issue: https://github.com/apache/spark/pull/40429 > approx_percentile produces wrong results for large decimals. > > > Key: SPARK-42775 > URL: https://issues.apache.org/jira/browse/SPARK-42775 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0, 2.2.0, 2.3.0, 2.4.0, 3.0.0, 3.1.0, 3.2.0, 3.3.0, > 3.4.0 >Reporter: Chenhao Li >Priority: Major > > In the {{approx_percentile}} expression, Spark casts decimal to double to > update the aggregation state > ([ApproximatePercentile.scala#L181|https://github.com/apache/spark/blob/933dc0c42f0caf74aaa077fd4f2c2e7208452b9b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala#L181]) > and casts the result double back to decimal > ([ApproximatePercentile.scala#L206|https://github.com/apache/spark/blob/933dc0c42f0caf74aaa077fd4f2c2e7208452b9b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala#L206]). > The precision loss in the casts can make the result decimal out of its > precision range. This can lead to the following counter-intuitive results: > {code:sql} > spark-sql> select approx_percentile(col, 0.5) from values > (999) as tab(col); > NULL > spark-sql> select approx_percentile(col, 0.5) is null from values > (999) as tab(col); > false > spark-sql> select cast(approx_percentile(col, 0.5) as string) from values > (999) as tab(col); > 1000 > spark-sql> desc select approx_percentile(col, 0.5) from values > (999) as tab(col); > approx_percentile(col, 0.5, 1)decimal(19,0) > {code} > The result is actually not null, so the second query returns false. The first > query returns null because the result cannot fit into {{{}decimal(19, 0){}}}. > A suggested fix is to use {{Decimal.changePrecision}} here to ensure the > result fits, and really returns a null or throws an exception when the result > doesn't fit. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42775) approx_percentile produces wrong results for large decimals.
[ https://issues.apache.org/jira/browse/SPARK-42775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42775: Assignee: (was: Apache Spark) > approx_percentile produces wrong results for large decimals. > > > Key: SPARK-42775 > URL: https://issues.apache.org/jira/browse/SPARK-42775 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0, 2.2.0, 2.3.0, 2.4.0, 3.0.0, 3.1.0, 3.2.0, 3.3.0, > 3.4.0 >Reporter: Chenhao Li >Priority: Major > > In the {{approx_percentile}} expression, Spark casts decimal to double to > update the aggregation state > ([ApproximatePercentile.scala#L181|https://github.com/apache/spark/blob/933dc0c42f0caf74aaa077fd4f2c2e7208452b9b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala#L181]) > and casts the result double back to decimal > ([ApproximatePercentile.scala#L206|https://github.com/apache/spark/blob/933dc0c42f0caf74aaa077fd4f2c2e7208452b9b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala#L206]). > The precision loss in the casts can make the result decimal out of its > precision range. This can lead to the following counter-intuitive results: > {code:sql} > spark-sql> select approx_percentile(col, 0.5) from values > (999) as tab(col); > NULL > spark-sql> select approx_percentile(col, 0.5) is null from values > (999) as tab(col); > false > spark-sql> select cast(approx_percentile(col, 0.5) as string) from values > (999) as tab(col); > 1000 > spark-sql> desc select approx_percentile(col, 0.5) from values > (999) as tab(col); > approx_percentile(col, 0.5, 1)decimal(19,0) > {code} > The result is actually not null, so the second query returns false. The first > query returns null because the result cannot fit into {{{}decimal(19, 0){}}}. > A suggested fix is to use {{Decimal.changePrecision}} here to ensure the > result fits, and really returns a null or throws an exception when the result > doesn't fit. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42775) approx_percentile produces wrong results for large decimals.
[ https://issues.apache.org/jira/browse/SPARK-42775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42775: Assignee: Apache Spark > approx_percentile produces wrong results for large decimals. > > > Key: SPARK-42775 > URL: https://issues.apache.org/jira/browse/SPARK-42775 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0, 2.2.0, 2.3.0, 2.4.0, 3.0.0, 3.1.0, 3.2.0, 3.3.0, > 3.4.0 >Reporter: Chenhao Li >Assignee: Apache Spark >Priority: Major > > In the {{approx_percentile}} expression, Spark casts decimal to double to > update the aggregation state > ([ApproximatePercentile.scala#L181|https://github.com/apache/spark/blob/933dc0c42f0caf74aaa077fd4f2c2e7208452b9b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala#L181]) > and casts the result double back to decimal > ([ApproximatePercentile.scala#L206|https://github.com/apache/spark/blob/933dc0c42f0caf74aaa077fd4f2c2e7208452b9b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala#L206]). > The precision loss in the casts can make the result decimal out of its > precision range. This can lead to the following counter-intuitive results: > {code:sql} > spark-sql> select approx_percentile(col, 0.5) from values > (999) as tab(col); > NULL > spark-sql> select approx_percentile(col, 0.5) is null from values > (999) as tab(col); > false > spark-sql> select cast(approx_percentile(col, 0.5) as string) from values > (999) as tab(col); > 1000 > spark-sql> desc select approx_percentile(col, 0.5) from values > (999) as tab(col); > approx_percentile(col, 0.5, 1)decimal(19,0) > {code} > The result is actually not null, so the second query returns false. The first > query returns null because the result cannot fit into {{{}decimal(19, 0){}}}. > A suggested fix is to use {{Decimal.changePrecision}} here to ensure the > result fits, and really returns a null or throws an exception when the result > doesn't fit. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42797) Spark Connect - Grammatical improvements to Spark Overview and Spark Connect Overview doc pages
[ https://issues.apache.org/jira/browse/SPARK-42797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42797: Assignee: Apache Spark > Spark Connect - Grammatical improvements to Spark Overview and Spark Connect > Overview doc pages > --- > > Key: SPARK-42797 > URL: https://issues.apache.org/jira/browse/SPARK-42797 > Project: Spark > Issue Type: Documentation > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Allan Folting >Assignee: Apache Spark >Priority: Major > > Grammatical improvements, this is a follow-up to this ticket: > Introducing Spark Connect on the main page and adding Spark Connect Overview > page > https://issues.apache.org/jira/browse/SPARK-42496 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42797) Spark Connect - Grammatical improvements to Spark Overview and Spark Connect Overview doc pages
[ https://issues.apache.org/jira/browse/SPARK-42797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700444#comment-17700444 ] Apache Spark commented on SPARK-42797: -- User 'allanf-db' has created a pull request for this issue: https://github.com/apache/spark/pull/40428 > Spark Connect - Grammatical improvements to Spark Overview and Spark Connect > Overview doc pages > --- > > Key: SPARK-42797 > URL: https://issues.apache.org/jira/browse/SPARK-42797 > Project: Spark > Issue Type: Documentation > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Allan Folting >Priority: Major > > Grammatical improvements, this is a follow-up to this ticket: > Introducing Spark Connect on the main page and adding Spark Connect Overview > page > https://issues.apache.org/jira/browse/SPARK-42496 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42797) Spark Connect - Grammatical improvements to Spark Overview and Spark Connect Overview doc pages
[ https://issues.apache.org/jira/browse/SPARK-42797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42797: Assignee: (was: Apache Spark) > Spark Connect - Grammatical improvements to Spark Overview and Spark Connect > Overview doc pages > --- > > Key: SPARK-42797 > URL: https://issues.apache.org/jira/browse/SPARK-42797 > Project: Spark > Issue Type: Documentation > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Allan Folting >Priority: Major > > Grammatical improvements, this is a follow-up to this ticket: > Introducing Spark Connect on the main page and adding Spark Connect Overview > page > https://issues.apache.org/jira/browse/SPARK-42496 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42797) Spark Connect - Grammatical improvements to Spark Overview and Spark Connect Overview doc pages
Allan Folting created SPARK-42797: - Summary: Spark Connect - Grammatical improvements to Spark Overview and Spark Connect Overview doc pages Key: SPARK-42797 URL: https://issues.apache.org/jira/browse/SPARK-42797 Project: Spark Issue Type: Documentation Components: Spark Core Affects Versions: 3.4.0 Reporter: Allan Folting Grammatical improvements, this is a follow-up to this ticket: Introducing Spark Connect on the main page and adding Spark Connect Overview page https://issues.apache.org/jira/browse/SPARK-42496 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42496) Introducing Spark Connect on the main page and adding Spark Connect Overview page
[ https://issues.apache.org/jira/browse/SPARK-42496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allan Folting updated SPARK-42496: -- Summary: Introducing Spark Connect on the main page and adding Spark Connect Overview page (was: Introducting Spark Connect at main page) > Introducing Spark Connect on the main page and adding Spark Connect Overview > page > - > > Key: SPARK-42496 > URL: https://issues.apache.org/jira/browse/SPARK-42496 > Project: Spark > Issue Type: Sub-task > Components: Connect, Documentation >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 3.4.1 > > > We should document the introduction of Spark Connect at PySpark main > documentation page to give a summary to users. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42789) Rewrite multiple GetJsonObjects to a JsonTuple if their json expression is the same
[ https://issues.apache.org/jira/browse/SPARK-42789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-42789: Description: Benchmark result: {noformat} Running benchmark: Benchmark rewrite GetJsonObjects Running case: Default: 2 Stopped after 2 iterations, 77193 ms Running case: Rewrite: 2 Stopped after 2 iterations, 51699 ms Java HotSpot(TM) 64-Bit Server VM 17.0.4.1+1-LTS-2 on Mac OS X 13.2.1 Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz Benchmark rewrite GetJsonObjects: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative Default: 237914 38597 966 0.25244.0 1.0X Rewrite: 224887 25850 1361 0.33442.2 1.5X Running benchmark: Benchmark rewrite GetJsonObjects Running case: Default: 3 Stopped after 2 iterations, 110890 ms Running case: Rewrite: 3 Stopped after 2 iterations, 56102 ms Java HotSpot(TM) 64-Bit Server VM 17.0.4.1+1-LTS-2 on Mac OS X 13.2.1 Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz Benchmark rewrite GetJsonObjects: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative Default: 352862 55445 NaN 0.17311.6 1.0X Rewrite: 326752 28051 1837 0.33700.2 2.0X Running benchmark: Benchmark rewrite GetJsonObjects Running case: Default: 4 Stopped after 2 iterations, 150828 ms Running case: Rewrite: 4 Stopped after 2 iterations, 57110 ms Java HotSpot(TM) 64-Bit Server VM 17.0.4.1+1-LTS-2 on Mac OS X 13.2.1 Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz Benchmark rewrite GetJsonObjects: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative Default: 471680 75414 NaN 0.19914.4 1.0X Rewrite: 428452 28555 145 0.33935.4 2.5X Running benchmark: Benchmark rewrite GetJsonObjects Running case: Default: 5 Stopped after 2 iterations, 223367 ms Running case: Rewrite: 5 Stopped after 2 iterations, 78193 ms Java HotSpot(TM) 64-Bit Server VM 17.0.4.1+1-LTS-2 on Mac OS X 13.2.1 Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz Benchmark rewrite GetJsonObjects: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative Default: 5 108479 111684 1447 0.1 15004.2 1.0X Rewrite: 536830 39097 NaN 0.25094.0 2.9X Running benchmark: Benchmark rewrite GetJsonObjects Running case: Default: 10 Stopped after 2 iterations, 311453 ms Running case: Rewrite: 10 Stopped after 2 iterations, 65873 ms Java HotSpot(TM) 64-Bit Server VM 17.0.4.1+1-LTS-2 on Mac OS X 13.2.1 Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz Benchmark rewrite GetJsonObjects: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative Default: 10 153952 155727 2510 0.0 21293.7 1.0X Rewrite: 10 32436 32937 708 0.24486.3 4.7X Running benchmark: Benchmark rewrite GetJsonObjects Running case: Default: 15 Stopped after 2 iterations, 451911 ms Running case: Rewrite: 15 Stopped after 2 iterations, 69790 ms Java HotSpot(TM) 64-Bit Server VM 17.0.4.1+1-LTS-2 on Mac OS X 13.2.1 Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz Benchmark rewrite GetJsonObjects: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative Default: 15 224950 225956 1423 0.0 31113.6 1.0X Rewrite: 15 34806 34895 126 0.24814.2 6.5X Running benchmark: Benchmark
[jira] [Resolved] (SPARK-42793) `connect` module requires `build_profile_flags`
[ https://issues.apache.org/jira/browse/SPARK-42793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-42793. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 40424 [https://github.com/apache/spark/pull/40424] > `connect` module requires `build_profile_flags` > --- > > Key: SPARK-42793 > URL: https://issues.apache.org/jira/browse/SPARK-42793 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42793) `connect` module requires `build_profile_flags`
[ https://issues.apache.org/jira/browse/SPARK-42793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-42793: Assignee: Dongjoon Hyun > `connect` module requires `build_profile_flags` > --- > > Key: SPARK-42793 > URL: https://issues.apache.org/jira/browse/SPARK-42793 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42757) Implement textFile for DataFrameReader
[ https://issues.apache.org/jira/browse/SPARK-42757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-42757. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 40377 [https://github.com/apache/spark/pull/40377] > Implement textFile for DataFrameReader > -- > > Key: SPARK-42757 > URL: https://issues.apache.org/jira/browse/SPARK-42757 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.1 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42757) Implement textFile for DataFrameReader
[ https://issues.apache.org/jira/browse/SPARK-42757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-42757: Assignee: BingKun Pan > Implement textFile for DataFrameReader > -- > > Key: SPARK-42757 > URL: https://issues.apache.org/jira/browse/SPARK-42757 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.1 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42731) Update Spark Configuration
[ https://issues.apache.org/jira/browse/SPARK-42731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-42731. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 40416 [https://github.com/apache/spark/pull/40416] > Update Spark Configuration > -- > > Key: SPARK-42731 > URL: https://issues.apache.org/jira/browse/SPARK-42731 > Project: Spark > Issue Type: Sub-task > Components: Connect, Documentation >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.4.0 > > > https://spark.apache.org/docs/latest/configuration.html > Add a section for Spark Connect configurations -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42731) Update Spark Configuration
[ https://issues.apache.org/jira/browse/SPARK-42731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-42731: Assignee: Hyukjin Kwon > Update Spark Configuration > -- > > Key: SPARK-42731 > URL: https://issues.apache.org/jira/browse/SPARK-42731 > Project: Spark > Issue Type: Sub-task > Components: Connect, Documentation >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > > https://spark.apache.org/docs/latest/configuration.html > Add a section for Spark Connect configurations -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42508) Extract the common .ml classes to `mllib-common`
[ https://issues.apache.org/jira/browse/SPARK-42508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-42508. -- Resolution: Fixed Fixed in https://github.com/apache/spark/pull/40097 > Extract the common .ml classes to `mllib-common` > > > Key: SPARK-42508 > URL: https://issues.apache.org/jira/browse/SPARK-42508 > Project: Spark > Issue Type: Sub-task > Components: Connect, ML >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42508) Extract the common .ml classes to `mllib-common`
[ https://issues.apache.org/jira/browse/SPARK-42508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-42508: - Fix Version/s: 3.5.0 > Extract the common .ml classes to `mllib-common` > > > Key: SPARK-42508 > URL: https://issues.apache.org/jira/browse/SPARK-42508 > Project: Spark > Issue Type: Sub-task > Components: Connect, ML >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42792) Add support to track FLUSH_WRITE_BYTES for RocksDB state store for streaming
[ https://issues.apache.org/jira/browse/SPARK-42792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700413#comment-17700413 ] Apache Spark commented on SPARK-42792: -- User 'anishshri-db' has created a pull request for this issue: https://github.com/apache/spark/pull/40427 > Add support to track FLUSH_WRITE_BYTES for RocksDB state store for streaming > > > Key: SPARK-42792 > URL: https://issues.apache.org/jira/browse/SPARK-42792 > Project: Spark > Issue Type: Task > Components: Structured Streaming >Affects Versions: 3.4.0 >Reporter: Anish Shrigondekar >Priority: Major > > Add support to track FLUSH_WRITE_BYTES for RocksDB state store for streaming > > Its useful to get this metric for bytes written during flush from RocksDB as > part of the DB custom metrics. We propose to add this to the existing metrics > that are collected. There is no additional overhead since we are just > querying the internal ticker guage, similar to other metrics. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42792) Add support to track FLUSH_WRITE_BYTES for RocksDB state store for streaming
[ https://issues.apache.org/jira/browse/SPARK-42792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42792: Assignee: (was: Apache Spark) > Add support to track FLUSH_WRITE_BYTES for RocksDB state store for streaming > > > Key: SPARK-42792 > URL: https://issues.apache.org/jira/browse/SPARK-42792 > Project: Spark > Issue Type: Task > Components: Structured Streaming >Affects Versions: 3.4.0 >Reporter: Anish Shrigondekar >Priority: Major > > Add support to track FLUSH_WRITE_BYTES for RocksDB state store for streaming > > Its useful to get this metric for bytes written during flush from RocksDB as > part of the DB custom metrics. We propose to add this to the existing metrics > that are collected. There is no additional overhead since we are just > querying the internal ticker guage, similar to other metrics. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42792) Add support to track FLUSH_WRITE_BYTES for RocksDB state store for streaming
[ https://issues.apache.org/jira/browse/SPARK-42792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42792: Assignee: Apache Spark > Add support to track FLUSH_WRITE_BYTES for RocksDB state store for streaming > > > Key: SPARK-42792 > URL: https://issues.apache.org/jira/browse/SPARK-42792 > Project: Spark > Issue Type: Task > Components: Structured Streaming >Affects Versions: 3.4.0 >Reporter: Anish Shrigondekar >Assignee: Apache Spark >Priority: Major > > Add support to track FLUSH_WRITE_BYTES for RocksDB state store for streaming > > Its useful to get this metric for bytes written during flush from RocksDB as > part of the DB custom metrics. We propose to add this to the existing metrics > that are collected. There is no additional overhead since we are just > querying the internal ticker guage, similar to other metrics. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42796) Support TimestampNTZ in Cached Batch
[ https://issues.apache.org/jira/browse/SPARK-42796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700414#comment-17700414 ] Apache Spark commented on SPARK-42796: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/40426 > Support TimestampNTZ in Cached Batch > > > Key: SPARK-42796 > URL: https://issues.apache.org/jira/browse/SPARK-42796 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.1 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42796) Support TimestampNTZ in Cached Batch
[ https://issues.apache.org/jira/browse/SPARK-42796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42796: Assignee: Apache Spark (was: Gengliang Wang) > Support TimestampNTZ in Cached Batch > > > Key: SPARK-42796 > URL: https://issues.apache.org/jira/browse/SPARK-42796 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.1 >Reporter: Gengliang Wang >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42796) Support TimestampNTZ in Cached Batch
[ https://issues.apache.org/jira/browse/SPARK-42796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700412#comment-17700412 ] Apache Spark commented on SPARK-42796: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/40426 > Support TimestampNTZ in Cached Batch > > > Key: SPARK-42796 > URL: https://issues.apache.org/jira/browse/SPARK-42796 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.1 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42796) Support TimestampNTZ in Cached Batch
[ https://issues.apache.org/jira/browse/SPARK-42796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42796: Assignee: Gengliang Wang (was: Apache Spark) > Support TimestampNTZ in Cached Batch > > > Key: SPARK-42796 > URL: https://issues.apache.org/jira/browse/SPARK-42796 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.1 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42796) Support TimestampNTZ in Cached Batch
Gengliang Wang created SPARK-42796: -- Summary: Support TimestampNTZ in Cached Batch Key: SPARK-42796 URL: https://issues.apache.org/jira/browse/SPARK-42796 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.4.1 Reporter: Gengliang Wang Assignee: Gengliang Wang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42794) Increase the lockAcquireTimeoutMs for acquiring the RocksDB state store in Structure Streaming
[ https://issues.apache.org/jira/browse/SPARK-42794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huanli Wang updated SPARK-42794: Description: We are seeing query failure which is caused by RocksDB acquisition failure for the retry tasks. * at t1, we shrink the cluster to only have one executor {code:java} 23/03/05 22:47:21 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20230305224215-/2 is now DECOMMISSIONED (worker decommissioned because of kill request from HTTP endpoint (data migration disabled)) 23/03/05 22:47:21 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20230305224215-/3 is now DECOMMISSIONED (worker decommissioned because of kill request from HTTP endpoint (data migration disabled)) {code} * at t1+2min, task 7 at its first attempt (i.e. task 7.0) is scheduled to the alive executor {code:java} 23/03/05 22:49:58 INFO TaskSetManager: Starting task 7.0 in stage 133.0 (TID 685) (10.166.225.249, executor 0, partition 7, ANY, {code} It seems that task 7.0 is able to pass [*{{dataRDD.iterator(partition, ctxt)}}*|https://github.com/apache/spark/blob/4db8e7b7944302a3929dd6a1197ea1385eecc46a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStoreRDD.scala#L123] and acquires the rocksdb lock as we are seeing {code:java} 23/03/05 22:51:59 WARN TaskSetManager: Lost task 4.1 in stage 133.1 (TID 700) (10.166.225.249 executor 0): java.lang.IllegalStateException: StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be acquired by [ThreadId: Some(50), task: partition 7.1 in stage 133.1, TID 700] as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage 133.0, TID 685] after 60003 ms. 23/03/05 22:52:59 WARN TaskSetManager: Lost task 4.2 in stage 133.1 (TID 702) (10.166.225.249 executor 0): java.lang.IllegalStateException: StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be acquired by [ThreadId: Some(1495), task: partition 7.2 in stage 133.1, TID 702] as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage 133.0, TID 685] after 60006 ms. 23/03/05 22:53:59 WARN TaskSetManager: Lost task 4.3 in stage 133.1 (TID 704) (10.166.225.249 executor 0): java.lang.IllegalStateException: StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be acquired by [ThreadId: Some(46), task: partition 7.3 in stage 133.1, TID 704] as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage 133.0, TID 685] after 60003 ms. {code} Increasing the *lockAcquireTimeoutMs* to 2 minutes such that 4 task retries will give us 8 minutes to acquire the lock and it is larger than connectionTimeout with retries (3 * 120s). was: We are seeing query failure which is caused by RocksDB acquisition failure for the retry tasks. * at t1, we shrink the cluster to only have one executor {code:java} 23/03/05 22:47:21 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20230305224215-/2 is now DECOMMISSIONED (worker decommissioned because of kill request from HTTP endpoint (data migration disabled)) 23/03/05 22:47:21 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20230305224215-/3 is now DECOMMISSIONED (worker decommissioned because of kill request from HTTP endpoint (data migration disabled)) {code} * at t1+2min, task 7 at its first attempt (i.e. task 7.0) is scheduled to the alive executor {code:java} 23/03/05 22:49:58 INFO TaskSetManager: Starting task 7.0 in stage 133.0 (TID 685) (10.166.225.249, executor 0, partition 7, ANY, {code} It seems that task 7.0 is able to pass *{{dataRDD.iterator(partition, ctxt)}}* and acquires the rocksdb lock as we are seeing {code:java} 23/03/05 22:51:59 WARN TaskSetManager: Lost task 4.1 in stage 133.1 (TID 700) (10.166.225.249 executor 0): java.lang.IllegalStateException: StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be acquired by [ThreadId: Some(50), task: partition 7.1 in stage 133.1, TID 700] as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage 133.0, TID 685] after 60003 ms. 23/03/05 22:52:59 WARN TaskSetManager: Lost task 4.2 in stage 133.1 (TID 702) (10.166.225.249 executor 0): java.lang.IllegalStateException: StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be acquired by [ThreadId: Some(1495), task: partition 7.2 in stage 133.1, TID 702] as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage 133.0, TID 685] after 60006 ms. 23/03/05 22:53:59 WARN TaskSetManager: Lost task 4.3 in stage 133.1 (TID 704) (10.166.225.249 executor 0): java.lang.IllegalStateException: StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be acquired by [ThreadId: Some(46), task: partition 7.3 in stage 133.1, TID 704] as it was not released by [ThreadId: Some(449), task: partition
[jira] [Commented] (SPARK-42794) Increase the lockAcquireTimeoutMs for acquiring the RocksDB state store in Structure Streaming
[ https://issues.apache.org/jira/browse/SPARK-42794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700402#comment-17700402 ] Apache Spark commented on SPARK-42794: -- User 'huanliwang-db' has created a pull request for this issue: https://github.com/apache/spark/pull/40425 > Increase the lockAcquireTimeoutMs for acquiring the RocksDB state store in > Structure Streaming > -- > > Key: SPARK-42794 > URL: https://issues.apache.org/jira/browse/SPARK-42794 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.5.0 >Reporter: Huanli Wang >Priority: Minor > > We are seeing query failure which is caused by RocksDB acquisition failure > for the retry tasks. > * at t1, we shrink the cluster to only have one executor > {code:java} > 23/03/05 22:47:21 INFO StandaloneAppClient$ClientEndpoint: Executor updated: > app-20230305224215-/2 is now DECOMMISSIONED (worker decommissioned > because of kill request from HTTP endpoint (data migration disabled)) > 23/03/05 22:47:21 INFO StandaloneAppClient$ClientEndpoint: Executor updated: > app-20230305224215-/3 is now DECOMMISSIONED (worker decommissioned > because of kill request from HTTP endpoint (data migration disabled)) > {code} > > * at t1+2min, task 7 at its first attempt (i.e. task 7.0) is scheduled to > the alive executor > {code:java} > 23/03/05 22:49:58 INFO TaskSetManager: Starting task 7.0 in stage 133.0 (TID > 685) (10.166.225.249, executor 0, partition 7, ANY, {code} > > It seems that task 7.0 is able to pass *{{dataRDD.iterator(partition, > ctxt)}}* and acquires the rocksdb lock as we are seeing > {code:java} > 23/03/05 22:51:59 WARN TaskSetManager: Lost task 4.1 in stage 133.1 (TID 700) > (10.166.225.249 executor 0): java.lang.IllegalStateException: > StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be > acquired by [ThreadId: Some(50), task: partition 7.1 in stage 133.1, TID 700] > as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage > 133.0, TID 685] after 60003 ms. > 23/03/05 22:52:59 WARN TaskSetManager: Lost task 4.2 in stage 133.1 (TID 702) > (10.166.225.249 executor 0): java.lang.IllegalStateException: > StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be > acquired by [ThreadId: Some(1495), task: partition 7.2 in stage 133.1, TID > 702] as it was not released by [ThreadId: Some(449), task: partition 7.0 in > stage 133.0, TID 685] after 60006 ms. > 23/03/05 22:53:59 WARN TaskSetManager: Lost task 4.3 in stage 133.1 (TID 704) > (10.166.225.249 executor 0): java.lang.IllegalStateException: > StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be > acquired by [ThreadId: Some(46), task: partition 7.3 in stage 133.1, TID 704] > as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage > 133.0, TID 685] after 60003 ms. > {code} > > Increasing the *lockAcquireTimeoutMs* to 2 minutes such that 4 task retries > will give us 8 minutes to acquire the lock and it is larger than > connectionTimeout with retries (3 * 120s). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42794) Increase the lockAcquireTimeoutMs for acquiring the RocksDB state store in Structure Streaming
[ https://issues.apache.org/jira/browse/SPARK-42794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42794: Assignee: (was: Apache Spark) > Increase the lockAcquireTimeoutMs for acquiring the RocksDB state store in > Structure Streaming > -- > > Key: SPARK-42794 > URL: https://issues.apache.org/jira/browse/SPARK-42794 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.5.0 >Reporter: Huanli Wang >Priority: Minor > > We are seeing query failure which is caused by RocksDB acquisition failure > for the retry tasks. > * at t1, we shrink the cluster to only have one executor > {code:java} > 23/03/05 22:47:21 INFO StandaloneAppClient$ClientEndpoint: Executor updated: > app-20230305224215-/2 is now DECOMMISSIONED (worker decommissioned > because of kill request from HTTP endpoint (data migration disabled)) > 23/03/05 22:47:21 INFO StandaloneAppClient$ClientEndpoint: Executor updated: > app-20230305224215-/3 is now DECOMMISSIONED (worker decommissioned > because of kill request from HTTP endpoint (data migration disabled)) > {code} > > * at t1+2min, task 7 at its first attempt (i.e. task 7.0) is scheduled to > the alive executor > {code:java} > 23/03/05 22:49:58 INFO TaskSetManager: Starting task 7.0 in stage 133.0 (TID > 685) (10.166.225.249, executor 0, partition 7, ANY, {code} > > It seems that task 7.0 is able to pass *{{dataRDD.iterator(partition, > ctxt)}}* and acquires the rocksdb lock as we are seeing > {code:java} > 23/03/05 22:51:59 WARN TaskSetManager: Lost task 4.1 in stage 133.1 (TID 700) > (10.166.225.249 executor 0): java.lang.IllegalStateException: > StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be > acquired by [ThreadId: Some(50), task: partition 7.1 in stage 133.1, TID 700] > as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage > 133.0, TID 685] after 60003 ms. > 23/03/05 22:52:59 WARN TaskSetManager: Lost task 4.2 in stage 133.1 (TID 702) > (10.166.225.249 executor 0): java.lang.IllegalStateException: > StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be > acquired by [ThreadId: Some(1495), task: partition 7.2 in stage 133.1, TID > 702] as it was not released by [ThreadId: Some(449), task: partition 7.0 in > stage 133.0, TID 685] after 60006 ms. > 23/03/05 22:53:59 WARN TaskSetManager: Lost task 4.3 in stage 133.1 (TID 704) > (10.166.225.249 executor 0): java.lang.IllegalStateException: > StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be > acquired by [ThreadId: Some(46), task: partition 7.3 in stage 133.1, TID 704] > as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage > 133.0, TID 685] after 60003 ms. > {code} > > Increasing the *lockAcquireTimeoutMs* to 2 minutes such that 4 task retries > will give us 8 minutes to acquire the lock and it is larger than > connectionTimeout with retries (3 * 120s). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42794) Increase the lockAcquireTimeoutMs for acquiring the RocksDB state store in Structure Streaming
[ https://issues.apache.org/jira/browse/SPARK-42794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42794: Assignee: Apache Spark > Increase the lockAcquireTimeoutMs for acquiring the RocksDB state store in > Structure Streaming > -- > > Key: SPARK-42794 > URL: https://issues.apache.org/jira/browse/SPARK-42794 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.5.0 >Reporter: Huanli Wang >Assignee: Apache Spark >Priority: Minor > > We are seeing query failure which is caused by RocksDB acquisition failure > for the retry tasks. > * at t1, we shrink the cluster to only have one executor > {code:java} > 23/03/05 22:47:21 INFO StandaloneAppClient$ClientEndpoint: Executor updated: > app-20230305224215-/2 is now DECOMMISSIONED (worker decommissioned > because of kill request from HTTP endpoint (data migration disabled)) > 23/03/05 22:47:21 INFO StandaloneAppClient$ClientEndpoint: Executor updated: > app-20230305224215-/3 is now DECOMMISSIONED (worker decommissioned > because of kill request from HTTP endpoint (data migration disabled)) > {code} > > * at t1+2min, task 7 at its first attempt (i.e. task 7.0) is scheduled to > the alive executor > {code:java} > 23/03/05 22:49:58 INFO TaskSetManager: Starting task 7.0 in stage 133.0 (TID > 685) (10.166.225.249, executor 0, partition 7, ANY, {code} > > It seems that task 7.0 is able to pass *{{dataRDD.iterator(partition, > ctxt)}}* and acquires the rocksdb lock as we are seeing > {code:java} > 23/03/05 22:51:59 WARN TaskSetManager: Lost task 4.1 in stage 133.1 (TID 700) > (10.166.225.249 executor 0): java.lang.IllegalStateException: > StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be > acquired by [ThreadId: Some(50), task: partition 7.1 in stage 133.1, TID 700] > as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage > 133.0, TID 685] after 60003 ms. > 23/03/05 22:52:59 WARN TaskSetManager: Lost task 4.2 in stage 133.1 (TID 702) > (10.166.225.249 executor 0): java.lang.IllegalStateException: > StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be > acquired by [ThreadId: Some(1495), task: partition 7.2 in stage 133.1, TID > 702] as it was not released by [ThreadId: Some(449), task: partition 7.0 in > stage 133.0, TID 685] after 60006 ms. > 23/03/05 22:53:59 WARN TaskSetManager: Lost task 4.3 in stage 133.1 (TID 704) > (10.166.225.249 executor 0): java.lang.IllegalStateException: > StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be > acquired by [ThreadId: Some(46), task: partition 7.3 in stage 133.1, TID 704] > as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage > 133.0, TID 685] after 60003 ms. > {code} > > Increasing the *lockAcquireTimeoutMs* to 2 minutes such that 4 task retries > will give us 8 minutes to acquire the lock and it is larger than > connectionTimeout with retries (3 * 120s). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42795) Create analyzer golden file based test suite
Daniel created SPARK-42795: -- Summary: Create analyzer golden file based test suite Key: SPARK-42795 URL: https://issues.apache.org/jira/browse/SPARK-42795 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.4.0 Reporter: Daniel -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42794) Increase the lockAcquireTimeoutMs for acquiring the RocksDB state store in Structure Streaming
[ https://issues.apache.org/jira/browse/SPARK-42794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huanli Wang updated SPARK-42794: Priority: Minor (was: Major) > Increase the lockAcquireTimeoutMs for acquiring the RocksDB state store in > Structure Streaming > -- > > Key: SPARK-42794 > URL: https://issues.apache.org/jira/browse/SPARK-42794 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.5.0 >Reporter: Huanli Wang >Priority: Minor > > We are seeing query failure which is caused by RocksDB acquisition failure > for the retry tasks. > * at t1, we shrink the cluster to only have one executor > {code:java} > 23/03/05 22:47:21 INFO StandaloneAppClient$ClientEndpoint: Executor updated: > app-20230305224215-/2 is now DECOMMISSIONED (worker decommissioned > because of kill request from HTTP endpoint (data migration disabled)) > 23/03/05 22:47:21 INFO StandaloneAppClient$ClientEndpoint: Executor updated: > app-20230305224215-/3 is now DECOMMISSIONED (worker decommissioned > because of kill request from HTTP endpoint (data migration disabled)) > {code} > > * at t1+2min, task 7 at its first attempt (i.e. task 7.0) is scheduled to > the alive executor > {code:java} > 23/03/05 22:49:58 INFO TaskSetManager: Starting task 7.0 in stage 133.0 (TID > 685) (10.166.225.249, executor 0, partition 7, ANY, {code} > > It seems that task 7.0 is able to pass *{{dataRDD.iterator(partition, > ctxt)}}* and acquires the rocksdb lock as we are seeing > {code:java} > 23/03/05 22:51:59 WARN TaskSetManager: Lost task 4.1 in stage 133.1 (TID 700) > (10.166.225.249 executor 0): java.lang.IllegalStateException: > StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be > acquired by [ThreadId: Some(50), task: partition 7.1 in stage 133.1, TID 700] > as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage > 133.0, TID 685] after 60003 ms. > 23/03/05 22:52:59 WARN TaskSetManager: Lost task 4.2 in stage 133.1 (TID 702) > (10.166.225.249 executor 0): java.lang.IllegalStateException: > StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be > acquired by [ThreadId: Some(1495), task: partition 7.2 in stage 133.1, TID > 702] as it was not released by [ThreadId: Some(449), task: partition 7.0 in > stage 133.0, TID 685] after 60006 ms. > 23/03/05 22:53:59 WARN TaskSetManager: Lost task 4.3 in stage 133.1 (TID 704) > (10.166.225.249 executor 0): java.lang.IllegalStateException: > StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be > acquired by [ThreadId: Some(46), task: partition 7.3 in stage 133.1, TID 704] > as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage > 133.0, TID 685] after 60003 ms. > {code} > > Increasing the *lockAcquireTimeoutMs* to 2 minutes such that 4 task retries > will give us 8 minutes to acquire the lock and it is larger than > connectionTimeout with retries (3 * 120s). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42793) `connect` module requires `build_profile_flags`
[ https://issues.apache.org/jira/browse/SPARK-42793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700398#comment-17700398 ] Apache Spark commented on SPARK-42793: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/40424 > `connect` module requires `build_profile_flags` > --- > > Key: SPARK-42793 > URL: https://issues.apache.org/jira/browse/SPARK-42793 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42794) Increase the lockAcquireTimeoutMs for acquiring the RocksDB state store in Structure Streaming
[ https://issues.apache.org/jira/browse/SPARK-42794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huanli Wang updated SPARK-42794: Description: We are seeing query failure which is caused by RocksDB acquisition failure for the retry tasks. * at t1, we shrink the cluster to only have one executor {code:java} 23/03/05 22:47:21 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20230305224215-/2 is now DECOMMISSIONED (worker decommissioned because of kill request from HTTP endpoint (data migration disabled)) 23/03/05 22:47:21 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20230305224215-/3 is now DECOMMISSIONED (worker decommissioned because of kill request from HTTP endpoint (data migration disabled)) {code} * at t1+2min, task 7 at its first attempt (i.e. task 7.0) is scheduled to the alive executor {code:java} 23/03/05 22:49:58 INFO TaskSetManager: Starting task 7.0 in stage 133.0 (TID 685) (10.166.225.249, executor 0, partition 7, ANY, {code} It seems that task 7.0 is able to pass *{{dataRDD.iterator(partition, ctxt)}}* and acquires the rocksdb lock as we are seeing {code:java} 23/03/05 22:51:59 WARN TaskSetManager: Lost task 4.1 in stage 133.1 (TID 700) (10.166.225.249 executor 0): java.lang.IllegalStateException: StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be acquired by [ThreadId: Some(50), task: partition 7.1 in stage 133.1, TID 700] as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage 133.0, TID 685] after 60003 ms. 23/03/05 22:52:59 WARN TaskSetManager: Lost task 4.2 in stage 133.1 (TID 702) (10.166.225.249 executor 0): java.lang.IllegalStateException: StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be acquired by [ThreadId: Some(1495), task: partition 7.2 in stage 133.1, TID 702] as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage 133.0, TID 685] after 60006 ms. 23/03/05 22:53:59 WARN TaskSetManager: Lost task 4.3 in stage 133.1 (TID 704) (10.166.225.249 executor 0): java.lang.IllegalStateException: StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be acquired by [ThreadId: Some(46), task: partition 7.3 in stage 133.1, TID 704] as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage 133.0, TID 685] after 60003 ms. {code} Increasing the *lockAcquireTimeoutMs* to 2 minutes such that 4 task retries will give us 8 minutes to acquire the lock and it is larger than connectionTimeout with retries (3 * 120s). was: We are seeing query failure which is caused by RocksDB acquisition failure for the retry tasks. * at t1, we shrink the cluster to only have one executor {code:java} 23/03/05 22:47:21 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20230305224215-/2 is now DECOMMISSIONED (worker decommissioned because of kill request from HTTP endpoint (data migration disabled)) 23/03/05 22:47:21 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20230305224215-/3 is now DECOMMISSIONED (worker decommissioned because of kill request from HTTP endpoint (data migration disabled)) {code} * at t1+2min, task 7 at its first attempt (i.e. task 7.0) is scheduled to the alive executor {code:java} 23/03/05 22:49:58 INFO TaskSetManager: Starting task 7.0 in stage 133.0 (TID 685) (10.166.225.249, executor 0, partition 7, ANY, {code} It seems that task 7.0 is able to pass *{{dataRDD.iterator(partition, ctxt)}}* and acquires the rocksdb lock as we are seeing {code:java} 23/03/05 22:51:59 WARN TaskSetManager: Lost task 4.1 in stage 133.1 (TID 700) (10.166.225.249 executor 0): java.lang.IllegalStateException: StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be acquired by [ThreadId: Some(50), task: partition 7.1 in stage 133.1, TID 700] as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage 133.0, TID 685] after 60003 ms. 23/03/05 22:52:59 WARN TaskSetManager: Lost task 4.2 in stage 133.1 (TID 702) (10.166.225.249 executor 0): java.lang.IllegalStateException: StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be acquired by [ThreadId: Some(1495), task: partition 7.2 in stage 133.1, TID 702] as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage 133.0, TID 685] after 60006 ms. 23/03/05 22:53:59 WARN TaskSetManager: Lost task 4.3 in stage 133.1 (TID 704) (10.166.225.249 executor 0): java.lang.IllegalStateException: StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be acquired by [ThreadId: Some(46), task: partition 7.3 in stage 133.1, TID 704] as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage 133.0, TID 685] after 60003 ms. {code} Increasing the
[jira] [Updated] (SPARK-42794) Increase the lockAcquireTimeoutMs for acquiring the RocksDB state store in Structure Streaming
[ https://issues.apache.org/jira/browse/SPARK-42794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huanli Wang updated SPARK-42794: Description: We are seeing query failure which is caused by RocksDB acquisition failure for the retry tasks. * at t1, we shrink the cluster to only have one executor {code:java} 23/03/05 22:47:21 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20230305224215-/2 is now DECOMMISSIONED (worker decommissioned because of kill request from HTTP endpoint (data migration disabled)) 23/03/05 22:47:21 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20230305224215-/3 is now DECOMMISSIONED (worker decommissioned because of kill request from HTTP endpoint (data migration disabled)) {code} * at t1+2min, task 7 at its first attempt (i.e. task 7.0) is scheduled to the alive executor {code:java} 23/03/05 22:49:58 INFO TaskSetManager: Starting task 7.0 in stage 133.0 (TID 685) (10.166.225.249, executor 0, partition 7, ANY, {code} It seems that task 7.0 is able to pass *{{dataRDD.iterator(partition, ctxt)}}* and acquires the rocksdb lock as we are seeing {code:java} 23/03/05 22:51:59 WARN TaskSetManager: Lost task 4.1 in stage 133.1 (TID 700) (10.166.225.249 executor 0): java.lang.IllegalStateException: StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be acquired by [ThreadId: Some(50), task: partition 7.1 in stage 133.1, TID 700] as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage 133.0, TID 685] after 60003 ms. 23/03/05 22:52:59 WARN TaskSetManager: Lost task 4.2 in stage 133.1 (TID 702) (10.166.225.249 executor 0): java.lang.IllegalStateException: StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be acquired by [ThreadId: Some(1495), task: partition 7.2 in stage 133.1, TID 702] as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage 133.0, TID 685] after 60006 ms. 23/03/05 22:53:59 WARN TaskSetManager: Lost task 4.3 in stage 133.1 (TID 704) (10.166.225.249 executor 0): java.lang.IllegalStateException: StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be acquired by [ThreadId: Some(46), task: partition 7.3 in stage 133.1, TID 704] as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage 133.0, TID 685] after 60003 ms. {code} Increasing the [lockAcquireTimeoutMs|https://src.dev.databricks.com/databricks/runtime/-/blob/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala?L927:3] to 2 minutes such that 4 task retries will give us 8 minutes to acquire the lock and it is larger than connectionTimeout with retries (3 * 120s). was: We are seeing query failure which is caused by RocksDB acquisition failure for the retry tasks. * at t1, we shrink the cluster to only have one executor 23/03/05 22:47:21 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20230305224215-/2 is now DECOMMISSIONED (worker decommissioned because of kill request from HTTP endpoint (data migration disabled)) 23/03/05 22:47:21 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20230305224215-/3 is now DECOMMISSIONED (worker decommissioned because of kill request from HTTP endpoint (data migration disabled)) * at t1+2min, task 7 at its first attempt (i.e. task 7.0) is scheduled to the alive executor 23/03/05 22:49:58 INFO TaskSetManager: Starting task 7.0 in stage 133.0 (TID 685) (10.166.225.249, executor 0, partition 7, ANY, It seems that task 7.0 is able to pass *{{dataRDD.iterator(partition, ctxt)}}* and acquires the rocksdb lock as we are seeing 23/03/05 22:51:59 WARN TaskSetManager: Lost task 4.1 in stage 133.1 (TID 700) (10.166.225.249 executor 0): java.lang.IllegalStateException: StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be acquired by [ThreadId: Some(50), task: partition 7.1 in stage 133.1, TID 700] as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage 133.0, TID 685] after 60003 ms. 23/03/05 22:52:59 WARN TaskSetManager: Lost task 4.2 in stage 133.1 (TID 702) (10.166.225.249 executor 0): java.lang.IllegalStateException: StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be acquired by [ThreadId: Some(1495), task: partition 7.2 in stage 133.1, TID 702] as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage 133.0, TID 685] after 60006 ms. 23/03/05 22:53:59 WARN TaskSetManager: Lost task 4.3 in stage 133.1 (TID 704) (10.166.225.249 executor 0): java.lang.IllegalStateException: StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be acquired by [ThreadId: Some(46), task: partition 7.3 in stage 133.1, TID 704] as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage 133.0, TID 685] after 60003 ms. Increasing the
[jira] [Assigned] (SPARK-42793) `connect` module requires `build_profile_flags`
[ https://issues.apache.org/jira/browse/SPARK-42793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42793: Assignee: (was: Apache Spark) > `connect` module requires `build_profile_flags` > --- > > Key: SPARK-42793 > URL: https://issues.apache.org/jira/browse/SPARK-42793 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42793) `connect` module requires `build_profile_flags`
[ https://issues.apache.org/jira/browse/SPARK-42793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42793: Assignee: Apache Spark > `connect` module requires `build_profile_flags` > --- > > Key: SPARK-42793 > URL: https://issues.apache.org/jira/browse/SPARK-42793 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42793) `connect` module requires `build_profile_flags`
[ https://issues.apache.org/jira/browse/SPARK-42793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700397#comment-17700397 ] Apache Spark commented on SPARK-42793: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/40424 > `connect` module requires `build_profile_flags` > --- > > Key: SPARK-42793 > URL: https://issues.apache.org/jira/browse/SPARK-42793 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42794) Increase the lockAcquireTimeoutMs for acquiring the RocksDB state store in Structure Streaming
Huanli Wang created SPARK-42794: --- Summary: Increase the lockAcquireTimeoutMs for acquiring the RocksDB state store in Structure Streaming Key: SPARK-42794 URL: https://issues.apache.org/jira/browse/SPARK-42794 Project: Spark Issue Type: Improvement Components: Structured Streaming Affects Versions: 3.5.0 Reporter: Huanli Wang We are seeing query failure which is caused by RocksDB acquisition failure for the retry tasks. * at t1, we shrink the cluster to only have one executor 23/03/05 22:47:21 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20230305224215-/2 is now DECOMMISSIONED (worker decommissioned because of kill request from HTTP endpoint (data migration disabled)) 23/03/05 22:47:21 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20230305224215-/3 is now DECOMMISSIONED (worker decommissioned because of kill request from HTTP endpoint (data migration disabled)) * at t1+2min, task 7 at its first attempt (i.e. task 7.0) is scheduled to the alive executor 23/03/05 22:49:58 INFO TaskSetManager: Starting task 7.0 in stage 133.0 (TID 685) (10.166.225.249, executor 0, partition 7, ANY, It seems that task 7.0 is able to pass *{{dataRDD.iterator(partition, ctxt)}}* and acquires the rocksdb lock as we are seeing 23/03/05 22:51:59 WARN TaskSetManager: Lost task 4.1 in stage 133.1 (TID 700) (10.166.225.249 executor 0): java.lang.IllegalStateException: StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be acquired by [ThreadId: Some(50), task: partition 7.1 in stage 133.1, TID 700] as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage 133.0, TID 685] after 60003 ms. 23/03/05 22:52:59 WARN TaskSetManager: Lost task 4.2 in stage 133.1 (TID 702) (10.166.225.249 executor 0): java.lang.IllegalStateException: StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be acquired by [ThreadId: Some(1495), task: partition 7.2 in stage 133.1, TID 702] as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage 133.0, TID 685] after 60006 ms. 23/03/05 22:53:59 WARN TaskSetManager: Lost task 4.3 in stage 133.1 (TID 704) (10.166.225.249 executor 0): java.lang.IllegalStateException: StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be acquired by [ThreadId: Some(46), task: partition 7.3 in stage 133.1, TID 704] as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage 133.0, TID 685] after 60003 ms. Increasing the [lockAcquireTimeoutMs|https://src.dev.databricks.com/databricks/runtime/-/blob/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala?L927:3] to 2 minutes such that 4 task retries will give us 8 minutes to acquire the lock and it is larger than connectionTimeout with retries (3 * 120s). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42793) `connect` module requires `build_profile_flags`
Dongjoon Hyun created SPARK-42793: - Summary: `connect` module requires `build_profile_flags` Key: SPARK-42793 URL: https://issues.apache.org/jira/browse/SPARK-42793 Project: Spark Issue Type: Bug Components: Connect Affects Versions: 3.4.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42792) Add support to track FLUSH_WRITE_BYTES for RocksDB state store for streaming
Anish Shrigondekar created SPARK-42792: -- Summary: Add support to track FLUSH_WRITE_BYTES for RocksDB state store for streaming Key: SPARK-42792 URL: https://issues.apache.org/jira/browse/SPARK-42792 Project: Spark Issue Type: Task Components: Structured Streaming Affects Versions: 3.4.0 Reporter: Anish Shrigondekar Add support to track FLUSH_WRITE_BYTES for RocksDB state store for streaming Its useful to get this metric for bytes written during flush from RocksDB as part of the DB custom metrics. We propose to add this to the existing metrics that are collected. There is no additional overhead since we are just querying the internal ticker guage, similar to other metrics. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42792) Add support to track FLUSH_WRITE_BYTES for RocksDB state store for streaming
[ https://issues.apache.org/jira/browse/SPARK-42792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700387#comment-17700387 ] Anish Shrigondekar commented on SPARK-42792: Will send the PR soon - cc - [~kabhwan] > Add support to track FLUSH_WRITE_BYTES for RocksDB state store for streaming > > > Key: SPARK-42792 > URL: https://issues.apache.org/jira/browse/SPARK-42792 > Project: Spark > Issue Type: Task > Components: Structured Streaming >Affects Versions: 3.4.0 >Reporter: Anish Shrigondekar >Priority: Major > > Add support to track FLUSH_WRITE_BYTES for RocksDB state store for streaming > > Its useful to get this metric for bytes written during flush from RocksDB as > part of the DB custom metrics. We propose to add this to the existing metrics > that are collected. There is no additional overhead since we are just > querying the internal ticker guage, similar to other metrics. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41775) Implement training functions as input
[ https://issues.apache.org/jira/browse/SPARK-41775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700332#comment-17700332 ] Apache Spark commented on SPARK-41775: -- User 'rithwik-db' has created a pull request for this issue: https://github.com/apache/spark/pull/40423 > Implement training functions as input > - > > Key: SPARK-41775 > URL: https://issues.apache.org/jira/browse/SPARK-41775 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Affects Versions: 3.4.0 >Reporter: Rithwik Ediga Lakhamsani >Assignee: Rithwik Ediga Lakhamsani >Priority: Major > Fix For: 3.4.0 > > > Sidenote: make formatting updates described in > https://github.com/apache/spark/pull/39188 > > Currently, `Distributor().run(...)` takes only files as input. Now we will > add in additional functionality to take in functions as well. This will > require us to go through the following process on each task in the executor > nodes: > 1. take the input function and args and pickle them > 2. Create a temp train.py file that looks like > {code:java} > import cloudpickle > import os > if _name_ == "_main_": > train, args = cloudpickle.load(f"{tempdir}/train_input.pkl") > output = train(*args) > if output and os.environ.get("RANK", "") == "0": # this is for > partitionId == 0 > cloudpickle.dump(f"{tempdir}/train_output.pkl") {code} > 3. Run that train.py file with `torchrun` > 4. Check if `train_output.pkl` has been created on process on partitionId == > 0, if it has, then deserialize it and return that output through `.collect()` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42791) Create golden file test framework for analysis
Daniel created SPARK-42791: -- Summary: Create golden file test framework for analysis Key: SPARK-42791 URL: https://issues.apache.org/jira/browse/SPARK-42791 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.4.0 Reporter: Daniel Here we track the work to add new golden file test support for the Spark analyzer. Each golden file can contain a list of SQL queries followed by the string representations of their analyzed logical plans. This can be similar to Spark's existing `SQLQueryTestSuite` [1], but stopping after analysis and listing analyzed plans as the results instead of fully executing queries end-to-end. As another example, ZetaSQL has analyzer-based golden file testing like this as well [2]. This way, any changes to analysis will show up as test diffs, which are easy to spot in review and also easy to update automatically. This could help the community together maintain the qualify of Apache Spark's query analysis. [1] [https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala] [2] [https://github.com/google/zetasql/blob/master/zetasql/analyzer/testdata/limit.test]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41775) Implement training functions as input
[ https://issues.apache.org/jira/browse/SPARK-41775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700333#comment-17700333 ] Apache Spark commented on SPARK-41775: -- User 'rithwik-db' has created a pull request for this issue: https://github.com/apache/spark/pull/40423 > Implement training functions as input > - > > Key: SPARK-41775 > URL: https://issues.apache.org/jira/browse/SPARK-41775 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Affects Versions: 3.4.0 >Reporter: Rithwik Ediga Lakhamsani >Assignee: Rithwik Ediga Lakhamsani >Priority: Major > Fix For: 3.4.0 > > > Sidenote: make formatting updates described in > https://github.com/apache/spark/pull/39188 > > Currently, `Distributor().run(...)` takes only files as input. Now we will > add in additional functionality to take in functions as well. This will > require us to go through the following process on each task in the executor > nodes: > 1. take the input function and args and pickle them > 2. Create a temp train.py file that looks like > {code:java} > import cloudpickle > import os > if _name_ == "_main_": > train, args = cloudpickle.load(f"{tempdir}/train_input.pkl") > output = train(*args) > if output and os.environ.get("RANK", "") == "0": # this is for > partitionId == 0 > cloudpickle.dump(f"{tempdir}/train_output.pkl") {code} > 3. Run that train.py file with `torchrun` > 4. Check if `train_output.pkl` has been created on process on partitionId == > 0, if it has, then deserialize it and return that output through `.collect()` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42774) Expose VectorTypes API for DataSourceV2 Batch Scans
[ https://issues.apache.org/jira/browse/SPARK-42774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Micah Kornfield updated SPARK-42774: Priority: Minor (was: Major) > Expose VectorTypes API for DataSourceV2 Batch Scans > --- > > Key: SPARK-42774 > URL: https://issues.apache.org/jira/browse/SPARK-42774 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.2 >Reporter: Micah Kornfield >Priority: Minor > > SparkPlan's vectorType's attribute can be used to [specialize > codegen|https://github.com/apache/spark/blob/5556cfc59aa97a3ad4ea0baacebe19859ec0bcb7/sql/core/src/main/scala/org/apache/spark/sql/execution/Columnar.scala#L151] > however > [BatchScanExecBase|https://github.com/apache/spark/blob/6b6bb6fa20f40aeedea2fb87008e9cce76c54e28/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2ScanExecBase.scala] > does not override this so we DSv2 sources do not get any benefit of concrete > class dispatch. > This proposes adding an override to BatchScanExecBase which delegates to a > new default method on > [PartitionReaderFactory|https://github.com/apache/spark/blob/f1d42bb68d6d69d9a32f91a390270f9ec33c3207/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/PartitionReaderFactory.java] > to expose vectoryTypes: > {{ > default Optional> getVectorTypes() > { return Optional.empty(); } }} > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42787) in spark-py docker images, arrowkeys do not work in (scala) spark-shell
[ https://issues.apache.org/jira/browse/SPARK-42787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700322#comment-17700322 ] Bjørn Jørgensen commented on SPARK-42787: - Have a look at https://github.com/apache/spark-docker > in spark-py docker images, arrowkeys do not work in (scala) spark-shell > --- > > Key: SPARK-42787 > URL: https://issues.apache.org/jira/browse/SPARK-42787 > Project: Spark > Issue Type: Improvement > Components: Deploy >Affects Versions: 3.1.3, 3.3.1 > Environment: [https://hub.docker.com/r/apache/spark-py] 3.1.3 and > 3.3.1 in Docker on M1 MacBook pro OSX ventura >Reporter: Max Rieger >Priority: Minor > > i tested this for 3.1.3 and 3.3.1 from > [https://hub.docker.com/r/apache/spark-py/tags] > while it works for pyspark, it does not for the scala spark-shell. > it seems this is due to scala REPL using {{jline}} for input management. > * creating a \{{.inputrc}} file with mapping for the arrow keys. this > wouldn't work > * finally, building and running from > {{dev/create-release/spark-rm/Dockerfile}} with jline installed as of the > Dockerfile, things worked. > likely not limited to the {{spark-py}} images only. > i'd do a PR, but am unsure if this is even the right Dockerfile to contribute > to in order to fix the docker hub images... > {code:sh} > diff --git a/dev/create-release/spark-rm/Dockerfile > b/dev/create-release/spark-rm/Dockerfile > --- dev/create-release/spark-rm/Dockerfile > +++ dev/create-release/spark-rm/Dockerfile > @@ -71,9 +71,9 @@ >$APT_INSTALL nodejs && \ ># Install needed python packages. Use pip for installing packages (for > consistency). >$APT_INSTALL python-is-python3 python3-pip python3-setuptools && \ ># qpdf is required for CRAN checks to pass. > - $APT_INSTALL qpdf jq && \ > + $APT_INSTALL qpdf jq libjline-java && \ >pip3 install $PIP_PKGS && \ ># Install R packages and dependencies used when building. ># R depends on pandoc*, libssl (which are installed above). ># Note that PySpark doc generation also needs pandoc due to nbsphinx > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42779) Allow V2 writes to indicate advisory partition size
[ https://issues.apache.org/jira/browse/SPARK-42779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42779: Assignee: Apache Spark > Allow V2 writes to indicate advisory partition size > --- > > Key: SPARK-42779 > URL: https://issues.apache.org/jira/browse/SPARK-42779 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Anton Okolnychyi >Assignee: Apache Spark >Priority: Major > > Data sources may request a particular distribution and ordering of data for > V2 writes. If AQE is enabled, the default session advisory partition size > (64MB) will be used as guidance. Unfortunately, this default value can still > lead to small files because the written data can be compressed nicely using > columnar file formats. Spark should allow data sources to indicate the > advisory shuffle partition size, just like it lets data sources request a > particular number of partitions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42779) Allow V2 writes to indicate advisory partition size
[ https://issues.apache.org/jira/browse/SPARK-42779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42779: Assignee: (was: Apache Spark) > Allow V2 writes to indicate advisory partition size > --- > > Key: SPARK-42779 > URL: https://issues.apache.org/jira/browse/SPARK-42779 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Anton Okolnychyi >Priority: Major > > Data sources may request a particular distribution and ordering of data for > V2 writes. If AQE is enabled, the default session advisory partition size > (64MB) will be used as guidance. Unfortunately, this default value can still > lead to small files because the written data can be compressed nicely using > columnar file formats. Spark should allow data sources to indicate the > advisory shuffle partition size, just like it lets data sources request a > particular number of partitions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42779) Allow V2 writes to indicate advisory partition size
[ https://issues.apache.org/jira/browse/SPARK-42779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700316#comment-17700316 ] Apache Spark commented on SPARK-42779: -- User 'aokolnychyi' has created a pull request for this issue: https://github.com/apache/spark/pull/40421 > Allow V2 writes to indicate advisory partition size > --- > > Key: SPARK-42779 > URL: https://issues.apache.org/jira/browse/SPARK-42779 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Anton Okolnychyi >Priority: Major > > Data sources may request a particular distribution and ordering of data for > V2 writes. If AQE is enabled, the default session advisory partition size > (64MB) will be used as guidance. Unfortunately, this default value can still > lead to small files because the written data can be compressed nicely using > columnar file formats. Spark should allow data sources to indicate the > advisory shuffle partition size, just like it lets data sources request a > particular number of partitions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41171) Push down filter through window when partitionSpec is empty
[ https://issues.apache.org/jira/browse/SPARK-41171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-41171: -- Affects Version/s: 3.5.0 (was: 3.4.0) > Push down filter through window when partitionSpec is empty > --- > > Key: SPARK-41171 > URL: https://issues.apache.org/jira/browse/SPARK-41171 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > Fix For: 3.5.0 > > > Sometimes, filter compares the rank-like window functions with number. > {code:java} > SELECT *, ROW_NUMBER() OVER(ORDER BY a) AS rn FROM Tab1 WHERE rn <= 5 > {code} > We can create a Limit(5) and push down it as the child of Window. > {code:java} > SELECT *, ROW_NUMBER() OVER(ORDER BY a) AS rn FROM (SELECT * FROM Tab1 ORDER > BY a LIMIT 5) t > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-42776) BroadcastHashJoinExec.requiredChildDistribution called before columnar replacement rules
[ https://issues.apache.org/jira/browse/SPARK-42776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700286#comment-17700286 ] Timothy Miller edited comment on SPARK-42776 at 3/14/23 4:34 PM: - A little more detail about the sequence events that cause this bug: * org.apache.spark.sql.execution.RemoveRedundantProjects is applied * that causes BroadcastHashJoinExec to get created * org.apache.spark.sql.execution.exchange.EnsureRequirements is applied * BroadcastHashJoinExec.requiredChildDistribution gets called, creating the hashmap object that gets broadcast * a few more rules are applied, followed by org.apache.spark.sql.execution.ApplyColumnarRulesAndInsertTransitions * Only after that can I replace BroadcastHashJoinExec with a columnar alternative, but by then it's too late. I can't find a way to inject extra rules into or between RemoveRedundantProjects or EnsureRequirements, so there doesn't seem to be a workaround either. was (Author: JIRAUSER287471): A little more detail about the sequence events that cause this bug: * org.apache.spark.sql.execution.RemoveRedundantProjects is applied * that causes BroadcastHashJoinExec to get created * org.apache.spark.sql.execution.exchange.EnsureRequirements is applied * BroadcastHashJoinExec.requiredChildDistribution gets called, creating the hashmap object that gets broadcast * a few more rules are applied, followed by org.apache.spark.sql.execution.ApplyColumnarRulesAndInsertTransitions I can't find a way to inject extra rules into or between RemoveRedundantProjects or EnsureRequirements, so there doesn't seem to be a workaround either. > BroadcastHashJoinExec.requiredChildDistribution called before columnar > replacement rules > > > Key: SPARK-42776 > URL: https://issues.apache.org/jira/browse/SPARK-42776 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 3.3.1 > Environment: I'm prototyping on a Mac, but that's not really relevant. >Reporter: Timothy Miller >Priority: Major > > I am trying to replace BroadcastHashJoinExec with a columnar equivalent. > However, I noticed that BroadcastHashJoinExec.requiredChildDistribution gets > called BEFORE the columnar replacement rules. As a result, the object that > gets broadcast is the plain old hashmap created from row data. By the time > the columnar replacement rules are applied, it's too late to get Spark to > broadcast any other kind of object. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42693) API Auditing
[ https://issues.apache.org/jira/browse/SPARK-42693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-42693: -- Target Version/s: 3.4.0 > API Auditing > > > Key: SPARK-42693 > URL: https://issues.apache.org/jira/browse/SPARK-42693 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark, Spark Core, SQL, Structured Streaming >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Blocker > > Audit user-facing API of Spark 3.4. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42693) API Auditing
[ https://issues.apache.org/jira/browse/SPARK-42693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700300#comment-17700300 ] Dongjoon Hyun commented on SPARK-42693: --- Hi, [~XinrongM]. This JIRA is open as a `Blocker` issue, but there is no activity. Could you share the progress please, [~XinrongM]? > API Auditing > > > Key: SPARK-42693 > URL: https://issues.apache.org/jira/browse/SPARK-42693 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark, Spark Core, SQL, Structured Streaming >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Blocker > > Audit user-facing API of Spark 3.4. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42754) Spark 3.4 history server's SQL tab incorrectly groups SQL executions when replaying event logs from Spark 3.3 and earlier
[ https://issues.apache.org/jira/browse/SPARK-42754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-42754: -- Target Version/s: 3.4.0 > Spark 3.4 history server's SQL tab incorrectly groups SQL executions when > replaying event logs from Spark 3.3 and earlier > - > > Key: SPARK-42754 > URL: https://issues.apache.org/jira/browse/SPARK-42754 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Josh Rosen >Assignee: Linhong Liu >Priority: Blocker > Fix For: 3.4.1 > > Attachments: example.png > > > In Spark 3.4.0 RC4, the Spark History Server's SQL tab incorrectly groups SQL > executions when replaying event logs generated by older Spark versions. > > {*}Reproduction{*}: > {{In ./bin/spark-shell --conf spark.eventLog.enabled=true --conf > spark.eventLog.dir=eventlogs, run three non-nested SQL queries:}} > {code:java} > sql("select * from range(10)").collect() > sql("select * from range(20)").collect() > sql("select * from range(30)").collect(){code} > Exit the shell and use the Spark History Server to replay this application's > UI. > In the SQL tab I expect to see three separate queries, but Spark 3.4's > history server incorrectly groups the second and third queries as nested > queries of the first (see attached screenshot). > > {*}Root cause{*}: > [https://github.com/apache/spark/pull/39268] / SPARK-41752 added a new > *non-optional* {{rootExecutionId: Long}} field to the > SparkListenerSQLExecutionStart case class. > When JsonProtocol deserializes this event it uses the "ignore missing > properties" Jackson deserialization option, causing the > {{rootExecutionField}} to be initialized with a default value of {{{}0{}}}. > The value {{0}} is a legitimate execution ID, so in the deserialized event we > have no ability to distinguish between the absence of a value and a case > where all queries have the first query as the root. > *Proposed* {*}fix{*}: > I think we should change this field to be of type {{Option[Long]}} . I > believe this is a release blocker for Spark 3.4.0 because we cannot change > the type of this new field in a future release without breaking binary > compatibility. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42754) Spark 3.4 history server's SQL tab incorrectly groups SQL executions when replaying event logs from Spark 3.3 and earlier
[ https://issues.apache.org/jira/browse/SPARK-42754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-42754. --- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 40403 [https://github.com/apache/spark/pull/40403] > Spark 3.4 history server's SQL tab incorrectly groups SQL executions when > replaying event logs from Spark 3.3 and earlier > - > > Key: SPARK-42754 > URL: https://issues.apache.org/jira/browse/SPARK-42754 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Josh Rosen >Priority: Blocker > Fix For: 3.4.0 > > Attachments: example.png > > > In Spark 3.4.0 RC4, the Spark History Server's SQL tab incorrectly groups SQL > executions when replaying event logs generated by older Spark versions. > > {*}Reproduction{*}: > {{In ./bin/spark-shell --conf spark.eventLog.enabled=true --conf > spark.eventLog.dir=eventlogs, run three non-nested SQL queries:}} > {code:java} > sql("select * from range(10)").collect() > sql("select * from range(20)").collect() > sql("select * from range(30)").collect(){code} > Exit the shell and use the Spark History Server to replay this application's > UI. > In the SQL tab I expect to see three separate queries, but Spark 3.4's > history server incorrectly groups the second and third queries as nested > queries of the first (see attached screenshot). > > {*}Root cause{*}: > [https://github.com/apache/spark/pull/39268] / SPARK-41752 added a new > *non-optional* {{rootExecutionId: Long}} field to the > SparkListenerSQLExecutionStart case class. > When JsonProtocol deserializes this event it uses the "ignore missing > properties" Jackson deserialization option, causing the > {{rootExecutionField}} to be initialized with a default value of {{{}0{}}}. > The value {{0}} is a legitimate execution ID, so in the deserialized event we > have no ability to distinguish between the absence of a value and a case > where all queries have the first query as the root. > *Proposed* {*}fix{*}: > I think we should change this field to be of type {{Option[Long]}} . I > believe this is a release blocker for Spark 3.4.0 because we cannot change > the type of this new field in a future release without breaking binary > compatibility. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42754) Spark 3.4 history server's SQL tab incorrectly groups SQL executions when replaying event logs from Spark 3.3 and earlier
[ https://issues.apache.org/jira/browse/SPARK-42754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-42754: -- Fix Version/s: 3.4.1 (was: 3.4.0) > Spark 3.4 history server's SQL tab incorrectly groups SQL executions when > replaying event logs from Spark 3.3 and earlier > - > > Key: SPARK-42754 > URL: https://issues.apache.org/jira/browse/SPARK-42754 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Josh Rosen >Assignee: Linhong Liu >Priority: Blocker > Fix For: 3.4.1 > > Attachments: example.png > > > In Spark 3.4.0 RC4, the Spark History Server's SQL tab incorrectly groups SQL > executions when replaying event logs generated by older Spark versions. > > {*}Reproduction{*}: > {{In ./bin/spark-shell --conf spark.eventLog.enabled=true --conf > spark.eventLog.dir=eventlogs, run three non-nested SQL queries:}} > {code:java} > sql("select * from range(10)").collect() > sql("select * from range(20)").collect() > sql("select * from range(30)").collect(){code} > Exit the shell and use the Spark History Server to replay this application's > UI. > In the SQL tab I expect to see three separate queries, but Spark 3.4's > history server incorrectly groups the second and third queries as nested > queries of the first (see attached screenshot). > > {*}Root cause{*}: > [https://github.com/apache/spark/pull/39268] / SPARK-41752 added a new > *non-optional* {{rootExecutionId: Long}} field to the > SparkListenerSQLExecutionStart case class. > When JsonProtocol deserializes this event it uses the "ignore missing > properties" Jackson deserialization option, causing the > {{rootExecutionField}} to be initialized with a default value of {{{}0{}}}. > The value {{0}} is a legitimate execution ID, so in the deserialized event we > have no ability to distinguish between the absence of a value and a case > where all queries have the first query as the root. > *Proposed* {*}fix{*}: > I think we should change this field to be of type {{Option[Long]}} . I > believe this is a release blocker for Spark 3.4.0 because we cannot change > the type of this new field in a future release without breaking binary > compatibility. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42754) Spark 3.4 history server's SQL tab incorrectly groups SQL executions when replaying event logs from Spark 3.3 and earlier
[ https://issues.apache.org/jira/browse/SPARK-42754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-42754: - Assignee: Linhong Liu > Spark 3.4 history server's SQL tab incorrectly groups SQL executions when > replaying event logs from Spark 3.3 and earlier > - > > Key: SPARK-42754 > URL: https://issues.apache.org/jira/browse/SPARK-42754 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Josh Rosen >Assignee: Linhong Liu >Priority: Blocker > Fix For: 3.4.0 > > Attachments: example.png > > > In Spark 3.4.0 RC4, the Spark History Server's SQL tab incorrectly groups SQL > executions when replaying event logs generated by older Spark versions. > > {*}Reproduction{*}: > {{In ./bin/spark-shell --conf spark.eventLog.enabled=true --conf > spark.eventLog.dir=eventlogs, run three non-nested SQL queries:}} > {code:java} > sql("select * from range(10)").collect() > sql("select * from range(20)").collect() > sql("select * from range(30)").collect(){code} > Exit the shell and use the Spark History Server to replay this application's > UI. > In the SQL tab I expect to see three separate queries, but Spark 3.4's > history server incorrectly groups the second and third queries as nested > queries of the first (see attached screenshot). > > {*}Root cause{*}: > [https://github.com/apache/spark/pull/39268] / SPARK-41752 added a new > *non-optional* {{rootExecutionId: Long}} field to the > SparkListenerSQLExecutionStart case class. > When JsonProtocol deserializes this event it uses the "ignore missing > properties" Jackson deserialization option, causing the > {{rootExecutionField}} to be initialized with a default value of {{{}0{}}}. > The value {{0}} is a legitimate execution ID, so in the deserialized event we > have no ability to distinguish between the absence of a value and a case > where all queries have the first query as the root. > *Proposed* {*}fix{*}: > I think we should change this field to be of type {{Option[Long]}} . I > believe this is a release blocker for Spark 3.4.0 because we cannot change > the type of this new field in a future release without breaking binary > compatibility. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42782) Port the tests for get_json_object from the Apache Hive project
[ https://issues.apache.org/jira/browse/SPARK-42782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-42782. --- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 40409 [https://github.com/apache/spark/pull/40409] > Port the tests for get_json_object from the Apache Hive project > --- > > Key: SPARK-42782 > URL: https://issues.apache.org/jira/browse/SPARK-42782 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Fix For: 3.5.0 > > > https://github.com/apache/hive/blob/ba0217ff17501fb849d8999e808d37579db7b4f1/ql/src/test/org/apache/hadoop/hive/ql/udf/TestUDFJson.java -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42776) BroadcastHashJoinExec.requiredChildDistribution called before columnar replacement rules
[ https://issues.apache.org/jira/browse/SPARK-42776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700286#comment-17700286 ] Timothy Miller commented on SPARK-42776: A little more detail about the sequence events that cause this bug: * org.apache.spark.sql.execution.RemoveRedundantProjects is applied * that causes BroadcastHashJoinExec to get created * org.apache.spark.sql.execution.exchange.EnsureRequirements is applied * BroadcastHashJoinExec.requiredChildDistribution gets called, creating the hashmap object that gets broadcast * a few more rules are applied, followed by org.apache.spark.sql.execution.ApplyColumnarRulesAndInsertTransitions I can't find a way to inject extra rules into or between RemoveRedundantProjects or EnsureRequirements, so there doesn't seem to be a workaround either. > BroadcastHashJoinExec.requiredChildDistribution called before columnar > replacement rules > > > Key: SPARK-42776 > URL: https://issues.apache.org/jira/browse/SPARK-42776 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 3.3.1 > Environment: I'm prototyping on a Mac, but that's not really relevant. >Reporter: Timothy Miller >Priority: Major > > I am trying to replace BroadcastHashJoinExec with a columnar equivalent. > However, I noticed that BroadcastHashJoinExec.requiredChildDistribution gets > called BEFORE the columnar replacement rules. As a result, the object that > gets broadcast is the plain old hashmap created from row data. By the time > the columnar replacement rules are applied, it's too late to get Spark to > broadcast any other kind of object. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42782) Port the tests for get_json_object from the Apache Hive project
[ https://issues.apache.org/jira/browse/SPARK-42782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-42782: -- Component/s: Tests > Port the tests for get_json_object from the Apache Hive project > --- > > Key: SPARK-42782 > URL: https://issues.apache.org/jira/browse/SPARK-42782 > Project: Spark > Issue Type: Improvement > Components: SQL, Tests >Affects Versions: 3.5.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Fix For: 3.5.0 > > > https://github.com/apache/hive/blob/ba0217ff17501fb849d8999e808d37579db7b4f1/ql/src/test/org/apache/hadoop/hive/ql/udf/TestUDFJson.java -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42782) Port the tests for get_json_object from the Apache Hive project
[ https://issues.apache.org/jira/browse/SPARK-42782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-42782: - Assignee: Yuming Wang > Port the tests for get_json_object from the Apache Hive project > --- > > Key: SPARK-42782 > URL: https://issues.apache.org/jira/browse/SPARK-42782 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > > https://github.com/apache/hive/blob/ba0217ff17501fb849d8999e808d37579db7b4f1/ql/src/test/org/apache/hadoop/hive/ql/udf/TestUDFJson.java -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42617) Support `isocalendar`
[ https://issues.apache.org/jira/browse/SPARK-42617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42617: Assignee: (was: Apache Spark) > Support `isocalendar` > - > > Key: SPARK-42617 > URL: https://issues.apache.org/jira/browse/SPARK-42617 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Priority: Major > > We should support `isocalendar` to match pandas behavior > (https://pandas.pydata.org/pandas-docs/dev/reference/api/pandas.Series.dt.isocalendar.html) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42617) Support `isocalendar`
[ https://issues.apache.org/jira/browse/SPARK-42617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42617: Assignee: Apache Spark > Support `isocalendar` > - > > Key: SPARK-42617 > URL: https://issues.apache.org/jira/browse/SPARK-42617 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Assignee: Apache Spark >Priority: Major > > We should support `isocalendar` to match pandas behavior > (https://pandas.pydata.org/pandas-docs/dev/reference/api/pandas.Series.dt.isocalendar.html) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42617) Support `isocalendar`
[ https://issues.apache.org/jira/browse/SPARK-42617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700282#comment-17700282 ] Apache Spark commented on SPARK-42617: -- User 'dzhigimont' has created a pull request for this issue: https://github.com/apache/spark/pull/40420 > Support `isocalendar` > - > > Key: SPARK-42617 > URL: https://issues.apache.org/jira/browse/SPARK-42617 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Priority: Major > > We should support `isocalendar` to match pandas behavior > (https://pandas.pydata.org/pandas-docs/dev/reference/api/pandas.Series.dt.isocalendar.html) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42770) SQLImplicitsTestSuite test failed with Java 17
[ https://issues.apache.org/jira/browse/SPARK-42770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-42770. --- Fix Version/s: 3.4.1 Resolution: Fixed Issue resolved by pull request 40395 [https://github.com/apache/spark/pull/40395] > SQLImplicitsTestSuite test failed with Java 17 > -- > > Key: SPARK-42770 > URL: https://issues.apache.org/jira/browse/SPARK-42770 > Project: Spark > Issue Type: Bug > Components: Connect, Tests >Affects Versions: 3.4.0, 3.5.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Fix For: 3.4.1 > > > [https://github.com/apache/spark/actions/runs/4318647315/jobs/7537203682] > {code:java} > [info] - test implicit encoder resolution *** FAILED *** (1 second, 329 > milliseconds) > 4429[info] 2023-03-02T23:00:20.404434 did not equal > 2023-03-02T23:00:20.404434875 (SQLImplicitsTestSuite.scala:63) > 4430[info] org.scalatest.exceptions.TestFailedException: > 4431[info] at > org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472) > 4432[info] at > org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471) > 4433[info] at > org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231) > 4434[info] at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295) > 4435[info] at > org.apache.spark.sql.SQLImplicitsTestSuite.testImplicit$1(SQLImplicitsTestSuite.scala:63) > 4436[info] at > org.apache.spark.sql.SQLImplicitsTestSuite.$anonfun$new$2(SQLImplicitsTestSuite.scala:133) > 4437[info] at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > 4438[info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) > 4439[info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) > 4440[info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > 4441[info] at org.scalatest.Transformer.apply(Transformer.scala:22) > 4442[info] at org.scalatest.Transformer.apply(Transformer.scala:20) > 4443[info] at > org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226) > [info] at org.scalatest.TestSuite.withFixture(TestSuite.scala:196) > 4445[info] at org.scalatest.TestSuite.withFixture$(TestSuite.scala:195) > 4446[info] at > org.scalatest.funsuite.AnyFunSuite.withFixture(AnyFunSuite.scala:1564) > 4447[info] at > org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224) > 4448[info] at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236) > 4449[info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) > 4450[info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236) > 4451[info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218) > 4452[info] at > org.scalatest.funsuite.AnyFunSuite.runTest(AnyFunSuite.scala:1564) > 4453[info] at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269) > 4454[info] at > org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413) > 4455[info] at scala.collection.immutable.List.foreach(List.scala:431) > 4456[info] at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) > 4457[info] at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396) > 4458[info] at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475) > 4459[info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269) > 4460[info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268) > 4461[info] at > org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1564) > 4462[info] at org.scalatest.Suite.run(Suite.scala:1114) > 4463[info] at org.scalatest.Suite.run$(Suite.scala:1096) > 4464[info] at > org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1564) > 4465[info] at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:273) > 4466[info] at org.scalatest.SuperEngine.runImpl(Engine.scala:535) > 4467[info] at > org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:273) > 4468[info] at > org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:272) > 4469[info] at > org.apache.spark.sql.SQLImplicitsTestSuite.org$scalatest$BeforeAndAfterAll$$super$run(SQLImplicitsTestSuite.scala:34) > 4470[info] at > org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213) > 4471[info] at > org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210) > 4472[info] at > org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208) > 4473[info] at >
[jira] [Assigned] (SPARK-42770) SQLImplicitsTestSuite test failed with Java 17
[ https://issues.apache.org/jira/browse/SPARK-42770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-42770: - Assignee: Yang Jie > SQLImplicitsTestSuite test failed with Java 17 > -- > > Key: SPARK-42770 > URL: https://issues.apache.org/jira/browse/SPARK-42770 > Project: Spark > Issue Type: Bug > Components: Connect, Tests >Affects Versions: 3.4.0, 3.5.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > > [https://github.com/apache/spark/actions/runs/4318647315/jobs/7537203682] > {code:java} > [info] - test implicit encoder resolution *** FAILED *** (1 second, 329 > milliseconds) > 4429[info] 2023-03-02T23:00:20.404434 did not equal > 2023-03-02T23:00:20.404434875 (SQLImplicitsTestSuite.scala:63) > 4430[info] org.scalatest.exceptions.TestFailedException: > 4431[info] at > org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472) > 4432[info] at > org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471) > 4433[info] at > org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231) > 4434[info] at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295) > 4435[info] at > org.apache.spark.sql.SQLImplicitsTestSuite.testImplicit$1(SQLImplicitsTestSuite.scala:63) > 4436[info] at > org.apache.spark.sql.SQLImplicitsTestSuite.$anonfun$new$2(SQLImplicitsTestSuite.scala:133) > 4437[info] at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > 4438[info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) > 4439[info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) > 4440[info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > 4441[info] at org.scalatest.Transformer.apply(Transformer.scala:22) > 4442[info] at org.scalatest.Transformer.apply(Transformer.scala:20) > 4443[info] at > org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226) > [info] at org.scalatest.TestSuite.withFixture(TestSuite.scala:196) > 4445[info] at org.scalatest.TestSuite.withFixture$(TestSuite.scala:195) > 4446[info] at > org.scalatest.funsuite.AnyFunSuite.withFixture(AnyFunSuite.scala:1564) > 4447[info] at > org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224) > 4448[info] at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236) > 4449[info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) > 4450[info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236) > 4451[info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218) > 4452[info] at > org.scalatest.funsuite.AnyFunSuite.runTest(AnyFunSuite.scala:1564) > 4453[info] at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269) > 4454[info] at > org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413) > 4455[info] at scala.collection.immutable.List.foreach(List.scala:431) > 4456[info] at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) > 4457[info] at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396) > 4458[info] at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475) > 4459[info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269) > 4460[info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268) > 4461[info] at > org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1564) > 4462[info] at org.scalatest.Suite.run(Suite.scala:1114) > 4463[info] at org.scalatest.Suite.run$(Suite.scala:1096) > 4464[info] at > org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1564) > 4465[info] at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:273) > 4466[info] at org.scalatest.SuperEngine.runImpl(Engine.scala:535) > 4467[info] at > org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:273) > 4468[info] at > org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:272) > 4469[info] at > org.apache.spark.sql.SQLImplicitsTestSuite.org$scalatest$BeforeAndAfterAll$$super$run(SQLImplicitsTestSuite.scala:34) > 4470[info] at > org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213) > 4471[info] at > org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210) > 4472[info] at > org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208) > 4473[info] at > org.apache.spark.sql.SQLImplicitsTestSuite.run(SQLImplicitsTestSuite.scala:34) > 4474[info] at >
[jira] [Updated] (SPARK-42785) [K8S][Core] When spark submit without --deploy-mode, will face NPE in Kubernetes Case
[ https://issues.apache.org/jira/browse/SPARK-42785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-42785: -- Affects Version/s: 3.2.4 3.3.3 3.4.0 (was: 3.3.2) > [K8S][Core] When spark submit without --deploy-mode, will face NPE in > Kubernetes Case > - > > Key: SPARK-42785 > URL: https://issues.apache.org/jira/browse/SPARK-42785 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.2.4, 3.3.3, 3.4.0 >Reporter: binjie yang >Assignee: binjie yang >Priority: Major > Fix For: 3.2.4, 3.3.3, 3.4.1 > > > According to this PR > [https://github.com/apache/spark/pull/37880#issuecomment-134890,] when > user spark submit without `--deploy-mode XXX` or `–conf > spark.submit.deployMode=`, may face NPE with this code > > args.deployMode.equals("client") > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42785) [K8S][Core] When spark submit without --deploy-mode, will face NPE in Kubernetes Case
[ https://issues.apache.org/jira/browse/SPARK-42785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-42785. --- Fix Version/s: 3.3.3 3.2.4 3.4.1 Resolution: Fixed Issue resolved by pull request 40414 [https://github.com/apache/spark/pull/40414] > [K8S][Core] When spark submit without --deploy-mode, will face NPE in > Kubernetes Case > - > > Key: SPARK-42785 > URL: https://issues.apache.org/jira/browse/SPARK-42785 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.3.2 >Reporter: binjie yang >Assignee: binjie yang >Priority: Major > Fix For: 3.3.3, 3.2.4, 3.4.1 > > > According to this PR > [https://github.com/apache/spark/pull/37880#issuecomment-134890,] when > user spark submit without `--deploy-mode XXX` or `–conf > spark.submit.deployMode=`, may face NPE with this code > > args.deployMode.equals("client") > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42785) [K8S][Core] When spark submit without --deploy-mode, will face NPE in Kubernetes Case
[ https://issues.apache.org/jira/browse/SPARK-42785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-42785: - Assignee: binjie yang > [K8S][Core] When spark submit without --deploy-mode, will face NPE in > Kubernetes Case > - > > Key: SPARK-42785 > URL: https://issues.apache.org/jira/browse/SPARK-42785 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.3.2 >Reporter: binjie yang >Assignee: binjie yang >Priority: Major > > According to this PR > [https://github.com/apache/spark/pull/37880#issuecomment-134890,] when > user spark submit without `--deploy-mode XXX` or `–conf > spark.submit.deployMode=`, may face NPE with this code > > args.deployMode.equals("client") > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org