[jira] [Commented] (SPARK-42750) Support INSERT INTO by name

2023-03-14 Thread Xinsen (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700509#comment-17700509
 ] 

Xinsen commented on SPARK-42750:


May I solve it? I'm interested in this. And by the way, does it just include 
inserting into hive table and hdfs file, or it includes jdbc tables like MySQL 
table?

> Support INSERT INTO by name
> ---
>
> Key: SPARK-42750
> URL: https://issues.apache.org/jira/browse/SPARK-42750
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Jose Torres
>Priority: Major
>
> In some use cases, users have incoming dataframes with fixed column names 
> which might differ from the canonical order. Currently there's no way to 
> handle this easily through the INSERT INTO API - the user has to make sure 
> the columns are in the right order as they would when inserting a tuple. We 
> should add an optional BY NAME clause, such that:
> INSERT INTO tgt BY NAME 
> takes each column of  and inserts it into the column in `tgt` which 
> has the same name according to the configured `resolver` logic.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42801) Fix Flaky ClientE2ETestSuite

2023-03-14 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-42801:
-

 Summary: Fix Flaky ClientE2ETestSuite
 Key: SPARK-42801
 URL: https://issues.apache.org/jira/browse/SPARK-42801
 Project: Spark
  Issue Type: Bug
  Components: Connect, Tests
Affects Versions: 3.4.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42706) Document the Spark SQL error classes in user-facing documentation.

2023-03-14 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700505#comment-17700505
 ] 

Apache Spark commented on SPARK-42706:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/40433

> Document the Spark SQL error classes in user-facing documentation.
> --
>
> Key: SPARK-42706
> URL: https://issues.apache.org/jira/browse/SPARK-42706
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
> Fix For: 3.5.0
>
>
> We need to have an error class list to user facing documents.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42800) Implement ml function {array_to_vector, vector_to_array}

2023-03-14 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42800:


Assignee: Apache Spark

> Implement ml function {array_to_vector, vector_to_array}
> 
>
> Key: SPARK-42800
> URL: https://issues.apache.org/jira/browse/SPARK-42800
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, ML, PySpark
>Affects Versions: 3.5.0
>Reporter: Ruifeng Zheng
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42800) Implement ml function {array_to_vector, vector_to_array}

2023-03-14 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42800:


Assignee: (was: Apache Spark)

> Implement ml function {array_to_vector, vector_to_array}
> 
>
> Key: SPARK-42800
> URL: https://issues.apache.org/jira/browse/SPARK-42800
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, ML, PySpark
>Affects Versions: 3.5.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42800) Implement ml function {array_to_vector, vector_to_array}

2023-03-14 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700496#comment-17700496
 ] 

Apache Spark commented on SPARK-42800:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/40432

> Implement ml function {array_to_vector, vector_to_array}
> 
>
> Key: SPARK-42800
> URL: https://issues.apache.org/jira/browse/SPARK-42800
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, ML, PySpark
>Affects Versions: 3.5.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42800) Implement ml function {array_to_vector, vector_to_array}

2023-03-14 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-42800:
-

 Summary: Implement ml function {array_to_vector, vector_to_array}
 Key: SPARK-42800
 URL: https://issues.apache.org/jira/browse/SPARK-42800
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, ML, PySpark
Affects Versions: 3.5.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42799) Update SBT build `xercesImpl` version to match with pom.xml

2023-03-14 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-42799:
--
Affects Version/s: 3.3.1
   3.3.0
   3.4.0
   3.3.2

> Update SBT build `xercesImpl` version to match with pom.xml
> ---
>
> Key: SPARK-42799
> URL: https://issues.apache.org/jira/browse/SPARK-42799
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.3.0, 3.2.2, 3.3.1, 3.3.2, 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42797) Spark Connect - Grammatical improvements to Spark Overview and Spark Connect Overview doc pages

2023-03-14 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-42797:
-
Fix Version/s: 3.4.1
   (was: 3.4.0)

> Spark Connect - Grammatical improvements to Spark Overview and Spark Connect 
> Overview doc pages
> ---
>
> Key: SPARK-42797
> URL: https://issues.apache.org/jira/browse/SPARK-42797
> Project: Spark
>  Issue Type: Documentation
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Allan Folting
>Assignee: Allan Folting
>Priority: Major
> Fix For: 3.4.1
>
>
> Grammatical improvements, this is a follow-up to this ticket:
> Introducing Spark Connect on the main page and adding Spark Connect Overview 
> page
> https://issues.apache.org/jira/browse/SPARK-42496



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42765) Enable importing `pandas_udf` from `pyspark.sql.connect.functions`

2023-03-14 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-42765:


Assignee: Xinrong Meng

> Enable importing `pandas_udf` from `pyspark.sql.connect.functions`
> --
>
> Key: SPARK-42765
> URL: https://issues.apache.org/jira/browse/SPARK-42765
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>
> Remove the outdated import path of `pandas_udf`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42765) Enable importing `pandas_udf` from `pyspark.sql.connect.functions`

2023-03-14 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-42765.
--
Fix Version/s: 3.4.1
   Resolution: Fixed

Issue resolved by pull request 40388
[https://github.com/apache/spark/pull/40388]

> Enable importing `pandas_udf` from `pyspark.sql.connect.functions`
> --
>
> Key: SPARK-42765
> URL: https://issues.apache.org/jira/browse/SPARK-42765
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.4.1
>
>
> Remove the outdated import path of `pandas_udf`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42797) Spark Connect - Grammatical improvements to Spark Overview and Spark Connect Overview doc pages

2023-03-14 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-42797.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 40428
[https://github.com/apache/spark/pull/40428]

> Spark Connect - Grammatical improvements to Spark Overview and Spark Connect 
> Overview doc pages
> ---
>
> Key: SPARK-42797
> URL: https://issues.apache.org/jira/browse/SPARK-42797
> Project: Spark
>  Issue Type: Documentation
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Allan Folting
>Assignee: Allan Folting
>Priority: Major
> Fix For: 3.4.0
>
>
> Grammatical improvements, this is a follow-up to this ticket:
> Introducing Spark Connect on the main page and adding Spark Connect Overview 
> page
> https://issues.apache.org/jira/browse/SPARK-42496



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42799) Update SBT build `xercesImpl` version to match with pom.xml

2023-03-14 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42799:


Assignee: Apache Spark

> Update SBT build `xercesImpl` version to match with pom.xml
> ---
>
> Key: SPARK-42799
> URL: https://issues.apache.org/jira/browse/SPARK-42799
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.2.2
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42797) Spark Connect - Grammatical improvements to Spark Overview and Spark Connect Overview doc pages

2023-03-14 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-42797:


Assignee: Allan Folting

> Spark Connect - Grammatical improvements to Spark Overview and Spark Connect 
> Overview doc pages
> ---
>
> Key: SPARK-42797
> URL: https://issues.apache.org/jira/browse/SPARK-42797
> Project: Spark
>  Issue Type: Documentation
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Allan Folting
>Assignee: Allan Folting
>Priority: Major
>
> Grammatical improvements, this is a follow-up to this ticket:
> Introducing Spark Connect on the main page and adding Spark Connect Overview 
> page
> https://issues.apache.org/jira/browse/SPARK-42496



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42799) Update SBT build `xercesImpl` version to match with pom.xml

2023-03-14 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42799:


Assignee: (was: Apache Spark)

> Update SBT build `xercesImpl` version to match with pom.xml
> ---
>
> Key: SPARK-42799
> URL: https://issues.apache.org/jira/browse/SPARK-42799
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.2.2
>Reporter: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42799) Update SBT build `xercesImpl` version to match with pom.xml

2023-03-14 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700492#comment-17700492
 ] 

Apache Spark commented on SPARK-42799:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/40431

> Update SBT build `xercesImpl` version to match with pom.xml
> ---
>
> Key: SPARK-42799
> URL: https://issues.apache.org/jira/browse/SPARK-42799
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.2.2
>Reporter: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42799) Update SBT build `xercesImpl` version to match with pom.xml

2023-03-14 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-42799:
-

 Summary: Update SBT build `xercesImpl` version to match with 
pom.xml
 Key: SPARK-42799
 URL: https://issues.apache.org/jira/browse/SPARK-42799
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 3.2.2
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42666) Fix `createDataFrame` to work properly with rows and schema

2023-03-14 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee resolved SPARK-42666.
-
Fix Version/s: 3.4.0
   Resolution: Duplicate

Resolved from SPARK-42679

> Fix `createDataFrame` to work properly with rows and schema
> ---
>
> Key: SPARK-42666
> URL: https://issues.apache.org/jira/browse/SPARK-42666
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
> Fix For: 3.4.0
>
>
> The code below is not working properly in Spark Connect:
> {code:java}
> >>> sdf = spark.range(10)
> >>> spark.createDataFrame(sdf.tail(5), sdf.schema) 
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 94, in 
> __repr__
>     return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes))
>   File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 162, in 
> dtypes
>     return [(str(f.name), f.dataType.simpleString()) for f in 
> self.schema.fields]
>   File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 1346, in 
> schema
>     self._schema = self._session.client.schema(query)
>   File "/.../spark/python/pyspark/sql/connect/client.py", line 614, in schema
>     proto_schema = self._analyze(method="schema", plan=plan).schema
>   File "/.../spark/python/pyspark/sql/connect/client.py", line 755, in 
> _analyze
>     self._handle_error(rpc_error)
>   File "/.../spark/python/pyspark/sql/connect/client.py", line 894, in 
> _handle_error
>     raise convert_exception(info, status.message) from None
> pyspark.errors.exceptions.connect.AnalysisException: 
> [NULLABLE_COLUMN_OR_FIELD] Column or field `id` is nullable while it's 
> required to be non-nullable.{code}
> whereas working properly in regular PySpark:
> {code:java}
> >>> sdf = spark.range(10)
> >>> spark.createDataFrame(sdf.tail(5), sdf.schema).show()
> +---+
> | id|
> +---+
> |  5|
> |  6|
> |  7|
> |  8|
> |  9|
> +---+ {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42798) Upgrade protobuf-java to 3.22.2

2023-03-14 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42798:


Assignee: Apache Spark

> Upgrade protobuf-java to 3.22.2
> ---
>
> Key: SPARK-42798
> URL: https://issues.apache.org/jira/browse/SPARK-42798
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Major
>
> * [https://github.com/protocolbuffers/protobuf/releases/tag/v22.1]
>  * [https://github.com/protocolbuffers/protobuf/releases/tag/v22.2]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42798) Upgrade protobuf-java to 3.22.2

2023-03-14 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42798:


Assignee: (was: Apache Spark)

> Upgrade protobuf-java to 3.22.2
> ---
>
> Key: SPARK-42798
> URL: https://issues.apache.org/jira/browse/SPARK-42798
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Major
>
> * [https://github.com/protocolbuffers/protobuf/releases/tag/v22.1]
>  * [https://github.com/protocolbuffers/protobuf/releases/tag/v22.2]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42798) Upgrade protobuf-java to 3.22.2

2023-03-14 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700484#comment-17700484
 ] 

Apache Spark commented on SPARK-42798:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/40430

> Upgrade protobuf-java to 3.22.2
> ---
>
> Key: SPARK-42798
> URL: https://issues.apache.org/jira/browse/SPARK-42798
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Major
>
> * [https://github.com/protocolbuffers/protobuf/releases/tag/v22.1]
>  * [https://github.com/protocolbuffers/protobuf/releases/tag/v22.2]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42571) Provide a mode to replace Py4J for local communication

2023-03-14 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-42571:
-
Affects Version/s: 3.5.0
   (was: 3.4.0)

> Provide a mode to replace Py4J for local communication
> --
>
> Key: SPARK-42571
> URL: https://issues.apache.org/jira/browse/SPARK-42571
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.5.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> We can replace Py4J even when master is specified. e.g., 
> SparkSession.builder.master("...") and communicate w/ JVM via Spark Connect 
> instead of Py4J.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42729) Update Submitting Applications page for Spark Connect

2023-03-14 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-42729:
-
Affects Version/s: 3.5.0
   (was: 3.4.0)

> Update Submitting Applications page for Spark Connect
> -
>
> Key: SPARK-42729
> URL: https://issues.apache.org/jira/browse/SPARK-42729
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Documentation
>Affects Versions: 3.5.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> https://spark.apache.org/docs/latest/submitting-applications.html
> Should we add Spark Connect application building content here or create 
> another, Spark Connect application building page.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42798) Upgrade protobuf-java to 3.22.2

2023-03-14 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-42798:
-
Description: 
* [https://github.com/protocolbuffers/protobuf/releases/tag/v22.1]
 * [https://github.com/protocolbuffers/protobuf/releases/tag/v22.2]

> Upgrade protobuf-java to 3.22.2
> ---
>
> Key: SPARK-42798
> URL: https://issues.apache.org/jira/browse/SPARK-42798
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Major
>
> * [https://github.com/protocolbuffers/protobuf/releases/tag/v22.1]
>  * [https://github.com/protocolbuffers/protobuf/releases/tag/v22.2]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42798) Upgrade protobuf-java to 3.22.2

2023-03-14 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-42798:
-
Environment: (was: * 
[https://github.com/protocolbuffers/protobuf/releases/tag/v22.1]
 * https://github.com/protocolbuffers/protobuf/releases/tag/v22.2)

> Upgrade protobuf-java to 3.22.2
> ---
>
> Key: SPARK-42798
> URL: https://issues.apache.org/jira/browse/SPARK-42798
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42798) Upgrade protobuf-java to 3.22.2

2023-03-14 Thread Yang Jie (Jira)
Yang Jie created SPARK-42798:


 Summary: Upgrade protobuf-java to 3.22.2
 Key: SPARK-42798
 URL: https://issues.apache.org/jira/browse/SPARK-42798
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.5.0
 Environment: * 
[https://github.com/protocolbuffers/protobuf/releases/tag/v22.1]
 * https://github.com/protocolbuffers/protobuf/releases/tag/v22.2
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42794) Increase the lockAcquireTimeoutMs for acquiring the RocksDB state store in Structure Streaming

2023-03-14 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim reassigned SPARK-42794:


Assignee: Huanli Wang

> Increase the lockAcquireTimeoutMs for acquiring the RocksDB state store in 
> Structure Streaming
> --
>
> Key: SPARK-42794
> URL: https://issues.apache.org/jira/browse/SPARK-42794
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Huanli Wang
>Assignee: Huanli Wang
>Priority: Minor
>
> We are seeing query failure which is caused by RocksDB acquisition failure 
> for the retry tasks.
>  *  at t1, we shrink the cluster to only have one executor
> {code:java}
> 23/03/05 22:47:21 INFO StandaloneAppClient$ClientEndpoint: Executor updated: 
> app-20230305224215-/2 is now DECOMMISSIONED (worker decommissioned 
> because of kill request from HTTP endpoint (data migration disabled))
> 23/03/05 22:47:21 INFO StandaloneAppClient$ClientEndpoint: Executor updated: 
> app-20230305224215-/3 is now DECOMMISSIONED (worker decommissioned 
> because of kill request from HTTP endpoint (data migration disabled))
> {code}
>  
>  * at t1+2min, task 7 at its first attempt (i.e. task 7.0) is scheduled to 
> the alive executor
> {code:java}
> 23/03/05 22:49:58 INFO TaskSetManager: Starting task 7.0 in stage 133.0 (TID 
> 685) (10.166.225.249, executor 0, partition 7, ANY, {code}
>  
> It seems that task 7.0 is able to pass [*{{dataRDD.iterator(partition, 
> ctxt)}}*|https://github.com/apache/spark/blob/4db8e7b7944302a3929dd6a1197ea1385eecc46a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStoreRDD.scala#L123]
>  and acquires the rocksdb lock as we are seeing
> {code:java}
> 23/03/05 22:51:59 WARN TaskSetManager: Lost task 4.1 in stage 133.1 (TID 700) 
> (10.166.225.249 executor 0): java.lang.IllegalStateException: 
> StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be 
> acquired by [ThreadId: Some(50), task: partition 7.1 in stage 133.1, TID 700] 
> as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage 
> 133.0, TID 685] after 60003 ms.
> 23/03/05 22:52:59 WARN TaskSetManager: Lost task 4.2 in stage 133.1 (TID 702) 
> (10.166.225.249 executor 0): java.lang.IllegalStateException: 
> StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be 
> acquired by [ThreadId: Some(1495), task: partition 7.2 in stage 133.1, TID 
> 702] as it was not released by [ThreadId: Some(449), task: partition 7.0 in 
> stage 133.0, TID 685] after 60006 ms.
> 23/03/05 22:53:59 WARN TaskSetManager: Lost task 4.3 in stage 133.1 (TID 704) 
> (10.166.225.249 executor 0): java.lang.IllegalStateException: 
> StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be 
> acquired by [ThreadId: Some(46), task: partition 7.3 in stage 133.1, TID 704] 
> as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage 
> 133.0, TID 685] after 60003 ms.
> {code}
>  
> Increasing the *lockAcquireTimeoutMs* to 2 minutes such that 4 task retries 
> will give us 8 minutes to acquire the lock and it is larger than 
> connectionTimeout with retries (3 * 120s).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42794) Increase the lockAcquireTimeoutMs for acquiring the RocksDB state store in Structure Streaming

2023-03-14 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim resolved SPARK-42794.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 40425
[https://github.com/apache/spark/pull/40425]

> Increase the lockAcquireTimeoutMs for acquiring the RocksDB state store in 
> Structure Streaming
> --
>
> Key: SPARK-42794
> URL: https://issues.apache.org/jira/browse/SPARK-42794
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Huanli Wang
>Assignee: Huanli Wang
>Priority: Minor
> Fix For: 3.5.0
>
>
> We are seeing query failure which is caused by RocksDB acquisition failure 
> for the retry tasks.
>  *  at t1, we shrink the cluster to only have one executor
> {code:java}
> 23/03/05 22:47:21 INFO StandaloneAppClient$ClientEndpoint: Executor updated: 
> app-20230305224215-/2 is now DECOMMISSIONED (worker decommissioned 
> because of kill request from HTTP endpoint (data migration disabled))
> 23/03/05 22:47:21 INFO StandaloneAppClient$ClientEndpoint: Executor updated: 
> app-20230305224215-/3 is now DECOMMISSIONED (worker decommissioned 
> because of kill request from HTTP endpoint (data migration disabled))
> {code}
>  
>  * at t1+2min, task 7 at its first attempt (i.e. task 7.0) is scheduled to 
> the alive executor
> {code:java}
> 23/03/05 22:49:58 INFO TaskSetManager: Starting task 7.0 in stage 133.0 (TID 
> 685) (10.166.225.249, executor 0, partition 7, ANY, {code}
>  
> It seems that task 7.0 is able to pass [*{{dataRDD.iterator(partition, 
> ctxt)}}*|https://github.com/apache/spark/blob/4db8e7b7944302a3929dd6a1197ea1385eecc46a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStoreRDD.scala#L123]
>  and acquires the rocksdb lock as we are seeing
> {code:java}
> 23/03/05 22:51:59 WARN TaskSetManager: Lost task 4.1 in stage 133.1 (TID 700) 
> (10.166.225.249 executor 0): java.lang.IllegalStateException: 
> StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be 
> acquired by [ThreadId: Some(50), task: partition 7.1 in stage 133.1, TID 700] 
> as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage 
> 133.0, TID 685] after 60003 ms.
> 23/03/05 22:52:59 WARN TaskSetManager: Lost task 4.2 in stage 133.1 (TID 702) 
> (10.166.225.249 executor 0): java.lang.IllegalStateException: 
> StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be 
> acquired by [ThreadId: Some(1495), task: partition 7.2 in stage 133.1, TID 
> 702] as it was not released by [ThreadId: Some(449), task: partition 7.0 in 
> stage 133.0, TID 685] after 60006 ms.
> 23/03/05 22:53:59 WARN TaskSetManager: Lost task 4.3 in stage 133.1 (TID 704) 
> (10.166.225.249 executor 0): java.lang.IllegalStateException: 
> StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be 
> acquired by [ThreadId: Some(46), task: partition 7.3 in stage 133.1, TID 704] 
> as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage 
> 133.0, TID 685] after 60003 ms.
> {code}
>  
> Increasing the *lockAcquireTimeoutMs* to 2 minutes such that 4 task retries 
> will give us 8 minutes to acquire the lock and it is larger than 
> connectionTimeout with retries (3 * 120s).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42374) User-facing documentation

2023-03-14 Thread Allan Folting (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allan Folting updated SPARK-42374:
--
Summary: User-facing documentation  (was: User-facing documentaiton)

> User-facing documentation
> -
>
> Key: SPARK-42374
> URL: https://issues.apache.org/jira/browse/SPARK-42374
> Project: Spark
>  Issue Type: Documentation
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Haejoon Lee
>Priority: Major
>
> Should provide the user-facing documentation so end users how to use Spark 
> Connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42796) Support TimestampNTZ in Cached Batch

2023-03-14 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-42796:
-
Fix Version/s: 3.4.1

> Support TimestampNTZ in Cached Batch
> 
>
> Key: SPARK-42796
> URL: https://issues.apache.org/jira/browse/SPARK-42796
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.1
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.4.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42422) Upgrade `maven-shade-plugin` to 3.4.1

2023-03-14 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-42422:
--
Affects Version/s: 3.4.0
   (was: 3.5.0)

> Upgrade `maven-shade-plugin` to 3.4.1
> -
>
> Key: SPARK-42422
> URL: https://issues.apache.org/jira/browse/SPARK-42422
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
> Fix For: 3.4.0
>
>
> * 
> [https://github.com/apache/maven-shade-plugin/releases/tag/maven-shade-plugin-3.3.0]
>  * 
> https://github.com/apache/maven-shade-plugin/compare/maven-shade-plugin-3.3.0...maven-shade-plugin-3.4.1



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42775) approx_percentile produces wrong results for large decimals.

2023-03-14 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700448#comment-17700448
 ] 

Apache Spark commented on SPARK-42775:
--

User 'chenhao-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/40429

> approx_percentile produces wrong results for large decimals.
> 
>
> Key: SPARK-42775
> URL: https://issues.apache.org/jira/browse/SPARK-42775
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0, 2.2.0, 2.3.0, 2.4.0, 3.0.0, 3.1.0, 3.2.0, 3.3.0, 
> 3.4.0
>Reporter: Chenhao Li
>Priority: Major
>
> In the {{approx_percentile}} expression, Spark casts decimal to double to 
> update the aggregation state 
> ([ApproximatePercentile.scala#L181|https://github.com/apache/spark/blob/933dc0c42f0caf74aaa077fd4f2c2e7208452b9b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala#L181])
>  and casts the result double back to decimal 
> ([ApproximatePercentile.scala#L206|https://github.com/apache/spark/blob/933dc0c42f0caf74aaa077fd4f2c2e7208452b9b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala#L206]).
>  The precision loss in the casts can make the result decimal out of its 
> precision range. This can lead to the following counter-intuitive results:
> {code:sql}
> spark-sql> select approx_percentile(col, 0.5) from values 
> (999) as tab(col);
> NULL
> spark-sql> select approx_percentile(col, 0.5) is null from values 
> (999) as tab(col);
> false
> spark-sql> select cast(approx_percentile(col, 0.5) as string) from values 
> (999) as tab(col);
> 1000
> spark-sql> desc select approx_percentile(col, 0.5) from values 
> (999) as tab(col);
> approx_percentile(col, 0.5, 1)decimal(19,0) 
> {code}
> The result is actually not null, so the second query returns false. The first 
> query returns null because the result cannot fit into {{{}decimal(19, 0){}}}.
> A suggested fix is to use {{Decimal.changePrecision}} here to ensure the 
> result fits, and really returns a null or throws an exception when the result 
> doesn't fit.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42775) approx_percentile produces wrong results for large decimals.

2023-03-14 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42775:


Assignee: (was: Apache Spark)

> approx_percentile produces wrong results for large decimals.
> 
>
> Key: SPARK-42775
> URL: https://issues.apache.org/jira/browse/SPARK-42775
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0, 2.2.0, 2.3.0, 2.4.0, 3.0.0, 3.1.0, 3.2.0, 3.3.0, 
> 3.4.0
>Reporter: Chenhao Li
>Priority: Major
>
> In the {{approx_percentile}} expression, Spark casts decimal to double to 
> update the aggregation state 
> ([ApproximatePercentile.scala#L181|https://github.com/apache/spark/blob/933dc0c42f0caf74aaa077fd4f2c2e7208452b9b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala#L181])
>  and casts the result double back to decimal 
> ([ApproximatePercentile.scala#L206|https://github.com/apache/spark/blob/933dc0c42f0caf74aaa077fd4f2c2e7208452b9b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala#L206]).
>  The precision loss in the casts can make the result decimal out of its 
> precision range. This can lead to the following counter-intuitive results:
> {code:sql}
> spark-sql> select approx_percentile(col, 0.5) from values 
> (999) as tab(col);
> NULL
> spark-sql> select approx_percentile(col, 0.5) is null from values 
> (999) as tab(col);
> false
> spark-sql> select cast(approx_percentile(col, 0.5) as string) from values 
> (999) as tab(col);
> 1000
> spark-sql> desc select approx_percentile(col, 0.5) from values 
> (999) as tab(col);
> approx_percentile(col, 0.5, 1)decimal(19,0) 
> {code}
> The result is actually not null, so the second query returns false. The first 
> query returns null because the result cannot fit into {{{}decimal(19, 0){}}}.
> A suggested fix is to use {{Decimal.changePrecision}} here to ensure the 
> result fits, and really returns a null or throws an exception when the result 
> doesn't fit.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42775) approx_percentile produces wrong results for large decimals.

2023-03-14 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42775:


Assignee: Apache Spark

> approx_percentile produces wrong results for large decimals.
> 
>
> Key: SPARK-42775
> URL: https://issues.apache.org/jira/browse/SPARK-42775
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0, 2.2.0, 2.3.0, 2.4.0, 3.0.0, 3.1.0, 3.2.0, 3.3.0, 
> 3.4.0
>Reporter: Chenhao Li
>Assignee: Apache Spark
>Priority: Major
>
> In the {{approx_percentile}} expression, Spark casts decimal to double to 
> update the aggregation state 
> ([ApproximatePercentile.scala#L181|https://github.com/apache/spark/blob/933dc0c42f0caf74aaa077fd4f2c2e7208452b9b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala#L181])
>  and casts the result double back to decimal 
> ([ApproximatePercentile.scala#L206|https://github.com/apache/spark/blob/933dc0c42f0caf74aaa077fd4f2c2e7208452b9b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala#L206]).
>  The precision loss in the casts can make the result decimal out of its 
> precision range. This can lead to the following counter-intuitive results:
> {code:sql}
> spark-sql> select approx_percentile(col, 0.5) from values 
> (999) as tab(col);
> NULL
> spark-sql> select approx_percentile(col, 0.5) is null from values 
> (999) as tab(col);
> false
> spark-sql> select cast(approx_percentile(col, 0.5) as string) from values 
> (999) as tab(col);
> 1000
> spark-sql> desc select approx_percentile(col, 0.5) from values 
> (999) as tab(col);
> approx_percentile(col, 0.5, 1)decimal(19,0) 
> {code}
> The result is actually not null, so the second query returns false. The first 
> query returns null because the result cannot fit into {{{}decimal(19, 0){}}}.
> A suggested fix is to use {{Decimal.changePrecision}} here to ensure the 
> result fits, and really returns a null or throws an exception when the result 
> doesn't fit.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42797) Spark Connect - Grammatical improvements to Spark Overview and Spark Connect Overview doc pages

2023-03-14 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42797:


Assignee: Apache Spark

> Spark Connect - Grammatical improvements to Spark Overview and Spark Connect 
> Overview doc pages
> ---
>
> Key: SPARK-42797
> URL: https://issues.apache.org/jira/browse/SPARK-42797
> Project: Spark
>  Issue Type: Documentation
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Allan Folting
>Assignee: Apache Spark
>Priority: Major
>
> Grammatical improvements, this is a follow-up to this ticket:
> Introducing Spark Connect on the main page and adding Spark Connect Overview 
> page
> https://issues.apache.org/jira/browse/SPARK-42496



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42797) Spark Connect - Grammatical improvements to Spark Overview and Spark Connect Overview doc pages

2023-03-14 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700444#comment-17700444
 ] 

Apache Spark commented on SPARK-42797:
--

User 'allanf-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/40428

> Spark Connect - Grammatical improvements to Spark Overview and Spark Connect 
> Overview doc pages
> ---
>
> Key: SPARK-42797
> URL: https://issues.apache.org/jira/browse/SPARK-42797
> Project: Spark
>  Issue Type: Documentation
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Allan Folting
>Priority: Major
>
> Grammatical improvements, this is a follow-up to this ticket:
> Introducing Spark Connect on the main page and adding Spark Connect Overview 
> page
> https://issues.apache.org/jira/browse/SPARK-42496



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42797) Spark Connect - Grammatical improvements to Spark Overview and Spark Connect Overview doc pages

2023-03-14 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42797:


Assignee: (was: Apache Spark)

> Spark Connect - Grammatical improvements to Spark Overview and Spark Connect 
> Overview doc pages
> ---
>
> Key: SPARK-42797
> URL: https://issues.apache.org/jira/browse/SPARK-42797
> Project: Spark
>  Issue Type: Documentation
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Allan Folting
>Priority: Major
>
> Grammatical improvements, this is a follow-up to this ticket:
> Introducing Spark Connect on the main page and adding Spark Connect Overview 
> page
> https://issues.apache.org/jira/browse/SPARK-42496



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42797) Spark Connect - Grammatical improvements to Spark Overview and Spark Connect Overview doc pages

2023-03-14 Thread Allan Folting (Jira)
Allan Folting created SPARK-42797:
-

 Summary: Spark Connect - Grammatical improvements to Spark 
Overview and Spark Connect Overview doc pages
 Key: SPARK-42797
 URL: https://issues.apache.org/jira/browse/SPARK-42797
 Project: Spark
  Issue Type: Documentation
  Components: Spark Core
Affects Versions: 3.4.0
Reporter: Allan Folting


Grammatical improvements, this is a follow-up to this ticket:

Introducing Spark Connect on the main page and adding Spark Connect Overview 
page
https://issues.apache.org/jira/browse/SPARK-42496



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42496) Introducing Spark Connect on the main page and adding Spark Connect Overview page

2023-03-14 Thread Allan Folting (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allan Folting updated SPARK-42496:
--
Summary: Introducing Spark Connect on the main page and adding Spark 
Connect Overview page  (was: Introducting Spark Connect at main page)

> Introducing Spark Connect on the main page and adding Spark Connect Overview 
> page
> -
>
> Key: SPARK-42496
> URL: https://issues.apache.org/jira/browse/SPARK-42496
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Documentation
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
> Fix For: 3.4.1
>
>
> We should document the introduction of Spark Connect at PySpark main 
> documentation page to give a summary to users.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42789) Rewrite multiple GetJsonObjects to a JsonTuple if their json expression is the same

2023-03-14 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-42789:

Description: 
Benchmark result:
{noformat}

Running benchmark: Benchmark rewrite GetJsonObjects
  Running case: Default: 2
  Stopped after 2 iterations, 77193 ms
  Running case: Rewrite: 2
  Stopped after 2 iterations, 51699 ms

Java HotSpot(TM) 64-Bit Server VM 17.0.4.1+1-LTS-2 on Mac OS X 13.2.1
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
Benchmark rewrite GetJsonObjects: Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative

Default: 237914  38597 
966  0.25244.0   1.0X
Rewrite: 224887  25850
1361  0.33442.2   1.5X

Running benchmark: Benchmark rewrite GetJsonObjects
  Running case: Default: 3
  Stopped after 2 iterations, 110890 ms
  Running case: Rewrite: 3
  Stopped after 2 iterations, 56102 ms

Java HotSpot(TM) 64-Bit Server VM 17.0.4.1+1-LTS-2 on Mac OS X 13.2.1
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
Benchmark rewrite GetJsonObjects: Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative

Default: 352862  55445 
NaN  0.17311.6   1.0X
Rewrite: 326752  28051
1837  0.33700.2   2.0X

Running benchmark: Benchmark rewrite GetJsonObjects
  Running case: Default: 4
  Stopped after 2 iterations, 150828 ms
  Running case: Rewrite: 4
  Stopped after 2 iterations, 57110 ms

Java HotSpot(TM) 64-Bit Server VM 17.0.4.1+1-LTS-2 on Mac OS X 13.2.1
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
Benchmark rewrite GetJsonObjects: Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative

Default: 471680  75414 
NaN  0.19914.4   1.0X
Rewrite: 428452  28555 
145  0.33935.4   2.5X

Running benchmark: Benchmark rewrite GetJsonObjects
  Running case: Default: 5
  Stopped after 2 iterations, 223367 ms
  Running case: Rewrite: 5
  Stopped after 2 iterations, 78193 ms

Java HotSpot(TM) 64-Bit Server VM 17.0.4.1+1-LTS-2 on Mac OS X 13.2.1
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
Benchmark rewrite GetJsonObjects: Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative

Default: 5   108479 111684
1447  0.1   15004.2   1.0X
Rewrite: 536830  39097 
NaN  0.25094.0   2.9X

Running benchmark: Benchmark rewrite GetJsonObjects
  Running case: Default: 10
  Stopped after 2 iterations, 311453 ms
  Running case: Rewrite: 10
  Stopped after 2 iterations, 65873 ms

Java HotSpot(TM) 64-Bit Server VM 17.0.4.1+1-LTS-2 on Mac OS X 13.2.1
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
Benchmark rewrite GetJsonObjects: Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative

Default: 10  153952 155727
2510  0.0   21293.7   1.0X
Rewrite: 10   32436  32937 
708  0.24486.3   4.7X

Running benchmark: Benchmark rewrite GetJsonObjects
  Running case: Default: 15
  Stopped after 2 iterations, 451911 ms
  Running case: Rewrite: 15
  Stopped after 2 iterations, 69790 ms

Java HotSpot(TM) 64-Bit Server VM 17.0.4.1+1-LTS-2 on Mac OS X 13.2.1
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
Benchmark rewrite GetJsonObjects: Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative

Default: 15  224950 225956
1423  0.0   31113.6   1.0X
Rewrite: 15   34806  34895 
126  0.24814.2   6.5X

Running benchmark: Benchmark 

[jira] [Resolved] (SPARK-42793) `connect` module requires `build_profile_flags`

2023-03-14 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-42793.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 40424
[https://github.com/apache/spark/pull/40424]

> `connect` module requires `build_profile_flags`
> ---
>
> Key: SPARK-42793
> URL: https://issues.apache.org/jira/browse/SPARK-42793
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42793) `connect` module requires `build_profile_flags`

2023-03-14 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-42793:


Assignee: Dongjoon Hyun

> `connect` module requires `build_profile_flags`
> ---
>
> Key: SPARK-42793
> URL: https://issues.apache.org/jira/browse/SPARK-42793
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42757) Implement textFile for DataFrameReader

2023-03-14 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-42757.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 40377
[https://github.com/apache/spark/pull/40377]

> Implement textFile for DataFrameReader
> --
>
> Key: SPARK-42757
> URL: https://issues.apache.org/jira/browse/SPARK-42757
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.1
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42757) Implement textFile for DataFrameReader

2023-03-14 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-42757:


Assignee: BingKun Pan

> Implement textFile for DataFrameReader
> --
>
> Key: SPARK-42757
> URL: https://issues.apache.org/jira/browse/SPARK-42757
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.1
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42731) Update Spark Configuration

2023-03-14 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-42731.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 40416
[https://github.com/apache/spark/pull/40416]

> Update Spark Configuration
> --
>
> Key: SPARK-42731
> URL: https://issues.apache.org/jira/browse/SPARK-42731
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Documentation
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>
> https://spark.apache.org/docs/latest/configuration.html
> Add a section for Spark Connect configurations



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42731) Update Spark Configuration

2023-03-14 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-42731:


Assignee: Hyukjin Kwon

> Update Spark Configuration
> --
>
> Key: SPARK-42731
> URL: https://issues.apache.org/jira/browse/SPARK-42731
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Documentation
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>
> https://spark.apache.org/docs/latest/configuration.html
> Add a section for Spark Connect configurations



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42508) Extract the common .ml classes to `mllib-common`

2023-03-14 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-42508.
--
Resolution: Fixed

Fixed in https://github.com/apache/spark/pull/40097

> Extract the common .ml classes to `mllib-common`
> 
>
> Key: SPARK-42508
> URL: https://issues.apache.org/jira/browse/SPARK-42508
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, ML
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42508) Extract the common .ml classes to `mllib-common`

2023-03-14 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-42508:
-
Fix Version/s: 3.5.0

> Extract the common .ml classes to `mllib-common`
> 
>
> Key: SPARK-42508
> URL: https://issues.apache.org/jira/browse/SPARK-42508
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, ML
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42792) Add support to track FLUSH_WRITE_BYTES for RocksDB state store for streaming

2023-03-14 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700413#comment-17700413
 ] 

Apache Spark commented on SPARK-42792:
--

User 'anishshri-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/40427

> Add support to track FLUSH_WRITE_BYTES for RocksDB state store for streaming
> 
>
> Key: SPARK-42792
> URL: https://issues.apache.org/jira/browse/SPARK-42792
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Anish Shrigondekar
>Priority: Major
>
> Add support to track FLUSH_WRITE_BYTES for RocksDB state store for streaming
>  
> Its useful to get this metric for bytes written during flush from RocksDB as 
> part of the DB custom metrics. We propose to add this to the existing metrics 
> that are collected. There is no additional overhead since we are just 
> querying the internal ticker guage, similar to other metrics.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42792) Add support to track FLUSH_WRITE_BYTES for RocksDB state store for streaming

2023-03-14 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42792:


Assignee: (was: Apache Spark)

> Add support to track FLUSH_WRITE_BYTES for RocksDB state store for streaming
> 
>
> Key: SPARK-42792
> URL: https://issues.apache.org/jira/browse/SPARK-42792
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Anish Shrigondekar
>Priority: Major
>
> Add support to track FLUSH_WRITE_BYTES for RocksDB state store for streaming
>  
> Its useful to get this metric for bytes written during flush from RocksDB as 
> part of the DB custom metrics. We propose to add this to the existing metrics 
> that are collected. There is no additional overhead since we are just 
> querying the internal ticker guage, similar to other metrics.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42792) Add support to track FLUSH_WRITE_BYTES for RocksDB state store for streaming

2023-03-14 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42792:


Assignee: Apache Spark

> Add support to track FLUSH_WRITE_BYTES for RocksDB state store for streaming
> 
>
> Key: SPARK-42792
> URL: https://issues.apache.org/jira/browse/SPARK-42792
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Anish Shrigondekar
>Assignee: Apache Spark
>Priority: Major
>
> Add support to track FLUSH_WRITE_BYTES for RocksDB state store for streaming
>  
> Its useful to get this metric for bytes written during flush from RocksDB as 
> part of the DB custom metrics. We propose to add this to the existing metrics 
> that are collected. There is no additional overhead since we are just 
> querying the internal ticker guage, similar to other metrics.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42796) Support TimestampNTZ in Cached Batch

2023-03-14 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700414#comment-17700414
 ] 

Apache Spark commented on SPARK-42796:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/40426

> Support TimestampNTZ in Cached Batch
> 
>
> Key: SPARK-42796
> URL: https://issues.apache.org/jira/browse/SPARK-42796
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.1
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42796) Support TimestampNTZ in Cached Batch

2023-03-14 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42796:


Assignee: Apache Spark  (was: Gengliang Wang)

> Support TimestampNTZ in Cached Batch
> 
>
> Key: SPARK-42796
> URL: https://issues.apache.org/jira/browse/SPARK-42796
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.1
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42796) Support TimestampNTZ in Cached Batch

2023-03-14 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700412#comment-17700412
 ] 

Apache Spark commented on SPARK-42796:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/40426

> Support TimestampNTZ in Cached Batch
> 
>
> Key: SPARK-42796
> URL: https://issues.apache.org/jira/browse/SPARK-42796
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.1
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42796) Support TimestampNTZ in Cached Batch

2023-03-14 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42796:


Assignee: Gengliang Wang  (was: Apache Spark)

> Support TimestampNTZ in Cached Batch
> 
>
> Key: SPARK-42796
> URL: https://issues.apache.org/jira/browse/SPARK-42796
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.1
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42796) Support TimestampNTZ in Cached Batch

2023-03-14 Thread Gengliang Wang (Jira)
Gengliang Wang created SPARK-42796:
--

 Summary: Support TimestampNTZ in Cached Batch
 Key: SPARK-42796
 URL: https://issues.apache.org/jira/browse/SPARK-42796
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.1
Reporter: Gengliang Wang
Assignee: Gengliang Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42794) Increase the lockAcquireTimeoutMs for acquiring the RocksDB state store in Structure Streaming

2023-03-14 Thread Huanli Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huanli Wang updated SPARK-42794:

Description: 
We are seeing query failure which is caused by RocksDB acquisition failure for 
the retry tasks.
 *  at t1, we shrink the cluster to only have one executor

{code:java}
23/03/05 22:47:21 INFO StandaloneAppClient$ClientEndpoint: Executor updated: 
app-20230305224215-/2 is now DECOMMISSIONED (worker decommissioned because 
of kill request from HTTP endpoint (data migration disabled))
23/03/05 22:47:21 INFO StandaloneAppClient$ClientEndpoint: Executor updated: 
app-20230305224215-/3 is now DECOMMISSIONED (worker decommissioned because 
of kill request from HTTP endpoint (data migration disabled))
{code}
 
 * at t1+2min, task 7 at its first attempt (i.e. task 7.0) is scheduled to the 
alive executor

{code:java}
23/03/05 22:49:58 INFO TaskSetManager: Starting task 7.0 in stage 133.0 (TID 
685) (10.166.225.249, executor 0, partition 7, ANY, {code}
 

It seems that task 7.0 is able to pass [*{{dataRDD.iterator(partition, 
ctxt)}}*|https://github.com/apache/spark/blob/4db8e7b7944302a3929dd6a1197ea1385eecc46a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStoreRDD.scala#L123]
 and acquires the rocksdb lock as we are seeing
{code:java}
23/03/05 22:51:59 WARN TaskSetManager: Lost task 4.1 in stage 133.1 (TID 700) 
(10.166.225.249 executor 0): java.lang.IllegalStateException: 
StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be 
acquired by [ThreadId: Some(50), task: partition 7.1 in stage 133.1, TID 700] 
as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage 
133.0, TID 685] after 60003 ms.
23/03/05 22:52:59 WARN TaskSetManager: Lost task 4.2 in stage 133.1 (TID 702) 
(10.166.225.249 executor 0): java.lang.IllegalStateException: 
StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be 
acquired by [ThreadId: Some(1495), task: partition 7.2 in stage 133.1, TID 702] 
as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage 
133.0, TID 685] after 60006 ms.
23/03/05 22:53:59 WARN TaskSetManager: Lost task 4.3 in stage 133.1 (TID 704) 
(10.166.225.249 executor 0): java.lang.IllegalStateException: 
StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be 
acquired by [ThreadId: Some(46), task: partition 7.3 in stage 133.1, TID 704] 
as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage 
133.0, TID 685] after 60003 ms.
{code}
 
Increasing the *lockAcquireTimeoutMs* to 2 minutes such that 4 task retries 
will give us 8 minutes to acquire the lock and it is larger than 
connectionTimeout with retries (3 * 120s).

  was:
We are seeing query failure which is caused by RocksDB acquisition failure for 
the retry tasks.
 *  at t1, we shrink the cluster to only have one executor

{code:java}
23/03/05 22:47:21 INFO StandaloneAppClient$ClientEndpoint: Executor updated: 
app-20230305224215-/2 is now DECOMMISSIONED (worker decommissioned because 
of kill request from HTTP endpoint (data migration disabled))
23/03/05 22:47:21 INFO StandaloneAppClient$ClientEndpoint: Executor updated: 
app-20230305224215-/3 is now DECOMMISSIONED (worker decommissioned because 
of kill request from HTTP endpoint (data migration disabled))
{code}
 
 * at t1+2min, task 7 at its first attempt (i.e. task 7.0) is scheduled to the 
alive executor

{code:java}
23/03/05 22:49:58 INFO TaskSetManager: Starting task 7.0 in stage 133.0 (TID 
685) (10.166.225.249, executor 0, partition 7, ANY, {code}
 

It seems that task 7.0 is able to pass *{{dataRDD.iterator(partition, ctxt)}}* 
and acquires the rocksdb lock as we are seeing
{code:java}
23/03/05 22:51:59 WARN TaskSetManager: Lost task 4.1 in stage 133.1 (TID 700) 
(10.166.225.249 executor 0): java.lang.IllegalStateException: 
StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be 
acquired by [ThreadId: Some(50), task: partition 7.1 in stage 133.1, TID 700] 
as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage 
133.0, TID 685] after 60003 ms.
23/03/05 22:52:59 WARN TaskSetManager: Lost task 4.2 in stage 133.1 (TID 702) 
(10.166.225.249 executor 0): java.lang.IllegalStateException: 
StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be 
acquired by [ThreadId: Some(1495), task: partition 7.2 in stage 133.1, TID 702] 
as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage 
133.0, TID 685] after 60006 ms.
23/03/05 22:53:59 WARN TaskSetManager: Lost task 4.3 in stage 133.1 (TID 704) 
(10.166.225.249 executor 0): java.lang.IllegalStateException: 
StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be 
acquired by [ThreadId: Some(46), task: partition 7.3 in stage 133.1, TID 704] 
as it was not released by [ThreadId: Some(449), task: partition 

[jira] [Commented] (SPARK-42794) Increase the lockAcquireTimeoutMs for acquiring the RocksDB state store in Structure Streaming

2023-03-14 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700402#comment-17700402
 ] 

Apache Spark commented on SPARK-42794:
--

User 'huanliwang-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/40425

> Increase the lockAcquireTimeoutMs for acquiring the RocksDB state store in 
> Structure Streaming
> --
>
> Key: SPARK-42794
> URL: https://issues.apache.org/jira/browse/SPARK-42794
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Huanli Wang
>Priority: Minor
>
> We are seeing query failure which is caused by RocksDB acquisition failure 
> for the retry tasks.
>  *  at t1, we shrink the cluster to only have one executor
> {code:java}
> 23/03/05 22:47:21 INFO StandaloneAppClient$ClientEndpoint: Executor updated: 
> app-20230305224215-/2 is now DECOMMISSIONED (worker decommissioned 
> because of kill request from HTTP endpoint (data migration disabled))
> 23/03/05 22:47:21 INFO StandaloneAppClient$ClientEndpoint: Executor updated: 
> app-20230305224215-/3 is now DECOMMISSIONED (worker decommissioned 
> because of kill request from HTTP endpoint (data migration disabled))
> {code}
>  
>  * at t1+2min, task 7 at its first attempt (i.e. task 7.0) is scheduled to 
> the alive executor
> {code:java}
> 23/03/05 22:49:58 INFO TaskSetManager: Starting task 7.0 in stage 133.0 (TID 
> 685) (10.166.225.249, executor 0, partition 7, ANY, {code}
>  
> It seems that task 7.0 is able to pass *{{dataRDD.iterator(partition, 
> ctxt)}}* and acquires the rocksdb lock as we are seeing
> {code:java}
> 23/03/05 22:51:59 WARN TaskSetManager: Lost task 4.1 in stage 133.1 (TID 700) 
> (10.166.225.249 executor 0): java.lang.IllegalStateException: 
> StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be 
> acquired by [ThreadId: Some(50), task: partition 7.1 in stage 133.1, TID 700] 
> as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage 
> 133.0, TID 685] after 60003 ms.
> 23/03/05 22:52:59 WARN TaskSetManager: Lost task 4.2 in stage 133.1 (TID 702) 
> (10.166.225.249 executor 0): java.lang.IllegalStateException: 
> StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be 
> acquired by [ThreadId: Some(1495), task: partition 7.2 in stage 133.1, TID 
> 702] as it was not released by [ThreadId: Some(449), task: partition 7.0 in 
> stage 133.0, TID 685] after 60006 ms.
> 23/03/05 22:53:59 WARN TaskSetManager: Lost task 4.3 in stage 133.1 (TID 704) 
> (10.166.225.249 executor 0): java.lang.IllegalStateException: 
> StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be 
> acquired by [ThreadId: Some(46), task: partition 7.3 in stage 133.1, TID 704] 
> as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage 
> 133.0, TID 685] after 60003 ms.
> {code}
>  
> Increasing the *lockAcquireTimeoutMs* to 2 minutes such that 4 task retries 
> will give us 8 minutes to acquire the lock and it is larger than 
> connectionTimeout with retries (3 * 120s).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42794) Increase the lockAcquireTimeoutMs for acquiring the RocksDB state store in Structure Streaming

2023-03-14 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42794:


Assignee: (was: Apache Spark)

> Increase the lockAcquireTimeoutMs for acquiring the RocksDB state store in 
> Structure Streaming
> --
>
> Key: SPARK-42794
> URL: https://issues.apache.org/jira/browse/SPARK-42794
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Huanli Wang
>Priority: Minor
>
> We are seeing query failure which is caused by RocksDB acquisition failure 
> for the retry tasks.
>  *  at t1, we shrink the cluster to only have one executor
> {code:java}
> 23/03/05 22:47:21 INFO StandaloneAppClient$ClientEndpoint: Executor updated: 
> app-20230305224215-/2 is now DECOMMISSIONED (worker decommissioned 
> because of kill request from HTTP endpoint (data migration disabled))
> 23/03/05 22:47:21 INFO StandaloneAppClient$ClientEndpoint: Executor updated: 
> app-20230305224215-/3 is now DECOMMISSIONED (worker decommissioned 
> because of kill request from HTTP endpoint (data migration disabled))
> {code}
>  
>  * at t1+2min, task 7 at its first attempt (i.e. task 7.0) is scheduled to 
> the alive executor
> {code:java}
> 23/03/05 22:49:58 INFO TaskSetManager: Starting task 7.0 in stage 133.0 (TID 
> 685) (10.166.225.249, executor 0, partition 7, ANY, {code}
>  
> It seems that task 7.0 is able to pass *{{dataRDD.iterator(partition, 
> ctxt)}}* and acquires the rocksdb lock as we are seeing
> {code:java}
> 23/03/05 22:51:59 WARN TaskSetManager: Lost task 4.1 in stage 133.1 (TID 700) 
> (10.166.225.249 executor 0): java.lang.IllegalStateException: 
> StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be 
> acquired by [ThreadId: Some(50), task: partition 7.1 in stage 133.1, TID 700] 
> as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage 
> 133.0, TID 685] after 60003 ms.
> 23/03/05 22:52:59 WARN TaskSetManager: Lost task 4.2 in stage 133.1 (TID 702) 
> (10.166.225.249 executor 0): java.lang.IllegalStateException: 
> StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be 
> acquired by [ThreadId: Some(1495), task: partition 7.2 in stage 133.1, TID 
> 702] as it was not released by [ThreadId: Some(449), task: partition 7.0 in 
> stage 133.0, TID 685] after 60006 ms.
> 23/03/05 22:53:59 WARN TaskSetManager: Lost task 4.3 in stage 133.1 (TID 704) 
> (10.166.225.249 executor 0): java.lang.IllegalStateException: 
> StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be 
> acquired by [ThreadId: Some(46), task: partition 7.3 in stage 133.1, TID 704] 
> as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage 
> 133.0, TID 685] after 60003 ms.
> {code}
>  
> Increasing the *lockAcquireTimeoutMs* to 2 minutes such that 4 task retries 
> will give us 8 minutes to acquire the lock and it is larger than 
> connectionTimeout with retries (3 * 120s).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42794) Increase the lockAcquireTimeoutMs for acquiring the RocksDB state store in Structure Streaming

2023-03-14 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42794:


Assignee: Apache Spark

> Increase the lockAcquireTimeoutMs for acquiring the RocksDB state store in 
> Structure Streaming
> --
>
> Key: SPARK-42794
> URL: https://issues.apache.org/jira/browse/SPARK-42794
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Huanli Wang
>Assignee: Apache Spark
>Priority: Minor
>
> We are seeing query failure which is caused by RocksDB acquisition failure 
> for the retry tasks.
>  *  at t1, we shrink the cluster to only have one executor
> {code:java}
> 23/03/05 22:47:21 INFO StandaloneAppClient$ClientEndpoint: Executor updated: 
> app-20230305224215-/2 is now DECOMMISSIONED (worker decommissioned 
> because of kill request from HTTP endpoint (data migration disabled))
> 23/03/05 22:47:21 INFO StandaloneAppClient$ClientEndpoint: Executor updated: 
> app-20230305224215-/3 is now DECOMMISSIONED (worker decommissioned 
> because of kill request from HTTP endpoint (data migration disabled))
> {code}
>  
>  * at t1+2min, task 7 at its first attempt (i.e. task 7.0) is scheduled to 
> the alive executor
> {code:java}
> 23/03/05 22:49:58 INFO TaskSetManager: Starting task 7.0 in stage 133.0 (TID 
> 685) (10.166.225.249, executor 0, partition 7, ANY, {code}
>  
> It seems that task 7.0 is able to pass *{{dataRDD.iterator(partition, 
> ctxt)}}* and acquires the rocksdb lock as we are seeing
> {code:java}
> 23/03/05 22:51:59 WARN TaskSetManager: Lost task 4.1 in stage 133.1 (TID 700) 
> (10.166.225.249 executor 0): java.lang.IllegalStateException: 
> StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be 
> acquired by [ThreadId: Some(50), task: partition 7.1 in stage 133.1, TID 700] 
> as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage 
> 133.0, TID 685] after 60003 ms.
> 23/03/05 22:52:59 WARN TaskSetManager: Lost task 4.2 in stage 133.1 (TID 702) 
> (10.166.225.249 executor 0): java.lang.IllegalStateException: 
> StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be 
> acquired by [ThreadId: Some(1495), task: partition 7.2 in stage 133.1, TID 
> 702] as it was not released by [ThreadId: Some(449), task: partition 7.0 in 
> stage 133.0, TID 685] after 60006 ms.
> 23/03/05 22:53:59 WARN TaskSetManager: Lost task 4.3 in stage 133.1 (TID 704) 
> (10.166.225.249 executor 0): java.lang.IllegalStateException: 
> StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be 
> acquired by [ThreadId: Some(46), task: partition 7.3 in stage 133.1, TID 704] 
> as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage 
> 133.0, TID 685] after 60003 ms.
> {code}
>  
> Increasing the *lockAcquireTimeoutMs* to 2 minutes such that 4 task retries 
> will give us 8 minutes to acquire the lock and it is larger than 
> connectionTimeout with retries (3 * 120s).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42795) Create analyzer golden file based test suite

2023-03-14 Thread Daniel (Jira)
Daniel created SPARK-42795:
--

 Summary: Create analyzer golden file based test suite
 Key: SPARK-42795
 URL: https://issues.apache.org/jira/browse/SPARK-42795
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: Daniel






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42794) Increase the lockAcquireTimeoutMs for acquiring the RocksDB state store in Structure Streaming

2023-03-14 Thread Huanli Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huanli Wang updated SPARK-42794:

Priority: Minor  (was: Major)

> Increase the lockAcquireTimeoutMs for acquiring the RocksDB state store in 
> Structure Streaming
> --
>
> Key: SPARK-42794
> URL: https://issues.apache.org/jira/browse/SPARK-42794
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Huanli Wang
>Priority: Minor
>
> We are seeing query failure which is caused by RocksDB acquisition failure 
> for the retry tasks.
>  *  at t1, we shrink the cluster to only have one executor
> {code:java}
> 23/03/05 22:47:21 INFO StandaloneAppClient$ClientEndpoint: Executor updated: 
> app-20230305224215-/2 is now DECOMMISSIONED (worker decommissioned 
> because of kill request from HTTP endpoint (data migration disabled))
> 23/03/05 22:47:21 INFO StandaloneAppClient$ClientEndpoint: Executor updated: 
> app-20230305224215-/3 is now DECOMMISSIONED (worker decommissioned 
> because of kill request from HTTP endpoint (data migration disabled))
> {code}
>  
>  * at t1+2min, task 7 at its first attempt (i.e. task 7.0) is scheduled to 
> the alive executor
> {code:java}
> 23/03/05 22:49:58 INFO TaskSetManager: Starting task 7.0 in stage 133.0 (TID 
> 685) (10.166.225.249, executor 0, partition 7, ANY, {code}
>  
> It seems that task 7.0 is able to pass *{{dataRDD.iterator(partition, 
> ctxt)}}* and acquires the rocksdb lock as we are seeing
> {code:java}
> 23/03/05 22:51:59 WARN TaskSetManager: Lost task 4.1 in stage 133.1 (TID 700) 
> (10.166.225.249 executor 0): java.lang.IllegalStateException: 
> StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be 
> acquired by [ThreadId: Some(50), task: partition 7.1 in stage 133.1, TID 700] 
> as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage 
> 133.0, TID 685] after 60003 ms.
> 23/03/05 22:52:59 WARN TaskSetManager: Lost task 4.2 in stage 133.1 (TID 702) 
> (10.166.225.249 executor 0): java.lang.IllegalStateException: 
> StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be 
> acquired by [ThreadId: Some(1495), task: partition 7.2 in stage 133.1, TID 
> 702] as it was not released by [ThreadId: Some(449), task: partition 7.0 in 
> stage 133.0, TID 685] after 60006 ms.
> 23/03/05 22:53:59 WARN TaskSetManager: Lost task 4.3 in stage 133.1 (TID 704) 
> (10.166.225.249 executor 0): java.lang.IllegalStateException: 
> StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be 
> acquired by [ThreadId: Some(46), task: partition 7.3 in stage 133.1, TID 704] 
> as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage 
> 133.0, TID 685] after 60003 ms.
> {code}
>  
> Increasing the *lockAcquireTimeoutMs* to 2 minutes such that 4 task retries 
> will give us 8 minutes to acquire the lock and it is larger than 
> connectionTimeout with retries (3 * 120s).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42793) `connect` module requires `build_profile_flags`

2023-03-14 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700398#comment-17700398
 ] 

Apache Spark commented on SPARK-42793:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/40424

> `connect` module requires `build_profile_flags`
> ---
>
> Key: SPARK-42793
> URL: https://issues.apache.org/jira/browse/SPARK-42793
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42794) Increase the lockAcquireTimeoutMs for acquiring the RocksDB state store in Structure Streaming

2023-03-14 Thread Huanli Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huanli Wang updated SPARK-42794:

Description: 
We are seeing query failure which is caused by RocksDB acquisition failure for 
the retry tasks.
 *  at t1, we shrink the cluster to only have one executor

{code:java}
23/03/05 22:47:21 INFO StandaloneAppClient$ClientEndpoint: Executor updated: 
app-20230305224215-/2 is now DECOMMISSIONED (worker decommissioned because 
of kill request from HTTP endpoint (data migration disabled))
23/03/05 22:47:21 INFO StandaloneAppClient$ClientEndpoint: Executor updated: 
app-20230305224215-/3 is now DECOMMISSIONED (worker decommissioned because 
of kill request from HTTP endpoint (data migration disabled))
{code}
 
 * at t1+2min, task 7 at its first attempt (i.e. task 7.0) is scheduled to the 
alive executor

{code:java}
23/03/05 22:49:58 INFO TaskSetManager: Starting task 7.0 in stage 133.0 (TID 
685) (10.166.225.249, executor 0, partition 7, ANY, {code}
 

It seems that task 7.0 is able to pass *{{dataRDD.iterator(partition, ctxt)}}* 
and acquires the rocksdb lock as we are seeing
{code:java}
23/03/05 22:51:59 WARN TaskSetManager: Lost task 4.1 in stage 133.1 (TID 700) 
(10.166.225.249 executor 0): java.lang.IllegalStateException: 
StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be 
acquired by [ThreadId: Some(50), task: partition 7.1 in stage 133.1, TID 700] 
as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage 
133.0, TID 685] after 60003 ms.
23/03/05 22:52:59 WARN TaskSetManager: Lost task 4.2 in stage 133.1 (TID 702) 
(10.166.225.249 executor 0): java.lang.IllegalStateException: 
StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be 
acquired by [ThreadId: Some(1495), task: partition 7.2 in stage 133.1, TID 702] 
as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage 
133.0, TID 685] after 60006 ms.
23/03/05 22:53:59 WARN TaskSetManager: Lost task 4.3 in stage 133.1 (TID 704) 
(10.166.225.249 executor 0): java.lang.IllegalStateException: 
StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be 
acquired by [ThreadId: Some(46), task: partition 7.3 in stage 133.1, TID 704] 
as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage 
133.0, TID 685] after 60003 ms.
{code}
 
Increasing the *lockAcquireTimeoutMs* to 2 minutes such that 4 task retries 
will give us 8 minutes to acquire the lock and it is larger than 
connectionTimeout with retries (3 * 120s).

  was:
We are seeing query failure which is caused by RocksDB acquisition failure for 
the retry tasks.
 *  at t1, we shrink the cluster to only have one executor

{code:java}
23/03/05 22:47:21 INFO StandaloneAppClient$ClientEndpoint: Executor updated: 
app-20230305224215-/2 is now DECOMMISSIONED (worker decommissioned because 
of kill request from HTTP endpoint (data migration disabled))
23/03/05 22:47:21 INFO StandaloneAppClient$ClientEndpoint: Executor updated: 
app-20230305224215-/3 is now DECOMMISSIONED (worker decommissioned because 
of kill request from HTTP endpoint (data migration disabled))
{code}
 
 * at t1+2min, task 7 at its first attempt (i.e. task 7.0) is scheduled to the 
alive executor

{code:java}
23/03/05 22:49:58 INFO TaskSetManager: Starting task 7.0 in stage 133.0 (TID 
685) (10.166.225.249, executor 0, partition 7, ANY, {code}
 

It seems that task 7.0 is able to pass *{{dataRDD.iterator(partition, ctxt)}}* 
and acquires the rocksdb lock as we are seeing
{code:java}
23/03/05 22:51:59 WARN TaskSetManager: Lost task 4.1 in stage 133.1 (TID 700) 
(10.166.225.249 executor 0): java.lang.IllegalStateException: 
StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be 
acquired by [ThreadId: Some(50), task: partition 7.1 in stage 133.1, TID 700] 
as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage 
133.0, TID 685] after 60003 ms.
23/03/05 22:52:59 WARN TaskSetManager: Lost task 4.2 in stage 133.1 (TID 702) 
(10.166.225.249 executor 0): java.lang.IllegalStateException: 
StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be 
acquired by [ThreadId: Some(1495), task: partition 7.2 in stage 133.1, TID 702] 
as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage 
133.0, TID 685] after 60006 ms.
23/03/05 22:53:59 WARN TaskSetManager: Lost task 4.3 in stage 133.1 (TID 704) 
(10.166.225.249 executor 0): java.lang.IllegalStateException: 
StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be 
acquired by [ThreadId: Some(46), task: partition 7.3 in stage 133.1, TID 704] 
as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage 
133.0, TID 685] after 60003 ms.
{code}
 
Increasing the 

[jira] [Updated] (SPARK-42794) Increase the lockAcquireTimeoutMs for acquiring the RocksDB state store in Structure Streaming

2023-03-14 Thread Huanli Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huanli Wang updated SPARK-42794:

Description: 
We are seeing query failure which is caused by RocksDB acquisition failure for 
the retry tasks.
 *  at t1, we shrink the cluster to only have one executor

{code:java}
23/03/05 22:47:21 INFO StandaloneAppClient$ClientEndpoint: Executor updated: 
app-20230305224215-/2 is now DECOMMISSIONED (worker decommissioned because 
of kill request from HTTP endpoint (data migration disabled))
23/03/05 22:47:21 INFO StandaloneAppClient$ClientEndpoint: Executor updated: 
app-20230305224215-/3 is now DECOMMISSIONED (worker decommissioned because 
of kill request from HTTP endpoint (data migration disabled))
{code}
 
 * at t1+2min, task 7 at its first attempt (i.e. task 7.0) is scheduled to the 
alive executor

{code:java}
23/03/05 22:49:58 INFO TaskSetManager: Starting task 7.0 in stage 133.0 (TID 
685) (10.166.225.249, executor 0, partition 7, ANY, {code}
 

It seems that task 7.0 is able to pass *{{dataRDD.iterator(partition, ctxt)}}* 
and acquires the rocksdb lock as we are seeing
{code:java}
23/03/05 22:51:59 WARN TaskSetManager: Lost task 4.1 in stage 133.1 (TID 700) 
(10.166.225.249 executor 0): java.lang.IllegalStateException: 
StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be 
acquired by [ThreadId: Some(50), task: partition 7.1 in stage 133.1, TID 700] 
as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage 
133.0, TID 685] after 60003 ms.
23/03/05 22:52:59 WARN TaskSetManager: Lost task 4.2 in stage 133.1 (TID 702) 
(10.166.225.249 executor 0): java.lang.IllegalStateException: 
StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be 
acquired by [ThreadId: Some(1495), task: partition 7.2 in stage 133.1, TID 702] 
as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage 
133.0, TID 685] after 60006 ms.
23/03/05 22:53:59 WARN TaskSetManager: Lost task 4.3 in stage 133.1 (TID 704) 
(10.166.225.249 executor 0): java.lang.IllegalStateException: 
StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be 
acquired by [ThreadId: Some(46), task: partition 7.3 in stage 133.1, TID 704] 
as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage 
133.0, TID 685] after 60003 ms.
{code}
 
Increasing the 
[lockAcquireTimeoutMs|https://src.dev.databricks.com/databricks/runtime/-/blob/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala?L927:3]
 to 2 minutes such that 4 task retries will give us 8 minutes to acquire the 
lock and it is larger than connectionTimeout with retries (3 * 120s).

  was:
We are seeing query failure which is caused by RocksDB acquisition failure for 
the retry tasks. * at t1, we shrink the cluster to only have one executor

23/03/05 22:47:21 INFO StandaloneAppClient$ClientEndpoint: Executor updated: 
app-20230305224215-/2 is now DECOMMISSIONED (worker decommissioned because 
of kill request from HTTP endpoint (data migration disabled))
23/03/05 22:47:21 INFO StandaloneAppClient$ClientEndpoint: Executor updated: 
app-20230305224215-/3 is now DECOMMISSIONED (worker decommissioned because 
of kill request from HTTP endpoint (data migration disabled)) * at t1+2min, 
task 7 at its first attempt (i.e. task 7.0) is scheduled to the alive executor

23/03/05 22:49:58 INFO TaskSetManager: Starting task 7.0 in stage 133.0 (TID 
685) (10.166.225.249, executor 0, partition 7, ANY, 
It seems that task 7.0 is able to pass *{{dataRDD.iterator(partition, ctxt)}}* 
and acquires the rocksdb lock as we are seeing
23/03/05 22:51:59 WARN TaskSetManager: Lost task 4.1 in stage 133.1 (TID 700) 
(10.166.225.249 executor 0): java.lang.IllegalStateException: 
StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be 
acquired by [ThreadId: Some(50), task: partition 7.1 in stage 133.1, TID 700] 
as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage 
133.0, TID 685] after 60003 ms.

23/03/05 22:52:59 WARN TaskSetManager: Lost task 4.2 in stage 133.1 (TID 702) 
(10.166.225.249 executor 0): java.lang.IllegalStateException: 
StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be 
acquired by [ThreadId: Some(1495), task: partition 7.2 in stage 133.1, TID 702] 
as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage 
133.0, TID 685] after 60006 ms.

23/03/05 22:53:59 WARN TaskSetManager: Lost task 4.3 in stage 133.1 (TID 704) 
(10.166.225.249 executor 0): java.lang.IllegalStateException: 
StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be 
acquired by [ThreadId: Some(46), task: partition 7.3 in stage 133.1, TID 704] 
as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage 
133.0, TID 685] after 60003 ms.
 
Increasing the 

[jira] [Assigned] (SPARK-42793) `connect` module requires `build_profile_flags`

2023-03-14 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42793:


Assignee: (was: Apache Spark)

> `connect` module requires `build_profile_flags`
> ---
>
> Key: SPARK-42793
> URL: https://issues.apache.org/jira/browse/SPARK-42793
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42793) `connect` module requires `build_profile_flags`

2023-03-14 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42793:


Assignee: Apache Spark

> `connect` module requires `build_profile_flags`
> ---
>
> Key: SPARK-42793
> URL: https://issues.apache.org/jira/browse/SPARK-42793
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42793) `connect` module requires `build_profile_flags`

2023-03-14 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700397#comment-17700397
 ] 

Apache Spark commented on SPARK-42793:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/40424

> `connect` module requires `build_profile_flags`
> ---
>
> Key: SPARK-42793
> URL: https://issues.apache.org/jira/browse/SPARK-42793
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42794) Increase the lockAcquireTimeoutMs for acquiring the RocksDB state store in Structure Streaming

2023-03-14 Thread Huanli Wang (Jira)
Huanli Wang created SPARK-42794:
---

 Summary: Increase the lockAcquireTimeoutMs for acquiring the 
RocksDB state store in Structure Streaming
 Key: SPARK-42794
 URL: https://issues.apache.org/jira/browse/SPARK-42794
 Project: Spark
  Issue Type: Improvement
  Components: Structured Streaming
Affects Versions: 3.5.0
Reporter: Huanli Wang


We are seeing query failure which is caused by RocksDB acquisition failure for 
the retry tasks. * at t1, we shrink the cluster to only have one executor

23/03/05 22:47:21 INFO StandaloneAppClient$ClientEndpoint: Executor updated: 
app-20230305224215-/2 is now DECOMMISSIONED (worker decommissioned because 
of kill request from HTTP endpoint (data migration disabled))
23/03/05 22:47:21 INFO StandaloneAppClient$ClientEndpoint: Executor updated: 
app-20230305224215-/3 is now DECOMMISSIONED (worker decommissioned because 
of kill request from HTTP endpoint (data migration disabled)) * at t1+2min, 
task 7 at its first attempt (i.e. task 7.0) is scheduled to the alive executor

23/03/05 22:49:58 INFO TaskSetManager: Starting task 7.0 in stage 133.0 (TID 
685) (10.166.225.249, executor 0, partition 7, ANY, 
It seems that task 7.0 is able to pass *{{dataRDD.iterator(partition, ctxt)}}* 
and acquires the rocksdb lock as we are seeing
23/03/05 22:51:59 WARN TaskSetManager: Lost task 4.1 in stage 133.1 (TID 700) 
(10.166.225.249 executor 0): java.lang.IllegalStateException: 
StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be 
acquired by [ThreadId: Some(50), task: partition 7.1 in stage 133.1, TID 700] 
as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage 
133.0, TID 685] after 60003 ms.

23/03/05 22:52:59 WARN TaskSetManager: Lost task 4.2 in stage 133.1 (TID 702) 
(10.166.225.249 executor 0): java.lang.IllegalStateException: 
StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be 
acquired by [ThreadId: Some(1495), task: partition 7.2 in stage 133.1, TID 702] 
as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage 
133.0, TID 685] after 60006 ms.

23/03/05 22:53:59 WARN TaskSetManager: Lost task 4.3 in stage 133.1 (TID 704) 
(10.166.225.249 executor 0): java.lang.IllegalStateException: 
StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be 
acquired by [ThreadId: Some(46), task: partition 7.3 in stage 133.1, TID 704] 
as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage 
133.0, TID 685] after 60003 ms.
 
Increasing the 
[lockAcquireTimeoutMs|https://src.dev.databricks.com/databricks/runtime/-/blob/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala?L927:3]
 to 2 minutes such that 4 task retries will give us 8 minutes to acquire the 
lock and it is larger than connectionTimeout with retries (3 * 120s).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42793) `connect` module requires `build_profile_flags`

2023-03-14 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-42793:
-

 Summary: `connect` module requires `build_profile_flags`
 Key: SPARK-42793
 URL: https://issues.apache.org/jira/browse/SPARK-42793
 Project: Spark
  Issue Type: Bug
  Components: Connect
Affects Versions: 3.4.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42792) Add support to track FLUSH_WRITE_BYTES for RocksDB state store for streaming

2023-03-14 Thread Anish Shrigondekar (Jira)
Anish Shrigondekar created SPARK-42792:
--

 Summary: Add support to track FLUSH_WRITE_BYTES for RocksDB state 
store for streaming
 Key: SPARK-42792
 URL: https://issues.apache.org/jira/browse/SPARK-42792
 Project: Spark
  Issue Type: Task
  Components: Structured Streaming
Affects Versions: 3.4.0
Reporter: Anish Shrigondekar


Add support to track FLUSH_WRITE_BYTES for RocksDB state store for streaming

 

Its useful to get this metric for bytes written during flush from RocksDB as 
part of the DB custom metrics. We propose to add this to the existing metrics 
that are collected. There is no additional overhead since we are just querying 
the internal ticker guage, similar to other metrics.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42792) Add support to track FLUSH_WRITE_BYTES for RocksDB state store for streaming

2023-03-14 Thread Anish Shrigondekar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700387#comment-17700387
 ] 

Anish Shrigondekar commented on SPARK-42792:


Will send the PR soon - cc - [~kabhwan] 

> Add support to track FLUSH_WRITE_BYTES for RocksDB state store for streaming
> 
>
> Key: SPARK-42792
> URL: https://issues.apache.org/jira/browse/SPARK-42792
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Anish Shrigondekar
>Priority: Major
>
> Add support to track FLUSH_WRITE_BYTES for RocksDB state store for streaming
>  
> Its useful to get this metric for bytes written during flush from RocksDB as 
> part of the DB custom metrics. We propose to add this to the existing metrics 
> that are collected. There is no additional overhead since we are just 
> querying the internal ticker guage, similar to other metrics.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41775) Implement training functions as input

2023-03-14 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700332#comment-17700332
 ] 

Apache Spark commented on SPARK-41775:
--

User 'rithwik-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/40423

> Implement training functions as input
> -
>
> Key: SPARK-41775
> URL: https://issues.apache.org/jira/browse/SPARK-41775
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark
>Affects Versions: 3.4.0
>Reporter: Rithwik Ediga Lakhamsani
>Assignee: Rithwik Ediga Lakhamsani
>Priority: Major
> Fix For: 3.4.0
>
>
> Sidenote: make formatting updates described in 
> https://github.com/apache/spark/pull/39188
>  
> Currently, `Distributor().run(...)` takes only files as input. Now we will 
> add in additional functionality to take in functions as well. This will 
> require us to go through the following process on each task in the executor 
> nodes:
> 1. take the input function and args and pickle them
> 2. Create a temp train.py file that looks like
> {code:java}
> import cloudpickle
> import os
> if _name_ == "_main_":
>     train, args = cloudpickle.load(f"{tempdir}/train_input.pkl")
>     output = train(*args)
>     if output and os.environ.get("RANK", "") == "0": # this is for 
> partitionId == 0
>         cloudpickle.dump(f"{tempdir}/train_output.pkl") {code}
> 3. Run that train.py file with `torchrun`
> 4. Check if `train_output.pkl` has been created on process on partitionId == 
> 0, if it has, then deserialize it and return that output through `.collect()`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42791) Create golden file test framework for analysis

2023-03-14 Thread Daniel (Jira)
Daniel created SPARK-42791:
--

 Summary: Create golden file test framework for analysis
 Key: SPARK-42791
 URL: https://issues.apache.org/jira/browse/SPARK-42791
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.4.0
Reporter: Daniel


Here we track the work to add new golden file test support for the Spark 
analyzer. Each golden file can contain a list of SQL queries followed by the 
string representations of their analyzed logical plans.
 
This can be similar to Spark's existing `SQLQueryTestSuite` [1], but stopping 
after analysis and listing analyzed plans as the results instead of fully 
executing queries end-to-end. As another example, ZetaSQL has analyzer-based 
golden file testing like this as well [2].
 
This way, any changes to analysis will show up as test diffs, which are easy to 
spot in review and also easy to update automatically. This could help the 
community together maintain the qualify of Apache Spark's query analysis.
 
[1] 
[https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala]
 
[2] 
[https://github.com/google/zetasql/blob/master/zetasql/analyzer/testdata/limit.test].
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41775) Implement training functions as input

2023-03-14 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700333#comment-17700333
 ] 

Apache Spark commented on SPARK-41775:
--

User 'rithwik-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/40423

> Implement training functions as input
> -
>
> Key: SPARK-41775
> URL: https://issues.apache.org/jira/browse/SPARK-41775
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark
>Affects Versions: 3.4.0
>Reporter: Rithwik Ediga Lakhamsani
>Assignee: Rithwik Ediga Lakhamsani
>Priority: Major
> Fix For: 3.4.0
>
>
> Sidenote: make formatting updates described in 
> https://github.com/apache/spark/pull/39188
>  
> Currently, `Distributor().run(...)` takes only files as input. Now we will 
> add in additional functionality to take in functions as well. This will 
> require us to go through the following process on each task in the executor 
> nodes:
> 1. take the input function and args and pickle them
> 2. Create a temp train.py file that looks like
> {code:java}
> import cloudpickle
> import os
> if _name_ == "_main_":
>     train, args = cloudpickle.load(f"{tempdir}/train_input.pkl")
>     output = train(*args)
>     if output and os.environ.get("RANK", "") == "0": # this is for 
> partitionId == 0
>         cloudpickle.dump(f"{tempdir}/train_output.pkl") {code}
> 3. Run that train.py file with `torchrun`
> 4. Check if `train_output.pkl` has been created on process on partitionId == 
> 0, if it has, then deserialize it and return that output through `.collect()`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42774) Expose VectorTypes API for DataSourceV2 Batch Scans

2023-03-14 Thread Micah Kornfield (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Micah Kornfield updated SPARK-42774:

Priority: Minor  (was: Major)

> Expose VectorTypes API for DataSourceV2 Batch Scans
> ---
>
> Key: SPARK-42774
> URL: https://issues.apache.org/jira/browse/SPARK-42774
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.2
>Reporter: Micah Kornfield
>Priority: Minor
>
> SparkPlan's vectorType's attribute can be used to [specialize 
> codegen|https://github.com/apache/spark/blob/5556cfc59aa97a3ad4ea0baacebe19859ec0bcb7/sql/core/src/main/scala/org/apache/spark/sql/execution/Columnar.scala#L151]
>  however 
> [BatchScanExecBase|https://github.com/apache/spark/blob/6b6bb6fa20f40aeedea2fb87008e9cce76c54e28/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2ScanExecBase.scala]
>  does not override this so we DSv2 sources do not get any benefit of concrete 
> class dispatch.
> This proposes adding an override to BatchScanExecBase which delegates to a 
> new default method on 
> [PartitionReaderFactory|https://github.com/apache/spark/blob/f1d42bb68d6d69d9a32f91a390270f9ec33c3207/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/PartitionReaderFactory.java]
>  to expose vectoryTypes:
> {{
> default Optional> getVectorTypes()
> { return Optional.empty(); } }}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42787) in spark-py docker images, arrowkeys do not work in (scala) spark-shell

2023-03-14 Thread Jira


[ 
https://issues.apache.org/jira/browse/SPARK-42787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700322#comment-17700322
 ] 

Bjørn Jørgensen commented on SPARK-42787:
-

Have a look at https://github.com/apache/spark-docker 

> in spark-py docker images, arrowkeys do not work in (scala) spark-shell
> ---
>
> Key: SPARK-42787
> URL: https://issues.apache.org/jira/browse/SPARK-42787
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy
>Affects Versions: 3.1.3, 3.3.1
> Environment: [https://hub.docker.com/r/apache/spark-py] 3.1.3 and 
> 3.3.1  in Docker on M1 MacBook pro OSX ventura
>Reporter: Max Rieger
>Priority: Minor
>
> i tested this for 3.1.3 and 3.3.1 from 
> [https://hub.docker.com/r/apache/spark-py/tags]
> while it works for pyspark, it does not for the scala spark-shell.
> it seems this is due to scala REPL using {{jline}} for input management.
>  * creating a \{{.inputrc}} file with mapping for the arrow keys. this 
> wouldn't work
>  * finally, building and running from 
> {{dev/create-release/spark-rm/Dockerfile}} with jline installed as of the 
> Dockerfile, things worked.
> likely not limited to the {{spark-py}} images only.
> i'd do a PR, but am unsure if this is even the right Dockerfile to contribute 
> to in order to fix the docker hub images...
> {code:sh}
> diff --git a/dev/create-release/spark-rm/Dockerfile 
> b/dev/create-release/spark-rm/Dockerfile
> --- dev/create-release/spark-rm/Dockerfile
> +++ dev/create-release/spark-rm/Dockerfile
> @@ -71,9 +71,9 @@
>$APT_INSTALL nodejs && \
># Install needed python packages. Use pip for installing packages (for 
> consistency).
>$APT_INSTALL python-is-python3 python3-pip python3-setuptools && \
># qpdf is required for CRAN checks to pass.
> -  $APT_INSTALL qpdf jq && \
> +  $APT_INSTALL qpdf jq libjline-java && \
>pip3 install $PIP_PKGS && \
># Install R packages and dependencies used when building.
># R depends on pandoc*, libssl (which are installed above).
># Note that PySpark doc generation also needs pandoc due to nbsphinx
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42779) Allow V2 writes to indicate advisory partition size

2023-03-14 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42779:


Assignee: Apache Spark

> Allow V2 writes to indicate advisory partition size
> ---
>
> Key: SPARK-42779
> URL: https://issues.apache.org/jira/browse/SPARK-42779
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Anton Okolnychyi
>Assignee: Apache Spark
>Priority: Major
>
> Data sources may request a particular distribution and ordering of data for 
> V2 writes. If AQE is enabled, the default session advisory partition size 
> (64MB) will be used as guidance. Unfortunately, this default value can still 
> lead to small files because the written data can be compressed nicely using 
> columnar file formats. Spark should allow data sources to indicate the 
> advisory shuffle partition size, just like it lets data sources request a 
> particular number of partitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42779) Allow V2 writes to indicate advisory partition size

2023-03-14 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42779:


Assignee: (was: Apache Spark)

> Allow V2 writes to indicate advisory partition size
> ---
>
> Key: SPARK-42779
> URL: https://issues.apache.org/jira/browse/SPARK-42779
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Anton Okolnychyi
>Priority: Major
>
> Data sources may request a particular distribution and ordering of data for 
> V2 writes. If AQE is enabled, the default session advisory partition size 
> (64MB) will be used as guidance. Unfortunately, this default value can still 
> lead to small files because the written data can be compressed nicely using 
> columnar file formats. Spark should allow data sources to indicate the 
> advisory shuffle partition size, just like it lets data sources request a 
> particular number of partitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42779) Allow V2 writes to indicate advisory partition size

2023-03-14 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700316#comment-17700316
 ] 

Apache Spark commented on SPARK-42779:
--

User 'aokolnychyi' has created a pull request for this issue:
https://github.com/apache/spark/pull/40421

> Allow V2 writes to indicate advisory partition size
> ---
>
> Key: SPARK-42779
> URL: https://issues.apache.org/jira/browse/SPARK-42779
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Anton Okolnychyi
>Priority: Major
>
> Data sources may request a particular distribution and ordering of data for 
> V2 writes. If AQE is enabled, the default session advisory partition size 
> (64MB) will be used as guidance. Unfortunately, this default value can still 
> lead to small files because the written data can be compressed nicely using 
> columnar file formats. Spark should allow data sources to indicate the 
> advisory shuffle partition size, just like it lets data sources request a 
> particular number of partitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41171) Push down filter through window when partitionSpec is empty

2023-03-14 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-41171:
--
Affects Version/s: 3.5.0
   (was: 3.4.0)

> Push down filter through window when partitionSpec is empty
> ---
>
> Key: SPARK-41171
> URL: https://issues.apache.org/jira/browse/SPARK-41171
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.5.0
>
>
> Sometimes, filter compares the rank-like window functions with number.
> {code:java}
> SELECT *, ROW_NUMBER() OVER(ORDER BY a) AS rn FROM Tab1 WHERE rn <= 5
> {code}
> We can create a Limit(5) and push down it as the child of Window.
> {code:java}
> SELECT *, ROW_NUMBER() OVER(ORDER BY a) AS rn FROM (SELECT * FROM Tab1 ORDER 
> BY a LIMIT 5) t
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-42776) BroadcastHashJoinExec.requiredChildDistribution called before columnar replacement rules

2023-03-14 Thread Timothy Miller (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700286#comment-17700286
 ] 

Timothy Miller edited comment on SPARK-42776 at 3/14/23 4:34 PM:
-

A little more detail about the sequence events that cause this bug:
 * org.apache.spark.sql.execution.RemoveRedundantProjects is applied
 * that causes BroadcastHashJoinExec to get created
 * org.apache.spark.sql.execution.exchange.EnsureRequirements is applied
 * BroadcastHashJoinExec.requiredChildDistribution gets called, creating the 
hashmap object that gets broadcast
 * a few more rules are applied, followed by 
org.apache.spark.sql.execution.ApplyColumnarRulesAndInsertTransitions
 * Only after that can I replace BroadcastHashJoinExec with a columnar 
alternative, but by then it's too late.

I can't find a way to inject extra rules into or between 
RemoveRedundantProjects or EnsureRequirements, so there doesn't seem to be a 
workaround either.


was (Author: JIRAUSER287471):
A little more detail about the sequence events that cause this bug:
 * org.apache.spark.sql.execution.RemoveRedundantProjects is applied
 * that causes BroadcastHashJoinExec to get created
 * org.apache.spark.sql.execution.exchange.EnsureRequirements is applied
 * BroadcastHashJoinExec.requiredChildDistribution gets called, creating the 
hashmap object that gets broadcast
 * a few more rules are applied, followed by 
org.apache.spark.sql.execution.ApplyColumnarRulesAndInsertTransitions

I can't find a way to inject extra rules into or between 
RemoveRedundantProjects or EnsureRequirements, so there doesn't seem to be a 
workaround either.

> BroadcastHashJoinExec.requiredChildDistribution called before columnar 
> replacement rules
> 
>
> Key: SPARK-42776
> URL: https://issues.apache.org/jira/browse/SPARK-42776
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer
>Affects Versions: 3.3.1
> Environment: I'm prototyping on a Mac, but that's not really relevant.
>Reporter: Timothy Miller
>Priority: Major
>
> I am trying to replace BroadcastHashJoinExec with a columnar equivalent. 
> However, I noticed that BroadcastHashJoinExec.requiredChildDistribution gets 
> called BEFORE the columnar replacement rules. As a result, the object that 
> gets broadcast is the plain old hashmap created from row data. By the time 
> the columnar replacement rules are applied, it's too late to get Spark to 
> broadcast any other kind of object.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42693) API Auditing

2023-03-14 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-42693:
--
Target Version/s: 3.4.0

> API Auditing
> 
>
> Key: SPARK-42693
> URL: https://issues.apache.org/jira/browse/SPARK-42693
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark, Spark Core, SQL, Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Blocker
>
> Audit user-facing API of Spark 3.4.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42693) API Auditing

2023-03-14 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700300#comment-17700300
 ] 

Dongjoon Hyun commented on SPARK-42693:
---

Hi, [~XinrongM]. This JIRA is open as a `Blocker` issue, but there is no 
activity.
Could you share the progress please, [~XinrongM]?

> API Auditing
> 
>
> Key: SPARK-42693
> URL: https://issues.apache.org/jira/browse/SPARK-42693
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark, Spark Core, SQL, Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Blocker
>
> Audit user-facing API of Spark 3.4.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42754) Spark 3.4 history server's SQL tab incorrectly groups SQL executions when replaying event logs from Spark 3.3 and earlier

2023-03-14 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-42754:
--
Target Version/s: 3.4.0

> Spark 3.4 history server's SQL tab incorrectly groups SQL executions when 
> replaying event logs from Spark 3.3 and earlier
> -
>
> Key: SPARK-42754
> URL: https://issues.apache.org/jira/browse/SPARK-42754
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Josh Rosen
>Assignee: Linhong Liu
>Priority: Blocker
> Fix For: 3.4.1
>
> Attachments: example.png
>
>
> In Spark 3.4.0 RC4, the Spark History Server's SQL tab incorrectly groups SQL 
> executions when replaying event logs generated by older Spark versions.
>  
> {*}Reproduction{*}:
> {{In ./bin/spark-shell --conf spark.eventLog.enabled=true --conf 
> spark.eventLog.dir=eventlogs, run three non-nested SQL queries:}}
> {code:java}
> sql("select * from range(10)").collect()
> sql("select * from range(20)").collect()
> sql("select * from range(30)").collect(){code}
> Exit the shell and use the Spark History Server to replay this application's 
> UI.
> In the SQL tab I expect to see three separate queries, but Spark 3.4's 
> history server incorrectly groups the second and third queries as nested 
> queries of the first (see attached screenshot).
>  
> {*}Root cause{*}: 
> [https://github.com/apache/spark/pull/39268] / SPARK-41752 added a new 
> *non-optional* {{rootExecutionId: Long}} field to the 
> SparkListenerSQLExecutionStart case class.
> When JsonProtocol deserializes this event it uses the "ignore missing 
> properties" Jackson deserialization option, causing the 
> {{rootExecutionField}} to be initialized with a default value of {{{}0{}}}.
> The value {{0}} is a legitimate execution ID, so in the deserialized event we 
> have no ability to distinguish between the absence of a value and a case 
> where all queries have the first query as the root.
> *Proposed* {*}fix{*}:
> I think we should change this field to be of type {{Option[Long]}} . I 
> believe this is a release blocker for Spark 3.4.0 because we cannot change 
> the type of this new field in a future release without breaking binary 
> compatibility.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42754) Spark 3.4 history server's SQL tab incorrectly groups SQL executions when replaying event logs from Spark 3.3 and earlier

2023-03-14 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-42754.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 40403
[https://github.com/apache/spark/pull/40403]

> Spark 3.4 history server's SQL tab incorrectly groups SQL executions when 
> replaying event logs from Spark 3.3 and earlier
> -
>
> Key: SPARK-42754
> URL: https://issues.apache.org/jira/browse/SPARK-42754
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Josh Rosen
>Priority: Blocker
> Fix For: 3.4.0
>
> Attachments: example.png
>
>
> In Spark 3.4.0 RC4, the Spark History Server's SQL tab incorrectly groups SQL 
> executions when replaying event logs generated by older Spark versions.
>  
> {*}Reproduction{*}:
> {{In ./bin/spark-shell --conf spark.eventLog.enabled=true --conf 
> spark.eventLog.dir=eventlogs, run three non-nested SQL queries:}}
> {code:java}
> sql("select * from range(10)").collect()
> sql("select * from range(20)").collect()
> sql("select * from range(30)").collect(){code}
> Exit the shell and use the Spark History Server to replay this application's 
> UI.
> In the SQL tab I expect to see three separate queries, but Spark 3.4's 
> history server incorrectly groups the second and third queries as nested 
> queries of the first (see attached screenshot).
>  
> {*}Root cause{*}: 
> [https://github.com/apache/spark/pull/39268] / SPARK-41752 added a new 
> *non-optional* {{rootExecutionId: Long}} field to the 
> SparkListenerSQLExecutionStart case class.
> When JsonProtocol deserializes this event it uses the "ignore missing 
> properties" Jackson deserialization option, causing the 
> {{rootExecutionField}} to be initialized with a default value of {{{}0{}}}.
> The value {{0}} is a legitimate execution ID, so in the deserialized event we 
> have no ability to distinguish between the absence of a value and a case 
> where all queries have the first query as the root.
> *Proposed* {*}fix{*}:
> I think we should change this field to be of type {{Option[Long]}} . I 
> believe this is a release blocker for Spark 3.4.0 because we cannot change 
> the type of this new field in a future release without breaking binary 
> compatibility.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42754) Spark 3.4 history server's SQL tab incorrectly groups SQL executions when replaying event logs from Spark 3.3 and earlier

2023-03-14 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-42754:
--
Fix Version/s: 3.4.1
   (was: 3.4.0)

> Spark 3.4 history server's SQL tab incorrectly groups SQL executions when 
> replaying event logs from Spark 3.3 and earlier
> -
>
> Key: SPARK-42754
> URL: https://issues.apache.org/jira/browse/SPARK-42754
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Josh Rosen
>Assignee: Linhong Liu
>Priority: Blocker
> Fix For: 3.4.1
>
> Attachments: example.png
>
>
> In Spark 3.4.0 RC4, the Spark History Server's SQL tab incorrectly groups SQL 
> executions when replaying event logs generated by older Spark versions.
>  
> {*}Reproduction{*}:
> {{In ./bin/spark-shell --conf spark.eventLog.enabled=true --conf 
> spark.eventLog.dir=eventlogs, run three non-nested SQL queries:}}
> {code:java}
> sql("select * from range(10)").collect()
> sql("select * from range(20)").collect()
> sql("select * from range(30)").collect(){code}
> Exit the shell and use the Spark History Server to replay this application's 
> UI.
> In the SQL tab I expect to see three separate queries, but Spark 3.4's 
> history server incorrectly groups the second and third queries as nested 
> queries of the first (see attached screenshot).
>  
> {*}Root cause{*}: 
> [https://github.com/apache/spark/pull/39268] / SPARK-41752 added a new 
> *non-optional* {{rootExecutionId: Long}} field to the 
> SparkListenerSQLExecutionStart case class.
> When JsonProtocol deserializes this event it uses the "ignore missing 
> properties" Jackson deserialization option, causing the 
> {{rootExecutionField}} to be initialized with a default value of {{{}0{}}}.
> The value {{0}} is a legitimate execution ID, so in the deserialized event we 
> have no ability to distinguish between the absence of a value and a case 
> where all queries have the first query as the root.
> *Proposed* {*}fix{*}:
> I think we should change this field to be of type {{Option[Long]}} . I 
> believe this is a release blocker for Spark 3.4.0 because we cannot change 
> the type of this new field in a future release without breaking binary 
> compatibility.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42754) Spark 3.4 history server's SQL tab incorrectly groups SQL executions when replaying event logs from Spark 3.3 and earlier

2023-03-14 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-42754:
-

Assignee: Linhong Liu

> Spark 3.4 history server's SQL tab incorrectly groups SQL executions when 
> replaying event logs from Spark 3.3 and earlier
> -
>
> Key: SPARK-42754
> URL: https://issues.apache.org/jira/browse/SPARK-42754
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Josh Rosen
>Assignee: Linhong Liu
>Priority: Blocker
> Fix For: 3.4.0
>
> Attachments: example.png
>
>
> In Spark 3.4.0 RC4, the Spark History Server's SQL tab incorrectly groups SQL 
> executions when replaying event logs generated by older Spark versions.
>  
> {*}Reproduction{*}:
> {{In ./bin/spark-shell --conf spark.eventLog.enabled=true --conf 
> spark.eventLog.dir=eventlogs, run three non-nested SQL queries:}}
> {code:java}
> sql("select * from range(10)").collect()
> sql("select * from range(20)").collect()
> sql("select * from range(30)").collect(){code}
> Exit the shell and use the Spark History Server to replay this application's 
> UI.
> In the SQL tab I expect to see three separate queries, but Spark 3.4's 
> history server incorrectly groups the second and third queries as nested 
> queries of the first (see attached screenshot).
>  
> {*}Root cause{*}: 
> [https://github.com/apache/spark/pull/39268] / SPARK-41752 added a new 
> *non-optional* {{rootExecutionId: Long}} field to the 
> SparkListenerSQLExecutionStart case class.
> When JsonProtocol deserializes this event it uses the "ignore missing 
> properties" Jackson deserialization option, causing the 
> {{rootExecutionField}} to be initialized with a default value of {{{}0{}}}.
> The value {{0}} is a legitimate execution ID, so in the deserialized event we 
> have no ability to distinguish between the absence of a value and a case 
> where all queries have the first query as the root.
> *Proposed* {*}fix{*}:
> I think we should change this field to be of type {{Option[Long]}} . I 
> believe this is a release blocker for Spark 3.4.0 because we cannot change 
> the type of this new field in a future release without breaking binary 
> compatibility.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42782) Port the tests for get_json_object from the Apache Hive project

2023-03-14 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-42782.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 40409
[https://github.com/apache/spark/pull/40409]

> Port the tests for get_json_object from the Apache Hive project
> ---
>
> Key: SPARK-42782
> URL: https://issues.apache.org/jira/browse/SPARK-42782
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.5.0
>
>
> https://github.com/apache/hive/blob/ba0217ff17501fb849d8999e808d37579db7b4f1/ql/src/test/org/apache/hadoop/hive/ql/udf/TestUDFJson.java



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42776) BroadcastHashJoinExec.requiredChildDistribution called before columnar replacement rules

2023-03-14 Thread Timothy Miller (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700286#comment-17700286
 ] 

Timothy Miller commented on SPARK-42776:


A little more detail about the sequence events that cause this bug:
 * org.apache.spark.sql.execution.RemoveRedundantProjects is applied
 * that causes BroadcastHashJoinExec to get created
 * org.apache.spark.sql.execution.exchange.EnsureRequirements is applied
 * BroadcastHashJoinExec.requiredChildDistribution gets called, creating the 
hashmap object that gets broadcast
 * a few more rules are applied, followed by 
org.apache.spark.sql.execution.ApplyColumnarRulesAndInsertTransitions

I can't find a way to inject extra rules into or between 
RemoveRedundantProjects or EnsureRequirements, so there doesn't seem to be a 
workaround either.

> BroadcastHashJoinExec.requiredChildDistribution called before columnar 
> replacement rules
> 
>
> Key: SPARK-42776
> URL: https://issues.apache.org/jira/browse/SPARK-42776
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer
>Affects Versions: 3.3.1
> Environment: I'm prototyping on a Mac, but that's not really relevant.
>Reporter: Timothy Miller
>Priority: Major
>
> I am trying to replace BroadcastHashJoinExec with a columnar equivalent. 
> However, I noticed that BroadcastHashJoinExec.requiredChildDistribution gets 
> called BEFORE the columnar replacement rules. As a result, the object that 
> gets broadcast is the plain old hashmap created from row data. By the time 
> the columnar replacement rules are applied, it's too late to get Spark to 
> broadcast any other kind of object.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42782) Port the tests for get_json_object from the Apache Hive project

2023-03-14 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-42782:
--
Component/s: Tests

> Port the tests for get_json_object from the Apache Hive project
> ---
>
> Key: SPARK-42782
> URL: https://issues.apache.org/jira/browse/SPARK-42782
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Tests
>Affects Versions: 3.5.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.5.0
>
>
> https://github.com/apache/hive/blob/ba0217ff17501fb849d8999e808d37579db7b4f1/ql/src/test/org/apache/hadoop/hive/ql/udf/TestUDFJson.java



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42782) Port the tests for get_json_object from the Apache Hive project

2023-03-14 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-42782:
-

Assignee: Yuming Wang

> Port the tests for get_json_object from the Apache Hive project
> ---
>
> Key: SPARK-42782
> URL: https://issues.apache.org/jira/browse/SPARK-42782
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>
> https://github.com/apache/hive/blob/ba0217ff17501fb849d8999e808d37579db7b4f1/ql/src/test/org/apache/hadoop/hive/ql/udf/TestUDFJson.java



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42617) Support `isocalendar`

2023-03-14 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42617:


Assignee: (was: Apache Spark)

> Support `isocalendar`
> -
>
> Key: SPARK-42617
> URL: https://issues.apache.org/jira/browse/SPARK-42617
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>
> We should support `isocalendar` to match pandas behavior 
> (https://pandas.pydata.org/pandas-docs/dev/reference/api/pandas.Series.dt.isocalendar.html)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42617) Support `isocalendar`

2023-03-14 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42617:


Assignee: Apache Spark

> Support `isocalendar`
> -
>
> Key: SPARK-42617
> URL: https://issues.apache.org/jira/browse/SPARK-42617
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Assignee: Apache Spark
>Priority: Major
>
> We should support `isocalendar` to match pandas behavior 
> (https://pandas.pydata.org/pandas-docs/dev/reference/api/pandas.Series.dt.isocalendar.html)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42617) Support `isocalendar`

2023-03-14 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700282#comment-17700282
 ] 

Apache Spark commented on SPARK-42617:
--

User 'dzhigimont' has created a pull request for this issue:
https://github.com/apache/spark/pull/40420

> Support `isocalendar`
> -
>
> Key: SPARK-42617
> URL: https://issues.apache.org/jira/browse/SPARK-42617
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>
> We should support `isocalendar` to match pandas behavior 
> (https://pandas.pydata.org/pandas-docs/dev/reference/api/pandas.Series.dt.isocalendar.html)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42770) SQLImplicitsTestSuite test failed with Java 17

2023-03-14 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-42770.
---
Fix Version/s: 3.4.1
   Resolution: Fixed

Issue resolved by pull request 40395
[https://github.com/apache/spark/pull/40395]

> SQLImplicitsTestSuite test failed with Java 17
> --
>
> Key: SPARK-42770
> URL: https://issues.apache.org/jira/browse/SPARK-42770
> Project: Spark
>  Issue Type: Bug
>  Components: Connect, Tests
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
> Fix For: 3.4.1
>
>
> [https://github.com/apache/spark/actions/runs/4318647315/jobs/7537203682]
> {code:java}
> [info] - test implicit encoder resolution *** FAILED *** (1 second, 329 
> milliseconds)
> 4429[info]   2023-03-02T23:00:20.404434 did not equal 
> 2023-03-02T23:00:20.404434875 (SQLImplicitsTestSuite.scala:63)
> 4430[info]   org.scalatest.exceptions.TestFailedException:
> 4431[info]   at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
> 4432[info]   at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
> 4433[info]   at 
> org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)
> 4434[info]   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)
> 4435[info]   at 
> org.apache.spark.sql.SQLImplicitsTestSuite.testImplicit$1(SQLImplicitsTestSuite.scala:63)
> 4436[info]   at 
> org.apache.spark.sql.SQLImplicitsTestSuite.$anonfun$new$2(SQLImplicitsTestSuite.scala:133)
> 4437[info]   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
> 4438[info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
> 4439[info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
> 4440[info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> 4441[info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
> 4442[info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
> 4443[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)
> [info]   at org.scalatest.TestSuite.withFixture(TestSuite.scala:196)
> 4445[info]   at org.scalatest.TestSuite.withFixture$(TestSuite.scala:195)
> 4446[info]   at 
> org.scalatest.funsuite.AnyFunSuite.withFixture(AnyFunSuite.scala:1564)
> 4447[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224)
> 4448[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236)
> 4449[info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
> 4450[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236)
> 4451[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218)
> 4452[info]   at 
> org.scalatest.funsuite.AnyFunSuite.runTest(AnyFunSuite.scala:1564)
> 4453[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269)
> 4454[info]   at 
> org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
> 4455[info]   at scala.collection.immutable.List.foreach(List.scala:431)
> 4456[info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
> 4457[info]   at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396)
> 4458[info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475)
> 4459[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269)
> 4460[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268)
> 4461[info]   at 
> org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1564)
> 4462[info]   at org.scalatest.Suite.run(Suite.scala:1114)
> 4463[info]   at org.scalatest.Suite.run$(Suite.scala:1096)
> 4464[info]   at 
> org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1564)
> 4465[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:273)
> 4466[info]   at org.scalatest.SuperEngine.runImpl(Engine.scala:535)
> 4467[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:273)
> 4468[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:272)
> 4469[info]   at 
> org.apache.spark.sql.SQLImplicitsTestSuite.org$scalatest$BeforeAndAfterAll$$super$run(SQLImplicitsTestSuite.scala:34)
> 4470[info]   at 
> org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213)
> 4471[info]   at 
> org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
> 4472[info]   at 
> org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
> 4473[info]   at 
> 

[jira] [Assigned] (SPARK-42770) SQLImplicitsTestSuite test failed with Java 17

2023-03-14 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-42770:
-

Assignee: Yang Jie

> SQLImplicitsTestSuite test failed with Java 17
> --
>
> Key: SPARK-42770
> URL: https://issues.apache.org/jira/browse/SPARK-42770
> Project: Spark
>  Issue Type: Bug
>  Components: Connect, Tests
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>
> [https://github.com/apache/spark/actions/runs/4318647315/jobs/7537203682]
> {code:java}
> [info] - test implicit encoder resolution *** FAILED *** (1 second, 329 
> milliseconds)
> 4429[info]   2023-03-02T23:00:20.404434 did not equal 
> 2023-03-02T23:00:20.404434875 (SQLImplicitsTestSuite.scala:63)
> 4430[info]   org.scalatest.exceptions.TestFailedException:
> 4431[info]   at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
> 4432[info]   at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
> 4433[info]   at 
> org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)
> 4434[info]   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)
> 4435[info]   at 
> org.apache.spark.sql.SQLImplicitsTestSuite.testImplicit$1(SQLImplicitsTestSuite.scala:63)
> 4436[info]   at 
> org.apache.spark.sql.SQLImplicitsTestSuite.$anonfun$new$2(SQLImplicitsTestSuite.scala:133)
> 4437[info]   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
> 4438[info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
> 4439[info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
> 4440[info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> 4441[info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
> 4442[info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
> 4443[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)
> [info]   at org.scalatest.TestSuite.withFixture(TestSuite.scala:196)
> 4445[info]   at org.scalatest.TestSuite.withFixture$(TestSuite.scala:195)
> 4446[info]   at 
> org.scalatest.funsuite.AnyFunSuite.withFixture(AnyFunSuite.scala:1564)
> 4447[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224)
> 4448[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236)
> 4449[info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
> 4450[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236)
> 4451[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218)
> 4452[info]   at 
> org.scalatest.funsuite.AnyFunSuite.runTest(AnyFunSuite.scala:1564)
> 4453[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269)
> 4454[info]   at 
> org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
> 4455[info]   at scala.collection.immutable.List.foreach(List.scala:431)
> 4456[info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
> 4457[info]   at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396)
> 4458[info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475)
> 4459[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269)
> 4460[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268)
> 4461[info]   at 
> org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1564)
> 4462[info]   at org.scalatest.Suite.run(Suite.scala:1114)
> 4463[info]   at org.scalatest.Suite.run$(Suite.scala:1096)
> 4464[info]   at 
> org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1564)
> 4465[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:273)
> 4466[info]   at org.scalatest.SuperEngine.runImpl(Engine.scala:535)
> 4467[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:273)
> 4468[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:272)
> 4469[info]   at 
> org.apache.spark.sql.SQLImplicitsTestSuite.org$scalatest$BeforeAndAfterAll$$super$run(SQLImplicitsTestSuite.scala:34)
> 4470[info]   at 
> org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213)
> 4471[info]   at 
> org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
> 4472[info]   at 
> org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
> 4473[info]   at 
> org.apache.spark.sql.SQLImplicitsTestSuite.run(SQLImplicitsTestSuite.scala:34)
> 4474[info]   at 
> 

[jira] [Updated] (SPARK-42785) [K8S][Core] When spark submit without --deploy-mode, will face NPE in Kubernetes Case

2023-03-14 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-42785:
--
Affects Version/s: 3.2.4
   3.3.3
   3.4.0
   (was: 3.3.2)

> [K8S][Core] When spark submit without --deploy-mode, will face NPE in 
> Kubernetes Case
> -
>
> Key: SPARK-42785
> URL: https://issues.apache.org/jira/browse/SPARK-42785
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.2.4, 3.3.3, 3.4.0
>Reporter: binjie yang
>Assignee: binjie yang
>Priority: Major
> Fix For: 3.2.4, 3.3.3, 3.4.1
>
>
> According to this PR 
> [https://github.com/apache/spark/pull/37880#issuecomment-134890,] when 
> user spark submit without `--deploy-mode XXX` or `–conf 
> spark.submit.deployMode=`, may face NPE with this code
>  
> args.deployMode.equals("client")
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42785) [K8S][Core] When spark submit without --deploy-mode, will face NPE in Kubernetes Case

2023-03-14 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-42785.
---
Fix Version/s: 3.3.3
   3.2.4
   3.4.1
   Resolution: Fixed

Issue resolved by pull request 40414
[https://github.com/apache/spark/pull/40414]

> [K8S][Core] When spark submit without --deploy-mode, will face NPE in 
> Kubernetes Case
> -
>
> Key: SPARK-42785
> URL: https://issues.apache.org/jira/browse/SPARK-42785
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.3.2
>Reporter: binjie yang
>Assignee: binjie yang
>Priority: Major
> Fix For: 3.3.3, 3.2.4, 3.4.1
>
>
> According to this PR 
> [https://github.com/apache/spark/pull/37880#issuecomment-134890,] when 
> user spark submit without `--deploy-mode XXX` or `–conf 
> spark.submit.deployMode=`, may face NPE with this code
>  
> args.deployMode.equals("client")
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42785) [K8S][Core] When spark submit without --deploy-mode, will face NPE in Kubernetes Case

2023-03-14 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-42785:
-

Assignee: binjie yang

> [K8S][Core] When spark submit without --deploy-mode, will face NPE in 
> Kubernetes Case
> -
>
> Key: SPARK-42785
> URL: https://issues.apache.org/jira/browse/SPARK-42785
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.3.2
>Reporter: binjie yang
>Assignee: binjie yang
>Priority: Major
>
> According to this PR 
> [https://github.com/apache/spark/pull/37880#issuecomment-134890,] when 
> user spark submit without `--deploy-mode XXX` or `–conf 
> spark.submit.deployMode=`, may face NPE with this code
>  
> args.deployMode.equals("client")
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >