[jira] [Updated] (SPARK-46674) Remove the Hive Index methods in HiveShim

2024-01-10 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao updated SPARK-46674:
-
Summary: Remove the Hive Index methods in HiveShim  (was: Remove the Hive 
Index methods)

> Remove the Hive Index methods in HiveShim
> -
>
> Key: SPARK-46674
> URL: https://issues.apache.org/jira/browse/SPARK-46674
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46667) XML: Throw error on multiple XML data source

2024-01-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46667:
---
Labels: pull-request-available  (was: )

> XML: Throw error on multiple XML data source
> 
>
> Key: SPARK-46667
> URL: https://issues.apache.org/jira/browse/SPARK-46667
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Sandip Agarwala
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46674) Remove the Hive Index methods

2024-01-10 Thread Kent Yao (Jira)
Kent Yao created SPARK-46674:


 Summary: Remove the Hive Index methods
 Key: SPARK-46674
 URL: https://issues.apache.org/jira/browse/SPARK-46674
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46673) Refine docstring `aes_encrypt\aes_decrypt\try_aes_decrypt`

2024-01-10 Thread BingKun Pan (Jira)
BingKun Pan created SPARK-46673:
---

 Summary: Refine docstring `aes_encrypt\aes_decrypt\try_aes_decrypt`
 Key: SPARK-46673
 URL: https://issues.apache.org/jira/browse/SPARK-46673
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation
Affects Versions: 4.0.0
Reporter: BingKun Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46673) Refine docstring `aes_encrypt/aes_decrypt/try_aes_decrypt`

2024-01-10 Thread BingKun Pan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BingKun Pan updated SPARK-46673:

Summary: Refine docstring `aes_encrypt/aes_decrypt/try_aes_decrypt`  (was: 
Refine docstring `aes_encrypt\aes_decrypt\try_aes_decrypt`)

> Refine docstring `aes_encrypt/aes_decrypt/try_aes_decrypt`
> --
>
> Key: SPARK-46673
> URL: https://issues.apache.org/jira/browse/SPARK-46673
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46672) Upgrade log4j2 to 2.22.1

2024-01-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46672:
---
Labels: pull-request-available  (was: )

> Upgrade log4j2 to 2.22.1
> 
>
> Key: SPARK-46672
> URL: https://issues.apache.org/jira/browse/SPARK-46672
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46672) Upgrade log4j2 to 2.22.1

2024-01-10 Thread Yang Jie (Jira)
Yang Jie created SPARK-46672:


 Summary: Upgrade log4j2 to 2.22.1
 Key: SPARK-46672
 URL: https://issues.apache.org/jira/browse/SPARK-46672
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 4.0.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46614) Refine docstring `make_timestamp/make_timestamp_ltz/make_timestamp_ntz/make_ym_interval`

2024-01-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46614.
--
Resolution: Fixed

Fixed in https://github.com/apache/spark/pull/44679

> Refine docstring 
> `make_timestamp/make_timestamp_ltz/make_timestamp_ntz/make_ym_interval`
> 
>
> Key: SPARK-46614
> URL: https://issues.apache.org/jira/browse/SPARK-46614
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46614) Refine docstring `make_timestamp/make_timestamp_ltz/make_timestamp_ntz/make_ym_interval`

2024-01-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46614:


Assignee: BingKun Pan

> Refine docstring 
> `make_timestamp/make_timestamp_ltz/make_timestamp_ntz/make_ym_interval`
> 
>
> Key: SPARK-46614
> URL: https://issues.apache.org/jira/browse/SPARK-46614
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46666) Make lxml as an optional testing dependency in test_session

2024-01-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-4?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-4.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44676
[https://github.com/apache/spark/pull/44676]

> Make lxml as an optional testing dependency in test_session
> ---
>
> Key: SPARK-4
> URL: https://issues.apache.org/jira/browse/SPARK-4
> Project: Spark
>  Issue Type: Test
>  Components: PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> {code}
> Traceback (most recent call last):
>   File "", line 198, in _run_module_as_main
>   File "", line 88, in _run_code
>   File "/__w/spark/spark/python/pyspark/sql/tests/test_session.py", line 22, 
> in 
> from lxml import etree
> ModuleNotFoundError: No module named 'lxml'
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46666) Make lxml as an optional testing dependency in test_session

2024-01-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-4?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-4:


Assignee: Hyukjin Kwon

> Make lxml as an optional testing dependency in test_session
> ---
>
> Key: SPARK-4
> URL: https://issues.apache.org/jira/browse/SPARK-4
> Project: Spark
>  Issue Type: Test
>  Components: PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
>  Labels: pull-request-available
>
> {code}
> Traceback (most recent call last):
>   File "", line 198, in _run_module_as_main
>   File "", line 88, in _run_code
>   File "/__w/spark/spark/python/pyspark/sql/tests/test_session.py", line 22, 
> in 
> from lxml import etree
> ModuleNotFoundError: No module named 'lxml'
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46671) InferFiltersFromConstraint rule is creating a redundant filter

2024-01-10 Thread Asif (Jira)
Asif created SPARK-46671:


 Summary: InferFiltersFromConstraint rule is creating a redundant 
filter
 Key: SPARK-46671
 URL: https://issues.apache.org/jira/browse/SPARK-46671
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.5.0
Reporter: Asif


while bring my old PR which uses a different approach to  the 
ConstraintPropagation algorithm ( 
[SPARK-33152|https://issues.apache.org/jira/browse/SPARK-33152]) in synch with 
current master, I noticed a test failure in my branch for SPARK-33152:
The test which is failing is
InferFiltersFromConstraintSuite:
{code}
  test("SPARK-43095: Avoid Once strategy's idempotence is broken for batch: 
Infer Filters") {
val x = testRelation.as("x")
val y = testRelation.as("y")
val z = testRelation.as("z")

// Removes EqualNullSafe when constructing candidate constraints
comparePlans(
  InferFiltersFromConstraints(x.select($"x.a", $"x.a".as("xa"))
.where($"xa" <=> $"x.a" && $"xa" === $"x.a").analyze),
  x.select($"x.a", $"x.a".as("xa"))
.where($"xa".isNotNull && $"x.a".isNotNull && $"xa" <=> $"x.a" && $"xa" 
=== $"x.a").analyze)

// Once strategy's idempotence is not broken
val originalQuery =
  x.join(y, condition = Some($"x.a" === $"y.a"))
.select($"x.a", $"x.a".as("xa")).as("xy")
.join(z, condition = Some($"xy.a" === $"z.a")).analyze

val correctAnswer =
  x.where($"a".isNotNull).join(y.where($"a".isNotNull), condition = 
Some($"x.a" === $"y.a"))
.select($"x.a", $"x.a".as("xa")).as("xy")
.join(z.where($"a".isNotNull), condition = Some($"xy.a" === 
$"z.a")).analyze

val optimizedQuery = InferFiltersFromConstraints(originalQuery)
comparePlans(optimizedQuery, correctAnswer)
comparePlans(InferFiltersFromConstraints(optimizedQuery), correctAnswer)
  }
{code}

In the above test, I believe the below assertion is not proper.
There is a redundant filter which is getting created.
Out of these two isNotNull constraints,  only one should be created.

$"xa".isNotNull && $"x.a".isNotNull 
 Because presence of (xa#0 = a#0), automatically implies that is one attribute 
is not null, the other also has to be not null.

  // Removes EqualNullSafe when constructing candidate constraints
comparePlans(
  InferFiltersFromConstraints(x.select($"x.a", $"x.a".as("xa"))
.where($"xa" <=> $"x.a" && $"xa" === $"x.a").analyze),
  x.select($"x.a", $"x.a".as("xa"))
.where($"xa".isNotNull && $"x.a".isNotNull && $"xa" <=> $"x.a" && $"xa" 
=== $"x.a").analyze) 

This is not a big issue, but it highlights the need to take a relook at the 
code of ConstraintPropagation and related code.

I am filing this jira so that constraint code can be tightened/made more robust.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46670) Make DataSourceManager isolated and self clone-able

2024-01-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46670:
---
Labels: pull-request-available  (was: )

> Make DataSourceManager isolated and self clone-able 
> 
>
> Key: SPARK-46670
> URL: https://issues.apache.org/jira/browse/SPARK-46670
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>  Labels: pull-request-available
>
> Make DataSourceManager isolated and self clone-able 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46670) Make DataSourceManager isolated and self clone-able

2024-01-10 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-46670:


 Summary: Make DataSourceManager isolated and self clone-able 
 Key: SPARK-46670
 URL: https://issues.apache.org/jira/browse/SPARK-46670
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark, SQL
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


Make DataSourceManager isolated and self clone-able 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46668) Parallelize Sphinx build of Python API docs

2024-01-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46668:
---
Labels: pull-request-available  (was: )

> Parallelize Sphinx build of Python API docs
> ---
>
> Key: SPARK-46668
> URL: https://issues.apache.org/jira/browse/SPARK-46668
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Nicholas Chammas
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46669) Bump Kubernetes Client 6.10.0

2024-01-10 Thread Cheng Pan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Pan resolved SPARK-46669.
---
Resolution: Duplicate

> Bump Kubernetes Client 6.10.0
> -
>
> Key: SPARK-46669
> URL: https://issues.apache.org/jira/browse/SPARK-46669
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: k8s
>Affects Versions: 4.0.0
>Reporter: Cheng Pan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46669) Bump Kubernetes Client 6.10.0

2024-01-10 Thread Cheng Pan (Jira)
Cheng Pan created SPARK-46669:
-

 Summary: Bump Kubernetes Client 6.10.0
 Key: SPARK-46669
 URL: https://issues.apache.org/jira/browse/SPARK-46669
 Project: Spark
  Issue Type: Dependency upgrade
  Components: k8s
Affects Versions: 4.0.0
Reporter: Cheng Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46668) Parallelize Sphinx build of Python API docs

2024-01-10 Thread Nicholas Chammas (Jira)
Nicholas Chammas created SPARK-46668:


 Summary: Parallelize Sphinx build of Python API docs
 Key: SPARK-46668
 URL: https://issues.apache.org/jira/browse/SPARK-46668
 Project: Spark
  Issue Type: Improvement
  Components: Documentation, PySpark
Affects Versions: 4.0.0
Reporter: Nicholas Chammas






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46653) Code-gen for full outer sort merge join output line by line

2024-01-10 Thread Mingliang Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Zhu updated SPARK-46653:
--
Description: Be consistent with closing code-gen, avoid oom when the parent 
of SortMergeJoin cannot codegen and there are a large number of duplicate keys 
in full outer sort merge join.  (was: Be consistent with closing code-gen, 
avoid oom when there are a large number of duplicate keys in full outer sort 
merge join.)

>  Code-gen for full outer sort merge join output line by line
> 
>
> Key: SPARK-46653
> URL: https://issues.apache.org/jira/browse/SPARK-46653
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Mingliang Zhu
>Priority: Major
>  Labels: pull-request-available
>
> Be consistent with closing code-gen, avoid oom when the parent of 
> SortMergeJoin cannot codegen and there are a large number of duplicate keys 
> in full outer sort merge join.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46575) Make HiveThriftServer2.startWithContext DevelopApi retriable

2024-01-10 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-46575:


Assignee: Kent Yao

> Make HiveThriftServer2.startWithContext DevelopApi retriable
> 
>
> Key: SPARK-46575
> URL: https://issues.apache.org/jira/browse/SPARK-46575
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46667) XML: Throw error on multiple XML data source

2024-01-10 Thread Sandip Agarwala (Jira)
Sandip Agarwala created SPARK-46667:
---

 Summary: XML: Throw error on multiple XML data source
 Key: SPARK-46667
 URL: https://issues.apache.org/jira/browse/SPARK-46667
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 4.0.0
Reporter: Sandip Agarwala






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46575) Make HiveThriftServer2.startWithContext DevelopApi retriable

2024-01-10 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-46575.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44575
[https://github.com/apache/spark/pull/44575]

> Make HiveThriftServer2.startWithContext DevelopApi retriable
> 
>
> Key: SPARK-46575
> URL: https://issues.apache.org/jira/browse/SPARK-46575
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46614) Refine docstring `make_timestamp/make_timestamp_ltz/make_timestamp_ntz/make_ym_interval`

2024-01-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46614:
---
Labels: pull-request-available  (was: )

> Refine docstring 
> `make_timestamp/make_timestamp_ltz/make_timestamp_ntz/make_ym_interval`
> 
>
> Key: SPARK-46614
> URL: https://issues.apache.org/jira/browse/SPARK-46614
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46656) Split `GroupbyParitySplitApplyTests`

2024-01-10 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-46656:
-

Assignee: Ruifeng Zheng

> Split `GroupbyParitySplitApplyTests`
> 
>
> Key: SPARK-46656
> URL: https://issues.apache.org/jira/browse/SPARK-46656
> Project: Spark
>  Issue Type: Sub-task
>  Components: PS, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46656) Split `GroupbyParitySplitApplyTests`

2024-01-10 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-46656.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44664
[https://github.com/apache/spark/pull/44664]

> Split `GroupbyParitySplitApplyTests`
> 
>
> Key: SPARK-46656
> URL: https://issues.apache.org/jira/browse/SPARK-46656
> Project: Spark
>  Issue Type: Sub-task
>  Components: PS, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46642) Add `getMessageTemplate` to PySpark error framework

2024-01-10 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee resolved SPARK-46642.
-
Resolution: Won't Fix

> Add `getMessageTemplate` to PySpark error framework
> ---
>
> Key: SPARK-46642
> URL: https://issues.apache.org/jira/browse/SPARK-46642
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>
> We should have `getMessageTemplate` to PySpark error framework to meet the 
> feature parity with JVM side.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46638) Create API to acquire execution memory for 'eval' and 'terminate' methods

2024-01-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46638:
---
Labels: pull-request-available  (was: )

> Create API to acquire execution memory for 'eval' and 'terminate' methods
> -
>
> Key: SPARK-46638
> URL: https://issues.apache.org/jira/browse/SPARK-46638
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Daniel
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46662) Upgrade kubernetes-client to 6.10.0

2024-01-10 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-46662:
-

Assignee: Bjørn Jørgensen

> Upgrade kubernetes-client to 6.10.0
> ---
>
> Key: SPARK-46662
> URL: https://issues.apache.org/jira/browse/SPARK-46662
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 4.0.0
>Reporter: Bjørn Jørgensen
>Assignee: Bjørn Jørgensen
>Priority: Major
>  Labels: pull-request-available
>
> new version https://github.com/fabric8io/kubernetes-client/releases



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46662) Upgrade kubernetes-client to 6.10.0

2024-01-10 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-46662.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44672
[https://github.com/apache/spark/pull/44672]

> Upgrade kubernetes-client to 6.10.0
> ---
>
> Key: SPARK-46662
> URL: https://issues.apache.org/jira/browse/SPARK-46662
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 4.0.0
>Reporter: Bjørn Jørgensen
>Assignee: Bjørn Jørgensen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> new version https://github.com/fabric8io/kubernetes-client/releases



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46658) Loosen Ruby dependency specs for doc build

2024-01-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46658.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44667
[https://github.com/apache/spark/pull/44667]

> Loosen Ruby dependency specs for doc build
> --
>
> Key: SPARK-46658
> URL: https://issues.apache.org/jira/browse/SPARK-46658
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 4.0.0
>Reporter: Nicholas Chammas
>Assignee: Nicholas Chammas
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46658) Loosen Ruby dependency specs for doc build

2024-01-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46658:


Assignee: Nicholas Chammas

> Loosen Ruby dependency specs for doc build
> --
>
> Key: SPARK-46658
> URL: https://issues.apache.org/jira/browse/SPARK-46658
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 4.0.0
>Reporter: Nicholas Chammas
>Assignee: Nicholas Chammas
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46666) Make lxml as an optional testing dependency in test_session

2024-01-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-4?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-4:
---
Labels: pull-request-available  (was: )

> Make lxml as an optional testing dependency in test_session
> ---
>
> Key: SPARK-4
> URL: https://issues.apache.org/jira/browse/SPARK-4
> Project: Spark
>  Issue Type: Test
>  Components: PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>  Labels: pull-request-available
>
> {code}
> Traceback (most recent call last):
>   File "", line 198, in _run_module_as_main
>   File "", line 88, in _run_code
>   File "/__w/spark/spark/python/pyspark/sql/tests/test_session.py", line 22, 
> in 
> from lxml import etree
> ModuleNotFoundError: No module named 'lxml'
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46666) Make lxml as an optional testing dependency in test_session

2024-01-10 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-4:


 Summary: Make lxml as an optional testing dependency in 
test_session
 Key: SPARK-4
 URL: https://issues.apache.org/jira/browse/SPARK-4
 Project: Spark
  Issue Type: Test
  Components: PySpark, Tests
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


{code}
Traceback (most recent call last):
  File "", line 198, in _run_module_as_main
  File "", line 88, in _run_code
  File "/__w/spark/spark/python/pyspark/sql/tests/test_session.py", line 22, in 

from lxml import etree
ModuleNotFoundError: No module named 'lxml'
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46665) Remove Pandas dependency for pyspark.testing

2024-01-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46665:
---
Labels: pull-request-available  (was: )

> Remove Pandas dependency for pyspark.testing
> 
>
> Key: SPARK-46665
> URL: https://issues.apache.org/jira/browse/SPARK-46665
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>
> We should not make pyspark.testing depending on Pandas.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46665) Remove Pandas dependency for pyspark.testing

2024-01-10 Thread Haejoon Lee (Jira)
Haejoon Lee created SPARK-46665:
---

 Summary: Remove Pandas dependency for pyspark.testing
 Key: SPARK-46665
 URL: https://issues.apache.org/jira/browse/SPARK-46665
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Haejoon Lee


We should not make pyspark.testing depending on Pandas.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46662) Upgrade kubernetes-client to 6.10.0

2024-01-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46662:
---
Labels: pull-request-available  (was: )

> Upgrade kubernetes-client to 6.10.0
> ---
>
> Key: SPARK-46662
> URL: https://issues.apache.org/jira/browse/SPARK-46662
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 4.0.0
>Reporter: Bjørn Jørgensen
>Priority: Major
>  Labels: pull-request-available
>
> new version https://github.com/fabric8io/kubernetes-client/releases



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46640) RemoveRedundantAliases does not account for SubqueryExpression when removing aliases

2024-01-10 Thread Nikhil Sheoran (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nikhil Sheoran updated SPARK-46640:
---
Fix Version/s: (was: 4.0.0)

> RemoveRedundantAliases does not account for SubqueryExpression when removing 
> aliases
> 
>
> Key: SPARK-46640
> URL: https://issues.apache.org/jira/browse/SPARK-46640
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer
>Affects Versions: 4.0.0
>Reporter: Nikhil Sheoran
>Priority: Minor
>  Labels: pull-request-available
>
> `RemoveRedundantAliases{{{}`{}}} does not take into account the outer 
> attributes of a `SubqueryExpression` aliases, potentially removing them if it 
> thinks they are redundant.
> This can cause scenarios where a subquery expression has conditions like `a#x 
> = a#x` i.e. both the attribute names and the expression ID(s) are the same. 
> This can then lead to conflicting expression ID(s) error.
> In `RemoveRedundantAliases`, we have an excluded AttributeSet argument 
> denoting the references for which we should not remove aliases. For a query 
> with a subquery expression, adding the references of this subquery in the 
> excluded set prevents such rewrite from happening.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46662) Upgrade kubernetes-client to 6.10.0

2024-01-10 Thread Jira
Bjørn Jørgensen created SPARK-46662:
---

 Summary: Upgrade kubernetes-client to 6.10.0
 Key: SPARK-46662
 URL: https://issues.apache.org/jira/browse/SPARK-46662
 Project: Spark
  Issue Type: Dependency upgrade
  Components: Kubernetes
Affects Versions: 4.0.0
Reporter: Bjørn Jørgensen


new version https://github.com/fabric8io/kubernetes-client/releases



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46657) Install `lxml` in Python 3.12

2024-01-10 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-46657.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44666
[https://github.com/apache/spark/pull/44666]

> Install `lxml` in Python 3.12
> -
>
> Key: SPARK-46657
> URL: https://issues.apache.org/jira/browse/SPARK-46657
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46657) Install `lxml` in Python 3.12

2024-01-10 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-46657:
-

Assignee: Dongjoon Hyun

> Install `lxml` in Python 3.12
> -
>
> Key: SPARK-46657
> URL: https://issues.apache.org/jira/browse/SPARK-46657
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46660) ReattachExecute requests do not refresh aliveness of SessionHolder

2024-01-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46660:
---
Labels: pull-request-available  (was: )

> ReattachExecute requests do not refresh aliveness of SessionHolder
> --
>
> Key: SPARK-46660
> URL: https://issues.apache.org/jira/browse/SPARK-46660
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Venkata Sai Akhil Gudesa
>Priority: Major
>  Labels: pull-request-available
>
> In the first executePlan request, creating the {{ExecuteHolder}} triggers  
> {{getOrCreateIsolatedSession}} which refreshes the aliveness of 
> {{{}SessionHolder{}}}. However in {{ReattachExecute}} , we fetch the 
> {{ExecuteHolder}} directly without going through the {{SessionHolder}} (and 
> hence making it seem like the {{SessionHolder}} is idle).
>  
> This would result in long-running queries (which do not send release execute 
> requests since that refreshes aliveness) failing because the 
> {{SessionHolder}} would expire during active query execution.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46661) Add customizable property spark.dynamicAllocation.lastExecutorIdleTimeout for last remaining executor, defaulting to spark.dynamicAllocation.executorIdleTimeout

2024-01-10 Thread Arnaud Nauwynck (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arnaud Nauwynck updated SPARK-46661:

Description: 
when using dynamicAllocation, the parameter 
"spark.dynamicAllocation.executorIdleTimeout" is used for any executor, 
regardless whether it is the last running one, or any other useless one.

However, it might be interresting to preserve the last executor running longer 
when it is the last remaining one, so that any incoming new task would be 
immediatly processed faster, instead of waiting for a complete restart of 
executors that may take >= 30 secondes.

This is particularly frequent in scenario when using Spark Streaming, and when 
polling for micro-batches. Preserving 1 alive executors help responding faster, 
while still allowing dynamic alllocation for 2,3..N executors.

In practise, this might change only the following source code lines 

In 
[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/internal/config/package.scala#L647-L653|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/internal/config/package.scala#L647-L653]
to add
{code:java}
  private[spark] val DYN_ALLOCATION_LAST_EXECUTOR_IDLE_TIMEOUT =
ConfigBuilder("spark.dynamicAllocation.lastExecutorIdleTimeout")
  .version("3.6.0")
  .timeConf(TimeUnit.SECONDS)
  .checkValue(_ >= 0L, "Last Timeout must be >= 0 (and preferrably >= 
spark.dynamicAllocation.executorIdleTimeout)")
  .createWithDefault(60)
{code}



In  
[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L46|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L46]

{code:java}
  private val idleTimeoutNs = TimeUnit.SECONDS.toNanos(
conf.get(DYN_ALLOCATION_EXECUTOR_IDLE_TIMEOUT))
{code}
to add
{code:java}
  private val idleTimeoutNs = TimeUnit.SECONDS.toNanos(
conf.get(DYN_ALLOCATION_EXECUTOR_IDLE_TIMEOUT))
  private val lastIdleTimeoutNs = TimeUnit.SECONDS.toNanos(
conf.get(DYN_ALLOCATION_EXECUTOR_LAST_IDLE_TIMEOUT))
{code}


and replace (insert if-condition) in
[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L573|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L573]

{code:java}
def updateTimeout(): Unit = {
...
 val timeout = Seq(_cacheTimeout, _shuffleTimeout, 
idleTimeoutNs).max
...
{code}

to be something like

{code:java}
def updateTimeout(): Unit = {
...
  val isOnlyOneLastExecutorRemaining = 
  val currIddleTimeoutNs = if (isOnlyOneLastExecutorRemaining) 
lastIdleTimeoutNs else idleTimeoutNs
  val timeout = Seq(_cacheTimeout, _shuffleTimeout, 
currIddleTimeoutNs).max
...
{code}


  was:
when using dynamicAllocation, the parameter 
"spark.dynamicAllocation.executorIdleTimeout" is used for any executor, 
regardless whether it is the last running one, or any other useless one.

However, it might be interresting to preserve the last executor running longer 
when it is the last remaining one, so that any incoming new task would be 
immediatly processed faster, instead of waiting for a complete restart of 
executors that may take >= 30 secondes.

This is particularly frequent in scenario when using Spark Streaming, and when 
polling for micro-batches. Preserving 1 alive executors help responding faster, 
while still allowing dynamic alllocation for 2,3..N executors.

In practise, this might change only the following source code lines 

In 
[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/internal/config/package.scala#L647-L653|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/internal/config/package.scala#L647-L653]
to add
{code:java}
  private[spark] val DYN_ALLOCATION_LAST_EXECUTOR_IDLE_TIMEOUT =
ConfigBuilder("spark.dynamicAllocation.lastExecutorIdleTimeout")
  .version("3.6.0")
  .timeConf(TimeUnit.SECONDS)
  .checkValue(_ >= 0L, "Last Timeout must be >= 0 (and preferrably >= 
spark.dynamicAllocation.executorIdleTimeout)")
  .createWithDefault(60)
{code}



In  
[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L46
|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L46
]

{code:java}
  private val idleTimeoutNs = TimeUnit.SECONDS.toNanos(
conf.get(DYN_ALLOCATION_EXECUTOR_IDLE_TIMEOUT))
{code}
to add
{code:java}
  private val idleTimeoutNs = TimeUnit.SECONDS.toNanos(
conf.get(DYN_ALLOCATION_EXECUTOR_IDLE_TIMEOUT))
  private val lastIdleTimeoutNs = TimeUnit.SECONDS.toNanos(

[jira] [Created] (SPARK-46661) Add customizable property spark.dynamicAllocation.lastExecutorIdleTimeout for last remaining executor, defaulting to spark.dynamicAllocation.executorIdleTimeout

2024-01-10 Thread Arnaud Nauwynck (Jira)
Arnaud Nauwynck created SPARK-46661:
---

 Summary: Add customizable property 
spark.dynamicAllocation.lastExecutorIdleTimeout for last remaining executor, 
defaulting to spark.dynamicAllocation.executorIdleTimeout
 Key: SPARK-46661
 URL: https://issues.apache.org/jira/browse/SPARK-46661
 Project: Spark
  Issue Type: Wish
  Components: Spark Core
Affects Versions: 3.5.0, 4.0.0
Reporter: Arnaud Nauwynck


when using dynamicAllocation, the parameter 
"spark.dynamicAllocation.executorIdleTimeout" is used for any executor, 
regardless whether it is the last running one, or any other useless one.

However, it might be interresting to preserve the last executor running longer 
when it is the last remaining one, so that any incoming new task would be 
immediatly processed faster, instead of waiting for a complete restart of 
executors that may take >= 30 secondes.

This is particularly frequent in scenario when using Spark Streaming, and when 
polling for micro-batches. Preserving 1 alive executors help responding faster, 
while still allowing dynamic alllocation for 2,3..N executors.

In practise, this might change only the following source code lines 

In 
[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/internal/config/package.scala#L647-L653|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/internal/config/package.scala#L647-L653]
to add
{code:java}
  private[spark] val DYN_ALLOCATION_LAST_EXECUTOR_IDLE_TIMEOUT =
ConfigBuilder("spark.dynamicAllocation.lastExecutorIdleTimeout")
  .version("3.6.0")
  .timeConf(TimeUnit.SECONDS)
  .checkValue(_ >= 0L, "Last Timeout must be >= 0 (and preferrably >= 
spark.dynamicAllocation.executorIdleTimeout)")
  .createWithDefault(60)
{code}



In  
[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L46
|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L46
]

{code:java}
  private val idleTimeoutNs = TimeUnit.SECONDS.toNanos(
conf.get(DYN_ALLOCATION_EXECUTOR_IDLE_TIMEOUT))
{code}
to add
{code:java}
  private val idleTimeoutNs = TimeUnit.SECONDS.toNanos(
conf.get(DYN_ALLOCATION_EXECUTOR_IDLE_TIMEOUT))
  private val lastIdleTimeoutNs = TimeUnit.SECONDS.toNanos(
conf.get(DYN_ALLOCATION_EXECUTOR_LAST_IDLE_TIMEOUT))
{code}


and replace (insert if-condition) in
[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L573
|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L573
]

{code:java}
def updateTimeout(): Unit = {
...
 val timeout = Seq(_cacheTimeout, _shuffleTimeout, 
idleTimeoutNs).max
...
{code}

to be something like

{code:java}
def updateTimeout(): Unit = {
...
  val isOnlyOneLastExecutorRemaining = 
  val currIddleTimeoutNs = if (isOnlyOneLastExecutorRemaining) 
lastIdleTimeoutNs else idleTimeoutNs
  val timeout = Seq(_cacheTimeout, _shuffleTimeout, 
currIddleTimeoutNs).max
...
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46660) ReattachExecute requests do not refresh aliveness of SessionHolder

2024-01-10 Thread Venkata Sai Akhil Gudesa (Jira)
Venkata Sai Akhil Gudesa created SPARK-46660:


 Summary: ReattachExecute requests do not refresh aliveness of 
SessionHolder
 Key: SPARK-46660
 URL: https://issues.apache.org/jira/browse/SPARK-46660
 Project: Spark
  Issue Type: Bug
  Components: Connect
Affects Versions: 4.0.0
Reporter: Venkata Sai Akhil Gudesa


In the first executePlan request, creating the {{ExecuteHolder}} triggers  
{{getOrCreateIsolatedSession}} which refreshes the aliveness of 
{{{}SessionHolder{}}}. However in {{ReattachExecute}} , we fetch the 
{{ExecuteHolder}} directly without going through the {{SessionHolder}} (and 
hence making it seem like the {{SessionHolder}} is idle).

 

This would result in long-running queries (which do not send release execute 
requests since that refreshes aliveness) failing because the {{SessionHolder}} 
would expire during active query execution.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46659) Add customizable TaskScheduling param, to avoid randomly choosing executor for tasks, and downscale on low micro-batches activity

2024-01-10 Thread Arnaud Nauwynck (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arnaud Nauwynck updated SPARK-46659:

Description: 
When using dynamicAllocation (but not spark.decommission.enabled=true) with a 
micro-batches activity, very small tasks are arriving at regular interval, and 
are processed extremely quickly.
The flow of events that are processed may consume less than 1% of the cpu of 
the cluster.
But globally, the number of executors stay at a high level 
(spark.dynamicAllocation.maxExecutors) eventhough they are all 99% of the time 
IDDLE.

Unfortunatly, in the current code, tasks are assigned randomly to executors, so 
a constant flow of very small tasks maintain artificially in an "active" status 
all the executors: 
all executors are receiving tasks from time to time, so strictly speaking, they 
are never considered as IDDLE during a duration longer than 
"spark.dynamicAllocation.executorIdleTimeout". 

Therefore, executors are never marked as candidate for decommissioning, and 
they continue to receive tasks forever, while thoses tasks could easily be 
assigned to any other executor (chosen not randomly).


The proposition is therefore to add a new configuration property to suppress 
the random shuffling of assignable offers for task.

see this code 
[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala#L773|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala#L773]


{code:java}
  /**
   * Shuffle offers around to avoid always placing tasks on the same workers.  
Exposed to allow
   * overriding in tests, so it can be deterministic.
   */
  protected def shuffleOffers(offers: IndexedSeq[WorkerOffer]): 
IndexedSeq[WorkerOffer] = {
Random.shuffle(offers)
  }
{code}

It could be replaced simply by

{code:java}
val SKIP_RANDOMIZE_WORKER_OFFERS =  
ConfigBuilder("spark.task.skipRandomizeWorkerOffers")
  .version("3.6.0")
  .booleanConf
  .createWithDefault(false)
..

val skipRandomizeWorkerOffers = conf.get(SKIP_RANDOMIZE_WORKER_OFFERS)

..

  protected def shuffleOffers(offers: IndexedSeq[WorkerOffer]): 
IndexedSeq[WorkerOffer] = {
if (skipRandomizeWorkerOffers) {
   offers
} else {
   Random.shuffle(offers)
}
  }
{code}



  was:
When using dynamicAllocation (but not spark.decommission.enabled=true) with a 
micro-batches activity, very small tasks are arriving at regular interval, and 
are processed extremely quickly.
The flow of events that are processed may consume less than 1% of the cpu of 
the cluster.
But globally, the number of executors stay at a high level 
(spark.dynamicAllocation.maxExecutors) eventhough they are all 99% of the time 
IDDLE.

Unfortunatly, in the current code, tasks are assigned randomly to executors, so 
a constant flow of very small tasks maintain artificially in an "active" status 
all the executors: 
all executors are receiving tasks from time to time, so strictly speaking, they 
are never considered as IDDLE during a duration longer than 
"spark.dynamicAllocation.executorIdleTimeout". 

Therefore, executors are never marked as candidate for decommissioning, and 
they continue to receive tasks forever, while thoses tasks could easily be 
assigned to any other executor (chosen not randomly).


The proposition is therefore to add a new configuration property to suppress 
the random shuffling of assignable offers for task.

see this code 
[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala#L773|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala#L773]

``` java
  /**
   * Shuffle offers around to avoid always placing tasks on the same workers.  
Exposed to allow
   * overriding in tests, so it can be deterministic.
   */
  protected def shuffleOffers(offers: IndexedSeq[WorkerOffer]): 
IndexedSeq[WorkerOffer] = {
Random.shuffle(offers)
  }
```

It could be replaced simply by

``` java
val SKIP_RANDOMIZE_WORKER_OFFERS =  
ConfigBuilder("spark.task.skipRandomizeWorkerOffers")
  .version("3.6.0")
  .booleanConf
  .createWithDefault(false)
..

val skipRandomizeWorkerOffers = conf.get(SKIP_RANDOMIZE_WORKER_OFFERS)

..

  protected def shuffleOffers(offers: IndexedSeq[WorkerOffer]): 
IndexedSeq[WorkerOffer] = {
if (skipRandomizeWorkerOffers) {
   offers
} else {
   Random.shuffle(offers)
}
  }
```



> Add customizable TaskScheduling param, to avoid randomly choosing executor 
> for tasks, and downscale on low micro-batches activity
> -
>
> Key: SPARK-46659
> URL: 

[jira] [Updated] (SPARK-46659) Add customizable TaskScheduling param, to avoid randomly choosing executor for tasks, and downscale on low micro-batches activity

2024-01-10 Thread Arnaud Nauwynck (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arnaud Nauwynck updated SPARK-46659:

Description: 
When using dynamicAllocation (but not spark.decommission.enabled=true) with a 
micro-batches activity, very small tasks are arriving at regular interval, and 
are processed extremely quickly.
The flow of events that are processed may consume less than 1% of the cpu of 
the cluster.
But globally, the number of executors stay at a high level 
(spark.dynamicAllocation.maxExecutors) eventhough they are all 99% of the time 
IDDLE.

Unfortunatly, in the current code, tasks are assigned randomly to executors, so 
a constant flow of very small tasks maintain artificially in an "active" status 
all the executors: 
all executors are receiving tasks from time to time, so strictly speaking, they 
are never considered as IDDLE during a duration longer than 
"spark.dynamicAllocation.executorIdleTimeout". 

Therefore, executors are never marked as candidate for decommissioning, and 
they continue to receive tasks forever, while thoses tasks could easily be 
assigned to any other executor (chosen not randomly).


The proposition is therefore to add a new configuration property to suppress 
the random shuffling of assignable offers for task.

see this code 
[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala#L773|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala#L773]
```java
  /**
   * Shuffle offers around to avoid always placing tasks on the same workers.  
Exposed to allow
   * overriding in tests, so it can be deterministic.
   */
  protected def shuffleOffers(offers: IndexedSeq[WorkerOffer]): 
IndexedSeq[WorkerOffer] = {
Random.shuffle(offers)
  }
```

It could be replaced simply by
```java
val SKIP_RANDOMIZE_WORKER_OFFERS =  
ConfigBuilder("spark.task.skipRandomizeWorkerOffers")
  .version("3.6.0")
  .booleanConf
  .createWithDefault(false)
..

val skipRandomizeWorkerOffers = conf.get(SKIP_RANDOMIZE_WORKER_OFFERS)

..

  protected def shuffleOffers(offers: IndexedSeq[WorkerOffer]): 
IndexedSeq[WorkerOffer] = {
if (skipRandomizeWorkerOffers) {
   offers
} else {
   Random.shuffle(offers)
}
  }
```


  was:
When using dynamicAllocation (but not spark.decommission.enabled=true) with a 
micro-batches activity, very small tasks are arriving at regular interval, and 
are processed extremely quickly.
The flow of events that are processed may consume less than 1% of the cpu of 
the cluster.
But globally, the number of executors stay at a high level 
(spark.dynamicAllocation.maxExecutors) eventhough they are all 99% of the time 
IDDLE.

Unfortunatly, in the current code, tasks are assigned randomly to executors, so 
a constant flow of very small tasks maintain artificially in an "active" status 
all the executors: 
all executors are receiving tasks from time to time, so strictly speaking, they 
are never considered as IDDLE during a duration longer than 
"spark.dynamicAllocation.executorIdleTimeout". 

Therefore, executors are never marked as candidate for decommissioning, and 
they continue to receive tasks forever, while thoses tasks could easily be 
assigned to any other executor (chosen not randomly).


The proposition is therefore to add a new configuration property to suppress 
the random shuffling of assignable offers for task.

see this code 
[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala#L773|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala#L773]
{{
  /**
   * Shuffle offers around to avoid always placing tasks on the same workers.  
Exposed to allow
   * overriding in tests, so it can be deterministic.
   */
  protected def shuffleOffers(offers: IndexedSeq[WorkerOffer]): 
IndexedSeq[WorkerOffer] = {
Random.shuffle(offers)
  }
}}

It could be replaced simply by
{{
val SKIP_RANDOMIZE_WORKER_OFFERS =  
ConfigBuilder("spark.task.skipRandomizeWorkerOffers")
  .version("3.6.0")
  .booleanConf
  .createWithDefault(false)
..

val skipRandomizeWorkerOffers = conf.get(SKIP_RANDOMIZE_WORKER_OFFERS)

..

  protected def shuffleOffers(offers: IndexedSeq[WorkerOffer]): 
IndexedSeq[WorkerOffer] = {
if (skipRandomizeWorkerOffers) {
   offers
} else {
   Random.shuffle(offers)
}
  }
}}



> Add customizable TaskScheduling param, to avoid randomly choosing executor 
> for tasks, and downscale on low micro-batches activity
> -
>
> Key: SPARK-46659
> URL: https://issues.apache.org/jira/browse/SPARK-46659
> 

[jira] [Updated] (SPARK-46659) Add customizable TaskScheduling param, to avoid randomly choosing executor for tasks, and downscale on low micro-batches activity

2024-01-10 Thread Arnaud Nauwynck (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arnaud Nauwynck updated SPARK-46659:

Description: 
When using dynamicAllocation (but not spark.decommission.enabled=true) with a 
micro-batches activity, very small tasks are arriving at regular interval, and 
are processed extremely quickly.
The flow of events that are processed may consume less than 1% of the cpu of 
the cluster.
But globally, the number of executors stay at a high level 
(spark.dynamicAllocation.maxExecutors) eventhough they are all 99% of the time 
IDDLE.

Unfortunatly, in the current code, tasks are assigned randomly to executors, so 
a constant flow of very small tasks maintain artificially in an "active" status 
all the executors: 
all executors are receiving tasks from time to time, so strictly speaking, they 
are never considered as IDDLE during a duration longer than 
"spark.dynamicAllocation.executorIdleTimeout". 

Therefore, executors are never marked as candidate for decommissioning, and 
they continue to receive tasks forever, while thoses tasks could easily be 
assigned to any other executor (chosen not randomly).


The proposition is therefore to add a new configuration property to suppress 
the random shuffling of assignable offers for task.

see this code 
[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala#L773|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala#L773]

``` java
  /**
   * Shuffle offers around to avoid always placing tasks on the same workers.  
Exposed to allow
   * overriding in tests, so it can be deterministic.
   */
  protected def shuffleOffers(offers: IndexedSeq[WorkerOffer]): 
IndexedSeq[WorkerOffer] = {
Random.shuffle(offers)
  }
```

It could be replaced simply by

``` java
val SKIP_RANDOMIZE_WORKER_OFFERS =  
ConfigBuilder("spark.task.skipRandomizeWorkerOffers")
  .version("3.6.0")
  .booleanConf
  .createWithDefault(false)
..

val skipRandomizeWorkerOffers = conf.get(SKIP_RANDOMIZE_WORKER_OFFERS)

..

  protected def shuffleOffers(offers: IndexedSeq[WorkerOffer]): 
IndexedSeq[WorkerOffer] = {
if (skipRandomizeWorkerOffers) {
   offers
} else {
   Random.shuffle(offers)
}
  }
```


  was:
When using dynamicAllocation (but not spark.decommission.enabled=true) with a 
micro-batches activity, very small tasks are arriving at regular interval, and 
are processed extremely quickly.
The flow of events that are processed may consume less than 1% of the cpu of 
the cluster.
But globally, the number of executors stay at a high level 
(spark.dynamicAllocation.maxExecutors) eventhough they are all 99% of the time 
IDDLE.

Unfortunatly, in the current code, tasks are assigned randomly to executors, so 
a constant flow of very small tasks maintain artificially in an "active" status 
all the executors: 
all executors are receiving tasks from time to time, so strictly speaking, they 
are never considered as IDDLE during a duration longer than 
"spark.dynamicAllocation.executorIdleTimeout". 

Therefore, executors are never marked as candidate for decommissioning, and 
they continue to receive tasks forever, while thoses tasks could easily be 
assigned to any other executor (chosen not randomly).


The proposition is therefore to add a new configuration property to suppress 
the random shuffling of assignable offers for task.

see this code 
[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala#L773|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala#L773]
```java
  /**
   * Shuffle offers around to avoid always placing tasks on the same workers.  
Exposed to allow
   * overriding in tests, so it can be deterministic.
   */
  protected def shuffleOffers(offers: IndexedSeq[WorkerOffer]): 
IndexedSeq[WorkerOffer] = {
Random.shuffle(offers)
  }
```

It could be replaced simply by
```java
val SKIP_RANDOMIZE_WORKER_OFFERS =  
ConfigBuilder("spark.task.skipRandomizeWorkerOffers")
  .version("3.6.0")
  .booleanConf
  .createWithDefault(false)
..

val skipRandomizeWorkerOffers = conf.get(SKIP_RANDOMIZE_WORKER_OFFERS)

..

  protected def shuffleOffers(offers: IndexedSeq[WorkerOffer]): 
IndexedSeq[WorkerOffer] = {
if (skipRandomizeWorkerOffers) {
   offers
} else {
   Random.shuffle(offers)
}
  }
```



> Add customizable TaskScheduling param, to avoid randomly choosing executor 
> for tasks, and downscale on low micro-batches activity
> -
>
> Key: SPARK-46659
> URL: 

[jira] [Created] (SPARK-46659) Add customizable TaskScheduling param, to avoid randomly choosing executor for tasks, and downscale on low micro-batches activity

2024-01-10 Thread Arnaud Nauwynck (Jira)
Arnaud Nauwynck created SPARK-46659:
---

 Summary: Add customizable TaskScheduling param, to avoid randomly 
choosing executor for tasks, and downscale on low micro-batches activity
 Key: SPARK-46659
 URL: https://issues.apache.org/jira/browse/SPARK-46659
 Project: Spark
  Issue Type: Wish
  Components: Spark Core
Affects Versions: 3.5.0, 3.4.0, 4.0.0
Reporter: Arnaud Nauwynck


When using dynamicAllocation (but not spark.decommission.enabled=true) with a 
micro-batches activity, very small tasks are arriving at regular interval, and 
are processed extremely quickly.
The flow of events that are processed may consume less than 1% of the cpu of 
the cluster.
But globally, the number of executors stay at a high level 
(spark.dynamicAllocation.maxExecutors) eventhough they are all 99% of the time 
IDDLE.

Unfortunatly, in the current code, tasks are assigned randomly to executors, so 
a constant flow of very small tasks maintain artificially in an "active" status 
all the executors: 
all executors are receiving tasks from time to time, so strictly speaking, they 
are never considered as IDDLE during a duration longer than 
"spark.dynamicAllocation.executorIdleTimeout". 

Therefore, executors are never marked as candidate for decommissioning, and 
they continue to receive tasks forever, while thoses tasks could easily be 
assigned to any other executor (chosen not randomly).


The proposition is therefore to add a new configuration property to suppress 
the random shuffling of assignable offers for task.

see this code 
[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala#L773|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala#L773]
{{
  /**
   * Shuffle offers around to avoid always placing tasks on the same workers.  
Exposed to allow
   * overriding in tests, so it can be deterministic.
   */
  protected def shuffleOffers(offers: IndexedSeq[WorkerOffer]): 
IndexedSeq[WorkerOffer] = {
Random.shuffle(offers)
  }
}}

It could be replaced simply by
{{
val SKIP_RANDOMIZE_WORKER_OFFERS =  
ConfigBuilder("spark.task.skipRandomizeWorkerOffers")
  .version("3.6.0")
  .booleanConf
  .createWithDefault(false)
..

val skipRandomizeWorkerOffers = conf.get(SKIP_RANDOMIZE_WORKER_OFFERS)

..

  protected def shuffleOffers(offers: IndexedSeq[WorkerOffer]): 
IndexedSeq[WorkerOffer] = {
if (skipRandomizeWorkerOffers) {
   offers
} else {
   Random.shuffle(offers)
}
  }
}}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46658) Loosen Ruby dependency specs for doc build

2024-01-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46658:
---
Labels: pull-request-available  (was: )

> Loosen Ruby dependency specs for doc build
> --
>
> Key: SPARK-46658
> URL: https://issues.apache.org/jira/browse/SPARK-46658
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 4.0.0
>Reporter: Nicholas Chammas
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42199) groupByKey creates columns that may conflict with exising columns

2024-01-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-42199:
---
Labels: pull-request-available  (was: )

> groupByKey creates columns that may conflict with exising columns
> -
>
> Key: SPARK-42199
> URL: https://issues.apache.org/jira/browse/SPARK-42199
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.3, 3.1.3, 3.2.3, 3.3.2, 3.4.0, 3.5.0
>Reporter: Enrico Minack
>Priority: Major
>  Labels: pull-request-available
>
> Calling {{ds.groupByKey(func: V => K)}} creates columns to store the key 
> value. These columns may conflict with columns that already exist in {{ds}}. 
> Function {{Dataset.groupByKey.agg}} accounts for this with a very specific 
> rule, which has some surprising weaknesses:
> {code:scala}
> spark.range(1)
>   // groupByKey adds column 'value'
>   .groupByKey(id => id)
>   // which cannot be referenced, though it is suggested
>   .agg(count("value"))
> {code}
> {code:java}
> org.apache.spark.sql.AnalysisException: Column 'value' does not exist. Did 
> you mean one of the following? [value, id];
> {code}
> An existing 'value' column can be referenced:
> {code:scala}
> // dataset with column 'value'
> spark.range(1).select($"id".as("value")).as[Long]
>   // groupByKey adds another column 'value'
>   .groupByKey(id => id)
>   // agg accounts for the extra column and excludes it when resolving 'value'
>   .agg(count("value"))
>   .show()
> {code}
> {code:java}
> +---++
> |key|count(value)|
> +---++
> |  0|   1|
> +---++
> {code}
> While column suggestion shows both 'value' columns:
> {code:scala}
> spark.range(1).select($"id".as("value")).as[Long]
>   .groupByKey(id => id)
>   .agg(count("unknown"))
> {code}
> {code:java}
> org.apache.spark.sql.AnalysisException: Column 'unknown' does not exist. Did 
> you mean one of the following? [value, value]
> {code}
> However, {{mapValues}} introduces another 'value' column, which should be 
> referencable, but it breaks the exclusion introduced by {{agg}}:
> {code:scala}
> spark.range(1)
>   // groupByKey adds column 'value'
>   .groupByKey(id => id)
>   // adds another 'value' column
>   .mapValues(value => value)
>   // which cannot be referenced in agg
>   .agg(count("value"))
> {code}
> {code:java}
> org.apache.spark.sql.AnalysisException: Reference 'value' is ambiguous, could 
> be: value, value.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46658) Loosen Ruby dependency specs for doc build

2024-01-10 Thread Nicholas Chammas (Jira)
Nicholas Chammas created SPARK-46658:


 Summary: Loosen Ruby dependency specs for doc build
 Key: SPARK-46658
 URL: https://issues.apache.org/jira/browse/SPARK-46658
 Project: Spark
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 4.0.0
Reporter: Nicholas Chammas






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44173) Make Spark an sbt build only project

2024-01-10 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17805237#comment-17805237
 ] 

Dongjoon Hyun commented on SPARK-44173:
---

IIRC, there is a discussion about this and we decided to stick to Maven because 
its explicit dependency management was preferred at that time, [~LuciferYang].

> Make Spark an sbt build only project
> 
>
> Key: SPARK-44173
> URL: https://issues.apache.org/jira/browse/SPARK-44173
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Minor
>
> Supporting both Maven and SBT always brings various testing problems and 
> increases the complexity of testing code writing
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-44173) Make Spark an sbt build only project

2024-01-10 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17805237#comment-17805237
 ] 

Dongjoon Hyun edited comment on SPARK-44173 at 1/10/24 5:40 PM:


IIRC, there was a discussion about this and we decided to stick to Maven 
because its explicit dependency management was preferred at that time, 
[~LuciferYang].


was (Author: dongjoon):
IIRC, there is a discussion about this and we decided to stick to Maven because 
its explicit dependency management was preferred at that time, [~LuciferYang].

> Make Spark an sbt build only project
> 
>
> Key: SPARK-44173
> URL: https://issues.apache.org/jira/browse/SPARK-44173
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Minor
>
> Supporting both Maven and SBT always brings various testing problems and 
> increases the complexity of testing code writing
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46657) Install `lxml` in Python 3.12

2024-01-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46657:
---
Labels: pull-request-available  (was: )

> Install `lxml` in Python 3.12
> -
>
> Key: SPARK-46657
> URL: https://issues.apache.org/jira/browse/SPARK-46657
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46657) Install `lxml` in Python 3.12

2024-01-10 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-46657:
-

 Summary: Install `lxml` in Python 3.12
 Key: SPARK-46657
 URL: https://issues.apache.org/jira/browse/SPARK-46657
 Project: Spark
  Issue Type: Bug
  Components: Project Infra
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46547) Fix deadlock issue between maintenance thread and streaming agg physical operators

2024-01-10 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim resolved SPARK-46547.
--
Fix Version/s: 3.5.1
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 44542
[https://github.com/apache/spark/pull/44542]

> Fix deadlock issue between maintenance thread and streaming agg physical 
> operators
> --
>
> Key: SPARK-46547
> URL: https://issues.apache.org/jira/browse/SPARK-46547
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Anish Shrigondekar
>Assignee: Anish Shrigondekar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.1, 4.0.0
>
>
> Fix deadlock issue between maintenance thread and streaming agg physical 
> operators



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46653) Code-gen for full outer sort merge join output line by line

2024-01-10 Thread Mingliang Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Zhu updated SPARK-46653:
--
Description: Be consistent with closing code-gen, avoid oom when there are 
a large number of duplicate keys in full outer sort merge join.  (was: Be 
consistent with closing code-gen, avoid oom when there are a large number of 
duplicate keys and the parent of SortMergeJoin cannot code-gen.)

>  Code-gen for full outer sort merge join output line by line
> 
>
> Key: SPARK-46653
> URL: https://issues.apache.org/jira/browse/SPARK-46653
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Mingliang Zhu
>Priority: Major
>  Labels: pull-request-available
>
> Be consistent with closing code-gen, avoid oom when there are a large number 
> of duplicate keys in full outer sort merge join.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46654) df.show() of pyspark displayed different results between Regular Spark and Spark Connect

2024-01-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46654:
---
Labels: pull-request-available  (was: )

> df.show() of pyspark displayed different results between Regular Spark and 
> Spark Connect
> 
>
> Key: SPARK-46654
> URL: https://issues.apache.org/jira/browse/SPARK-46654
> Project: Spark
>  Issue Type: Bug
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>
> The following doctest will throw an error in the tests of the pyspark-connect 
> module
> {code:java}
> Example 2: Converting a complex StructType to a CSV string    
> >>> from pyspark.sql import Row, functions as sf
>     >>> data = [(1, Row(age=2, name='Alice', scores=[100, 200, 300]))]
>     >>> df = spark.createDataFrame(data, ("key", "value"))
>     >>> df.select(sf.to_csv(df.value)).show(truncate=False) # doctest: +SKIP
>     +---+
>     |to_csv(value)          |
>     +---+
>     |2,Alice,"[100,200,300]"|
>     +---+{code}
> {code:java}
> **
> 3953File "/__w/spark/spark/python/pyspark/sql/connect/functions/builtin.py", 
> line 2232, in pyspark.sql.connect.functions.builtin.to_csv
> 3954Failed example:
> 3955df.select(sf.to_csv(df.value)).show(truncate=False)
> 3956Expected:
> 3957+---+
> 3958|to_csv(value)  |
> 3959+---+
> 3960|2,Alice,"[100,200,300]"|
> 3961+---+
> 3962Got:
> 3963
> +--+
> 3964|to_csv(value)
>  |
> 3965
> +--+
> 3966
> |2,Alice,org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@99c5e30f|
> 3967
> +--+
> 3968
> 3969**
> 3970   1 of  18 in pyspark.sql.connect.functions.builtin.to_csv
> 3971***Test Failed*** 1 failures. {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46656) Split `GroupbyParitySplitApplyTests`

2024-01-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46656:
---
Labels: pull-request-available  (was: )

> Split `GroupbyParitySplitApplyTests`
> 
>
> Key: SPARK-46656
> URL: https://issues.apache.org/jira/browse/SPARK-46656
> Project: Spark
>  Issue Type: Sub-task
>  Components: PS, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46656) Split `GroupbyParitySplitApplyTests`

2024-01-10 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-46656:
-

 Summary: Split `GroupbyParitySplitApplyTests`
 Key: SPARK-46656
 URL: https://issues.apache.org/jira/browse/SPARK-46656
 Project: Spark
  Issue Type: Sub-task
  Components: PS, Tests
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-46257) Upgrade Derby to 10.16.1.1

2024-01-10 Thread Laurenceau Julien (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-46257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17805059#comment-17805059
 ] 

Laurenceau Julien edited comment on SPARK-46257 at 1/10/24 10:36 AM:
-

Yes you are right.

The only version that fix this vuln currently released on maven central is : 
[10.17.1.0|https://mvnrepository.com/artifact/org.apache.derby/derby/10.17.1.0]

[https://mvnrepository.com/artifact/org.apache.derby/derby]

Do you think it will be possible to upgrade to 10.17.x for spark 4.0.0 ?

NB: I asked to derby on their ticket about the vuln if there is a release 
planned for 10.16.1.2.


was (Author: julienlau):
Yes you are right.

The only version that fix this vuln currently released on maven central is : 
[10.17.1.0|https://mvnrepository.com/artifact/org.apache.derby/derby/10.17.1.0]

[https://mvnrepository.com/artifact/org.apache.derby/derby]

Do you think it will be possible to upgrade to 10.17.x for spark 4.0.0 ?

> Upgrade Derby to 10.16.1.1
> --
>
> Key: SPARK-46257
> URL: https://issues.apache.org/jira/browse/SPARK-46257
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> https://db.apache.org/derby/releases/release-10_16_1_1.cgi



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-46257) Upgrade Derby to 10.16.1.1

2024-01-10 Thread Laurenceau Julien (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-46257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17805059#comment-17805059
 ] 

Laurenceau Julien commented on SPARK-46257:
---

Yes you are right.

The only version that fix this vuln currently released on maven central is : 
[10.17.1.0|https://mvnrepository.com/artifact/org.apache.derby/derby/10.17.1.0]

[https://mvnrepository.com/artifact/org.apache.derby/derby]

Do you think it will be possible to upgrade to 10.17.x for spark 4.0.0 ?

> Upgrade Derby to 10.16.1.1
> --
>
> Key: SPARK-46257
> URL: https://issues.apache.org/jira/browse/SPARK-46257
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> https://db.apache.org/derby/releases/release-10_16_1_1.cgi



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46652) Remove `Snappy` from `TPCDSQueryBenchmark` benchmark case name

2024-01-10 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-46652:
-

Assignee: Dongjoon Hyun

> Remove `Snappy` from `TPCDSQueryBenchmark` benchmark case name
> --
>
> Key: SPARK-46652
> URL: https://issues.apache.org/jira/browse/SPARK-46652
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46652) Remove `Snappy` from `TPCDSQueryBenchmark` benchmark case name

2024-01-10 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-46652.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44657
[https://github.com/apache/spark/pull/44657]

> Remove `Snappy` from `TPCDSQueryBenchmark` benchmark case name
> --
>
> Key: SPARK-46652
> URL: https://issues.apache.org/jira/browse/SPARK-46652
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46650) Replace AtomicBoolean with volatile boolean

2024-01-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-46650:
--

Assignee: (was: Apache Spark)

> Replace AtomicBoolean with volatile boolean
> ---
>
> Key: SPARK-46650
> URL: https://issues.apache.org/jira/browse/SPARK-46650
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46635) Refine docstring of `from_csv/schema_of_csv/to_csv`

2024-01-10 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie resolved SPARK-46635.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44639
[https://github.com/apache/spark/pull/44639]

> Refine docstring of `from_csv/schema_of_csv/to_csv`
> ---
>
> Key: SPARK-46635
> URL: https://issues.apache.org/jira/browse/SPARK-46635
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46635) Refine docstring of `from_csv/schema_of_csv/to_csv`

2024-01-10 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie reassigned SPARK-46635:


Assignee: Yang Jie

> Refine docstring of `from_csv/schema_of_csv/to_csv`
> ---
>
> Key: SPARK-46635
> URL: https://issues.apache.org/jira/browse/SPARK-46635
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org