date:20221004

[jira] [Created] (SPARK-40662) Serialization of MapStatuses is somtimes much larger on scala 2.13

2022-10-04 Thread Emil Ejbyfeldt (Jira)

Emil Ejbyfeldt created SPARK-40662:
--

 Summary: Serialization of MapStatuses is somtimes much larger on 
scala 2.13
 Key: SPARK-40662
 URL: https://issues.apache.org/jira/browse/SPARK-40662
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.3.0
Reporter: Emil Ejbyfeldt


We have observed a case where the same job run against spark on scala 2.13 
fails going out of memory due to the the broadcast for the MapStatuses being 
huge.

In the logs around the time the job fails it tries to create a broadcast of 
size 4.8GiB. 
```
2022-09-18 22:46:01,418 INFO memory.MemoryStore: Block broadcast_17 stored as 
values in memory (estimated size 4.8 GiB, free 12.9 GiB)
```

The same broadcast of the MapStatus for the same job running on 2.12 is 391.5 
Mib so 
```
2022-09-18 16:11:58,753 INFO memory.MemoryStore: Block broadcast_17 stored as 
values in memory (estimated size 391.5 MiB, free 26.4 GiB)
```

in this particular case it seems the broadcast for MapStatuses more than 10 
large when using 2.13. This is not something universal for all MapStatus 
broadcast as we have have many other jobs using Scala 2.13 where the status is 
ruffly the same size. 

This has been observed on 3.3.0 but I also tested it against 3.3.1-rc2 and 
build of 3.4.0-SNAPSHOT and both of those also reproduced the issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40660) Switch to XORShiftRandom to distribute elements

2022-10-04 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-40660:
---

Assignee: Yuming Wang

> Switch to XORShiftRandom to distribute elements
> ---
>
> Key: SPARK-40660
> URL: https://issues.apache.org/jira/browse/SPARK-40660
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>
> {code:scala}
> import java.util.Random
> import org.apache.spark.util.random.XORShiftRandom
> import scala.util.hashing
> def distribution(count: Int, partition: Int) = {
>   println((1 to count).map(partitionId => new 
> Random(partitionId).nextInt(partition))
> .groupBy(f => f)
> .map(_._2.size).mkString(". "))
>   println((1 to count).map(partitionId => new 
> Random(hashing.byteswap32(partitionId)).nextInt(partition))
> .groupBy(f => f)
> .map(_._2.size).mkString(". "))
>   println((1 to count).map(partitionId => new 
> XORShiftRandom(partitionId).nextInt(partition))
> .groupBy(f => f)
> .map(_._2.size).mkString(". "))
> }
> distribution(200, 4)
> {code}
> {noformat}
> 200
> 50. 60. 46. 44
> 55. 48. 43. 54
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-40660) Switch to XORShiftRandom to distribute elements

2022-10-04 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-40660.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38106
[https://github.com/apache/spark/pull/38106]

> Switch to XORShiftRandom to distribute elements
> ---
>
> Key: SPARK-40660
> URL: https://issues.apache.org/jira/browse/SPARK-40660
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.4.0
>
>
> {code:scala}
> import java.util.Random
> import org.apache.spark.util.random.XORShiftRandom
> import scala.util.hashing
> def distribution(count: Int, partition: Int) = {
>   println((1 to count).map(partitionId => new 
> Random(partitionId).nextInt(partition))
> .groupBy(f => f)
> .map(_._2.size).mkString(". "))
>   println((1 to count).map(partitionId => new 
> Random(hashing.byteswap32(partitionId)).nextInt(partition))
> .groupBy(f => f)
> .map(_._2.size).mkString(". "))
>   println((1 to count).map(partitionId => new 
> XORShiftRandom(partitionId).nextInt(partition))
> .groupBy(f => f)
> .map(_._2.size).mkString(". "))
> }
> distribution(200, 4)
> {code}
> {noformat}
> 200
> 50. 60. 46. 44
> 55. 48. 43. 54
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40587) SELECT * shouldn't be empty project list in proto.

2022-10-04 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-40587:
---

Assignee: Rui Wang

> SELECT * shouldn't be empty project list in proto.
> --
>
> Key: SPARK-40587
> URL: https://issues.apache.org/jira/browse/SPARK-40587
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>
> Current proto uses empty project list for `SELECT *`.  However, this is an 
> implicit way that it is hard to differentiate `not set` and `set but empty`. 
> For longer term proto compatibility, we should always use explicit fields for 
> passing through information.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-40587) SELECT * shouldn't be empty project list in proto.

2022-10-04 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-40587.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38023
[https://github.com/apache/spark/pull/38023]

> SELECT * shouldn't be empty project list in proto.
> --
>
> Key: SPARK-40587
> URL: https://issues.apache.org/jira/browse/SPARK-40587
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
> Fix For: 3.4.0
>
>
> Current proto uses empty project list for `SELECT *`.  However, this is an 
> implicit way that it is hard to differentiate `not set` and `set but empty`. 
> For longer term proto compatibility, we should always use explicit fields for 
> passing through information.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40659) Schema evolution for protobuf (and Avro too?)

2022-10-04 Thread Raghu Angadi (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612849#comment-17612849
 ] 

Raghu Angadi commented on SPARK-40659:
--

For 1): No schema evolution is necessary. We keep reading old and latest 
messages.

For 2): Schema evolution is for this case so that we don't drop fields.

Say a streaming application reads from Kafka and writes all the fields to a 
delta table. This pipeline keeps running for a long time. Meanwhile customer 
adds a new field 'zip_code' to the schema. What should happen?
 * (a) Without schema evolution: 'zip_code' field would be dropped and would 
not appear in the destination table.
 * (b) With schema evolution: we create new column 'zip_code' and populate the 
column.

We want to have (b). In terms of implementation, if throw a specific error, 
structured streaming stops the pipeline and restarts, which will fetch the new 
schema and handle 'zip_code' correctly. 

> Schema evolution for protobuf (and Avro too?)
> -
>
> Key: SPARK-40659
> URL: https://issues.apache.org/jira/browse/SPARK-40659
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.3.0
>Reporter: Raghu Angadi
>Priority: Major
>
> Protobuf & Avro should support schema evolution in streaming. We need to 
> throw a specific error message when we detect newer version of the the schema 
> in schema registry.
> A couple of options for detecting version change at runtime:
>  * How do we detect newer version from schema registry? It is contacted only 
> during planning currently.
>  * We could detect version id in coming messages.
>  ** What if the id in the incoming message is newer than what our 
> schema-registry reports after the restart?
>  *** This indicates delayed syncs between customers schema-registry servers 
> (should be rare). We can keep erroring out until it is fixed.
>  *** Make sure we log the schema id used during planning.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40651) Drop Hadoop2 binary distribtuion from release process

2022-10-04 Thread Yang Jie (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612845#comment-17612845
 ] 

Yang Jie commented on SPARK-40651:
--

Is there an overall removal plan? What can I do to help?

> Drop Hadoop2 binary distribtuion from release process
> -
>
> Key: SPARK-40651
> URL: https://issues.apache.org/jira/browse/SPARK-40651
> Project: Spark
>  Issue Type: Task
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40661) Upgrade `jetty-http` from 9.4.48.v20220622 to 9.4.49.v20220914

2022-10-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40661:


Assignee: (was: Apache Spark)

> Upgrade `jetty-http` from 9.4.48.v20220622 to 9.4.49.v20220914
> --
>
> Key: SPARK-40661
> URL: https://issues.apache.org/jira/browse/SPARK-40661
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40661) Upgrade `jetty-http` from 9.4.48.v20220622 to 9.4.49.v20220914

2022-10-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40661:


Assignee: Apache Spark

> Upgrade `jetty-http` from 9.4.48.v20220622 to 9.4.49.v20220914
> --
>
> Key: SPARK-40661
> URL: https://issues.apache.org/jira/browse/SPARK-40661
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40661) Upgrade `jetty-http` from 9.4.48.v20220622 to 9.4.49.v20220914

2022-10-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612840#comment-17612840
 ] 

Apache Spark commented on SPARK-40661:
--

User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/38107

> Upgrade `jetty-http` from 9.4.48.v20220622 to 9.4.49.v20220914
> --
>
> Key: SPARK-40661
> URL: https://issues.apache.org/jira/browse/SPARK-40661
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-40661) Upgrade `jetty-http` from 9.4.48.v20220622 to 9.4.49.v20220914

2022-10-04 Thread BingKun Pan (Jira)

BingKun Pan created SPARK-40661:
---

 Summary: Upgrade `jetty-http` from 9.4.48.v20220622 to 
9.4.49.v20220914
 Key: SPARK-40661
 URL: https://issues.apache.org/jira/browse/SPARK-40661
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.4.0
Reporter: BingKun Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40660) Switch to XORShiftRandom to distribute elements

2022-10-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612839#comment-17612839
 ] 

Apache Spark commented on SPARK-40660:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/38106

> Switch to XORShiftRandom to distribute elements
> ---
>
> Key: SPARK-40660
> URL: https://issues.apache.org/jira/browse/SPARK-40660
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Priority: Major
>
> {code:scala}
> import java.util.Random
> import org.apache.spark.util.random.XORShiftRandom
> import scala.util.hashing
> def distribution(count: Int, partition: Int) = {
>   println((1 to count).map(partitionId => new 
> Random(partitionId).nextInt(partition))
> .groupBy(f => f)
> .map(_._2.size).mkString(". "))
>   println((1 to count).map(partitionId => new 
> Random(hashing.byteswap32(partitionId)).nextInt(partition))
> .groupBy(f => f)
> .map(_._2.size).mkString(". "))
>   println((1 to count).map(partitionId => new 
> XORShiftRandom(partitionId).nextInt(partition))
> .groupBy(f => f)
> .map(_._2.size).mkString(". "))
> }
> distribution(200, 4)
> {code}
> {noformat}
> 200
> 50. 60. 46. 44
> 55. 48. 43. 54
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40660) Switch to XORShiftRandom to distribute elements

2022-10-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40660:


Assignee: (was: Apache Spark)

> Switch to XORShiftRandom to distribute elements
> ---
>
> Key: SPARK-40660
> URL: https://issues.apache.org/jira/browse/SPARK-40660
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Priority: Major
>
> {code:scala}
> import java.util.Random
> import org.apache.spark.util.random.XORShiftRandom
> import scala.util.hashing
> def distribution(count: Int, partition: Int) = {
>   println((1 to count).map(partitionId => new 
> Random(partitionId).nextInt(partition))
> .groupBy(f => f)
> .map(_._2.size).mkString(". "))
>   println((1 to count).map(partitionId => new 
> Random(hashing.byteswap32(partitionId)).nextInt(partition))
> .groupBy(f => f)
> .map(_._2.size).mkString(". "))
>   println((1 to count).map(partitionId => new 
> XORShiftRandom(partitionId).nextInt(partition))
> .groupBy(f => f)
> .map(_._2.size).mkString(". "))
> }
> distribution(200, 4)
> {code}
> {noformat}
> 200
> 50. 60. 46. 44
> 55. 48. 43. 54
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40660) Switch to XORShiftRandom to distribute elements

2022-10-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612838#comment-17612838
 ] 

Apache Spark commented on SPARK-40660:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/38106

> Switch to XORShiftRandom to distribute elements
> ---
>
> Key: SPARK-40660
> URL: https://issues.apache.org/jira/browse/SPARK-40660
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Priority: Major
>
> {code:scala}
> import java.util.Random
> import org.apache.spark.util.random.XORShiftRandom
> import scala.util.hashing
> def distribution(count: Int, partition: Int) = {
>   println((1 to count).map(partitionId => new 
> Random(partitionId).nextInt(partition))
> .groupBy(f => f)
> .map(_._2.size).mkString(". "))
>   println((1 to count).map(partitionId => new 
> Random(hashing.byteswap32(partitionId)).nextInt(partition))
> .groupBy(f => f)
> .map(_._2.size).mkString(". "))
>   println((1 to count).map(partitionId => new 
> XORShiftRandom(partitionId).nextInt(partition))
> .groupBy(f => f)
> .map(_._2.size).mkString(". "))
> }
> distribution(200, 4)
> {code}
> {noformat}
> 200
> 50. 60. 46. 44
> 55. 48. 43. 54
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40660) Switch to XORShiftRandom to distribute elements

2022-10-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40660:


Assignee: Apache Spark

> Switch to XORShiftRandom to distribute elements
> ---
>
> Key: SPARK-40660
> URL: https://issues.apache.org/jira/browse/SPARK-40660
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Assignee: Apache Spark
>Priority: Major
>
> {code:scala}
> import java.util.Random
> import org.apache.spark.util.random.XORShiftRandom
> import scala.util.hashing
> def distribution(count: Int, partition: Int) = {
>   println((1 to count).map(partitionId => new 
> Random(partitionId).nextInt(partition))
> .groupBy(f => f)
> .map(_._2.size).mkString(". "))
>   println((1 to count).map(partitionId => new 
> Random(hashing.byteswap32(partitionId)).nextInt(partition))
> .groupBy(f => f)
> .map(_._2.size).mkString(". "))
>   println((1 to count).map(partitionId => new 
> XORShiftRandom(partitionId).nextInt(partition))
> .groupBy(f => f)
> .map(_._2.size).mkString(". "))
> }
> distribution(200, 4)
> {code}
> {noformat}
> 200
> 50. 60. 46. 44
> 55. 48. 43. 54
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40645) Throw exception for Collect() and recommend to use toPandas()

2022-10-04 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-40645:


Assignee: Rui Wang

> Throw exception for Collect() and recommend to use toPandas()
> -
>
> Key: SPARK-40645
> URL: https://issues.apache.org/jira/browse/SPARK-40645
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>
> Current connect `Collect()` return Pandas DataFrame, which does not match 
> with PySpark DataFrame API: 
> https://github.com/apache/spark/blob/ceb8527413288b4d5c54d3afd76d00c9e26817a1/python/pyspark/sql/connect/data_frame.py#L227.
> The underlying implementation has been generating Pandas DataFrame though. In 
> this case, we can choose to use to `toPandas()` and throw exception for 
> `Collect()`. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-40645) Throw exception for Collect() and recommend to use toPandas()

2022-10-04 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-40645.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38089
[https://github.com/apache/spark/pull/38089]

> Throw exception for Collect() and recommend to use toPandas()
> -
>
> Key: SPARK-40645
> URL: https://issues.apache.org/jira/browse/SPARK-40645
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
> Fix For: 3.4.0
>
>
> Current connect `Collect()` return Pandas DataFrame, which does not match 
> with PySpark DataFrame API: 
> https://github.com/apache/spark/blob/ceb8527413288b4d5c54d3afd76d00c9e26817a1/python/pyspark/sql/connect/data_frame.py#L227.
> The underlying implementation has been generating Pandas DataFrame though. In 
> this case, we can choose to use to `toPandas()` and throw exception for 
> `Collect()`. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-40660) Switch to XORShiftRandom to distribute elements

2022-10-04 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-40660:

Summary: Switch to XORShiftRandom to distribute elements  (was: Switch 
XORShiftRandom to distribute elements)

> Switch to XORShiftRandom to distribute elements
> ---
>
> Key: SPARK-40660
> URL: https://issues.apache.org/jira/browse/SPARK-40660
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Priority: Major
>
> {code:scala}
> import java.util.Random
> import org.apache.spark.util.random.XORShiftRandom
> import scala.util.hashing
> def distribution(count: Int, partition: Int) = {
>   println((1 to count).map(partitionId => new 
> Random(partitionId).nextInt(partition))
> .groupBy(f => f)
> .map(_._2.size).mkString(". "))
>   println((1 to count).map(partitionId => new 
> Random(hashing.byteswap32(partitionId)).nextInt(partition))
> .groupBy(f => f)
> .map(_._2.size).mkString(". "))
>   println((1 to count).map(partitionId => new 
> XORShiftRandom(partitionId).nextInt(partition))
> .groupBy(f => f)
> .map(_._2.size).mkString(". "))
> }
> distribution(200, 4)
> {code}
> {noformat}
> 200
> 50. 60. 46. 44
> 55. 48. 43. 54
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-40660) Switch XORShiftRandom to distribute elements

2022-10-04 Thread Yuming Wang (Jira)

Yuming Wang created SPARK-40660:
---

 Summary: Switch XORShiftRandom to distribute elements
 Key: SPARK-40660
 URL: https://issues.apache.org/jira/browse/SPARK-40660
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.4.0
Reporter: Yuming Wang



{code:scala}
import java.util.Random
import org.apache.spark.util.random.XORShiftRandom
import scala.util.hashing

def distribution(count: Int, partition: Int) = {
  println((1 to count).map(partitionId => new 
Random(partitionId).nextInt(partition))
.groupBy(f => f)
.map(_._2.size).mkString(". "))

  println((1 to count).map(partitionId => new 
Random(hashing.byteswap32(partitionId)).nextInt(partition))
.groupBy(f => f)
.map(_._2.size).mkString(". "))

  println((1 to count).map(partitionId => new 
XORShiftRandom(partitionId).nextInt(partition))
.groupBy(f => f)
.map(_._2.size).mkString(". "))
}

distribution(200, 4)
{code}


{noformat}
200
50. 60. 46. 44
55. 48. 43. 54
{noformat}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-39725) Upgrade jetty-http from 9.4.46.v20220331 to 9.4.48.v20220622

2022-10-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-39725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-39725:
--
Fix Version/s: 3.3.2

> Upgrade jetty-http from 9.4.46.v20220331 to 9.4.48.v20220622
> 
>
> Key: SPARK-39725
> URL: https://issues.apache.org/jira/browse/SPARK-39725
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Bjørn Jørgensen
>Assignee: Bjørn Jørgensen
>Priority: Major
> Fix For: 3.4.0, 3.3.2
>
> Attachments: jetty-io-spark.png
>
>
> [Release note |https://github.com/eclipse/jetty.project/releases] 
> [CVE-2022-2047|https://nvd.nist.gov/vuln/detail/CVE-2022-2047]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-39725) Upgrade jetty-http from 9.4.46.v20220331 to 9.4.48.v20220622

2022-10-04 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-39725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612831#comment-17612831
 ] 

Dongjoon Hyun commented on SPARK-39725:
---

This landed to branch-3.3 via [https://github.com/apache/spark/pull/38098]

> Upgrade jetty-http from 9.4.46.v20220331 to 9.4.48.v20220622
> 
>
> Key: SPARK-39725
> URL: https://issues.apache.org/jira/browse/SPARK-39725
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Bjørn Jørgensen
>Assignee: Bjørn Jørgensen
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: jetty-io-spark.png
>
>
> [Release note |https://github.com/eclipse/jetty.project/releases] 
> [CVE-2022-2047|https://nvd.nist.gov/vuln/detail/CVE-2022-2047]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-40281) Memory Profiler on Executors

2022-10-04 Thread Xinrong Meng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-40281:
-
Description: 
Profiling is critical to performance engineering. Memory consumption is a key 
indicator of how efficient a PySpark program is. There is an existing effort on 
memory profiling of Python progrms, Memory Profiler 
([https://pypi.org/project/memory-profiler/).|https://pypi.org/project/memory-profiler/]

PySpark applications run as independent sets of processes on a cluster, 
coordinated by the SparkContext object in the driver program. On the driver 
side, PySpark is a regular Python process, thus, we can profile it as a normal 
Python program using Memory Profiler.

However, on the executors side, we are missing such memory profiler. Since 
executors are distributed on different nodes in the cluster, we need to 
aggregate profiles. Furthermore, Python worker processes are spawned per 
executor for the Python/Pandas UDF execution, which makes the memory profiling 
more intricate.

The umbrella proposes to implement a Memory Profiler on Executors.

  was:
Profiling is critical to performance engineering. Memory consumption is a key 
indicator of how efficient a PySpark program is. There is an existing effort on 
memory profiling of Python progrms, Memory Profiler 
([https://pypi.org/project/memory-profiler/).|https://pypi.org/project/memory-profiler/]

PySpark applications run as independent sets of processes on a cluster, 
coordinated by the SparkContext object in the driver program. On the driver 
side, PySpark is a regular Python process, thus, we can profile it as a normal 
Python program using Memory Profiler.

However, on the executors side, we are missing such memory profiler. Since 
executors are distributed on different nodes in the cluster, we need to need to 
aggregate profiles. Furthermore, Python worker processes are spawned per 
executor for the Python/Pandas UDF execution, which makes the memory profiling 
more intricate.

The umbrella proposes to implement a Memory Profiler on Executors.


> Memory Profiler on Executors
> 
>
> Key: SPARK-40281
> URL: https://issues.apache.org/jira/browse/SPARK-40281
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Profiling is critical to performance engineering. Memory consumption is a key 
> indicator of how efficient a PySpark program is. There is an existing effort 
> on memory profiling of Python progrms, Memory Profiler 
> ([https://pypi.org/project/memory-profiler/).|https://pypi.org/project/memory-profiler/]
> PySpark applications run as independent sets of processes on a cluster, 
> coordinated by the SparkContext object in the driver program. On the driver 
> side, PySpark is a regular Python process, thus, we can profile it as a 
> normal Python program using Memory Profiler.
> However, on the executors side, we are missing such memory profiler. Since 
> executors are distributed on different nodes in the cluster, we need to 
> aggregate profiles. Furthermore, Python worker processes are spawned per 
> executor for the Python/Pandas UDF execution, which makes the memory 
> profiling more intricate.
> The umbrella proposes to implement a Memory Profiler on Executors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-40428) Add a shutdownhook to CoarseGrained scheduler to avoid dangling resources during abnormal shutdown

2022-10-04 Thread Holden Karau (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Holden Karau resolved SPARK-40428.
--
Fix Version/s: 3.4.0
 Assignee: Holden Karau
   Resolution: Fixed

> Add a shutdownhook to CoarseGrained scheduler to avoid dangling resources 
> during abnormal shutdown
> --
>
> Key: SPARK-40428
> URL: https://issues.apache.org/jira/browse/SPARK-40428
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Spark Core
>Affects Versions: 3.4.0
>Reporter: Holden Karau
>Assignee: Holden Karau
>Priority: Minor
> Fix For: 3.4.0
>
>
>     Add a shutdown hook in the CoarseGrainedSchedulerBackend to call stop 
> since we've got zombie pods hanging around since the resource tie isn't 
> perfect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40540) Migrate compilation errors onto error classes

2022-10-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612829#comment-17612829
 ] 

Apache Spark commented on SPARK-40540:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/38104

> Migrate compilation errors onto error classes
> -
>
> Key: SPARK-40540
> URL: https://issues.apache.org/jira/browse/SPARK-40540
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.4.0
>
>
> Use temporary error classes in the compilation exceptions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-40585) Support double-quoted identifiers

2022-10-04 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-40585.

Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38022
[https://github.com/apache/spark/pull/38022]

> Support double-quoted identifiers
> -
>
> Key: SPARK-40585
> URL: https://issues.apache.org/jira/browse/SPARK-40585
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Serge Rielau
>Assignee: Serge Rielau
>Priority: Major
> Fix For: 3.4.0
>
>
> In many SQL identifiers can be unquoted or quoted with double quotes. 
> In Spark double quoted literals imply strings.
> In this proposal we allow for a config:
> double_quoted_identifiers
> which, when set, switches the interpretation from string to identifier.
> Note that back ticks are still allowed.
> Also the treatment of escapes is not changed as part of this work.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40585) Support double-quoted identifiers

2022-10-04 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang reassigned SPARK-40585:
--

Assignee: Serge Rielau

> Support double-quoted identifiers
> -
>
> Key: SPARK-40585
> URL: https://issues.apache.org/jira/browse/SPARK-40585
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Serge Rielau
>Assignee: Serge Rielau
>Priority: Major
>
> In many SQL identifiers can be unquoted or quoted with double quotes. 
> In Spark double quoted literals imply strings.
> In this proposal we allow for a config:
> double_quoted_identifiers
> which, when set, switches the interpretation from string to identifier.
> Note that back ticks are still allowed.
> Also the treatment of escapes is not changed as part of this work.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-40617) Assertion failed in ExecutorMetricsPoller "task count shouldn't below 0"

2022-10-04 Thread Attila Zsolt Piros (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Zsolt Piros updated SPARK-40617:
---
Fix Version/s: 3.2.0

> Assertion failed in ExecutorMetricsPoller "task count shouldn't below 0"
> 
>
> Key: SPARK-40617
> URL: https://issues.apache.org/jira/browse/SPARK-40617
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0, 3.2.1, 3.3.0, 3.2.2
>Reporter: Attila Zsolt Piros
>Assignee: Attila Zsolt Piros
>Priority: Major
> Fix For: 3.2.0, 3.4.0, 3.3.1
>
>
> Spurious failures because of the assert:
> {noformat}
> 22/09/29 09:46:24 ERROR SparkUncaughtExceptionHandler: Uncaught exception in 
> thread Thread[Executor task launch worker for task 3063.0 in stage 1997.0 
> (TID 677249),5,main]
> java.lang.AssertionError: assertion failed: task count shouldn't below 0
>   at scala.Predef$.assert(Predef.scala:223)
>   at 
> org.apache.spark.executor.ExecutorMetricsPoller.decrementCount$1(ExecutorMetricsPoller.scala:130)
>   at 
> org.apache.spark.executor.ExecutorMetricsPoller.$anonfun$onTaskCompletion$3(ExecutorMetricsPoller.scala:135)
>   at 
> java.base/java.util.concurrent.ConcurrentHashMap.computeIfPresent(ConcurrentHashMap.java:1822)
>   at 
> org.apache.spark.executor.ExecutorMetricsPoller.onTaskCompletion(ExecutorMetricsPoller.scala:135)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:737)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.base/java.lang.Thread.run(Thread.java:829)
> 22/09/29 09:46:24 INFO MemoryStore: MemoryStore cleared
> 22/09/29 09:46:24 INFO BlockManager: BlockManager stopped
> 22/09/29 09:46:24 INFO ShutdownHookManager: Shutdown hook called
> 22/09/29 09:46:24 INFO ShutdownHookManager: Deleting directory 
> /mnt/yarn/usercache/hadoop/appcache/application_1664443624160_0001/spark-93efc2d4-84de-494b-a3b7-2cb1c3a45426
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40659) Schema evolution for protobuf (and Avro too?)

2022-10-04 Thread Mohan Parthasarathy (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612811#comment-17612811
 ] 

Mohan Parthasarathy commented on SPARK-40659:
-

[~rangadi]  A few clarifications. I am trying to understand the conditions 
under which the error is thrown. Using Confluent schema registry terminology, 
let's take a couple of examples:

1) BACKWARDS:  Assuming the schema has evolved as per the rules, the consumer 
using the latest schema can read messages both from old and latest schema

2) FORWARDS: Similarly, the consumer using the older schema can read messages 
from a later schema; it would just ignore the new fields.

In these cases, it will continue to work. Why would we throw error in these 
cases ? What other cases needs an error to be thrown ? Could you elaborate ?

> Schema evolution for protobuf (and Avro too?)
> -
>
> Key: SPARK-40659
> URL: https://issues.apache.org/jira/browse/SPARK-40659
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.3.0
>Reporter: Raghu Angadi
>Priority: Major
>
> Protobuf & Avro should support schema evolution in streaming. We need to 
> throw a specific error message when we detect newer version of the the schema 
> in schema registry.
> A couple of options for detecting version change at runtime:
>  * How do we detect newer version from schema registry? It is contacted only 
> during planning currently.
>  * We could detect version id in coming messages.
>  ** What if the id in the incoming message is newer than what our 
> schema-registry reports after the restart?
>  *** This indicates delayed syncs between customers schema-registry servers 
> (should be rare). We can keep erroring out until it is fixed.
>  *** Make sure we log the schema id used during planning.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-40659) Schema evolution for protobuf (and Avro too?)

2022-10-04 Thread Raghu Angadi (Jira)

Raghu Angadi created SPARK-40659:


 Summary: Schema evolution for protobuf (and Avro too?)
 Key: SPARK-40659
 URL: https://issues.apache.org/jira/browse/SPARK-40659
 Project: Spark
  Issue Type: Improvement
  Components: Structured Streaming
Affects Versions: 3.3.0
Reporter: Raghu Angadi


Protobuf & Avro should support schema evolution in streaming. We need to throw 
a specific error message when we detect newer version of the the schema in 
schema registry.

A couple of options for detecting version change at runtime:
 * How do we detect newer version from schema registry? It is contacted only 
during planning currently.

 * We could detect version id in coming messages.

 ** What if the id in the incoming message is newer than what our 
schema-registry reports after the restart?

 *** This indicates delayed syncs between customers schema-registry servers 
(should be rare). We can keep erroring out until it is fixed.

 *** Make sure we log the schema id used during planning.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-40658) Protobuf v2 & v3 support

2022-10-04 Thread Raghu Angadi (Jira)

Raghu Angadi created SPARK-40658:


 Summary: Protobuf v2 & v3 support
 Key: SPARK-40658
 URL: https://issues.apache.org/jira/browse/SPARK-40658
 Project: Spark
  Issue Type: Improvement
  Components: Structured Streaming
Affects Versions: 3.3.0
Reporter: Raghu Angadi


We want to ensure Protobuf functions support both Protobuf version 2 and 
version 3 schemas (e.g. descriptor file or compiled classes with v2 and v3).

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-40657) Add support for compiled classes (Java classes)

2022-10-04 Thread Raghu Angadi (Jira)

Raghu Angadi created SPARK-40657:


 Summary: Add support for compiled classes (Java classes)
 Key: SPARK-40657
 URL: https://issues.apache.org/jira/browse/SPARK-40657
 Project: Spark
  Issue Type: Improvement
  Components: Structured Streaming
Affects Versions: 3.3.0
Reporter: Raghu Angadi


For some users, it is more convenient to provide compiled classes rather than a 
descriptor file. 

We can support java compiled classes. Python could also use the same since all 
the processing happens in Scala.

Supporting python compiled classes is out-of-scope for this. It is not clear 
how well we can support that, short of using python UDF. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40654) Protobuf support MVP with descriptor files

2022-10-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40654:


Assignee: Apache Spark

> Protobuf support MVP with descriptor files
> --
>
> Key: SPARK-40654
> URL: https://issues.apache.org/jira/browse/SPARK-40654
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.3.0
>Reporter: Raghu Angadi
>Assignee: Apache Spark
>Priority: Major
>
> This is the MVP implementation of protobuf support with descriptor files.
> Currently in PR https://github.com/apache/spark/pull/37972



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40654) Protobuf support MVP with descriptor files

2022-10-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40654:


Assignee: (was: Apache Spark)

> Protobuf support MVP with descriptor files
> --
>
> Key: SPARK-40654
> URL: https://issues.apache.org/jira/browse/SPARK-40654
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.3.0
>Reporter: Raghu Angadi
>Priority: Major
>
> This is the MVP implementation of protobuf support with descriptor files.
> Currently in PR https://github.com/apache/spark/pull/37972



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40654) Protobuf support MVP with descriptor files

2022-10-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612802#comment-17612802
 ] 

Apache Spark commented on SPARK-40654:
--

User 'SandishKumarHN' has created a pull request for this issue:
https://github.com/apache/spark/pull/37972

> Protobuf support MVP with descriptor files
> --
>
> Key: SPARK-40654
> URL: https://issues.apache.org/jira/browse/SPARK-40654
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.3.0
>Reporter: Raghu Angadi
>Priority: Major
>
> This is the MVP implementation of protobuf support with descriptor files.
> Currently in PR https://github.com/apache/spark/pull/37972



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40655) Protobuf functions in Python

2022-10-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40655:


Assignee: (was: Apache Spark)

> Protobuf functions in Python 
> -
>
> Key: SPARK-40655
> URL: https://issues.apache.org/jira/browse/SPARK-40655
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.3.0
>Reporter: Raghu Angadi
>Priority: Major
>
> Add Python support for Protobuf functions



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40648) Add `@ExtendedLevelDBTest` to the leveldb relevant tests in the yarn module

2022-10-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-40648:
-

Assignee: Yang Jie  (was: Apache Spark)

>   Add `@ExtendedLevelDBTest` to the leveldb relevant  tests in the yarn module
> --
>
> Key: SPARK-40648
> URL: https://issues.apache.org/jira/browse/SPARK-40648
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests, YARN
>Affects Versions: 3.2.2, 3.4.0, 3.3.1
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
> Fix For: 3.4.0, 3.3.2
>
>
> SPARK-40490 make  the test case related to `YarnShuffleIntegrationSuite` 
> starts to verify the registeredExecFile reload test scenario again，so we need 
> to add `@ExtendedLevelDBTest` for the test case using LevelDB so that the 
> `MacOs/Apple Silicon` can skip relevant tests through 
> `-Dtest.exclude.tags=org.apache.spark.tags.ExtendedLevelDBTest`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-40656) Schema-registry support for Protobuf format

2022-10-04 Thread Raghu Angadi (Jira)

Raghu Angadi created SPARK-40656:


 Summary: Schema-registry support for Protobuf format
 Key: SPARK-40656
 URL: https://issues.apache.org/jira/browse/SPARK-40656
 Project: Spark
  Issue Type: Improvement
  Components: Structured Streaming
Affects Versions: 3.3.0
Reporter: Raghu Angadi


Add support for reading protobuf schema (definition) from Confluent 
schema-registry. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-40648) Add `@ExtendedLevelDBTest` to the leveldb relevant tests in the yarn module

2022-10-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-40648:
--
Fix Version/s: 3.3.2

>   Add `@ExtendedLevelDBTest` to the leveldb relevant  tests in the yarn module
> --
>
> Key: SPARK-40648
> URL: https://issues.apache.org/jira/browse/SPARK-40648
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests, YARN
>Affects Versions: 3.2.2, 3.4.0, 3.3.1
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Major
> Fix For: 3.4.0, 3.3.2
>
>
> SPARK-40490 make  the test case related to `YarnShuffleIntegrationSuite` 
> starts to verify the registeredExecFile reload test scenario again，so we need 
> to add `@ExtendedLevelDBTest` for the test case using LevelDB so that the 
> `MacOs/Apple Silicon` can skip relevant tests through 
> `-Dtest.exclude.tags=org.apache.spark.tags.ExtendedLevelDBTest`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40648) Add `@ExtendedLevelDBTest` to the leveldb relevant tests in the yarn module

2022-10-04 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612799#comment-17612799
 ] 

Dongjoon Hyun commented on SPARK-40648:
---

This landed to branch-3.3 via [https://github.com/apache/spark/pull/38096]

>   Add `@ExtendedLevelDBTest` to the leveldb relevant  tests in the yarn module
> --
>
> Key: SPARK-40648
> URL: https://issues.apache.org/jira/browse/SPARK-40648
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests, YARN
>Affects Versions: 3.2.2, 3.4.0, 3.3.1
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Major
> Fix For: 3.4.0, 3.3.2
>
>
> SPARK-40490 make  the test case related to `YarnShuffleIntegrationSuite` 
> starts to verify the registeredExecFile reload test scenario again，so we need 
> to add `@ExtendedLevelDBTest` for the test case using LevelDB so that the 
> `MacOs/Apple Silicon` can skip relevant tests through 
> `-Dtest.exclude.tags=org.apache.spark.tags.ExtendedLevelDBTest`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40655) Protobuf functions in Python

2022-10-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612800#comment-17612800
 ] 

Apache Spark commented on SPARK-40655:
--

User 'SandishKumarHN' has created a pull request for this issue:
https://github.com/apache/spark/pull/38100

> Protobuf functions in Python 
> -
>
> Key: SPARK-40655
> URL: https://issues.apache.org/jira/browse/SPARK-40655
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.3.0
>Reporter: Raghu Angadi
>Priority: Major
>
> Add Python support for Protobuf functions



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40655) Protobuf functions in Python

2022-10-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40655:


Assignee: Apache Spark

> Protobuf functions in Python 
> -
>
> Key: SPARK-40655
> URL: https://issues.apache.org/jira/browse/SPARK-40655
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.3.0
>Reporter: Raghu Angadi
>Assignee: Apache Spark
>Priority: Major
>
> Add Python support for Protobuf functions



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-40655) Protobuf functions in Python

2022-10-04 Thread Raghu Angadi (Jira)

Raghu Angadi created SPARK-40655:


 Summary: Protobuf functions in Python 
 Key: SPARK-40655
 URL: https://issues.apache.org/jira/browse/SPARK-40655
 Project: Spark
  Issue Type: Improvement
  Components: Structured Streaming
Affects Versions: 3.3.0
Reporter: Raghu Angadi


Add Python support for Protobuf functions



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-40654) Protobuf support MVP with descriptor files

2022-10-04 Thread Raghu Angadi (Jira)

Raghu Angadi created SPARK-40654:


 Summary: Protobuf support MVP with descriptor files
 Key: SPARK-40654
 URL: https://issues.apache.org/jira/browse/SPARK-40654
 Project: Spark
  Issue Type: Improvement
  Components: Structured Streaming
Affects Versions: 3.3.0
Reporter: Raghu Angadi


This is the MVP implementation of protobuf support with descriptor files.

Currently in PR https://github.com/apache/spark/pull/37972



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-40648) Add `@ExtendedLevelDBTest` to the leveldb relevant tests in the yarn module

2022-10-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-40648.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38095
[https://github.com/apache/spark/pull/38095]

>   Add `@ExtendedLevelDBTest` to the leveldb relevant  tests in the yarn module
> --
>
> Key: SPARK-40648
> URL: https://issues.apache.org/jira/browse/SPARK-40648
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests, YARN
>Affects Versions: 3.2.2, 3.4.0, 3.3.1
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Major
> Fix For: 3.4.0
>
>
> SPARK-40490 make  the test case related to `YarnShuffleIntegrationSuite` 
> starts to verify the registeredExecFile reload test scenario again，so we need 
> to add `@ExtendedLevelDBTest` for the test case using LevelDB so that the 
> `MacOs/Apple Silicon` can skip relevant tests through 
> `-Dtest.exclude.tags=org.apache.spark.tags.ExtendedLevelDBTest`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40652) Add MASK_PHONE and TRY_MASK_PHONE functions

2022-10-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612796#comment-17612796
 ] 

Apache Spark commented on SPARK-40652:
--

User 'dtenedor' has created a pull request for this issue:
https://github.com/apache/spark/pull/38101

> Add MASK_PHONE and TRY_MASK_PHONE functions
> ---
>
> Key: SPARK-40652
> URL: https://issues.apache.org/jira/browse/SPARK-40652
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Daniel
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40652) Add MASK_PHONE and TRY_MASK_PHONE functions

2022-10-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40652:


Assignee: (was: Apache Spark)

> Add MASK_PHONE and TRY_MASK_PHONE functions
> ---
>
> Key: SPARK-40652
> URL: https://issues.apache.org/jira/browse/SPARK-40652
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Daniel
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40636) Fix wrong remained shuffles log in BlockManagerDecommissioner

2022-10-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-40636:
-

Assignee: Zhongwei Zhu

> Fix wrong remained shuffles log in BlockManagerDecommissioner
> -
>
> Key: SPARK-40636
> URL: https://issues.apache.org/jira/browse/SPARK-40636
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager
>Affects Versions: 3.3.0
>Reporter: Zhongwei Zhu
>Assignee: Zhongwei Zhu
>Priority: Minor
>
>  BlockManagerDecommissioner should log correct remained shuffles.
> {code:java}
> 4 of 24 local shuffles are added. In total, 24 shuffles are remained.
> 2022-09-30 17:42:15.035 PDT
> 0 of 24 local shuffles are added. In total, 24 shuffles are remained.
> 2022-09-30 17:42:45.069 PDT
> 0 of 24 local shuffles are added. In total, 24 shuffles are remained.{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40652) Add MASK_PHONE and TRY_MASK_PHONE functions

2022-10-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40652:


Assignee: Apache Spark

> Add MASK_PHONE and TRY_MASK_PHONE functions
> ---
>
> Key: SPARK-40652
> URL: https://issues.apache.org/jira/browse/SPARK-40652
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Daniel
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40652) Add MASK_PHONE and TRY_MASK_PHONE functions

2022-10-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612795#comment-17612795
 ] 

Apache Spark commented on SPARK-40652:
--

User 'dtenedor' has created a pull request for this issue:
https://github.com/apache/spark/pull/38101

> Add MASK_PHONE and TRY_MASK_PHONE functions
> ---
>
> Key: SPARK-40652
> URL: https://issues.apache.org/jira/browse/SPARK-40652
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Daniel
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-40636) Fix wrong remained shuffles log in BlockManagerDecommissioner

2022-10-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-40636.
---
Fix Version/s: 3.3.2
   3.2.3
   3.4.0
   Resolution: Fixed

Issue resolved by pull request 38078
[https://github.com/apache/spark/pull/38078]

> Fix wrong remained shuffles log in BlockManagerDecommissioner
> -
>
> Key: SPARK-40636
> URL: https://issues.apache.org/jira/browse/SPARK-40636
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager
>Affects Versions: 3.3.0
>Reporter: Zhongwei Zhu
>Assignee: Zhongwei Zhu
>Priority: Minor
> Fix For: 3.3.2, 3.2.3, 3.4.0
>
>
>  BlockManagerDecommissioner should log correct remained shuffles.
> {code:java}
> 4 of 24 local shuffles are added. In total, 24 shuffles are remained.
> 2022-09-30 17:42:15.035 PDT
> 0 of 24 local shuffles are added. In total, 24 shuffles are remained.
> 2022-09-30 17:42:45.069 PDT
> 0 of 24 local shuffles are added. In total, 24 shuffles are remained.{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40653) Protobuf Support in Structured Streaming

2022-10-04 Thread Raghu Angadi (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612793#comment-17612793
 ] 

Raghu Angadi commented on SPARK-40653:
--

cc: [~sanysand...@gmail.com] , [~mparthas] 

> Protobuf Support in Structured Streaming
> 
>
> Key: SPARK-40653
> URL: https://issues.apache.org/jira/browse/SPARK-40653
> Project: Spark
>  Issue Type: Epic
>  Components: Structured Streaming
>Affects Versions: 3.3.0
>Reporter: Raghu Angadi
>Priority: Major
>
> Add support for Protobuf messages in streaming sources. This would be similar 
> to Avro format support. This includes features like schema-registry, Python 
> support, schema evolution, etc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-40653) Protobuf Support in Structured Streaming

2022-10-04 Thread Raghu Angadi (Jira)

Raghu Angadi created SPARK-40653:


 Summary: Protobuf Support in Structured Streaming
 Key: SPARK-40653
 URL: https://issues.apache.org/jira/browse/SPARK-40653
 Project: Spark
  Issue Type: Epic
  Components: Structured Streaming
Affects Versions: 3.3.0
Reporter: Raghu Angadi


Add support for Protobuf messages in streaming sources. This would be similar 
to Avro format support. This includes features like schema-registry, Python 
support, schema evolution, etc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-40652) Add MASK_PHONE and TRY_MASK_PHONE functions

2022-10-04 Thread Daniel (Jira)

Daniel created SPARK-40652:
--

 Summary: Add MASK_PHONE and TRY_MASK_PHONE functions
 Key: SPARK-40652
 URL: https://issues.apache.org/jira/browse/SPARK-40652
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: Daniel






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40651) Drop Hadoop2 binary distribtuion from release process

2022-10-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612764#comment-17612764
 ] 

Apache Spark commented on SPARK-40651:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/38099

> Drop Hadoop2 binary distribtuion from release process
> -
>
> Key: SPARK-40651
> URL: https://issues.apache.org/jira/browse/SPARK-40651
> Project: Spark
>  Issue Type: Task
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40651) Drop Hadoop2 binary distribtuion from release process

2022-10-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612762#comment-17612762
 ] 

Apache Spark commented on SPARK-40651:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/38099

> Drop Hadoop2 binary distribtuion from release process
> -
>
> Key: SPARK-40651
> URL: https://issues.apache.org/jira/browse/SPARK-40651
> Project: Spark
>  Issue Type: Task
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40651) Drop Hadoop2 binary distribtuion from release process

2022-10-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40651:


Assignee: Apache Spark

> Drop Hadoop2 binary distribtuion from release process
> -
>
> Key: SPARK-40651
> URL: https://issues.apache.org/jira/browse/SPARK-40651
> Project: Spark
>  Issue Type: Task
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40651) Drop Hadoop2 binary distribtuion from release process

2022-10-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40651:


Assignee: (was: Apache Spark)

> Drop Hadoop2 binary distribtuion from release process
> -
>
> Key: SPARK-40651
> URL: https://issues.apache.org/jira/browse/SPARK-40651
> Project: Spark
>  Issue Type: Task
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-40651) Drop Hadoop2 binary distribtuion from release process

2022-10-04 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-40651:
-

 Summary: Drop Hadoop2 binary distribtuion from release process
 Key: SPARK-40651
 URL: https://issues.apache.org/jira/browse/SPARK-40651
 Project: Spark
  Issue Type: Task
  Components: Project Infra
Affects Versions: 3.4.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-40650) Infer date type for Json schema inference

2022-10-04 Thread Xiaonan Yang (Jira)

Xiaonan Yang created SPARK-40650:


 Summary: Infer date type for Json schema inference
 Key: SPARK-40650
 URL: https://issues.apache.org/jira/browse/SPARK-40650
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.4.0
Reporter: Xiaonan Yang
 Fix For: 3.4.0


In ticket https://issues.apache.org/jira/browse/SPARK-39469, we introduced date 
type support in CSV schema inference. We want to introduce similar support for 
Json data source.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-40649) Infer date type for Json schema inference

2022-10-04 Thread Xiaonan Yang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaonan Yang updated SPARK-40649:
-
Fix Version/s: (was: 3.4.0)

> Infer date type for Json schema inference
> -
>
> Key: SPARK-40649
> URL: https://issues.apache.org/jira/browse/SPARK-40649
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.1
>Reporter: Xiaonan Yang
>Assignee: Jonathan Cui
>Priority: Major
>
> 1. If a column contains only dates, it should be of “date” type in the 
> inferred schema
>  * If the date format and the timestamp format are identical (e.g. both are 
> /mm/dd), entries will default to being interpreted as Date
> 2. If a column contains dates and timestamps, it should be of “timestamp” 
> type in the inferred schema
>  
> A similar issue was opened in the past but was reverted due to the lack of 
> strict pattern matching. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-40649) Infer date type for Json schema inference

2022-10-04 Thread Xiaonan Yang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaonan Yang resolved SPARK-40649.
--
Resolution: Duplicate

> Infer date type for Json schema inference
> -
>
> Key: SPARK-40649
> URL: https://issues.apache.org/jira/browse/SPARK-40649
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.1
>Reporter: Xiaonan Yang
>Assignee: Jonathan Cui
>Priority: Major
> Fix For: 3.4.0
>
>
> 1. If a column contains only dates, it should be of “date” type in the 
> inferred schema
>  * If the date format and the timestamp format are identical (e.g. both are 
> /mm/dd), entries will default to being interpreted as Date
> 2. If a column contains dates and timestamps, it should be of “timestamp” 
> type in the inferred schema
>  
> A similar issue was opened in the past but was reverted due to the lack of 
> strict pattern matching. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-40649) Infer date type for Json schema inference

2022-10-04 Thread Xiaonan Yang (Jira)

Xiaonan Yang created SPARK-40649:


 Summary: Infer date type for Json schema inference
 Key: SPARK-40649
 URL: https://issues.apache.org/jira/browse/SPARK-40649
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.1
Reporter: Xiaonan Yang
Assignee: Jonathan Cui
 Fix For: 3.4.0


1. If a column contains only dates, it should be of “date” type in the inferred 
schema
 * If the date format and the timestamp format are identical (e.g. both are 
/mm/dd), entries will default to being interpreted as Date

2. If a column contains dates and timestamps, it should be of “timestamp” type 
in the inferred schema

 

A similar issue was opened in the past but was reverted due to the lack of 
strict pattern matching. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-39725) Upgrade jetty-http from 9.4.46.v20220331 to 9.4.48.v20220622

2022-10-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-39725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612727#comment-17612727
 ] 

Apache Spark commented on SPARK-39725:
--

User 'bjornjorgensen' has created a pull request for this issue:
https://github.com/apache/spark/pull/38098

> Upgrade jetty-http from 9.4.46.v20220331 to 9.4.48.v20220622
> 
>
> Key: SPARK-39725
> URL: https://issues.apache.org/jira/browse/SPARK-39725
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Bjørn Jørgensen
>Assignee: Bjørn Jørgensen
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: jetty-io-spark.png
>
>
> [Release note |https://github.com/eclipse/jetty.project/releases] 
> [CVE-2022-2047|https://nvd.nist.gov/vuln/detail/CVE-2022-2047]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-39725) Upgrade jetty-http from 9.4.46.v20220331 to 9.4.48.v20220622

2022-10-04 Thread Jira



[ 
https://issues.apache.org/jira/browse/SPARK-39725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612711#comment-17612711
 ] 

Bjørn Jørgensen commented on SPARK-39725:
-

I created a branch from 3.3 but it start to build to master.  
https://github.com/bjornjorgensen/spark/tree/3.3_jetty_48.xx 

> Upgrade jetty-http from 9.4.46.v20220331 to 9.4.48.v20220622
> 
>
> Key: SPARK-39725
> URL: https://issues.apache.org/jira/browse/SPARK-39725
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Bjørn Jørgensen
>Assignee: Bjørn Jørgensen
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: jetty-io-spark.png
>
>
> [Release note |https://github.com/eclipse/jetty.project/releases] 
> [CVE-2022-2047|https://nvd.nist.gov/vuln/detail/CVE-2022-2047]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-39725) Upgrade jetty-http from 9.4.46.v20220331 to 9.4.48.v20220622

2022-10-04 Thread Sean R. Owen (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-39725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612707#comment-17612707
 ] 

Sean R. Owen commented on SPARK-39725:
--

[~bjornjorgensen] well, it would need to be a change vs branch-3.3, which is 
already on 9.4.46: https://github.com/apache/spark/blob/branch-3.3/pom.xml#L136 
But it's a simple change yes.

> Upgrade jetty-http from 9.4.46.v20220331 to 9.4.48.v20220622
> 
>
> Key: SPARK-39725
> URL: https://issues.apache.org/jira/browse/SPARK-39725
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Bjørn Jørgensen
>Assignee: Bjørn Jørgensen
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: jetty-io-spark.png
>
>
> [Release note |https://github.com/eclipse/jetty.project/releases] 
> [CVE-2022-2047|https://nvd.nist.gov/vuln/detail/CVE-2022-2047]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] (SPARK-39725) Upgrade jetty-http from 9.4.46.v20220331 to 9.4.48.v20220622

2022-10-04 Thread Jira



[ https://issues.apache.org/jira/browse/SPARK-39725 ]


Bjørn Jørgensen deleted comment on SPARK-39725:
-

was (Author: bjornjorgensen):
[~srowen] LIke 
[this|https://github.com/bjornjorgensen/spark/tree/3.3-etty_48.v20220622] one?
 

> Upgrade jetty-http from 9.4.46.v20220331 to 9.4.48.v20220622
> 
>
> Key: SPARK-39725
> URL: https://issues.apache.org/jira/browse/SPARK-39725
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Bjørn Jørgensen
>Assignee: Bjørn Jørgensen
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: jetty-io-spark.png
>
>
> [Release note |https://github.com/eclipse/jetty.project/releases] 
> [CVE-2022-2047|https://nvd.nist.gov/vuln/detail/CVE-2022-2047]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-39725) Upgrade jetty-http from 9.4.46.v20220331 to 9.4.48.v20220622

2022-10-04 Thread Jira



[ 
https://issues.apache.org/jira/browse/SPARK-39725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612704#comment-17612704
 ] 

Bjørn Jørgensen commented on SPARK-39725:
-

[~srowen] LIke 
[this|https://github.com/bjornjorgensen/spark/tree/3.3-etty_48.v20220622] one?
 

> Upgrade jetty-http from 9.4.46.v20220331 to 9.4.48.v20220622
> 
>
> Key: SPARK-39725
> URL: https://issues.apache.org/jira/browse/SPARK-39725
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Bjørn Jørgensen
>Assignee: Bjørn Jørgensen
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: jetty-io-spark.png
>
>
> [Release note |https://github.com/eclipse/jetty.project/releases] 
> [CVE-2022-2047|https://nvd.nist.gov/vuln/detail/CVE-2022-2047]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-39725) Upgrade jetty-http from 9.4.46.v20220331 to 9.4.48.v20220622

2022-10-04 Thread Sean R. Owen (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-39725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612689#comment-17612689
 ] 

Sean R. Owen commented on SPARK-39725:
--

I don't know if this affects Spark, but I think it's fine to back-port this 
update to 3.3.x. [~bjornjorgensen] are you able to do that now? we could get it 
in for 3.3.1. Or I can.

> Upgrade jetty-http from 9.4.46.v20220331 to 9.4.48.v20220622
> 
>
> Key: SPARK-39725
> URL: https://issues.apache.org/jira/browse/SPARK-39725
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Bjørn Jørgensen
>Assignee: Bjørn Jørgensen
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: jetty-io-spark.png
>
>
> [Release note |https://github.com/eclipse/jetty.project/releases] 
> [CVE-2022-2047|https://nvd.nist.gov/vuln/detail/CVE-2022-2047]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40648) Add `@ExtendedLevelDBTest` to the leveldb relevant tests in the yarn module

2022-10-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612680#comment-17612680
 ] 

Apache Spark commented on SPARK-40648:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/38097

>   Add `@ExtendedLevelDBTest` to the leveldb relevant  tests in the yarn module
> --
>
> Key: SPARK-40648
> URL: https://issues.apache.org/jira/browse/SPARK-40648
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests, YARN
>Affects Versions: 3.2.2, 3.4.0, 3.3.1
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Major
>
> SPARK-40490 make  the test case related to `YarnShuffleIntegrationSuite` 
> starts to verify the registeredExecFile reload test scenario again，so we need 
> to add `@ExtendedLevelDBTest` for the test case using LevelDB so that the 
> `MacOs/Apple Silicon` can skip relevant tests through 
> `-Dtest.exclude.tags=org.apache.spark.tags.ExtendedLevelDBTest`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-39725) Upgrade jetty-http from 9.4.46.v20220331 to 9.4.48.v20220622

2022-10-04 Thread phoebe chen (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-39725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612681#comment-17612681
 ] 

phoebe chen commented on SPARK-39725:
-

Thanks(y)

> Upgrade jetty-http from 9.4.46.v20220331 to 9.4.48.v20220622
> 
>
> Key: SPARK-39725
> URL: https://issues.apache.org/jira/browse/SPARK-39725
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Bjørn Jørgensen
>Assignee: Bjørn Jørgensen
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: jetty-io-spark.png
>
>
> [Release note |https://github.com/eclipse/jetty.project/releases] 
> [CVE-2022-2047|https://nvd.nist.gov/vuln/detail/CVE-2022-2047]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40648) Add `@ExtendedLevelDBTest` to the leveldb relevant tests in the yarn module

2022-10-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612679#comment-17612679
 ] 

Apache Spark commented on SPARK-40648:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/38097

>   Add `@ExtendedLevelDBTest` to the leveldb relevant  tests in the yarn module
> --
>
> Key: SPARK-40648
> URL: https://issues.apache.org/jira/browse/SPARK-40648
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests, YARN
>Affects Versions: 3.2.2, 3.4.0, 3.3.1
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Major
>
> SPARK-40490 make  the test case related to `YarnShuffleIntegrationSuite` 
> starts to verify the registeredExecFile reload test scenario again，so we need 
> to add `@ExtendedLevelDBTest` for the test case using LevelDB so that the 
> `MacOs/Apple Silicon` can skip relevant tests through 
> `-Dtest.exclude.tags=org.apache.spark.tags.ExtendedLevelDBTest`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40648) Add `@ExtendedLevelDBTest` to the leveldb relevant tests in the yarn module

2022-10-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612676#comment-17612676
 ] 

Apache Spark commented on SPARK-40648:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/38096

>   Add `@ExtendedLevelDBTest` to the leveldb relevant  tests in the yarn module
> --
>
> Key: SPARK-40648
> URL: https://issues.apache.org/jira/browse/SPARK-40648
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests, YARN
>Affects Versions: 3.2.2, 3.4.0, 3.3.1
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Major
>
> SPARK-40490 make  the test case related to `YarnShuffleIntegrationSuite` 
> starts to verify the registeredExecFile reload test scenario again，so we need 
> to add `@ExtendedLevelDBTest` for the test case using LevelDB so that the 
> `MacOs/Apple Silicon` can skip relevant tests through 
> `-Dtest.exclude.tags=org.apache.spark.tags.ExtendedLevelDBTest`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40648) Add `@ExtendedLevelDBTest` to the leveldb relevant tests in the yarn module

2022-10-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612674#comment-17612674
 ] 

Apache Spark commented on SPARK-40648:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/38096

>   Add `@ExtendedLevelDBTest` to the leveldb relevant  tests in the yarn module
> --
>
> Key: SPARK-40648
> URL: https://issues.apache.org/jira/browse/SPARK-40648
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests, YARN
>Affects Versions: 3.2.2, 3.4.0, 3.3.1
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Major
>
> SPARK-40490 make  the test case related to `YarnShuffleIntegrationSuite` 
> starts to verify the registeredExecFile reload test scenario again，so we need 
> to add `@ExtendedLevelDBTest` for the test case using LevelDB so that the 
> `MacOs/Apple Silicon` can skip relevant tests through 
> `-Dtest.exclude.tags=org.apache.spark.tags.ExtendedLevelDBTest`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40648) Add `@ExtendedLevelDBTest` to the leveldb relevant tests in the yarn module

2022-10-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40648:


Assignee: Apache Spark

>   Add `@ExtendedLevelDBTest` to the leveldb relevant  tests in the yarn module
> --
>
> Key: SPARK-40648
> URL: https://issues.apache.org/jira/browse/SPARK-40648
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests, YARN
>Affects Versions: 3.2.2, 3.4.0, 3.3.1
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Major
>
> SPARK-40490 make  the test case related to `YarnShuffleIntegrationSuite` 
> starts to verify the registeredExecFile reload test scenario again，so we need 
> to add `@ExtendedLevelDBTest` for the test case using LevelDB so that the 
> `MacOs/Apple Silicon` can skip relevant tests through 
> `-Dtest.exclude.tags=org.apache.spark.tags.ExtendedLevelDBTest`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40648) Add `@ExtendedLevelDBTest` to the leveldb relevant tests in the yarn module

2022-10-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612653#comment-17612653
 ] 

Apache Spark commented on SPARK-40648:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/38095

>   Add `@ExtendedLevelDBTest` to the leveldb relevant  tests in the yarn module
> --
>
> Key: SPARK-40648
> URL: https://issues.apache.org/jira/browse/SPARK-40648
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests, YARN
>Affects Versions: 3.2.2, 3.4.0, 3.3.1
>Reporter: Yang Jie
>Priority: Major
>
> SPARK-40490 make  the test case related to `YarnShuffleIntegrationSuite` 
> starts to verify the registeredExecFile reload test scenario again，so we need 
> to add `@ExtendedLevelDBTest` for the test case using LevelDB so that the 
> `MacOs/Apple Silicon` can skip relevant tests through 
> `-Dtest.exclude.tags=org.apache.spark.tags.ExtendedLevelDBTest`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40648) Add `@ExtendedLevelDBTest` to the leveldb relevant tests in the yarn module

2022-10-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40648:


Assignee: Apache Spark

>   Add `@ExtendedLevelDBTest` to the leveldb relevant  tests in the yarn module
> --
>
> Key: SPARK-40648
> URL: https://issues.apache.org/jira/browse/SPARK-40648
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests, YARN
>Affects Versions: 3.2.2, 3.4.0, 3.3.1
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Major
>
> SPARK-40490 make  the test case related to `YarnShuffleIntegrationSuite` 
> starts to verify the registeredExecFile reload test scenario again，so we need 
> to add `@ExtendedLevelDBTest` for the test case using LevelDB so that the 
> `MacOs/Apple Silicon` can skip relevant tests through 
> `-Dtest.exclude.tags=org.apache.spark.tags.ExtendedLevelDBTest`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40648) Add `@ExtendedLevelDBTest` to the leveldb relevant tests in the yarn module

2022-10-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40648:


Assignee: (was: Apache Spark)

>   Add `@ExtendedLevelDBTest` to the leveldb relevant  tests in the yarn module
> --
>
> Key: SPARK-40648
> URL: https://issues.apache.org/jira/browse/SPARK-40648
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests, YARN
>Affects Versions: 3.2.2, 3.4.0, 3.3.1
>Reporter: Yang Jie
>Priority: Major
>
> SPARK-40490 make  the test case related to `YarnShuffleIntegrationSuite` 
> starts to verify the registeredExecFile reload test scenario again，so we need 
> to add `@ExtendedLevelDBTest` for the test case using LevelDB so that the 
> `MacOs/Apple Silicon` can skip relevant tests through 
> `-Dtest.exclude.tags=org.apache.spark.tags.ExtendedLevelDBTest`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-40648) Add `@ExtendedLevelDBTest` to the leveldb relevant tests in the yarn module

2022-10-04 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-40648:
-
Summary:   Add `@ExtendedLevelDBTest` to the leveldb relevant  tests in the 
yarn module  (was:   Add `@ExtendedLevelDBTest` to the testing leveldb in the 
yarn module)

>   Add `@ExtendedLevelDBTest` to the leveldb relevant  tests in the yarn module
> --
>
> Key: SPARK-40648
> URL: https://issues.apache.org/jira/browse/SPARK-40648
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests, YARN
>Affects Versions: 3.2.2, 3.4.0, 3.3.1
>Reporter: Yang Jie
>Priority: Major
>
> SPARK-40490 make  the test case related to `YarnShuffleIntegrationSuite` 
> starts to verify the registeredExecFile reload test scenario again，so we need 
> to add `@ExtendedLevelDBTest` for the test case using LevelDB so that the 
> `MacOs/Apple Silicon` can skip relevant tests through 
> `-Dtest.exclude.tags=org.apache.spark.tags.ExtendedLevelDBTest`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-40648) Add `@ExtendedLevelDBTest` to the testing leveldb in the yarn module

2022-10-04 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-40648:
-
Summary:   Add `@ExtendedLevelDBTest` to the testing leveldb in the yarn 
module  (was:   Add `@ExtendedLevelDBTest` for the case of testing leveldb in 
the yarn module)

>   Add `@ExtendedLevelDBTest` to the testing leveldb in the yarn module
> --
>
> Key: SPARK-40648
> URL: https://issues.apache.org/jira/browse/SPARK-40648
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests, YARN
>Affects Versions: 3.2.2, 3.4.0, 3.3.1
>Reporter: Yang Jie
>Priority: Major
>
> SPARK-40490 make  the test case related to `YarnShuffleIntegrationSuite` 
> starts to verify the registeredExecFile reload test scenario again，so we need 
> to add `@ExtendedLevelDBTest` for the test case using LevelDB so that the 
> `MacOs/Apple Silicon` can skip relevant tests through 
> `-Dtest.exclude.tags=org.apache.spark.tags.ExtendedLevelDBTest`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-40648) Add `@ExtendedLevelDBTest` for the case of testing leveldb in the yarn module

2022-10-04 Thread Yang Jie (Jira)

Yang Jie created SPARK-40648:


 Summary:   Add `@ExtendedLevelDBTest` for the case of testing 
leveldb in the yarn module
 Key: SPARK-40648
 URL: https://issues.apache.org/jira/browse/SPARK-40648
 Project: Spark
  Issue Type: Improvement
  Components: Tests, YARN
Affects Versions: 3.2.2, 3.4.0, 3.3.1
Reporter: Yang Jie


SPARK-40490 make  the test case related to `YarnShuffleIntegrationSuite` starts 
to verify the registeredExecFile reload test scenario again，so we need to add 
`@ExtendedLevelDBTest` for the test case using LevelDB so that the `MacOs/Apple 
Silicon` can skip relevant tests through 
`-Dtest.exclude.tags=org.apache.spark.tags.ExtendedLevelDBTest`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40618) Bug in MergeScalarSubqueries rule attempting to merge nested subquery with parent

2022-10-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612517#comment-17612517
 ] 

Apache Spark commented on SPARK-40618:
--

User 'peter-toth' has created a pull request for this issue:
https://github.com/apache/spark/pull/38093

> Bug in MergeScalarSubqueries rule attempting to merge nested subquery with 
> parent
> -
>
> Key: SPARK-40618
> URL: https://issues.apache.org/jira/browse/SPARK-40618
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Daniel
>Assignee: Daniel
>Priority: Major
> Fix For: 3.4.0
>
>
> There is a bug in the `MergeScalarSubqueries` rule for queries with subquery 
> expressions nested inside each other, wherein the rule attempts to merge the 
> nested subquery with its enclosing parent subquery. The result is not a valid 
> plan and raises an exception in the optimizer. Here is a minimal reproducing 
> case:
>  
> ```
> sql("create table test(col int) using csv")
> checkAnswer(sql("select(select sum((select sum(col) from test)) from test)"), 
> Row(null))
> ```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40647) DAGScheduler should fail job until all related running tasks have been killed

2022-10-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40647:


Assignee: (was: Apache Spark)

> DAGScheduler should fail job until all related running tasks have been killed
> -
>
> Key: SPARK-40647
> URL: https://issues.apache.org/jira/browse/SPARK-40647
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.2
>Reporter: Wechar
>Priority: Major
>
> *Issue Description*
> The staging directory within table location is not removed when {{CTAS}} 
> fails sometimes.
> It is a trouble if the new table is a Managed Table when we want to recreate 
> it.
> *Root Cause*
> SchedulerBackend kills tasks via {{KillTask}} message which is asynchronous, 
> so we may have failed a job but the tasks are still running and create the 
> tmp file. Even if the running tasks will failed and delete the generated file 
> finally, but the temporary directory was left.
> *Solution*
> Before killing a job, we should make sure that all related running tasks have 
> been killed.
> *How to Reproduce*
> Step 1: create a source table and insert data to make the file number exceeds 
> 20 on HDFS
> {code:sql}
> -- create source table
> CREATE TABLE IF NOT EXISTS default.test_wechar
> (name string)
> PARTITIONED BY (grass_date date)
> STORED AS PARQUET
> -- insert data 24 times
> insert into default.test_wechar partition (grass_date='2022-09-03')
> select uuid()
> lateral view explode(sequence(1,2000)) as temp_view;
> {code}
> Step 2: create a new path for new table and setQuota to 20
> {code:bash}
> $hadoop fs -count -q hdfs://Test-DMP01/user/weiqiang.yu/tmp/test_wechar_tmp
>   20  19none inf1 
>0  0 
> hdfs://Test-DMP01/user/weiqiang.yu/tmp/test_wechar_tmp
> {code}
> Step 3: create new table from source table
> {code:sql}
> create table if not exists default.test_wechar_tmp 
> location 'hdfs://Test-DMP01/user/weiqiang.yu/tmp/test_wechar_tmp' 
> as select * from default.test_wechar;
> {code}
>  
> Step 4: check the location of new table after the job failed
> {code:bash}
> $hadoop fs -ls /user/weiqiang.yu/tmp/test_wechar_tmp/*
> Found 1 items
> drwxrwxr-x   - weiqiang.yu weiqiang.yu  0 2022-10-04 12:56 
> /user/weiqiang.yu/tmp/test_wechar_tmp/.hive-staging_hive_2022-10-04_12-56-21_545_2745177084386740362-1/-ext-1
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40647) DAGScheduler should fail job until all related running tasks have been killed

2022-10-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612503#comment-17612503
 ] 

Apache Spark commented on SPARK-40647:
--

User 'wecharyu' has created a pull request for this issue:
https://github.com/apache/spark/pull/38092

> DAGScheduler should fail job until all related running tasks have been killed
> -
>
> Key: SPARK-40647
> URL: https://issues.apache.org/jira/browse/SPARK-40647
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.2
>Reporter: Wechar
>Priority: Major
>
> *Issue Description*
> The staging directory within table location is not removed when {{CTAS}} 
> fails sometimes.
> It is a trouble if the new table is a Managed Table when we want to recreate 
> it.
> *Root Cause*
> SchedulerBackend kills tasks via {{KillTask}} message which is asynchronous, 
> so we may have failed a job but the tasks are still running and create the 
> tmp file. Even if the running tasks will failed and delete the generated file 
> finally, but the temporary directory was left.
> *Solution*
> Before killing a job, we should make sure that all related running tasks have 
> been killed.
> *How to Reproduce*
> Step 1: create a source table and insert data to make the file number exceeds 
> 20 on HDFS
> {code:sql}
> -- create source table
> CREATE TABLE IF NOT EXISTS default.test_wechar
> (name string)
> PARTITIONED BY (grass_date date)
> STORED AS PARQUET
> -- insert data 24 times
> insert into default.test_wechar partition (grass_date='2022-09-03')
> select uuid()
> lateral view explode(sequence(1,2000)) as temp_view;
> {code}
> Step 2: create a new path for new table and setQuota to 20
> {code:bash}
> $hadoop fs -count -q hdfs://Test-DMP01/user/weiqiang.yu/tmp/test_wechar_tmp
>   20  19none inf1 
>0  0 
> hdfs://Test-DMP01/user/weiqiang.yu/tmp/test_wechar_tmp
> {code}
> Step 3: create new table from source table
> {code:sql}
> create table if not exists default.test_wechar_tmp 
> location 'hdfs://Test-DMP01/user/weiqiang.yu/tmp/test_wechar_tmp' 
> as select * from default.test_wechar;
> {code}
>  
> Step 4: check the location of new table after the job failed
> {code:bash}
> $hadoop fs -ls /user/weiqiang.yu/tmp/test_wechar_tmp/*
> Found 1 items
> drwxrwxr-x   - weiqiang.yu weiqiang.yu  0 2022-10-04 12:56 
> /user/weiqiang.yu/tmp/test_wechar_tmp/.hive-staging_hive_2022-10-04_12-56-21_545_2745177084386740362-1/-ext-1
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40647) DAGScheduler should fail job until all related running tasks have been killed

2022-10-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40647:


Assignee: Apache Spark

> DAGScheduler should fail job until all related running tasks have been killed
> -
>
> Key: SPARK-40647
> URL: https://issues.apache.org/jira/browse/SPARK-40647
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.2
>Reporter: Wechar
>Assignee: Apache Spark
>Priority: Major
>
> *Issue Description*
> The staging directory within table location is not removed when {{CTAS}} 
> fails sometimes.
> It is a trouble if the new table is a Managed Table when we want to recreate 
> it.
> *Root Cause*
> SchedulerBackend kills tasks via {{KillTask}} message which is asynchronous, 
> so we may have failed a job but the tasks are still running and create the 
> tmp file. Even if the running tasks will failed and delete the generated file 
> finally, but the temporary directory was left.
> *Solution*
> Before killing a job, we should make sure that all related running tasks have 
> been killed.
> *How to Reproduce*
> Step 1: create a source table and insert data to make the file number exceeds 
> 20 on HDFS
> {code:sql}
> -- create source table
> CREATE TABLE IF NOT EXISTS default.test_wechar
> (name string)
> PARTITIONED BY (grass_date date)
> STORED AS PARQUET
> -- insert data 24 times
> insert into default.test_wechar partition (grass_date='2022-09-03')
> select uuid()
> lateral view explode(sequence(1,2000)) as temp_view;
> {code}
> Step 2: create a new path for new table and setQuota to 20
> {code:bash}
> $hadoop fs -count -q hdfs://Test-DMP01/user/weiqiang.yu/tmp/test_wechar_tmp
>   20  19none inf1 
>0  0 
> hdfs://Test-DMP01/user/weiqiang.yu/tmp/test_wechar_tmp
> {code}
> Step 3: create new table from source table
> {code:sql}
> create table if not exists default.test_wechar_tmp 
> location 'hdfs://Test-DMP01/user/weiqiang.yu/tmp/test_wechar_tmp' 
> as select * from default.test_wechar;
> {code}
>  
> Step 4: check the location of new table after the job failed
> {code:bash}
> $hadoop fs -ls /user/weiqiang.yu/tmp/test_wechar_tmp/*
> Found 1 items
> drwxrwxr-x   - weiqiang.yu weiqiang.yu  0 2022-10-04 12:56 
> /user/weiqiang.yu/tmp/test_wechar_tmp/.hive-staging_hive_2022-10-04_12-56-21_545_2745177084386740362-1/-ext-1
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40647) DAGScheduler should fail job until all related running tasks have been killed

2022-10-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612504#comment-17612504
 ] 

Apache Spark commented on SPARK-40647:
--

User 'wecharyu' has created a pull request for this issue:
https://github.com/apache/spark/pull/38092

> DAGScheduler should fail job until all related running tasks have been killed
> -
>
> Key: SPARK-40647
> URL: https://issues.apache.org/jira/browse/SPARK-40647
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.2
>Reporter: Wechar
>Priority: Major
>
> *Issue Description*
> The staging directory within table location is not removed when {{CTAS}} 
> fails sometimes.
> It is a trouble if the new table is a Managed Table when we want to recreate 
> it.
> *Root Cause*
> SchedulerBackend kills tasks via {{KillTask}} message which is asynchronous, 
> so we may have failed a job but the tasks are still running and create the 
> tmp file. Even if the running tasks will failed and delete the generated file 
> finally, but the temporary directory was left.
> *Solution*
> Before killing a job, we should make sure that all related running tasks have 
> been killed.
> *How to Reproduce*
> Step 1: create a source table and insert data to make the file number exceeds 
> 20 on HDFS
> {code:sql}
> -- create source table
> CREATE TABLE IF NOT EXISTS default.test_wechar
> (name string)
> PARTITIONED BY (grass_date date)
> STORED AS PARQUET
> -- insert data 24 times
> insert into default.test_wechar partition (grass_date='2022-09-03')
> select uuid()
> lateral view explode(sequence(1,2000)) as temp_view;
> {code}
> Step 2: create a new path for new table and setQuota to 20
> {code:bash}
> $hadoop fs -count -q hdfs://Test-DMP01/user/weiqiang.yu/tmp/test_wechar_tmp
>   20  19none inf1 
>0  0 
> hdfs://Test-DMP01/user/weiqiang.yu/tmp/test_wechar_tmp
> {code}
> Step 3: create new table from source table
> {code:sql}
> create table if not exists default.test_wechar_tmp 
> location 'hdfs://Test-DMP01/user/weiqiang.yu/tmp/test_wechar_tmp' 
> as select * from default.test_wechar;
> {code}
>  
> Step 4: check the location of new table after the job failed
> {code:bash}
> $hadoop fs -ls /user/weiqiang.yu/tmp/test_wechar_tmp/*
> Found 1 items
> drwxrwxr-x   - weiqiang.yu weiqiang.yu  0 2022-10-04 12:56 
> /user/weiqiang.yu/tmp/test_wechar_tmp/.hive-staging_hive_2022-10-04_12-56-21_545_2745177084386740362-1/-ext-1
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40096) Finalize shuffle merge slow due to connection creation fails

2022-10-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612497#comment-17612497
 ] 

Apache Spark commented on SPARK-40096:
--

User 'wankunde' has created a pull request for this issue:
https://github.com/apache/spark/pull/38091

> Finalize shuffle merge slow due to connection creation fails
> 
>
> Key: SPARK-40096
> URL: https://issues.apache.org/jira/browse/SPARK-40096
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Wan Kun
>Assignee: Wan Kun
>Priority: Major
> Fix For: 3.4.0
>
>
> *How to reproduce this issue*
>  * Enable push based shuffle
>  * Remove some merger nodes before sending finalize RPCs
>  * Driver try to connect those merger shuffle services and send finalize RPC 
> one by one, each connection creation will timeout after 
> SPARK_NETWORK_IO_CONNECTIONCREATIONTIMEOUT_KEY (120s by default)
>  
> We can send these RPCs in *shuffleMergeFinalizeScheduler*  thread pool and 
> handle the connection creation exception



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

86 matches

Mail list logo