[jira] [Created] (SPARK-26288) add initRegisteredExecutorsDB in ExternalShuffleService

2018-12-05 Thread weixiuli (JIRA)
weixiuli created SPARK-26288:


 Summary: add initRegisteredExecutorsDB in ExternalShuffleService
 Key: SPARK-26288
 URL: https://issues.apache.org/jira/browse/SPARK-26288
 Project: Spark
  Issue Type: New Feature
  Components: Kubernetes, Shuffle
Affects Versions: 2.4.0
Reporter: weixiuli
 Fix For: 2.4.0


As we all know that spark on Yarn uses DB to record RegisteredExecutors 
information, when the ExternalShuffleService restart and it can be reloaded, 
which will be used as well .

While neither spark's standalone nor spark on k8s can record it's 
RegisteredExecutors information by db or others ,so when ExternalShuffleService 
restart ,which RegisteredExecutors information will be lost,it is't what we 
looking forward to .

This commit add initRegisteredExecutorsDB which can be used either spark 
standalone or spark on k8s to record RegisteredExecutors information , when the 
ExternalShuffleService restart and it can be reloaded, which will be used as 
well .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26288) add initRegisteredExecutorsDB in ExternalShuffleService

2018-12-06 Thread weixiuli (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weixiuli updated SPARK-26288:
-
Description: 
As we all know that spark on Yarn uses DB to record RegisteredExecutors 
information which can be reloaded and used again when the 
ExternalShuffleService is restarted .

The RegisteredExecutors information can't be recorded both in the mode of 
spark's standalone and spark on k8s , which will cause the RegisteredExecutors 
information to be lost ,when the ExternalShuffleService is restarted.

To solve the problem above, a method is proposed and is committed .

  was:
As we all know that spark on Yarn uses DB to record RegisteredExecutors 
information, when the ExternalShuffleService restart and it can be reloaded, 
which will be used as well .

While neither spark's standalone nor spark on k8s can record it's 
RegisteredExecutors information by db or others ,so when ExternalShuffleService 
restart ,which RegisteredExecutors information will be lost,it is't what we 
looking forward to .

This commit add initRegisteredExecutorsDB which can be used either spark 
standalone or spark on k8s to record RegisteredExecutors information , when the 
ExternalShuffleService restart and it can be reloaded, which will be used as 
well .


> add initRegisteredExecutorsDB in ExternalShuffleService
> ---
>
> Key: SPARK-26288
> URL: https://issues.apache.org/jira/browse/SPARK-26288
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes, Shuffle
>Affects Versions: 2.4.0
>Reporter: weixiuli
>Priority: Major
> Fix For: 2.4.0
>
>
> As we all know that spark on Yarn uses DB to record RegisteredExecutors 
> information which can be reloaded and used again when the 
> ExternalShuffleService is restarted .
> The RegisteredExecutors information can't be recorded both in the mode of 
> spark's standalone and spark on k8s , which will cause the 
> RegisteredExecutors information to be lost ,when the ExternalShuffleService 
> is restarted.
> To solve the problem above, a method is proposed and is committed .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26288) add initRegisteredExecutorsDB in ExternalShuffleService

2018-12-18 Thread weixiuli (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weixiuli updated SPARK-26288:
-
Component/s: Spark Core

> add initRegisteredExecutorsDB in ExternalShuffleService
> ---
>
> Key: SPARK-26288
> URL: https://issues.apache.org/jira/browse/SPARK-26288
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes, Shuffle, Spark Core
>Affects Versions: 2.4.0
>Reporter: weixiuli
>Priority: Major
>
> As we all know that spark on Yarn uses DB to record RegisteredExecutors 
> information which can be reloaded and used again when the 
> ExternalShuffleService is restarted .
> The RegisteredExecutors information can't be recorded both in the mode of 
> spark's standalone and spark on k8s , which will cause the 
> RegisteredExecutors information to be lost ,when the ExternalShuffleService 
> is restarted.
> To solve the problem above, a method is proposed and is committed .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29551) There is a bug about fetch failed when an executor lost

2019-10-22 Thread weixiuli (Jira)
weixiuli created SPARK-29551:


 Summary: There is a bug about fetch failed when an executor lost 
 Key: SPARK-29551
 URL: https://issues.apache.org/jira/browse/SPARK-29551
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.4.3
Reporter: weixiuli
 Fix For: 2.4.3


There will be a regression when the executor lost and then causes 'fetch 
failed'.

We can add  an unittest in 'DAGSchedulerSuite.scala'  to catch the above 
problem.

{code}
test("All shuffle files on the slave should be cleaned up when slave lost 
test") {
// reset the test context with the right shuffle service config
afterEach()
val conf = new SparkConf()
conf.set(config.SHUFFLE_SERVICE_ENABLED.key, "true")
conf.set("spark.files.fetchFailure.unRegisterOutputOnHost", "true")
init(conf)
runEvent(ExecutorAdded("exec-hostA1", "hostA"))
runEvent(ExecutorAdded("exec-hostA2", "hostA"))
runEvent(ExecutorAdded("exec-hostB", "hostB"))
val firstRDD = new MyRDD(sc, 3, Nil)
val firstShuffleDep = new ShuffleDependency(firstRDD, new 
HashPartitioner(3))
val firstShuffleId = firstShuffleDep.shuffleId
val shuffleMapRdd = new MyRDD(sc, 3, List(firstShuffleDep))
val shuffleDep = new ShuffleDependency(shuffleMapRdd, new 
HashPartitioner(3))
val secondShuffleId = shuffleDep.shuffleId
val reduceRdd = new MyRDD(sc, 1, List(shuffleDep))
submit(reduceRdd, Array(0))
// map stage1 completes successfully, with one task on each executor
complete(taskSets(0), Seq(
  (Success,
MapStatus(
  BlockManagerId("exec-hostA1", "hostA", 12345), 
Array.fill[Long](1)(2), mapTaskId = 5)),
  (Success,
MapStatus(
  BlockManagerId("exec-hostA2", "hostA", 12345), 
Array.fill[Long](1)(2), mapTaskId = 6)),
  (Success, makeMapStatus("hostB", 1, mapTaskId = 7))
))
// map stage2 completes successfully, with one task on each executor
complete(taskSets(1), Seq(
  (Success,
MapStatus(
  BlockManagerId("exec-hostA1", "hostA", 12345), 
Array.fill[Long](1)(2), mapTaskId = 8)),
  (Success,
MapStatus(
  BlockManagerId("exec-hostA2", "hostA", 12345), 
Array.fill[Long](1)(2), mapTaskId = 9)),
  (Success, makeMapStatus("hostB", 1, mapTaskId = 10))
))
// make sure our test setup is correct
val initialMapStatus1 = 
mapOutputTracker.shuffleStatuses(firstShuffleId).mapStatuses
//  val initialMapStatus1 = mapOutputTracker.mapStatuses.get(0).get
assert(initialMapStatus1.count(_ != null) === 3)
assert(initialMapStatus1.map{_.location.executorId}.toSet ===
  Set("exec-hostA1", "exec-hostA2", "exec-hostB"))
assert(initialMapStatus1.map{_.mapId}.toSet === Set(5, 6, 7))

val initialMapStatus2 = 
mapOutputTracker.shuffleStatuses(secondShuffleId).mapStatuses
//  val initialMapStatus1 = mapOutputTracker.mapStatuses.get(0).get
assert(initialMapStatus2.count(_ != null) === 3)
assert(initialMapStatus2.map{_.location.executorId}.toSet ===
  Set("exec-hostA1", "exec-hostA2", "exec-hostB"))
assert(initialMapStatus2.map{_.mapId}.toSet === Set(8, 9, 10))

// kill exec-hostA2
runEvent(ExecutorLost("exec-hostA2", ExecutorKilled))
// reduce stage fails with a fetch failure from map stage from exec-hostA2
complete(taskSets(2), Seq(
  (FetchFailed(BlockManagerId("exec-hostA2", "hostA", 12345),
secondShuffleId, 0L, 0, 0, "ignored"),
null)
))
// Here is the main assertion -- make sure that we de-register
// the map outputs for both map stage from both executors on hostA
val mapStatus1 = 
mapOutputTracker.shuffleStatuses(firstShuffleId).mapStatuses
assert(mapStatus1.count(_ != null) === 1)
assert(mapStatus1(2).location.executorId === "exec-hostB")
assert(mapStatus1(2).location.host === "hostB")

val mapStatus2 = 
mapOutputTracker.shuffleStatuses(secondShuffleId).mapStatuses
assert(mapStatus2.count(_ != null) === 1)
assert(mapStatus2(2).location.executorId === "exec-hostB")
assert(mapStatus2(2).location.host === "hostB")
  }
{code}

The error output is:
{code}

3 did not equal 1
ScalaTestFailureLocation: org.apache.spark.scheduler.DAGSchedulerSuite at 
(DAGSchedulerSuite.scala:609)
Expected :1
Actual   :3
 

org.scalatest.exceptions.TestFailedException: 3 did not equal 1

{code}






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29551) There is a bug about fetch failed when an executor lost

2019-10-22 Thread weixiuli (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weixiuli updated SPARK-29551:
-
Description: 
There will be a regression when the executor lost and then causes 'fetch 
failed'.

When an executor lost with some reason (eg:. the external shuffle service or 
host lost on the executor's host ) and the executor loses time happens to be 
reduce stage fetch failed from it which is really bad, the previous only call 
mapOutputTracker.unregisterMapOutput(shuffleId, mapIndex, bmAddress) to mark 
one map as broken in the map stage at this time , but other maps on the 
executor are also not available which can only be resubmitted by a nest retry 
stage which is the regression.

As we all know that the previous will call 
mapOutputTracker.removeOutputsOnHost(host) or
mapOutputTracker.removeOutputsOnExecutor(execId) when reduce stage fetches 
failed and the executor is active, while it does NOT for the above problems.

So we should distinguish the failedEpoch of 'executor lost' from the 
fetchFailedEpoch of 'fetch failed' to solve the above problem.

We can add  an unittest in 'DAGSchedulerSuite.scala'  to catch the above 
problem.

{code}
test("All shuffle files on the slave should be cleaned up when slave lost 
test") {
// reset the test context with the right shuffle service config
afterEach()
val conf = new SparkConf()
conf.set(config.SHUFFLE_SERVICE_ENABLED.key, "true")
conf.set("spark.files.fetchFailure.unRegisterOutputOnHost", "true")
init(conf)
runEvent(ExecutorAdded("exec-hostA1", "hostA"))
runEvent(ExecutorAdded("exec-hostA2", "hostA"))
runEvent(ExecutorAdded("exec-hostB", "hostB"))
val firstRDD = new MyRDD(sc, 3, Nil)
val firstShuffleDep = new ShuffleDependency(firstRDD, new 
HashPartitioner(3))
val firstShuffleId = firstShuffleDep.shuffleId
val shuffleMapRdd = new MyRDD(sc, 3, List(firstShuffleDep))
val shuffleDep = new ShuffleDependency(shuffleMapRdd, new 
HashPartitioner(3))
val secondShuffleId = shuffleDep.shuffleId
val reduceRdd = new MyRDD(sc, 1, List(shuffleDep))
submit(reduceRdd, Array(0))
// map stage1 completes successfully, with one task on each executor
complete(taskSets(0), Seq(
  (Success,
MapStatus(
  BlockManagerId("exec-hostA1", "hostA", 12345), 
Array.fill[Long](1)(2), mapTaskId = 5)),
  (Success,
MapStatus(
  BlockManagerId("exec-hostA2", "hostA", 12345), 
Array.fill[Long](1)(2), mapTaskId = 6)),
  (Success, makeMapStatus("hostB", 1, mapTaskId = 7))
))
// map stage2 completes successfully, with one task on each executor
complete(taskSets(1), Seq(
  (Success,
MapStatus(
  BlockManagerId("exec-hostA1", "hostA", 12345), 
Array.fill[Long](1)(2), mapTaskId = 8)),
  (Success,
MapStatus(
  BlockManagerId("exec-hostA2", "hostA", 12345), 
Array.fill[Long](1)(2), mapTaskId = 9)),
  (Success, makeMapStatus("hostB", 1, mapTaskId = 10))
))
// make sure our test setup is correct
val initialMapStatus1 = 
mapOutputTracker.shuffleStatuses(firstShuffleId).mapStatuses
//  val initialMapStatus1 = mapOutputTracker.mapStatuses.get(0).get
assert(initialMapStatus1.count(_ != null) === 3)
assert(initialMapStatus1.map{_.location.executorId}.toSet ===
  Set("exec-hostA1", "exec-hostA2", "exec-hostB"))
assert(initialMapStatus1.map{_.mapId}.toSet === Set(5, 6, 7))

val initialMapStatus2 = 
mapOutputTracker.shuffleStatuses(secondShuffleId).mapStatuses
//  val initialMapStatus1 = mapOutputTracker.mapStatuses.get(0).get
assert(initialMapStatus2.count(_ != null) === 3)
assert(initialMapStatus2.map{_.location.executorId}.toSet ===
  Set("exec-hostA1", "exec-hostA2", "exec-hostB"))
assert(initialMapStatus2.map{_.mapId}.toSet === Set(8, 9, 10))

// kill exec-hostA2
runEvent(ExecutorLost("exec-hostA2", ExecutorKilled))
// reduce stage fails with a fetch failure from map stage from exec-hostA2
complete(taskSets(2), Seq(
  (FetchFailed(BlockManagerId("exec-hostA2", "hostA", 12345),
secondShuffleId, 0L, 0, 0, "ignored"),
null)
))
// Here is the main assertion -- make sure that we de-register
// the map outputs for both map stage from both executors on hostA
val mapStatus1 = 
mapOutputTracker.shuffleStatuses(firstShuffleId).mapStatuses
assert(mapStatus1.count(_ != null) === 1)
assert(mapStatus1(2).location.executorId === "exec-hostB")
assert(mapStatus1(2).location.host === "hostB")

val mapStatus2 = 
mapOutputTracker.shuffleStatuses(secondShuffleId).mapStatuses
assert(mapStatus2.count(_ != null) === 1)
assert(mapStatus2(2).location.executorId === "exec-hostB")
assert(mapStatus2(2).location.host === "hostB")
  }
{code}

The error output is:
{code}

3 did not equal 1
ScalaTestFailureLocation: org.apache.spark.sched

[jira] [Commented] (SPARK-27736) Improve handling of FetchFailures caused by ExternalShuffleService losing track of executor registrations

2019-10-31 Thread weixiuli (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-27736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963818#comment-16963818
 ] 

weixiuli commented on SPARK-27736:
--

https://github.com/apache/spark/pull/26206

> Improve handling of FetchFailures caused by ExternalShuffleService losing 
> track of executor registrations
> -
>
> Key: SPARK-27736
> URL: https://issues.apache.org/jira/browse/SPARK-27736
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 2.4.0
>Reporter: Josh Rosen
>Priority: Minor
>
> This ticket describes a fault-tolerance edge-case which can cause Spark jobs 
> to fail if a single external shuffle service process reboots and fails to 
> recover the list of registered executors (something which can happen when 
> using YARN if NodeManager recovery is disabled) _and_ the Spark job has a 
> large number of executors per host.
> I believe this problem can be worked around today via a change of 
> configurations, but I'm filing this issue to (a) better document this 
> problem, and (b) propose either a change of default configurations or 
> additional DAGScheduler logic to better handle this failure mode.
> h2. Problem description
> The external shuffle service process is _mostly_ stateless except for a map 
> tracking the set of registered applications and executors.
> When processing a shuffle fetch request, the shuffle services first checks 
> whether the requested block ID's executor is registered; if it's not 
> registered then the shuffle service throws an exception like 
> {code:java}
> java.lang.RuntimeException: Executor is not registered 
> (appId=application_1557557221330_6891, execId=428){code}
> and this exception becomes a {{FetchFailed}} error in the executor requesting 
> the shuffle block.
> In normal operation this error should not occur because executors shouldn't 
> be mis-routing shuffle fetch requests. However, this _can_ happen if the 
> shuffle service crashes and restarts, causing it to lose its in-memory 
> executor registration state. With YARN this state can be recovered from disk 
> if YARN NodeManager recovery is enabled (using the mechanism added in 
> SPARK-9439), but I don't believe that we perform state recovery in Standalone 
> and Mesos modes (see SPARK-24223).
> If state cannot be recovered then map outputs cannot be served (even though 
> the files probably still exist on disk). In theory, this shouldn't cause 
> Spark jobs to fail because we can always redundantly recompute lost / 
> unfetchable map outputs.
> However, in practice this can cause total job failures in deployments where 
> the node with the failed shuffle service was running a large number of 
> executors: by default, the DAGScheduler unregisters map outputs _only from 
> individual executor whose shuffle blocks could not be fetched_ (see 
> [code|https://github.com/apache/spark/blame/bfb3ffe9b33a403a1f3b6f5407d34a477ce62c85/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1643]),
>  so it can take several rounds of failed stage attempts to fail and clear 
> output from all executors on the faulty host. If the number of executors on a 
> host is greater than the stage retry limit then this can exhaust stage retry 
> attempts and cause job failures.
> This "multiple rounds of recomputation to discover all failed executors on a 
> host" problem was addressed by SPARK-19753, which added a 
> {{spark.files.fetchFailure.unRegisterOutputOnHost}} configuration which 
> promotes executor fetch failures into host-wide fetch failures (clearing 
> output from all neighboring executors upon a single failure). However, that 
> configuration is {{false}} by default.
> h2. Potential solutions
> I have a few ideas about how we can improve this situation:
>  - Update the [YARN external shuffle service 
> documentation|https://spark.apache.org/docs/latest/running-on-yarn.html#configuring-the-external-shuffle-service]
>  to recommend enabling node manager recovery.
>  - Consider defaulting {{spark.files.fetchFailure.unRegisterOutputOnHost}} to 
> {{true}}. This would improve out-of-the-box resiliency for large clusters. 
> The trade-off here is a reduction of efficiency in case there are transient 
> "false positive" fetch failures, but I suspect this case may be unlikely in 
> practice (so the change of default could be an acceptable trade-off). See 
> [prior discussion on 
> GitHub|https://github.com/apache/spark/pull/18150#discussion_r119736751].
>  - Modify DAGScheduler to add special-case handling for "Executor is not 
> registered" exceptions that trigger FetchFailures: if we see this exception 
> then it implies that the shuffle service failed to recover state, implying 
> that all of its pri

[jira] [Created] (SPARK-32170) Improve the speculation for the inefficient tasks by the task metrics.

2020-07-04 Thread weixiuli (Jira)
weixiuli created SPARK-32170:


 Summary:  Improve the speculation for the inefficient tasks by the 
task metrics.
 Key: SPARK-32170
 URL: https://issues.apache.org/jira/browse/SPARK-32170
 Project: Spark
  Issue Type: Improvement
  Components: Scheduler, Spark Core
Affects Versions: 3.0.0
Reporter: weixiuli
 Fix For: 3.0.0


1) Tasks will be speculated when meet certain conditions no matter they are 
inefficient or not,this would be a huge waste of cluster resources.
2) In production,the speculation task from an efficient one  will be killed 
finally,which is unnecessary and will waste of cluster resources.
3) So, we should  evaluate whether the task is inefficient by success tasks 
metrics firstly, and then decide to speculate it or not. The  inefficient task 
will be speculated and efficient one will not, it better for the cluster 
resources.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32170) Improve the speculation for the inefficient tasks by the task metrics.

2020-07-04 Thread weixiuli (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weixiuli updated SPARK-32170:
-
Description: 
1) Tasks will be speculated when meet certain conditions no matter they are 
inefficient or not,this would be a huge waste of cluster resources.
2) In production,the speculation task from an efficient one  will be killed 
finally,which is unnecessary and will waste of cluster resources.
3) So, we should  evaluate whether the task is inefficient by success tasks 
metrics firstly, and then decide to speculate it or not. The  inefficient task 
will be speculated and efficient one will not, it is better for the cluster 
resources.


  was:
1) Tasks will be speculated when meet certain conditions no matter they are 
inefficient or not,this would be a huge waste of cluster resources.
2) In production,the speculation task from an efficient one  will be killed 
finally,which is unnecessary and will waste of cluster resources.
3) So, we should  evaluate whether the task is inefficient by success tasks 
metrics firstly, and then decide to speculate it or not. The  inefficient task 
will be speculated and efficient one will not, it better for the cluster 
resources.



>  Improve the speculation for the inefficient tasks by the task metrics.
> ---
>
> Key: SPARK-32170
> URL: https://issues.apache.org/jira/browse/SPARK-32170
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler, Spark Core
>Affects Versions: 3.0.0
>Reporter: weixiuli
>Priority: Major
> Fix For: 3.0.0
>
>
> 1) Tasks will be speculated when meet certain conditions no matter they are 
> inefficient or not,this would be a huge waste of cluster resources.
> 2) In production,the speculation task from an efficient one  will be killed 
> finally,which is unnecessary and will waste of cluster resources.
> 3) So, we should  evaluate whether the task is inefficient by success tasks 
> metrics firstly, and then decide to speculate it or not. The  inefficient 
> task will be speculated and efficient one will not, it is better for the 
> cluster resources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32170) Improve the speculation for the inefficient tasks by the task metrics.

2020-07-04 Thread weixiuli (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weixiuli updated SPARK-32170:
-
Description: 
1) Tasks will be speculated when meet certain conditions no matter they are 
inefficient or not,this would be a huge waste of cluster resources.
2) In production,the speculation task comes  from an efficient one  will be 
killed finally,which is unnecessary and will waste of cluster resources.
3) So, we should  evaluate whether the task is inefficient by success tasks 
metrics firstly, and then decide to speculate it or not. The  inefficient task 
will be speculated and efficient one will not, it is better for the cluster 
resources.


  was:
1) Tasks will be speculated when meet certain conditions no matter they are 
inefficient or not,this would be a huge waste of cluster resources.
2) In production,the speculation task from an efficient one  will be killed 
finally,which is unnecessary and will waste of cluster resources.
3) So, we should  evaluate whether the task is inefficient by success tasks 
metrics firstly, and then decide to speculate it or not. The  inefficient task 
will be speculated and efficient one will not, it is better for the cluster 
resources.



>  Improve the speculation for the inefficient tasks by the task metrics.
> ---
>
> Key: SPARK-32170
> URL: https://issues.apache.org/jira/browse/SPARK-32170
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler, Spark Core
>Affects Versions: 3.0.0
>Reporter: weixiuli
>Priority: Major
> Fix For: 3.0.0
>
>
> 1) Tasks will be speculated when meet certain conditions no matter they are 
> inefficient or not,this would be a huge waste of cluster resources.
> 2) In production,the speculation task comes  from an efficient one  will be 
> killed finally,which is unnecessary and will waste of cluster resources.
> 3) So, we should  evaluate whether the task is inefficient by success tasks 
> metrics firstly, and then decide to speculate it or not. The  inefficient 
> task will be speculated and efficient one will not, it is better for the 
> cluster resources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33747) Avoid calling unregisterMapOutput when the map stage is being rerunning.

2020-12-10 Thread weixiuli (Jira)
weixiuli created SPARK-33747:


 Summary: Avoid calling unregisterMapOutput when the map stage is 
being rerunning.
 Key: SPARK-33747
 URL: https://issues.apache.org/jira/browse/SPARK-33747
 Project: Spark
  Issue Type: Bug
  Components: Block Manager
Affects Versions: 3.0.1, 2.4.5
Reporter: weixiuli


When a fetch failure happened, DAGScheduler will try to unregister the 
corresponding map output. The current logic has a race condition that the new 
map stage attempt is running while the old reduce stage attempt returns another 
fetch failure. In this case, if the map output is always unregistered, it may 
actually unregister the map output from the new map stage attempt.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33747) Avoid calling unregisterMapOutput when the map stage is being rerunning.

2020-12-10 Thread weixiuli (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weixiuli updated SPARK-33747:
-
Fix Version/s: 2.4.5
   3.0.1

> Avoid calling unregisterMapOutput when the map stage is being rerunning.
> 
>
> Key: SPARK-33747
> URL: https://issues.apache.org/jira/browse/SPARK-33747
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager
>Affects Versions: 2.4.5, 3.0.1
>Reporter: weixiuli
>Priority: Major
> Fix For: 2.4.5, 3.0.1
>
>
> When a fetch failure happened, DAGScheduler will try to unregister the 
> corresponding map output. The current logic has a race condition that the new 
> map stage attempt is running while the old reduce stage attempt returns 
> another fetch failure. In this case, if the map output is always 
> unregistered, it may actually unregister the map output from the new map 
> stage attempt.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33747) Avoid calling unregisterMapOutput when the map stage is being rerunning.

2020-12-21 Thread weixiuli (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weixiuli updated SPARK-33747:
-
Description: When a fetch failure happened, DAGScheduler will try to 
unregister the corresponding map output. The current logic has a race condition 
that the new map stage attempt is running while the current reduce stage 
attempt returns another fetch failure (note: the current reduce stage firstly 
returns a fetch failure to make the maps stage is rerunning, and then the 
rerunning map stage may return some mapstatus of the failed MapId before the 
current reduce stage returns another fetch failure at the same MapId, the 
current reduce is last attempt due to the new map stage is not yet completed). 
In this case, if the map output is always unregistered, it may actually 
unregister the map output from the new map stage attempt.  (was: When a fetch 
failure happened, DAGScheduler will try to unregister the corresponding map 
output. The current logic has a race condition that the new map stage attempt 
is running while the old reduce stage attempt returns another fetch failure. In 
this case, if the map output is always unregistered, it may actually unregister 
the map output from the new map stage attempt.)

> Avoid calling unregisterMapOutput when the map stage is being rerunning.
> 
>
> Key: SPARK-33747
> URL: https://issues.apache.org/jira/browse/SPARK-33747
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager
>Affects Versions: 2.4.5, 3.0.1
>Reporter: weixiuli
>Priority: Major
> Fix For: 2.4.5, 3.0.1
>
>
> When a fetch failure happened, DAGScheduler will try to unregister the 
> corresponding map output. The current logic has a race condition that the new 
> map stage attempt is running while the current reduce stage attempt returns 
> another fetch failure (note: the current reduce stage firstly returns a fetch 
> failure to make the maps stage is rerunning, and then the rerunning map stage 
> may return some mapstatus of the failed MapId before the current reduce stage 
> returns another fetch failure at the same MapId, the current reduce is last 
> attempt due to the new map stage is not yet completed). In this case, if the 
> map output is always unregistered, it may actually unregister the map output 
> from the new map stage attempt.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34834) There is a potential netty leak in TransportResponseHandler.

2021-03-23 Thread weixiuli (Jira)
weixiuli created SPARK-34834:


 Summary: There is a potential netty leak in 
TransportResponseHandler.
 Key: SPARK-34834
 URL: https://issues.apache.org/jira/browse/SPARK-34834
 Project: Spark
  Issue Type: Bug
  Components: Shuffle
Affects Versions: 3.1.1, 3.1.0, 3.0.2, 2.4.7
Reporter: weixiuli


There is a potential netty leak in TransportResponseHandler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34834) There is a potential Netty memory leak in TransportResponseHandler.

2021-03-23 Thread weixiuli (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weixiuli updated SPARK-34834:
-
Summary: There is a potential Netty memory leak in 
TransportResponseHandler.  (was: There is a potential netty leak in 
TransportResponseHandler.)

> There is a potential Netty memory leak in TransportResponseHandler.
> ---
>
> Key: SPARK-34834
> URL: https://issues.apache.org/jira/browse/SPARK-34834
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 2.4.7, 3.0.2, 3.1.0, 3.1.1
>Reporter: weixiuli
>Priority: Major
>
> There is a potential netty leak in TransportResponseHandler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34834) There is a potential Netty memory leak in TransportResponseHandler.

2021-03-23 Thread weixiuli (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weixiuli updated SPARK-34834:
-
Description: There is a potential Netty memory leak in 
TransportResponseHandler.  (was: There is a potential netty leak in 
TransportResponseHandler.)

> There is a potential Netty memory leak in TransportResponseHandler.
> ---
>
> Key: SPARK-34834
> URL: https://issues.apache.org/jira/browse/SPARK-34834
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 2.4.7, 3.0.2, 3.1.0, 3.1.1
>Reporter: weixiuli
>Priority: Major
>
> There is a potential Netty memory leak in TransportResponseHandler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36635) spark-sql do NOT support that select name expression as string type now

2021-09-01 Thread weixiuli (Jira)
weixiuli created SPARK-36635:


 Summary: spark-sql do NOT support  that select name expression as 
string type now
 Key: SPARK-36635
 URL: https://issues.apache.org/jira/browse/SPARK-36635
 Project: Spark
  Issue Type: Bug
  Components: Block Manager
Affects Versions: 3.1.2, 3.1.0
Reporter: weixiuli


The follow statement would throw an exception.


{code:java}
 sql(SELECT age as 'a', name as 'n' FROM VALUES (2, 'Alice'), (5, 'Bob') 
people(age, name))
{code}



{code:java}
// Exception information
Error in query:
mismatched input ''a'' expecting {, ';'}(line 1, pos 14)

== SQL ==
SELECT age as 'a', name as 'n' FROM VALUES (2, 'Alice'), (5, 'Bob') people(age, 
name)
--^^^
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36635) spark-sql do NOT support that select name expression as string type now

2021-09-02 Thread weixiuli (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17409222#comment-17409222
 ] 

weixiuli commented on SPARK-36635:
--

It didn't work before either.

> spark-sql do NOT support  that select name expression as string type now
> 
>
> Key: SPARK-36635
> URL: https://issues.apache.org/jira/browse/SPARK-36635
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0, 3.1.2
>Reporter: weixiuli
>Priority: Major
>
> The follow statement would throw an exception.
> {code:java}
>  sql("SELECT age as 'a', name as 'n' FROM VALUES (2, 'Alice'), (5, 'Bob') 
> people(age, name)")
> {code}
> {code:java}
> // Exception information
> Error in query:
> mismatched input ''a'' expecting {, ';'}(line 1, pos 14)
> == SQL ==
> SELECT age as 'a', name as 'n' FROM VALUES (2, 'Alice'), (5, 'Bob') 
> people(age, name)
> --^^^
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37028) Add 'kill' executors link in Web UI.

2021-10-16 Thread weixiuli (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weixiuli updated SPARK-37028:
-
Description: Add a 'kill' executors link in Web UI.  (was: Add 'kill' 
executors link in Web UI.)

>  Add  'kill' executors link in Web UI.
> --
>
> Key: SPARK-37028
> URL: https://issues.apache.org/jira/browse/SPARK-37028
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: weixiuli
>Priority: Major
>
> Add a 'kill' executors link in Web UI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37028) Add 'kill' executors link in Web UI.

2021-10-16 Thread weixiuli (Jira)
weixiuli created SPARK-37028:


 Summary:  Add  'kill' executors link in Web UI.
 Key: SPARK-37028
 URL: https://issues.apache.org/jira/browse/SPARK-37028
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.3.0
Reporter: weixiuli


Add 'kill' executors link in Web UI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37028) Add a 'kill' executors link in Web UI.

2021-10-16 Thread weixiuli (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weixiuli updated SPARK-37028:
-
Summary:  Add a 'kill' executors link in Web UI.  (was:  Add  'kill' 
executors link in Web UI.)

>  Add a 'kill' executors link in Web UI.
> ---
>
> Key: SPARK-37028
> URL: https://issues.apache.org/jira/browse/SPARK-37028
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: weixiuli
>Priority: Major
>
> Add a 'kill' executors link in Web UI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37028) Add a 'kill' executor link in Web UI.

2021-10-16 Thread weixiuli (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weixiuli updated SPARK-37028:
-
Summary:  Add a 'kill' executor link in Web UI.  (was:  Add a 'kill' 
executors link in Web UI.)

>  Add a 'kill' executor link in Web UI.
> --
>
> Key: SPARK-37028
> URL: https://issues.apache.org/jira/browse/SPARK-37028
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: weixiuli
>Priority: Major
>
> Add a 'kill' executors link in Web UI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37028) Add a 'kill' executor link in Web UI.

2021-10-16 Thread weixiuli (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weixiuli updated SPARK-37028:
-
Description: 
The executor which is running in a bad node(eg. The system is overloaded or 
disks are busy) or it has big GC overheads may affect the efficiency of job 
execution, although there are speculative mechanisms to resolve this 
problem,but sometimes the speculated task may also run in a bad executor.
We should have a "kill" link for each executor, similar to what we have for 
each stage, so it's easier for users to kill executors in the UI.

  was:Add a 'kill' executors link in Web UI.


>  Add a 'kill' executor link in Web UI.
> --
>
> Key: SPARK-37028
> URL: https://issues.apache.org/jira/browse/SPARK-37028
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: weixiuli
>Priority: Major
>
> The executor which is running in a bad node(eg. The system is overloaded or 
> disks are busy) or it has big GC overheads may affect the efficiency of job 
> execution, although there are speculative mechanisms to resolve this 
> problem,but sometimes the speculated task may also run in a bad executor.
> We should have a "kill" link for each executor, similar to what we have for 
> each stage, so it's easier for users to kill executors in the UI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37028) Add a 'kill' executor link in Web UI.

2021-10-16 Thread weixiuli (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weixiuli updated SPARK-37028:
-
Description: 
The executor which is running in a bad node(eg. The system is overloaded or 
disks are busy) or has big GC overheads may affect the efficiency of job 
execution, although there are speculative mechanisms to resolve this 
problem,but sometimes the speculated task may also run in a bad executor.
We should have a "kill" link for each executor, similar to what we have for 
each stage, so it's easier for users to kill executors in the UI.

  was:
The executor which is running in a bad node(eg. The system is overloaded or 
disks are busy) or it has big GC overheads may affect the efficiency of job 
execution, although there are speculative mechanisms to resolve this 
problem,but sometimes the speculated task may also run in a bad executor.
We should have a "kill" link for each executor, similar to what we have for 
each stage, so it's easier for users to kill executors in the UI.


>  Add a 'kill' executor link in Web UI.
> --
>
> Key: SPARK-37028
> URL: https://issues.apache.org/jira/browse/SPARK-37028
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: weixiuli
>Priority: Major
>
> The executor which is running in a bad node(eg. The system is overloaded or 
> disks are busy) or has big GC overheads may affect the efficiency of job 
> execution, although there are speculative mechanisms to resolve this 
> problem,but sometimes the speculated task may also run in a bad executor.
> We should have a "kill" link for each executor, similar to what we have for 
> each stage, so it's easier for users to kill executors in the UI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37028) Add a 'kill' executor link in the Web UI.

2021-10-16 Thread weixiuli (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weixiuli updated SPARK-37028:
-
Summary:  Add a 'kill' executor link in the Web UI.  (was:  Add a 'kill' 
executor link in Web UI.)

>  Add a 'kill' executor link in the Web UI.
> --
>
> Key: SPARK-37028
> URL: https://issues.apache.org/jira/browse/SPARK-37028
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: weixiuli
>Priority: Major
>
> The executor which is running in a bad node(eg. The system is overloaded or 
> disks are busy) or has big GC overheads may affect the efficiency of job 
> execution, although there are speculative mechanisms to resolve this 
> problem,but sometimes the speculated task may also run in a bad executor.
> We should have a "kill" link for each executor, similar to what we have for 
> each stage, so it's easier for users to kill executors in the UI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37028) Add a 'kill' executor link in the Web UI.

2021-10-16 Thread weixiuli (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weixiuli updated SPARK-37028:
-
Description: 
The executor which is running in a bad node(eg. The system is overloaded or 
disks are busy) or has big GC overheads may affect the efficiency of job 
execution, although there are speculative mechanisms to resolve this 
problem,but sometimes the speculated task may also run in a bad executor.
 We should have a 'kill' link for each executor, similar to what we have for 
each stage, so it's easier for users to kill executors in the UI.

  was:
The executor which is running in a bad node(eg. The system is overloaded or 
disks are busy) or has big GC overheads may affect the efficiency of job 
execution, although there are speculative mechanisms to resolve this 
problem,but sometimes the speculated task may also run in a bad executor.
We should have a "kill" link for each executor, similar to what we have for 
each stage, so it's easier for users to kill executors in the UI.


>  Add a 'kill' executor link in the Web UI.
> --
>
> Key: SPARK-37028
> URL: https://issues.apache.org/jira/browse/SPARK-37028
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: weixiuli
>Priority: Major
>
> The executor which is running in a bad node(eg. The system is overloaded or 
> disks are busy) or has big GC overheads may affect the efficiency of job 
> execution, although there are speculative mechanisms to resolve this 
> problem,but sometimes the speculated task may also run in a bad executor.
>  We should have a 'kill' link for each executor, similar to what we have for 
> each stage, so it's easier for users to kill executors in the UI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37028) Add a 'kill' executor link in the Web UI.

2021-10-16 Thread weixiuli (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weixiuli updated SPARK-37028:
-
Description: 
The executor which is running in a bad node(eg. The system is overloaded or 
disks are busy) or has big GC overheads may affect the efficiency of job 
execution, although there are speculative mechanisms to resolve this problem, 
but sometimes the speculated task may also run in a bad executor.
 We should have a 'kill' link for each executor, similar to what we have for 
each stage, so it's easier for users to kill executors in the UI.

  was:
The executor which is running in a bad node(eg. The system is overloaded or 
disks are busy) or has big GC overheads may affect the efficiency of job 
execution, although there are speculative mechanisms to resolve this 
problem,but sometimes the speculated task may also run in a bad executor.
 We should have a 'kill' link for each executor, similar to what we have for 
each stage, so it's easier for users to kill executors in the UI.


>  Add a 'kill' executor link in the Web UI.
> --
>
> Key: SPARK-37028
> URL: https://issues.apache.org/jira/browse/SPARK-37028
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: weixiuli
>Priority: Major
>
> The executor which is running in a bad node(eg. The system is overloaded or 
> disks are busy) or has big GC overheads may affect the efficiency of job 
> execution, although there are speculative mechanisms to resolve this problem, 
> but sometimes the speculated task may also run in a bad executor.
>  We should have a 'kill' link for each executor, similar to what we have for 
> each stage, so it's easier for users to kill executors in the UI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35200) Avoid to recompute the pending tasks in the ExecutorAllocationManager and remove unnecessary code

2021-04-22 Thread weixiuli (Jira)
weixiuli created SPARK-35200:


 Summary: Avoid to recompute the pending tasks in the 
ExecutorAllocationManager and remove unnecessary code
 Key: SPARK-35200
 URL: https://issues.apache.org/jira/browse/SPARK-35200
 Project: Spark
  Issue Type: Improvement
  Components: Scheduler, Spark Core
Affects Versions: 3.1.1, 3.1.0, 3.0.2
Reporter: weixiuli


The number of the pending speculative tasks is recomputed in the 
ExecutorAllocationManager to calculate the maximum number of executors 
required. while , it only needs to be computed once to improve  performance.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35200) Avoid to recompute the pending speculative tasks in the ExecutorAllocationManager and remove unnecessary code

2021-04-22 Thread weixiuli (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weixiuli updated SPARK-35200:
-
Summary: Avoid to recompute the pending speculative tasks in the 
ExecutorAllocationManager and remove unnecessary code  (was: Avoid to recompute 
the pending tasks in the ExecutorAllocationManager and remove unnecessary code)

> Avoid to recompute the pending speculative tasks in the 
> ExecutorAllocationManager and remove unnecessary code
> -
>
> Key: SPARK-35200
> URL: https://issues.apache.org/jira/browse/SPARK-35200
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler, Spark Core
>Affects Versions: 3.0.2, 3.1.0, 3.1.1
>Reporter: weixiuli
>Priority: Major
>
> The number of the pending speculative tasks is recomputed in the 
> ExecutorAllocationManager to calculate the maximum number of executors 
> required. while , it only needs to be computed once to improve  performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35200) Avoid to recompute the pending speculative tasks in the ExecutorAllocationManager and remove unnecessary code

2021-04-22 Thread weixiuli (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weixiuli updated SPARK-35200:
-
Description: 
The number of the pending speculative tasks is recomputed in the 
ExecutorAllocationManager to calculate the maximum number of executors 
required. While , it only needs to be computed once to improve  performance.


  was:
The number of the pending speculative tasks is recomputed in the 
ExecutorAllocationManager to calculate the maximum number of executors 
required. while , it only needs to be computed once to improve  performance.



> Avoid to recompute the pending speculative tasks in the 
> ExecutorAllocationManager and remove unnecessary code
> -
>
> Key: SPARK-35200
> URL: https://issues.apache.org/jira/browse/SPARK-35200
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler, Spark Core
>Affects Versions: 3.0.2, 3.1.0, 3.1.1
>Reporter: weixiuli
>Priority: Major
>
> The number of the pending speculative tasks is recomputed in the 
> ExecutorAllocationManager to calculate the maximum number of executors 
> required. While , it only needs to be computed once to improve  performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35200) Avoid to recompute the pending speculative tasks in the ExecutorAllocationManager and remove unnecessary code

2021-04-22 Thread weixiuli (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weixiuli updated SPARK-35200:
-
Fix Version/s: 3.0.1
   3.0.2
   3.1.0
   3.1.1

> Avoid to recompute the pending speculative tasks in the 
> ExecutorAllocationManager and remove unnecessary code
> -
>
> Key: SPARK-35200
> URL: https://issues.apache.org/jira/browse/SPARK-35200
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler, Spark Core
>Affects Versions: 3.0.2, 3.1.0, 3.1.1
>Reporter: weixiuli
>Priority: Major
> Fix For: 3.0.1, 3.0.2, 3.1.0, 3.1.1
>
>
> The number of the pending speculative tasks is recomputed in the 
> ExecutorAllocationManager to calculate the maximum number of executors 
> required. While , it only needs to be computed once to improve  performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30186) support Dynamic Partition Pruning in Adaptive Execution

2021-05-10 Thread weixiuli (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17342266#comment-17342266
 ] 

weixiuli commented on SPARK-30186:
--

https://github.com/apache/spark/pull/31941

> support Dynamic Partition Pruning in Adaptive Execution
> ---
>
> Key: SPARK-30186
> URL: https://issues.apache.org/jira/browse/SPARK-30186
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Xiaoju Wu
>Priority: Major
>
> Currently Adaptive Execution cannot work if Dynamic Partition Pruning is 
> applied.
> private def supportAdaptive(plan: SparkPlan): Boolean = {
>  // TODO migrate dynamic-partition-pruning onto adaptive execution.
>  sanityCheck(plan) &&
>  !plan.logicalLink.exists(_.isStreaming) &&
>  
> *!plan.expressions.exists(_.find(_.isInstanceOf[DynamicPruningSubquery]).isDefined)*
>  &&
>  plan.children.forall(supportAdaptive)
> }
> It means we cannot benefit the performance from both AE and DPP.
> This ticket is target to make DPP + AE works together.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35424) Remove some useless code in ExternalBlockHandler

2021-05-17 Thread weixiuli (Jira)
weixiuli created SPARK-35424:


 Summary: Remove some useless code in ExternalBlockHandler
 Key: SPARK-35424
 URL: https://issues.apache.org/jira/browse/SPARK-35424
 Project: Spark
  Issue Type: Improvement
  Components: Shuffle
Affects Versions: 3.1.1, 3.0.2, 3.2.0
Reporter: weixiuli


There is some useless code in the ExternalBlockHandler, so we may remove it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35783) Set the list of read columns in the task configuration for reducing read data for ORC

2021-06-15 Thread weixiuli (Jira)
weixiuli created SPARK-35783:


 Summary: Set the list of read columns in the task configuration 
for reducing read data for ORC
 Key: SPARK-35783
 URL: https://issues.apache.org/jira/browse/SPARK-35783
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.1.2, 3.1.0, 3.0.1, 3.0.0
Reporter: weixiuli


Now, if the read column list is not set in the task configuration, it will read 
all columns in the ORC table. Therefore, we should set the list of read columns 
in the task configuration to reduce the ORC read data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35783) Set the list of read columns in the task configuration to reduce read data for ORC

2021-06-15 Thread weixiuli (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weixiuli updated SPARK-35783:
-
Summary: Set the list of read columns in the task configuration to reduce 
read data for ORC  (was: Set the list of read columns in the task configuration 
for reducing read data for ORC)

> Set the list of read columns in the task configuration to reduce read data 
> for ORC
> --
>
> Key: SPARK-35783
> URL: https://issues.apache.org/jira/browse/SPARK-35783
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.0.1, 3.1.0, 3.1.2
>Reporter: weixiuli
>Priority: Major
>
> Now, if the read column list is not set in the task configuration, it will 
> read all columns in the ORC table. Therefore, we should set the list of read 
> columns in the task configuration to reduce the ORC read data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35783) Set the list of read columns in the task configuration to reduce read data for ORC

2021-06-15 Thread weixiuli (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weixiuli updated SPARK-35783:
-
Description: Now, if the read column list is not set in the task 
configuration, it will read all columns in the ORC table. Therefore, we should 
set the list of read columns in the task configuration to reduce reading of ORC 
data.  (was: Now, if the read column list is not set in the task configuration, 
it will read all columns in the ORC table. Therefore, we should set the list of 
read columns in the task configuration to reduce the ORC read data.)

> Set the list of read columns in the task configuration to reduce read data 
> for ORC
> --
>
> Key: SPARK-35783
> URL: https://issues.apache.org/jira/browse/SPARK-35783
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.0.1, 3.1.0, 3.1.2
>Reporter: weixiuli
>Priority: Major
>
> Now, if the read column list is not set in the task configuration, it will 
> read all columns in the ORC table. Therefore, we should set the list of read 
> columns in the task configuration to reduce reading of ORC data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35783) Set the list of read columns in the task configuration to reduce reading of ORC data.

2021-06-15 Thread weixiuli (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weixiuli updated SPARK-35783:
-
Summary: Set the list of read columns in the task configuration to reduce 
reading of ORC data.  (was: Set the list of read columns in the task 
configuration to reduce read data for ORC)

> Set the list of read columns in the task configuration to reduce reading of 
> ORC data.
> -
>
> Key: SPARK-35783
> URL: https://issues.apache.org/jira/browse/SPARK-35783
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.0.1, 3.1.0, 3.1.2
>Reporter: weixiuli
>Priority: Major
>
> Now, if the read column list is not set in the task configuration, it will 
> read all columns in the ORC table. Therefore, we should set the list of read 
> columns in the task configuration to reduce reading of ORC data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35783) Set the list of read columns in the task configuration to reduce reading of ORC data.

2021-06-16 Thread weixiuli (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weixiuli updated SPARK-35783:
-
Description: Now, the ORC reader will read all columns of the ORC table 
when the task configuration does not set the read column list. Therefore, we 
should set the list of read columns in the task configuration to reduce reading 
of ORC data.  (was: Now, if the read column list is not set in the task 
configuration, it will read all columns in the ORC table. Therefore, we should 
set the list of read columns in the task configuration to reduce reading of ORC 
data.)

> Set the list of read columns in the task configuration to reduce reading of 
> ORC data.
> -
>
> Key: SPARK-35783
> URL: https://issues.apache.org/jira/browse/SPARK-35783
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.0.1, 3.1.0, 3.1.2
>Reporter: weixiuli
>Priority: Major
>
> Now, the ORC reader will read all columns of the ORC table when the task 
> configuration does not set the read column list. Therefore, we should set the 
> list of read columns in the task configuration to reduce reading of ORC data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38280) The Rank window to sort is not necessary in a query

2022-02-21 Thread weixiuli (Jira)
weixiuli created SPARK-38280:


 Summary: The Rank window to sort is not necessary in a query
 Key: SPARK-38280
 URL: https://issues.apache.org/jira/browse/SPARK-38280
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 3.2.1, 3.2.0, 3.1.2, 3.1.1, 3.1.0, 3.0.3, 3.0.2, 3.0.1, 
3.0.0
Reporter: weixiuli






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38280) The Rank windows to be ordered is not necessary in a query

2022-02-21 Thread weixiuli (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weixiuli updated SPARK-38280:
-
Summary: The Rank windows to be ordered is not necessary in a query  (was: 
The Rank window to sort is not necessary in a query)

> The Rank windows to be ordered is not necessary in a query
> --
>
> Key: SPARK-38280
> URL: https://issues.apache.org/jira/browse/SPARK-38280
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0, 
> 3.2.1
>Reporter: weixiuli
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38344) Avoid to submit task when there are no requests to push up in push-based shuffle

2022-02-27 Thread weixiuli (Jira)
weixiuli created SPARK-38344:


 Summary: Avoid to submit task when there are no requests to push 
up in push-based shuffle
 Key: SPARK-38344
 URL: https://issues.apache.org/jira/browse/SPARK-38344
 Project: Spark
  Issue Type: Bug
  Components: Shuffle, Spark Core
Affects Versions: 3.2.1, 3.2.0
Reporter: weixiuli






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38428) Improve FetchShuffleBlocks in External shuffle service

2022-03-06 Thread weixiuli (Jira)
weixiuli created SPARK-38428:


 Summary: Improve FetchShuffleBlocks in External shuffle service
 Key: SPARK-38428
 URL: https://issues.apache.org/jira/browse/SPARK-38428
 Project: Spark
  Issue Type: Improvement
  Components: Shuffle
Affects Versions: 3.2.1, 3.2.0, 3.1.2, 3.1.1, 3.1.0
Reporter: weixiuli






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38428) Check the FetchShuffleBlocks message only once to improve iteration in external shuffle service

2022-03-06 Thread weixiuli (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weixiuli updated SPARK-38428:
-
Summary: Check the FetchShuffleBlocks message only once to improve 
iteration in external shuffle service   (was: Improve FetchShuffleBlocks in 
External shuffle service)

> Check the FetchShuffleBlocks message only once to improve iteration in 
> external shuffle service 
> 
>
> Key: SPARK-38428
> URL: https://issues.apache.org/jira/browse/SPARK-38428
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle
>Affects Versions: 3.1.0, 3.1.1, 3.1.2, 3.2.0, 3.2.1
>Reporter: weixiuli
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38555) Avoid contention and get or create clientPools quickly in the TransportClientFactory

2022-03-15 Thread weixiuli (Jira)
weixiuli created SPARK-38555:


 Summary:  Avoid contention and get or create clientPools quickly 
in the TransportClientFactory
 Key: SPARK-38555
 URL: https://issues.apache.org/jira/browse/SPARK-38555
 Project: Spark
  Issue Type: Improvement
  Components: Shuffle, Spark Core
Affects Versions: 3.2.1, 3.2.0, 3.1.2, 3.1.1, 3.1.0, 3.0.3, 3.0.2, 3.0.1, 
3.0.0
Reporter: weixiuli






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38683) It is unnecessary to release the ShuffleManagedBufferIterator or ShuffleChunkManagedBufferIterator or ManagedBufferIterator buffers when the client channel's connection

2022-03-29 Thread weixiuli (Jira)
weixiuli created SPARK-38683:


 Summary: It is unnecessary to release the 
ShuffleManagedBufferIterator or ShuffleChunkManagedBufferIterator or 
ManagedBufferIterator buffers when the client channel's connection is terminated
 Key: SPARK-38683
 URL: https://issues.apache.org/jira/browse/SPARK-38683
 Project: Spark
  Issue Type: Bug
  Components: Shuffle
Affects Versions: 3.2.1, 3.2.0, 3.1.2, 3.1.1, 3.1.0
Reporter: weixiuli


 It is unnecessary to release the ShuffleManagedBufferIterator or 
ShuffleChunkManagedBufferIterator or ManagedBufferIterator buffers when the 
client channel's connection is terminated, to reduce I/O operations and improve 
performance for the External Shuffle Service.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38805) Remove an expired indexFilePath from the ESS shuffleIndexCache or the PBS indexCache to save memory.

2022-04-06 Thread weixiuli (Jira)
weixiuli created SPARK-38805:


 Summary: Remove an expired indexFilePath from the ESS 
shuffleIndexCache or the PBS indexCache to save memory.
 Key: SPARK-38805
 URL: https://issues.apache.org/jira/browse/SPARK-38805
 Project: Spark
  Issue Type: Bug
  Components: Shuffle
Affects Versions: 3.2.1, 3.2.0, 3.1.2, 3.1.1, 3.1.0
Reporter: weixiuli


Support to automatically remove an expired indexFilePath from the ESS 
shuffleIndexCache or the PBS indexCache to save memory.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38856) Fix a rejectedExecutionException error when push-based shuffle is enabled

2022-04-11 Thread weixiuli (Jira)
weixiuli created SPARK-38856:


 Summary: Fix a rejectedExecutionException error when push-based 
shuffle is enabled
 Key: SPARK-38856
 URL: https://issues.apache.org/jira/browse/SPARK-38856
 Project: Spark
  Issue Type: Bug
  Components: Shuffle
Affects Versions: 3.2.1, 3.2.0
Reporter: weixiuli






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38856) Fix a rejectedExecutionException error when push-based shuffle is enabled

2022-04-18 Thread weixiuli (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weixiuli updated SPARK-38856:
-
Description: 
When enabled push-based shuffle in our production, there will be a 
rejectedExecutionException error, this is because that the shuffle pusher pool 
has been shutdowned before using it.

This is the rejectedExecutionException error :

{{FetchFailed(BlockManagerId(26,x.hadoop.jd.local, 7337, None), 
shuffleId=0, mapIndex=6424, mapId=4177, reduceId=1031, message=
org.apache.spark.shuffle.FetchFailedException
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:1181)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:919)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:81)
at 
org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:29)
at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
at 
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)
at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.agg_doAggregateWithKeys_0$(Unknown
 Source)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$2.hasNext(WholeStageCodegenExec.scala:815)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
at 
org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:179)
at 
org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
at org.apache.spark.scheduler.Task.run(Task.scala:133)
at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.RejectedExecutionException: Task 
org.apache.spark.shuffle.ShuffleBlockPusher$$anon$2$$Lambda$1045/583658475@3492bd6f
 rejected from java.util.concurrent.ThreadPoolExecutor@2e63bad5[Terminated, 
pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 243134]
at 
java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063)
at 
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)
at 
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379)
at 
org.apache.spark.shuffle.ShuffleBlockPusher.submitTask(ShuffleBlockPusher.scala:147)
at 
org.apache.spark.shuffle.ShuffleBlockPusher$$anon$2.handleResult(ShuffleBlockPusher.scala:235)
at 
org.apache.spark.shuffle.ShuffleBlockPusher$$anon$2.onBlockPushSuccess(ShuffleBlockPusher.scala:245)
at 
org.apache.spark.network.shuffle.BlockPushingListener.onBlockTransferSuccess(BlockPushingListener.java:42)
at 
org.apache.spark.shuffle.ShuffleBlockPusher$$anon$2.onBlockTransferSuccess(ShuffleBlockPusher.scala:224)
at 
org.apache.spark.network.shuffle.RetryingBlockTransferor$RetryingBlockTransferListener.handleBlockTransferSuccess(RetryingBlockTransferor.java:258)
at 
org.apache.spark.network.shuffle.RetryingBlockTransferor$RetryingBlockTransferListener.onBlockPushSuccess(RetryingBlockTransferor.java:304)
at 
org.apache.spark.network.shuffle.OneForOneBlockPusher$BlockPushCallback.onSuccess(OneForOneBlockPusher.java:97)
at 
org.apache.spark.network.client.TransportResponseHandler.handle(TransportResponseHandler.java:197)
at 
org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:142)
at 
org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53)
at 
io.netty.channel.SimpleChannelInboundHandler.c

[jira] [Commented] (SPARK-38856) Fix a rejectedExecutionException error when push-based shuffle is enabled

2022-04-18 Thread weixiuli (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17523981#comment-17523981
 ] 

weixiuli commented on SPARK-38856:
--

OK, done.  [~srowen] 

> Fix a rejectedExecutionException error when push-based shuffle is enabled
> -
>
> Key: SPARK-38856
> URL: https://issues.apache.org/jira/browse/SPARK-38856
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 3.2.0, 3.2.1
>Reporter: weixiuli
>Assignee: weixiuli
>Priority: Major
>
> When enabled push-based shuffle in our production, there will be a 
> rejectedExecutionException error, this is because that the shuffle pusher 
> pool has been shutdowned before using it.
> This is the rejectedExecutionException error :
> {{FetchFailed(BlockManagerId(26,x.hadoop.jd.local, 7337, None), 
> shuffleId=0, mapIndex=6424, mapId=4177, reduceId=1031, message=
> org.apache.spark.shuffle.FetchFailedException
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:1181)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:919)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:81)
>   at 
> org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:29)
>   at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492)
>   at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
>   at 
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)
>   at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
>   at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.agg_doAggregateWithKeys_0$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$2.hasNext(WholeStageCodegenExec.scala:815)
>   at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
>   at 
> org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:179)
>   at 
> org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
>   at org.apache.spark.scheduler.Task.run(Task.scala:133)
>   at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.util.concurrent.RejectedExecutionException: Task 
> org.apache.spark.shuffle.ShuffleBlockPusher$$anon$2$$Lambda$1045/583658475@3492bd6f
>  rejected from java.util.concurrent.ThreadPoolExecutor@2e63bad5[Terminated, 
> pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 243134]
>   at 
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063)
>   at 
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)
>   at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379)
>   at 
> org.apache.spark.shuffle.ShuffleBlockPusher.submitTask(ShuffleBlockPusher.scala:147)
>   at 
> org.apache.spark.shuffle.ShuffleBlockPusher$$anon$2.handleResult(ShuffleBlockPusher.scala:235)
>   at 
> org.apache.spark.shuffle.ShuffleBlockPusher$$anon$2.onBlockPushSuccess(ShuffleBlockPusher.scala:245)
>   at 
> org.apache.spark.network.shuffle.BlockPushingListener.onBlockTransferSuccess(BlockPushingListener.java:42)
>   at 
> org.apache.spark.shuffle.ShuffleBlockPusher$$anon$2.onBlockTransferSuccess(ShuffleBlockPusher.scala:224)
>   at 
> org.apache.spark.network.shuffle.RetryingBlockTransferor$RetryingBlockTransferListener.handleBlockTransferSuccess(RetryingBlockTransferor.java:258)
>   at 
> org.apache.spark.network.shuffle.Retrying

[jira] [Created] (SPARK-39287) TaskSchedulerImpl should quickly ignore task finished event if its task was finished state

2022-05-25 Thread weixiuli (Jira)
weixiuli created SPARK-39287:


 Summary: TaskSchedulerImpl should quickly ignore task finished 
event if its task was  finished state
 Key: SPARK-39287
 URL: https://issues.apache.org/jira/browse/SPARK-39287
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.2.1, 3.2.0, 3.1.2, 3.1.1, 3.1.0
Reporter: weixiuli






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37462) To avoid unnecessary flight request calculations

2021-11-25 Thread weixiuli (Jira)
weixiuli created SPARK-37462:


 Summary: To avoid unnecessary flight request calculations
 Key: SPARK-37462
 URL: https://issues.apache.org/jira/browse/SPARK-37462
 Project: Spark
  Issue Type: Improvement
  Components: Shuffle, Spark Core
Affects Versions: 3.2.0, 3.1.0
Reporter: weixiuli


To avoid unnecessary flight request calculations



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37462) To avoid unnecessary calculation of outstanding fetch requests and RPCS

2021-11-25 Thread weixiuli (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weixiuli updated SPARK-37462:
-
Description: To avoid unnecessary calculation of outstanding fetch requests 
and RPCS  (was: To avoid unnecessary flight request calculations)
Summary: To avoid unnecessary calculation of outstanding fetch requests 
and RPCS  (was: To avoid unnecessary flight request calculations)

> To avoid unnecessary calculation of outstanding fetch requests and RPCS
> ---
>
> Key: SPARK-37462
> URL: https://issues.apache.org/jira/browse/SPARK-37462
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 3.1.0, 3.2.0
>Reporter: weixiuli
>Priority: Major
>
> To avoid unnecessary calculation of outstanding fetch requests and RPCS



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37462) Avoid unnecessary calculating the number of outstanding fetch requests and RPCS

2021-11-25 Thread weixiuli (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weixiuli updated SPARK-37462:
-
Description: It is unnecessary to calculate the number of outstanding fetch 
requests and RPCS when the IdleStateEvent is not IDLE or the last request is 
not timeout.  (was: To avoid unnecessary calculation of outstanding fetch 
requests and RPCS)
Summary:  Avoid unnecessary calculating the number of outstanding fetch 
requests and RPCS  (was: To avoid unnecessary calculation of outstanding fetch 
requests and RPCS)

>  Avoid unnecessary calculating the number of outstanding fetch requests and 
> RPCS
> 
>
> Key: SPARK-37462
> URL: https://issues.apache.org/jira/browse/SPARK-37462
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 3.1.0, 3.2.0
>Reporter: weixiuli
>Priority: Major
>
> It is unnecessary to calculate the number of outstanding fetch requests and 
> RPCS when the IdleStateEvent is not IDLE or the last request is not timeout.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37524) We should drop all tables after testing dynamic partition pruning

2021-12-02 Thread weixiuli (Jira)
weixiuli created SPARK-37524:


 Summary: We should drop all tables after testing dynamic partition 
pruning
 Key: SPARK-37524
 URL: https://issues.apache.org/jira/browse/SPARK-37524
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.2.0, 3.1.2, 3.1.1, 3.1.0, 3.0.3, 3.0.2, 3.0.1, 3.0.0
Reporter: weixiuli


We should drop all tables after testing dynamic partition pruning.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37542) Improve the Dynamic partition pruning

2021-12-03 Thread weixiuli (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weixiuli updated SPARK-37542:
-
Description: 
Currently, the dynamic partition pruning rule will insert a predicate on the 
filterable table using the filter from the other side of the join and a custom 
wrapper called DynamicPruning,and the predicate will be re-optimized by the AQE 
or non-AQE.

But, some time the predicate may be unnecessary if the join can NOT reuse 
broadcastExchange or it is not benefit,and it will be dropped by the rules of  
the AQE or non-AQE.

We should optimize the PartitionPruning and avoid insert unnecessary  predicate 
to improve the performance.

  was:
Currently, the dynamic partition pruning rule will insert a predicate on the 
filterable table using the filter from the other side of the join and a custom 
wrapper called DynamicPruning,and the predicate will be re-optimized by the AQE 
or non-AQE.

But, some time the predicate may be unnecessary if the join can NOT reuse 
broadcastExchange or it is not benefit,and it will be dropped by the rule of  
the AQE rule  or non-AQE.

We should optimize the PartitionPruning and avoid insert unnecessary  predicate 
to improve the performance.


> Improve the Dynamic partition pruning 
> --
>
> Key: SPARK-37542
> URL: https://issues.apache.org/jira/browse/SPARK-37542
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0
>Reporter: weixiuli
>Priority: Major
>
> Currently, the dynamic partition pruning rule will insert a predicate on the 
> filterable table using the filter from the other side of the join and a 
> custom wrapper called DynamicPruning,and the predicate will be re-optimized 
> by the AQE or non-AQE.
> But, some time the predicate may be unnecessary if the join can NOT reuse 
> broadcastExchange or it is not benefit,and it will be dropped by the rules of 
>  the AQE or non-AQE.
> We should optimize the PartitionPruning and avoid insert unnecessary  
> predicate to improve the performance.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37542) Improve the Dynamic partition pruning

2021-12-03 Thread weixiuli (Jira)
weixiuli created SPARK-37542:


 Summary: Improve the Dynamic partition pruning 
 Key: SPARK-37542
 URL: https://issues.apache.org/jira/browse/SPARK-37542
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.2.0, 3.1.2, 3.1.1, 3.1.0, 3.0.3, 3.0.2, 3.0.1, 3.0.0
Reporter: weixiuli


Currently, the dynamic partition pruning rule will insert a predicate on the 
filterable table using the filter from the other side of the join and a custom 
wrapper called DynamicPruning,and the predicate will be re-optimized by the AQE 
or non-AQE.

But, some time the predicate may be unnecessary if the join can NOT reuse 
broadcastExchange or it is not benefit,and it will be dropped by the rule of  
the AQE rule  or non-AQE.

We should optimize the PartitionPruning and avoid insert unnecessary  predicate 
to improve the performance.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37542) Improve the Dynamic partition pruning

2021-12-03 Thread weixiuli (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weixiuli updated SPARK-37542:
-
Description: 
Currently, the dynamic partition pruning rule will insert a predicate on the 
filterable table using the filter from the other side of the join and a custom 
wrapper called DynamicPruning,and the predicate will be re-optimized by the AQE 
or non-AQE.

But, some time the predicate may be unnecessary if the join can NOT reuse 
broadcastExchange or it is not benefit,and it will be dropped by the rules of  
the AQE or non-AQE.

We should optimize the PartitionPruning and avoid inserting unnecessary  
predicate to improve the performance.

  was:
Currently, the dynamic partition pruning rule will insert a predicate on the 
filterable table using the filter from the other side of the join and a custom 
wrapper called DynamicPruning,and the predicate will be re-optimized by the AQE 
or non-AQE.

But, some time the predicate may be unnecessary if the join can NOT reuse 
broadcastExchange or it is not benefit,and it will be dropped by the rules of  
the AQE or non-AQE.

We should optimize the PartitionPruning and avoid insert unnecessary  predicate 
to improve the performance.


> Improve the Dynamic partition pruning 
> --
>
> Key: SPARK-37542
> URL: https://issues.apache.org/jira/browse/SPARK-37542
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0
>Reporter: weixiuli
>Priority: Major
>
> Currently, the dynamic partition pruning rule will insert a predicate on the 
> filterable table using the filter from the other side of the join and a 
> custom wrapper called DynamicPruning,and the predicate will be re-optimized 
> by the AQE or non-AQE.
> But, some time the predicate may be unnecessary if the join can NOT reuse 
> broadcastExchange or it is not benefit,and it will be dropped by the rules of 
>  the AQE or non-AQE.
> We should optimize the PartitionPruning and avoid inserting unnecessary  
> predicate to improve the performance.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37542) Improve the Dynamic partition pruning

2021-12-03 Thread weixiuli (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weixiuli updated SPARK-37542:
-
Description: 
Currently, the dynamic partition pruning rule will insert a predicate on the 
filterable table using the filter from the other side of the join and a custom 
wrapper called DynamicPruning,and the predicate will be re-optimized by the AQE 
or non-AQE.

But, some time the predicate may be unnecessary if the join can NOT reuse 
broadcastExchange or it is not benefit,and it will be dropped by the rules of  
the AQE or non-AQE.

We should optimize the dynamic partition pruning rule  and avoid inserting 
unnecessary predicate to improve the performance.

  was:
Currently, the dynamic partition pruning rule will insert a predicate on the 
filterable table using the filter from the other side of the join and a custom 
wrapper called DynamicPruning,and the predicate will be re-optimized by the AQE 
or non-AQE.

But, some time the predicate may be unnecessary if the join can NOT reuse 
broadcastExchange or it is not benefit,and it will be dropped by the rules of  
the AQE or non-AQE.

We should optimize the dynamic partition pruning rule  and avoid inserting 
unnecessary  predicate to improve the performance.


> Improve the Dynamic partition pruning 
> --
>
> Key: SPARK-37542
> URL: https://issues.apache.org/jira/browse/SPARK-37542
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0
>Reporter: weixiuli
>Priority: Major
>
> Currently, the dynamic partition pruning rule will insert a predicate on the 
> filterable table using the filter from the other side of the join and a 
> custom wrapper called DynamicPruning,and the predicate will be re-optimized 
> by the AQE or non-AQE.
> But, some time the predicate may be unnecessary if the join can NOT reuse 
> broadcastExchange or it is not benefit,and it will be dropped by the rules of 
>  the AQE or non-AQE.
> We should optimize the dynamic partition pruning rule  and avoid inserting 
> unnecessary predicate to improve the performance.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37542) Improve the Dynamic partition pruning

2021-12-03 Thread weixiuli (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weixiuli updated SPARK-37542:
-
Description: 
Currently, the dynamic partition pruning rule will insert a predicate on the 
filterable table using the filter from the other side of the join and a custom 
wrapper called DynamicPruning,and the predicate will be re-optimized by the AQE 
or non-AQE.

But, some time the predicate may be unnecessary if the join can NOT reuse 
broadcastExchange or it is not benefit,and it will be dropped by the rules of  
the AQE or non-AQE.

We should optimize the dynamic partition pruning rule  and avoid inserting 
unnecessary  predicate to improve the performance.

  was:
Currently, the dynamic partition pruning rule will insert a predicate on the 
filterable table using the filter from the other side of the join and a custom 
wrapper called DynamicPruning,and the predicate will be re-optimized by the AQE 
or non-AQE.

But, some time the predicate may be unnecessary if the join can NOT reuse 
broadcastExchange or it is not benefit,and it will be dropped by the rules of  
the AQE or non-AQE.

We should optimize the PartitionPruning and avoid inserting unnecessary  
predicate to improve the performance.


> Improve the Dynamic partition pruning 
> --
>
> Key: SPARK-37542
> URL: https://issues.apache.org/jira/browse/SPARK-37542
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0
>Reporter: weixiuli
>Priority: Major
>
> Currently, the dynamic partition pruning rule will insert a predicate on the 
> filterable table using the filter from the other side of the join and a 
> custom wrapper called DynamicPruning,and the predicate will be re-optimized 
> by the AQE or non-AQE.
> But, some time the predicate may be unnecessary if the join can NOT reuse 
> broadcastExchange or it is not benefit,and it will be dropped by the rules of 
>  the AQE or non-AQE.
> We should optimize the dynamic partition pruning rule  and avoid inserting 
> unnecessary  predicate to improve the performance.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37542) Improve the Dynamic partition pruning

2021-12-03 Thread weixiuli (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weixiuli updated SPARK-37542:
-
Description: 
Currently, the dynamic partition pruning rule will insert a predicate on the 
filterable table using the filter from the other side of the join and a custom 
wrapper called DynamicPruning,and the predicate will be re-optimized by the AQE 
or non-AQE.

But, some time the predicate may be unnecessary if the join can NOT reuse 
broadcastExchange or it is not benefit,and it will be dropped by the rules of  
the AQE or non-AQE.

We should optimize the dynamic partitioning prune rules to avoid inserting 
unnecessary predicates to improve performance.

  was:
Currently, the dynamic partition pruning rule will insert a predicate on the 
filterable table using the filter from the other side of the join and a custom 
wrapper called DynamicPruning,and the predicate will be re-optimized by the AQE 
or non-AQE.

But, some time the predicate may be unnecessary if the join can NOT reuse 
broadcastExchange or it is not benefit,and it will be dropped by the rules of  
the AQE or non-AQE.

We should optimize the dynamic partition pruning rule  and avoid inserting 
unnecessary predicate to improve the performance.


> Improve the Dynamic partition pruning 
> --
>
> Key: SPARK-37542
> URL: https://issues.apache.org/jira/browse/SPARK-37542
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0
>Reporter: weixiuli
>Priority: Major
>
> Currently, the dynamic partition pruning rule will insert a predicate on the 
> filterable table using the filter from the other side of the join and a 
> custom wrapper called DynamicPruning,and the predicate will be re-optimized 
> by the AQE or non-AQE.
> But, some time the predicate may be unnecessary if the join can NOT reuse 
> broadcastExchange or it is not benefit,and it will be dropped by the rules of 
>  the AQE or non-AQE.
> We should optimize the dynamic partitioning prune rules to avoid inserting 
> unnecessary predicates to improve performance.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37542) optimize the dynamic partitioning prune rules to avoid inserting unnecessary predicates to improve performance

2021-12-03 Thread weixiuli (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weixiuli updated SPARK-37542:
-
Summary: optimize the dynamic partitioning prune rules to avoid inserting 
unnecessary predicates to improve performance  (was: Improve the Dynamic 
partition pruning )

> optimize the dynamic partitioning prune rules to avoid inserting unnecessary 
> predicates to improve performance
> --
>
> Key: SPARK-37542
> URL: https://issues.apache.org/jira/browse/SPARK-37542
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0
>Reporter: weixiuli
>Priority: Major
>
> Currently, the dynamic partition pruning rule will insert a predicate on the 
> filterable table using the filter from the other side of the join and a 
> custom wrapper called DynamicPruning,and the predicate will be re-optimized 
> by the AQE or non-AQE.
> But, some time the predicate may be unnecessary if the join can NOT reuse 
> broadcastExchange or it is not benefit,and it will be dropped by the rules of 
>  the AQE or non-AQE.
> We should optimize the dynamic partitioning prune rules to avoid inserting 
> unnecessary predicates to improve performance.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37542) Optimize the dynamic partitioning prune rules to avoid inserting unnecessary predicates to improve performance

2021-12-03 Thread weixiuli (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weixiuli updated SPARK-37542:
-
Summary: Optimize the dynamic partitioning prune rules to avoid inserting 
unnecessary predicates to improve performance  (was: optimize the dynamic 
partitioning prune rules to avoid inserting unnecessary predicates to improve 
performance)

> Optimize the dynamic partitioning prune rules to avoid inserting unnecessary 
> predicates to improve performance
> --
>
> Key: SPARK-37542
> URL: https://issues.apache.org/jira/browse/SPARK-37542
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0
>Reporter: weixiuli
>Priority: Major
>
> Currently, the dynamic partition pruning rule will insert a predicate on the 
> filterable table using the filter from the other side of the join and a 
> custom wrapper called DynamicPruning,and the predicate will be re-optimized 
> by the AQE or non-AQE.
> But, some time the predicate may be unnecessary if the join can NOT reuse 
> broadcastExchange or it is not benefit,and it will be dropped by the rules of 
>  the AQE or non-AQE.
> We should optimize the dynamic partitioning prune rules to avoid inserting 
> unnecessary predicates to improve performance.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37542) Optimize the dynamic partitioning prune rules to avoid inserting unnecessary predicates to improve performance

2021-12-03 Thread weixiuli (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weixiuli updated SPARK-37542:
-
Description: 
Currently, the dynamic partition pruning rule will insert a predicate on the 
filterable table using the filter from the other side of the join and a custom 
wrapper called DynamicPruning,and the predicate will be re-optimized by the AQE 
or non-AQE.

But, some time the predicate may be unnecessary if the join can NOT reuse 
broadcastExchange or it is not benefit,and it will be dropped by the rules of  
the AQE or non-AQE.

We should optimize the dynamic partitioning pruning rule to avoid inserting 
unnecessary predicates to improve performance.

  was:
Currently, the dynamic partition pruning rule will insert a predicate on the 
filterable table using the filter from the other side of the join and a custom 
wrapper called DynamicPruning,and the predicate will be re-optimized by the AQE 
or non-AQE.

But, some time the predicate may be unnecessary if the join can NOT reuse 
broadcastExchange or it is not benefit,and it will be dropped by the rules of  
the AQE or non-AQE.

We should optimize the dynamic partitioning prune rules to avoid inserting 
unnecessary predicates to improve performance.


> Optimize the dynamic partitioning prune rules to avoid inserting unnecessary 
> predicates to improve performance
> --
>
> Key: SPARK-37542
> URL: https://issues.apache.org/jira/browse/SPARK-37542
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0
>Reporter: weixiuli
>Priority: Major
>
> Currently, the dynamic partition pruning rule will insert a predicate on the 
> filterable table using the filter from the other side of the join and a 
> custom wrapper called DynamicPruning,and the predicate will be re-optimized 
> by the AQE or non-AQE.
> But, some time the predicate may be unnecessary if the join can NOT reuse 
> broadcastExchange or it is not benefit,and it will be dropped by the rules of 
>  the AQE or non-AQE.
> We should optimize the dynamic partitioning pruning rule to avoid inserting 
> unnecessary predicates to improve performance.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37542) Optimize the dynamic partitioning prune rules to avoid inserting unnecessary predicates to improve performance

2021-12-03 Thread weixiuli (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weixiuli updated SPARK-37542:
-
Description: 
Currently, the dynamic partition pruning rule will insert a predicate on the 
filterable table using the filter from the other side of the join and a custom 
wrapper called DynamicPruning,and the predicate will be re-optimized by the AQE 
or non-AQE.

But, sometimes the predicate may be unnecessary if the join can NOT reuse 
broadcastExchange or it is not benefit,and it will be dropped by the rules of  
the AQE or non-AQE.

We should optimize the dynamic partitioning pruning rule to avoid inserting 
unnecessary predicates to improve performance.

  was:
Currently, the dynamic partition pruning rule will insert a predicate on the 
filterable table using the filter from the other side of the join and a custom 
wrapper called DynamicPruning,and the predicate will be re-optimized by the AQE 
or non-AQE.

But, some time the predicate may be unnecessary if the join can NOT reuse 
broadcastExchange or it is not benefit,and it will be dropped by the rules of  
the AQE or non-AQE.

We should optimize the dynamic partitioning pruning rule to avoid inserting 
unnecessary predicates to improve performance.


> Optimize the dynamic partitioning prune rules to avoid inserting unnecessary 
> predicates to improve performance
> --
>
> Key: SPARK-37542
> URL: https://issues.apache.org/jira/browse/SPARK-37542
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0
>Reporter: weixiuli
>Priority: Major
>
> Currently, the dynamic partition pruning rule will insert a predicate on the 
> filterable table using the filter from the other side of the join and a 
> custom wrapper called DynamicPruning,and the predicate will be re-optimized 
> by the AQE or non-AQE.
> But, sometimes the predicate may be unnecessary if the join can NOT reuse 
> broadcastExchange or it is not benefit,and it will be dropped by the rules of 
>  the AQE or non-AQE.
> We should optimize the dynamic partitioning pruning rule to avoid inserting 
> unnecessary predicates to improve performance.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37616) Support pushing down a dynamic partition pruning from one join to other joins

2021-12-12 Thread weixiuli (Jira)
weixiuli created SPARK-37616:


 Summary: Support pushing down a dynamic partition pruning from one 
join to other joins
 Key: SPARK-37616
 URL: https://issues.apache.org/jira/browse/SPARK-37616
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.0, 3.1.2, 3.1.1
Reporter: weixiuli


Support pushing down a dynamic partition pruning from one join to other joins



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37674) Reduce the output partition of output stage to avoid producing small files.

2021-12-17 Thread weixiuli (Jira)
weixiuli created SPARK-37674:


 Summary: Reduce the output partition of output stage to avoid 
producing small files.
 Key: SPARK-37674
 URL: https://issues.apache.org/jira/browse/SPARK-37674
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0, 3.1.1, 3.0.3, 3.0.2, 3.0.0
Reporter: weixiuli


The partition size of the finalStage with `DataWritingCommand` or
 `V2TableWriteExec` may use the ADVISORY_PARTITION_SIZE_IN_BYTES which is 
smaller one and  produce some small files, it may bad for production, we should 
use a new partition size for the finalStage with `DataWritingCommand` or 
`V2TableWriteExec` to avoid small files.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37978) Remove the useless ChunkFetchFailureException class

2022-01-20 Thread weixiuli (Jira)
weixiuli created SPARK-37978:


 Summary: Remove the useless ChunkFetchFailureException class
 Key: SPARK-37978
 URL: https://issues.apache.org/jira/browse/SPARK-37978
 Project: Spark
  Issue Type: Improvement
  Components: Shuffle, Spark Core
Affects Versions: 3.2.0, 3.1.1, 3.0.3, 3.0.1, 3.0.0
Reporter: weixiuli


Remove the useless ChunkFetchFailureException class



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37978) Remove the unnecessary ChunkFetchFailureException class

2022-01-21 Thread weixiuli (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weixiuli updated SPARK-37978:
-
Summary: Remove the unnecessary ChunkFetchFailureException class  (was: 
Remove the useless ChunkFetchFailureException class)

> Remove the unnecessary ChunkFetchFailureException class
> ---
>
> Key: SPARK-37978
> URL: https://issues.apache.org/jira/browse/SPARK-37978
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 3.0.0, 3.0.1, 3.0.3, 3.1.1, 3.2.0
>Reporter: weixiuli
>Priority: Major
>
> Remove the useless ChunkFetchFailureException class



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37978) Remove the unnecessary ChunkFetchFailureException class

2022-01-21 Thread weixiuli (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weixiuli updated SPARK-37978:
-
Description: The ChunkFetchFailureException is unnecessary and can be 
replaced by RuntimeException.  (was: Remove the useless 
ChunkFetchFailureException class)

> Remove the unnecessary ChunkFetchFailureException class
> ---
>
> Key: SPARK-37978
> URL: https://issues.apache.org/jira/browse/SPARK-37978
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 3.0.0, 3.0.1, 3.0.3, 3.1.1, 3.2.0
>Reporter: weixiuli
>Priority: Major
>
> The ChunkFetchFailureException is unnecessary and can be replaced by 
> RuntimeException.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37984) Avoid computing all outstanding requests to improve performance.

2022-01-21 Thread weixiuli (Jira)
weixiuli created SPARK-37984:


 Summary: Avoid computing all outstanding requests to improve 
performance.
 Key: SPARK-37984
 URL: https://issues.apache.org/jira/browse/SPARK-37984
 Project: Spark
  Issue Type: Improvement
  Components: Shuffle, Spark Core
Affects Versions: 3.2.0, 3.1.2, 3.1.0, 3.0.3, 3.0.1, 3.0.0
Reporter: weixiuli


Avoid computing all outstanding requests to improve performance.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37984) Avoid calculating all outstanding requests to improve performance.

2022-01-21 Thread weixiuli (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weixiuli updated SPARK-37984:
-
Description: Avoid calculating all outstanding requests to improve 
performance.  (was: Avoid computing all outstanding requests to improve 
performance.)
Summary: Avoid calculating all outstanding requests to improve 
performance.  (was: Avoid computing all outstanding requests to improve 
performance.)

> Avoid calculating all outstanding requests to improve performance.
> --
>
> Key: SPARK-37984
> URL: https://issues.apache.org/jira/browse/SPARK-37984
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 3.0.0, 3.0.1, 3.0.3, 3.1.0, 3.1.2, 3.2.0
>Reporter: weixiuli
>Priority: Major
>
> Avoid calculating all outstanding requests to improve performance.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37993) Avoid multiple calls to conf parameter values

2022-01-23 Thread weixiuli (Jira)
weixiuli created SPARK-37993:


 Summary: Avoid multiple calls to conf parameter values
 Key: SPARK-37993
 URL: https://issues.apache.org/jira/browse/SPARK-37993
 Project: Spark
  Issue Type: Improvement
  Components: Shuffle, Spark Core
Affects Versions: 3.2.0, 3.1.2, 3.1.0, 3.0.3, 3.0.2, 3.0.0
Reporter: weixiuli


Avoid multiple calls to conf parameter values



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37993) Avoid multiple calls to configuration parameter values

2022-01-23 Thread weixiuli (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weixiuli updated SPARK-37993:
-
Description: Avoid multiple calls to configuration parameter values  (was: 
Avoid multiple calls to conf parameter values)
Summary: Avoid multiple calls to configuration parameter values  (was: 
Avoid multiple calls to conf parameter values)

> Avoid multiple calls to configuration parameter values
> --
>
> Key: SPARK-37993
> URL: https://issues.apache.org/jira/browse/SPARK-37993
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 3.0.0, 3.0.2, 3.0.3, 3.1.0, 3.1.2, 3.2.0
>Reporter: weixiuli
>Priority: Major
>
> Avoid multiple calls to configuration parameter values



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38008) Fix the description of refill method

2022-01-24 Thread weixiuli (Jira)
weixiuli created SPARK-38008:


 Summary: Fix the description of refill method
 Key: SPARK-38008
 URL: https://issues.apache.org/jira/browse/SPARK-38008
 Project: Spark
  Issue Type: Bug
  Components: Shuffle, Spark Core
Affects Versions: 3.2.0, 3.1.2, 3.1.1, 3.1.0
Reporter: weixiuli


Fix the description of refill method.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38008) Fix the method description of refill

2022-01-24 Thread weixiuli (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weixiuli updated SPARK-38008:
-
Description: Fix the method description of refill  (was: Fix the 
description of refill method.)
Summary: Fix the method description of refill  (was: Fix the 
description of refill method)

> Fix the method description of refill
> 
>
> Key: SPARK-38008
> URL: https://issues.apache.org/jira/browse/SPARK-38008
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 3.1.0, 3.1.1, 3.1.2, 3.2.0
>Reporter: weixiuli
>Priority: Major
>
> Fix the method description of refill



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38129) Adaptive enable timeout for BroadcastQueryStageExec

2022-02-07 Thread weixiuli (Jira)
weixiuli created SPARK-38129:


 Summary: Adaptive enable timeout for BroadcastQueryStageExec
 Key: SPARK-38129
 URL: https://issues.apache.org/jira/browse/SPARK-38129
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.2.1, 3.2.0
Reporter: weixiuli


We should disable timeout for BroadcastQueryStageExec when it comes from 
shuffle query stages which runtime statistics are usually correct in AQE, but 
should enable timeout for it when it comes from others which statistics may be 
incorrect, and keep it the same as non-AQE.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38129) Adaptively enable timeout for BroadcastQueryStageExec

2022-02-07 Thread weixiuli (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weixiuli updated SPARK-38129:
-
Summary: Adaptively enable timeout for BroadcastQueryStageExec  (was: 
Adaptive enable timeout for BroadcastQueryStageExec)

> Adaptively enable timeout for BroadcastQueryStageExec
> -
>
> Key: SPARK-38129
> URL: https://issues.apache.org/jira/browse/SPARK-38129
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0, 3.2.1
>Reporter: weixiuli
>Priority: Major
>
> We should disable timeout for BroadcastQueryStageExec when it comes from 
> shuffle query stages which runtime statistics are usually correct in AQE, but 
> should enable timeout for it when it comes from others which statistics may 
> be incorrect, and keep it the same as non-AQE.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38129) Adaptively enable timeout for BroadcastQueryStageExec

2022-02-07 Thread weixiuli (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weixiuli updated SPARK-38129:
-
Fix Version/s: 3.2.1
   3.2.0

> Adaptively enable timeout for BroadcastQueryStageExec
> -
>
> Key: SPARK-38129
> URL: https://issues.apache.org/jira/browse/SPARK-38129
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0, 3.2.1
>Reporter: weixiuli
>Priority: Major
> Fix For: 3.2.0, 3.2.1
>
>
> We should disable timeout for BroadcastQueryStageExec when it comes from 
> shuffle query stages which runtime statistics are usually correct in AQE, but 
> should enable timeout for it when it comes from others which statistics may 
> be incorrect, and keep it the same as non-AQE.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38129) Adaptively enable timeout for BroadcastQueryStageExec

2022-02-07 Thread weixiuli (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weixiuli updated SPARK-38129:
-
Parent: SPARK-33828
Issue Type: Sub-task  (was: Bug)

> Adaptively enable timeout for BroadcastQueryStageExec
> -
>
> Key: SPARK-38129
> URL: https://issues.apache.org/jira/browse/SPARK-38129
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0, 3.2.1
>Reporter: weixiuli
>Priority: Major
> Fix For: 3.2.0, 3.2.1
>
>
> We should disable timeout for BroadcastQueryStageExec when it comes from 
> shuffle query stages which runtime statistics are usually correct in AQE, but 
> should enable timeout for it when it comes from others which statistics may 
> be incorrect, and keep it the same as non-AQE.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38191) The staging directory of write job only needs to be initialized once in HadoopMapReduceCommitProtocol.

2022-02-11 Thread weixiuli (Jira)
weixiuli created SPARK-38191:


 Summary: The staging directory of write job only needs to be 
initialized once in HadoopMapReduceCommitProtocol.
 Key: SPARK-38191
 URL: https://issues.apache.org/jira/browse/SPARK-38191
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.2.1, 3.2.0, 3.1.2, 3.1.1, 3.1.0, 3.0.3, 3.0.2, 3.0.1, 
3.0.0
Reporter: weixiuli






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39950) It's necessary to materialize BroadcastQueryStage first, because the BroadcastQueryStage would NOT timeout in AQE.

2022-08-02 Thread weixiuli (Jira)
weixiuli created SPARK-39950:


 Summary: It's necessary to materialize BroadcastQueryStage first, 
because the BroadcastQueryStage would NOT timeout in AQE. 
 Key: SPARK-39950
 URL: https://issues.apache.org/jira/browse/SPARK-39950
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.2, 3.3.0, 3.2.1
Reporter: weixiuli






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39950) It's necessary to materialize BroadcastQueryStage first, because the BroadcastQueryStage does not timeout in AQE.

2022-08-02 Thread weixiuli (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weixiuli updated SPARK-39950:
-
Summary: It's necessary to materialize BroadcastQueryStage first, because 
the BroadcastQueryStage does not  timeout in AQE.   (was: It's necessary to 
materialize BroadcastQueryStage first, because the BroadcastQueryStage would 
NOT timeout in AQE. )

> It's necessary to materialize BroadcastQueryStage first, because the 
> BroadcastQueryStage does not  timeout in AQE. 
> ---
>
> Key: SPARK-39950
> URL: https://issues.apache.org/jira/browse/SPARK-39950
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.1, 3.3.0, 3.2.2
>Reporter: weixiuli
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org