[GitHub] [spark] holdenk commented on pull request #29211: [SPARK-31197][CORE] Shutdown executor once we are done decommissioning

2020-08-05 Thread GitBox


holdenk commented on pull request #29211:
URL: https://github.com/apache/spark/pull/29211#issuecomment-669596854


   Merged to the dev branch :)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] holdenk commented on pull request #29211: [SPARK-31197][CORE] Shutdown executor once we are done decommissioning

2020-08-05 Thread GitBox


holdenk commented on pull request #29211:
URL: https://github.com/apache/spark/pull/29211#issuecomment-669595846


   Since it's approved and passing jenkins & GHA I'm going to merge this, but 
I'll add the comment about the time in the follow up I'll rebase on top of this 
after merge.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] holdenk commented on pull request #29211: [SPARK-31197][CORE] Shutdown executor once we are done decommissioning

2020-08-05 Thread GitBox


holdenk commented on pull request #29211:
URL: https://github.com/apache/spark/pull/29211#issuecomment-669389554


   The only outstanding point of discussion is a test timeout length, which I 
don't believe is critical to address. If CI passes I intend to merge this PR 
and rebase the next PR which brings the feature full circle.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] holdenk commented on pull request #29211: [SPARK-31197][CORE] Shutdown executor once we are done decommissioning

2020-08-03 Thread GitBox


holdenk commented on pull request #29211:
URL: https://github.com/apache/spark/pull/29211#issuecomment-668340555


   I've added a bunch of tests around the timestamp stuff. I think it's got all 
of the cases folks were asking for covered, and in the discussion on the PR to 
this PR it seems like we're in agreement that while the timestamps are complex, 
it beats the alternative of maybe having a race condition and trying to 
mitigate with a thread sleep.
   
   With that in mind, if there is no other active discussion I'll consider this 
to be a lazy consensus.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] holdenk commented on pull request #29211: [SPARK-31197][CORE] Shutdown executor once we are done decommissioning

2020-07-31 Thread GitBox


holdenk commented on pull request #29211:
URL: https://github.com/apache/spark/pull/29211#issuecomment-667163174


   So we don’t reject tasks sent to us and an executor can start
   decommissioning without the driver knowing yet so it is possible (although
   unlikely) to get a new task after we’ve started decommissioning.
   
   
   
   On Fri, Jul 31, 2020 at 6:58 AM Attila Zsolt Piros 
   wrote:
   
   > *@attilapiros* commented on this pull request.
   > --
   >
   > In
   > 
core/src/main/scala/org/apache/spark/storage/BlockManagerDecommissioner.scala
   > :
   >
   > > @@ -327,4 +354,28 @@ private[storage] class BlockManagerDecommissioner(
   >  }
   >  logInfo("Stopped storage decommissioner")
   >}
   > +
   > +  /*
   > +   *  Returns the last migration time and a boolean for if all blocks 
have been migrated.
   > +   *  If there are any tasks running since that time the boolean may be 
incorrect.
   > +   */
   > +  private[storage] def lastMigrationInfo(): (Long, Boolean) = {
   > +if (stopped || (stoppedRDD && stoppedShuffle)) {
   > +  (System.nanoTime(), true)
   > +} else {
   > +  // Chose the min of the running times.
   > +  val lastMigrationTime = if (
   > +conf.get(config.STORAGE_DECOMMISSION_SHUFFLE_BLOCKS_ENABLED) &&
   > +conf.get(config.STORAGE_DECOMMISSION_RDD_BLOCKS_ENABLED)) {
   > +Math.min(lastRDDMigrationTime, lastShuffleMigrationTime)
   >
   > Now I am starting to get this part. Can we try to simplify this? :)
   >
   > On a decommissioned executor there will be no new tasks started, right?
   > (Theoretically one task could be started as the scheduler does not
   > processed the DecommissionExecutor message but let's take this corner
   > case out as we can have this first 1 sec sleep which more or less handles
   > this).
   >
   > So when executor.numRunningTasks will be 0 it stays 0. As I remember
   > caching is part of task running so no new cached blocks will be created
   > when numRunningTasks=0.
   >
   > So would it work if we first wait to reach numRunningTasks==0 in a sleep
   > check loop and then we would check for the migration finished flag without
   > using the time part of lastMigrationInfo?
   >
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub
   > , or
   > unsubscribe
   > 

   > .
   >
   -- 
   Cell : 425-233-8271
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] holdenk commented on pull request #29211: [SPARK-31197][CORE] Shutdown executor once we are done decommissioning

2020-07-29 Thread GitBox


holdenk commented on pull request #29211:
URL: https://github.com/apache/spark/pull/29211#issuecomment-665379526


   So it seems without the thread start, we end up in the situation where there 
is an orphan worker (see 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126735/console
 ) so I think that's covered. I'll double-check that this goes away once we put 
the thread start back in, otherwise I'll do some more poking.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] holdenk commented on pull request #29211: [SPARK-31197][CORE] Shutdown executor once we are done decommissioning

2020-07-24 Thread GitBox


holdenk commented on pull request #29211:
URL: https://github.com/apache/spark/pull/29211#issuecomment-663678743


   Sure, I can hold off on this one then @attilapiros :)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] holdenk commented on pull request #29211: [SPARK-31197][CORE] Shutdown executor once we are done decommissioning

2020-07-24 Thread GitBox


holdenk commented on pull request #29211:
URL: https://github.com/apache/spark/pull/29211#issuecomment-663650784


   All of the github actions pass, if no one else wants more time to comment on 
this, I'll merge this today.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] holdenk commented on pull request #29211: [SPARK-31197][CORE] Shutdown executor once we are done decommissioning

2020-07-23 Thread GitBox


holdenk commented on pull request #29211:
URL: https://github.com/apache/spark/pull/29211#issuecomment-663263429


   cc @attilapiros & @agrawaldevesh 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org