[GitHub] [spark] yangwwei commented on pull request #34672: [SPARK-37394][CORE] Skip registering with ESS if a customized shuffle manager is configured

2022-01-12 Thread GitBox


yangwwei commented on pull request #34672:
URL: https://github.com/apache/spark/pull/34672#issuecomment-1011570638


   I checked two open-source projects, Uber Zeus and Tencent Firestorm, they 
both did not implement `ShuffleBlockResolver `, see firestorm 
[RssShuffleManager#shuffleBlockResolver()](https://github.com/Tencent/Firestorm/blob/0cac0134da6c791c3077e683b2ac6b02e6546b06/client-spark/spark3/src/main/java/org/apache/spark/shuffle/RssShuffleManager.java#L470-L473),
 and Zeus's 
[RssShuffleBlockResolver](https://github.com/uber/RemoteShuffleService/blob/607358d97dc17e1d4e8d29b145e43248ac9ef6dd/src/main/scala/org/apache/spark/shuffle/RssShuffleBlockResolver.scala#L20).
 They have both implemented ShuffleManager, and their own shuffle read/writer.  
So if we start the refactoring from this level, that might not help, what do 
you think, @attilapiros ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yangwwei commented on pull request #34672: [SPARK-37394][CORE] Skip registering with ESS if a customized shuffle manager is configured

2022-01-11 Thread GitBox


yangwwei commented on pull request #34672:
URL: https://github.com/apache/spark/pull/34672#issuecomment-1010677544


   Thank you all, @tgravescs , @attilapiros , @hiboyang , @mridulm 
   I had a conversation with @attilapiros , I will work with him and see if 
there will be any way to solve this in a better way. Will come back and update 
this thread once we find out more. Thank you all for the feedback and 
suggestions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yangwwei commented on pull request #34672: [SPARK-37394][CORE] Skip registering with ESS if a customized shuffle manager is configured

2022-01-08 Thread GitBox


yangwwei commented on pull request #34672:
URL: https://github.com/apache/spark/pull/34672#issuecomment-1007823318


   hi @tgravescs thanks for getting back on this one : ). 
   
   I think @attilapiros already mentioned 
   
   > But it got sidetracked and become stale.
   Based on this experience I am afraid the ideal solution is a bit far away in 
the future.
   
   @attilapiros please share your thoughts.
   
   I understand this may not be the best/elegant solution, but it is simple and 
it works.  I do not think hacking more things in the remote shuffle service 
side is better, this is a common problem that should be handled on the Spark 
side. If there is a better, actionable solution, I would love to explore more.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yangwwei commented on pull request #34672: [SPARK-37394][CORE] Skip registering with ESS if a customized shuffle manager is configured

2022-01-07 Thread GitBox


yangwwei commented on pull request #34672:
URL: https://github.com/apache/spark/pull/34672#issuecomment-1007823318


   hi @tgravescs thanks for getting back on this one : ). 
   
   I think @attilapiros already mentioned 
   
   > But it got sidetracked and become stale.
   Based on this experience I am afraid the ideal solution is a bit far away in 
the future.
   
   @attilapiros please share your thoughts.
   
   I understand this may not be the best/elegant solution, but it is simple and 
it works.  I do not think hacking more things in the remote shuffle service 
side is better, this is a common problem that should be handled on the Spark 
side. If there is a better, actionable solution, I would love to explore more.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yangwwei commented on pull request #34672: [SPARK-37394][CORE] Skip registering with ESS if a customized shuffle manager is configured

2021-12-02 Thread GitBox


yangwwei commented on pull request #34672:
URL: https://github.com/apache/spark/pull/34672#issuecomment-984948459


   hi @tgravescs , @mridulm understand your concerns, but we need a solution to 
get this work for today's RSS. How about adding a config like:
   
   ```
 private[spark] val SHUFFLE_REGISTRATION_ENABLED =
   ConfigBuilder("spark.shuffle.registration.enabled")
 .doc("Enable the executors to register with the local external shuffle 
service. When " +
   "`spark.shuffle.service.enabled` is true and a local external 
shuffle service is used, " +
   "it must be set to true; if the local shuffle service is not usd, 
set this value to " +
   "false to skip the registration.")
 .version("3.3.0")
 .booleanConf
 .createWithDefault(true)
   ```
   
   this way at least we have a config property to set for RSS, in order to skip 
the registration step currently being hardcoded.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yangwwei commented on pull request #34672: [SPARK-37394][CORE] Skip registering with ESS if a customized shuffle manager is configured

2021-11-30 Thread GitBox


yangwwei commented on pull request #34672:
URL: https://github.com/apache/spark/pull/34672#issuecomment-983067546


   > so I'm a bit on the fence about this. I hesitate to change this API and 
cause folks to recompile without really more investigation to properly make 
shuffle pluggable. The shuffleManager here is private[spark] so the fact we are 
adding a function for 3rd party libraries doesn't really match well with that 
by itself. Ideally we really make api's pluggable and classes developers apis 
for 3rd parties but I also realize that is a much larger task to find how to do 
that properly. I just hate to change the api multiple times if we are planning 
on doing more work there. At the same time, I think other attempts at that have 
stalled, so open here to thoughts from others?
   
   Thanks for sharing your thoughts @tgravescs . By far, there are several 3rd 
shuffle service implementations out there, and it seems the existing APIs are 
pluggable enough. Well, I may miss some facts, but at least we are pretty happy 
to run with Uber's RSS with minimal configs. This is the only issue we found 
that we cannot bypass, we need to have some changes in the Spark in order not 
to "always" register with the local shuffle service. This approach is the 
simplest and safest solution I can think of. I would be loving to hear thoughts 
from others as well.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yangwwei commented on pull request #34672: [SPARK-37394][CORE] Skip registering with ESS if a customized shuffle manager is configured

2021-11-29 Thread GitBox


yangwwei commented on pull request #34672:
URL: https://github.com/apache/spark/pull/34672#issuecomment-982363102


   @mridulm , @attilapiros , @tgravescs  could you pls help to review the 
changes again?
   Per @attilapiros 's suggestion, I have added a method in the ShuffleManager 
trait and this is allowed to be overridden when needed. The default returns 
true, so there is no behavior change. I have also updated the "how this was 
tested" with more details about the tests I've done locally.
   
   Note, this is still an "incompatible" change to the 3rd party shuffle 
service implementations. Adding a method with a default implementation in a 
trait will require a re-compile of the RSS's server/client library.
   
   Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yangwwei commented on pull request #34672: [SPARK-37394][CORE] Skip registering with ESS if a customized shuffle manager is configured

2021-11-22 Thread GitBox


yangwwei commented on pull request #34672:
URL: https://github.com/apache/spark/pull/34672#issuecomment-976225163


   Thank you @HyukjinKwon , @attilapiros , @tgravescs 
   
   >What about extending ShuffleManager trait with a new method indicating 
whether this shuffle manager implementation works with the external shuffle 
manager or not. It can have a default implementation giving back true and only 
needed to be overridden when the external shuffle manager is not supported.
   
   I really like this idea, thank you @attilapiros. How about adding a new 
method:  `supportExternalShuffleService()`. This method gives each shuffle 
manager implementation a way to tell if the external shuffle service is needed 
for this shuffle manager to work. Default it returns true, and then the block 
manager will register with the external shuffle server; otherwise, that 
registration can be skipped.
   
   >So my first reaction is: you have a 3rd party shuffle manager that is an 
external shuffle service because it supports dynamic allocation, then why is it 
failing... is it because you didn't override something, or because you couldn't 
override something? In this case it's creating a ExternalBlockStoreClient, 
which I think isn't setup to be overridden. I think it comes down to we just 
haven't really added support to allow this.
   
   We actually found this issue while using [Uber's Remote Shuffle 
Service](https://github.com/uber/RemoteShuffleService) with DA enabled. This is 
due to [this part of 
code](https://github.com/apache/spark/blob/5d3a6573a56f9c00ccc513c8131c037de7d29000/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L532-L534)
 being hardcoded to register with the external shuffle service even a 3rd party 
shuffle service is used. We will need a more general way to handle this. Please 
let me know your thoughts, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org