[GitHub] [spark] yangwwei commented on pull request #34672: [SPARK-37394][CORE] Skip registering with ESS if a customized shuffle manager is configured
yangwwei commented on pull request #34672: URL: https://github.com/apache/spark/pull/34672#issuecomment-1011570638 I checked two open-source projects, Uber Zeus and Tencent Firestorm, they both did not implement `ShuffleBlockResolver `, see firestorm [RssShuffleManager#shuffleBlockResolver()](https://github.com/Tencent/Firestorm/blob/0cac0134da6c791c3077e683b2ac6b02e6546b06/client-spark/spark3/src/main/java/org/apache/spark/shuffle/RssShuffleManager.java#L470-L473), and Zeus's [RssShuffleBlockResolver](https://github.com/uber/RemoteShuffleService/blob/607358d97dc17e1d4e8d29b145e43248ac9ef6dd/src/main/scala/org/apache/spark/shuffle/RssShuffleBlockResolver.scala#L20). They have both implemented ShuffleManager, and their own shuffle read/writer. So if we start the refactoring from this level, that might not help, what do you think, @attilapiros ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yangwwei commented on pull request #34672: [SPARK-37394][CORE] Skip registering with ESS if a customized shuffle manager is configured
yangwwei commented on pull request #34672: URL: https://github.com/apache/spark/pull/34672#issuecomment-1010677544 Thank you all, @tgravescs , @attilapiros , @hiboyang , @mridulm I had a conversation with @attilapiros , I will work with him and see if there will be any way to solve this in a better way. Will come back and update this thread once we find out more. Thank you all for the feedback and suggestions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yangwwei commented on pull request #34672: [SPARK-37394][CORE] Skip registering with ESS if a customized shuffle manager is configured
yangwwei commented on pull request #34672: URL: https://github.com/apache/spark/pull/34672#issuecomment-1007823318 hi @tgravescs thanks for getting back on this one : ). I think @attilapiros already mentioned > But it got sidetracked and become stale. Based on this experience I am afraid the ideal solution is a bit far away in the future. @attilapiros please share your thoughts. I understand this may not be the best/elegant solution, but it is simple and it works. I do not think hacking more things in the remote shuffle service side is better, this is a common problem that should be handled on the Spark side. If there is a better, actionable solution, I would love to explore more. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yangwwei commented on pull request #34672: [SPARK-37394][CORE] Skip registering with ESS if a customized shuffle manager is configured
yangwwei commented on pull request #34672: URL: https://github.com/apache/spark/pull/34672#issuecomment-1007823318 hi @tgravescs thanks for getting back on this one : ). I think @attilapiros already mentioned > But it got sidetracked and become stale. Based on this experience I am afraid the ideal solution is a bit far away in the future. @attilapiros please share your thoughts. I understand this may not be the best/elegant solution, but it is simple and it works. I do not think hacking more things in the remote shuffle service side is better, this is a common problem that should be handled on the Spark side. If there is a better, actionable solution, I would love to explore more. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yangwwei commented on pull request #34672: [SPARK-37394][CORE] Skip registering with ESS if a customized shuffle manager is configured
yangwwei commented on pull request #34672: URL: https://github.com/apache/spark/pull/34672#issuecomment-984948459 hi @tgravescs , @mridulm understand your concerns, but we need a solution to get this work for today's RSS. How about adding a config like: ``` private[spark] val SHUFFLE_REGISTRATION_ENABLED = ConfigBuilder("spark.shuffle.registration.enabled") .doc("Enable the executors to register with the local external shuffle service. When " + "`spark.shuffle.service.enabled` is true and a local external shuffle service is used, " + "it must be set to true; if the local shuffle service is not usd, set this value to " + "false to skip the registration.") .version("3.3.0") .booleanConf .createWithDefault(true) ``` this way at least we have a config property to set for RSS, in order to skip the registration step currently being hardcoded. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yangwwei commented on pull request #34672: [SPARK-37394][CORE] Skip registering with ESS if a customized shuffle manager is configured
yangwwei commented on pull request #34672: URL: https://github.com/apache/spark/pull/34672#issuecomment-983067546 > so I'm a bit on the fence about this. I hesitate to change this API and cause folks to recompile without really more investigation to properly make shuffle pluggable. The shuffleManager here is private[spark] so the fact we are adding a function for 3rd party libraries doesn't really match well with that by itself. Ideally we really make api's pluggable and classes developers apis for 3rd parties but I also realize that is a much larger task to find how to do that properly. I just hate to change the api multiple times if we are planning on doing more work there. At the same time, I think other attempts at that have stalled, so open here to thoughts from others? Thanks for sharing your thoughts @tgravescs . By far, there are several 3rd shuffle service implementations out there, and it seems the existing APIs are pluggable enough. Well, I may miss some facts, but at least we are pretty happy to run with Uber's RSS with minimal configs. This is the only issue we found that we cannot bypass, we need to have some changes in the Spark in order not to "always" register with the local shuffle service. This approach is the simplest and safest solution I can think of. I would be loving to hear thoughts from others as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yangwwei commented on pull request #34672: [SPARK-37394][CORE] Skip registering with ESS if a customized shuffle manager is configured
yangwwei commented on pull request #34672: URL: https://github.com/apache/spark/pull/34672#issuecomment-982363102 @mridulm , @attilapiros , @tgravescs could you pls help to review the changes again? Per @attilapiros 's suggestion, I have added a method in the ShuffleManager trait and this is allowed to be overridden when needed. The default returns true, so there is no behavior change. I have also updated the "how this was tested" with more details about the tests I've done locally. Note, this is still an "incompatible" change to the 3rd party shuffle service implementations. Adding a method with a default implementation in a trait will require a re-compile of the RSS's server/client library. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yangwwei commented on pull request #34672: [SPARK-37394][CORE] Skip registering with ESS if a customized shuffle manager is configured
yangwwei commented on pull request #34672: URL: https://github.com/apache/spark/pull/34672#issuecomment-976225163 Thank you @HyukjinKwon , @attilapiros , @tgravescs >What about extending ShuffleManager trait with a new method indicating whether this shuffle manager implementation works with the external shuffle manager or not. It can have a default implementation giving back true and only needed to be overridden when the external shuffle manager is not supported. I really like this idea, thank you @attilapiros. How about adding a new method: `supportExternalShuffleService()`. This method gives each shuffle manager implementation a way to tell if the external shuffle service is needed for this shuffle manager to work. Default it returns true, and then the block manager will register with the external shuffle server; otherwise, that registration can be skipped. >So my first reaction is: you have a 3rd party shuffle manager that is an external shuffle service because it supports dynamic allocation, then why is it failing... is it because you didn't override something, or because you couldn't override something? In this case it's creating a ExternalBlockStoreClient, which I think isn't setup to be overridden. I think it comes down to we just haven't really added support to allow this. We actually found this issue while using [Uber's Remote Shuffle Service](https://github.com/uber/RemoteShuffleService) with DA enabled. This is due to [this part of code](https://github.com/apache/spark/blob/5d3a6573a56f9c00ccc513c8131c037de7d29000/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L532-L534) being hardcoded to register with the external shuffle service even a 3rd party shuffle service is used. We will need a more general way to handle this. Please let me know your thoughts, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org