Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-56506066 @pwendell This is not hadoop RDD specific functionality - it is a general requirement which can be leveraged by any RDD in spark - and hadoop RDD currently happens to have a usecase for this when dfs caching is used. The fact that preferred location is currently a String might be the limitation here : and so extending it for uri or whatever else will add overhead (including current patch). For example: RDD which pulls data from tachyon or other distributed memory stores, loading data into accelerator cards and specifying process local locality for the block, etc are all uses of the same functionality imo. If not addressed properly, when the next similar requirement comes along - either we will be rewriting this code; or adding more surgical hacks along same lines. If the expectation is that spark wont need to support these other requirements [1], then we can definitely punt on doing a proper design change. Given this is not user facing change (right ?), we can definitely take current approach and replace it later; or do a more principled solution upfront. @kayousterhout @markhamstra @mateiz any thoughts given this modifies TaskSetManager for addition of this feature ? [1] which is unlikely given mllib's rapid pace of development - it is fairly inevitable to have the need to support accelerator cards sooner rather than later - atleast given the arc of our past efforts with ml on spark.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org