Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/5294#issuecomment-88254374 Yes that is basically the scenario. Although I would expect it start out package hadoopA with Spark running on HadoopA, then hadoopB is deployed and spark with hadoopA runs just fine on hadoopB. This allows for separate deployments of hadoop and spark. Otherwise you have to make sure spark and hadoop get deployed everywhere at the same time and everyone upgrades to new version of spark. yes it did happen which is what lead me to filing this jira and plan on changing how we internally package spark. I don't think it will happen real often but I also don't want this to cause an issue on a production system. MapReduce has this same issue and we actually package that fully separate to prevent this. With Hadoop now supporting rolling upgrades this is more of a concern. Personally I see things trying to go to more isolated environments where we aren't making the hadoop and its dependencies be included in everything that runs on YARN. Many users have issues with dependencies and such and having this config should at least give them the option.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org