Github user tgravescs commented on the pull request:

    https://github.com/apache/spark/pull/5294#issuecomment-88254374
  
    Yes that is basically the scenario.  Although I would expect it start out 
package hadoopA with Spark running on HadoopA, then hadoopB is deployed and 
spark with hadoopA runs just fine on hadoopB.
    
    This allows for separate deployments of hadoop and spark. Otherwise you 
have to make sure spark and hadoop get deployed everywhere at the same time and 
everyone upgrades to new version of spark.
    
    yes it did happen which is what lead me to filing this jira and plan on 
changing how we internally package spark. I don't think it will happen real 
often but I also don't want this to cause an issue on a production system.  
MapReduce has this same issue and we actually package that fully separate to 
prevent this.  With Hadoop now supporting rolling upgrades this is more of a 
concern.  
    
    Personally I see things trying to go to more isolated environments where we 
aren't making the hadoop and its dependencies be included in everything that 
runs on YARN.  Many users have issues with dependencies and such and having 
this config should at least give them the option.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to