As you know, the hadoop versions and so on are available in the spark build files, iirc the top level pox.xml has all the maven variables for versions.
So I think if you just build hadoop locally (i.e. build it as it to 2.2.1234-SNAPSHOT and mvn install it), you should be able to change the corresponding varaible in the top level spark pom.xml. ..... Of course this is a pandoras box where now you need to also deploy your custom YARN on your cluster, make sure it matches the spark target, and so on (if your running spark on YARN). RPMs and DEB packages tend to be useful for this kind of thing, since you can easily sync the /etc/ config files and uniformly manage/upgrade versions etc. ... Thus... if your really serious about building a custom distribution, mixing & matching hadoop components separately, you might want to consider using Apache BigTop, just bring this up on that mailing list... We curate a hadoop distribution "builder" that builds spark, hadoop, hive, ignite, kafka, zookeeper, hbase and so on... Since bigtop has all the tooling necessary to fully build, test, and deploy on VMs/containers your hadoop bits, it might make your life a little easier. On Tue, Jul 21, 2015 at 11:11 PM, Dogtail Ray <spark.ru...@gmail.com> wrote: > Hi, > > I have modified some Hadoop code, and want to build Spark with the > modified version of Hadoop. Do I need to change the compilation dependency > files? How to then? Great thanks! > -- jay vyas