GitHub user pwendell opened a pull request:
https://github.com/apache/spark/pull/286
Merge Hadoop Into Spark
This patch merges the Hadoop 0.20.2 source code into the Spark project.
I've thought about this a bunch and this will provide us with several benefits:
### More source code
Let's be honest, to be taken seriously as a project Spark needs to have
_way more_ lines of code. Spark is currently 70,000 lines of Scala code - this
patch adds 452,000 lines of XML alone (!) This will make our github stats look
great!
### Seamless builds
Sometimes users stumble trying to build Spark against Hadoop. Not anymore!!
With Hadoop inside of Spark this won't a problem at all. I mean, there's
basically only one Hadoop version, right? So this should work for pretty much
everyone. Wait, hold on, is hadoop-0.20.2 the same as hadoop-2.2.0? I'm
assuming it's the same because they both have the same number of "2"s.
### Your favorite old configs
This patch will give users access to some of their favorite old configs
from Hadoop. Did you just figure out what
`dfs.namenode.path.based.cache.block.map.allocation.percent` was?! Now you can
use it in Spark!! Pining for your old friend
`mapreduce.map.skip.proc.count.autoincr`... fuggedabodit - we got ya!
I plan to contribute tests and docs in a subsequent patch. Please merge
this ASAP and include in Spark 0.9.1.
**NB: This diff is too large for github to render. Users will have to
download and play with this on their own.**
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/pwendell/spark 4-1
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/286.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #286
----
commit 2ce4bb837bd551501542683f20cf15bb6f574efc
Author: Patrick Wendell <[email protected]>
Date: 2014-04-01T18:10:12Z
Merge Hadoop Into Spark
This patch merges the Hadoop 0.20.2 source code into the Spark project.
I've thought about this a bunch and this will provide us with several benefits:
Let's be honest, to be taken seriously as a project Spark needs to have
_way more_ lines of code. Spark is currently 70,000 lines of Scala code - this
patch adds 452,000 lines of XML alone (!) This will make our github stats look
great!
Sometimes users stumble trying to build Spark against Hadoop. Not anymore!!
With Hadoop inside of Spark this won't a problem at all. I mean, there's
basically only one Hadoop version, right? So this should work for pretty much
everyone. Wait, hold on, is hadoop-0.20.2 the same as hadoop-2.2.0? I'm
assuming it's the same because they both have the same number of "2"s.
This patch will give users access to some of their favorite old configs
from Hadoop. Did you just figure out what
`dfs.namenode.path.based.cache.block.map.allocation.percent` was?! Now you can
use it in Spark!! Pining for your old friend
`mapreduce.map.skip.proc.count.autoincr`... fuggedabodit - we got ya!
I plan to contribute tests and docs in a subsequent patch. Please merge
this ASAP and include in Spark 0.9.1.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---