[GitHub] spark pull request: Merge Hadoop Into Spark

pwendell Tue, 01 Apr 2014 11:26:39 -0700

GitHub user pwendell opened a pull request:

    https://github.com/apache/spark/pull/286


    Merge Hadoop Into Spark

    This patch merges the Hadoop 0.20.2 source code into the Spark project. 
I've thought about this a bunch and this will provide us with several benefits:
    
    ### More source code
    Let's be honest, to be taken seriously as a project Spark needs to have 
_way more_ lines of code. Spark is currently 70,000 lines of Scala code - this 
patch adds 452,000 lines of XML alone (!) This will make our github stats look 
great!
    
    ### Seamless builds
    Sometimes users stumble trying to build Spark against Hadoop. Not anymore!! 
With Hadoop inside of Spark this won't a problem at all. I mean, there's 
basically only one Hadoop version, right? So this should work for pretty much 
everyone. Wait, hold on, is hadoop-0.20.2 the same as hadoop-2.2.0? I'm 
assuming it's the same because they both have the same number of "2"s.
    
    ### Your favorite old configs
    This patch will give users access to some of their favorite old configs 
from Hadoop. Did you just figure out what 
`dfs.namenode.path.based.cache.block.map.allocation.percent` was?! Now you can 
use it in Spark!! Pining for your old friend 
`mapreduce.map.skip.proc.count.autoincr`... fuggedabodit - we got ya!
    
    I plan to contribute tests and docs in a subsequent patch. Please merge 
this ASAP and include in Spark 0.9.1.
    
    **NB: This diff is too large for github to render. Users will have to 
download and play with this on their own.**

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/pwendell/spark 4-1

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/286.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #286
    
----
commit 2ce4bb837bd551501542683f20cf15bb6f574efc
Author: Patrick Wendell <[email protected]>
Date:   2014-04-01T18:10:12Z

    Merge Hadoop Into Spark
    
    This patch merges the Hadoop 0.20.2 source code into the Spark project. 
I've thought about this a bunch and this will provide us with several benefits:
    
    Let's be honest, to be taken seriously as a project Spark needs to have 
_way more_ lines of code. Spark is currently 70,000 lines of Scala code - this 
patch adds 452,000 lines of XML alone (!) This will make our github stats look 
great!
    
    Sometimes users stumble trying to build Spark against Hadoop. Not anymore!! 
With Hadoop inside of Spark this won't a problem at all. I mean, there's 
basically only one Hadoop version, right? So this should work for pretty much 
everyone. Wait, hold on, is hadoop-0.20.2 the same as hadoop-2.2.0? I'm 
assuming it's the same because they both have the same number of "2"s.
    
    This patch will give users access to some of their favorite old configs 
from Hadoop. Did you just figure out what 
`dfs.namenode.path.based.cache.block.map.allocation.percent` was?! Now you can 
use it in Spark!! Pining for your old friend 
`mapreduce.map.skip.proc.count.autoincr`... fuggedabodit - we got ya!
    
    I plan to contribute tests and docs in a subsequent patch. Please merge 
this ASAP and include in Spark 0.9.1.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Merge Hadoop Into Spark

Reply via email to