Add Dumbo to contrib
--------------------
Key: HADOOP-4304
URL: https://issues.apache.org/jira/browse/HADOOP-4304
Project: Hadoop Core
Issue Type: New Feature
Reporter: Klaas Bosteels
Priority: Minor
Originally, Dumbo was a simple Python module developed at Last.fm to make
writing and running Hadoop Streaming programs very easy, but now it also
consists of some (up till now unreleased) helper code in Java (although it can
still be used without the Java code). We propose to add Dumbo to "src/contrib"
such that the Java classes get build/installed together with the rest of
Hadoop, and the Python module can be installed separately at will. A tar.gz of
the directory that would have to be added to "src/contrib" is available at
http://static.last.fm/dumbo/dumbo-contrib.tar.gz
and more info about Dumbo can be found here:
* Basic documentation: http://github.com/klbostee/dumbo/wikis
* Presentation at HUG (where it was first suggested to add Dumbo to contrib):
http://skillsmatter.com/podcast/home/dumbo-hadoop-streaming-made-elegant-and-easy
* Initial announcement:
http://blog.last.fm/2008/05/29/python-hadoop-flying-circus-elephant
For some of the more advanced features of Dumbo (in particular the ones for
which the Java classes are needed) there is no public documentation yet, but we
could easily fill that gap by moving some of the internal Last.fm documentation
to the Hadoop wiki.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.