Re: CDH2 or Apache Hadoop - Official Debian packages

2010-02-25 Thread Thomas Koch
Allen, 
 For all intents and purposes, the Debian package sounds just like a
 re-packaging of the Apache distribution in .deb form.
You're perfectly right. Most Debian packages are just a re-packaging of the 
upstream projects, but with additional management information and logic to 
ease the installation and make them work well on the plattform and together 
with other programs.
It's the beautiful world of package management:
apt-get install hadoop
less /usr/share/doc/hadoop/README
... Have fun with hadoop

  - no version namespace, everything is called just hadoop, not
  hadoop-0.18 or hadoop-0.20 as in the cloudera package
 
 ... and thus making upgrades really hard and not suitable for anything
 real.
Actually my hope is in the plan of hadoop to once establish a stable API (as 
planned) so that an upgrade will be backwards compatible.
As long as that isn't the case, the Debian package is intended only for three 
audiencens:
- People who are willing to deal with any upgrade hassles for the benefit of 
an official Debian package
- People who'd like to try out and learn hadoop with an easily installable 
package
- Me

That said, I'm going to use the Debian package on a tiny production cluster of 
5 machines.
 
Thomas Koch, http://www.koch.ro


Re: CDH2 or Apache Hadoop - Official Debian packages

2010-02-25 Thread Owen O'Malley


On Feb 25, 2010, at 10:20 AM, Allen Wittenauer wrote:

Actually my hope is in the plan of hadoop to once establish a  
stable API (as

planned) so that an upgrade will be backwards compatible.


History shows you are in for a long wait.


I hope not and I'm trying to make sure that isn't true. At this point,  
we have a lot of customers inside Yahoo who yell at our SVP when  
anyone breaks API compatibility with the previous release.


My hope to get to the point where we do one major release a year and  
each major release is backwards compatible with the previous major  
release (as in you don't need to recompile your code). Bonus points if  
we can get a minor release out at the half year point. And of course  
bug fix releases as needed...


-- Owen


Re: CDH2 or Apache Hadoop - Official Debian packages

2010-02-25 Thread Allen Wittenauer



On 2/25/10 8:39 AM, Thomas Koch tho...@koch.ro wrote:
 - no version namespace, everything is called just hadoop, not
 hadoop-0.18 or hadoop-0.20 as in the cloudera package
 
 ... and thus making upgrades really hard and not suitable for anything
 real.
 Actually my hope is in the plan of hadoop to once establish a stable API (as
 planned) so that an upgrade will be backwards compatible.

History shows you are in for a long wait.

It is also worth pointing out that API compat is only part of the issue.
Without ABI compat, it is still a very rough road.  [A point lost on way too
many in the Hadoop community; too many devs, not enough ops.]




Re: CDH2 or Apache Hadoop - Official Debian packages

2010-02-24 Thread Thomas Koch
Ananth,
 Just wanted to get the groups general feelings on what the preferred distro
 is and why? Obviously assuming one didn't have a service agreement with
 cloudera.
There'll shortly be a third alternative: The debian package of hadoop is in 
the Debian new queue[1] and will hopefully pass it in a couple of days to 
enter debian unstable. A preview is available from the unofficial repository 
of the Debian-Java Team.[2][3]
The Debian package took the cloudera packaging as model, with some slight 
changes:

- no version namespace, everything is called just hadoop, not hadoop-0.18 
or hadoop-0.20 as in the cloudera package

- some contributions are missing due to lack of manpower or missing 
dependencies in Debian

- the native C++ hadoop code is not in the package due to lack of manpower

The advantage of the debian packages is a more standards conform integration 
in Debian.

[1] http://ftp-master.debian.org/new.html
[2] put this in /etc/apt/sources.list:
deb http://pkg-java.alioth.debian.org unstable/all/
[3] http://wiki.debian.org/Teams/JavaPackaging

Best regards,

Thomas Koch, http://www.koch.ro


Re: CDH2 or Apache Hadoop - Official Debian packages

2010-02-24 Thread Allen Wittenauer



On 2/24/10 4:45 AM, Thomas Koch tho...@koch.ro wrote:
 There'll shortly be a third alternative:

There are already three:

- Apache
- Cloudera
- Yahoo!

and with several others in development.

For all intents and purposes, the Debian package sounds just like a
re-packaging of the Apache distribution in .deb form.


 - no version namespace, everything is called just hadoop, not hadoop-0.18
 or hadoop-0.20 as in the cloudera package

... and thus making upgrades really hard and not suitable for anything
real.