Re: CDH2 or Apache Hadoop - Official Debian packages
Allen, For all intents and purposes, the Debian package sounds just like a re-packaging of the Apache distribution in .deb form. You're perfectly right. Most Debian packages are just a re-packaging of the upstream projects, but with additional management information and logic to ease the installation and make them work well on the plattform and together with other programs. It's the beautiful world of package management: apt-get install hadoop less /usr/share/doc/hadoop/README ... Have fun with hadoop - no version namespace, everything is called just hadoop, not hadoop-0.18 or hadoop-0.20 as in the cloudera package ... and thus making upgrades really hard and not suitable for anything real. Actually my hope is in the plan of hadoop to once establish a stable API (as planned) so that an upgrade will be backwards compatible. As long as that isn't the case, the Debian package is intended only for three audiencens: - People who are willing to deal with any upgrade hassles for the benefit of an official Debian package - People who'd like to try out and learn hadoop with an easily installable package - Me That said, I'm going to use the Debian package on a tiny production cluster of 5 machines. Thomas Koch, http://www.koch.ro
Re: CDH2 or Apache Hadoop - Official Debian packages
On Feb 25, 2010, at 10:20 AM, Allen Wittenauer wrote: Actually my hope is in the plan of hadoop to once establish a stable API (as planned) so that an upgrade will be backwards compatible. History shows you are in for a long wait. I hope not and I'm trying to make sure that isn't true. At this point, we have a lot of customers inside Yahoo who yell at our SVP when anyone breaks API compatibility with the previous release. My hope to get to the point where we do one major release a year and each major release is backwards compatible with the previous major release (as in you don't need to recompile your code). Bonus points if we can get a minor release out at the half year point. And of course bug fix releases as needed... -- Owen
Re: CDH2 or Apache Hadoop - Official Debian packages
On 2/25/10 8:39 AM, Thomas Koch tho...@koch.ro wrote: - no version namespace, everything is called just hadoop, not hadoop-0.18 or hadoop-0.20 as in the cloudera package ... and thus making upgrades really hard and not suitable for anything real. Actually my hope is in the plan of hadoop to once establish a stable API (as planned) so that an upgrade will be backwards compatible. History shows you are in for a long wait. It is also worth pointing out that API compat is only part of the issue. Without ABI compat, it is still a very rough road. [A point lost on way too many in the Hadoop community; too many devs, not enough ops.]
Re: CDH2 or Apache Hadoop - Official Debian packages
Ananth, Just wanted to get the groups general feelings on what the preferred distro is and why? Obviously assuming one didn't have a service agreement with cloudera. There'll shortly be a third alternative: The debian package of hadoop is in the Debian new queue[1] and will hopefully pass it in a couple of days to enter debian unstable. A preview is available from the unofficial repository of the Debian-Java Team.[2][3] The Debian package took the cloudera packaging as model, with some slight changes: - no version namespace, everything is called just hadoop, not hadoop-0.18 or hadoop-0.20 as in the cloudera package - some contributions are missing due to lack of manpower or missing dependencies in Debian - the native C++ hadoop code is not in the package due to lack of manpower The advantage of the debian packages is a more standards conform integration in Debian. [1] http://ftp-master.debian.org/new.html [2] put this in /etc/apt/sources.list: deb http://pkg-java.alioth.debian.org unstable/all/ [3] http://wiki.debian.org/Teams/JavaPackaging Best regards, Thomas Koch, http://www.koch.ro
Re: CDH2 or Apache Hadoop - Official Debian packages
On 2/24/10 4:45 AM, Thomas Koch tho...@koch.ro wrote: There'll shortly be a third alternative: There are already three: - Apache - Cloudera - Yahoo! and with several others in development. For all intents and purposes, the Debian package sounds just like a re-packaging of the Apache distribution in .deb form. - no version namespace, everything is called just hadoop, not hadoop-0.18 or hadoop-0.20 as in the cloudera package ... and thus making upgrades really hard and not suitable for anything real.