On Sun, Jan 30, 2011 at 11:19 PM, Owen O'Malley <omal...@apache.org> wrote:
> > Also note that pushing code out of Hadoop has a high cost. There are at > least 3 forks of the hadoop-gpl-compression code. That creates a lot of > confusion for the users. A lot of users never go to the work to figure out > which fork and branch of hadoop-gpl-compression work with the version of > Hadoop they installed. > > Indeed it creates confusion, but in my opinion it has been very successful modulo that confusion. In particular, Kevin and I (who each have a repo on github but basically co-maintain a branch) have done about 8 bugfix releases of LZO in the last year. The ability to take a bug and turn it around into a release within a few days has been very beneficial to the users. If it were part of core Hadoop, people would be forced to live with these blocker bugs for months at a time between dot releases. IMO the more we can take non-core components and move them to separate release timelines, the better. Yes, it is harder for users, but it also is easier for them when they hit a bug - they don't have to wait months for a wholesale upgrade which might contain hundreds of other changes to core components. I think this will also help the situation where people have set up shop on branches -- a lot of the value of these branches comes from the frequency of backports and bugfixes to "non-core" components. If the non-core stuff were on a faster timeline upstream, we could maintain core stability while also offering people the latest and greatest libraries, tools, codecs, etc. -Todd -- Todd Lipcon Software Engineer, Cloudera