On Sun, Jan 30, 2011 at 11:19 PM, Owen O'Malley <omal...@apache.org> wrote:

>
> Also note that pushing code out of Hadoop has a high cost. There are at
> least 3 forks of the hadoop-gpl-compression code. That creates a lot of
> confusion for the users. A lot of users never go to the work to figure out
> which fork and branch of hadoop-gpl-compression work with the version of
> Hadoop they installed.
>
>
Indeed it creates confusion, but in my opinion it has been very successful
modulo that confusion.

In particular, Kevin and I (who each have a repo on github but basically
co-maintain a branch) have done about 8 bugfix releases of LZO in the last
year. The ability to take a bug and turn it around into a release within a
few days has been very beneficial to the users. If it were part of core
Hadoop, people would be forced to live with these blocker bugs for months at
a time between dot releases.

IMO the more we can take non-core components and move them to separate
release timelines, the better. Yes, it is harder for users, but it also is
easier for them when they hit a bug - they don't have to wait months for a
wholesale upgrade which might contain hundreds of other changes to core
components. I think this will also help the situation where people have set
up shop on branches -- a lot of the value of these branches comes from the
frequency of backports and bugfixes to "non-core" components. If the
non-core stuff were on a faster timeline upstream, we could maintain core
stability while also offering people the latest and greatest libraries,
tools, codecs, etc.

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

Reply via email to