On Thu, Jan 13, 2011 at 3:05 PM, Arun C Murthy <a...@yahoo-inc.com> wrote:

> Since this could be applied as a linear set of patches instead of a big
>> lump, would there be interest in using this as the 0.20.>100 Apache
>> release?
>> I can take the time to remove any patches that are cloudera specific or
>> not
>> yet applied upstream.
>>
>>
> Interesting discussion, thanks.
>
> I'm sure it took you a fair amount of work to squash patches (which I tried
> too, btw).


Yep, I had a great summer ;-)


> That, plus the fact that we would need to do a similar amount of work for
> the 10 or so releases we have done after 0.20.100.3 scares me.
>

Sorry, I actually meant 0.20.104.3. Have there been many releases since
then? That's the last version available on the Yahoo github, and that's the
version we incorporated/linearized.

If there is a large sequence of patches after this that you're planning on
including, it would be good to see them in your git repo.



> As we Nigel and I discussed here, the jumbo  patch and an up-to-date
> CHANGES.txt provides almost all of the benefits we seek and allows all of us
> to get this done very quickly to focus on hadoop-0.22 and beyond.
>
>
In my opinion here are the downsides to this plan:

- a mondo "merge" patch is a big pain when trying to do debugging. It may be
sufficient for a user to look at CHANGES.txt, but I find myself using
blame/log/etc on individual files to understand code lineage on a daily
basis. If all of the merge shows up as a big patch it will be very difficult
(at least the way I work with code) to help users debug issues or understand
which JIRA a certain regression may have come from.

- CHANGES.txt traditionally doesn't reference which patch file from a JIRA
was checked in. So we may know that a given JIRA has been included, but
often there are several revisions of patches on the JIRA and it's difficult
to be sure that we have the most up-to-date version. By looking at change
history it's usually easy to pick this out, but if it's one giant patch
apply, this isn't possible.

- the proposal to use the YDH distro certainly solves the Security issue,
but doesn't help out HBase at all. Given HBase has been asking for a long
time to get a real release of the append branch, I think it would be better
to have one 20-based release which has both of these features, rather than
further fragmenting the community into 0.20.2, 0.20.2+security,
0.20.2+append.

I think the first two points could be addressed if you push your git tree
either to github or an apache-hosted git, and then include in SVN as a mondo
patch. It's not ideal, but at least when trying to debug issues and
understand the history of this branch there will be a publicly available
change history to reference.

To clarify my position a bit here - I definitely appreciate your
volunteering to do the work, and wouldn't *block* the proposal as you've put
it forth. I just think it will have limited utility for the community by
being opaque (if contributed as a giant patch) and by not including the sync
feature which is critical for a large segment of users. Given those
downsides I'd rather see the effort diverted towards making a killer 0.22
release that we can all jump on.

Thanks
-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

Reply via email to