On Thu, Jan 13, 2011 at 3:05 PM, Arun C Murthy <a...@yahoo-inc.com> wrote:
> Since this could be applied as a linear set of patches instead of a big >> lump, would there be interest in using this as the 0.20.>100 Apache >> release? >> I can take the time to remove any patches that are cloudera specific or >> not >> yet applied upstream. >> >> > Interesting discussion, thanks. > > I'm sure it took you a fair amount of work to squash patches (which I tried > too, btw). Yep, I had a great summer ;-) > That, plus the fact that we would need to do a similar amount of work for > the 10 or so releases we have done after 0.20.100.3 scares me. > Sorry, I actually meant 0.20.104.3. Have there been many releases since then? That's the last version available on the Yahoo github, and that's the version we incorporated/linearized. If there is a large sequence of patches after this that you're planning on including, it would be good to see them in your git repo. > As we Nigel and I discussed here, the jumbo patch and an up-to-date > CHANGES.txt provides almost all of the benefits we seek and allows all of us > to get this done very quickly to focus on hadoop-0.22 and beyond. > > In my opinion here are the downsides to this plan: - a mondo "merge" patch is a big pain when trying to do debugging. It may be sufficient for a user to look at CHANGES.txt, but I find myself using blame/log/etc on individual files to understand code lineage on a daily basis. If all of the merge shows up as a big patch it will be very difficult (at least the way I work with code) to help users debug issues or understand which JIRA a certain regression may have come from. - CHANGES.txt traditionally doesn't reference which patch file from a JIRA was checked in. So we may know that a given JIRA has been included, but often there are several revisions of patches on the JIRA and it's difficult to be sure that we have the most up-to-date version. By looking at change history it's usually easy to pick this out, but if it's one giant patch apply, this isn't possible. - the proposal to use the YDH distro certainly solves the Security issue, but doesn't help out HBase at all. Given HBase has been asking for a long time to get a real release of the append branch, I think it would be better to have one 20-based release which has both of these features, rather than further fragmenting the community into 0.20.2, 0.20.2+security, 0.20.2+append. I think the first two points could be addressed if you push your git tree either to github or an apache-hosted git, and then include in SVN as a mondo patch. It's not ideal, but at least when trying to debug issues and understand the history of this branch there will be a publicly available change history to reference. To clarify my position a bit here - I definitely appreciate your volunteering to do the work, and wouldn't *block* the proposal as you've put it forth. I just think it will have limited utility for the community by being opaque (if contributed as a giant patch) and by not including the sync feature which is critical for a large segment of users. Given those downsides I'd rather see the effort diverted towards making a killer 0.22 release that we can all jump on. Thanks -Todd -- Todd Lipcon Software Engineer, Cloudera