Re: JIRAs post-"unsplit"

2011-06-13 Thread Todd Lipcon
On Mon, Jun 13, 2011 at 11:51 AM, Konstantin Boudnik  wrote:

> I tend to agree: JIRA separation was the benefit of the split.
>
> I'd rather keep the current JIRA split in effect (e.g. separate JIRA
> projects
> for separate Hadoop components; don't recombine them) and file patches in
> the
> same way (for common, hdfs, mapreduce). If a cross component patch is
> needed
> then HADOOP project JIRA can be used for tracking, patches, etc.
>

Yea, perhaps we just need the QA bot to be smart enough that it could handle
a cross-project patch attached to HADOOP? Maybe we do something crazy and
make a new HADOOPCROSS jira for patches that affect multiple projects? (just
brainstorming here...)


> Tree-based watch-list seems like a great idea, but won't it narrow the
> scope
> somehow? Are you saying that if I am interested in say
> hdfs/src/c++/libhdfs,
> but a JIRA is open which affects libhdfs and something else (e.g. NameNode)
> I
> will still get the notification?
>

Right, that's the idea. You'd be added as a watcher (and get notified) for
any patch that touches the area you care about, regardless of whether it
also touches some other areas.

-Todd

On Mon, Jun 13, 2011 at 11:28AM, Todd Lipcon wrote:
> > After the "project unsplit" this weekend, we're now back to a place where
> we
> > have a single SVN/git tree that encompasses all of the subprojects. This
> > opens up the next question: should we merge the JIRAs and allow a single
> > issue to have a patch which spans projects?
> >
> > My thoughts are:
> > - the biggest pain point with the project split is dealing with
> > cross-project patches
> > - one of the biggest reasons we did the project split was that the
> combined
> > traffic from the HADOOP JIRA was hard to follow for people who really
> care
> > about certain subprojects.
> > - the jira split is a coarse-grained way of allowing people to watch just
> > the sub-areas they care about.
> >
> > So, I was thinking the following... what if there were a way to watch
> JIRAs
> > based on subtrees? I'm imagining a web page where any community user
> could
> > have an account and manage a "watch list" of subtrees. If you want to
> watch
> > all MR jiras, you could simply watch mapreduce/*. If you care only about
> > libhdfs, you could watch hdfs/src/c++/libhdfs, etc. Then a bot would
> watch
> > all patches attached to JIRA, and any time a patch is uploaded that
> touches
> > something on your watch list, it automatically adds you as a watcher on
> the
> > ticket and sends you a notification via email. It would also be easy to
> set
> > up a watch based on patch size, for example.
> >
> > I think even if we don't recombine the JIRAs, this might be a handy way
> to
> > cut down on mailing list traffic for contributors who have a more narrow
> > focus on certain areas of the code.
> >
> > Does this sound useful? I don't know if/when I'd have time to build such
> a
> > thing, but if the community thinks it would be really helpful, I might
> > become inspired.
> >
> > -Todd
> > --
> > Todd Lipcon
> > Software Engineer, Cloudera
>



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: HADOOP-7106 (project unsplit) this weekend

2011-06-13 Thread Todd Lipcon
On Sun, Jun 12, 2011 at 4:38 PM, Todd Lipcon  wrote:

> If you want to be able to have git follow renames all the way through the
> project split back to the beginning of time, put the following in
> hadoop-common/.git/info/grafts:
> 5128a9a453d64bfe1ed978cf9ffed27985eeef36
> 6c16dc8cf2b28818c852e95302920a278d07ad0c
> 6a3ac690e493c7da45bbf2ae2054768c427fd0e1
> 6c16dc8cf2b28818c852e95302920a278d07ad0c
> 546d96754ffee3142bcbbf4563c624c053d0ed0d
> 6c16dc8cf2b28818c852e95302920a278d07ad0c
>
>
ATM has pointed out that the above contents line-wrapped in some people's
email inboxes. I've put the correct information on the GitAndHadoop wiki
page here:
http://wiki.apache.org/hadoop/GitAndHadoop

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera


Re: JIRAs post-"unsplit"

2011-06-13 Thread Todd Lipcon
On Mon, Jun 13, 2011 at 4:54 PM, Konstantin Boudnik  wrote:

> On Mon, Jun 13, 2011 at 01:37PM, Todd Lipcon wrote:
> > On Mon, Jun 13, 2011 at 11:51 AM, Konstantin Boudnik 
> wrote:
> >
> > > I tend to agree: JIRA separation was the benefit of the split.
> > >
> > > I'd rather keep the current JIRA split in effect (e.g. separate JIRA
> > > projects
> > > for separate Hadoop components; don't recombine them) and file patches
> in
> > > the
> > > same way (for common, hdfs, mapreduce). If a cross component patch is
> > > needed
> > > then HADOOP project JIRA can be used for tracking, patches, etc.
> > >
> >
> > Yea, perhaps we just need the QA bot to be smart enough that it could
> handle
> > a cross-project patch attached to HADOOP? Maybe we do something crazy and
> > make a new HADOOPCROSS jira for patches that affect multiple projects?
> (just
> > brainstorming here...)
>
> Correct me if I'm wrong but in the new structure cross-component patch
> differs
> from a component one by a patch level (i.e. p0 vs p1 if looked from
> common/trunk), right? I guess the bot can be hacked to use this distinction
> thus saving us an extra JIRA project which will merely serve the purpose of
> meta-project.
>
>
Yes, I am about to commit HADOOP-7384 which can at least deal with patches
relative to either trunk/ or trunk/. But, it will also detect a
cross-project patch and barf.

It could certainly be extended to apply and test a cross-project patch,
though it would be substantially more work.

The advantage of a separate HADOOPX jira would be to allow people to notice
cross-project patches. For example, a dev who primarily works on HDFS may
not subscribe to mapreduce-dev or mapreduce-issues, but if an MR issue is
going to modify something in the HDFS codebase, he or she will certainly
want to be aware of it.

-Todd


> > > Tree-based watch-list seems like a great idea, but won't it narrow the
> > > scope
> > > somehow? Are you saying that if I am interested in say
> > > hdfs/src/c++/libhdfs,
> > > but a JIRA is open which affects libhdfs and something else (e.g.
> NameNode)
> > > I
> > > will still get the notification?
> > >
> >
> > Right, that's the idea. You'd be added as a watcher (and get notified)
> for
> > any patch that touches the area you care about, regardless of whether it
> > also touches some other areas.
> >
> > -Todd
> >
> > On Mon, Jun 13, 2011 at 11:28AM, Todd Lipcon wrote:
> > > > After the "project unsplit" this weekend, we're now back to a place
> where
> > > we
> > > > have a single SVN/git tree that encompasses all of the subprojects.
> This
> > > > opens up the next question: should we merge the JIRAs and allow a
> single
> > > > issue to have a patch which spans projects?
> > > >
> > > > My thoughts are:
> > > > - the biggest pain point with the project split is dealing with
> > > > cross-project patches
> > > > - one of the biggest reasons we did the project split was that the
> > > combined
> > > > traffic from the HADOOP JIRA was hard to follow for people who really
> > > care
> > > > about certain subprojects.
> > > > - the jira split is a coarse-grained way of allowing people to watch
> just
> > > > the sub-areas they care about.
> > > >
> > > > So, I was thinking the following... what if there were a way to watch
> > > JIRAs
> > > > based on subtrees? I'm imagining a web page where any community user
> > > could
> > > > have an account and manage a "watch list" of subtrees. If you want to
> > > watch
> > > > all MR jiras, you could simply watch mapreduce/*. If you care only
> about
> > > > libhdfs, you could watch hdfs/src/c++/libhdfs, etc. Then a bot would
> > > watch
> > > > all patches attached to JIRA, and any time a patch is uploaded that
> > > touches
> > > > something on your watch list, it automatically adds you as a watcher
> on
> > > the
> > > > ticket and sends you a notification via email. It would also be easy
> to
> > > set
> > > > up a watch based on patch size, for example.
> > > >
> > > > I think even if we don't recombine the JIRAs, this might be a handy
> way
> > > to
> > > > cut down on mailing list traffic for contributors who have a more
> narrow
> > > > focus on certain areas of the code.
> > > >
> > > > Does this sound useful? I don't know if/when I'd have time to build
> such
> > > a
> > > > thing, but if the community thinks it would be really helpful, I
> might
> > > > become inspired.
> > > >
> > > > -Todd
> > > > --
> > > > Todd Lipcon
> > > > Software Engineer, Cloudera
> > >
> >
> >
> >
> > --
> > Todd Lipcon
> > Software Engineer, Cloudera
>



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: HADOOP-7106 (project unsplit) this weekend

2011-06-13 Thread Todd Lipcon
On Mon, Jun 13, 2011 at 11:42 AM, Tsz Wo (Nicholas), Sze <
s29752-hadoopgene...@yahoo.com> wrote:

> A few minor problems:
> (1) I had committed MAPREDUCE-2588, however, an commit email was sent to
> both
> common-commits@ and mapreduce-commits@.
>

I put up a patch to fix this on HADOOP-7106. Unfortunately I think Doug and
Ian are the only ones who can commit it, and they're both traveling at the
moment. So we may have to endure dup emails for a day or two. Sorry for the
mailbox noise

-Todd

-- 
Todd Lipcon
Software Engineer, Cloudera


Re: JIRAs post-"unsplit"

2011-06-14 Thread Todd Lipcon
On Tue, Jun 14, 2011 at 9:35 AM, Rottinghuis, Joep wrote:

> Project un-split definitely simplifies things.
>
> Todd, if people add a watch based on patches, would they not miss
> notifictions for those entries in an earlier phase of their lifecycle?
> For example when issues are just reported, discussed and assigned, but no
> patch has been attached yet?
>
>
Another thought that Alejandro just suggested offline is to use JIRA
components rather than just the file paths. So, assuming there is a bot that
watches the JIRA, it would be easy enough to allow you to permawatch a
component (JIRA itself doesn't give this option).

Then, assuming the patch is assigned the right components, it will be seen
by people who care early on. If it's not given the right components, then it
will be seen once you upload a patch.



> A separate HADOOPX Jira project would eliminate such issues.
>
> It does raise another question though: What happens if an issue starts out
> in one area, and then turns out to require changes in other areas?
> Would one then first create a HADOOP-x, a HDFS-y, or MAPREDUCE-z and then
> when it turns out other components are involved a new HADOOPX- referring to
> such earlier Jira?
>
> Cheers,
>
> Joep
>
> 
> From: Todd Lipcon [t...@cloudera.com]
> Sent: Monday, June 13, 2011 1:37 PM
> To: general@hadoop.apache.org
> Subject: Re: JIRAs post-"unsplit"
>
> On Mon, Jun 13, 2011 at 11:51 AM, Konstantin Boudnik 
> wrote:
>
> > I tend to agree: JIRA separation was the benefit of the split.
> >
> > I'd rather keep the current JIRA split in effect (e.g. separate JIRA
> > projects
> > for separate Hadoop components; don't recombine them) and file patches in
> > the
> > same way (for common, hdfs, mapreduce). If a cross component patch is
> > needed
> > then HADOOP project JIRA can be used for tracking, patches, etc.
> >
>
> Yea, perhaps we just need the QA bot to be smart enough that it could
> handle
> a cross-project patch attached to HADOOP? Maybe we do something crazy and
> make a new HADOOPCROSS jira for patches that affect multiple projects?
> (just
> brainstorming here...)
>
>
> > Tree-based watch-list seems like a great idea, but won't it narrow the
> > scope
> > somehow? Are you saying that if I am interested in say
> > hdfs/src/c++/libhdfs,
> > but a JIRA is open which affects libhdfs and something else (e.g.
> NameNode)
> > I
> > will still get the notification?
> >
>
> Right, that's the idea. You'd be added as a watcher (and get notified) for
> any patch that touches the area you care about, regardless of whether it
> also touches some other areas.
>
> -Todd
>



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: [VOTE] Powered by Logo

2011-06-14 Thread Todd Lipcon
Who is allowed to vote in this? Committers? PMC? Everyone?

My vote: 5, 2, 6, 3, 1, 4

On Tue, Jun 14, 2011 at 8:19 PM, Owen O'Malley  wrote:

> All,
>   We've had a wide range of entries for a powered by logo. I've put them
> all on a page, here:
>
> http://people.apache.org/~omalley/hadoop-powered-by/
>
> Since there are a lot of contenders and we only want a single round of
> voting, let's use single transferable vote ( STV
> http://en.wikipedia.org/wiki/Single_transferable_vote). The important
> thing is to pick the images *IN ORDER* that you would like them.
>
> My vote (in order of course): 4, 1, 2, 3, 5, 6.
>
> In other words, I want option 4 most and option 6 least. With STV, you
> don't need to worry about voting for an unpopular choice since your vote
> will automatically roll over to your next choice.
>
> -- Owen
>
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera


Re: [VOTE] Shall we adopt the "Defining Hadoop" page

2011-06-15 Thread Todd Lipcon
On Wed, Jun 15, 2011 at 7:19 PM, Craig L Russell
wrote:

> There's no ambiguity. Either you ship the bits that the Apache PMC has
> voted on as a release, or you change it (one bit) and it is no longer what
> the PMC has voted on. It's a derived work.
>
> The rules for voting in Apache require that if you change a bit in an
> artifact, you can no longer count votes for the previous artifact. Because
> the new work is different. A new vote is required.
>

Sorry, but this is just silly. Are you telling me that the httpd package in
Ubuntu isn't Apache httpd? It has 43 patches applied. Tomcat6 has 17. I'm
sure every other commonly used piece of software bundled with ubuntu has
been patched, too. I don't see them calling their packages "Ubuntu HTTP
server powered by Apache HTTPD". It's just httpd.

The httpd in RHEL 5 is the same way. In fact they even provide some nice
metadata in their patches, for example:
httpd-2.0.48-release.patch:Upstream-Status: vendor-specific change
httpd-2.1.10-apctl.patch:Upstream-Status: Vendor-specific changes for better
initscript integration

To me, this is a good thing: allowing vendors to redistribute the software
with some modifications makes it much more accessible to users and
businesses alike, and that's part of why Hadoop has had so much success. So
long as we require the vendors to upstream those modifications back to the
ASF, we get the benefits of these contributions back in the community and
everyone should be happy.

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera


Re: HADOOP-7106 (project unsplit) this weekend

2011-06-16 Thread Todd Lipcon
On Thu, Jun 16, 2011 at 2:27 AM, Ranjit Mathew  wrote:

> On 06/13/2011 09:33 PM, Todd Lipcon wrote:
>
>> Oops, sorry about that one. I will take care of that in about 30 minutes
>>> (just headed out the door now to catch a train). If someone else with
>>> commit
>>> access wants to, you just need to propset the externals to point to th
>>> new
>>> common/trunk/common/src/test/**bin instead of the old location.
>>>
>>>
>>>  Fixed the svn:externals
>>
>
> Does it make sense for "test-patch.sh" and "smart-apply-patch.sh" to
> remain "external" now that they're within the same project?
>

Probably not long-term... but right now the various subproject hudson builds
only check out the hdfs/ or mapreduce/ subtree, so rooting the test scripts
above that level would require a bit of reconfiguration.


>
> Currently on every "svn up" for trunk I get:
> -- 8< --
> $ svn up
>
> Fetching external item into 'hdfs/src/test/bin'
> External at revision 1136333.
>
>
> Fetching external item into 'mapreduce/src/test/bin'
> External at revision 1136333.
>
> At revision 1136333.
> -- 8< --
>
> after a little pause.
>
> Ranjit
>
> PS: I'm using "svn, version 1.6.16 (r1073529)" on Fedora 14/x86-32.
>



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Thinking about the next hadoop mainline release

2011-06-17 Thread Todd Lipcon
On Fri, Jun 17, 2011 at 7:30 AM, Brian Bockelman  wrote:
>
> Hi Ryan, Eric,
>
> Just looked at those two for the first time in awhile.
> - HDFS-918 (now 1323?) doesn't seem like it's too controversial, but does 
> seem like there's a bit of validation left.

Yes, 1323 and also 1148 would be "nice to haves", but neither is ready
to go, yet. Though I really want to improve HBase performance, I also
tend to be fairly conservative on how much testing these things should
need before getting checked in (unless they can be completely
pluggable).

The good news is we did get 941 in last week, and that's a real nice
improvement.

> - HDFS-347 has a long, contentious history.  However, it seems that most of 
> the strong objections have been cleared up.  Is there anyone left who objects 
> to it, now that it doesn't appear to bypass security?

It still has a way to go to be pushed over the finish line. I don't
foresee it happening for this release.

> Finally, I see Todd has posted HDFS-2080 claiming some sizable performance 
> improvements.  Would it be possible that could finish in time for release?

HDFS-2080 has very good bang-for-the-buck in the gains-per-complexity
ratio, especially compared to 347. It could also be made completely
pluggable, since it's just a new implementation of BlockReader. So it
might be feasible to include but not enabled by default.

But, I wouldn't block the 0.23 (or any other) release on including
these things. If they're done and look low-risk at an early enough
date, I'll do my best to convince the RM to include them, but if they
haven't had enough testing, then off to the next release with em.

-Todd

>
> On Jun 17, 2011, at 2:36 AM, Ryan Rawson wrote:
>
> > HDFS-918 and HDFS-347 are absolutely critical for random read
> > performance.  The smarter sites are already running HDFS-347 (I guess
> > they aren't running "Hadoop" then?), and soon they will be testing and
> > running HDFS-918 as well.  Opening 1 socket for every read just isn't
> > really scalable.
> >
> > -ryan
> >
> > On Fri, Jun 17, 2011 at 12:17 AM, Eric Baldeschwieler
> >  wrote:
> >> Hi Folks,
> >>
> >> I'd like to start a conversation on mainline planning and the next release 
> >> of Apache Hadoop beyond 0.22.
> >>
> >> The Yahoo! Hadoop team has been working hard to complete several big 
> >> Hadoop projects, including:
> >>
> >> - HDFS Federation [HDFS-1052]
> >>  - Already merged into trunk
> >>
> >> - Next Generation Map-Reduce [MR-279]
> >>  - Passing most tests now and discussing merging into trunk
> >>
> >> - The merging of our previous work on Hadoop with security into mainline 
> >> [http://yhoo.it/i9Ww8W]
> >>  - This is mostly done, but owen and others are doing a scrub to close out 
> >> the remaining issues
> >>
> >> All of these projects are now reaching a place where we would like to 
> >> combine them with the good work already in 0.22 and put out a new apache 
> >> release, perhaps 0.23.  We think the best way to accomplish that is to 
> >> finish the merge in the next few weeks and then cut a release from trunk.
> >>
> >> Yahoo stands ready to help us (the Apache Hadoop Community) turn this new 
> >> release into a stable release by running it through its 9 month test and 
> >> burn in process.  The result of that will be another stable release such 
> >> as 0.18, 0.20 or 0.20.203 (hadoop with security).  We have Yahoo!s support 
> >> for this substantial investment because this new release will have a great 
> >> combination of new features for small and very large sites alike:
> >>  - New Write Pipeline - HBase support [also in 0.21 & 0.22]
> >>  - Federation - Scale up to larger clusters and the ability to experiment 
> >> with new namenode approaches
> >>  - Next Gen MapReduce - Scaleup, performance improvements, ability to 
> >> experiment with new processing frameworks
> >>
> >> I think this effort will produce a great new Apache Hadoop release for the 
> >> community.  I'm starting this thread to collect feedback and hopefully 
> >> folks' endorsement for merging in MR-279 and putting together this new 
> >> release.  Feedback please?
> >>
> >> Thanks,
> >>
> >> E14
> >>
> >>
>



--
Todd Lipcon
Software Engineer, Cloudera


Re: Thinking about the next hadoop mainline release

2011-06-17 Thread Todd Lipcon
On Fri, Jun 17, 2011 at 7:15 AM, Arun C Murthy  wrote:
> I volunteer to be the RM for the release since I've been leading the NG NR 
> effort.
>
> Are folks ok with this?

+1. It would be an honor to fix bugs for you, Arun.

-Todd


> Sent from my iPhone
>
> On Jun 17, 2011, at 1:45 PM, "Ted Dunning"  wrote:
>
>> NG map reduce is a huge deal both in terms of making things better for
>> users, but also in terms of unblocking the Hadoop development process.
>>
>> On Fri, Jun 17, 2011 at 9:36 AM, Ryan Rawson  wrote:
>>
>>>> - Next Generation Map-Reduce [MR-279]
>>>> - Passing most tests now and discussing merging into trunk
>>>
>



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Thinking about the next hadoop mainline release

2011-06-24 Thread Todd Lipcon
On Fri, Jun 24, 2011 at 5:28 PM, Arun C Murthy  wrote:

> Thanks Suresh!
>
> Todd - I'd appreciate if you could help on some of the HBase/Performance
> jiras... thanks!
>
>
Sure thing.

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera


Hoping to merge HDFS-1073 branch soon

2011-06-30 Thread Todd Lipcon
Hey all,

Work on the HDFS-1073 branch has been progressing steadily, and I believe
we're coming close to the point where it can be merged. To briefly summarize
the status:
- NameNode and SecondaryNameNode are both fully working and have undergone
some stress/fault testing in addition to a over 3000 lines worth of new unit
tests.
- Most of the existing unit tests have been updated, though a few more need
some small tweaks (HDFS-2101)
- The BackupNode and CheckpointNode are not currently working, though I am
working on it locally and making good progress (HDFS-1979)
- There are a few various and sundry small improvements that should probably
be done before release, but I think could be done either before or after
merge (eg HDFS-2104)

Given this, I am expecting that we can merge this into trunk by the end of
July if not earlier, as soon as the BN/CN work is complete. If you are
hoping to review the code or tests before merge time, this is your early
warning! Please do so now!

Thanks!

-Todd
P.S. I will also be giving a short talk about the motivations and current
status of this project at Friday's contributor meeting, for those who are
able to attend. If we're lucky, maybe even a demo!
-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Hadoop Java Versions

2011-06-30 Thread Todd Lipcon
On Thu, Jun 30, 2011 at 5:16 PM, Ted Dunning  wrote:

> You have to consider the long-term reliability as well.
>
> Losing an entire set of 10 or 12 disks at once makes the overall
> reliability
> of a large cluster very suspect.  This is because it becomes entirely too
> likely that two additional drives will fail before the data on the off-line
> node can be replicated.  For 100 nodes, that can decrease the average time
> to data loss down to less than a year.  This can only be mitigated in stock
> hadoop by keeping the number of drives relatively low.  MapR avoids this by
> not failing nodes for trivial problems.
>

I'd advise you to look at "stock hadoop" again. This used to be true, but
was fixed a long while back by HDFS-457 and several followup JIRAs.

If MapR does something fancier, I'm sure we'd be interested to hear about it
so we can compare the approaches.

-Todd


>
> On Thu, Jun 30, 2011 at 4:18 PM, Aaron Eng  wrote:
>
> > >Keeping the amount of disks per node low and the amount of nodes high
> > should keep the impact of dead nodes in control.
> >
> > It keeps the impact of dead nodes in control but I don't think thats
> > long-term cost efficient.  As prices of 10GbE go down, the "keep the node
> > small" arguement seems less fitting.  And on another note, most servers
> > manufactured in the last 10 years have dual 1GbE network interfaces.  If
> > one
> > were to go by these calcs:
> >
> > >150 nodes with four 2TB disks each, with HDFS 60% full, it takes around
> > ~32
> > minutes to recover
> >
> > It seems like that assumes a single 1GbE interface, why  not leverage the
> > second?
> >
> > On Thu, Jun 30, 2011 at 2:31 PM, Evert Lammerts  > >wrote:
> >
> > > > You can get 12-24 TB in a server today, which means the loss of a
> > server
> > > > generates a lot of traffic -which argues for 10 Gbe.
> > > >
> > > > But
> > > >   -big increase in switch cost, especially if you (CoI warning) go
> with
> > > > Cisco
> > > >   -there have been problems with things like BIOS PXE and lights out
> > > > management on 10 Gbe -probably due to the NICs being things the BIOS
> > > > wasn't expecting and off the mainboard. This should improve.
> > > >   -I don't know how well linux works with ether that fast (field
> > reports
> > > > useful)
> > > >   -the big threat is still ToR switch failure, as that will trigger a
> > > > re-replication of every block in the rack.
> > >
> > > Keeping the amount of disks per node low and the amount of nodes high
> > > should keep the impact of dead nodes in control. A ToR switch failing
> is
> > > different - missing 30 nodes (~120TB) at once cannot be fixed by adding
> > more
> > > nodes; that actually increases ToR switch failure. Although such
> failure
> > is
> > > quite rare to begin with, I guess. The back-of-the-envelope-calculation
> I
> > > made suggests that ~150 (1U) nodes should be fine with 1Gb ethernet.
> > (e.g.,
> > > when 6 nodes fail in a cluster with 150 nodes with four 2TB disks each,
> > with
> > > HDFS 60% full, it takes around ~32 minutes to recover. 2 nodes failing
> > > should take around 640 seconds. Also see the attached spreadsheet.)
> This
> > > doesn't take ToR switch failure in account though. On the other hand -
> > 150
> > > nodes is only ~5 racks - in such a scenario you might rather want to
> shut
> > > the system down completely rather than letting it replicate 20% of all
> > data.
> > >
> > > Cheers,
> > > Evert
> >
>



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Hoping to merge HDFS-1073 branch soon

2011-07-06 Thread Todd Lipcon
Hi all,

Just an update on this project:
- The current list of uncommitted patches up for review is:

1bea9d3 HDFS-1979. Fix BackupNode and CheckpointNode
32db384 Amend HDFS-2011. Fix TestCheckpoint test for double close/abort of
ELFOS
b6a55a4 HDFS-2101. Update remaining unit tests for new layout
ca0ace6 HDFS-2133. Address TODOs left in code
b46825d HDFS-1780. reduce need to rewrite fsimage on statrtup
30c858d HDFS-2104. Add flag to SecondaryNameNode to format it during startup
942eaef HDFS-2135. Fix regression of HDFS-1955 in branch

I believe Eli is going to work on reviewing these this week.

- I've set up a Hudson job for the branch here:
https://builds.apache.org/job/Hadoop-Hdfs-1073-branch/
It's currently failing because it's missing some of the patches above. After
the above patches go in, I expect a pretty clean build, modulo maybe one or
two things that are environment issues, which I'll tackle later this week.

- BackupNode and CheckpointNode are working. I've done some basic functional
testing by pounding edits into the NN while both a 2NN and a BN are
checkpointing every 2 seconds.
- I merged with trunk as of this morning, so I think we should be up-to-date
with trunk patches. Aaron was very helpful and went through all NN-related
patches in trunk from the last 3 months to make sure we didn't inadvertently
regress anything - he discovered one bug but everything else looks good.

Once the above patches are in the branch, I would like to merge. So, if you
plan on reviewing pre-merge, please do so *this week*. Of course, if you
don't have time and you find issues post-merge, I absolutely plan on fixing
them ASAP ;-)

Thanks
-Todd

On Thu, Jun 30, 2011 at 12:11 AM, Todd Lipcon  wrote:

> Hey all,
>
> Work on the HDFS-1073 branch has been progressing steadily, and I believe
> we're coming close to the point where it can be merged. To briefly summarize
> the status:
> - NameNode and SecondaryNameNode are both fully working and have undergone
> some stress/fault testing in addition to a over 3000 lines worth of new unit
> tests.
> - Most of the existing unit tests have been updated, though a few more need
> some small tweaks (HDFS-2101)
> - The BackupNode and CheckpointNode are not currently working, though I am
> working on it locally and making good progress (HDFS-1979)
> - There are a few various and sundry small improvements that should
> probably be done before release, but I think could be done either before or
> after merge (eg HDFS-2104)
>
> Given this, I am expecting that we can merge this into trunk by the end of
> July if not earlier, as soon as the BN/CN work is complete. If you are
> hoping to review the code or tests before merge time, this is your early
> warning! Please do so now!
>
> Thanks!
>
> -Todd
> P.S. I will also be giving a short talk about the motivations and current
> status of this project at Friday's contributor meeting, for those who are
> able to attend. If we're lucky, maybe even a demo!
> --
> Todd Lipcon
> Software Engineer, Cloudera
>



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: HDFS-1623 branching strategy

2011-07-07 Thread Todd Lipcon
Sounds good to me. I think this strategy has worked well on the HDFS-1073
branch -- allowed development to be quite rapid, and at this point all but a
couple trivial patches have been explicitly reviewed by a committer (and the
others implicitly reviewed since later patches touched the same code area).

+1.

-Todd

On Thu, Jul 7, 2011 at 1:43 PM, Aaron T. Myers  wrote:

> Hello everyone,
>
> This has been informally mentioned before, but I think it's best to be
> completely transparent/explicit about this.
>
> We (Sanjay, Suresh, Todd, Eli, myself, and anyone else who wants to help)
> intend to do the work for HDFS-1623 (High Availability Framework for HDFS
> NN) on a development branch off of trunk. The work in the HDFS-1073
> development branch is necessary to complete HDFS-1623. As such, we're
> waiting for the work in HDFS-1073 to be merged into trunk before creating a
> branch for HDFS-1623.
>
> Once this branch is created, I'd like to use a similar modified
> commit-then-review policy for this branch as was done in the HDFS-1073
> branch, which I think worked very well. To review, this was:
>
> {quote}
> - A patch will be uploaded to the JIRA for review like usual
> - If another committer provides a +1, it may be committed at that
> point, just like usual.
> - If no committer provides +1 (or a review asking for changes) within
> 24 business hours, it will be committed to the branch under "commit then
> review" policy.Of course if any committer feels that code needs to be
> amended, he or she should feel free to open a new JIRA against the branch
> including the review comments, and they will be addressed before the merge
> into trunk. And just like with any branch merge, ample time will be given
> for the community to review both the large merge commit as well as the
> individual historical commits of the branch, before it goes into trunk.
> {quote}
>
> I'm also volunteering to keep the HDFS-1623 development branch up to date
> with respect to merging the concurrent changes which go into trunk into
> this
> development branch to make sure the merge back into trunk is as painless as
> possible.
>
> Comments are certainly welcome on this strategy.
>
> Thanks a lot,
> Aaron
>
> --
> Aaron T. Myers
>  Software Engineer, Cloudera
>



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: [VOTE] Change bylaws to require 3 binding +1s for branch merge

2011-07-11 Thread Todd Lipcon
To clarify, is there any restriction on who may give the +1s? For example,
if a branch has a group of 5 committers primarily authoring the patches, can
the three +1s be made by a subset of those committers?

-Todd

On Mon, Jul 11, 2011 at 5:11 PM, Jakob Homan  wrote:

> As discussed in the recent thread on HDFS-1623 branching models, I'd
> like to amend the bylaws to provide that branches should get a minimum
> of three committer +1s before being merged to trunk.
>
> The rationale:
> Feature branches are often created in order that developers can
> iterate quickly without the review then commit requirements of trunk.
> Branches' commit requirements are determined by the branch maintainer
> and in this situation are often set up as commit-then-review.  As
> such, there is no way to guarantee that the entire changeset offered
> for trunk merge has had a second pair of eyes on it.  Therefore, it is
> prudent to give that final merge heightened scrutiny, particularly
> since these branches often extensively affect critical parts of the
> system.  Requiring three binding +1s does not slow down the branch
> development process, but does provide a better chance of catching bugs
> before they make their way to trunk.
>
> Specifically, under the Actions subsection, this vote would add a new
> bullet item:
> * Branch merge: A feature branch that does not require the same
> criteria for code to be committed to trunk will require three binding
> +1s before being merged into trunk.
>
> The last bylaw change required lazy majority of PMC and ran for 7
> days, which I believe would apply to this one as well.  That would
> have this vote ending 5pm PST July 18.
> -Jakob
>



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: [VOTE] Change bylaws to require 3 binding +1s for branch merge

2011-07-11 Thread Todd Lipcon
Sounds fine to me. +1

On Mon, Jul 11, 2011 at 9:30 PM, Mahadev Konar wrote:

> +1
>
> mahadev
>
> On Mon, Jul 11, 2011 at 9:26 PM, Arun C Murthy 
> wrote:
> > +1
> >
> > Arun
> >
> > On Jul 11, 2011, at 5:11 PM, Jakob Homan wrote:
> >
> >> As discussed in the recent thread on HDFS-1623 branching models, I'd
> >> like to amend the bylaws to provide that branches should get a minimum
> >> of three committer +1s before being merged to trunk.
> >>
> >> The rationale:
> >> Feature branches are often created in order that developers can
> >> iterate quickly without the review then commit requirements of trunk.
> >> Branches' commit requirements are determined by the branch maintainer
> >> and in this situation are often set up as commit-then-review.  As
> >> such, there is no way to guarantee that the entire changeset offered
> >> for trunk merge has had a second pair of eyes on it.  Therefore, it is
> >> prudent to give that final merge heightened scrutiny, particularly
> >> since these branches often extensively affect critical parts of the
> >> system.  Requiring three binding +1s does not slow down the branch
> >> development process, but does provide a better chance of catching bugs
> >> before they make their way to trunk.
> >>
> >> Specifically, under the Actions subsection, this vote would add a new
> >> bullet item:
> >> * Branch merge: A feature branch that does not require the same
> >> criteria for code to be committed to trunk will require three binding
> >> +1s before being merged into trunk.
> >>
> >> The last bylaw change required lazy majority of PMC and ran for 7
> >> days, which I believe would apply to this one as well.  That would
> >> have this vote ending 5pm PST July 18.
> >> -Jakob
> >
> >
>



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Hoping to merge HDFS-1073 branch soon

2011-07-12 Thread Todd Lipcon
On Tue, Jul 12, 2011 at 10:38 AM, sanjay Radia wrote:

> We can merge 1580  after 1073  is merged in.
>
> Looks like the biggest thing in  your 1073  list  is the Backup NN related
> changes.
>

The BN-related changes are done and just awaiting code review. See
HDFS-1979. The current list of patches awaiting review are: HDFS-1979,
HDFS-2101, HDFS-2133, HDFS-1780, HDFS-2104, HDFS-2135.


> Are you shooting for end of this month?
>

I'm hoping as early as next week, assuming folks feel the branch is in good
shape. If all goes well, I'll have code reviews back for the above in the
next day or two, can respond to review comments and commit over the weekend,
and call a vote to merge early next week.

Thanks
-Todd


> On Jul 6, 2011, at 8:03 PM, Todd Lipcon wrote:
>
> > Hi all,
> >
> > Just an update on this project:
> > - The current list of uncommitted patches up for review is:
> >
> > 1bea9d3 HDFS-1979. Fix BackupNode and CheckpointNode
> > 32db384 Amend HDFS-2011. Fix TestCheckpoint test for double close/abort
> of
> > ELFOS
> > b6a55a4 HDFS-2101. Update remaining unit tests for new layout
> > ca0ace6 HDFS-2133. Address TODOs left in code
> > b46825d HDFS-1780. reduce need to rewrite fsimage on statrtup
> > 30c858d HDFS-2104. Add flag to SecondaryNameNode to format it during
> startup
> > 942eaef HDFS-2135. Fix regression of HDFS-1955 in branch
> >
> > I believe Eli is going to work on reviewing these this week.
> >
> > - I've set up a Hudson job for the branch here:
> > https://builds.apache.org/job/Hadoop-Hdfs-1073-branch/
> > It's currently failing because it's missing some of the patches above.
> After
> > the above patches go in, I expect a pretty clean build, modulo maybe one
> or
> > two things that are environment issues, which I'll tackle later this
> week.
> >
> > - BackupNode and CheckpointNode are working. I've done some basic
> functional
> > testing by pounding edits into the NN while both a 2NN and a BN are
> > checkpointing every 2 seconds.
> > - I merged with trunk as of this morning, so I think we should be
> up-to-date
> > with trunk patches. Aaron was very helpful and went through all
> NN-related
> > patches in trunk from the last 3 months to make sure we didn't
> inadvertently
> > regress anything - he discovered one bug but everything else looks good.
> >
> > Once the above patches are in the branch, I would like to merge. So, if
> you
> > plan on reviewing pre-merge, please do so *this week*. Of course, if you
> > don't have time and you find issues post-merge, I absolutely plan on
> fixing
> > them ASAP ;-)
> >
> > Thanks
> > -Todd
> >
> > On Thu, Jun 30, 2011 at 12:11 AM, Todd Lipcon  wrote:
> >
> >> Hey all,
> >>
> >> Work on the HDFS-1073 branch has been progressing steadily, and I
> believe
> >> we're coming close to the point where it can be merged. To briefly
> summarize
> >> the status:
> >> - NameNode and SecondaryNameNode are both fully working and have
> undergone
> >> some stress/fault testing in addition to a over 3000 lines worth of new
> unit
> >> tests.
> >> - Most of the existing unit tests have been updated, though a few more
> need
> >> some small tweaks (HDFS-2101)
> >> - The BackupNode and CheckpointNode are not currently working, though I
> am
> >> working on it locally and making good progress (HDFS-1979)
> >> - There are a few various and sundry small improvements that should
> >> probably be done before release, but I think could be done either before
> or
> >> after merge (eg HDFS-2104)
> >>
> >> Given this, I am expecting that we can merge this into trunk by the end
> of
> >> July if not earlier, as soon as the BN/CN work is complete. If you are
> >> hoping to review the code or tests before merge time, this is your early
> >> warning! Please do so now!
> >>
> >> Thanks!
> >>
> >> -Todd
> >> P.S. I will also be giving a short talk about the motivations and
> current
> >> status of this project at Friday's contributor meeting, for those who
> are
> >> able to attend. If we're lucky, maybe even a demo!
> >> --
> >> Todd Lipcon
> >> Software Engineer, Cloudera
> >>
> >
> >
> >
> > --
> > Todd Lipcon
> > Software Engineer, Cloudera
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera


Vote to merge HDFS-1073 ito trunk

2011-07-19 Thread Todd Lipcon
Hi all,

HDFS-1073 is now complete and ready to be merged. Many thanks to those who
helped review in the last two weeks.

Hudson test-patch results are available on HDFS-1073 JIRA - please see the
recent comments there for explanations.

A few notes that may help you vote:

- I have run the NNThroughputBenchmark and seen just a small regression in
logging performance due to the inclusion of a txid with every edit for
increased robustness.
- The NN read path and the read/write IO paths are entirely untouched by
these changes.
- Image and edit load time were benchmarked throughout development of the
branch and no significant regressions have been seen.

Since this is a code change, all committers should feel free to vote. The
voting requires three committer +1s and no -1s to pass. I will not vote
since I contributed the majority of the code in the branch, though obviously
I'm +1 :)

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Vote to merge HDFS-1073 ito trunk

2011-07-29 Thread Todd Lipcon
Thanks for the votes. The vote has passed and I committed a merge to trunk
just now. If anything breaks, don't hesitate to drop me a mail.

-Todd

On Thu, Jul 28, 2011 at 12:27 PM, Matt Foley  wrote:

> +1 for the merge. I've read a majority of the code changes, excluding the
> BNN and 2NN, approaching from the "big diff" rather than individual
> patches,
> and starting with the files most changed from both current trunk and the
> 1073 branchpoint.  I've found almost nothing to comment on.  It looks like
> a
> solid job, it is a significant simplification of FSEditLog, and I have
> become confident that the merge should proceed.
> --Matt
>
>
> From: Eli Collins 
>
> Date: Tue, 19 Jul 2011 18:43:58 -0700
>
>
> > +1 for the merge.  I've reviewed all but a handful of the 50+
>
> individual patches, also looked at the merge patch for sanity and it
>
> looks good.
>
>
> >
> > From: Jitendra Pandey 
>
> Date: Tue, 19 Jul 2011 18:23:39 -0700
>
>
> > +1 for the merge. I haven't looked at BackupNode changes in much detail,
> > but
>
> apart from that the patch looks good.
>
>
> > On Tue, Jul 19, 2011 at 6:12 PM, Todd Lipcon  wrote:
>
>
> > > Hi all,
>
> >
>
> > HDFS-1073 is now complete and ready to be merged. Many thanks to those
> who
>
> > helped review in the last two weeks.
>
> >
>
> > Hudson test-patch results are available on HDFS-1073 JIRA - please see
> the
>
> > recent comments there for explanations.
>
> >
>
> > A few notes that may help you vote:
>
> >
>
> > - I have run the NNThroughputBenchmark and seen just a small regression
> in
>
> > logging performance due to the inclusion of a txid with every edit for
>
> > increased robustness.
>
> > - The NN read path and the read/write IO paths are entirely untouched by
>
> > these changes.
>
> > - Image and edit load time were benchmarked throughout development of the
>
> > branch and no significant regressions have been seen.
>
> >
>
> > Since this is a code change, all committers should feel free to vote. The
>
> > voting requires three committer +1s and no -1s to pass. I will not vote
>
> > since I contributed the majority of the code in the branch, though
>
> > obviously
>
> > I'm +1 :)
>
> >
>
> > -Todd
>
> > --
>
> > Todd Lipcon
>
> > Software Engineer, Cloudera
>
> >
>



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: hadoop-0.23

2011-08-18 Thread Todd Lipcon
On Thu, Aug 18, 2011 at 9:36 AM, Arun C Murthy  wrote:
> Good morning!
>
> On Jul 13, 2011, at 3:39 PM, Arun C Murthy wrote:
>
>> It's looking like trunk is moving along rapidly - it's about time to start 
>> thinking of the next release to unlock all of the goodies there.
>>
>> As the RM, my current thinking is that after we merge NextGen MR (MR-279) 
>> and the HDFS-1073 branch into trunk we should be good to create the 
>> hadoop-0.23 branch.
>
> Since the last time we spoke (actually, since last night, in fact!) the world 
> (trunk) has changed to accommodate our wishes... *smile*
>
> HDFS-1073 and MAPREDUCE-279 are presently in trunk and I think it's time to 
> cut the 0.23 branch so that we can focus on testing and stabilizing a 
> hadoop-0.23 release off that branch.
>
> I propose to do it noon of the coming Monday (Aug 22).
>
> Thoughts?

I assume we will make sure the HDFS mavenization is in before then?
Tom said he intends to commit it tomorrow, but if something comes up
and it's not committed, let's make sure mavenization happens before we
branch.

Also, what will be the guidelines for committing a change to 0.23
branch? Is it "bug fix only" or are we still allowing improvements?
Given how recently MR2 was merged, I imagine there will be a lot of
things that aren't strictly bugs that we will really want to have in
our next release. I also have a couple of HDFS patches (eg the new
faster CRC on-by-default) that I'd like to get into 23.

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Add Append-HBase support in upcoming 20.205

2011-08-31 Thread Todd Lipcon
On Wed, Aug 31, 2011 at 3:07 PM,   wrote:
> FWIW, Stack has already done the work needed to make sure that Hbase works
> with Hadoop 0.22 branch, and I suppose if
> https://issues.apache.org/jira/browse/MAPREDUCE-2767 is committed, it
> removes the last blocker from 0.22.0, so that it can be released.

The 0.22 implementation "works" but there are certainly still bugs in it.

If other HDFS committers familiar with the new append could help here,
that would be very much appreciated.

For example, https://issues.apache.org/jira/browse/HDFS-2288 can cause
HBase to fail to recover its WAL during a crash scenario. There are
some others that I'll be likely working through in the coming months.

-Todd

>
> I am cc'ng hbase-dev, since this is relevant to them as well.
>
> - Milind
>
> On 8/31/11 11:41 AM, "sanjay Radia"  wrote:
>
>>
>>I propose that the 20-append patches (details below)  be included in
>>20.205 which will become the first official Apache
>>release of Hadoop that supports Append and HBase.
>>
>>Background:
>>There hasn't been a official Apache release that supports HBase.
>>The HBase community have instead been using the 20-append branch; the
>>patches were contributed by the HBase community including Facebook. The
>>Cloudera distribution has also included these patches.
>>Andrew Purtell has ported these patches to 20-security branch.
>>
>>Risk Level:
>>These patches have been used and tested on large HBase clusters by FB ,
>>by those who use 20-append branch directly (various users including a 500
>>node HBase cluster at Yahoo) and by those that use the Cloudera
>>distribution. We have reviewed the patches and have conducted further
>>tests; testing and validation continues.
>>
>>
>>Patches:
>>HDFS-200. Support append and sync for hadoop 0.20 branch.
>>HDFS-142. Blocks that are being written by a client are stored in the
>>blocksBeingWritten directory.
>>HDFS-1057.  Concurrent readers hit ChecksumExceptions if following a
>>writer to very end of file
>>HDFS-724.  Use a bidirectional heartbeat to detect stuck pipeline.
>>HDFS-895. Allow hflush/sync to occur in parallel with new writes to the
>>file.
>>HDFS-1520. Lightweight NameNode operation recoverLease to trigger lease
>>recovery.
>>HDFS-1555. Disallow pipelien recovery if a file is already being lease
>>recovered.
>>HDFS-1554. New semantics for recoverLease.
>>HDFS-988. Fix bug where savenameSpace can corrupt edits log.
>>HDFS-826. Allow a mechanism for an application to detect that datanode(s)
>>have died in the write pipeline.
>>HDFS-630. Client can exclude specific nodes in the write pipeline.
>>HDFS-1141. completeFile does not check lease ownership.
>>HDFS-1204. Lease expiration should recover single files, not entire lease
>>holder
>>HDFS-1254. Support append/sync via the default configuration.
>>HDFS-1346. DFSClient receives out of order packet ack.
>>HDFS-1054. remove sleep before retry for allocating a block.
>>
>>
>
>



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Add Append-HBase support in upcoming 20.205

2011-09-02 Thread Todd Lipcon
The following other JIRAs have been committed in CDH for 18 months or
so, for the purpose of HBase. You may want to consider backporting
them as well - many were never committed to 0.20-append due to lack of
reviews by HDFS committers at the time.

HDFS-1056. Fix possible multinode deadlocks during block recovery
when using ephemeral dataxceiv

Description: Fixes the logic by which datanodes identify local RPC targets
 during block recovery for the case when the datanode
 is configured with an ephemeral data transceiver port.
Reason: Potential internode deadlock for clusters using ephemeral ports


HADOOP-6722. Workaround a TCP spec quirk by not allowing
NetUtils.connect to connect to itself

Description: TCP's ephemeral port assignment results in the possibility
 that a client can connect back to its own outgoing socket,
 resulting in failed RPCs or datanode transfers.
Reason: Fixes intermittent errors in cluster testing with ephemeral
IPC/transceiver ports on datanodes.

HDFS-1122. Don't allow client verification to prematurely add
inprogress blocks to DataBlockScanner

Description: When a client reads a block that is also open for writing,
 it should not add it to the datanode block scanner.
 If it does, the block scanner can incorrectly mark the
 block as corrupt, causing data loss.
Reason: Potential dataloss with concurrent writer-reader case.

HDFS-1248. Miscellaneous cleanup and improvements on 0.20 append branch

Description: Miscellaneous code cleanup and logging changes, including:
 - Slight cleanup to recoverFile() function in TestFileAppend4
 - Improve error messages on OP_READ_BLOCK
 - Some comment cleanup in FSNamesystem
 - Remove toInodeUnderConstruction (was not used)
 - Add some checks for null blocks in FSNamesystem to avoid a possible NPE
 - Only log "inconsistent size" warnings at WARN level for
non-under-construction blocks.
 - Redundant addStoredBlock calls are also not worthy of WARN level
 - Add some extra information to a warning in ReplicationTargetChooser
Reason: Improves diagnosis of error cases and clarity of code


HDFS-1242. Add unit test for the appendFile race condition /
synchronization bug fixed in HDFS-142

Reason: Test coverage for previously applied patch.

HDFS-1218. Replicas that are recovered during DN startup should
not be allowed to truncate better replicas.

Description: If a datanode loses power and then recovers, its replicas
 may be truncated due to the recovery of the local FS
 journal. This patch ensures that a replica truncated by
 a power loss does not truncate the block on HDFS.
Reason: Potential dataloss bug uncovered by power failure simulation

HDFS-915. Write pipeline hangs for too long when ResponseProcessor
hits timeout

Description: Previously, the write pipeline would hang for the entire write
 timeout when it encountered a read timeout (eg due to a
 network connectivity issue). This patch interrupts the writing
 thread when a read error occurs.
Reason: Faster recovery from pipeline failure for HBase and other
interactive applications.


HDFS-1186. Writers should be interrupted when recovery is started,
not when it's completed.

Description: When the write pipeline recovery process is initiated, this
 interrupts any concurrent writers to the block under recovery.
 This prevents a case where some edits may be lost if the
 writer has lost its lease but continues to write (eg due to
 a garbage collection pause)
Reason: Fixes a potential dataloss bug


commit a960eea40dbd6a4e87072bdf73ac3b62e772f70a
Author: Todd Lipcon 
Date:   Sun Jun 13 23:02:38 2010 -0700

HDFS-1197. Received blocks should not be added to block map
prematurely for under construction files

Description: Fixes a possible dataloss scenario when using append() on
 real-life clusters. Also augments unit tests to uncover
 similar bugs in the future by simulating latency when
 reporting blocks received by datanodes.
Reason: Append support dataloss bug
Author: Todd Lipcon


HDFS-1260. tryUpdateBlock should do validation before renaming meta file

Description: Solves bug where block became inaccessible in certain failure
 conditions (particularly network partitions). Observed under
 HBase workload at user site.
Reason: Potential loss of syunced data when write pipeline fails


On Fri, Sep 2, 2011 at 11:20 AM, Suresh Srinivas  wrote:
> I also propose following jiras, which are non append related bug fixes from
> 0.20-append branch:
>

Re: 0.20.205 Sustaining Release branch plan and content plan

2011-09-09 Thread Todd Lipcon
On Fri, Sep 9, 2011 at 4:56 PM, Matt Foley  wrote:
> If I read the jira correctly, this is a workaround for RHEL6.0 that is no
> longer needed for RHEL6.1.
> Is that correct?  If so, would it be no longer needed?

Yes, it's fixed in RHEL 6.1. Also, since the uid caching is enabled in
the 20x series, it's less important, since the race is much much rare.
So I'm +/- 0 (doesn't seem urgent but shoudln't hurt things)

-Todd

>
> On Fri, Sep 9, 2011 at 3:45 AM, Steve Loughran  wrote:
>
>>
>> What about RHEL6.1 workarounds?
>> https://issues.apache.org/**jira/browse/HADOOP-7156<https://issues.apache.org/jira/browse/HADOOP-7156>
>>
>>
>



-- 
Todd Lipcon
Software Engineer, Cloudera


Welcoming Harsh J as a Hadoop committer

2011-09-15 Thread Todd Lipcon
On behalf of the PMC, I am pleased to announce that Harsh J Chouraria
has been elected a committer in the Apache Hadoop Common, HDFS, and
MapReduce projects. Anyone subscribed to the mailing list or JIRA will
undoubtedly recognize Harsh's name as one of the most helpful
community members and an author of increasingly many code
contributions. The Hadoop PMC and community appreciates Harsh's
involvement and looks forward to continuing contributions!

Welcome, Harsh!

-Todd and the Hadoop Project Management Committee


Re: Update on hadoop-0.23

2011-09-27 Thread Todd Lipcon
Hi all,

Just an update from the HBase side: I've run some cluster tests on
HDFS 0.23 (as of about a month ago) and it generally works well.
Performance for some workloads is ~2x due to HDFS-941, and can be
improved a bit more if I finish HDFS-2080 in time. I did not do
extensive failure testing (to stress the new append/sync code) but I
do plan to do that in the coming months.

HBase trunk can compile against 0.23 by using -Dhadoop23 on the maven
build. Currently some 15 or so tests are failing - the following HBase
JIRA tracks those issues:
https://issues.apache.org/jira/browse/HBASE-4254

(these may be indicative of HDFS side bugs)

Any help there from the community would be appreciated!

-Todd

On Tue, Sep 27, 2011 at 12:24 PM, Roman Shaposhnik  wrote:
> Hi Arun!
>
> Thanks for the quick reply!
>
> I'm sorry if I had too many questions in my original email, but I can't find
> an answer to my "integration tests" question. Could you, please, share
> a URL with us where I can find out more about them?
>
> On Mon, Sep 26, 2011 at 11:20 PM, Arun C Murthy  wrote:
>> # We made changes to Pig - rather we got help from the Pig team, 
>> particularly Daniel.
>>
>> So, we plan to work through the rest of the stack - Hive, Oozie etc. very 
>> soon and we'll
>> depend on updated releases from the individual projects.
>
> Do we have any kinds of commitment from downstream projects as far as those
> updates are concerned? Are they targeting these changes as part of point 
> (patch)
> release of an already released version (like Pig 0.9.X for example) or
> will it be
> part of a brand new major release?
>
> Thanks,
> Roman.
>



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Update on hadoop-0.23

2011-09-30 Thread Todd Lipcon
On Fri, Sep 30, 2011 at 11:44 AM, Roman Shaposhnik  wrote:
> I apologize if my level of institutional knowledge of these things is
> lacking, but do you have any
> benchmarking results between 0.22 and 0.20.2xx? The reason I'm asking
> is twofold -- I really
> would like to see an objective numbers qualifying the viability of
> 0.22 from the performance stand point,
> but more importantly I would really like to include the benchmarking
> code into Bigtop.

0.22 currently suffers from MAPREDUCE-2266, which, last time I
benchmarked it, caused a significant slowdown. iirc a terasort ran
something like twice as slow on my test cluster due to this bug.
0.23/MR2 doesn't suffer from this bug.

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Java Versions and Hadoop

2011-10-08 Thread Todd Lipcon
I think requiring Java 7 is years off... I think most people have
doubts as to Java 7's stability until it's been adopted by a majority
of applications, and the new features aren't compelling enough to jump
ship, IMO.

-Todd

On Fri, Oct 7, 2011 at 3:33 PM,   wrote:
> Hi Folks,
>
> While I have seen the wiki on which java versions to use currently to run
> Hadoop, I have not seen any discussion about the roadmap of java version
> compatibility with future hadoop versions.
>
> Recently, Oracle retired the "Operating System Distributor License for
> Java" (DLJ) [http://robilad.livejournal.com/90792.html,
> http://jdk-distros.java.net/] and Linux vendors have started making
> OpenJDK (6/7) as the default java version bundled with their OSs
> [http://www.java7developer.com/blog/?p=361]. Also, all future Java SE
> updates will be delivered through OpenJDK updates project.
>
> I see that OpenJDK6 (6b20pre) cannot be used to compile hadoop trunk. Has
> anyone tried OpenJDK7 ?
>
> Additionally, I have a few small projects in mind which can really make
> use of the new (esp I/O) features of Java 7.
>
> What, if any, timeline do hadoop developers have in mind to make Java 7 as
> required (and tested with OpenJDK 7) ?
>
> Thanks,
>
> - milind
>
> ---
> Milind Bhandarkar
> Greenplum Labs, EMC
> (Disclaimer: Opinions expressed in this email are those of the author, and
> do not necessarily represent the views of any organization, past or
> present, the author might be affiliated with.)
>
>



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Update on hadoop-0.23

2011-10-18 Thread Todd Lipcon
On Tue, Oct 18, 2011 at 4:36 AM, Steve Loughran  wrote:
>
> One more thing: are the ProtocolBuffers needed for all installations, or is
> that a compile-time requirement? If the binaries are going to be required,
> there's going to have to be one built for the various platforms, and
> source.deb/RPM files to build themselves on Linux. I'd rather avoid all that
> work

The protobuf java jar is required at runtime. protoc (native) is only
required at compile time.

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera


Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-15 Thread Todd Lipcon
On Tue, Nov 15, 2011 at 1:57 AM, Steve Loughran  wrote:
> On 15/11/11 06:07, Dhruba Borthakur wrote:
>>
>> +1 to making the upcoming 0.23 release as 2.0.
>>
>
> +1
>
> And leave the 0.20.20x chain as is, just because people are used to it
>

+1 to Steve's proposal. Renaming 0.20 is too big a pain at this point.
Though it's weird to never have a 1.0, the "0.20" name is well
ingrained, and I think renaming it at this point will cause a lot of
confusion (plus cause problems for downstream projects like Hive and
HBase which use regexes against the version string in various shim
layers)

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Heads up - 0.23.1

2011-11-18 Thread Todd Lipcon
Sounds good to me. Let's continue to bill it as "alpha" though - we've
come a long way, but we've still got some work to do ahead of us.

-Todd

On Fri, Nov 18, 2011 at 9:55 AM, Arun C Murthy  wrote:
> Folks,
>
>  I'm considering cutting a 0.23.1 RC early December. Somethings I'd like to 
> see in it (ideally):
>
>  a) Complete mavenization (thanks to all of Tucu's help so far!)
>  b) Oozie fixes for security mode
>  c) A bunch of bug-fixes and perf fixes.
>
> thanks,
> Arun



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: [VOTE] Release hadoop-0.23.1-rc0

2012-02-09 Thread Todd Lipcon
I just committed HDFS-2923 to branch-0.23. This bug can cause big
performance issues on the NN since the number of IPC handlers will
default way too low and won't be changed with the expected config.

Since there's a workaround, it's not a regression since 0.23.0, and
0.23.1 is still going to be labeled alpha/beta, it's up to you whether
you want to spin an rc1 on account of just this bug. If there are
other issues, though, definitely worth including this in rc1.

-Todd

On Wed, Feb 8, 2012 at 1:33 AM, Arun C Murthy  wrote:
> I've created a release candidate for hadoop-0.23.1 that I would like to 
> release.
>
> It is available at: http://people.apache.org/~acmurthy/hadoop-0.23.1-rc0/
>
> Some highlights:
> # Since hadoop-0.23.0 in November there has been significant progress in 
> branch-0.23 with nearly 400 jiras committed to it (68 in Common, 78 in HDFS 
> and 242 in MapReduce).
> # An important aspect is that we've done a lot of performance related work 
> and hadoop-0.23.1 matches or exceeds performance of hadoop-1 in pretty much 
> every aspect of HDFS & MapReduce.
> # Also, several downstream projects (HBase, Pig, Oozie, Hive etc.)  seem to 
> be playing nicely with hadoop-0.23.1.
>
> Please try the release and vote; the vote will run for the usual 7 days.
>
> thanks,
> Arun
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: [VOTE] Release Apache Hadoop 0.23.1-rc2

2012-02-22 Thread Todd Lipcon
-1, unfortunately. HDFS-2991 is a blocker regression introduced in
0.23.1. See the JIRA for instructions on how to reproduce on the rc2
build.

-Todd

On Fri, Feb 17, 2012 at 11:23 PM, Arun C Murthy  wrote:
> I've created another release candidate for hadoop-0.23.1 that I would like to 
> release.
>
> It is available at: http://people.apache.org/~acmurthy/hadoop-0.23.1-rc2/
> The hadoop-0.23.1-rc2 svn tag: 
> https://svn.apache.org/repos/asf/hadoop/common/tags/release-0.23.1-rc2
> The maven artifacts for hadoop-0.23.1-rc2 are also available at 
> repository.apache.org.
>
> Please try the release and vote; the vote will run for the usual 7 days.
>
> thanks,
> Arun
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: [VOTE] Release Apache Hadoop 0.23.1-rc2

2012-02-22 Thread Todd Lipcon
On Wed, Feb 22, 2012 at 7:51 PM, Vinod Kumar Vavilapalli
 wrote:
> Todd,
>
> From your analysis at HDFS-2991, looks like this was there in 0.23
> too. Also, seems this happens only at scale, and only (paraphrasing
> you) "when the file is reopened for append on an exact block
> boundary".

Let me clarify: HDFS-2991 basically has two halves:
First half (been present "forever"): when we append() on a block
boundary, we don't log an OP_ADD
Second half (new due to HDFS-2718): if we get an OP_CLOSE for a file
we haven't OP_ADDed, we'll get a ClassCastException on startup.

So even though the first half isn't a regression, the regression in
the second half means that this longstanding bug will now actually
prevent startup.

Also, there's nothing related to scale here. I happened to run into it
doing scale tests, but it turned out to not be relevant. You'll see it
if you run TestDFSIO with standard parameters on trunk or 23.1 (that's
how I discovered it).

>
> Agree it is a critical fix, but given above, can we proceed along with
> 0.23.1? Anyways, 0.23.1 is still an alpha (albeit of next level), so
> I'd think we can get that in for 0.23.2.

Alright, consider me -0, though it's pretty nasty once you run into
it. The only way I could start my NN again without losing data was to
recompile with the fix in place.

-Todd

>
> On Wed, Feb 22, 2012 at 6:43 PM, Todd Lipcon  wrote:
>> -1, unfortunately. HDFS-2991 is a blocker regression introduced in
>> 0.23.1. See the JIRA for instructions on how to reproduce on the rc2
>> build.
>>
>> -Todd
>>
>> On Fri, Feb 17, 2012 at 11:23 PM, Arun C Murthy  wrote:
>>> I've created another release candidate for hadoop-0.23.1 that I would like 
>>> to release.
>>>
>>> It is available at: http://people.apache.org/~acmurthy/hadoop-0.23.1-rc2/
>>> The hadoop-0.23.1-rc2 svn tag: 
>>> https://svn.apache.org/repos/asf/hadoop/common/tags/release-0.23.1-rc2
>>> The maven artifacts for hadoop-0.23.1-rc2 are also available at 
>>> repository.apache.org.
>>>
>>> Please try the release and vote; the vote will run for the usual 7 days.
>>>
>>> thanks,
>>> Arun
>>>
>>> --
>>> Arun C. Murthy
>>> Hortonworks Inc.
>>> http://hortonworks.com/
>>>
>>>
>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Naming of Hadoop releases

2012-03-19 Thread Todd Lipcon
On Mon, Mar 19, 2012 at 2:56 PM, Doug Cutting  wrote:
> On 03/19/2012 02:47 PM, Arun C Murthy wrote:
>> This is against the Apache Hadoop release policy on major releases i.e. only 
>> features deprecated for at least one release can be removed.
>
> In many case the reason this happened was that features were backported
> from trunk to 0.20 but not to 0.22.  In other words, its no fault of the
> folks who were working on branch 0.22.

I agree that it's no fault of the folks on 0.22.

>  So a related policy we might add
> to prevent such situations in the future might be that if you backport
> something from branch n to n-2 then you ought to also be required to
> backport it to branch n-1 and in general to all intervening branches.
> Does that seem sensible?

-1 on this requirement. Otherwise the cost of backporting something to
the stable line becomes really high, and we'll end up with
distributors just maintaining their own branches outside of Apache
(the state we were in with 0.20.x).

On the other hand, it does suck for users if they update from "1.x" to
"2.x" and they end up losing some bug fixes or features they
previously were running.

Unfortunately, I don't have a better solution in mind that resolves
the above problems - I just don't think it's tenable to combine a
policy like "anyone may make a release branch off trunk and claim a
major version number" with another policy like "you have to port a fix
to all intermediate versions in order to port a fix to any of them".
If a group of committers wants to make a release branch, then the
maintenance of that branch should be up to them.

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera


Re: [VOTE] Rename hadoop branches post hadoop-1.x

2012-03-19 Thread Todd Lipcon
My vote: 3,2,1,4,5, in order of preference. (binding)

-Todd

On Mon, Mar 19, 2012 at 5:44 PM, Brock Noland  wrote:
> 2
> 1
> 3
> 4
> 5
>
> (non-binding)
>
> On Mon, Mar 19, 2012 at 5:41 PM, Shaneal Manek  wrote:
>> 2
>> 1
>> 3
>> 4
>> 5
>>
>> My vote is non-binding, fwiw.
>>
>> -Shaneal
>>
>> On Mon, Mar 19, 2012 at 5:34 PM, Arun C Murthy  wrote:
>>>
>>> On Mar 19, 2012, at 5:23 PM, Roman Shaposhnik wrote:
>>>
>>>> On Mon, Mar 19, 2012 at 5:13 PM, Arun C Murthy  
>>>> wrote:
>>>>> We've discussed several options:
>>>>>
>>>>> (1) Rename branch-0.22 to branch-2, rename branch-0.23 to branch-3.
>>>>> (2) Rename branch-0.23 to branch-3, keep branch-0.22 as-is i.e. leave a 
>>>>> hole.
>>>>> (3) Rename branch-0.23 to branch-2, keep branch-0.22 as-is.
>>>>> (4) If security is fixed in branch-0.22 within a short time-frame i.e. 2 
>>>>> months then we get option 1, else we get option 2. Effectively postpone 
>>>>> discussion by 2 months, start a timer now.
>>>>> (5) Do nothing, keep branch-0.22 and branch-0.23 as-is.
>>>>>
>>>>> Let's do a STV [1] to get reach consensus.
>>>>>
>>>>> Please vote by listing the options above in order of your preferences.
>>>>
>>>> Not sure whether this vote is open to all community members, committers or 
>>>> PMC
>>>> (Arun, could you, please, clarify?) but here's my vote:
>>>
>>> As always, everyone is welcome to vote. PMC votes are binding.
>>>
>>> Forgot to add, the vote will run the normal 7 days.
>>>
>>> Thanks for voting Roman.
>>>
>>> Arun
>
>
>
> --
> Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: [VOTE] Rename hadoop branches post hadoop-1.x

2012-03-19 Thread Todd Lipcon
My vote remains the same: (binding)
(3) Rename branch-0.23 to branch-2, keep branch-0.22 as-is.
(2) Rename branch-0.23 to branch-3, keep branch-0.22 as-is i.e. leave a hole.
(1) Rename branch-0.22 to branch-2, rename branch-0.23 to branch-3.
(4) If security is fixed in branch-0.22 within a short time-frame i.e.
2 months then we get option 1, else we get option 3. Effectively
postpone discussion by 2 months, start a timer now.
(5) Do nothing, keep branch-0.22 and branch-0.23 as-is.


On Mon, Mar 19, 2012 at 6:06 PM, Arun C Murthy  wrote:
> We've discussed several options:
>
> (1) Rename branch-0.22 to branch-2, rename branch-0.23 to branch-3.
> (2) Rename branch-0.23 to branch-3, keep branch-0.22 as-is i.e. leave a hole.
> (3) Rename branch-0.23 to branch-2, keep branch-0.22 as-is.
> (4) If security is fixed in branch-0.22 within a short time-frame i.e. 2 
> months then we get option 1, else we get option 3. Effectively postpone 
> discussion by 2 months, start a timer now.
> (5) Do nothing, keep branch-0.22 and branch-0.23 as-is.
>
> Let's do a STV [1] to get reach consensus.
>
> Please vote by listing the options above in order of your preferences.
>
> My vote is 3, 4, 2, 1, 5 in order (binding).
>
> The vote will run the normal 7 days.
>
> thanks,
> Arun
>
> [1] http://en.wikipedia.org/wiki/Single_transferable_vote
>



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Naming of Hadoop releases

2012-03-20 Thread Todd Lipcon
On Mon, Mar 19, 2012 at 11:02 PM, Konstantin Shvachko
 wrote:
> Feature freeze has been broken so many times for the .20 branch, so
> that it became a norm for the entire project rather than an exception,
> which we had in the past.

I agree we should be stricter about what feature backports we allow
into "stable" branches. Security and hflush were both necessary evils
- I'm glad now that we have them, but we should try to stay out of
these types of situations in the future where we feel compelled to
backport (or re-do in the case of hflush/sync) such large items.

>
> I don't understand this constant segregation against Hadoop .22. It is
> a perfectly usable version of Hadoop. It would be waste not to have it
> released. Very glad that universities adopted it. If somebody needs
> security there is a number of choices, Hadoop-1 being the first. But
> if you cannot afford stand-alone HBase clusters or need to combine
> general Hadoop and HBase loads there is nothing else but Hadoop 0.22
> at this point.

I don't see what HBase has to do with it. In fact HBase runs way
better on 1.x compared to 0.22. The tests don't even pass on 0.22 due
to differences in the append semantics in 0.21+ compared to 0.20.
Every production HBase deploy I know about runs on an 1.x based
distribution. You could argue this is selection bias by nature of my
employer, but the same is true based on emails to the hbase-user
lists, etc. This is orthogonal to the discussion at hand, I just
wanted to correct this lest any users get the wrong perception and
migrate their HBase clusters to a version which is rarely used and
strictly inferior for this use case.

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera


Re: [VOTE] Release hadoop-0.23.2-rc0

2012-04-19 Thread Todd Lipcon
On Thu, Apr 19, 2012 at 12:26 PM, Eli Collins  wrote:

> On Thu, Apr 19, 2012 at 11:45 AM, Arun C Murthy 
> wrote:
> > Yep, makes sense - I'll roll an rc0 for 2.0 after.
> >
> > However, we should consider whether HDFS protocols are 'ready' for us to
> commit to them for the foreseeable future, my sense is that it's a tad
> early - particularly with auto-failover not complete.
> >
> > Thus, we have a couple of options:
> > a) Call the first release here as *2.0.0-alpha* version (lots of ASF
> projects do this).
> > b) Just go with 2.0.0 and deem 2.0.x or 2.1.x as the first stable
> release and fwd-compatible release later.
> >
> > Given this is a major release (unlike something obscure like
> hadoop-0.23.0) I'm inclined to go with a) i.e. hadoop-2.0.0-alpha.
> >
> > Thoughts?
> >
>
> Agree that we're a little too early on the HDFS protocol side, think
> MR2 is probably in a similar boat wrt stability as well.
>
> +1 to option a, calling it hadoop-2.0.0-alpha seems most appropriate.
>

Regarding protocols:
+1 to _not_ locking down "cluster-internal" wire compatibility at this
point. i.e we can break DN<->NN, or NN<->SBN, or Admin command -> NN
compatibility still.
+1 to locking down client wire compatibility with the release of 2.0. After
2.0 is released I would like to see all 2.0.x clients continue to be
compatible. Now that we are protobuf-ified, I think this is doable.
Should we open a separate discussion thread for the above?

Regarding version numbering: either of the proposals seems fine by me.

-Todd

 > Arun
> >
> > On Apr 19, 2012, at 12:24 AM, Eli Collins wrote:
> >
> >> Hey Arun,
> >>
> >> This vote passed a week or so ago, let's make it official?
> >>
> >> Also, are you still planning to roll a hadoop-2.0.0-rc0 of branch-2
> >> this week?  I think we should do that soon, if you're not planning to
> >> do this holler and I'd be happy to.  There's only 1 blocker left
> >> (http://bit.ly/I55LAd) and it's patch available, I think we should
> >> role an rc from branch-2 when it's merged.
> >>
> >> Thanks,
> >> Eli
> >>
> >> On Thu, Mar 29, 2012 at 4:07 PM, Arun C Murthy 
> wrote:
> >>> 0.23.2 is just  a small set of bug-fixes on top of 0.23.1 and doesn't
> have NN HA etc.
> >>>
> >>> As I've noted separately, I plan to put out a hadoop-2.0.0-rc0 in a
> couple weeks with NN HA, PB for HDFS etc.
> >>>
> >>> thanks,
> >>> Arun
> >>>
> >>> On Mar 29, 2012, at 3:55 PM, Ted Yu wrote:
> >>>
> >>>> What are the issues fixed / features added in 0.23.2 compared to
> 0.23.1 ?
> >>>>
> >>>> Thanks
> >>>>
> >>>> On Thu, Mar 29, 2012 at 3:45 PM, Arun C Murthy 
> wrote:
> >>>>
> >>>>> I've created a release candidate for hadoop-0.23.2 that I would like
> to
> >>>>> release.
> >>>>>
> >>>>> It is available at:
> http://people.apache.org/~acmurthy/hadoop-0.23.2-rc0/
> >>>>>
> >>>>> The maven artifacts are available via repository.apache.org.
> >>>>>
> >>>>> Please try the release and vote; the vote will run for the usual 7
> days.
> >>>>>
> >>>>> thanks,
> >>>>> Arun
> >>>>>
> >>>>> --
> >>>>> Arun C. Murthy
> >>>>> Hortonworks Inc.
> >>>>> http://hortonworks.com/
> >>>>>
> >>>>>
> >>>>>
> >>>
> >>> --
> >>> Arun C. Murthy
> >>> Hortonworks Inc.
> >>> http://hortonworks.com/
> >>>
> >>>
> >
> > --
> > Arun C. Murthy
> > Hortonworks Inc.
> > http://hortonworks.com/
> >
> >
>



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: [VOTE] Release hadoop-0.23.2-rc0

2012-04-20 Thread Todd Lipcon
On Fri, Apr 20, 2012 at 7:28 AM, Daryn Sharp  wrote:

> I believe it's premature to release a non-alpha. branch-2.0 does not
> contain a full working implementation of host-based tokens that was
> introduced in 1.x (yes, it was done out of order...).  This is a very
> important feature that prevents tokens from being invalidated when a host's
> ip changes.  The token implementation in 2.0 requires daemons to be
> restarted and/or jobs resubmitted when the ip of an NN is changed (ex. an
> upgrade).  Host-based tokens prevents the need to restart all cluster that
> access a remote cluster that is upgraded.
>

I think blocking the release of a non-alpha on this feature is a bit nutty.
Not to undermine the work, but it only affects users who run security and
want to be able to move a NN/JT from one IP to another without killing
currently running jobs. Only a small fraction of the user base enables
security at all, and an even smaller fraction regularly wants to migrate a
master to a new IP mid-job.


>
> I've been actively working on the yarn side and close to completion.
>  Until that's complete, I feel we should consider 2.x an alpha so there's
> not an omission of a major feature.
>

I'd call it a minor feature.

-Todd


>
> On Apr 19, 2012, at 1:45 PM, Arun C Murthy wrote:
>
> > Yep, makes sense - I'll roll an rc0 for 2.0 after.
> >
> > However, we should consider whether HDFS protocols are 'ready' for us to
> commit to them for the foreseeable future, my sense is that it's a tad
> early - particularly with auto-failover not complete.
> >
> > Thus, we have a couple of options:
> > a) Call the first release here as *2.0.0-alpha* version (lots of ASF
> projects do this).
> > b) Just go with 2.0.0 and deem 2.0.x or 2.1.x as the first stable
> release and fwd-compatible release later.
> >
> > Given this is a major release (unlike something obscure like
> hadoop-0.23.0) I'm inclined to go with a) i.e. hadoop-2.0.0-alpha.
> >
> > Thoughts?
> >
> > Arun
> >
> > On Apr 19, 2012, at 12:24 AM, Eli Collins wrote:
> >
> >> Hey Arun,
> >>
> >> This vote passed a week or so ago, let's make it official?
> >>
> >> Also, are you still planning to roll a hadoop-2.0.0-rc0 of branch-2
> >> this week?  I think we should do that soon, if you're not planning to
> >> do this holler and I'd be happy to.  There's only 1 blocker left
> >> (http://bit.ly/I55LAd) and it's patch available, I think we should
> >> role an rc from branch-2 when it's merged.
> >>
> >> Thanks,
> >> Eli
> >>
> >> On Thu, Mar 29, 2012 at 4:07 PM, Arun C Murthy 
> wrote:
> >>> 0.23.2 is just  a small set of bug-fixes on top of 0.23.1 and doesn't
> have NN HA etc.
> >>>
> >>> As I've noted separately, I plan to put out a hadoop-2.0.0-rc0 in a
> couple weeks with NN HA, PB for HDFS etc.
> >>>
> >>> thanks,
> >>> Arun
> >>>
> >>> On Mar 29, 2012, at 3:55 PM, Ted Yu wrote:
> >>>
> >>>> What are the issues fixed / features added in 0.23.2 compared to
> 0.23.1 ?
> >>>>
> >>>> Thanks
> >>>>
> >>>> On Thu, Mar 29, 2012 at 3:45 PM, Arun C Murthy 
> wrote:
> >>>>
> >>>>> I've created a release candidate for hadoop-0.23.2 that I would like
> to
> >>>>> release.
> >>>>>
> >>>>> It is available at:
> http://people.apache.org/~acmurthy/hadoop-0.23.2-rc0/
> >>>>>
> >>>>> The maven artifacts are available via repository.apache.org.
> >>>>>
> >>>>> Please try the release and vote; the vote will run for the usual 7
> days.
> >>>>>
> >>>>> thanks,
> >>>>> Arun
> >>>>>
> >>>>> --
> >>>>> Arun C. Murthy
> >>>>> Hortonworks Inc.
> >>>>> http://hortonworks.com/
> >>>>>
> >>>>>
> >>>>>
> >>>
> >>> --
> >>> Arun C. Murthy
> >>> Hortonworks Inc.
> >>> http://hortonworks.com/
> >>>
> >>>
> >
> > --
> > Arun C. Murthy
> > Hortonworks Inc.
> > http://hortonworks.com/
> >
> >
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera


Re: [VOTE] Release hadoop-2.0.0-alpha

2012-05-09 Thread Todd Lipcon
Hi Andrew,

Have you seen the new MiniMRClientCluster class? It's meant to be what
you describe - a minicluster which only exposes "external" APIs --
most importantly a way of getting at a JobClient to submit jobs. We
have it implemented in both 1.x and 2.x at this point, though I don't
recall if it's in the 1.0.x releases or if it's only slated for 1.1+

-Todd

On Wed, May 9, 2012 at 6:05 PM, Andrew Purtell  wrote:
> Hi Suresh,
>
> The unstable designation makes sense.  As would one for MiniMRCluster.
>
> I was over the top initially to surprise. I'm sure the MR minicluster seems a 
> minor detail.
>
> Maybe it's worth thinking about the miniclusters differently? Please pardon 
> if I am rehashing an old discussion.
>
> Things like MRUnit for applications and BigTop for full cluster tests can 
> help, but for as mentioned in the below annotation Pig, Hive, HBase, and 
> other parts of the stack use miniclusters for local end to end testing in 
> unit tests. As the complexity of the stack increases and we consider cross 
> version support, unit tests on miniclusters I think will have no substitute.
>
> As Hadoop 2 has been evolving there has been some difficulty keeping up with 
> minicluster changes. This makes sense. The attention to stability to client 
> APIs and such, and the lack thereof to the minicluster, I think is self 
> evident. But the need to fix up tests unpredictably introduces some friction 
> that perhaps need not be there.
>
> Would a JIRA to discuss defining a subset of the minicluster interfaces as 
> more stable be worthwhile?
>
> Best regards,
>
>    - Andy
>
>
> On May 9, 2012, at 1:45 PM, Suresh Srinivas  wrote:
>
>> For this reason, in HDFS, we change MiniDFSCluster to LimitedPrivate and
>> not treat it as such:
>>
>> @InterfaceAudience.LimitedPrivate({"HBase", "HDFS", "Hive", "MapReduce",
>> "Pig"})
>> @InterfaceStability.Unstable
>> public class MiniDFSCluster { ...}
>>
>> On Wed, May 9, 2012 at 11:33 AM, Andrew Purtell  wrote:
>>
>>> Sounds good Arun.
>>>
>>> How should we consider the suitability and stability of MiniMRCluster
>>> for downstream projects?
>>>
>>> On Wed, May 9, 2012 at 11:30 AM, Arun C Murthy 
>>> wrote:
>>>> No worries Andy. I can spin an rc1 once we can pin-point the bug.
>>>>
>>>> thanks,
>>>> Arun
>>>>
>>>> On May 9, 2012, at 10:17 AM, Andrew Purtell wrote:
>>>>
>>>>> -1 (nonbinding), we are currently facing a minicluster semantic change
>>>>> of some kind, or more than one:
>>>>>
>>>>>   https://issues.apache.org/jira/browse/HBASE-5966
>>>>>
>>>>> There are other HBase JIRAs related to 2.0.0-alpha that we are working
>>>>> on, but I'd claim those are all our fault for breaking abstractions to
>>>>> solve issues. In one case there's a new helpful 2.x API
>>>>> (ShutdownHookManager, thank you!) that we can eventually move to.
>>>>>
>>>>> However, the minicluster changes are causing us some repeated
>>>>> discomfort. It will break, we'll get some help fixing up our tests for
>>>>> that, then some time later it will break again, repeat. Perhaps we
>>>>> have no right to complain, the minicluster isn't meant to be used by
>>>>> downstream projects. If so then please disregard the complaint, but
>>>>> your assistance in helping to fix the breakage again would be much
>>>>> appreciated. And, if so, perhaps we can discuss what makes sense in
>>>>> terms of a stable minicluster consumable for downstream projects?
>>>>>
>>>>> Best regards,
>>>>>
>>>>>   - Andy
>>>>>
>>>>> On Wed, May 9, 2012 at 9:58 AM, Arun C Murthy 
>>> wrote:
>>>>>> I've created a release candidate for hadoop-2.0.0-alpha that I would
>>> like to release.
>>>>>>
>>>>>> It is available at:
>>> http://people.apache.org/~acmurthy/hadoop-2.0.0-alpha-rc0/
>>>>>>
>>>>>> The maven artifacts are available via repository.apache.org.
>>>>>>
>>>>>> Please try the release and vote; the vote will run for the usual 7
>>> days.
>>>>>>
>>>>>> This is a big milestone for the Apache Hadoop community -
>>> congratulations and thanks for all the contributions!
>>>>>>
>>>>>> thanks,
>>>>>> Arun
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Arun C. Murthy
>>>>>> Hortonworks Inc.
>>>>>> http://hortonworks.com/
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best regards,
>>>>>
>>>>>   - Andy
>>>>>
>>>>> Problems worthy of attack prove their worth by hitting back. - Piet
>>>>> Hein (via Tom White)
>>>>
>>>> --
>>>> Arun C. Murthy
>>>> Hortonworks Inc.
>>>> http://hortonworks.com/
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Best regards,
>>>
>>>   - Andy
>>>
>>> Problems worthy of attack prove their worth by hitting back. - Piet
>>> Hein (via Tom White)
>>>



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: MiniMRCluster usage in dependent projects

2012-05-10 Thread Todd Lipcon
[changing thread name to not hijack the vote thread]

On Thu, May 10, 2012 at 11:23 AM, Andrew Purtell  wrote:
> Hi Todd,
>
>> Have you seen the new MiniMRClientCluster class? It's meant to be what
>> you describe - a minicluster which only exposes "external" APIs --
>> most importantly a way of getting at a JobClient to submit jobs. We
>> have it implemented in both 1.x and 2.x at this point, though I don't
>> recall if it's in the 1.0.x releases or if it's only slated for 1.1+
>
> Do you mean the below?
>
>    /*
>     * A simple interface for a client MR cluster used for testing.
> This interface
>     * provides basic methods which are independent of the underlying
> Mini Cluster (
>     * either through MR1 or MR2).
>     */
>    public interface MiniMRClientCluster {
>      public void start() throws IOException;
>      public void stop() throws IOException;
>      public Configuration getConfig() throws IOException;
>    }
>
> This doesn't sufficiently encapsulate the mini MR cluster for the
> purposes of a test rig. The issues we've seen are variations in what
> configuration variables are required: their names, and their
> semantics, for finding information about how the cluster is set up.
> Let's take one basic case, how does one find the address of the job
> tracker in a version agnostic way? For example, perhaps:
>
>    public InetSocketAddress getJobTrackerAddress();

The issue is that MR2 doesn't have a JobTracker address. Neither does
it have TaskTrackers. So there is no real way to expose this.

I don't see any reason that HBase should need to get these things --
so long as it can get a Configuration, it should be able to submit
jobs.

>
> or at a higher level of abstraction:
>
>    public JobTrackerInfo getJobTracker();
>
>    public TaskTrackerInfo[] getTaskTrackers();
>
> and, since this a test rig, we'd like to terminate, perhaps abruptly,
> a task tracker, or launch replacements, or launch new ones.
>
>    public boolean stopTaskTracker(TaskTrackerInfo tracker, boolean force);
>
>    public TaskTrackerInfo startTaskTracker(... /* some universal
> public parameters TBD */);

The above should only be useful for system-testing MR itself. But for
dependent projects (eg HBase/Hive/etc) what's the use case?

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera


Heads up: merge HDFS-3042 branch next week?

2012-05-12 Thread Todd Lipcon
Hi all,

I'd like to merge the HDFS-3042 (auto-failover) branch back to trunk
next week. There are a few improvements still up on JIRA against the
branch, but it's mostly done and we have been QAing it for over a
month, including some testing at customer sites.

Over the next day or two, I'm going to make sure all the test cases
pass and there are no new findbugs/warnings introduced by the branch.
Meanwhile, I wanted to give the heads up that I plan to call a vote
really soon.

Thanks
-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera


Re: [VOTE] Release hadoop-2.0.0-alpha

2012-05-12 Thread Todd Lipcon
Looking at the release tag vs the current state of branch-2, I have
two concerns from the point of view of HDFS:

1) We reverted HDFS-3157 in branch-2 because it sends deletions for
corrupt replicas without properly going through the "corrupt block"
path. We saw this cause data loss in TestPipelinesFailover. So, I'm
nervous about putting it in a release, even labeled as alpha.

2) HADOOP-8285 and HADOOP-8366 changed the wire format for the RPC
envelope in branch-2, but didn't make it into this rc. So, that would
mean that future alphas would not be protocol-compatible with this
alpha. Per a discussion a few weeks ago, I think we all were in
agreement that, if possible, we'd like all 2.x to be compatible for
client-server communication, at least (even if we don't support
cross-version for the intra-cluster protocols)

Do other folks think it's worth rolling an rc1? I would propose either:
a) Revert HDFS-3157 and commit HADOOP-8285 and HADOOP-8366 on
branch-2.0.0-alpha, so these are the only changes since rc0. Roll a
new rc1 from here.
or:
b) Discard the current branch-2.0.0-alpha and re-branch from the
current state of branch-2.

-Todd

On Fri, May 11, 2012 at 7:19 PM, Eli Collins  wrote:
> +1  I installed the build on a 6 node cluster and kicked the tires,
> didn't find any blocking issues.
>
> Btw in the future better to build from the svn repo so the revision is
> an svn rev from the release branch. Eg 1336254 instead of 40e90d3c7
> which is from the git mirror, this way we're consistent across
> releases.
>
> hadoop-2.0.0-alpha $ ./bin/hadoop version
> Hadoop 2.0.0-alpha
> Subversion 
> git://devadm900.cc1.ygridcore.net/grid/0/dev/acm/hadoop-trunk/hadoop-common-project/hadoop-common
> -r 40e90d3c7e5d71aedcdc2d9cc55d078e78944c55
> Compiled by hortonmu on Wed May  9 16:19:55 UTC 2012
> From source with checksum 3d9a13a31ef3a9ab4b5cba1f982ab888
>
>
> On Wed, May 9, 2012 at 9:58 AM, Arun C Murthy  wrote:
>> I've created a release candidate for hadoop-2.0.0-alpha that I would like to 
>> release.
>>
>> It is available at: 
>> http://people.apache.org/~acmurthy/hadoop-2.0.0-alpha-rc0/
>>
>> The maven artifacts are available via repository.apache.org.
>>
>> Please try the release and vote; the vote will run for the usual 7 days.
>>
>> This is a big milestone for the Apache Hadoop community - congratulations 
>> and thanks for all the contributions!
>>
>> thanks,
>> Arun
>>
>>
>> --
>> Arun C. Murthy
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>>



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: [VOTE] Release hadoop-2.0.0-alpha

2012-05-14 Thread Todd Lipcon
Hey Arun,

One more thing on the rc tarball: the source artifact doesn't appear
to be an exact svn export, based on a diff. For example, it includes
the README, NOTICE, and LICENSE files, as well as a few other things
which appear to be build artifacts (eg
hadoop-hdfs-project/hadoop-hdfs/downloads,
hadoop-hdfs-project/hadoop-hdfs/test_edit_log, etc).

It seems like we _should_ have the various README style files, but we
shouldn't have the test artifacts in our source release.

In order to get our source release to match svn, perhaps we should
move NOTICE, README, LICENSE, etc to the top level of our svn repo,
such that a pure svn export would be a releaseable source artifact?

-Todd



On Mon, May 14, 2012 at 2:14 PM, Siddharth Seth
 wrote:
> Do we want to get MAPREDUCE-4067 in as well ? It affects folks who may be
> writing their own AMs. Shouldn't affect MR clients though. I believe 2.0
> alpha doesn't freeze the Yarn protocols for the 2.0 branch, so probably not
> critical.
>
> Thanks
> - Sid
>
> On Mon, May 14, 2012 at 1:32 PM, Eli Collins  wrote:
>
>> As soon as jira is back up and I can post an updated patch I'll merge
>> HDFS-3418 (also incompatible).
>>
>>
>> On Mon, May 14, 2012 at 12:16 PM, Tsz Wo Sze  wrote:
>> > I just have merged HADOOP-8285 and HADOOP-8366.  I also have merged
>> HDFS-3211 since it is an incompatible protocol change (without it,
>> 2.0.0-alphaand 2.0.0 will be incompatible.)
>> >
>> > Tsz-Wo
>> >
>> >
>> >
>> > - Original Message -
>> > From: Tsz Wo Sze 
>> > To: "general@hadoop.apache.org" 
>> > Cc:
>> > Sent: Monday, May 14, 2012 11:07 AM
>> > Subject: Re: [VOTE] Release hadoop-2.0.0-alpha
>> >
>> > Let me merge HADOOP-8285 and HADOOP-8366.  Thanks.
>> > Tsz-Wo
>> >
>> >
>> >
>> > - Original Message -
>> > From: Uma Maheswara Rao G 
>> > To: "general@hadoop.apache.org" 
>> > Cc:
>> > Sent: Monday, May 14, 2012 10:56 AM
>> > Subject: RE: [VOTE] Release hadoop-2.0.0-alpha
>> >
>> >> a) Revert HDFS-3157 and commit HADOOP-8285 and HADOOP-8366 on
>> >> branch-2.0.0-alpha, so these are the only changes since rc0. Roll a
>> >> new rc1 from here.
>> > I have merged HDFS-3157 revert.
>> > Do you mind taking a look at HADOOP-8285 and HADOOP-8366?
>> >
>> > Thanks,
>> > Uma
>> > 
>> > From: Arun C Murthy [a...@hortonworks.com]
>> > Sent: Monday, May 14, 2012 10:24 PM
>> > To: general@hadoop.apache.org
>> > Subject: Re: [VOTE] Release hadoop-2.0.0-alpha
>> >
>> > Todd,
>> >
>> > Please go ahead and merge changes into branch-2.0.0-alpha and I'll roll
>> RC1.
>> >
>> > thanks,
>> > Arun
>> >
>> > On May 12, 2012, at 10:05 PM, Todd Lipcon wrote:
>> >
>> >> Looking at the release tag vs the current state of branch-2, I have
>> >> two concerns from the point of view of HDFS:
>> >>
>> >> 1) We reverted HDFS-3157 in branch-2 because it sends deletions for
>> >> corrupt replicas without properly going through the "corrupt block"
>> >> path. We saw this cause data loss in TestPipelinesFailover. So, I'm
>> >> nervous about putting it in a release, even labeled as alpha.
>> >>
>> >> 2) HADOOP-8285 and HADOOP-8366 changed the wire format for the RPC
>> >> envelope in branch-2, but didn't make it into this rc. So, that would
>> >> mean that future alphas would not be protocol-compatible with this
>> >> alpha. Per a discussion a few weeks ago, I think we all were in
>> >> agreement that, if possible, we'd like all 2.x to be compatible for
>> >> client-server communication, at least (even if we don't support
>> >> cross-version for the intra-cluster protocols)
>> >>
>> >> Do other folks think it's worth rolling an rc1? I would propose either:
>> >> a) Revert HDFS-3157 and commit HADOOP-8285 and HADOOP-8366 on
>> >> branch-2.0.0-alpha, so these are the only changes since rc0. Roll a
>> >> new rc1 from here.
>> >> or:
>> >> b) Discard the current branch-2.0.0-alpha and re-branch from the
>> >> current state of branch-2.
>> >>
>> >> -Todd
>> >>
>> >> On Fri, May 11, 2012 at 7:19 PM, Eli Collins  wrote:
>> >>> +1  I ins

Re: [VOTE] Release hadoop-2.0.0-alpha

2012-05-15 Thread Todd Lipcon
Hi Kumar,

It looks like that patch was only committed to trunk, not branch-2.

IMO we should keep the new changes for 2.0.0-alpha to a minimum (just
things that impact client-server wire compatibility) and then plan a
2.0.1-alpha ASAP following this release, where we can pull in everything
else that went into branch-2 in the last couple weeks since the 2.0.0-alpha
branch was cut.

Arun: do you have time today to roll a new RC? If not, I am happy to do so.

Does that sound reasonable?
-Todd

On Tue, May 15, 2012 at 8:51 AM, Kumar Ravi  wrote:

> Hi,
>
>  Can HDFS-3265 be included too?
> It seems like this was marked for inclusion but I can't seem to find the
> patch in the branch-2.0.0-alpha tree.
>
> Thanks,
> Kumar
>
> Kumar Ravi
>
>
> [image: Inactive hide details for Todd Lipcon ---05/14/2012 11:21:34
> PM---Hey Arun,]Todd Lipcon ---05/14/2012 11:21:34 PM---Hey Arun,
>
>
>
>From:
>
>
> Todd Lipcon 
>
>To:
>
>
> general@hadoop.apache.org
>
>Date:
>
>
> 05/14/2012 11:21 PM
>
>Subject:
>
>
> Re: [VOTE] Release hadoop-2.0.0-alpha
> --
>
>
> Hey Arun,
>
> One more thing on the rc tarball: the source artifact doesn't appear
> to be an exact svn export, based on a diff. For example, it includes
> the README, NOTICE, and LICENSE files, as well as a few other things
> which appear to be build artifacts (eg
> hadoop-hdfs-project/hadoop-hdfs/downloads,
> hadoop-hdfs-project/hadoop-hdfs/test_edit_log, etc).
>
> It seems like we _should_ have the various README style files, but we
> shouldn't have the test artifacts in our source release.
>
> In order to get our source release to match svn, perhaps we should
> move NOTICE, README, LICENSE, etc to the top level of our svn repo,
> such that a pure svn export would be a releaseable source artifact?
>
> -Todd
>
>
>
> On Mon, May 14, 2012 at 2:14 PM, Siddharth Seth
>  wrote:
> > Do we want to get MAPREDUCE-4067 in as well ? It affects folks who may be
> > writing their own AMs. Shouldn't affect MR clients though. I believe 2.0
> > alpha doesn't freeze the Yarn protocols for the 2.0 branch, so probably
> not
> > critical.
> >
> > Thanks
> > - Sid
> >
> > On Mon, May 14, 2012 at 1:32 PM, Eli Collins  wrote:
> >
> >> As soon as jira is back up and I can post an updated patch I'll merge
> >> HDFS-3418 (also incompatible).
> >>
> >>
> >> On Mon, May 14, 2012 at 12:16 PM, Tsz Wo Sze 
> wrote:
> >> > I just have merged HADOOP-8285 and HADOOP-8366.  I also have merged
> >> HDFS-3211 since it is an incompatible protocol change (without it,
> >> 2.0.0-alphaand 2.0.0 will be incompatible.)
> >> >
> >> > Tsz-Wo
> >> >
> >> >
> >> >
> >> > - Original Message -
> >> > From: Tsz Wo Sze 
> >> > To: "general@hadoop.apache.org" 
> >> > Cc:
> >> > Sent: Monday, May 14, 2012 11:07 AM
> >> > Subject: Re: [VOTE] Release hadoop-2.0.0-alpha
> >> >
> >> > Let me merge HADOOP-8285 and HADOOP-8366.  Thanks.
> >> > Tsz-Wo
> >> >
> >> >
> >> >
> >> > - Original Message -
> >> > From: Uma Maheswara Rao G 
> >> > To: "general@hadoop.apache.org" 
> >> > Cc:
> >> > Sent: Monday, May 14, 2012 10:56 AM
> >> > Subject: RE: [VOTE] Release hadoop-2.0.0-alpha
> >> >
> >> >> a) Revert HDFS-3157 and commit HADOOP-8285 and HADOOP-8366 on
> >> >> branch-2.0.0-alpha, so these are the only changes since rc0. Roll a
> >> >> new rc1 from here.
> >> > I have merged HDFS-3157 revert.
> >> > Do you mind taking a look at HADOOP-8285 and HADOOP-8366?
> >> >
> >> > Thanks,
> >> > Uma
> >> > 
> >> > From: Arun C Murthy [a...@hortonworks.com]
> >> > Sent: Monday, May 14, 2012 10:24 PM
> >> > To: general@hadoop.apache.org
> >> > Subject: Re: [VOTE] Release hadoop-2.0.0-alpha
> >> >
> >> > Todd,
> >> >
> >> > Please go ahead and merge changes into branch-2.0.0-alpha and I'll
> roll
> >> RC1.
> >> >
> >> > thanks,
> >> > Arun
> >> >
> >> > On May 12, 2012, at 10:05 PM, Todd Lipcon wrote:
> >> >
> >> >> Looking at the release tag vs the current state of branch-2, I hav

Re: [VOTE] Release hadoop-2.0.0-alpha

2012-05-15 Thread Todd Lipcon
On Tue, May 15, 2012 at 11:10 AM, Arun C Murthy  wrote:
> Any more HDFS related merges before I roll RC1?

I'm good as is. Thanks!

>
> On May 15, 2012, at 10:05 AM, Arun C Murthy wrote:
>
>> Eli, is this done so I can roll rc1?
>>
>> On May 14, 2012, at 1:32 PM, Eli Collins wrote:
>>
>>> As soon as jira is back up and I can post an updated patch I'll merge
>>> HDFS-3418 (also incompatible).
>>>
>>>
>>> On Mon, May 14, 2012 at 12:16 PM, Tsz Wo Sze  wrote:
>>>> I just have merged HADOOP-8285 and HADOOP-8366.  I also have merged 
>>>> HDFS-3211 since it is an incompatible protocol change (without it, 
>>>> 2.0.0-alphaand 2.0.0 will be incompatible.)
>>>>
>>>> Tsz-Wo
>>>>
>>>>
>>>>
>>>> - Original Message -
>>>> From: Tsz Wo Sze 
>>>> To: "general@hadoop.apache.org" 
>>>> Cc:
>>>> Sent: Monday, May 14, 2012 11:07 AM
>>>> Subject: Re: [VOTE] Release hadoop-2.0.0-alpha
>>>>
>>>> Let me merge HADOOP-8285 and HADOOP-8366.  Thanks.
>>>> Tsz-Wo
>>>>
>>>>
>>>>
>>>> - Original Message -
>>>> From: Uma Maheswara Rao G 
>>>> To: "general@hadoop.apache.org" 
>>>> Cc:
>>>> Sent: Monday, May 14, 2012 10:56 AM
>>>> Subject: RE: [VOTE] Release hadoop-2.0.0-alpha
>>>>
>>>>> a) Revert HDFS-3157 and commit HADOOP-8285 and HADOOP-8366 on
>>>>> branch-2.0.0-alpha, so these are the only changes since rc0. Roll a
>>>>> new rc1 from here.
>>>> I have merged HDFS-3157 revert.
>>>> Do you mind taking a look at HADOOP-8285 and HADOOP-8366?
>>>>
>>>> Thanks,
>>>> Uma
>>>> 
>>>> From: Arun C Murthy [a...@hortonworks.com]
>>>> Sent: Monday, May 14, 2012 10:24 PM
>>>> To: general@hadoop.apache.org
>>>> Subject: Re: [VOTE] Release hadoop-2.0.0-alpha
>>>>
>>>> Todd,
>>>>
>>>> Please go ahead and merge changes into branch-2.0.0-alpha and I'll roll 
>>>> RC1.
>>>>
>>>> thanks,
>>>> Arun
>>>>
>>>> On May 12, 2012, at 10:05 PM, Todd Lipcon wrote:
>>>>
>>>>> Looking at the release tag vs the current state of branch-2, I have
>>>>> two concerns from the point of view of HDFS:
>>>>>
>>>>> 1) We reverted HDFS-3157 in branch-2 because it sends deletions for
>>>>> corrupt replicas without properly going through the "corrupt block"
>>>>> path. We saw this cause data loss in TestPipelinesFailover. So, I'm
>>>>> nervous about putting it in a release, even labeled as alpha.
>>>>>
>>>>> 2) HADOOP-8285 and HADOOP-8366 changed the wire format for the RPC
>>>>> envelope in branch-2, but didn't make it into this rc. So, that would
>>>>> mean that future alphas would not be protocol-compatible with this
>>>>> alpha. Per a discussion a few weeks ago, I think we all were in
>>>>> agreement that, if possible, we'd like all 2.x to be compatible for
>>>>> client-server communication, at least (even if we don't support
>>>>> cross-version for the intra-cluster protocols)
>>>>>
>>>>> Do other folks think it's worth rolling an rc1? I would propose either:
>>>>> a) Revert HDFS-3157 and commit HADOOP-8285 and HADOOP-8366 on
>>>>> branch-2.0.0-alpha, so these are the only changes since rc0. Roll a
>>>>> new rc1 from here.
>>>>> or:
>>>>> b) Discard the current branch-2.0.0-alpha and re-branch from the
>>>>> current state of branch-2.
>>>>>
>>>>> -Todd
>>>>>
>>>>> On Fri, May 11, 2012 at 7:19 PM, Eli Collins  wrote:
>>>>>> +1  I installed the build on a 6 node cluster and kicked the tires,
>>>>>> didn't find any blocking issues.
>>>>>>
>>>>>> Btw in the future better to build from the svn repo so the revision is
>>>>>> an svn rev from the release branch. Eg 1336254 instead of 40e90d3c7
>>>>>> which is from the git mirror, this way we're consistent across
>>>>>> releases.
>>>>>>
>>>>>

Re: [VOTE] Release hadoop-2.0.0-alpha-rc1

2012-05-15 Thread Todd Lipcon
Thanks for posting the new RC. Will take a look tomorrow. Meanwhile,
I'm going through CHANGES.txt and JIRA and moving things that didn't
make the 2.0.0 cut to 2.0.1.

So, if folks commit things tomorrow, please check to put it in the
right spot in CHANGES.txt and in JIRA. I'll take care of anything
committed tonight that would conflict with my change.

-Todd

On Tue, May 15, 2012 at 7:20 PM, Arun C Murthy  wrote:
> I've created a release candidate (rc1) for hadoop-2.0.0-alpha that I would 
> like to release.
>
> It is available at: http://people.apache.org/~acmurthy/hadoop-2.0.0-alpha-rc1/
>
> The maven artifacts are available via repository.apache.org.
>
> Please try the release and vote; the vote will run for the usual 7 days.
>
> thanks,
> Arun
>
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: [VOTE] Release hadoop-2.0.0-alpha-rc1

2012-05-15 Thread Todd Lipcon
OK, the fixes to CHANGES.txt and JIRA are complete. Sorry for the mail bomb ;-)

-Todd

On Tue, May 15, 2012 at 10:30 PM, Todd Lipcon  wrote:
> Thanks for posting the new RC. Will take a look tomorrow. Meanwhile,
> I'm going through CHANGES.txt and JIRA and moving things that didn't
> make the 2.0.0 cut to 2.0.1.
>
> So, if folks commit things tomorrow, please check to put it in the
> right spot in CHANGES.txt and in JIRA. I'll take care of anything
> committed tonight that would conflict with my change.
>
> -Todd
>
> On Tue, May 15, 2012 at 7:20 PM, Arun C Murthy  wrote:
>> I've created a release candidate (rc1) for hadoop-2.0.0-alpha that I would 
>> like to release.
>>
>> It is available at: 
>> http://people.apache.org/~acmurthy/hadoop-2.0.0-alpha-rc1/
>>
>> The maven artifacts are available via repository.apache.org.
>>
>> Please try the release and vote; the vote will run for the usual 7 days.
>>
>> thanks,
>> Arun
>>
>>
>> --
>> Arun C. Murthy
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: [VOTE] Release hadoop-2.0.0-alpha-rc1

2012-05-16 Thread Todd Lipcon
+1, I also tried it in pseudo-distributed, ran some uber-jobs, etc.

I verified the contents of the -src tarball matched svn (modulo the
license/readme/text files)

The configs in etc/hadoop could do with some improvement, but didn't
seem like a blocker.

Here is my signature for the src tarball:

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)

iEYEABECAAYFAk+0K64ACgkQXkPKua7Hfq9gzQCffzIhMOBgC1T4/tziiFzOMYPC
piAAoOYO5aDE06cWP50T07Hkxr7f64aq
=ostO
-END PGP SIGNATURE-

Thanks
-Todd

On Wed, May 16, 2012 at 12:04 AM, Robert Evans  wrote:
> +1 I downloaded the binary ran a single node cluster and kicked the tires a 
> bit with a few MR jobs.  Everything worked.
>
>
> On 5/15/12 9:20 PM, "Arun C Murthy"  wrote:
>
> I've created a release candidate (rc1) for hadoop-2.0.0-alpha that I would 
> like to release.
>
> It is available at: http://people.apache.org/~acmurthy/hadoop-2.0.0-alpha-rc1/
>
> The maven artifacts are available via repository.apache.org.
>
> Please try the release and vote; the vote will run for the usual 7 days.
>
> thanks,
> Arun
>
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: [DISCUSS] Merge *-user@hadoop lists?

2012-07-20 Thread Todd Lipcon
Sure, +1. I already subscribe to all and filter into the same mailbox anyway :)

-Todd

On Fri, Jul 20, 2012 at 11:34 AM, Mahadev Konar  wrote:
> +1 .
>
>
> On Fri, Jul 20, 2012 at 10:48 AM, Jitendra Pandey
>  wrote:
>> +1 for merging.
>>
>> On Thu, Jul 19, 2012 at 11:25 PM, Arun C Murthy  wrote:
>>
>>> I've been thinking that we currently have too many *-user@ lists
>>> (common,hdfs,mapreduce) and confuse folks all the time resulting in too
>>> many cross-posts etc., particularly new users. Basically, it's too unwieldy
>>> and tedious.
>>>
>>> How about simplifying things by having a single user@hadoop.apache.orglist 
>>> by merging all of them?
>>>
>>> Thoughts?
>>>
>>> Arun
>>
>>
>>
>>
>> --
>> <http://hortonworks.com/download/>



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

2012-08-29 Thread Todd Lipcon
Have we not learned our lessons from the last attempts to split?

The issues in our community, which I think Chris is referring to, do
not generally revolve around project boundaries. It's not the case
that the HDFS community wants to go one way and the MR/YARN community
wants to go another, and we get into a conflict around it. If it were,
then splitting into separate TLPs would make a ton of sense.

Instead, the issues are usually _within_ a component. So, if we split
into 3 TLPs, then we'll just have 3 TLPs, each of which is just as
contentious as before.

Let's just embrace contention as a fact of life on a high-profile
high-stakes project and get back to work.

I wasted nearly a month undoing the mess of the last attempt, and I
don't see why this time it would go any better. -1 from my perspective
on splitting again at this point. Perhaps if we get to the point that
we're never making cross-project commits it makes sense, but we're not
there still.

-Todd

On Wed, Aug 29, 2012 at 1:40 PM, Alejandro Abdelnur  wrote:
> I volunteer to help cleanup/normalize Maven stuff.
>
> Thx
>
> On Wed, Aug 29, 2012 at 1:34 PM, Tom White  wrote:
>> Eric - I agree with Common being included in HDFS. That's what I meant
>> by Common not having a clear enough mission to be a TLP by itself.
>>
>> Arun - I'm happy to RM some of the upcoming MR releases too. Also to
>> help out with the work on audience annotations and compatibility.
>>
>> Cheers,
>> Tom
>>
>> On Wed, Aug 29, 2012 at 7:22 PM, Arun C Murthy  wrote:
>>> On Aug 29, 2012, at 10:04 AM, Arun C Murthy wrote:
>>>> On Aug 29, 2012, at 10:02 AM, Suresh Srinivas wrote:
>>>>>
>>>>> Robert and Alejandro have brought up good questions. Here are my thoughts:
>>>>> - For first one or two releases all the projects can coordinate and do the
>>>>> releases together. This should help simplify the immediate work needed.
>>>>> This should also help in us meeting the release timelines that we are
>>>>> working towards. As the split makes progress, this cross project
>>>>> coordination will no longer be necessary. I volunteer to RM these releases
>>>>> and do the needed co-ordination from HDFS.
>>>>
>>>>
>>>> +1 seems like a reasonable first step. Thanks for volunteering Suresh.
>>>
>>> Also, I'd say we make at least 3-4 alpha/beta releases in this shape.
>>>
>>> I volunteer to RM for MR/YARN releases and work with Suresh.
>>>
>>> Arun
>>>
>
>
>
> --
> Alejandro



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

2012-08-29 Thread Todd Lipcon
On Wed, Aug 29, 2012 at 4:20 PM, Mattmann, Chris A (388J)
 wrote:
> Arun, great work below. Concrete, and an actual proposal of PMC lists.
>
> What do folks think?

Already expressed my opinion above on the thread that whole idea of
splitting is crazy. But, I'll comment on some specifics of the
proposal as well:

>>
>> I think the simplest way is to have all existing HDFS committers be 
>> committers and PMC members of the new project. That list is found in the 
>> asf-authorization-template which has:

Why? If we were to do this, why not take the opportunity to narrow
down into the people who are actually active contributors to the
project? (per your reasoning on the YARN thread)

>>
>> hadoop-hdfs = 
>> acmurthy,atm,aw,boryas,cdouglas,cos,cutting,daryn,ddas,dhruba,eli,enis,eric14,eyang,gkesavan,hairong,harsh,jitendra,jghoman,johan,knoguchi,kzhang,lohit,mahadev,matei,mattf,molkov,nigel,omalley,ramya,rangadi,sharad,shv,sradia,stevel,suresh,szetszwo,tanping,todd,tomwhite,tucu,umamahesh,yhemanth,zshao

Of these, only the following people have actually contributed more
than 5 patches to common and HDFS in the last year:
Hairong Kuang (7):
Vinod Kumar Vavilapalli (7):
Daryn Sharp (8):
Matthew J. Foley (10):
Devaraj Das (11):
Mahadev Konar (15):
Eric Yang (18):
Sanjay Radia (18):
Thomas Graves (18):
Thomas White (21):
Konstantin Shvachko (23):
Steve Loughran (24):
Arun Murthy (32):
Uma Maheswara Rao G (36):
Jitendra Nath Pandey (51):
Harsh J (68):
Robert Joseph Evans (71):
Alejandro Abdelnur (106):
Suresh Srinivas (107):
Aaron Twining Myers (171):
Tsz-wo Sze (184):
Eli Collins (252):
Todd Lipcon (286):

So I would propose:
atm,daryn,ddas,eli,eyang,hairong,harsh,jitendra,mahadev,mattf,shv,sradia,stevel,suresh,szetszwo,todd,tomwhite,tucu,umamahesh

and listing the others as Emeritus, who could easily regain committer
status if they started contributing again.

>>
>>
>> 
>>
>>
>> Proposal: Apache Hadoop MapReduce as a TLP
>>
>> I propose we graduate MapReduce as a TLP named 'Apache Hadoop MapReduce'.
>>
>> I think the simplest way is to have all existing MR committers be committers 
>> and PMC members of the new project. That list is found in the 
>> asf-authorization-template which has:
>>
>> hadoop-mapreduce = 
>> acmurthy,amareshwari,amarrk,aw,bobby,cdouglas,cos,cutting,daryn,ddas,dhruba,enis,eric14,eyang,gkesavan,hairong,harsh,hitesh,jeagles,jitendra,jghoman,johan,kimballa,knoguchi,kzhang,llu,lohit,mahadev,matei,mattf,nigel,omalley,ramya,rangadi,ravigummadi,schen,sharad,shv,sradia,sreekanth,sseth,stevel,szetszwo,tgraves,todd,tomwhite,tucu,vinodkv,yhemanth,zshao
>>

Applying the same criteria, the list would be:

Suresh Srinivas (6):
Aaron Twining Myers (7):
Steve Loughran (7):
Ravi  Gummadi (9):
Konstantin Shvachko (11):
Todd Lipcon (12):
Tsz-wo Sze (16):
Amar Kamat (17):
Harsh J (20):
Eli Collins (21):
Thomas White (27):
Siddharth Seth (46):
Thomas Graves (60):
Alejandro Abdelnur (71):
Robert Joseph Evans (107):
Mahadev Konar (118):
Vinod Kumar Vavilapalli (164):
Arun Murthy (209):

(this is based on git shortlog on the directories in the repository)


But I still think this discussion is silly, and we're not ready to do it.

-- 
Todd Lipcon
Software Engineer, Cloudera


Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

2012-08-29 Thread Todd Lipcon
een Jenkins
builds, etc. You can say these aren't technical issues, but if you're
not dealing with the project on a technical basis, I don't think
you're well qualified to judge. I certainly appreciate the work you've
done way back in the Nutch days and your continued evangelism, but
this whole thread just seems like it's stirring up trouble and not
going to accomplish anything except a bunch of wasted man-hours. (I've
already wasted about 45 minutes today on it, oops!)

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera


Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

2012-08-29 Thread Todd Lipcon
On Wed, Aug 29, 2012 at 4:47 PM, Konstantin Boudnik  wrote:
> I am curious where the arbitrar numbery 5 is coming from: is it reflected in
> the bylaws?

Nope, I picked it based on Arun's earlier picking of the same number
in the YARN thread. We have no bylaws about what would happen in the
eventual TLP-ification of subcomponents, of course.

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera


Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

2012-08-29 Thread Todd Lipcon
.S. I appreciate you and am still one of your biggest fans. Just trying to
> help you see the bigger picture here and to wear your Apache hat.

Thanks for that. As for Apache vs Cloudera hat: I think they're well
aligned here. Both hats want the project to be easy for people to
contribute to, and want to avoid a bunch of wasted time spent on new
technical issues that this would create. I want to spend that time
making the product better, for our users benefit. Whether the users
are Apache community users, or Cloudera customers, or Facebook's data
scientists, they all are going to be happier if I spend a month
improving our HA support compared to spending a month figuring out how
to release three separate projects which somehow stitch together in a
reasonable way at runtime without jar conflicts, tons of duplicate
configuration work, byzantine version dependencies, etc.

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Heads up: next hadoop-2 release

2012-08-31 Thread Todd Lipcon
Should we consider re-branching 2.1.0-alpha off the current branch-2
code, then? Given there's been lots of testing on tip of branch-2, and
not much in there that I consider "scary", it might be easier to do
that than to cherry pick all of the changes individually?

Just a thought.

-Todd

On Fri, Aug 31, 2012 at 9:31 AM, Eli Collins  wrote:
> On Fri, Aug 31, 2012 at 8:48 AM, Arun C Murthy  wrote:
>> Eli,
>>
>>  Good point. Looks like both HDFS & MR committers have been lax in 
>> maintaining branch-2.1.0-alpha. Lots of stabilization has occurred via 
>> branch-2/branch-0.23.
>>
>>  Since branch-0.23 is nearly there, I'll go ahead and cherry-pick to 
>> branch-2.1.0-alpha - do you mind do the same for HDFS from branch-2 also?
>
> Will do, probably won't get to this today early next week with a plan
> to spin an RC maybe by the end of week would work?
>
> Thanks,
> Eli
>
>>
>> thanks,
>> Arun
>>
>> On Aug 30, 2012, at 5:43 PM, Eli Collins wrote:
>>
>>> Hey Arun,
>>>
>>> Are you still planning on releasing this branch?  It's been a couple
>>> of months and branch-2.1.0-alpha wasn't compiling for a while (had the
>>> wrong mvn versions, and YARN-36) so sounds like this branch isn't
>>> actually being stabilized?
>>>
>>> Thanks,
>>> Eli
>>>
>>>
>>> On Mon, Jul 9, 2012 at 1:31 PM, Arun C Murthy  wrote:
>>>> I'll try and shoot for end of month, but will depend on performance work 
>>>> and how much we'll need to get to parity with hadoop-1.x on branch-2 etc., 
>>>> even though hadoop-0.23 was there. Fingers crossed.
>>>>
>>>> thanks,
>>>> Arun
>>>>
>>>> On Jul 9, 2012, at 11:59 AM, Eli Collins wrote:
>>>>
>>>>> Thanks Arun, what do you think the ETA is for the first 2.1.0-alpha RC?
>>>>>
>>>>> On Mon, Jul 9, 2012 at 11:55 AM, Arun C Murthy  
>>>>> wrote:
>>>>>> I'm about to create a hadoop-2.0.1-alpha release with a couple of 
>>>>>> security fixes.
>>>>>>
>>>>>> Also, I plan to create a branch-2.1.0-alpha for the next release in the 
>>>>>> hadoop-2 release. So, this way, I can start stabilize that release 
>>>>>> (performance, bug-fixes etc.) - please be careful, henceforth, while 
>>>>>> committing to branch-2.1.0-alpha.
>>>>>>
>>>>>> thanks,
>>>>>> Arun
>>>>>>
>>>>>> On Jun 8, 2012, at 12:08 AM, Arun C Murthy wrote:
>>>>>>
>>>>>>> Folks,
>>>>>>>
>>>>>>> I'm considering cutting a hadoop-2.0.1-alpha release within the next 
>>>>>>> four weeks or so with some more major add ons I think we can get done 
>>>>>>> soon:
>>>>>>>
>>>>>>> # Auto NN Failover (thanks for the HDFS-3042 merge today Todd) and 
>>>>>>> follow-ons as necessary.
>>>>>>> # Pending security work for YARN (anything for HDFS w.r.t HA?)
>>>>>>> # RM Restart (MAPREDUCE-4326)
>>>>>>> # Container re-use (MAPREDUCE-3902)
>>>>>>> # Multi-resource scheduling for CS (MAPREDUCE-4327).
>>>>>>>
>>>>>>> If you would like others please set the 'Target Version' and I'll watch 
>>>>>>> that list.
>>>>>>>
>>>>>>> thanks,
>>>>>>> Arun
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Arun C. Murthy
>>>>>>> Hortonworks Inc.
>>>>>>> http://hortonworks.com/
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Arun C. Murthy
>>>>>> Hortonworks Inc.
>>>>>> http://hortonworks.com/
>>>>>>
>>>>>>
>>>>
>>>> --
>>>> Arun C. Murthy
>>>> Hortonworks Inc.
>>>> http://hortonworks.com/
>>>>
>>>>
>>
>> --
>> Arun C. Murthy
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>>



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

2012-08-31 Thread Todd Lipcon
On Thu, Aug 30, 2012 at 11:50 PM, Mattmann, Chris A (388J)
 wrote:
> Hi Andrew,
>
> On Aug 30, 2012, at 11:42 PM, Andrew Purtell wrote:
>
>> If Apache Hadoop -- as an umbrella or sum of its parts -- isn't practical
>> to develop end applications or downstream projects on, the community will
>> disappear.
>
> Sure, the end-user community might disappear, but the point I'm trying to 
> make is
> that the community is more than that. It's developers that build code together
> ("community over code"); it's folks who write documentation who are part of 
> the
> project's committee of folks working together to develop software for the 
> public
> good at this Foundation. It's folks who write unit tests as part of that.  
> It's also people
> that fly by on the lists and that need help; or that may throw up a patch, or
> whatever. It's other members of the Apache Software Foundation that are
> charged with caring and giving a rip about the Foundation's projects.

Well, speaking as one of the developer community who hasn't been a
traditional user of Hadoop since my previous job in 2008: if the end
user community started to languish, I (and 80% of the other most
involved contributors) would probably stop working on the project
pretty quickly. We're here because a user community exists, which
funds our employers, who fund us.

Another point I'll make is that I've talked to a number of former
contributors (from the 0.20 days) who pretty much stopped contributing
because of the code base churn around the prior project split. It
became too much effort to forward and back port patches from their
internal branches, so their cost/reward tradeoff dipped negative. So
there are real community costs associated with what seem like
"technical" changes.

I don't know who came up with the original "community over code"
mantra, or whether the ASF truly thinks these are hard and fast rules
rather than principles and guidelines. But, if I may be so bold, I
would much prefer the mantra of "community around code". Without the
code at the center of any project, we'd just be a bunch of nerds
shooting the shit. The code's what ties us together, and the pressure
of keeping a centralized codebase that we can all feel good about
shipping is what allows us to get past our differences and produce
high quality software.

The best reference I can find on apache.org is the Committer's FAQ:
http://www.apache.org/dev/committers.html where it says explicitly:

> Note: While there is not an official list, the following six principles have 
> been cited as the core beliefs of The Apache Way:
> - collaborative software development
> - commercial-friendly standard license
> - consistently high quality software
> - respectful, honest, technical-based interaction
> - faithful implementation of standards
> - security as a mandatory feature

Maybe you disagree, but from my perspective, we're doing reasonably
well on all of them. You may not think there's much collaboration, but
in the last 2-3 weeks, I have collaborated on Hadoop-related work with
developers from Trend Micro, Facebook, Calxeda, Hortonworks, and
interacted with users from a much wider variety of organizations.

As Andrew said, I thought we were going along pretty well before this thread.


As for technical things we need to do to get to a feasible split: big
+1 that classpath pollution issues are near top of the list. We need a
reasonable classloader strategy, and I think Tom's OSGi stuff is a
good start in that direction. But it's going to be quite some time
before that's all integrated and pulled into dependent projects, etc.
So let's work on it but not be rash in our decisions.

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera


Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

2012-08-31 Thread Todd Lipcon
>> 2. Decide on a chair. Try not to VOTE for this explicitly, see if can
>>>be discussed and consensus
>>> can be reached (just a thought experiment). VOTE if necessary.
>>>
>>> 3. [VOTE] thread for 
>>>
>>> 4. Create Project:
>>>   a. paste resolution from #0 to board@ or;
>>>   b. go to general@incubator and start new Incubator project.
>>>
>>> 5. infrastructure set up.
>>>MLs moving; new UNIX groups; website setup;
>>>SVN setup like this:
>>>
>>> svn copy -m "MR TLP." https://svn.apache.org/repos/asf/hadoop/
>>>https://svn.apache.org/repos/asf/; or
>>> svn copy -m "YARN TLP." https://svn.apache.org/repos/asf/hadoop/
>>>https://svn.apache.org/repos/asf/; or
>>> svn copy -m "HDFS TLP." https://svn.apache.org/repos/asf/hadoop/
>>>https://svn.apache.org/repos/asf/
>>>
>>> After all 3 have been created run:
>>>
>>> svn remove -m "Remove Hadoop umbrella TLP. Split into separate
>>>projects." https://svn.apache.org/repos/asf/hadoop
>>>
>>> 6. (TLPs if 4a; Incubator podling if 4b;) proceed, collaborate, operate
>>>as distinct communities, and try to solve the code duplication/dependency
>>> issues from there.
>>>
>>> 7. If 4b; then graduate as TLP from Incubator.
>>>
>>> -snip
>>>
>>> So that's my proposal.
>>>
>>> Thanks guys.
>>>
>>> Cheers,
>>> Chris
>>>
>>> ++
>>> Chris Mattmann, Ph.D.
>>> Senior Computer Scientist
>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>> Office: 171-266B, Mailstop: 171-246
>>> Email: chris.a.mattm...@nasa.gov
>>> WWW:   http://sunset.usc.edu/~mattmann/
>>> ++
>>> Adjunct Assistant Professor, Computer Science Department
>>> University of Southern California, Los Angeles, CA 90089 USA
>>> ++
>>>
>



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Heads up: next hadoop-2 release

2012-09-01 Thread Todd Lipcon
On Fri, Aug 31, 2012 at 1:15 PM, Eli Collins  wrote:

> Yea, I think we should nuke 2.1.0-alpha and re-create when we're
> actually going to do a release. On the HDFS side there's quite a few
> things already waiting to get out, if it's going to take another 4 or
> so weeks then would be great to shoot for getting HDFS-3077.

Seems doable to me. I'm in the "finishing touches" stages now, and
feeling pretty confident about the basic protocol after a few
machine-years of fault injection testing, plus some early test results
on a 100 node QA setup. After the current round of open JIRAs goes in,
I'll start a sweep for findbugs, removing TODOs, and a adding a few
more stress tests. Then I think it will be a good time to propose a
merge.

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Large feature development

2012-09-01 Thread Todd Lipcon
Thanks for starting this thread, Steve. I think your points below are
good. I've snipped most of your comment and will reply inline to one
bit below:

On Fri, Aug 31, 2012 at 10:07 AM, Steve Loughran
 wrote:

> Of the big changes that have worked, they are
>
>
>1. HDFS 2's HA and ongoing improvements: collaborative dev on the list
>with incremental changes going on in trunk, RTC with lots of tests. This
>isn't finished, and the test problem there is that functional testing of
>all failure modes requires software-controlled fencing devices and switches
>-and tests to generated the expected failure space.

Actually, most of the HDFS HA code has been done on branches. The
first work that led towards HA was the redesign of the edits logging
infrastrucutre -- HDFS-1073. This was a feature branch with about 60
patches on it. Then HDFS-1623, the main manual-failover HA
development, had close to 150 patches on the branch. Automatic HA
(HDFS-3042) was some 15-20 patches. The current work (removing
dependency on NAS) is around 35 patches in so far and getting close to
merge.

In these various branches, we've experimented with a few policies
which have differed from trunk. In particular:
- HDFS-1073 had a "modified review then commit" policy, which was
that, if a patch sat without a review for more than 24hrs, we
committed it with the restriction that there would be a post-commit
review before the branch was merged.
- All of the branches have done away with the requirement of running
the full QA suite, findbugs, etc prior to commit. This means that the
branches at times have broken tests checked in, but also makes it
quicker to iterate on the new feature. Again, the assumption is that
these requirements are met before merge.
- In all cases there has been a design doc and some good design
discussion up front before substantial code was written. This made it
easier to forge ahead on the branch with good confidence that the
community was on-board with the idea.

Given my experiences, I think all of the above are useful to follow.
It means development can happen quickly, but ensures that when the
merge is proposed, people feel like the quality meets our normal
standards.

>2. YARN: Arun on his own branch, CTR, merge once mostly stable, and
>completely replacing MRv1.

I'd actually contend that YARN was merged too early. I have yet to see
anyone running YARN in production, and it's holding up the "Stable"
moniker for Hadoop 2.0 -- HDFS-wise we are already quite stable and
I'm seeing fewer issues in our customers running Hadoop HDFS 2
compared to Hadoop 1-derived code.

>
> How then do we get (a) more dev projects working and integrated by the
> current committers, and (b) a process in which people who are not yet
> contributors/committers can develop non-trivial changes to the project in a
> way that it is done with the knowledge, support and mentorship of the rest
> of the community?

Here's one proposal, making use of git as an easy way to allow
non-committers to "commit" code while still tracking development in
the usual places:
- Upon anyone's request, we create a new "Version" tag in JIRA.
- The developers create an umbrella JIRA for the project, and file the
individual work items as subtasks (either up front, or as they are
developed if using a more iterative model)
- On the umbrella, they add a pointer to a git branch to be used as
the staging area for the branch. As they develop each subtask, they
can use the JIRA to discuss the development like they would with a
normally committed JIRA, but when they feel it is ready to go (not
requiring a +1 from any committer) they commit to their git branch
instead of the SVN repo.
- When the branch is ready to merge, they can call a merge vote, which
requires +1 from 3 committers, same as a branch being proposed by an
existing committer. A committer would then use git-svn to merge their
branch commit-by-commit, or if it is less extensive, simply generate a
single big patch to commit into SVN.

My thinking is that this would provide a low-friction way for people
to collaborate with the community and develop in the open, without
having to work closely with any committer to review every individual
subtask.

Another alternative, if people are reluctant to use git, would be to
add a "sandbox/" repository inside our SVN, and hand out commit bit to
branches inside there without any PMC vote. Anyone interested in
contributing could request a branch in the sandbox, and be granted
access as soon as they get an apache SVN account.

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Large feature development

2012-09-02 Thread Todd Lipcon
Hey Arun,

First, let me apologize if my email came off as a personal "snipe"
against the project or anyone working on it. I know the team has been
hard at work for multiple years now on the project, and I certainly
don't mean to denigrate the work anyone has done. I also agree that
the improvements made possible by YARN are tremendously important, and
I've expressed this opinion both online and in interviews with
analysts, etc.

But, I'll stand by my point that YARN is at this point more "alpha"
than HDFS2. You brought up two bugs in the HDFS2 code base as examples
of HDFS 2 not being high quality. The first, HDFS-3626, was indeed a
messy bug, but had nothing to do with HA, the edit log rewrite, or any
other of the changes being discussed in the thread. In fact, the bug
has been there since the "beginning of time", and is in fact present
in Hadoop 1.0.x as well (which is why the JIRA is still open). You
simply need to pass a non-canonicalized path by the Path(URI)
constructor, and you'll see the same behavior in every release
including 1.0.x, 0.20.x, or earlier. The reason it shows up more often
in Hadoop 2 was actually due to the FsShell rewrite -- not any changes
in HDFS itself, and certainly not related to HA like you've implied
here.

The other bug causes blocksBeingWritten to disappear upon upgrade.
This, also, had nothing to do with any of the features being discussed
in this thread, and in fact only impacts a cluster which is taken down
_uncleanly_ prior to an upgrade. Upon starting the upgraded cluster,
the user would be alerted to the missing blocks and could rollback
with no lost data. So, while it should be fixed (and has been), I
wouldn't consider it particularly frightening. Most users I am aware
of do a "clean" shutdown of services like HBase before trying to
upgrade their cluster, and, worst case, they would see the issue
immediately after the upgrade and perform a rollback with no adverse
effects.

In branch-1, however, I've seen other bugs that I'd consider much more
scary. Two in particular come to mind and together represent the vast
majority of cases in which we've seen customers experience data
corruption: HDFS-3652 and HDFS-2305. These two bugs were branch-1
only, and never present in Hadoop 2 due to the "edit log rewrite"
project (HDFS-1073).

So, at risk of this thread just becoming a laundry list of bugs that
have existed in HDFS, or a list of bugs in YARN, I'll summarize: I
still think that YARN is "alpha" and HDFS 2 is at least as "stable" as
Hadoop 1.0. We have customers running it for production workloads, in
multi-rack clusters, with great success. But this has nothing to do
with this thread at hand, so I'll raise the question of
alpha/beta/stable labeling in the context of our next release vote,
and hope we can go back to the more fruitful discussion of how to
encourage large feature development while maintaining stability.

Thanks
-Todd

On Sun, Sep 2, 2012 at 3:11 PM, Arun Murthy  wrote:
> Eli,
>
> On Sep 2, 2012, at 1:01 PM, Eli Collins  wrote:
>
>> On Sat, Sep 1, 2012 at 12:47 PM, Arun C Murthy  wrote:
>>> Todd,
>>>
>>> On Sep 1, 2012, at 1:20 AM, Todd Lipcon wrote:
>>>
>>>> I'd actually contend that YARN was merged too early. I have yet to see
>>>> anyone running YARN in production, and it's holding up the "Stable"
>>>> moniker for Hadoop 2.0 -- HDFS-wise we are already quite stable and
>>>> I'm seeing fewer issues in our customers running Hadoop HDFS 2
>>>> compared to Hadoop 1-derived code.
>>>
>>> You know I respect you a ton, but I'm very saddened to see you perpetuate 
>>> this FUD on our public lists. I expected better, particularly when everyone 
>>> is working towards the same goals of advancing Hadoop-2. This sniping on 
>>> other members doing work is, um, I'll just stop here rather than regret 
>>> later.
>> 2. HDFS is more mature than YARN. Not a surprise given that we all
>> agree YARN is alpha, and a much newer project than HDFS that hasn't
>> yet been deployed in production environments yet (to my knowledge).
>
> Let's focus on the ground reality here.
>
> Please read my (or Rajiv's) message again about YARN's current
> stability and how much it's baked, it's deployment plans to a very
> large cluster in a few *days*. Or, talk to the people developing,
> testing and supporting these customers and clusters.
>
> I'll repeat - YARN has clearly baked much more than HDFS HA given
> the basic bugs (upgrade, edit logs corruption etc.) we've seen after
> being declared *done*; but then we just disagree since clearly I'm
> more conservative. Also, we need to be more conservative wrt HDFS -
> but then what would I know...
>
> I'll admit it's hard to discuss with someone (or a collective) who
> just repeat themselves. Plus, I broke my own rule about email this
> weekend - so, I'll try harder.
>
> Arun



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Large feature development

2012-09-03 Thread Todd Lipcon
On Mon, Sep 3, 2012 at 12:05 AM, Arun C Murthy  wrote:
>>
>> But, I'll stand by my point that YARN is at this point more "alpha"
>> than HDFS2.
>
> I'll unfair to tag-team me while consistently ignoring what I write.

I'm not sure I ignored what you wrote. I understand that Yahoo is
deploying soon on one of their clusters. That's great news. My
original point was about the state of YARN when it was merged, and the
comment about its current state was more of an aside. Hardly worth
debating further. Best of luck with the deployment next week - I look
forward to reading about how it goes on the list.

>> You brought up two bugs in the HDFS2 code base as examples
>> of HDFS 2 not being high quality.
>
> Through a lot of words you just agreed with what I said - if people didn't 
> upgrade to HDFS2 (not just HA) they wouldn't hit any of these: HDFS-3626,

You could hit this on Hadoop 1, it was just harder to hit.

> HDFS-3731 etc.

The details of this bug have to do with the upgrade/snapshot behavior
of the blocksBeingWritten directory which was added in branch-1. In
fact, the same basic bug continues to exist in branch-1. If you
perform an upgrade, it doesn't hard-link the blocks into the new
"current" directory. Hence, if the upgraded cluster exits safe mode
(causing lease recovery of those blocks), and then the user issues a
rollback, the blocks will have been deleted from the pre-upgrade
image. This broken branch-1 behavior carried over into branch-2 as
well, but it's not a new bug, as I said before.

> There are more, for e.g. how do folks work around Secondary NN not starting 
> up on upgrades from hadoop-1 (HDFS-3597)? They just copy multiple PBs over to 
> a new hadoop-2 cluster, or patch SNN themselves post HDFS-1073?

No, they rm -Rf the contents of the 2NN directory, which is completely
safe and doesn't data loss in any way. In fact, the bug fix is exactly
that -- it just does the rm -Rf itself, automatically. It's a trivial
workaround similar to how other bugs in the Hadoop 1 branch have
required workarounds in the past. Certainly no data movement or local
patching. The SNN is transient state and can always be cleared.

If you have any questions about other bugs in the 2.x line, feel free
to ask on the relevant JIRAs. I'm still perfectly confident in the
stability of HDFS 2 vs HDFS 1. In fact my cell phone is likely the one
that would ring if any of these production HDFS 2 clusters had an
issue, and I'll offer the same publicly to anyone on this list. If you
experience a corruption or data loss issue on the tip of branch-2
HDFS, email me off-list and I'll personally diagnose the issue. I
would not make that same offer for branch-1 due to the fundamentally
less robust design which has caused a lot of subtle bugs over the past
several years.

Thanks
-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Heads Up - hadoop-2.0.3 release

2012-11-16 Thread Todd Lipcon
+1 from me, too. I wanted to let it sit in trunk for a few weeks to see if
anyone found issues, but it's now been a bit over a month all the feedback
I've gotten so far has been good, tests have been stable, etc.

Unless anyone votes otherwise, I'll start backporting the patches into
branch-2.

Todd

On Fri, Nov 16, 2012 at 12:58 PM, lohit  wrote:

> +1 on having QJM in hadoop-2.0.3. Any rough estimate when this is targeted
> for?
>
> 2012/11/15 Arun C Murthy 
>
> > On the heels of the planned 0.23.5 release (thanks Bobby & Thomas) I want
> > to rollout a hadoop-2.0.3 release to reflect the growing stability of
> YARN.
> >
> > I'm hoping we can also release the QJM along-with; hence I'd love to know
> > an ETA - Todd? Sanjay? Suresh?
> >
> > One other thing which would be nice henceforth is to better reflect
> > release content for end-users in release-notes etc.; thus, can I ask
> > committers to start paying closer attention to bug classification such as
> > Blocker/Critical/Major/Minor etc.? This way, as we get closer to stable
> > hadoop-2 releases, we can do a better job communicating content and it's
> > criticality.
> >
> > thanks,
> > Arun
> >
> >
>
>
> --
> Have a Nice Day!
> Lohit
>



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Heads Up - hadoop-2.0.3 release

2012-11-16 Thread Todd Lipcon
Here's a git branch with the backported changes in case anyone has time to
take a look this weekend:

https://github.com/toddlipcon/hadoop-common/tree/branch-2-QJM

There were a few conflicts due to patches committed in different orders,
and I had to pull in a couple other JIRAs along the way, but it is passing
its tests. If it looks good I'll start putting up the patches on JIRA and
committing next week.

-Todd

On Fri, Nov 16, 2012 at 1:14 PM, Todd Lipcon  wrote:

> +1 from me, too. I wanted to let it sit in trunk for a few weeks to see if
> anyone found issues, but it's now been a bit over a month all the feedback
> I've gotten so far has been good, tests have been stable, etc.
>
> Unless anyone votes otherwise, I'll start backporting the patches into
> branch-2.
>
> Todd
>
> On Fri, Nov 16, 2012 at 12:58 PM, lohit wrote:
>
>> +1 on having QJM in hadoop-2.0.3. Any rough estimate when this is targeted
>> for?
>>
>> 2012/11/15 Arun C Murthy 
>>
>> > On the heels of the planned 0.23.5 release (thanks Bobby & Thomas) I
>> want
>> > to rollout a hadoop-2.0.3 release to reflect the growing stability of
>> YARN.
>> >
>> > I'm hoping we can also release the QJM along-with; hence I'd love to
>> know
>> > an ETA - Todd? Sanjay? Suresh?
>> >
>> > One other thing which would be nice henceforth is to better reflect
>> > release content for end-users in release-notes etc.; thus, can I ask
>> > committers to start paying closer attention to bug classification such
>> as
>> > Blocker/Critical/Major/Minor etc.? This way, as we get closer to stable
>> > hadoop-2 releases, we can do a better job communicating content and it's
>> > criticality.
>> >
>> > thanks,
>> > Arun
>> >
>> >
>>
>>
>> --
>> Have a Nice Day!
>> Lohit
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Heads Up - hadoop-2.0.3 release

2012-12-04 Thread Todd Lipcon
Hey Arun,

I put up patches for the QJM backport merge yesterday. Aaron said he'd
take a look at reviewing them, so I anticipate that to be finished
"real soon now". Sorry for the delay.

-Todd

On Tue, Dec 4, 2012 at 6:09 AM, Arun C Murthy  wrote:
> Lohit,
>
>  There are some outstanding blockers and I'm still awaiting the QJM merge.
>
>  Feel free to watch the blocker list:
>  http://s.apache.org/e1J
>
> Arun
>
> On Dec 3, 2012, at 10:02 AM, lohit wrote:
>
>> Hello Hadoop Release managers,
>> Any update on this?
>>
>> Thanks,
>> Lohit
>>
>> 2012/11/20 Tom White 
>>
>>> On Mon, Nov 19, 2012 at 6:09 PM, Siddharth Seth
>>>  wrote:
>>>> YARN-142/MAPREDUCE-4067 should ideally be fixed before we commit to API
>>>> backward compatibility. Also, from the recent YARN meetup - there seemed
>>> to
>>>> be a requirement to change the AM-RM protocol for container requests. In
>>>> this case, I believe it's OK to not have all functionality implemented,
>>> as
>>>> long as the protocol itself can represent the requirements.
>>>
>>> I agree. Do you think we can make these changes before removing the
>>> 'alpha' label, i.e. in 2.0.3? If that's not possible for the container
>>> requests change, then we could mark AMRMProtocol (or related classes)
>>> as @Evolving. Another alternative would be to introduce a new
>>> interface.
>>>
>>>> However, as
>>>> Bobby pointed out, given the current adoption by other projects -
>>>> incompatible changes at this point can be problematic and needs to be
>>>> figured out.
>>>
>>> We have a mechanism for this already. If something is marked as
>>> @Evolving it can change incompatibly between minor versions - e.g.
>>> 2.0.x to 2.1.0. If it is @Stable then it can only change on major
>>> versions, e.g. 2.x.y to 3.0.0. Let's make sure we are happy with the
>>> annotations - and willing to support them at the indicated level -
>>> before we remove the 'alpha' label. Of course, we strive not to change
>>> APIs without a very good reason, but if we do we should do so within
>>> the guidelines so that users know what to expect.
>>>
>>> Cheers,
>>> Tom
>>>
>>>>
>>>> Thanks
>>>> - Sid
>>>>
>>>>
>>>> On Mon, Nov 19, 2012 at 8:22 AM, Robert Evans 
>>> wrote:
>>>>
>>>>> I am OK with removing the alpha assuming that we think that the APIs are
>>>>> stable enough that we are willing to truly start maintaining backwards
>>>>> compatibility on them within 2.X. From what I have seen I think that
>>> they
>>>>> are fairly stable and I think there is enough adoption by other projects
>>>>> right now that breaking backwards compatibility would be problematic.
>>>>>
>>>>> --Bobby Evans
>>>>>
>>>>> On 11/16/12 11:34 PM, "Stack"  wrote:
>>>>>
>>>>>> On Fri, Nov 16, 2012 at 3:38 PM, Aaron T. Myers 
>>> wrote:
>>>>>>> Hi Arun,
>>>>>>>
>>>>>>> Given that the 2.0.3 release is intended to reflect the growing
>>>>>>> stability
>>>>>>> of YARN, and the QJM work will be included in 2.0.3 which provides a
>>>>>>> complete HDFS HA solution, I think it's time we consider removing the
>>>>>>> "-alpha" label from the release version. My preference would be to
>>>>>>> remove
>>>>>>> the label entirely, but we could also perhaps call it "-beta" or
>>>>>>> something.
>>>>>>>
>>>>>>> Thoughts?
>>>>>>>
>>>>>>
>>>>>> I think it fine after two minor releases undoing the '-alpha' suffix.
>>>>>>
>>>>>> If folks insist we next go to '-beta', I'd hope we'd travel all
>>>>>> remaining 22 letters of the greek alphabet before we 2.0.x.
>>>>>>
>>>>>> St.Ack
>>>>>
>>>>>
>>>
>>
>>
>>
>> --
>> Have a Nice Day!
>> Lohit
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Heads Up - hadoop-2.0.3 release

2012-12-05 Thread Todd Lipcon
OK, QJM is now in branch-2. I also merged all the follow-up patches I
could find so that branch-2 and trunk should be equivalent with regard
to QJM functionality at this point. If anyone sees anything I missed,
feel free to give me a holler.

Thanks
Todd

On Tue, Dec 4, 2012 at 6:56 PM, Arun Murthy  wrote:
> Thanks Todd!
>
>
> On Dec 4, 2012, at 6:04 PM, Todd Lipcon  wrote:
>
>> Hey Arun,
>>
>> I put up patches for the QJM backport merge yesterday. Aaron said he'd
>> take a look at reviewing them, so I anticipate that to be finished
>> "real soon now". Sorry for the delay.
>>
>> -Todd
>>
>> On Tue, Dec 4, 2012 at 6:09 AM, Arun C Murthy  wrote:
>>> Lohit,
>>>
>>> There are some outstanding blockers and I'm still awaiting the QJM merge.
>>>
>>> Feel free to watch the blocker list:
>>> http://s.apache.org/e1J
>>>
>>> Arun
>>>
>>> On Dec 3, 2012, at 10:02 AM, lohit wrote:
>>>
>>>> Hello Hadoop Release managers,
>>>> Any update on this?
>>>>
>>>> Thanks,
>>>> Lohit
>>>>
>>>> 2012/11/20 Tom White 
>>>>
>>>>> On Mon, Nov 19, 2012 at 6:09 PM, Siddharth Seth
>>>>>  wrote:
>>>>>> YARN-142/MAPREDUCE-4067 should ideally be fixed before we commit to API
>>>>>> backward compatibility. Also, from the recent YARN meetup - there seemed
>>>>> to
>>>>>> be a requirement to change the AM-RM protocol for container requests. In
>>>>>> this case, I believe it's OK to not have all functionality implemented,
>>>>> as
>>>>>> long as the protocol itself can represent the requirements.
>>>>>
>>>>> I agree. Do you think we can make these changes before removing the
>>>>> 'alpha' label, i.e. in 2.0.3? If that's not possible for the container
>>>>> requests change, then we could mark AMRMProtocol (or related classes)
>>>>> as @Evolving. Another alternative would be to introduce a new
>>>>> interface.
>>>>>
>>>>>> However, as
>>>>>> Bobby pointed out, given the current adoption by other projects -
>>>>>> incompatible changes at this point can be problematic and needs to be
>>>>>> figured out.
>>>>>
>>>>> We have a mechanism for this already. If something is marked as
>>>>> @Evolving it can change incompatibly between minor versions - e.g.
>>>>> 2.0.x to 2.1.0. If it is @Stable then it can only change on major
>>>>> versions, e.g. 2.x.y to 3.0.0. Let's make sure we are happy with the
>>>>> annotations - and willing to support them at the indicated level -
>>>>> before we remove the 'alpha' label. Of course, we strive not to change
>>>>> APIs without a very good reason, but if we do we should do so within
>>>>> the guidelines so that users know what to expect.
>>>>>
>>>>> Cheers,
>>>>> Tom
>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> - Sid
>>>>>>
>>>>>>
>>>>>> On Mon, Nov 19, 2012 at 8:22 AM, Robert Evans 
>>>>> wrote:
>>>>>>
>>>>>>> I am OK with removing the alpha assuming that we think that the APIs are
>>>>>>> stable enough that we are willing to truly start maintaining backwards
>>>>>>> compatibility on them within 2.X. From what I have seen I think that
>>>>> they
>>>>>>> are fairly stable and I think there is enough adoption by other projects
>>>>>>> right now that breaking backwards compatibility would be problematic.
>>>>>>>
>>>>>>> --Bobby Evans
>>>>>>>
>>>>>>> On 11/16/12 11:34 PM, "Stack"  wrote:
>>>>>>>
>>>>>>>> On Fri, Nov 16, 2012 at 3:38 PM, Aaron T. Myers 
>>>>> wrote:
>>>>>>>>> Hi Arun,
>>>>>>>>>
>>>>>>>>> Given that the 2.0.3 release is intended to reflect the growing
>>>>>>>>> stability
>>>>>>>>> of YARN, and the QJM work will be included in 2.0.3 which provides a
>>>>>>>>> complete HDFS HA solution, I think it's time we consider removing the
>>>>>>>>> "-alpha" label from the release version. My preference would be to
>>>>>>>>> remove
>>>>>>>>> the label entirely, but we could also perhaps call it "-beta" or
>>>>>>>>> something.
>>>>>>>>>
>>>>>>>>> Thoughts?
>>>>>>>>>
>>>>>>>>
>>>>>>>> I think it fine after two minor releases undoing the '-alpha' suffix.
>>>>>>>>
>>>>>>>> If folks insist we next go to '-beta', I'd hope we'd travel all
>>>>>>>> remaining 22 letters of the greek alphabet before we 2.0.x.
>>>>>>>>
>>>>>>>> St.Ack
>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Have a Nice Day!
>>>> Lohit
>>>
>>> --
>>> Arun C. Murthy
>>> Hortonworks Inc.
>>> http://hortonworks.com/
>>>
>>>
>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Heads Up - hadoop-2.0.3 release

2012-12-18 Thread Todd Lipcon
Any news on how this is progressing? Some folks in this thread below
inquired about getting this release out around the New Year timeframe,
but it looks like YARN-117 subtasks have gone pretty quiet. We all
know how long lifecycle changes can take to get pushed through ;-)

-Todd

On Mon, Nov 19, 2012 at 12:41 PM, Steve Loughran
 wrote:
> I want to make some changes to the lifecycle of a yarn service (in a
> backwards compatible way).
>
> https://issues.apache.org/jira/browse/YARN-117
>
>
>1. formal state machine model with stop state idempotent and entry-able
>from any state
>2. waiting/blocked state a service can enter when waiting for something
>else
>3. an alternate base class that does the state model checks before
>executing any state change functions -currently its done at
>end-of-operation in the super() calls.
>4. gradual move of services to the stricter base class.
>
> With a new base class nothing will break (as the move can be done
> case-by-case, leaving the heavily subclassed ones alone); the state model
> extensions & formalisation would be visible but not used.
>
> I don't want to hold anything up, because I need more testing of things
> before this is ready for review. I just want to get the fixes in before it
> ships
>
> On 19 November 2012 16:22, Robert Evans  wrote:
>
>> I am OK with removing the alpha assuming that we think that the APIs are
>> stable enough that we are willing to truly start maintaining backwards
>> compatibility on them within 2.X. From what I have seen I think that they
>> are fairly stable and I think there is enough adoption by other projects
>> right now that breaking backwards compatibility would be problematic.
>>
>> --Bobby Evans
>>
>> On 11/16/12 11:34 PM, "Stack"  wrote:
>>
>> >On Fri, Nov 16, 2012 at 3:38 PM, Aaron T. Myers  wrote:
>> >> Hi Arun,
>> >>
>> >> Given that the 2.0.3 release is intended to reflect the growing
>> >>stability
>> >> of YARN, and the QJM work will be included in 2.0.3 which provides a
>> >> complete HDFS HA solution, I think it's time we consider removing the
>> >> "-alpha" label from the release version. My preference would be to
>> >>remove
>> >> the label entirely, but we could also perhaps call it "-beta" or
>> >>something.
>> >>
>> >> Thoughts?
>> >>
>> >
>> >I think it fine after two minor releases undoing the '-alpha' suffix.
>> >
>> >If folks insist we next go to '-beta', I'd hope we'd travel all
>> >remaining 22 letters of the greek alphabet before we 2.0.x.
>> >
>> >St.Ack
>>
>>



-- 
Todd Lipcon
Software Engineer, Cloudera


<    1   2