from:"Todd Lipcon"

Re: Heads Up - hadoop-2.0.3 release

2012-12-04 Thread Todd Lipcon

Hey Arun,

I put up patches for the QJM backport merge yesterday. Aaron said he'd
take a look at reviewing them, so I anticipate that to be finished
real soon now. Sorry for the delay.

-Todd

On Tue, Dec 4, 2012 at 6:09 AM, Arun C Murthy a...@hortonworks.com wrote:
 Lohit,

  There are some outstanding blockers and I'm still awaiting the QJM merge.

  Feel free to watch the blocker list:
  http://s.apache.org/e1J

 Arun

 On Dec 3, 2012, at 10:02 AM, lohit wrote:

 Hello Hadoop Release managers,
 Any update on this?

 Thanks,
 Lohit

 2012/11/20 Tom White t...@cloudera.com

 On Mon, Nov 19, 2012 at 6:09 PM, Siddharth Seth
 seth.siddha...@gmail.com wrote:
 YARN-142/MAPREDUCE-4067 should ideally be fixed before we commit to API
 backward compatibility. Also, from the recent YARN meetup - there seemed
 to
 be a requirement to change the AM-RM protocol for container requests. In
 this case, I believe it's OK to not have all functionality implemented,
 as
 long as the protocol itself can represent the requirements.

 I agree. Do you think we can make these changes before removing the
 'alpha' label, i.e. in 2.0.3? If that's not possible for the container
 requests change, then we could mark AMRMProtocol (or related classes)
 as @Evolving. Another alternative would be to introduce a new
 interface.

 However, as
 Bobby pointed out, given the current adoption by other projects -
 incompatible changes at this point can be problematic and needs to be
 figured out.

 We have a mechanism for this already. If something is marked as
 @Evolving it can change incompatibly between minor versions - e.g.
 2.0.x to 2.1.0. If it is @Stable then it can only change on major
 versions, e.g. 2.x.y to 3.0.0. Let's make sure we are happy with the
 annotations - and willing to support them at the indicated level -
 before we remove the 'alpha' label. Of course, we strive not to change
 APIs without a very good reason, but if we do we should do so within
 the guidelines so that users know what to expect.

 Cheers,
 Tom


 Thanks
 - Sid


 On Mon, Nov 19, 2012 at 8:22 AM, Robert Evans ev...@yahoo-inc.com
 wrote:

 I am OK with removing the alpha assuming that we think that the APIs are
 stable enough that we are willing to truly start maintaining backwards
 compatibility on them within 2.X. From what I have seen I think that
 they
 are fairly stable and I think there is enough adoption by other projects
 right now that breaking backwards compatibility would be problematic.

 --Bobby Evans

 On 11/16/12 11:34 PM, Stack st...@duboce.net wrote:

 On Fri, Nov 16, 2012 at 3:38 PM, Aaron T. Myers a...@cloudera.com
 wrote:
 Hi Arun,

 Given that the 2.0.3 release is intended to reflect the growing
 stability
 of YARN, and the QJM work will be included in 2.0.3 which provides a
 complete HDFS HA solution, I think it's time we consider removing the
 -alpha label from the release version. My preference would be to
 remove
 the label entirely, but we could also perhaps call it -beta or
 something.

 Thoughts?


 I think it fine after two minor releases undoing the '-alpha' suffix.

 If folks insist we next go to '-beta', I'd hope we'd travel all
 remaining 22 letters of the greek alphabet before we 2.0.x.

 St.Ack






 --
 Have a Nice Day!
 Lohit

 --
 Arun C. Murthy
 Hortonworks Inc.
 http://hortonworks.com/





-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Heads Up - hadoop-2.0.3 release

2012-11-16 Thread Todd Lipcon

+1 from me, too. I wanted to let it sit in trunk for a few weeks to see if
anyone found issues, but it's now been a bit over a month all the feedback
I've gotten so far has been good, tests have been stable, etc.

Unless anyone votes otherwise, I'll start backporting the patches into
branch-2.

Todd

On Fri, Nov 16, 2012 at 12:58 PM, lohit lohit.vijayar...@gmail.com wrote:

 +1 on having QJM in hadoop-2.0.3. Any rough estimate when this is targeted
 for?

 2012/11/15 Arun C Murthy a...@hortonworks.com

  On the heels of the planned 0.23.5 release (thanks Bobby  Thomas) I want
  to rollout a hadoop-2.0.3 release to reflect the growing stability of
 YARN.
 
  I'm hoping we can also release the QJM along-with; hence I'd love to know
  an ETA - Todd? Sanjay? Suresh?
 
  One other thing which would be nice henceforth is to better reflect
  release content for end-users in release-notes etc.; thus, can I ask
  committers to start paying closer attention to bug classification such as
  Blocker/Critical/Major/Minor etc.? This way, as we get closer to stable
  hadoop-2 releases, we can do a better job communicating content and it's
  criticality.
 
  thanks,
  Arun
 
 


 --
 Have a Nice Day!
 Lohit




-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Heads Up - hadoop-2.0.3 release

2012-11-16 Thread Todd Lipcon

Here's a git branch with the backported changes in case anyone has time to
take a look this weekend:

https://github.com/toddlipcon/hadoop-common/tree/branch-2-QJM

There were a few conflicts due to patches committed in different orders,
and I had to pull in a couple other JIRAs along the way, but it is passing
its tests. If it looks good I'll start putting up the patches on JIRA and
committing next week.

-Todd

On Fri, Nov 16, 2012 at 1:14 PM, Todd Lipcon t...@cloudera.com wrote:

 +1 from me, too. I wanted to let it sit in trunk for a few weeks to see if
 anyone found issues, but it's now been a bit over a month all the feedback
 I've gotten so far has been good, tests have been stable, etc.

 Unless anyone votes otherwise, I'll start backporting the patches into
 branch-2.

 Todd

 On Fri, Nov 16, 2012 at 12:58 PM, lohit lohit.vijayar...@gmail.comwrote:

 +1 on having QJM in hadoop-2.0.3. Any rough estimate when this is targeted
 for?

 2012/11/15 Arun C Murthy a...@hortonworks.com

  On the heels of the planned 0.23.5 release (thanks Bobby  Thomas) I
 want
  to rollout a hadoop-2.0.3 release to reflect the growing stability of
 YARN.
 
  I'm hoping we can also release the QJM along-with; hence I'd love to
 know
  an ETA - Todd? Sanjay? Suresh?
 
  One other thing which would be nice henceforth is to better reflect
  release content for end-users in release-notes etc.; thus, can I ask
  committers to start paying closer attention to bug classification such
 as
  Blocker/Critical/Major/Minor etc.? This way, as we get closer to stable
  hadoop-2 releases, we can do a better job communicating content and it's
  criticality.
 
  thanks,
  Arun
 
 


 --
 Have a Nice Day!
 Lohit




 --
 Todd Lipcon
 Software Engineer, Cloudera




-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Large feature development

2012-09-03 Thread Todd Lipcon

On Mon, Sep 3, 2012 at 12:05 AM, Arun C Murthy a...@hortonworks.com wrote:

 But, I'll stand by my point that YARN is at this point more alpha
 than HDFS2.

 I'll unfair to tag-team me while consistently ignoring what I write.

I'm not sure I ignored what you wrote. I understand that Yahoo is
deploying soon on one of their clusters. That's great news. My
original point was about the state of YARN when it was merged, and the
comment about its current state was more of an aside. Hardly worth
debating further. Best of luck with the deployment next week - I look
forward to reading about how it goes on the list.

 You brought up two bugs in the HDFS2 code base as examples
 of HDFS 2 not being high quality.

 Through a lot of words you just agreed with what I said - if people didn't 
 upgrade to HDFS2 (not just HA) they wouldn't hit any of these: HDFS-3626,

You could hit this on Hadoop 1, it was just harder to hit.

 HDFS-3731 etc.

The details of this bug have to do with the upgrade/snapshot behavior
of the blocksBeingWritten directory which was added in branch-1. In
fact, the same basic bug continues to exist in branch-1. If you
perform an upgrade, it doesn't hard-link the blocks into the new
current directory. Hence, if the upgraded cluster exits safe mode
(causing lease recovery of those blocks), and then the user issues a
rollback, the blocks will have been deleted from the pre-upgrade
image. This broken branch-1 behavior carried over into branch-2 as
well, but it's not a new bug, as I said before.

 There are more, for e.g. how do folks work around Secondary NN not starting 
 up on upgrades from hadoop-1 (HDFS-3597)? They just copy multiple PBs over to 
 a new hadoop-2 cluster, or patch SNN themselves post HDFS-1073?

No, they rm -Rf the contents of the 2NN directory, which is completely
safe and doesn't data loss in any way. In fact, the bug fix is exactly
that -- it just does the rm -Rf itself, automatically. It's a trivial
workaround similar to how other bugs in the Hadoop 1 branch have
required workarounds in the past. Certainly no data movement or local
patching. The SNN is transient state and can always be cleared.

If you have any questions about other bugs in the 2.x line, feel free
to ask on the relevant JIRAs. I'm still perfectly confident in the
stability of HDFS 2 vs HDFS 1. In fact my cell phone is likely the one
that would ring if any of these production HDFS 2 clusters had an
issue, and I'll offer the same publicly to anyone on this list. If you
experience a corruption or data loss issue on the tip of branch-2
HDFS, email me off-list and I'll personally diagnose the issue. I
would not make that same offer for branch-1 due to the fundamentally
less robust design which has caused a lot of subtle bugs over the past
several years.

Thanks
-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Large feature development

2012-09-02 Thread Todd Lipcon

Hey Arun,

First, let me apologize if my email came off as a personal snipe
against the project or anyone working on it. I know the team has been
hard at work for multiple years now on the project, and I certainly
don't mean to denigrate the work anyone has done. I also agree that
the improvements made possible by YARN are tremendously important, and
I've expressed this opinion both online and in interviews with
analysts, etc.

But, I'll stand by my point that YARN is at this point more alpha
than HDFS2. You brought up two bugs in the HDFS2 code base as examples
of HDFS 2 not being high quality. The first, HDFS-3626, was indeed a
messy bug, but had nothing to do with HA, the edit log rewrite, or any
other of the changes being discussed in the thread. In fact, the bug
has been there since the beginning of time, and is in fact present
in Hadoop 1.0.x as well (which is why the JIRA is still open). You
simply need to pass a non-canonicalized path by the Path(URI)
constructor, and you'll see the same behavior in every release
including 1.0.x, 0.20.x, or earlier. The reason it shows up more often
in Hadoop 2 was actually due to the FsShell rewrite -- not any changes
in HDFS itself, and certainly not related to HA like you've implied
here.

The other bug causes blocksBeingWritten to disappear upon upgrade.
This, also, had nothing to do with any of the features being discussed
in this thread, and in fact only impacts a cluster which is taken down
_uncleanly_ prior to an upgrade. Upon starting the upgraded cluster,
the user would be alerted to the missing blocks and could rollback
with no lost data. So, while it should be fixed (and has been), I
wouldn't consider it particularly frightening. Most users I am aware
of do a clean shutdown of services like HBase before trying to
upgrade their cluster, and, worst case, they would see the issue
immediately after the upgrade and perform a rollback with no adverse
effects.

In branch-1, however, I've seen other bugs that I'd consider much more
scary. Two in particular come to mind and together represent the vast
majority of cases in which we've seen customers experience data
corruption: HDFS-3652 and HDFS-2305. These two bugs were branch-1
only, and never present in Hadoop 2 due to the edit log rewrite
project (HDFS-1073).

So, at risk of this thread just becoming a laundry list of bugs that
have existed in HDFS, or a list of bugs in YARN, I'll summarize: I
still think that YARN is alpha and HDFS 2 is at least as stable as
Hadoop 1.0. We have customers running it for production workloads, in
multi-rack clusters, with great success. But this has nothing to do
with this thread at hand, so I'll raise the question of
alpha/beta/stable labeling in the context of our next release vote,
and hope we can go back to the more fruitful discussion of how to
encourage large feature development while maintaining stability.

Thanks
-Todd

On Sun, Sep 2, 2012 at 3:11 PM, Arun Murthy a...@hortonworks.com wrote:
 Eli,

 On Sep 2, 2012, at 1:01 PM, Eli Collins e...@cloudera.com wrote:

 On Sat, Sep 1, 2012 at 12:47 PM, Arun C Murthy a...@hortonworks.com wrote:
 Todd,

 On Sep 1, 2012, at 1:20 AM, Todd Lipcon wrote:

 I'd actually contend that YARN was merged too early. I have yet to see
 anyone running YARN in production, and it's holding up the Stable
 moniker for Hadoop 2.0 -- HDFS-wise we are already quite stable and
 I'm seeing fewer issues in our customers running Hadoop HDFS 2
 compared to Hadoop 1-derived code.

 You know I respect you a ton, but I'm very saddened to see you perpetuate 
 this FUD on our public lists. I expected better, particularly when everyone 
 is working towards the same goals of advancing Hadoop-2. This sniping on 
 other members doing work is, um, I'll just stop here rather than regret 
 later.
 2. HDFS is more mature than YARN. Not a surprise given that we all
 agree YARN is alpha, and a much newer project than HDFS that hasn't
 yet been deployed in production environments yet (to my knowledge).

 Let's focus on the ground reality here.

 Please read my (or Rajiv's) message again about YARN's current
 stability and how much it's baked, it's deployment plans to a very
 large cluster in a few *days*. Or, talk to the people developing,
 testing and supporting these customers and clusters.

 I'll repeat - YARN has clearly baked much more than HDFS HA given
 the basic bugs (upgrade, edit logs corruption etc.) we've seen after
 being declared *done*; but then we just disagree since clearly I'm
 more conservative. Also, we need to be more conservative wrt HDFS -
 but then what would I know...

 I'll admit it's hard to discuss with someone (or a collective) who
 just repeat themselves. Plus, I broke my own rule about email this
 weekend - so, I'll try harder.

 Arun



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Heads up: next hadoop-2 release

2012-09-01 Thread Todd Lipcon

On Fri, Aug 31, 2012 at 1:15 PM, Eli Collins e...@cloudera.com wrote:

 Yea, I think we should nuke 2.1.0-alpha and re-create when we're
 actually going to do a release. On the HDFS side there's quite a few
 things already waiting to get out, if it's going to take another 4 or
 so weeks then would be great to shoot for getting HDFS-3077.

Seems doable to me. I'm in the finishing touches stages now, and
feeling pretty confident about the basic protocol after a few
machine-years of fault injection testing, plus some early test results
on a 100 node QA setup. After the current round of open JIRAs goes in,
I'll start a sweep for findbugs, removing TODOs, and a adding a few
more stress tests. Then I think it will be a good time to propose a
merge.

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Large feature development

2012-09-01 Thread Todd Lipcon

Thanks for starting this thread, Steve. I think your points below are
good. I've snipped most of your comment and will reply inline to one
bit below:

On Fri, Aug 31, 2012 at 10:07 AM, Steve Loughran
steve.lough...@gmail.com wrote:

 Of the big changes that have worked, they are


1. HDFS 2's HA and ongoing improvements: collaborative dev on the list
with incremental changes going on in trunk, RTC with lots of tests. This
isn't finished, and the test problem there is that functional testing of
all failure modes requires software-controlled fencing devices and switches
-and tests to generated the expected failure space.

Actually, most of the HDFS HA code has been done on branches. The
first work that led towards HA was the redesign of the edits logging
infrastrucutre -- HDFS-1073. This was a feature branch with about 60
patches on it. Then HDFS-1623, the main manual-failover HA
development, had close to 150 patches on the branch. Automatic HA
(HDFS-3042) was some 15-20 patches. The current work (removing
dependency on NAS) is around 35 patches in so far and getting close to
merge.

In these various branches, we've experimented with a few policies
which have differed from trunk. In particular:
- HDFS-1073 had a modified review then commit policy, which was
that, if a patch sat without a review for more than 24hrs, we
committed it with the restriction that there would be a post-commit
review before the branch was merged.
- All of the branches have done away with the requirement of running
the full QA suite, findbugs, etc prior to commit. This means that the
branches at times have broken tests checked in, but also makes it
quicker to iterate on the new feature. Again, the assumption is that
these requirements are met before merge.
- In all cases there has been a design doc and some good design
discussion up front before substantial code was written. This made it
easier to forge ahead on the branch with good confidence that the
community was on-board with the idea.

Given my experiences, I think all of the above are useful to follow.
It means development can happen quickly, but ensures that when the
merge is proposed, people feel like the quality meets our normal
standards.

2. YARN: Arun on his own branch, CTR, merge once mostly stable, and
completely replacing MRv1.

I'd actually contend that YARN was merged too early. I have yet to see
anyone running YARN in production, and it's holding up the Stable
moniker for Hadoop 2.0 -- HDFS-wise we are already quite stable and
I'm seeing fewer issues in our customers running Hadoop HDFS 2
compared to Hadoop 1-derived code.


 How then do we get (a) more dev projects working and integrated by the
 current committers, and (b) a process in which people who are not yet
 contributors/committers can develop non-trivial changes to the project in a
 way that it is done with the knowledge, support and mentorship of the rest
 of the community?

Here's one proposal, making use of git as an easy way to allow
non-committers to commit code while still tracking development in
the usual places:
- Upon anyone's request, we create a new Version tag in JIRA.
- The developers create an umbrella JIRA for the project, and file the
individual work items as subtasks (either up front, or as they are
developed if using a more iterative model)
- On the umbrella, they add a pointer to a git branch to be used as
the staging area for the branch. As they develop each subtask, they
can use the JIRA to discuss the development like they would with a
normally committed JIRA, but when they feel it is ready to go (not
requiring a +1 from any committer) they commit to their git branch
instead of the SVN repo.
- When the branch is ready to merge, they can call a merge vote, which
requires +1 from 3 committers, same as a branch being proposed by an
existing committer. A committer would then use git-svn to merge their
branch commit-by-commit, or if it is less extensive, simply generate a
single big patch to commit into SVN.

My thinking is that this would provide a low-friction way for people
to collaborate with the community and develop in the open, without
having to work closely with any committer to review every individual
subtask.

Another alternative, if people are reluctant to use git, would be to
add a sandbox/ repository inside our SVN, and hand out commit bit to
branches inside there without any PMC vote. Anyone interested in
contributing could request a branch in the sandbox, and be granted
access as soon as they get an apache SVN account.

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

2012-08-31 Thread Todd Lipcon

, operate
as distinct communities, and try to solve the code duplication/dependency
 issues from there.

 7. If 4b; then graduate as TLP from Incubator.

 -snip

 So that's my proposal.

 Thanks guys.

 Cheers,
 Chris

 ++
 Chris Mattmann, Ph.D.
 Senior Computer Scientist
 NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
 Office: 171-266B, Mailstop: 171-246
 Email: chris.a.mattm...@nasa.gov
 WWW:   http://sunset.usc.edu/~mattmann/
 ++
 Adjunct Assistant Professor, Computer Science Department
 University of Southern California, Los Angeles, CA 90089 USA
 ++





-- 
Todd Lipcon
Software Engineer, Cloudera

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

2012-08-29 Thread Todd Lipcon

On Wed, Aug 29, 2012 at 4:20 PM, Mattmann, Chris A (388J)
chris.a.mattm...@jpl.nasa.gov wrote:
 Arun, great work below. Concrete, and an actual proposal of PMC lists.

 What do folks think?

Already expressed my opinion above on the thread that whole idea of
splitting is crazy. But, I'll comment on some specifics of the
proposal as well:


 I think the simplest way is to have all existing HDFS committers be 
 committers and PMC members of the new project. That list is found in the 
 asf-authorization-template which has:

Why? If we were to do this, why not take the opportunity to narrow
down into the people who are actually active contributors to the
project? (per your reasoning on the YARN thread)


 hadoop-hdfs = 
 acmurthy,atm,aw,boryas,cdouglas,cos,cutting,daryn,ddas,dhruba,eli,enis,eric14,eyang,gkesavan,hairong,harsh,jitendra,jghoman,johan,knoguchi,kzhang,lohit,mahadev,matei,mattf,molkov,nigel,omalley,ramya,rangadi,sharad,shv,sradia,stevel,suresh,szetszwo,tanping,todd,tomwhite,tucu,umamahesh,yhemanth,zshao

Of these, only the following people have actually contributed more
than 5 patches to common and HDFS in the last year:
Hairong Kuang (7):
Vinod Kumar Vavilapalli (7):
Daryn Sharp (8):
Matthew J. Foley (10):
Devaraj Das (11):
Mahadev Konar (15):
Eric Yang (18):
Sanjay Radia (18):
Thomas Graves (18):
Thomas White (21):
Konstantin Shvachko (23):
Steve Loughran (24):
Arun Murthy (32):
Uma Maheswara Rao G (36):
Jitendra Nath Pandey (51):
Harsh J (68):
Robert Joseph Evans (71):
Alejandro Abdelnur (106):
Suresh Srinivas (107):
Aaron Twining Myers (171):
Tsz-wo Sze (184):
Eli Collins (252):
Todd Lipcon (286):

So I would propose:
atm,daryn,ddas,eli,eyang,hairong,harsh,jitendra,mahadev,mattf,shv,sradia,stevel,suresh,szetszwo,todd,tomwhite,tucu,umamahesh

and listing the others as Emeritus, who could easily regain committer
status if they started contributing again.



 


 Proposal: Apache Hadoop MapReduce as a TLP

 I propose we graduate MapReduce as a TLP named 'Apache Hadoop MapReduce'.

 I think the simplest way is to have all existing MR committers be committers 
 and PMC members of the new project. That list is found in the 
 asf-authorization-template which has:

 hadoop-mapreduce = 
 acmurthy,amareshwari,amarrk,aw,bobby,cdouglas,cos,cutting,daryn,ddas,dhruba,enis,eric14,eyang,gkesavan,hairong,harsh,hitesh,jeagles,jitendra,jghoman,johan,kimballa,knoguchi,kzhang,llu,lohit,mahadev,matei,mattf,nigel,omalley,ramya,rangadi,ravigummadi,schen,sharad,shv,sradia,sreekanth,sseth,stevel,szetszwo,tgraves,todd,tomwhite,tucu,vinodkv,yhemanth,zshao


Applying the same criteria, the list would be:

Suresh Srinivas (6):
Aaron Twining Myers (7):
Steve Loughran (7):
Ravi  Gummadi (9):
Konstantin Shvachko (11):
Todd Lipcon (12):
Tsz-wo Sze (16):
Amar Kamat (17):
Harsh J (20):
Eli Collins (21):
Thomas White (27):
Siddharth Seth (46):
Thomas Graves (60):
Alejandro Abdelnur (71):
Robert Joseph Evans (107):
Mahadev Konar (118):
Vinod Kumar Vavilapalli (164):
Arun Murthy (209):

(this is based on git shortlog on the directories in the repository)


But I still think this discussion is silly, and we're not ready to do it.

-- 
Todd Lipcon
Software Engineer, Cloudera

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

2012-08-29 Thread Todd Lipcon

-- 
Todd Lipcon
Software Engineer, Cloudera

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

2012-08-29 Thread Todd Lipcon

On Wed, Aug 29, 2012 at 4:47 PM, Konstantin Boudnik c...@apache.org wrote:
 I am curious where the arbitrar numbery 5 is coming from: is it reflected in
 the bylaws?

Nope, I picked it based on Arun's earlier picking of the same number
in the YARN thread. We have no bylaws about what would happen in the
eventual TLP-ification of subcomponents, of course.

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

2012-08-29 Thread Todd Lipcon

technical issues that this would create. I want to spend that time
making the product better, for our users benefit. Whether the users
are Apache community users, or Cloudera customers, or Facebook's data
scientists, they all are going to be happier if I spend a month
improving our HA support compared to spending a month figuring out how
to release three separate projects which somehow stitch together in a
reasonable way at runtime without jar conflicts, tons of duplicate
configuration work, byzantine version dependencies, etc.

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

Re: [DISCUSS] Merge *-user@hadoop lists?

2012-07-20 Thread Todd Lipcon

Sure, +1. I already subscribe to all and filter into the same mailbox anyway :)

-Todd

On Fri, Jul 20, 2012 at 11:34 AM, Mahadev Konar maha...@hortonworks.com wrote:
 +1 .


 On Fri, Jul 20, 2012 at 10:48 AM, Jitendra Pandey
 jiten...@hortonworks.com wrote:
 +1 for merging.

 On Thu, Jul 19, 2012 at 11:25 PM, Arun C Murthy a...@hortonworks.com wrote:

 I've been thinking that we currently have too many *-user@ lists
 (common,hdfs,mapreduce) and confuse folks all the time resulting in too
 many cross-posts etc., particularly new users. Basically, it's too unwieldy
 and tedious.

 How about simplifying things by having a single user@hadoop.apache.orglist 
 by merging all of them?

 Thoughts?

 Arun




 --
 http://hortonworks.com/download/



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: [VOTE] Release hadoop-2.0.0-alpha-rc1

2012-05-16 Thread Todd Lipcon

OK, the fixes to CHANGES.txt and JIRA are complete. Sorry for the mail bomb ;-)

-Todd

On Tue, May 15, 2012 at 10:30 PM, Todd Lipcon t...@cloudera.com wrote:
 Thanks for posting the new RC. Will take a look tomorrow. Meanwhile,
 I'm going through CHANGES.txt and JIRA and moving things that didn't
 make the 2.0.0 cut to 2.0.1.

 So, if folks commit things tomorrow, please check to put it in the
 right spot in CHANGES.txt and in JIRA. I'll take care of anything
 committed tonight that would conflict with my change.

 -Todd

 On Tue, May 15, 2012 at 7:20 PM, Arun C Murthy a...@hortonworks.com wrote:
 I've created a release candidate (rc1) for hadoop-2.0.0-alpha that I would 
 like to release.

 It is available at: 
 http://people.apache.org/~acmurthy/hadoop-2.0.0-alpha-rc1/

 The maven artifacts are available via repository.apache.org.

 Please try the release and vote; the vote will run for the usual 7 days.

 thanks,
 Arun


 --
 Arun C. Murthy
 Hortonworks Inc.
 http://hortonworks.com/





 --
 Todd Lipcon
 Software Engineer, Cloudera



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: [VOTE] Release hadoop-2.0.0-alpha

2012-05-15 Thread Todd Lipcon

Hi Kumar,

It looks like that patch was only committed to trunk, not branch-2.

IMO we should keep the new changes for 2.0.0-alpha to a minimum (just
things that impact client-server wire compatibility) and then plan a
2.0.1-alpha ASAP following this release, where we can pull in everything
else that went into branch-2 in the last couple weeks since the 2.0.0-alpha
branch was cut.

Arun: do you have time today to roll a new RC? If not, I am happy to do so.

Does that sound reasonable?
-Todd

On Tue, May 15, 2012 at 8:51 AM, Kumar Ravi kum...@us.ibm.com wrote:

 Hi,

  Can HDFS-3265 be included too?
 It seems like this was marked for inclusion but I can't seem to find the
 patch in the branch-2.0.0-alpha tree.

 Thanks,
 Kumar

 Kumar Ravi


 [image: Inactive hide details for Todd Lipcon ---05/14/2012 11:21:34
 PM---Hey Arun,]Todd Lipcon ---05/14/2012 11:21:34 PM---Hey Arun,



From:


 Todd Lipcon t...@cloudera.com

To:


 general@hadoop.apache.org

Date:


 05/14/2012 11:21 PM

Subject:


 Re: [VOTE] Release hadoop-2.0.0-alpha
 --


 Hey Arun,

 One more thing on the rc tarball: the source artifact doesn't appear
 to be an exact svn export, based on a diff. For example, it includes
 the README, NOTICE, and LICENSE files, as well as a few other things
 which appear to be build artifacts (eg
 hadoop-hdfs-project/hadoop-hdfs/downloads,
 hadoop-hdfs-project/hadoop-hdfs/test_edit_log, etc).

 It seems like we _should_ have the various README style files, but we
 shouldn't have the test artifacts in our source release.

 In order to get our source release to match svn, perhaps we should
 move NOTICE, README, LICENSE, etc to the top level of our svn repo,
 such that a pure svn export would be a releaseable source artifact?

 -Todd



 On Mon, May 14, 2012 at 2:14 PM, Siddharth Seth
 seth.siddha...@gmail.com wrote:
  Do we want to get MAPREDUCE-4067 in as well ? It affects folks who may be
  writing their own AMs. Shouldn't affect MR clients though. I believe 2.0
  alpha doesn't freeze the Yarn protocols for the 2.0 branch, so probably
 not
  critical.
 
  Thanks
  - Sid
 
  On Mon, May 14, 2012 at 1:32 PM, Eli Collins e...@cloudera.com wrote:
 
  As soon as jira is back up and I can post an updated patch I'll merge
  HDFS-3418 (also incompatible).
 
 
  On Mon, May 14, 2012 at 12:16 PM, Tsz Wo Sze szets...@yahoo.com
 wrote:
   I just have merged HADOOP-8285 and HADOOP-8366.  I also have merged
  HDFS-3211 since it is an incompatible protocol change (without it,
  2.0.0-alphaand 2.0.0 will be incompatible.)
  
   Tsz-Wo
  
  
  
   - Original Message -
   From: Tsz Wo Sze szets...@yahoo.com
   To: general@hadoop.apache.org general@hadoop.apache.org
   Cc:
   Sent: Monday, May 14, 2012 11:07 AM
   Subject: Re: [VOTE] Release hadoop-2.0.0-alpha
  
   Let me merge HADOOP-8285 and HADOOP-8366.  Thanks.
   Tsz-Wo
  
  
  
   - Original Message -
   From: Uma Maheswara Rao G mahesw...@huawei.com
   To: general@hadoop.apache.org general@hadoop.apache.org
   Cc:
   Sent: Monday, May 14, 2012 10:56 AM
   Subject: RE: [VOTE] Release hadoop-2.0.0-alpha
  
   a) Revert HDFS-3157 and commit HADOOP-8285 and HADOOP-8366 on
   branch-2.0.0-alpha, so these are the only changes since rc0. Roll a
   new rc1 from here.
   I have merged HDFS-3157 revert.
   Do you mind taking a look at HADOOP-8285 and HADOOP-8366?
  
   Thanks,
   Uma
   
   From: Arun C Murthy [a...@hortonworks.com]
   Sent: Monday, May 14, 2012 10:24 PM
   To: general@hadoop.apache.org
   Subject: Re: [VOTE] Release hadoop-2.0.0-alpha
  
   Todd,
  
   Please go ahead and merge changes into branch-2.0.0-alpha and I'll
 roll
  RC1.
  
   thanks,
   Arun
  
   On May 12, 2012, at 10:05 PM, Todd Lipcon wrote:
  
   Looking at the release tag vs the current state of branch-2, I have
   two concerns from the point of view of HDFS:
  
   1) We reverted HDFS-3157 in branch-2 because it sends deletions for
   corrupt replicas without properly going through the corrupt block
   path. We saw this cause data loss in TestPipelinesFailover. So, I'm
   nervous about putting it in a release, even labeled as alpha.
  
   2) HADOOP-8285 and HADOOP-8366 changed the wire format for the RPC
   envelope in branch-2, but didn't make it into this rc. So, that would
   mean that future alphas would not be protocol-compatible with this
   alpha. Per a discussion a few weeks ago, I think we all were in
   agreement that, if possible, we'd like all 2.x to be compatible for
   client-server communication, at least (even if we don't support
   cross-version for the intra-cluster protocols)
  
   Do other folks think it's worth rolling an rc1? I would propose
 either:
   a) Revert HDFS-3157 and commit HADOOP-8285 and HADOOP-8366 on
   branch-2.0.0-alpha, so these are the only changes since rc0. Roll a
   new rc1 from here.
   or:
   b) Discard the current branch-2.0.0-alpha and re-branch from

Re: [VOTE] Release hadoop-2.0.0-alpha

2012-05-15 Thread Todd Lipcon

On Tue, May 15, 2012 at 11:10 AM, Arun C Murthy a...@hortonworks.com wrote:
 Any more HDFS related merges before I roll RC1?

I'm good as is. Thanks!


 On May 15, 2012, at 10:05 AM, Arun C Murthy wrote:

 Eli, is this done so I can roll rc1?

 On May 14, 2012, at 1:32 PM, Eli Collins wrote:

 As soon as jira is back up and I can post an updated patch I'll merge
 HDFS-3418 (also incompatible).


 On Mon, May 14, 2012 at 12:16 PM, Tsz Wo Sze szets...@yahoo.com wrote:
 I just have merged HADOOP-8285 and HADOOP-8366.  I also have merged 
 HDFS-3211 since it is an incompatible protocol change (without it, 
 2.0.0-alphaand 2.0.0 will be incompatible.)

 Tsz-Wo



 - Original Message -
 From: Tsz Wo Sze szets...@yahoo.com
 To: general@hadoop.apache.org general@hadoop.apache.org
 Cc:
 Sent: Monday, May 14, 2012 11:07 AM
 Subject: Re: [VOTE] Release hadoop-2.0.0-alpha

 Let me merge HADOOP-8285 and HADOOP-8366.  Thanks.
 Tsz-Wo



 - Original Message -
 From: Uma Maheswara Rao G mahesw...@huawei.com
 To: general@hadoop.apache.org general@hadoop.apache.org
 Cc:
 Sent: Monday, May 14, 2012 10:56 AM
 Subject: RE: [VOTE] Release hadoop-2.0.0-alpha

 a) Revert HDFS-3157 and commit HADOOP-8285 and HADOOP-8366 on
 branch-2.0.0-alpha, so these are the only changes since rc0. Roll a
 new rc1 from here.
 I have merged HDFS-3157 revert.
 Do you mind taking a look at HADOOP-8285 and HADOOP-8366?

 Thanks,
 Uma
 
 From: Arun C Murthy [a...@hortonworks.com]
 Sent: Monday, May 14, 2012 10:24 PM
 To: general@hadoop.apache.org
 Subject: Re: [VOTE] Release hadoop-2.0.0-alpha

 Todd,

 Please go ahead and merge changes into branch-2.0.0-alpha and I'll roll 
 RC1.

 thanks,
 Arun

 On May 12, 2012, at 10:05 PM, Todd Lipcon wrote:

 Looking at the release tag vs the current state of branch-2, I have
 two concerns from the point of view of HDFS:

 1) We reverted HDFS-3157 in branch-2 because it sends deletions for
 corrupt replicas without properly going through the corrupt block
 path. We saw this cause data loss in TestPipelinesFailover. So, I'm
 nervous about putting it in a release, even labeled as alpha.

 2) HADOOP-8285 and HADOOP-8366 changed the wire format for the RPC
 envelope in branch-2, but didn't make it into this rc. So, that would
 mean that future alphas would not be protocol-compatible with this
 alpha. Per a discussion a few weeks ago, I think we all were in
 agreement that, if possible, we'd like all 2.x to be compatible for
 client-server communication, at least (even if we don't support
 cross-version for the intra-cluster protocols)

 Do other folks think it's worth rolling an rc1? I would propose either:
 a) Revert HDFS-3157 and commit HADOOP-8285 and HADOOP-8366 on
 branch-2.0.0-alpha, so these are the only changes since rc0. Roll a
 new rc1 from here.
 or:
 b) Discard the current branch-2.0.0-alpha and re-branch from the
 current state of branch-2.

 -Todd

 On Fri, May 11, 2012 at 7:19 PM, Eli Collins e...@cloudera.com wrote:
 +1  I installed the build on a 6 node cluster and kicked the tires,
 didn't find any blocking issues.

 Btw in the future better to build from the svn repo so the revision is
 an svn rev from the release branch. Eg 1336254 instead of 40e90d3c7
 which is from the git mirror, this way we're consistent across
 releases.

 hadoop-2.0.0-alpha $ ./bin/hadoop version
 Hadoop 2.0.0-alpha
 Subversion 
 git://devadm900.cc1.ygridcore.net/grid/0/dev/acm/hadoop-trunk/hadoop-common-project/hadoop-common
 -r 40e90d3c7e5d71aedcdc2d9cc55d078e78944c55
 Compiled by hortonmu on Wed May  9 16:19:55 UTC 2012
 From source with checksum 3d9a13a31ef3a9ab4b5cba1f982ab888


 On Wed, May 9, 2012 at 9:58 AM, Arun C Murthy a...@hortonworks.com 
 wrote:
 I've created a release candidate for hadoop-2.0.0-alpha that I would 
 like to release.

 It is available at: 
 http://people.apache.org/~acmurthy/hadoop-2.0.0-alpha-rc0/

 The maven artifacts are available via repository.apache.org.

 Please try the release and vote; the vote will run for the usual 7 days.

 This is a big milestone for the Apache Hadoop community - 
 congratulations and thanks for all the contributions!

 thanks,
 Arun


 --
 Arun C. Murthy
 Hortonworks Inc.
 http://hortonworks.com/





 --
 Todd Lipcon
 Software Engineer, Cloudera

 --
 Arun C. Murthy
 Hortonworks Inc.
 http://hortonworks.com/


 --
 Arun C. Murthy
 Hortonworks Inc.
 http://hortonworks.com/



 --
 Arun C. Murthy
 Hortonworks Inc.
 http://hortonworks.com/





-- 
Todd Lipcon
Software Engineer, Cloudera

Re: [VOTE] Release hadoop-2.0.0-alpha-rc1

2012-05-15 Thread Todd Lipcon

Thanks for posting the new RC. Will take a look tomorrow. Meanwhile,
I'm going through CHANGES.txt and JIRA and moving things that didn't
make the 2.0.0 cut to 2.0.1.

So, if folks commit things tomorrow, please check to put it in the
right spot in CHANGES.txt and in JIRA. I'll take care of anything
committed tonight that would conflict with my change.

-Todd

On Tue, May 15, 2012 at 7:20 PM, Arun C Murthy a...@hortonworks.com wrote:
 I've created a release candidate (rc1) for hadoop-2.0.0-alpha that I would 
 like to release.

 It is available at: http://people.apache.org/~acmurthy/hadoop-2.0.0-alpha-rc1/

 The maven artifacts are available via repository.apache.org.

 Please try the release and vote; the vote will run for the usual 7 days.

 thanks,
 Arun


 --
 Arun C. Murthy
 Hortonworks Inc.
 http://hortonworks.com/





-- 
Todd Lipcon
Software Engineer, Cloudera

Re: [VOTE] Release hadoop-2.0.0-alpha

2012-05-14 Thread Todd Lipcon

Hey Arun,

One more thing on the rc tarball: the source artifact doesn't appear
to be an exact svn export, based on a diff. For example, it includes
the README, NOTICE, and LICENSE files, as well as a few other things
which appear to be build artifacts (eg
hadoop-hdfs-project/hadoop-hdfs/downloads,
hadoop-hdfs-project/hadoop-hdfs/test_edit_log, etc).

It seems like we _should_ have the various README style files, but we
shouldn't have the test artifacts in our source release.

In order to get our source release to match svn, perhaps we should
move NOTICE, README, LICENSE, etc to the top level of our svn repo,
such that a pure svn export would be a releaseable source artifact?

-Todd



On Mon, May 14, 2012 at 2:14 PM, Siddharth Seth
seth.siddha...@gmail.com wrote:
 Do we want to get MAPREDUCE-4067 in as well ? It affects folks who may be
 writing their own AMs. Shouldn't affect MR clients though. I believe 2.0
 alpha doesn't freeze the Yarn protocols for the 2.0 branch, so probably not
 critical.

 Thanks
 - Sid

 On Mon, May 14, 2012 at 1:32 PM, Eli Collins e...@cloudera.com wrote:

 As soon as jira is back up and I can post an updated patch I'll merge
 HDFS-3418 (also incompatible).


 On Mon, May 14, 2012 at 12:16 PM, Tsz Wo Sze szets...@yahoo.com wrote:
  I just have merged HADOOP-8285 and HADOOP-8366.  I also have merged
 HDFS-3211 since it is an incompatible protocol change (without it,
 2.0.0-alphaand 2.0.0 will be incompatible.)
 
  Tsz-Wo
 
 
 
  - Original Message -
  From: Tsz Wo Sze szets...@yahoo.com
  To: general@hadoop.apache.org general@hadoop.apache.org
  Cc:
  Sent: Monday, May 14, 2012 11:07 AM
  Subject: Re: [VOTE] Release hadoop-2.0.0-alpha
 
  Let me merge HADOOP-8285 and HADOOP-8366.  Thanks.
  Tsz-Wo
 
 
 
  - Original Message -
  From: Uma Maheswara Rao G mahesw...@huawei.com
  To: general@hadoop.apache.org general@hadoop.apache.org
  Cc:
  Sent: Monday, May 14, 2012 10:56 AM
  Subject: RE: [VOTE] Release hadoop-2.0.0-alpha
 
  a) Revert HDFS-3157 and commit HADOOP-8285 and HADOOP-8366 on
  branch-2.0.0-alpha, so these are the only changes since rc0. Roll a
  new rc1 from here.
  I have merged HDFS-3157 revert.
  Do you mind taking a look at HADOOP-8285 and HADOOP-8366?
 
  Thanks,
  Uma
  
  From: Arun C Murthy [a...@hortonworks.com]
  Sent: Monday, May 14, 2012 10:24 PM
  To: general@hadoop.apache.org
  Subject: Re: [VOTE] Release hadoop-2.0.0-alpha
 
  Todd,
 
  Please go ahead and merge changes into branch-2.0.0-alpha and I'll roll
 RC1.
 
  thanks,
  Arun
 
  On May 12, 2012, at 10:05 PM, Todd Lipcon wrote:
 
  Looking at the release tag vs the current state of branch-2, I have
  two concerns from the point of view of HDFS:
 
  1) We reverted HDFS-3157 in branch-2 because it sends deletions for
  corrupt replicas without properly going through the corrupt block
  path. We saw this cause data loss in TestPipelinesFailover. So, I'm
  nervous about putting it in a release, even labeled as alpha.
 
  2) HADOOP-8285 and HADOOP-8366 changed the wire format for the RPC
  envelope in branch-2, but didn't make it into this rc. So, that would
  mean that future alphas would not be protocol-compatible with this
  alpha. Per a discussion a few weeks ago, I think we all were in
  agreement that, if possible, we'd like all 2.x to be compatible for
  client-server communication, at least (even if we don't support
  cross-version for the intra-cluster protocols)
 
  Do other folks think it's worth rolling an rc1? I would propose either:
  a) Revert HDFS-3157 and commit HADOOP-8285 and HADOOP-8366 on
  branch-2.0.0-alpha, so these are the only changes since rc0. Roll a
  new rc1 from here.
  or:
  b) Discard the current branch-2.0.0-alpha and re-branch from the
  current state of branch-2.
 
  -Todd
 
  On Fri, May 11, 2012 at 7:19 PM, Eli Collins e...@cloudera.com wrote:
  +1  I installed the build on a 6 node cluster and kicked the tires,
  didn't find any blocking issues.
 
  Btw in the future better to build from the svn repo so the revision is
  an svn rev from the release branch. Eg 1336254 instead of 40e90d3c7
  which is from the git mirror, this way we're consistent across
  releases.
 
  hadoop-2.0.0-alpha $ ./bin/hadoop version
  Hadoop 2.0.0-alpha
  Subversion git://
 devadm900.cc1.ygridcore.net/grid/0/dev/acm/hadoop-trunk/hadoop-common-project/hadoop-common
  -r 40e90d3c7e5d71aedcdc2d9cc55d078e78944c55
  Compiled by hortonmu on Wed May  9 16:19:55 UTC 2012
  From source with checksum 3d9a13a31ef3a9ab4b5cba1f982ab888
 
 
  On Wed, May 9, 2012 at 9:58 AM, Arun C Murthy a...@hortonworks.com
 wrote:
  I've created a release candidate for hadoop-2.0.0-alpha that I would
 like to release.
 
  It is available at:
 http://people.apache.org/~acmurthy/hadoop-2.0.0-alpha-rc0/
 
  The maven artifacts are available via repository.apache.org.
 
  Please try the release and vote; the vote will run for the usual 7
 days.
 
  This is a big

Re: [VOTE] Release hadoop-2.0.0-alpha

2012-05-12 Thread Todd Lipcon

Looking at the release tag vs the current state of branch-2, I have
two concerns from the point of view of HDFS:

1) We reverted HDFS-3157 in branch-2 because it sends deletions for
corrupt replicas without properly going through the corrupt block
path. We saw this cause data loss in TestPipelinesFailover. So, I'm
nervous about putting it in a release, even labeled as alpha.

2) HADOOP-8285 and HADOOP-8366 changed the wire format for the RPC
envelope in branch-2, but didn't make it into this rc. So, that would
mean that future alphas would not be protocol-compatible with this
alpha. Per a discussion a few weeks ago, I think we all were in
agreement that, if possible, we'd like all 2.x to be compatible for
client-server communication, at least (even if we don't support
cross-version for the intra-cluster protocols)

Do other folks think it's worth rolling an rc1? I would propose either:
a) Revert HDFS-3157 and commit HADOOP-8285 and HADOOP-8366 on
branch-2.0.0-alpha, so these are the only changes since rc0. Roll a
new rc1 from here.
or:
b) Discard the current branch-2.0.0-alpha and re-branch from the
current state of branch-2.

-Todd

On Fri, May 11, 2012 at 7:19 PM, Eli Collins e...@cloudera.com wrote:
+1 I installed the build on a 6 node cluster and kicked the tires,
didn't find any blocking issues.

Btw in the future better to build from the svn repo so the revision is
an svn rev from the release branch. Eg 1336254 instead of 40e90d3c7
which is from the git mirror, this way we're consistent across
releases.

hadoop-2.0.0-alpha $ ./bin/hadoop version
Hadoop 2.0.0-alpha
Subversion
git://devadm900.cc1.ygridcore.net/grid/0/dev/acm/hadoop-trunk/hadoop-common-project/hadoop-common
-r 40e90d3c7e5d71aedcdc2d9cc55d078e78944c55
Compiled by hortonmu on Wed May 9 16:19:55 UTC 2012
From source with checksum 3d9a13a31ef3a9ab4b5cba1f982ab888

On Wed, May 9, 2012 at 9:58 AM, Arun C Murthy a...@hortonworks.com wrote:
I've created a release candidate for hadoop-2.0.0-alpha that I would like to
release.

It is available at:
http://people.apache.org/~acmurthy/hadoop-2.0.0-alpha-rc0/

The maven artifacts are available via repository.apache.org.

Please try the release and vote; the vote will run for the usual 7 days.

This is a big milestone for the Apache Hadoop community - congratulations
and thanks for all the contributions!

thanks,
Arun

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/

--
Todd Lipcon
Software Engineer, Cloudera

Re: [VOTE] Release hadoop-2.0.0-alpha

2012-05-09 Thread Todd Lipcon

Hi Andrew,

Have you seen the new MiniMRClientCluster class? It's meant to be what
you describe - a minicluster which only exposes external APIs --
most importantly a way of getting at a JobClient to submit jobs. We
have it implemented in both 1.x and 2.x at this point, though I don't
recall if it's in the 1.0.x releases or if it's only slated for 1.1+

-Todd

On Wed, May 9, 2012 at 6:05 PM, Andrew Purtell andrew.purt...@gmail.com wrote:
 Hi Suresh,

 The unstable designation makes sense.  As would one for MiniMRCluster.

 I was over the top initially to surprise. I'm sure the MR minicluster seems a 
 minor detail.

 Maybe it's worth thinking about the miniclusters differently? Please pardon 
 if I am rehashing an old discussion.

 Things like MRUnit for applications and BigTop for full cluster tests can 
 help, but for as mentioned in the below annotation Pig, Hive, HBase, and 
 other parts of the stack use miniclusters for local end to end testing in 
 unit tests. As the complexity of the stack increases and we consider cross 
 version support, unit tests on miniclusters I think will have no substitute.

 As Hadoop 2 has been evolving there has been some difficulty keeping up with 
 minicluster changes. This makes sense. The attention to stability to client 
 APIs and such, and the lack thereof to the minicluster, I think is self 
 evident. But the need to fix up tests unpredictably introduces some friction 
 that perhaps need not be there.

 Would a JIRA to discuss defining a subset of the minicluster interfaces as 
 more stable be worthwhile?

 Best regards,

    - Andy


 On May 9, 2012, at 1:45 PM, Suresh Srinivas sur...@hortonworks.com wrote:

 For this reason, in HDFS, we change MiniDFSCluster to LimitedPrivate and
 not treat it as such:

 @InterfaceAudience.LimitedPrivate({HBase, HDFS, Hive, MapReduce,
 Pig})
 @InterfaceStability.Unstable
 public class MiniDFSCluster { ...}

 On Wed, May 9, 2012 at 11:33 AM, Andrew Purtell apurt...@apache.org wrote:

 Sounds good Arun.

 How should we consider the suitability and stability of MiniMRCluster
 for downstream projects?

 On Wed, May 9, 2012 at 11:30 AM, Arun C Murthy a...@hortonworks.com
 wrote:
 No worries Andy. I can spin an rc1 once we can pin-point the bug.

 thanks,
 Arun

 On May 9, 2012, at 10:17 AM, Andrew Purtell wrote:

 -1 (nonbinding), we are currently facing a minicluster semantic change
 of some kind, or more than one:

   https://issues.apache.org/jira/browse/HBASE-5966

 There are other HBase JIRAs related to 2.0.0-alpha that we are working
 on, but I'd claim those are all our fault for breaking abstractions to
 solve issues. In one case there's a new helpful 2.x API
 (ShutdownHookManager, thank you!) that we can eventually move to.

 However, the minicluster changes are causing us some repeated
 discomfort. It will break, we'll get some help fixing up our tests for
 that, then some time later it will break again, repeat. Perhaps we
 have no right to complain, the minicluster isn't meant to be used by
 downstream projects. If so then please disregard the complaint, but
 your assistance in helping to fix the breakage again would be much
 appreciated. And, if so, perhaps we can discuss what makes sense in
 terms of a stable minicluster consumable for downstream projects?

 Best regards,

   - Andy

 On Wed, May 9, 2012 at 9:58 AM, Arun C Murthy a...@hortonworks.com
 wrote:
 I've created a release candidate for hadoop-2.0.0-alpha that I would
 like to release.

 It is available at:
 http://people.apache.org/~acmurthy/hadoop-2.0.0-alpha-rc0/

 The maven artifacts are available via repository.apache.org.

 Please try the release and vote; the vote will run for the usual 7
 days.

 This is a big milestone for the Apache Hadoop community -
 congratulations and thanks for all the contributions!

 thanks,
 Arun


 --
 Arun C. Murthy
 Hortonworks Inc.
 http://hortonworks.com/





 --
 Best regards,

   - Andy

 Problems worthy of attack prove their worth by hitting back. - Piet
 Hein (via Tom White)

 --
 Arun C. Murthy
 Hortonworks Inc.
 http://hortonworks.com/





 --
 Best regards,

   - Andy

 Problems worthy of attack prove their worth by hitting back. - Piet
 Hein (via Tom White)




-- 
Todd Lipcon
Software Engineer, Cloudera

Re: [VOTE] Release hadoop-0.23.2-rc0

2012-04-19 Thread Todd Lipcon

On Thu, Apr 19, 2012 at 12:26 PM, Eli Collins e...@cloudera.com wrote:

 On Thu, Apr 19, 2012 at 11:45 AM, Arun C Murthy a...@hortonworks.com
 wrote:
  Yep, makes sense - I'll roll an rc0 for 2.0 after.
 
  However, we should consider whether HDFS protocols are 'ready' for us to
 commit to them for the foreseeable future, my sense is that it's a tad
 early - particularly with auto-failover not complete.
 
  Thus, we have a couple of options:
  a) Call the first release here as *2.0.0-alpha* version (lots of ASF
 projects do this).
  b) Just go with 2.0.0 and deem 2.0.x or 2.1.x as the first stable
 release and fwd-compatible release later.
 
  Given this is a major release (unlike something obscure like
 hadoop-0.23.0) I'm inclined to go with a) i.e. hadoop-2.0.0-alpha.
 
  Thoughts?
 

 Agree that we're a little too early on the HDFS protocol side, think
 MR2 is probably in a similar boat wrt stability as well.

 +1 to option a, calling it hadoop-2.0.0-alpha seems most appropriate.


Regarding protocols:
+1 to _not_ locking down cluster-internal wire compatibility at this
point. i.e we can break DN-NN, or NN-SBN, or Admin command - NN
compatibility still.
+1 to locking down client wire compatibility with the release of 2.0. After
2.0 is released I would like to see all 2.0.x clients continue to be
compatible. Now that we are protobuf-ified, I think this is doable.
Should we open a separate discussion thread for the above?

Regarding version numbering: either of the proposals seems fine by me.

-Todd

  Arun
 
  On Apr 19, 2012, at 12:24 AM, Eli Collins wrote:
 
  Hey Arun,
 
  This vote passed a week or so ago, let's make it official?
 
  Also, are you still planning to roll a hadoop-2.0.0-rc0 of branch-2
  this week?  I think we should do that soon, if you're not planning to
  do this holler and I'd be happy to.  There's only 1 blocker left
  (http://bit.ly/I55LAd) and it's patch available, I think we should
  role an rc from branch-2 when it's merged.
 
  Thanks,
  Eli
 
  On Thu, Mar 29, 2012 at 4:07 PM, Arun C Murthy a...@hortonworks.com
 wrote:
  0.23.2 is just  a small set of bug-fixes on top of 0.23.1 and doesn't
 have NN HA etc.
 
  As I've noted separately, I plan to put out a hadoop-2.0.0-rc0 in a
 couple weeks with NN HA, PB for HDFS etc.
 
  thanks,
  Arun
 
  On Mar 29, 2012, at 3:55 PM, Ted Yu wrote:
 
  What are the issues fixed / features added in 0.23.2 compared to
 0.23.1 ?
 
  Thanks
 
  On Thu, Mar 29, 2012 at 3:45 PM, Arun C Murthy a...@hortonworks.com
 wrote:
 
  I've created a release candidate for hadoop-0.23.2 that I would like
 to
  release.
 
  It is available at:
 http://people.apache.org/~acmurthy/hadoop-0.23.2-rc0/
 
  The maven artifacts are available via repository.apache.org.
 
  Please try the release and vote; the vote will run for the usual 7
 days.
 
  thanks,
  Arun
 
  --
  Arun C. Murthy
  Hortonworks Inc.
  http://hortonworks.com/
 
 
 
 
  --
  Arun C. Murthy
  Hortonworks Inc.
  http://hortonworks.com/
 
 
 
  --
  Arun C. Murthy
  Hortonworks Inc.
  http://hortonworks.com/
 
 




-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Naming of Hadoop releases

2012-03-20 Thread Todd Lipcon

On Mon, Mar 19, 2012 at 11:02 PM, Konstantin Shvachko
shv.had...@gmail.com wrote:
 Feature freeze has been broken so many times for the .20 branch, so
 that it became a norm for the entire project rather than an exception,
 which we had in the past.

I agree we should be stricter about what feature backports we allow
into stable branches. Security and hflush were both necessary evils
- I'm glad now that we have them, but we should try to stay out of
these types of situations in the future where we feel compelled to
backport (or re-do in the case of hflush/sync) such large items.


 I don't understand this constant segregation against Hadoop .22. It is
 a perfectly usable version of Hadoop. It would be waste not to have it
 released. Very glad that universities adopted it. If somebody needs
 security there is a number of choices, Hadoop-1 being the first. But
 if you cannot afford stand-alone HBase clusters or need to combine
 general Hadoop and HBase loads there is nothing else but Hadoop 0.22
 at this point.

I don't see what HBase has to do with it. In fact HBase runs way
better on 1.x compared to 0.22. The tests don't even pass on 0.22 due
to differences in the append semantics in 0.21+ compared to 0.20.
Every production HBase deploy I know about runs on an 1.x based
distribution. You could argue this is selection bias by nature of my
employer, but the same is true based on emails to the hbase-user
lists, etc. This is orthogonal to the discussion at hand, I just
wanted to correct this lest any users get the wrong perception and
migrate their HBase clusters to a version which is rarely used and
strictly inferior for this use case.

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Naming of Hadoop releases

2012-03-19 Thread Todd Lipcon

On Mon, Mar 19, 2012 at 2:56 PM, Doug Cutting cutt...@apache.org wrote:
 On 03/19/2012 02:47 PM, Arun C Murthy wrote:
 This is against the Apache Hadoop release policy on major releases i.e. only 
 features deprecated for at least one release can be removed.

 In many case the reason this happened was that features were backported
 from trunk to 0.20 but not to 0.22.  In other words, its no fault of the
 folks who were working on branch 0.22.

I agree that it's no fault of the folks on 0.22.

  So a related policy we might add
 to prevent such situations in the future might be that if you backport
 something from branch n to n-2 then you ought to also be required to
 backport it to branch n-1 and in general to all intervening branches.
 Does that seem sensible?

-1 on this requirement. Otherwise the cost of backporting something to
the stable line becomes really high, and we'll end up with
distributors just maintaining their own branches outside of Apache
(the state we were in with 0.20.x).

On the other hand, it does suck for users if they update from 1.x to
2.x and they end up losing some bug fixes or features they
previously were running.

Unfortunately, I don't have a better solution in mind that resolves
the above problems - I just don't think it's tenable to combine a
policy like anyone may make a release branch off trunk and claim a
major version number with another policy like you have to port a fix
to all intermediate versions in order to port a fix to any of them.
If a group of committers wants to make a release branch, then the
maintenance of that branch should be up to them.

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

Re: [VOTE] Rename hadoop branches post hadoop-1.x

2012-03-19 Thread Todd Lipcon

My vote remains the same: (binding)
(3) Rename branch-0.23 to branch-2, keep branch-0.22 as-is.
(2) Rename branch-0.23 to branch-3, keep branch-0.22 as-is i.e. leave a hole.
(1) Rename branch-0.22 to branch-2, rename branch-0.23 to branch-3.
(4) If security is fixed in branch-0.22 within a short time-frame i.e.
2 months then we get option 1, else we get option 3. Effectively
postpone discussion by 2 months, start a timer now.
(5) Do nothing, keep branch-0.22 and branch-0.23 as-is.


On Mon, Mar 19, 2012 at 6:06 PM, Arun C Murthy a...@hortonworks.com wrote:
 We've discussed several options:

 (1) Rename branch-0.22 to branch-2, rename branch-0.23 to branch-3.
 (2) Rename branch-0.23 to branch-3, keep branch-0.22 as-is i.e. leave a hole.
 (3) Rename branch-0.23 to branch-2, keep branch-0.22 as-is.
 (4) If security is fixed in branch-0.22 within a short time-frame i.e. 2 
 months then we get option 1, else we get option 3. Effectively postpone 
 discussion by 2 months, start a timer now.
 (5) Do nothing, keep branch-0.22 and branch-0.23 as-is.

 Let's do a STV [1] to get reach consensus.

 Please vote by listing the options above in order of your preferences.

 My vote is 3, 4, 2, 1, 5 in order (binding).

 The vote will run the normal 7 days.

 thanks,
 Arun

 [1] http://en.wikipedia.org/wiki/Single_transferable_vote




-- 
Todd Lipcon
Software Engineer, Cloudera

Re: [VOTE] Release Apache Hadoop 0.23.1-rc2

2012-02-22 Thread Todd Lipcon

-1, unfortunately. HDFS-2991 is a blocker regression introduced in
0.23.1. See the JIRA for instructions on how to reproduce on the rc2
build.

-Todd

On Fri, Feb 17, 2012 at 11:23 PM, Arun C Murthy a...@hortonworks.com wrote:
 I've created another release candidate for hadoop-0.23.1 that I would like to 
 release.

 It is available at: http://people.apache.org/~acmurthy/hadoop-0.23.1-rc2/
 The hadoop-0.23.1-rc2 svn tag: 
 https://svn.apache.org/repos/asf/hadoop/common/tags/release-0.23.1-rc2
 The maven artifacts for hadoop-0.23.1-rc2 are also available at 
 repository.apache.org.

 Please try the release and vote; the vote will run for the usual 7 days.

 thanks,
 Arun

 --
 Arun C. Murthy
 Hortonworks Inc.
 http://hortonworks.com/





-- 
Todd Lipcon
Software Engineer, Cloudera

Re: [VOTE] Release Apache Hadoop 0.23.1-rc2

2012-02-22 Thread Todd Lipcon

On Wed, Feb 22, 2012 at 7:51 PM, Vinod Kumar Vavilapalli
vino...@hortonworks.com wrote:
 Todd,

 From your analysis at HDFS-2991, looks like this was there in 0.23
 too. Also, seems this happens only at scale, and only (paraphrasing
 you) when the file is reopened for append on an exact block
 boundary.

Let me clarify: HDFS-2991 basically has two halves:
First half (been present forever): when we append() on a block
boundary, we don't log an OP_ADD
Second half (new due to HDFS-2718): if we get an OP_CLOSE for a file
we haven't OP_ADDed, we'll get a ClassCastException on startup.

So even though the first half isn't a regression, the regression in
the second half means that this longstanding bug will now actually
prevent startup.

Also, there's nothing related to scale here. I happened to run into it
doing scale tests, but it turned out to not be relevant. You'll see it
if you run TestDFSIO with standard parameters on trunk or 23.1 (that's
how I discovered it).


 Agree it is a critical fix, but given above, can we proceed along with
 0.23.1? Anyways, 0.23.1 is still an alpha (albeit of next level), so
 I'd think we can get that in for 0.23.2.

Alright, consider me -0, though it's pretty nasty once you run into
it. The only way I could start my NN again without losing data was to
recompile with the fix in place.

-Todd


 On Wed, Feb 22, 2012 at 6:43 PM, Todd Lipcon t...@cloudera.com wrote:
 -1, unfortunately. HDFS-2991 is a blocker regression introduced in
 0.23.1. See the JIRA for instructions on how to reproduce on the rc2
 build.

 -Todd

 On Fri, Feb 17, 2012 at 11:23 PM, Arun C Murthy a...@hortonworks.com wrote:
 I've created another release candidate for hadoop-0.23.1 that I would like 
 to release.

 It is available at: http://people.apache.org/~acmurthy/hadoop-0.23.1-rc2/
 The hadoop-0.23.1-rc2 svn tag: 
 https://svn.apache.org/repos/asf/hadoop/common/tags/release-0.23.1-rc2
 The maven artifacts for hadoop-0.23.1-rc2 are also available at 
 repository.apache.org.

 Please try the release and vote; the vote will run for the usual 7 days.

 thanks,
 Arun

 --
 Arun C. Murthy
 Hortonworks Inc.
 http://hortonworks.com/





 --
 Todd Lipcon
 Software Engineer, Cloudera



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: [VOTE] Release hadoop-0.23.1-rc0

2012-02-09 Thread Todd Lipcon

I just committed HDFS-2923 to branch-0.23. This bug can cause big
performance issues on the NN since the number of IPC handlers will
default way too low and won't be changed with the expected config.

Since there's a workaround, it's not a regression since 0.23.0, and
0.23.1 is still going to be labeled alpha/beta, it's up to you whether
you want to spin an rc1 on account of just this bug. If there are
other issues, though, definitely worth including this in rc1.

-Todd

On Wed, Feb 8, 2012 at 1:33 AM, Arun C Murthy a...@hortonworks.com wrote:
 I've created a release candidate for hadoop-0.23.1 that I would like to 
 release.

 It is available at: http://people.apache.org/~acmurthy/hadoop-0.23.1-rc0/

 Some highlights:
 # Since hadoop-0.23.0 in November there has been significant progress in 
 branch-0.23 with nearly 400 jiras committed to it (68 in Common, 78 in HDFS 
 and 242 in MapReduce).
 # An important aspect is that we've done a lot of performance related work 
 and hadoop-0.23.1 matches or exceeds performance of hadoop-1 in pretty much 
 every aspect of HDFS  MapReduce.
 # Also, several downstream projects (HBase, Pig, Oozie, Hive etc.)  seem to 
 be playing nicely with hadoop-0.23.1.

 Please try the release and vote; the vote will run for the usual 7 days.

 thanks,
 Arun

 --
 Arun C. Murthy
 Hortonworks Inc.
 http://hortonworks.com/





-- 
Todd Lipcon
Software Engineer, Cloudera

Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-15 Thread Todd Lipcon

On Tue, Nov 15, 2011 at 1:57 AM, Steve Loughran ste...@apache.org wrote:
 On 15/11/11 06:07, Dhruba Borthakur wrote:

 +1 to making the upcoming 0.23 release as 2.0.


 +1

 And leave the 0.20.20x chain as is, just because people are used to it


+1 to Steve's proposal. Renaming 0.20 is too big a pain at this point.
Though it's weird to never have a 1.0, the 0.20 name is well
ingrained, and I think renaming it at this point will cause a lot of
confusion (plus cause problems for downstream projects like Hive and
HBase which use regexes against the version string in various shim
layers)

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Update on hadoop-0.23

2011-10-18 Thread Todd Lipcon

On Tue, Oct 18, 2011 at 4:36 AM, Steve Loughran ste...@apache.org wrote:

 One more thing: are the ProtocolBuffers needed for all installations, or is
 that a compile-time requirement? If the binaries are going to be required,
 there's going to have to be one built for the various platforms, and
 source.deb/RPM files to build themselves on Linux. I'd rather avoid all that
 work

The protobuf java jar is required at runtime. protoc (native) is only
required at compile time.

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Java Versions and Hadoop

2011-10-08 Thread Todd Lipcon

I think requiring Java 7 is years off... I think most people have
doubts as to Java 7's stability until it's been adopted by a majority
of applications, and the new features aren't compelling enough to jump
ship, IMO.

-Todd

On Fri, Oct 7, 2011 at 3:33 PM,  milind.bhandar...@emc.com wrote:
 Hi Folks,

 While I have seen the wiki on which java versions to use currently to run
 Hadoop, I have not seen any discussion about the roadmap of java version
 compatibility with future hadoop versions.

 Recently, Oracle retired the Operating System Distributor License for
 Java (DLJ) [http://robilad.livejournal.com/90792.html,
 http://jdk-distros.java.net/] and Linux vendors have started making
 OpenJDK (6/7) as the default java version bundled with their OSs
 [http://www.java7developer.com/blog/?p=361]. Also, all future Java SE
 updates will be delivered through OpenJDK updates project.

 I see that OpenJDK6 (6b20pre) cannot be used to compile hadoop trunk. Has
 anyone tried OpenJDK7 ?

 Additionally, I have a few small projects in mind which can really make
 use of the new (esp I/O) features of Java 7.

 What, if any, timeline do hadoop developers have in mind to make Java 7 as
 required (and tested with OpenJDK 7) ?

 Thanks,

 - milind

 ---
 Milind Bhandarkar
 Greenplum Labs, EMC
 (Disclaimer: Opinions expressed in this email are those of the author, and
 do not necessarily represent the views of any organization, past or
 present, the author might be affiliated with.)





-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Update on hadoop-0.23

2011-09-30 Thread Todd Lipcon

On Fri, Sep 30, 2011 at 11:44 AM, Roman Shaposhnik r...@apache.org wrote:
 I apologize if my level of institutional knowledge of these things is
 lacking, but do you have any
 benchmarking results between 0.22 and 0.20.2xx? The reason I'm asking
 is twofold -- I really
 would like to see an objective numbers qualifying the viability of
 0.22 from the performance stand point,
 but more importantly I would really like to include the benchmarking
 code into Bigtop.

0.22 currently suffers from MAPREDUCE-2266, which, last time I
benchmarked it, caused a significant slowdown. iirc a terasort ran
something like twice as slow on my test cluster due to this bug.
0.23/MR2 doesn't suffer from this bug.

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

Welcoming Harsh J as a Hadoop committer

2011-09-16 Thread Todd Lipcon

On behalf of the PMC, I am pleased to announce that Harsh J Chouraria
has been elected a committer in the Apache Hadoop Common, HDFS, and
MapReduce projects. Anyone subscribed to the mailing list or JIRA will
undoubtedly recognize Harsh's name as one of the most helpful
community members and an author of increasingly many code
contributions. The Hadoop PMC and community appreciates Harsh's
involvement and looks forward to continuing contributions!

Welcome, Harsh!

-Todd and the Hadoop Project Management Committee

Re: Add Append-HBase support in upcoming 20.205

2011-09-02 Thread Todd Lipcon

The following other JIRAs have been committed in CDH for 18 months or
so, for the purpose of HBase. You may want to consider backporting
them as well - many were never committed to 0.20-append due to lack of
reviews by HDFS committers at the time.

HDFS-1056. Fix possible multinode deadlocks during block recovery
when using ephemeral dataxceiv

Description: Fixes the logic by which datanodes identify local RPC targets
 during block recovery for the case when the datanode
 is configured with an ephemeral data transceiver port.
Reason: Potential internode deadlock for clusters using ephemeral ports


HADOOP-6722. Workaround a TCP spec quirk by not allowing
NetUtils.connect to connect to itself

Description: TCP's ephemeral port assignment results in the possibility
 that a client can connect back to its own outgoing socket,
 resulting in failed RPCs or datanode transfers.
Reason: Fixes intermittent errors in cluster testing with ephemeral
IPC/transceiver ports on datanodes.

HDFS-1122. Don't allow client verification to prematurely add
inprogress blocks to DataBlockScanner

Description: When a client reads a block that is also open for writing,
 it should not add it to the datanode block scanner.
 If it does, the block scanner can incorrectly mark the
 block as corrupt, causing data loss.
Reason: Potential dataloss with concurrent writer-reader case.

HDFS-1248. Miscellaneous cleanup and improvements on 0.20 append branch

Description: Miscellaneous code cleanup and logging changes, including:
 - Slight cleanup to recoverFile() function in TestFileAppend4
 - Improve error messages on OP_READ_BLOCK
 - Some comment cleanup in FSNamesystem
 - Remove toInodeUnderConstruction (was not used)
 - Add some checks for null blocks in FSNamesystem to avoid a possible NPE
 - Only log inconsistent size warnings at WARN level for
non-under-construction blocks.
 - Redundant addStoredBlock calls are also not worthy of WARN level
 - Add some extra information to a warning in ReplicationTargetChooser
Reason: Improves diagnosis of error cases and clarity of code


HDFS-1242. Add unit test for the appendFile race condition /
synchronization bug fixed in HDFS-142

Reason: Test coverage for previously applied patch.

HDFS-1218. Replicas that are recovered during DN startup should
not be allowed to truncate better replicas.

Description: If a datanode loses power and then recovers, its replicas
 may be truncated due to the recovery of the local FS
 journal. This patch ensures that a replica truncated by
 a power loss does not truncate the block on HDFS.
Reason: Potential dataloss bug uncovered by power failure simulation

HDFS-915. Write pipeline hangs for too long when ResponseProcessor
hits timeout

Description: Previously, the write pipeline would hang for the entire write
 timeout when it encountered a read timeout (eg due to a
 network connectivity issue). This patch interrupts the writing
 thread when a read error occurs.
Reason: Faster recovery from pipeline failure for HBase and other
interactive applications.


HDFS-1186. Writers should be interrupted when recovery is started,
not when it's completed.

Description: When the write pipeline recovery process is initiated, this
 interrupts any concurrent writers to the block under recovery.
 This prevents a case where some edits may be lost if the
 writer has lost its lease but continues to write (eg due to
 a garbage collection pause)
Reason: Fixes a potential dataloss bug


commit a960eea40dbd6a4e87072bdf73ac3b62e772f70a
Author: Todd Lipcon t...@lipcon.org
Date:   Sun Jun 13 23:02:38 2010 -0700

HDFS-1197. Received blocks should not be added to block map
prematurely for under construction files

Description: Fixes a possible dataloss scenario when using append() on
 real-life clusters. Also augments unit tests to uncover
 similar bugs in the future by simulating latency when
 reporting blocks received by datanodes.
Reason: Append support dataloss bug
Author: Todd Lipcon


HDFS-1260. tryUpdateBlock should do validation before renaming meta file

Description: Solves bug where block became inaccessible in certain failure
 conditions (particularly network partitions). Observed under
 HBase workload at user site.
Reason: Potential loss of syunced data when write pipeline fails


On Fri, Sep 2, 2011 at 11:20 AM, Suresh Srinivas sur...@hortonworks.com wrote:
 I also propose following jiras, which are non append related bug fixes from
 0.20-append branch

Re: hadoop-0.23

2011-08-18 Thread Todd Lipcon

On Thu, Aug 18, 2011 at 9:36 AM, Arun C Murthy a...@hortonworks.com wrote:
 Good morning!

 On Jul 13, 2011, at 3:39 PM, Arun C Murthy wrote:

 It's looking like trunk is moving along rapidly - it's about time to start 
 thinking of the next release to unlock all of the goodies there.

 As the RM, my current thinking is that after we merge NextGen MR (MR-279) 
 and the HDFS-1073 branch into trunk we should be good to create the 
 hadoop-0.23 branch.

 Since the last time we spoke (actually, since last night, in fact!) the world 
 (trunk) has changed to accommodate our wishes... *smile*

 HDFS-1073 and MAPREDUCE-279 are presently in trunk and I think it's time to 
 cut the 0.23 branch so that we can focus on testing and stabilizing a 
 hadoop-0.23 release off that branch.

 I propose to do it noon of the coming Monday (Aug 22).

 Thoughts?

I assume we will make sure the HDFS mavenization is in before then?
Tom said he intends to commit it tomorrow, but if something comes up
and it's not committed, let's make sure mavenization happens before we
branch.

Also, what will be the guidelines for committing a change to 0.23
branch? Is it bug fix only or are we still allowing improvements?
Given how recently MR2 was merged, I imagine there will be a lot of
things that aren't strictly bugs that we will really want to have in
our next release. I also have a couple of HDFS patches (eg the new
faster CRC on-by-default) that I'd like to get into 23.

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Vote to merge HDFS-1073 ito trunk

2011-07-29 Thread Todd Lipcon

Thanks for the votes. The vote has passed and I committed a merge to trunk
just now. If anything breaks, don't hesitate to drop me a mail.

-Todd

On Thu, Jul 28, 2011 at 12:27 PM, Matt Foley mfo...@hortonworks.com wrote:

 +1 for the merge. I've read a majority of the code changes, excluding the
 BNN and 2NN, approaching from the big diff rather than individual
 patches,
 and starting with the files most changed from both current trunk and the
 1073 branchpoint.  I've found almost nothing to comment on.  It looks like
 a
 solid job, it is a significant simplification of FSEditLog, and I have
 become confident that the merge should proceed.
 --Matt


 From: Eli Collins e...@cloudera.com

 Date: Tue, 19 Jul 2011 18:43:58 -0700


  +1 for the merge.  I've reviewed all but a handful of the 50+

 individual patches, also looked at the merge patch for sanity and it

 looks good.


 
  From: Jitendra Pandey jiten...@hortonworks.com

 Date: Tue, 19 Jul 2011 18:23:39 -0700


  +1 for the merge. I haven't looked at BackupNode changes in much detail,
  but

 apart from that the patch looks good.


  On Tue, Jul 19, 2011 at 6:12 PM, Todd Lipcon t...@cloudera.com wrote:


   Hi all,

 

  HDFS-1073 is now complete and ready to be merged. Many thanks to those
 who

  helped review in the last two weeks.

 

  Hudson test-patch results are available on HDFS-1073 JIRA - please see
 the

  recent comments there for explanations.

 

  A few notes that may help you vote:

 

  - I have run the NNThroughputBenchmark and seen just a small regression
 in

  logging performance due to the inclusion of a txid with every edit for

  increased robustness.

  - The NN read path and the read/write IO paths are entirely untouched by

  these changes.

  - Image and edit load time were benchmarked throughout development of the

  branch and no significant regressions have been seen.

 

  Since this is a code change, all committers should feel free to vote. The

  voting requires three committer +1s and no -1s to pass. I will not vote

  since I contributed the majority of the code in the branch, though

  obviously

  I'm +1 :)

 

  -Todd

  --

  Todd Lipcon

  Software Engineer, Cloudera

 




-- 
Todd Lipcon
Software Engineer, Cloudera

Vote to merge HDFS-1073 ito trunk

2011-07-19 Thread Todd Lipcon

Hi all,

HDFS-1073 is now complete and ready to be merged. Many thanks to those who
helped review in the last two weeks.

Hudson test-patch results are available on HDFS-1073 JIRA - please see the
recent comments there for explanations.

A few notes that may help you vote:

- I have run the NNThroughputBenchmark and seen just a small regression in
logging performance due to the inclusion of a txid with every edit for
increased robustness.
- The NN read path and the read/write IO paths are entirely untouched by
these changes.
- Image and edit load time were benchmarked throughout development of the
branch and no significant regressions have been seen.

Since this is a code change, all committers should feel free to vote. The
voting requires three committer +1s and no -1s to pass. I will not vote
since I contributed the majority of the code in the branch, though obviously
I'm +1 :)

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Hoping to merge HDFS-1073 branch soon

2011-07-12 Thread Todd Lipcon

On Tue, Jul 12, 2011 at 10:38 AM, sanjay Radia san...@hortonworks.comwrote:

 We can merge 1580  after 1073  is merged in.

 Looks like the biggest thing in  your 1073  list  is the Backup NN related
 changes.


The BN-related changes are done and just awaiting code review. See
HDFS-1979. The current list of patches awaiting review are: HDFS-1979,
HDFS-2101, HDFS-2133, HDFS-1780, HDFS-2104, HDFS-2135.


 Are you shooting for end of this month?


I'm hoping as early as next week, assuming folks feel the branch is in good
shape. If all goes well, I'll have code reviews back for the above in the
next day or two, can respond to review comments and commit over the weekend,
and call a vote to merge early next week.

Thanks
-Todd


 On Jul 6, 2011, at 8:03 PM, Todd Lipcon wrote:

  Hi all,
 
  Just an update on this project:
  - The current list of uncommitted patches up for review is:
 
  1bea9d3 HDFS-1979. Fix BackupNode and CheckpointNode
  32db384 Amend HDFS-2011. Fix TestCheckpoint test for double close/abort
 of
  ELFOS
  b6a55a4 HDFS-2101. Update remaining unit tests for new layout
  ca0ace6 HDFS-2133. Address TODOs left in code
  b46825d HDFS-1780. reduce need to rewrite fsimage on statrtup
  30c858d HDFS-2104. Add flag to SecondaryNameNode to format it during
 startup
  942eaef HDFS-2135. Fix regression of HDFS-1955 in branch
 
  I believe Eli is going to work on reviewing these this week.
 
  - I've set up a Hudson job for the branch here:
  https://builds.apache.org/job/Hadoop-Hdfs-1073-branch/
  It's currently failing because it's missing some of the patches above.
 After
  the above patches go in, I expect a pretty clean build, modulo maybe one
 or
  two things that are environment issues, which I'll tackle later this
 week.
 
  - BackupNode and CheckpointNode are working. I've done some basic
 functional
  testing by pounding edits into the NN while both a 2NN and a BN are
  checkpointing every 2 seconds.
  - I merged with trunk as of this morning, so I think we should be
 up-to-date
  with trunk patches. Aaron was very helpful and went through all
 NN-related
  patches in trunk from the last 3 months to make sure we didn't
 inadvertently
  regress anything - he discovered one bug but everything else looks good.
 
  Once the above patches are in the branch, I would like to merge. So, if
 you
  plan on reviewing pre-merge, please do so *this week*. Of course, if you
  don't have time and you find issues post-merge, I absolutely plan on
 fixing
  them ASAP ;-)
 
  Thanks
  -Todd
 
  On Thu, Jun 30, 2011 at 12:11 AM, Todd Lipcon t...@cloudera.com wrote:
 
  Hey all,
 
  Work on the HDFS-1073 branch has been progressing steadily, and I
 believe
  we're coming close to the point where it can be merged. To briefly
 summarize
  the status:
  - NameNode and SecondaryNameNode are both fully working and have
 undergone
  some stress/fault testing in addition to a over 3000 lines worth of new
 unit
  tests.
  - Most of the existing unit tests have been updated, though a few more
 need
  some small tweaks (HDFS-2101)
  - The BackupNode and CheckpointNode are not currently working, though I
 am
  working on it locally and making good progress (HDFS-1979)
  - There are a few various and sundry small improvements that should
  probably be done before release, but I think could be done either before
 or
  after merge (eg HDFS-2104)
 
  Given this, I am expecting that we can merge this into trunk by the end
 of
  July if not earlier, as soon as the BN/CN work is complete. If you are
  hoping to review the code or tests before merge time, this is your early
  warning! Please do so now!
 
  Thanks!
 
  -Todd
  P.S. I will also be giving a short talk about the motivations and
 current
  status of this project at Friday's contributor meeting, for those who
 are
  able to attend. If we're lucky, maybe even a demo!
  --
  Todd Lipcon
  Software Engineer, Cloudera
 
 
 
 
  --
  Todd Lipcon
  Software Engineer, Cloudera




-- 
Todd Lipcon
Software Engineer, Cloudera

Re: [VOTE] Change bylaws to require 3 binding +1s for branch merge

2011-07-11 Thread Todd Lipcon

To clarify, is there any restriction on who may give the +1s? For example,
if a branch has a group of 5 committers primarily authoring the patches, can
the three +1s be made by a subset of those committers?

-Todd

On Mon, Jul 11, 2011 at 5:11 PM, Jakob Homan jgho...@gmail.com wrote:

 As discussed in the recent thread on HDFS-1623 branching models, I'd
 like to amend the bylaws to provide that branches should get a minimum
 of three committer +1s before being merged to trunk.

 The rationale:
 Feature branches are often created in order that developers can
 iterate quickly without the review then commit requirements of trunk.
 Branches' commit requirements are determined by the branch maintainer
 and in this situation are often set up as commit-then-review.  As
 such, there is no way to guarantee that the entire changeset offered
 for trunk merge has had a second pair of eyes on it.  Therefore, it is
 prudent to give that final merge heightened scrutiny, particularly
 since these branches often extensively affect critical parts of the
 system.  Requiring three binding +1s does not slow down the branch
 development process, but does provide a better chance of catching bugs
 before they make their way to trunk.

 Specifically, under the Actions subsection, this vote would add a new
 bullet item:
 * Branch merge: A feature branch that does not require the same
 criteria for code to be committed to trunk will require three binding
 +1s before being merged into trunk.

 The last bylaw change required lazy majority of PMC and ran for 7
 days, which I believe would apply to this one as well.  That would
 have this vote ending 5pm PST July 18.
 -Jakob




-- 
Todd Lipcon
Software Engineer, Cloudera

Re: [VOTE] Change bylaws to require 3 binding +1s for branch merge

2011-07-11 Thread Todd Lipcon

Sounds fine to me. +1

On Mon, Jul 11, 2011 at 9:30 PM, Mahadev Konar maha...@hortonworks.comwrote:

 +1

 mahadev

 On Mon, Jul 11, 2011 at 9:26 PM, Arun C Murthy a...@hortonworks.com
 wrote:
  +1
 
  Arun
 
  On Jul 11, 2011, at 5:11 PM, Jakob Homan wrote:
 
  As discussed in the recent thread on HDFS-1623 branching models, I'd
  like to amend the bylaws to provide that branches should get a minimum
  of three committer +1s before being merged to trunk.
 
  The rationale:
  Feature branches are often created in order that developers can
  iterate quickly without the review then commit requirements of trunk.
  Branches' commit requirements are determined by the branch maintainer
  and in this situation are often set up as commit-then-review.  As
  such, there is no way to guarantee that the entire changeset offered
  for trunk merge has had a second pair of eyes on it.  Therefore, it is
  prudent to give that final merge heightened scrutiny, particularly
  since these branches often extensively affect critical parts of the
  system.  Requiring three binding +1s does not slow down the branch
  development process, but does provide a better chance of catching bugs
  before they make their way to trunk.
 
  Specifically, under the Actions subsection, this vote would add a new
  bullet item:
  * Branch merge: A feature branch that does not require the same
  criteria for code to be committed to trunk will require three binding
  +1s before being merged into trunk.
 
  The last bylaw change required lazy majority of PMC and ran for 7
  days, which I believe would apply to this one as well.  That would
  have this vote ending 5pm PST July 18.
  -Jakob
 
 




-- 
Todd Lipcon
Software Engineer, Cloudera

Re: HDFS-1623 branching strategy

2011-07-07 Thread Todd Lipcon

Sounds good to me. I think this strategy has worked well on the HDFS-1073
branch -- allowed development to be quite rapid, and at this point all but a
couple trivial patches have been explicitly reviewed by a committer (and the
others implicitly reviewed since later patches touched the same code area).

+1.

-Todd

On Thu, Jul 7, 2011 at 1:43 PM, Aaron T. Myers a...@cloudera.com wrote:

 Hello everyone,

 This has been informally mentioned before, but I think it's best to be
 completely transparent/explicit about this.

 We (Sanjay, Suresh, Todd, Eli, myself, and anyone else who wants to help)
 intend to do the work for HDFS-1623 (High Availability Framework for HDFS
 NN) on a development branch off of trunk. The work in the HDFS-1073
 development branch is necessary to complete HDFS-1623. As such, we're
 waiting for the work in HDFS-1073 to be merged into trunk before creating a
 branch for HDFS-1623.

 Once this branch is created, I'd like to use a similar modified
 commit-then-review policy for this branch as was done in the HDFS-1073
 branch, which I think worked very well. To review, this was:

 {quote}
 - A patch will be uploaded to the JIRA for review like usual
 - If another committer provides a +1, it may be committed at that
 point, just like usual.
 - If no committer provides +1 (or a review asking for changes) within
 24 business hours, it will be committed to the branch under commit then
 review policy.Of course if any committer feels that code needs to be
 amended, he or she should feel free to open a new JIRA against the branch
 including the review comments, and they will be addressed before the merge
 into trunk. And just like with any branch merge, ample time will be given
 for the community to review both the large merge commit as well as the
 individual historical commits of the branch, before it goes into trunk.
 {quote}

 I'm also volunteering to keep the HDFS-1623 development branch up to date
 with respect to merging the concurrent changes which go into trunk into
 this
 development branch to make sure the merge back into trunk is as painless as
 possible.

 Comments are certainly welcome on this strategy.

 Thanks a lot,
 Aaron

 --
 Aaron T. Myers
  Software Engineer, Cloudera




-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Hoping to merge HDFS-1073 branch soon

2011-07-06 Thread Todd Lipcon

Hi all,

Just an update on this project:
- The current list of uncommitted patches up for review is:

1bea9d3 HDFS-1979. Fix BackupNode and CheckpointNode
32db384 Amend HDFS-2011. Fix TestCheckpoint test for double close/abort of
ELFOS
b6a55a4 HDFS-2101. Update remaining unit tests for new layout
ca0ace6 HDFS-2133. Address TODOs left in code
b46825d HDFS-1780. reduce need to rewrite fsimage on statrtup
30c858d HDFS-2104. Add flag to SecondaryNameNode to format it during startup
942eaef HDFS-2135. Fix regression of HDFS-1955 in branch

I believe Eli is going to work on reviewing these this week.

- I've set up a Hudson job for the branch here:
https://builds.apache.org/job/Hadoop-Hdfs-1073-branch/
It's currently failing because it's missing some of the patches above. After
the above patches go in, I expect a pretty clean build, modulo maybe one or
two things that are environment issues, which I'll tackle later this week.

- BackupNode and CheckpointNode are working. I've done some basic functional
testing by pounding edits into the NN while both a 2NN and a BN are
checkpointing every 2 seconds.
- I merged with trunk as of this morning, so I think we should be up-to-date
with trunk patches. Aaron was very helpful and went through all NN-related
patches in trunk from the last 3 months to make sure we didn't inadvertently
regress anything - he discovered one bug but everything else looks good.

Once the above patches are in the branch, I would like to merge. So, if you
plan on reviewing pre-merge, please do so *this week*. Of course, if you
don't have time and you find issues post-merge, I absolutely plan on fixing
them ASAP ;-)

Thanks
-Todd

On Thu, Jun 30, 2011 at 12:11 AM, Todd Lipcon t...@cloudera.com wrote:

 Hey all,

 Work on the HDFS-1073 branch has been progressing steadily, and I believe
 we're coming close to the point where it can be merged. To briefly summarize
 the status:
 - NameNode and SecondaryNameNode are both fully working and have undergone
 some stress/fault testing in addition to a over 3000 lines worth of new unit
 tests.
 - Most of the existing unit tests have been updated, though a few more need
 some small tweaks (HDFS-2101)
 - The BackupNode and CheckpointNode are not currently working, though I am
 working on it locally and making good progress (HDFS-1979)
 - There are a few various and sundry small improvements that should
 probably be done before release, but I think could be done either before or
 after merge (eg HDFS-2104)

 Given this, I am expecting that we can merge this into trunk by the end of
 July if not earlier, as soon as the BN/CN work is complete. If you are
 hoping to review the code or tests before merge time, this is your early
 warning! Please do so now!

 Thanks!

 -Todd
 P.S. I will also be giving a short talk about the motivations and current
 status of this project at Friday's contributor meeting, for those who are
 able to attend. If we're lucky, maybe even a demo!
 --
 Todd Lipcon
 Software Engineer, Cloudera




-- 
Todd Lipcon
Software Engineer, Cloudera

Hoping to merge HDFS-1073 branch soon

2011-06-30 Thread Todd Lipcon

Hey all,

Work on the HDFS-1073 branch has been progressing steadily, and I believe
we're coming close to the point where it can be merged. To briefly summarize
the status:
- NameNode and SecondaryNameNode are both fully working and have undergone
some stress/fault testing in addition to a over 3000 lines worth of new unit
tests.
- Most of the existing unit tests have been updated, though a few more need
some small tweaks (HDFS-2101)
- The BackupNode and CheckpointNode are not currently working, though I am
working on it locally and making good progress (HDFS-1979)
- There are a few various and sundry small improvements that should probably
be done before release, but I think could be done either before or after
merge (eg HDFS-2104)

Given this, I am expecting that we can merge this into trunk by the end of
July if not earlier, as soon as the BN/CN work is complete. If you are
hoping to review the code or tests before merge time, this is your early
warning! Please do so now!

Thanks!

-Todd
P.S. I will also be giving a short talk about the motivations and current
status of this project at Friday's contributor meeting, for those who are
able to attend. If we're lucky, maybe even a demo!
-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Hadoop Java Versions

2011-06-30 Thread Todd Lipcon

On Thu, Jun 30, 2011 at 5:16 PM, Ted Dunning tdunn...@maprtech.com wrote:

 You have to consider the long-term reliability as well.

 Losing an entire set of 10 or 12 disks at once makes the overall
 reliability
 of a large cluster very suspect.  This is because it becomes entirely too
 likely that two additional drives will fail before the data on the off-line
 node can be replicated.  For 100 nodes, that can decrease the average time
 to data loss down to less than a year.  This can only be mitigated in stock
 hadoop by keeping the number of drives relatively low.  MapR avoids this by
 not failing nodes for trivial problems.


I'd advise you to look at stock hadoop again. This used to be true, but
was fixed a long while back by HDFS-457 and several followup JIRAs.

If MapR does something fancier, I'm sure we'd be interested to hear about it
so we can compare the approaches.

-Todd



 On Thu, Jun 30, 2011 at 4:18 PM, Aaron Eng a...@maprtech.com wrote:

  Keeping the amount of disks per node low and the amount of nodes high
  should keep the impact of dead nodes in control.
 
  It keeps the impact of dead nodes in control but I don't think thats
  long-term cost efficient.  As prices of 10GbE go down, the keep the node
  small arguement seems less fitting.  And on another note, most servers
  manufactured in the last 10 years have dual 1GbE network interfaces.  If
  one
  were to go by these calcs:
 
  150 nodes with four 2TB disks each, with HDFS 60% full, it takes around
  ~32
  minutes to recover
 
  It seems like that assumes a single 1GbE interface, why  not leverage the
  second?
 
  On Thu, Jun 30, 2011 at 2:31 PM, Evert Lammerts evert.lamme...@sara.nl
  wrote:
 
You can get 12-24 TB in a server today, which means the loss of a
  server
generates a lot of traffic -which argues for 10 Gbe.
   
But
  -big increase in switch cost, especially if you (CoI warning) go
 with
Cisco
  -there have been problems with things like BIOS PXE and lights out
management on 10 Gbe -probably due to the NICs being things the BIOS
wasn't expecting and off the mainboard. This should improve.
  -I don't know how well linux works with ether that fast (field
  reports
useful)
  -the big threat is still ToR switch failure, as that will trigger a
re-replication of every block in the rack.
  
   Keeping the amount of disks per node low and the amount of nodes high
   should keep the impact of dead nodes in control. A ToR switch failing
 is
   different - missing 30 nodes (~120TB) at once cannot be fixed by adding
  more
   nodes; that actually increases ToR switch failure. Although such
 failure
  is
   quite rare to begin with, I guess. The back-of-the-envelope-calculation
 I
   made suggests that ~150 (1U) nodes should be fine with 1Gb ethernet.
  (e.g.,
   when 6 nodes fail in a cluster with 150 nodes with four 2TB disks each,
  with
   HDFS 60% full, it takes around ~32 minutes to recover. 2 nodes failing
   should take around 640 seconds. Also see the attached spreadsheet.)
 This
   doesn't take ToR switch failure in account though. On the other hand -
  150
   nodes is only ~5 racks - in such a scenario you might rather want to
 shut
   the system down completely rather than letting it replicate 20% of all
  data.
  
   Cheers,
   Evert
 




-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Thinking about the next hadoop mainline release

2011-06-24 Thread Todd Lipcon

On Fri, Jun 24, 2011 at 5:28 PM, Arun C Murthy ar...@yahoo-inc.com wrote:

 Thanks Suresh!

 Todd - I'd appreciate if you could help on some of the HBase/Performance
 jiras... thanks!


Sure thing.

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Thinking about the next hadoop mainline release

2011-06-17 Thread Todd Lipcon

On Fri, Jun 17, 2011 at 7:15 AM, Arun C Murthy ar...@yahoo-inc.com wrote:
 I volunteer to be the RM for the release since I've been leading the NG NR 
 effort.

 Are folks ok with this?

+1. It would be an honor to fix bugs for you, Arun.

-Todd


 Sent from my iPhone

 On Jun 17, 2011, at 1:45 PM, Ted Dunning tdunn...@maprtech.com wrote:

 NG map reduce is a huge deal both in terms of making things better for
 users, but also in terms of unblocking the Hadoop development process.

 On Fri, Jun 17, 2011 at 9:36 AM, Ryan Rawson ryano...@gmail.com wrote:

 - Next Generation Map-Reduce [MR-279]
 - Passing most tests now and discussing merging into trunk





-- 
Todd Lipcon
Software Engineer, Cloudera

Re: [VOTE] Shall we adopt the Defining Hadoop page

2011-06-15 Thread Todd Lipcon

On Wed, Jun 15, 2011 at 7:19 PM, Craig L Russell
craig.russ...@oracle.comwrote:

 There's no ambiguity. Either you ship the bits that the Apache PMC has
 voted on as a release, or you change it (one bit) and it is no longer what
 the PMC has voted on. It's a derived work.

 The rules for voting in Apache require that if you change a bit in an
 artifact, you can no longer count votes for the previous artifact. Because
 the new work is different. A new vote is required.


Sorry, but this is just silly. Are you telling me that the httpd package in
Ubuntu isn't Apache httpd? It has 43 patches applied. Tomcat6 has 17. I'm
sure every other commonly used piece of software bundled with ubuntu has
been patched, too. I don't see them calling their packages Ubuntu HTTP
server powered by Apache HTTPD. It's just httpd.

The httpd in RHEL 5 is the same way. In fact they even provide some nice
metadata in their patches, for example:
httpd-2.0.48-release.patch:Upstream-Status: vendor-specific change
httpd-2.1.10-apctl.patch:Upstream-Status: Vendor-specific changes for better
initscript integration

To me, this is a good thing: allowing vendors to redistribute the software
with some modifications makes it much more accessible to users and
businesses alike, and that's part of why Hadoop has had so much success. So
long as we require the vendors to upstream those modifications back to the
ASF, we get the benefits of these contributions back in the community and
everyone should be happy.

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

Re: JIRAs post-unsplit

2011-06-14 Thread Todd Lipcon

On Tue, Jun 14, 2011 at 9:35 AM, Rottinghuis, Joep jrottingh...@ebay.comwrote:

 Project un-split definitely simplifies things.

 Todd, if people add a watch based on patches, would they not miss
 notifictions for those entries in an earlier phase of their lifecycle?
 For example when issues are just reported, discussed and assigned, but no
 patch has been attached yet?


Another thought that Alejandro just suggested offline is to use JIRA
components rather than just the file paths. So, assuming there is a bot that
watches the JIRA, it would be easy enough to allow you to permawatch a
component (JIRA itself doesn't give this option).

Then, assuming the patch is assigned the right components, it will be seen
by people who care early on. If it's not given the right components, then it
will be seen once you upload a patch.



 A separate HADOOPX Jira project would eliminate such issues.

 It does raise another question though: What happens if an issue starts out
 in one area, and then turns out to require changes in other areas?
 Would one then first create a HADOOP-x, a HDFS-y, or MAPREDUCE-z and then
 when it turns out other components are involved a new HADOOPX- referring to
 such earlier Jira?

 Cheers,

 Joep

 
 From: Todd Lipcon [t...@cloudera.com]
 Sent: Monday, June 13, 2011 1:37 PM
 To: general@hadoop.apache.org
 Subject: Re: JIRAs post-unsplit

 On Mon, Jun 13, 2011 at 11:51 AM, Konstantin Boudnik c...@apache.org
 wrote:

  I tend to agree: JIRA separation was the benefit of the split.
 
  I'd rather keep the current JIRA split in effect (e.g. separate JIRA
  projects
  for separate Hadoop components; don't recombine them) and file patches in
  the
  same way (for common, hdfs, mapreduce). If a cross component patch is
  needed
  then HADOOP project JIRA can be used for tracking, patches, etc.
 

 Yea, perhaps we just need the QA bot to be smart enough that it could
 handle
 a cross-project patch attached to HADOOP? Maybe we do something crazy and
 make a new HADOOPCROSS jira for patches that affect multiple projects?
 (just
 brainstorming here...)


  Tree-based watch-list seems like a great idea, but won't it narrow the
  scope
  somehow? Are you saying that if I am interested in say
  hdfs/src/c++/libhdfs,
  but a JIRA is open which affects libhdfs and something else (e.g.
 NameNode)
  I
  will still get the notification?
 

 Right, that's the idea. You'd be added as a watcher (and get notified) for
 any patch that touches the area you care about, regardless of whether it
 also touches some other areas.

 -Todd




-- 
Todd Lipcon
Software Engineer, Cloudera

Re: [VOTE] Powered by Logo

2011-06-14 Thread Todd Lipcon

Who is allowed to vote in this? Committers? PMC? Everyone?

My vote: 5, 2, 6, 3, 1, 4

On Tue, Jun 14, 2011 at 8:19 PM, Owen O'Malley omal...@apache.org wrote:

 All,
   We've had a wide range of entries for a powered by logo. I've put them
 all on a page, here:

 http://people.apache.org/~omalley/hadoop-powered-by/

 Since there are a lot of contenders and we only want a single round of
 voting, let's use single transferable vote ( STV
 http://en.wikipedia.org/wiki/Single_transferable_vote). The important
 thing is to pick the images *IN ORDER* that you would like them.

 My vote (in order of course): 4, 1, 2, 3, 5, 6.

 In other words, I want option 4 most and option 6 least. With STV, you
 don't need to worry about voting for an unpopular choice since your vote
 will automatically roll over to your next choice.

 -- Owen





-- 
Todd Lipcon
Software Engineer, Cloudera

Re: HADOOP-7106 (project unsplit) this weekend

2011-06-13 Thread Todd Lipcon

Hey Robert,

It seems to be working for me... what URL are you trying to check out?
I moved aside my ~/.subversion dir, and then did:
$ svn co http://svn.apache.org/repos/asf/hadoop/common/branches/MR-279/

Thanks
-Todd

On Mon, Jun 13, 2011 at 7:05 AM, Robert Evans ev...@yahoo-inc.com wrote:

 Could someone unlock some of these branches for anonymous read only
 checkout? At least with MR-279 I get a 403 forbidden error when I try to
 check out.

 --Bobby

 On 6/12/11 6:38 PM, Todd Lipcon t...@cloudera.com wrote:

 OK, this seems to have succeeded without any big problems!

 I've re-enabled the git mirrors and the hudson builds. Feel free to commit
 to the new trees.

 Here are some instructions for the migration:

 === SVN users ===

 Next time you svn up in your common working directory you'll end up
 seeing the combined tree - ie a mapreduce/, hdfs/, and common/
 subdirectory.
 This is probably the easiest place from which to work, now. The URLs for
 the
 combined SVN trees are:

 trunk: https://svn.apache.org/repos/asf/hadoop/common/trunk/
 branch-0.22:
 http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.22
 branch-0.21:
 http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.21
 yahoo-merge:
 http://svn.apache.org/repos/asf/hadoop/common/branches/yahoo-merge
  (this one has the yahoo-merge branches from common, hdfs, and mapred)
 MR-279: http://svn.apache.org/repos/asf/hadoop/common/branches/MR-279
  (this one has the yahoo-merge common and hdfs, and the MR-279 mapred)

 The same kind of thing happened for HDFS-1073 and branch-0.21-old.
 Pre-project-split branches like branch-0.20 should have remained untouched.

 You can proceed to delete your checkouts of the individual mapred and hdfs
 trees, since they exist within the combined trees above. If for some reason
 you prefer to 'svn switch' an old MR or HDFS-specific checkout to point to
 its new location, you can use the following incantation:
 svn sw $(svn info | grep URL | awk '{print $2}' | sed
 's,\(hdfs\|mapreduce\|common\)/\(.*\),common/\2/\1,')

 === Git Users ===
 The git mirrors of the above 7 branches should now have a set of 4 commits
 near the top that look like this:

 Merge: 928d485 cd66945 77f628f
 Author: Todd Lipcon t...@apache.org
 Date:   Sun Jun 12 22:53:28 2011 +

HADOOP-7106. Reorganize SVN layout to combine HDFS, Common, and MR in a
 single tree (project unsplit)

git-svn-id:
 https://svn.apache.org/repos/asf/hadoop/common/trunk@113499413f79535-47bb-https://svn.apache.org/repos/asf/hadoop/common/trunk@113499413f79535-47bb-0310-9956-ffa450edef68
 0310-9956-ffa450edef68

 commit 77f628ff5925c25ba2ee4ce14590789eb2e7b85b
 Author: Todd Lipcon t...@apache.org
 Date:   Sun Jun 12 22:53:27 2011 +

Relocate mapreduce into mapreduce/

 commit cd66945f62635f589ff93468e94c0039684a8b6d
 Author: Todd Lipcon t...@apache.org
 Date:   Sun Jun 12 22:53:26 2011 +

Relocate hdfs into hdfs/

 commit 928d485e2743115fe37f9d123ce9a635c5afb91a
 Author: Todd Lipcon t...@apache.org
 Date:   Sun Jun 12 22:53:25 2011 +

Relocate common into common/

 The first of these 4 is a 3-parent octopus merge commit of the
 pre-project-unsplit branches. In theory, git is smart enough to track
 changes through this merge, so long as you pass the right flags (eg
 --follow). For example:

 todd@todd-w510:~/git/hadoop-common$ git log --pretty=oneline
 --abbrev-commit
 --follow mapreduce/src/java/org/apache/hadoop/mapred/JobTracker.java | head
 -10
 77f628f Relocate mapreduce into mapreduce/
 90df0cb MAPREDUCE-2455. Remove deprecated JobTracker.State in favour of
 JobTrackerStatus.
 ca2aba0 MAPREDUCE-2490. Add logging to graylist and blacklist activity to
 aid diagnosis of related issues. Contributed by Jonathan Eagles
 32aaa2a MAPREDUCE-2515. MapReduce code references some deprecated options.
 Contributed by Ari Rabkin.

 If you want to be able to have git follow renames all the way through the
 project split back to the beginning of time, put the following in
 hadoop-common/.git/info/grafts:
 5128a9a453d64bfe1ed978cf9ffed27985eeef36
 6c16dc8cf2b28818c852e95302920a278d07ad0c
 6a3ac690e493c7da45bbf2ae2054768c427fd0e1
 6c16dc8cf2b28818c852e95302920a278d07ad0c
 546d96754ffee3142bcbbf4563c624c053d0ed0d
 6c16dc8cf2b28818c852e95302920a278d07ad0c

 In terms of rebasing git branches, git is actually pretty smart. For
 example, I have a local HDFS-1073 branch in my hdfs repo. To transition
 it
 to the new combined repo, I did the following:

 # Add my project-split hdfs git repo as a remote:
 git remote add splithdfs /home/todd/git/hadoop-hdfs/
 git fetch splithdfs

 # Checkout a branch in my combined repo
 git checkout -b HDFS-1073 splithdfs/HDFS-1073

 # Rebase it on the combined 1073 branch
 git rebase origin/HDFS-1073

 ...and it actually applies my patches inside the appropriate subdirectory
 (I
 was surprised and impressed by this!)
 If the branch you're rebasing has added or moved files, it might not be
 smart

Re: HADOOP-7106 (project unsplit) this weekend

2011-06-13 Thread Todd Lipcon

Hmm, as I got farther down my email this morning, I saw some complaints that
HBase's SVN was giving 403s as well, this morning. Perhaps there is some
ASF-wide issue going on, completely unrelated to the Hadoop changes over the
weekend?

-Todd

On Mon, Jun 13, 2011 at 7:52 AM, Todd Lipcon t...@cloudera.com wrote:

 Hey Robert,

 It seems to be working for me... what URL are you trying to check out?
 I moved aside my ~/.subversion dir, and then did:
 $ svn co http://svn.apache.org/repos/asf/hadoop/common/branches/MR-279/

 Thanks
 -Todd

 On Mon, Jun 13, 2011 at 7:05 AM, Robert Evans ev...@yahoo-inc.com wrote:

 Could someone unlock some of these branches for anonymous read only
 checkout? At least with MR-279 I get a 403 forbidden error when I try to
 check out.

 --Bobby

 On 6/12/11 6:38 PM, Todd Lipcon t...@cloudera.com wrote:

 OK, this seems to have succeeded without any big problems!

 I've re-enabled the git mirrors and the hudson builds. Feel free to commit
 to the new trees.

 Here are some instructions for the migration:

 === SVN users ===

 Next time you svn up in your common working directory you'll end up
 seeing the combined tree - ie a mapreduce/, hdfs/, and common/
 subdirectory.
 This is probably the easiest place from which to work, now. The URLs for
 the
 combined SVN trees are:

 trunk: https://svn.apache.org/repos/asf/hadoop/common/trunk/
 branch-0.22:
 http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.22
 branch-0.21http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.22branch-0.21
 :
 http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.21
 yahoo-mergehttp://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.21yahoo-merge
 :
 http://svn.apache.org/repos/asf/hadoop/common/branches/yahoo-merge
  (this one has the yahoo-merge branches from common, hdfs, and mapred)
 MR-279: http://svn.apache.org/repos/asf/hadoop/common/branches/MR-279
  (this one has the yahoo-merge common and hdfs, and the MR-279 mapred)

 The same kind of thing happened for HDFS-1073 and branch-0.21-old.
 Pre-project-split branches like branch-0.20 should have remained
 untouched.

 You can proceed to delete your checkouts of the individual mapred and hdfs
 trees, since they exist within the combined trees above. If for some
 reason
 you prefer to 'svn switch' an old MR or HDFS-specific checkout to point to
 its new location, you can use the following incantation:
 svn sw $(svn info | grep URL | awk '{print $2}' | sed
 's,\(hdfs\|mapreduce\|common\)/\(.*\),common/\2/\1,')

 === Git Users ===
 The git mirrors of the above 7 branches should now have a set of 4 commits
 near the top that look like this:

 Merge: 928d485 cd66945 77f628f
 Author: Todd Lipcon t...@apache.org
 Date:   Sun Jun 12 22:53:28 2011 +

HADOOP-7106. Reorganize SVN layout to combine HDFS, Common, and MR in a
 single tree (project unsplit)

git-svn-id:
 https://svn.apache.org/repos/asf/hadoop/common/trunk@113499413f79535-47bb-https://svn.apache.org/repos/asf/hadoop/common/trunk@113499413f79535-47bb-0310-9956-ffa450edef68
 0310-9956-ffa450edef68

 commit 77f628ff5925c25ba2ee4ce14590789eb2e7b85b
 Author: Todd Lipcon t...@apache.org
 Date:   Sun Jun 12 22:53:27 2011 +

Relocate mapreduce into mapreduce/

 commit cd66945f62635f589ff93468e94c0039684a8b6d
 Author: Todd Lipcon t...@apache.org
 Date:   Sun Jun 12 22:53:26 2011 +

Relocate hdfs into hdfs/

 commit 928d485e2743115fe37f9d123ce9a635c5afb91a
 Author: Todd Lipcon t...@apache.org
 Date:   Sun Jun 12 22:53:25 2011 +

Relocate common into common/

 The first of these 4 is a 3-parent octopus merge commit of the
 pre-project-unsplit branches. In theory, git is smart enough to track
 changes through this merge, so long as you pass the right flags (eg
 --follow). For example:

 todd@todd-w510:~/git/hadoop-common$ git log --pretty=oneline
 --abbrev-commit
 --follow mapreduce/src/java/org/apache/hadoop/mapred/JobTracker.java |
 head
 -10
 77f628f Relocate mapreduce into mapreduce/
 90df0cb MAPREDUCE-2455. Remove deprecated JobTracker.State in favour of
 JobTrackerStatus.
 ca2aba0 MAPREDUCE-2490. Add logging to graylist and blacklist activity to
 aid diagnosis of related issues. Contributed by Jonathan Eagles
 32aaa2a MAPREDUCE-2515. MapReduce code references some deprecated options.
 Contributed by Ari Rabkin.

 If you want to be able to have git follow renames all the way through the
 project split back to the beginning of time, put the following in
 hadoop-common/.git/info/grafts:
 5128a9a453d64bfe1ed978cf9ffed27985eeef36
 6c16dc8cf2b28818c852e95302920a278d07ad0c
 6a3ac690e493c7da45bbf2ae2054768c427fd0e1
 6c16dc8cf2b28818c852e95302920a278d07ad0c
 546d96754ffee3142bcbbf4563c624c053d0ed0d
 6c16dc8cf2b28818c852e95302920a278d07ad0c

 In terms of rebasing git branches, git is actually pretty smart. For
 example, I have a local HDFS-1073 branch in my hdfs repo. To transition
 it
 to the new combined repo, I did the following:

 # Add

Re: HADOOP-7106 (project unsplit) this weekend

2011-06-13 Thread Todd Lipcon

On Mon, Jun 13, 2011 at 8:14 AM, Todd Lipcon t...@cloudera.com wrote:


 Oops, sorry about that one. I will take care of that in about 30 minutes
 (just headed out the door now to catch a train). If someone else with commit
 access wants to, you just need to propset the externals to point to th new
 common/trunk/common/src/test/bin instead of the old location.


Fixed the svn:externals



 Also, the ant eclipse targets seem to be broken now.  It seems like
 various
 parts of the eclipse target need to be commonized now (the
 .eclipse-templates stuff and .classpath, .launches, etc.)


 Will look into this as well.

 Can you explain further what's broken? Are you trying to make a project
that's rooted in the directory that contains common/, mapreduce/, and hdfs/?
I can imagine that wouldn't work, but I'm not sure why it wouldn't work to
continue having three separate projects. Are you using some kind of SVN
integration with Eclipse?

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

Re: HADOOP-7106 (project unsplit) this weekend

2011-06-13 Thread Todd Lipcon

On Mon, Jun 13, 2011 at 9:24 AM, Jeffrey Naisbitt jnais...@yahoo-inc.comwrote:

 As you say though, I would think it should still work with three separate
 projects, so I'll just go back to that for now.  It would be nice to be
 able
 to build the whole thing as a single project though (since it's a single
 repository in svn now), but that would probably take some extra work - and
 would probably make sense in a separate Jira.


Yep - I think the idea is this will happen once we mavenize everything.
Maven apparently supports the idea of modules - basically a recursive
structure in which a top level pom file can build sub-poms, but the subpoms
could also continue to be built independently if necessary. I think the
top-level pom would live in the new root.

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

Re: HADOOP-7106 (project unsplit) this weekend

2011-06-13 Thread Todd Lipcon

On Mon, Jun 13, 2011 at 11:42 AM, Tsz Wo (Nicholas), Sze 
s29752-hadoopgene...@yahoo.com wrote:

 Todd,

 Great work!


 A few minor problems:
 (1) I had committed MAPREDUCE-2588, however, an commit email was sent to
 both
 common-commits@ and mapreduce-commits@.
 (2) hadoop/site becomes an empty directory.
 (3) There are svn properties in hadoop/common/trunk/

 (2) and (3) are simple.  I will remove hadoop/site and the svn properties
 for
 hadoop/common/trunk/.

 Does anyone know how to fix (1)?


Ah, we need to update the mailer config. This is the remaining task I
mentioned on the HADOOP-7106 JIRA. We had updated it to include both the old
and new spots for the sake of the transition. I'll ping Ian about this - I
think he's the only one with access.




 A separated question:
 Strictly speaking, this is not project unsplit since we are going to
 submit
 patches for individual sub-projects as before, i.e. we keep generating and
 committing patches to common/trunk/[common/hdfs/mapreduce] but not
 common/trunk;  hudson will pick up patches from individual sub-projects,
 etc.
 Am I correct?  Just want to make sure that everyone is on the same page.
  :)


Correct, that's where we at for today. I opened HADOOP-7384 which will allow
patches generated against the full repo to apply to an individual project,
to make life easier for git users, but right now we still need separate
JIRAs and patches. See my other thread from this morning about some ideas
how we might be able to do cross-project patches in a reasonable way.

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

Re: JIRAs post-unsplit

2011-06-13 Thread Todd Lipcon

On Mon, Jun 13, 2011 at 4:54 PM, Konstantin Boudnik c...@apache.org wrote:

 On Mon, Jun 13, 2011 at 01:37PM, Todd Lipcon wrote:
  On Mon, Jun 13, 2011 at 11:51 AM, Konstantin Boudnik c...@apache.org
 wrote:
 
   I tend to agree: JIRA separation was the benefit of the split.
  
   I'd rather keep the current JIRA split in effect (e.g. separate JIRA
   projects
   for separate Hadoop components; don't recombine them) and file patches
 in
   the
   same way (for common, hdfs, mapreduce). If a cross component patch is
   needed
   then HADOOP project JIRA can be used for tracking, patches, etc.
  
 
  Yea, perhaps we just need the QA bot to be smart enough that it could
 handle
  a cross-project patch attached to HADOOP? Maybe we do something crazy and
  make a new HADOOPCROSS jira for patches that affect multiple projects?
 (just
  brainstorming here...)

 Correct me if I'm wrong but in the new structure cross-component patch
 differs
 from a component one by a patch level (i.e. p0 vs p1 if looked from
 common/trunk), right? I guess the bot can be hacked to use this distinction
 thus saving us an extra JIRA project which will merely serve the purpose of
 meta-project.


Yes, I am about to commit HADOOP-7384 which can at least deal with patches
relative to either trunk/ or trunk/project. But, it will also detect a
cross-project patch and barf.

It could certainly be extended to apply and test a cross-project patch,
though it would be substantially more work.

The advantage of a separate HADOOPX jira would be to allow people to notice
cross-project patches. For example, a dev who primarily works on HDFS may
not subscribe to mapreduce-dev or mapreduce-issues, but if an MR issue is
going to modify something in the HDFS codebase, he or she will certainly
want to be aware of it.

-Todd


   Tree-based watch-list seems like a great idea, but won't it narrow the
   scope
   somehow? Are you saying that if I am interested in say
   hdfs/src/c++/libhdfs,
   but a JIRA is open which affects libhdfs and something else (e.g.
 NameNode)
   I
   will still get the notification?
  
 
  Right, that's the idea. You'd be added as a watcher (and get notified)
 for
  any patch that touches the area you care about, regardless of whether it
  also touches some other areas.
 
  -Todd
 
  On Mon, Jun 13, 2011 at 11:28AM, Todd Lipcon wrote:
After the project unsplit this weekend, we're now back to a place
 where
   we
have a single SVN/git tree that encompasses all of the subprojects.
 This
opens up the next question: should we merge the JIRAs and allow a
 single
issue to have a patch which spans projects?
   
My thoughts are:
- the biggest pain point with the project split is dealing with
cross-project patches
- one of the biggest reasons we did the project split was that the
   combined
traffic from the HADOOP JIRA was hard to follow for people who really
   care
about certain subprojects.
- the jira split is a coarse-grained way of allowing people to watch
 just
the sub-areas they care about.
   
So, I was thinking the following... what if there were a way to watch
   JIRAs
based on subtrees? I'm imagining a web page where any community user
   could
have an account and manage a watch list of subtrees. If you want to
   watch
all MR jiras, you could simply watch mapreduce/*. If you care only
 about
libhdfs, you could watch hdfs/src/c++/libhdfs, etc. Then a bot would
   watch
all patches attached to JIRA, and any time a patch is uploaded that
   touches
something on your watch list, it automatically adds you as a watcher
 on
   the
ticket and sends you a notification via email. It would also be easy
 to
   set
up a watch based on patch size, for example.
   
I think even if we don't recombine the JIRAs, this might be a handy
 way
   to
cut down on mailing list traffic for contributors who have a more
 narrow
focus on certain areas of the code.
   
Does this sound useful? I don't know if/when I'd have time to build
 such
   a
thing, but if the community thinks it would be really helpful, I
 might
become inspired.
   
-Todd
--
Todd Lipcon
Software Engineer, Cloudera
  
 
 
 
  --
  Todd Lipcon
  Software Engineer, Cloudera




-- 
Todd Lipcon
Software Engineer, Cloudera

Re: HADOOP-7106 (project unsplit) this weekend

2011-06-12 Thread Todd Lipcon

OK, this seems to have succeeded without any big problems!

I've re-enabled the git mirrors and the hudson builds. Feel free to commit
to the new trees.

Here are some instructions for the migration:

=== SVN users ===

Next time you svn up in your common working directory you'll end up
seeing the combined tree - ie a mapreduce/, hdfs/, and common/ subdirectory.
This is probably the easiest place from which to work, now. The URLs for the
combined SVN trees are:

trunk: https://svn.apache.org/repos/asf/hadoop/common/trunk/
branch-0.22:
http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.22
branch-0.21:
http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.21
yahoo-merge:
http://svn.apache.org/repos/asf/hadoop/common/branches/yahoo-merge
  (this one has the yahoo-merge branches from common, hdfs, and mapred)
MR-279: http://svn.apache.org/repos/asf/hadoop/common/branches/MR-279
  (this one has the yahoo-merge common and hdfs, and the MR-279 mapred)

The same kind of thing happened for HDFS-1073 and branch-0.21-old.
Pre-project-split branches like branch-0.20 should have remained untouched.

You can proceed to delete your checkouts of the individual mapred and hdfs
trees, since they exist within the combined trees above. If for some reason
you prefer to 'svn switch' an old MR or HDFS-specific checkout to point to
its new location, you can use the following incantation:
svn sw $(svn info | grep URL | awk '{print $2}' | sed
's,\(hdfs\|mapreduce\|common\)/\(.*\),common/\2/\1,')

=== Git Users ===
The git mirrors of the above 7 branches should now have a set of 4 commits
near the top that look like this:

Merge: 928d485 cd66945 77f628f
Author: Todd Lipcon t...@apache.org
Date:   Sun Jun 12 22:53:28 2011 +

HADOOP-7106. Reorganize SVN layout to combine HDFS, Common, and MR in a
single tree (project unsplit)

git-svn-id:
https://svn.apache.org/repos/asf/hadoop/common/trunk@113499413f79535-47bb-0310-9956-ffa450edef68

commit 77f628ff5925c25ba2ee4ce14590789eb2e7b85b
Author: Todd Lipcon t...@apache.org
Date:   Sun Jun 12 22:53:27 2011 +

Relocate mapreduce into mapreduce/

commit cd66945f62635f589ff93468e94c0039684a8b6d
Author: Todd Lipcon t...@apache.org
Date:   Sun Jun 12 22:53:26 2011 +

Relocate hdfs into hdfs/

commit 928d485e2743115fe37f9d123ce9a635c5afb91a
Author: Todd Lipcon t...@apache.org
Date:   Sun Jun 12 22:53:25 2011 +

Relocate common into common/

The first of these 4 is a 3-parent octopus merge commit of the
pre-project-unsplit branches. In theory, git is smart enough to track
changes through this merge, so long as you pass the right flags (eg
--follow). For example:

todd@todd-w510:~/git/hadoop-common$ git log --pretty=oneline --abbrev-commit
--follow mapreduce/src/java/org/apache/hadoop/mapred/JobTracker.java | head
-10
77f628f Relocate mapreduce into mapreduce/
90df0cb MAPREDUCE-2455. Remove deprecated JobTracker.State in favour of
JobTrackerStatus.
ca2aba0 MAPREDUCE-2490. Add logging to graylist and blacklist activity to
aid diagnosis of related issues. Contributed by Jonathan Eagles
32aaa2a MAPREDUCE-2515. MapReduce code references some deprecated options.
Contributed by Ari Rabkin.

If you want to be able to have git follow renames all the way through the
project split back to the beginning of time, put the following in
hadoop-common/.git/info/grafts:
5128a9a453d64bfe1ed978cf9ffed27985eeef36
6c16dc8cf2b28818c852e95302920a278d07ad0c
6a3ac690e493c7da45bbf2ae2054768c427fd0e1
6c16dc8cf2b28818c852e95302920a278d07ad0c
546d96754ffee3142bcbbf4563c624c053d0ed0d
6c16dc8cf2b28818c852e95302920a278d07ad0c

In terms of rebasing git branches, git is actually pretty smart. For
example, I have a local HDFS-1073 branch in my hdfs repo. To transition it
to the new combined repo, I did the following:

# Add my project-split hdfs git repo as a remote:
git remote add splithdfs /home/todd/git/hadoop-hdfs/
git fetch splithdfs

# Checkout a branch in my combined repo
git checkout -b HDFS-1073 splithdfs/HDFS-1073

# Rebase it on the combined 1073 branch
git rebase origin/HDFS-1073

...and it actually applies my patches inside the appropriate subdirectory (I
was surprised and impressed by this!)
If the branch you're rebasing has added or moved files, it might not be
smart enough and you'll have to manually rename them in your branch inside
of the appropriate subtree.. but for simple patches this seems to work. For
less simple things, the best bet may be to use git filter-branch on the
patch series to relocate it inside a subdirectory, and then try to rebase.
Let me know if you need a hand with any git cleanup, happy to help.


== Outstanding issues ==

The one outstanding issue I'm aware of is that the test-patch builds should
be smart enough to be able to deal with patches that are relative to the
combined root instead of the original project. Right now, if you export a
diff from git, it will include hdfs/ or mapreduce/ in the changed file
names, and the QA bot

Re: HADOOP-7106 (project unsplit) this weekend

2011-06-11 Thread Todd Lipcon

Hi all,

I'm figuring out one more small nit I noticed in my testing this evening.
Hopefully I will figure out what's going wrong and be ready to press the big
button tomorrow.

Assuming I don't have to abort mission, my hope is to do this at around
3PM PST tomorrow (Sunday). I'll send out a message asking folks to please
hold commits to all branches while the move is in progress.

Thanks
-Todd

On Fri, Jun 10, 2011 at 11:20 AM, Todd Lipcon t...@cloudera.com wrote:

 Hi all,

 Pending any unforeseen issues, I am planning on committing HADOOP-7106 this
 weekend. I have the credentials from Jukka to take care of the git trees as
 well, and have done a practice move several times on a local mirror of the
 svn.

 I'll send out an announcement of the exact time in advance of when I
 actually do the commit.

 Thanks
 -Todd
 --
 Todd Lipcon
 Software Engineer, Cloudera




-- 
Todd Lipcon
Software Engineer, Cloudera

HADOOP-7106 (project unsplit) this weekend

2011-06-10 Thread Todd Lipcon

Hi all,

Pending any unforeseen issues, I am planning on committing HADOOP-7106 this
weekend. I have the credentials from Jukka to take care of the git trees as
well, and have done a practice move several times on a local mirror of the
svn.

I'll send out an announcement of the exact time in advance of when I
actually do the commit.

Thanks
-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

Re: LimitedPrivate and HBase

2011-06-06 Thread Todd Lipcon

On Mon, Jun 6, 2011 at 9:45 AM, Allen Wittenauer a...@apache.org wrote:



I have some concerns over the recent usage of LimitedPrivate being
 opened up to HBase.  Shouldn't HBase really be sticking to public APIs
 rather than poking through some holes?  If HBase needs an API, wouldn't
 other clients as well?


IMO LimitedPrivate can be used to open an API for a specific project when
it's not clear that the API is generally useful, and/or we anticipate the
API might be pretty unstable. Marking it LimitedPrivate to HBase gives us
the opportunity to talk to the HBase team and say hey, we want to rename
this without @Deprecation or hey, we're going to kill this, is that OK?
Making it true public, even if we call it Unstable, is a bit harder to move.

I agree that most of these things in the long run would be determined
generally useful and made public.

Do you have a specific thing in mind?

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

Re: LimitedPrivate and HBase

2011-06-06 Thread Todd Lipcon

On Mon, Jun 6, 2011 at 6:05 PM, Allen Wittenauer a...@apache.org wrote:


 On Jun 6, 2011, at 5:56 PM, Todd Lipcon wrote:

  Or because this is the sort of thing that could take weeks of discussion
 or
  just 5 minutes to unblock HBase from moving on to trunk. I'd rather have
 the
  weeks of discussion *after* the 5 minute patch, so people can continue to
  make progress. We've moved too slowly for too long.


 I didn't realize trunk was coming out as a release next month.


If all goes well, 0.22 will come out as a release some time in that
timeframe. Stack has been getting HBase running on it. This patch was to fix
0.22.



Let's face it: this happened because it was HBase.  If it was almost
 anyone else, it would have sat there and *that's* the point where I'm
 mainly concerned.


If you want to feel better, take a look at HDFS-941, HDFS-347, and HDFS-918
- these are patches that HBase has been asking for for nearly 2 years in
some cases and haven't gone in. Satisfied?

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Update on 0.22

2011-06-02 Thread Todd Lipcon

On Wed, Jun 1, 2011 at 11:32 PM, Konstantin Shvachko
shv.had...@gmail.comwrote:

 I can see them well.
 I think Suresh's point is that non-blockers are going into 0.22.
 Nigel, do you have full control over it?


Of course it's up to Nigel to decide, but here's my personal opinion:

One of the reasons we had a lot of divergence (read: external
branches/forks/whatever) off of 0.20 is that the commit rules on the branch
were held pretty strictly. So, if you wanted a non-critical bug fix or a
small improvement, the only option was to do such things on an external
fork. 0.20 was branched in December '08 and not released until mid April
'09. In 4 months a fair number of bug fixes and small improvements go in.
0.22 has been around even longer. If we were to keep it to *only* blockers,
then again it would be a fairly useless release due to the number of
non-blocker bugs.

Clearly there's a balance and a judgment call when moving things back to a
branch. But at this point I'd consider small improvements and pretty much
any bug fix to be reasonable, so long as it doesn't involve major reworking
of components. Nigel: if this assumption doesn't jive (ha ha, get it?) with
what you're thinking, please let me know :)

-Todd


 On Wed, Jun 1, 2011 at 1:50 PM, Eric Baldeschwieler eri...@yahoo-inc.com
 wrote:

  makes sense to me, but it might be good to work to make these decisions
  visible so folks can understand what is happening.
 
  On Jun 1, 2011, at 1:46 PM, Owen O'Malley wrote:
 
  
   On Jun 1, 2011, at 1:27 PM, Suresh Srinivas wrote:
  
   I see that there are several non blockers being promoted to 0.22 from
  trunk.
   From my understanding, any non blocker change to 0.22 should be
 approved
  by
   vote. Is this correct?
  
   No, the Release Manager has full control over what goes into a release.
  The PMC votes on it once there is a release candidate.
  
   -- Owen
 
 




-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Update on 0.22

2011-06-02 Thread Todd Lipcon

On Thu, Jun 2, 2011 at 11:06 AM, Konstantin Shvachko
shv.had...@gmail.comwrote:

 I propose just to make them blockers before committing to attract attention
 of the release manager and get his approval. Imho, even small changes, like
 HDFS-1954 are blockers, because a vague UI message is bug and bugs are
 blockers.


Bugs are blockers? Then we'll never release!

Let's hear from Nigel what he thinks. It's his branch, if he's upset about
the way it's being handled, he can deal with it as he sees fit.

-Todd


 On Thu, Jun 2, 2011 at 10:39 AM, Todd Lipcon t...@cloudera.com wrote:

  On Wed, Jun 1, 2011 at 11:32 PM, Konstantin Shvachko
  shv.had...@gmail.comwrote:
 
   I can see them well.
   I think Suresh's point is that non-blockers are going into 0.22.
   Nigel, do you have full control over it?
  
 
  Of course it's up to Nigel to decide, but here's my personal opinion:
 
  One of the reasons we had a lot of divergence (read: external
  branches/forks/whatever) off of 0.20 is that the commit rules on the
 branch
  were held pretty strictly. So, if you wanted a non-critical bug fix or a
  small improvement, the only option was to do such things on an external
  fork. 0.20 was branched in December '08 and not released until mid April
  '09. In 4 months a fair number of bug fixes and small improvements go in.
  0.22 has been around even longer. If we were to keep it to *only*
 blockers,
  then again it would be a fairly useless release due to the number of
  non-blocker bugs.
 
  Clearly there's a balance and a judgment call when moving things back to
 a
  branch. But at this point I'd consider small improvements and pretty much
  any bug fix to be reasonable, so long as it doesn't involve major
 reworking
  of components. Nigel: if this assumption doesn't jive (ha ha, get it?)
 with
  what you're thinking, please let me know :)
 
  -Todd
 
 
   On Wed, Jun 1, 2011 at 1:50 PM, Eric Baldeschwieler 
  eri...@yahoo-inc.com
   wrote:
  
makes sense to me, but it might be good to work to make these
 decisions
visible so folks can understand what is happening.
   
On Jun 1, 2011, at 1:46 PM, Owen O'Malley wrote:
   

 On Jun 1, 2011, at 1:27 PM, Suresh Srinivas wrote:

 I see that there are several non blockers being promoted to 0.22
  from
trunk.
 From my understanding, any non blocker change to 0.22 should be
   approved
by
 vote. Is this correct?

 No, the Release Manager has full control over what goes into a
  release.
The PMC votes on it once there is a release candidate.

 -- Owen
   
   
  
 
 
 
  --
  Todd Lipcon
  Software Engineer, Cloudera
 




-- 
Todd Lipcon
Software Engineer, Cloudera

Spam on wiki

2011-06-02 Thread Todd Lipcon

FYI, I've filed the following ticket with ASF Infrastructure to see if we
can get a CAPTCHA set up on our wiki:
https://issues.apache.org/jira/browse/INFRA-3670

In the meantime, we've been doing a decent job of policing, let's keep it up

https://issues.apache.org/jira/browse/INFRA-3670-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Release compatibility was Re: [VOTE] Release candidate 0.20.203.0-rc1

2011-05-10 Thread Todd Lipcon

On Tue, May 10, 2011 at 12:41 PM, Scott Carey sc...@richrelevance.comwrote:


 As an observer, this is a very important observation.  Sure, the default
 is that dot releases are bugfix-onl.  But exceptions to these rules are
 sometimes required and often beneficial to the health of the project.
 Performance enhancements, minor features, and other items are sometimes
 very low risk and the barrier to getting them to users earlier should be
 lower.


I agree whole-heartedly.


 These issues are the sort of things that get into non-Apache releases
 quickly and drive the community away from the Apache release.  Its been
 well proven through those vehicles that back-porting minor features and
 improvements from trunk to an old release can be done safely.


However, one shouldn't understate the difficulty of agreeing on the
risk-reward tradeoff here. While risk is mostly technical, reward may vary
widely based on the userbase or organization.

For example, everyone would agree that security was a very risky feature to
add to 20, with known backward compatibilities and a lot of fallout. For
some people (both CDH and YDH), the security features were an absolute
necessity on a tight timeline, so the risk-reward decision was clear -- I've
heard from many users, though, that they saw none of the reward from
security and wished they hadn't had to endure the resulting changes and bugs
within the 0.20 series.

Another example is the 0.20-append patch series, which is indispensable for
the HBase community but seen as overly risky by those who do not use HBase.

So, while I'm in favor of sustaining release series like 0.20-security in
theory, I also think we need a clear inclusion criteria for such branches.
As I said in a previous email, the criteria used to be low risk compatible
bug fixes only with a vote process for any exceptions. 0.20-security is
obviously entirely different, but as yet remains undefined (it's way more
than just security).

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

newbie label on JIRA

2011-05-10 Thread Todd Lipcon

Hi all,

I spent this afternoon looking through JIRA to identify some issues that I
think would be good for new contributors to try their hand at. In my mind,
the qualities of such an issue are:

- fairly straightforward issue to solve (an experienced contributor would be
able to address it in 30-60 minutes)
- fairly tight scope (doesn't require understanding of a lot of different
moving pieces)
- easy to write a unit test for (so we get new contributors on the right
path of testing their changes)
- not likely to be controversial among contributors

I came up with about 25 of these from looking through the 0.22 and 0.23
Affects Version lists:
https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=truejqlQuery=project+in+(%22HADOOP%22,+%22MAPREDUCE%22,+%22HDFS%22)+and+labels+%3D+%22newbie%22

I'd like to encourage others to look through any JIRAs that they think fit
the bill, and add the same label. Then, we can point new contributors at
this list of JIRAs -- hopefully this will get them on the right path towards
understanding our project's workflow and give some nice positive
reinforcement since they should be easy to review and commit quickly.

Thanks!

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

Re: HADOOP-7106: Re-organize hadoop subversion layout

2011-05-06 Thread Todd Lipcon

Hey folks,

FYI I'm in the process of loading one of the SVN dumps onto a server here.

MAN is it slow. Going maybe 2 revisions/sec so I should have an SVN
replica in somewhere around a week to test with.

@Infra: I don't suppose it's possible to get a writable snapshot mounted
somehow? If I recall correctly, ZFS supports this and svn.apache.org runs
ZFS?


-Todd

On Fri, Apr 29, 2011 at 1:16 PM, Nigel Daley nda...@mac.com wrote:

 I can't do this at 2pm now.  Todd, I suspect you want more time to try out
 the svn/git test anyways.

 Let's shoot for next Wednesday at 2pm.  Ian should be back by then too.
  Any objections?

 Cheers,
 Nige

 On Apr 29, 2011, at 11:36 AM, Owen O'Malley wrote:

 
  On Apr 28, 2011, at 11:24 PM, Todd Lipcon wrote:
 
  Wasn't sure how to go about doing that. I guess we need to talk to infra
 about it? Do you know how we might clone the SVN repos themselves to test
 with?
 
  It looks like there are svn dumps at http://svn-master.apache.org/dump/from 
  2 april 2011. You should be able to use those to setup a local
 subversion.
 
  -- Owen
 




-- 
Todd Lipcon
Software Engineer, Cloudera

Re: [VOTE] Release candidate 0.20.203.0-rc1

2011-05-04 Thread Todd Lipcon

-1 for the same reasons I outlined in my email yesterday. This is not a
community artifact following the community's processes, and thus should not
be an official release until those issues are addressed.

On Wed, May 4, 2011 at 3:17 PM, Doug Cutting cutt...@apache.org wrote:

 -1

 This candidate has lots of patches that are not in trunk, potentially
 adding regressions to 0.22 and 0.23.  This should be addressed before we
 release from 0.20-security.  We should also not move to four-component
 version numbering.  A release from the 0.20-security branch should
 perhaps be called 0.20.100.

 Doug

 On 05/04/2011 10:31 AM, Owen O'Malley wrote:
  Here's an updated release candidate for 0.20.203.0. I've incorporated the
 feedback and included all of the patches from 0.20.2, which is the last
 stable release. I also fixed the eclipse-plugin problem.
 
  The candidate is at:
 http://people.apache.org/~omalley/hadoop-0.20.203.0-rc1/
 
  Please download it, inspect it, compile it, and test it. Clearly, I'm +1.
 
  -- Owen




-- 
Todd Lipcon
Software Engineer, Cloudera

Re: [VOTE] Release candidate 0.20.203.0-rc1

2011-05-04 Thread Todd Lipcon

With Cloudera hat on, I agree with Eli's assessment.

With Apache hat on, I don't see how this is at all relevant to the task at
hand. I would make the same arguments against taking CDH3 and releasing it
as an ASF artifact -- we'd also have a certain amount of work to do to make
sure that all of the patches are in trunk, first. Additionally, I'd want to
outline what the inclusion criteria would be for that branch.

-Todd

On Wed, May 4, 2011 at 3:24 PM, Eli Collins e...@cloudera.com wrote:

 With my Cloudera hat on..

 When we went through the 10x and 20x patches we only pulled a subset
 of them, primarily for security and the general improvements that we
 thought were good.  We found both incompatible changes and some
 sketchy changes that we did not pull in from a quality perspective.
 There is a big difference between a patch set that's acceptable for
 Yahoo!'s user base and one that's a more general artifact.

 When we evaluated the YDH patch sets we were using that frame of mind.
  I'm now looking it in terms of an Apache release. And the place to
 review changes for an Apache release is on jira.

 CDH3 is based on the latest stable Apache release (20.2) so it doesn't
 regress against it.  I'm nervous about rebasing future releases on 203
 because of the compatibility and quality implications.

 Thanks,
 Eli


 On Wed, May 4, 2011 at 3:06 PM, Suresh Srinivas sures...@yahoo-inc.com
 wrote:
  Eli,
 
  How many of these patches that you find troublesome are in CDH already?
 
  Regards,
  Suresh
 
 
  On 5/4/11 3:03 PM, Eli Collins e...@cloudera.com wrote:
 
  On Wed, May 4, 2011 at 10:31 AM, Owen O'Malley omal...@apache.org
 wrote:
  Here's an updated release candidate for 0.20.203.0. I've incorporated
 the
  feedback and included all of the patches from 0.20.2, which is the last
  stable release. I also fixed the eclipse-plugin problem.
 
  The candidate is at:
 http://people.apache.org/~omalley/hadoop-0.20.203.0-rc1/
 
  Please download it, inspect it, compile it, and test it. Clearly, I'm
 +1.
 
  -- Owen
 
  While rc2 is an improvement on rc1, I am -1 on this particular rc.
  Rationale:
 
  This rc contains many patches not yet committed to trunk. This would
  cause the next major release (0.22) to be a feature regression against
  our latest stable release (203), were 0.22 released soon.
 
  This rc contains many patches not yet reviewed by the community via
  the normal process (jira, patch against trunk, merge to a release
  branch). I think we should respect the existing community process that
  has been used for all previous releases.
 
  This rc introduces a new development and braching model (new feature
  development outside trunk) and Hadoop versioning scheme without
  sufficient discussion or proposal of these changes with the community.
 
  We should establish new process before the release, a release is not
  the appropriate mechanism for changing our review and development
  process or versioning .
 
  I do support a release from branch-0.20-security that follows the
  existing, established community process.
 
  Thanks,
  Eli
 
 




-- 
Todd Lipcon
Software Engineer, Cloudera

Re: [VOTE] Release candidate 0.20.203.0-rc1

2011-05-04 Thread Todd Lipcon

On Wed, May 4, 2011 at 4:11 PM, Arun C Murthy a...@yahoo-inc.com wrote:

 On May 4, 2011, at 4:09 PM, Tsz Wo (Nicholas), Sze wrote:

  The list seems highly inaccurate.  Checked the first few N/A items.  All
 are
 false positives.


 Also,  can you please provide a list on features which are not related to
 gridmix benchmarks or herriot tests?


Here are a few I quickly pulled up:
MAPREDUCE-2316 (docs for improved capacity scheduler)
MAPREDUCE-2355 (adds new config for heartbeat dampening in MR)

   BZ-4182948. Add statistics logging to Fred for better visibility into
startup time costs. (Matt Foley)
- I believe I saw a note from Matt on the JIRA yesterday about this feature,
where he decided that the version done in 203 wasn't a good approach, and
it's done differently in trunk (not sure if done yet).

MAPREDUCE-2364 (important bug fix for localization)
- in fact most of localization is different in this branch compared to trunk
due to inclusion of MAPREDUCE-2378, the trunk version of which is still on
the yahoo-merge branch,.

New cunters for FileInput/OutputFormat. New Counter
MAP_OUTPUT_MATERIALZIED_BYTES. Related bugs: 4241034, 3418543,
4217546
- not sure which JIRA this is, I think I've seen a JIRA for trunk, but not
committed.

- MAPREDUCE-1904, committed without JIRA as:
. Reducing new Path(), RawFileStatus() creation overhead in
LocalDirAllocator
not in trunk

+BZ4101537 .  When a queue is built without any access rights we explain
the
+problem.  (dking, rvw ramach)  [attachment of 2010-11-24]
seems to be on trunk as MR-2411, but not committed, best I can tell, despite
the JIRA there being resolved (based on looking at QueueManager in trunk)

. Remove unnecessary reference to user configuration from
TaskDistributedCacheManager causing memory leaks
Not in trunk, not sure which JIRA it might be.. probably part of 2178.

Major new feature: MAPREDUCE-323 - very large rework of how job history
files are managed
Major change: MAPREDUCE-1100/MAPREDUCE-1176: unresolved on trunk, though
probably will be attacked by different JIRAs
Major new ops-visible feature: metrics2 system
Major new ops-visible feature: MAPREDUCE-291 job history can be viewed from
a separate server
Major new set of user-visible configurations: MAPREDUCE-1943 and friends
which implement new limits in MapReduce (eg MAPREDUCE-1872 as well)

I have code to work on, so I won't keep going, but this is from looking at
the last couple months of 203.

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

Re: [VOTE] Release candidate 0.20.203.0-rc0

2011-05-03 Thread Todd Lipcon

 on a wiki page (or web site page) regarding the
currently active branches?

Thanks
-Todd

On Tue, May 3, 2011 at 10:02 AM, Eli Collins e...@cloudera.com wrote:

 I think we still need to incorporate the patches currently checked
 into branch 0.20.  For example, Owen identified a major bug
 (BooleanWritable's comparator is broken) and filed a jira
 (HADOOP-6928) to put it in branch-0.20, where I reviewed it and
 checked it in, so this bug would be fixed in the next stable release.
 However this change is not in branch-0.20-security-203. Unless we put
 the delta from branch-0.20 into this release, it is missing important
 bug fixes that will cause it to regress against 20.3 (if it ever is
 released).

 I am also nervous about changes like the one identified by
 HADOOP-7255. It looks like this change caused a significant regression
 in TestDFSIO throughput. It changes the core Task class, the commit
 log is a single line, and as far as I can tell it was not discussed or
 reviewed by anyone in the community. Don't changes like this at least
 deserve a jira before we release them?

 Thanks,
 Eli

 On Tue, May 3, 2011 at 1:39 AM, Konstantin Shvachko
 shv.had...@gmail.com wrote:
  I think its a good idea to release hadoop-0.20.203. It moves Apache
 Hadoop a
  step forward.
 
  Looks like the technical difficulties are resolved now with latest Arun's
  commits.
  Being a superset of hadoop-0.20.2 it can be considered based on one of
 the
  official Apache releases.
  I don't think there was a lack of discussions on the lists about the
 issues
  included in the release candidate. Todd did a thorough review of the
 entire
  security branch. Many developers participated in discussions.
  Agreeing with Stack I wish HBase was considered a primary target for
 Hadoop
  support. But it is not realistic to have it in hadoop-0.20.203.
  I have some experience running a version of this release candidate on a
  large cluster. It works. I would add a couple of patches, which make it
 run
  on Windows for me like HADOOP-7110, HADOOP-7126. But those are not
 blockers.
 
  Thanks,
  --Konstantin
 
 
  On Mon, May 2, 2011 at 5:12 PM, Ian Holsman had...@holsman.net wrote:
 
 
  On May 3, 2011, at 9:58 AM, Arun C Murthy wrote:
 
  
   Owen, Suresh and I have committed everything on this list except
   HADOOP-6386 and HADOOP-6428. Not sure which of the two are relevant/
   necessary, I'll check with Cos.  Other than that hadoop-0.20.203 now
 a
   superset of hadoop-0.20.2.
  
  
   Missed adding HADOOP-5759 to that list, I'll check with Amareshwari
  before committing.
  
   Arun
 
  Thanks for doing this so fast Arun.
 
 
 




-- 
Todd Lipcon
Software Engineer, Cloudera

Re: HADOOP-7106: Re-organize hadoop subversion layout

2011-04-29 Thread Todd Lipcon

On Thu, Apr 28, 2011 at 10:06 PM, Nigel Daley nda...@mac.com wrote:

 As announced last week, I'm planning to do this at 2pm PDT tomorrow
 (Friday) April 29.

 Suresh, when do you plan to commit HFS-1052?  That should be done first.

 Owen or Todd, did you want to follow Paul's advice:
  If you're really wanting to make sure to keep the history in Git
  intact my suggestion would be to setup a temporary svn server locally
  and test our mirroring scripts against the commands you intend to run.
 If so, how much more time do you need?


Wasn't sure how to go about doing that. I guess we need to talk to infra
about it? Do you know how we might clone the SVN repos themselves to test
with?

-Todd

On Apr 20, 2011, at 9:42 PM, Nigel Daley wrote:

  Owen, I'll admit I'm not familiar with all the git details/issues in your
 proposal, but I think the layout change you propose is fine and seems to
 solve the git issues with very minimal impact on the layout.
 
  Let's shoot for doing this next Friday, April 29 at 2pm PDT.  I'll update
 the patch and send out a reminder about this later next week.
 
  Thanks,
  Nige
 
  On Apr 20, 2011, at 8:00 AM, Owen O'Malley wrote:
 
 
  On Apr 19, 2011, at 10:58 PM, Todd Lipcon wrote:
 
  On Tue, Apr 19, 2011 at 10:20 PM, Todd Lipcon t...@cloudera.com
 wrote:
 
 
  I'm currently looking into how the git mirrors are setup in
 Apache-land.
 
  Uh, why isn't infra-dev on this thread?
 
  For those on infra-dev, the context is that Nigel is trying to merge
 together the source trees of the Hadoop sub-projects that were split apart 2
 years ago. So he is taking:
 
  prefix = http://svn.apache.org/repos/asf/hadoop/
 
  $prefix/common/trunk - $prefix/trunk/common
  $prefix/hdfs/trunk - $prefix/trunk/hdfs
  $prefix/mapreduce/trunk - $prefix/trunk/mapreduce
 
  and play similar games with the rest of the branches and tags. For more
 details look at HADOOP-7106.
 
  From the project split, subversion was able to track the history across
 the subversion moves between projects, but not git.
 
  Four questions:
  1. Is there anything we can do to minimize the history loss in git?
  2. Are we going to be able to preserve our sha's or are they going to
 change again?
  3. What changes do we need to make to the subversion notification file?
  4. Are there any other changes that need to be coordinated?
 
  After considering it this morning, I believe that the least disruptive
 move is to leave common at the same url and merge hdfs and mapreduce back
 in:
 
  $prefix/common/trunk/* - $prefix/common/trunk/common/*
  $prefix/hdfs/trunk - $prefix/common/trunk/hdfs
  $prefix/mapreduce/trunk - $prefix/common/trunk/mapreduce
 
  This will preserve the hashes and history for common (and the 20
 branches). We'll still need to play git voodoo to get git history for hdfs
 and mapreduce, but it is far better than starting a brand new git clone.
 
  -- Owen
 
 
 




-- 
Todd Lipcon
Software Engineer, Cloudera

Re: HADOOP-7106: Re-organize hadoop subversion layout

2011-04-19 Thread Todd Lipcon

On Tue, Apr 19, 2011 at 10:02 PM, Nigel Daley nda...@mac.com wrote:

 I'm still planning to make this SVN change on Thursday this week.

 Ian, Owen, Todd, note the questions I ask you below.  Can you help with
 these on Thursday?


Unfortunately I'm out of the office most of the day on Thursday with a
customer. I'll be available Thursday evening, though, to help with any
cleanup/etc.

I'm currently looking into how the git mirrors are setup in Apache-land.

My guess is that there will be some disturbance to developers on Thurs
afternoon / Friday as this gets sorted out, even if we try to plan as much
as possible. Would it be better to do this on Friday so that we have the
weekend to fix up broken pieces before people get to work on Monday?

-Todd


 On Apr 9, 2011, at 11:09 PM, Nigel Daley wrote:

 All,

 As discussed in Jan/Feb, I'd like to coordinate a date for committing the
 re-organization of our svn layout:
 https://issues.apache.org/jira/browse/HADOOP-7106.  I propose Thursday
 April 21 at 11am PDT.

 - I will send out reminders leading up to that date.
 - I will announce on IRC when I'm about to start the changes.
 - I will run the script to make the changes.
 - Ian, can you update the asf-authorization-template file and the
 asf-mailer.conf files at the same time?
 - Owen/Todd/Jukka, can you make sure that actions needed by git users are
 taken care of at the same time? (what are these?)

 More info on this change is at http://wiki.apache.org/hadoop/ProjectSplit

 Cheers,
 Nige





-- 
Todd Lipcon
Software Engineer, Cloudera

Re: HADOOP-7106: Re-organize hadoop subversion layout

2011-04-19 Thread Todd Lipcon

On Tue, Apr 19, 2011 at 10:20 PM, Todd Lipcon t...@cloudera.com wrote:


 I'm currently looking into how the git mirrors are setup in Apache-land.


Git-wise, I think we have two options:

Option 1)
- Create a new git mirror for the new hadoop/ tree. This will have no
history.
- On the Apache side, fetch the split-project git mirrors into the combined
git mirror as branches - eg hadoop-hdfs.git:trunk becomes a branch named
something like pre-HADOOP-7106/hdfs/trunk. Thus, when any user fetches,
he'll get all the git objects from prehistory as well without having to
add separate remotes.
- Add a script or README file explaining how to set up git grafts on the
combined hadoop.git so that the new combination branch foo looks like a
merge of pre-HADOOP-7106/{hdfs,common,mapred}/foo. Since git grafts are
local constructs, each git user would have to run this script once after
checking out the git tree, after which the history would be healed

Pros:
 - all existing sha1s stay the same.
 - Any local branches people might have for works in progress should
continue to refer to proper SHA1s and should rebase relatively easily onto
the combined trunk
 - Should be reasonably simple to implement

Cons:
 - users have to run a script upon checkout in order to graft back together
history

Option 2)
- Use git-filter-branch on the split repos to rewrite them as if they always
took place in their new subdirectories.
- Fetch these repos into the merged repo
- Set up grafts in the merged repo
- Run git-filter-branch --all in the merged repo, which will make the grafts
permanent
- May have to run git-filter-branch to rewrite some of the git-svn-info:
commit messages to trick git-svn.

This option basically rewrites history so that it looks like the original
project split did what we're planning to do now.

Pros:
 - we have a single cohesive git repo with no need to have users set up
grafts

Cons:
 - all of our SHA1s between the original split and now would change (making
it harder to rebase local branches for example)
 - way more opportunity for error, I think.

I'm leaning towards option 1 above, and happy to write the script which
installs the grafts into the user's local repo.

-Todd



 On Apr 9, 2011, at 11:09 PM, Nigel Daley wrote:

 All,

 As discussed in Jan/Feb, I'd like to coordinate a date for committing the
 re-organization of our svn layout:
 https://issues.apache.org/jira/browse/HADOOP-7106.  I propose Thursday
 April 21 at 11am PDT.

 - I will send out reminders leading up to that date.
 - I will announce on IRC when I'm about to start the changes.
 - I will run the script to make the changes.
 - Ian, can you update the asf-authorization-template file and the
 asf-mailer.conf files at the same time?
 - Owen/Todd/Jukka, can you make sure that actions needed by git users are
 taken care of at the same time? (what are these?)

 More info on this change is at http://wiki.apache.org/hadoop/ProjectSplit

 Cheers,
 Nige





 --
 Todd Lipcon
 Software Engineer, Cloudera




-- 
Todd Lipcon
Software Engineer, Cloudera

Re: [ANNOUNCEMENT] Yahoo focusing on Apache Hadoop, discontinuing The Yahoo Distribution of Hadoop

2011-04-07 Thread Todd Lipcon

Is there a list available of which patches you've made this decision about?
I'm curious, for example, about MAPREDUCE-2178 -- as of today, the MR
security in trunk has a serious vulnerability. Do we plan on fixing it, or
will the answer be that, if anyone needs security, they must update to MR
Next Gen?

-Todd

On Thu, Apr 7, 2011 at 3:52 PM, Arun C Murthy a...@yahoo-inc.com wrote:


 On Feb 14, 2011, at 1:34 PM, Arun C Murthy wrote:


 As the final installment in this process, I've started a discussion on
 us contributing a re-factor of Map-Reduce in
 https://issues.apache.org/jira/browse/MAPREDUCE-279
 .




 Hi Folks,

 We wanted to share our thoughts around the co-development of the NextGen
 MapReduce branch (Jira MR-279), maintaining the branch-0.20-security and
 merging the work on the security branch with trunk.  We've concluded that it
 does not make sense for us to port a very small subset of the work from the
 branch-0.20-security to the Hadoop mainline.  The JIRAs we don't plan to
 port all effect areas of the mainline that are going to be replaced by work
 in the NextGen MapReduce branch (
 http://svn.apache.org/viewvc/hadoop/mapreduce/branches/MR-279/).

 We've been working on the NextGen MapReduce branch (MAPREDUCE-279) within
 Apache for a while now and are excited about it's progress.  We think that
 this branch will be a huge improvement in scalability, performance and
 functionality.  We are now confident that we can get it ready for release in
 in the next few months.  We believe that the next major release of Apache
 Hadoop we will test at Yahoo will include the work in this branch and we are
 committed to merging the NextGen branch into the mainline after the PMC
 approves the merge.

 Meanwhile, we have continued to find and fix bugs on branch-0.20-security
 and have been working to port that work into the Hadoop mainline.  Most of
 this work is done and we've also brought all the patches in from our github
 branch into apache subversion, so that it is easy for everyone to see the
 work remaining.  What we've found is that some of the work in
 branch-0.20-security is in code sections that have been completely replaced
 / refactored in the NextGen MapReduce branch.  Since we are committed to the
 NextGen branch, we don't think there is any upside in porting this code into
 portions of mainline we expect to discard. All of these JIRAs will be fixed
 in the NextGen MapReduce branch and through there ultimately in trunk
 (assuming the PMC approves the merge).

 So at this point it is our intent to not port the JIRAs listed above to
 trunk, but to wait until we merge NextGen into trunk to resolve these issues
 there.  If you are interested in seeing these issues ported to mainline, let
 us know.  We are happy to help review your patches and explain context to
 anyone who is interested in doing this work.

 Arun and Eric




-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Proposal: Further Project Split(s)

2011-04-01 Thread Todd Lipcon

+4.01. This is a terrific idea.

On Fri, Apr 1, 2011 at 1:19 AM, Aaron T. Myers a...@cloudera.com wrote:

 Hello Hadoop Community,

 Given the tremendous positive feedback we've all had regarding the HDFS,
 MapReduce, and Common project split, I'd like to propose we take the next
 step and further separate the existing projects.

 I propose we begin by splitting the MapReduce project into separate Map
 and Reduce sub-projects. This will provide us the opportunity to tease
 out
 the complex interdependencies between map and reduce that exist today,
 to encourage us to write more modular and isolated code, which should speed
 releases. This will also aid our users who exclusively run map-only or
 reduce-only jobs. These are important use-cases, and so should be given
 high
 priority.

 Given that these two portions of the existing MapReduce project share a
 great deal of code, we will likely need to release these two new projects
 concurrently at first, but the eventual goal should certainly be to be able
 to release Map and Reduce independently. This seems intuitive to me,
 given the remarkable recent advancements in the academic community
 regarding
 reduce, while the research coming out of the map academics has largely
 stagnated of late.

 If this proposal is accepted, and it has the success I think it will, then
 we should strongly consider splitting the other two projects as well. My
 gut
 instinct is that we should split HDFS into HD and FS sub-projects,
 and
 simply rename the Common project to C'Mon. We can think about the
 details of what exactly these project splits mean later.

 Please let me know what you think.

 Best,
 Aaron




-- 
Todd Lipcon
Software Engineer, Cloudera

Maintenance of Hadoop 0.21 branch?

2011-03-30 Thread Todd Lipcon

Hi all,

Some recent discussion on HDFS-1786 has raised an interesting question: does
anyone plan on maintaining the 0.21 branch and eventually releasing an
0.21.1? Should we bother to commit bug fixes to this branch?

It seems to me that our time would be better spent getting 0.22 and trunk
back to a green state so we can talk about releasing them, rather than
applying patches to a branch with no releases planned.

Of course a decision now doesn't preclude anyone from stepping up later,
backporting patches to 0.21, and releasing an 0.21.1.

Thanks
-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Maintenance of Hadoop 0.21 branch?

2011-03-30 Thread Todd Lipcon

On Wed, Mar 30, 2011 at 2:50 PM, Tsz Wo (Nicholas), Sze 
s29752-hadoopgene...@yahoo.com wrote:

 I recall that some users are using 0.21.   Should we discuss this on the
 users
 mailing lists?


I thought -general was considered a user mailing list for hadoop-wide
discussions like this? We can add +CC all the user lists, or just
common-user which most people are on, if you think that's better?

-Todd


 
 From: Todd Lipcon t...@cloudera.com
 To: general@hadoop.apache.org
 Sent: Wed, March 30, 2011 2:41:35 PM
 Subject: Maintenance of Hadoop 0.21 branch?

 Hi all,

 Some recent discussion on HDFS-1786 has raised an interesting question:
 does
 anyone plan on maintaining the 0.21 branch and eventually releasing an
 0.21.1? Should we bother to commit bug fixes to this branch?

 It seems to me that our time would be better spent getting 0.22 and trunk
 back to a green state so we can talk about releasing them, rather than
 applying patches to a branch with no releases planned.

 Of course a decision now doesn't preclude anyone from stepping up later,
 backporting patches to 0.21, and releasing an 0.21.1.

 Thanks
 -Todd
 --
 Todd Lipcon
 Software Engineer, Cloudera




-- 
Todd Lipcon
Software Engineer, Cloudera

Re: [VOTE] Abandon hod Common contrib

2011-02-23 Thread Todd Lipcon

Can any committer with knowledge of HOD please review this patch?

If there are no committers with such knowledge, I would encourage us to
either (a) add a committer to maintain hod, or (b) reconsider the vote to
abandon it as an official contrib. Perhaps Simone and Gianluigi could move
it to a separate incubator project?

-Todd

On Fri, Feb 18, 2011 at 6:40 AM, Simone Leo simone@crs4.it wrote:

 I am the co-author (with Gianluigi Zanetti) of HADOOP-6369 -- add Grid
 Engine support to HOD. At CRS4 we've been using (our patched version of)
 HOD since 2008 and we still use it in production. We use Hadoop 0.20.2
 since it was released one year ago.

 Simone

 On 02/12/11 06:15, Owen O'Malley wrote:
 
  On Feb 11, 2011, at 6:17 PM, Nigel Daley wrote:
 
  a) I don't think hod is actually part of any unit tests, so including
  it would likely only be a burden on the tarball size.
 
  Not true.  HOD has python unit tests and is the reason our builds have
  dependencies on python.
 
  But Allen's point is that I don't recall ever seeing HOD test failures
  causing the build to fail.
 
  b) The edu community uses this quite extensively, evidenced by the
  topic coming up on the mailing lists at least once every two months
  or so and has for years.  Can't say that about the other contrib
  modules other than the schedulers and streaming.
 
  Then they are using old version of Hadoop.  AFAICT HOD does not work
  with 0.20 or beyond.
 
  Out of curiosity, what goes wrong? Clearly nothing major has changed in
  starting up a mapreduce cluster in a very long time.
 
  c) The community that does use it has even submitted a patch that
  we've ignored.
 
  Which means the committers of this project gave up on it long ago.
 
  There are also some patches on core Hadoop that have been sitting for a
  long time, so I don't think that is a valid inference.
 
  I would love to hear some of the people who are using HOD speak up and
  give us their feedback.
 
  -- Owen


 --
 Simone Leo
 Data Fusion - Distributed Computing
 CRS4
 POLARIS - Building #1
 Piscina Manna
 I-09010 Pula (CA) - Italy
 e-mail: simone@crs4.it
 http://www.crs4.it




-- 
Todd Lipcon
Software Engineer, Cloudera

[ANN] HBase 0.90.1 available for download

2011-02-17 Thread Todd Lipcon

The Apache HBase team is happy to announce the general availability of HBase
0.90.1, available from your Apache mirror of choice:

http://www.apache.org/dyn/closer.cgi/hbase/
[at the time of this writing, not all mirrors have updated yet -- please
pick a different mirror if your first choice does not show 0.90.1]

HBase 0.90.1 is a maintenance release that fixes several important bugs
since version 0.90.0, while retaining API and data compatibility. The
release notes may be found on the Apache JIRA:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310753version=12315548

Users upgrading from HBase 0.90.0 may upgrade clients and servers
separately, though it is recommended that both be upgraded. If upgrading
from a version of HBase prior to 0.90.0, please read the notes accompanying
that release:
http://osdir.com/ml/general-hadoop-apache/2011-01/msg00208.html

As always, many thanks to those who contributed to this release!

-The HBase Team

Re: [DISCUSS] Move common, hdfs, mapreduce contrib components to apache-extras.org or elsewhere

2011-02-01 Thread Todd Lipcon

On Tue, Feb 1, 2011 at 1:02 AM, Allen Wittenauer
awittena...@linkedin.comwrote:



 So is the expectation that users would have to follow bread crumbs
 to the github dumping ground, then try to figure out which repo is the
 'better' choice for their usage?   Using LZO as an example, it appears we
 have a choice of kevin's, your's, or the master without even taking into
 consideration any tags. That sounds like a recipe for disaster that's even
 worse than what we have today.


Kevin's and mine are currently identical
(0e7005136e4160ed4cc157c4ddd7f4f1c6e11ffa)

Not sure who the master is -- maybe you're referring to the Google Code
repo? The reason we started working on github over a year ago is that the
bugs we reported (and provided diffs for) in the Google Code project were
ignored. For example:
http://code.google.com/p/hadoop-gpl-compression/issues/detail?id=17

In fact this repo hasn't been updated since Sep '09:
http://code.google.com/p/hadoop-gpl-compression/source/list

Github provided an excellent place to collaborate on the project, make
progress, fix bugs, and provide a better product for the users.

As for dumping ground, I don't quite follow your point - we develop in the
open, accept pull requests from users, and code review each others' changes.
Since October every commit has either been contributed by or fixes a bug
reported by a user completely outside of the organizations where Kevin and I
work.

I agree that it's a bit of breadcrumb following to find the repo, though.
We do at least have a link on the wiki:
http://wiki.apache.org/hadoop/UsingLzoCompression which points to Kevin's
repo.

Perhaps the best solution here is to add a page to the official Hadoop site
(not just the wiki) with links to actively maintained contrib projects?



  IMO the more we can take non-core components and move them to separate
  release timelines, the better. Yes, it is harder for users, but it also
 is
  easier for them when they hit a bug - they don't have to wait months for
 a
  wholesale upgrade which might contain hundreds of other changes to core
  components.

 I'd agree except for one thing:  even when users do provide patches
 to contrib components we ignore them.  How long have those patches for HOD
 been sitting there in the patch queue?  So of course they wait
 months/years--because we seemingly ignore anything that isn't important to
 us.  Unfortunately, that covers a large chunk of contrib. :(


True - we ignore them because the core contributors generally have little
clue about the contrib components, so don't feel qualified to review. I'll
happily admit that I've never run failmon, index, dynamic-scheduler,
eclipse-plugin, data_join, mumak, or vertica contribs. Wouldn't you rather
these components lived on github so the people who wrote them could update
them as they wished without having to wait on committers who have little to
no clue about how to evaluate the changes?

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

Re: [DISCUSS] Move common, hdfs, mapreduce contrib components to apache-extras.org or elsewhere

2011-02-01 Thread Todd Lipcon

On Tue, Feb 1, 2011 at 9:37 AM, Tom White t...@cloudera.com wrote:


 HBase moved all its contrib components out of the main tree a few
 months back - can anyone comment how that worked out?


Sure. For each contrib:

ec2: no longer exists, and now has been integrated into Whirr and much
improved. Whirr has made several releases in the time that HBase has made
one. The whirr contributors know way more about cloud deployment than the
HBase contributors (except where they happen to overlap). Strong net
positive.

mdc_replication: pulled into core since it's developed by core committers
and also needs a fair amount of tight integration with core components

stargate: pulled into core - it was only in contrib as a sort of staging
ground - it's really an improved/new version of the rest interface we
already had in core.

transactional: moved to github - this has languished a bit on github because
only one person was actively maintaining it. However, it had already been
languishing as part of contrib - even though it compiled, it never really
worked very well in HBase trunk. So, moving it to a place where it's
languished has just made it more obvious what was already true - that it
isn't a well supported component (yet). Recently it's been taken back up by
the author of it - if it develops a large user base it can move quickly and
evolve without waiting on our release. Net: probably a wash

So, overall, I'd say it was a good decision. Though we never had the same
number of contribs that Hadoop seems to have sprouted.

-Todd



 On Tue, Feb 1, 2011 at 1:02 AM, Allen Wittenauer
 awittena...@linkedin.com wrote:
 
  On Jan 31, 2011, at 3:23 PM, Todd Lipcon wrote:
 
  On Sun, Jan 30, 2011 at 11:19 PM, Owen O'Malley omal...@apache.org
 wrote:
 
 
  Also note that pushing code out of Hadoop has a high cost. There are at
  least 3 forks of the hadoop-gpl-compression code. That creates a lot of
  confusion for the users. A lot of users never go to the work to figure
 out
  which fork and branch of hadoop-gpl-compression work with the version
 of
  Hadoop they installed.
 
 
  Indeed it creates confusion, but in my opinion it has been very
 successful
  modulo that confusion.
 
 I'm not sure how the above works with what you wrote below:
 
  In particular, Kevin and I (who each have a repo on github but basically
  co-maintain a branch) have done about 8 bugfix releases of LZO in the
 last
  year. The ability to take a bug and turn it around into a release within
 a
  few days has been very beneficial to the users. If it were part of core
  Hadoop, people would be forced to live with these blocker bugs for
 months at
  a time between dot releases.
 
 So is the expectation that users would have to follow bread crumbs
 to the github dumping ground, then try to figure out which repo is the
 'better' choice for their usage?   Using LZO as an example, it appears we
 have a choice of kevin's, your's, or the master without even taking into
 consideration any tags. That sounds like a recipe for disaster that's even
 worse than what we have today.
 
 
  IMO the more we can take non-core components and move them to separate
  release timelines, the better. Yes, it is harder for users, but it also
 is
  easier for them when they hit a bug - they don't have to wait months for
 a
  wholesale upgrade which might contain hundreds of other changes to core
  components.
 
 I'd agree except for one thing:  even when users do provide
 patches to contrib components we ignore them.  How long have those patches
 for HOD been sitting there in the patch queue?  So of course they wait
 months/years--because we seemingly ignore anything that isn't important to
 us.  Unfortunately, that covers a large chunk of contrib. :(
 
 
 




-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Hadoop-common-trunk-Commit is failing since 01/19/2011

2011-01-31 Thread Todd Lipcon

On Mon, Jan 31, 2011 at 1:57 PM, Konstantin Shvachko
shv.had...@gmail.comwrote:

Anybody with gcc active could you please verify if the problem is caused by
HADOOP-6864.

I can build common trunk just fine on CentOS 5.5 including native.

I think the issue is somehow isolated to the build machines. Anyone know
what OS they've got? Or can I swing an account on the box where the failures
are happening?

-Todd

On Mon, Jan 31, 2011 at 1:36 PM, Ted Dunning tdunn...@maprtech.com
wrote:

The has been a problem with more than one build failing (Mahout is the
one
that I saw first) due to a change in maven version which meant that the
clover license isn't being found properly. At least, that is the tale I
heard from infra.

On Mon, Jan 31, 2011 at 1:31 PM, Eli Collins e...@cloudera.com wrote:

Hey Konstantin,

The only build breakage I saw from HADOOP-6904 is MAPREDUCE-2290,
which was fixed. Trees from trunk are compiling against each other
for me (eg each installed to a local maven repo), perhaps the upstream
maven repo hasn't been updated with the latest bits yet.

Thanks,
Eli

On Mon, Jan 31, 2011 at 12:14 PM, Konstantin Shvachko
shv.had...@gmail.com wrote:
Sending this to general to attract urgent attention.
Both HDFS and MapReduce are not compiling since
HADOOP-6904 and its hdfs and MP counterparts were committed.
The problem is not with this patch as described below, but I think
those
commits should be reversed if Common integration build cannot be
restored promptly.

Thanks,
--Konstantin

On Fri, Jan 28, 2011 at 5:53 PM, Konstantin Shvachko
shv.had...@gmail.comwrote:

I see Hadoop-common-trunk-Commit is failing and not sending any
emails.
It times out on native compilation and aborts.
Therefore changes are not integrated, and now it lead to hdfs and
mapreduce
both not compiling.
Can somebody please take a look at this.
The last few lines of the build are below.

Thanks
--Konstantin

[javah] [Loaded

/grid/0/hudson/hudson-slave/workspace/Hadoop-Common-trunk-Commit/trunk/build/classes/org/apache/hadoop/security/JniBasedUnixGroupsMapping.class]

[javah] [Loaded

/homes/hudson/tools/java/jdk1.6.0_11-32/jre/lib/rt.jar(java/lang/Object.class)]
[javah] [Forcefully writing file

/grid/0/hudson/hudson-slave/workspace/Hadoop-Common-trunk-Commit/trunk/build/native/Linux-i386-32/src/org/apache/hadoop/security/org_apache_hadoop_security_JniBasedUnixGroupsNetgroupMapping.h]

[exec] checking for gcc... gcc
[exec] checking whether the C compiler works... yes
[exec] checking for C compiler default output file name...
a.out
[exec] checking for suffix of executables...

Build timed out. Aborting
Build was aborted
[FINDBUGS] Skipping publisher since build result is ABORTED
Publishing Javadoc
Archiving artifacts
Recording test results
No test report files were found. Configuration error?

Recording fingerprints
[exec] Terminated
Publishing Clover coverage report...
No Clover report will be published due to a Build Failure
No emails were triggered.
Finished: ABORTED

--
Todd Lipcon
Software Engineer, Cloudera

Re: [DISCUSS] Move common, hdfs, mapreduce contrib components to apache-extras.org or elsewhere

2011-01-31 Thread Todd Lipcon

On Sun, Jan 30, 2011 at 11:19 PM, Owen O'Malley omal...@apache.org wrote:


 Also note that pushing code out of Hadoop has a high cost. There are at
 least 3 forks of the hadoop-gpl-compression code. That creates a lot of
 confusion for the users. A lot of users never go to the work to figure out
 which fork and branch of hadoop-gpl-compression work with the version of
 Hadoop they installed.


Indeed it creates confusion, but in my opinion it has been very successful
modulo that confusion.

In particular, Kevin and I (who each have a repo on github but basically
co-maintain a branch) have done about 8 bugfix releases of LZO in the last
year. The ability to take a bug and turn it around into a release within a
few days has been very beneficial to the users. If it were part of core
Hadoop, people would be forced to live with these blocker bugs for months at
a time between dot releases.

IMO the more we can take non-core components and move them to separate
release timelines, the better. Yes, it is harder for users, but it also is
easier for them when they hit a bug - they don't have to wait months for a
wholesale upgrade which might contain hundreds of other changes to core
components. I think this will also help the situation where people have set
up shop on branches -- a lot of the value of these branches comes from the
frequency of backports and bugfixes to non-core components. If the
non-core stuff were on a faster timeline upstream, we could maintain core
stability while also offering people the latest and greatest libraries,
tools, codecs, etc.

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Patch testing

2011-01-26 Thread Todd Lipcon

On Wed, Jan 26, 2011 at 10:05 AM, Nigel Daley nda...@mac.com wrote:

 raid (contrib) test hanging: TestBlockFixer

 I forced 2 thread dumps.  Both hung in the same place.  Filed
 https://issues.apache.org/jira/browse/MAPREDUCE-2283  This is a blocker
 for turning on MR precommit.


Since this is contrib, I'd like to suggest just disabling this test
temporarily. We can re-enable it once it's fixed.

Not having MR pre-commit working has been pretty painful.

-Todd


 On Jan 25, 2011, at 11:19 PM, Nigel Daley wrote:

  Started another trial run of MR precommit testing:
 
 https://hudson.apache.org/hudson/view/G-L/view/Hadoop/job/PreCommit-MAPREDUCE-Build/17/
 
  Let's see if 17th time is a charm...
 
  Nige
 
  On Jan 7, 2011, at 5:14 PM, Todd Lipcon wrote:
 
  On Fri, Jan 7, 2011 at 2:11 PM, Nigel Daley nda...@mac.com wrote:
 
  Hrm, the MR precommit test I'm running has hung (been running for 14
 hours
  so far).  FWIW, 2 HDFS precommit tests are hung too.  I suspect it
 could be
  the NFS mounts on the machines.  I forced a thread dump which you can
 see in
  the console:
 
 https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/10/console
 
 
  Strange, haven't seen a hang like that before in
 handleConnectionFailure. It
  should retry for 15 minutes max in that loop.
 
 
  Any other ideas why these might be hanging?
 
 
  There is an HDFS bug right now that can cause hangs on some tests -
  HDFS-1529 - would appreciate if someone can take a look. But I don't
 think
  this is responsible for the MR hang above.
 
  -Todd
 
 
  On Jan 5, 2011, at 5:42 PM, Todd Lipcon wrote:
 
  On Wed, Jan 5, 2011 at 4:39 PM, Nigel Daley nda...@mac.com wrote:
 
  Thanks for looking into it Todd.  Let's first see if you think it can
 be
  fixed quickly.  Let me know.
 
 
  No problem, it wasn't too bad after all. Patch up on HADOOP-7087 which
  fixes
  this test timeout for me.
 
  -Todd
 
 
  On Jan 5, 2011, at 4:33 PM, Todd Lipcon wrote:
 
  On Wed, Jan 5, 2011 at 4:19 PM, Nigel Daley nda...@mac.com wrote:
 
  Todd, would love to get
  https://issues.apache.org/jira/browse/MAPREDUCE-2121 fixed first
  since
  this is failing every night on trunk.
 
 
  What if we disable that test, move that issue to 0.22 blocker, and
 then
  enable the test-patch? I'll also look into that one today, but if
 it's
  something that will take a while to fix, I don't think we should
 hold
  off
  the useful testing for all the other patches.
 
  -Todd
 
  On Jan 5, 2011, at 2:45 PM, Todd Lipcon wrote:
 
  Hi Nigel,
 
  MAPREDUCE-2172 has been fixed for a while. Are there any other
  particular
  JIRAs you think need to be fixed before the MR test-patch queue
 gets
  enabled? I have a lot of outstanding patches and doing all the
  test-patch
  turnaround manually on 3 different boxes is a real headache.
 
  Thanks
  -Todd
 
  On Tue, Dec 21, 2010 at 1:33 PM, Nigel Daley nda...@mac.com
 wrote:
 
  Ok, HDFS is now enabled.  You'll see a stream of updates shortly
 on
  the
  ~30
  Patch Available HDFS issues.
 
  Nige
 
  On Dec 20, 2010, at 12:42 PM, Jakob Homan wrote:
 
  I committed HDFS-1511 this morning.  We should be good to go.  I
  can
  haz snooty robot butler?
 
  On Fri, Dec 17, 2010 at 8:31 PM, Konstantin Boudnik 
  c...@apache.org
  wrote:
  Thanks Jacob. I am wasted already but I can do it on Sun, I
 think,
  unless it is done earlier.
  --
  Take care,
  Konstantin (Cos) Boudnik
 
 
 
  On Fri, Dec 17, 2010 at 19:41, Jakob Homan jgho...@gmail.com
  wrote:
  Ok.  I'll get a patch out for 1511 tomorrow, unless someone
 wants
  to
  whip one up tonight.
 
 
  On Fri, Dec 17, 2010 at 7:22 PM, Nigel Daley nda...@mac.com
  wrote:
  I agree with Cos on fixing HDFS-1511 first. Once that is done
  I'll
  enable hdfs patch testing.
 
  Cheers,
  Nige
 
  Sent from my iPhone4
 
  On Dec 17, 2010, at 7:01 PM, Konstantin Boudnik 
 c...@apache.org
 
  wrote:
 
  One more issue needs to be addressed before test-patch is
  turned
  on
  HDFS is
  https://issues.apache.org/jira/browse/HDFS-1511
  --
  Take care,
  Konstantin (Cos) Boudnik
 
 
 
  On Fri, Dec 17, 2010 at 16:17, Konstantin Boudnik 
  c...@apache.org
  wrote:
  Considering that because of these 4 faulty cases every
 patch
  will
  be
  -1'ed a patch author will still have to look at it and make
 a
  comment
  why this particular -1 isn't valid. Lesser work, perhaps,
 but
  messier
  IMO. I'm not blocking it - I just feel like there's a
 better
  way.
 
  --
  Take care,
  Konstantin (Cos) Boudnik
 
 
 
  On Fri, Dec 17, 2010 at 15:55, Jakob Homan 
 jgho...@gmail.com
 
  wrote:
  If HDFS is added to the test-patch queue right now we get
  nothing but dozens of -1'ed patches.
  There aren't dozens of patches being submitted currently.
  The
  -1
  isn't the important thing, it's the grunt work of actually
  running
  (and waiting) for the tests, test-patch, etc. that Hudson
  does
  so
  that
  the developer doesn't have to.
 
  On Fri, Dec 17, 2010 at 3:48 PM, Dhruba Borthakur

Re: triggering automated precommit testing

2011-01-14 Thread Todd Lipcon

Hey Nigel,

Would there be any way to add a feature where we can make some special
comment on the JIRA that would trigger a hudson retest? There are a lot of
really old patches out on the JIRA that would be worth re-testing against
trunk, and it's a pain to download and re-attach.

I'm thinking a comment with a special token like @Hudson.Test

Failing that, Ian, can you add me to the Hudson list?

-Todd

On Wed, Jan 12, 2011 at 4:33 PM, Nigel Daley nda...@mac.com wrote:

  Jakob Homan commented on HDFS-884:
  --
 
  Konstantin, if you're trying to kick a new patch build for this you no
 longer move it to Open and back to Patch Available. Instead, you must
 upload a new patch. Or, if you have permission, you can kickhttps://
 hudson.apache.org/hudson/job/PreCommit-HDFS-Build/ and enter the issue
 number.
 
  That makes me sad.  Is this a new feature or regression?

 [For everyone's benefit, moving this to general@]

 Jakob, I referenced the change here: http://tinyurl.com/4crxlvy
 The new system is much more robust partial because it no longer relies on
 watching Jira generated emails to determined when issues move into Patch
 Available state. There is limited info I can get from the Jira API, thus the
 triggering mechanism had to change.

 Cheers,
 Nige




-- 
Todd Lipcon
Software Engineer, Cloudera

Re: [DISCUSS] Hadoop Security Release off Yahoo! patchset

2011-01-13 Thread Todd Lipcon

Hi Arun, all,

When we merged YDH and CDH for CDH3b3, we went through the effort of
linearizing all of the YDH patches and squashing multiple commits into
single ones corresponding to a single JIRA where possible. So, we have a
100% linear set of patches that applies on top of the 0.20.2 source tree and
includes Yahoo 0.20.100.3 as well as almost all the patches from 0.20-append
and a number of other backports.

Since this could be applied as a linear set of patches instead of a big
lump, would there be interest in using this as the 0.20.100 Apache release?
I can take the time to remove any patches that are cloudera specific or not
yet applied upstream.

Thanks
-Todd


On Wed, Jan 12, 2011 at 11:07 PM, Arun C Murthy a...@yahoo-inc.com wrote:


 On Jan 12, 2011, at 2:56 PM, Nigel Daley wrote:

  +1 for 0.20.x, where x = 100.  I agree that the 1.0 moniker would involve
 more discussion.


 Ok, seems like we are converging; we can continue talking. I've created the
 branch to get the ball rolling.


  Will this be a jumbo patch attached to a Jira and then committed to the
 branch?  Just curious.


 I'm afraid that the svn log of the branch from github Y! branch is fairly
 useless since a single JIRA might have multiple commits in the Y! branch
 (bugfix on top of a bugfix). We have done that in several cases (but the
 patches committed to trunk have a single patch which is the result of
 forward porting a complete feature/bugfix). IAC the this branch and 0.22
 have diverged so much that almost no non-trivial patch would apply without a
 significant amount of work.

 Thus, I think a jumbo patch should suffice. It will also ensure this can
 done quickly so that the community can then concentrate on 0.22 and beyond.

 However, I will (manually) ensure all relevant jiras are referenced in the
 CHANGES.txt and Release Notes for folks to see the contents of the release.
 This is the hardest part of the exercise. Also, this ensures that we can
 track these jiras for 0.22 as Eli suggested.

 Does that seem like a reasonable way forward? I'm happy to brainstorm.

 thanks,
 Arun




-- 
Todd Lipcon
Software Engineer, Cloudera

Re: [DISCUSS] Hadoop Security Release off Yahoo! patchset

2011-01-13 Thread Todd Lipcon

On Thu, Jan 13, 2011 at 3:05 PM, Arun C Murthy a...@yahoo-inc.com wrote:

 Since this could be applied as a linear set of patches instead of a big
 lump, would there be interest in using this as the 0.20.100 Apache
 release?
 I can take the time to remove any patches that are cloudera specific or
 not
 yet applied upstream.


 Interesting discussion, thanks.

 I'm sure it took you a fair amount of work to squash patches (which I tried
 too, btw).


Yep, I had a great summer ;-)


 That, plus the fact that we would need to do a similar amount of work for
 the 10 or so releases we have done after 0.20.100.3 scares me.


Sorry, I actually meant 0.20.104.3. Have there been many releases since
then? That's the last version available on the Yahoo github, and that's the
version we incorporated/linearized.

If there is a large sequence of patches after this that you're planning on
including, it would be good to see them in your git repo.



 As we Nigel and I discussed here, the jumbo  patch and an up-to-date
 CHANGES.txt provides almost all of the benefits we seek and allows all of us
 to get this done very quickly to focus on hadoop-0.22 and beyond.


In my opinion here are the downsides to this plan:

- a mondo merge patch is a big pain when trying to do debugging. It may be
sufficient for a user to look at CHANGES.txt, but I find myself using
blame/log/etc on individual files to understand code lineage on a daily
basis. If all of the merge shows up as a big patch it will be very difficult
(at least the way I work with code) to help users debug issues or understand
which JIRA a certain regression may have come from.

- CHANGES.txt traditionally doesn't reference which patch file from a JIRA
was checked in. So we may know that a given JIRA has been included, but
often there are several revisions of patches on the JIRA and it's difficult
to be sure that we have the most up-to-date version. By looking at change
history it's usually easy to pick this out, but if it's one giant patch
apply, this isn't possible.

- the proposal to use the YDH distro certainly solves the Security issue,
but doesn't help out HBase at all. Given HBase has been asking for a long
time to get a real release of the append branch, I think it would be better
to have one 20-based release which has both of these features, rather than
further fragmenting the community into 0.20.2, 0.20.2+security,
0.20.2+append.

I think the first two points could be addressed if you push your git tree
either to github or an apache-hosted git, and then include in SVN as a mondo
patch. It's not ideal, but at least when trying to debug issues and
understand the history of this branch there will be a publicly available
change history to reference.

To clarify my position a bit here - I definitely appreciate your
volunteering to do the work, and wouldn't *block* the proposal as you've put
it forth. I just think it will have limited utility for the community by
being opaque (if contributed as a giant patch) and by not including the sync
feature which is critical for a large segment of users. Given those
downsides I'd rather see the effort diverted towards making a killer 0.22
release that we can all jump on.

Thanks
-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

Re: [DISCUSS] Move project split down a level

2011-01-13 Thread Todd Lipcon

Big +1.

Curious how this will map to git, though - do we go back to one git repo?

When we have a patch that is mainly HDFS or MR focused but will need changes
across projects, can we just put up one patch in HDFS/MR or do we still need
to open a parallel common JIRA?

On Thu, Jan 13, 2011 at 11:25 PM, Eric Baldeschwieler
eri...@yahoo-inc.comwrote:

 +1

 Death to the project split!  Or short of that, anything to tame it.

 On Jan 13, 2011, at 10:18 PM, Nigel Daley wrote:

  Folks,
 
  As I look more at the impact of the common/MR/HDFS project split on what
 and how we release Hadoop, I feel like the split needs an adjustment.  Many
 folks I've talked to agree that the project split has caused us a splitting
 headache.  I think 1 relatively small change could alleviate some of that.
 
  CURRENT SVN REPO:
 
  hadoop / [common, mapreduce, hdfs] / trunk
  hadoop / [common, mapreduce, hdfs] / branches
 
  PROPOSAL:
 
  hadoop / trunk / [common, mapreduce, hdfs]
  hadoop / branches / [common, mapreduce, hdfs]
 
  We're a long way from releasing these 3 projects independently.  Given
 that, they should be branched and released as a unit.  This SVN structure
 enforces that and provides a more natural place to keep a top level build
 and pkg scripts that operate across all 3 projects.
 
  Thoughts?
 
  Cheers,
  Nige




-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Patch testing

2011-01-07 Thread Todd Lipcon

On Fri, Jan 7, 2011 at 2:11 PM, Nigel Daley nda...@mac.com wrote:

 Hrm, the MR precommit test I'm running has hung (been running for 14 hours
 so far).  FWIW, 2 HDFS precommit tests are hung too.  I suspect it could be
 the NFS mounts on the machines.  I forced a thread dump which you can see in
 the console:
 https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/10/console


Strange, haven't seen a hang like that before in handleConnectionFailure. It
should retry for 15 minutes max in that loop.


 Any other ideas why these might be hanging?


There is an HDFS bug right now that can cause hangs on some tests -
HDFS-1529 - would appreciate if someone can take a look. But I don't think
this is responsible for the MR hang above.

-Todd


 On Jan 5, 2011, at 5:42 PM, Todd Lipcon wrote:

  On Wed, Jan 5, 2011 at 4:39 PM, Nigel Daley nda...@mac.com wrote:
 
  Thanks for looking into it Todd.  Let's first see if you think it can be
  fixed quickly.  Let me know.
 
 
  No problem, it wasn't too bad after all. Patch up on HADOOP-7087 which
 fixes
  this test timeout for me.
 
  -Todd
 
 
  On Jan 5, 2011, at 4:33 PM, Todd Lipcon wrote:
 
  On Wed, Jan 5, 2011 at 4:19 PM, Nigel Daley nda...@mac.com wrote:
 
  Todd, would love to get
  https://issues.apache.org/jira/browse/MAPREDUCE-2121 fixed first
 since
  this is failing every night on trunk.
 
 
  What if we disable that test, move that issue to 0.22 blocker, and then
  enable the test-patch? I'll also look into that one today, but if it's
  something that will take a while to fix, I don't think we should hold
 off
  the useful testing for all the other patches.
 
  -Todd
 
  On Jan 5, 2011, at 2:45 PM, Todd Lipcon wrote:
 
  Hi Nigel,
 
  MAPREDUCE-2172 has been fixed for a while. Are there any other
  particular
  JIRAs you think need to be fixed before the MR test-patch queue gets
  enabled? I have a lot of outstanding patches and doing all the
  test-patch
  turnaround manually on 3 different boxes is a real headache.
 
  Thanks
  -Todd
 
  On Tue, Dec 21, 2010 at 1:33 PM, Nigel Daley nda...@mac.com wrote:
 
  Ok, HDFS is now enabled.  You'll see a stream of updates shortly on
  the
  ~30
  Patch Available HDFS issues.
 
  Nige
 
  On Dec 20, 2010, at 12:42 PM, Jakob Homan wrote:
 
  I committed HDFS-1511 this morning.  We should be good to go.  I
 can
  haz snooty robot butler?
 
  On Fri, Dec 17, 2010 at 8:31 PM, Konstantin Boudnik 
 c...@apache.org
  wrote:
  Thanks Jacob. I am wasted already but I can do it on Sun, I think,
  unless it is done earlier.
  --
  Take care,
  Konstantin (Cos) Boudnik
 
 
 
  On Fri, Dec 17, 2010 at 19:41, Jakob Homan jgho...@gmail.com
  wrote:
  Ok.  I'll get a patch out for 1511 tomorrow, unless someone wants
  to
  whip one up tonight.
 
 
  On Fri, Dec 17, 2010 at 7:22 PM, Nigel Daley nda...@mac.com
  wrote:
  I agree with Cos on fixing HDFS-1511 first. Once that is done
 I'll
  enable hdfs patch testing.
 
  Cheers,
  Nige
 
  Sent from my iPhone4
 
  On Dec 17, 2010, at 7:01 PM, Konstantin Boudnik c...@apache.org
 
  wrote:
 
  One more issue needs to be addressed before test-patch is
 turned
  on
  HDFS is
  https://issues.apache.org/jira/browse/HDFS-1511
  --
  Take care,
  Konstantin (Cos) Boudnik
 
 
 
  On Fri, Dec 17, 2010 at 16:17, Konstantin Boudnik 
  c...@apache.org
  wrote:
  Considering that because of these 4 faulty cases every patch
  will
  be
  -1'ed a patch author will still have to look at it and make a
  comment
  why this particular -1 isn't valid. Lesser work, perhaps, but
  messier
  IMO. I'm not blocking it - I just feel like there's a better
  way.
 
  --
  Take care,
  Konstantin (Cos) Boudnik
 
 
 
  On Fri, Dec 17, 2010 at 15:55, Jakob Homan jgho...@gmail.com
 
  wrote:
  If HDFS is added to the test-patch queue right now we get
  nothing but dozens of -1'ed patches.
  There aren't dozens of patches being submitted currently.
  The
  -1
  isn't the important thing, it's the grunt work of actually
  running
  (and waiting) for the tests, test-patch, etc. that Hudson
 does
  so
  that
  the developer doesn't have to.
 
  On Fri, Dec 17, 2010 at 3:48 PM, Dhruba Borthakur 
  dhr...@gmail.com wrote:
  +1, thanks for doing this.
 
  On Fri, Dec 17, 2010 at 3:19 PM, Jakob Homan 
  jgho...@gmail.com
 
  wrote:
 
  So, with test-patch updated to show the failing tests,
 saving
  the
  developers the need to go and verify that the failed tests
  are
  all
  known, how do people feel about turning on test-patch again
  for
  HDFS
  and mapred?  I think it'll help prevent any more tests from
  entering
  the yeah, we know category.
 
  Thanks,
  jg
 
 
  On Wed, Nov 17, 2010 at 5:08 PM, Jakob Homan 
  jho...@yahoo-inc.com wrote:
  True, each patch would get a -1 and the failing tests
 would
  need
  to be
  verified as those known bad (BTW, it would be great if
  Hudson
  could list
  which tests failed in the message it posts to JIRA).  But
  that's
  still
  quite
  a bit less error-prone

Re: Patch testing

2011-01-05 Thread Todd Lipcon

On Wed, Jan 5, 2011 at 4:19 PM, Nigel Daley nda...@mac.com wrote:

 Todd, would love to get
 https://issues.apache.org/jira/browse/MAPREDUCE-2121 fixed first since
 this is failing every night on trunk.


What if we disable that test, move that issue to 0.22 blocker, and then
enable the test-patch? I'll also look into that one today, but if it's
something that will take a while to fix, I don't think we should hold off
the useful testing for all the other patches.

 -Todd

On Jan 5, 2011, at 2:45 PM, Todd Lipcon wrote:

  Hi Nigel,
 
  MAPREDUCE-2172 has been fixed for a while. Are there any other particular
  JIRAs you think need to be fixed before the MR test-patch queue gets
  enabled? I have a lot of outstanding patches and doing all the test-patch
  turnaround manually on 3 different boxes is a real headache.
 
  Thanks
  -Todd
 
  On Tue, Dec 21, 2010 at 1:33 PM, Nigel Daley nda...@mac.com wrote:
 
  Ok, HDFS is now enabled.  You'll see a stream of updates shortly on the
 ~30
  Patch Available HDFS issues.
 
  Nige
 
  On Dec 20, 2010, at 12:42 PM, Jakob Homan wrote:
 
  I committed HDFS-1511 this morning.  We should be good to go.  I can
  haz snooty robot butler?
 
  On Fri, Dec 17, 2010 at 8:31 PM, Konstantin Boudnik c...@apache.org
  wrote:
  Thanks Jacob. I am wasted already but I can do it on Sun, I think,
  unless it is done earlier.
  --
   Take care,
  Konstantin (Cos) Boudnik
 
 
 
  On Fri, Dec 17, 2010 at 19:41, Jakob Homan jgho...@gmail.com wrote:
  Ok.  I'll get a patch out for 1511 tomorrow, unless someone wants to
  whip one up tonight.
 
 
  On Fri, Dec 17, 2010 at 7:22 PM, Nigel Daley nda...@mac.com wrote:
  I agree with Cos on fixing HDFS-1511 first. Once that is done I'll
  enable hdfs patch testing.
 
  Cheers,
  Nige
 
  Sent from my iPhone4
 
  On Dec 17, 2010, at 7:01 PM, Konstantin Boudnik c...@apache.org
  wrote:
 
  One more issue needs to be addressed before test-patch is turned on
  HDFS is
  https://issues.apache.org/jira/browse/HDFS-1511
  --
   Take care,
  Konstantin (Cos) Boudnik
 
 
 
  On Fri, Dec 17, 2010 at 16:17, Konstantin Boudnik c...@apache.org
  wrote:
  Considering that because of these 4 faulty cases every patch will
 be
  -1'ed a patch author will still have to look at it and make a
  comment
  why this particular -1 isn't valid. Lesser work, perhaps, but
  messier
  IMO. I'm not blocking it - I just feel like there's a better way.
 
  --
   Take care,
  Konstantin (Cos) Boudnik
 
 
 
  On Fri, Dec 17, 2010 at 15:55, Jakob Homan jgho...@gmail.com
  wrote:
  If HDFS is added to the test-patch queue right now we get
  nothing but dozens of -1'ed patches.
  There aren't dozens of patches being submitted currently.  The -1
  isn't the important thing, it's the grunt work of actually
 running
  (and waiting) for the tests, test-patch, etc. that Hudson does so
  that
  the developer doesn't have to.
 
  On Fri, Dec 17, 2010 at 3:48 PM, Dhruba Borthakur 
  dhr...@gmail.com wrote:
  +1, thanks for doing this.
 
  On Fri, Dec 17, 2010 at 3:19 PM, Jakob Homan jgho...@gmail.com
 
  wrote:
 
  So, with test-patch updated to show the failing tests, saving
 the
  developers the need to go and verify that the failed tests are
  all
  known, how do people feel about turning on test-patch again for
  HDFS
  and mapred?  I think it'll help prevent any more tests from
  entering
  the yeah, we know category.
 
  Thanks,
  jg
 
 
  On Wed, Nov 17, 2010 at 5:08 PM, Jakob Homan 
  jho...@yahoo-inc.com wrote:
  True, each patch would get a -1 and the failing tests would
 need
  to be
  verified as those known bad (BTW, it would be great if Hudson
  could list
  which tests failed in the message it posts to JIRA).  But
 that's
  still
  quite
  a bit less error-prone work than if the developer runs the
 tests
  and
  test-patch themselves.  Also, with 22 being cut, there are a
 lot
  of
  patches
  up in the air and several developers are juggling multiple
  patches.  The
  more automation we can have, even if it's not perfect, will
  decrease
  errors
  we may make.
  -jg
 
  Nigel Daley wrote:
 
  On Nov 17, 2010, at 3:11 PM, Jakob Homan wrote:
 
  It's also ready to run on MapReduce and HDFS but we won't
  turn it on
  until these projects build and test cleanly.  Looks like
 both
  these
  projects
  currently have test failures.
 
  Assuming the projects are compiling and building, is there a
  reason to
  not turn it on despite the test failures? Hudson is
 invaluable
  to
  developers
  who then don't have to run the tests and test-patch
  themselves.  We
  didn't
  turn Hudson off when it was working previously and there
 were
  known
  failures.  I think one of the reasons we have more failing
  tests now is
  the
  higher cost of doing Hudson's work (not a great excuse I
  know).  This
  is
  particularly true now because several of the failing tests
  involve
  tests
  timing out, making the whole testing regime even longer.
 
  Every single patch would get a -1

Re: DISCUSSION: Cut a hadoop-0.20.0-append release from the tip of branch-0.20-append branch?

2010-12-23 Thread Todd Lipcon

On Thu, Dec 23, 2010 at 10:15 AM, M. C. Srivas mcsri...@gmail.com wrote:

 Regardless, there will still be 2 incompatible branches. And that is only
 the beginning.

 Some future features will be done only on branch 1 (since company 1 uses
 that), and other features on branch 2 (by company 2, since they prefer
 branch 2),  thereby further separating the two branches.

 If the goal is to avoid the split, then there are only 2 choices:
  (a) merge both
  (b) abandon one or the other.


The 0.20 append solution has never been seen as a fork. It's a stop-gap
fixup of the 0.20 append feature, but we don't intend to forward-port that
append implementation into trunk. From an API perspective it's very close to
the 0.22 version, and I think everyone fully intends to abandon the
0.20-append work once 0.22 append has been heavily tested for HBase
workloads.



 
  The Promised Land that we say we're all trying to get to is regular,
  timely, feature-complete, tested, innovative but stable releases of
  new versions of Apache Hadoop.  Missing out any one of those criteria
  discovered will continue (and has continued) the current situation
  where quasi-official branches and outside distributions fill the void
  such a release should.  The effort to maintain this offical branch and
  fix the bugs that will be discovered could be better spent moving us
  closer to that goal.
 


+1. Interestingly, the work on 0.20-append uncovered a number of bugs that
also will apply to 0.22's implementation. So it wasn't all a wasted effort
;-)

-- 
Todd Lipcon
Software Engineer, Cloudera

Re: namenode doesn't start after reboot

2010-12-23 Thread Todd Lipcon

On Thu, Dec 23, 2010 at 2:50 AM, Bjoern Schiessle bjo...@schiessle.orgwrote:


 1. I have set up a second dfs.name.dir which is stored at another
 computer (mounted by sshfs)


I would strongly discourage the use of sshfs for the name dir. For one, it's
slow, and for two, I've sen it have some really weird semantics where it's
doing write-back caching.

Just take a look at its manpage and you should get scared about using it for
a critical mount point like this.

A soft interruptable NFS mount is a much safer bet.

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

Re: namenode doesn't start after reboot

2010-12-23 Thread Todd Lipcon

On Thu, Dec 23, 2010 at 12:47 PM, Jakob Homan jgho...@gmail.com wrote:

 Please move discussions of CDH issues to Cloudera's lists.  Thanks.


Hi Jakob,

These bugs are clearly not CDH-specific. NameNode corruption bugs, and best
practices with regard to the storage of NN metadata, are clearly applicable
to any version of Hadoop that users may run, be it Apache, Yahoo, Facebook,
0.20, 0.21, or trunk. If you have reason to believe my suggestion you quoted
below is somehow not relevant to the larger community I would love to hear
it.

My understanding of the ASF goals is that we should encourage a cohesive
community. Asking users of CDH to move general Hadoop questions off of ASF
mailing lists just because of their choice in distros encourages a fractured
community rather than a cohesive one.

Clearly. if a user has a question specifically about Cloudera packaging they
should be directed to the CDH lists so as not to clutter non-CDH users'
inboxes with irrelevant questions. I think if you browse the archives you'll
find that Cloudera employees have been consistent about doing this since we
started the cdh-user list several months ago. But if an issue is a bug that
is likely to occur in trunk, it makes sense to me to leave it on the list
associated with the core project.

Personally I do my best to answer questions on the ASF lists regardless of
which distro the person is using - though our distros have some divergence
in backported patch sets, it's rare that a bug in one distro doesn't allow
us to fix a bug in trunk. I can readily pull up several recent examples of
this, and I'm surprised that there isn't more concern in the general
community about bugs that may result in NN metadata corruption.

Thanks,
-Todd



 On Thu, Dec 23, 2010 at 12:02 PM, Todd Lipcon t...@cloudera.com wrote:
  On Thu, Dec 23, 2010 at 2:50 AM, Bjoern Schiessle bjo...@schiessle.org
 wrote:
 
 
  1. I have set up a second dfs.name.dir which is stored at another
  computer (mounted by sshfs)
 
 
  I would strongly discourage the use of sshfs for the name dir. For one,
 it's
  slow, and for two, I've sen it have some really weird semantics where
 it's
  doing write-back caching.
 
  Just take a look at its manpage and you should get scared about using it
 for
  a critical mount point like this.
 
  A soft interruptable NFS mount is a much safer bet.
 
  -Todd
  --
  Todd Lipcon
  Software Engineer, Cloudera
 




-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Patch testing

2010-12-17 Thread Todd Lipcon

On Fri, Dec 17, 2010 at 3:55 PM, Jakob Homan jgho...@gmail.com wrote:
 If HDFS is added to the test-patch queue right now we get
 nothing but dozens of -1'ed patches.
 There aren't dozens of patches being submitted currently.  The -1
 isn't the important thing, it's the grunt work of actually running
 (and waiting) for the tests, test-patch, etc. that Hudson does so that
 the developer doesn't have to.


I agree with Jakob. I've had to run and re-run the test-patch and unit
tests probably 30 times over the last two weeks, and it takes a lot of
effort, since my own infrastructure for doing this is a bit messy. I'd
much rather just reply to the Hudson comments saying these are known
issues than have to run the tests, check the results, copy and paste
them and *then* say these are known issues anyway!

 On Fri, Dec 17, 2010 at 3:48 PM, Dhruba Borthakur dhr...@gmail.com wrote:
 +1, thanks for doing this.

 On Fri, Dec 17, 2010 at 3:19 PM, Jakob Homan jgho...@gmail.com wrote:

 So, with test-patch updated to show the failing tests, saving the
 developers the need to go and verify that the failed tests are all
 known, how do people feel about turning on test-patch again for HDFS
 and mapred?  I think it'll help prevent any more tests from entering
 the yeah, we know category.

 Thanks,
 jg


 On Wed, Nov 17, 2010 at 5:08 PM, Jakob Homan jho...@yahoo-inc.com wrote:
  True, each patch would get a -1 and the failing tests would need to be
  verified as those known bad (BTW, it would be great if Hudson could list
  which tests failed in the message it posts to JIRA).  But that's still
 quite
  a bit less error-prone work than if the developer runs the tests and
  test-patch themselves.  Also, with 22 being cut, there are a lot of
 patches
  up in the air and several developers are juggling multiple patches.  The
  more automation we can have, even if it's not perfect, will decrease
 errors
  we may make.
  -jg
 
  Nigel Daley wrote:
 
  On Nov 17, 2010, at 3:11 PM, Jakob Homan wrote:
 
  It's also ready to run on MapReduce and HDFS but we won't turn it on
  until these projects build and test cleanly.  Looks like both these
 projects
  currently have test failures.
 
  Assuming the projects are compiling and building, is there a reason to
  not turn it on despite the test failures? Hudson is invaluable to
 developers
  who then don't have to run the tests and test-patch themselves.  We
 didn't
  turn Hudson off when it was working previously and there were known
  failures.  I think one of the reasons we have more failing tests now is
 the
  higher cost of doing Hudson's work (not a great excuse I know).  This
 is
  particularly true now because several of the failing tests involve
 tests
  timing out, making the whole testing regime even longer.
 
  Every single patch would get a -1 and need investigation.  Currently,
 that
  would be about 83 investigations between MR and HDFS issues that are in
  patch available state.  Shouldn't we focus on getting these tests fixed
 or
  removed/?  Also, I need to get MAPREDUCE-2172 fixed (applies to HDFS as
  well) before I turn this on.
 
  Cheers,
  Nige
 
 




 --
 Connect to me at http://www.facebook.com/dhruba





-- 
Todd Lipcon
Software Engineer, Cloudera

hadoop.job.ugi backwards compatibility

2010-09-13 Thread Todd Lipcon

Hi all,

I wanted to start a (hopefully short) discussion around the treatment of the
hadoop.job.ugi configuration in Hadoop 0.22 and beyond (as well as the
secure 0.20 branch). In the current security implementation, the following
incompatible changes have been made even for users who are sticking with
simple security.

1) Groups resolution happens on the server side, where it used to happen on
the client. Thus, all Hadoop users must exist on the NN/JT machines in order
for group mapping to succeed (or the user must write a custom group mapper).
2) The hadoop.job.ugi parameter is ignored - instead the user has to use the
new UGI.createRemoteUser(foo).doAs() API, even in simple security.

I'm curious whether the general user community feels these are acceptable
breaking changes. The potential solutions I can see are:

For 1) Add a configuration like hadoop.security.simple.groupmappinglocation
- client or server. If it's set to client, the group mapping would
continue to happen as it does in prior versions on the client side.
For 2) If security is simple, we can have the FileSystem and JobClient
constructors check for this parameter. If it's set, and there is no Subject
object associated with the current AccessControlContext, wrap the creation
of the RPC proxy with the correct doAs() call.

Although security is obviously an absolute necessity for many organizations,
I know of a lot of people who have small clusters and small teams who don't
have any plans to deploy it. For these people, I imagine the above
backward-compatibility layer may be very helpful as they adopt the next
releases of Hadoop. If we don't want to support these options going forward,
we can of course emit deprecation warnings when they are in effect and
remove the compatibility layer in the next major release.

Any thoughts here? Do people often make use of the hadoop.job.ugi variable
to such an extent that this breaking change would block your organization
from upgrading?

Thanks
-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Why single thread for HDFS?

2010-07-05 Thread Todd Lipcon

On Mon, Jul 5, 2010 at 5:08 AM, elton sky eltonsky9...@gmail.com wrote:

 Segel, Jay
 Thanks for reply!

 Your parallelism comes from multiple tasks running on different nodes
 within the cloud. By default you get one map/reduce job per block. You can
 write your own splitter to increase this and then get more parallelism.
 sounds like an elegant solution. We can modify the 'distcp', using a simple
 MR job, make it based on block rather than file.


There's actually an open ticket somewhere to make distcp do this using the
new concat() API in the NameNode. concat() allows several files to be
combined into one file at the metadata level, so long as a number of
restrictions are met. The work hasn't been done yet, but the concat() call
is there and waiting for a user.

-Todd



 in practice, you very rarely know how big your output is going to be
 before
 it's produced, so this doesn't really work
 I think you got the point why Yahoo make this design descision.
 Multithreading only applicable when you know the size of the file, like
 copy
 existing files, so you can split them and feed to different threads.

 On Sat, Jul 3, 2010 at 1:24 AM, Jay Booth jaybo...@gmail.com wrote:

  Yeah, a good way to think of it is that parallelism is achieved at the
  application level.
 
  On the input side, you can process multiple files in parallel or one
  file in parallel by logically splitting and opening multiple readers
  of the same file at multiple points.  Each of these readers is single
  threaded, because, well, you're returning a stream of bytes in order.
  It's inherently serial.
 
  On the reduce side, multiple reduces run, writing to multiple files in
  the same directory.  Again, you can't really write to a single file in
  parallel effectively -- you can't write byte 26 before byte 25,
  because the file's not that long yet.
 
  Theoretically, maybe you could have all reduces write to the same file
  by allocating some amount of space ahead of time and writing to the
  blocks in parallel - in practice, you very rarely know how big your
  output is going to be before it's produced, so this doesn't really
  work.  Multiple files in the same directory achieves the same goal
  much more elegantly, without exposing a bunch of internal details of
  the filesystem to user space.
 
  Does that make sense?
 
 
 
  On Fri, Jul 2, 2010 at 9:26 AM, Segel, Mike mse...@navteq.com wrote:
   Actually they also listen here and this is a basic question...
  
   I'm not an expert, but how does having multiple threads really help
 this
  problem?
  
   I'm assuming you're talking about a map/reduce job and not some
 specific
  client code which is being run on a client outside of the
 cloud/cluster
  
   I wasn't aware that you could easily synchronize threads running on
  different JVMs. ;-)
  
   Your parallelism comes from multiple tasks running on different nodes
  within the cloud. By default you get one map/reduce job per block. You
 can
  write your own splitter to increase this and then get more parallelism.
  
   HTH
  
   -Mike
  
  
   -Original Message-
   From: Hemanth Yamijala [mailto:yhema...@gmail.com]
   Sent: Friday, July 02, 2010 2:56 AM
   To: general@hadoop.apache.org
   Subject: Re: Why single thread for HDFS?
  
   Hi,
  
   Can you please post this on hdfs-...@hadoop.apache.org ? I suspect the
   most qualified people to answer this question would all be on that
   list.
  
   Hemanth
  
   On Fri, Jul 2, 2010 at 11:43 AM, elton sky eltonsky9...@gmail.com
  wrote:
   I guess this question was igored, so I just post it again.
  
   From my understanding, HDFS uses a single thread to do read and write.
   Since a file is composed of many blocks and each block is stored as a
  file
   in the underlying FS, we can do some parallelism on block base.
   When read across multi-blocks, threads can be used to read all blocks.
  When
   write, we can calculate the offset of each block and write to all of
  them
   simultaneously.
  
   Is this right?
  
  
  
   The information contained in this communication may be CONFIDENTIAL and
  is intended only for the use of the recipient(s) named above.  If you are
  not the intended recipient, you are hereby notified that any
 dissemination,
  distribution, or copying of this communication, or any of its contents,
 is
  strictly prohibited.  If you have received this communication in error,
  please notify the sender and delete/destroy the original message and any
  copy of it from your computer or paper files.
  
 




-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Hadoop versions distributions

2010-07-05 Thread Todd Lipcon

On Mon, Jul 5, 2010 at 1:12 AM, Evert Lammerts evert.lamme...@sara.nlwrote:

  There are a number of different versions and distributions of Hadoop
 which, as far as I understand, all differ from each other. I know that in
 the 0.20-append branch, files in HDFS can be appended, and that the Y!
 distribution (0.20.S) implements security features through Kerberos. And
 then there are the 0.20.3 and 0.22.0 branches. And trunk of course, which I
 guess is 0.20.2 nowadays? In addition to that there are distributions by
 Cloudera(CDH2 / 3beta) and IBM (IDAH).



 From my perspective, setting up a pilot cluster for a small number of users
 from different institutes, security (0.20.S) is very attractive – scientists
 like the idea of shielding their data and logic from other users. But what
 will I miss if I choose Y!’s distribution over all of these other options?


Hi Evert,

Y!'s distribution does contain a good set of patches, and we at Cloudera are
always keeping track of the ydist git repository to incorporate those
changes into CDH. Currently, ydist contains the security patch series, but
doesn't include the recent append work. CDH3b2 includes the append work, but
not security as of yet -- we are currently integrating security and it
should be available in the next beta.

Aside from the specific patches included, it's worth noting that the Y! dist
is a git repository, rather than a full binary-and-source distribution of
Hadoop and related tools. CDH includes not just the core hadoop components
but also integrates many other important ecosystem components including Pig,
Hive, Oozie, HBase, Zookeeper, Flume, etc.

Thanks
-Todd

-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Can we modify files in HDFS?

2010-06-28 Thread Todd Lipcon

Hi Elton,

Typically, large data sets are of the sort that continuously grow, and are
not edited or amended. For example, a common Hadoop use case is the analysis
of log data or other instrumentation from web or application servers. In
these cases, files are simply added, but there is no need to go back and
change entries.

For the ability to have a more table-like random access storage on top of
Hadoop, I would encourage you to look into HBase. It supports random
read/write access with low latency.

-Todd

On Mon, Jun 28, 2010 at 9:48 PM, elton sky eltonsky9...@gmail.com wrote:

 thanx Jeff,

 So...it is a significant drawback.
 As a matter of fact, there are many cases we need to modify.
 I dont understand why Yahoo didn't provoid that functionality. And as I
 know
 no one else is working on this. Why is that?




-- 
Todd Lipcon
Software Engineer, Cloudera

Re: datanode goes down, maybe due to Unexpected problem in creating temporary file

2010-05-17 Thread Todd Lipcon

Have you disabled the statechange log on the NN? This block has to be in
there.

Also, are you by any chance running with append enabled on unpatched 0.20?

-Todd

On Mon, May 17, 2010 at 12:40 PM, Ted Yu yuzhih...@gmail.com wrote:

 That blk doesn't appear in NameNode log.

 For datanode,
 2010-05-15 00:09:31,023 INFO
 org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
 blk_926027507678171558_3620 src: /10.32.56.170:49172 dest: /
 10.32.56.171:50010
 2010-05-15 00:09:31,024 INFO
 org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
 blk_926027507678171558_3620 received exception java.io.IOException:
 Unexpected problem in creating temporary file for
 blk_926027507678171558_3620.  File

 /home/hadoop/m2m_3.0.x/3.0.trunk.39-270238/data/hadoop-data/dfs/data/tmp/blk_926027507678171558
 should not be present, but is.
 2010-05-15 00:09:31,024 INFO
 org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
 blk_-5814095875968936685_2910 received exception java.io.IOException:
 Unexpected problem in creating temporary file for
 blk_-5814095875968936685_2910.  File

 /home/hadoop/m2m_3.0.x/3.0.trunk.39-270238/data/hadoop-data/dfs/data/tmp/blk_-5814095875968936685
 should not be present, but is.
 2010-05-15 00:09:31,025 ERROR
 org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
 10.32.56.171:50010,
 storageID=DS-1723593983-10.32.56.171-50010-1273792791835, infoPort=50075,
 ipcPort=50020):DataXceiver
 java.io.IOException: Unexpected problem in creating temporary file for
 blk_926027507678171558_3620.  File

 /home/hadoop/m2m_3.0.x/3.0.trunk.39-270238/data/hadoop-data/dfs/data/tmp/blk_926027507678171558
 should not be present, but is.
at

 org.apache.hadoop.hdfs.server.datanode.FSDataset$FSVolume.createTmpFile(FSDataset.java:398)
at

 org.apache.hadoop.hdfs.server.datanode.FSDataset$FSVolume.createTmpFile(FSDataset.java:376)
at

 org.apache.hadoop.hdfs.server.datanode.FSDataset.createTmpFile(FSDataset.java:1133)
at

 org.apache.hadoop.hdfs.server.datanode.FSDataset.writeToBlock(FSDataset.java:1022)
at

 org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:98)
at

 org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:259)
at

 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)
at java.lang.Thread.run(Thread.java:619)
 2010-05-15 00:09:31,025 ERROR
 org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
 10.32.56.171:50010,
 storageID=DS-1723593983-10.32.56.171-50010-1273792791835, infoPort=50075,
 ipcPort=50020):DataXceiver

 2010-05-15 00:19:28,334 INFO
 org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
 blk_926027507678171558_3620 src: /10.32.56.170:36887 dest: /
 10.32.56.171:50010
 2010-05-15 00:19:28,334 INFO
 org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
 blk_926027507678171558_3620 received exception java.io.IOException:
 Unexpected problem in creating temporary file for
 blk_926027507678171558_3620.  File

 /home/hadoop/m2m_3.0.x/3.0.trunk.39-270238/data/hadoop-data/dfs/data/tmp/blk_926027507678171558
 should not be present, but is.
 2010-05-15 00:19:28,334 ERROR
 org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
 10.32.56.171:50010,
 storageID=DS-1723593983-10.32.56.171-50010-1273792791835, infoPort=50075,
 ipcPort=50020):DataXceiver
 java.io.IOException: Unexpected problem in creating temporary file for
 blk_926027507678171558_3620.  File

 /home/hadoop/m2m_3.0.x/3.0.trunk.39-270238/data/hadoop-data/dfs/data/tmp/blk_926027507678171558
 should not be present, but is.
at

 org.apache.hadoop.hdfs.server.datanode.FSDataset$FSVolume.createTmpFile(FSDataset.java:398)
at

 org.apache.hadoop.hdfs.server.datanode.FSDataset$FSVolume.createTmpFile(FSDataset.java:376)
at

 org.apache.hadoop.hdfs.server.datanode.FSDataset.createTmpFile(FSDataset.java:1133)
at

 org.apache.hadoop.hdfs.server.datanode.FSDataset.writeToBlock(FSDataset.java:1022)
at

 org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:98)
at

 org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:259)
at

 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)
at java.lang.Thread.run(Thread.java:619)
 2010-05-15 00:29:25,635 INFO
 org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
 blk_926027507678171558_3620 src: /10.32.56.170:34823 dest: /
 10.32.56.171:50010

 On Mon, May 17, 2010 at 11:43 AM, Todd Lipcon t...@cloudera.com wrote:

  Hi Ted,
 
  Can you please grep your NN and DN logs for blk_926027507678171558 and
  pastebin the results?
 
  -Todd
 
  On Mon, May 17, 2010 at 9:57 AM, Ted Yu yuzhih...@gmail.com wrote:
 
   Hi,
   We use CDH2 hadoop-0.20.2+228 which crashed on datanode
 smsrv10.ciq.com
  
   I found this in datanode log:
  
   2010-05-15 07:37

Re: Hadoop support for hbase

2010-05-08 Thread Todd Lipcon

On Sat, May 8, 2010 at 9:59 AM, Thomas Koch tho...@koch.ro wrote:

 I'm a little confused and concerned now that I learn that hbase uses a
 patches
 hadoop. For Debian I use plain hadoop under hbase and it seems to work in
 testing environments.


 - Are these patches necessary to run HBase?


It will work unless you have failures, in which case it will lose edits.
HBase relies on the hflush API (called sync in 0.20) which does not work
properly in 0.20 without significant patching.

Without this patch series, HBase will certainly run, but I could never
recommend running it in a production environment where data loss is a
show-stopper.


 - Where can I find these patches?


Currently they're in various places on the JIRA - HDFS-200, HDFS-142,
HDFS-826, HDFS-561, etc. I have a github branch up which contains them all
applied, but I haven't tested it beyond unit tests - my testing is all
happening in our CDH3 tree, and afaik Dhruba's testing is on their FB
internal tree.



 - Why aren't these patches included in hadoop? Are they too unstable?


Yes, the policy is not to make such significant changes in patch releases,
so they would need to be voted into the 0.20 series. It's not that they're
entirely unstable, it's just that the code is very tricky and still under
development. The upcoming 0.21 release has a *different* implementation of
Append which also hasn't been tested significantly in real life failure
scenarios, but it's important that we keep the stable release stable.


 - If they're unstable, does this mean, HBase is unstable?


Again I would not say it's terribly unstable - but it's nowhere near the
level of stability that Hadoop is

  Should I worry at all about these patches for the Debian packages?

If you expect that people might actually want to run a production HBase,
they should have the patches. If you expect people to just be playing around
on single node clusters where failures aren't an issue, best to skip. Of
course, even for production usage, I wouldn't recommend running what we've
got now - wait a month or two and it should be one more notch up the
stability/testing scale.

-Todd

-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Hadoop support for hbase

2010-05-07 Thread Todd Lipcon

I have a few questions about this proposal:

1) Will we open new JIRAs separately for each change we want to commit, and
go through the normal review process? Currently the 20-append work has been
mostly going on under HDFS-142 for whatever reason, with ancillary issues
only for bugs that also exist in trunk.
2) Do we plan to do a release off the branch, or is it meant only as a
repository for sharing patches and a tree?
3) If we do a release, what version number would we give it and how would it
be presented on the download/release pages?

I'm certainly not against the idea, just would like to open discussion on
the above points. The other alternative as I see it is to have those working
on this branch do so somewhere like github - the advantages to that would be
(a) it provides a more open way for non-committers to contribute, which is
important since we're working closely with the HBase team on this, and (b)
it doesn't add confusion to the main Hadoop jira and download pages. The
disadvantage of course is that it fragments the code repository and we can't
really do a release as easily.

Thanks
-Todd

On Fri, May 7, 2010 at 10:34 AM, Dhruba Borthakur dhr...@gmail.com wrote:

 Hi folks,

 I would like to open a discussion on how we can make HBase work well with a
 supported/released version of Hadoop. HBase currently ships with a hadoop
 jar and that hadoop jar is from hadoop 0.20 + a set of ten/twenty patches.
 Most of these patches are focussed on HDFS append support in hadoop 0.20.
 These cannot be ported back to the 0.20 branch without affecting stability
 of the hadoop 0.20 branch. On the other hand, it is premature for hbase
 deployments to use hadoop 0.21 because hadoop 0.21 is still under testing
 and will take some time to stabilize.

 My proposal is to create a new branch off the hadoop 0.20 branch and name
 it branch-0.20-hbase. It will have support for append/sync and will be API
 compatible with the hadoop 0.20 branch. However, this branch will be marked
 experimental and API compatibility is subject to change. This branch will
 contain all of hdfs/mapreduce/core.

 If the community likes this idea, I will volunteer myself to be the release
 manager for this new branch and will propose a formal vote.

 comments/feedback/questions are most welcome.
 dhruba


 --
 Connect to me at http://www.facebook.com/dhruba




-- 
Todd Lipcon
Software Engineer, Cloudera

1 2 >

1 - 100 of 117 matches

Mail list logo