Re: Heads Up - hadoop-2.0.3 release
Hey Arun, I put up patches for the QJM backport merge yesterday. Aaron said he'd take a look at reviewing them, so I anticipate that to be finished real soon now. Sorry for the delay. -Todd On Tue, Dec 4, 2012 at 6:09 AM, Arun C Murthy a...@hortonworks.com wrote: Lohit, There are some outstanding blockers and I'm still awaiting the QJM merge. Feel free to watch the blocker list: http://s.apache.org/e1J Arun On Dec 3, 2012, at 10:02 AM, lohit wrote: Hello Hadoop Release managers, Any update on this? Thanks, Lohit 2012/11/20 Tom White t...@cloudera.com On Mon, Nov 19, 2012 at 6:09 PM, Siddharth Seth seth.siddha...@gmail.com wrote: YARN-142/MAPREDUCE-4067 should ideally be fixed before we commit to API backward compatibility. Also, from the recent YARN meetup - there seemed to be a requirement to change the AM-RM protocol for container requests. In this case, I believe it's OK to not have all functionality implemented, as long as the protocol itself can represent the requirements. I agree. Do you think we can make these changes before removing the 'alpha' label, i.e. in 2.0.3? If that's not possible for the container requests change, then we could mark AMRMProtocol (or related classes) as @Evolving. Another alternative would be to introduce a new interface. However, as Bobby pointed out, given the current adoption by other projects - incompatible changes at this point can be problematic and needs to be figured out. We have a mechanism for this already. If something is marked as @Evolving it can change incompatibly between minor versions - e.g. 2.0.x to 2.1.0. If it is @Stable then it can only change on major versions, e.g. 2.x.y to 3.0.0. Let's make sure we are happy with the annotations - and willing to support them at the indicated level - before we remove the 'alpha' label. Of course, we strive not to change APIs without a very good reason, but if we do we should do so within the guidelines so that users know what to expect. Cheers, Tom Thanks - Sid On Mon, Nov 19, 2012 at 8:22 AM, Robert Evans ev...@yahoo-inc.com wrote: I am OK with removing the alpha assuming that we think that the APIs are stable enough that we are willing to truly start maintaining backwards compatibility on them within 2.X. From what I have seen I think that they are fairly stable and I think there is enough adoption by other projects right now that breaking backwards compatibility would be problematic. --Bobby Evans On 11/16/12 11:34 PM, Stack st...@duboce.net wrote: On Fri, Nov 16, 2012 at 3:38 PM, Aaron T. Myers a...@cloudera.com wrote: Hi Arun, Given that the 2.0.3 release is intended to reflect the growing stability of YARN, and the QJM work will be included in 2.0.3 which provides a complete HDFS HA solution, I think it's time we consider removing the -alpha label from the release version. My preference would be to remove the label entirely, but we could also perhaps call it -beta or something. Thoughts? I think it fine after two minor releases undoing the '-alpha' suffix. If folks insist we next go to '-beta', I'd hope we'd travel all remaining 22 letters of the greek alphabet before we 2.0.x. St.Ack -- Have a Nice Day! Lohit -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ -- Todd Lipcon Software Engineer, Cloudera
Re: Heads Up - hadoop-2.0.3 release
+1 from me, too. I wanted to let it sit in trunk for a few weeks to see if anyone found issues, but it's now been a bit over a month all the feedback I've gotten so far has been good, tests have been stable, etc. Unless anyone votes otherwise, I'll start backporting the patches into branch-2. Todd On Fri, Nov 16, 2012 at 12:58 PM, lohit lohit.vijayar...@gmail.com wrote: +1 on having QJM in hadoop-2.0.3. Any rough estimate when this is targeted for? 2012/11/15 Arun C Murthy a...@hortonworks.com On the heels of the planned 0.23.5 release (thanks Bobby Thomas) I want to rollout a hadoop-2.0.3 release to reflect the growing stability of YARN. I'm hoping we can also release the QJM along-with; hence I'd love to know an ETA - Todd? Sanjay? Suresh? One other thing which would be nice henceforth is to better reflect release content for end-users in release-notes etc.; thus, can I ask committers to start paying closer attention to bug classification such as Blocker/Critical/Major/Minor etc.? This way, as we get closer to stable hadoop-2 releases, we can do a better job communicating content and it's criticality. thanks, Arun -- Have a Nice Day! Lohit -- Todd Lipcon Software Engineer, Cloudera
Re: Heads Up - hadoop-2.0.3 release
Here's a git branch with the backported changes in case anyone has time to take a look this weekend: https://github.com/toddlipcon/hadoop-common/tree/branch-2-QJM There were a few conflicts due to patches committed in different orders, and I had to pull in a couple other JIRAs along the way, but it is passing its tests. If it looks good I'll start putting up the patches on JIRA and committing next week. -Todd On Fri, Nov 16, 2012 at 1:14 PM, Todd Lipcon t...@cloudera.com wrote: +1 from me, too. I wanted to let it sit in trunk for a few weeks to see if anyone found issues, but it's now been a bit over a month all the feedback I've gotten so far has been good, tests have been stable, etc. Unless anyone votes otherwise, I'll start backporting the patches into branch-2. Todd On Fri, Nov 16, 2012 at 12:58 PM, lohit lohit.vijayar...@gmail.comwrote: +1 on having QJM in hadoop-2.0.3. Any rough estimate when this is targeted for? 2012/11/15 Arun C Murthy a...@hortonworks.com On the heels of the planned 0.23.5 release (thanks Bobby Thomas) I want to rollout a hadoop-2.0.3 release to reflect the growing stability of YARN. I'm hoping we can also release the QJM along-with; hence I'd love to know an ETA - Todd? Sanjay? Suresh? One other thing which would be nice henceforth is to better reflect release content for end-users in release-notes etc.; thus, can I ask committers to start paying closer attention to bug classification such as Blocker/Critical/Major/Minor etc.? This way, as we get closer to stable hadoop-2 releases, we can do a better job communicating content and it's criticality. thanks, Arun -- Have a Nice Day! Lohit -- Todd Lipcon Software Engineer, Cloudera -- Todd Lipcon Software Engineer, Cloudera
Re: Large feature development
On Mon, Sep 3, 2012 at 12:05 AM, Arun C Murthy a...@hortonworks.com wrote: But, I'll stand by my point that YARN is at this point more alpha than HDFS2. I'll unfair to tag-team me while consistently ignoring what I write. I'm not sure I ignored what you wrote. I understand that Yahoo is deploying soon on one of their clusters. That's great news. My original point was about the state of YARN when it was merged, and the comment about its current state was more of an aside. Hardly worth debating further. Best of luck with the deployment next week - I look forward to reading about how it goes on the list. You brought up two bugs in the HDFS2 code base as examples of HDFS 2 not being high quality. Through a lot of words you just agreed with what I said - if people didn't upgrade to HDFS2 (not just HA) they wouldn't hit any of these: HDFS-3626, You could hit this on Hadoop 1, it was just harder to hit. HDFS-3731 etc. The details of this bug have to do with the upgrade/snapshot behavior of the blocksBeingWritten directory which was added in branch-1. In fact, the same basic bug continues to exist in branch-1. If you perform an upgrade, it doesn't hard-link the blocks into the new current directory. Hence, if the upgraded cluster exits safe mode (causing lease recovery of those blocks), and then the user issues a rollback, the blocks will have been deleted from the pre-upgrade image. This broken branch-1 behavior carried over into branch-2 as well, but it's not a new bug, as I said before. There are more, for e.g. how do folks work around Secondary NN not starting up on upgrades from hadoop-1 (HDFS-3597)? They just copy multiple PBs over to a new hadoop-2 cluster, or patch SNN themselves post HDFS-1073? No, they rm -Rf the contents of the 2NN directory, which is completely safe and doesn't data loss in any way. In fact, the bug fix is exactly that -- it just does the rm -Rf itself, automatically. It's a trivial workaround similar to how other bugs in the Hadoop 1 branch have required workarounds in the past. Certainly no data movement or local patching. The SNN is transient state and can always be cleared. If you have any questions about other bugs in the 2.x line, feel free to ask on the relevant JIRAs. I'm still perfectly confident in the stability of HDFS 2 vs HDFS 1. In fact my cell phone is likely the one that would ring if any of these production HDFS 2 clusters had an issue, and I'll offer the same publicly to anyone on this list. If you experience a corruption or data loss issue on the tip of branch-2 HDFS, email me off-list and I'll personally diagnose the issue. I would not make that same offer for branch-1 due to the fundamentally less robust design which has caused a lot of subtle bugs over the past several years. Thanks -Todd -- Todd Lipcon Software Engineer, Cloudera
Re: Large feature development
Hey Arun, First, let me apologize if my email came off as a personal snipe against the project or anyone working on it. I know the team has been hard at work for multiple years now on the project, and I certainly don't mean to denigrate the work anyone has done. I also agree that the improvements made possible by YARN are tremendously important, and I've expressed this opinion both online and in interviews with analysts, etc. But, I'll stand by my point that YARN is at this point more alpha than HDFS2. You brought up two bugs in the HDFS2 code base as examples of HDFS 2 not being high quality. The first, HDFS-3626, was indeed a messy bug, but had nothing to do with HA, the edit log rewrite, or any other of the changes being discussed in the thread. In fact, the bug has been there since the beginning of time, and is in fact present in Hadoop 1.0.x as well (which is why the JIRA is still open). You simply need to pass a non-canonicalized path by the Path(URI) constructor, and you'll see the same behavior in every release including 1.0.x, 0.20.x, or earlier. The reason it shows up more often in Hadoop 2 was actually due to the FsShell rewrite -- not any changes in HDFS itself, and certainly not related to HA like you've implied here. The other bug causes blocksBeingWritten to disappear upon upgrade. This, also, had nothing to do with any of the features being discussed in this thread, and in fact only impacts a cluster which is taken down _uncleanly_ prior to an upgrade. Upon starting the upgraded cluster, the user would be alerted to the missing blocks and could rollback with no lost data. So, while it should be fixed (and has been), I wouldn't consider it particularly frightening. Most users I am aware of do a clean shutdown of services like HBase before trying to upgrade their cluster, and, worst case, they would see the issue immediately after the upgrade and perform a rollback with no adverse effects. In branch-1, however, I've seen other bugs that I'd consider much more scary. Two in particular come to mind and together represent the vast majority of cases in which we've seen customers experience data corruption: HDFS-3652 and HDFS-2305. These two bugs were branch-1 only, and never present in Hadoop 2 due to the edit log rewrite project (HDFS-1073). So, at risk of this thread just becoming a laundry list of bugs that have existed in HDFS, or a list of bugs in YARN, I'll summarize: I still think that YARN is alpha and HDFS 2 is at least as stable as Hadoop 1.0. We have customers running it for production workloads, in multi-rack clusters, with great success. But this has nothing to do with this thread at hand, so I'll raise the question of alpha/beta/stable labeling in the context of our next release vote, and hope we can go back to the more fruitful discussion of how to encourage large feature development while maintaining stability. Thanks -Todd On Sun, Sep 2, 2012 at 3:11 PM, Arun Murthy a...@hortonworks.com wrote: Eli, On Sep 2, 2012, at 1:01 PM, Eli Collins e...@cloudera.com wrote: On Sat, Sep 1, 2012 at 12:47 PM, Arun C Murthy a...@hortonworks.com wrote: Todd, On Sep 1, 2012, at 1:20 AM, Todd Lipcon wrote: I'd actually contend that YARN was merged too early. I have yet to see anyone running YARN in production, and it's holding up the Stable moniker for Hadoop 2.0 -- HDFS-wise we are already quite stable and I'm seeing fewer issues in our customers running Hadoop HDFS 2 compared to Hadoop 1-derived code. You know I respect you a ton, but I'm very saddened to see you perpetuate this FUD on our public lists. I expected better, particularly when everyone is working towards the same goals of advancing Hadoop-2. This sniping on other members doing work is, um, I'll just stop here rather than regret later. 2. HDFS is more mature than YARN. Not a surprise given that we all agree YARN is alpha, and a much newer project than HDFS that hasn't yet been deployed in production environments yet (to my knowledge). Let's focus on the ground reality here. Please read my (or Rajiv's) message again about YARN's current stability and how much it's baked, it's deployment plans to a very large cluster in a few *days*. Or, talk to the people developing, testing and supporting these customers and clusters. I'll repeat - YARN has clearly baked much more than HDFS HA given the basic bugs (upgrade, edit logs corruption etc.) we've seen after being declared *done*; but then we just disagree since clearly I'm more conservative. Also, we need to be more conservative wrt HDFS - but then what would I know... I'll admit it's hard to discuss with someone (or a collective) who just repeat themselves. Plus, I broke my own rule about email this weekend - so, I'll try harder. Arun -- Todd Lipcon Software Engineer, Cloudera
Re: Heads up: next hadoop-2 release
On Fri, Aug 31, 2012 at 1:15 PM, Eli Collins e...@cloudera.com wrote: Yea, I think we should nuke 2.1.0-alpha and re-create when we're actually going to do a release. On the HDFS side there's quite a few things already waiting to get out, if it's going to take another 4 or so weeks then would be great to shoot for getting HDFS-3077. Seems doable to me. I'm in the finishing touches stages now, and feeling pretty confident about the basic protocol after a few machine-years of fault injection testing, plus some early test results on a 100 node QA setup. After the current round of open JIRAs goes in, I'll start a sweep for findbugs, removing TODOs, and a adding a few more stress tests. Then I think it will be a good time to propose a merge. -Todd -- Todd Lipcon Software Engineer, Cloudera
Re: Large feature development
Thanks for starting this thread, Steve. I think your points below are good. I've snipped most of your comment and will reply inline to one bit below: On Fri, Aug 31, 2012 at 10:07 AM, Steve Loughran steve.lough...@gmail.com wrote: Of the big changes that have worked, they are 1. HDFS 2's HA and ongoing improvements: collaborative dev on the list with incremental changes going on in trunk, RTC with lots of tests. This isn't finished, and the test problem there is that functional testing of all failure modes requires software-controlled fencing devices and switches -and tests to generated the expected failure space. Actually, most of the HDFS HA code has been done on branches. The first work that led towards HA was the redesign of the edits logging infrastrucutre -- HDFS-1073. This was a feature branch with about 60 patches on it. Then HDFS-1623, the main manual-failover HA development, had close to 150 patches on the branch. Automatic HA (HDFS-3042) was some 15-20 patches. The current work (removing dependency on NAS) is around 35 patches in so far and getting close to merge. In these various branches, we've experimented with a few policies which have differed from trunk. In particular: - HDFS-1073 had a modified review then commit policy, which was that, if a patch sat without a review for more than 24hrs, we committed it with the restriction that there would be a post-commit review before the branch was merged. - All of the branches have done away with the requirement of running the full QA suite, findbugs, etc prior to commit. This means that the branches at times have broken tests checked in, but also makes it quicker to iterate on the new feature. Again, the assumption is that these requirements are met before merge. - In all cases there has been a design doc and some good design discussion up front before substantial code was written. This made it easier to forge ahead on the branch with good confidence that the community was on-board with the idea. Given my experiences, I think all of the above are useful to follow. It means development can happen quickly, but ensures that when the merge is proposed, people feel like the quality meets our normal standards. 2. YARN: Arun on his own branch, CTR, merge once mostly stable, and completely replacing MRv1. I'd actually contend that YARN was merged too early. I have yet to see anyone running YARN in production, and it's holding up the Stable moniker for Hadoop 2.0 -- HDFS-wise we are already quite stable and I'm seeing fewer issues in our customers running Hadoop HDFS 2 compared to Hadoop 1-derived code. How then do we get (a) more dev projects working and integrated by the current committers, and (b) a process in which people who are not yet contributors/committers can develop non-trivial changes to the project in a way that it is done with the knowledge, support and mentorship of the rest of the community? Here's one proposal, making use of git as an easy way to allow non-committers to commit code while still tracking development in the usual places: - Upon anyone's request, we create a new Version tag in JIRA. - The developers create an umbrella JIRA for the project, and file the individual work items as subtasks (either up front, or as they are developed if using a more iterative model) - On the umbrella, they add a pointer to a git branch to be used as the staging area for the branch. As they develop each subtask, they can use the JIRA to discuss the development like they would with a normally committed JIRA, but when they feel it is ready to go (not requiring a +1 from any committer) they commit to their git branch instead of the SVN repo. - When the branch is ready to merge, they can call a merge vote, which requires +1 from 3 committers, same as a branch being proposed by an existing committer. A committer would then use git-svn to merge their branch commit-by-commit, or if it is less extensive, simply generate a single big patch to commit into SVN. My thinking is that this would provide a low-friction way for people to collaborate with the community and develop in the open, without having to work closely with any committer to review every individual subtask. Another alternative, if people are reluctant to use git, would be to add a sandbox/ repository inside our SVN, and hand out commit bit to branches inside there without any PMC vote. Anyone interested in contributing could request a branch in the sandbox, and be granted access as soon as they get an apache SVN account. -Todd -- Todd Lipcon Software Engineer, Cloudera
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project
, operate as distinct communities, and try to solve the code duplication/dependency issues from there. 7. If 4b; then graduate as TLP from Incubator. -snip So that's my proposal. Thanks guys. Cheers, Chris ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ -- Todd Lipcon Software Engineer, Cloudera
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project
On Wed, Aug 29, 2012 at 4:20 PM, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: Arun, great work below. Concrete, and an actual proposal of PMC lists. What do folks think? Already expressed my opinion above on the thread that whole idea of splitting is crazy. But, I'll comment on some specifics of the proposal as well: I think the simplest way is to have all existing HDFS committers be committers and PMC members of the new project. That list is found in the asf-authorization-template which has: Why? If we were to do this, why not take the opportunity to narrow down into the people who are actually active contributors to the project? (per your reasoning on the YARN thread) hadoop-hdfs = acmurthy,atm,aw,boryas,cdouglas,cos,cutting,daryn,ddas,dhruba,eli,enis,eric14,eyang,gkesavan,hairong,harsh,jitendra,jghoman,johan,knoguchi,kzhang,lohit,mahadev,matei,mattf,molkov,nigel,omalley,ramya,rangadi,sharad,shv,sradia,stevel,suresh,szetszwo,tanping,todd,tomwhite,tucu,umamahesh,yhemanth,zshao Of these, only the following people have actually contributed more than 5 patches to common and HDFS in the last year: Hairong Kuang (7): Vinod Kumar Vavilapalli (7): Daryn Sharp (8): Matthew J. Foley (10): Devaraj Das (11): Mahadev Konar (15): Eric Yang (18): Sanjay Radia (18): Thomas Graves (18): Thomas White (21): Konstantin Shvachko (23): Steve Loughran (24): Arun Murthy (32): Uma Maheswara Rao G (36): Jitendra Nath Pandey (51): Harsh J (68): Robert Joseph Evans (71): Alejandro Abdelnur (106): Suresh Srinivas (107): Aaron Twining Myers (171): Tsz-wo Sze (184): Eli Collins (252): Todd Lipcon (286): So I would propose: atm,daryn,ddas,eli,eyang,hairong,harsh,jitendra,mahadev,mattf,shv,sradia,stevel,suresh,szetszwo,todd,tomwhite,tucu,umamahesh and listing the others as Emeritus, who could easily regain committer status if they started contributing again. Proposal: Apache Hadoop MapReduce as a TLP I propose we graduate MapReduce as a TLP named 'Apache Hadoop MapReduce'. I think the simplest way is to have all existing MR committers be committers and PMC members of the new project. That list is found in the asf-authorization-template which has: hadoop-mapreduce = acmurthy,amareshwari,amarrk,aw,bobby,cdouglas,cos,cutting,daryn,ddas,dhruba,enis,eric14,eyang,gkesavan,hairong,harsh,hitesh,jeagles,jitendra,jghoman,johan,kimballa,knoguchi,kzhang,llu,lohit,mahadev,matei,mattf,nigel,omalley,ramya,rangadi,ravigummadi,schen,sharad,shv,sradia,sreekanth,sseth,stevel,szetszwo,tgraves,todd,tomwhite,tucu,vinodkv,yhemanth,zshao Applying the same criteria, the list would be: Suresh Srinivas (6): Aaron Twining Myers (7): Steve Loughran (7): Ravi Gummadi (9): Konstantin Shvachko (11): Todd Lipcon (12): Tsz-wo Sze (16): Amar Kamat (17): Harsh J (20): Eli Collins (21): Thomas White (27): Siddharth Seth (46): Thomas Graves (60): Alejandro Abdelnur (71): Robert Joseph Evans (107): Mahadev Konar (118): Vinod Kumar Vavilapalli (164): Arun Murthy (209): (this is based on git shortlog on the directories in the repository) But I still think this discussion is silly, and we're not ready to do it. -- Todd Lipcon Software Engineer, Cloudera
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project
-- Todd Lipcon Software Engineer, Cloudera
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project
On Wed, Aug 29, 2012 at 4:47 PM, Konstantin Boudnik c...@apache.org wrote: I am curious where the arbitrar numbery 5 is coming from: is it reflected in the bylaws? Nope, I picked it based on Arun's earlier picking of the same number in the YARN thread. We have no bylaws about what would happen in the eventual TLP-ification of subcomponents, of course. -Todd -- Todd Lipcon Software Engineer, Cloudera
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project
technical issues that this would create. I want to spend that time making the product better, for our users benefit. Whether the users are Apache community users, or Cloudera customers, or Facebook's data scientists, they all are going to be happier if I spend a month improving our HA support compared to spending a month figuring out how to release three separate projects which somehow stitch together in a reasonable way at runtime without jar conflicts, tons of duplicate configuration work, byzantine version dependencies, etc. -Todd -- Todd Lipcon Software Engineer, Cloudera
Re: [DISCUSS] Merge *-user@hadoop lists?
Sure, +1. I already subscribe to all and filter into the same mailbox anyway :) -Todd On Fri, Jul 20, 2012 at 11:34 AM, Mahadev Konar maha...@hortonworks.com wrote: +1 . On Fri, Jul 20, 2012 at 10:48 AM, Jitendra Pandey jiten...@hortonworks.com wrote: +1 for merging. On Thu, Jul 19, 2012 at 11:25 PM, Arun C Murthy a...@hortonworks.com wrote: I've been thinking that we currently have too many *-user@ lists (common,hdfs,mapreduce) and confuse folks all the time resulting in too many cross-posts etc., particularly new users. Basically, it's too unwieldy and tedious. How about simplifying things by having a single user@hadoop.apache.orglist by merging all of them? Thoughts? Arun -- http://hortonworks.com/download/ -- Todd Lipcon Software Engineer, Cloudera
Re: [VOTE] Release hadoop-2.0.0-alpha-rc1
OK, the fixes to CHANGES.txt and JIRA are complete. Sorry for the mail bomb ;-) -Todd On Tue, May 15, 2012 at 10:30 PM, Todd Lipcon t...@cloudera.com wrote: Thanks for posting the new RC. Will take a look tomorrow. Meanwhile, I'm going through CHANGES.txt and JIRA and moving things that didn't make the 2.0.0 cut to 2.0.1. So, if folks commit things tomorrow, please check to put it in the right spot in CHANGES.txt and in JIRA. I'll take care of anything committed tonight that would conflict with my change. -Todd On Tue, May 15, 2012 at 7:20 PM, Arun C Murthy a...@hortonworks.com wrote: I've created a release candidate (rc1) for hadoop-2.0.0-alpha that I would like to release. It is available at: http://people.apache.org/~acmurthy/hadoop-2.0.0-alpha-rc1/ The maven artifacts are available via repository.apache.org. Please try the release and vote; the vote will run for the usual 7 days. thanks, Arun -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ -- Todd Lipcon Software Engineer, Cloudera -- Todd Lipcon Software Engineer, Cloudera
Re: [VOTE] Release hadoop-2.0.0-alpha
Hi Kumar, It looks like that patch was only committed to trunk, not branch-2. IMO we should keep the new changes for 2.0.0-alpha to a minimum (just things that impact client-server wire compatibility) and then plan a 2.0.1-alpha ASAP following this release, where we can pull in everything else that went into branch-2 in the last couple weeks since the 2.0.0-alpha branch was cut. Arun: do you have time today to roll a new RC? If not, I am happy to do so. Does that sound reasonable? -Todd On Tue, May 15, 2012 at 8:51 AM, Kumar Ravi kum...@us.ibm.com wrote: Hi, Can HDFS-3265 be included too? It seems like this was marked for inclusion but I can't seem to find the patch in the branch-2.0.0-alpha tree. Thanks, Kumar Kumar Ravi [image: Inactive hide details for Todd Lipcon ---05/14/2012 11:21:34 PM---Hey Arun,]Todd Lipcon ---05/14/2012 11:21:34 PM---Hey Arun, From: Todd Lipcon t...@cloudera.com To: general@hadoop.apache.org Date: 05/14/2012 11:21 PM Subject: Re: [VOTE] Release hadoop-2.0.0-alpha -- Hey Arun, One more thing on the rc tarball: the source artifact doesn't appear to be an exact svn export, based on a diff. For example, it includes the README, NOTICE, and LICENSE files, as well as a few other things which appear to be build artifacts (eg hadoop-hdfs-project/hadoop-hdfs/downloads, hadoop-hdfs-project/hadoop-hdfs/test_edit_log, etc). It seems like we _should_ have the various README style files, but we shouldn't have the test artifacts in our source release. In order to get our source release to match svn, perhaps we should move NOTICE, README, LICENSE, etc to the top level of our svn repo, such that a pure svn export would be a releaseable source artifact? -Todd On Mon, May 14, 2012 at 2:14 PM, Siddharth Seth seth.siddha...@gmail.com wrote: Do we want to get MAPREDUCE-4067 in as well ? It affects folks who may be writing their own AMs. Shouldn't affect MR clients though. I believe 2.0 alpha doesn't freeze the Yarn protocols for the 2.0 branch, so probably not critical. Thanks - Sid On Mon, May 14, 2012 at 1:32 PM, Eli Collins e...@cloudera.com wrote: As soon as jira is back up and I can post an updated patch I'll merge HDFS-3418 (also incompatible). On Mon, May 14, 2012 at 12:16 PM, Tsz Wo Sze szets...@yahoo.com wrote: I just have merged HADOOP-8285 and HADOOP-8366. I also have merged HDFS-3211 since it is an incompatible protocol change (without it, 2.0.0-alphaand 2.0.0 will be incompatible.) Tsz-Wo - Original Message - From: Tsz Wo Sze szets...@yahoo.com To: general@hadoop.apache.org general@hadoop.apache.org Cc: Sent: Monday, May 14, 2012 11:07 AM Subject: Re: [VOTE] Release hadoop-2.0.0-alpha Let me merge HADOOP-8285 and HADOOP-8366. Thanks. Tsz-Wo - Original Message - From: Uma Maheswara Rao G mahesw...@huawei.com To: general@hadoop.apache.org general@hadoop.apache.org Cc: Sent: Monday, May 14, 2012 10:56 AM Subject: RE: [VOTE] Release hadoop-2.0.0-alpha a) Revert HDFS-3157 and commit HADOOP-8285 and HADOOP-8366 on branch-2.0.0-alpha, so these are the only changes since rc0. Roll a new rc1 from here. I have merged HDFS-3157 revert. Do you mind taking a look at HADOOP-8285 and HADOOP-8366? Thanks, Uma From: Arun C Murthy [a...@hortonworks.com] Sent: Monday, May 14, 2012 10:24 PM To: general@hadoop.apache.org Subject: Re: [VOTE] Release hadoop-2.0.0-alpha Todd, Please go ahead and merge changes into branch-2.0.0-alpha and I'll roll RC1. thanks, Arun On May 12, 2012, at 10:05 PM, Todd Lipcon wrote: Looking at the release tag vs the current state of branch-2, I have two concerns from the point of view of HDFS: 1) We reverted HDFS-3157 in branch-2 because it sends deletions for corrupt replicas without properly going through the corrupt block path. We saw this cause data loss in TestPipelinesFailover. So, I'm nervous about putting it in a release, even labeled as alpha. 2) HADOOP-8285 and HADOOP-8366 changed the wire format for the RPC envelope in branch-2, but didn't make it into this rc. So, that would mean that future alphas would not be protocol-compatible with this alpha. Per a discussion a few weeks ago, I think we all were in agreement that, if possible, we'd like all 2.x to be compatible for client-server communication, at least (even if we don't support cross-version for the intra-cluster protocols) Do other folks think it's worth rolling an rc1? I would propose either: a) Revert HDFS-3157 and commit HADOOP-8285 and HADOOP-8366 on branch-2.0.0-alpha, so these are the only changes since rc0. Roll a new rc1 from here. or: b) Discard the current branch-2.0.0-alpha and re-branch from
Re: [VOTE] Release hadoop-2.0.0-alpha
On Tue, May 15, 2012 at 11:10 AM, Arun C Murthy a...@hortonworks.com wrote: Any more HDFS related merges before I roll RC1? I'm good as is. Thanks! On May 15, 2012, at 10:05 AM, Arun C Murthy wrote: Eli, is this done so I can roll rc1? On May 14, 2012, at 1:32 PM, Eli Collins wrote: As soon as jira is back up and I can post an updated patch I'll merge HDFS-3418 (also incompatible). On Mon, May 14, 2012 at 12:16 PM, Tsz Wo Sze szets...@yahoo.com wrote: I just have merged HADOOP-8285 and HADOOP-8366. I also have merged HDFS-3211 since it is an incompatible protocol change (without it, 2.0.0-alphaand 2.0.0 will be incompatible.) Tsz-Wo - Original Message - From: Tsz Wo Sze szets...@yahoo.com To: general@hadoop.apache.org general@hadoop.apache.org Cc: Sent: Monday, May 14, 2012 11:07 AM Subject: Re: [VOTE] Release hadoop-2.0.0-alpha Let me merge HADOOP-8285 and HADOOP-8366. Thanks. Tsz-Wo - Original Message - From: Uma Maheswara Rao G mahesw...@huawei.com To: general@hadoop.apache.org general@hadoop.apache.org Cc: Sent: Monday, May 14, 2012 10:56 AM Subject: RE: [VOTE] Release hadoop-2.0.0-alpha a) Revert HDFS-3157 and commit HADOOP-8285 and HADOOP-8366 on branch-2.0.0-alpha, so these are the only changes since rc0. Roll a new rc1 from here. I have merged HDFS-3157 revert. Do you mind taking a look at HADOOP-8285 and HADOOP-8366? Thanks, Uma From: Arun C Murthy [a...@hortonworks.com] Sent: Monday, May 14, 2012 10:24 PM To: general@hadoop.apache.org Subject: Re: [VOTE] Release hadoop-2.0.0-alpha Todd, Please go ahead and merge changes into branch-2.0.0-alpha and I'll roll RC1. thanks, Arun On May 12, 2012, at 10:05 PM, Todd Lipcon wrote: Looking at the release tag vs the current state of branch-2, I have two concerns from the point of view of HDFS: 1) We reverted HDFS-3157 in branch-2 because it sends deletions for corrupt replicas without properly going through the corrupt block path. We saw this cause data loss in TestPipelinesFailover. So, I'm nervous about putting it in a release, even labeled as alpha. 2) HADOOP-8285 and HADOOP-8366 changed the wire format for the RPC envelope in branch-2, but didn't make it into this rc. So, that would mean that future alphas would not be protocol-compatible with this alpha. Per a discussion a few weeks ago, I think we all were in agreement that, if possible, we'd like all 2.x to be compatible for client-server communication, at least (even if we don't support cross-version for the intra-cluster protocols) Do other folks think it's worth rolling an rc1? I would propose either: a) Revert HDFS-3157 and commit HADOOP-8285 and HADOOP-8366 on branch-2.0.0-alpha, so these are the only changes since rc0. Roll a new rc1 from here. or: b) Discard the current branch-2.0.0-alpha and re-branch from the current state of branch-2. -Todd On Fri, May 11, 2012 at 7:19 PM, Eli Collins e...@cloudera.com wrote: +1 I installed the build on a 6 node cluster and kicked the tires, didn't find any blocking issues. Btw in the future better to build from the svn repo so the revision is an svn rev from the release branch. Eg 1336254 instead of 40e90d3c7 which is from the git mirror, this way we're consistent across releases. hadoop-2.0.0-alpha $ ./bin/hadoop version Hadoop 2.0.0-alpha Subversion git://devadm900.cc1.ygridcore.net/grid/0/dev/acm/hadoop-trunk/hadoop-common-project/hadoop-common -r 40e90d3c7e5d71aedcdc2d9cc55d078e78944c55 Compiled by hortonmu on Wed May 9 16:19:55 UTC 2012 From source with checksum 3d9a13a31ef3a9ab4b5cba1f982ab888 On Wed, May 9, 2012 at 9:58 AM, Arun C Murthy a...@hortonworks.com wrote: I've created a release candidate for hadoop-2.0.0-alpha that I would like to release. It is available at: http://people.apache.org/~acmurthy/hadoop-2.0.0-alpha-rc0/ The maven artifacts are available via repository.apache.org. Please try the release and vote; the vote will run for the usual 7 days. This is a big milestone for the Apache Hadoop community - congratulations and thanks for all the contributions! thanks, Arun -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ -- Todd Lipcon Software Engineer, Cloudera -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ -- Todd Lipcon Software Engineer, Cloudera
Re: [VOTE] Release hadoop-2.0.0-alpha-rc1
Thanks for posting the new RC. Will take a look tomorrow. Meanwhile, I'm going through CHANGES.txt and JIRA and moving things that didn't make the 2.0.0 cut to 2.0.1. So, if folks commit things tomorrow, please check to put it in the right spot in CHANGES.txt and in JIRA. I'll take care of anything committed tonight that would conflict with my change. -Todd On Tue, May 15, 2012 at 7:20 PM, Arun C Murthy a...@hortonworks.com wrote: I've created a release candidate (rc1) for hadoop-2.0.0-alpha that I would like to release. It is available at: http://people.apache.org/~acmurthy/hadoop-2.0.0-alpha-rc1/ The maven artifacts are available via repository.apache.org. Please try the release and vote; the vote will run for the usual 7 days. thanks, Arun -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ -- Todd Lipcon Software Engineer, Cloudera
Re: [VOTE] Release hadoop-2.0.0-alpha
Hey Arun, One more thing on the rc tarball: the source artifact doesn't appear to be an exact svn export, based on a diff. For example, it includes the README, NOTICE, and LICENSE files, as well as a few other things which appear to be build artifacts (eg hadoop-hdfs-project/hadoop-hdfs/downloads, hadoop-hdfs-project/hadoop-hdfs/test_edit_log, etc). It seems like we _should_ have the various README style files, but we shouldn't have the test artifacts in our source release. In order to get our source release to match svn, perhaps we should move NOTICE, README, LICENSE, etc to the top level of our svn repo, such that a pure svn export would be a releaseable source artifact? -Todd On Mon, May 14, 2012 at 2:14 PM, Siddharth Seth seth.siddha...@gmail.com wrote: Do we want to get MAPREDUCE-4067 in as well ? It affects folks who may be writing their own AMs. Shouldn't affect MR clients though. I believe 2.0 alpha doesn't freeze the Yarn protocols for the 2.0 branch, so probably not critical. Thanks - Sid On Mon, May 14, 2012 at 1:32 PM, Eli Collins e...@cloudera.com wrote: As soon as jira is back up and I can post an updated patch I'll merge HDFS-3418 (also incompatible). On Mon, May 14, 2012 at 12:16 PM, Tsz Wo Sze szets...@yahoo.com wrote: I just have merged HADOOP-8285 and HADOOP-8366. I also have merged HDFS-3211 since it is an incompatible protocol change (without it, 2.0.0-alphaand 2.0.0 will be incompatible.) Tsz-Wo - Original Message - From: Tsz Wo Sze szets...@yahoo.com To: general@hadoop.apache.org general@hadoop.apache.org Cc: Sent: Monday, May 14, 2012 11:07 AM Subject: Re: [VOTE] Release hadoop-2.0.0-alpha Let me merge HADOOP-8285 and HADOOP-8366. Thanks. Tsz-Wo - Original Message - From: Uma Maheswara Rao G mahesw...@huawei.com To: general@hadoop.apache.org general@hadoop.apache.org Cc: Sent: Monday, May 14, 2012 10:56 AM Subject: RE: [VOTE] Release hadoop-2.0.0-alpha a) Revert HDFS-3157 and commit HADOOP-8285 and HADOOP-8366 on branch-2.0.0-alpha, so these are the only changes since rc0. Roll a new rc1 from here. I have merged HDFS-3157 revert. Do you mind taking a look at HADOOP-8285 and HADOOP-8366? Thanks, Uma From: Arun C Murthy [a...@hortonworks.com] Sent: Monday, May 14, 2012 10:24 PM To: general@hadoop.apache.org Subject: Re: [VOTE] Release hadoop-2.0.0-alpha Todd, Please go ahead and merge changes into branch-2.0.0-alpha and I'll roll RC1. thanks, Arun On May 12, 2012, at 10:05 PM, Todd Lipcon wrote: Looking at the release tag vs the current state of branch-2, I have two concerns from the point of view of HDFS: 1) We reverted HDFS-3157 in branch-2 because it sends deletions for corrupt replicas without properly going through the corrupt block path. We saw this cause data loss in TestPipelinesFailover. So, I'm nervous about putting it in a release, even labeled as alpha. 2) HADOOP-8285 and HADOOP-8366 changed the wire format for the RPC envelope in branch-2, but didn't make it into this rc. So, that would mean that future alphas would not be protocol-compatible with this alpha. Per a discussion a few weeks ago, I think we all were in agreement that, if possible, we'd like all 2.x to be compatible for client-server communication, at least (even if we don't support cross-version for the intra-cluster protocols) Do other folks think it's worth rolling an rc1? I would propose either: a) Revert HDFS-3157 and commit HADOOP-8285 and HADOOP-8366 on branch-2.0.0-alpha, so these are the only changes since rc0. Roll a new rc1 from here. or: b) Discard the current branch-2.0.0-alpha and re-branch from the current state of branch-2. -Todd On Fri, May 11, 2012 at 7:19 PM, Eli Collins e...@cloudera.com wrote: +1 I installed the build on a 6 node cluster and kicked the tires, didn't find any blocking issues. Btw in the future better to build from the svn repo so the revision is an svn rev from the release branch. Eg 1336254 instead of 40e90d3c7 which is from the git mirror, this way we're consistent across releases. hadoop-2.0.0-alpha $ ./bin/hadoop version Hadoop 2.0.0-alpha Subversion git:// devadm900.cc1.ygridcore.net/grid/0/dev/acm/hadoop-trunk/hadoop-common-project/hadoop-common -r 40e90d3c7e5d71aedcdc2d9cc55d078e78944c55 Compiled by hortonmu on Wed May 9 16:19:55 UTC 2012 From source with checksum 3d9a13a31ef3a9ab4b5cba1f982ab888 On Wed, May 9, 2012 at 9:58 AM, Arun C Murthy a...@hortonworks.com wrote: I've created a release candidate for hadoop-2.0.0-alpha that I would like to release. It is available at: http://people.apache.org/~acmurthy/hadoop-2.0.0-alpha-rc0/ The maven artifacts are available via repository.apache.org. Please try the release and vote; the vote will run for the usual 7 days. This is a big
Re: [VOTE] Release hadoop-2.0.0-alpha
Looking at the release tag vs the current state of branch-2, I have two concerns from the point of view of HDFS: 1) We reverted HDFS-3157 in branch-2 because it sends deletions for corrupt replicas without properly going through the corrupt block path. We saw this cause data loss in TestPipelinesFailover. So, I'm nervous about putting it in a release, even labeled as alpha. 2) HADOOP-8285 and HADOOP-8366 changed the wire format for the RPC envelope in branch-2, but didn't make it into this rc. So, that would mean that future alphas would not be protocol-compatible with this alpha. Per a discussion a few weeks ago, I think we all were in agreement that, if possible, we'd like all 2.x to be compatible for client-server communication, at least (even if we don't support cross-version for the intra-cluster protocols) Do other folks think it's worth rolling an rc1? I would propose either: a) Revert HDFS-3157 and commit HADOOP-8285 and HADOOP-8366 on branch-2.0.0-alpha, so these are the only changes since rc0. Roll a new rc1 from here. or: b) Discard the current branch-2.0.0-alpha and re-branch from the current state of branch-2. -Todd On Fri, May 11, 2012 at 7:19 PM, Eli Collins e...@cloudera.com wrote: +1 I installed the build on a 6 node cluster and kicked the tires, didn't find any blocking issues. Btw in the future better to build from the svn repo so the revision is an svn rev from the release branch. Eg 1336254 instead of 40e90d3c7 which is from the git mirror, this way we're consistent across releases. hadoop-2.0.0-alpha $ ./bin/hadoop version Hadoop 2.0.0-alpha Subversion git://devadm900.cc1.ygridcore.net/grid/0/dev/acm/hadoop-trunk/hadoop-common-project/hadoop-common -r 40e90d3c7e5d71aedcdc2d9cc55d078e78944c55 Compiled by hortonmu on Wed May 9 16:19:55 UTC 2012 From source with checksum 3d9a13a31ef3a9ab4b5cba1f982ab888 On Wed, May 9, 2012 at 9:58 AM, Arun C Murthy a...@hortonworks.com wrote: I've created a release candidate for hadoop-2.0.0-alpha that I would like to release. It is available at: http://people.apache.org/~acmurthy/hadoop-2.0.0-alpha-rc0/ The maven artifacts are available via repository.apache.org. Please try the release and vote; the vote will run for the usual 7 days. This is a big milestone for the Apache Hadoop community - congratulations and thanks for all the contributions! thanks, Arun -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ -- Todd Lipcon Software Engineer, Cloudera
Re: [VOTE] Release hadoop-2.0.0-alpha
Hi Andrew, Have you seen the new MiniMRClientCluster class? It's meant to be what you describe - a minicluster which only exposes external APIs -- most importantly a way of getting at a JobClient to submit jobs. We have it implemented in both 1.x and 2.x at this point, though I don't recall if it's in the 1.0.x releases or if it's only slated for 1.1+ -Todd On Wed, May 9, 2012 at 6:05 PM, Andrew Purtell andrew.purt...@gmail.com wrote: Hi Suresh, The unstable designation makes sense. As would one for MiniMRCluster. I was over the top initially to surprise. I'm sure the MR minicluster seems a minor detail. Maybe it's worth thinking about the miniclusters differently? Please pardon if I am rehashing an old discussion. Things like MRUnit for applications and BigTop for full cluster tests can help, but for as mentioned in the below annotation Pig, Hive, HBase, and other parts of the stack use miniclusters for local end to end testing in unit tests. As the complexity of the stack increases and we consider cross version support, unit tests on miniclusters I think will have no substitute. As Hadoop 2 has been evolving there has been some difficulty keeping up with minicluster changes. This makes sense. The attention to stability to client APIs and such, and the lack thereof to the minicluster, I think is self evident. But the need to fix up tests unpredictably introduces some friction that perhaps need not be there. Would a JIRA to discuss defining a subset of the minicluster interfaces as more stable be worthwhile? Best regards, - Andy On May 9, 2012, at 1:45 PM, Suresh Srinivas sur...@hortonworks.com wrote: For this reason, in HDFS, we change MiniDFSCluster to LimitedPrivate and not treat it as such: @InterfaceAudience.LimitedPrivate({HBase, HDFS, Hive, MapReduce, Pig}) @InterfaceStability.Unstable public class MiniDFSCluster { ...} On Wed, May 9, 2012 at 11:33 AM, Andrew Purtell apurt...@apache.org wrote: Sounds good Arun. How should we consider the suitability and stability of MiniMRCluster for downstream projects? On Wed, May 9, 2012 at 11:30 AM, Arun C Murthy a...@hortonworks.com wrote: No worries Andy. I can spin an rc1 once we can pin-point the bug. thanks, Arun On May 9, 2012, at 10:17 AM, Andrew Purtell wrote: -1 (nonbinding), we are currently facing a minicluster semantic change of some kind, or more than one: https://issues.apache.org/jira/browse/HBASE-5966 There are other HBase JIRAs related to 2.0.0-alpha that we are working on, but I'd claim those are all our fault for breaking abstractions to solve issues. In one case there's a new helpful 2.x API (ShutdownHookManager, thank you!) that we can eventually move to. However, the minicluster changes are causing us some repeated discomfort. It will break, we'll get some help fixing up our tests for that, then some time later it will break again, repeat. Perhaps we have no right to complain, the minicluster isn't meant to be used by downstream projects. If so then please disregard the complaint, but your assistance in helping to fix the breakage again would be much appreciated. And, if so, perhaps we can discuss what makes sense in terms of a stable minicluster consumable for downstream projects? Best regards, - Andy On Wed, May 9, 2012 at 9:58 AM, Arun C Murthy a...@hortonworks.com wrote: I've created a release candidate for hadoop-2.0.0-alpha that I would like to release. It is available at: http://people.apache.org/~acmurthy/hadoop-2.0.0-alpha-rc0/ The maven artifacts are available via repository.apache.org. Please try the release and vote; the vote will run for the usual 7 days. This is a big milestone for the Apache Hadoop community - congratulations and thanks for all the contributions! thanks, Arun -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) -- Todd Lipcon Software Engineer, Cloudera
Re: [VOTE] Release hadoop-0.23.2-rc0
On Thu, Apr 19, 2012 at 12:26 PM, Eli Collins e...@cloudera.com wrote: On Thu, Apr 19, 2012 at 11:45 AM, Arun C Murthy a...@hortonworks.com wrote: Yep, makes sense - I'll roll an rc0 for 2.0 after. However, we should consider whether HDFS protocols are 'ready' for us to commit to them for the foreseeable future, my sense is that it's a tad early - particularly with auto-failover not complete. Thus, we have a couple of options: a) Call the first release here as *2.0.0-alpha* version (lots of ASF projects do this). b) Just go with 2.0.0 and deem 2.0.x or 2.1.x as the first stable release and fwd-compatible release later. Given this is a major release (unlike something obscure like hadoop-0.23.0) I'm inclined to go with a) i.e. hadoop-2.0.0-alpha. Thoughts? Agree that we're a little too early on the HDFS protocol side, think MR2 is probably in a similar boat wrt stability as well. +1 to option a, calling it hadoop-2.0.0-alpha seems most appropriate. Regarding protocols: +1 to _not_ locking down cluster-internal wire compatibility at this point. i.e we can break DN-NN, or NN-SBN, or Admin command - NN compatibility still. +1 to locking down client wire compatibility with the release of 2.0. After 2.0 is released I would like to see all 2.0.x clients continue to be compatible. Now that we are protobuf-ified, I think this is doable. Should we open a separate discussion thread for the above? Regarding version numbering: either of the proposals seems fine by me. -Todd Arun On Apr 19, 2012, at 12:24 AM, Eli Collins wrote: Hey Arun, This vote passed a week or so ago, let's make it official? Also, are you still planning to roll a hadoop-2.0.0-rc0 of branch-2 this week? I think we should do that soon, if you're not planning to do this holler and I'd be happy to. There's only 1 blocker left (http://bit.ly/I55LAd) and it's patch available, I think we should role an rc from branch-2 when it's merged. Thanks, Eli On Thu, Mar 29, 2012 at 4:07 PM, Arun C Murthy a...@hortonworks.com wrote: 0.23.2 is just a small set of bug-fixes on top of 0.23.1 and doesn't have NN HA etc. As I've noted separately, I plan to put out a hadoop-2.0.0-rc0 in a couple weeks with NN HA, PB for HDFS etc. thanks, Arun On Mar 29, 2012, at 3:55 PM, Ted Yu wrote: What are the issues fixed / features added in 0.23.2 compared to 0.23.1 ? Thanks On Thu, Mar 29, 2012 at 3:45 PM, Arun C Murthy a...@hortonworks.com wrote: I've created a release candidate for hadoop-0.23.2 that I would like to release. It is available at: http://people.apache.org/~acmurthy/hadoop-0.23.2-rc0/ The maven artifacts are available via repository.apache.org. Please try the release and vote; the vote will run for the usual 7 days. thanks, Arun -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ -- Todd Lipcon Software Engineer, Cloudera
Re: Naming of Hadoop releases
On Mon, Mar 19, 2012 at 11:02 PM, Konstantin Shvachko shv.had...@gmail.com wrote: Feature freeze has been broken so many times for the .20 branch, so that it became a norm for the entire project rather than an exception, which we had in the past. I agree we should be stricter about what feature backports we allow into stable branches. Security and hflush were both necessary evils - I'm glad now that we have them, but we should try to stay out of these types of situations in the future where we feel compelled to backport (or re-do in the case of hflush/sync) such large items. I don't understand this constant segregation against Hadoop .22. It is a perfectly usable version of Hadoop. It would be waste not to have it released. Very glad that universities adopted it. If somebody needs security there is a number of choices, Hadoop-1 being the first. But if you cannot afford stand-alone HBase clusters or need to combine general Hadoop and HBase loads there is nothing else but Hadoop 0.22 at this point. I don't see what HBase has to do with it. In fact HBase runs way better on 1.x compared to 0.22. The tests don't even pass on 0.22 due to differences in the append semantics in 0.21+ compared to 0.20. Every production HBase deploy I know about runs on an 1.x based distribution. You could argue this is selection bias by nature of my employer, but the same is true based on emails to the hbase-user lists, etc. This is orthogonal to the discussion at hand, I just wanted to correct this lest any users get the wrong perception and migrate their HBase clusters to a version which is rarely used and strictly inferior for this use case. -Todd -- Todd Lipcon Software Engineer, Cloudera
Re: Naming of Hadoop releases
On Mon, Mar 19, 2012 at 2:56 PM, Doug Cutting cutt...@apache.org wrote: On 03/19/2012 02:47 PM, Arun C Murthy wrote: This is against the Apache Hadoop release policy on major releases i.e. only features deprecated for at least one release can be removed. In many case the reason this happened was that features were backported from trunk to 0.20 but not to 0.22. In other words, its no fault of the folks who were working on branch 0.22. I agree that it's no fault of the folks on 0.22. So a related policy we might add to prevent such situations in the future might be that if you backport something from branch n to n-2 then you ought to also be required to backport it to branch n-1 and in general to all intervening branches. Does that seem sensible? -1 on this requirement. Otherwise the cost of backporting something to the stable line becomes really high, and we'll end up with distributors just maintaining their own branches outside of Apache (the state we were in with 0.20.x). On the other hand, it does suck for users if they update from 1.x to 2.x and they end up losing some bug fixes or features they previously were running. Unfortunately, I don't have a better solution in mind that resolves the above problems - I just don't think it's tenable to combine a policy like anyone may make a release branch off trunk and claim a major version number with another policy like you have to port a fix to all intermediate versions in order to port a fix to any of them. If a group of committers wants to make a release branch, then the maintenance of that branch should be up to them. -Todd -- Todd Lipcon Software Engineer, Cloudera
Re: [VOTE] Rename hadoop branches post hadoop-1.x
My vote remains the same: (binding) (3) Rename branch-0.23 to branch-2, keep branch-0.22 as-is. (2) Rename branch-0.23 to branch-3, keep branch-0.22 as-is i.e. leave a hole. (1) Rename branch-0.22 to branch-2, rename branch-0.23 to branch-3. (4) If security is fixed in branch-0.22 within a short time-frame i.e. 2 months then we get option 1, else we get option 3. Effectively postpone discussion by 2 months, start a timer now. (5) Do nothing, keep branch-0.22 and branch-0.23 as-is. On Mon, Mar 19, 2012 at 6:06 PM, Arun C Murthy a...@hortonworks.com wrote: We've discussed several options: (1) Rename branch-0.22 to branch-2, rename branch-0.23 to branch-3. (2) Rename branch-0.23 to branch-3, keep branch-0.22 as-is i.e. leave a hole. (3) Rename branch-0.23 to branch-2, keep branch-0.22 as-is. (4) If security is fixed in branch-0.22 within a short time-frame i.e. 2 months then we get option 1, else we get option 3. Effectively postpone discussion by 2 months, start a timer now. (5) Do nothing, keep branch-0.22 and branch-0.23 as-is. Let's do a STV [1] to get reach consensus. Please vote by listing the options above in order of your preferences. My vote is 3, 4, 2, 1, 5 in order (binding). The vote will run the normal 7 days. thanks, Arun [1] http://en.wikipedia.org/wiki/Single_transferable_vote -- Todd Lipcon Software Engineer, Cloudera
Re: [VOTE] Release Apache Hadoop 0.23.1-rc2
-1, unfortunately. HDFS-2991 is a blocker regression introduced in 0.23.1. See the JIRA for instructions on how to reproduce on the rc2 build. -Todd On Fri, Feb 17, 2012 at 11:23 PM, Arun C Murthy a...@hortonworks.com wrote: I've created another release candidate for hadoop-0.23.1 that I would like to release. It is available at: http://people.apache.org/~acmurthy/hadoop-0.23.1-rc2/ The hadoop-0.23.1-rc2 svn tag: https://svn.apache.org/repos/asf/hadoop/common/tags/release-0.23.1-rc2 The maven artifacts for hadoop-0.23.1-rc2 are also available at repository.apache.org. Please try the release and vote; the vote will run for the usual 7 days. thanks, Arun -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ -- Todd Lipcon Software Engineer, Cloudera
Re: [VOTE] Release Apache Hadoop 0.23.1-rc2
On Wed, Feb 22, 2012 at 7:51 PM, Vinod Kumar Vavilapalli vino...@hortonworks.com wrote: Todd, From your analysis at HDFS-2991, looks like this was there in 0.23 too. Also, seems this happens only at scale, and only (paraphrasing you) when the file is reopened for append on an exact block boundary. Let me clarify: HDFS-2991 basically has two halves: First half (been present forever): when we append() on a block boundary, we don't log an OP_ADD Second half (new due to HDFS-2718): if we get an OP_CLOSE for a file we haven't OP_ADDed, we'll get a ClassCastException on startup. So even though the first half isn't a regression, the regression in the second half means that this longstanding bug will now actually prevent startup. Also, there's nothing related to scale here. I happened to run into it doing scale tests, but it turned out to not be relevant. You'll see it if you run TestDFSIO with standard parameters on trunk or 23.1 (that's how I discovered it). Agree it is a critical fix, but given above, can we proceed along with 0.23.1? Anyways, 0.23.1 is still an alpha (albeit of next level), so I'd think we can get that in for 0.23.2. Alright, consider me -0, though it's pretty nasty once you run into it. The only way I could start my NN again without losing data was to recompile with the fix in place. -Todd On Wed, Feb 22, 2012 at 6:43 PM, Todd Lipcon t...@cloudera.com wrote: -1, unfortunately. HDFS-2991 is a blocker regression introduced in 0.23.1. See the JIRA for instructions on how to reproduce on the rc2 build. -Todd On Fri, Feb 17, 2012 at 11:23 PM, Arun C Murthy a...@hortonworks.com wrote: I've created another release candidate for hadoop-0.23.1 that I would like to release. It is available at: http://people.apache.org/~acmurthy/hadoop-0.23.1-rc2/ The hadoop-0.23.1-rc2 svn tag: https://svn.apache.org/repos/asf/hadoop/common/tags/release-0.23.1-rc2 The maven artifacts for hadoop-0.23.1-rc2 are also available at repository.apache.org. Please try the release and vote; the vote will run for the usual 7 days. thanks, Arun -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ -- Todd Lipcon Software Engineer, Cloudera -- Todd Lipcon Software Engineer, Cloudera
Re: [VOTE] Release hadoop-0.23.1-rc0
I just committed HDFS-2923 to branch-0.23. This bug can cause big performance issues on the NN since the number of IPC handlers will default way too low and won't be changed with the expected config. Since there's a workaround, it's not a regression since 0.23.0, and 0.23.1 is still going to be labeled alpha/beta, it's up to you whether you want to spin an rc1 on account of just this bug. If there are other issues, though, definitely worth including this in rc1. -Todd On Wed, Feb 8, 2012 at 1:33 AM, Arun C Murthy a...@hortonworks.com wrote: I've created a release candidate for hadoop-0.23.1 that I would like to release. It is available at: http://people.apache.org/~acmurthy/hadoop-0.23.1-rc0/ Some highlights: # Since hadoop-0.23.0 in November there has been significant progress in branch-0.23 with nearly 400 jiras committed to it (68 in Common, 78 in HDFS and 242 in MapReduce). # An important aspect is that we've done a lot of performance related work and hadoop-0.23.1 matches or exceeds performance of hadoop-1 in pretty much every aspect of HDFS MapReduce. # Also, several downstream projects (HBase, Pig, Oozie, Hive etc.) seem to be playing nicely with hadoop-0.23.1. Please try the release and vote; the vote will run for the usual 7 days. thanks, Arun -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ -- Todd Lipcon Software Engineer, Cloudera
Re: [DISCUSS] Apache Hadoop 1.0?
On Tue, Nov 15, 2011 at 1:57 AM, Steve Loughran ste...@apache.org wrote: On 15/11/11 06:07, Dhruba Borthakur wrote: +1 to making the upcoming 0.23 release as 2.0. +1 And leave the 0.20.20x chain as is, just because people are used to it +1 to Steve's proposal. Renaming 0.20 is too big a pain at this point. Though it's weird to never have a 1.0, the 0.20 name is well ingrained, and I think renaming it at this point will cause a lot of confusion (plus cause problems for downstream projects like Hive and HBase which use regexes against the version string in various shim layers) -Todd -- Todd Lipcon Software Engineer, Cloudera
Re: Update on hadoop-0.23
On Tue, Oct 18, 2011 at 4:36 AM, Steve Loughran ste...@apache.org wrote: One more thing: are the ProtocolBuffers needed for all installations, or is that a compile-time requirement? If the binaries are going to be required, there's going to have to be one built for the various platforms, and source.deb/RPM files to build themselves on Linux. I'd rather avoid all that work The protobuf java jar is required at runtime. protoc (native) is only required at compile time. -Todd -- Todd Lipcon Software Engineer, Cloudera
Re: Java Versions and Hadoop
I think requiring Java 7 is years off... I think most people have doubts as to Java 7's stability until it's been adopted by a majority of applications, and the new features aren't compelling enough to jump ship, IMO. -Todd On Fri, Oct 7, 2011 at 3:33 PM, milind.bhandar...@emc.com wrote: Hi Folks, While I have seen the wiki on which java versions to use currently to run Hadoop, I have not seen any discussion about the roadmap of java version compatibility with future hadoop versions. Recently, Oracle retired the Operating System Distributor License for Java (DLJ) [http://robilad.livejournal.com/90792.html, http://jdk-distros.java.net/] and Linux vendors have started making OpenJDK (6/7) as the default java version bundled with their OSs [http://www.java7developer.com/blog/?p=361]. Also, all future Java SE updates will be delivered through OpenJDK updates project. I see that OpenJDK6 (6b20pre) cannot be used to compile hadoop trunk. Has anyone tried OpenJDK7 ? Additionally, I have a few small projects in mind which can really make use of the new (esp I/O) features of Java 7. What, if any, timeline do hadoop developers have in mind to make Java 7 as required (and tested with OpenJDK 7) ? Thanks, - milind --- Milind Bhandarkar Greenplum Labs, EMC (Disclaimer: Opinions expressed in this email are those of the author, and do not necessarily represent the views of any organization, past or present, the author might be affiliated with.) -- Todd Lipcon Software Engineer, Cloudera
Re: Update on hadoop-0.23
On Fri, Sep 30, 2011 at 11:44 AM, Roman Shaposhnik r...@apache.org wrote: I apologize if my level of institutional knowledge of these things is lacking, but do you have any benchmarking results between 0.22 and 0.20.2xx? The reason I'm asking is twofold -- I really would like to see an objective numbers qualifying the viability of 0.22 from the performance stand point, but more importantly I would really like to include the benchmarking code into Bigtop. 0.22 currently suffers from MAPREDUCE-2266, which, last time I benchmarked it, caused a significant slowdown. iirc a terasort ran something like twice as slow on my test cluster due to this bug. 0.23/MR2 doesn't suffer from this bug. -Todd -- Todd Lipcon Software Engineer, Cloudera
Welcoming Harsh J as a Hadoop committer
On behalf of the PMC, I am pleased to announce that Harsh J Chouraria has been elected a committer in the Apache Hadoop Common, HDFS, and MapReduce projects. Anyone subscribed to the mailing list or JIRA will undoubtedly recognize Harsh's name as one of the most helpful community members and an author of increasingly many code contributions. The Hadoop PMC and community appreciates Harsh's involvement and looks forward to continuing contributions! Welcome, Harsh! -Todd and the Hadoop Project Management Committee
Re: Add Append-HBase support in upcoming 20.205
The following other JIRAs have been committed in CDH for 18 months or so, for the purpose of HBase. You may want to consider backporting them as well - many were never committed to 0.20-append due to lack of reviews by HDFS committers at the time. HDFS-1056. Fix possible multinode deadlocks during block recovery when using ephemeral dataxceiv Description: Fixes the logic by which datanodes identify local RPC targets during block recovery for the case when the datanode is configured with an ephemeral data transceiver port. Reason: Potential internode deadlock for clusters using ephemeral ports HADOOP-6722. Workaround a TCP spec quirk by not allowing NetUtils.connect to connect to itself Description: TCP's ephemeral port assignment results in the possibility that a client can connect back to its own outgoing socket, resulting in failed RPCs or datanode transfers. Reason: Fixes intermittent errors in cluster testing with ephemeral IPC/transceiver ports on datanodes. HDFS-1122. Don't allow client verification to prematurely add inprogress blocks to DataBlockScanner Description: When a client reads a block that is also open for writing, it should not add it to the datanode block scanner. If it does, the block scanner can incorrectly mark the block as corrupt, causing data loss. Reason: Potential dataloss with concurrent writer-reader case. HDFS-1248. Miscellaneous cleanup and improvements on 0.20 append branch Description: Miscellaneous code cleanup and logging changes, including: - Slight cleanup to recoverFile() function in TestFileAppend4 - Improve error messages on OP_READ_BLOCK - Some comment cleanup in FSNamesystem - Remove toInodeUnderConstruction (was not used) - Add some checks for null blocks in FSNamesystem to avoid a possible NPE - Only log inconsistent size warnings at WARN level for non-under-construction blocks. - Redundant addStoredBlock calls are also not worthy of WARN level - Add some extra information to a warning in ReplicationTargetChooser Reason: Improves diagnosis of error cases and clarity of code HDFS-1242. Add unit test for the appendFile race condition / synchronization bug fixed in HDFS-142 Reason: Test coverage for previously applied patch. HDFS-1218. Replicas that are recovered during DN startup should not be allowed to truncate better replicas. Description: If a datanode loses power and then recovers, its replicas may be truncated due to the recovery of the local FS journal. This patch ensures that a replica truncated by a power loss does not truncate the block on HDFS. Reason: Potential dataloss bug uncovered by power failure simulation HDFS-915. Write pipeline hangs for too long when ResponseProcessor hits timeout Description: Previously, the write pipeline would hang for the entire write timeout when it encountered a read timeout (eg due to a network connectivity issue). This patch interrupts the writing thread when a read error occurs. Reason: Faster recovery from pipeline failure for HBase and other interactive applications. HDFS-1186. Writers should be interrupted when recovery is started, not when it's completed. Description: When the write pipeline recovery process is initiated, this interrupts any concurrent writers to the block under recovery. This prevents a case where some edits may be lost if the writer has lost its lease but continues to write (eg due to a garbage collection pause) Reason: Fixes a potential dataloss bug commit a960eea40dbd6a4e87072bdf73ac3b62e772f70a Author: Todd Lipcon t...@lipcon.org Date: Sun Jun 13 23:02:38 2010 -0700 HDFS-1197. Received blocks should not be added to block map prematurely for under construction files Description: Fixes a possible dataloss scenario when using append() on real-life clusters. Also augments unit tests to uncover similar bugs in the future by simulating latency when reporting blocks received by datanodes. Reason: Append support dataloss bug Author: Todd Lipcon HDFS-1260. tryUpdateBlock should do validation before renaming meta file Description: Solves bug where block became inaccessible in certain failure conditions (particularly network partitions). Observed under HBase workload at user site. Reason: Potential loss of syunced data when write pipeline fails On Fri, Sep 2, 2011 at 11:20 AM, Suresh Srinivas sur...@hortonworks.com wrote: I also propose following jiras, which are non append related bug fixes from 0.20-append branch
Re: hadoop-0.23
On Thu, Aug 18, 2011 at 9:36 AM, Arun C Murthy a...@hortonworks.com wrote: Good morning! On Jul 13, 2011, at 3:39 PM, Arun C Murthy wrote: It's looking like trunk is moving along rapidly - it's about time to start thinking of the next release to unlock all of the goodies there. As the RM, my current thinking is that after we merge NextGen MR (MR-279) and the HDFS-1073 branch into trunk we should be good to create the hadoop-0.23 branch. Since the last time we spoke (actually, since last night, in fact!) the world (trunk) has changed to accommodate our wishes... *smile* HDFS-1073 and MAPREDUCE-279 are presently in trunk and I think it's time to cut the 0.23 branch so that we can focus on testing and stabilizing a hadoop-0.23 release off that branch. I propose to do it noon of the coming Monday (Aug 22). Thoughts? I assume we will make sure the HDFS mavenization is in before then? Tom said he intends to commit it tomorrow, but if something comes up and it's not committed, let's make sure mavenization happens before we branch. Also, what will be the guidelines for committing a change to 0.23 branch? Is it bug fix only or are we still allowing improvements? Given how recently MR2 was merged, I imagine there will be a lot of things that aren't strictly bugs that we will really want to have in our next release. I also have a couple of HDFS patches (eg the new faster CRC on-by-default) that I'd like to get into 23. -Todd -- Todd Lipcon Software Engineer, Cloudera
Re: Vote to merge HDFS-1073 ito trunk
Thanks for the votes. The vote has passed and I committed a merge to trunk just now. If anything breaks, don't hesitate to drop me a mail. -Todd On Thu, Jul 28, 2011 at 12:27 PM, Matt Foley mfo...@hortonworks.com wrote: +1 for the merge. I've read a majority of the code changes, excluding the BNN and 2NN, approaching from the big diff rather than individual patches, and starting with the files most changed from both current trunk and the 1073 branchpoint. I've found almost nothing to comment on. It looks like a solid job, it is a significant simplification of FSEditLog, and I have become confident that the merge should proceed. --Matt From: Eli Collins e...@cloudera.com Date: Tue, 19 Jul 2011 18:43:58 -0700 +1 for the merge. I've reviewed all but a handful of the 50+ individual patches, also looked at the merge patch for sanity and it looks good. From: Jitendra Pandey jiten...@hortonworks.com Date: Tue, 19 Jul 2011 18:23:39 -0700 +1 for the merge. I haven't looked at BackupNode changes in much detail, but apart from that the patch looks good. On Tue, Jul 19, 2011 at 6:12 PM, Todd Lipcon t...@cloudera.com wrote: Hi all, HDFS-1073 is now complete and ready to be merged. Many thanks to those who helped review in the last two weeks. Hudson test-patch results are available on HDFS-1073 JIRA - please see the recent comments there for explanations. A few notes that may help you vote: - I have run the NNThroughputBenchmark and seen just a small regression in logging performance due to the inclusion of a txid with every edit for increased robustness. - The NN read path and the read/write IO paths are entirely untouched by these changes. - Image and edit load time were benchmarked throughout development of the branch and no significant regressions have been seen. Since this is a code change, all committers should feel free to vote. The voting requires three committer +1s and no -1s to pass. I will not vote since I contributed the majority of the code in the branch, though obviously I'm +1 :) -Todd -- Todd Lipcon Software Engineer, Cloudera -- Todd Lipcon Software Engineer, Cloudera
Vote to merge HDFS-1073 ito trunk
Hi all, HDFS-1073 is now complete and ready to be merged. Many thanks to those who helped review in the last two weeks. Hudson test-patch results are available on HDFS-1073 JIRA - please see the recent comments there for explanations. A few notes that may help you vote: - I have run the NNThroughputBenchmark and seen just a small regression in logging performance due to the inclusion of a txid with every edit for increased robustness. - The NN read path and the read/write IO paths are entirely untouched by these changes. - Image and edit load time were benchmarked throughout development of the branch and no significant regressions have been seen. Since this is a code change, all committers should feel free to vote. The voting requires three committer +1s and no -1s to pass. I will not vote since I contributed the majority of the code in the branch, though obviously I'm +1 :) -Todd -- Todd Lipcon Software Engineer, Cloudera
Re: Hoping to merge HDFS-1073 branch soon
On Tue, Jul 12, 2011 at 10:38 AM, sanjay Radia san...@hortonworks.comwrote: We can merge 1580 after 1073 is merged in. Looks like the biggest thing in your 1073 list is the Backup NN related changes. The BN-related changes are done and just awaiting code review. See HDFS-1979. The current list of patches awaiting review are: HDFS-1979, HDFS-2101, HDFS-2133, HDFS-1780, HDFS-2104, HDFS-2135. Are you shooting for end of this month? I'm hoping as early as next week, assuming folks feel the branch is in good shape. If all goes well, I'll have code reviews back for the above in the next day or two, can respond to review comments and commit over the weekend, and call a vote to merge early next week. Thanks -Todd On Jul 6, 2011, at 8:03 PM, Todd Lipcon wrote: Hi all, Just an update on this project: - The current list of uncommitted patches up for review is: 1bea9d3 HDFS-1979. Fix BackupNode and CheckpointNode 32db384 Amend HDFS-2011. Fix TestCheckpoint test for double close/abort of ELFOS b6a55a4 HDFS-2101. Update remaining unit tests for new layout ca0ace6 HDFS-2133. Address TODOs left in code b46825d HDFS-1780. reduce need to rewrite fsimage on statrtup 30c858d HDFS-2104. Add flag to SecondaryNameNode to format it during startup 942eaef HDFS-2135. Fix regression of HDFS-1955 in branch I believe Eli is going to work on reviewing these this week. - I've set up a Hudson job for the branch here: https://builds.apache.org/job/Hadoop-Hdfs-1073-branch/ It's currently failing because it's missing some of the patches above. After the above patches go in, I expect a pretty clean build, modulo maybe one or two things that are environment issues, which I'll tackle later this week. - BackupNode and CheckpointNode are working. I've done some basic functional testing by pounding edits into the NN while both a 2NN and a BN are checkpointing every 2 seconds. - I merged with trunk as of this morning, so I think we should be up-to-date with trunk patches. Aaron was very helpful and went through all NN-related patches in trunk from the last 3 months to make sure we didn't inadvertently regress anything - he discovered one bug but everything else looks good. Once the above patches are in the branch, I would like to merge. So, if you plan on reviewing pre-merge, please do so *this week*. Of course, if you don't have time and you find issues post-merge, I absolutely plan on fixing them ASAP ;-) Thanks -Todd On Thu, Jun 30, 2011 at 12:11 AM, Todd Lipcon t...@cloudera.com wrote: Hey all, Work on the HDFS-1073 branch has been progressing steadily, and I believe we're coming close to the point where it can be merged. To briefly summarize the status: - NameNode and SecondaryNameNode are both fully working and have undergone some stress/fault testing in addition to a over 3000 lines worth of new unit tests. - Most of the existing unit tests have been updated, though a few more need some small tweaks (HDFS-2101) - The BackupNode and CheckpointNode are not currently working, though I am working on it locally and making good progress (HDFS-1979) - There are a few various and sundry small improvements that should probably be done before release, but I think could be done either before or after merge (eg HDFS-2104) Given this, I am expecting that we can merge this into trunk by the end of July if not earlier, as soon as the BN/CN work is complete. If you are hoping to review the code or tests before merge time, this is your early warning! Please do so now! Thanks! -Todd P.S. I will also be giving a short talk about the motivations and current status of this project at Friday's contributor meeting, for those who are able to attend. If we're lucky, maybe even a demo! -- Todd Lipcon Software Engineer, Cloudera -- Todd Lipcon Software Engineer, Cloudera -- Todd Lipcon Software Engineer, Cloudera
Re: [VOTE] Change bylaws to require 3 binding +1s for branch merge
To clarify, is there any restriction on who may give the +1s? For example, if a branch has a group of 5 committers primarily authoring the patches, can the three +1s be made by a subset of those committers? -Todd On Mon, Jul 11, 2011 at 5:11 PM, Jakob Homan jgho...@gmail.com wrote: As discussed in the recent thread on HDFS-1623 branching models, I'd like to amend the bylaws to provide that branches should get a minimum of three committer +1s before being merged to trunk. The rationale: Feature branches are often created in order that developers can iterate quickly without the review then commit requirements of trunk. Branches' commit requirements are determined by the branch maintainer and in this situation are often set up as commit-then-review. As such, there is no way to guarantee that the entire changeset offered for trunk merge has had a second pair of eyes on it. Therefore, it is prudent to give that final merge heightened scrutiny, particularly since these branches often extensively affect critical parts of the system. Requiring three binding +1s does not slow down the branch development process, but does provide a better chance of catching bugs before they make their way to trunk. Specifically, under the Actions subsection, this vote would add a new bullet item: * Branch merge: A feature branch that does not require the same criteria for code to be committed to trunk will require three binding +1s before being merged into trunk. The last bylaw change required lazy majority of PMC and ran for 7 days, which I believe would apply to this one as well. That would have this vote ending 5pm PST July 18. -Jakob -- Todd Lipcon Software Engineer, Cloudera
Re: [VOTE] Change bylaws to require 3 binding +1s for branch merge
Sounds fine to me. +1 On Mon, Jul 11, 2011 at 9:30 PM, Mahadev Konar maha...@hortonworks.comwrote: +1 mahadev On Mon, Jul 11, 2011 at 9:26 PM, Arun C Murthy a...@hortonworks.com wrote: +1 Arun On Jul 11, 2011, at 5:11 PM, Jakob Homan wrote: As discussed in the recent thread on HDFS-1623 branching models, I'd like to amend the bylaws to provide that branches should get a minimum of three committer +1s before being merged to trunk. The rationale: Feature branches are often created in order that developers can iterate quickly without the review then commit requirements of trunk. Branches' commit requirements are determined by the branch maintainer and in this situation are often set up as commit-then-review. As such, there is no way to guarantee that the entire changeset offered for trunk merge has had a second pair of eyes on it. Therefore, it is prudent to give that final merge heightened scrutiny, particularly since these branches often extensively affect critical parts of the system. Requiring three binding +1s does not slow down the branch development process, but does provide a better chance of catching bugs before they make their way to trunk. Specifically, under the Actions subsection, this vote would add a new bullet item: * Branch merge: A feature branch that does not require the same criteria for code to be committed to trunk will require three binding +1s before being merged into trunk. The last bylaw change required lazy majority of PMC and ran for 7 days, which I believe would apply to this one as well. That would have this vote ending 5pm PST July 18. -Jakob -- Todd Lipcon Software Engineer, Cloudera
Re: HDFS-1623 branching strategy
Sounds good to me. I think this strategy has worked well on the HDFS-1073 branch -- allowed development to be quite rapid, and at this point all but a couple trivial patches have been explicitly reviewed by a committer (and the others implicitly reviewed since later patches touched the same code area). +1. -Todd On Thu, Jul 7, 2011 at 1:43 PM, Aaron T. Myers a...@cloudera.com wrote: Hello everyone, This has been informally mentioned before, but I think it's best to be completely transparent/explicit about this. We (Sanjay, Suresh, Todd, Eli, myself, and anyone else who wants to help) intend to do the work for HDFS-1623 (High Availability Framework for HDFS NN) on a development branch off of trunk. The work in the HDFS-1073 development branch is necessary to complete HDFS-1623. As such, we're waiting for the work in HDFS-1073 to be merged into trunk before creating a branch for HDFS-1623. Once this branch is created, I'd like to use a similar modified commit-then-review policy for this branch as was done in the HDFS-1073 branch, which I think worked very well. To review, this was: {quote} - A patch will be uploaded to the JIRA for review like usual - If another committer provides a +1, it may be committed at that point, just like usual. - If no committer provides +1 (or a review asking for changes) within 24 business hours, it will be committed to the branch under commit then review policy.Of course if any committer feels that code needs to be amended, he or she should feel free to open a new JIRA against the branch including the review comments, and they will be addressed before the merge into trunk. And just like with any branch merge, ample time will be given for the community to review both the large merge commit as well as the individual historical commits of the branch, before it goes into trunk. {quote} I'm also volunteering to keep the HDFS-1623 development branch up to date with respect to merging the concurrent changes which go into trunk into this development branch to make sure the merge back into trunk is as painless as possible. Comments are certainly welcome on this strategy. Thanks a lot, Aaron -- Aaron T. Myers Software Engineer, Cloudera -- Todd Lipcon Software Engineer, Cloudera
Re: Hoping to merge HDFS-1073 branch soon
Hi all, Just an update on this project: - The current list of uncommitted patches up for review is: 1bea9d3 HDFS-1979. Fix BackupNode and CheckpointNode 32db384 Amend HDFS-2011. Fix TestCheckpoint test for double close/abort of ELFOS b6a55a4 HDFS-2101. Update remaining unit tests for new layout ca0ace6 HDFS-2133. Address TODOs left in code b46825d HDFS-1780. reduce need to rewrite fsimage on statrtup 30c858d HDFS-2104. Add flag to SecondaryNameNode to format it during startup 942eaef HDFS-2135. Fix regression of HDFS-1955 in branch I believe Eli is going to work on reviewing these this week. - I've set up a Hudson job for the branch here: https://builds.apache.org/job/Hadoop-Hdfs-1073-branch/ It's currently failing because it's missing some of the patches above. After the above patches go in, I expect a pretty clean build, modulo maybe one or two things that are environment issues, which I'll tackle later this week. - BackupNode and CheckpointNode are working. I've done some basic functional testing by pounding edits into the NN while both a 2NN and a BN are checkpointing every 2 seconds. - I merged with trunk as of this morning, so I think we should be up-to-date with trunk patches. Aaron was very helpful and went through all NN-related patches in trunk from the last 3 months to make sure we didn't inadvertently regress anything - he discovered one bug but everything else looks good. Once the above patches are in the branch, I would like to merge. So, if you plan on reviewing pre-merge, please do so *this week*. Of course, if you don't have time and you find issues post-merge, I absolutely plan on fixing them ASAP ;-) Thanks -Todd On Thu, Jun 30, 2011 at 12:11 AM, Todd Lipcon t...@cloudera.com wrote: Hey all, Work on the HDFS-1073 branch has been progressing steadily, and I believe we're coming close to the point where it can be merged. To briefly summarize the status: - NameNode and SecondaryNameNode are both fully working and have undergone some stress/fault testing in addition to a over 3000 lines worth of new unit tests. - Most of the existing unit tests have been updated, though a few more need some small tweaks (HDFS-2101) - The BackupNode and CheckpointNode are not currently working, though I am working on it locally and making good progress (HDFS-1979) - There are a few various and sundry small improvements that should probably be done before release, but I think could be done either before or after merge (eg HDFS-2104) Given this, I am expecting that we can merge this into trunk by the end of July if not earlier, as soon as the BN/CN work is complete. If you are hoping to review the code or tests before merge time, this is your early warning! Please do so now! Thanks! -Todd P.S. I will also be giving a short talk about the motivations and current status of this project at Friday's contributor meeting, for those who are able to attend. If we're lucky, maybe even a demo! -- Todd Lipcon Software Engineer, Cloudera -- Todd Lipcon Software Engineer, Cloudera
Hoping to merge HDFS-1073 branch soon
Hey all, Work on the HDFS-1073 branch has been progressing steadily, and I believe we're coming close to the point where it can be merged. To briefly summarize the status: - NameNode and SecondaryNameNode are both fully working and have undergone some stress/fault testing in addition to a over 3000 lines worth of new unit tests. - Most of the existing unit tests have been updated, though a few more need some small tweaks (HDFS-2101) - The BackupNode and CheckpointNode are not currently working, though I am working on it locally and making good progress (HDFS-1979) - There are a few various and sundry small improvements that should probably be done before release, but I think could be done either before or after merge (eg HDFS-2104) Given this, I am expecting that we can merge this into trunk by the end of July if not earlier, as soon as the BN/CN work is complete. If you are hoping to review the code or tests before merge time, this is your early warning! Please do so now! Thanks! -Todd P.S. I will also be giving a short talk about the motivations and current status of this project at Friday's contributor meeting, for those who are able to attend. If we're lucky, maybe even a demo! -- Todd Lipcon Software Engineer, Cloudera
Re: Hadoop Java Versions
On Thu, Jun 30, 2011 at 5:16 PM, Ted Dunning tdunn...@maprtech.com wrote: You have to consider the long-term reliability as well. Losing an entire set of 10 or 12 disks at once makes the overall reliability of a large cluster very suspect. This is because it becomes entirely too likely that two additional drives will fail before the data on the off-line node can be replicated. For 100 nodes, that can decrease the average time to data loss down to less than a year. This can only be mitigated in stock hadoop by keeping the number of drives relatively low. MapR avoids this by not failing nodes for trivial problems. I'd advise you to look at stock hadoop again. This used to be true, but was fixed a long while back by HDFS-457 and several followup JIRAs. If MapR does something fancier, I'm sure we'd be interested to hear about it so we can compare the approaches. -Todd On Thu, Jun 30, 2011 at 4:18 PM, Aaron Eng a...@maprtech.com wrote: Keeping the amount of disks per node low and the amount of nodes high should keep the impact of dead nodes in control. It keeps the impact of dead nodes in control but I don't think thats long-term cost efficient. As prices of 10GbE go down, the keep the node small arguement seems less fitting. And on another note, most servers manufactured in the last 10 years have dual 1GbE network interfaces. If one were to go by these calcs: 150 nodes with four 2TB disks each, with HDFS 60% full, it takes around ~32 minutes to recover It seems like that assumes a single 1GbE interface, why not leverage the second? On Thu, Jun 30, 2011 at 2:31 PM, Evert Lammerts evert.lamme...@sara.nl wrote: You can get 12-24 TB in a server today, which means the loss of a server generates a lot of traffic -which argues for 10 Gbe. But -big increase in switch cost, especially if you (CoI warning) go with Cisco -there have been problems with things like BIOS PXE and lights out management on 10 Gbe -probably due to the NICs being things the BIOS wasn't expecting and off the mainboard. This should improve. -I don't know how well linux works with ether that fast (field reports useful) -the big threat is still ToR switch failure, as that will trigger a re-replication of every block in the rack. Keeping the amount of disks per node low and the amount of nodes high should keep the impact of dead nodes in control. A ToR switch failing is different - missing 30 nodes (~120TB) at once cannot be fixed by adding more nodes; that actually increases ToR switch failure. Although such failure is quite rare to begin with, I guess. The back-of-the-envelope-calculation I made suggests that ~150 (1U) nodes should be fine with 1Gb ethernet. (e.g., when 6 nodes fail in a cluster with 150 nodes with four 2TB disks each, with HDFS 60% full, it takes around ~32 minutes to recover. 2 nodes failing should take around 640 seconds. Also see the attached spreadsheet.) This doesn't take ToR switch failure in account though. On the other hand - 150 nodes is only ~5 racks - in such a scenario you might rather want to shut the system down completely rather than letting it replicate 20% of all data. Cheers, Evert -- Todd Lipcon Software Engineer, Cloudera
Re: Thinking about the next hadoop mainline release
On Fri, Jun 24, 2011 at 5:28 PM, Arun C Murthy ar...@yahoo-inc.com wrote: Thanks Suresh! Todd - I'd appreciate if you could help on some of the HBase/Performance jiras... thanks! Sure thing. -Todd -- Todd Lipcon Software Engineer, Cloudera
Re: Thinking about the next hadoop mainline release
On Fri, Jun 17, 2011 at 7:15 AM, Arun C Murthy ar...@yahoo-inc.com wrote: I volunteer to be the RM for the release since I've been leading the NG NR effort. Are folks ok with this? +1. It would be an honor to fix bugs for you, Arun. -Todd Sent from my iPhone On Jun 17, 2011, at 1:45 PM, Ted Dunning tdunn...@maprtech.com wrote: NG map reduce is a huge deal both in terms of making things better for users, but also in terms of unblocking the Hadoop development process. On Fri, Jun 17, 2011 at 9:36 AM, Ryan Rawson ryano...@gmail.com wrote: - Next Generation Map-Reduce [MR-279] - Passing most tests now and discussing merging into trunk -- Todd Lipcon Software Engineer, Cloudera
Re: [VOTE] Shall we adopt the Defining Hadoop page
On Wed, Jun 15, 2011 at 7:19 PM, Craig L Russell craig.russ...@oracle.comwrote: There's no ambiguity. Either you ship the bits that the Apache PMC has voted on as a release, or you change it (one bit) and it is no longer what the PMC has voted on. It's a derived work. The rules for voting in Apache require that if you change a bit in an artifact, you can no longer count votes for the previous artifact. Because the new work is different. A new vote is required. Sorry, but this is just silly. Are you telling me that the httpd package in Ubuntu isn't Apache httpd? It has 43 patches applied. Tomcat6 has 17. I'm sure every other commonly used piece of software bundled with ubuntu has been patched, too. I don't see them calling their packages Ubuntu HTTP server powered by Apache HTTPD. It's just httpd. The httpd in RHEL 5 is the same way. In fact they even provide some nice metadata in their patches, for example: httpd-2.0.48-release.patch:Upstream-Status: vendor-specific change httpd-2.1.10-apctl.patch:Upstream-Status: Vendor-specific changes for better initscript integration To me, this is a good thing: allowing vendors to redistribute the software with some modifications makes it much more accessible to users and businesses alike, and that's part of why Hadoop has had so much success. So long as we require the vendors to upstream those modifications back to the ASF, we get the benefits of these contributions back in the community and everyone should be happy. -Todd -- Todd Lipcon Software Engineer, Cloudera
Re: JIRAs post-unsplit
On Tue, Jun 14, 2011 at 9:35 AM, Rottinghuis, Joep jrottingh...@ebay.comwrote: Project un-split definitely simplifies things. Todd, if people add a watch based on patches, would they not miss notifictions for those entries in an earlier phase of their lifecycle? For example when issues are just reported, discussed and assigned, but no patch has been attached yet? Another thought that Alejandro just suggested offline is to use JIRA components rather than just the file paths. So, assuming there is a bot that watches the JIRA, it would be easy enough to allow you to permawatch a component (JIRA itself doesn't give this option). Then, assuming the patch is assigned the right components, it will be seen by people who care early on. If it's not given the right components, then it will be seen once you upload a patch. A separate HADOOPX Jira project would eliminate such issues. It does raise another question though: What happens if an issue starts out in one area, and then turns out to require changes in other areas? Would one then first create a HADOOP-x, a HDFS-y, or MAPREDUCE-z and then when it turns out other components are involved a new HADOOPX- referring to such earlier Jira? Cheers, Joep From: Todd Lipcon [t...@cloudera.com] Sent: Monday, June 13, 2011 1:37 PM To: general@hadoop.apache.org Subject: Re: JIRAs post-unsplit On Mon, Jun 13, 2011 at 11:51 AM, Konstantin Boudnik c...@apache.org wrote: I tend to agree: JIRA separation was the benefit of the split. I'd rather keep the current JIRA split in effect (e.g. separate JIRA projects for separate Hadoop components; don't recombine them) and file patches in the same way (for common, hdfs, mapreduce). If a cross component patch is needed then HADOOP project JIRA can be used for tracking, patches, etc. Yea, perhaps we just need the QA bot to be smart enough that it could handle a cross-project patch attached to HADOOP? Maybe we do something crazy and make a new HADOOPCROSS jira for patches that affect multiple projects? (just brainstorming here...) Tree-based watch-list seems like a great idea, but won't it narrow the scope somehow? Are you saying that if I am interested in say hdfs/src/c++/libhdfs, but a JIRA is open which affects libhdfs and something else (e.g. NameNode) I will still get the notification? Right, that's the idea. You'd be added as a watcher (and get notified) for any patch that touches the area you care about, regardless of whether it also touches some other areas. -Todd -- Todd Lipcon Software Engineer, Cloudera
Re: [VOTE] Powered by Logo
Who is allowed to vote in this? Committers? PMC? Everyone? My vote: 5, 2, 6, 3, 1, 4 On Tue, Jun 14, 2011 at 8:19 PM, Owen O'Malley omal...@apache.org wrote: All, We've had a wide range of entries for a powered by logo. I've put them all on a page, here: http://people.apache.org/~omalley/hadoop-powered-by/ Since there are a lot of contenders and we only want a single round of voting, let's use single transferable vote ( STV http://en.wikipedia.org/wiki/Single_transferable_vote). The important thing is to pick the images *IN ORDER* that you would like them. My vote (in order of course): 4, 1, 2, 3, 5, 6. In other words, I want option 4 most and option 6 least. With STV, you don't need to worry about voting for an unpopular choice since your vote will automatically roll over to your next choice. -- Owen -- Todd Lipcon Software Engineer, Cloudera
Re: HADOOP-7106 (project unsplit) this weekend
Hey Robert, It seems to be working for me... what URL are you trying to check out? I moved aside my ~/.subversion dir, and then did: $ svn co http://svn.apache.org/repos/asf/hadoop/common/branches/MR-279/ Thanks -Todd On Mon, Jun 13, 2011 at 7:05 AM, Robert Evans ev...@yahoo-inc.com wrote: Could someone unlock some of these branches for anonymous read only checkout? At least with MR-279 I get a 403 forbidden error when I try to check out. --Bobby On 6/12/11 6:38 PM, Todd Lipcon t...@cloudera.com wrote: OK, this seems to have succeeded without any big problems! I've re-enabled the git mirrors and the hudson builds. Feel free to commit to the new trees. Here are some instructions for the migration: === SVN users === Next time you svn up in your common working directory you'll end up seeing the combined tree - ie a mapreduce/, hdfs/, and common/ subdirectory. This is probably the easiest place from which to work, now. The URLs for the combined SVN trees are: trunk: https://svn.apache.org/repos/asf/hadoop/common/trunk/ branch-0.22: http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.22 branch-0.21: http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.21 yahoo-merge: http://svn.apache.org/repos/asf/hadoop/common/branches/yahoo-merge (this one has the yahoo-merge branches from common, hdfs, and mapred) MR-279: http://svn.apache.org/repos/asf/hadoop/common/branches/MR-279 (this one has the yahoo-merge common and hdfs, and the MR-279 mapred) The same kind of thing happened for HDFS-1073 and branch-0.21-old. Pre-project-split branches like branch-0.20 should have remained untouched. You can proceed to delete your checkouts of the individual mapred and hdfs trees, since they exist within the combined trees above. If for some reason you prefer to 'svn switch' an old MR or HDFS-specific checkout to point to its new location, you can use the following incantation: svn sw $(svn info | grep URL | awk '{print $2}' | sed 's,\(hdfs\|mapreduce\|common\)/\(.*\),common/\2/\1,') === Git Users === The git mirrors of the above 7 branches should now have a set of 4 commits near the top that look like this: Merge: 928d485 cd66945 77f628f Author: Todd Lipcon t...@apache.org Date: Sun Jun 12 22:53:28 2011 + HADOOP-7106. Reorganize SVN layout to combine HDFS, Common, and MR in a single tree (project unsplit) git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/trunk@113499413f79535-47bb-https://svn.apache.org/repos/asf/hadoop/common/trunk@113499413f79535-47bb-0310-9956-ffa450edef68 0310-9956-ffa450edef68 commit 77f628ff5925c25ba2ee4ce14590789eb2e7b85b Author: Todd Lipcon t...@apache.org Date: Sun Jun 12 22:53:27 2011 + Relocate mapreduce into mapreduce/ commit cd66945f62635f589ff93468e94c0039684a8b6d Author: Todd Lipcon t...@apache.org Date: Sun Jun 12 22:53:26 2011 + Relocate hdfs into hdfs/ commit 928d485e2743115fe37f9d123ce9a635c5afb91a Author: Todd Lipcon t...@apache.org Date: Sun Jun 12 22:53:25 2011 + Relocate common into common/ The first of these 4 is a 3-parent octopus merge commit of the pre-project-unsplit branches. In theory, git is smart enough to track changes through this merge, so long as you pass the right flags (eg --follow). For example: todd@todd-w510:~/git/hadoop-common$ git log --pretty=oneline --abbrev-commit --follow mapreduce/src/java/org/apache/hadoop/mapred/JobTracker.java | head -10 77f628f Relocate mapreduce into mapreduce/ 90df0cb MAPREDUCE-2455. Remove deprecated JobTracker.State in favour of JobTrackerStatus. ca2aba0 MAPREDUCE-2490. Add logging to graylist and blacklist activity to aid diagnosis of related issues. Contributed by Jonathan Eagles 32aaa2a MAPREDUCE-2515. MapReduce code references some deprecated options. Contributed by Ari Rabkin. If you want to be able to have git follow renames all the way through the project split back to the beginning of time, put the following in hadoop-common/.git/info/grafts: 5128a9a453d64bfe1ed978cf9ffed27985eeef36 6c16dc8cf2b28818c852e95302920a278d07ad0c 6a3ac690e493c7da45bbf2ae2054768c427fd0e1 6c16dc8cf2b28818c852e95302920a278d07ad0c 546d96754ffee3142bcbbf4563c624c053d0ed0d 6c16dc8cf2b28818c852e95302920a278d07ad0c In terms of rebasing git branches, git is actually pretty smart. For example, I have a local HDFS-1073 branch in my hdfs repo. To transition it to the new combined repo, I did the following: # Add my project-split hdfs git repo as a remote: git remote add splithdfs /home/todd/git/hadoop-hdfs/ git fetch splithdfs # Checkout a branch in my combined repo git checkout -b HDFS-1073 splithdfs/HDFS-1073 # Rebase it on the combined 1073 branch git rebase origin/HDFS-1073 ...and it actually applies my patches inside the appropriate subdirectory (I was surprised and impressed by this!) If the branch you're rebasing has added or moved files, it might not be smart
Re: HADOOP-7106 (project unsplit) this weekend
Hmm, as I got farther down my email this morning, I saw some complaints that HBase's SVN was giving 403s as well, this morning. Perhaps there is some ASF-wide issue going on, completely unrelated to the Hadoop changes over the weekend? -Todd On Mon, Jun 13, 2011 at 7:52 AM, Todd Lipcon t...@cloudera.com wrote: Hey Robert, It seems to be working for me... what URL are you trying to check out? I moved aside my ~/.subversion dir, and then did: $ svn co http://svn.apache.org/repos/asf/hadoop/common/branches/MR-279/ Thanks -Todd On Mon, Jun 13, 2011 at 7:05 AM, Robert Evans ev...@yahoo-inc.com wrote: Could someone unlock some of these branches for anonymous read only checkout? At least with MR-279 I get a 403 forbidden error when I try to check out. --Bobby On 6/12/11 6:38 PM, Todd Lipcon t...@cloudera.com wrote: OK, this seems to have succeeded without any big problems! I've re-enabled the git mirrors and the hudson builds. Feel free to commit to the new trees. Here are some instructions for the migration: === SVN users === Next time you svn up in your common working directory you'll end up seeing the combined tree - ie a mapreduce/, hdfs/, and common/ subdirectory. This is probably the easiest place from which to work, now. The URLs for the combined SVN trees are: trunk: https://svn.apache.org/repos/asf/hadoop/common/trunk/ branch-0.22: http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.22 branch-0.21http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.22branch-0.21 : http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.21 yahoo-mergehttp://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.21yahoo-merge : http://svn.apache.org/repos/asf/hadoop/common/branches/yahoo-merge (this one has the yahoo-merge branches from common, hdfs, and mapred) MR-279: http://svn.apache.org/repos/asf/hadoop/common/branches/MR-279 (this one has the yahoo-merge common and hdfs, and the MR-279 mapred) The same kind of thing happened for HDFS-1073 and branch-0.21-old. Pre-project-split branches like branch-0.20 should have remained untouched. You can proceed to delete your checkouts of the individual mapred and hdfs trees, since they exist within the combined trees above. If for some reason you prefer to 'svn switch' an old MR or HDFS-specific checkout to point to its new location, you can use the following incantation: svn sw $(svn info | grep URL | awk '{print $2}' | sed 's,\(hdfs\|mapreduce\|common\)/\(.*\),common/\2/\1,') === Git Users === The git mirrors of the above 7 branches should now have a set of 4 commits near the top that look like this: Merge: 928d485 cd66945 77f628f Author: Todd Lipcon t...@apache.org Date: Sun Jun 12 22:53:28 2011 + HADOOP-7106. Reorganize SVN layout to combine HDFS, Common, and MR in a single tree (project unsplit) git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/trunk@113499413f79535-47bb-https://svn.apache.org/repos/asf/hadoop/common/trunk@113499413f79535-47bb-0310-9956-ffa450edef68 0310-9956-ffa450edef68 commit 77f628ff5925c25ba2ee4ce14590789eb2e7b85b Author: Todd Lipcon t...@apache.org Date: Sun Jun 12 22:53:27 2011 + Relocate mapreduce into mapreduce/ commit cd66945f62635f589ff93468e94c0039684a8b6d Author: Todd Lipcon t...@apache.org Date: Sun Jun 12 22:53:26 2011 + Relocate hdfs into hdfs/ commit 928d485e2743115fe37f9d123ce9a635c5afb91a Author: Todd Lipcon t...@apache.org Date: Sun Jun 12 22:53:25 2011 + Relocate common into common/ The first of these 4 is a 3-parent octopus merge commit of the pre-project-unsplit branches. In theory, git is smart enough to track changes through this merge, so long as you pass the right flags (eg --follow). For example: todd@todd-w510:~/git/hadoop-common$ git log --pretty=oneline --abbrev-commit --follow mapreduce/src/java/org/apache/hadoop/mapred/JobTracker.java | head -10 77f628f Relocate mapreduce into mapreduce/ 90df0cb MAPREDUCE-2455. Remove deprecated JobTracker.State in favour of JobTrackerStatus. ca2aba0 MAPREDUCE-2490. Add logging to graylist and blacklist activity to aid diagnosis of related issues. Contributed by Jonathan Eagles 32aaa2a MAPREDUCE-2515. MapReduce code references some deprecated options. Contributed by Ari Rabkin. If you want to be able to have git follow renames all the way through the project split back to the beginning of time, put the following in hadoop-common/.git/info/grafts: 5128a9a453d64bfe1ed978cf9ffed27985eeef36 6c16dc8cf2b28818c852e95302920a278d07ad0c 6a3ac690e493c7da45bbf2ae2054768c427fd0e1 6c16dc8cf2b28818c852e95302920a278d07ad0c 546d96754ffee3142bcbbf4563c624c053d0ed0d 6c16dc8cf2b28818c852e95302920a278d07ad0c In terms of rebasing git branches, git is actually pretty smart. For example, I have a local HDFS-1073 branch in my hdfs repo. To transition it to the new combined repo, I did the following: # Add
Re: HADOOP-7106 (project unsplit) this weekend
On Mon, Jun 13, 2011 at 8:14 AM, Todd Lipcon t...@cloudera.com wrote: Oops, sorry about that one. I will take care of that in about 30 minutes (just headed out the door now to catch a train). If someone else with commit access wants to, you just need to propset the externals to point to th new common/trunk/common/src/test/bin instead of the old location. Fixed the svn:externals Also, the ant eclipse targets seem to be broken now. It seems like various parts of the eclipse target need to be commonized now (the .eclipse-templates stuff and .classpath, .launches, etc.) Will look into this as well. Can you explain further what's broken? Are you trying to make a project that's rooted in the directory that contains common/, mapreduce/, and hdfs/? I can imagine that wouldn't work, but I'm not sure why it wouldn't work to continue having three separate projects. Are you using some kind of SVN integration with Eclipse? -Todd -- Todd Lipcon Software Engineer, Cloudera
Re: HADOOP-7106 (project unsplit) this weekend
On Mon, Jun 13, 2011 at 9:24 AM, Jeffrey Naisbitt jnais...@yahoo-inc.comwrote: As you say though, I would think it should still work with three separate projects, so I'll just go back to that for now. It would be nice to be able to build the whole thing as a single project though (since it's a single repository in svn now), but that would probably take some extra work - and would probably make sense in a separate Jira. Yep - I think the idea is this will happen once we mavenize everything. Maven apparently supports the idea of modules - basically a recursive structure in which a top level pom file can build sub-poms, but the subpoms could also continue to be built independently if necessary. I think the top-level pom would live in the new root. -Todd -- Todd Lipcon Software Engineer, Cloudera
Re: HADOOP-7106 (project unsplit) this weekend
On Mon, Jun 13, 2011 at 11:42 AM, Tsz Wo (Nicholas), Sze s29752-hadoopgene...@yahoo.com wrote: Todd, Great work! A few minor problems: (1) I had committed MAPREDUCE-2588, however, an commit email was sent to both common-commits@ and mapreduce-commits@. (2) hadoop/site becomes an empty directory. (3) There are svn properties in hadoop/common/trunk/ (2) and (3) are simple. I will remove hadoop/site and the svn properties for hadoop/common/trunk/. Does anyone know how to fix (1)? Ah, we need to update the mailer config. This is the remaining task I mentioned on the HADOOP-7106 JIRA. We had updated it to include both the old and new spots for the sake of the transition. I'll ping Ian about this - I think he's the only one with access. A separated question: Strictly speaking, this is not project unsplit since we are going to submit patches for individual sub-projects as before, i.e. we keep generating and committing patches to common/trunk/[common/hdfs/mapreduce] but not common/trunk; hudson will pick up patches from individual sub-projects, etc. Am I correct? Just want to make sure that everyone is on the same page. :) Correct, that's where we at for today. I opened HADOOP-7384 which will allow patches generated against the full repo to apply to an individual project, to make life easier for git users, but right now we still need separate JIRAs and patches. See my other thread from this morning about some ideas how we might be able to do cross-project patches in a reasonable way. -Todd -- Todd Lipcon Software Engineer, Cloudera
Re: JIRAs post-unsplit
On Mon, Jun 13, 2011 at 4:54 PM, Konstantin Boudnik c...@apache.org wrote: On Mon, Jun 13, 2011 at 01:37PM, Todd Lipcon wrote: On Mon, Jun 13, 2011 at 11:51 AM, Konstantin Boudnik c...@apache.org wrote: I tend to agree: JIRA separation was the benefit of the split. I'd rather keep the current JIRA split in effect (e.g. separate JIRA projects for separate Hadoop components; don't recombine them) and file patches in the same way (for common, hdfs, mapreduce). If a cross component patch is needed then HADOOP project JIRA can be used for tracking, patches, etc. Yea, perhaps we just need the QA bot to be smart enough that it could handle a cross-project patch attached to HADOOP? Maybe we do something crazy and make a new HADOOPCROSS jira for patches that affect multiple projects? (just brainstorming here...) Correct me if I'm wrong but in the new structure cross-component patch differs from a component one by a patch level (i.e. p0 vs p1 if looked from common/trunk), right? I guess the bot can be hacked to use this distinction thus saving us an extra JIRA project which will merely serve the purpose of meta-project. Yes, I am about to commit HADOOP-7384 which can at least deal with patches relative to either trunk/ or trunk/project. But, it will also detect a cross-project patch and barf. It could certainly be extended to apply and test a cross-project patch, though it would be substantially more work. The advantage of a separate HADOOPX jira would be to allow people to notice cross-project patches. For example, a dev who primarily works on HDFS may not subscribe to mapreduce-dev or mapreduce-issues, but if an MR issue is going to modify something in the HDFS codebase, he or she will certainly want to be aware of it. -Todd Tree-based watch-list seems like a great idea, but won't it narrow the scope somehow? Are you saying that if I am interested in say hdfs/src/c++/libhdfs, but a JIRA is open which affects libhdfs and something else (e.g. NameNode) I will still get the notification? Right, that's the idea. You'd be added as a watcher (and get notified) for any patch that touches the area you care about, regardless of whether it also touches some other areas. -Todd On Mon, Jun 13, 2011 at 11:28AM, Todd Lipcon wrote: After the project unsplit this weekend, we're now back to a place where we have a single SVN/git tree that encompasses all of the subprojects. This opens up the next question: should we merge the JIRAs and allow a single issue to have a patch which spans projects? My thoughts are: - the biggest pain point with the project split is dealing with cross-project patches - one of the biggest reasons we did the project split was that the combined traffic from the HADOOP JIRA was hard to follow for people who really care about certain subprojects. - the jira split is a coarse-grained way of allowing people to watch just the sub-areas they care about. So, I was thinking the following... what if there were a way to watch JIRAs based on subtrees? I'm imagining a web page where any community user could have an account and manage a watch list of subtrees. If you want to watch all MR jiras, you could simply watch mapreduce/*. If you care only about libhdfs, you could watch hdfs/src/c++/libhdfs, etc. Then a bot would watch all patches attached to JIRA, and any time a patch is uploaded that touches something on your watch list, it automatically adds you as a watcher on the ticket and sends you a notification via email. It would also be easy to set up a watch based on patch size, for example. I think even if we don't recombine the JIRAs, this might be a handy way to cut down on mailing list traffic for contributors who have a more narrow focus on certain areas of the code. Does this sound useful? I don't know if/when I'd have time to build such a thing, but if the community thinks it would be really helpful, I might become inspired. -Todd -- Todd Lipcon Software Engineer, Cloudera -- Todd Lipcon Software Engineer, Cloudera -- Todd Lipcon Software Engineer, Cloudera
Re: HADOOP-7106 (project unsplit) this weekend
OK, this seems to have succeeded without any big problems! I've re-enabled the git mirrors and the hudson builds. Feel free to commit to the new trees. Here are some instructions for the migration: === SVN users === Next time you svn up in your common working directory you'll end up seeing the combined tree - ie a mapreduce/, hdfs/, and common/ subdirectory. This is probably the easiest place from which to work, now. The URLs for the combined SVN trees are: trunk: https://svn.apache.org/repos/asf/hadoop/common/trunk/ branch-0.22: http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.22 branch-0.21: http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.21 yahoo-merge: http://svn.apache.org/repos/asf/hadoop/common/branches/yahoo-merge (this one has the yahoo-merge branches from common, hdfs, and mapred) MR-279: http://svn.apache.org/repos/asf/hadoop/common/branches/MR-279 (this one has the yahoo-merge common and hdfs, and the MR-279 mapred) The same kind of thing happened for HDFS-1073 and branch-0.21-old. Pre-project-split branches like branch-0.20 should have remained untouched. You can proceed to delete your checkouts of the individual mapred and hdfs trees, since they exist within the combined trees above. If for some reason you prefer to 'svn switch' an old MR or HDFS-specific checkout to point to its new location, you can use the following incantation: svn sw $(svn info | grep URL | awk '{print $2}' | sed 's,\(hdfs\|mapreduce\|common\)/\(.*\),common/\2/\1,') === Git Users === The git mirrors of the above 7 branches should now have a set of 4 commits near the top that look like this: Merge: 928d485 cd66945 77f628f Author: Todd Lipcon t...@apache.org Date: Sun Jun 12 22:53:28 2011 + HADOOP-7106. Reorganize SVN layout to combine HDFS, Common, and MR in a single tree (project unsplit) git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/trunk@113499413f79535-47bb-0310-9956-ffa450edef68 commit 77f628ff5925c25ba2ee4ce14590789eb2e7b85b Author: Todd Lipcon t...@apache.org Date: Sun Jun 12 22:53:27 2011 + Relocate mapreduce into mapreduce/ commit cd66945f62635f589ff93468e94c0039684a8b6d Author: Todd Lipcon t...@apache.org Date: Sun Jun 12 22:53:26 2011 + Relocate hdfs into hdfs/ commit 928d485e2743115fe37f9d123ce9a635c5afb91a Author: Todd Lipcon t...@apache.org Date: Sun Jun 12 22:53:25 2011 + Relocate common into common/ The first of these 4 is a 3-parent octopus merge commit of the pre-project-unsplit branches. In theory, git is smart enough to track changes through this merge, so long as you pass the right flags (eg --follow). For example: todd@todd-w510:~/git/hadoop-common$ git log --pretty=oneline --abbrev-commit --follow mapreduce/src/java/org/apache/hadoop/mapred/JobTracker.java | head -10 77f628f Relocate mapreduce into mapreduce/ 90df0cb MAPREDUCE-2455. Remove deprecated JobTracker.State in favour of JobTrackerStatus. ca2aba0 MAPREDUCE-2490. Add logging to graylist and blacklist activity to aid diagnosis of related issues. Contributed by Jonathan Eagles 32aaa2a MAPREDUCE-2515. MapReduce code references some deprecated options. Contributed by Ari Rabkin. If you want to be able to have git follow renames all the way through the project split back to the beginning of time, put the following in hadoop-common/.git/info/grafts: 5128a9a453d64bfe1ed978cf9ffed27985eeef36 6c16dc8cf2b28818c852e95302920a278d07ad0c 6a3ac690e493c7da45bbf2ae2054768c427fd0e1 6c16dc8cf2b28818c852e95302920a278d07ad0c 546d96754ffee3142bcbbf4563c624c053d0ed0d 6c16dc8cf2b28818c852e95302920a278d07ad0c In terms of rebasing git branches, git is actually pretty smart. For example, I have a local HDFS-1073 branch in my hdfs repo. To transition it to the new combined repo, I did the following: # Add my project-split hdfs git repo as a remote: git remote add splithdfs /home/todd/git/hadoop-hdfs/ git fetch splithdfs # Checkout a branch in my combined repo git checkout -b HDFS-1073 splithdfs/HDFS-1073 # Rebase it on the combined 1073 branch git rebase origin/HDFS-1073 ...and it actually applies my patches inside the appropriate subdirectory (I was surprised and impressed by this!) If the branch you're rebasing has added or moved files, it might not be smart enough and you'll have to manually rename them in your branch inside of the appropriate subtree.. but for simple patches this seems to work. For less simple things, the best bet may be to use git filter-branch on the patch series to relocate it inside a subdirectory, and then try to rebase. Let me know if you need a hand with any git cleanup, happy to help. == Outstanding issues == The one outstanding issue I'm aware of is that the test-patch builds should be smart enough to be able to deal with patches that are relative to the combined root instead of the original project. Right now, if you export a diff from git, it will include hdfs/ or mapreduce/ in the changed file names, and the QA bot
Re: HADOOP-7106 (project unsplit) this weekend
Hi all, I'm figuring out one more small nit I noticed in my testing this evening. Hopefully I will figure out what's going wrong and be ready to press the big button tomorrow. Assuming I don't have to abort mission, my hope is to do this at around 3PM PST tomorrow (Sunday). I'll send out a message asking folks to please hold commits to all branches while the move is in progress. Thanks -Todd On Fri, Jun 10, 2011 at 11:20 AM, Todd Lipcon t...@cloudera.com wrote: Hi all, Pending any unforeseen issues, I am planning on committing HADOOP-7106 this weekend. I have the credentials from Jukka to take care of the git trees as well, and have done a practice move several times on a local mirror of the svn. I'll send out an announcement of the exact time in advance of when I actually do the commit. Thanks -Todd -- Todd Lipcon Software Engineer, Cloudera -- Todd Lipcon Software Engineer, Cloudera
HADOOP-7106 (project unsplit) this weekend
Hi all, Pending any unforeseen issues, I am planning on committing HADOOP-7106 this weekend. I have the credentials from Jukka to take care of the git trees as well, and have done a practice move several times on a local mirror of the svn. I'll send out an announcement of the exact time in advance of when I actually do the commit. Thanks -Todd -- Todd Lipcon Software Engineer, Cloudera
Re: LimitedPrivate and HBase
On Mon, Jun 6, 2011 at 9:45 AM, Allen Wittenauer a...@apache.org wrote: I have some concerns over the recent usage of LimitedPrivate being opened up to HBase. Shouldn't HBase really be sticking to public APIs rather than poking through some holes? If HBase needs an API, wouldn't other clients as well? IMO LimitedPrivate can be used to open an API for a specific project when it's not clear that the API is generally useful, and/or we anticipate the API might be pretty unstable. Marking it LimitedPrivate to HBase gives us the opportunity to talk to the HBase team and say hey, we want to rename this without @Deprecation or hey, we're going to kill this, is that OK? Making it true public, even if we call it Unstable, is a bit harder to move. I agree that most of these things in the long run would be determined generally useful and made public. Do you have a specific thing in mind? -Todd -- Todd Lipcon Software Engineer, Cloudera
Re: LimitedPrivate and HBase
On Mon, Jun 6, 2011 at 6:05 PM, Allen Wittenauer a...@apache.org wrote: On Jun 6, 2011, at 5:56 PM, Todd Lipcon wrote: Or because this is the sort of thing that could take weeks of discussion or just 5 minutes to unblock HBase from moving on to trunk. I'd rather have the weeks of discussion *after* the 5 minute patch, so people can continue to make progress. We've moved too slowly for too long. I didn't realize trunk was coming out as a release next month. If all goes well, 0.22 will come out as a release some time in that timeframe. Stack has been getting HBase running on it. This patch was to fix 0.22. Let's face it: this happened because it was HBase. If it was almost anyone else, it would have sat there and *that's* the point where I'm mainly concerned. If you want to feel better, take a look at HDFS-941, HDFS-347, and HDFS-918 - these are patches that HBase has been asking for for nearly 2 years in some cases and haven't gone in. Satisfied? -Todd -- Todd Lipcon Software Engineer, Cloudera
Re: Update on 0.22
On Wed, Jun 1, 2011 at 11:32 PM, Konstantin Shvachko shv.had...@gmail.comwrote: I can see them well. I think Suresh's point is that non-blockers are going into 0.22. Nigel, do you have full control over it? Of course it's up to Nigel to decide, but here's my personal opinion: One of the reasons we had a lot of divergence (read: external branches/forks/whatever) off of 0.20 is that the commit rules on the branch were held pretty strictly. So, if you wanted a non-critical bug fix or a small improvement, the only option was to do such things on an external fork. 0.20 was branched in December '08 and not released until mid April '09. In 4 months a fair number of bug fixes and small improvements go in. 0.22 has been around even longer. If we were to keep it to *only* blockers, then again it would be a fairly useless release due to the number of non-blocker bugs. Clearly there's a balance and a judgment call when moving things back to a branch. But at this point I'd consider small improvements and pretty much any bug fix to be reasonable, so long as it doesn't involve major reworking of components. Nigel: if this assumption doesn't jive (ha ha, get it?) with what you're thinking, please let me know :) -Todd On Wed, Jun 1, 2011 at 1:50 PM, Eric Baldeschwieler eri...@yahoo-inc.com wrote: makes sense to me, but it might be good to work to make these decisions visible so folks can understand what is happening. On Jun 1, 2011, at 1:46 PM, Owen O'Malley wrote: On Jun 1, 2011, at 1:27 PM, Suresh Srinivas wrote: I see that there are several non blockers being promoted to 0.22 from trunk. From my understanding, any non blocker change to 0.22 should be approved by vote. Is this correct? No, the Release Manager has full control over what goes into a release. The PMC votes on it once there is a release candidate. -- Owen -- Todd Lipcon Software Engineer, Cloudera
Re: Update on 0.22
On Thu, Jun 2, 2011 at 11:06 AM, Konstantin Shvachko shv.had...@gmail.comwrote: I propose just to make them blockers before committing to attract attention of the release manager and get his approval. Imho, even small changes, like HDFS-1954 are blockers, because a vague UI message is bug and bugs are blockers. Bugs are blockers? Then we'll never release! Let's hear from Nigel what he thinks. It's his branch, if he's upset about the way it's being handled, he can deal with it as he sees fit. -Todd On Thu, Jun 2, 2011 at 10:39 AM, Todd Lipcon t...@cloudera.com wrote: On Wed, Jun 1, 2011 at 11:32 PM, Konstantin Shvachko shv.had...@gmail.comwrote: I can see them well. I think Suresh's point is that non-blockers are going into 0.22. Nigel, do you have full control over it? Of course it's up to Nigel to decide, but here's my personal opinion: One of the reasons we had a lot of divergence (read: external branches/forks/whatever) off of 0.20 is that the commit rules on the branch were held pretty strictly. So, if you wanted a non-critical bug fix or a small improvement, the only option was to do such things on an external fork. 0.20 was branched in December '08 and not released until mid April '09. In 4 months a fair number of bug fixes and small improvements go in. 0.22 has been around even longer. If we were to keep it to *only* blockers, then again it would be a fairly useless release due to the number of non-blocker bugs. Clearly there's a balance and a judgment call when moving things back to a branch. But at this point I'd consider small improvements and pretty much any bug fix to be reasonable, so long as it doesn't involve major reworking of components. Nigel: if this assumption doesn't jive (ha ha, get it?) with what you're thinking, please let me know :) -Todd On Wed, Jun 1, 2011 at 1:50 PM, Eric Baldeschwieler eri...@yahoo-inc.com wrote: makes sense to me, but it might be good to work to make these decisions visible so folks can understand what is happening. On Jun 1, 2011, at 1:46 PM, Owen O'Malley wrote: On Jun 1, 2011, at 1:27 PM, Suresh Srinivas wrote: I see that there are several non blockers being promoted to 0.22 from trunk. From my understanding, any non blocker change to 0.22 should be approved by vote. Is this correct? No, the Release Manager has full control over what goes into a release. The PMC votes on it once there is a release candidate. -- Owen -- Todd Lipcon Software Engineer, Cloudera -- Todd Lipcon Software Engineer, Cloudera
Spam on wiki
FYI, I've filed the following ticket with ASF Infrastructure to see if we can get a CAPTCHA set up on our wiki: https://issues.apache.org/jira/browse/INFRA-3670 In the meantime, we've been doing a decent job of policing, let's keep it up https://issues.apache.org/jira/browse/INFRA-3670-Todd -- Todd Lipcon Software Engineer, Cloudera
Re: Release compatibility was Re: [VOTE] Release candidate 0.20.203.0-rc1
On Tue, May 10, 2011 at 12:41 PM, Scott Carey sc...@richrelevance.comwrote: As an observer, this is a very important observation. Sure, the default is that dot releases are bugfix-onl. But exceptions to these rules are sometimes required and often beneficial to the health of the project. Performance enhancements, minor features, and other items are sometimes very low risk and the barrier to getting them to users earlier should be lower. I agree whole-heartedly. These issues are the sort of things that get into non-Apache releases quickly and drive the community away from the Apache release. Its been well proven through those vehicles that back-porting minor features and improvements from trunk to an old release can be done safely. However, one shouldn't understate the difficulty of agreeing on the risk-reward tradeoff here. While risk is mostly technical, reward may vary widely based on the userbase or organization. For example, everyone would agree that security was a very risky feature to add to 20, with known backward compatibilities and a lot of fallout. For some people (both CDH and YDH), the security features were an absolute necessity on a tight timeline, so the risk-reward decision was clear -- I've heard from many users, though, that they saw none of the reward from security and wished they hadn't had to endure the resulting changes and bugs within the 0.20 series. Another example is the 0.20-append patch series, which is indispensable for the HBase community but seen as overly risky by those who do not use HBase. So, while I'm in favor of sustaining release series like 0.20-security in theory, I also think we need a clear inclusion criteria for such branches. As I said in a previous email, the criteria used to be low risk compatible bug fixes only with a vote process for any exceptions. 0.20-security is obviously entirely different, but as yet remains undefined (it's way more than just security). -Todd -- Todd Lipcon Software Engineer, Cloudera
newbie label on JIRA
Hi all, I spent this afternoon looking through JIRA to identify some issues that I think would be good for new contributors to try their hand at. In my mind, the qualities of such an issue are: - fairly straightforward issue to solve (an experienced contributor would be able to address it in 30-60 minutes) - fairly tight scope (doesn't require understanding of a lot of different moving pieces) - easy to write a unit test for (so we get new contributors on the right path of testing their changes) - not likely to be controversial among contributors I came up with about 25 of these from looking through the 0.22 and 0.23 Affects Version lists: https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=truejqlQuery=project+in+(%22HADOOP%22,+%22MAPREDUCE%22,+%22HDFS%22)+and+labels+%3D+%22newbie%22 I'd like to encourage others to look through any JIRAs that they think fit the bill, and add the same label. Then, we can point new contributors at this list of JIRAs -- hopefully this will get them on the right path towards understanding our project's workflow and give some nice positive reinforcement since they should be easy to review and commit quickly. Thanks! -Todd -- Todd Lipcon Software Engineer, Cloudera
Re: HADOOP-7106: Re-organize hadoop subversion layout
Hey folks, FYI I'm in the process of loading one of the SVN dumps onto a server here. MAN is it slow. Going maybe 2 revisions/sec so I should have an SVN replica in somewhere around a week to test with. @Infra: I don't suppose it's possible to get a writable snapshot mounted somehow? If I recall correctly, ZFS supports this and svn.apache.org runs ZFS? -Todd On Fri, Apr 29, 2011 at 1:16 PM, Nigel Daley nda...@mac.com wrote: I can't do this at 2pm now. Todd, I suspect you want more time to try out the svn/git test anyways. Let's shoot for next Wednesday at 2pm. Ian should be back by then too. Any objections? Cheers, Nige On Apr 29, 2011, at 11:36 AM, Owen O'Malley wrote: On Apr 28, 2011, at 11:24 PM, Todd Lipcon wrote: Wasn't sure how to go about doing that. I guess we need to talk to infra about it? Do you know how we might clone the SVN repos themselves to test with? It looks like there are svn dumps at http://svn-master.apache.org/dump/from 2 april 2011. You should be able to use those to setup a local subversion. -- Owen -- Todd Lipcon Software Engineer, Cloudera
Re: [VOTE] Release candidate 0.20.203.0-rc1
-1 for the same reasons I outlined in my email yesterday. This is not a community artifact following the community's processes, and thus should not be an official release until those issues are addressed. On Wed, May 4, 2011 at 3:17 PM, Doug Cutting cutt...@apache.org wrote: -1 This candidate has lots of patches that are not in trunk, potentially adding regressions to 0.22 and 0.23. This should be addressed before we release from 0.20-security. We should also not move to four-component version numbering. A release from the 0.20-security branch should perhaps be called 0.20.100. Doug On 05/04/2011 10:31 AM, Owen O'Malley wrote: Here's an updated release candidate for 0.20.203.0. I've incorporated the feedback and included all of the patches from 0.20.2, which is the last stable release. I also fixed the eclipse-plugin problem. The candidate is at: http://people.apache.org/~omalley/hadoop-0.20.203.0-rc1/ Please download it, inspect it, compile it, and test it. Clearly, I'm +1. -- Owen -- Todd Lipcon Software Engineer, Cloudera
Re: [VOTE] Release candidate 0.20.203.0-rc1
With Cloudera hat on, I agree with Eli's assessment. With Apache hat on, I don't see how this is at all relevant to the task at hand. I would make the same arguments against taking CDH3 and releasing it as an ASF artifact -- we'd also have a certain amount of work to do to make sure that all of the patches are in trunk, first. Additionally, I'd want to outline what the inclusion criteria would be for that branch. -Todd On Wed, May 4, 2011 at 3:24 PM, Eli Collins e...@cloudera.com wrote: With my Cloudera hat on.. When we went through the 10x and 20x patches we only pulled a subset of them, primarily for security and the general improvements that we thought were good. We found both incompatible changes and some sketchy changes that we did not pull in from a quality perspective. There is a big difference between a patch set that's acceptable for Yahoo!'s user base and one that's a more general artifact. When we evaluated the YDH patch sets we were using that frame of mind. I'm now looking it in terms of an Apache release. And the place to review changes for an Apache release is on jira. CDH3 is based on the latest stable Apache release (20.2) so it doesn't regress against it. I'm nervous about rebasing future releases on 203 because of the compatibility and quality implications. Thanks, Eli On Wed, May 4, 2011 at 3:06 PM, Suresh Srinivas sures...@yahoo-inc.com wrote: Eli, How many of these patches that you find troublesome are in CDH already? Regards, Suresh On 5/4/11 3:03 PM, Eli Collins e...@cloudera.com wrote: On Wed, May 4, 2011 at 10:31 AM, Owen O'Malley omal...@apache.org wrote: Here's an updated release candidate for 0.20.203.0. I've incorporated the feedback and included all of the patches from 0.20.2, which is the last stable release. I also fixed the eclipse-plugin problem. The candidate is at: http://people.apache.org/~omalley/hadoop-0.20.203.0-rc1/ Please download it, inspect it, compile it, and test it. Clearly, I'm +1. -- Owen While rc2 is an improvement on rc1, I am -1 on this particular rc. Rationale: This rc contains many patches not yet committed to trunk. This would cause the next major release (0.22) to be a feature regression against our latest stable release (203), were 0.22 released soon. This rc contains many patches not yet reviewed by the community via the normal process (jira, patch against trunk, merge to a release branch). I think we should respect the existing community process that has been used for all previous releases. This rc introduces a new development and braching model (new feature development outside trunk) and Hadoop versioning scheme without sufficient discussion or proposal of these changes with the community. We should establish new process before the release, a release is not the appropriate mechanism for changing our review and development process or versioning . I do support a release from branch-0.20-security that follows the existing, established community process. Thanks, Eli -- Todd Lipcon Software Engineer, Cloudera
Re: [VOTE] Release candidate 0.20.203.0-rc1
On Wed, May 4, 2011 at 4:11 PM, Arun C Murthy a...@yahoo-inc.com wrote: On May 4, 2011, at 4:09 PM, Tsz Wo (Nicholas), Sze wrote: The list seems highly inaccurate. Checked the first few N/A items. All are false positives. Also, can you please provide a list on features which are not related to gridmix benchmarks or herriot tests? Here are a few I quickly pulled up: MAPREDUCE-2316 (docs for improved capacity scheduler) MAPREDUCE-2355 (adds new config for heartbeat dampening in MR) BZ-4182948. Add statistics logging to Fred for better visibility into startup time costs. (Matt Foley) - I believe I saw a note from Matt on the JIRA yesterday about this feature, where he decided that the version done in 203 wasn't a good approach, and it's done differently in trunk (not sure if done yet). MAPREDUCE-2364 (important bug fix for localization) - in fact most of localization is different in this branch compared to trunk due to inclusion of MAPREDUCE-2378, the trunk version of which is still on the yahoo-merge branch,. New cunters for FileInput/OutputFormat. New Counter MAP_OUTPUT_MATERIALZIED_BYTES. Related bugs: 4241034, 3418543, 4217546 - not sure which JIRA this is, I think I've seen a JIRA for trunk, but not committed. - MAPREDUCE-1904, committed without JIRA as: . Reducing new Path(), RawFileStatus() creation overhead in LocalDirAllocator not in trunk +BZ4101537 . When a queue is built without any access rights we explain the +problem. (dking, rvw ramach) [attachment of 2010-11-24] seems to be on trunk as MR-2411, but not committed, best I can tell, despite the JIRA there being resolved (based on looking at QueueManager in trunk) . Remove unnecessary reference to user configuration from TaskDistributedCacheManager causing memory leaks Not in trunk, not sure which JIRA it might be.. probably part of 2178. Major new feature: MAPREDUCE-323 - very large rework of how job history files are managed Major change: MAPREDUCE-1100/MAPREDUCE-1176: unresolved on trunk, though probably will be attacked by different JIRAs Major new ops-visible feature: metrics2 system Major new ops-visible feature: MAPREDUCE-291 job history can be viewed from a separate server Major new set of user-visible configurations: MAPREDUCE-1943 and friends which implement new limits in MapReduce (eg MAPREDUCE-1872 as well) I have code to work on, so I won't keep going, but this is from looking at the last couple months of 203. -Todd -- Todd Lipcon Software Engineer, Cloudera
Re: [VOTE] Release candidate 0.20.203.0-rc0
on a wiki page (or web site page) regarding the currently active branches? Thanks -Todd On Tue, May 3, 2011 at 10:02 AM, Eli Collins e...@cloudera.com wrote: I think we still need to incorporate the patches currently checked into branch 0.20. For example, Owen identified a major bug (BooleanWritable's comparator is broken) and filed a jira (HADOOP-6928) to put it in branch-0.20, where I reviewed it and checked it in, so this bug would be fixed in the next stable release. However this change is not in branch-0.20-security-203. Unless we put the delta from branch-0.20 into this release, it is missing important bug fixes that will cause it to regress against 20.3 (if it ever is released). I am also nervous about changes like the one identified by HADOOP-7255. It looks like this change caused a significant regression in TestDFSIO throughput. It changes the core Task class, the commit log is a single line, and as far as I can tell it was not discussed or reviewed by anyone in the community. Don't changes like this at least deserve a jira before we release them? Thanks, Eli On Tue, May 3, 2011 at 1:39 AM, Konstantin Shvachko shv.had...@gmail.com wrote: I think its a good idea to release hadoop-0.20.203. It moves Apache Hadoop a step forward. Looks like the technical difficulties are resolved now with latest Arun's commits. Being a superset of hadoop-0.20.2 it can be considered based on one of the official Apache releases. I don't think there was a lack of discussions on the lists about the issues included in the release candidate. Todd did a thorough review of the entire security branch. Many developers participated in discussions. Agreeing with Stack I wish HBase was considered a primary target for Hadoop support. But it is not realistic to have it in hadoop-0.20.203. I have some experience running a version of this release candidate on a large cluster. It works. I would add a couple of patches, which make it run on Windows for me like HADOOP-7110, HADOOP-7126. But those are not blockers. Thanks, --Konstantin On Mon, May 2, 2011 at 5:12 PM, Ian Holsman had...@holsman.net wrote: On May 3, 2011, at 9:58 AM, Arun C Murthy wrote: Owen, Suresh and I have committed everything on this list except HADOOP-6386 and HADOOP-6428. Not sure which of the two are relevant/ necessary, I'll check with Cos. Other than that hadoop-0.20.203 now a superset of hadoop-0.20.2. Missed adding HADOOP-5759 to that list, I'll check with Amareshwari before committing. Arun Thanks for doing this so fast Arun. -- Todd Lipcon Software Engineer, Cloudera
Re: HADOOP-7106: Re-organize hadoop subversion layout
On Thu, Apr 28, 2011 at 10:06 PM, Nigel Daley nda...@mac.com wrote: As announced last week, I'm planning to do this at 2pm PDT tomorrow (Friday) April 29. Suresh, when do you plan to commit HFS-1052? That should be done first. Owen or Todd, did you want to follow Paul's advice: If you're really wanting to make sure to keep the history in Git intact my suggestion would be to setup a temporary svn server locally and test our mirroring scripts against the commands you intend to run. If so, how much more time do you need? Wasn't sure how to go about doing that. I guess we need to talk to infra about it? Do you know how we might clone the SVN repos themselves to test with? -Todd On Apr 20, 2011, at 9:42 PM, Nigel Daley wrote: Owen, I'll admit I'm not familiar with all the git details/issues in your proposal, but I think the layout change you propose is fine and seems to solve the git issues with very minimal impact on the layout. Let's shoot for doing this next Friday, April 29 at 2pm PDT. I'll update the patch and send out a reminder about this later next week. Thanks, Nige On Apr 20, 2011, at 8:00 AM, Owen O'Malley wrote: On Apr 19, 2011, at 10:58 PM, Todd Lipcon wrote: On Tue, Apr 19, 2011 at 10:20 PM, Todd Lipcon t...@cloudera.com wrote: I'm currently looking into how the git mirrors are setup in Apache-land. Uh, why isn't infra-dev on this thread? For those on infra-dev, the context is that Nigel is trying to merge together the source trees of the Hadoop sub-projects that were split apart 2 years ago. So he is taking: prefix = http://svn.apache.org/repos/asf/hadoop/ $prefix/common/trunk - $prefix/trunk/common $prefix/hdfs/trunk - $prefix/trunk/hdfs $prefix/mapreduce/trunk - $prefix/trunk/mapreduce and play similar games with the rest of the branches and tags. For more details look at HADOOP-7106. From the project split, subversion was able to track the history across the subversion moves between projects, but not git. Four questions: 1. Is there anything we can do to minimize the history loss in git? 2. Are we going to be able to preserve our sha's or are they going to change again? 3. What changes do we need to make to the subversion notification file? 4. Are there any other changes that need to be coordinated? After considering it this morning, I believe that the least disruptive move is to leave common at the same url and merge hdfs and mapreduce back in: $prefix/common/trunk/* - $prefix/common/trunk/common/* $prefix/hdfs/trunk - $prefix/common/trunk/hdfs $prefix/mapreduce/trunk - $prefix/common/trunk/mapreduce This will preserve the hashes and history for common (and the 20 branches). We'll still need to play git voodoo to get git history for hdfs and mapreduce, but it is far better than starting a brand new git clone. -- Owen -- Todd Lipcon Software Engineer, Cloudera
Re: HADOOP-7106: Re-organize hadoop subversion layout
On Tue, Apr 19, 2011 at 10:02 PM, Nigel Daley nda...@mac.com wrote: I'm still planning to make this SVN change on Thursday this week. Ian, Owen, Todd, note the questions I ask you below. Can you help with these on Thursday? Unfortunately I'm out of the office most of the day on Thursday with a customer. I'll be available Thursday evening, though, to help with any cleanup/etc. I'm currently looking into how the git mirrors are setup in Apache-land. My guess is that there will be some disturbance to developers on Thurs afternoon / Friday as this gets sorted out, even if we try to plan as much as possible. Would it be better to do this on Friday so that we have the weekend to fix up broken pieces before people get to work on Monday? -Todd On Apr 9, 2011, at 11:09 PM, Nigel Daley wrote: All, As discussed in Jan/Feb, I'd like to coordinate a date for committing the re-organization of our svn layout: https://issues.apache.org/jira/browse/HADOOP-7106. I propose Thursday April 21 at 11am PDT. - I will send out reminders leading up to that date. - I will announce on IRC when I'm about to start the changes. - I will run the script to make the changes. - Ian, can you update the asf-authorization-template file and the asf-mailer.conf files at the same time? - Owen/Todd/Jukka, can you make sure that actions needed by git users are taken care of at the same time? (what are these?) More info on this change is at http://wiki.apache.org/hadoop/ProjectSplit Cheers, Nige -- Todd Lipcon Software Engineer, Cloudera
Re: HADOOP-7106: Re-organize hadoop subversion layout
On Tue, Apr 19, 2011 at 10:20 PM, Todd Lipcon t...@cloudera.com wrote: I'm currently looking into how the git mirrors are setup in Apache-land. Git-wise, I think we have two options: Option 1) - Create a new git mirror for the new hadoop/ tree. This will have no history. - On the Apache side, fetch the split-project git mirrors into the combined git mirror as branches - eg hadoop-hdfs.git:trunk becomes a branch named something like pre-HADOOP-7106/hdfs/trunk. Thus, when any user fetches, he'll get all the git objects from prehistory as well without having to add separate remotes. - Add a script or README file explaining how to set up git grafts on the combined hadoop.git so that the new combination branch foo looks like a merge of pre-HADOOP-7106/{hdfs,common,mapred}/foo. Since git grafts are local constructs, each git user would have to run this script once after checking out the git tree, after which the history would be healed Pros: - all existing sha1s stay the same. - Any local branches people might have for works in progress should continue to refer to proper SHA1s and should rebase relatively easily onto the combined trunk - Should be reasonably simple to implement Cons: - users have to run a script upon checkout in order to graft back together history Option 2) - Use git-filter-branch on the split repos to rewrite them as if they always took place in their new subdirectories. - Fetch these repos into the merged repo - Set up grafts in the merged repo - Run git-filter-branch --all in the merged repo, which will make the grafts permanent - May have to run git-filter-branch to rewrite some of the git-svn-info: commit messages to trick git-svn. This option basically rewrites history so that it looks like the original project split did what we're planning to do now. Pros: - we have a single cohesive git repo with no need to have users set up grafts Cons: - all of our SHA1s between the original split and now would change (making it harder to rebase local branches for example) - way more opportunity for error, I think. I'm leaning towards option 1 above, and happy to write the script which installs the grafts into the user's local repo. -Todd On Apr 9, 2011, at 11:09 PM, Nigel Daley wrote: All, As discussed in Jan/Feb, I'd like to coordinate a date for committing the re-organization of our svn layout: https://issues.apache.org/jira/browse/HADOOP-7106. I propose Thursday April 21 at 11am PDT. - I will send out reminders leading up to that date. - I will announce on IRC when I'm about to start the changes. - I will run the script to make the changes. - Ian, can you update the asf-authorization-template file and the asf-mailer.conf files at the same time? - Owen/Todd/Jukka, can you make sure that actions needed by git users are taken care of at the same time? (what are these?) More info on this change is at http://wiki.apache.org/hadoop/ProjectSplit Cheers, Nige -- Todd Lipcon Software Engineer, Cloudera -- Todd Lipcon Software Engineer, Cloudera
Re: [ANNOUNCEMENT] Yahoo focusing on Apache Hadoop, discontinuing The Yahoo Distribution of Hadoop
Is there a list available of which patches you've made this decision about? I'm curious, for example, about MAPREDUCE-2178 -- as of today, the MR security in trunk has a serious vulnerability. Do we plan on fixing it, or will the answer be that, if anyone needs security, they must update to MR Next Gen? -Todd On Thu, Apr 7, 2011 at 3:52 PM, Arun C Murthy a...@yahoo-inc.com wrote: On Feb 14, 2011, at 1:34 PM, Arun C Murthy wrote: As the final installment in this process, I've started a discussion on us contributing a re-factor of Map-Reduce in https://issues.apache.org/jira/browse/MAPREDUCE-279 . Hi Folks, We wanted to share our thoughts around the co-development of the NextGen MapReduce branch (Jira MR-279), maintaining the branch-0.20-security and merging the work on the security branch with trunk. We've concluded that it does not make sense for us to port a very small subset of the work from the branch-0.20-security to the Hadoop mainline. The JIRAs we don't plan to port all effect areas of the mainline that are going to be replaced by work in the NextGen MapReduce branch ( http://svn.apache.org/viewvc/hadoop/mapreduce/branches/MR-279/). We've been working on the NextGen MapReduce branch (MAPREDUCE-279) within Apache for a while now and are excited about it's progress. We think that this branch will be a huge improvement in scalability, performance and functionality. We are now confident that we can get it ready for release in in the next few months. We believe that the next major release of Apache Hadoop we will test at Yahoo will include the work in this branch and we are committed to merging the NextGen branch into the mainline after the PMC approves the merge. Meanwhile, we have continued to find and fix bugs on branch-0.20-security and have been working to port that work into the Hadoop mainline. Most of this work is done and we've also brought all the patches in from our github branch into apache subversion, so that it is easy for everyone to see the work remaining. What we've found is that some of the work in branch-0.20-security is in code sections that have been completely replaced / refactored in the NextGen MapReduce branch. Since we are committed to the NextGen branch, we don't think there is any upside in porting this code into portions of mainline we expect to discard. All of these JIRAs will be fixed in the NextGen MapReduce branch and through there ultimately in trunk (assuming the PMC approves the merge). So at this point it is our intent to not port the JIRAs listed above to trunk, but to wait until we merge NextGen into trunk to resolve these issues there. If you are interested in seeing these issues ported to mainline, let us know. We are happy to help review your patches and explain context to anyone who is interested in doing this work. Arun and Eric -- Todd Lipcon Software Engineer, Cloudera
Re: Proposal: Further Project Split(s)
+4.01. This is a terrific idea. On Fri, Apr 1, 2011 at 1:19 AM, Aaron T. Myers a...@cloudera.com wrote: Hello Hadoop Community, Given the tremendous positive feedback we've all had regarding the HDFS, MapReduce, and Common project split, I'd like to propose we take the next step and further separate the existing projects. I propose we begin by splitting the MapReduce project into separate Map and Reduce sub-projects. This will provide us the opportunity to tease out the complex interdependencies between map and reduce that exist today, to encourage us to write more modular and isolated code, which should speed releases. This will also aid our users who exclusively run map-only or reduce-only jobs. These are important use-cases, and so should be given high priority. Given that these two portions of the existing MapReduce project share a great deal of code, we will likely need to release these two new projects concurrently at first, but the eventual goal should certainly be to be able to release Map and Reduce independently. This seems intuitive to me, given the remarkable recent advancements in the academic community regarding reduce, while the research coming out of the map academics has largely stagnated of late. If this proposal is accepted, and it has the success I think it will, then we should strongly consider splitting the other two projects as well. My gut instinct is that we should split HDFS into HD and FS sub-projects, and simply rename the Common project to C'Mon. We can think about the details of what exactly these project splits mean later. Please let me know what you think. Best, Aaron -- Todd Lipcon Software Engineer, Cloudera
Maintenance of Hadoop 0.21 branch?
Hi all, Some recent discussion on HDFS-1786 has raised an interesting question: does anyone plan on maintaining the 0.21 branch and eventually releasing an 0.21.1? Should we bother to commit bug fixes to this branch? It seems to me that our time would be better spent getting 0.22 and trunk back to a green state so we can talk about releasing them, rather than applying patches to a branch with no releases planned. Of course a decision now doesn't preclude anyone from stepping up later, backporting patches to 0.21, and releasing an 0.21.1. Thanks -Todd -- Todd Lipcon Software Engineer, Cloudera
Re: Maintenance of Hadoop 0.21 branch?
On Wed, Mar 30, 2011 at 2:50 PM, Tsz Wo (Nicholas), Sze s29752-hadoopgene...@yahoo.com wrote: I recall that some users are using 0.21. Should we discuss this on the users mailing lists? I thought -general was considered a user mailing list for hadoop-wide discussions like this? We can add +CC all the user lists, or just common-user which most people are on, if you think that's better? -Todd From: Todd Lipcon t...@cloudera.com To: general@hadoop.apache.org Sent: Wed, March 30, 2011 2:41:35 PM Subject: Maintenance of Hadoop 0.21 branch? Hi all, Some recent discussion on HDFS-1786 has raised an interesting question: does anyone plan on maintaining the 0.21 branch and eventually releasing an 0.21.1? Should we bother to commit bug fixes to this branch? It seems to me that our time would be better spent getting 0.22 and trunk back to a green state so we can talk about releasing them, rather than applying patches to a branch with no releases planned. Of course a decision now doesn't preclude anyone from stepping up later, backporting patches to 0.21, and releasing an 0.21.1. Thanks -Todd -- Todd Lipcon Software Engineer, Cloudera -- Todd Lipcon Software Engineer, Cloudera
Re: [VOTE] Abandon hod Common contrib
Can any committer with knowledge of HOD please review this patch? If there are no committers with such knowledge, I would encourage us to either (a) add a committer to maintain hod, or (b) reconsider the vote to abandon it as an official contrib. Perhaps Simone and Gianluigi could move it to a separate incubator project? -Todd On Fri, Feb 18, 2011 at 6:40 AM, Simone Leo simone@crs4.it wrote: I am the co-author (with Gianluigi Zanetti) of HADOOP-6369 -- add Grid Engine support to HOD. At CRS4 we've been using (our patched version of) HOD since 2008 and we still use it in production. We use Hadoop 0.20.2 since it was released one year ago. Simone On 02/12/11 06:15, Owen O'Malley wrote: On Feb 11, 2011, at 6:17 PM, Nigel Daley wrote: a) I don't think hod is actually part of any unit tests, so including it would likely only be a burden on the tarball size. Not true. HOD has python unit tests and is the reason our builds have dependencies on python. But Allen's point is that I don't recall ever seeing HOD test failures causing the build to fail. b) The edu community uses this quite extensively, evidenced by the topic coming up on the mailing lists at least once every two months or so and has for years. Can't say that about the other contrib modules other than the schedulers and streaming. Then they are using old version of Hadoop. AFAICT HOD does not work with 0.20 or beyond. Out of curiosity, what goes wrong? Clearly nothing major has changed in starting up a mapreduce cluster in a very long time. c) The community that does use it has even submitted a patch that we've ignored. Which means the committers of this project gave up on it long ago. There are also some patches on core Hadoop that have been sitting for a long time, so I don't think that is a valid inference. I would love to hear some of the people who are using HOD speak up and give us their feedback. -- Owen -- Simone Leo Data Fusion - Distributed Computing CRS4 POLARIS - Building #1 Piscina Manna I-09010 Pula (CA) - Italy e-mail: simone@crs4.it http://www.crs4.it -- Todd Lipcon Software Engineer, Cloudera
[ANN] HBase 0.90.1 available for download
The Apache HBase team is happy to announce the general availability of HBase 0.90.1, available from your Apache mirror of choice: http://www.apache.org/dyn/closer.cgi/hbase/ [at the time of this writing, not all mirrors have updated yet -- please pick a different mirror if your first choice does not show 0.90.1] HBase 0.90.1 is a maintenance release that fixes several important bugs since version 0.90.0, while retaining API and data compatibility. The release notes may be found on the Apache JIRA: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310753version=12315548 Users upgrading from HBase 0.90.0 may upgrade clients and servers separately, though it is recommended that both be upgraded. If upgrading from a version of HBase prior to 0.90.0, please read the notes accompanying that release: http://osdir.com/ml/general-hadoop-apache/2011-01/msg00208.html As always, many thanks to those who contributed to this release! -The HBase Team
Re: [DISCUSS] Move common, hdfs, mapreduce contrib components to apache-extras.org or elsewhere
On Tue, Feb 1, 2011 at 1:02 AM, Allen Wittenauer awittena...@linkedin.comwrote: So is the expectation that users would have to follow bread crumbs to the github dumping ground, then try to figure out which repo is the 'better' choice for their usage? Using LZO as an example, it appears we have a choice of kevin's, your's, or the master without even taking into consideration any tags. That sounds like a recipe for disaster that's even worse than what we have today. Kevin's and mine are currently identical (0e7005136e4160ed4cc157c4ddd7f4f1c6e11ffa) Not sure who the master is -- maybe you're referring to the Google Code repo? The reason we started working on github over a year ago is that the bugs we reported (and provided diffs for) in the Google Code project were ignored. For example: http://code.google.com/p/hadoop-gpl-compression/issues/detail?id=17 In fact this repo hasn't been updated since Sep '09: http://code.google.com/p/hadoop-gpl-compression/source/list Github provided an excellent place to collaborate on the project, make progress, fix bugs, and provide a better product for the users. As for dumping ground, I don't quite follow your point - we develop in the open, accept pull requests from users, and code review each others' changes. Since October every commit has either been contributed by or fixes a bug reported by a user completely outside of the organizations where Kevin and I work. I agree that it's a bit of breadcrumb following to find the repo, though. We do at least have a link on the wiki: http://wiki.apache.org/hadoop/UsingLzoCompression which points to Kevin's repo. Perhaps the best solution here is to add a page to the official Hadoop site (not just the wiki) with links to actively maintained contrib projects? IMO the more we can take non-core components and move them to separate release timelines, the better. Yes, it is harder for users, but it also is easier for them when they hit a bug - they don't have to wait months for a wholesale upgrade which might contain hundreds of other changes to core components. I'd agree except for one thing: even when users do provide patches to contrib components we ignore them. How long have those patches for HOD been sitting there in the patch queue? So of course they wait months/years--because we seemingly ignore anything that isn't important to us. Unfortunately, that covers a large chunk of contrib. :( True - we ignore them because the core contributors generally have little clue about the contrib components, so don't feel qualified to review. I'll happily admit that I've never run failmon, index, dynamic-scheduler, eclipse-plugin, data_join, mumak, or vertica contribs. Wouldn't you rather these components lived on github so the people who wrote them could update them as they wished without having to wait on committers who have little to no clue about how to evaluate the changes? -Todd -- Todd Lipcon Software Engineer, Cloudera
Re: [DISCUSS] Move common, hdfs, mapreduce contrib components to apache-extras.org or elsewhere
On Tue, Feb 1, 2011 at 9:37 AM, Tom White t...@cloudera.com wrote: HBase moved all its contrib components out of the main tree a few months back - can anyone comment how that worked out? Sure. For each contrib: ec2: no longer exists, and now has been integrated into Whirr and much improved. Whirr has made several releases in the time that HBase has made one. The whirr contributors know way more about cloud deployment than the HBase contributors (except where they happen to overlap). Strong net positive. mdc_replication: pulled into core since it's developed by core committers and also needs a fair amount of tight integration with core components stargate: pulled into core - it was only in contrib as a sort of staging ground - it's really an improved/new version of the rest interface we already had in core. transactional: moved to github - this has languished a bit on github because only one person was actively maintaining it. However, it had already been languishing as part of contrib - even though it compiled, it never really worked very well in HBase trunk. So, moving it to a place where it's languished has just made it more obvious what was already true - that it isn't a well supported component (yet). Recently it's been taken back up by the author of it - if it develops a large user base it can move quickly and evolve without waiting on our release. Net: probably a wash So, overall, I'd say it was a good decision. Though we never had the same number of contribs that Hadoop seems to have sprouted. -Todd On Tue, Feb 1, 2011 at 1:02 AM, Allen Wittenauer awittena...@linkedin.com wrote: On Jan 31, 2011, at 3:23 PM, Todd Lipcon wrote: On Sun, Jan 30, 2011 at 11:19 PM, Owen O'Malley omal...@apache.org wrote: Also note that pushing code out of Hadoop has a high cost. There are at least 3 forks of the hadoop-gpl-compression code. That creates a lot of confusion for the users. A lot of users never go to the work to figure out which fork and branch of hadoop-gpl-compression work with the version of Hadoop they installed. Indeed it creates confusion, but in my opinion it has been very successful modulo that confusion. I'm not sure how the above works with what you wrote below: In particular, Kevin and I (who each have a repo on github but basically co-maintain a branch) have done about 8 bugfix releases of LZO in the last year. The ability to take a bug and turn it around into a release within a few days has been very beneficial to the users. If it were part of core Hadoop, people would be forced to live with these blocker bugs for months at a time between dot releases. So is the expectation that users would have to follow bread crumbs to the github dumping ground, then try to figure out which repo is the 'better' choice for their usage? Using LZO as an example, it appears we have a choice of kevin's, your's, or the master without even taking into consideration any tags. That sounds like a recipe for disaster that's even worse than what we have today. IMO the more we can take non-core components and move them to separate release timelines, the better. Yes, it is harder for users, but it also is easier for them when they hit a bug - they don't have to wait months for a wholesale upgrade which might contain hundreds of other changes to core components. I'd agree except for one thing: even when users do provide patches to contrib components we ignore them. How long have those patches for HOD been sitting there in the patch queue? So of course they wait months/years--because we seemingly ignore anything that isn't important to us. Unfortunately, that covers a large chunk of contrib. :( -- Todd Lipcon Software Engineer, Cloudera
Re: Hadoop-common-trunk-Commit is failing since 01/19/2011
On Mon, Jan 31, 2011 at 1:57 PM, Konstantin Shvachko shv.had...@gmail.comwrote: Anybody with gcc active could you please verify if the problem is caused by HADOOP-6864. I can build common trunk just fine on CentOS 5.5 including native. I think the issue is somehow isolated to the build machines. Anyone know what OS they've got? Or can I swing an account on the box where the failures are happening? -Todd On Mon, Jan 31, 2011 at 1:36 PM, Ted Dunning tdunn...@maprtech.com wrote: The has been a problem with more than one build failing (Mahout is the one that I saw first) due to a change in maven version which meant that the clover license isn't being found properly. At least, that is the tale I heard from infra. On Mon, Jan 31, 2011 at 1:31 PM, Eli Collins e...@cloudera.com wrote: Hey Konstantin, The only build breakage I saw from HADOOP-6904 is MAPREDUCE-2290, which was fixed. Trees from trunk are compiling against each other for me (eg each installed to a local maven repo), perhaps the upstream maven repo hasn't been updated with the latest bits yet. Thanks, Eli On Mon, Jan 31, 2011 at 12:14 PM, Konstantin Shvachko shv.had...@gmail.com wrote: Sending this to general to attract urgent attention. Both HDFS and MapReduce are not compiling since HADOOP-6904 and its hdfs and MP counterparts were committed. The problem is not with this patch as described below, but I think those commits should be reversed if Common integration build cannot be restored promptly. Thanks, --Konstantin On Fri, Jan 28, 2011 at 5:53 PM, Konstantin Shvachko shv.had...@gmail.comwrote: I see Hadoop-common-trunk-Commit is failing and not sending any emails. It times out on native compilation and aborts. Therefore changes are not integrated, and now it lead to hdfs and mapreduce both not compiling. Can somebody please take a look at this. The last few lines of the build are below. Thanks --Konstantin [javah] [Loaded /grid/0/hudson/hudson-slave/workspace/Hadoop-Common-trunk-Commit/trunk/build/classes/org/apache/hadoop/security/JniBasedUnixGroupsMapping.class] [javah] [Loaded /homes/hudson/tools/java/jdk1.6.0_11-32/jre/lib/rt.jar(java/lang/Object.class)] [javah] [Forcefully writing file /grid/0/hudson/hudson-slave/workspace/Hadoop-Common-trunk-Commit/trunk/build/native/Linux-i386-32/src/org/apache/hadoop/security/org_apache_hadoop_security_JniBasedUnixGroupsNetgroupMapping.h] [exec] checking for gcc... gcc [exec] checking whether the C compiler works... yes [exec] checking for C compiler default output file name... a.out [exec] checking for suffix of executables... Build timed out. Aborting Build was aborted [FINDBUGS] Skipping publisher since build result is ABORTED Publishing Javadoc Archiving artifacts Recording test results No test report files were found. Configuration error? Recording fingerprints [exec] Terminated Publishing Clover coverage report... No Clover report will be published due to a Build Failure No emails were triggered. Finished: ABORTED -- Todd Lipcon Software Engineer, Cloudera
Re: [DISCUSS] Move common, hdfs, mapreduce contrib components to apache-extras.org or elsewhere
On Sun, Jan 30, 2011 at 11:19 PM, Owen O'Malley omal...@apache.org wrote: Also note that pushing code out of Hadoop has a high cost. There are at least 3 forks of the hadoop-gpl-compression code. That creates a lot of confusion for the users. A lot of users never go to the work to figure out which fork and branch of hadoop-gpl-compression work with the version of Hadoop they installed. Indeed it creates confusion, but in my opinion it has been very successful modulo that confusion. In particular, Kevin and I (who each have a repo on github but basically co-maintain a branch) have done about 8 bugfix releases of LZO in the last year. The ability to take a bug and turn it around into a release within a few days has been very beneficial to the users. If it were part of core Hadoop, people would be forced to live with these blocker bugs for months at a time between dot releases. IMO the more we can take non-core components and move them to separate release timelines, the better. Yes, it is harder for users, but it also is easier for them when they hit a bug - they don't have to wait months for a wholesale upgrade which might contain hundreds of other changes to core components. I think this will also help the situation where people have set up shop on branches -- a lot of the value of these branches comes from the frequency of backports and bugfixes to non-core components. If the non-core stuff were on a faster timeline upstream, we could maintain core stability while also offering people the latest and greatest libraries, tools, codecs, etc. -Todd -- Todd Lipcon Software Engineer, Cloudera
Re: Patch testing
On Wed, Jan 26, 2011 at 10:05 AM, Nigel Daley nda...@mac.com wrote: raid (contrib) test hanging: TestBlockFixer I forced 2 thread dumps. Both hung in the same place. Filed https://issues.apache.org/jira/browse/MAPREDUCE-2283 This is a blocker for turning on MR precommit. Since this is contrib, I'd like to suggest just disabling this test temporarily. We can re-enable it once it's fixed. Not having MR pre-commit working has been pretty painful. -Todd On Jan 25, 2011, at 11:19 PM, Nigel Daley wrote: Started another trial run of MR precommit testing: https://hudson.apache.org/hudson/view/G-L/view/Hadoop/job/PreCommit-MAPREDUCE-Build/17/ Let's see if 17th time is a charm... Nige On Jan 7, 2011, at 5:14 PM, Todd Lipcon wrote: On Fri, Jan 7, 2011 at 2:11 PM, Nigel Daley nda...@mac.com wrote: Hrm, the MR precommit test I'm running has hung (been running for 14 hours so far). FWIW, 2 HDFS precommit tests are hung too. I suspect it could be the NFS mounts on the machines. I forced a thread dump which you can see in the console: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/10/console Strange, haven't seen a hang like that before in handleConnectionFailure. It should retry for 15 minutes max in that loop. Any other ideas why these might be hanging? There is an HDFS bug right now that can cause hangs on some tests - HDFS-1529 - would appreciate if someone can take a look. But I don't think this is responsible for the MR hang above. -Todd On Jan 5, 2011, at 5:42 PM, Todd Lipcon wrote: On Wed, Jan 5, 2011 at 4:39 PM, Nigel Daley nda...@mac.com wrote: Thanks for looking into it Todd. Let's first see if you think it can be fixed quickly. Let me know. No problem, it wasn't too bad after all. Patch up on HADOOP-7087 which fixes this test timeout for me. -Todd On Jan 5, 2011, at 4:33 PM, Todd Lipcon wrote: On Wed, Jan 5, 2011 at 4:19 PM, Nigel Daley nda...@mac.com wrote: Todd, would love to get https://issues.apache.org/jira/browse/MAPREDUCE-2121 fixed first since this is failing every night on trunk. What if we disable that test, move that issue to 0.22 blocker, and then enable the test-patch? I'll also look into that one today, but if it's something that will take a while to fix, I don't think we should hold off the useful testing for all the other patches. -Todd On Jan 5, 2011, at 2:45 PM, Todd Lipcon wrote: Hi Nigel, MAPREDUCE-2172 has been fixed for a while. Are there any other particular JIRAs you think need to be fixed before the MR test-patch queue gets enabled? I have a lot of outstanding patches and doing all the test-patch turnaround manually on 3 different boxes is a real headache. Thanks -Todd On Tue, Dec 21, 2010 at 1:33 PM, Nigel Daley nda...@mac.com wrote: Ok, HDFS is now enabled. You'll see a stream of updates shortly on the ~30 Patch Available HDFS issues. Nige On Dec 20, 2010, at 12:42 PM, Jakob Homan wrote: I committed HDFS-1511 this morning. We should be good to go. I can haz snooty robot butler? On Fri, Dec 17, 2010 at 8:31 PM, Konstantin Boudnik c...@apache.org wrote: Thanks Jacob. I am wasted already but I can do it on Sun, I think, unless it is done earlier. -- Take care, Konstantin (Cos) Boudnik On Fri, Dec 17, 2010 at 19:41, Jakob Homan jgho...@gmail.com wrote: Ok. I'll get a patch out for 1511 tomorrow, unless someone wants to whip one up tonight. On Fri, Dec 17, 2010 at 7:22 PM, Nigel Daley nda...@mac.com wrote: I agree with Cos on fixing HDFS-1511 first. Once that is done I'll enable hdfs patch testing. Cheers, Nige Sent from my iPhone4 On Dec 17, 2010, at 7:01 PM, Konstantin Boudnik c...@apache.org wrote: One more issue needs to be addressed before test-patch is turned on HDFS is https://issues.apache.org/jira/browse/HDFS-1511 -- Take care, Konstantin (Cos) Boudnik On Fri, Dec 17, 2010 at 16:17, Konstantin Boudnik c...@apache.org wrote: Considering that because of these 4 faulty cases every patch will be -1'ed a patch author will still have to look at it and make a comment why this particular -1 isn't valid. Lesser work, perhaps, but messier IMO. I'm not blocking it - I just feel like there's a better way. -- Take care, Konstantin (Cos) Boudnik On Fri, Dec 17, 2010 at 15:55, Jakob Homan jgho...@gmail.com wrote: If HDFS is added to the test-patch queue right now we get nothing but dozens of -1'ed patches. There aren't dozens of patches being submitted currently. The -1 isn't the important thing, it's the grunt work of actually running (and waiting) for the tests, test-patch, etc. that Hudson does so that the developer doesn't have to. On Fri, Dec 17, 2010 at 3:48 PM, Dhruba Borthakur
Re: triggering automated precommit testing
Hey Nigel, Would there be any way to add a feature where we can make some special comment on the JIRA that would trigger a hudson retest? There are a lot of really old patches out on the JIRA that would be worth re-testing against trunk, and it's a pain to download and re-attach. I'm thinking a comment with a special token like @Hudson.Test Failing that, Ian, can you add me to the Hudson list? -Todd On Wed, Jan 12, 2011 at 4:33 PM, Nigel Daley nda...@mac.com wrote: Jakob Homan commented on HDFS-884: -- Konstantin, if you're trying to kick a new patch build for this you no longer move it to Open and back to Patch Available. Instead, you must upload a new patch. Or, if you have permission, you can kickhttps:// hudson.apache.org/hudson/job/PreCommit-HDFS-Build/ and enter the issue number. That makes me sad. Is this a new feature or regression? [For everyone's benefit, moving this to general@] Jakob, I referenced the change here: http://tinyurl.com/4crxlvy The new system is much more robust partial because it no longer relies on watching Jira generated emails to determined when issues move into Patch Available state. There is limited info I can get from the Jira API, thus the triggering mechanism had to change. Cheers, Nige -- Todd Lipcon Software Engineer, Cloudera
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchset
Hi Arun, all, When we merged YDH and CDH for CDH3b3, we went through the effort of linearizing all of the YDH patches and squashing multiple commits into single ones corresponding to a single JIRA where possible. So, we have a 100% linear set of patches that applies on top of the 0.20.2 source tree and includes Yahoo 0.20.100.3 as well as almost all the patches from 0.20-append and a number of other backports. Since this could be applied as a linear set of patches instead of a big lump, would there be interest in using this as the 0.20.100 Apache release? I can take the time to remove any patches that are cloudera specific or not yet applied upstream. Thanks -Todd On Wed, Jan 12, 2011 at 11:07 PM, Arun C Murthy a...@yahoo-inc.com wrote: On Jan 12, 2011, at 2:56 PM, Nigel Daley wrote: +1 for 0.20.x, where x = 100. I agree that the 1.0 moniker would involve more discussion. Ok, seems like we are converging; we can continue talking. I've created the branch to get the ball rolling. Will this be a jumbo patch attached to a Jira and then committed to the branch? Just curious. I'm afraid that the svn log of the branch from github Y! branch is fairly useless since a single JIRA might have multiple commits in the Y! branch (bugfix on top of a bugfix). We have done that in several cases (but the patches committed to trunk have a single patch which is the result of forward porting a complete feature/bugfix). IAC the this branch and 0.22 have diverged so much that almost no non-trivial patch would apply without a significant amount of work. Thus, I think a jumbo patch should suffice. It will also ensure this can done quickly so that the community can then concentrate on 0.22 and beyond. However, I will (manually) ensure all relevant jiras are referenced in the CHANGES.txt and Release Notes for folks to see the contents of the release. This is the hardest part of the exercise. Also, this ensures that we can track these jiras for 0.22 as Eli suggested. Does that seem like a reasonable way forward? I'm happy to brainstorm. thanks, Arun -- Todd Lipcon Software Engineer, Cloudera
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchset
On Thu, Jan 13, 2011 at 3:05 PM, Arun C Murthy a...@yahoo-inc.com wrote: Since this could be applied as a linear set of patches instead of a big lump, would there be interest in using this as the 0.20.100 Apache release? I can take the time to remove any patches that are cloudera specific or not yet applied upstream. Interesting discussion, thanks. I'm sure it took you a fair amount of work to squash patches (which I tried too, btw). Yep, I had a great summer ;-) That, plus the fact that we would need to do a similar amount of work for the 10 or so releases we have done after 0.20.100.3 scares me. Sorry, I actually meant 0.20.104.3. Have there been many releases since then? That's the last version available on the Yahoo github, and that's the version we incorporated/linearized. If there is a large sequence of patches after this that you're planning on including, it would be good to see them in your git repo. As we Nigel and I discussed here, the jumbo patch and an up-to-date CHANGES.txt provides almost all of the benefits we seek and allows all of us to get this done very quickly to focus on hadoop-0.22 and beyond. In my opinion here are the downsides to this plan: - a mondo merge patch is a big pain when trying to do debugging. It may be sufficient for a user to look at CHANGES.txt, but I find myself using blame/log/etc on individual files to understand code lineage on a daily basis. If all of the merge shows up as a big patch it will be very difficult (at least the way I work with code) to help users debug issues or understand which JIRA a certain regression may have come from. - CHANGES.txt traditionally doesn't reference which patch file from a JIRA was checked in. So we may know that a given JIRA has been included, but often there are several revisions of patches on the JIRA and it's difficult to be sure that we have the most up-to-date version. By looking at change history it's usually easy to pick this out, but if it's one giant patch apply, this isn't possible. - the proposal to use the YDH distro certainly solves the Security issue, but doesn't help out HBase at all. Given HBase has been asking for a long time to get a real release of the append branch, I think it would be better to have one 20-based release which has both of these features, rather than further fragmenting the community into 0.20.2, 0.20.2+security, 0.20.2+append. I think the first two points could be addressed if you push your git tree either to github or an apache-hosted git, and then include in SVN as a mondo patch. It's not ideal, but at least when trying to debug issues and understand the history of this branch there will be a publicly available change history to reference. To clarify my position a bit here - I definitely appreciate your volunteering to do the work, and wouldn't *block* the proposal as you've put it forth. I just think it will have limited utility for the community by being opaque (if contributed as a giant patch) and by not including the sync feature which is critical for a large segment of users. Given those downsides I'd rather see the effort diverted towards making a killer 0.22 release that we can all jump on. Thanks -Todd -- Todd Lipcon Software Engineer, Cloudera
Re: [DISCUSS] Move project split down a level
Big +1. Curious how this will map to git, though - do we go back to one git repo? When we have a patch that is mainly HDFS or MR focused but will need changes across projects, can we just put up one patch in HDFS/MR or do we still need to open a parallel common JIRA? On Thu, Jan 13, 2011 at 11:25 PM, Eric Baldeschwieler eri...@yahoo-inc.comwrote: +1 Death to the project split! Or short of that, anything to tame it. On Jan 13, 2011, at 10:18 PM, Nigel Daley wrote: Folks, As I look more at the impact of the common/MR/HDFS project split on what and how we release Hadoop, I feel like the split needs an adjustment. Many folks I've talked to agree that the project split has caused us a splitting headache. I think 1 relatively small change could alleviate some of that. CURRENT SVN REPO: hadoop / [common, mapreduce, hdfs] / trunk hadoop / [common, mapreduce, hdfs] / branches PROPOSAL: hadoop / trunk / [common, mapreduce, hdfs] hadoop / branches / [common, mapreduce, hdfs] We're a long way from releasing these 3 projects independently. Given that, they should be branched and released as a unit. This SVN structure enforces that and provides a more natural place to keep a top level build and pkg scripts that operate across all 3 projects. Thoughts? Cheers, Nige -- Todd Lipcon Software Engineer, Cloudera
Re: Patch testing
On Fri, Jan 7, 2011 at 2:11 PM, Nigel Daley nda...@mac.com wrote: Hrm, the MR precommit test I'm running has hung (been running for 14 hours so far). FWIW, 2 HDFS precommit tests are hung too. I suspect it could be the NFS mounts on the machines. I forced a thread dump which you can see in the console: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/10/console Strange, haven't seen a hang like that before in handleConnectionFailure. It should retry for 15 minutes max in that loop. Any other ideas why these might be hanging? There is an HDFS bug right now that can cause hangs on some tests - HDFS-1529 - would appreciate if someone can take a look. But I don't think this is responsible for the MR hang above. -Todd On Jan 5, 2011, at 5:42 PM, Todd Lipcon wrote: On Wed, Jan 5, 2011 at 4:39 PM, Nigel Daley nda...@mac.com wrote: Thanks for looking into it Todd. Let's first see if you think it can be fixed quickly. Let me know. No problem, it wasn't too bad after all. Patch up on HADOOP-7087 which fixes this test timeout for me. -Todd On Jan 5, 2011, at 4:33 PM, Todd Lipcon wrote: On Wed, Jan 5, 2011 at 4:19 PM, Nigel Daley nda...@mac.com wrote: Todd, would love to get https://issues.apache.org/jira/browse/MAPREDUCE-2121 fixed first since this is failing every night on trunk. What if we disable that test, move that issue to 0.22 blocker, and then enable the test-patch? I'll also look into that one today, but if it's something that will take a while to fix, I don't think we should hold off the useful testing for all the other patches. -Todd On Jan 5, 2011, at 2:45 PM, Todd Lipcon wrote: Hi Nigel, MAPREDUCE-2172 has been fixed for a while. Are there any other particular JIRAs you think need to be fixed before the MR test-patch queue gets enabled? I have a lot of outstanding patches and doing all the test-patch turnaround manually on 3 different boxes is a real headache. Thanks -Todd On Tue, Dec 21, 2010 at 1:33 PM, Nigel Daley nda...@mac.com wrote: Ok, HDFS is now enabled. You'll see a stream of updates shortly on the ~30 Patch Available HDFS issues. Nige On Dec 20, 2010, at 12:42 PM, Jakob Homan wrote: I committed HDFS-1511 this morning. We should be good to go. I can haz snooty robot butler? On Fri, Dec 17, 2010 at 8:31 PM, Konstantin Boudnik c...@apache.org wrote: Thanks Jacob. I am wasted already but I can do it on Sun, I think, unless it is done earlier. -- Take care, Konstantin (Cos) Boudnik On Fri, Dec 17, 2010 at 19:41, Jakob Homan jgho...@gmail.com wrote: Ok. I'll get a patch out for 1511 tomorrow, unless someone wants to whip one up tonight. On Fri, Dec 17, 2010 at 7:22 PM, Nigel Daley nda...@mac.com wrote: I agree with Cos on fixing HDFS-1511 first. Once that is done I'll enable hdfs patch testing. Cheers, Nige Sent from my iPhone4 On Dec 17, 2010, at 7:01 PM, Konstantin Boudnik c...@apache.org wrote: One more issue needs to be addressed before test-patch is turned on HDFS is https://issues.apache.org/jira/browse/HDFS-1511 -- Take care, Konstantin (Cos) Boudnik On Fri, Dec 17, 2010 at 16:17, Konstantin Boudnik c...@apache.org wrote: Considering that because of these 4 faulty cases every patch will be -1'ed a patch author will still have to look at it and make a comment why this particular -1 isn't valid. Lesser work, perhaps, but messier IMO. I'm not blocking it - I just feel like there's a better way. -- Take care, Konstantin (Cos) Boudnik On Fri, Dec 17, 2010 at 15:55, Jakob Homan jgho...@gmail.com wrote: If HDFS is added to the test-patch queue right now we get nothing but dozens of -1'ed patches. There aren't dozens of patches being submitted currently. The -1 isn't the important thing, it's the grunt work of actually running (and waiting) for the tests, test-patch, etc. that Hudson does so that the developer doesn't have to. On Fri, Dec 17, 2010 at 3:48 PM, Dhruba Borthakur dhr...@gmail.com wrote: +1, thanks for doing this. On Fri, Dec 17, 2010 at 3:19 PM, Jakob Homan jgho...@gmail.com wrote: So, with test-patch updated to show the failing tests, saving the developers the need to go and verify that the failed tests are all known, how do people feel about turning on test-patch again for HDFS and mapred? I think it'll help prevent any more tests from entering the yeah, we know category. Thanks, jg On Wed, Nov 17, 2010 at 5:08 PM, Jakob Homan jho...@yahoo-inc.com wrote: True, each patch would get a -1 and the failing tests would need to be verified as those known bad (BTW, it would be great if Hudson could list which tests failed in the message it posts to JIRA). But that's still quite a bit less error-prone
Re: Patch testing
On Wed, Jan 5, 2011 at 4:19 PM, Nigel Daley nda...@mac.com wrote: Todd, would love to get https://issues.apache.org/jira/browse/MAPREDUCE-2121 fixed first since this is failing every night on trunk. What if we disable that test, move that issue to 0.22 blocker, and then enable the test-patch? I'll also look into that one today, but if it's something that will take a while to fix, I don't think we should hold off the useful testing for all the other patches. -Todd On Jan 5, 2011, at 2:45 PM, Todd Lipcon wrote: Hi Nigel, MAPREDUCE-2172 has been fixed for a while. Are there any other particular JIRAs you think need to be fixed before the MR test-patch queue gets enabled? I have a lot of outstanding patches and doing all the test-patch turnaround manually on 3 different boxes is a real headache. Thanks -Todd On Tue, Dec 21, 2010 at 1:33 PM, Nigel Daley nda...@mac.com wrote: Ok, HDFS is now enabled. You'll see a stream of updates shortly on the ~30 Patch Available HDFS issues. Nige On Dec 20, 2010, at 12:42 PM, Jakob Homan wrote: I committed HDFS-1511 this morning. We should be good to go. I can haz snooty robot butler? On Fri, Dec 17, 2010 at 8:31 PM, Konstantin Boudnik c...@apache.org wrote: Thanks Jacob. I am wasted already but I can do it on Sun, I think, unless it is done earlier. -- Take care, Konstantin (Cos) Boudnik On Fri, Dec 17, 2010 at 19:41, Jakob Homan jgho...@gmail.com wrote: Ok. I'll get a patch out for 1511 tomorrow, unless someone wants to whip one up tonight. On Fri, Dec 17, 2010 at 7:22 PM, Nigel Daley nda...@mac.com wrote: I agree with Cos on fixing HDFS-1511 first. Once that is done I'll enable hdfs patch testing. Cheers, Nige Sent from my iPhone4 On Dec 17, 2010, at 7:01 PM, Konstantin Boudnik c...@apache.org wrote: One more issue needs to be addressed before test-patch is turned on HDFS is https://issues.apache.org/jira/browse/HDFS-1511 -- Take care, Konstantin (Cos) Boudnik On Fri, Dec 17, 2010 at 16:17, Konstantin Boudnik c...@apache.org wrote: Considering that because of these 4 faulty cases every patch will be -1'ed a patch author will still have to look at it and make a comment why this particular -1 isn't valid. Lesser work, perhaps, but messier IMO. I'm not blocking it - I just feel like there's a better way. -- Take care, Konstantin (Cos) Boudnik On Fri, Dec 17, 2010 at 15:55, Jakob Homan jgho...@gmail.com wrote: If HDFS is added to the test-patch queue right now we get nothing but dozens of -1'ed patches. There aren't dozens of patches being submitted currently. The -1 isn't the important thing, it's the grunt work of actually running (and waiting) for the tests, test-patch, etc. that Hudson does so that the developer doesn't have to. On Fri, Dec 17, 2010 at 3:48 PM, Dhruba Borthakur dhr...@gmail.com wrote: +1, thanks for doing this. On Fri, Dec 17, 2010 at 3:19 PM, Jakob Homan jgho...@gmail.com wrote: So, with test-patch updated to show the failing tests, saving the developers the need to go and verify that the failed tests are all known, how do people feel about turning on test-patch again for HDFS and mapred? I think it'll help prevent any more tests from entering the yeah, we know category. Thanks, jg On Wed, Nov 17, 2010 at 5:08 PM, Jakob Homan jho...@yahoo-inc.com wrote: True, each patch would get a -1 and the failing tests would need to be verified as those known bad (BTW, it would be great if Hudson could list which tests failed in the message it posts to JIRA). But that's still quite a bit less error-prone work than if the developer runs the tests and test-patch themselves. Also, with 22 being cut, there are a lot of patches up in the air and several developers are juggling multiple patches. The more automation we can have, even if it's not perfect, will decrease errors we may make. -jg Nigel Daley wrote: On Nov 17, 2010, at 3:11 PM, Jakob Homan wrote: It's also ready to run on MapReduce and HDFS but we won't turn it on until these projects build and test cleanly. Looks like both these projects currently have test failures. Assuming the projects are compiling and building, is there a reason to not turn it on despite the test failures? Hudson is invaluable to developers who then don't have to run the tests and test-patch themselves. We didn't turn Hudson off when it was working previously and there were known failures. I think one of the reasons we have more failing tests now is the higher cost of doing Hudson's work (not a great excuse I know). This is particularly true now because several of the failing tests involve tests timing out, making the whole testing regime even longer. Every single patch would get a -1
Re: DISCUSSION: Cut a hadoop-0.20.0-append release from the tip of branch-0.20-append branch?
On Thu, Dec 23, 2010 at 10:15 AM, M. C. Srivas mcsri...@gmail.com wrote: Regardless, there will still be 2 incompatible branches. And that is only the beginning. Some future features will be done only on branch 1 (since company 1 uses that), and other features on branch 2 (by company 2, since they prefer branch 2), thereby further separating the two branches. If the goal is to avoid the split, then there are only 2 choices: (a) merge both (b) abandon one or the other. The 0.20 append solution has never been seen as a fork. It's a stop-gap fixup of the 0.20 append feature, but we don't intend to forward-port that append implementation into trunk. From an API perspective it's very close to the 0.22 version, and I think everyone fully intends to abandon the 0.20-append work once 0.22 append has been heavily tested for HBase workloads. The Promised Land that we say we're all trying to get to is regular, timely, feature-complete, tested, innovative but stable releases of new versions of Apache Hadoop. Missing out any one of those criteria discovered will continue (and has continued) the current situation where quasi-official branches and outside distributions fill the void such a release should. The effort to maintain this offical branch and fix the bugs that will be discovered could be better spent moving us closer to that goal. +1. Interestingly, the work on 0.20-append uncovered a number of bugs that also will apply to 0.22's implementation. So it wasn't all a wasted effort ;-) -- Todd Lipcon Software Engineer, Cloudera
Re: namenode doesn't start after reboot
On Thu, Dec 23, 2010 at 2:50 AM, Bjoern Schiessle bjo...@schiessle.orgwrote: 1. I have set up a second dfs.name.dir which is stored at another computer (mounted by sshfs) I would strongly discourage the use of sshfs for the name dir. For one, it's slow, and for two, I've sen it have some really weird semantics where it's doing write-back caching. Just take a look at its manpage and you should get scared about using it for a critical mount point like this. A soft interruptable NFS mount is a much safer bet. -Todd -- Todd Lipcon Software Engineer, Cloudera
Re: namenode doesn't start after reboot
On Thu, Dec 23, 2010 at 12:47 PM, Jakob Homan jgho...@gmail.com wrote: Please move discussions of CDH issues to Cloudera's lists. Thanks. Hi Jakob, These bugs are clearly not CDH-specific. NameNode corruption bugs, and best practices with regard to the storage of NN metadata, are clearly applicable to any version of Hadoop that users may run, be it Apache, Yahoo, Facebook, 0.20, 0.21, or trunk. If you have reason to believe my suggestion you quoted below is somehow not relevant to the larger community I would love to hear it. My understanding of the ASF goals is that we should encourage a cohesive community. Asking users of CDH to move general Hadoop questions off of ASF mailing lists just because of their choice in distros encourages a fractured community rather than a cohesive one. Clearly. if a user has a question specifically about Cloudera packaging they should be directed to the CDH lists so as not to clutter non-CDH users' inboxes with irrelevant questions. I think if you browse the archives you'll find that Cloudera employees have been consistent about doing this since we started the cdh-user list several months ago. But if an issue is a bug that is likely to occur in trunk, it makes sense to me to leave it on the list associated with the core project. Personally I do my best to answer questions on the ASF lists regardless of which distro the person is using - though our distros have some divergence in backported patch sets, it's rare that a bug in one distro doesn't allow us to fix a bug in trunk. I can readily pull up several recent examples of this, and I'm surprised that there isn't more concern in the general community about bugs that may result in NN metadata corruption. Thanks, -Todd On Thu, Dec 23, 2010 at 12:02 PM, Todd Lipcon t...@cloudera.com wrote: On Thu, Dec 23, 2010 at 2:50 AM, Bjoern Schiessle bjo...@schiessle.org wrote: 1. I have set up a second dfs.name.dir which is stored at another computer (mounted by sshfs) I would strongly discourage the use of sshfs for the name dir. For one, it's slow, and for two, I've sen it have some really weird semantics where it's doing write-back caching. Just take a look at its manpage and you should get scared about using it for a critical mount point like this. A soft interruptable NFS mount is a much safer bet. -Todd -- Todd Lipcon Software Engineer, Cloudera -- Todd Lipcon Software Engineer, Cloudera
Re: Patch testing
On Fri, Dec 17, 2010 at 3:55 PM, Jakob Homan jgho...@gmail.com wrote: If HDFS is added to the test-patch queue right now we get nothing but dozens of -1'ed patches. There aren't dozens of patches being submitted currently. The -1 isn't the important thing, it's the grunt work of actually running (and waiting) for the tests, test-patch, etc. that Hudson does so that the developer doesn't have to. I agree with Jakob. I've had to run and re-run the test-patch and unit tests probably 30 times over the last two weeks, and it takes a lot of effort, since my own infrastructure for doing this is a bit messy. I'd much rather just reply to the Hudson comments saying these are known issues than have to run the tests, check the results, copy and paste them and *then* say these are known issues anyway! On Fri, Dec 17, 2010 at 3:48 PM, Dhruba Borthakur dhr...@gmail.com wrote: +1, thanks for doing this. On Fri, Dec 17, 2010 at 3:19 PM, Jakob Homan jgho...@gmail.com wrote: So, with test-patch updated to show the failing tests, saving the developers the need to go and verify that the failed tests are all known, how do people feel about turning on test-patch again for HDFS and mapred? I think it'll help prevent any more tests from entering the yeah, we know category. Thanks, jg On Wed, Nov 17, 2010 at 5:08 PM, Jakob Homan jho...@yahoo-inc.com wrote: True, each patch would get a -1 and the failing tests would need to be verified as those known bad (BTW, it would be great if Hudson could list which tests failed in the message it posts to JIRA). But that's still quite a bit less error-prone work than if the developer runs the tests and test-patch themselves. Also, with 22 being cut, there are a lot of patches up in the air and several developers are juggling multiple patches. The more automation we can have, even if it's not perfect, will decrease errors we may make. -jg Nigel Daley wrote: On Nov 17, 2010, at 3:11 PM, Jakob Homan wrote: It's also ready to run on MapReduce and HDFS but we won't turn it on until these projects build and test cleanly. Looks like both these projects currently have test failures. Assuming the projects are compiling and building, is there a reason to not turn it on despite the test failures? Hudson is invaluable to developers who then don't have to run the tests and test-patch themselves. We didn't turn Hudson off when it was working previously and there were known failures. I think one of the reasons we have more failing tests now is the higher cost of doing Hudson's work (not a great excuse I know). This is particularly true now because several of the failing tests involve tests timing out, making the whole testing regime even longer. Every single patch would get a -1 and need investigation. Currently, that would be about 83 investigations between MR and HDFS issues that are in patch available state. Shouldn't we focus on getting these tests fixed or removed/? Also, I need to get MAPREDUCE-2172 fixed (applies to HDFS as well) before I turn this on. Cheers, Nige -- Connect to me at http://www.facebook.com/dhruba -- Todd Lipcon Software Engineer, Cloudera
hadoop.job.ugi backwards compatibility
Hi all, I wanted to start a (hopefully short) discussion around the treatment of the hadoop.job.ugi configuration in Hadoop 0.22 and beyond (as well as the secure 0.20 branch). In the current security implementation, the following incompatible changes have been made even for users who are sticking with simple security. 1) Groups resolution happens on the server side, where it used to happen on the client. Thus, all Hadoop users must exist on the NN/JT machines in order for group mapping to succeed (or the user must write a custom group mapper). 2) The hadoop.job.ugi parameter is ignored - instead the user has to use the new UGI.createRemoteUser(foo).doAs() API, even in simple security. I'm curious whether the general user community feels these are acceptable breaking changes. The potential solutions I can see are: For 1) Add a configuration like hadoop.security.simple.groupmappinglocation - client or server. If it's set to client, the group mapping would continue to happen as it does in prior versions on the client side. For 2) If security is simple, we can have the FileSystem and JobClient constructors check for this parameter. If it's set, and there is no Subject object associated with the current AccessControlContext, wrap the creation of the RPC proxy with the correct doAs() call. Although security is obviously an absolute necessity for many organizations, I know of a lot of people who have small clusters and small teams who don't have any plans to deploy it. For these people, I imagine the above backward-compatibility layer may be very helpful as they adopt the next releases of Hadoop. If we don't want to support these options going forward, we can of course emit deprecation warnings when they are in effect and remove the compatibility layer in the next major release. Any thoughts here? Do people often make use of the hadoop.job.ugi variable to such an extent that this breaking change would block your organization from upgrading? Thanks -Todd -- Todd Lipcon Software Engineer, Cloudera
Re: Why single thread for HDFS?
On Mon, Jul 5, 2010 at 5:08 AM, elton sky eltonsky9...@gmail.com wrote: Segel, Jay Thanks for reply! Your parallelism comes from multiple tasks running on different nodes within the cloud. By default you get one map/reduce job per block. You can write your own splitter to increase this and then get more parallelism. sounds like an elegant solution. We can modify the 'distcp', using a simple MR job, make it based on block rather than file. There's actually an open ticket somewhere to make distcp do this using the new concat() API in the NameNode. concat() allows several files to be combined into one file at the metadata level, so long as a number of restrictions are met. The work hasn't been done yet, but the concat() call is there and waiting for a user. -Todd in practice, you very rarely know how big your output is going to be before it's produced, so this doesn't really work I think you got the point why Yahoo make this design descision. Multithreading only applicable when you know the size of the file, like copy existing files, so you can split them and feed to different threads. On Sat, Jul 3, 2010 at 1:24 AM, Jay Booth jaybo...@gmail.com wrote: Yeah, a good way to think of it is that parallelism is achieved at the application level. On the input side, you can process multiple files in parallel or one file in parallel by logically splitting and opening multiple readers of the same file at multiple points. Each of these readers is single threaded, because, well, you're returning a stream of bytes in order. It's inherently serial. On the reduce side, multiple reduces run, writing to multiple files in the same directory. Again, you can't really write to a single file in parallel effectively -- you can't write byte 26 before byte 25, because the file's not that long yet. Theoretically, maybe you could have all reduces write to the same file by allocating some amount of space ahead of time and writing to the blocks in parallel - in practice, you very rarely know how big your output is going to be before it's produced, so this doesn't really work. Multiple files in the same directory achieves the same goal much more elegantly, without exposing a bunch of internal details of the filesystem to user space. Does that make sense? On Fri, Jul 2, 2010 at 9:26 AM, Segel, Mike mse...@navteq.com wrote: Actually they also listen here and this is a basic question... I'm not an expert, but how does having multiple threads really help this problem? I'm assuming you're talking about a map/reduce job and not some specific client code which is being run on a client outside of the cloud/cluster I wasn't aware that you could easily synchronize threads running on different JVMs. ;-) Your parallelism comes from multiple tasks running on different nodes within the cloud. By default you get one map/reduce job per block. You can write your own splitter to increase this and then get more parallelism. HTH -Mike -Original Message- From: Hemanth Yamijala [mailto:yhema...@gmail.com] Sent: Friday, July 02, 2010 2:56 AM To: general@hadoop.apache.org Subject: Re: Why single thread for HDFS? Hi, Can you please post this on hdfs-...@hadoop.apache.org ? I suspect the most qualified people to answer this question would all be on that list. Hemanth On Fri, Jul 2, 2010 at 11:43 AM, elton sky eltonsky9...@gmail.com wrote: I guess this question was igored, so I just post it again. From my understanding, HDFS uses a single thread to do read and write. Since a file is composed of many blocks and each block is stored as a file in the underlying FS, we can do some parallelism on block base. When read across multi-blocks, threads can be used to read all blocks. When write, we can calculate the offset of each block and write to all of them simultaneously. Is this right? The information contained in this communication may be CONFIDENTIAL and is intended only for the use of the recipient(s) named above. If you are not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited. If you have received this communication in error, please notify the sender and delete/destroy the original message and any copy of it from your computer or paper files. -- Todd Lipcon Software Engineer, Cloudera
Re: Hadoop versions distributions
On Mon, Jul 5, 2010 at 1:12 AM, Evert Lammerts evert.lamme...@sara.nlwrote: There are a number of different versions and distributions of Hadoop which, as far as I understand, all differ from each other. I know that in the 0.20-append branch, files in HDFS can be appended, and that the Y! distribution (0.20.S) implements security features through Kerberos. And then there are the 0.20.3 and 0.22.0 branches. And trunk of course, which I guess is 0.20.2 nowadays? In addition to that there are distributions by Cloudera(CDH2 / 3beta) and IBM (IDAH). From my perspective, setting up a pilot cluster for a small number of users from different institutes, security (0.20.S) is very attractive – scientists like the idea of shielding their data and logic from other users. But what will I miss if I choose Y!’s distribution over all of these other options? Hi Evert, Y!'s distribution does contain a good set of patches, and we at Cloudera are always keeping track of the ydist git repository to incorporate those changes into CDH. Currently, ydist contains the security patch series, but doesn't include the recent append work. CDH3b2 includes the append work, but not security as of yet -- we are currently integrating security and it should be available in the next beta. Aside from the specific patches included, it's worth noting that the Y! dist is a git repository, rather than a full binary-and-source distribution of Hadoop and related tools. CDH includes not just the core hadoop components but also integrates many other important ecosystem components including Pig, Hive, Oozie, HBase, Zookeeper, Flume, etc. Thanks -Todd -- Todd Lipcon Software Engineer, Cloudera
Re: Can we modify files in HDFS?
Hi Elton, Typically, large data sets are of the sort that continuously grow, and are not edited or amended. For example, a common Hadoop use case is the analysis of log data or other instrumentation from web or application servers. In these cases, files are simply added, but there is no need to go back and change entries. For the ability to have a more table-like random access storage on top of Hadoop, I would encourage you to look into HBase. It supports random read/write access with low latency. -Todd On Mon, Jun 28, 2010 at 9:48 PM, elton sky eltonsky9...@gmail.com wrote: thanx Jeff, So...it is a significant drawback. As a matter of fact, there are many cases we need to modify. I dont understand why Yahoo didn't provoid that functionality. And as I know no one else is working on this. Why is that? -- Todd Lipcon Software Engineer, Cloudera
Re: datanode goes down, maybe due to Unexpected problem in creating temporary file
Have you disabled the statechange log on the NN? This block has to be in there. Also, are you by any chance running with append enabled on unpatched 0.20? -Todd On Mon, May 17, 2010 at 12:40 PM, Ted Yu yuzhih...@gmail.com wrote: That blk doesn't appear in NameNode log. For datanode, 2010-05-15 00:09:31,023 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_926027507678171558_3620 src: /10.32.56.170:49172 dest: / 10.32.56.171:50010 2010-05-15 00:09:31,024 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_926027507678171558_3620 received exception java.io.IOException: Unexpected problem in creating temporary file for blk_926027507678171558_3620. File /home/hadoop/m2m_3.0.x/3.0.trunk.39-270238/data/hadoop-data/dfs/data/tmp/blk_926027507678171558 should not be present, but is. 2010-05-15 00:09:31,024 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-5814095875968936685_2910 received exception java.io.IOException: Unexpected problem in creating temporary file for blk_-5814095875968936685_2910. File /home/hadoop/m2m_3.0.x/3.0.trunk.39-270238/data/hadoop-data/dfs/data/tmp/blk_-5814095875968936685 should not be present, but is. 2010-05-15 00:09:31,025 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration( 10.32.56.171:50010, storageID=DS-1723593983-10.32.56.171-50010-1273792791835, infoPort=50075, ipcPort=50020):DataXceiver java.io.IOException: Unexpected problem in creating temporary file for blk_926027507678171558_3620. File /home/hadoop/m2m_3.0.x/3.0.trunk.39-270238/data/hadoop-data/dfs/data/tmp/blk_926027507678171558 should not be present, but is. at org.apache.hadoop.hdfs.server.datanode.FSDataset$FSVolume.createTmpFile(FSDataset.java:398) at org.apache.hadoop.hdfs.server.datanode.FSDataset$FSVolume.createTmpFile(FSDataset.java:376) at org.apache.hadoop.hdfs.server.datanode.FSDataset.createTmpFile(FSDataset.java:1133) at org.apache.hadoop.hdfs.server.datanode.FSDataset.writeToBlock(FSDataset.java:1022) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:98) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:259) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103) at java.lang.Thread.run(Thread.java:619) 2010-05-15 00:09:31,025 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration( 10.32.56.171:50010, storageID=DS-1723593983-10.32.56.171-50010-1273792791835, infoPort=50075, ipcPort=50020):DataXceiver 2010-05-15 00:19:28,334 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_926027507678171558_3620 src: /10.32.56.170:36887 dest: / 10.32.56.171:50010 2010-05-15 00:19:28,334 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_926027507678171558_3620 received exception java.io.IOException: Unexpected problem in creating temporary file for blk_926027507678171558_3620. File /home/hadoop/m2m_3.0.x/3.0.trunk.39-270238/data/hadoop-data/dfs/data/tmp/blk_926027507678171558 should not be present, but is. 2010-05-15 00:19:28,334 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration( 10.32.56.171:50010, storageID=DS-1723593983-10.32.56.171-50010-1273792791835, infoPort=50075, ipcPort=50020):DataXceiver java.io.IOException: Unexpected problem in creating temporary file for blk_926027507678171558_3620. File /home/hadoop/m2m_3.0.x/3.0.trunk.39-270238/data/hadoop-data/dfs/data/tmp/blk_926027507678171558 should not be present, but is. at org.apache.hadoop.hdfs.server.datanode.FSDataset$FSVolume.createTmpFile(FSDataset.java:398) at org.apache.hadoop.hdfs.server.datanode.FSDataset$FSVolume.createTmpFile(FSDataset.java:376) at org.apache.hadoop.hdfs.server.datanode.FSDataset.createTmpFile(FSDataset.java:1133) at org.apache.hadoop.hdfs.server.datanode.FSDataset.writeToBlock(FSDataset.java:1022) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:98) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:259) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103) at java.lang.Thread.run(Thread.java:619) 2010-05-15 00:29:25,635 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_926027507678171558_3620 src: /10.32.56.170:34823 dest: / 10.32.56.171:50010 On Mon, May 17, 2010 at 11:43 AM, Todd Lipcon t...@cloudera.com wrote: Hi Ted, Can you please grep your NN and DN logs for blk_926027507678171558 and pastebin the results? -Todd On Mon, May 17, 2010 at 9:57 AM, Ted Yu yuzhih...@gmail.com wrote: Hi, We use CDH2 hadoop-0.20.2+228 which crashed on datanode smsrv10.ciq.com I found this in datanode log: 2010-05-15 07:37
Re: Hadoop support for hbase
On Sat, May 8, 2010 at 9:59 AM, Thomas Koch tho...@koch.ro wrote: I'm a little confused and concerned now that I learn that hbase uses a patches hadoop. For Debian I use plain hadoop under hbase and it seems to work in testing environments. - Are these patches necessary to run HBase? It will work unless you have failures, in which case it will lose edits. HBase relies on the hflush API (called sync in 0.20) which does not work properly in 0.20 without significant patching. Without this patch series, HBase will certainly run, but I could never recommend running it in a production environment where data loss is a show-stopper. - Where can I find these patches? Currently they're in various places on the JIRA - HDFS-200, HDFS-142, HDFS-826, HDFS-561, etc. I have a github branch up which contains them all applied, but I haven't tested it beyond unit tests - my testing is all happening in our CDH3 tree, and afaik Dhruba's testing is on their FB internal tree. - Why aren't these patches included in hadoop? Are they too unstable? Yes, the policy is not to make such significant changes in patch releases, so they would need to be voted into the 0.20 series. It's not that they're entirely unstable, it's just that the code is very tricky and still under development. The upcoming 0.21 release has a *different* implementation of Append which also hasn't been tested significantly in real life failure scenarios, but it's important that we keep the stable release stable. - If they're unstable, does this mean, HBase is unstable? Again I would not say it's terribly unstable - but it's nowhere near the level of stability that Hadoop is Should I worry at all about these patches for the Debian packages? If you expect that people might actually want to run a production HBase, they should have the patches. If you expect people to just be playing around on single node clusters where failures aren't an issue, best to skip. Of course, even for production usage, I wouldn't recommend running what we've got now - wait a month or two and it should be one more notch up the stability/testing scale. -Todd -- Todd Lipcon Software Engineer, Cloudera
Re: Hadoop support for hbase
I have a few questions about this proposal: 1) Will we open new JIRAs separately for each change we want to commit, and go through the normal review process? Currently the 20-append work has been mostly going on under HDFS-142 for whatever reason, with ancillary issues only for bugs that also exist in trunk. 2) Do we plan to do a release off the branch, or is it meant only as a repository for sharing patches and a tree? 3) If we do a release, what version number would we give it and how would it be presented on the download/release pages? I'm certainly not against the idea, just would like to open discussion on the above points. The other alternative as I see it is to have those working on this branch do so somewhere like github - the advantages to that would be (a) it provides a more open way for non-committers to contribute, which is important since we're working closely with the HBase team on this, and (b) it doesn't add confusion to the main Hadoop jira and download pages. The disadvantage of course is that it fragments the code repository and we can't really do a release as easily. Thanks -Todd On Fri, May 7, 2010 at 10:34 AM, Dhruba Borthakur dhr...@gmail.com wrote: Hi folks, I would like to open a discussion on how we can make HBase work well with a supported/released version of Hadoop. HBase currently ships with a hadoop jar and that hadoop jar is from hadoop 0.20 + a set of ten/twenty patches. Most of these patches are focussed on HDFS append support in hadoop 0.20. These cannot be ported back to the 0.20 branch without affecting stability of the hadoop 0.20 branch. On the other hand, it is premature for hbase deployments to use hadoop 0.21 because hadoop 0.21 is still under testing and will take some time to stabilize. My proposal is to create a new branch off the hadoop 0.20 branch and name it branch-0.20-hbase. It will have support for append/sync and will be API compatible with the hadoop 0.20 branch. However, this branch will be marked experimental and API compatibility is subject to change. This branch will contain all of hdfs/mapreduce/core. If the community likes this idea, I will volunteer myself to be the release manager for this new branch and will propose a formal vote. comments/feedback/questions are most welcome. dhruba -- Connect to me at http://www.facebook.com/dhruba -- Todd Lipcon Software Engineer, Cloudera