Re: reviewboard vs. gerrit
Hi Thomas, we (apache project) don't control the infrastructure supported by Apache, we are merely users. Support for git is a long running discussion at Apache, I suspect you should direct these types of issues to infrastruct...@. Also, I did a quick search on the INFRA jira (you might look at the GIT component there) and found the following already in place: https://issues.apache.org/jira/browse/INFRA-2205 Regards, Patrick On Wed, Jan 12, 2011 at 8:23 AM, Thomas Koch tho...@koch.ro wrote: Hi, I'm currently proposing gerrit as a tool to host GIT repositories at Apache. For evaluation I installed a gerrit instance at http://koch.ro:8080 Gerrit not only does the same things as reviewboard (AFAIK), but also - directly commits to the official GIT repo when review is succesful - manages commit and review permissions - let you directly git push to itself - so no more git diffPATCH and upload But I don't have used gerrit in practice so far. It's only my impression, that it would much better serve the needs of the ASF. Gerrit is developed by google for the android project and also used by the eclipse foundation. Both projects have similiar legal and workflow requirements as the ASF. Best regards, Thomas Koch, http://www.koch.ro
Re: Issues with attachments to wiki.
Ok, great. How are we doing with moving over the wiki? Anyone been working on that? https://issues.apache.org/jira/browse/ZOOKEEPER-940 btw, Nigel pinged me last night, he wanted to remove the ZK tab and make some other changes on the Hadoop site to reflect the fact that ZK is now a TLP. I asked him to hold off for a bit while we finalize the site/wiki but they'd like to make the changes soon. What's the status on the web site? (ben?) I clicked on some links and notice there are still some 404 results (right side nav for example). Any chance we can finalize this soon? The board meeting is on the 19th, too late for that I'm guessing but I'd really like to have 940 finalized before the next one. Regards, Patrick On Wed, Jan 12, 2011 at 9:02 AM, Mahadev Konar maha...@yahoo-inc.com wrote: Yup. We already did. It works on the new cwiki. Thanks mahadev On 1/12/11 9:00 AM, Patrick Hunt ph...@apache.org wrote: Could you give this a try on the new cwiki?
Re: bylaws proposal
Hi Ben, thanks for getting this rolling. Your committer suggestion sounds fine to me. WRT to pre announcing, we are already giving multiple days for a vote, also in the lead up to a release it should be pretty obvious that one is imminent (we usually send out status updates and such, plus the activity is pretty clear on the lists, jira, svn, etc...). Adding more pre announce time will stretch out the timeframes (and process) even longer, I'd rather we keep it as is unless ppl think this is a big issue. Would you mind creating a proposal page on the cwiki similar to what Pig did? http://markmail.org/message/lvchbhoojpbwuxyx Regards, Patrick On Tue, Jan 4, 2011 at 9:17 AM, Benjamin Reed br...@yahoo-inc.com wrote: I really like the Pig bylaws. I would suggest using it as a starting point for ZooKeeper. One thing I would like to modify is the Committer section. Pig's bylaws state that the committer becomes emeritus if they haven't contributed in any form for 6 months. I would tighten that up and say if they haven't been actively involved in reviewing or committing. (They are after all committers.) I have made this change to the text. The other change that I would like, and did not add, is for some of the votes to have a requirement to pre-announce. Specifically for PMC, committer, and release it would be nice to give a week or two notice that a vote will be coming up, just so that interested parties don't miss it. ben here is the text: Introduction This document defines the bylaws under which the Apache ZooKeeper project operates. It defines the roles and responsibilities of the project, who may vote, how voting works, how conflicts are resolved, etc. ZooKeeper is a project of the Apache Software Foundation The foundation holds the copyright on Apache code including the code in the ZooKeeper codebase. The foundation FAQ explains the operation and background of the foundation. ZooKeeper is typical of Apache projects in that it operates under a set of principles, known collectively as the Apache Way. If you are new to Apache development, please refer to the Incubator project for more information on how Apache projects operate. Roles and Responsibilities Apache projects define a set of roles with associated rights and responsibilities. These roles govern what tasks an individual may perform within the project. The roles are defined in the following sections. Users The most important participants in the project are people who use our software. The majority of our contributors start out as users and guide their development efforts from the user's perspective. Users contribute to the Apache projects by providing feedback to contributors in the form of bug reports and feature suggestions. As well, users participate in the Apache community by helping other users on mailing lists and user support forums. Contributors All of the volunteers who are contributing time, code, documentation, or resources to the ZooKeeper Project. A contributor that makes sustained, welcome contributions to the project may be invited to become a committer, though the exact timing of such invitations depends on many factors. Committers The project's committers are responsible for the project's technical management. Committers have access to a specified set of subproject's subversion repositories. Committers on subprojects may cast binding votes on any technical discussion regarding that subproject. Committer access is by invitation only and must be approved by lazy consensus of the active PMC members. A Committer is considered emeritus by his or her own declaration or by not reviewing patches or commiting patches to the project for over six months. An emeritus committer may request reinstatement of commit access from the PMC which must be approved by lazy consensus of the active PMC members. Commit access can be revoked by a unanimous vote of all the active PMC members (except the committer in question if he or she is also a PMC member). All Apache committers are required to have a signed Contributor License Agreement (CLA) on file with the Apache Software Foundation. There is a Committer FAQ which provides more details on the requirements for committers. A committer who makes a sustained contribution to the project may be invited to become a member of the PMC. The form of contribution is not limited to code. It can also include code review, helping out users on the mailing lists, documentation, etc. Project Management Committee The PMC is responsible to the board and the ASF for the management and oversight of the Apache ZooKeeper codebase. The responsibilities of the PMC include Deciding what is distributed as products of the Apache ZooKeeper project. In particular all releases must be approved by the PMC. Maintaining the project's shared resources, including the codebase repository, mailing lists, websites. Speaking on behalf of the project.
Re: Discussion - Clusterlib as a subproject for ZooKeeper
On Wed, Jan 12, 2011 at 10:18 AM, Avery Ching ach...@yahoo-inc.com wrote: Thanks for the suggestions on http://incubator.apache.org/ The reason why we thought it would be best as ZooKeeper subproject was because it is heavily dependent on ZooKeeper. Subproj is fine if that's the way you want to go, just highlighting these other possibilities. As for libmicrohttpd's LGPL, sorry if it wasn't more clear in the README, but we only link to it, we do not include the source code for libmicrohttpd. libmicrohttpd is only required if you want to build the Clusterlib http server. Seems to me though that the UI is pretty useful, would be a good idea to move to a category A license soonish. I thought the fact that you detailed the license situation was great, very helpful. Might be good to break down into sections; core, UI, ... and be more explicit. You should also take a look at Apache RAT (release audit tool), it can scan your code for conformance to apache license guidelines, and look for prohibited licenses, etc... http://incubator.apache.org/rat/ Patrick Avery On Jan 12, 2011, at 8:53 AM, Patrick Hunt wrote: Hi Avery, clusterlib looks like some great functionality, I don't see why we couldn't include it as a subproject (see one caveat I noticed below). I'd also like to point out that incubator is also a great option for the project. http://incubator.apache.org/ , have you considered that? According to the readme on GH a dependency exists on libmicrohttpd which is LGPL licensed. Unfortunately we (apache projects) cannot include LGPL licensed code, see category X here http://www.apache.org/legal/3party.html This dependency would have to be removed prior to adding the subproject. Regards, Patrick On Tue, Jan 11, 2011 at 5:34 PM, Avery Ching ach...@yahoo-inc.com wrote: Sorry for the delay (meetings). I just threw it up on GitHub. https://github.com/aching/Clusterlib Enjoy! Avery On Jan 11, 2011, at 3:42 PM, Fournier, Camille F. [Tech] wrote: Is the code somewhere we can look at it right now? C -Original Message- From: Avery Ching [mailto:ach...@yahoo-inc.com] Sent: Tuesday, January 11, 2011 2:02 PM To: dev@zookeeper.apache.org Subject: Discussion - Clusterlib as a subproject for ZooKeeper Hello, We have been working on Clusterlib at Yahoo! and would like to contribute it as a subproject to ZooKeeper. Clusterlib was developed as a next-generation platform for creating/coordinating search applications/services (including crawling, processing, indexing, and front end) at Yahoo!. We suspect much of this work will be useful for others trying to build up large-scale/distributed applications that would like to coordinate and share the same semantics. Here is a (relatively) short summary of why Clusterlib was developed: Large-scale distributed applications are difficult and time-consuming to develop since a great deal of effort is spent solving the same challenges (consistency, fault-tolerance, naming problems, etc.). Additionally, coordinating these applications is typically ad-hoc and hard to maintain. Clusterlib fills the gap by providing distributed application developers with an object-oriented data model, asynchronous event handling system, well-defined consistency semantics, and methods for making coordination easy across cooperating applications. Some example applications might include a search engine, scalable file system, large-scale data cache, etc. Clusterlib is a middleware library for building distributed applications. It was designed to simplify the job of application developers and provides a set of distributed objects that all inherit from the same Notifyable interface. The set of distributed objects includes: Root, Application, Group, DataDistribution, Node, ProcessSlot, PropertyList, and Queue. In order to give context, each object is described briefly. * Root is a point-of-entry object at the top of the hierarchy in Clusterlib and manages its Applications. There is only one Root per Clusterlib instance. * Applications are used as a namespace for managing Groups, Nodes, DataDistributions, Queues, and PropertyLists in a user-defined application. Using the application concept (as opposed to only having groups) makes accessing another Application's child objects explicit to developers. * Groups are a logical association of Clusterlib objects that can be nested. Since large-scale applications often require hundreds or thousands of nodes to operate, there might a node Group that has an alive child Group and a dead child Group that are each populated with their respective sets of nodes. * DataDistributions balance load and data across a set of objects. DataDistributions provide user-extensible key hashing to variable-sized hash ranges for user flexibility. * Nodes typically represent a physical or virtual node in an application. It has child ProcessSlots that can be used to reserve
Re: bylaws proposal
Why don't we see if it becomes a problem and then add it to our process rather than codifying in the bylaws. That work? Patrick On Wed, Jan 12, 2011 at 3:36 PM, Benjamin Reed br...@yahoo-inc.com wrote: i've added it to the cwiki: https://cwiki.apache.org/confluence/display/ZOOKEEPER/ZooKeeperBylawsProposal i don't mean the pre-annouce time to stretch things out. it's really to make sure the status updates go out. as we get more committers we need to make sure that there is a heads up that something is coming so that if someone will be on vacation or something like that we can take it into account. the idea is that we would say: hey next month we will be having a vote that way interested parties can watch out for it rather than going on vacation, coming back, and finding out that a blocker was overridden and a flawed release has gone out. ben On 01/12/2011 09:21 AM, Patrick Hunt wrote: Hi Ben, thanks for getting this rolling. Your committer suggestion sounds fine to me. WRT to pre announcing, we are already giving multiple days for a vote, also in the lead up to a release it should be pretty obvious that one is imminent (we usually send out status updates and such, plus the activity is pretty clear on the lists, jira, svn, etc...). Adding more pre announce time will stretch out the timeframes (and process) even longer, I'd rather we keep it as is unless ppl think this is a big issue. Would you mind creating a proposal page on the cwiki similar to what Pig did? http://markmail.org/message/lvchbhoojpbwuxyx Regards, Patrick On Tue, Jan 4, 2011 at 9:17 AM, Benjamin Reedbr...@yahoo-inc.com wrote: I really like the Pig bylaws. I would suggest using it as a starting point for ZooKeeper. One thing I would like to modify is the Committer section. Pig's bylaws state that the committer becomes emeritus if they haven't contributed in any form for 6 months. I would tighten that up and say if they haven't been actively involved in reviewing or committing. (They are after all committers.) I have made this change to the text. The other change that I would like, and did not add, is for some of the votes to have a requirement to pre-announce. Specifically for PMC, committer, and release it would be nice to give a week or two notice that a vote will be coming up, just so that interested parties don't miss it. ben here is the text: Introduction This document defines the bylaws under which the Apache ZooKeeper project operates. It defines the roles and responsibilities of the project, who may vote, how voting works, how conflicts are resolved, etc. ZooKeeper is a project of the Apache Software Foundation The foundation holds the copyright on Apache code including the code in the ZooKeeper codebase. The foundation FAQ explains the operation and background of the foundation. ZooKeeper is typical of Apache projects in that it operates under a set of principles, known collectively as the Apache Way. If you are new to Apache development, please refer to the Incubator project for more information on how Apache projects operate. Roles and Responsibilities Apache projects define a set of roles with associated rights and responsibilities. These roles govern what tasks an individual may perform within the project. The roles are defined in the following sections. Users The most important participants in the project are people who use our software. The majority of our contributors start out as users and guide their development efforts from the user's perspective. Users contribute to the Apache projects by providing feedback to contributors in the form of bug reports and feature suggestions. As well, users participate in the Apache community by helping other users on mailing lists and user support forums. Contributors All of the volunteers who are contributing time, code, documentation, or resources to the ZooKeeper Project. A contributor that makes sustained, welcome contributions to the project may be invited to become a committer, though the exact timing of such invitations depends on many factors. Committers The project's committers are responsible for the project's technical management. Committers have access to a specified set of subproject's subversion repositories. Committers on subprojects may cast binding votes on any technical discussion regarding that subproject. Committer access is by invitation only and must be approved by lazy consensus of the active PMC members. A Committer is considered emeritus by his or her own declaration or by not reviewing patches or commiting patches to the project for over six months. An emeritus committer may request reinstatement of commit access from the PMC which must be approved by lazy consensus of the active PMC members. Commit access can be revoked by a unanimous vote of all the active PMC members (except the committer in question if he or she is also a PMC member). All
Re: 3.3.3 release
Our typical process is to have a running fix release in parallel with the trunk. So 3.3.3 was created in jira after 3.3.2 went out, 3.4.0 is trunk. This catches any new issues that might need a fix release (3.3.3). We triage the issues marked for the fix release (3.3.3) and also apply those changes to the trunk. Only blocker bugs will hold up the fix release once we get to a point where people think we should release it (say a blocker gets fixed). It sounds like we're at that point here, where a 3.3.3 release makes sense, given 962 is fixed. Looking at 3.3.3 on jira there are currently 2 open blockers, once these are addressed (fixed or reprioritized) we could spin a release. Any volunteers to be the release manager for 3.3.3? Patrick On Tue, Jan 25, 2011 at 11:27 AM, Benjamin Reed br...@yahoo-inc.com wrote: the 962 bug we fixed was pretty severe. i would like to get a release out asap. i was looking over the bugs tagged for 3.3.3 and almost all of them look like the should really be for 3.4. in my opinion only severe bugs should be back ported to previous releases, and most of the bugs marked 3.3 done not meet that criteria. back porting patches creates a burden on developers and committers and are also not tested by qa. i think we should avoid them. ben
Re: pushing the 3.3.3 bugs
On Wed, Jan 26, 2011 at 10:27 AM, Benjamin Reed br...@yahoo-inc.com wrote: i would really like to get 3.3.3 out because of the fixes that just went in. there are quite a few bugs that are marked for 3.3.3, but i think they can all be pushed to 3.4.0. Any non-blocker jiras are always pushed to the next release if they are not resolved for the current release. What you are describing is how we always do things. Blockers always hold up the release unless specifically addressed otherwise (changing the severity or agreeing to push it to a subsequent release). i would really like to push everything to 3.4.0 and then work on getting the 3.4.0 release out. we haven't done a release from trunk in a while, but that is the only code that gets tested by hadoopqa. i think it is a bad idea to be releasing from branches that are not regularly tested. That sounds fine. However we always maintain a fix branch in parallel in case something critical (blocker) comes up before the trunk is ready. going forward doesn't it seem like a better idea to only do a release from just a branch if it is something that pops up quickly right after a release. otherwise, we should be releasing from trunk and possibly doing a simultaneous release from a branch. In theory that's how it should work. However the trunk release has gotten stalled. One way to un-stall things is for someone to step up and agree to be release manager for 3.3.3 and 3.4.0 releases (doesn't have to be the same person). That person is responsible for determining when the release is ready, what's in the release, etc... Patrick
Re: pushing the 3.3.3 bugs
On Wed, Jan 26, 2011 at 10:38 AM, Flavio Junqueira f...@yahoo-inc.comwrote: Ben, Your proposal in general sounds reasonable to me with the exception of do a release from just a branch if it is something that pops up quickly right after a release. I don't see a reason for binding it to time, and instead we could say that we will have a branch release if: 1- there is an important bug fix that needs to be released 2- we are not close to a trunk release One problem with our current model is that we create a release placeholder before we have a release manager for the release. What you are suggesting makes sense to me, but it introduces another problem. Today we create release placeholders as soon as we push out a release, we always have placeholders for the upcoming fix/trunk based releases. This gives us a place to hang JIRA issues off of, it allows us to triage new issues and slate them for a particular release. We could instead go to the model of having only trunk, no placeholder at all for the fix and next major/minor release (3.3.3/3.4.0 today). Then, at some point, a release manager could step up and volunteer to do a release, say 3.3.3, they would then be responsible for determining what's in the release. They would work with the community to do this, in the end they (the RM) are the arbiter for what's in/out of the release. We could try this and see how it works. It would allow for what Ben is suggesting. EOD though it requires someone to step up and take on the responsibility of being the RM. (hint hint :-) ) Patrick -Flavio On Jan 26, 2011, at 7:27 PM, Benjamin Reed wrote: i would really like to get 3.3.3 out because of the fixes that just went in. there are quite a few bugs that are marked for 3.3.3, but i think they can all be pushed to 3.4.0. i would really like to push everything to 3.4.0 and then work on getting the 3.4.0 release out. we haven't done a release from trunk in a while, but that is the only code that gets tested by hadoopqa. i think it is a bad idea to be releasing from branches that are not regularly tested. going forward doesn't it seem like a better idea to only do a release from just a branch if it is something that pops up quickly right after a release. otherwise, we should be releasing from trunk and possibly doing a simultaneous release from a branch. ben *flavio* *junqueira* research scientist f...@yahoo-inc.com direct +34 93-183-8828 avinguda diagonal 177, 8th floor, barcelona, 08018, es phone (408) 349 3300fax (408) 349 3301
Re: pushing the 3.3.3 bugs
One other thing to keep in mind with this model. The RM is responsible for backporting (or working with the author to backport) any issues that go into a fix release. Today we require authors to provide patches for both the fix branch and the trunk (for fixes). If changes are committed to the trunk, and at some point a RM steps up to create a fix release, those changes need to be applied to the fix branch. Granted, this seems to fit well with Ben's original suggestion of limiting the number of fixes that go into fix releases. It's a step away from what our users have come to expect though - that we essentially maintain a fix release branch with most/all fixes, as well as a new feature development branch (trunk). Patrick On Wed, Jan 26, 2011 at 11:15 AM, Patrick Hunt ph...@apache.org wrote: On Wed, Jan 26, 2011 at 10:38 AM, Flavio Junqueira f...@yahoo-inc.comwrote: Ben, Your proposal in general sounds reasonable to me with the exception of do a release from just a branch if it is something that pops up quickly right after a release. I don't see a reason for binding it to time, and instead we could say that we will have a branch release if: 1- there is an important bug fix that needs to be released 2- we are not close to a trunk release One problem with our current model is that we create a release placeholder before we have a release manager for the release. What you are suggesting makes sense to me, but it introduces another problem. Today we create release placeholders as soon as we push out a release, we always have placeholders for the upcoming fix/trunk based releases. This gives us a place to hang JIRA issues off of, it allows us to triage new issues and slate them for a particular release. We could instead go to the model of having only trunk, no placeholder at all for the fix and next major/minor release (3.3.3/3.4.0 today). Then, at some point, a release manager could step up and volunteer to do a release, say 3.3.3, they would then be responsible for determining what's in the release. They would work with the community to do this, in the end they (the RM) are the arbiter for what's in/out of the release. We could try this and see how it works. It would allow for what Ben is suggesting. EOD though it requires someone to step up and take on the responsibility of being the RM. (hint hint :-) ) Patrick -Flavio On Jan 26, 2011, at 7:27 PM, Benjamin Reed wrote: i would really like to get 3.3.3 out because of the fixes that just went in. there are quite a few bugs that are marked for 3.3.3, but i think they can all be pushed to 3.4.0. i would really like to push everything to 3.4.0 and then work on getting the 3.4.0 release out. we haven't done a release from trunk in a while, but that is the only code that gets tested by hadoopqa. i think it is a bad idea to be releasing from branches that are not regularly tested. going forward doesn't it seem like a better idea to only do a release from just a branch if it is something that pops up quickly right after a release. otherwise, we should be releasing from trunk and possibly doing a simultaneous release from a branch. ben *flavio* *junqueira* research scientist f...@yahoo-inc.com direct +34 93-183-8828 avinguda diagonal 177, 8th floor, barcelona, 08018, es phone (408) 349 3300fax (408) 349 3301
Re: pushing the 3.3.3 bugs
On Wed, Jan 26, 2011 at 12:39 PM, Benjamin Reed br...@yahoo-inc.com wrote: this is an interesting read. i'm not a big fan of the even odd numbering scheme. i'm also unclear how it work. for example, lets say i signed up to be the RM for 3.4.0. i branch, stabilize the code, and then do a release. would i also be in charge of 3.4.1? i would hope the answer would be yes. i think the RM should have some long term commitment until they decide to retire the 3.4 series. We wouldn't have to adopt everything, just the bits we like. re 3.4.0 vs 3.4.1, a RM is responsible for a single release. It would be up to that person to sign up for a subsequent release, or not. Someone else would have to pickup the ball in that case. Another good reason for having more committers. it would allow things to flow a bit better if the RM pulled patches from trunk rather than contributors having to work with two versions of code to do a patch. of course that puts more work on the RM. This is what you were referring to in your follow-up email on this thread, right? What you are saying here is what the RM would have to do. Pull patches from trunk to include in the fix release. In some cases though the RM would be on the hook to get the patch backported (or do the work themselves). Hopefully this person would work closely with the community to capture the patches they are interested in, but eod it's up to the RM what they will include. Of course the PMC could then reject the release etc... Patrick ben On 01/26/2011 11:30 AM, Patrick Hunt wrote: FYI, this is a _really_ good read, perhaps we should try something like this, at the very least we should document our approach: http://httpd.apache.org/dev/release.html http://httpd.apache.org/dev/release.htmlPatrick On Wed, Jan 26, 2011 at 11:23 AM, Patrick Huntph...@apache.org wrote: One other thing to keep in mind with this model. The RM is responsible for backporting (or working with the author to backport) any issues that go into a fix release. Today we require authors to provide patches for both the fix branch and the trunk (for fixes). If changes are committed to the trunk, and at some point a RM steps up to create a fix release, those changes need to be applied to the fix branch. Granted, this seems to fit well with Ben's original suggestion of limiting the number of fixes that go into fix releases. It's a step away from what our users have come to expect though - that we essentially maintain a fix release branch with most/all fixes, as well as a new feature development branch (trunk). Patrick On Wed, Jan 26, 2011 at 11:15 AM, Patrick Huntph...@apache.org wrote: On Wed, Jan 26, 2011 at 10:38 AM, Flavio Junqueiraf...@yahoo-inc.comwrote: Ben, Your proposal in general sounds reasonable to me with the exception of do a release from just a branch if it is something that pops up quickly right after a release. I don't see a reason for binding it to time, and instead we could say that we will have a branch release if: 1- there is an important bug fix that needs to be released 2- we are not close to a trunk release One problem with our current model is that we create a release placeholder before we have a release manager for the release. What you are suggesting makes sense to me, but it introduces another problem. Today we create release placeholders as soon as we push out a release, we always have placeholders for the upcoming fix/trunk based releases. This gives us a place to hang JIRA issues off of, it allows us to triage new issues and slate them for a particular release. We could instead go to the model of having only trunk, no placeholder at all for the fix and next major/minor release (3.3.3/3.4.0 today). Then, at some point, a release manager could step up and volunteer to do a release, say 3.3.3, they would then be responsible for determining what's in the release. They would work with the community to do this, in the end they (the RM) are the arbiter for what's in/out of the release. We could try this and see how it works. It would allow for what Ben is suggesting. EOD though it requires someone to step up and take on the responsibility of being the RM. (hint hint :-) ) Patrick -Flavio On Jan 26, 2011, at 7:27 PM, Benjamin Reed wrote: i would really like to get 3.3.3 out because of the fixes that just went in. there are quite a few bugs that are marked for 3.3.3, but i think they can all be pushed to 3.4.0. i would really like to push everything to 3.4.0 and then work on getting the 3.4.0 release out. we haven't done a release from trunk in a while, but that is the only code that gets tested by hadoopqa. i think it is a bad idea to be releasing from branches that are not regularly tested. going forward doesn't it seem like a better idea to only do a release from just a branch if it is something that pops up quickly right after a release. otherwise, we should
[VOTE] Bylaws for the Apache ZooKeeper project
I propose that we adopt the bylaws proposed at https://cwiki.apache.org/confluence/display/ZOOKEEPER/ZooKeeperBylawsProposal as the bylaws for the Apache ZooKeeper project. In a self referential use of these bylaws I further propose that this vote will be open for 6 business days and require +1 votes from two thirds of PMC members (which works out to 4 votes). All ZooKeeper community members are encouraged to vote, though only PMC votes will be binding. Here's my +1 vote. Patrick
Re: New to ZooKeeper project
Welcome! Sounds like you might want to work with Camille: http://markmail.org/message/fo6fzrrh3t3v5ayr ZK java runs under cygwin, so that's an option as well. Regards, Patrick On Thu, Feb 3, 2011 at 9:36 PM, Thiwanka Somasiri asthiwa...@gmail.com wrote: Hi, I am new to the ZooKeeper project and I need to know whether I can contribute to the project with a single computer which has Windows (64-bit) installed. And from which point should I start to contribute the project? -- Regards A.S.Thiwanka Somasiri Skype : executionerwild MSN : thi...@ymail.com thi...@ymail.com
ZooKeeper site updated for Apache Project Branding Requirements
This morning I pushed a small set of updates to the site for Apache Project Branding Requirements: http://www.apache.org/foundation/marks/pmcs.html Let me know if you see any issues. Regards, Patrick
Re: release management and committer criteria wiki pages
Ben looks like a good start to me. I made some additions to the release management page that I lifted from the httpd page (mainly on who/what it means to be a RM). Committer criteria looks good as well, I added a few more details. Also we should create a page at some point similar to this (what to do once you are a committer): http://camel.apache.org/how-do-i-become-a-committer.html Patrick On Tue, Feb 1, 2011 at 2:46 PM, Benjamin Reed br...@yahoo-inc.com wrote: i've added two new pages to the wiki about release management and committer criteria that i'd like to get feed back on: https://cwiki.apache.org/confluence/display/ZOOKEEPER/ReleaseManagement https://cwiki.apache.org/confluence/display/ZOOKEEPER/CommitterCriteria please feel free to comment. i also do not hold a lock on those pages, so PMC members, please feel free to update. there was something i noticed when doing the ReleaseManagement; the language in the ByLaws with respect to the Release Plan is a bit imprecise: Defines the timetable and actions for a release. The plan also nominates a Release Manager. I think it should be designates rather than nominates.
[DISCUSS] Move BookKeeper and Hedwig to the incubator
I wanted to initiate this discussion (not a vote) so that we can work out interest levels and all be on the same page. These are our current contribs: bookkeeper/ fatjar/ hedwig/ huebrowser/ loggraph/ monitoring/ rest/ zkfuse/ zkperl/ zkpython/ zktreeutil/ zooinspector/ While most of these are relatively closely tied to ZK, in particular BK and Hedwig are projects that are not directly related to ZK. Rather they are users of the service, similar to say HBase. From bk/hedwig perspective: I personally think that both of these projects would benefit by moving to the incubator. They could build their own distinct communities and could govern their own development. Their own releases should not be directly tied to ZK releases. As these projects gain momentum (bk is already being looked at by Hadoop and others) this will be even more of an issue. From zk perspective: While we could also make these subprojects of ZK I don't think that's the right way to go. We (zk pmc) shouldn't be governing these communities, that's what the incubator is for. The incubator may seem daunting given the list of issues that need to be resolved and the oversite provided, but I can tell you from personal experience (whirr) that this is not as big a deal as it seems. I would be willing to be a mentor for both of these projects if they were to move to incubator. From apache perspective: apache created the incubator specifically for the reasons I'm citing and want to see new, distinct projects move through that process, eventually to become a TLP. Also, how does our (ZK) community recognize the contributions of those primarily working on BK/Hedwig. Do we make them ZK committers? We could, but committership on ZK is supposed to be reserved for those making contributions to ZK itself (the code and community). Given that BK/Hedwig are distinct codebases/users/community this doesn't really fit and complicates some of the governance issues. Please do give this some thought and respond with your insights. Regards, Patrick
Re: [VOTE] Release ZooKeeper 3.3.3 (candidate 0)
FYI: if you make bulk changes (greater than 5 or so, typically this happens during release time) to JIRA please do use the bulk change feature, and in particular turn OFF email notification. Subsequently send out a single email detailing the changes. Patrick On Mon, Feb 21, 2011 at 3:46 AM, Flavio Junqueira f...@yahoo-inc.com wrote: Hi again, I'm sorry for raising yet another point. I also wanted to point out that we have a few jiras marked as blocked not included in the current candidate. ZOOKEEPER-880 is one of them, and Vishal asked us to consider it for 3.3.3. I think that either we conclude that it is not a blocker and leave it marked for 3.4.0 as it is currently, or include it in 3.3.3. -Flavio On Feb 19, 2011, at 3:27 PM, Benjamin Reed wrote: (the previous email had the URL slightly incorrect) after much struggle! i've created a candidate build for ZooKeeper 3.3.3. this is a bug fix release addressing 13 issues (two of them extremely critical) -- see the release notes for details. *** Please download, test and VOTE before the *** vote closes 11pm pacific time, Tuesday, February 22.*** http://people.apache.org/~breed/zookeeper-3.3.3-candidate-0/ one thing that has not been fixed in this release is that the docs still reference hadoop. this will be fixed in a future release. should we release this? ben ps - give that this is the first release there is more than likely something i missed and given the severity of issues addressed it would be nice to get it out quickly. please review ASAP. *flavio* *junqueira* research scientist f...@yahoo-inc.com direct +34 93-183-8828 avinguda diagonal 177, 8th floor, barcelona, 08018, es phone (408) 349 3300fax (408) 349 3301
Re: [VOTE] Release ZooKeeper 3.3.3 (candidate 0)
-1, in general looks good but I did notice a few things: * the zookeeper_version.h file needs to be updated to version 3.3.3 * same with configure.ac (http://wiki.apache.org/hadoop/ZooKeeper/HowToRelease) * the toplevel zk jar and all jars in dist-maven have not been signed, they need to be RAT passed so that's good. Patrick On Sat, Feb 19, 2011 at 6:27 AM, Benjamin Reed br...@apache.org wrote: (the previous email had the URL slightly incorrect) after much struggle! i've created a candidate build for ZooKeeper 3.3.3. this is a bug fix release addressing 13 issues (two of them extremely critical) -- see the release notes for details. *** Please download, test and VOTE before the *** vote closes 11pm pacific time, Tuesday, February 22.*** http://people.apache.org/~breed/zookeeper-3.3.3-candidate-0/ one thing that has not been fixed in this release is that the docs still reference hadoop. this will be fixed in a future release. should we release this? ben ps - give that this is the first release there is more than likely something i missed and given the severity of issues addressed it would be nice to get it out quickly. please review ASAP.
Re: [VOTE] Release ZooKeeper 3.3.3 (candidate 0)
Done. (hopefully this gives you some insight into why I've been pushing for maven, not perfect but it does alot of this for you). Patrick On Tue, Feb 22, 2011 at 7:58 AM, Benjamin Reed br...@apache.org wrote: pat, do you mind updating the building section of https://cwiki.apache.org/confluence/display/ZOOKEEPER/HowToRelease https://cwiki.apache.org/confluence/display/ZOOKEEPER/HowToReleasewith these three items. signing the jar files is very unclear in the original doc. do i untar, sign, and retar? we don't do jarsigning? i'll do an rc1 this evening. ben On Mon, Feb 21, 2011 at 11:55 PM, Patrick Hunt ph...@apache.org wrote: -1, in general looks good but I did notice a few things: * the zookeeper_version.h file needs to be updated to version 3.3.3 * same with configure.ac ( http://wiki.apache.org/hadoop/ZooKeeper/HowToRelease) * the toplevel zk jar and all jars in dist-maven have not been signed, they need to be RAT passed so that's good. Patrick On Sat, Feb 19, 2011 at 6:27 AM, Benjamin Reed br...@apache.org wrote: (the previous email had the URL slightly incorrect) after much struggle! i've created a candidate build for ZooKeeper 3.3.3. this is a bug fix release addressing 13 issues (two of them extremely critical) -- see the release notes for details. *** Please download, test and VOTE before the *** vote closes 11pm pacific time, Tuesday, February 22.*** http://people.apache.org/~breed/zookeeper-3.3.3-candidate-0/ one thing that has not been fixed in this release is that the docs still reference hadoop. this will be fixed in a future release. should we release this? ben ps - give that this is the first release there is more than likely something i missed and given the severity of issues addressed it would be nice to get it out quickly. please review ASAP.
Re: [VOTE] Release ZooKeeper 3.3.3 (candidate 0)
done On Tue, Feb 22, 2011 at 7:59 AM, Benjamin Reed ben.r...@gmail.com wrote: can you add this to the how to release wiki as well? ben On Mon, Feb 21, 2011 at 9:38 AM, Patrick Hunt ph...@apache.org wrote: FYI: if you make bulk changes (greater than 5 or so, typically this happens during release time) to JIRA please do use the bulk change feature, and in particular turn OFF email notification. Subsequently send out a single email detailing the changes. Patrick On Mon, Feb 21, 2011 at 3:46 AM, Flavio Junqueira f...@yahoo-inc.comwrote: Hi again, I'm sorry for raising yet another point. I also wanted to point out that we have a few jiras marked as blocked not included in the current candidate. ZOOKEEPER-880 is one of them, and Vishal asked us to consider it for 3.3.3. I think that either we conclude that it is not a blocker and leave it marked for 3.4.0 as it is currently, or include it in 3.3.3. -Flavio On Feb 19, 2011, at 3:27 PM, Benjamin Reed wrote: (the previous email had the URL slightly incorrect) after much struggle! i've created a candidate build for ZooKeeper 3.3.3. this is a bug fix release addressing 13 issues (two of them extremely critical) -- see the release notes for details. *** Please download, test and VOTE before the *** vote closes 11pm pacific time, Tuesday, February 22.*** http://people.apache.org/~breed/zookeeper-3.3.3-candidate-0/ one thing that has not been fixed in this release is that the docs still reference hadoop. this will be fixed in a future release. should we release this? ben ps - give that this is the first release there is more than likely something i missed and given the severity of issues addressed it would be nice to get it out quickly. please review ASAP. *flavio* *junqueira* research scientist f...@yahoo-inc.com direct +34 93-183-8828 avinguda diagonal 177, 8th floor, barcelona, 08018, es phone (408) 349 3300 fax (408) 349 3301
Re: [VOTE] Release ZooKeeper 3.3.3 (candidate 0)
On Tue, Feb 22, 2011 at 9:38 AM, Flavio Junqueira f...@yahoo-inc.com wrote: I'm not sure why you say it is fine, Pat. If I try to compile with the candidate release, ant complains that build-contrib.xml is missing (and it is not where it is supposed to be). Compiling from trunk or one of the branches works fine, though. Hi Flavio, can you be more specific? Because it's working fine for me: cd zookeeper-3.3.3/src/contrib/bookkeeper ant jar ... jar: [echo] contrib: bookkeeper BUILD SUCCESSFUL Total time: 2 seconds -Flavio On Feb 22, 2011, at 6:04 PM, Patrick Hunt wrote: I think it's fine - it's in src/contrib not contrib. On Tue, Feb 22, 2011 at 8:00 AM, Benjamin Reed ben.r...@gmail.com wrote: i don't know why build-contrib.xml is missing. pat do you have any ideas? ant tar should grab it right? ben On Mon, Feb 21, 2011 at 2:02 AM, Flavio Junqueira f...@yahoo-inc.com wrote: Ben, I have a question. Even though the bookkeeper jar is there, build-contrib.xml is missing (ZOOKEEPER-956), and compiling fails without it. Is it supposed to be this way? -Flavio On Feb 19, 2011, at 3:27 PM, Benjamin Reed wrote: (the previous email had the URL slightly incorrect) after much struggle! i've created a candidate build for ZooKeeper 3.3.3. this is a bug fix release addressing 13 issues (two of them extremely critical) -- see the release notes for details. *** Please download, test and VOTE before the *** vote closes 11pm pacific time, Tuesday, February 22.*** http://people.apache.org/~breed/zookeeper-3.3.3-candidate-0/ one thing that has not been fixed in this release is that the docs still reference hadoop. this will be fixed in a future release. should we release this? ben ps - give that this is the first release there is more than likely something i missed and given the severity of issues addressed it would be nice to get it out quickly. please review ASAP. *flavio* *junqueira* research scientist f...@yahoo-inc.com direct +34 93-183-8828 avinguda diagonal 177, 8th floor, barcelona, 08018, es phone (408) 349 3300fax (408) 349 3301 *flavio* *junqueira* research scientist f...@yahoo-inc.com direct +34 93-183-8828 avinguda diagonal 177, 8th floor, barcelona, 08018, es phone (408) 349 3300fax (408) 349 3301
Re: [VOTE] Release ZooKeeper 3.3.3 (candidate 0)
See my comments I just sent to Flavio. I agree, if we go with something like maven we should be able to have a source only release artifact, plus a number of binary artifacts (push to maven, a separate binary artifact, etc...). I've proposed maven in the past, but not gotten much interest. Now that addl people are participating in the RM task there may be renewed interest. ;-) Patrick On Wed, Feb 23, 2011 at 4:55 AM, Thomas Koch tho...@koch.ro wrote: Flavio Junqueira: The problem is that there is a path zookeeper-3.3.3/contrib/ bookkeeper containing build.xml and source code. If you try to run from there, it will fail. I'm actually wondering why we have the source code both under contrib/ and under src/contrib/. The folder structure under contrib/ is different and it does look like it is not supposed to compile (hence my confusion), and I wonder why we need it in the release. -Flavio The code dupliation of contrib and src/contrib is one of the annoyances when packaging ZooKeeper for Debian. You'd really do me a favor if you'd release a tarball without this duplication. (The biggest annoyance is the shipment of binary .jar files in every java project, not only ZooKeeper. - But things are getting better slowly with maven.) Best regards, Thomas Koch, http://www.koch.ro
Re: zombie on zk precommit build machine
https://hudson.apache.org/hudson/view/S-Z/view/ZooKeeper/job/PreCommit-ZOOKEEPER-Build/173/ usually this indicates that there is a zombie server hanging around: [exec] [exec] ./zktest-st [exec] [exec] ZooKeeper server process failed ZooKeeper server NOT startedRunning [exec] [exec] Zookeeper_operations::testPing : elapsed 0 : OK On Fri, Feb 25, 2011 at 10:41 AM, Nigel Daley nda...@mac.com wrote: which one? I don't see any zombie. Nige On Feb 25, 2011, at 10:03 AM, Patrick Hunt wrote: Guys, there seems to be a zombie on the precommit build machine, it's causing all the builds to fail. Can you take a look? Thanks! Patrick
Re: zombie on zk precommit build machine
zk.log (the log output of starting that server) was missing from the artifacts. I just added it, which should allow us to better diagnose this issue on subsequent failures. Thanks guys! Patrick On Fri, Feb 25, 2011 at 11:18 AM, Giridharan Kesavan gkesa...@yahoo-inc.com wrote: there is no stale process running on h7. I dont see any On Feb 25, 2011, at 11:14 AM, Patrick Hunt wrote: https://hudson.apache.org/hudson/view/S-Z/view/ZooKeeper/job/PreCommit-ZOOKEEPER-Build/173/ usually this indicates that there is a zombie server hanging around: [exec] [exec] ./zktest-st [exec] [exec] ZooKeeper server process failed ZooKeeper server NOT startedRunning [exec] [exec] Zookeeper_operations::testPing : elapsed 0 : OK On Fri, Feb 25, 2011 at 10:41 AM, Nigel Daley nda...@mac.com wrote: which one? I don't see any zombie. Nige On Feb 25, 2011, at 10:03 AM, Patrick Hunt wrote: Guys, there seems to be a zombie on the precommit build machine, it's causing all the builds to fail. Can you take a look? Thanks! Patrick
Re: Interesting project on github - Noah
Ya, I wondered about that too, esp given he mentions (yes I'm aware that you can run Zookeeper in single server mode from a single JAR file) and Noah's stack requires (granted we require java): Ruby EventMachine/Sinatra/Ohm Redis Patrick On Sat, Feb 26, 2011 at 10:11 PM, Benjamin Reed ben.r...@gmail.com wrote: what does he mean that zookeeper is big? ben On Sat, Feb 26, 2011 at 2:44 PM, Patrick Hunt ph...@apache.org wrote: Noah is a lightweight registry based on the concepts in the Apache Zookeeper project. https://github.com/lusis/Noah/wiki/Original-README Patrick
Re: [DISCUSS] 3.4.0 release plan of ZooKeeper.
On Mon, Feb 28, 2011 at 11:22 PM, Thomas Koch tho...@koch.ro wrote: there are many more issues in jira marked for 3.4.0. Do you plan to postpone those? What is the reason to do the 3.4.0 release now without those instead of waiting a bit more? Or what is already in 3.4.0 what needs to get out? Hi Thomas, historically we block a release if it has blocker jiras assigned. Anything short of that gets pushed out to a subsequent release. Of course contributors can submit changes that are not blockers prior to the release, these typically get included, but it's up the the RM to make the call: https://cwiki.apache.org/confluence/display/ZOOKEEPER/ReleaseManagement Patrick
ZooKeeper patch builds failing due to java not found
Nigel, Giri, any idea what's going on here? See https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/174/artifact/trunk/build/tmp/zk.log for this error: /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/src/c/tests/zkServer.sh: line 115: java: command not found Patrick
Re: IRC channel for zookeeper.
I personally like the IRC channel, esp when someone needs a quick answer to a question. However in general Apache frowns on making decisions there. Best to use the mailing list and jira - keep in mind that users/contributors may be spread across many timezones, not everyone is online at the same time, IRC is not archived, etc... Email/Jira addresses this. Regards, Patrick On Tue, Mar 1, 2011 at 10:02 AM, Mahadev Konar maha...@apache.org wrote: Hi all, Just making sure that folks know about the IRC channel for zookeeper. Pat had created one long time ago. http://zookeeper.apache.org/irc.html. Its sometimes easier to have discussions on IRC rather than doing back and forth on the jira. Though we should make sure to post the conclusions on the jira. thanks mahadev
Possible very serious regression in 3.3.3
I'm seeing this with 3.3.3, which seems like a very serious issue: https://issues.apache.org/jira/browse/ZOOKEEPER-1006 Any insights, could someone triage this? Is this related to the QP changes in 3.3.3? Patrick
Apache Sonar service now available from asfinfra
This is cool: http://wiki.apache.org/general/SonarInstance Would be great to see ZooKeeper up there. Anyone interested to take the lead on making this happen? Patrick
Re: PMC member criteria for ZooKeeper.
On Tue, Mar 8, 2011 at 7:00 AM, Flavio Junqueira f...@yahoo-inc.com wrote: Most discussions apart from issues like new committers are open, and anyone in the community has the right to express an opinion, and I believe we in general do take opinions and suggestions into account. Consequently, I don't see much benefit in having a PMC member that does not have a set of responsabilities that is a superset of the of the ones of a committer. Community members come and go, a sign of a healthy Apache project is adding new committers and pmc members to ensure that the project continues to be viable as this ebb/flow happens. At the same time, I don't see a reason for constraining PMC to be committers in the bylaws. I would much rather discuss each case individually, and evaluate the merit of the candidate accordingly. We have clearly stated in the bylaws how one becomes a PMC member voting), so I agree with you we don't need to update the bylaws. But it is a good idea to outline how one becomes a PMC member and the criteria we (zk) use to judge. Even if this is just a pointer to the links I sent earlier. (similar to what we have for committers https://cwiki.apache.org/confluence/display/ZOOKEEPER/CommitterCriteria I think this is what Mahadev was shooting for, get everyone on the same page; current PMC members, new members as they are elected, and the community at large) Patrick On Mar 8, 2011, at 12:12 AM, Benjamin Reed wrote: i would like to the pmc to have more of a project management view. i think it would be great to have pmc members come up through the committer ranks, but i also think there may be potential pmc members that are more project management oriented than code oriented. for me an ideal pmc member would: - understand the project - have a good understanding for where the project should and shouldn't go, and be able to express that understanding - should vote on releases and be involved in release discussions - should participate in the mailing lists - have a good view of how zookeeper sits in the apache eco system - know what work is going on and identify areas of needed work a committer will do many of these things, but you could be the ideal pmc member and not be heavily involved in the coding, so making the pmc members a subset of the committers seems overly restrictive. actually it may be nice to have some members who don't have their heads down in the code so that they can take a broader view. so i guess the one attribute i would take issue with from your list is the patch reviews and contributions. a pmc member should be familiar with the work going on in the project, but patch reviews and contributions is squarely in the committers area of responsibility. ben On Mon, Mar 7, 2011 at 9:00 AM, Mahadev Konar maha...@apache.org wrote: Hi all, I have been thinking about what should be the criteria for PMC members for ZK. I do not have much experience with PMC member criteria for other projects except for Hadoop. In Hadoop we indirectly imply that a PMC member be a superset of a committer. Meaning more responsibilities than a committer, more responsibility towards project direction, more responsibilities towards projects day to day activities. and here is what I had in mind for ZK (mostly explicitly stating what we have in Hadoop): A PMC member should be able to get involved in the day to day activities of the project - by day to day activities I imply - release discussions - code reviews/ could be any kind - documentation/ others (does not imply a deep understanding of the project), should be willing to contribute on any part of the project - should be willing to work with new contributors and mentor them (mostly a superset of committer). - works well with other PMC members By the above I imply that a PMC member has a greater set of responsibilities that a committer and should be able to review (any contribution) and contribute towards ZK releases. What do others think? thanks mahadev *flavio* *junqueira* research scientist f...@yahoo-inc.com direct +34 93-183-8828 avinguda diagonal 177, 8th floor, barcelona, 08018, es phone (408) 349 3300fax (408) 349 3301
Re: PMC member criteria for ZooKeeper.
On Mon, Mar 7, 2011 at 9:00 AM, Mahadev Konar maha...@apache.org wrote: I have been thinking about what should be the criteria for PMC members for ZK. I do not have much experience with PMC member criteria for other projects except for Hadoop. In Hadoop we indirectly imply that a PMC member be a superset of a committer. Meaning more responsibilities than a committer, more responsibility towards project direction, more responsibilities towards projects day to day activities. Hey Mahadev, from an Apache perspective coding doesn't really come into play, PMC is more about governance/legal/community than coding: http://www.apache.org/foundation/how-it-works.html#pmc The key components are this: The role of the PMC from a Foundation perspective is oversight. The main role of the PMC is not code and not coding - but to ensure that all legal issues are addressed, that procedure is followed, and that each and every release is the product of the community as a whole. That is key to our litigation protection mechanisms. Secondly the role of the PMC is to further the long term development and health of the community as a whole, and to ensure that balanced and wide scale peer review and collaboration does happen. Within the ASF we worry about any community which centers around a few individuals who are working virtually uncontested. We believe that this is detrimental to quality, stability, and robustness of both code and long term social structures. Further there is no requirement that a PMC member even be a committer. http://www.apache.org/foundation/how-it-works.html#pmc-members A PMC member is a developer or a committer that was elected due to merit for the evolution of the project and demonstration of commitment. They have write access to the code repository, an apache.org mail address, the right to vote for the community-related decisions and the right to propose an active user for committership. The PMC as a whole is the entity that controls the project, nobody else. What you are describing about coding/review is more Committership and not PMC. By the above I imply that a PMC member has a greater set of responsibilities that a committer and should be able to review (any contribution) and contribute towards ZK releases. What do others think? Wrt to great responsibilities that's definitely true, however PMC responsibilities are around governance, while Committer responsibilities are coding/reviewing. Patrick
Re: PMC member criteria for ZooKeeper.
Ben, what you are detailing is similar to my response to Mahadev. One note though, from an Apache perspective PMC members need not even be familiar with the project, take Hadoop as an example where Ian was largely unfamiliar with Hadoop prior to joining their PMC. legal/procedure/community building, these are all things that can be done by someone familiar with the apache way, but not necessarily familiar with the individual project (not that I'm advocating we pull in non-zk community into the pmc, but just to highlight). Another example is the IPMC (incubator pmc), any Apache Member may be an IPMC member just by asking, and they are charged with the oversight of the individual podlings. Patrick On Mon, Mar 7, 2011 at 3:12 PM, Benjamin Reed br...@apache.org wrote: i would like to the pmc to have more of a project management view. i think it would be great to have pmc members come up through the committer ranks, but i also think there may be potential pmc members that are more project management oriented than code oriented. for me an ideal pmc member would: - understand the project - have a good understanding for where the project should and shouldn't go, and be able to express that understanding - should vote on releases and be involved in release discussions - should participate in the mailing lists - have a good view of how zookeeper sits in the apache eco system - know what work is going on and identify areas of needed work a committer will do many of these things, but you could be the ideal pmc member and not be heavily involved in the coding, so making the pmc members a subset of the committers seems overly restrictive. actually it may be nice to have some members who don't have their heads down in the code so that they can take a broader view. so i guess the one attribute i would take issue with from your list is the patch reviews and contributions. a pmc member should be familiar with the work going on in the project, but patch reviews and contributions is squarely in the committers area of responsibility. ben On Mon, Mar 7, 2011 at 9:00 AM, Mahadev Konar maha...@apache.org wrote: Hi all, I have been thinking about what should be the criteria for PMC members for ZK. I do not have much experience with PMC member criteria for other projects except for Hadoop. In Hadoop we indirectly imply that a PMC member be a superset of a committer. Meaning more responsibilities than a committer, more responsibility towards project direction, more responsibilities towards projects day to day activities. and here is what I had in mind for ZK (mostly explicitly stating what we have in Hadoop): A PMC member should be able to get involved in the day to day activities of the project - by day to day activities I imply - release discussions - code reviews/ could be any kind - documentation/ others (does not imply a deep understanding of the project), should be willing to contribute on any part of the project - should be willing to work with new contributors and mentor them (mostly a superset of committer). - works well with other PMC members By the above I imply that a PMC member has a greater set of responsibilities that a committer and should be able to review (any contribution) and contribute towards ZK releases. What do others think? thanks mahadev
Re: redirect old zookeeper site?
Hm, no idea. I'll look into it. On Fri, Mar 18, 2011 at 2:50 PM, Benjamin Reed br...@apache.org wrote: good point! do you know how to do this pat? ben On Fri, Mar 18, 2011 at 2:30 PM, Sean Bridges sean.brid...@gmail.com wrote: The first result I get in google for zookeeper points to the old zookeeper page, http://hadoop.apache.org/zookeeper/index.html Can you get that page to redirect to or link to, http://zookeeper.apache.org/ I was looking to download 3.3.3, but there is no download link to 3.3.3 from the old site. Thanks, Sean
Re: broken links on zookeeper.apache.org
Thanks for pointing this out, I'll look into it - I suspect we didn't put up the api docs when we moved the site (it's a separate step as part of a release). Patrick On Wed, Mar 23, 2011 at 8:01 AM, nicholas harteau n...@hep.cat wrote: is this the primary site now? it looks like the api doc is a 404 in recent releases: from: http://zookeeper.apache.org/doc/r3.3.3/ http://zookeeper.apache.org/doc/r3.3.3/api/index.html = 404 similarly, from: http://zookeeper.apache.org/doc/r3.3.3/ http://zookeeper.apache.org/doc/r3.3.2/api/index.html = 404 -- n...@hep.cat (^-^)
Re: negotiated timeout
Ted, you'll need to ask the hbase guys about this if you are not running a dedicated zk cluster. I'm not sure how they manage embedded zk. However a quick search of the HBASE code results in: ./src/main/java/org/apache/hadoop/hbase/zookeeper/HQuorumPeer.java: // Set the max session timeout from the provided client-side timeout properties.setProperty(maxSessionTimeout, conf.get(zookeeper.session.timeout, 18)); Patrick On Thu, Mar 24, 2011 at 4:00 PM, Ted Yu yuzhih...@gmail.com wrote: Patrick: Do you want me to look at maxSessionTimeout ? Since hbase manages zookeeper, I am not sure I can control this parameter directly. On Thu, Mar 24, 2011 at 3:50 PM, Patrick Hunt ph...@apache.org wrote: http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html#sc_advancedConfiguration On Thu, Mar 24, 2011 at 3:43 PM, Mahadev Konar maha...@apache.org wrote: Hi Ted, The session timeout can be changed by the server depending on min/max bounds set on the servers. Are you servers configured to have a max timeout of 60 seconds? usually the default is 20 * tickTime. Looks like your ticktime is 3 seconds? thanks mahadev On Thu, Mar 24, 2011 at 3:20 PM, Ted Yu yuzhih...@gmail.com wrote: Hi, hbase 0.90.1 uses zookeeper 3.3.2 I specified: property namezookeeper.session.timeout/name value49/value /property In zookeeper log I see: 2011-03-24 19:58:09,499 INFO org.apache.zookeeper.server.NIOServerCnxn: Client attempting to establish new session at /10.202.50.111:50325 2011-03-24 19:58:09,499 INFO org.apache.zookeeper.server.NIOServerCnxn: Established session 0x12ebb99d686a012 with negotiated timeout 6 for client /10.202.50.112:62386 2011-03-24 19:58:09,499 INFO org.apache.zookeeper.server.NIOServerCnxn: Client attempting to establish new session at /10.202.50.112:62387 2011-03-24 19:58:09,499 INFO org.apache.zookeeper.server.PrepRequestProcessor: Got user-level KeeperException when processing sessionid:0x12ebb99d686a012 type:create cxid:0x1 zxid:0xfffe txntype:unknown reqpath:n/a Error Path:/hbase Error:KeeperErrorCode = NodeExists for /hbase 2011-03-24 19:58:09,499 INFO org.apache.zookeeper.server.NIOServerCnxn: Established session 0x12ebb99d686a013 with negotiated timeout 6 for client /10.202.50.111:50324 Can someone tell me how the negotiated timeout of 6 was computed ? Thanks
Re: [ANNOUNCE] ZooKeeper Committer: Michi Mutsuzaki
Awesome! Welcome Michi! Patrick On Fri, Apr 8, 2011 at 5:28 PM, Mahadev Konar maha...@apache.org wrote: Hi all, The ZooKeeper PMC recently extended committer karma to Michi and he has accepted. Welcome aboard Michi! -- thanks mahadev @mahadevkonar
UltraESA - using ZK
A very interesting product using ZK for group coordination: http://twitter.com/#!/phunt/status/66173748109258755 Patrick
Fwd: Centralized KEYS files on https://people.apache.org/
FYI: If you are a committer you should get your key(s) up asap: -- Forwarded message -- From: Daniel Shahaf danie...@apache.org Date: Thu, May 5, 2011 at 8:51 AM Subject: Centralized KEYS files on https://people.apache.org/ To: committ...@apache.org The following is now implemented: If you login to https://id.apache.org and enter PGP key fingerprint(s), then the corresponding keys will within a few hours be made available under https://people.apache.org/keys/ , at URLs such as the following: https://people.apache.org/keys/committer/danielsh.asc (my key) https://people.apache.org/keys/group/httpd.asc (all httpd committers' keys). This arrangement is independent of the PGP fingerprints in ^/committers/info/, it does not pull fingerprints listed there. We recommend projects transition from KEYS files to /keys/group/$pmc.asc files managed via https://id.apache.org/. We have pre-filled this with the keys listed on the public http://people.apache.org pages (which are built from the information in ^/committers/info/ in the private repository). Some keys were not added, you can add those manually. You can edit or remove the listed keys via https://id.apache.org/. Documentation will follow on www.apache.org/dev/ later; it's basically enter the fingerprint or keyid in a format that 'gpg --recv-key' is happy with. Questions, patches, bugs, flames: to infrastructure@ (about id.a.o) or site-dev@ (about the scripts generating the keys/ directory). Thanks to pctony; bugs by me. Daniel
Re: zookeeper dev f2f meeting before or after the hadoop summit
Ben, last year Yahoo was nice enough to host us. Are they doing again this year? I heard some of the other hadoop related projects were going to get together, should we try to sync up with them? Whichever way you go this sounds like a good idea to me. Patrick On Thu, May 12, 2011 at 11:03 AM, Eugene Koontz ekoo...@hiro-tan.org wrote: I'm looking forward to meeting other ZK devs. Thanks to Ben for suggesting it. Either before or after the Summit is fine for me. -Eugene
Re: retreat from zookeeper
Hi Thomas, I did notice that you had been making some personal changes (all hail LI!) and I'm sorry to see you moving on. Flavio makes some good points which I won't re-iterate, really I think it boils down to perfect is the enemy of the good. It's great to chase perfection, but on a day to day basis we're all striving to make things better. I must admit I'm personally disappointed that the netty changes have still not gone in. I had completed these changes some time back before you raised issues and took on the effort of driving that to completion. There are a number of community members, not to mention my employer, who are disappointed with _me_ as a result. (which takes me back to my previous perfect enemy of good comment). Best of luck! Patrick On Thu, May 19, 2011 at 6:12 AM, Thomas Koch tho...@koch.ro wrote: Flavio Junqueira: poorly designed system. Hi Flavio, thank you for your response. Please note that I do intend to appreciate the sound design of ZooKeeper in my talk. ZooKeeper is not poorly _designed_! However it is IMHO poorly _implemented_. And I also tried to excuse this by pointing out that ZK seems to be the first of its kind. ZK has explored new areas and provided insights that others can build upon now. The fact that many projects use ZK shows how important its service is and confirms my assumption that nobody else has provided something equivalent before.[1] [1] http://stackoverflow.com/questions/6047917/zookeeper-alternatives-cluster- coordination-service So today it's easy for me to point out shortcomings and take ZK for granted, but in fact it should not be forgotten that ZK is a piece of visionary work without precedent in its kind. Best regards, Thomas Koch, http://www.koch.ro
Re: dev meeting agenda
One agenda item: I think we should discuss the following list (recently sent by Thomas) and boil it down into some action items (jiras if they don't already exist) with assignee's where possible. I've been discussing Maven support for a while, maybe this is a good time to do it. Patrick * The code is tightly coupled * most so called Unit-Tests are actualy integration tests. They run the whole application and test one specific functionality. * no uniform configuration: command line parameters, system properties, configuration file (java properties) * configuration properties copied to static class members * feature bloat on fragile foundation: e.g. chroot + automatic resubscribtion does not work * implementation unlike specification: allowed characters in path * still on ant instead of maven (depends how you see ant vs. maven) * circular object dependencies (e.g. ZooKeeper - ClientCnxn) * methods with +100 lines of code and nested conditions depth well over 5 * general attitude against refactoring, no knowledge or appreciation of effective java (Josh Bloch) or clean code (Robert C. Martin) * magic numbers instead of enum * still bound to inline copy of jute (HadoopIO, avro predecessor) * even hand coded (de)serialization in leader election * no client-only jar. Every client gets the full server code. * unhandy API triggered (at least) two client API wrappers: zkClient, cages * insane amounts of code duplication * horrible, fragile thread programming: plenty of XYZ extends Threads instead of - implements runnable - or better: executor framework - or much better: actors (see Akka) - leads to fear of refactoring, because nobody understands all synchronization needs. On Thu, May 19, 2011 at 9:17 AM, Benjamin Reed br...@apache.org wrote: please respond to this email to add things to the agenda for the dev meeting. here are my items: item: we need to figure out a better way of managing recipes.
Re: post hadoop summit dev meeting
Awesome, I'm excited to get together with the community. (I'd suggest use whatever the BAHUG uses for meetups) Patrick On Thu, May 19, 2011 at 9:16 AM, Benjamin Reed br...@apache.org wrote: yahoo! will be sponsoring developer meetings for various projects the day after the hadoop summit. they need a head count to reserve the right room size and snacks. other dev meetings i've been to generally last around 2 hours. so, if that works for everyone i can setup a meeting on meetup. (is that what people use these days?) i'll start a new email thread to start collecting things to put on the agenda. ben
Re: commit process for zookeeper
Great idea Ben, thanks. On Thu, May 19, 2011 at 12:30 PM, Michi Mutsuzaki mic...@yahoo-inc.com wrote: Hi Ben, Thank you for putting this up! I find it very helpful. --Michi On 5/19/11 12:17 PM, Benjamin Reed br...@apache.org wrote: i've added a wiki page to document the commit process. they are the steps i follow, so hopefully i didn't miss anything :) i think it would be helpful for the new committers as they do their first commits. (welcome! thanks for sharing the work!) https://cwiki.apache.org/confluence/display/ZOOKEEPER/Committing+changes
Welcome Ted Dunning to the ZooKeeper PMC!
Hi Folks, I'm happy to report that the PMC has voted and Ted has happily accepted to become a ZooKeeper PMC member! Ted, welcome aboard! Please feel free to mention a little bit about yourself, and congrats! Patrick
ZooKeeper and Maven builds
I've uploaded a patch that adds maven build support: https://issues.apache.org/jira/browse/ZOOKEEPER-1078 If you can give it a try and comment on the jira I'd appreciate it. I'd like to get this committed asap so that ppl can try it out more easily and we can all hack on it more (ie improve the build/dev/release processes). You should be able to: 1) apply the patch to trunk (chmod +x the 2 scripts) 2) ./build.sh test 3) ./build.sh dist # see results in the dist directory Thanks, Patrick
Re: Error in sending Message
I looks like you are sending 1.7gig data in your message, which far exceeds the default max that ZK enforces. See this: http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html#Unsafe+Options Typically ZK is not used to send large data like this, usually you want to store that data somewhere and use ZK to send a reference to that file (a URL say). Patrick On Sun, May 29, 2011 at 12:45 AM, Yosef Arraf yosef.ar...@mailvision.com wrote: Hi My Name is Yosef, I'm trying to work with Norbert that wraps ZooKeeper : As a test i'm running the NorbertJavaNetworkClientMain and NorbertJavaNetworkServerMain I have a problem when i send a message from the Client to the server - in the zookeeper i;m getting the exception: * java.io.IOException: Unreasonable length = 1701999662 at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100) at org.apache.zookeeper.proto.ConnectRequest.deserialize(ConnectRequest.java:89) at org.apache.zookeeper.server.NIOServerCnxn.readConnectRequest(NIOServerCnxn.java:733) at org.apache.zookeeper.server.NIOServerCnxn.readPayload(NIOServerCnxn.java:485) at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:521) at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:262) * And the server code never reach the message function handler can you help? Thanks, Yosef
Re: zookeeper build error
The main build also builds the c client/tests. You need autoconf for that. Either install autoconf or try running just the java tests ant test-core-java Patrick On Mon, Jun 6, 2011 at 3:46 PM, Ma, Ming min...@ebay.com wrote: Hi, I tried to use the instruction http://wiki.apache.org/hadoop/ZooKeeper/HowToContribute to build zookeeper using ant -Djavac.args=-Xlint -Xmaxwarns 1000 clean test tar and got this build error. Has anyone seen this? /svnroot/hadoop/zookeeper-trunk/build.xml:901: Execute failed: java.io.IOException: Cannot run program autoreconf (in directory /svnroot/hadoop/zookeeper-trunk/src/c): java.io.IOException: error=2, No such file or directory at java.lang.ProcessBuilder.start(ProcessBuilder.java:460) at java.lang.Runtime.exec(Runtime.java:593) at org.apache.tools.ant.taskdefs.Execute$Java13CommandLauncher.exec(Execute.java:827) at org.apache.tools.ant.taskdefs.Execute.launch(Execute.java:445) at org.apache.tools.ant.taskdefs.Execute.execute(Execute.java:459) at org.apache.tools.ant.taskdefs.ExecTask.runExecute(ExecTask.java:635) at org.apache.tools.ant.taskdefs.ExecTask.runExec(ExecTask.java:676) at org.apache.tools.ant.taskdefs.ExecTask.execute(ExecTask.java:502) at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291) at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106) at org.apache.tools.ant.Task.perform(Task.java:348) at org.apache.tools.ant.Target.execute(Target.java:390) at org.apache.tools.ant.Target.performTasks(Target.java:411) at org.apache.tools.ant.Project.executeSortedTargets(Project.java:1360) at org.apache.tools.ant.helper.SingleCheckExecutor.executeTargets(SingleCheckExecutor.java:38) at org.apache.tools.ant.Project.executeTargets(Project.java:1212) at org.apache.tools.ant.taskdefs.Ant.execute(Ant.java:441) at org.apache.tools.ant.taskdefs.CallTarget.execute(CallTarget.java:105) at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291) at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106) at org.apache.tools.ant.Task.perform(Task.java:348) at org.apache.tools.ant.Target.execute(Target.java:390) at org.apache.tools.ant.Target.performTasks(Target.java:411) at org.apache.tools.ant.Project.executeSortedTargets(Project.java:1360) at org.apache.tools.ant.helper.SingleCheckExecutor.executeTargets(SingleCheckExecutor.java:38) at org.apache.tools.ant.Project.executeTargets(Project.java:1212) at org.apache.tools.ant.taskdefs.Ant.execute(Ant.java:441) at org.apache.tools.ant.taskdefs.CallTarget.execute(CallTarget.java:105) at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291) at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106) at org.apache.tools.ant.Task.perform(Task.java:348) at org.apache.tools.ant.Target.execute(Target.java:390) at org.apache.tools.ant.Target.performTasks(Target.java:411) at org.apache.tools.ant.Project.executeSortedTargets(Project.java:1360) at org.apache.tools.ant.helper.SingleCheckExecutor.executeTargets(SingleCheckExecutor.java:38) at org.apache.tools.ant.Project.executeTargets(Project.java:1212) at org.apache.tools.ant.taskdefs.Ant.execute(Ant.java:441) at org.apache.tools.ant.taskdefs.CallTarget.execute(CallTarget.java:105) at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291) at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106) at org.apache.tools.ant.Task.perform(Task.java:348) at org.apache.tools.ant.Target.execute(Target.java:390) at org.apache.tools.ant.Target.performTasks(Target.java:411) at org.apache.tools.ant.Project.executeSortedTargets(Project.java:1360) at org.apache.tools.ant.Project.executeTarget(Project.java:1329) at org.apache.tools.ant.helper.DefaultExecutor.executeTargets(DefaultExecutor.java:41) at org.apache.tools.ant.Project.executeTargets(Project.java:1212) at org.apache.tools.ant.Main.runBuild(Main.java:801) at org.apache.tools.ant.Main.startAnt(Main.java:218) at
Re: Samle program for making any service highly available using zookeeper
IMO one area in which we are particularly weak is good client side user examples, so yes, please do submit! If you could add sufficient supporting comments/javadoc/docs along with the example I think it would be particularly helpful to end users. Regards, Patrick On Wed, Jun 8, 2011 at 1:56 AM, Divya :P divyaprabha...@gmail.com wrote: Hi , We used zookeeper to make one of our service highly available. I have written a sample program which shows the usage of zookeeper to make the required service highly available . Can it be added it to examples package in the zookeeper code ? Thanks Regards, Divya
Re: Released versions in jira
https://issues.apache.org/jira/browse/ZOOKEEPER Click on versions on the left, then click on manage versions on the right. Then click on release and give it the correct date of the release. Can you do all that, or do you need to be admin? Patrick On Wed, Jun 8, 2011 at 1:54 PM, Benjamin Reed br...@apache.org wrote: ah dang. i'm sure that is my fault. we need to add that to the release process wiki. do you know how to do this pat? ben On Wed, Jun 8, 2011 at 1:06 PM, Camille Fournier skami...@gmail.com wrote: Looks like our jira thinks we haven't released 3.3.3. Anyone know how to fix this? C
Re: [jira] [Commented] (ZOOKEEPER-723) ephemeral parent znodes
fyi: Flavio and I responded on jira. On Tue, Jun 14, 2011 at 1:23 PM, Benjamin Reed br...@apache.org wrote: good point, you can check the cversion of the parent. that was my big objection. to be honest i can go either way. it is cumbersome to have to do the firstChild, but i'm wondering if it is easier to explain and manage that in the code than saying that the znode will go away if there aren't any children left unless no child have been created. i don't have a strong feeling one way or the other, but i do lean towards the firstChild option. are there any others that have an opinion? On Tue, Jun 14, 2011 at 12:02 PM, Daniel Gómez Ferro (JIRA) j...@apache.org wrote: [ https://issues.apache.org/jira/browse/ZOOKEEPER-723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049340#comment-13049340 ] Daniel Gómez Ferro commented on ZOOKEEPER-723: -- Couldn't that be checked with the cversion of the parent? I just find it a bit ugly/cumbersome having that firstChild automatically created (that you are probably going to delete right after creating any other child) but that's mostly an aesthetic reason, so if you think its more robust having the firstChild I'll provide a new version of the patch. ephemeral parent znodes --- Key: ZOOKEEPER-723 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-723 Project: ZooKeeper Issue Type: New Feature Components: server Reporter: Benjamin Reed Assignee: Daniel Gómez Ferro Attachments: ZOOKEEPER-723.patch ephemeral znodes have the nice property of automatically cleaning up after themselves when the creator goes away, but since they can't have children it is hard to build subtrees that will cleanup after the clients that are using them are gone. rather than changing the semantics of ephemeral nodes, i propose ephemeral parents: znodes that disappear when they have no more children. this cleanup would happen automatically when the last child is removed. an ephemeral parent is not tied to any particular session, so even if the creator goes away, the ephemeral parent will remain as long as there are children. the when an ephemeral parent is created it will have an initial child, so that it doesn't get immediately removed. i think this child should be an ephemeral znode with a predefined name, firstChild. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: more about debian
Sounds reasonable to me. Can that be done through Apache (ie do the work on the Apache deb and use the result in Ubuntu/Ensemble) or it has to be done directly through debian (or whatever one calls the way it's currently being done). Patrick On Thu, Jun 16, 2011 at 11:33 AM, Gustavo Niemeyer gust...@niemeyer.net wrote: Given that ZK (and the rest of the hadoop ecosystem) will now have Apache created/maintained debian pkgs how/does that effect things? For us (Ubuntu/Ensemble), not much. First, we want to have important software readily available. Then, ZooKeeper is a dependency of Ensemble. For one to be in the distribution, the other must be, so we'll continue to have packages well maintained. -- Gustavo Niemeyer http://niemeyer.net http://niemeyer.net/blog http://niemeyer.net/twitter
Re: QuorumTest.testFollowersStartAfterLeader
Hi Eugene, that looks right to me. (did that fix it for you?) In addition to the anti-pattern I mentioned earlier, another one to look for is slow running tests -- often times a test will run slowly that could be coded in a different way to run much more quickly. Notice here that we don't even wait the second if the test is already passing. Although we might run for a much longer time if needed. (this all adds up when you have hundreds/thousands of tests). btw, you might put that comment directly into your assert. Also take a look at Assert.fail(foo) instead of assertTrue(false). (it's a nit though). Patrick On Tue, Jun 21, 2011 at 2:03 PM, Eugene Koontz ekoo...@hiro-tan.org wrote: On 6/21/11 12:45 PM, Patrick Hunt wrote: Such uses of sleep are just asking for trouble. Take a look at the use of sleep in testSessionMove in the same class for a better way to do this. I had gone through all the tests a while back, replacing all the sleep(x) with something like this testSessionMove pattern (retry with a max limit that's very long). During reviews we should look for anti-patterns like this and address them before commit. Patrick Thanks a lot for bringing this up, Camille. I had exactly this problem (QuorumTest.testFollowersStartAfterLeader failing) yesterday and today . Would the attached patch be the fix in the spirit of the pattern you're describing, Patrick? -Eugene
Review Request: automating log and snapshot cleaning
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1043/ --- Review request for zookeeper, Patrick Hunt, Benjamin Reed, and Mahadev Konar. Summary --- I like to have ZK itself manage the amount of snapshots and logs kept, instead of relying on the PurgeTxnLog utility. This addresses bug ZOOKEEPER-1107. https://issues.apache.org/jira/browse/ZOOKEEPER-1107 Diffs - ./conf/zoo_sample.cfg 1141901 ./src/java/main/org/apache/zookeeper/ZooKeeperPurger.java PRE-CREATION ./src/java/main/org/apache/zookeeper/server/quorum/QuorumPeerConfig.java 1141901 ./src/java/test/org/apache/zookeeper/ZooKeeperPurgeTest.java PRE-CREATION Diff: https://reviews.apache.org/r/1043/diff Testing --- test added, passing hudson qa bot. Thanks, Patrick
Re: Review Request: automating log and snapshot cleaning
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1043/#review998 --- The documentation (src/docs) need to be updated - specifically the cleanup section in the admin guide. Have you considered hooking this into JMX or the 4letterwords? It would be nice for operators to get basic information. In JMX they could also control the settings... consider for a follow-on JIRA? ./conf/zoo_sample.cfg https://reviews.apache.org/r/1043/#comment2060 My personal belief is that this should be turned off by default - i.e. comment out the parameters in the sample config. ./src/java/main/org/apache/zookeeper/ZooKeeperPurger.java https://reviews.apache.org/r/1043/#comment2081 should this be in zookeeper or zookeeper.server ? ./src/java/main/org/apache/zookeeper/ZooKeeperPurger.java https://reviews.apache.org/r/1043/#comment2080 consider calling this something more descriptive - perhaps DatadirCleanupManager or similar... ./src/java/main/org/apache/zookeeper/ZooKeeperPurger.java https://reviews.apache.org/r/1043/#comment2069 enum? ./src/java/main/org/apache/zookeeper/ZooKeeperPurger.java https://reviews.apache.org/r/1043/#comment2070 use TimeUnit ./src/java/main/org/apache/zookeeper/ZooKeeperPurger.java https://reviews.apache.org/r/1043/#comment2076 perhaps this should be done in start? ./src/java/main/org/apache/zookeeper/ZooKeeperPurger.java https://reviews.apache.org/r/1043/#comment2074 move this check to the start method. 1) INFO level log if turned off 2) exit the thread if turned off ./src/java/main/org/apache/zookeeper/ZooKeeperPurger.java https://reviews.apache.org/r/1043/#comment2073 this formatting is not very nice, please adjust it a bit. ./src/java/main/org/apache/zookeeper/ZooKeeperPurger.java https://reviews.apache.org/r/1043/#comment2075 I see tests, which is great, however where is this method being called in ZooKeeper server code proper? (what I mean is the server doesn't seem to be running this) ./src/java/main/org/apache/zookeeper/ZooKeeperPurger.java https://reviews.apache.org/r/1043/#comment2079 given we are tracking the state shouldn't we be testing that here? ./src/java/main/org/apache/zookeeper/ZooKeeperPurger.java https://reviews.apache.org/r/1043/#comment2077 I would rather we check if the state is started. (log warning if not) ./src/java/main/org/apache/zookeeper/ZooKeeperPurger.java https://reviews.apache.org/r/1043/#comment2078 Zookeeper should be referred to as ZooKeeper ./src/java/main/org/apache/zookeeper/server/quorum/QuorumPeerConfig.java https://reviews.apache.org/r/1043/#comment2067 specify explicit default, 3? ./src/java/main/org/apache/zookeeper/server/quorum/QuorumPeerConfig.java https://reviews.apache.org/r/1043/#comment2068 specify explicit default - e.g. 0. ./src/java/main/org/apache/zookeeper/server/quorum/QuorumPeerConfig.java https://reviews.apache.org/r/1043/#comment2065 the docs should reflect that this only controls the number of snaps to keep, the logs are purged based on the corresponding purged snaps. ./src/java/main/org/apache/zookeeper/server/quorum/QuorumPeerConfig.java https://reviews.apache.org/r/1043/#comment2066 what does time refer to. elapsed time? what are the units. ./src/java/test/org/apache/zookeeper/ZooKeeperPurgeTest.java https://reviews.apache.org/r/1043/#comment2082 Nice! - Patrick On 2011-07-07 23:10:13, Patrick Hunt wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1043/ --- (Updated 2011-07-07 23:10:13) Review request for zookeeper, Patrick Hunt, Benjamin Reed, and Mahadev Konar. Summary --- I like to have ZK itself manage the amount of snapshots and logs kept, instead of relying on the PurgeTxnLog utility. This addresses bug ZOOKEEPER-1107. https://issues.apache.org/jira/browse/ZOOKEEPER-1107 Diffs - ./conf/zoo_sample.cfg 1141901 ./src/java/main/org/apache/zookeeper/ZooKeeperPurger.java PRE-CREATION ./src/java/main/org/apache/zookeeper/server/quorum/QuorumPeerConfig.java 1141901 ./src/java/test/org/apache/zookeeper/ZooKeeperPurgeTest.java PRE-CREATION Diff: https://reviews.apache.org/r/1043/diff Testing --- test added, passing hudson qa bot. Thanks, Patrick
Re: Review Request: automating log and snapshot cleaning
On 2011-07-07 23:34:13, Camille Fournier wrote: Are people just supposed to create their own new ZooKeeperPurger and call start on it? I don't see any hooks for starting this anywhere, or even a main method to use to start it. Would be nice to give that to people so they have a utility they can run easily. Heh, you beat me to it. ;-) I think it should be started by the server, but the config defaults should have it turned off (time=0) by default. (more in my comments) - Patrick --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1043/#review1000 --- On 2011-07-07 23:10:13, Patrick Hunt wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1043/ --- (Updated 2011-07-07 23:10:13) Review request for zookeeper, Patrick Hunt, Benjamin Reed, and Mahadev Konar. Summary --- I like to have ZK itself manage the amount of snapshots and logs kept, instead of relying on the PurgeTxnLog utility. This addresses bug ZOOKEEPER-1107. https://issues.apache.org/jira/browse/ZOOKEEPER-1107 Diffs - ./conf/zoo_sample.cfg 1141901 ./src/java/main/org/apache/zookeeper/ZooKeeperPurger.java PRE-CREATION ./src/java/main/org/apache/zookeeper/server/quorum/QuorumPeerConfig.java 1141901 ./src/java/test/org/apache/zookeeper/ZooKeeperPurgeTest.java PRE-CREATION Diff: https://reviews.apache.org/r/1043/diff Testing --- test added, passing hudson qa bot. Thanks, Patrick
Re: Review Request: automating log and snapshot cleaning
On 2011-07-07 23:40:32, Patrick Hunt wrote: ./src/java/main/org/apache/zookeeper/ZooKeeperPurger.java, lines 74-77 https://reviews.apache.org/r/1043/diff/1/?file=22149#file22149line74 move this check to the start method. 1) INFO level log if turned off 2) exit the thread if turned off Sorry, meant exit the start method if turned off (don't start the timer/task). - Patrick --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1043/#review998 --- On 2011-07-07 23:10:13, Patrick Hunt wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1043/ --- (Updated 2011-07-07 23:10:13) Review request for zookeeper, Patrick Hunt, Benjamin Reed, and Mahadev Konar. Summary --- I like to have ZK itself manage the amount of snapshots and logs kept, instead of relying on the PurgeTxnLog utility. This addresses bug ZOOKEEPER-1107. https://issues.apache.org/jira/browse/ZOOKEEPER-1107 Diffs - ./conf/zoo_sample.cfg 1141901 ./src/java/main/org/apache/zookeeper/ZooKeeperPurger.java PRE-CREATION ./src/java/main/org/apache/zookeeper/server/quorum/QuorumPeerConfig.java 1141901 ./src/java/test/org/apache/zookeeper/ZooKeeperPurgeTest.java PRE-CREATION Diff: https://reviews.apache.org/r/1043/diff Testing --- test added, passing hudson qa bot. Thanks, Patrick
Review Request: Support Kerberos authentication of clients.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1049/ --- Review request for zookeeper, Benjamin Reed and Mahadev Konar. Summary --- Support Kerberos authentication of clients. This addresses bug ZOOKEEPER-938. https://issues.apache.org/jira/browse/ZOOKEEPER-938 Diffs - src/java/main/org/apache/zookeeper/ClientCnxn.java 87477df src/java/main/org/apache/zookeeper/ClientCnxnSocketNIO.java d0e74fa src/java/main/org/apache/zookeeper/LoginThread.java PRE-CREATION src/java/main/org/apache/zookeeper/Watcher.java e72105c src/java/main/org/apache/zookeeper/ZooDefs.java f77ac20 src/java/main/org/apache/zookeeper/ZooKeeper.java f2ab4a6 src/java/main/org/apache/zookeeper/client/ZooKeeperSaslClient.java PRE-CREATION src/java/main/org/apache/zookeeper/server/FinalRequestProcessor.java b690817 src/java/main/org/apache/zookeeper/server/NIOServerCnxn.java bab8998 src/java/main/org/apache/zookeeper/server/NIOServerCnxnFactory.java e7e1846 src/java/main/org/apache/zookeeper/server/NettyServerCnxn.java c6ab5dd src/java/main/org/apache/zookeeper/server/NettyServerCnxnFactory.java deb1e7a src/java/main/org/apache/zookeeper/server/Request.java 80d2b99 src/java/main/org/apache/zookeeper/server/ServerCnxn.java 6d69073 src/java/main/org/apache/zookeeper/server/ServerCnxnFactory.java 72af158 src/java/main/org/apache/zookeeper/server/ServerConfig.java ec710cd src/java/main/org/apache/zookeeper/server/ZooKeeperSaslServer.java PRE-CREATION src/java/main/org/apache/zookeeper/server/ZooKeeperServerMain.java 52d3820 src/java/main/org/apache/zookeeper/server/auth/DigestLoginModule.java PRE-CREATION src/java/main/org/apache/zookeeper/server/auth/SASLAuthenticationProvider.java PRE-CREATION src/java/main/org/apache/zookeeper/server/quorum/QuorumPeerConfig.java c6d9c09 src/java/main/org/apache/zookeeper/server/quorum/QuorumPeerMain.java 969a482 src/java/test/org/apache/zookeeper/test/SaslAuthFailTest.java PRE-CREATION src/java/test/org/apache/zookeeper/test/SaslAuthTest.java PRE-CREATION src/zookeeper.jute 34eac78 Diff: https://reviews.apache.org/r/1049/diff Testing --- Thanks, Patrick
Re: dev meeting minutes
Vishal (Kathuria) would you mind posting your ZK presentation? Perhaps here as wiki attachment: https://cwiki.apache.org/confluence/display/ZOOKEEPER/ZooKeeperPresentations with a reference from minutes? thanks, Patrick On Wed, Jul 13, 2011 at 8:56 AM, Benjamin Reed br...@apache.org wrote: I don't know your approach to cluster reconfiguration, but I had a thought about it lately that may be rather trivial, if I didn't oversee anything. Cluster reconfiguration would need to be an operation that goes through the leader and gets acknowledged by a quorum. All operations following the cluster reconfiguration operation would need to be committed by the new quorum. I hope this idea might help and I didn't make a fool of myself. :-) Best regards, Thomas Koch, http://www.koch.ro it would be nice if things were that simple. your idea only works if no failures happen. (lots of things are simple if no failure happens. code paths become nice and clean.) since we would like to deal with failures, things are a bit more complicated. at the heart of it is a reconfiguration request that gets committed using Zab. ben
Re: Review Request: automating log and snapshot cleaning
On 2011-07-07 23:40:32, Patrick Hunt wrote: The documentation (src/docs) need to be updated - specifically the cleanup section in the admin guide. Have you considered hooking this into JMX or the 4letterwords? It would be nice for operators to get basic information. In JMX they could also control the settings... consider for a follow-on JIRA? Laxman Ch wrote: Fixed the documentation part. @Pat: I guess I'm not clear about how hooking into JMX or 4 letter-words will be helpful. Can you please explain this idea? I can takeup this task in separate JIRA. both JMX and 4letterwords provide insight to the operator into the runtime status of the system http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html#sc_zkCommands http://zookeeper.apache.org/doc/r3.3.3/zookeeperJMX.html what I was suggesting is that you could allow such insight into the cleanup task - for example is the process running, when the last time it ran, a history of the files cleaned up and when, stop the task, restart, etc... - Patrick --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1043/#review998 --- On 2011-07-07 23:10:13, Patrick Hunt wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1043/ --- (Updated 2011-07-07 23:10:13) Review request for zookeeper, Patrick Hunt, Benjamin Reed, and Mahadev Konar. Summary --- I like to have ZK itself manage the amount of snapshots and logs kept, instead of relying on the PurgeTxnLog utility. This addresses bug ZOOKEEPER-1107. https://issues.apache.org/jira/browse/ZOOKEEPER-1107 Diffs - ./conf/zoo_sample.cfg 1141901 ./src/java/main/org/apache/zookeeper/ZooKeeperPurger.java PRE-CREATION ./src/java/main/org/apache/zookeeper/server/quorum/QuorumPeerConfig.java 1141901 ./src/java/test/org/apache/zookeeper/ZooKeeperPurgeTest.java PRE-CREATION Diff: https://reviews.apache.org/r/1043/diff Testing --- test added, passing hudson qa bot. Thanks, Patrick
Re: Review Request: automating log and snapshot cleaning
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1043/#review1116 --- This is looking pretty close! ./conf/zoo_sample.cfg https://reviews.apache.org/r/1043/#comment2277 I think something like datadircleanupmanager.snapRetainCount and datadircleanupmanager.purgeInterval would be better here -- less ambiguous (in general we need to cleanup our configuration naming/handling at some point) ./src/docs/src/documentation/content/xdocs/zookeeperAdmin.xml https://reviews.apache.org/r/1043/#comment2281 mention that this is new in 3.4.0 -- see some of the other parameters such as clientPortAddress for an example of how to do this. ./src/docs/src/documentation/content/xdocs/zookeeperAdmin.xml https://reviews.apache.org/r/1043/#comment2278 something like the following? the ..snapretaincount.. most recent snapshots ... ./src/docs/src/documentation/content/xdocs/zookeeperAdmin.xml https://reviews.apache.org/r/1043/#comment2279 replace purges with deletes? ./src/docs/src/documentation/content/xdocs/zookeeperAdmin.xml https://reviews.apache.org/r/1043/#comment2280 do you mean something like: Default is 3. Minimum value is 3. I think this would be a bit more obvious. Might also be good if we print a warning if user sets to less than 3 (and specify we are using 3) ./src/java/main/org/apache/zookeeper/server/DatadirCleanupManager.java https://reviews.apache.org/r/1043/#comment2282 can we add a class javadoc describing this class? ./src/java/main/org/apache/zookeeper/server/DatadirCleanupManager.java https://reviews.apache.org/r/1043/#comment2283 should be DataDirCleanupManager.class not nioservercnxn.class. ./src/java/main/org/apache/zookeeper/server/DatadirCleanupManager.java https://reviews.apache.org/r/1043/#comment2285 move this to QPC, check it there, print a warning if user sets below this, and set the config field to this MIN. ./src/java/main/org/apache/zookeeper/server/DatadirCleanupManager.java https://reviews.apache.org/r/1043/#comment2286 if we set this in the constructor we can define these as final. ./src/java/main/org/apache/zookeeper/server/DatadirCleanupManager.java https://reviews.apache.org/r/1043/#comment2284 rather than add a dependency on QPC, could we pass these 4 parameters into a constructor for this class? Notice how this also makes your tests easier (no need to meddle with config). shouldn't we check the state before starting? ./src/java/main/org/apache/zookeeper/server/DatadirCleanupManager.java https://reviews.apache.org/r/1043/#comment2287 move this to QPC ./src/java/main/org/apache/zookeeper/server/DatadirCleanupManager.java https://reviews.apache.org/r/1043/#comment2288 I'd move this to the constructor of this class. ./src/java/main/org/apache/zookeeper/server/DatadirCleanupManager.java https://reviews.apache.org/r/1043/#comment2289 nit - formatting is a bit off, typ if (foo bar) { ... } ./src/java/main/org/apache/zookeeper/server/DatadirCleanupManager.java https://reviews.apache.org/r/1043/#comment2292 final - Patrick On 2011-07-19 21:04:20, Patrick Hunt wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1043/ --- (Updated 2011-07-19 21:04:20) Review request for zookeeper, Patrick Hunt, Benjamin Reed, and Mahadev Konar. Summary --- I like to have ZK itself manage the amount of snapshots and logs kept, instead of relying on the PurgeTxnLog utility. This addresses bug ZOOKEEPER-1107. https://issues.apache.org/jira/browse/ZOOKEEPER-1107 Diffs - ./conf/zoo_sample.cfg 1146568 ./src/docs/src/documentation/content/xdocs/zookeeperAdmin.xml 1146568 ./src/java/main/org/apache/zookeeper/server/DatadirCleanupManager.java PRE-CREATION ./src/java/main/org/apache/zookeeper/server/quorum/QuorumPeerConfig.java 1146568 ./src/java/main/org/apache/zookeeper/server/quorum/QuorumPeerMain.java 1146568 ./src/java/test/org/apache/zookeeper/server/DatadirCleanupManagerTest.java PRE-CREATION Diff: https://reviews.apache.org/r/1043/diff Testing --- test added, passing hudson qa bot. Thanks, Patrick
Re: Review Request: ZOOKEEPER-999 Create an package integration project
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1143/#review1120 --- 1) src/recipes/election has been added recently to trunk, are changes needed there as well? (see my comments below - seems like if we separated out pkg building from regular build it would make this more explicit/obvious) 2) i've been wondering for a while, perhaps we should have different scripts for zkServer.sh depending upon whether running in development mode and running in package mode. There's alot of cruft in there having to do with determining which mode we are in and then setting up appropriately (zkCli, zkServer, zkCleanup, etc...). ./bin/zkEnv.sh https://reviews.apache.org/r/1143/#comment2312 this was recently changed by ZOOKEEPER-1084 to either use the variable if passed, or use ../conf (but not ../etc) ./build.xml https://reviews.apache.org/r/1143/#comment2310 in my case aclocal does not reside in /usr/local... (ubuntu natty) but rather /usr/share can we determine this in some other way? ./build.xml https://reviews.apache.org/r/1143/#comment2311 I think this is going to cause problems for our normal release - we don't compile the c bindings as part of this process. This should be separated out. I think what we'd really need is to have separate targets for building the source artifact, and any additional binary convenience artifacts. Avro does this very successfully. How about separating out the building of packages entirely from building/testing the java code? Say by having a separate build-packages.xml (ant build file) instead? ./ivy.xml https://reviews.apache.org/r/1143/#comment2296 jdeb is listed as a default dependency, shouldn't this only be used for building, not default? (similar to say rat) ./src/contrib/zkpython/ivy.xml https://reviews.apache.org/r/1143/#comment2314 sorry, why do we need this in zkpython? ./src/contrib/zkpython/ivy.xml https://reviews.apache.org/r/1143/#comment2298 replace Hadoop with ZooKeeper ./src/contrib/zkpython/src/packages/deb/zkpython.control/control https://reviews.apache.org/r/1143/#comment2299 maintainer should be dev@zookeeper ./src/contrib/zkpython/src/packages/rpm/spec/zkpython.spec https://reviews.apache.org/r/1143/#comment2300 Hadoop? ./src/contrib/zkpython/src/packages/rpm/spec/zkpython.spec https://reviews.apache.org/r/1143/#comment2301 zookeeper rather than hadoop? ./src/packages/deb/init.d/zookeeper https://reviews.apache.org/r/1143/#comment2304 hadoop-zookeeper ./src/packages/deb/zookeeper.control/control https://reviews.apache.org/r/1143/#comment2306 dev@zookeeper ? ./src/packages/rpm/spec/zookeeper.spec https://reviews.apache.org/r/1143/#comment2307 zookeeper ./src/packages/rpm/spec/zookeeper.spec https://reviews.apache.org/r/1143/#comment2308 zookeeper? ./src/packages/templates/conf/zoo.cfg https://reviews.apache.org/r/1143/#comment2309 is there a way to not duplicate this? (a sample is also in conf) ./src/packages/update-zookeeper-env.sh https://reviews.apache.org/r/1143/#comment2303 the group used is hadoop? (seems fine, just wondering...) - Patrick On 2011-07-19 22:15:38, Patrick Hunt wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1143/ --- (Updated 2011-07-19 22:15:38) Review request for zookeeper and Mahadev Konar. Summary --- This goal of this ticket is to generate a set of RPM/debian package which integrate well with RPM sets created by HADOOP-6255. This addresses bug ZOOKEEPER-999. https://issues.apache.org/jira/browse/ZOOKEEPER-999 Diffs - ./bin/zkCleanup.sh 1141173 ./bin/zkCli.sh 1141173 ./bin/zkEnv.sh 1141173 ./bin/zkServer.sh 1141173 ./build.xml 1141176 ./ivy.xml 1141173 ./src/contrib/build-contrib.xml 1141173 ./src/contrib/build.xml 1141173 ./src/contrib/zkpython/build.xml 1141173 ./src/contrib/zkpython/ivy.xml PRE-CREATION ./src/contrib/zkpython/src/packages/deb/zkpython.control/control PRE-CREATION ./src/contrib/zkpython/src/packages/rpm/spec/zkpython.spec PRE-CREATION ./src/contrib/zkpython/src/python/setup.py 1141173 ./src/packages/deb/init.d/zookeeper PRE-CREATION ./src/packages/deb/zookeeper.control/conffile PRE-CREATION ./src/packages/deb/zookeeper.control/control PRE-CREATION ./src/packages/deb/zookeeper.control/postinst PRE-CREATION ./src/packages/deb/zookeeper.control/postrm PRE-CREATION ./src/packages/deb/zookeeper.control/preinst PRE-CREATION ./src/packages/deb/zookeeper.control/prerm PRE-CREATION ./src/packages/rpm/init.d/zookeeper PRE
Re: Review Request: ZOOKEEPER-999 Create an package integration project
On 2011-07-19 22:46:10, Patrick Hunt wrote: 1) src/recipes/election has been added recently to trunk, are changes needed there as well? (see my comments below - seems like if we separated out pkg building from regular build it would make this more explicit/obvious) 2) i've been wondering for a while, perhaps we should have different scripts for zkServer.sh depending upon whether running in development mode and running in package mode. There's alot of cruft in there having to do with determining which mode we are in and then setting up appropriately (zkCli, zkServer, zkCleanup, etc...). What do you think of this idea (2 in particular?) I guess hadoop is fine with this mixed use? Perhaps we can take some ideas from there an apply? On 2011-07-19 22:46:10, Patrick Hunt wrote: ./bin/zkEnv.sh, lines 31-35 https://reviews.apache.org/r/1143/diff/1/?file=26344#file26344line31 this was recently changed by ZOOKEEPER-1084 to either use the variable if passed, or use ../conf (but not ../etc) Eric Yang wrote: Hadoop stack software used the same pattern *_HOME/conf for configuration directory. This naming convention doesn't work when *_HOME/conf is collapsed into a single directory, (i.e. /usr) A proposal maded in HADOOP-6255 to address this issue. For packaged software, $PREFIX/etc/$project will be the naming convention for configuration directory. For developer, it will use $PREFIX/conf for source code build. This patch already merged change for ZOOKEEPER-1084, it will honor $ZOOCFGDIR if it is defined. we should look at moving to best practices, esp if hadoop already figured this out. Would you mind starting a ZK wiki page detailing both the bin/* scripts and how it fits into packaging? Or perhaps we can document it within the scripts themselves? I'd like to see us start deprecating (and eventually removing) the old ways of doing things an moving people to whatever our current best practices are. On 2011-07-19 22:46:10, Patrick Hunt wrote: ./build.xml, line 155 https://reviews.apache.org/r/1143/diff/1/?file=26346#file26346line155 in my case aclocal does not reside in /usr/local... (ubuntu natty) but rather /usr/share can we determine this in some other way? Eric Yang wrote: I am fine to use /usr/share as default. It is also possible to overwrite this in build.properties. Conditional checking to set property is not pretty in ant, it may be better to leave this as a property for now. I'm fine either way - as long as it doesn't impact the ability for people to easily checkout the code and build. I wasn't sure how this would effect that. On 2011-07-19 22:46:10, Patrick Hunt wrote: ./src/packages/update-zookeeper-env.sh, line 131 https://reviews.apache.org/r/1143/diff/1/?file=26365#file26365line131 the group used is hadoop? (seems fine, just wondering...) Eric Yang wrote: I set the full stack of the software to be ownership by group hadoop for easier file ownership management. seems fine. Are you planning to participate in Bigtop? Would love to have you involved. It would be great to nail all this type of thing down so that it will be consistent. (or make it easy to configure if there's no consensus, ie figure out what ppl agree on and what not). On 2011-07-19 22:46:10, Patrick Hunt wrote: ./src/packages/templates/conf/zoo.cfg, lines 1-12 https://reviews.apache.org/r/1143/diff/1/?file=26364#file26364line1 is there a way to not duplicate this? (a sample is also in conf) Eric Yang wrote: How about generate conf/zoo_sample.cfg from src/packages/templates/conf/zoo.cfg as part of the build process? My concerns are: 1) make it easy for ppl to get started with development, 2) try to limit duplication, esp stuff we're likely to forget to update, 3) generated code should go into build (eventually target once maven is here) perhaps a conf/zoo_template.cfg that could be used by 1), and generate into build/.../src/packages/templates/conf/zoo.cfg for use in packaging? - Patrick --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1143/#review1120 --- On 2011-07-19 22:15:38, Patrick Hunt wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1143/ --- (Updated 2011-07-19 22:15:38) Review request for zookeeper and Mahadev Konar. Summary --- This goal of this ticket is to generate a set of RPM/debian package which integrate well with RPM sets created by HADOOP-6255. This addresses bug ZOOKEEPER-999. https://issues.apache.org/jira/browse/ZOOKEEPER-999 Diffs - ./bin/zkCleanup.sh 1141173
Re: Review Request: ZOOKEEPER-999 Create an package integration project
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1143/ --- (Updated 2011-07-20 17:45:08.665910) Review request for zookeeper and Mahadev Konar. Changes --- updated from Eric. Summary --- This goal of this ticket is to generate a set of RPM/debian package which integrate well with RPM sets created by HADOOP-6255. This addresses bug ZOOKEEPER-999. https://issues.apache.org/jira/browse/ZOOKEEPER-999 Diffs (updated) - ./bin/zkCleanup.sh 1148587 ./bin/zkCli.sh 1148587 ./bin/zkEnv.sh 1148587 ./bin/zkServer.sh 1148587 ./build.xml 1148587 ./ivy.xml 1148587 ./src/contrib/build-contrib.xml 1148587 ./src/contrib/build.xml 1148587 ./src/contrib/zkpython/build.xml 1148587 ./src/contrib/zkpython/ivy.xml PRE-CREATION ./src/contrib/zkpython/src/packages/deb/zkpython.control/control PRE-CREATION ./src/contrib/zkpython/src/packages/rpm/spec/zkpython.spec PRE-CREATION ./src/contrib/zkpython/src/python/setup.py 1148587 ./src/packages/deb/init.d/zookeeper PRE-CREATION ./src/packages/deb/zookeeper.control/conffile PRE-CREATION ./src/packages/deb/zookeeper.control/control PRE-CREATION ./src/packages/deb/zookeeper.control/postinst PRE-CREATION ./src/packages/deb/zookeeper.control/postrm PRE-CREATION ./src/packages/deb/zookeeper.control/preinst PRE-CREATION ./src/packages/deb/zookeeper.control/prerm PRE-CREATION ./src/packages/rpm/init.d/zookeeper PRE-CREATION ./src/packages/rpm/spec/zookeeper.spec PRE-CREATION ./src/packages/templates/conf/zookeeper-env.sh PRE-CREATION ./src/packages/update-zookeeper-env.sh PRE-CREATION ./src/recipes/build-recipes.xml 1148587 ./src/recipes/build.xml 1148587 ./src/recipes/lock/build.xml 1148587 ./src/recipes/queue/build.xml 1148587 Diff: https://reviews.apache.org/r/1143/diff Testing --- Thanks, Patrick
Re: Review Request: ZOOKEEPER-999 Create an package integration project
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1143/ --- (Updated 2011-07-21 05:31:05.468929) Review request for zookeeper and Mahadev Konar. Changes --- update from Eric Summary --- This goal of this ticket is to generate a set of RPM/debian package which integrate well with RPM sets created by HADOOP-6255. This addresses bug ZOOKEEPER-999. https://issues.apache.org/jira/browse/ZOOKEEPER-999 Diffs (updated) - ./bin/zkCleanup.sh 1148587 ./bin/zkCli.sh 1148587 ./bin/zkEnv.sh 1148587 ./bin/zkServer.sh 1148587 ./build.xml 1148587 ./ivy.xml 1148587 ./src/contrib/build-contrib.xml 1148587 ./src/contrib/build.xml 1148587 ./src/contrib/zkpython/build.xml 1148587 ./src/contrib/zkpython/ivy.xml PRE-CREATION ./src/contrib/zkpython/src/packages/deb/zkpython.control/control PRE-CREATION ./src/contrib/zkpython/src/packages/rpm/spec/zkpython.spec PRE-CREATION ./src/contrib/zkpython/src/python/setup.py 1148587 ./src/packages/deb/init.d/zookeeper PRE-CREATION ./src/packages/deb/zookeeper.control/conffile PRE-CREATION ./src/packages/deb/zookeeper.control/control PRE-CREATION ./src/packages/deb/zookeeper.control/postinst PRE-CREATION ./src/packages/deb/zookeeper.control/postrm PRE-CREATION ./src/packages/deb/zookeeper.control/preinst PRE-CREATION ./src/packages/deb/zookeeper.control/prerm PRE-CREATION ./src/packages/rpm/init.d/zookeeper PRE-CREATION ./src/packages/rpm/spec/zookeeper.spec PRE-CREATION ./src/packages/templates/conf/zookeeper-env.sh PRE-CREATION ./src/packages/update-zookeeper-env.sh PRE-CREATION ./src/recipes/build-recipes.xml 1148587 ./src/recipes/build.xml 1148587 ./src/recipes/lock/build.xml 1148587 ./src/recipes/queue/build.xml 1148587 Diff: https://reviews.apache.org/r/1143/diff Testing --- Thanks, Patrick
Re: Review Request: ZOOKEEPER-999 Create an package integration project
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1143/#review1149 --- I ran ant deb successfully, although I have not yet tried to install it. I noticed a couple issues with the generated package. shouldn't /etc/zookeeper contain zoo.cfg? drwxr-xr-x 0/0 0 2011-07-20 22:41 ./etc/ drwxr-xr-x 0/0 0 2011-07-20 22:41 ./etc/zookeeper/ -rw-r--r-- 0/0 535 2011-07-20 22:41 ./etc/zookeeper/configuration.xsl -rw-r--r-- 0/02161 2011-07-20 22:41 ./etc/zookeeper/log4j.properties -rw-r--r-- 0/0 447 2011-07-20 22:41 ./etc/zookeeper/zoo_sample.cfg shouldn't we be creating (if doesn't exist) the /var/lib/zookeeper directory? ZK does this currently, however I'm planning to file a bug for this - really we shouldn't come up if we can't find this directory (handles case of misconfiguration - we stop rather than start with an empty data hierarchy) Can you add some documentation? Nothing fancy, perhaps a README_packaging.txt at the toplevel that describes the currently supported packages, some basic information about them, how to build ant deb etc..., additional requirements to build, etc... something basic to help out someone trying to build the packages. ./src/packages/deb/zookeeper.control/control https://reviews.apache.org/r/1143/#comment2380 ZooKeeper only supports sun jdk/jre, not openjdk - Patrick On 2011-07-21 05:31:05, Patrick Hunt wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1143/ --- (Updated 2011-07-21 05:31:05) Review request for zookeeper and Mahadev Konar. Summary --- This goal of this ticket is to generate a set of RPM/debian package which integrate well with RPM sets created by HADOOP-6255. This addresses bug ZOOKEEPER-999. https://issues.apache.org/jira/browse/ZOOKEEPER-999 Diffs - ./bin/zkCleanup.sh 1148587 ./bin/zkCli.sh 1148587 ./bin/zkEnv.sh 1148587 ./bin/zkServer.sh 1148587 ./build.xml 1148587 ./ivy.xml 1148587 ./src/contrib/build-contrib.xml 1148587 ./src/contrib/build.xml 1148587 ./src/contrib/zkpython/build.xml 1148587 ./src/contrib/zkpython/ivy.xml PRE-CREATION ./src/contrib/zkpython/src/packages/deb/zkpython.control/control PRE-CREATION ./src/contrib/zkpython/src/packages/rpm/spec/zkpython.spec PRE-CREATION ./src/contrib/zkpython/src/python/setup.py 1148587 ./src/packages/deb/init.d/zookeeper PRE-CREATION ./src/packages/deb/zookeeper.control/conffile PRE-CREATION ./src/packages/deb/zookeeper.control/control PRE-CREATION ./src/packages/deb/zookeeper.control/postinst PRE-CREATION ./src/packages/deb/zookeeper.control/postrm PRE-CREATION ./src/packages/deb/zookeeper.control/preinst PRE-CREATION ./src/packages/deb/zookeeper.control/prerm PRE-CREATION ./src/packages/rpm/init.d/zookeeper PRE-CREATION ./src/packages/rpm/spec/zookeeper.spec PRE-CREATION ./src/packages/templates/conf/zookeeper-env.sh PRE-CREATION ./src/packages/update-zookeeper-env.sh PRE-CREATION ./src/recipes/build-recipes.xml 1148587 ./src/recipes/build.xml 1148587 ./src/recipes/lock/build.xml 1148587 ./src/recipes/queue/build.xml 1148587 Diff: https://reviews.apache.org/r/1143/diff Testing --- Thanks, Patrick
Out of memory running ZK unit tests against trunk
I've never seen this before, but in my CI environment (sun jdk 1.6.0_20) I'm seeing some intermittent failures such as the following. Has anyone added/modified tests for 3.4.0 that might be using more threads/memory than previously? Creating ZK clients but not closing them, etc... java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:597) at org.apache.zookeeper.server.NIOServerCnxnFactory.start(NIOServerCnxnFactory.java:114) at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:406) at org.apache.zookeeper.test.QuorumBase.startServers(QuorumBase.java:186) at org.apache.zookeeper.test.QuorumBase.setUp(QuorumBase.java:103) at org.apache.zookeeper.test.QuorumBase.setUp(QuorumBase.java:67) at org.apache.zookeeper.test.QuorumZxidSyncTest.setUp(QuorumZxidSyncTest.java:39) Patrick
Re: Review Request: ZOOKEEPER-999 Create an package integration project
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1143/ --- (Updated 2011-07-22 18:24:20.565434) Review request for zookeeper and Mahadev Konar. Changes --- update 8 from eric Summary --- This goal of this ticket is to generate a set of RPM/debian package which integrate well with RPM sets created by HADOOP-6255. This addresses bug ZOOKEEPER-999. https://issues.apache.org/jira/browse/ZOOKEEPER-999 Diffs (updated) - ./README_packaging.txt PRE-CREATION ./bin/zkCleanup.sh 1148587 ./bin/zkCli.sh 1148587 ./bin/zkEnv.sh 1148587 ./bin/zkServer.sh 1148587 ./build.xml 1148587 ./ivy.xml 1148587 ./src/contrib/build-contrib.xml 1148587 ./src/contrib/build.xml 1148587 ./src/contrib/zkpython/build.xml 1148587 ./src/contrib/zkpython/ivy.xml PRE-CREATION ./src/contrib/zkpython/src/packages/deb/zkpython.control/control PRE-CREATION ./src/contrib/zkpython/src/packages/rpm/spec/zkpython.spec PRE-CREATION ./src/contrib/zkpython/src/python/setup.py 1148587 ./src/packages/deb/init.d/zookeeper PRE-CREATION ./src/packages/deb/zookeeper.control/conffile PRE-CREATION ./src/packages/deb/zookeeper.control/control PRE-CREATION ./src/packages/deb/zookeeper.control/postinst PRE-CREATION ./src/packages/deb/zookeeper.control/postrm PRE-CREATION ./src/packages/deb/zookeeper.control/preinst PRE-CREATION ./src/packages/deb/zookeeper.control/prerm PRE-CREATION ./src/packages/rpm/init.d/zookeeper PRE-CREATION ./src/packages/rpm/spec/zookeeper.spec PRE-CREATION ./src/packages/templates/conf/zookeeper-env.sh PRE-CREATION ./src/packages/update-zookeeper-env.sh PRE-CREATION ./src/recipes/build-recipes.xml 1148587 ./src/recipes/build.xml 1148587 ./src/recipes/lock/build.xml 1148587 ./src/recipes/queue/build.xml 1148587 Diff: https://reviews.apache.org/r/1143/diff Testing --- Thanks, Patrick
Re: Review Request: ZOOKEEPER-999 Create an package integration project
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1143/#review1169 --- The packaging code itself looks pretty good now. I've moved on to verifying the installation, however I see issues installing the deb on a clean ubuntu 10.10. Can you try generating the packages, install them and work through these? (ie test on ubuntu and rhel/centos). 1) the following during install: Setting up zookeeper (3.4.0) ... ln: creating symbolic link `/usr/etc/zookeeper': No such file or directory 2) I'm unable to start the server, seems like number of issues: a) why /usr/sbin/../etc/zookeeper for config, should just be /etc/zookeeper no? b) permission issues, notice I am running this as sudo. sudo /etc/init.d/zookeeper start * Starting Apache ZooKeeper server zookeeper JMX enabled by default Using config: /usr/sbin/../etc/zookeeper/zoo.cfg grep: /usr/sbin/../etc/zookeeper/zoo.cfg: No such file or directory mkdir: cannot create directory `': No such file or directory Starting zookeeper ... /usr/bin/../sbin/zkServer.sh: line 109: /zookeeper_server.pid: Permission denied FAILED TO WRITE PID /usr/bin/../sbin/zkServer.sh: line 105: ./zookeeper.out: Permission denied [fail] ./src/packages/deb/zookeeper.control/control https://reviews.apache.org/r/1143/#comment2429 should we depend on either the jre or the jdk? ./README_packaging.txt https://reviews.apache.org/r/1143/#comment2435 Nice! Would be nice to add - when I build where are the generated deb/rpm files placed? Are there any requirements for building? (I don't think so, but I'm not sure) ./README_packaging.txt https://reviews.apache.org/r/1143/#comment2437 It's great to document the source layout, could we also provide some detail on where things are placed on the install machine? (high level is fine). ./src/contrib/zkpython/src/packages/deb/zkpython.control/control https://reviews.apache.org/r/1143/#comment2431 missing license - is there a way to add comments here or no? ./src/packages/deb/zookeeper.control/conffile https://reviews.apache.org/r/1143/#comment2432 license? ./src/packages/templates/conf/zookeeper-env.sh https://reviews.apache.org/r/1143/#comment2433 need license header here. - Patrick On 2011-07-22 18:24:20, Patrick Hunt wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1143/ --- (Updated 2011-07-22 18:24:20) Review request for zookeeper and Mahadev Konar. Summary --- This goal of this ticket is to generate a set of RPM/debian package which integrate well with RPM sets created by HADOOP-6255. This addresses bug ZOOKEEPER-999. https://issues.apache.org/jira/browse/ZOOKEEPER-999 Diffs - ./README_packaging.txt PRE-CREATION ./bin/zkCleanup.sh 1148587 ./bin/zkCli.sh 1148587 ./bin/zkEnv.sh 1148587 ./bin/zkServer.sh 1148587 ./build.xml 1148587 ./ivy.xml 1148587 ./src/contrib/build-contrib.xml 1148587 ./src/contrib/build.xml 1148587 ./src/contrib/zkpython/build.xml 1148587 ./src/contrib/zkpython/ivy.xml PRE-CREATION ./src/contrib/zkpython/src/packages/deb/zkpython.control/control PRE-CREATION ./src/contrib/zkpython/src/packages/rpm/spec/zkpython.spec PRE-CREATION ./src/contrib/zkpython/src/python/setup.py 1148587 ./src/packages/deb/init.d/zookeeper PRE-CREATION ./src/packages/deb/zookeeper.control/conffile PRE-CREATION ./src/packages/deb/zookeeper.control/control PRE-CREATION ./src/packages/deb/zookeeper.control/postinst PRE-CREATION ./src/packages/deb/zookeeper.control/postrm PRE-CREATION ./src/packages/deb/zookeeper.control/preinst PRE-CREATION ./src/packages/deb/zookeeper.control/prerm PRE-CREATION ./src/packages/rpm/init.d/zookeeper PRE-CREATION ./src/packages/rpm/spec/zookeeper.spec PRE-CREATION ./src/packages/templates/conf/zookeeper-env.sh PRE-CREATION ./src/packages/update-zookeeper-env.sh PRE-CREATION ./src/recipes/build-recipes.xml 1148587 ./src/recipes/build.xml 1148587 ./src/recipes/lock/build.xml 1148587 ./src/recipes/queue/build.xml 1148587 Diff: https://reviews.apache.org/r/1143/diff Testing --- Thanks, Patrick
Re: FW: Does abrupt kill corrupts the datadir?
ZK has been built around the fail fast approach. In order to maintain high availability we want to ensure that restarting a server will result in it attempting to rejoin the quorum. IMO we would not want to change this (kill -9). Patrick On Tue, Jul 26, 2011 at 2:02 AM, Laxman lakshman...@huawei.com wrote: Hi Everyone, Any thoughts? Do we need consider changing abrupt shutdown to Implementations in some other hadoop eco system projects for your reference. Hadoop - kill [SIGTERM] HBase - kill [SIGTERM] and then kill -9 [SIGKILL] if process hung ZooKeeper - kill -9 [SIGKILL] -Original Message- From: Laxman [mailto:lakshman...@huawei.com] Sent: Wednesday, July 13, 2011 12:36 PM To: 'dev@zookeeper.apache.org' Subject: RE: Does abrupt kill corrupts the datadir? Hi Mahadev, Shutdown hook is just a quick thought. Another approach can be just give a kill [SIGTERM] call which can be interpreted by process. First look at the kill -9 triggered the following scenario. In worst case, if latest snaps in all zookeeper nodes gets corrupted there is a chance of dataloss. How does zookeeper can deal with this scenario gracefully? Also, I feel we should give a chance to application to shutdown gracefully before abrupt shutdown. http://en.wikipedia.org/wiki/SIGKILL Because SIGKILL gives the process no opportunity to do cleanup operations on terminating, in most system shutdown procedures an attempt is first made to terminate processes using SIGTERM, before resorting to SIGKILL. http://rackerhacker.com/2010/03/18/sigterm-vs-sigkill/ The application can determine what it wants to do once a SIGTERM is received. While most applications will clean up their resources and stop, some may not. An application may be configured to do something completely different when a SIGTERM is received. Also, if the application is in a bad state, such as waiting for disk I/O, it may not be able to act on the signal that was sent. Most system administrators will usually resort to the more abrupt signal when an application doesn't respond to a SIGTERM. -Original Message- From: Mahadev Konar [mailto:maha...@hortonworks.com] Sent: Wednesday, July 13, 2011 12:02 PM To: dev@zookeeper.apache.org Subject: Re: Does abrupt kill corrupts the datadir? Hi Laxman, The servers takes care of all the issues with data integrity, so a kill -9 is OK. Shutdown hooks are tricky. Also, the best way to make sure everything works reliably is use kill -9 :). Thanks mahadev On 7/12/11 11:16 PM, Laxman lakshman...@huawei.com wrote: When we stop zookeeper through zkServer.sh stop, we are aborting the zookeeper process using kill -9. 129 stop) 130 echo -n Stopping zookeeper ... 131 if [ ! -f $ZOOPIDFILE ] 132 then 133 echo error: could not find file $ZOOPIDFILE 134 exit 1 135 else 136 $KILL -9 $(cat $ZOOPIDFILE) 137 rm $ZOOPIDFILE 138 echo STOPPED 139 exit 0 140 fi 141 ;; This may corrupt the snapshot and transaction logs. Also, its not recommended to use kill -9. In worst case, if latest snaps in all zookeeper nodes gets corrupted there is a chance of dataloss. How about introducing a shutdown hook which will ensure zookeeper is shutdown gracefully when we call stop? Note: This is just an observation and its not found in a test. -- Thanks, Laxman
reminder to committers: new source files must have license headers
In running the release audit for 3.4.0 branch I see a number of new files w/o licenses: https://issues.apache.org/jira/browse/ZOOKEEPER-1138 in general new files must have license headers - this is esp the case for source code, scripts, etc... When adding new files to SVN be sure to review the license status of each. Patrick
Re: Review Request: ZOOKEEPER-999 Create an package integration project
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1143/ --- (Updated 2011-07-28 07:13:52.220703) Review request for zookeeper and Mahadev Konar. Changes --- update 10 from eric Summary --- This goal of this ticket is to generate a set of RPM/debian package which integrate well with RPM sets created by HADOOP-6255. This addresses bug ZOOKEEPER-999. https://issues.apache.org/jira/browse/ZOOKEEPER-999 Diffs (updated) - ./README_packaging.txt PRE-CREATION ./bin/zkCleanup.sh 1151144 ./bin/zkCli.sh 1151144 ./bin/zkEnv.sh 1151144 ./bin/zkServer.sh 1151144 ./build.xml 1151144 ./ivy.xml 1151144 ./src/contrib/build-contrib.xml 1151144 ./src/contrib/build.xml 1151144 ./src/contrib/zkpython/build.xml 1151144 ./src/contrib/zkpython/ivy.xml PRE-CREATION ./src/contrib/zkpython/src/packages/deb/zkpython.control/control PRE-CREATION ./src/contrib/zkpython/src/packages/rpm/spec/zkpython.spec PRE-CREATION ./src/contrib/zkpython/src/python/setup.py 1151144 ./src/packages/deb/init.d/zookeeper PRE-CREATION ./src/packages/deb/zookeeper.control/conffile PRE-CREATION ./src/packages/deb/zookeeper.control/control PRE-CREATION ./src/packages/deb/zookeeper.control/postinst PRE-CREATION ./src/packages/deb/zookeeper.control/postrm PRE-CREATION ./src/packages/deb/zookeeper.control/preinst PRE-CREATION ./src/packages/deb/zookeeper.control/prerm PRE-CREATION ./src/packages/rpm/init.d/zookeeper PRE-CREATION ./src/packages/rpm/spec/zookeeper.spec PRE-CREATION ./src/packages/templates/conf/zookeeper-env.sh PRE-CREATION ./src/packages/update-zookeeper-env.sh PRE-CREATION ./src/recipes/build-recipes.xml 1151144 ./src/recipes/build.xml 1151144 ./src/recipes/lock/build.xml 1151144 ./src/recipes/queue/build.xml 1151144 Diff: https://reviews.apache.org/r/1143/diff Testing --- Thanks, Patrick
Re: Out of memory running ZK unit tests against trunk
I tracked this down to a low ulimit setting on the particular jenkins host where this was failing (max processes). Specifically the following test was failing on trunk, but not on branch 3_3, which concerns me ./src/java/test/org/apache/zookeeper/test/QuorumZxidSyncTest.java there haven't been any real changes to this test between versions, any insight into why the server is using more threads in trunk vs branch33? Patrick On Fri, Jul 22, 2011 at 10:58 AM, Patrick Hunt ph...@apache.org wrote: I've never seen this before, but in my CI environment (sun jdk 1.6.0_20) I'm seeing some intermittent failures such as the following. Has anyone added/modified tests for 3.4.0 that might be using more threads/memory than previously? Creating ZK clients but not closing them, etc... java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:597) at org.apache.zookeeper.server.NIOServerCnxnFactory.start(NIOServerCnxnFactory.java:114) at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:406) at org.apache.zookeeper.test.QuorumBase.startServers(QuorumBase.java:186) at org.apache.zookeeper.test.QuorumBase.setUp(QuorumBase.java:103) at org.apache.zookeeper.test.QuorumBase.setUp(QuorumBase.java:67) at org.apache.zookeeper.test.QuorumZxidSyncTest.setUp(QuorumZxidSyncTest.java:39) Patrick
Re: Out of memory running ZK unit tests against trunk
Hi Laxman, you want to take a stab at it? https://issues.apache.org/jira/browse/ZOOKEEPER-1140 Can you followup with Flavio/Henry about the readonly issue? Shouldn't such a feature only be enabled when R/O support is enabled? (my assumption is that it should be off by default, on via configuration option) Patrick On Fri, Jul 29, 2011 at 7:00 AM, Laxman lakshman...@huawei.com wrote: In QuorumPeer, when the peer is in LOOKING state we are starting ReadOnlyZooKeeperServer in a separate thread. And we are shutting down this server even before startup which has no effect. Also, as this is not a blocking call QP keeps on spawning new servers. 1) ReadOnlyZooKeeperServer.startup() need not be called in separate a thread. 2) ReadOnlyZooKeeperServer.startup() is not a blocking call. Need to introduce a method like Leader.lead(), Follower.followLeader() 3) Shutdown should be called only after the a/m blocking call is returned. -Original Message- From: Patrick Hunt [mailto:ph...@apache.org] Sent: Friday, July 29, 2011 6:24 AM To: dev@zookeeper.apache.org Subject: Re: Out of memory running ZK unit tests against trunk Near the end of this test (QuorumZxidSyncTest) there are tons of threads running - 115 ProcessThread threads, similar numbers of SessionTracker. Also I see ~100 ReadOnlyRequestProcessor - why is this running as a separate thread? (henry/flavio?) Regardless, I'll enter a 3.4.0 blocker to clean this up - I suspect that the server shutdown is not shutting down fully for some reason. Patrick On Thu, Jul 28, 2011 at 5:28 PM, Mahadev Konar maha...@hortonworks.com wrote: Nice find Pat. I cant see a reason on why that should happen. Can we just do a stack dump and compare? thanks mahadev On Thu, Jul 28, 2011 at 1:54 PM, Patrick Hunt ph...@apache.org wrote: I tracked this down to a low ulimit setting on the particular jenkins host where this was failing (max processes). Specifically the following test was failing on trunk, but not on branch 3_3, which concerns me ./src/java/test/org/apache/zookeeper/test/QuorumZxidSyncTest.java there haven't been any real changes to this test between versions, any insight into why the server is using more threads in trunk vs branch33? Patrick On Fri, Jul 22, 2011 at 10:58 AM, Patrick Hunt ph...@apache.org wrote: I've never seen this before, but in my CI environment (sun jdk 1.6.0_20) I'm seeing some intermittent failures such as the following. Has anyone added/modified tests for 3.4.0 that might be using more threads/memory than previously? Creating ZK clients but not closing them, etc... java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:597) at org.apache.zookeeper.server.NIOServerCnxnFactory.start(NIOServerCnxnFactory. java:114) at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:406) at org.apache.zookeeper.test.QuorumBase.startServers(QuorumBase.java:186) at org.apache.zookeeper.test.QuorumBase.setUp(QuorumBase.java:103) at org.apache.zookeeper.test.QuorumBase.setUp(QuorumBase.java:67) at org.apache.zookeeper.test.QuorumZxidSyncTest.setUp(QuorumZxidSyncTest.java:3 9) Patrick
Re: FW: Does abrupt kill corrupts the datadir?
Andrei, you might find this useful for such testing: https://github.com/toddlipcon/gremlins Patrick On Thu, Jul 28, 2011 at 4:14 PM, Andrei Savu savu.and...@gmail.com wrote: I've been doing some testing in the past for this scenario and I've seen no data loss over an extended period of time (a day). Testing steps: 0. start an ensemble running 5 servers 1. start an workload generator (e.g. push a strictly increasing sequence of numbers to a queue stored in zookeeper) every few seconds: kill the cluster leader (-9) and restart You should be careful how you handle ConnectionLossException OperationTimeoutException You can find the code for this test here (executed against the trunk version): https://github.com/andreisavu/zookeeper-mq -- Andrei Savu / andreisavu.ro On Thu, Jul 28, 2011 at 9:05 AM, Benjamin Reed br...@apache.org wrote: almost everything we do in zookkeeper is to make sure that we don't lose data in much worse scenarios. the probably of a loss in this scenario is really just the probability of a bug in the code. i don't think that kill -TERM vs kill -KILL changes that probability at all either way. ben On Thu, Jul 28, 2011 at 12:50 AM, Laxman lakshman...@huawei.com wrote: Thanks for the responses Mahadev, Pat and Ben. I understand your explanation. My only question is Will there be any probability data loss in the scenario mentioned? In worst case, if latest snaps in all zookeeper nodes gets corrupted there is a chance of data loss. if we use sigterm in the script, we would want to put a timeout in to escalate to a -9 As Ben mentioned, even if we escalate to kill -9 to ensure shutdown, still we may have data loss. But the probability is very less by giving a chance to shutdown gracefully. Please do correct me if my understanding is wrong. -- Laxman -Original Message- From: Benjamin Reed [mailto:br...@apache.org] Sent: Thursday, July 28, 2011 11:40 AM To: dev@zookeeper.apache.org Subject: Re: FW: Does abrupt kill corrupts the datadir? i agree with pat. if we use sigterm in the script, we would want to put a timeout in to escalate to a -9 which makes the script a bit more complicated without reason since we don't have any exit hooks that we want to run. zookeeper is designed to recover well from hard failures, much worse than a kill -9. i don't think we want to change that. ben On Wed, Jul 27, 2011 at 10:25 AM, Patrick Hunt ph...@apache.org wrote: ZK has been built around the fail fast approach. In order to maintain high availability we want to ensure that restarting a server will result in it attempting to rejoin the quorum. IMO we would not want to change this (kill -9). Patrick On Tue, Jul 26, 2011 at 2:02 AM, Laxman lakshman...@huawei.com wrote: Hi Everyone, Any thoughts? Do we need consider changing abrupt shutdown to Implementations in some other hadoop eco system projects for your reference. Hadoop - kill [SIGTERM] HBase - kill [SIGTERM] and then kill -9 [SIGKILL] if process hung ZooKeeper - kill -9 [SIGKILL] -Original Message- From: Laxman [mailto:lakshman...@huawei.com] Sent: Wednesday, July 13, 2011 12:36 PM To: 'dev@zookeeper.apache.org' Subject: RE: Does abrupt kill corrupts the datadir? Hi Mahadev, Shutdown hook is just a quick thought. Another approach can be just give a kill [SIGTERM] call which can be interpreted by process. First look at the kill -9 triggered the following scenario. In worst case, if latest snaps in all zookeeper nodes gets corrupted there is a chance of dataloss. How does zookeeper can deal with this scenario gracefully? Also, I feel we should give a chance to application to shutdown gracefully before abrupt shutdown. http://en.wikipedia.org/wiki/SIGKILL Because SIGKILL gives the process no opportunity to do cleanup operations on terminating, in most system shutdown procedures an attempt is first made to terminate processes using SIGTERM, before resorting to SIGKILL. http://rackerhacker.com/2010/03/18/sigterm-vs-sigkill/ The application can determine what it wants to do once a SIGTERM is received. While most applications will clean up their resources and stop, some may not. An application may be configured to do something completely different when a SIGTERM is received. Also, if the application is in a bad state, such as waiting for disk I/O, it may not be able to act on the signal that was sent. Most system administrators will usually resort to the more abrupt signal when an application doesn't respond to a SIGTERM. -Original Message- From: Mahadev Konar [mailto:maha...@hortonworks.com] Sent: Wednesday, July 13, 2011 12:02 PM To: dev@zookeeper.apache.org Subject: Re: Does abrupt kill corrupts the datadir? Hi Laxman, The servers takes care of all the issues with data integrity, so a kill -9 is OK. Shutdown hooks are tricky. Also, the best way to make sure everything works reliably is use kill -9
Re: 3.4 Release.
Eugene you saw my email about that test and ulimit right? Make sure you have ulimit for max files and max processes 1k, that was the issue I had (max proc was set to 1024). Patrick On Tue, Aug 2, 2011 at 4:35 PM, Eugene Koontz ekoo...@hiro-tan.org wrote: On 8/1/11 7:49 PM, Mahadev Konar wrote: Looks like jenkins is still having issues. Until then we can fix the Open issues we have: ZOOKEEPER-1125: Intermittent java core test failures Vishal do you want to take this up? ZOOKEEPER-839: Ill upload a patch by tonight. ZOOKEEPER-1136. Ben would you be taking this up? ZOOKEEPER-1055. Eugene, do you have sometime to take a look at this? Hi Mahadev, I refreshed the patch for ZOOKEEPER-1055. I am having trouble getting QuorumZxidSyncTest to pass, on EC2. The test works locally with ant junit.run -Dtest.output=yes -Dtestcase=QuorumZxidSyncTest, though. This odd behavior is the topic of ZOOKEEPER-1125. -Eugene
Re: 3.4 Release.
What type of ec2 instance are you running on? I've seen some failures due to underpowered/underresourced systems. Is ObserverTest consistently failing? Patrick On Tue, Aug 2, 2011 at 6:28 PM, Eugene Koontz ekoo...@hiro-tan.org wrote: On 8/2/11 5:26 PM, Patrick Hunt wrote: Eugene you saw my email about that test and ulimit right? Make sure you have ulimit for max files and max processes 1k, that was the issue I had (max proc was set to 1024). Patrick Thank you Patrick, no; I missed your earlier email about max processes. I increased my max processes to 10 by adding to /etc/security/limits.conf: ec2-user soft nproc 10 ec2-user hard nproc 10 It was great to see that QuorumZxidSyncTest passed after doing this (as well as ReadOnlyModeTest, which I neglected to metion, also failed before) . Unfortunately, now ObserverTest fails. So let me try increasing nofiles similarly and see if that helps (although I doubt this would be a factor, because it passed earlier with its currently limit of only 1024 max open files). -Eugene
Re: 3.4 Release.
Seems the observer (or the quorum itself) is failing to allow a client to connect: [junit] 2011-08-03 14:12:29,273 [myid:3] - INFO [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:11229:QuorumPeer@701] - OBSERVING [junit] 2011-08-03 14:12:29,359 [myid:] - INFO [main:ZooKeeper@427] - Initiating client connection, connectString=127.0.0.1:11229 sessionTimeout=3 watcher=org.apache.zookeeper.test.ObserverTest@6490832e [junit] 2011-08-03 14:12:29,378 [myid:] - INFO [main-SendThread():ClientCnxn$SendThread@888] - Opening socket connection to server /127.0.0.1:11229 [junit] 2011-08-03 14:12:29,379 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11229:NIOServerCnxnFactory@197] - Accepted socket connection from /127.0.0.1:56250 [junit] 2011-08-03 14:12:29,379 [myid:] - INFO [main-SendThread(localhost:11229):ClientCnxn$SendThread@814] - Socket connection established to localhost/127.0.0.1:11229, initiating session [junit] 2011-08-03 14:12:29,380 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11229:ZooKeeperServer@833] - Client attempting to establish new session at /127.0.0.1:56250 [junit] 2011-08-03 14:12:53,356 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11228:Leader@419] - Shutting down Notice that last line, ~24 seconds go by. Please file a bug on this. Blocker for 3.4.0. Can you try re-running your test, but modify it to attempt to have a client connect to a non-observer in the case that connecting to the observer fails? It would be interesting to see if this was an observer specific issue or not. (another thing perhaps to try is just have the existing client connect to a non-observer rather than the observer, run it a bunch of times and see if it happens) Patrick On Wed, Aug 3, 2011 at 7:16 AM, Eugene Koontz ekoo...@hiro-tan.org wrote: On 8/2/11 10:32 PM, Patrick Hunt wrote: What type of ec2 instance are you running on? I've seen some failures due to underpowered/underresourced systems. Is ObserverTest consistently failing? Patrick Hi Patrick, It's an m1.large. I have ulimit -a set so that open files and open processes are at 100,000. If I run ObserverTest on its own using the attached shell script repeat.sh (src/repeat.sh ObserverTest), it usually fails within 20 iterations; (although in the following pastebin it took 38 iterations to fail). It always fails in the same place, at ObserverTest.java:101: http://pastebin.com/BGNUb05t -Eugene
Re: devops/admin/client question: What do you do when you rollback?
On Fri, Aug 5, 2011 at 9:01 AM, Fournier, Camille F. camille.fourn...@gs.com wrote: Actuallly can I update the ConnectRequest protocol version number? If I can do that, I can have the server only send back the indicating ConnectResponse on clients with a higher protocol version. It doesn't look like it's read anywhere right now. (Moving this to dev since we've moved to a dev discussion) That's what I was going to suggest - upping the protocol version number for new clients. New servers can respond with ConnectionResponse2 if they see the new version, this response should have improved semantics. Otw they can just respond with the old version/resp. New clients will have to handle both types of responses. Patrick -Original Message- From: Fournier, Camille F. [Tech] Sent: Friday, August 05, 2011 11:57 AM To: 'u...@zookeeper.apache.org' Subject: RE: devops/admin/client question: What do you do when you rollback? Hmmm. I thought I had another way around this but I don't. We really didn't write the client to be easy to encode other errors in the connection result... I think any good solution will have to be in our 4.0 clojure rewrite ;) C -Original Message- From: Ted Dunning [mailto:ted.dunn...@gmail.com] Sent: Friday, August 05, 2011 11:51 AM To: u...@zookeeper.apache.org Subject: Re: devops/admin/client question: What do you do when you rollback? If you get the lower zxid from the leader then you know that things have gone south. Likewise, if you get a lower epoch number from a node that thinks that it is in quorum then things are not good. The definition of thinks it is in quorum is problematic of course. On Fri, Aug 5, 2011 at 10:57 AM, Fournier, Camille F. camille.fourn...@gs.com wrote: Oh blah, of course it won't be b/w compatible, because all the older clients would expire their sessions in the instance of a single zxid higher than the cluster zxid which I doubt most people want. Is there a way to check if the zxid of the client is higher than the current possible zxid after connection, and send the session_expired then? That would at least help us out most of the way. -Original Message- From: Patrick Hunt [mailto:ph...@apache.org] Sent: Thursday, August 04, 2011 7:23 PM To: u...@zookeeper.apache.org Subject: Re: devops/admin/client question: What do you do when you rollback? Sounds reasonable to me as long as it's b/w compatible (which it seems like it would be), anything we can do to improve this situation would be huge - I frequently see our support team trying to address this (e.g. the max count exceeded issue) with clients like hbase. Def plus for supportability. Patrick On Thu, Aug 4, 2011 at 4:11 PM, Camille Fournier cami...@apache.org wrote: I'm thinking of hacking it through the connectresponse session timeout (similar to the way we detect session rejected). I wrote up a prototype that worked ok this way. Might could extend this hack to other things, using that field as an encoded error msg, thoughts? C On Aug 4, 2011 6:10 PM, Patrick Hunt ph...@apache.org wrote: Our error reporting server-client has always been weak. It's a PITA to debug in production because a lot of times when the client gets bounced it's not clear from the client side why (you end up having to search the server log - for example when maxClientCount is exceeded). It would be great to fix this, esp if the server could provide insight to the client about why (an error code/message perhaps). Doing it in a b/w compatible way might be tough though... Patrick On Thu, Aug 4, 2011 at 2:45 PM, Ted Dunning ted.dunn...@gmail.com wrote: This is used normally to guarantee in-order data views. If you get disconnected from one host in an advanced state and then connect to an out of date slave, ZK automatically disconnects you to avoid letting you see time go backwards. Your situation is different of course. On Thu, Aug 4, 2011 at 7:05 PM, Fournier, Camille F. camille.fourn...@gs.com wrote: Right now the server just detects that the zxid is wrong, and calls close on the client. The client logs: 15:01:47,593 - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@1159] - Unable to read additional data from server sessionid 0x131962b0054, likely server has closed socket, closing socket connection and attempting reconnect (branch 3.3.3) I will poke around and see if I can figure out a nicer way to indicate this condition. The expired state is perfectly fine for me in my use case. C -Original Message- From: Patrick Hunt [mailto:ph...@apache.org] Sent: Thursday, August 04, 2011 1:51 PM To: u...@zookeeper.apache.org Subject: Re: devops/admin/client question: What do you do when you rollback? On Thu, Aug 4, 2011 at 10:29 AM, Fournier, Camille F. camille.fourn...@gs.com wrote: We had an issue here the other day
Re: Hadoop build machine update
IIRC there was a way to trigger patch builds for all patch available jiras (Nigel did this a while back) - Mahadev can you check with Giri? If not we'll either have to trigger the build manually (committers only) or the contributor would need to cancel patch then resubmit (no need to reattach the patch file itself). On Tue, Aug 9, 2011 at 11:01 PM, Mahadev Konar maha...@hortonworks.com wrote: Lakshman/others, Ill definitely send out a note on what to do. You are right that folks might have to re upload the patches to the jira. The slaves are still not fully functional yet. I expect them to be fully functional tomorrow late evening (US time). Hope that helps mahadev On Tue, Aug 9, 2011 at 10:44 PM, Laxman lakshman...@huawei.com wrote: Hi Mahadev, Do we need to resubmit the patches which were uploaded last week? Or Hudson will pick them up automatically? -Original Message- From: Mahadev Konar [mailto:maha...@hortonworks.com] Sent: Wednesday, August 10, 2011 8:14 AM To: dev@zookeeper.apache.org; Giridharan Kesavan Subject: Re: Hadoop build machine update Thanks a lot Giri. I see something for ZK builds. Hopefully they'll be functional tomm. thanks mahadev On Tue, Aug 9, 2011 at 7:35 PM, Giridharan Kesavan gkesa...@hortonworks.com wrote: All, All the hudson slaves are back online. I was working on getting the build tools installed on all the new slaves and it looks like Its going to take some more time to change the build job configs before I could enable the pre-commit jobs. Will get the precommit and nightly build jobs back online tomorrow. -Giri On Tue, Aug 2, 2011 at 11:45 PM, Nigel Daley nda...@mac.com wrote: Great! Thanks for getting the info. Cheers, n. On Aug 2, 2011, at 8:05 PM, Eric Baldeschwieler wrote: Hi Folks, I've talked to the folks at Yahoo about the build machines and am happy to report that an end to the blackout is in sight. 1) They need to reimage the machines, which is in progress and machines should be restored within a week (pessimistically). 2) They plan to reach out to the apache infrastructure team and change the admin of these machines to remove yahoo from the admin loop. This should avoid future outages and give apache more flexibility in managing the machines. I'm told they will be posting more details today or tomorrow. Everyone there is committed to maintaining quality support for Apache and is concerned about the outage and making sure it does not repeat. Thanks, E14
Re: Hadoop build machine update
You should be able to login to jenkins, try again: $ list_appgroups.pl hudson-jobadmin|egrep maha mahadev You can trigger builds one-by-one, but it's somewhat tedious if there's a large number. Goto the patch build job https://builds.apache.org//view/S-Z/view/ZooKeeper/job/PreCommit-ZOOKEEPER-Build/ and select build now the form will ask you for the jira number. Patrick On Wed, Aug 10, 2011 at 1:19 PM, Mahadev Konar maha...@hortonworks.com wrote: Ok, the pre commit builds are up! Thanks a lot to Giri. Pat, I do not have login on the hudson build machines - builds.apache.org. Giri tells me that we can trigger builds for all the patches by just specifying a patch name one at a time? Do you know how to do that? thanks mahadev On Aug 10, 2011, at 9:19 AM, Patrick Hunt wrote: IIRC there was a way to trigger patch builds for all patch available jiras (Nigel did this a while back) - Mahadev can you check with Giri? If not we'll either have to trigger the build manually (committers only) or the contributor would need to cancel patch then resubmit (no need to reattach the patch file itself). On Tue, Aug 9, 2011 at 11:01 PM, Mahadev Konar maha...@hortonworks.com wrote: Lakshman/others, Ill definitely send out a note on what to do. You are right that folks might have to re upload the patches to the jira. The slaves are still not fully functional yet. I expect them to be fully functional tomorrow late evening (US time). Hope that helps mahadev On Tue, Aug 9, 2011 at 10:44 PM, Laxman lakshman...@huawei.com wrote: Hi Mahadev, Do we need to resubmit the patches which were uploaded last week? Or Hudson will pick them up automatically? -Original Message- From: Mahadev Konar [mailto:maha...@hortonworks.com] Sent: Wednesday, August 10, 2011 8:14 AM To: dev@zookeeper.apache.org; Giridharan Kesavan Subject: Re: Hadoop build machine update Thanks a lot Giri. I see something for ZK builds. Hopefully they'll be functional tomm. thanks mahadev On Tue, Aug 9, 2011 at 7:35 PM, Giridharan Kesavan gkesa...@hortonworks.com wrote: All, All the hudson slaves are back online. I was working on getting the build tools installed on all the new slaves and it looks like Its going to take some more time to change the build job configs before I could enable the pre-commit jobs. Will get the precommit and nightly build jobs back online tomorrow. -Giri On Tue, Aug 2, 2011 at 11:45 PM, Nigel Daley nda...@mac.com wrote: Great! Thanks for getting the info. Cheers, n. On Aug 2, 2011, at 8:05 PM, Eric Baldeschwieler wrote: Hi Folks, I've talked to the folks at Yahoo about the build machines and am happy to report that an end to the blackout is in sight. 1) They need to reimage the machines, which is in progress and machines should be restored within a week (pessimistically). 2) They plan to reach out to the apache infrastructure team and change the admin of these machines to remove yahoo from the admin loop. This should avoid future outages and give apache more flexibility in managing the machines. I'm told they will be posting more details today or tomorrow. Everyone there is committed to maintaining quality support for Apache and is concerned about the outage and making sure it does not repeat. Thanks, E14
Re: Hadoop build machine update
All the PA jira that have been updated since Jenkins went down, you can find the list here: https://issues.apache.org/jira/secure/IssueNavigator.jspa?sorter/field=updatedsorter/order=DESC Thanks! Patrick On Wed, Aug 10, 2011 at 1:25 PM, Giridharan Kesavan gkesa...@yahoo-inc.com wrote: Do you have the patch number handy ? I can trigger the patch builds from the web UI this time. Pass on the jira numbers.. Whatever patch that you guys submit from now on will be picked up by jenkins automatically. Thanks, Giri On 8/10/11 1:19 PM, Mahadev Konar maha...@hortonworks.com wrote: Ok, the pre commit builds are up! Thanks a lot to Giri. Pat, I do not have login on the hudson build machines - builds.apache.org. Giri tells me that we can trigger builds for all the patches by just specifying a patch name one at a time? Do you know how to do that? thanks mahadev On Aug 10, 2011, at 9:19 AM, Patrick Hunt wrote: IIRC there was a way to trigger patch builds for all patch available jiras (Nigel did this a while back) - Mahadev can you check with Giri? If not we'll either have to trigger the build manually (committers only) or the contributor would need to cancel patch then resubmit (no need to reattach the patch file itself). On Tue, Aug 9, 2011 at 11:01 PM, Mahadev Konar maha...@hortonworks.com wrote: Lakshman/others, Ill definitely send out a note on what to do. You are right that folks might have to re upload the patches to the jira. The slaves are still not fully functional yet. I expect them to be fully functional tomorrow late evening (US time). Hope that helps mahadev On Tue, Aug 9, 2011 at 10:44 PM, Laxman lakshman...@huawei.com wrote: Hi Mahadev, Do we need to resubmit the patches which were uploaded last week? Or Hudson will pick them up automatically? -Original Message- From: Mahadev Konar [mailto:maha...@hortonworks.com] Sent: Wednesday, August 10, 2011 8:14 AM To: dev@zookeeper.apache.org; Giridharan Kesavan Subject: Re: Hadoop build machine update Thanks a lot Giri. I see something for ZK builds. Hopefully they'll be functional tomm. thanks mahadev On Tue, Aug 9, 2011 at 7:35 PM, Giridharan Kesavan gkesa...@hortonworks.com wrote: All, All the hudson slaves are back online. I was working on getting the build tools installed on all the new slaves and it looks like Its going to take some more time to change the build job configs before I could enable the pre-commit jobs. Will get the precommit and nightly build jobs back online tomorrow. -Giri On Tue, Aug 2, 2011 at 11:45 PM, Nigel Daley nda...@mac.com wrote: Great! Thanks for getting the info. Cheers, n. On Aug 2, 2011, at 8:05 PM, Eric Baldeschwieler wrote: Hi Folks, I've talked to the folks at Yahoo about the build machines and am happy to report that an end to the blackout is in sight. 1) They need to reimage the machines, which is in progress and machines should be restored within a week (pessimistically). 2) They plan to reach out to the apache infrastructure team and change the admin of these machines to remove yahoo from the admin loop. This should avoid future outages and give apache more flexibility in managing the machines. I'm told they will be posting more details today or tomorrow. Everyone there is committed to maintaining quality support for Apache and is concerned about the outage and making sure it does not repeat. Thanks, E14
Re: Question on test timeouts
Can you both check your ulimits? I was seeing random failures when max user processes was too low (1024, although this seems to be an issue with server shutdown) Also if the open files is too low. What does ulimit -a look like? Patrick On Fri, Aug 19, 2011 at 11:02 AM, Vishal Kathuria vishal.kathu...@fb.com wrote: Thanks Camille, Comforting to know I am not the only one seeing this, so it is not a regression out of my changes. Vishal -Original Message- From: Fournier, Camille F. [mailto:camille.fourn...@gs.com] Sent: Thursday, August 18, 2011 1:21 PM To: 'dev@zookeeper.apache.org' Subject: RE: Question on test timeouts The hammer tests always seem to fail for me too. I've started ignoring them, which is probably not a good thing. -Original Message- From: Vishal Kathuria [mailto:vishal.kathu...@fb.com] Sent: Thursday, August 18, 2011 4:11 PM To: dev@zookeeper.apache.org Subject: Question on test timeouts Hi, I apologize if I this is mentioned somewhere in the wiki and I missed it. For some tests, I see them pass and then fail because of a timeout. (This is on a clean checkout from the trunk) junit.run: [junit] Running org.apache.zookeeper.test.ObserverQuorumHammerTest [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 35.899 sec [junit] Running org.apache.zookeeper.test.ObserverQuorumHammerTest [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec [junit] Test org.apache.zookeeper.test.ObserverQuorumHammerTest FAILED (timeout) Does anyone know what could be going on? The log doesn't say much about what happened. Thanks! Vishal
more problems on apache jenkins CI.
recent trunk builds have been failing due to some issue with the findbugs jenkins plugin on build@. I've notified builds@ and turned off the jenkins plugin for the time being, this should clear up the issue. Patrick
Re: [ANN] gozk updated
Nice! thanks for passing it along. I've updated the client bindings wiki page to include: https://cwiki.apache.org/confluence/display/ZOOKEEPER/ZKClientBindings Patrick On Thu, Aug 18, 2011 at 9:12 PM, Gustavo Niemeyer gust...@niemeyer.net wrote: Greetings, A significant update to gozk, the ZooKeeper bindings for Go, was just released. Among other things, this update includes several small tweaks to Event handling to make it easier to produce correct code. gozk is an experiment we push as part of Ensemble at Canonical: https://wiki.ubuntu.com/gozk API documentation is available at: http://goneat.org/lp/gozk The following changes and improvements were made. Please read it through as there are incompatible changes. - Event channels will now receive Event rather than *Event. - STATE_CLOSED has been changed to 0 so that events obtained from closed channels can be more easily handled. - Session event channel is now buffered. You should still consume events from it if your application is long-living, though, for processing session expirations with re-connections or panics. If the session channel buffer fills up, the application will panic. - Watch event channels will not receive transient events such as STATE_CONNECTING or STATE_ASSOCIATING anymore, since they disrupt the workflow unnecessarily. These events are still sent to the session event channel, though, and critical events such as STATE_EXPIRED_SESSION are still sent to watch channels. For a full description of how events are now handled, see the documentation: http://goneat.org/lp/gozk#Event - Event now has Ok and String methods. The Ok method returns true if the event reports a usable connection to ZooKeeper, and String enables using critical events as errors. Combined, these methods enable simplified watch handling: event := -watch if !event.Ok() { err = event return } - Init and ReInit now take parameters in nanoseconds rather than milliseconds, sice that's the most used convention in Go. Make sure you update the calls. - Several functions were renamed for improved styling: zk.GetACL = zk.ACL zk.GetChildren = zk.Children zk.GetChildrenW = zk.ChildrenW zk.GetClientId = zk.ClientId -- Gustavo Niemeyer http://niemeyer.net http://niemeyer.net/plus http://niemeyer.net/twitter http://niemeyer.net/blog -- I never filed a patent.
Re: more problems on apache jenkins CI.
builds@ jenkins seems to be borked - I had to turn off a number of plugins before it started passing again. I've notified builds@ but they are currently unable to clear the issue. For the time being jenkins will not report findbugs/warnings/clover results - however these tests are still running. We should be ok for now. Patrick On Wed, Aug 24, 2011 at 9:52 AM, Patrick Hunt ph...@apache.org wrote: recent trunk builds have been failing due to some issue with the findbugs jenkins plugin on build@. I've notified builds@ and turned off the jenkins plugin for the time being, this should clear up the issue. Patrick
Re: what happens when AuthenticationProvider throws an exception
Probably should have caught up with all my email first... did you find a resolution for this? On Fri, Aug 12, 2011 at 11:00 AM, Fournier, Camille F. camille.fourn...@gs.com wrote: Hi guys, So debugging some fun issues in my dev cluster, I discovered that due to some bad user data, my AuthenticationProvider was throwing a null pointer exception inside the handleAuthentication call. This call is made inside of NIOServerCnxn.readRequest, and there is no try catch block. So it bubbles all the way up to the NIOServerCnxn run method, which only logs it. Eventually I end up with the corrupted request buffer I sent earlier: 2011-08-12 08:01:16,602 - ERROR [CommitProcessor:4:FinalRequestProcessor@347] - Failed to process sessionid:0x5319dd2bf3403f4 type:exists cxid:0x0 zxid:0xfffe txntype:unknown reqpath:n/a java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:180) at java.io.DataInputStream.readFully(DataInputStream.java:152) at org.apache.jute.BinaryInputArchive.readString(BinaryInputArchive.java:82) at org.apache.zookeeper.proto.ExistsRequest.deserialize(ExistsRequest.java:55) at org.apache.zookeeper.server.ZooKeeperServer.byteBuffer2Record(ZooKeeperServer.java:599) at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:227) at org.apache.zookeeper.server.quorum.Leader$ToBeAppliedRequestProcessor.processRequest(Leader.java:540) at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:73) 2011-08-12 08:01:16,602 - ERROR [CommitProcessor:4:FinalRequestProcessor@354] - Dumping request buffer: 0x504150 I suspect this is due to the fact that, in the readPayload method, we don't call clear on lenBuffer when an exception is thrown by readRequest. Question: Obviously, I need to fix the exception that is being thrown by my AuthenticationProvider, but do we want to put some try/catch logic around that call? It seems like the error there is probably contributing to my corrupted buffer problem. C
Re: more problems on apache jenkins CI.
FYI builds@ resolved the issue, I've reverted back to our previous configuration and everything seems ok at this point. On Thu, Aug 25, 2011 at 11:05 PM, Mahadev Konar maha...@hortonworks.com wrote: Thanks Pat! mahadev On Wed, Aug 24, 2011 at 3:19 PM, Patrick Hunt ph...@apache.org wrote: builds@ jenkins seems to be borked - I had to turn off a number of plugins before it started passing again. I've notified builds@ but they are currently unable to clear the issue. For the time being jenkins will not report findbugs/warnings/clover results - however these tests are still running. We should be ok for now. Patrick On Wed, Aug 24, 2011 at 9:52 AM, Patrick Hunt ph...@apache.org wrote: recent trunk builds have been failing due to some issue with the findbugs jenkins plugin on build@. I've notified builds@ and turned off the jenkins plugin for the time being, this should clear up the issue. Patrick
Re: Small patch for test classes to better support eclipse
Warren, sounds good, would you mind submitting as a patch on a jira? thanks! http://wiki.apache.org/hadoop/ZooKeeper/HowToContribute basically create a jira here and attach your patch: https://issues.apache.org/jira/browse/ZOOKEEPER Patrick On Mon, Aug 29, 2011 at 2:27 PM, Warren Turkal w...@penguintechs.org wrote: Here is a small patch to better support eclipse. It makes one of the base classes that doesn't include any tests get ignored by the JUnit4 test framework. Please include, if possible. Thanks, wt Index: src/java/systest/org/apache/zookeeper/test/system/BaseSysTest.java === --- src/java/systest/org/apache/zookeeper/test/system/BaseSysTest.java (revision 1162989) +++ src/java/systest/org/apache/zookeeper/test/system/BaseSysTest.java (working copy) @@ -32,8 +32,10 @@ import org.apache.zookeeper.ZooKeeper; import org.apache.zookeeper.server.quorum.QuorumPeer; import org.apache.zookeeper.server.quorum.QuorumPeer.QuorumServer; +import org.junit.Ignore; import org.junit.runner.JUnitCore; +@Ignore public class BaseSysTest extends TestCase { private static int fakeBasePort = 33222; private static String zkHostPort;
Re: zk keeps disconnecting and reconnecting
Based on past experience I believe it's going to take a fix release or two before 3.4 is rock solid, I personally think we should do a 3.3.4. Notice there are 6 blockers currently listed in 3.3.4 https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=truejqlQuery=project+%3D+ZOOKEEPER+AND+fixVersion+%3D+12316276+ORDER+BY+priority+DESC%2C+key+DESC I'd be happy to RM 3.3.4 if no one else is available to do it. My goal would be to push out a fix release containing the current listed blockers plus anything else that's currently available. Patrick On Tue, Aug 30, 2011 at 4:45 PM, Benjamin Reed br...@apache.org wrote: i have been wondering about 3.3.4. there are so many great bugs that were fixed in 3.4.0 that it isn't clear what we should put into 3.3.4 or if we should even do it. the chroot bug does seem like a good one to do a 3.3.4 release for. ben On Mon, Aug 29, 2011 at 12:45 PM, Mahadev Konar maha...@hortonworks.com wrote: Camille, I will be cutting a branch this week some time. Just waiting for ZOOKEEPER-999 to get in. Other than that, we are probably 2 weeks away from the release. 3.3.4 would be good even if we have 3.4 coming in a week or 2. Thats because 3.4.0 might take sometime to stabilize and 3.3.4 would be a good stable release (recommended for production use), until 3.4 stabilizes. Does that sound reasonable? Others? thanks mahadev On Aug 29, 2011, at 12:38 PM, Fournier, Camille F. wrote: Yeah let's put it in 3.3.4. What's the plan for 3.4? I thought we were almost ready for that. C -Original Message- From: Mahadev Konar [mailto:maha...@hortonworks.com] Sent: Monday, August 29, 2011 2:10 PM To: u...@zookeeper.apache.org Subject: Re: zk keeps disconnecting and reconnecting Camille, Do you think we should put the fix in 3.3.4? I think 3.4 might take a while to stabilize, so 3.3.4 would be a good release to get this in. Thoughts? mahadev On Aug 29, 2011, at 10:50 AM, Fournier, Camille F. wrote: Well, it causes the problem you are seeing. If you set any watchers with a chroot and then your client gets disconnected with these watches outstanding, when you reconnect you will try to reset them and they are probably on paths that don't exist (if you are creating everything under path /kafka-tracking). So you get a notification about the watches immediately after resetting them, which causes the string out of bounds exception. The only fix is to disable auto watch reset, and then have your own client reset watches when it gets a reconnected event. I suspect it would be easier for you to take a shot at fixing the bug than to rewrite your client to handle this. Thomas provided a patch with tests that presumably show the error, so all you need is a fix to make them pass. C -Original Message- From: Jun Rao [mailto:jun...@gmail.com] Sent: Monday, August 29, 2011 12:39 PM To: u...@zookeeper.apache.org; tho...@koch.ro Subject: Re: zk keeps disconnecting and reconnecting What's the impact of ZOOKEEPER-961? If it shows up, does that mean the client won't get any watcher events afterwards? If so, this sounds like a blocker for 3.4 release to me. What's the temporary solution for 3.3.3? Also, for the very first time that the ZK client gets disconnected, I saw the following entry in the log. It seems that the client can't ping the server for 4 seconds. The ZK server was up at that time and the load was minimal. What could cause the time out? Client GC pauses? 2011/08/26 10:58:22.306 INFO [ClientCnxn] [main-SendThread(esv4-app27.stg:12913)] [kafka] Client session timed out, have not heard from server in 4001ms for sessionid 0x131f ddd84bc0006, closing socket connection and attempting reconnect Thanks, Jun On Mon, Aug 29, 2011 at 7:54 AM, Thomas Koch tho...@koch.ro wrote: Fournier, Camille F.: Did anyone ever check resetting watches at client reconnect on a client with a chroot? Looking at the code, we store the watches associated with the non-chroot path, but they are set by the original request prepending chroot to the request. However, it looks like the SetWatches request on reconnect just calls get on the various watch lists from ZooKeeper, which don't have the prepended chroot. I haven't written a test but I would bet dollars to donuts this is the problem. C seems to be this: ZOOKEEPER-961, ZOOKEEPER-1091 Regards, Thomas Koch, http://www.koch.ro
Re: Review Request: automating log and snapshot cleaning
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1043/ --- (Updated 2011-09-01 16:40:46.185217) Review request for zookeeper, Patrick Hunt, Benjamin Reed, and Mahadev Konar. Changes --- updated to 1107.5 patch Summary --- I like to have ZK itself manage the amount of snapshots and logs kept, instead of relying on the PurgeTxnLog utility. This addresses bug ZOOKEEPER-1107. https://issues.apache.org/jira/browse/ZOOKEEPER-1107 Diffs (updated) - ./src/java/main/org/apache/zookeeper/server/quorum/QuorumPeerMain.java 1149082 ./src/java/main/org/apache/zookeeper/server/quorum/QuorumPeerConfig.java 1149082 ./src/java/main/org/apache/zookeeper/server/DatadirCleanupManager.java PRE-CREATION ./conf/zoo_sample.cfg 1149082 ./src/docs/src/documentation/content/xdocs/zookeeperAdmin.xml 1149082 ./src/java/test/org/apache/zookeeper/server/DatadirCleanupManagerTest.java PRE-CREATION Diff: https://reviews.apache.org/r/1043/diff Testing --- test added, passing hudson qa bot. Thanks, Patrick
Re: ZooKeeper cleanup / refactoring / scala migration
Hey Thomas! I've raised a scala port a number of times previously, most recently at the post-summit meetup: http://markmail.org/message/t32x22hmifo3urxk We discussed this shortly both at the meetup and subsequently on list. Unfortunately there was no consensus around building ZK on scala (or any other language other than java -- Ted asked to port to clojure instead), in particular the point was raised that we, the currently zk community, are a community of ZK Java developers with no experience with scala (or clojure/c#(?) for that matter). What I think would be successful is for you to bring this to the incubator - ZooKeeper-Scala: re-implement ZooKeeper in Scala. A community of ZK scala developers could be grown there, see the proposal guide, in particular we (ZK tlp) could sponsor this effort, with the intent that upon successful graduation from the incubator we pull in zk-scala as either a subproject or just part of the TLP but as a separate release artifact from zk java server. see this link for details: http://incubator.apache.org/guides/proposal.html#template-sponsoring-entity I'd be willing to be the champion and a mentor for this project in the incubator. Regards, Patrick On Fri, Sep 2, 2011 at 7:12 AM, Fournier, Camille F. camille.fourn...@gs.com wrote: Hi Thomas, Here's my feedback: 1. For any useful fixes you find here, please follow the normal procedures of raising a ticket and attaching a patch. In my experience, static analysis tools often carry with them a lot of irrelevant noise, but as long as the changes you propose are clean and don't break backwards compatibility, I would be happy to review and accept such changes. What I can't promise is that you will get much traction on trying to push a patch with a huge number of fixes at once. Every line of code we have to review increases the complexity of our job as reviewers. It would be great if you could break these up into small patches and would definitely increase the odds of the changes being accepted. We already have a reviewboard set up for zookeeper which you should plan to use, and of course the hadoop build farm. If you believe additional analysis tools would be useful in our build, please work with our build.xml and the infra team to get the necessary tools installed. It doesn't do us much good to fix things caught by your build server and then have them break again because we don't have the tooling available. 2. I doubt you will get much traction here on pushing changes back. I'm a big fan of refactoring but at some point refactoring for refactoring's sake does nothing but muddle the change history of a code base. Any refactoring that needs to be done would be best done in conjunction with a fix or feature that it helps to enable. 3. Sounds like a fun academic exercise for you. Maybe you could start with a Scala client that we could support. Not sure there's any benefit since Scala can run Java code, but it could be interesting and maybe something we could take back as a contribution if you wanted. C -Original Message- From: Thomas Koch [mailto:tho...@koch.ro] Sent: Friday, September 02, 2011 9:16 AM To: dev@zookeeper.apache.org Subject: ZooKeeper cleanup / refactoring / scala migration Hi, my university labs work has started yesterday. In the next two months I'll work on ZooKeeper. This work has three major goals - improve the maintainability of the code base - migrate ZooKeeper to scala and use actors for reliable concurrency - find other developers to collaborate on the scala version I plan to work in these steps: 1. use static analysis tools to cleanup the java codebase ( checkstyle, pmd, findbugs, eclipse compiler warnings ) 2. decrease dependencies in the ZK java packages, break cyclic dependencies 3. migrate components one-by-one to scala, starting with the leafs of the dependency graph I'd be happy if as much as possible of my work in steps 1. and 2. could also be useful for you. What would be a good strategy to go? I assume you're concentrating now on getting 3.4 out and don't have time for other things. Can I help on 3.4 blockers? I've set up a gerrit[1] (git based code review) and a jenkins[2] server for my project. Jenkins is armed with checkstyle, findbugs, pmd, copy-paste-detection and jdepends. I've carefully selected the checks run by checkstyle and pmd and would suggest that the remaining warnings should really be eliminated. Gerrit already hosts two changesets which eliminate nearly all eclipse compiler warnings. I'd be happy if you'd like to create an account at the gerrit instance (openid needed) and play around with it. I can also give you reviewer status which lets you push changes. Every change push will trigger a jenkins build. I've already proposed to infra@a.o that I'd help in setting up a gerrit server for the ASF if any project would be interested. [1] http://koch.ro:8081 [2]
Re: [jira] [Commented] (ZOOKEEPER-1158) C# client
Hi Andrew. If we can get the code to run successfully with Mono then we'd can open an INFRA jira to make that happen (we don't admin the Jenkins servers - builds@ handles that) See this as an example of such a request: https://issues.apache.org/jira/browse/INFRA-3842 In this case the request would have to be tailored to install mono on Jenkins, etc... Patrick On Fri, Sep 2, 2011 at 8:18 AM, Andrew Finnell andrew.finn...@gmail.com wrote: There is an alternative to a Windows .NET build system. You can use Mono on the existing Hudson Linux machine. I can setup a build file that will build the client, if someone is able to install at least 2.10.4 of Mono on the server and set MONO_HOME. Andrew On Sep 2, 2011, at 10:38 AM, Camille Fournier (JIRA) wrote: [ https://issues.apache.org/jira/browse/ZOOKEEPER-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096015#comment-13096015 ] Camille Fournier commented on ZOOKEEPER-1158: - Eric, Very exciting, thanks so much for pushing this back. I think we need to get a .NET build for our build farm. We're already working on a windows C++ build so hopefully this won't be too difficult. Which version of .NET is this written against, 4.0? I'm going to have my team here try to apply and compile the patch. We use 3.5 though, so not sure we will be successful. Either way I think we can do the code review since we're already using the old SharpKeeper BTW, the second patch should probably be submitted on top of the first, instead of as a separate thing. C# client - Key: ZOOKEEPER-1158 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1158 Project: ZooKeeper Issue Type: Improvement Reporter: Eric Hauser Assignee: Eric Hauser Attachments: ZOOKEEPER-1158.patch, ZOOKEEPER-1158_2.patch Native C# client for ZooKeeper. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Fwd: [PROPOSAL] Accumulo for the Apache Incubator
FYI, another project using ZK -- woot!!! (note that they have their own WAL - perhaps a good application for BookKeeper?) -- Forwarded message -- From: Billie J Rinaldi billie.j.rina...@ugov.gov Date: Fri, Sep 2, 2011 at 8:45 AM Subject: [PROPOSAL] Accumulo for the Apache Incubator To: gene...@incubator.apache.org Greetings, I would like to propose Accumulo to be an Apache Incubator project. Accumulo is a distributed key/value store that provides expressive cell-level access labels and a server-side programming mechanism that can modify key/value pairs at various points in the data management process. It is based on Google's BigTable design and runs over Apache Hadoop and Zookeeper. Here is a link to the proposal in the Incubator wiki: http://wiki.apache.org/incubator/AccumuloProposal I've also pasted the initial contents below. Thanks, Billie Rinaldi = Accumulo Proposal = == Abstract == Accumulo is a distributed key/value store that provides expressive, cell-level access labels. == Proposal == Accumulo is a sorted, distributed key/value store based on Google's BigTable design. It is built on top of Apache Hadoop, Zookeeper, and Thrift. It features a few novel improvements on the BigTable design in the form of cell-level access labels and a server-side programming mechanism that can modify key/value pairs at various points in the data management process. == Background == Google published the design of BigTable in 2006. Several other open source projects have implemented aspects of this design including HBase, CloudStore, and Cassandra. Accumulo began its development in 2008. == Rationale == There is a need for a flexible, high performance distributed key/value store that provides expressive, fine-grained access labels. The communities we expect to be most interested in such a project are government, health care, and other industries where privacy is a concern. We have made much progress in developing this project over the past 3 years and believe both the project and the interested communities would benefit from this work being openly available and having open development. == Current Status == === Meritocracy === We intend to strongly encourage the community to help with and contribute to the code. We will actively seek potential committers and help them become familiar with the codebase. === Community === A strong government community has developed around Accumulo and training classes have been ongoing for about a year. Hundreds of developers use Accumulo. === Core Developers === The developers are mainly employed by the National Security Agency, but we anticipate interest developing among other companies. === Alignment === Accumulo is built on top of Hadoop, Zookeeper, and Thrift. It builds with Maven. Due to the strong relationship with these Apache projects, the incubator is a good match for Accumulo. == Known Risks == === Orphaned Products === There is only a small risk of being orphaned. The community is committed to improving the codebase of the project due to its fulfilling needs not addressed by any other software. === Inexperience with Open Source === The codebase has been treated internally as an open source project since its beginning, and the initial Apache committers have been involved with the code for multiple years. While our experience with public open source is limited, we do not anticipate difficulty in operating under Apache's development process. === Homogeneous Developers === The committers have multiple employers and it is expected that committers from different companies will be recruited. === Reliance on Salaried Developers === The initial committers are all paid by their employers to work on Accumulo and we expect such employment to continue. Some of the initial committers would continue as volunteers even if no longer employed to do so. === Relationships with Other Apache Products === Accumulo uses Hadoop, Zookeeper, Thrift, Maven, log4j, commons-lang, -net, -io, -jci, -collections, -configuration, -logging, and -codec. === Relationship to HBase === Accumulo and HBase are both based on the design of Google's BigTable, so there is a danger that potential users will have difficulty distinguishing the two or that they will not see an incentive in adopting Accumulo. There are a few key areas in which Accumulo differs from HBase. Some of the desired features of Accumulo could be incorporated into HBase, however the most important of these may be unlikely to be adopted (see cell-level access labels and iterators below). It is a possibility that the codebases will ultimately converge, but the number of differences at the current time warrants a separate project for Accumulo. Access Labels Accumulo has an additional portion of its key that sorts after the column qualifier and before the timestamp. It is called column visibility and enables expressive cell-level access control. Authorizations are passed with each query
Re: [PROPOSAL] Accumulo for the Apache Incubator
Seems similar, see the proposal, there are a few sections that call out the differences. (search for hbase) On Fri, Sep 2, 2011 at 9:45 AM, Mahadev Konar maha...@hortonworks.com wrote: Nice! Is this related to HBase? Or similar to it? mahadev On Fri, Sep 2, 2011 at 9:27 AM, Patrick Hunt ph...@apache.org wrote: FYI, another project using ZK -- woot!!! (note that they have their own WAL - perhaps a good application for BookKeeper?) -- Forwarded message -- From: Billie J Rinaldi billie.j.rina...@ugov.gov Date: Fri, Sep 2, 2011 at 8:45 AM Subject: [PROPOSAL] Accumulo for the Apache Incubator To: gene...@incubator.apache.org Greetings, I would like to propose Accumulo to be an Apache Incubator project. Accumulo is a distributed key/value store that provides expressive cell-level access labels and a server-side programming mechanism that can modify key/value pairs at various points in the data management process. It is based on Google's BigTable design and runs over Apache Hadoop and Zookeeper. Here is a link to the proposal in the Incubator wiki: http://wiki.apache.org/incubator/AccumuloProposal I've also pasted the initial contents below. Thanks, Billie Rinaldi = Accumulo Proposal = == Abstract == Accumulo is a distributed key/value store that provides expressive, cell-level access labels. == Proposal == Accumulo is a sorted, distributed key/value store based on Google's BigTable design. It is built on top of Apache Hadoop, Zookeeper, and Thrift. It features a few novel improvements on the BigTable design in the form of cell-level access labels and a server-side programming mechanism that can modify key/value pairs at various points in the data management process. == Background == Google published the design of BigTable in 2006. Several other open source projects have implemented aspects of this design including HBase, CloudStore, and Cassandra. Accumulo began its development in 2008. == Rationale == There is a need for a flexible, high performance distributed key/value store that provides expressive, fine-grained access labels. The communities we expect to be most interested in such a project are government, health care, and other industries where privacy is a concern. We have made much progress in developing this project over the past 3 years and believe both the project and the interested communities would benefit from this work being openly available and having open development. == Current Status == === Meritocracy === We intend to strongly encourage the community to help with and contribute to the code. We will actively seek potential committers and help them become familiar with the codebase. === Community === A strong government community has developed around Accumulo and training classes have been ongoing for about a year. Hundreds of developers use Accumulo. === Core Developers === The developers are mainly employed by the National Security Agency, but we anticipate interest developing among other companies. === Alignment === Accumulo is built on top of Hadoop, Zookeeper, and Thrift. It builds with Maven. Due to the strong relationship with these Apache projects, the incubator is a good match for Accumulo. == Known Risks == === Orphaned Products === There is only a small risk of being orphaned. The community is committed to improving the codebase of the project due to its fulfilling needs not addressed by any other software. === Inexperience with Open Source === The codebase has been treated internally as an open source project since its beginning, and the initial Apache committers have been involved with the code for multiple years. While our experience with public open source is limited, we do not anticipate difficulty in operating under Apache's development process. === Homogeneous Developers === The committers have multiple employers and it is expected that committers from different companies will be recruited. === Reliance on Salaried Developers === The initial committers are all paid by their employers to work on Accumulo and we expect such employment to continue. Some of the initial committers would continue as volunteers even if no longer employed to do so. === Relationships with Other Apache Products === Accumulo uses Hadoop, Zookeeper, Thrift, Maven, log4j, commons-lang, -net, -io, -jci, -collections, -configuration, -logging, and -codec. === Relationship to HBase === Accumulo and HBase are both based on the design of Google's BigTable, so there is a danger that potential users will have difficulty distinguishing the two or that they will not see an incentive in adopting Accumulo. There are a few key areas in which Accumulo differs from HBase. Some of the desired features of Accumulo could be incorporated into HBase, however the most important of these may be unlikely to be adopted (see cell-level access labels and iterators below
Re: 3.4 update.
On Fri, Sep 2, 2011 at 10:46 AM, Mahadev Konar maha...@hortonworks.com wrote: ZOOKEEPER-1149 - Pat any suggestion on if we just want to put this in release notes? I think release notes are fine --- I'll take this. ZOOKEEPER-1159 - Anyone want to take this up? ZOOKEEPER-1159 - Vishal/Alex any update on this? You listed 1159 twice, did you mean that? ZOOKEEPER-1107 - Pat/Laxman this seems ready to go right? 1107 is done, Laxman clarified my concern, I'll commit this later today as long as there are no other objections. Patrick
Clover now running again on ZooKeeper trunk
Clover has been re-installed on the hadoop# jenkins machines. I've updated our job to again run clover against nightly ZK trunk. https://builds.apache.org//view/S-Z/view/ZooKeeper/job/ZooKeeper-trunk/ Notice that our coverage has been declining over the past few weeks. Not good. Committers please be sure that changes are accompanied by adequate testing. Patrick
Re: Reviewboard, Gerrit was: ZooKeeper cleanup / refactoring / scala migration
Thomas, checkout post-review from https://github.com/reviewboard/rbtools I haven't used it against Apache rb, however I use it to great effect inside Cloudera. It will allow you to cut the number of steps significantly (I have small bash scripts wrapping post-review for the various projects I contribute to). Enjoy, Patrick On Fri, Sep 2, 2011 at 12:48 PM, Thomas Koch tho...@koch.ro wrote: Fournier, Camille F.: We already have a reviewboard set up for zookeeper which you should plan to Hi Fournier, I'll use reviewboard for patches targeted for inclusion. However just for reference and to make you curious, I compare the steps necessary to create a review in reviewboard and in gerrit: reviewboard: - create a patch file - open web interface, log in - select project - select patch file - select reviewer group - select jira bug (lookup bug number in jira...) - copy summary from jira issue (or write from scratch) - enter . for description - publish gerrit: - include issue number in commit message - git push gerrit ... That's it? That's it! Additionally gerrit also triggers jenkins, merges the change in the master branch once the review is finished and hosts your GIT repos with access control. There's also inclusion in mylyn to let you do reviews from eclipse... :-) So if you like GIT you're invited to just play around with the gerrit instance. Best regards, Thomas Koch, http://www.koch.ro
Re: Review Request: Fix compiler (eclipse) warnings: unused imports, unused variables, missing generics
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1709/#review1738 --- Ship it! lgtm - Patrick On 2011-09-02 19:43:36, Thomas Koch wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1709/ --- (Updated 2011-09-02 19:43:36) Review request for zookeeper. Summary --- . This addresses bug ZOOKEEPER-1170. https://issues.apache.org/jira/browse/ZOOKEEPER-1170 Diffs - src/java/main/org/apache/jute/BinaryOutputArchive.java 213e203 src/java/main/org/apache/jute/CsvInputArchive.java 3eb40ec src/java/main/org/apache/jute/CsvOutputArchive.java f6d60d8 src/java/main/org/apache/jute/OutputArchive.java 4e084e8 src/java/main/org/apache/jute/RecordReader.java 2977d3f src/java/main/org/apache/jute/RecordWriter.java 0adbd56 src/java/main/org/apache/jute/XmlOutputArchive.java b65e9a0 src/java/main/org/apache/zookeeper/ClientCnxnSocketNIO.java 626da04 src/java/main/org/apache/zookeeper/Environment.java 9a66743 src/java/main/org/apache/zookeeper/MultiResponse.java 70f7623 src/java/main/org/apache/zookeeper/MultiTransactionRecord.java 801969a src/java/main/org/apache/zookeeper/ServerAdminClient.java da17fcf src/java/main/org/apache/zookeeper/Shell.java 789c481 src/java/main/org/apache/zookeeper/server/DataTree.java 0690ce9 src/java/main/org/apache/zookeeper/server/FinalRequestProcessor.java b690817 src/java/main/org/apache/zookeeper/server/LogFormatter.java cd1347d src/java/main/org/apache/zookeeper/server/PrepRequestProcessor.java 30ebf68 src/java/main/org/apache/zookeeper/server/PurgeTxnLog.java 185c1e1 src/java/main/org/apache/zookeeper/server/Request.java 80d2b99 src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java c085bfb src/java/main/org/apache/zookeeper/server/persistence/FileTxnLog.java 05d8431 src/java/main/org/apache/zookeeper/server/quorum/AuthFastLeaderElection.java cac7140 src/java/main/org/apache/zookeeper/server/quorum/FastLeaderElection.java a584adc src/java/main/org/apache/zookeeper/server/quorum/Follower.java ab3f288 src/java/main/org/apache/zookeeper/server/quorum/Leader.java fb9dbde src/java/main/org/apache/zookeeper/server/quorum/LeaderElection.java 77c27e7 src/java/main/org/apache/zookeeper/server/quorum/Observer.java ee61a90 src/java/main/org/apache/zookeeper/server/quorum/ProposalRequestProcessor.java 17ca8fb src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java 1b9c409 src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java 61f1f70 src/java/main/org/apache/zookeeper/server/quorum/flexible/QuorumHierarchical.java d37881f src/java/main/org/apache/zookeeper/server/upgrade/DataTreeV1.java e3d0633 src/java/main/org/apache/zookeeper/server/upgrade/UpgradeSnapShotV1.java aecc4d2 Diff: https://reviews.apache.org/r/1709/diff Testing --- Thanks, Thomas
Re: Clover now running again on ZooKeeper trunk
Thanks Camille for looking into this! I also noticed that the kerberos patch introduced a number of new compiler warnings, it would be great if we could knock compiler warnings down to 0: https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ZooKeeper-trunk/1292/warningsResult/ Patrick On Fri, Sep 2, 2011 at 3:57 PM, Camille Fournier cami...@apache.org wrote: Being the coverage obsessive that I am, I drilled into this and it looks like it's basically entirely the Kerberos patch that caused the coverage change. If we care I suspect there's a few edges there that could take unit testing. Everything else seemed to stay the same or go up. C On Fri, Sep 2, 2011 at 2:47 PM, Patrick Hunt ph...@apache.org wrote: Clover has been re-installed on the hadoop# jenkins machines. I've updated our job to again run clover against nightly ZK trunk. https://builds.apache.org//view/S-Z/view/ZooKeeper/job/ZooKeeper-trunk/ Notice that our coverage has been declining over the past few weeks. Not good. Committers please be sure that changes are accompanied by adequate testing. Patrick
Re: ZooKeeper cleanup / refactoring / scala migration
+1, having sonar would be nice although I've found that having tools that no one pays attention to is not useful. The best way to combat that is to integrate it into the build (e.g. hadoop QA bot). On Mon, Sep 5, 2011 at 1:27 AM, Thomas Koch tho...@koch.ro wrote: Hi Camille, We already have a reviewboard set up for zookeeper which you should plan to use, and of course the hadoop build farm. If you believe additional analysis tools would be useful in our build, please work with our build.xml and the infra team to get the necessary tools installed. It doesn't do us much good to fix things caught by your build server and then have them break again because we don't have the tooling available. I understand that ZooKeeper is moving to maven as soon as possible. Once the move has been done, I could help getting ZK in the ASF sonar instance.[1] After that we can fine-tune the analysis run on ZK. [1] https://analysis.apache.org Regards, Thomas Koch, http://www.koch.ro