Re: Apache Subversion and the Outreach Program for Women
On Sat, Dec 01, 2012 at 05:36:34PM +1000, Miriam Hochwald wrote: > To whom it may concern, > > I would like to express interest in contributing to Apache Subversion. Hi Miriam! I've volunteered as mentor for Subversion in OPW 2013. > Projects of interest: > >- > >Improve bindings to other programming languages. >- Show progress output. >- Improve 'svn help'. >- More customizable behavior for 'svn diff'. > > My core languages are C, C++ and Java. I have a familiarity with other > languages as well (e.g. Python, JavaScript, HTML, CSS etc.). Do you have preference regarding which of these projects you'd like to work on? Have you investigated any of these projects more closely, and do you have any questions regarding any of these projects? Picking a project that you'd love doing and which will keep you motivated is very important. You should have a gut feeling about what the project entails, what you might want to learn about or discover along the way, and communicate your thoughts on this. Given that you know C and Python, the two main langauges used in Subversion, you could take up pretty much any task of reasonable scope for the 3 month internship. So if you find anything else that interests you but is not the ideas list please don't hesitate to talk to me about alternative project ideas. > At this point in time I don't have Apache Subversion exposure. However, I > note that there is an online book on the topic. My Noob status would > provide a good insight into common learner perspectives, with relevance to > the projects. I do have familiarity with UNIX and CVS. Familiarity with CVS should be good enough, given that Subversion's development is rooted in fixing CVS's shortcomings :) > Please let me know if you would like to proceed with these discussions. I > note that the deadline for the Outreach Program for Women is the 3rd of > December. Subversion joined the OPW program just last week, so the deadline for 3rd of December is rather tight. You should certainly try to formally submit your application by then, but making a small contribution by 3rd of December is not a strict requirement. Please note that as a project we prefer to keep any communication which doesn't involve sensitive or private matters on the dev@ list (i.e this list), which is publicly archived, so that everyone involved can stay on top of what's going on. I'm 'stsp' in #svn-dev on freenode, BTW. I'll try to be reachable for you there as well, as far as my current schedule permits.
Re: RFC: simple proposal for Internet-scoped IDs
On Sat, Dec 01, 2012 at 09:41:17AM -0500, Justin Erenkrantz wrote: > Whether I call him "zhakov" or "ivan" - it's the same person. No, he's very different on IRC ("zhakov") to when I sat next to him in-person ("ivan") in a bar in Berlin at 2am in the morning. Really!
Re: 1.8 Progress
On Thu, Nov 29, 2012 at 04:52:15PM -0500, C. Michael Pilato wrote: > I also seem to recall Stefan also saying something about not having time to > work on this stuff for the remainder of 2012, but I can't find a reference > for that at the moment, so perhaps I just misremembered. I said so on IRC. I'm fully booked by elego customers for most of each week until Christmas, which is why I don't have much time I can invest into Subversion this month. > OWNERSHIP: Given Philip's comment, I believe it is reasonable to deem him > the owner of this body of work, or at least co-owner with a > possibly-time-constrained Stefan. But perhaps all there is to "own" is the > post-branch removal of the feature? I'd prefer to get moves on trunk into a releasable state, or cut the feature out on trunk and put it back in after branching 1.8. I believe removing the feature would be somewhat invasive so it makes sense to do so on trunk in several steps, rather than on the release branch which is supposed to be stable at the time it is created. Obviously, fixing updates of moves would be even better :) I hope Philip or someone else can find time for this. I'll definitely get back to this in January, see where things are at that point, and move forward into whatever direction seems reasaonable.
Re: Fwd: Regarding the Outreach Program for Women
On Sat, Dec 01, 2012 at 03:29:44PM +0530, Shivani Poddar wrote: > Hi > I am currently a second year student in IIIT-Hyderabad and am hugely > interested in a lot of Projects that are being mentored here. It would be > great if you could guide me how to go about applying for them.As it is i am > not able to locate the respective mentors for your projects. Hi Shivani, I missed putting up my details as mentor on our OPW page, sorry. I've added this information now. > Having seen > this program only yesterday, i have not been able to do much work for > contributions, but am confident on my abilities to make a difference in the > same if given the right opportunity. Please don't worry too much about the small contribution deadline, which officially is next Monday. We joined the OPW program rather late in the game (just last week!), so it would be somewhat unreasonable to require a contribution by Monday. I'm not sure whether OPW will consider applications handed in past the deadline, however I'll try to sort things out if your run into any problems due to that. Have you already taken some time to find a project that you're interested in doing. Have you had a look at the project ideas list on our OPW page? http://subversion.apache.org/opw.html Does any of these project ideas appeal to you, or if not do you have any other project idea? I'm very open to suggestions, and would be happy to mentor any project you'd be interested in doing. Please note that as a project we prefer to keep any communication which doesn't involve sensitive or private matters on the dev@ list (i.e this list), which is publicly archived, so that everyone involved can stay on top of what's going on. I'm 'stsp' in #svn-dev on freenode, BTW. I'll try to be reachable for you there as well, as far as my current schedule permits.
Apache Subversion and the Outreach Program for Women
To whom it may concern, I would like to express interest in contributing to Apache Subversion. Projects of interest: - Improve bindings to other programming languages. - Show progress output. - Improve 'svn help'. - More customizable behavior for 'svn diff'. My core languages are C, C++ and Java. I have a familiarity with other languages as well (e.g. Python, JavaScript, HTML, CSS etc.). At this point in time I don't have Apache Subversion exposure. However, I note that there is an online book on the topic. My Noob status would provide a good insight into common learner perspectives, with relevance to the projects. I do have familiarity with UNIX and CVS. Please let me know if you would like to proceed with these discussions. I note that the deadline for the Outreach Program for Women is the 3rd of December. Thank you. Kind regards, Miriam Hochwald -- Founder & Director Girl Geek Coffees (GGC) sites.google.com/site/girlgeekcoffees
Re: non-skelta update editor mode in ra_serf (was Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c)
On Sat, Dec 1, 2012 at 10:40 PM, Branko Čibej wrote: > On 01.12.2012 22:18, Ivan Zhakov wrote: >> Completely agree. >> >> My point was that in theory skelta-mode is cool, but it still needs a >> lot of work to get it really done. > > Sorry? I've had "http-library = serf" in ~/.subversion/config for years, > and have seen no problems. What does it take, in your opinion, to get it > "really done"? It has become abundantly clear during the last year that, just because some people have been using it for years, doesn't mean it's stable / done / ready for prime time. What do you think I spent my entire Berlin hackathon week on last June, together with Justin and others? And weeks before that and after ... and the time spent by others during the last months fixing difficult to reproduce, yet important bugs. Those issues were not due to wrong configurations, or total edge cases, but things that people with big installations would have run into the first day they would have started using it. I don't want to rehash all the discussions, I think there's mostly consensus on the general direction we can / should go (which is forward, not backward). But I find "I have been using it for years without problems" not a very strong argument in this case. Getting back to the actual topic ... as an svn admin, I'd definitely like to have the choice between send-all and skelta mode, if that's possible. Which one should be the default ... I don't know. But having the choice is the most important thing. I'd sleep better at night, knowing that I can flip a switch to reduce the number of requests etc., in the unlikely case that would turn out to be a problem :-). -- Johan
Re: Outreach Program for Women
On Fri, Nov 30, 2012 at 08:01:51PM -0800, Lisa L wrote: > Thanks, Greg. That'll give me something to get familiarized with until a > mentoring-type person is available. :-) Hi, the mentoring-type person would be me :) I'm happy to hear you're interested in improving 'svn help'. I put together the project ideas list in a hurry to get something up on the website ASAP, since the application deadline is rather close. So I may have missed some potentially projects suitable for OPW. If you find that improving 'svn help' is not a suitable project for you we can try to find something else if you prefer. Most coding tasks in Subversion require knowledge of the C programming language. I've tried to come up with project ideas that don't require intimate knowledge of C in case an OPW applicant doesn't already know C. Neither the bindings and 'svn help' project ideas strictly require C skills and I'm happy to help out wherever C skills are necessary. I hope you'll find an interesting project to work on. If you have any questions just ask me. Note that as a project we prefer to keep any communication which doesn't involve sensitive or private matters on this list, which is publicly archived, so that everyone involved can stay on top of what's going on. I'm 'stsp' in #svn-dev on freenode, BTW. I'll try to be reachable for you there as well, as far as my current schedule permits.
Re: non-skelta update editor mode in ra_serf (was Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c)
On 01.12.2012 22:18, Ivan Zhakov wrote: > Completely agree. > > My point was that in theory skelta-mode is cool, but it still needs a > lot of work to get it really done. Sorry? I've had "http-library = serf" in ~/.subversion/config for years, and have seen no problems. What does it take, in your opinion, to get it "really done"? > So let's release ra_serf by > piecemeal, because we also have significant amount of ra_serf issues > unrelated to update editor. We've been through that. There's one or maybe two bugs that are actual bugs that pop up in edge cases, everything else is related to server configuration etc. Is your goal shipping without bugs? -- Brane -- Branko Čibej Director of Subversion | WANdisco | www.wandisco.com
non-skelta update editor mode in ra_serf (was Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c)
[ changing subject to make topic more visible] On Sat, Dec 1, 2012 at 9:00 PM, Mark Phippard wrote: > On Sat, Dec 1, 2012 at 12:36 AM, Justin Erenkrantz > wrote: >> On Fri, Nov 30, 2012 at 4:54 PM, wrote: >>> >>> Author: cmpilato >>> Date: Fri Nov 30 21:54:35 2012 >>> New Revision: 1415864 >>> >>> URL: http://svn.apache.org/viewvc?rev=1415864&view=rev >>> Log: >>> Implement in ra_serf "send-all" mode support for update-style REPORTs >>> and their responses. (Currently disabled by compile-time conditionals.) >>> >>> (This one goes out to Ivan Zhakov.) >> >> >> I've stated for a long time that I think the send-all mode is a huge mistake >> architecturally because it is too prone to double-compression and TCP >> pipeline stalls and is a tremendous burden on a properly-configured httpd >> (by not taking advantage of server-side parallelism), it's nice to see it's >> not *too* hard to shoehorn this bad idea back into ra_serf. We'd never be >> able to shove the non-send-all approach into ra_neon. =) > > Just to be clear, I do not believe anyone is suggesting we completely > abandon the non-send-all approach. I like that this approach can > offer good performance on a well-configured server as well as enable > new features/ideas such as not even fetching the full-texts that we > already have locally. I think the question is simply what is the best > way to deliver this. Completely agree. My point was that in theory skelta-mode is cool, but it still needs a lot of work to get it really done. So let's release ra_serf by piecemeal, because we also have significant amount of ra_serf issues unrelated to update editor. > >> Here's my suggestion for consideration - let's experiment with this setting >> in the beta release process with the setting as-is - that is we always do >> the parallel updates unconditionally (except perhaps when svnrdump is being >> stupid). If we get real users complaining about the update during that >> cycle, we can then figure out either switching the default and/or adding a >> config-option or even allowing some control via capabilities exchange. > > I feel pretty strongly that we should at minimum use the send-all > approach when talking to pre-1.8 servers. Even though in some > situations it could still offer good performance. I just think it > would be more respectful to our users (server admins in this case) to > not change this behavior in a way that could surprise them. Maybe we > could come up with exceptions, such as older servers that are using > the SVNAllowBulkUpdates off directive. In that situation we should > use the new behavior since that is basically what that directive is > asking for. > > As I said in another thread, I think we should treat a 1.8 server the > same way and require someone that was upgrading to add some new > directive to enable the new feature. This would allow a server admin > to setup his server correctly, including using things like > mod_deflate, and turn on the new behavior rather than get it > automatically simply because they upgraded their binaries. > > This seems like it satisfies everyone. Existing users, especially > those running older server versions, would not be surprised by new and > unwanted client behavior, and it would still be easy to configure a > new server properly to support the non-send-all mode when it was > desired. I just do not see what the downside would be to approaching > it this way. > +1. -- Ivan Zhakov
Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c
On Sat, Dec 1, 2012 at 12:00 PM, Mark Phippard wrote: > I feel pretty strongly that we should at minimum use the send-all > approach when talking to pre-1.8 servers. Even though in some > situations it could still offer good performance. I just think it > would be more respectful to our users (server admins in this case) to > not change this behavior in a way that could surprise them. Maybe we > could come up with exceptions, such as older servers that are using > the SVNAllowBulkUpdates off directive. In that situation we should > use the new behavior since that is basically what that directive is > asking for. > Without a lot of concrete feedback that parallel updates should be removed by default, I strongly believe that we should not be conservative on this issue. The issue here is not one of compatibility - ra_serf has been around for years and can talk just fine to older servers (way back to prior to 1.0 servers actually). The only argument against altering the default behavior is that there might be an admin of a high-traffic site somewhere that might suddenly be shocked by more HTTP requests coming in. I honestly have little to no sympathy for such an admin who doesn't properly understand how to manage a large installation - they likely have other issues that they are not paying attention to. Until we have hoards of users coming in and complaining about this, I think it's silly to be conservative here. I'm definitely not against giving knobs to the client or to the admin in weird corner cases (provided someone cares enough to write that up), but I strongly believe that for now we should do the right thing out of the box in 1.8 - which is to utilize parallel updates. -- justin
Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c
On Fri, Nov 30, 2012 at 5:38 PM, C. Michael Pilato wrote: > On 11/30/2012 05:25 PM, Mark Phippard wrote: >> On Fri, Nov 30, 2012 at 5:23 PM, C. Michael Pilato >> wrote: >>> On 11/30/2012 05:00 PM, Mark Phippard wrote: On Fri, Nov 30, 2012 at 4:54 PM, wrote: > Author: cmpilato > Date: Fri Nov 30 21:54:35 2012 > New Revision: 1415864 > > URL: http://svn.apache.org/viewvc?rev=1415864&view=rev > Log: > Implement in ra_serf "send-all" mode support for update-style REPORTs > and their responses. (Currently disabled by compile-time conditionals.) Sweet! Would this also resolve the issue with svnrdump, or could it? When Serf is using this mode, I assume it is also now conforming to Ev1? >>> >>> I guess it *could* based on what I'm reading is considered the source of >>> svnrdump+ra_serf's problems, but I'm a bit confused -- I thought svnrdump >>> used the ra-replay API instead of the ra-update one? >> >> Guess I am more wondering if it was another area where the same >> solution could be applied? > > No, that's just it. ra_serf's implementation of the ra-replay API is > single-connection, just like ra-neon's was. What suprises me is that > svnrdump *does* use the ra-update API. > > Ah! I see why, now. When not doing an incremental dump, 'svnrdump dump' > uses the ra-update API to handle that initial checkout-like revision. After > that (and otherwise when in incremental mode), it uses the ra-replay API. > So yes, I believe svnrdump would be in fine shape over ra-serf if it was > asking the server to use this "send-all" mode, where document Ev1 drive > ordering *should* be honored. So this sounds like pretty great news. Regardless what we decide to do for Serf with normal updates, it seems like we could unconditionally make svnrdump tap into the send-all mode and that would remove a release blocker. -- Thanks Mark Phippard http://markphip.blogspot.com/
Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c
On Sat, Dec 1, 2012 at 12:36 AM, Justin Erenkrantz wrote: > On Fri, Nov 30, 2012 at 4:54 PM, wrote: >> >> Author: cmpilato >> Date: Fri Nov 30 21:54:35 2012 >> New Revision: 1415864 >> >> URL: http://svn.apache.org/viewvc?rev=1415864&view=rev >> Log: >> Implement in ra_serf "send-all" mode support for update-style REPORTs >> and their responses. (Currently disabled by compile-time conditionals.) >> >> (This one goes out to Ivan Zhakov.) > > > I've stated for a long time that I think the send-all mode is a huge mistake > architecturally because it is too prone to double-compression and TCP > pipeline stalls and is a tremendous burden on a properly-configured httpd > (by not taking advantage of server-side parallelism), it's nice to see it's > not *too* hard to shoehorn this bad idea back into ra_serf. We'd never be > able to shove the non-send-all approach into ra_neon. =) Just to be clear, I do not believe anyone is suggesting we completely abandon the non-send-all approach. I like that this approach can offer good performance on a well-configured server as well as enable new features/ideas such as not even fetching the full-texts that we already have locally. I think the question is simply what is the best way to deliver this. > Here's my suggestion for consideration - let's experiment with this setting > in the beta release process with the setting as-is - that is we always do > the parallel updates unconditionally (except perhaps when svnrdump is being > stupid). If we get real users complaining about the update during that > cycle, we can then figure out either switching the default and/or adding a > config-option or even allowing some control via capabilities exchange. I feel pretty strongly that we should at minimum use the send-all approach when talking to pre-1.8 servers. Even though in some situations it could still offer good performance. I just think it would be more respectful to our users (server admins in this case) to not change this behavior in a way that could surprise them. Maybe we could come up with exceptions, such as older servers that are using the SVNAllowBulkUpdates off directive. In that situation we should use the new behavior since that is basically what that directive is asking for. As I said in another thread, I think we should treat a 1.8 server the same way and require someone that was upgrading to add some new directive to enable the new feature. This would allow a server admin to setup his server correctly, including using things like mod_deflate, and turn on the new behavior rather than get it automatically simply because they upgraded their binaries. This seems like it satisfies everyone. Existing users, especially those running older server versions, would not be surprised by new and unwanted client behavior, and it would still be easy to configure a new server properly to support the non-send-all mode when it was desired. I just do not see what the downside would be to approaching it this way. -- Thanks Mark Phippard http://markphip.blogspot.com/
Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c
Justin Erenkrantz wrote on Sat, Dec 01, 2012 at 09:50:29 -0500: > On Sat, Dec 1, 2012 at 9:41 AM, Branko Čibej wrote: > > > On 01.12.2012 14:31, Justin Erenkrantz wrote: > > > And, yes, that clearly could all be done in time for 1.8 without > > > jeopardizing the timelines one tiny bit. =P > > > > Eep ... :) > > > > > > Another thing I've been thinking about is this: Why are we using SHA1 > > checksums on the server and on the wire for consistency checks when a > > 64-bit CRC would do the job just as well, and 15 times cheaper? And > > banging my head against the wall for not thinking of this 10 years ago. > > > > I can sort of understand the use of SHA1 as a content index for > > client-side pristine files. On the server, however ... dunno. Maybe we > > could design something akin to what the rsync protocol does, but for > > repository-wide data storage. Could be quite tricky to achieve locality, > > however. > > > > The one thing that's nice with using SHA checksums is we're using it > everywhere. It makes protocol debugging a *lot* easier - since we also > used SHA checksums as the content index, that makes it easier to compare > what we recorded in libsvn_wc to what was sent by the server. If we > diverged the checksums algorithms, it'd be hard to do a quick comparison > visually (do the checksums match?) without actually running the checksum > yourself! If that's the problem, have the server send the recorded-in-fs sha1 checksum as an attribute that the client ignores. (in SVN_DEBUG builds only) > > So, I think we optimized for humans here...and I'm okay with that. We can > always build faster processors...and take advantage of parallelism. =) > > There I go off on a tangent again. > > > > *grin* -- justin
Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c
On Sat, Dec 1, 2012 at 9:41 AM, Branko Čibej wrote: > On 01.12.2012 14:31, Justin Erenkrantz wrote: > > And, yes, that clearly could all be done in time for 1.8 without > > jeopardizing the timelines one tiny bit. =P > > Eep ... :) > > > Another thing I've been thinking about is this: Why are we using SHA1 > checksums on the server and on the wire for consistency checks when a > 64-bit CRC would do the job just as well, and 15 times cheaper? And > banging my head against the wall for not thinking of this 10 years ago. > > I can sort of understand the use of SHA1 as a content index for > client-side pristine files. On the server, however ... dunno. Maybe we > could design something akin to what the rsync protocol does, but for > repository-wide data storage. Could be quite tricky to achieve locality, > however. > The one thing that's nice with using SHA checksums is we're using it everywhere. It makes protocol debugging a *lot* easier - since we also used SHA checksums as the content index, that makes it easier to compare what we recorded in libsvn_wc to what was sent by the server. If we diverged the checksums algorithms, it'd be hard to do a quick comparison visually (do the checksums match?) without actually running the checksum yourself! So, I think we optimized for humans here...and I'm okay with that. We can always build faster processors...and take advantage of parallelism. =) There I go off on a tangent again. > *grin* -- justin
Re: RFC: simple proposal for Internet-scoped IDs
On Sat, Dec 1, 2012 at 8:53 AM, Eric S. Raymond wrote: > There. You're done. It's backward-compatible (older installations > can ignore the feature and nothing breaks). It's independent of your > authentication method, but you can add auth checks if you care enough. > It scales well because the burden of setting up FULLNAME strings is > small and distributed - also because project administrators only have > to make one decision, once. Provided that it's not the default then I don't really see a problem with this. > I'm not certain, but if your server-side hooks work the way I think > they do, all of this except (1) can be done in Python. Not having to > add complexity to your C code is a significant virtue. It's not really about complexity in my opinion. It's about the fact that putting it in the C code would be us trying to implement local policies which can be implemented in a hook script without us trying to consider every possible policy. > If you have an LDAP setup or something like that, you don't need this > - you flip some other switch once and stuff Just Works. Which is fine: > the point of this proposal isn't to force a DVCS-like choice, it's to > put in place a low-effort path to Internet-scoped attribution IDs that > will *always work*. LDAP or something like that is going to pay off much higher dividends for the forge project than this initiative in my opinion. > Somebody alleged "this is a social problem". That's only half-true, but > now I'm going to focus on the true part. "Social problem" doesn't > mean you can or should ignore it. It means you have to lead by > educating and jawboning your users. A large step is just being willing > to say, where your users can see it, "Internet-scoped attributions > are important. Here's how to make them work..." I really don't think in the context of Subversion that this is as important as you make it out to be. The ASF Infra folks have probably done more repository moves than just about anyone I can think about. They've managed to handle this without internet scoped attributions. The only thing this really buys you is that you theoretically don't have to worry about userid conflicts. Which technically you don't since the svn:author field isn't used in any meaningful way anyway. I'd say that "First Last " does not guarantee there are no userid conflicts since if the userid ends up being: John Smith You could still end up with a situation where John Smith 1 stops using gmail and loses that address and John Smith 2 comes along and gets it. Unlikely, but you still have the problem, it's just less likely.
Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c
On 01.12.2012 14:31, Justin Erenkrantz wrote: > And, yes, that clearly could all be done in time for 1.8 without > jeopardizing the timelines one tiny bit. =P Eep ... :) Another thing I've been thinking about is this: Why are we using SHA1 checksums on the server and on the wire for consistency checks when a 64-bit CRC would do the job just as well, and 15 times cheaper? And banging my head against the wall for not thinking of this 10 years ago. I can sort of understand the use of SHA1 as a content index for client-side pristine files. On the server, however ... dunno. Maybe we could design something akin to what the rsync protocol does, but for repository-wide data storage. Could be quite tricky to achieve locality, however. There I go off on a tangent again. -- Brane -- Branko Čibej Director of Subversion | WANdisco | www.wandisco.com
Re: RFC: simple proposal for Internet-scoped IDs
On Sat, Dec 1, 2012 at 9:28 AM, Daniel Shahaf wrote: > BTW, the ability to change svn:author at will is one of the reasons they > aren't global-scoped: if Subversion ever migrated away from ASF, we can > _then_ change all svn:author revprops --- just like we once changed > "zhakov" (implied @tigris) to "ivan" (implied @apache). > Exactly. And, to be honest, I probably never realized that Ivan's author tag changed...I knew it was him in both cases as I'm involved in the project. So, for most human-scale projects, I think that you don't need globally unique IDs as there's a defined community and set of participants. Whether I call him "zhakov" or "ivan" - it's the same person. I know that, you know that...and, really, all of the people who care know that. =) For projects where people don't know everyone (Linux), I can see why globally unique IDs are helpful to contributors. But, I would shudder if suddenly svn's own blame output emitted "Daniel Shahaf < d...@daniel.shahaf.name>" instead of "danielsh". I have that map already in my head thank-you-very-much. Hence, this is why I'd be a strong proponent of them being in separate revprops - a "local" project name (svn:author) and something that more uniquely identifies the contributor (FULLNAME blah blah blah). And, perhaps have an option on the client as to which one to use - I could see some folks wanting the GUID, but that's just way too verbose for me... -- justin
Re: RFC: simple proposal for Internet-scoped IDs
Justin Erenkrantz wrote on Sat, Dec 01, 2012 at 09:08:05 -0500: > And, once again, I'll reiterate my earlier point that FULLNAME can be added > retroactively pretty easily to existing SVN repositories. So, for > svn.apache.org, after we might deploy a FULLNAME infrastructure, we could > easily craft a tool to go back to all old revisions and annotate them > correctly. Easy peasy. -- justin BTW, the ability to change svn:author at will is one of the reasons they aren't global-scoped: if Subversion ever migrated away from ASF, we can _then_ change all svn:author revprops --- just like we once changed "zhakov" (implied @tigris) to "ivan" (implied @apache).
Re: reposurgeon now writes Subversion repositories
On Sat, Dec 1, 2012 at 8:14 AM, Eric S. Raymond wrote: > This one confines your Unix-ID adhesion to the FULLNAMES array, which > is a long step in the right direction because it means your repo history > will be local-ID-clean. It confines it to whatever value that python script could be taught how to get it. I'm sure you can modify the python script to get it from a different source. For that matter you could have the script in the repo and use a post-commit script that updates it everytime someone commits it. Then the script moves with the repo. > But it doesn't actually solve the mobility problem. If the project > ever moves, you still have to patch the FULLNAMES dictionary by hand. > This approach won't scale very well. Of course it doesn't scale. It's a trivial example to demonstrate the technique. What I don't understand is your hypothetical situation is demanding an awful lot of Subversion. You've scoped things like an issue tracker and other things as being part of this. But for some reason you've not bothered to scope an authentication system and exporting and moving the users. All of these forge sites allow you to access the repo with the same username/password as the issue tracker etc... So you need some sort of federated (even if it's just specific to each project) authentication system. Subversion doesn't provide that for you, nor should it. You're probably not going to find one that's ready made to your situation either. You're going to need to do some thinking about how to configure things. > I also note that you do really want "J. Random User " > with a preferred "home" address as part of the mix, because the > entropy of human names alone is not quite high enough. Yes, if I see > "Daniel Shahaf" I'm pretty sure there is only one of those. But > "Willam Smith" or "Robert Jones"? " :-) And it's trivial to adjust it to be that way. > But the first? I've heard of LDAP and know roughly what it does, but > I've never seen a live instance. Forges don't have them. Maybe I'm > being parochial, but this seems like a solution for a case too unusual > to be very interesting. Why not? What's so hard about setting up an LDAP instance for the project? >> Alternative server-side implementation (via breser): >> [[[ >> command="svnserve -t --tunnel-user='Daniel Shahaf'" ssh-rsa ... >> ]]] > > Um, does this mean everyone's commits are coing to look like > Daniel Shahaf made them? If not, where is --tunnel-user going to > come from? No this setup is something that gets added to the start of everyone line (different for each user) of the authorized_keys file for the user you're having people use with svn+ssh. Generally I'd expect whatever system you're using to manage these keys is going to handle this for you(e.g. user goes to some web form and pastes their public key in and then this system edits the authorized_keys file). You'll have to write something. > The lesson from this criticism is intended to be that it's not > enough to make Internet-scoped IDs possible, you have to make > them *easy* - that is, not disruptive of normal workflow. I'd say that the choices you've been presented with are relatively easy to implement. Tons of corporate users have managed to implement things like this. What isn't easy is what you're really asking to do. Which is systems design. You want to pull together a bunch of disparate programs and make them work together in a coordinated and seamless way. That's not terribly easy to do without putting some degree of time building the infrastructure around them. Which is really what a forge site is about. If you want to build a forge site that has portable setups then you're going to have to take and write a way to export all the data (not just the repositories, issue trackers db, wiki db, etc...) but also all the glue between those pieces. Unless you've got multiple existing forges already interested in implementing something like this that come together to implement an agreed upon data format. Your best bet is going to be implementing a packaged up system that uses various systems and then exports and imports your data format. We've gone well beyond the area that Subversion is involved and quite frankly we're heading entirely into off topic design work for your forge.
Re: reposurgeon now writes Subversion repositories
On 01.12.2012 14:14, Eric S. Raymond wrote: > (Apologies if this is a duplicate send. I just had a disturbing > glitch in my MUA and want to make sure it got out.) > > Daniel Shahaf : >> Server-side implementation, independent of RA method: (via brane) > Ah, now that looks somewhat like progress. But some (possibly all) of > these solutions have serious weaknesses which you need to think about. > >> [[[ >> #!/usr/bin/env python >> >> import sys >> from svn.repos import * >> from svn.fs import * >> from svn.core import SVN_PROP_REVISION_AUTHOR >> >> FULLNAMES = { >> 'danielsh': 'Daniel Shahaf', >> } >> >> reposdir, txnname = sys.argv[1:3] >> >> repos = svn_repos_open(reposdir, None) >> fs = svn_repos_fs(repos) >> txn = svn_fs_open_txn(fs, txnname, None) >> propval = svn_fs_txn_prop(txn, SVN_PROP_REVISION_AUTHOR, None) >> svn_fs_change_txn_prop(txn, SVN_PROP_REVISION_AUTHOR, >>FULLNAMES.get(propval, propval), None) >> ]]] > This one confines your Unix-ID adhesion to the FULLNAMES array, which > is a long step in the right direction because it means your repo history > will be local-ID-clean. > > But it doesn't actually solve the mobility problem. If the project > ever moves, you still have to patch the FULLNAMES dictionary by hand. > This approach won't scale very well. Oh come on. Daniel was giving an example cobbled up in all of 5 minutes. Surely you can imagine replacing FULLNAMES with some user database? > I also note that you do really want "J. Random User " > with a preferred "home" address as part of the mix, because the > entropy of human names alone is not quite high enough. Yes, if I see > "Daniel Shahaf" I'm pretty sure there is only one of those. But > "Willam Smith" or "Robert Jones"? " :-) See above. You can put anything into FULLNAMES and/or a database and/or LDAP (which is just a database). >> Alternative server-side implementation (via markphip): >> [[[ >> AuthLDAPRemoteUserAttribute cn >> ]]] > A variant of this that does "J. Random User " > looks like it might work provided there's an LDAP directory and we trust > the LDAP directory to be up to date. The second assumption seems > reasonable if we grant the first. > > But the first? I've heard of LDAP and know roughly what it does, but > I've never seen a live instance. Forges don't have them. Maybe I'm > being parochial, but this seems like a solution for a case too unusual > to be very interesting. Oh right. Does it make the solution any less unusual if I tell you that all of the ASF services, including Subversion, have single-signon via LDAP? Or that you can just as easily replace mod_ldap with mod_authn_ which essentially brings you back to the post-commit hook example. > But this has been fruitful. I think I can write a simple proposal > about how to solve this problem now. I'll do it in my next email. No offence, but it sure looks as if you're deliberately nitpicking in order to give yourself an excuse for writing a proposal for a feature that Subversion, essentially, already has. Certainly I'll read your proposal and don't intend to dismiss it out of hand. But trusting the server to properly authenticate committers is a basic axiom of Subversion's centralized model. And for the record, it's also a basic axiom of GitHub's centralized model. -- Brane P.S.: I find it fascinating that DVCS aficionados haven't noticed that GitHub takes the D out of DVCS very effectively, thereby making git actually useful for most normal people. -- Branko Čibej Director of Subversion | WANdisco | www.wandisco.com
Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c
On Sat, Dec 1, 2012 at 9:01 AM, Lieven Govaerts wrote: > There are some scenario's where either the server admin or the user > can decide if parallel requests make sense or not. > > I'm specifically thinking of the use Kerberos per request > authentication. These responses can't be cached on the client side, > and require the authorization header to be sent for each request. > Assuming 2 step handshake of which serf can bypass the first, this > means an overhead per request of 1-10KB, with a 3 step handshake each > request has to be sent twice further increasing the overhead. > IMHO in this scenario the server admin should be able to veto the use > of parallel requests. > > And the same is true for https connections, where it's also the server > admin who can decide if the necessary caches have been put in place to > enable the benefits of parallel requests. > Totally agreed. I'd favor a three-value httpd directive option on the server-side that is advertised in the capabilities exchange: - default (client defaults to parallel if ra_serf, serial if older ra_neon client; or if client overrides ra_serf via their local servers options) - serial (server suggests to client that it should be serial; but permit parallel when client wants it) - force-serial (same capability advertisement, but always trigger send-all responses regardless of what client asks for) I'm 95% sure we have code in ra_serf that handles the case where the server sends us inline responses anyway as older (prior to 1.2, IIRC) always sent inline responses no matter what we send...so, it should be fairly straightforward decision tree with minimal code changes. My $.02...which is still not enough for me to write the patch. =) -- justin
Re: RFC: simple proposal for Internet-scoped IDs
On Sat, Dec 1, 2012 at 8:53 AM, Eric S. Raymond wrote: > I'm not certain, but if your server-side hooks work the way I think > they do, all of this except (1) can be done in Python. Not having to > add complexity to your C code is a significant virtue. > Here's another approach to take with regards to setting the FULLNAME field that doesn't require any change to the client and can be deployed server-side via hooks without any code changes at all. So, (1) can be done in Python pretty easily for the lazy users and coders who don't want to do anything at all. =) If you have a centralized registry (like either LDAP or http://people.apache.org/committer-index.html), the server in the server-side hooks can set FULLNAME for each svn:author if isn't set by the client by looking up its internal directory. Within the ASF infrastructure, we have tools to allow committers to manage fields like this in a self-service way. So, the server admin can default that field as they like and give the users to set that field as they like. I believe that this is one of the benefits of a centralized infrastructure - we can make it so that every client doesn't *have* to set something themselves on their client to utilize FULLNAME. And, once again, I'll reiterate my earlier point that FULLNAME can be added retroactively pretty easily to existing SVN repositories. So, for svn.apache.org, after we might deploy a FULLNAME infrastructure, we could easily craft a tool to go back to all old revisions and annotate them correctly. Easy peasy. -- justin
Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c
On Sat, Dec 1, 2012 at 2:31 PM, Justin Erenkrantz wrote: > On Sat, Dec 1, 2012 at 5:59 AM, Johan Corveleyn wrote: >> >> I'm wondering whether your concerns apply to both internet-wide >> deployments and local (all on the same LAN) ones. > > > That line is certainly a fair one to draw in the sand. That said, I think > the internal use case cries out even *more* for the parallel updates as the > internal server in that environment is often wildly over-provisioned on the > CPU side - with a fairly low-traffic environment, you want to take advantage > of the parallel cores of a CPU to drive the updates. > > Generally speaking, what I discovered years ago back in 2006 (yikes) and I > believe is still true as we near 2013 (shudder), if everything else is > perfectly optimized (disk, latency, bandwidth, etc.), you're going to > eventually bottleneck on the checksumming on both client and server - which > is entirely CPU-bound and is expensive. You can solve that by splitting out > the work across multiple cores - for a server, you need to utilize multiple > parallel requests in-flight; and for a client, you then need to parallelize > the editor drive. > > The reason that disk isn't such a bottleneck as you might first expect is > due to the OS's buffer cache - for reads on the server-side, common data is > already going to be in RAM so hot spots in the fsfs repos will already be in > memory, for writes on the client-side, modern client OSes won't necessarily > block you until everything is sync'd to disk. But, once you exhaust the > capabilities of RAM, your underlying disk architecture matters a lot and one > that might not be intuitive to those that haven't spent a lot of time > closely with them. (Hi Brane!) If you are using direct-attached storage > locally on either server or client, then you will probably be bottlenecked > right there. However, if your corporate environment has an NFS filer or SAN > (a la NetApp/EMC) backing the FSFS repository or as NFS working copies (oh > so common), those large disk subsystems are geared towards parallel I/Os - > not single-threaded I/O performance - Isilon/BlueArc-class storage is > however; but I've yet to see anyone obsessed enough about SVN I/O perf to > place either their repository or working copies on a BlueArc-class storage > system! So, if you are not using direct-attached storage and are using NFS > today in a corporate environment on either client or server, then you want > to parallelize everything so that you can take advantage of the disk/network > I/O architecture preferred by NetApp/EMC. Throwing more cores against a > NetApp/EMC storage system in a high-available bandwidth environment allows > for linear performance returns (i.e., reading/writing one I/O is 1X, two > threads is 2X, three threads is 3X, etc, etc.). > > To that end, I'd eventually love to see ra_serf drive the update editor > across multiple threads so that the checksum and disk I/O bottleneck can be > distributed across cores on the client-side as well. Compared to where we > were in 2006, that's the biggest inefficiency we have yet to solve and take > advantage of. And, I'm sure this'll break all sorts of promises in the Ev1 > and perhaps Ev2 world and drive C-Mike even crazier. =) But, if you want > to put a rocket pack on our HTTP performance, that's exactly what we should > do. I'm reasonably certain that serf itself could be finely tuned to handle > network I/O in a single thread at or close to wire-speed even on a 10G > connection with a modern processor/OS - it's what we do with the file > contents/textdeltas that needs to be shoved to a different set of worker > threads and remove all of that libsvn_wc processing from blocking network > traffic processing and get it all distributed and thread-safe. If we do > that, woah, I'd bet that we are we going to make things way faster across > the board and completely blow everything else out of the water when our > available bandwidth is high - which is the case in an internal network. > And, yes, that clearly could all be done in time for 1.8 without > jeopardizing the timelines one tiny bit. =P > > So, that's my long-winded answer of saying that, yah, even in an internal > LAN environment, you still want to parallelize. > > However, I'm definitely not going to veto a patch that would add an httpd > directive that allows the server to steer the client - unless overridden by > the client's local config - to using parallel updates or not. -- justin There are some scenario's where either the server admin or the user can decide if parallel requests make sense or not. I'm specifically thinking of the use Kerberos per request authentication. These responses can't be cached on the client side, and require the authorization header to be sent for each request. Assuming 2 step handshake of which serf can bypass the first, this means an overhead per request of 1-10KB, with a 3 step handshake each request has to be sent twice fur
RFC: simple proposal for Internet-scoped IDs
This discussion has been fruitful. The responses from Greg, Branko and others suggest that you guys are actually engaged with the project-mobility problem now - apologies if my approach seemed a bit too boot-to-the head, but I really am trying to be helpful. I think I can now write a simple proposal that will work. Despite what some people in this conversation have thought, I'm not ideologically fixated on "user sets his own attribution ID". But I keep coming back to that because it seems to be the only solution that scales up and covers all the deployment cases. Here's the proposal: 1. Add support to the client tools for shipping a FULLNAME field mined from somewhere under ~/.subversion. Maybe the existing username entry will do, maybe it won't - I see arguments both ways. I don't care, we can fill in that detail later. 2. Add server-side logic that says: if you see a FULLNAME field in a request, use that to fill svn:author. (Yes, in practice you used a different, dedicated revprop to carry FULLNAME. That's OK, it's an implementation detail.) 3. Add a config switch to the server side that tells it to reject commit attempts with a "Set your FULLNAME, please" message if it doesn't see a FULLNAME field in the request. Initially default this switch off. 4. (Important) Tell repository administrators about this in the docs, and say that turning on FULLNAME-required is best practice, and explain why. Since I know how much most people hate writing docs, I volunteer to do this part. 5. If you're really worried about spoofing, you add some server-side logic that stores (auth-cookie, FULLNAME) pairs whenever a new FULLNAME arrives and barfs if a known auth-cookie arrives with a known FULLNAME and they don't match. But this is an optional extra, field experience says you don't need it. There. You're done. It's backward-compatible (older installations can ignore the feature and nothing breaks). It's independent of your authentication method, but you can add auth checks if you care enough. It scales well because the burden of setting up FULLNAME strings is small and distributed - also because project administrators only have to make one decision, once. I'm not certain, but if your server-side hooks work the way I think they do, all of this except (1) can be done in Python. Not having to add complexity to your C code is a significant virtue. If you have an LDAP setup or something like that, you don't need this - you flip some other switch once and stuff Just Works. Which is fine: the point of this proposal isn't to force a DVCS-like choice, it's to put in place a low-effort path to Internet-scoped attribution IDs that will *always work*. Somebody alleged "this is a social problem". That's only half-true, but now I'm going to focus on the true part. "Social problem" doesn't mean you can or should ignore it. It means you have to lead by educating and jawboning your users. A large step is just being willing to say, where your users can see it, "Internet-scoped attributions are important. Here's how to make them work..." -- http://www.catb.org/~esr/";>Eric S. Raymond Whether the authorities be invaders or merely local tyrants, the effect of such [gun control] laws is to place the individual at the mercy of the state, unable to resist. -- Robert Anson Heinlein, 1949
Re: reposurgeon now writes Subversion repositories
On Sat, 01 Dec 2012, Eric S. Raymond wrote: Alternative server-side implementation (via breser): [[[ command="svnserve -t --tunnel-user='Daniel Shahaf'" ssh-rsa ... ]]] Um, does this mean everyone's commits are coing to look like Daniel Shahaf made them? If not, where is --tunnel-user going to come from? It comes from the .ssh/authorized_keys file, in a context that is associated with exactly one ssh key (the "ssh-rsa ..." part); this would be the same place that previously had "--tunnel-user=danielsh". --apb (Alan Barrett)
Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c
On Sat, Dec 1, 2012 at 5:59 AM, Johan Corveleyn wrote: > I'm wondering whether your concerns apply to both internet-wide > deployments and local (all on the same LAN) ones. > That line is certainly a fair one to draw in the sand. That said, I think the internal use case cries out even *more* for the parallel updates as the internal server in that environment is often wildly over-provisioned on the CPU side - with a fairly low-traffic environment, you want to take advantage of the parallel cores of a CPU to drive the updates. Generally speaking, what I discovered years ago back in 2006 (yikes) and I believe is still true as we near 2013 (shudder), if everything else is perfectly optimized (disk, latency, bandwidth, etc.), you're going to eventually bottleneck on the checksumming on both client and server - which is entirely CPU-bound and is expensive. You can solve that by splitting out the work across multiple cores - for a server, you need to utilize multiple parallel requests in-flight; and for a client, you then need to parallelize the editor drive. The reason that disk isn't such a bottleneck as you might first expect is due to the OS's buffer cache - for reads on the server-side, common data is already going to be in RAM so hot spots in the fsfs repos will already be in memory, for writes on the client-side, modern client OSes won't necessarily block you until everything is sync'd to disk. But, once you exhaust the capabilities of RAM, your underlying disk architecture matters a lot and one that might not be intuitive to those that haven't spent a lot of time closely with them. (Hi Brane!) If you are using direct-attached storage locally on either server or client, then you will probably be bottlenecked right there. However, if your corporate environment has an NFS filer or SAN (a la NetApp/EMC) backing the FSFS repository or as NFS working copies (oh so common), those large disk subsystems are geared towards parallel I/Os - not single-threaded I/O performance - Isilon/BlueArc-class storage is however; but I've yet to see anyone obsessed enough about SVN I/O perf to place either their repository or working copies on a BlueArc-class storage system! So, if you are not using direct-attached storage and are using NFS today in a corporate environment on either client or server, then you want to parallelize everything so that you can take advantage of the disk/network I/O architecture preferred by NetApp/EMC. Throwing more cores against a NetApp/EMC storage system in a high-available bandwidth environment allows for linear performance returns (i.e., reading/writing one I/O is 1X, two threads is 2X, three threads is 3X, etc, etc.). To that end, I'd eventually love to see ra_serf drive the update editor across multiple threads so that the checksum and disk I/O bottleneck can be distributed across cores on the client-side as well. Compared to where we were in 2006, that's the biggest inefficiency we have yet to solve and take advantage of. And, I'm sure this'll break all sorts of promises in the Ev1 and perhaps Ev2 world and drive C-Mike even crazier. =) But, if you want to put a rocket pack on our HTTP performance, that's exactly what we should do. I'm reasonably certain that serf itself could be finely tuned to handle network I/O in a single thread at or close to wire-speed even on a 10G connection with a modern processor/OS - it's what we do with the file contents/textdeltas that needs to be shoved to a different set of worker threads and remove all of that libsvn_wc processing from blocking network traffic processing and get it all distributed and thread-safe. If we do that, woah, I'd bet that we are we going to make things way faster across the board and completely blow everything else out of the water when our available bandwidth is high - which is the case in an internal network. And, yes, that clearly could all be done in time for 1.8 without jeopardizing the timelines one tiny bit. =P So, that's my long-winded answer of saying that, yah, even in an internal LAN environment, you still want to parallelize. However, I'm definitely not going to veto a patch that would add an httpd directive that allows the server to steer the client - unless overridden by the client's local config - to using parallel updates or not. -- justin
Re: reposurgeon now writes Subversion repositories
(Apologies if this is a duplicate send. I just had a disturbing glitch in my MUA and want to make sure it got out.) Daniel Shahaf : > Server-side implementation, independent of RA method: (via brane) Ah, now that looks somewhat like progress. But some (possibly all) of these solutions have serious weaknesses which you need to think about. > [[[ > #!/usr/bin/env python > > import sys > from svn.repos import * > from svn.fs import * > from svn.core import SVN_PROP_REVISION_AUTHOR > > FULLNAMES = { > 'danielsh': 'Daniel Shahaf', > } > > reposdir, txnname = sys.argv[1:3] > > repos = svn_repos_open(reposdir, None) > fs = svn_repos_fs(repos) > txn = svn_fs_open_txn(fs, txnname, None) > propval = svn_fs_txn_prop(txn, SVN_PROP_REVISION_AUTHOR, None) > svn_fs_change_txn_prop(txn, SVN_PROP_REVISION_AUTHOR, >FULLNAMES.get(propval, propval), None) > ]]] This one confines your Unix-ID adhesion to the FULLNAMES array, which is a long step in the right direction because it means your repo history will be local-ID-clean. But it doesn't actually solve the mobility problem. If the project ever moves, you still have to patch the FULLNAMES dictionary by hand. This approach won't scale very well. I also note that you do really want "J. Random User " with a preferred "home" address as part of the mix, because the entropy of human names alone is not quite high enough. Yes, if I see "Daniel Shahaf" I'm pretty sure there is only one of those. But "Willam Smith" or "Robert Jones"? " :-) > Alternative server-side implementation (via markphip): > [[[ > AuthLDAPRemoteUserAttribute cn > ]]] A variant of this that does "J. Random User " looks like it might work provided there's an LDAP directory and we trust the LDAP directory to be up to date. The second assumption seems reasonable if we grant the first. But the first? I've heard of LDAP and know roughly what it does, but I've never seen a live instance. Forges don't have them. Maybe I'm being parochial, but this seems like a solution for a case too unusual to be very interesting. > Alternative server-side implementation (via breser): > [[[ > command="svnserve -t --tunnel-user='Daniel Shahaf'" ssh-rsa ... > ]]] Um, does this mean everyone's commits are coing to look like Daniel Shahaf made them? If not, where is --tunnel-user going to come from? > Client-side implementation (via danielsh): > [[[ > [ -n "${EMAIL}" ] && svn() { > if [ x"$1" = x"ci" ] || [ x"$1" = x"commit" ]; then > command svn --with-revprop=svn:x-committer-email=${EMAIL} "$@" > else > command svn "$@" > fi > } > ]]] Bletch. This one is begging for failure unless you can train your users to use a wrapper script every time - good luck with that. One important case where this approach will break, and cause acrimony, is Emacs VC mode. That's somewhere up to 50% of your users under open-source platforms, if the stats on editor usage are to be believed. The lesson from this criticism is intended to be that it's not enough to make Internet-scoped IDs possible, you have to make them *easy* - that is, not disruptive of normal workflow. But this has been fruitful. I think I can write a simple proposal about how to solve this problem now. I'll do it in my next email. -- http://www.catb.org/~esr/";>Eric S. Raymond A man with a gun is a citizen. A man without a gun is a subject.
Re: reposurgeon now writes Subversion repositories
Daniel Shahaf : > Server-side implementation, independent of RA method: (via brane) Ah, now that looks somewhat like progress. But some (possibly all) of these solutions have serious weaknesses which you need to think about. > [[[ > #!/usr/bin/env python > > import sys > from svn.repos import * > from svn.fs import * > from svn.core import SVN_PROP_REVISION_AUTHOR > > FULLNAMES = { > 'danielsh': 'Daniel Shahaf', > } > > reposdir, txnname = sys.argv[1:3] > > repos = svn_repos_open(reposdir, None) > fs = svn_repos_fs(repos) > txn = svn_fs_open_txn(fs, txnname, None) > propval = svn_fs_txn_prop(txn, SVN_PROP_REVISION_AUTHOR, None) > svn_fs_change_txn_prop(txn, SVN_PROP_REVISION_AUTHOR, >FULLNAMES.get(propval, propval), None) > ]]] This one confines your Unix-ID adhesion to the FULLNAMES array, which is a long step in the right direction because it means your repo history will be local-ID-clean. But it doesn't actually solve the mobility problem. If the project ever moves, you still have to patch the FULLNAMES dictionary by hand. This approach won't scale very well. I also note that you do really want "J. Random User " with a preferred "home" address as part of the mix, because the entropy of human names alone is not quite high enough. Yes, if I see "Daniel Shahaf" I'm pretty sure there is only one of those. But "Willam Smith" or "Robert Jones"? " :-) > Alternative server-side implementation (via markphip): > [[[ > AuthLDAPRemoteUserAttribute cn > ]]] A variant of this that does "J. Random User " looks like it might work provided there's an LDAP directory and we trust the LDAP directory to be up to date. The second assumption seems reasonable if we grant the first. But the first? I've heard of LDAP and know roughly what it does, but I've never seen a live instance. Forges don't have them. Maybe I'm being parochial, but this seems like a solution for a case too unusual to be very interesting. > Alternative server-side implementation (via breser): > [[[ > command="svnserve -t --tunnel-user='Daniel Shahaf'" ssh-rsa ... > ]]] Um, does this mean everyone's commits are coing to look like Daniel Shahaf made them? If not, where is --tunnel-user going to come from? > Client-side implementation (via danielsh): > [[[ > [ -n "${EMAIL}" ] && svn() { > if [ x"$1" = x"ci" ] || [ x"$1" = x"commit" ]; then > command svn --with-revprop=svn:x-committer-email=${EMAIL} "$@" > else > command svn "$@" > fi > } > ]]] Bletch. This one is begging for failure unless you can train your users to use a wrapper script every time - good luck with that. One important case where this approach will break, and cause acrimony, is Emacs VC mode. That's somewhere up to 50% of your users under open-source platforms, if the stats on editor usage are to be believed. The lesson from this criticism is intended to be that it's not enough to make Internet-scoped IDs possible, you have to make them *easy* - that is, not disruptive of normal workflow. But this has been fruitful. I think I can write a simple proposal about how to solve this problem now. I'll do it in my next email. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: 1.8 Progress
On Thu, Nov 29, 2012 at 4:52 PM, C. Michael Pilato wrote: > > > 2) Ev2. The notes say this is believed to be in a releasable state? Is > > there any work needed to verify this? Do we need to remove the use of > Ev2 > > in any place to avoid releasing with compatibility shims in use? Are we > > comfortable that the API is complete? > > Julian expressed doubt about whether the API was ready for prime-time. > > C-Mike expressed concern about the extremely low bus factor. > > Hyrum acknowledged both, and continued with: "We can always shuffle > headers > around or document the things as experimental, so committing ourselves to > the API as this point isn't my concern. The only real limiting around Ev2 > and 1.8 is issue #4116 which is svnrdump failures over ra_serf. In the > issue, I propose using Ev2 to get around the problem, since the dumpfile > format is so incongruent with the editor. Of course, we don't *have* to do > that, but as I've thought about it, any solution will require a bit o' > caching---which we've already implemented as part of the Ev2 shims. We > *might* be able to implement the svnrdump editor as Ev2, shim the thing on > the client side (which gives us the required caching) and release that way. > Or there might be a better solution I'm overlooking because I've got Ev2 > on > the brain." > This is basically boils down to "rdump isn't completely Delta Editor friendly, which interacts badly with Serf." This problem is only tangentially related to Ev2, but it was proposed as one of the possible solutions. It's probably better to try and pursue other solutions to this independent of Ev2. As for Ev2 itself, I don't see anything that should be blocking 1.8. If people are uncomfortable shipping the API, some documentation and/or header hackery should be sufficient to make it mutable in future releases. As far as I know, all the Ev2 work is entirely self-contained within Subversion. OWNERSHIP: Hyrum's got the most experience here, but due to his time > contention, we may very well have no owner for this at all. That's bad. Sadly, true. -Hyrum
Re: reposurgeon now writes Subversion repositories
Alan Barrett : > Perhaps it would be a good first step to add examples to the > documentation, showing how the admin can use "Full Name > " in the svn:author field, with all the common access > methods. Yes. I think it is (a) possible that better documentation can solve this problem, and (b) certain that better documentation is *necessary* to solve this problem. I'm willing to help. You can look at the description of the dump-load format at notes/dump-load-format.txt, most of which I wrote earlier this year, to see that this is not an idle promise. -- http://www.catb.org/~esr/";>Eric S. Raymond
Fwd: Regarding the Outreach Program for Women
Hi I am currently a second year student in IIIT-Hyderabad and am hugely interested in a lot of Projects that are being mentored here. It would be great if you could guide me how to go about applying for them.As it is i am not able to locate the respective mentors for your projects. Having seen this program only yesterday, i have not been able to do much work for contributions, but am confident on my abilities to make a difference in the same if given the right opportunity. I request you to please forward this to the respective people. I am attaching my cv and posting a link to my github repo. I would be highly obliged if you could have a look at them. https://github.com/shivanipoddariiith Thank You, Shivani Poddar cv-internopports.rtf Description: RTF file
Re: reposurgeon now writes Subversion repositories
Alan Barrett wrote on Sat, Dec 01, 2012 at 12:05:48 +0300: > Perhaps it would be a good first step to add examples to the > documentation, showing how the admin can use "Full Name " > in the svn:author field, with all the common access methods. Server-side implementation, independent of RA method: (via brane) [[[ #!/usr/bin/env python import sys from svn.repos import * from svn.fs import * from svn.core import SVN_PROP_REVISION_AUTHOR FULLNAMES = { 'danielsh': 'Daniel Shahaf', } reposdir, txnname = sys.argv[1:3] repos = svn_repos_open(reposdir, None) fs = svn_repos_fs(repos) txn = svn_fs_open_txn(fs, txnname, None) propval = svn_fs_txn_prop(txn, SVN_PROP_REVISION_AUTHOR, None) svn_fs_change_txn_prop(txn, SVN_PROP_REVISION_AUTHOR, FULLNAMES.get(propval, propval), None) ]]] Alternative server-side implementation (via markphip): [[[ AuthLDAPRemoteUserAttribute cn ]]] Alternative server-side implementation (via breser): [[[ command="svnserve -t --tunnel-user='Daniel Shahaf'" ssh-rsa ... ]]] Client-side implementation (via danielsh): [[[ [ -n "${EMAIL}" ] && svn() { if [ x"$1" = x"ci" ] || [ x"$1" = x"commit" ]; then command svn --with-revprop=svn:x-committer-email=${EMAIL} "$@" else command svn "$@" fi } ]]]
Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c
On Sat, Dec 1, 2012 at 6:36 AM, Justin Erenkrantz wrote: > On Fri, Nov 30, 2012 at 4:54 PM, wrote: >> >> Author: cmpilato >> Date: Fri Nov 30 21:54:35 2012 >> New Revision: 1415864 >> >> URL: http://svn.apache.org/viewvc?rev=1415864&view=rev >> Log: >> Implement in ra_serf "send-all" mode support for update-style REPORTs >> and their responses. (Currently disabled by compile-time conditionals.) >> >> (This one goes out to Ivan Zhakov.) > > > I've stated for a long time that I think the send-all mode is a huge mistake > architecturally because it is too prone to double-compression and TCP > pipeline stalls and is a tremendous burden on a properly-configured httpd > (by not taking advantage of server-side parallelism), it's nice to see it's > not *too* hard to shoehorn this bad idea back into ra_serf. We'd never be > able to shove the non-send-all approach into ra_neon. =) I'm wondering whether your concerns apply to both internet-wide deployments and local (all on the same LAN) ones. It seems to me that SVN has two sets of audiences when it comes to networking: some have to support users over the internet with sometimes slow and high-latency, perhaps flaky connections; and others have all their users on a local (or almost-local) network, and want to make optimal use of their infrastructure, which offers an absolutely rock-solid low-latency connection ... they'd like to shove the content through that (wide, short) pipe as quickly as possible. I'm no expert, but I suppose it's possible that those two audiences need two different networking configurations to make optimal use of their environment. If that's the case, it would be great if we could offer some (clear, simple to use) configuration directives for those admins to tune things ... Just my 2 cents ... -- Johan
Re: reposurgeon now writes Subversion repositories
On Sat, 01 Dec 2012, Eric S. Raymond wrote: I've lost count of the number of Subversion repo lifts I've done (has to be more than a dozen at this point), and in no case have I ever seen *anything* but a local Unix ID in the svn:author property. Yes, it's probably true that most svn repositories use short strings that resemble unix user ids, and a lot of the svn documentation uses such strings in examples. But it's also true that the admin can use almost any string they like. In repositories that I have set up, I have always used short strings that resemble local unix IDs, but in most cases those strings would not have been valid unix user names on the server host. Perhaps it would be a good first step to add examples to the documentation, showing how the admin can use "Full Name " in the svn:author field, with all the common access methods. --apb (Alan Barrett)