Re: Apache Subversion and the Outreach Program for Women

2012-12-01 Thread Stefan Sperling
On Sat, Dec 01, 2012 at 05:36:34PM +1000, Miriam Hochwald wrote:
> To whom it may concern,
> 
> I would like to express interest in contributing to Apache Subversion.

Hi Miriam!

I've volunteered as mentor for Subversion in OPW 2013.

> Projects of interest:
> 
>-
> 
>Improve bindings to other programming languages.
>- Show progress output.
>- Improve 'svn help'.
>- More customizable behavior for 'svn diff'.
> 
> My core languages are C, C++ and Java. I have a familiarity with other
> languages as well (e.g. Python, JavaScript, HTML, CSS etc.).

Do you have preference regarding which of these projects you'd
like to work on? Have you investigated any of these projects
more closely, and do you have any questions regarding any of
these projects?

Picking a project that you'd love doing and which will keep you
motivated is very important. You should have a gut feeling about
what the project entails, what you might want to learn about or
discover along the way, and communicate your thoughts on this.

Given that you know C and Python, the two main langauges used in
Subversion, you could take up pretty much any task of reasonable
scope for the 3 month internship. So if you find anything else
that interests you but is not the ideas list please don't hesitate
to talk to me about alternative project ideas.

> At this point in time I don't have Apache Subversion exposure. However, I
> note that there is an online book on the topic. My Noob status would
> provide a good insight into common learner perspectives, with relevance to
> the projects. I do have familiarity with UNIX and CVS.

Familiarity with CVS should be good enough, given that Subversion's
development is rooted in fixing CVS's shortcomings :)

> Please let me know if you would like to proceed with these discussions. I
> note that the deadline for the Outreach Program for Women is the 3rd of
> December.

Subversion joined the OPW program just last week, so the deadline for
3rd of December is rather tight. You should certainly try to formally
submit your application by then, but making a small contribution by
3rd of December is not a strict requirement.

Please note that as a project we prefer to keep any communication which
doesn't involve sensitive or private matters on the dev@ list (i.e this
list), which is publicly archived, so that everyone involved can stay
on top of what's going on.

I'm 'stsp' in #svn-dev on freenode, BTW. I'll try to be reachable
for you there as well, as far as my current schedule permits.


Re: RFC: simple proposal for Internet-scoped IDs

2012-12-01 Thread Stefan Sperling
On Sat, Dec 01, 2012 at 09:41:17AM -0500, Justin Erenkrantz wrote:
>  Whether I call him "zhakov" or "ivan" - it's the same person. 

No, he's very different on IRC ("zhakov") to when I sat next to
him in-person ("ivan") in a bar in Berlin at 2am in the morning.
Really!


Re: 1.8 Progress

2012-12-01 Thread Stefan Sperling
On Thu, Nov 29, 2012 at 04:52:15PM -0500, C. Michael Pilato wrote:
> I also seem to recall Stefan also saying something about not having time to
> work on this stuff for the remainder of 2012, but I can't find a reference
> for that at the moment, so perhaps I just misremembered.

I said so on IRC. I'm fully booked by elego customers for most of each
week until Christmas, which is why I don't have much time I can invest
into Subversion this month.

> OWNERSHIP:  Given Philip's comment, I believe it is reasonable to deem him
> the owner of this body of work, or at least co-owner with a
> possibly-time-constrained Stefan.  But perhaps all there is to "own" is the
> post-branch removal of the feature?

I'd prefer to get moves on trunk into a releasable state, or cut
the feature out on trunk and put it back in after branching 1.8.
I believe removing the feature would be somewhat invasive so it makes
sense to do so on trunk in several steps, rather than on the release
branch which is supposed to be stable at the time it is created.

Obviously, fixing updates of moves would be even better :)
I hope Philip or someone else can find time for this.

I'll definitely get back to this in January, see where things are
at that point, and move forward into whatever direction seems reasaonable.


Re: Fwd: Regarding the Outreach Program for Women

2012-12-01 Thread Stefan Sperling
On Sat, Dec 01, 2012 at 03:29:44PM +0530, Shivani Poddar wrote:
> Hi
> I am currently a second year student in IIIT-Hyderabad and am hugely
> interested in a lot of Projects that are being mentored here. It would be
> great if you could guide me how to go about applying for them.As it is i am
> not able to locate the respective mentors for your projects.

Hi Shivani,

I missed putting up my details as mentor on our OPW page, sorry.
I've added this information now.

> Having seen
> this program only yesterday, i have not been able to do much work for
> contributions, but am confident on my abilities to make a difference in the
> same if given the right opportunity.

Please don't worry too much about the small contribution deadline,
which officially is next Monday. We joined the OPW program rather
late in the game (just last week!), so it would be somewhat unreasonable
to require a contribution by Monday. I'm not sure whether OPW will
consider applications handed in past the deadline, however I'll try
to sort things out if your run into any problems due to that.

Have you already taken some time to find a project that you're interested
in doing. Have you had a look at the project ideas list on our OPW page?
http://subversion.apache.org/opw.html
Does any of these project ideas appeal to you, or if not do you have
any other project idea? I'm very open to suggestions, and would be happy
to mentor any project you'd be interested in doing.

Please note that as a project we prefer to keep any communication which
doesn't involve sensitive or private matters on the dev@ list (i.e this
list), which is publicly archived, so that everyone involved can stay
on top of what's going on.

I'm 'stsp' in #svn-dev on freenode, BTW. I'll try to be reachable
for you there as well, as far as my current schedule permits.


Apache Subversion and the Outreach Program for Women

2012-12-01 Thread Miriam Hochwald
To whom it may concern,

I would like to express interest in contributing to Apache Subversion.

Projects of interest:

   -

   Improve bindings to other programming languages.
   - Show progress output.
   - Improve 'svn help'.
   - More customizable behavior for 'svn diff'.

My core languages are C, C++ and Java. I have a familiarity with other
languages as well (e.g. Python, JavaScript, HTML, CSS etc.).

At this point in time I don't have Apache Subversion exposure. However, I
note that there is an online book on the topic. My Noob status would
provide a good insight into common learner perspectives, with relevance to
the projects. I do have familiarity with UNIX and CVS.

Please let me know if you would like to proceed with these discussions. I
note that the deadline for the Outreach Program for Women is the 3rd of
December.

Thank you.

Kind regards,
Miriam Hochwald

-- 
Founder & Director
Girl Geek Coffees (GGC)
sites.google.com/site/girlgeekcoffees


Re: non-skelta update editor mode in ra_serf (was Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c)

2012-12-01 Thread Johan Corveleyn
On Sat, Dec 1, 2012 at 10:40 PM, Branko Čibej  wrote:
> On 01.12.2012 22:18, Ivan Zhakov wrote:
>> Completely agree.
>>
>> My point was that in theory skelta-mode is cool, but it still needs a
>> lot of work to get it really done.
>
> Sorry? I've had "http-library = serf" in ~/.subversion/config for years,
> and have seen no problems. What does it take, in your opinion, to get it
> "really done"?

It has become abundantly clear during the last year that, just because
some people have been using it for years, doesn't mean it's stable /
done / ready for prime time. What do you think I spent my entire
Berlin hackathon week on last June, together with Justin and others?
And weeks before that and after ... and the time spent by others
during the last months fixing difficult to reproduce, yet important
bugs.

Those issues were not due to wrong configurations, or total edge
cases, but things that people with big installations would have run
into the first day they would have started using it.

I don't want to rehash all the discussions, I think there's mostly
consensus on the general direction we can / should go (which is
forward, not backward). But I find "I have been using it for years
without problems" not a very strong argument in this case.

Getting back to the actual topic ... as an svn admin, I'd definitely
like to have the choice between send-all and skelta mode, if that's
possible. Which one should be the default ... I don't know. But having
the choice is the most important thing. I'd sleep better at night,
knowing that I can flip a switch to reduce the number of requests
etc., in the unlikely case that would turn out to be a problem :-).

-- 
Johan


Re: Outreach Program for Women

2012-12-01 Thread Stefan Sperling
On Fri, Nov 30, 2012 at 08:01:51PM -0800, Lisa L wrote:
> Thanks, Greg. That'll give me something to get familiarized with until a
> mentoring-type person is available. :-)

Hi, the mentoring-type person would be me :)

I'm happy to hear you're interested in improving 'svn help'.

I put together the project ideas list in a hurry to get something
up on the website ASAP, since the application deadline is rather close.
So I may have missed some potentially projects suitable for OPW.
If you find that improving 'svn help' is not a suitable project
for you we can try to find something else if you prefer.

Most coding tasks in Subversion require knowledge of the C programming
language. I've tried to come up with project ideas that don't require
intimate knowledge of C in case an OPW applicant doesn't already know C.
Neither the bindings and 'svn help' project ideas strictly require C skills
and I'm happy to help out wherever C skills are necessary.

I hope you'll find an interesting project to work on.
If you have any questions just ask me. Note that as a project we
prefer to keep any communication which doesn't involve sensitive
or private matters on this list, which is publicly archived, so
that everyone involved can stay on top of what's going on.

I'm 'stsp' in #svn-dev on freenode, BTW. I'll try to be reachable
for you there as well, as far as my current schedule permits.


Re: non-skelta update editor mode in ra_serf (was Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c)

2012-12-01 Thread Branko Čibej
On 01.12.2012 22:18, Ivan Zhakov wrote:
> Completely agree.
>
> My point was that in theory skelta-mode is cool, but it still needs a
> lot of work to get it really done.

Sorry? I've had "http-library = serf" in ~/.subversion/config for years,
and have seen no problems. What does it take, in your opinion, to get it
"really done"?

>  So let's release ra_serf by
> piecemeal, because we also have significant amount of ra_serf issues
> unrelated to update editor.

We've been through that. There's one or maybe two bugs that are actual
bugs that pop up in edge cases, everything else is related to server
configuration etc.

Is your goal shipping without bugs?

-- Brane

-- 
Branko Čibej
Director of Subversion | WANdisco | www.wandisco.com



non-skelta update editor mode in ra_serf (was Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c)

2012-12-01 Thread Ivan Zhakov
[ changing subject to make topic more visible]

On Sat, Dec 1, 2012 at 9:00 PM, Mark Phippard  wrote:
> On Sat, Dec 1, 2012 at 12:36 AM, Justin Erenkrantz
>  wrote:
>> On Fri, Nov 30, 2012 at 4:54 PM,  wrote:
>>>
>>> Author: cmpilato
>>> Date: Fri Nov 30 21:54:35 2012
>>> New Revision: 1415864
>>>
>>> URL: http://svn.apache.org/viewvc?rev=1415864&view=rev
>>> Log:
>>> Implement in ra_serf "send-all" mode support for update-style REPORTs
>>> and their responses.  (Currently disabled by compile-time conditionals.)
>>>
>>> (This one goes out to Ivan Zhakov.)
>>
>>
>> I've stated for a long time that I think the send-all mode is a huge mistake
>> architecturally because it is too prone to double-compression and TCP
>> pipeline stalls and is a tremendous burden on a properly-configured httpd
>> (by not taking advantage of server-side parallelism), it's nice to see it's
>> not *too* hard to shoehorn this bad idea back into ra_serf.  We'd never be
>> able to shove the non-send-all approach into ra_neon.  =)
>
> Just to be clear, I do not believe anyone is suggesting we completely
> abandon the non-send-all approach.  I like that this approach can
> offer good performance on a well-configured server as well as enable
> new features/ideas such as not even fetching the full-texts that we
> already have locally.  I think the question is simply what is the best
> way to deliver this.
Completely agree.

My point was that in theory skelta-mode is cool, but it still needs a
lot of work to get it really done. So let's release ra_serf by
piecemeal, because we also have significant amount of ra_serf issues
unrelated to update editor.

>
>> Here's my suggestion for consideration - let's experiment with this setting
>> in the beta release process with the setting as-is - that is we always do
>> the parallel updates unconditionally (except perhaps when svnrdump is being
>> stupid).  If we get real users complaining about the update during that
>> cycle, we can then figure out either switching the default and/or adding a
>> config-option or even allowing some control via capabilities exchange.
>
> I feel pretty strongly that we should at minimum use the send-all
> approach when talking to pre-1.8 servers.  Even though in some
> situations it could still offer good performance.  I just think it
> would be more respectful to our users (server admins in this case) to
> not change this behavior in a way that could surprise them.  Maybe we
> could come up with exceptions, such as older servers that are using
> the SVNAllowBulkUpdates off directive.  In that situation we should
> use the new behavior since that is basically what that directive is
> asking for.
>
> As I said in another thread, I think we should treat a 1.8 server the
> same way and require someone that was upgrading to add some new
> directive to enable the new feature.  This would allow a server admin
> to setup his server correctly, including using things like
> mod_deflate, and turn on the new behavior rather than get it
> automatically simply because they upgraded their binaries.
>
> This seems like it satisfies everyone.  Existing users, especially
> those running older server versions, would not be surprised by new and
> unwanted client behavior, and it would still be easy to configure a
> new server properly to support the non-send-all mode when it was
> desired.  I just do not see what the downside would be to approaching
> it this way.
>
+1.

-- 
Ivan Zhakov


Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c

2012-12-01 Thread Justin Erenkrantz
On Sat, Dec 1, 2012 at 12:00 PM, Mark Phippard  wrote:

> I feel pretty strongly that we should at minimum use the send-all
> approach when talking to pre-1.8 servers.  Even though in some
> situations it could still offer good performance.  I just think it
> would be more respectful to our users (server admins in this case) to
> not change this behavior in a way that could surprise them.  Maybe we
> could come up with exceptions, such as older servers that are using
> the SVNAllowBulkUpdates off directive.  In that situation we should
> use the new behavior since that is basically what that directive is
> asking for.
>

Without a lot of concrete feedback that parallel updates should be removed
by default, I strongly believe that we should not be conservative on this
issue.  The issue here is not one of compatibility - ra_serf has been
around for years and can talk just fine to older servers (way back to prior
to 1.0 servers actually).  The only argument against altering the default
behavior is that there might be an admin of a high-traffic site somewhere
that might suddenly be shocked by more HTTP requests coming in.  I honestly
have little to no sympathy for such an admin who doesn't properly
understand how to manage a large installation - they likely have other
issues that they are not paying attention to.  Until we have hoards of
users coming in and complaining about this, I think it's silly to be
conservative here.

I'm definitely not against giving knobs to the client or to the admin in
weird corner cases (provided someone cares enough to write that up), but I
strongly believe that for now we should do the right thing out of the box
in 1.8 - which is to utilize parallel updates.  -- justin


Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c

2012-12-01 Thread Mark Phippard
On Fri, Nov 30, 2012 at 5:38 PM, C. Michael Pilato  wrote:
> On 11/30/2012 05:25 PM, Mark Phippard wrote:
>> On Fri, Nov 30, 2012 at 5:23 PM, C. Michael Pilato  
>> wrote:
>>> On 11/30/2012 05:00 PM, Mark Phippard wrote:
 On Fri, Nov 30, 2012 at 4:54 PM,   wrote:
> Author: cmpilato
> Date: Fri Nov 30 21:54:35 2012
> New Revision: 1415864
>
> URL: http://svn.apache.org/viewvc?rev=1415864&view=rev
> Log:
> Implement in ra_serf "send-all" mode support for update-style REPORTs
> and their responses.  (Currently disabled by compile-time conditionals.)

 Sweet!

 Would this also resolve the issue with svnrdump, or could it?  When
 Serf is using this mode, I assume it is also now conforming to Ev1?
>>>
>>> I guess it *could* based on what I'm reading is considered the source of
>>> svnrdump+ra_serf's problems, but I'm a bit confused -- I thought svnrdump
>>> used the ra-replay API instead of the ra-update one?
>>
>> Guess I am more wondering if it was another area where the same
>> solution could be applied?
>
> No, that's just it.  ra_serf's implementation of the ra-replay API is
> single-connection, just like ra-neon's was.  What suprises me is that
> svnrdump *does* use the ra-update API.
>
> Ah!  I see why, now.  When not doing an incremental dump, 'svnrdump dump'
> uses the ra-update API to handle that initial checkout-like revision.  After
> that (and otherwise when in incremental mode), it uses the ra-replay API.
> So yes, I believe svnrdump would be in fine shape over ra-serf if it was
> asking the server to use this "send-all" mode, where document Ev1 drive
> ordering *should* be honored.

So this sounds like pretty great news.  Regardless what we decide to
do for Serf with normal updates, it seems like we could
unconditionally make svnrdump tap into the send-all mode and that
would remove a release blocker.


-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/


Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c

2012-12-01 Thread Mark Phippard
On Sat, Dec 1, 2012 at 12:36 AM, Justin Erenkrantz
 wrote:
> On Fri, Nov 30, 2012 at 4:54 PM,  wrote:
>>
>> Author: cmpilato
>> Date: Fri Nov 30 21:54:35 2012
>> New Revision: 1415864
>>
>> URL: http://svn.apache.org/viewvc?rev=1415864&view=rev
>> Log:
>> Implement in ra_serf "send-all" mode support for update-style REPORTs
>> and their responses.  (Currently disabled by compile-time conditionals.)
>>
>> (This one goes out to Ivan Zhakov.)
>
>
> I've stated for a long time that I think the send-all mode is a huge mistake
> architecturally because it is too prone to double-compression and TCP
> pipeline stalls and is a tremendous burden on a properly-configured httpd
> (by not taking advantage of server-side parallelism), it's nice to see it's
> not *too* hard to shoehorn this bad idea back into ra_serf.  We'd never be
> able to shove the non-send-all approach into ra_neon.  =)

Just to be clear, I do not believe anyone is suggesting we completely
abandon the non-send-all approach.  I like that this approach can
offer good performance on a well-configured server as well as enable
new features/ideas such as not even fetching the full-texts that we
already have locally.  I think the question is simply what is the best
way to deliver this.

> Here's my suggestion for consideration - let's experiment with this setting
> in the beta release process with the setting as-is - that is we always do
> the parallel updates unconditionally (except perhaps when svnrdump is being
> stupid).  If we get real users complaining about the update during that
> cycle, we can then figure out either switching the default and/or adding a
> config-option or even allowing some control via capabilities exchange.

I feel pretty strongly that we should at minimum use the send-all
approach when talking to pre-1.8 servers.  Even though in some
situations it could still offer good performance.  I just think it
would be more respectful to our users (server admins in this case) to
not change this behavior in a way that could surprise them.  Maybe we
could come up with exceptions, such as older servers that are using
the SVNAllowBulkUpdates off directive.  In that situation we should
use the new behavior since that is basically what that directive is
asking for.

As I said in another thread, I think we should treat a 1.8 server the
same way and require someone that was upgrading to add some new
directive to enable the new feature.  This would allow a server admin
to setup his server correctly, including using things like
mod_deflate, and turn on the new behavior rather than get it
automatically simply because they upgraded their binaries.

This seems like it satisfies everyone.  Existing users, especially
those running older server versions, would not be surprised by new and
unwanted client behavior, and it would still be easy to configure a
new server properly to support the non-send-all mode when it was
desired.  I just do not see what the downside would be to approaching
it this way.

-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/


Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c

2012-12-01 Thread Daniel Shahaf
Justin Erenkrantz wrote on Sat, Dec 01, 2012 at 09:50:29 -0500:
> On Sat, Dec 1, 2012 at 9:41 AM, Branko Čibej  wrote:
> 
> > On 01.12.2012 14:31, Justin Erenkrantz wrote:
> > > And, yes, that clearly could all be done in time for 1.8 without
> > > jeopardizing the timelines one tiny bit. =P
> >
> > Eep ... :)
> >
> >
> > Another thing I've been thinking about is this: Why are we using SHA1
> > checksums on the server and on the wire for consistency checks when a
> > 64-bit CRC would do the job just as well, and 15 times cheaper? And
> > banging my head against the wall for not thinking of this 10 years ago.
> >
> > I can sort of understand the use of SHA1 as a content index for
> > client-side pristine files. On the server, however ... dunno. Maybe we
> > could design something akin to what the rsync protocol does, but for
> > repository-wide data storage. Could be quite tricky to achieve locality,
> > however.
> >
> 
> The one thing that's nice with using SHA checksums is we're using it
> everywhere.  It makes protocol debugging a *lot* easier - since we also
> used SHA checksums as the content index, that makes it easier to compare
> what we recorded in libsvn_wc to what was sent by the server.  If we
> diverged the checksums algorithms, it'd be hard to do a quick comparison
> visually (do the checksums match?) without actually running the checksum
> yourself!

If that's the problem, have the server send the recorded-in-fs sha1
checksum as an attribute that the client ignores. (in SVN_DEBUG builds
only)

> 
> So, I think we optimized for humans here...and I'm okay with that.  We can
> always build faster processors...and take advantage of parallelism.  =)
> 
> There I go off on a tangent again.
> >
> 
> *grin*  -- justin


Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c

2012-12-01 Thread Justin Erenkrantz
On Sat, Dec 1, 2012 at 9:41 AM, Branko Čibej  wrote:

> On 01.12.2012 14:31, Justin Erenkrantz wrote:
> > And, yes, that clearly could all be done in time for 1.8 without
> > jeopardizing the timelines one tiny bit. =P
>
> Eep ... :)
>
>
> Another thing I've been thinking about is this: Why are we using SHA1
> checksums on the server and on the wire for consistency checks when a
> 64-bit CRC would do the job just as well, and 15 times cheaper? And
> banging my head against the wall for not thinking of this 10 years ago.
>
> I can sort of understand the use of SHA1 as a content index for
> client-side pristine files. On the server, however ... dunno. Maybe we
> could design something akin to what the rsync protocol does, but for
> repository-wide data storage. Could be quite tricky to achieve locality,
> however.
>

The one thing that's nice with using SHA checksums is we're using it
everywhere.  It makes protocol debugging a *lot* easier - since we also
used SHA checksums as the content index, that makes it easier to compare
what we recorded in libsvn_wc to what was sent by the server.  If we
diverged the checksums algorithms, it'd be hard to do a quick comparison
visually (do the checksums match?) without actually running the checksum
yourself!

So, I think we optimized for humans here...and I'm okay with that.  We can
always build faster processors...and take advantage of parallelism.  =)

There I go off on a tangent again.
>

*grin*  -- justin


Re: RFC: simple proposal for Internet-scoped IDs

2012-12-01 Thread Ben Reser
On Sat, Dec 1, 2012 at 8:53 AM, Eric S. Raymond  wrote:
> There.  You're done.  It's backward-compatible (older installations
> can ignore the feature and nothing breaks).  It's independent of your
> authentication method, but you can add auth checks if you care enough.
> It scales well because the burden of setting up FULLNAME strings is
> small and distributed - also because project administrators only have
> to make one decision, once.

Provided that it's not the default then I don't really see a problem with this.

> I'm not certain, but if your server-side hooks work the way I think
> they do, all of this except (1) can be done in Python.  Not having to
> add complexity to your C code is a significant virtue.

It's not really about complexity in my opinion.  It's about the fact
that putting it in the C code would be us trying to implement local
policies which can be implemented in a hook script without us trying
to consider every possible policy.

> If you have an LDAP setup or something like that, you don't need this
> - you flip some other switch once and stuff Just Works.  Which is fine:
> the point of this proposal isn't to force a DVCS-like choice, it's to
> put in place a low-effort path to Internet-scoped attribution IDs that
> will *always work*.

LDAP or something like that is going to pay off much higher dividends
for the forge project than this initiative in my opinion.

> Somebody alleged "this is a social problem". That's only half-true, but
> now I'm going to focus on the true part.  "Social problem" doesn't
> mean you can or should ignore it.  It means you have to lead by
> educating and jawboning your users.  A large step is just being willing
> to say, where your users can see it, "Internet-scoped attributions
> are important.  Here's how to make them work..."

I really don't think in the context of Subversion that this is as
important as you make it out to be.  The ASF Infra folks have probably
done more repository moves than just about anyone I can think about.
They've managed to handle this without internet scoped attributions.

The only thing this really buys you is that you theoretically don't
have to worry about userid conflicts.  Which technically you don't
since the svn:author field isn't used in any meaningful way anyway.

I'd say that "First Last " does not guarantee there
are no userid conflicts since if the userid ends up being:
John Smith 

You could still end up with a situation where John Smith 1 stops using
gmail and loses that address and John Smith 2 comes along and gets it.

Unlikely, but you still have the problem, it's just less likely.


Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c

2012-12-01 Thread Branko Čibej
On 01.12.2012 14:31, Justin Erenkrantz wrote:
> And, yes, that clearly could all be done in time for 1.8 without
> jeopardizing the timelines one tiny bit. =P

Eep ... :)


Another thing I've been thinking about is this: Why are we using SHA1
checksums on the server and on the wire for consistency checks when a
64-bit CRC would do the job just as well, and 15 times cheaper? And
banging my head against the wall for not thinking of this 10 years ago.

I can sort of understand the use of SHA1 as a content index for
client-side pristine files. On the server, however ... dunno. Maybe we
could design something akin to what the rsync protocol does, but for
repository-wide data storage. Could be quite tricky to achieve locality,
however.

There I go off on a tangent again.

-- Brane

-- 
Branko Čibej
Director of Subversion | WANdisco | www.wandisco.com



Re: RFC: simple proposal for Internet-scoped IDs

2012-12-01 Thread Justin Erenkrantz
On Sat, Dec 1, 2012 at 9:28 AM, Daniel Shahaf wrote:

> BTW, the ability to change svn:author at will is one of the reasons they
> aren't global-scoped: if Subversion ever migrated away from ASF, we can
> _then_ change all svn:author revprops --- just like we once changed
> "zhakov" (implied @tigris) to "ivan" (implied @apache).
>

Exactly.  And, to be honest, I probably never realized that Ivan's author
tag changed...I knew it was him in both cases as I'm involved in the
project.  So, for most human-scale projects, I think that you don't need
globally unique IDs as there's a defined community and set of participants.
 Whether I call him "zhakov" or "ivan" - it's the same person.  I know
that, you know that...and, really, all of the people who care know that.  =)

For projects where people don't know everyone (Linux), I can see why
globally unique IDs are helpful to contributors.  But, I would shudder if
suddenly svn's own blame output emitted "Daniel Shahaf <
d...@daniel.shahaf.name>" instead of "danielsh".  I have that map already in
my head thank-you-very-much.  Hence, this is why I'd be a strong proponent
of them being in separate revprops - a "local" project name (svn:author)
and something that more uniquely identifies the contributor (FULLNAME blah
blah blah).  And, perhaps have an option on the client as to which one to
use - I could see some folks wanting the GUID, but that's just way too
verbose for me...  -- justin


Re: RFC: simple proposal for Internet-scoped IDs

2012-12-01 Thread Daniel Shahaf
Justin Erenkrantz wrote on Sat, Dec 01, 2012 at 09:08:05 -0500:
> And, once again, I'll reiterate my earlier point that FULLNAME can be added
> retroactively pretty easily to existing SVN repositories.  So, for
> svn.apache.org, after we might deploy a FULLNAME infrastructure, we could
> easily craft a tool to go back to all old revisions and annotate them
> correctly.  Easy peasy.  -- justin

BTW, the ability to change svn:author at will is one of the reasons they
aren't global-scoped: if Subversion ever migrated away from ASF, we can
_then_ change all svn:author revprops --- just like we once changed
"zhakov" (implied @tigris) to "ivan" (implied @apache).


Re: reposurgeon now writes Subversion repositories

2012-12-01 Thread Ben Reser
On Sat, Dec 1, 2012 at 8:14 AM, Eric S. Raymond  wrote:
> This one confines your Unix-ID adhesion to the FULLNAMES array,  which
> is a long step in the right direction because it means your repo history
> will be local-ID-clean.

It confines it to whatever value that python script could be taught
how to get it.  I'm sure you can modify the python script to get it
from a different source.

For that matter you could have the script in the repo and use a
post-commit script that updates it everytime someone commits it.  Then
the script moves with the repo.

> But it doesn't actually solve the mobility problem.  If the project
> ever moves, you still have to patch the FULLNAMES dictionary by hand.
> This approach won't scale very well.

Of course it doesn't scale.  It's a trivial example to demonstrate the
technique.

What I don't understand is your hypothetical situation is demanding an
awful lot of Subversion.  You've scoped things like an issue tracker
and other things as being part of this.  But for some reason you've
not bothered to scope an authentication system and exporting and
moving the users.  All of these forge sites allow you to access the
repo with the same username/password as the issue tracker etc...

So you need some sort of federated (even if it's just specific to each
project) authentication system.  Subversion doesn't provide that for
you, nor should it.

You're probably not going to find one that's ready made to your
situation either.  You're going to need to do some thinking about how
to configure things.

> I also note that you do really want "J. Random User "
> with a preferred "home" address as part of the mix, because the
> entropy of human names alone is not quite high enough.  Yes, if I see
> "Daniel Shahaf" I'm pretty sure there is only one of those.  But
> "Willam Smith" or "Robert Jones"? " :-)

And it's trivial to adjust it to be that way.

> But the first?  I've heard of LDAP and know roughly what it does, but
> I've never seen a live instance.  Forges don't have them.  Maybe I'm
> being parochial, but this seems like a solution for a case too unusual
> to be very interesting.

Why not?  What's so hard about setting up an LDAP instance for the project?

>> Alternative server-side implementation (via breser):
>> [[[
>> command="svnserve -t --tunnel-user='Daniel Shahaf'" ssh-rsa ...
>> ]]]
>
> Um, does this mean everyone's commits are coing to look like
> Daniel Shahaf made them?  If not, where is --tunnel-user going to
> come from?

No this setup is something that gets added to the start of everyone
line (different for each user) of the authorized_keys file for the
user you're having people use with svn+ssh.  Generally I'd expect
whatever system you're using to manage these keys is going to handle
this for you(e.g. user goes to some web form and pastes their public
key in and then this system edits the authorized_keys file).  You'll
have to write something.

> The lesson from this criticism is intended to be that it's not
> enough to make Internet-scoped IDs possible, you have to make
> them *easy* - that is, not disruptive of normal workflow.

I'd say that the choices you've been presented with are relatively
easy to implement.  Tons of corporate users have managed to implement
things like this.

What isn't easy is what you're really asking to do.  Which is systems
design.  You want to pull together a bunch of disparate programs and
make them work together in a coordinated and seamless way.  That's not
terribly easy to do without putting some degree of time building the
infrastructure around them.

Which is really what a forge site is about.

If you want to build a forge site that has portable setups then you're
going to have to take and write a way to export all the data (not just
the repositories, issue trackers db, wiki db, etc...) but also all the
glue between those pieces.

Unless you've got multiple existing forges already interested in
implementing something like this that come together to implement an
agreed upon data format.  Your best bet is going to be implementing a
packaged up system that uses various systems and then exports and
imports your data format.

We've gone well beyond the area that Subversion is involved and quite
frankly we're heading entirely into off topic design work for your
forge.


Re: reposurgeon now writes Subversion repositories

2012-12-01 Thread Branko Čibej
On 01.12.2012 14:14, Eric S. Raymond wrote:
> (Apologies if this is a duplicate send.  I just had a disturbing
> glitch in my MUA and want to make sure it got out.)
>
> Daniel Shahaf :
>> Server-side implementation, independent of RA method: (via brane)
> Ah, now that looks somewhat like progress.  But some (possibly all) of
> these solutions have serious weaknesses which you need to think about.
>
>> [[[
>> #!/usr/bin/env python
>>
>> import sys
>> from svn.repos import *
>> from svn.fs import *
>> from svn.core import SVN_PROP_REVISION_AUTHOR
>>
>> FULLNAMES = {
>>   'danielsh': 'Daniel Shahaf',
>> }
>>
>> reposdir, txnname = sys.argv[1:3]
>>
>> repos = svn_repos_open(reposdir, None)
>> fs = svn_repos_fs(repos)
>> txn = svn_fs_open_txn(fs, txnname, None)
>> propval = svn_fs_txn_prop(txn, SVN_PROP_REVISION_AUTHOR, None)
>> svn_fs_change_txn_prop(txn, SVN_PROP_REVISION_AUTHOR,
>>FULLNAMES.get(propval, propval), None)
>> ]]]
> This one confines your Unix-ID adhesion to the FULLNAMES array,  which
> is a long step in the right direction because it means your repo history
> will be local-ID-clean.  
>
> But it doesn't actually solve the mobility problem.  If the project
> ever moves, you still have to patch the FULLNAMES dictionary by hand.
> This approach won't scale very well.

Oh come on. Daniel was giving an example cobbled up in all of 5 minutes.
Surely you can imagine replacing FULLNAMES with some user database?

> I also note that you do really want "J. Random User " 
> with a preferred "home" address as part of the mix, because the
> entropy of human names alone is not quite high enough.  Yes, if I see
> "Daniel Shahaf" I'm pretty sure there is only one of those.  But
> "Willam Smith" or "Robert Jones"? " :-)

See above. You can put anything into FULLNAMES and/or a database and/or
LDAP (which is just a database).

>> Alternative server-side implementation (via markphip):
>> [[[
>> AuthLDAPRemoteUserAttribute cn
>> ]]]
> A variant of this that does "J. Random User "
> looks like it might work provided there's an LDAP directory and we trust 
> the LDAP directory to be up to date.  The second assumption seems
> reasonable if we grant the first.
>
> But the first?  I've heard of LDAP and know roughly what it does, but
> I've never seen a live instance.  Forges don't have them.  Maybe I'm
> being parochial, but this seems like a solution for a case too unusual
> to be very interesting.

Oh right. Does it make the solution any less unusual if I tell you that
all of the ASF services, including Subversion, have single-signon via
LDAP? Or that you can just as easily replace mod_ldap with
mod_authn_ which essentially brings you back to the
post-commit hook example.

> But this has been fruitful.  I think I can write a simple proposal
> about how to solve this problem now.  I'll do it in my next email.

No offence, but it sure looks as if you're deliberately nitpicking in
order to give yourself an excuse for writing a proposal for a feature
that Subversion, essentially, already has.

Certainly I'll read your proposal and don't intend to dismiss it out of
hand. But trusting the server to properly authenticate committers is a
basic axiom of Subversion's centralized model. And for the record, it's
also a basic axiom of GitHub's centralized model.


-- Brane


P.S.: I find it fascinating that DVCS aficionados haven't noticed that
GitHub takes the D out of DVCS very effectively, thereby making git
actually useful for most normal people.

-- 
Branko Čibej
Director of Subversion | WANdisco | www.wandisco.com



Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c

2012-12-01 Thread Justin Erenkrantz
On Sat, Dec 1, 2012 at 9:01 AM, Lieven Govaerts  wrote:

> There are some scenario's where either the server admin or the user
> can decide if parallel requests make sense or not.
>
> I'm specifically thinking of the use Kerberos per request
> authentication. These responses can't be cached on the client side,
> and require the authorization header to be sent for each request.
> Assuming 2 step handshake of which serf can bypass the first, this
> means an overhead per request of 1-10KB, with a 3 step handshake each
> request has to be sent twice further increasing the overhead.
> IMHO in this scenario the server admin should be able to veto the use
> of parallel requests.
>
> And the same is true for https connections, where it's also the server
> admin who can decide if the necessary caches have been put in place to
> enable the benefits of parallel requests.
>

Totally agreed.  I'd favor a three-value httpd directive option on the
server-side that is advertised in the capabilities exchange:

- default (client defaults to parallel if ra_serf, serial if older ra_neon
client; or if client overrides ra_serf via their local servers options)
- serial (server suggests to client that it should be serial; but permit
parallel when client wants it)
- force-serial (same capability advertisement, but always trigger send-all
responses regardless of what client asks for)

I'm 95% sure we have code in ra_serf that handles the case where the server
sends us inline responses anyway as older (prior to 1.2, IIRC) always sent
inline responses no matter what we send...so, it should be fairly
straightforward decision tree with minimal code changes.

My $.02...which is still not enough for me to write the patch.  =)  --
justin


Re: RFC: simple proposal for Internet-scoped IDs

2012-12-01 Thread Justin Erenkrantz
On Sat, Dec 1, 2012 at 8:53 AM, Eric S. Raymond  wrote:

> I'm not certain, but if your server-side hooks work the way I think
> they do, all of this except (1) can be done in Python.  Not having to
> add complexity to your C code is a significant virtue.
>

Here's another approach to take with regards to setting the FULLNAME field
that doesn't require any change to the client and can be deployed
server-side via hooks without any code changes at all.  So, (1) can be done
in Python pretty easily for the lazy users and coders who don't want to do
anything at all.  =)

If you have a centralized registry (like either LDAP or
http://people.apache.org/committer-index.html), the server in the
server-side hooks can set FULLNAME for each svn:author if isn't set by the
client by looking up its internal directory.  Within the ASF
infrastructure, we have tools to allow committers to manage fields like
this in a self-service way.  So, the server admin can default that field as
they like and give the users to set that field as they like.

I believe that this is one of the benefits of a centralized infrastructure
- we can make it so that every client doesn't *have* to set something
themselves on their client to utilize FULLNAME.

And, once again, I'll reiterate my earlier point that FULLNAME can be added
retroactively pretty easily to existing SVN repositories.  So, for
svn.apache.org, after we might deploy a FULLNAME infrastructure, we could
easily craft a tool to go back to all old revisions and annotate them
correctly.  Easy peasy.  -- justin


Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c

2012-12-01 Thread Lieven Govaerts
On Sat, Dec 1, 2012 at 2:31 PM, Justin Erenkrantz  wrote:
> On Sat, Dec 1, 2012 at 5:59 AM, Johan Corveleyn  wrote:
>>
>> I'm wondering whether your concerns apply to both internet-wide
>> deployments and local (all on the same LAN) ones.
>
>
> That line is certainly a fair one to draw in the sand.  That said, I think
> the internal use case cries out even *more* for the parallel updates as the
> internal server in that environment is often wildly over-provisioned on the
> CPU side - with a fairly low-traffic environment, you want to take advantage
> of the parallel cores of a CPU to drive the updates.
>
> Generally speaking, what I discovered years ago back in 2006 (yikes) and I
> believe is still true as we near 2013 (shudder), if everything else is
> perfectly optimized (disk, latency, bandwidth, etc.), you're going to
> eventually bottleneck on the checksumming on both client and server - which
> is entirely CPU-bound and is expensive.  You can solve that by splitting out
> the work across multiple cores - for a server, you need to utilize multiple
> parallel requests in-flight; and for a client, you then need to parallelize
> the editor drive.
>
> The reason that disk isn't such a bottleneck as you might first expect is
> due to the OS's buffer cache - for reads on the server-side, common data is
> already going to be in RAM so hot spots in the fsfs repos will already be in
> memory, for writes on the client-side, modern client OSes won't necessarily
> block you until everything is sync'd to disk.  But, once you exhaust the
> capabilities of RAM, your underlying disk architecture matters a lot and one
> that might not be intuitive to those that haven't spent a lot of time
> closely with them.  (Hi Brane!)  If you are using direct-attached storage
> locally on either server or client, then you will probably be bottlenecked
> right there.  However, if your corporate environment has an NFS filer or SAN
> (a la NetApp/EMC) backing the FSFS repository or as NFS working copies (oh
> so common), those large disk subsystems are geared towards parallel I/Os -
> not single-threaded I/O performance - Isilon/BlueArc-class storage is
> however; but I've yet to see anyone obsessed enough about SVN I/O perf to
> place either their repository or working copies on a BlueArc-class storage
> system!  So, if you are not using direct-attached storage and are using NFS
> today in a corporate environment on either client or server, then you want
> to parallelize everything so that you can take advantage of the disk/network
> I/O architecture preferred by NetApp/EMC.  Throwing more cores against a
> NetApp/EMC storage system in a high-available bandwidth environment allows
> for linear performance returns (i.e., reading/writing one I/O is 1X, two
> threads is 2X, three threads is 3X, etc, etc.).
>
> To that end, I'd eventually love to see ra_serf drive the update editor
> across multiple threads so that the checksum and disk I/O bottleneck can be
> distributed across cores on the client-side as well.  Compared to where we
> were in 2006, that's the biggest inefficiency we have yet to solve and take
> advantage of.  And, I'm sure this'll break all sorts of promises in the Ev1
> and perhaps Ev2 world and drive C-Mike even crazier.  =)  But, if you want
> to put a rocket pack on our HTTP performance, that's exactly what we should
> do.  I'm reasonably certain that serf itself could be finely tuned to handle
> network I/O in a single thread at or close to wire-speed even on a 10G
> connection with a modern processor/OS - it's what we do with the file
> contents/textdeltas that needs to be shoved to a different set of worker
> threads and remove all of that libsvn_wc processing from blocking network
> traffic processing and get it all distributed and thread-safe.  If we do
> that, woah, I'd bet that we are we going to make things way faster across
> the board and completely blow everything else out of the water when our
> available bandwidth is high - which is the case in an internal network.
> And, yes, that clearly could all be done in time for 1.8 without
> jeopardizing the timelines one tiny bit.  =P
>
> So, that's my long-winded answer of saying that, yah, even in an internal
> LAN environment, you still want to parallelize.
>
> However, I'm definitely not going to veto a patch that would add an httpd
> directive that allows the server to steer the client - unless overridden by
> the client's local config - to using parallel updates or not.  -- justin

There are some scenario's where either the server admin or the user
can decide if parallel requests make sense or not.

I'm specifically thinking of the use Kerberos per request
authentication. These responses can't be cached on the client side,
and require the authorization header to be sent for each request.
Assuming 2 step handshake of which serf can bypass the first, this
means an overhead per request of 1-10KB, with a 3 step handshake each
request has to be sent twice fur

RFC: simple proposal for Internet-scoped IDs

2012-12-01 Thread Eric S. Raymond
This discussion has been fruitful.  The responses from Greg, Branko
and others suggest that you guys are actually engaged with the
project-mobility problem now - apologies if my approach seemed a bit too
boot-to-the head, but I really am trying to be helpful.

I think I can now write a simple proposal that will work.

Despite what some people in this conversation have thought, I'm not
ideologically fixated on "user sets his own attribution ID".  But
I keep coming back to that because it seems to be the only solution
that scales up and covers all the deployment cases.

Here's the proposal:

1. Add support to the client tools for shipping a FULLNAME field
mined from somewhere under ~/.subversion.  Maybe the existing 
username entry will do, maybe it won't - I see arguments both ways.
I don't care, we can fill in that detail later.

2. Add server-side logic that says: if you see a FULLNAME field
in a request, use that to fill svn:author.  (Yes, in practice 
you used a different, dedicated revprop to carry FULLNAME.
That's OK, it's an implementation detail.)

3. Add a config switch to the server side that tells it to reject 
commit attempts with a "Set your FULLNAME, please" message if it
doesn't see a FULLNAME field in the request. Initially default this
switch off.

4. (Important) Tell repository administrators about this in the docs,
and say that turning on FULLNAME-required is best practice, and
explain why. Since I know how much most people hate writing docs,
I volunteer to do this part.

5. If you're really worried about spoofing, you add some server-side
logic that stores (auth-cookie, FULLNAME) pairs whenever a new
FULLNAME arrives and barfs if a known auth-cookie arrives with a
known FULLNAME and they don't match. But this is an optional extra,
field experience says you don't need it.

There.  You're done.  It's backward-compatible (older installations
can ignore the feature and nothing breaks).  It's independent of your
authentication method, but you can add auth checks if you care enough.
It scales well because the burden of setting up FULLNAME strings is
small and distributed - also because project administrators only have
to make one decision, once.

I'm not certain, but if your server-side hooks work the way I think
they do, all of this except (1) can be done in Python.  Not having to
add complexity to your C code is a significant virtue.

If you have an LDAP setup or something like that, you don't need this
- you flip some other switch once and stuff Just Works.  Which is fine:
the point of this proposal isn't to force a DVCS-like choice, it's to
put in place a low-effort path to Internet-scoped attribution IDs that
will *always work*.

Somebody alleged "this is a social problem". That's only half-true, but
now I'm going to focus on the true part.  "Social problem" doesn't
mean you can or should ignore it.  It means you have to lead by 
educating and jawboning your users.  A large step is just being willing
to say, where your users can see it, "Internet-scoped attributions
are important.  Here's how to make them work..."
-- 
http://www.catb.org/~esr/";>Eric S. Raymond

Whether the authorities be invaders or merely local tyrants, the
effect of such [gun control] laws is to place the individual at the 
mercy of the state, unable to resist.
-- Robert Anson Heinlein, 1949


Re: reposurgeon now writes Subversion repositories

2012-12-01 Thread Alan Barrett

On Sat, 01 Dec 2012, Eric S. Raymond wrote:

Alternative server-side implementation (via breser):
[[[
command="svnserve -t --tunnel-user='Daniel Shahaf'" ssh-rsa ...
]]]


Um, does this mean everyone's commits are coing to look like
Daniel Shahaf made them?  If not, where is --tunnel-user going to
come from?


It comes from the .ssh/authorized_keys file, in a context 
that is associated with exactly one ssh key (the "ssh-rsa 
..." part); this would be the same place that previously had 
"--tunnel-user=danielsh".


--apb (Alan Barrett)


Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c

2012-12-01 Thread Justin Erenkrantz
On Sat, Dec 1, 2012 at 5:59 AM, Johan Corveleyn  wrote:

> I'm wondering whether your concerns apply to both internet-wide
> deployments and local (all on the same LAN) ones.
>

That line is certainly a fair one to draw in the sand.  That said, I think
the internal use case cries out even *more* for the parallel updates as the
internal server in that environment is often wildly over-provisioned on the
CPU side - with a fairly low-traffic environment, you want to take
advantage of the parallel cores of a CPU to drive the updates.

Generally speaking, what I discovered years ago back in 2006 (yikes) and I
believe is still true as we near 2013 (shudder), if everything else is
perfectly optimized (disk, latency, bandwidth, etc.), you're going to
eventually bottleneck on the checksumming on both client and server - which
is entirely CPU-bound and is expensive.  You can solve that by splitting
out the work across multiple cores - for a server, you need to utilize
multiple parallel requests in-flight; and for a client, you then need to
parallelize the editor drive.

The reason that disk isn't such a bottleneck as you might first expect is
due to the OS's buffer cache - for reads on the server-side, common data is
already going to be in RAM so hot spots in the fsfs repos will already be
in memory, for writes on the client-side, modern client OSes won't
necessarily block you until everything is sync'd to disk.  But, once you
exhaust the capabilities of RAM, your underlying disk architecture matters
a lot and one that might not be intuitive to those that haven't spent a lot
of time closely with them.  (Hi Brane!)  If you are using direct-attached
storage locally on either server or client, then you will probably be
bottlenecked right there.  However, if your corporate environment has an
NFS filer or SAN (a la NetApp/EMC) backing the FSFS repository or as NFS
working copies (oh so common), those large disk subsystems are geared
towards parallel I/Os - not single-threaded I/O performance -
Isilon/BlueArc-class storage is however; but I've yet to see anyone
obsessed enough about SVN I/O perf to place either their repository or
working copies on a BlueArc-class storage system!  So, if you are not using
direct-attached storage and are using NFS today in a corporate environment
on either client or server, then you want to parallelize everything so that
you can take advantage of the disk/network I/O architecture preferred by
NetApp/EMC.  Throwing more cores against a NetApp/EMC storage system in a
high-available bandwidth environment allows for linear performance returns
(i.e., reading/writing one I/O is 1X, two threads is 2X, three threads is
3X, etc, etc.).

To that end, I'd eventually love to see ra_serf drive the update editor
across multiple threads so that the checksum and disk I/O bottleneck can be
distributed across cores on the client-side as well.  Compared to where we
were in 2006, that's the biggest inefficiency we have yet to solve and take
advantage of.  And, I'm sure this'll break all sorts of promises in the Ev1
and perhaps Ev2 world and drive C-Mike even crazier.  =)  But, if you want
to put a rocket pack on our HTTP performance, that's exactly what we should
do.  I'm reasonably certain that serf itself could be finely tuned to
handle network I/O in a single thread at or close to wire-speed even on a
10G connection with a modern processor/OS - it's what we do with the file
contents/textdeltas that needs to be shoved to a different set of worker
threads and remove all of that libsvn_wc processing from blocking network
traffic processing and get it all distributed and thread-safe.  If we do
that, woah, I'd bet that we are we going to make things way faster across
the board and completely blow everything else out of the water when our
available bandwidth is high - which is the case in an internal network.
 And, yes, that clearly could all be done in time for 1.8 without
jeopardizing the timelines one tiny bit.  =P

So, that's my long-winded answer of saying that, yah, even in an internal
LAN environment, you still want to parallelize.

However, I'm definitely not going to veto a patch that would add an httpd
directive that allows the server to steer the client - unless overridden by
the client's local config - to using parallel updates or not.  -- justin


Re: reposurgeon now writes Subversion repositories

2012-12-01 Thread Eric S. Raymond
(Apologies if this is a duplicate send.  I just had a disturbing
glitch in my MUA and want to make sure it got out.)

Daniel Shahaf :
> Server-side implementation, independent of RA method: (via brane)

Ah, now that looks somewhat like progress.  But some (possibly all) of
these solutions have serious weaknesses which you need to think about.

> [[[
> #!/usr/bin/env python
> 
> import sys
> from svn.repos import *
> from svn.fs import *
> from svn.core import SVN_PROP_REVISION_AUTHOR
> 
> FULLNAMES = {
>   'danielsh': 'Daniel Shahaf',
> }
> 
> reposdir, txnname = sys.argv[1:3]
> 
> repos = svn_repos_open(reposdir, None)
> fs = svn_repos_fs(repos)
> txn = svn_fs_open_txn(fs, txnname, None)
> propval = svn_fs_txn_prop(txn, SVN_PROP_REVISION_AUTHOR, None)
> svn_fs_change_txn_prop(txn, SVN_PROP_REVISION_AUTHOR,
>FULLNAMES.get(propval, propval), None)
> ]]]

This one confines your Unix-ID adhesion to the FULLNAMES array,  which
is a long step in the right direction because it means your repo history
will be local-ID-clean.  

But it doesn't actually solve the mobility problem.  If the project
ever moves, you still have to patch the FULLNAMES dictionary by hand.
This approach won't scale very well.

I also note that you do really want "J. Random User " 
with a preferred "home" address as part of the mix, because the
entropy of human names alone is not quite high enough.  Yes, if I see
"Daniel Shahaf" I'm pretty sure there is only one of those.  But
"Willam Smith" or "Robert Jones"? " :-)
 
> Alternative server-side implementation (via markphip):
> [[[
> AuthLDAPRemoteUserAttribute cn
> ]]]

A variant of this that does "J. Random User "
looks like it might work provided there's an LDAP directory and we trust 
the LDAP directory to be up to date.  The second assumption seems
reasonable if we grant the first.  

But the first?  I've heard of LDAP and know roughly what it does, but
I've never seen a live instance.  Forges don't have them.  Maybe I'm
being parochial, but this seems like a solution for a case too unusual
to be very interesting.

> Alternative server-side implementation (via breser):
> [[[
> command="svnserve -t --tunnel-user='Daniel Shahaf'" ssh-rsa ...
> ]]]

Um, does this mean everyone's commits are coing to look like
Daniel Shahaf made them?  If not, where is --tunnel-user going to
come from?

> Client-side implementation (via danielsh):
> [[[
> [ -n "${EMAIL}" ] && svn() {
>  if [ x"$1" = x"ci" ] || [ x"$1" = x"commit" ]; then
>   command svn --with-revprop=svn:x-committer-email=${EMAIL} "$@"
>  else
>   command svn "$@"
>  fi
> }
> ]]]

Bletch.  This one is begging for failure unless you can train your
users to use a wrapper script every time - good luck with that.  One
important case where this approach will break, and cause acrimony, is
Emacs VC mode.  That's somewhere up to 50% of your users under
open-source platforms, if the stats on editor usage are to be believed.

The lesson from this criticism is intended to be that it's not
enough to make Internet-scoped IDs possible, you have to make 
them *easy* - that is, not disruptive of normal workflow.

But this has been fruitful.  I think I can write a simple proposal
about how to solve this problem now.  I'll do it in my next email.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond

A man with a gun is a citizen.  A man without a gun is a subject.


Re: reposurgeon now writes Subversion repositories

2012-12-01 Thread Eric S. Raymond
Daniel Shahaf :
> Server-side implementation, independent of RA method: (via brane)

Ah, now that looks somewhat like progress.  But some (possibly all) of
these solutions have serious weaknesses which you need to think about.

> [[[
> #!/usr/bin/env python
> 
> import sys
> from svn.repos import *
> from svn.fs import *
> from svn.core import SVN_PROP_REVISION_AUTHOR
> 
> FULLNAMES = {
>   'danielsh': 'Daniel Shahaf',
> }
> 
> reposdir, txnname = sys.argv[1:3]
> 
> repos = svn_repos_open(reposdir, None)
> fs = svn_repos_fs(repos)
> txn = svn_fs_open_txn(fs, txnname, None)
> propval = svn_fs_txn_prop(txn, SVN_PROP_REVISION_AUTHOR, None)
> svn_fs_change_txn_prop(txn, SVN_PROP_REVISION_AUTHOR,
>FULLNAMES.get(propval, propval), None)
> ]]]

This one confines your Unix-ID adhesion to the FULLNAMES array,  which
is a long step in the right direction because it means your repo history
will be local-ID-clean.  

But it doesn't actually solve the mobility problem.  If the project
ever moves, you still have to patch the FULLNAMES dictionary by hand.
This approach won't scale very well.

I also note that you do really want "J. Random User " 
with a preferred "home" address as part of the mix, because the
entropy of human names alone is not quite high enough.  Yes, if I see
"Daniel Shahaf" I'm pretty sure there is only one of those.  But
"Willam Smith" or "Robert Jones"? " :-)
 
> Alternative server-side implementation (via markphip):
> [[[
> AuthLDAPRemoteUserAttribute cn
> ]]]

A variant of this that does "J. Random User "
looks like it might work provided there's an LDAP directory and we trust 
the LDAP directory to be up to date.  The second assumption seems
reasonable if we grant the first.  

But the first?  I've heard of LDAP and know roughly what it does, but
I've never seen a live instance.  Forges don't have them.  Maybe I'm
being parochial, but this seems like a solution for a case too unusual
to be very interesting.

> Alternative server-side implementation (via breser):
> [[[
> command="svnserve -t --tunnel-user='Daniel Shahaf'" ssh-rsa ...
> ]]]

Um, does this mean everyone's commits are coing to look like
Daniel Shahaf made them?  If not, where is --tunnel-user going to
come from?

> Client-side implementation (via danielsh):
> [[[
> [ -n "${EMAIL}" ] && svn() {
>  if [ x"$1" = x"ci" ] || [ x"$1" = x"commit" ]; then
>   command svn --with-revprop=svn:x-committer-email=${EMAIL} "$@"
>  else
>   command svn "$@"
>  fi
> }
> ]]]

Bletch.  This one is begging for failure unless you can train your
users to use a wrapper script every time - good luck with that.  One
important case where this approach will break, and cause acrimony, is
Emacs VC mode.  That's somewhere up to 50% of your users under
open-source platforms, if the stats on editor usage are to be believed.

The lesson from this criticism is intended to be that it's not
enough to make Internet-scoped IDs possible, you have to make 
them *easy* - that is, not disruptive of normal workflow.

But this has been fruitful.  I think I can write a simple proposal
about how to solve this problem now.  I'll do it in my next email.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


Re: 1.8 Progress

2012-12-01 Thread Hyrum K Wright
On Thu, Nov 29, 2012 at 4:52 PM, C. Michael Pilato wrote:
>
> > 2) Ev2.  The notes say this is believed to be in a releasable state?  Is
> > there any work needed to verify this?  Do we need to remove the use of
> Ev2
> > in any place to avoid releasing with compatibility shims in use? Are we
> > comfortable that the API is complete?
>
> Julian expressed doubt about whether the API was ready for prime-time.
>
> C-Mike expressed concern about the extremely low bus factor.
>
> Hyrum acknowledged both, and continued with:  "We can always shuffle
> headers
> around or document the things as experimental, so committing ourselves to
> the API as this point isn't my concern.  The only real limiting around Ev2
> and 1.8 is issue #4116 which is svnrdump failures over ra_serf.  In the
> issue, I propose using Ev2 to get around the problem, since the dumpfile
> format is so incongruent with the editor.  Of course, we don't *have* to do
> that, but as I've thought about it, any solution will require a bit o'
> caching---which we've already implemented as part of the Ev2 shims.  We
> *might* be able to implement the svnrdump editor as Ev2, shim the thing on
> the client side (which gives us the required caching) and release that way.
>  Or there might be a better solution I'm overlooking because I've got Ev2
> on
> the brain."
>

This is basically boils down to "rdump isn't completely Delta Editor
friendly, which interacts badly with Serf."  This problem is only
tangentially related to Ev2, but it was proposed as one of the possible
solutions.  It's probably better to try and pursue other solutions to this
independent of Ev2.

As for Ev2 itself, I don't see anything that should be blocking 1.8.  If
people are uncomfortable shipping the API, some documentation and/or header
hackery should be sufficient to make it mutable in future releases.  As far
as I know, all the Ev2 work is entirely self-contained within Subversion.

OWNERSHIP:  Hyrum's got the most experience here, but due to his time
> contention, we may very well have no owner for this at all.  That's bad.


Sadly, true.

-Hyrum


Re: reposurgeon now writes Subversion repositories

2012-12-01 Thread Eric S. Raymond
Alan Barrett :
> Perhaps it would be a good first step to add examples to the
> documentation, showing how the admin can use "Full Name
> " in the svn:author field, with all the common access
> methods.

Yes. I think it is (a) possible that better documentation can solve this
problem, and (b) certain that better documentation is *necessary* to solve
this problem.

I'm willing to help.  You can look at the description of the dump-load
format at notes/dump-load-format.txt, most of which I wrote earlier
this year, to see that this is not an idle promise.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


Fwd: Regarding the Outreach Program for Women

2012-12-01 Thread Shivani Poddar
Hi
I am currently a second year student in IIIT-Hyderabad and am hugely
interested in a lot of Projects that are being mentored here. It would be
great if you could guide me how to go about applying for them.As it is i am
not able to locate the respective mentors for your projects. Having seen
this program only yesterday, i have not been able to do much work for
contributions, but am confident on my abilities to make a difference in the
same if given the right opportunity. I request you to please forward this
to the respective people.
 I am attaching my cv and posting a link to my github repo.
I would be highly obliged if you could have a look at them.
https://github.com/shivanipoddariiith

Thank You,
Shivani Poddar


cv-internopports.rtf
Description: RTF file


Re: reposurgeon now writes Subversion repositories

2012-12-01 Thread Daniel Shahaf
Alan Barrett wrote on Sat, Dec 01, 2012 at 12:05:48 +0300:
> Perhaps it would be a good first step to add examples to the 
> documentation, showing how the admin can use "Full Name " 
> in the svn:author field, with all the common access methods.

Server-side implementation, independent of RA method: (via brane)
[[[
#!/usr/bin/env python

import sys
from svn.repos import *
from svn.fs import *
from svn.core import SVN_PROP_REVISION_AUTHOR

FULLNAMES = {
  'danielsh': 'Daniel Shahaf',
}

reposdir, txnname = sys.argv[1:3]

repos = svn_repos_open(reposdir, None)
fs = svn_repos_fs(repos)
txn = svn_fs_open_txn(fs, txnname, None)
propval = svn_fs_txn_prop(txn, SVN_PROP_REVISION_AUTHOR, None)
svn_fs_change_txn_prop(txn, SVN_PROP_REVISION_AUTHOR,
   FULLNAMES.get(propval, propval), None)
]]]

Alternative server-side implementation (via markphip):
[[[
AuthLDAPRemoteUserAttribute cn
]]]

Alternative server-side implementation (via breser):
[[[
command="svnserve -t --tunnel-user='Daniel Shahaf'" ssh-rsa ...
]]]

Client-side implementation (via danielsh):
[[[
[ -n "${EMAIL}" ] && svn() {
 if [ x"$1" = x"ci" ] || [ x"$1" = x"commit" ]; then
  command svn --with-revprop=svn:x-committer-email=${EMAIL} "$@"
 else
  command svn "$@"
 fi
}
]]]


Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c

2012-12-01 Thread Johan Corveleyn
On Sat, Dec 1, 2012 at 6:36 AM, Justin Erenkrantz  wrote:
> On Fri, Nov 30, 2012 at 4:54 PM,  wrote:
>>
>> Author: cmpilato
>> Date: Fri Nov 30 21:54:35 2012
>> New Revision: 1415864
>>
>> URL: http://svn.apache.org/viewvc?rev=1415864&view=rev
>> Log:
>> Implement in ra_serf "send-all" mode support for update-style REPORTs
>> and their responses.  (Currently disabled by compile-time conditionals.)
>>
>> (This one goes out to Ivan Zhakov.)
>
>
> I've stated for a long time that I think the send-all mode is a huge mistake
> architecturally because it is too prone to double-compression and TCP
> pipeline stalls and is a tremendous burden on a properly-configured httpd
> (by not taking advantage of server-side parallelism), it's nice to see it's
> not *too* hard to shoehorn this bad idea back into ra_serf.  We'd never be
> able to shove the non-send-all approach into ra_neon.  =)

I'm wondering whether your concerns apply to both internet-wide
deployments and local (all on the same LAN) ones.

It seems to me that SVN has two sets of audiences when it comes to
networking: some have to support users over the internet with
sometimes slow and high-latency, perhaps flaky connections; and others
have all their users on a local (or almost-local) network, and want to
make optimal use of their infrastructure, which offers an absolutely
rock-solid low-latency connection ... they'd like to shove the content
through that (wide, short) pipe as quickly as possible.

I'm no expert, but I suppose it's possible that those two audiences
need two different networking configurations to make optimal use of
their environment. If that's the case, it would be great if we could
offer some (clear, simple to use) configuration directives for those
admins to tune things ...

Just my 2 cents ...
-- 
Johan


Re: reposurgeon now writes Subversion repositories

2012-12-01 Thread Alan Barrett

On Sat, 01 Dec 2012, Eric S. Raymond wrote:

I've lost count of the number of Subversion repo
lifts I've done (has to be more than a dozen at this point), and in no
case have I ever seen *anything* but a local Unix ID in the svn:author
property.


Yes, it's probably true that most svn repositories use short 
strings that resemble unix user ids, and a lot of the svn 
documentation uses such strings in examples.  But it's also 
true that the admin can use almost any string they like.  In 
repositories that I have set up, I have always used short strings 
that resemble local unix IDs, but in most cases those strings 
would not have been valid unix user names on the server host.


Perhaps it would be a good first step to add examples to 
the documentation, showing how the admin can use "Full Name 
" in the svn:author field, with all the common 
access methods.


--apb (Alan Barrett)