Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c

2012-12-10 Thread Lieven Govaerts
Hi Markus,

On Mon, Dec 3, 2012 at 10:43 AM, Markus Schaber  wrote:
> Hi,
>
> Just another crazy idea:
>
> The main problem with the parallelization in ra_serf seems to be the number 
> of http requests (which potentially causes a high number of authentications 
> and tcp connections).
>
> Maybe we could add some partitioned send-all request:
>
> When the client decides to use 4 connections, it could send 4 requests, with 
> some parameter like send-all(1/4), send-all(2/4), ..., send-all(4/4).
>
> Then the server can send one quarter of the complete response on each 
> connection.
>
> An advanced server could even share the common state of those 4 requests 
> through some shared memory / caching scheme, to avoid doing the same work 
> multiple times.
>
> Years ago, I implemented a similar scheme between caching GIS web frontend 
> servers, and the rendering backend server, in the protocol for fetching and 
> rendering the map tiles. It gave a nearly linear speedup with the number of 
> connections, up to the point where the CPUs were saturated.
>

the concept implemented in ra_serf is to parallelize individual GET
requests, so that these can be cached by proxies either on the client
or on the server side. So we want to avoid using send-all as much as
possible, as this creates always one large uncacheable response.

I've made a mental note of your idea though, if we need to stick with
send-all and further improve it, your suggestion is one way to do
this.

>
> Best regards
>
> Markus Schaber
>

thanks,

Lieven
[..]


Re: non-skelta update editor mode in ra_serf (was Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c)

2012-12-05 Thread Lieven Govaerts
Hi,

On Sat, Dec 1, 2012 at 10:18 PM, Ivan Zhakov  wrote:
> [ changing subject to make topic more visible]
>
> On Sat, Dec 1, 2012 at 9:00 PM, Mark Phippard  wrote:
>> On Sat, Dec 1, 2012 at 12:36 AM, Justin Erenkrantz
>>  wrote:
>>> On Fri, Nov 30, 2012 at 4:54 PM,  wrote:

 Author: cmpilato
 Date: Fri Nov 30 21:54:35 2012
 New Revision: 1415864

 URL: http://svn.apache.org/viewvc?rev=1415864&view=rev
 Log:
 Implement in ra_serf "send-all" mode support for update-style REPORTs
 and their responses.  (Currently disabled by compile-time conditionals.)

 (This one goes out to Ivan Zhakov.)
>>>
>>>
>>> I've stated for a long time that I think the send-all mode is a huge mistake
>>> architecturally because it is too prone to double-compression and TCP
>>> pipeline stalls and is a tremendous burden on a properly-configured httpd
>>> (by not taking advantage of server-side parallelism), it's nice to see it's
>>> not *too* hard to shoehorn this bad idea back into ra_serf.  We'd never be
>>> able to shove the non-send-all approach into ra_neon.  =)
>>
>> Just to be clear, I do not believe anyone is suggesting we completely
>> abandon the non-send-all approach.  I like that this approach can
>> offer good performance on a well-configured server as well as enable
>> new features/ideas such as not even fetching the full-texts that we
>> already have locally.  I think the question is simply what is the best
>> way to deliver this.
> Completely agree.
>
> My point was that in theory skelta-mode is cool, but it still needs a
> lot of work to get it really done. So let's release ra_serf by
> piecemeal, because we also have significant amount of ra_serf issues
> unrelated to update editor.
>
>>
>>> Here's my suggestion for consideration - let's experiment with this setting
>>> in the beta release process with the setting as-is - that is we always do
>>> the parallel updates unconditionally (except perhaps when svnrdump is being
>>> stupid).  If we get real users complaining about the update during that
>>> cycle, we can then figure out either switching the default and/or adding a
>>> config-option or even allowing some control via capabilities exchange.
>>
>> I feel pretty strongly that we should at minimum use the send-all
>> approach when talking to pre-1.8 servers.  Even though in some
>> situations it could still offer good performance.  I just think it
>> would be more respectful to our users (server admins in this case) to
>> not change this behavior in a way that could surprise them.  Maybe we
>> could come up with exceptions, such as older servers that are using
>> the SVNAllowBulkUpdates off directive.  In that situation we should
>> use the new behavior since that is basically what that directive is
>> asking for.
>>
>> As I said in another thread, I think we should treat a 1.8 server the
>> same way and require someone that was upgrading to add some new
>> directive to enable the new feature.  This would allow a server admin
>> to setup his server correctly, including using things like
>> mod_deflate, and turn on the new behavior rather than get it
>> automatically simply because they upgraded their binaries.
>>
>> This seems like it satisfies everyone.  Existing users, especially
>> those running older server versions, would not be surprised by new and
>> unwanted client behavior, and it would still be easy to configure a
>> new server properly to support the non-send-all mode when it was
>> desired.  I just do not see what the downside would be to approaching
>> it this way.
>>
> +1.
>

r1417639 and r1417642 respectively give the server admin and the user
the option to force the use of bulk updates.
Changes are ready for review and use.

I haven't changed any of the defaults in ra_serf and mod_dav_svn.

Lieven


AW: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c

2012-12-03 Thread Markus Schaber
Hi,

Just another crazy idea:

The main problem with the parallelization in ra_serf seems to be the number of 
http requests (which potentially causes a high number of authentications and 
tcp connections).

Maybe we could add some partitioned send-all request:

When the client decides to use 4 connections, it could send 4 requests, with 
some parameter like send-all(1/4), send-all(2/4), ..., send-all(4/4).

Then the server can send one quarter of the complete response on each 
connection.

An advanced server could even share the common state of those 4 requests 
through some shared memory / caching scheme, to avoid doing the same work 
multiple times.

Years ago, I implemented a similar scheme between caching GIS web frontend 
servers, and the rendering backend server, in the protocol for fetching and 
rendering the map tiles. It gave a nearly linear speedup with the number of 
connections, up to the point where the CPUs were saturated.


Best regards

Markus Schaber

CODESYS® a trademark of 3S-Smart Software Solutions GmbH 

Inspiring Automation Solutions 

3S-Smart Software Solutions GmbH 
Dipl.-Inf. Markus Schaber | Product Development Core Technology 
Memminger Str. 151 | 87439 Kempten | Germany 
Tel. +49-831-54031-979 | Fax +49-831-54031-50 

E-Mail: m.scha...@codesys.com | Web: codesys.com 
CODESYS internet forum: forum.codesys.com 

Managing Directors: Dipl.Inf. Dieter Hess, Dipl.Inf. Manfred Werner | Trade 
register: Kempten HRB 6186 | Tax ID No.: DE 167014915 
Von: justin.erenkra...@gmail.com [mailto:justin.erenkra...@gmail.com] Im 
Auftrag von Justin Erenkrantz
Gesendet: Samstag, 1. Dezember 2012 15:15
An: Lieven Govaerts
Cc: Johan Corveleyn; dev@subversion.apache.org
Betreff: Re: svn commit: r1415864 - 
/subversion/trunk/subversion/libsvn_ra_serf/update.c

On Sat, Dec 1, 2012 at 9:01 AM, Lieven Govaerts  wrote:
There are some scenario's where either the server admin or the user
can decide if parallel requests make sense or not.

I'm specifically thinking of the use Kerberos per request
authentication. These responses can't be cached on the client side,
and require the authorization header to be sent for each request.
Assuming 2 step handshake of which serf can bypass the first, this
means an overhead per request of 1-10KB, with a 3 step handshake each
request has to be sent twice further increasing the overhead.
IMHO in this scenario the server admin should be able to veto the use
of parallel requests.

And the same is true for https connections, where it's also the server
admin who can decide if the necessary caches have been put in place to
enable the benefits of parallel requests.

Totally agreed.  I'd favor a three-value httpd directive option on the 
server-side that is advertised in the capabilities exchange:

- default (client defaults to parallel if ra_serf, serial if older ra_neon 
client; or if client overrides ra_serf via their local servers options)
- serial (server suggests to client that it should be serial; but permit 
parallel when client wants it)
- force-serial (same capability advertisement, but always trigger send-all 
responses regardless of what client asks for)

I'm 95% sure we have code in ra_serf that handles the case where the server 
sends us inline responses anyway as older (prior to 1.2, IIRC) always sent 
inline responses no matter what we send...so, it should be fairly 
straightforward decision tree with minimal code changes.

My $.02...which is still not enough for me to write the patch.  =)  -- justin


Re: non-skelta update editor mode in ra_serf (was Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c)

2012-12-01 Thread Johan Corveleyn
On Sat, Dec 1, 2012 at 10:40 PM, Branko Čibej  wrote:
> On 01.12.2012 22:18, Ivan Zhakov wrote:
>> Completely agree.
>>
>> My point was that in theory skelta-mode is cool, but it still needs a
>> lot of work to get it really done.
>
> Sorry? I've had "http-library = serf" in ~/.subversion/config for years,
> and have seen no problems. What does it take, in your opinion, to get it
> "really done"?

It has become abundantly clear during the last year that, just because
some people have been using it for years, doesn't mean it's stable /
done / ready for prime time. What do you think I spent my entire
Berlin hackathon week on last June, together with Justin and others?
And weeks before that and after ... and the time spent by others
during the last months fixing difficult to reproduce, yet important
bugs.

Those issues were not due to wrong configurations, or total edge
cases, but things that people with big installations would have run
into the first day they would have started using it.

I don't want to rehash all the discussions, I think there's mostly
consensus on the general direction we can / should go (which is
forward, not backward). But I find "I have been using it for years
without problems" not a very strong argument in this case.

Getting back to the actual topic ... as an svn admin, I'd definitely
like to have the choice between send-all and skelta mode, if that's
possible. Which one should be the default ... I don't know. But having
the choice is the most important thing. I'd sleep better at night,
knowing that I can flip a switch to reduce the number of requests
etc., in the unlikely case that would turn out to be a problem :-).

-- 
Johan


Re: non-skelta update editor mode in ra_serf (was Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c)

2012-12-01 Thread Branko Čibej
On 01.12.2012 22:18, Ivan Zhakov wrote:
> Completely agree.
>
> My point was that in theory skelta-mode is cool, but it still needs a
> lot of work to get it really done.

Sorry? I've had "http-library = serf" in ~/.subversion/config for years,
and have seen no problems. What does it take, in your opinion, to get it
"really done"?

>  So let's release ra_serf by
> piecemeal, because we also have significant amount of ra_serf issues
> unrelated to update editor.

We've been through that. There's one or maybe two bugs that are actual
bugs that pop up in edge cases, everything else is related to server
configuration etc.

Is your goal shipping without bugs?

-- Brane

-- 
Branko Čibej
Director of Subversion | WANdisco | www.wandisco.com



non-skelta update editor mode in ra_serf (was Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c)

2012-12-01 Thread Ivan Zhakov
[ changing subject to make topic more visible]

On Sat, Dec 1, 2012 at 9:00 PM, Mark Phippard  wrote:
> On Sat, Dec 1, 2012 at 12:36 AM, Justin Erenkrantz
>  wrote:
>> On Fri, Nov 30, 2012 at 4:54 PM,  wrote:
>>>
>>> Author: cmpilato
>>> Date: Fri Nov 30 21:54:35 2012
>>> New Revision: 1415864
>>>
>>> URL: http://svn.apache.org/viewvc?rev=1415864&view=rev
>>> Log:
>>> Implement in ra_serf "send-all" mode support for update-style REPORTs
>>> and their responses.  (Currently disabled by compile-time conditionals.)
>>>
>>> (This one goes out to Ivan Zhakov.)
>>
>>
>> I've stated for a long time that I think the send-all mode is a huge mistake
>> architecturally because it is too prone to double-compression and TCP
>> pipeline stalls and is a tremendous burden on a properly-configured httpd
>> (by not taking advantage of server-side parallelism), it's nice to see it's
>> not *too* hard to shoehorn this bad idea back into ra_serf.  We'd never be
>> able to shove the non-send-all approach into ra_neon.  =)
>
> Just to be clear, I do not believe anyone is suggesting we completely
> abandon the non-send-all approach.  I like that this approach can
> offer good performance on a well-configured server as well as enable
> new features/ideas such as not even fetching the full-texts that we
> already have locally.  I think the question is simply what is the best
> way to deliver this.
Completely agree.

My point was that in theory skelta-mode is cool, but it still needs a
lot of work to get it really done. So let's release ra_serf by
piecemeal, because we also have significant amount of ra_serf issues
unrelated to update editor.

>
>> Here's my suggestion for consideration - let's experiment with this setting
>> in the beta release process with the setting as-is - that is we always do
>> the parallel updates unconditionally (except perhaps when svnrdump is being
>> stupid).  If we get real users complaining about the update during that
>> cycle, we can then figure out either switching the default and/or adding a
>> config-option or even allowing some control via capabilities exchange.
>
> I feel pretty strongly that we should at minimum use the send-all
> approach when talking to pre-1.8 servers.  Even though in some
> situations it could still offer good performance.  I just think it
> would be more respectful to our users (server admins in this case) to
> not change this behavior in a way that could surprise them.  Maybe we
> could come up with exceptions, such as older servers that are using
> the SVNAllowBulkUpdates off directive.  In that situation we should
> use the new behavior since that is basically what that directive is
> asking for.
>
> As I said in another thread, I think we should treat a 1.8 server the
> same way and require someone that was upgrading to add some new
> directive to enable the new feature.  This would allow a server admin
> to setup his server correctly, including using things like
> mod_deflate, and turn on the new behavior rather than get it
> automatically simply because they upgraded their binaries.
>
> This seems like it satisfies everyone.  Existing users, especially
> those running older server versions, would not be surprised by new and
> unwanted client behavior, and it would still be easy to configure a
> new server properly to support the non-send-all mode when it was
> desired.  I just do not see what the downside would be to approaching
> it this way.
>
+1.

-- 
Ivan Zhakov


Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c

2012-12-01 Thread Justin Erenkrantz
On Sat, Dec 1, 2012 at 12:00 PM, Mark Phippard  wrote:

> I feel pretty strongly that we should at minimum use the send-all
> approach when talking to pre-1.8 servers.  Even though in some
> situations it could still offer good performance.  I just think it
> would be more respectful to our users (server admins in this case) to
> not change this behavior in a way that could surprise them.  Maybe we
> could come up with exceptions, such as older servers that are using
> the SVNAllowBulkUpdates off directive.  In that situation we should
> use the new behavior since that is basically what that directive is
> asking for.
>

Without a lot of concrete feedback that parallel updates should be removed
by default, I strongly believe that we should not be conservative on this
issue.  The issue here is not one of compatibility - ra_serf has been
around for years and can talk just fine to older servers (way back to prior
to 1.0 servers actually).  The only argument against altering the default
behavior is that there might be an admin of a high-traffic site somewhere
that might suddenly be shocked by more HTTP requests coming in.  I honestly
have little to no sympathy for such an admin who doesn't properly
understand how to manage a large installation - they likely have other
issues that they are not paying attention to.  Until we have hoards of
users coming in and complaining about this, I think it's silly to be
conservative here.

I'm definitely not against giving knobs to the client or to the admin in
weird corner cases (provided someone cares enough to write that up), but I
strongly believe that for now we should do the right thing out of the box
in 1.8 - which is to utilize parallel updates.  -- justin


Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c

2012-12-01 Thread Mark Phippard
On Fri, Nov 30, 2012 at 5:38 PM, C. Michael Pilato  wrote:
> On 11/30/2012 05:25 PM, Mark Phippard wrote:
>> On Fri, Nov 30, 2012 at 5:23 PM, C. Michael Pilato  
>> wrote:
>>> On 11/30/2012 05:00 PM, Mark Phippard wrote:
 On Fri, Nov 30, 2012 at 4:54 PM,   wrote:
> Author: cmpilato
> Date: Fri Nov 30 21:54:35 2012
> New Revision: 1415864
>
> URL: http://svn.apache.org/viewvc?rev=1415864&view=rev
> Log:
> Implement in ra_serf "send-all" mode support for update-style REPORTs
> and their responses.  (Currently disabled by compile-time conditionals.)

 Sweet!

 Would this also resolve the issue with svnrdump, or could it?  When
 Serf is using this mode, I assume it is also now conforming to Ev1?
>>>
>>> I guess it *could* based on what I'm reading is considered the source of
>>> svnrdump+ra_serf's problems, but I'm a bit confused -- I thought svnrdump
>>> used the ra-replay API instead of the ra-update one?
>>
>> Guess I am more wondering if it was another area where the same
>> solution could be applied?
>
> No, that's just it.  ra_serf's implementation of the ra-replay API is
> single-connection, just like ra-neon's was.  What suprises me is that
> svnrdump *does* use the ra-update API.
>
> Ah!  I see why, now.  When not doing an incremental dump, 'svnrdump dump'
> uses the ra-update API to handle that initial checkout-like revision.  After
> that (and otherwise when in incremental mode), it uses the ra-replay API.
> So yes, I believe svnrdump would be in fine shape over ra-serf if it was
> asking the server to use this "send-all" mode, where document Ev1 drive
> ordering *should* be honored.

So this sounds like pretty great news.  Regardless what we decide to
do for Serf with normal updates, it seems like we could
unconditionally make svnrdump tap into the send-all mode and that
would remove a release blocker.


-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/


Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c

2012-12-01 Thread Mark Phippard
On Sat, Dec 1, 2012 at 12:36 AM, Justin Erenkrantz
 wrote:
> On Fri, Nov 30, 2012 at 4:54 PM,  wrote:
>>
>> Author: cmpilato
>> Date: Fri Nov 30 21:54:35 2012
>> New Revision: 1415864
>>
>> URL: http://svn.apache.org/viewvc?rev=1415864&view=rev
>> Log:
>> Implement in ra_serf "send-all" mode support for update-style REPORTs
>> and their responses.  (Currently disabled by compile-time conditionals.)
>>
>> (This one goes out to Ivan Zhakov.)
>
>
> I've stated for a long time that I think the send-all mode is a huge mistake
> architecturally because it is too prone to double-compression and TCP
> pipeline stalls and is a tremendous burden on a properly-configured httpd
> (by not taking advantage of server-side parallelism), it's nice to see it's
> not *too* hard to shoehorn this bad idea back into ra_serf.  We'd never be
> able to shove the non-send-all approach into ra_neon.  =)

Just to be clear, I do not believe anyone is suggesting we completely
abandon the non-send-all approach.  I like that this approach can
offer good performance on a well-configured server as well as enable
new features/ideas such as not even fetching the full-texts that we
already have locally.  I think the question is simply what is the best
way to deliver this.

> Here's my suggestion for consideration - let's experiment with this setting
> in the beta release process with the setting as-is - that is we always do
> the parallel updates unconditionally (except perhaps when svnrdump is being
> stupid).  If we get real users complaining about the update during that
> cycle, we can then figure out either switching the default and/or adding a
> config-option or even allowing some control via capabilities exchange.

I feel pretty strongly that we should at minimum use the send-all
approach when talking to pre-1.8 servers.  Even though in some
situations it could still offer good performance.  I just think it
would be more respectful to our users (server admins in this case) to
not change this behavior in a way that could surprise them.  Maybe we
could come up with exceptions, such as older servers that are using
the SVNAllowBulkUpdates off directive.  In that situation we should
use the new behavior since that is basically what that directive is
asking for.

As I said in another thread, I think we should treat a 1.8 server the
same way and require someone that was upgrading to add some new
directive to enable the new feature.  This would allow a server admin
to setup his server correctly, including using things like
mod_deflate, and turn on the new behavior rather than get it
automatically simply because they upgraded their binaries.

This seems like it satisfies everyone.  Existing users, especially
those running older server versions, would not be surprised by new and
unwanted client behavior, and it would still be easy to configure a
new server properly to support the non-send-all mode when it was
desired.  I just do not see what the downside would be to approaching
it this way.

-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/


Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c

2012-12-01 Thread Daniel Shahaf
Justin Erenkrantz wrote on Sat, Dec 01, 2012 at 09:50:29 -0500:
> On Sat, Dec 1, 2012 at 9:41 AM, Branko Čibej  wrote:
> 
> > On 01.12.2012 14:31, Justin Erenkrantz wrote:
> > > And, yes, that clearly could all be done in time for 1.8 without
> > > jeopardizing the timelines one tiny bit. =P
> >
> > Eep ... :)
> >
> >
> > Another thing I've been thinking about is this: Why are we using SHA1
> > checksums on the server and on the wire for consistency checks when a
> > 64-bit CRC would do the job just as well, and 15 times cheaper? And
> > banging my head against the wall for not thinking of this 10 years ago.
> >
> > I can sort of understand the use of SHA1 as a content index for
> > client-side pristine files. On the server, however ... dunno. Maybe we
> > could design something akin to what the rsync protocol does, but for
> > repository-wide data storage. Could be quite tricky to achieve locality,
> > however.
> >
> 
> The one thing that's nice with using SHA checksums is we're using it
> everywhere.  It makes protocol debugging a *lot* easier - since we also
> used SHA checksums as the content index, that makes it easier to compare
> what we recorded in libsvn_wc to what was sent by the server.  If we
> diverged the checksums algorithms, it'd be hard to do a quick comparison
> visually (do the checksums match?) without actually running the checksum
> yourself!

If that's the problem, have the server send the recorded-in-fs sha1
checksum as an attribute that the client ignores. (in SVN_DEBUG builds
only)

> 
> So, I think we optimized for humans here...and I'm okay with that.  We can
> always build faster processors...and take advantage of parallelism.  =)
> 
> There I go off on a tangent again.
> >
> 
> *grin*  -- justin


Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c

2012-12-01 Thread Justin Erenkrantz
On Sat, Dec 1, 2012 at 9:41 AM, Branko Čibej  wrote:

> On 01.12.2012 14:31, Justin Erenkrantz wrote:
> > And, yes, that clearly could all be done in time for 1.8 without
> > jeopardizing the timelines one tiny bit. =P
>
> Eep ... :)
>
>
> Another thing I've been thinking about is this: Why are we using SHA1
> checksums on the server and on the wire for consistency checks when a
> 64-bit CRC would do the job just as well, and 15 times cheaper? And
> banging my head against the wall for not thinking of this 10 years ago.
>
> I can sort of understand the use of SHA1 as a content index for
> client-side pristine files. On the server, however ... dunno. Maybe we
> could design something akin to what the rsync protocol does, but for
> repository-wide data storage. Could be quite tricky to achieve locality,
> however.
>

The one thing that's nice with using SHA checksums is we're using it
everywhere.  It makes protocol debugging a *lot* easier - since we also
used SHA checksums as the content index, that makes it easier to compare
what we recorded in libsvn_wc to what was sent by the server.  If we
diverged the checksums algorithms, it'd be hard to do a quick comparison
visually (do the checksums match?) without actually running the checksum
yourself!

So, I think we optimized for humans here...and I'm okay with that.  We can
always build faster processors...and take advantage of parallelism.  =)

There I go off on a tangent again.
>

*grin*  -- justin


Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c

2012-12-01 Thread Branko Čibej
On 01.12.2012 14:31, Justin Erenkrantz wrote:
> And, yes, that clearly could all be done in time for 1.8 without
> jeopardizing the timelines one tiny bit. =P

Eep ... :)


Another thing I've been thinking about is this: Why are we using SHA1
checksums on the server and on the wire for consistency checks when a
64-bit CRC would do the job just as well, and 15 times cheaper? And
banging my head against the wall for not thinking of this 10 years ago.

I can sort of understand the use of SHA1 as a content index for
client-side pristine files. On the server, however ... dunno. Maybe we
could design something akin to what the rsync protocol does, but for
repository-wide data storage. Could be quite tricky to achieve locality,
however.

There I go off on a tangent again.

-- Brane

-- 
Branko Čibej
Director of Subversion | WANdisco | www.wandisco.com



Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c

2012-12-01 Thread Justin Erenkrantz
On Sat, Dec 1, 2012 at 9:01 AM, Lieven Govaerts  wrote:

> There are some scenario's where either the server admin or the user
> can decide if parallel requests make sense or not.
>
> I'm specifically thinking of the use Kerberos per request
> authentication. These responses can't be cached on the client side,
> and require the authorization header to be sent for each request.
> Assuming 2 step handshake of which serf can bypass the first, this
> means an overhead per request of 1-10KB, with a 3 step handshake each
> request has to be sent twice further increasing the overhead.
> IMHO in this scenario the server admin should be able to veto the use
> of parallel requests.
>
> And the same is true for https connections, where it's also the server
> admin who can decide if the necessary caches have been put in place to
> enable the benefits of parallel requests.
>

Totally agreed.  I'd favor a three-value httpd directive option on the
server-side that is advertised in the capabilities exchange:

- default (client defaults to parallel if ra_serf, serial if older ra_neon
client; or if client overrides ra_serf via their local servers options)
- serial (server suggests to client that it should be serial; but permit
parallel when client wants it)
- force-serial (same capability advertisement, but always trigger send-all
responses regardless of what client asks for)

I'm 95% sure we have code in ra_serf that handles the case where the server
sends us inline responses anyway as older (prior to 1.2, IIRC) always sent
inline responses no matter what we send...so, it should be fairly
straightforward decision tree with minimal code changes.

My $.02...which is still not enough for me to write the patch.  =)  --
justin


Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c

2012-12-01 Thread Lieven Govaerts
On Sat, Dec 1, 2012 at 2:31 PM, Justin Erenkrantz  wrote:
> On Sat, Dec 1, 2012 at 5:59 AM, Johan Corveleyn  wrote:
>>
>> I'm wondering whether your concerns apply to both internet-wide
>> deployments and local (all on the same LAN) ones.
>
>
> That line is certainly a fair one to draw in the sand.  That said, I think
> the internal use case cries out even *more* for the parallel updates as the
> internal server in that environment is often wildly over-provisioned on the
> CPU side - with a fairly low-traffic environment, you want to take advantage
> of the parallel cores of a CPU to drive the updates.
>
> Generally speaking, what I discovered years ago back in 2006 (yikes) and I
> believe is still true as we near 2013 (shudder), if everything else is
> perfectly optimized (disk, latency, bandwidth, etc.), you're going to
> eventually bottleneck on the checksumming on both client and server - which
> is entirely CPU-bound and is expensive.  You can solve that by splitting out
> the work across multiple cores - for a server, you need to utilize multiple
> parallel requests in-flight; and for a client, you then need to parallelize
> the editor drive.
>
> The reason that disk isn't such a bottleneck as you might first expect is
> due to the OS's buffer cache - for reads on the server-side, common data is
> already going to be in RAM so hot spots in the fsfs repos will already be in
> memory, for writes on the client-side, modern client OSes won't necessarily
> block you until everything is sync'd to disk.  But, once you exhaust the
> capabilities of RAM, your underlying disk architecture matters a lot and one
> that might not be intuitive to those that haven't spent a lot of time
> closely with them.  (Hi Brane!)  If you are using direct-attached storage
> locally on either server or client, then you will probably be bottlenecked
> right there.  However, if your corporate environment has an NFS filer or SAN
> (a la NetApp/EMC) backing the FSFS repository or as NFS working copies (oh
> so common), those large disk subsystems are geared towards parallel I/Os -
> not single-threaded I/O performance - Isilon/BlueArc-class storage is
> however; but I've yet to see anyone obsessed enough about SVN I/O perf to
> place either their repository or working copies on a BlueArc-class storage
> system!  So, if you are not using direct-attached storage and are using NFS
> today in a corporate environment on either client or server, then you want
> to parallelize everything so that you can take advantage of the disk/network
> I/O architecture preferred by NetApp/EMC.  Throwing more cores against a
> NetApp/EMC storage system in a high-available bandwidth environment allows
> for linear performance returns (i.e., reading/writing one I/O is 1X, two
> threads is 2X, three threads is 3X, etc, etc.).
>
> To that end, I'd eventually love to see ra_serf drive the update editor
> across multiple threads so that the checksum and disk I/O bottleneck can be
> distributed across cores on the client-side as well.  Compared to where we
> were in 2006, that's the biggest inefficiency we have yet to solve and take
> advantage of.  And, I'm sure this'll break all sorts of promises in the Ev1
> and perhaps Ev2 world and drive C-Mike even crazier.  =)  But, if you want
> to put a rocket pack on our HTTP performance, that's exactly what we should
> do.  I'm reasonably certain that serf itself could be finely tuned to handle
> network I/O in a single thread at or close to wire-speed even on a 10G
> connection with a modern processor/OS - it's what we do with the file
> contents/textdeltas that needs to be shoved to a different set of worker
> threads and remove all of that libsvn_wc processing from blocking network
> traffic processing and get it all distributed and thread-safe.  If we do
> that, woah, I'd bet that we are we going to make things way faster across
> the board and completely blow everything else out of the water when our
> available bandwidth is high - which is the case in an internal network.
> And, yes, that clearly could all be done in time for 1.8 without
> jeopardizing the timelines one tiny bit.  =P
>
> So, that's my long-winded answer of saying that, yah, even in an internal
> LAN environment, you still want to parallelize.
>
> However, I'm definitely not going to veto a patch that would add an httpd
> directive that allows the server to steer the client - unless overridden by
> the client's local config - to using parallel updates or not.  -- justin

There are some scenario's where either the server admin or the user
can decide if parallel requests make sense or not.

I'm specifically thinking of the use Kerberos per request
authentication. These responses can't be cached on the client side,
and require the authorization header to be sent for each request.
Assuming 2 step handshake of which serf can bypass the first, this
means an overhead per request of 1-10KB, with a 3 step handshake each
request has to be sent twice fur

Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c

2012-12-01 Thread Justin Erenkrantz
On Sat, Dec 1, 2012 at 5:59 AM, Johan Corveleyn  wrote:

> I'm wondering whether your concerns apply to both internet-wide
> deployments and local (all on the same LAN) ones.
>

That line is certainly a fair one to draw in the sand.  That said, I think
the internal use case cries out even *more* for the parallel updates as the
internal server in that environment is often wildly over-provisioned on the
CPU side - with a fairly low-traffic environment, you want to take
advantage of the parallel cores of a CPU to drive the updates.

Generally speaking, what I discovered years ago back in 2006 (yikes) and I
believe is still true as we near 2013 (shudder), if everything else is
perfectly optimized (disk, latency, bandwidth, etc.), you're going to
eventually bottleneck on the checksumming on both client and server - which
is entirely CPU-bound and is expensive.  You can solve that by splitting
out the work across multiple cores - for a server, you need to utilize
multiple parallel requests in-flight; and for a client, you then need to
parallelize the editor drive.

The reason that disk isn't such a bottleneck as you might first expect is
due to the OS's buffer cache - for reads on the server-side, common data is
already going to be in RAM so hot spots in the fsfs repos will already be
in memory, for writes on the client-side, modern client OSes won't
necessarily block you until everything is sync'd to disk.  But, once you
exhaust the capabilities of RAM, your underlying disk architecture matters
a lot and one that might not be intuitive to those that haven't spent a lot
of time closely with them.  (Hi Brane!)  If you are using direct-attached
storage locally on either server or client, then you will probably be
bottlenecked right there.  However, if your corporate environment has an
NFS filer or SAN (a la NetApp/EMC) backing the FSFS repository or as NFS
working copies (oh so common), those large disk subsystems are geared
towards parallel I/Os - not single-threaded I/O performance -
Isilon/BlueArc-class storage is however; but I've yet to see anyone
obsessed enough about SVN I/O perf to place either their repository or
working copies on a BlueArc-class storage system!  So, if you are not using
direct-attached storage and are using NFS today in a corporate environment
on either client or server, then you want to parallelize everything so that
you can take advantage of the disk/network I/O architecture preferred by
NetApp/EMC.  Throwing more cores against a NetApp/EMC storage system in a
high-available bandwidth environment allows for linear performance returns
(i.e., reading/writing one I/O is 1X, two threads is 2X, three threads is
3X, etc, etc.).

To that end, I'd eventually love to see ra_serf drive the update editor
across multiple threads so that the checksum and disk I/O bottleneck can be
distributed across cores on the client-side as well.  Compared to where we
were in 2006, that's the biggest inefficiency we have yet to solve and take
advantage of.  And, I'm sure this'll break all sorts of promises in the Ev1
and perhaps Ev2 world and drive C-Mike even crazier.  =)  But, if you want
to put a rocket pack on our HTTP performance, that's exactly what we should
do.  I'm reasonably certain that serf itself could be finely tuned to
handle network I/O in a single thread at or close to wire-speed even on a
10G connection with a modern processor/OS - it's what we do with the file
contents/textdeltas that needs to be shoved to a different set of worker
threads and remove all of that libsvn_wc processing from blocking network
traffic processing and get it all distributed and thread-safe.  If we do
that, woah, I'd bet that we are we going to make things way faster across
the board and completely blow everything else out of the water when our
available bandwidth is high - which is the case in an internal network.
 And, yes, that clearly could all be done in time for 1.8 without
jeopardizing the timelines one tiny bit.  =P

So, that's my long-winded answer of saying that, yah, even in an internal
LAN environment, you still want to parallelize.

However, I'm definitely not going to veto a patch that would add an httpd
directive that allows the server to steer the client - unless overridden by
the client's local config - to using parallel updates or not.  -- justin


Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c

2012-12-01 Thread Johan Corveleyn
On Sat, Dec 1, 2012 at 6:36 AM, Justin Erenkrantz  wrote:
> On Fri, Nov 30, 2012 at 4:54 PM,  wrote:
>>
>> Author: cmpilato
>> Date: Fri Nov 30 21:54:35 2012
>> New Revision: 1415864
>>
>> URL: http://svn.apache.org/viewvc?rev=1415864&view=rev
>> Log:
>> Implement in ra_serf "send-all" mode support for update-style REPORTs
>> and their responses.  (Currently disabled by compile-time conditionals.)
>>
>> (This one goes out to Ivan Zhakov.)
>
>
> I've stated for a long time that I think the send-all mode is a huge mistake
> architecturally because it is too prone to double-compression and TCP
> pipeline stalls and is a tremendous burden on a properly-configured httpd
> (by not taking advantage of server-side parallelism), it's nice to see it's
> not *too* hard to shoehorn this bad idea back into ra_serf.  We'd never be
> able to shove the non-send-all approach into ra_neon.  =)

I'm wondering whether your concerns apply to both internet-wide
deployments and local (all on the same LAN) ones.

It seems to me that SVN has two sets of audiences when it comes to
networking: some have to support users over the internet with
sometimes slow and high-latency, perhaps flaky connections; and others
have all their users on a local (or almost-local) network, and want to
make optimal use of their infrastructure, which offers an absolutely
rock-solid low-latency connection ... they'd like to shove the content
through that (wide, short) pipe as quickly as possible.

I'm no expert, but I suppose it's possible that those two audiences
need two different networking configurations to make optimal use of
their environment. If that's the case, it would be great if we could
offer some (clear, simple to use) configuration directives for those
admins to tune things ...

Just my 2 cents ...
-- 
Johan


Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c

2012-11-30 Thread Daniel Shahaf
On Sat, Dec 01, 2012 at 12:36:30AM -0500, Justin Erenkrantz wrote:
> Here's my suggestion for consideration - let's experiment with this setting
> in the beta release process with the setting as-is - that is we always do
> the parallel updates unconditionally (except perhaps when svnrdump is being
> stupid).  If we get real users complaining about the update during that
> cycle, we can then figure out either switching the default and/or adding a
> config-option or even allowing some control via capabilities exchange.
> 
> Does that sound reasonable?  -- justin

Why wait until beta?  Can't we get such feedback by asking people to test the
nightly releases, or by tagging r1415926 and releasing it as pre-alpha?


Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c

2012-11-30 Thread Justin Erenkrantz
On Fri, Nov 30, 2012 at 4:54 PM,  wrote:

> Author: cmpilato
> Date: Fri Nov 30 21:54:35 2012
> New Revision: 1415864
>
> URL: http://svn.apache.org/viewvc?rev=1415864&view=rev
> Log:
> Implement in ra_serf "send-all" mode support for update-style REPORTs
> and their responses.  (Currently disabled by compile-time conditionals.)
>
> (This one goes out to Ivan Zhakov.)
>

I've stated for a long time that I think the send-all mode is a huge
mistake architecturally because it is too prone to double-compression and
TCP pipeline stalls and is a tremendous burden on a properly-configured
httpd (by not taking advantage of server-side parallelism), it's nice to
see it's not *too* hard to shoehorn this bad idea back into ra_serf.  We'd
never be able to shove the non-send-all approach into ra_neon.  =)

Here's my suggestion for consideration - let's experiment with this setting
in the beta release process with the setting as-is - that is we always do
the parallel updates unconditionally (except perhaps when svnrdump is being
stupid).  If we get real users complaining about the update during that
cycle, we can then figure out either switching the default and/or adding a
config-option or even allowing some control via capabilities exchange.

Does that sound reasonable?  -- justin


Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c

2012-11-30 Thread C. Michael Pilato
On 11/30/2012 05:25 PM, Mark Phippard wrote:
> On Fri, Nov 30, 2012 at 5:23 PM, C. Michael Pilato  
> wrote:
>> On 11/30/2012 05:00 PM, Mark Phippard wrote:
>>> On Fri, Nov 30, 2012 at 4:54 PM,   wrote:
 Author: cmpilato
 Date: Fri Nov 30 21:54:35 2012
 New Revision: 1415864

 URL: http://svn.apache.org/viewvc?rev=1415864&view=rev
 Log:
 Implement in ra_serf "send-all" mode support for update-style REPORTs
 and their responses.  (Currently disabled by compile-time conditionals.)
>>>
>>> Sweet!
>>>
>>> Would this also resolve the issue with svnrdump, or could it?  When
>>> Serf is using this mode, I assume it is also now conforming to Ev1?
>>
>> I guess it *could* based on what I'm reading is considered the source of
>> svnrdump+ra_serf's problems, but I'm a bit confused -- I thought svnrdump
>> used the ra-replay API instead of the ra-update one?
> 
> Guess I am more wondering if it was another area where the same
> solution could be applied?

No, that's just it.  ra_serf's implementation of the ra-replay API is
single-connection, just like ra-neon's was.  What suprises me is that
svnrdump *does* use the ra-update API.

Ah!  I see why, now.  When not doing an incremental dump, 'svnrdump dump'
uses the ra-update API to handle that initial checkout-like revision.  After
that (and otherwise when in incremental mode), it uses the ra-replay API.
So yes, I believe svnrdump would be in fine shape over ra-serf if it was
asking the server to use this "send-all" mode, where document Ev1 drive
ordering *should* be honored.

-- 
C. Michael Pilato 
CollabNet   <>   www.collab.net   <>   Enterprise Cloud Development



signature.asc
Description: OpenPGP digital signature


Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c

2012-11-30 Thread Mark Phippard
On Fri, Nov 30, 2012 at 5:23 PM, C. Michael Pilato  wrote:
> On 11/30/2012 05:00 PM, Mark Phippard wrote:
>> On Fri, Nov 30, 2012 at 4:54 PM,   wrote:
>>> Author: cmpilato
>>> Date: Fri Nov 30 21:54:35 2012
>>> New Revision: 1415864
>>>
>>> URL: http://svn.apache.org/viewvc?rev=1415864&view=rev
>>> Log:
>>> Implement in ra_serf "send-all" mode support for update-style REPORTs
>>> and their responses.  (Currently disabled by compile-time conditionals.)
>>
>> Sweet!
>>
>> Would this also resolve the issue with svnrdump, or could it?  When
>> Serf is using this mode, I assume it is also now conforming to Ev1?
>
> I guess it *could* based on what I'm reading is considered the source of
> svnrdump+ra_serf's problems, but I'm a bit confused -- I thought svnrdump
> used the ra-replay API instead of the ra-update one?

Guess I am more wondering if it was another area where the same
solution could be applied?


-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/


Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c

2012-11-30 Thread C. Michael Pilato
On 11/30/2012 05:00 PM, Mark Phippard wrote:
> On Fri, Nov 30, 2012 at 4:54 PM,   wrote:
>> Author: cmpilato
>> Date: Fri Nov 30 21:54:35 2012
>> New Revision: 1415864
>>
>> URL: http://svn.apache.org/viewvc?rev=1415864&view=rev
>> Log:
>> Implement in ra_serf "send-all" mode support for update-style REPORTs
>> and their responses.  (Currently disabled by compile-time conditionals.)
> 
> Sweet!
> 
> Would this also resolve the issue with svnrdump, or could it?  When
> Serf is using this mode, I assume it is also now conforming to Ev1?

I guess it *could* based on what I'm reading is considered the source of
svnrdump+ra_serf's problems, but I'm a bit confused -- I thought svnrdump
used the ra-replay API instead of the ra-update one?

-- 
C. Michael Pilato 
CollabNet   <>   www.collab.net   <>   Enterprise Cloud Development



signature.asc
Description: OpenPGP digital signature


Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c

2012-11-30 Thread Mark Phippard
On Fri, Nov 30, 2012 at 4:54 PM,   wrote:
> Author: cmpilato
> Date: Fri Nov 30 21:54:35 2012
> New Revision: 1415864
>
> URL: http://svn.apache.org/viewvc?rev=1415864&view=rev
> Log:
> Implement in ra_serf "send-all" mode support for update-style REPORTs
> and their responses.  (Currently disabled by compile-time conditionals.)

Sweet!

Would this also resolve the issue with svnrdump, or could it?  When
Serf is using this mode, I assume it is also now conforming to Ev1?

-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/