Re: [serf-dev] Re: What stands between us and branching 1.7?

2011-01-23 Thread Lieven Govaerts
On Fri, Jan 21, 2011 at 7:59 AM, Justin Erenkrantz jus...@erenkrantz.comwrote:

 On Thu, Jan 20, 2011 at 3:18 PM, Lieven Govaerts svn...@mobsol.be wrote:
  Greg or Lieven, any thoughts here?  -- justin
 
  At least the one rev that fixes this issue, don't know if the other are
  already working in all scenario's.
  I'll look at it this weekend and make a release.

 Woohoo.  Thanks.  I'll be sure to test it.  Let me know if you need
 any help.  -- justin


I've prepared branch 0.7.x with the changes that are safe to go in serf
0.7.1.

I'm running some tests on Linux, Windows 7 and Mac Os X. On the latter svn
trunk currently doesn't build for me (is_atomicity_error symbol not found),
but doesn't seem to be related to serf.

I'd appreciate some more testing, before I make the release later this week.
The code to get is:
http://serf.googlecode.com/svn/branches/0.7.x at r1427.

Lieven


Re: [serf-dev] Re: What stands between us and branching 1.7?

2011-01-23 Thread Stefan Sperling
On Sun, Jan 23, 2011 at 02:41:49PM +0100, Lieven Govaerts wrote:
 On Fri, Jan 21, 2011 at 7:59 AM, Justin Erenkrantz 
 jus...@erenkrantz.comwrote:
 
  On Thu, Jan 20, 2011 at 3:18 PM, Lieven Govaerts svn...@mobsol.be wrote:
   Greg or Lieven, any thoughts here?  -- justin
  
   At least the one rev that fixes this issue, don't know if the other are
   already working in all scenario's.
   I'll look at it this weekend and make a release.
 
  Woohoo.  Thanks.  I'll be sure to test it.  Let me know if you need
  any help.  -- justin
 
 
 I've prepared branch 0.7.x with the changes that are safe to go in serf
 0.7.1.
 
 I'm running some tests on Linux, Windows 7 and Mac Os X. On the latter svn
 trunk currently doesn't build for me (is_atomicity_error symbol not found),
 but doesn't seem to be related to serf.
 
 I'd appreciate some more testing, before I make the release later this week.
 The code to get is:
 http://serf.googlecode.com/svn/branches/0.7.x at r1427.

The svn-openbsd-i386 buildbot is using serf 0.7.x. The working copy
isn't updated automatically, but I've just removed it so it should get
a fresh checkout during the next build and run all tests with the
newest serf.

Stean


Re: [serf-dev] Re: What stands between us and branching 1.7?

2011-01-23 Thread Daniel Shahaf
Lieven Govaerts wrote on Sun, Jan 23, 2011 at 14:41:49 +0100:
 I'm running some tests on Linux, Windows 7 and Mac Os X. On the latter svn
 trunk currently doesn't build for me (is_atomicity_error symbol not found),
 but doesn't seem to be related to serf.

That's odd: in HEAD, that symbol only appears in ra/util.c, and it's
declared static there.


Re: [PATCH] extend svn_subst_translate_string2() with a REPAIR parameter

2011-01-23 Thread Danny Trebbien
Ping. I haven't yet received feedback on this patch.

On Fri, Jan 14, 2011 at 5:57 PM, Danny Trebbien dtrebb...@gmail.com wrote:

 On Fri, Jan 14, 2011 at 12:32 PM, Danny Trebbien dtrebb...@gmail.com wrote:
  Attached are a patch and log message to extend
  svn_subst_translate_string2() with a new parameter, REPAIR. This
  allows the caller to customize whether the translation repairs line
  endings.

 I noticed a slight problem with the extra asserts in
 test_svn_subst_translate_string2() of
 subversion/tests/libsvn_subr/subst_translate-test.c that this patch
 adds. When REPAIR is FALSE and VALUE has inconsistent line endings,
 then an SVN_ERR_IO_INCONSISTENT_EOL error is returned. If, however, a
 future change causes svn_subst_translate_string2() to return
 SVN_NO_ERROR, then the assert that the error was
 SVN_ERR_IO_INCONSISTENT_EOL would dereference a NULL pointer.

 Attached are the patch and log message for version 1.1 of this patch.
 The new patch corrects this flaw. The new log message corrects a
 grammar mistake.


Re: [PATCH] allow svnsync to translate non-UTF-8 log messages to UTF-8 (v. 4)

2011-01-23 Thread Danny Trebbien
Ping. I haven't yet received feedback on this patch.

On Fri, Jan 14, 2011 at 1:01 PM, Danny Trebbien dtrebb...@gmail.com wrote:
 Attached are version 4 of the patch and log message. Also attached are
 two TGZ archives containing the six DUMP files that the new svnsync
 and svnrdump tests depend upon. These TGZ archives can be extracted
 directly into your working copy of trunk.

 Note that this patch depends on
 http://thread.gmane.org/gmane.comp.version-control.subversion.devel/125056,
 which must be committed first.


Re: [serf-dev] Re: What stands between us and branching 1.7?

2011-01-23 Thread Stefan Küng

On 23.01.2011 14:41, Lieven Govaerts wrote:



On Fri, Jan 21, 2011 at 7:59 AM, Justin Erenkrantz
jus...@erenkrantz.com mailto:jus...@erenkrantz.com wrote:

On Thu, Jan 20, 2011 at 3:18 PM, Lieven Govaerts svn...@mobsol.be
mailto:svn...@mobsol.be wrote:
  Greg or Lieven, any thoughts here?  -- justin
 
  At least the one rev that fixes this issue, don't know if the
other are
  already working in all scenario's.
  I'll look at it this weekend and make a release.

Woohoo.  Thanks.  I'll be sure to test it.  Let me know if you need
any help.  -- justin


I've prepared branch 0.7.x with the changes that are safe to go in serf
0.7.1.

I'm running some tests on Linux, Windows 7 and Mac Os X. On the latter
svn trunk currently doesn't build for me (is_atomicity_error symbol not
found), but doesn't seem to be related to serf.

I'd appreciate some more testing, before I make the release later this
week. The code to get is:
http://serf.googlecode.com/svn/branches/0.7.x at r1427.


I've used a build from serf trunk for a while now without problems.
The serf log indicates that r1416 has been merged, but r1417 has not. Is 
there a reason why this memory leak fix hasn't been merged to 0.7.x?


Stefan

--
   ___
  oo  // \\  De Chelonian Mobile
 (_,\/ \_/ \ TortoiseSVN
   \ \_/_\_/The coolest Interface to (Sub)Version Control
   /_/   \_\ http://tortoisesvn.net


Re: [serf-dev] Re: What stands between us and branching 1.7?

2011-01-23 Thread Lieven Govaerts
On Sun, Jan 23, 2011 at 3:59 PM, Stefan Küng tortoise...@gmail.com wrote:

 On 23.01.2011 14:41, Lieven Govaerts wrote:



 On Fri, Jan 21, 2011 at 7:59 AM, Justin Erenkrantz
 jus...@erenkrantz.com mailto:jus...@erenkrantz.com wrote:

On Thu, Jan 20, 2011 at 3:18 PM, Lieven Govaerts svn...@mobsol.be
mailto:svn...@mobsol.be wrote:
  Greg or Lieven, any thoughts here?  -- justin
 
  At least the one rev that fixes this issue, don't know if the
other are
  already working in all scenario's.
  I'll look at it this weekend and make a release.

Woohoo.  Thanks.  I'll be sure to test it.  Let me know if you need
any help.  -- justin


 I've prepared branch 0.7.x with the changes that are safe to go in serf
 0.7.1.

 I'm running some tests on Linux, Windows 7 and Mac Os X. On the latter
 svn trunk currently doesn't build for me (is_atomicity_error symbol not
 found), but doesn't seem to be related to serf.

 I'd appreciate some more testing, before I make the release later this
 week. The code to get is:
 http://serf.googlecode.com/svn/branches/0.7.x at r1427.


 I've used a build from serf trunk for a while now without problems.
 The serf log indicates that r1416 has been merged, but r1417 has not. Is
 there a reason why this memory leak fix hasn't been merged to 0.7.x?


r1417 is part of a refactoring on trunk to move the connection related code
in a separate bucket (httpconn). I doubt that that code is really stable
yet, so I'd rather not merge it to 0.7.x.

Lieven


Re: [serf-dev] Re: What stands between us and branching 1.7?

2011-01-23 Thread Justin Erenkrantz
On Sun, Jan 23, 2011 at 5:41 AM, Lieven Govaerts svn...@mobsol.be wrote:
 I'd appreciate some more testing, before I make the release later this week.
 The code to get is:
 http://serf.googlecode.com/svn/branches/0.7.x at r1427.

% uname -v
Darwin Kernel Version 10.5.0: Fri Nov  5 23:20:39 PDT 2010;
root:xnu-1504.9.17~1/RELEASE_I386
(OS X 10.6.5)
% ln -s ~/work/serf-0.7.x/test/serftestca.pem test/serftestca.pem
(I use VPATH builds, so had to bring over serftestca.pem into the
build dir to get the 2 SSL tests to pass)
% ./test/test_all
.

OK (17 tests)

And, 0.7.x builds fine against latest SVN 1.6.x branches and basic
co/ls functionality via ra_serf seems good.

Looks good!  Thanks!  -- justin


Re: [serf-dev] Re: What stands between us and branching 1.7?

2011-01-23 Thread Stefan Küng

On 23.01.2011 16:09, Lieven Govaerts wrote:


I've used a build from serf trunk for a while now without problems.
The serf log indicates that r1416 has been merged, but r1417 has
not. Is there a reason why this memory leak fix hasn't been merged
to 0.7.x?


r1417 is part of a refactoring on trunk to move the connection related
code in a separate bucket (httpconn). I doubt that that code is really
stable yet, so I'd rather not merge it to 0.7.x.


Ok, tested with serf from the 0.7.x branch: memory rise is still higher 
than with neon, indicating that there's still some (small) memory leak 
somewhere. But checkouts and updates of even larger projects succeed 
without using up all available memory as before.
To compare: checkout of the TSVN source including all externals with 
serf uses up about 40MB more RAM than when using neon. I'd say that's ok.


Stefan

--
   ___
  oo  // \\  De Chelonian Mobile
 (_,\/ \_/ \ TortoiseSVN
   \ \_/_\_/The coolest Interface to (Sub)Version Control
   /_/   \_\ http://tortoisesvn.net


Re: [serf-dev] Re: What stands between us and branching 1.7?

2011-01-23 Thread Justin Erenkrantz
On Sun, Jan 23, 2011 at 8:42 AM, Stefan Küng tortoise...@gmail.com wrote:
 Ok, tested with serf from the 0.7.x branch: memory rise is still higher than
 with neon, indicating that there's still some (small) memory leak somewhere.
 But checkouts and updates of even larger projects succeed without using up
 all available memory as before.
 To compare: checkout of the TSVN source including all externals with serf
 uses up about 40MB more RAM than when using neon. I'd say that's ok.

Does the memory keep going up?  Or, does it reach a steady point?  I'd
expect that ra_serf would use a bit more memory than ra_neon as it has
to manage a lot more than what neon has to do.

As a point of reference, on Mac OS X 10.6, svn 1.6.x with ra_serf
checking out svn trunk peaks at 81MB, while ra_neon peaks at 12MB.

HTH.  -- justin


Tuning recommendations for Apache HTTP Server w/ra_serf clients (MaxKeepAliveRequests)

2011-01-23 Thread Justin Erenkrantz
I think this probably belongs in the book, but I don't know exactly
where...so I'll ensure that it's noted again and let someone else
figure out where it should live.

As we start to have more users with ra_serf, we should also recommend
that httpd server admins bump up their MaxKeepAliveRequests from 100
(default) to at least 1000.  This'll help ra_serf utilize TCP
connections more efficiently - as ra_serf issues 2 HTTP requests per
file that it needs to check out or update.

ra_serf has heuristics so that it'll work gracefully under the default
settings (and determine what the value actually is), but you'll get
better network performance if you can utilize fewer TCP connections.
This way, ra_serf will only need to open a new connection every 500
files rather than every 50.  (Honestly, there is really no reason that
it couldn't be 1.)

If we also have a tuning section, we should remind folks to also
enable mod_deflate (SetOutputFilter DEFLATE in their Location block)
as that'll help a bit when transferring XML content back and forth.
mod_deflate will come at a slight latency penalty, but that'll be
offset for folks with slower connections taking less time to transfer
the responses overall.

FWIW, I've already asked ASF infra to bump up the MaxKeepAliveRequests
settings for svn.apache.org.

Thanks!  -- justin


Re: [serf-dev] Re: What stands between us and branching 1.7?

2011-01-23 Thread Stefan Küng

On 23.01.2011 18:06, Justin Erenkrantz wrote:

On Sun, Jan 23, 2011 at 8:42 AM, Stefan Küngtortoise...@gmail.com  wrote:

Ok, tested with serf from the 0.7.x branch: memory rise is still higher than
with neon, indicating that there's still some (small) memory leak somewhere.
But checkouts and updates of even larger projects succeed without using up
all available memory as before.
To compare: checkout of the TSVN source including all externals with serf
uses up about 40MB more RAM than when using neon. I'd say that's ok.


Does the memory keep going up?  Or, does it reach a steady point?  I'd
expect that ra_serf would use a bit more memory than ra_neon as it has
to manage a lot more than what neon has to do.

As a point of reference, on Mac OS X 10.6, svn 1.6.x with ra_serf
checking out svn trunk peaks at 81MB, while ra_neon peaks at 12MB.


I can't really say for sure if serf memory keeps going up or not. I was 
testing with TSVN and there the memory has to keep going up since all 
the output is stored in memory for the UI controls to show to the user. 
I can only say for sure that the memory for both neon and serf goes up 
at approximately the same rate. Only with serf, memory goes up at the 
beginning very fast until about 40-50MB more than with neon, but then 
keeps that 'distance' (well, almost: the distance increases a little 
bit, but not much - around 8kbytes per two seconds) to neon until the end.


Stefan

--
   ___
  oo  // \\  De Chelonian Mobile
 (_,\/ \_/ \ TortoiseSVN
   \ \_/_\_/The coolest Interface to (Sub)Version Control
   /_/   \_\ http://tortoisesvn.net


Re: [serf-dev] Re: What stands between us and branching 1.7?

2011-01-23 Thread Lieven Govaerts
On Sun, Jan 23, 2011 at 6:06 PM, Justin Erenkrantz jus...@erenkrantz.comwrote:

 On Sun, Jan 23, 2011 at 8:42 AM, Stefan Küng tortoise...@gmail.com
 wrote:
  Ok, tested with serf from the 0.7.x branch: memory rise is still higher
 than
  with neon, indicating that there's still some (small) memory leak
 somewhere.
  But checkouts and updates of even larger projects succeed without using
 up
  all available memory as before.
  To compare: checkout of the TSVN source including all externals with serf
  uses up about 40MB more RAM than when using neon. I'd say that's ok.

 Does the memory keep going up?  Or, does it reach a steady point?  I'd
 expect that ra_serf would use a bit more memory than ra_neon as it has
 to manage a lot more than what neon has to do.

 As a point of reference, on Mac OS X 10.6, svn 1.6.x with ra_serf
 checking out svn trunk peaks at 81MB, while ra_neon peaks at 12MB.


I have been looking at ra_serf's memory usage with svn trunk at the last svn
days in Berlin and found that most memory was allocated and held in the
sqlite code.

Things have changed since then though. Can anyone test with svn 1.6.x to see
how it uses memory?

Lieven


Re: Tuning recommendations for Apache HTTP Server w/ra_serf clients (MaxKeepAliveRequests)

2011-01-23 Thread Stefan Sperling
On Sun, Jan 23, 2011 at 09:27:18AM -0800, Justin Erenkrantz wrote:
 If we also have a tuning section, we should remind folks to also
 enable mod_deflate (SetOutputFilter DEFLATE in their Location block)
 as that'll help a bit when transferring XML content back and forth.
 mod_deflate will come at a slight latency penalty, but that'll be
 offset for folks with slower connections taking less time to transfer
 the responses overall.

Has this problem been fixed, then?
http://svn.haxx.se/dev/archive-2009-08/0274.shtml
I don't think we should recommend mod_deflate if that problem still exists.

I don't think that thread came to a conclusion.
Greg was implying the bug was in mod_deflate rather than svn:
http://svn.haxx.se/dev/archive-2009-08/0290.shtml

Stefan


Re: [serf-dev] Re: What stands between us and branching 1.7?

2011-01-23 Thread Justin Erenkrantz
On Sun, Jan 23, 2011 at 10:22 AM, Lieven Govaerts svn...@mobsol.be wrote:
 Things have changed since then though. Can anyone test with svn 1.6.x to see
 how it uses memory?

For ra_serf, I'm wondering if we're creating an additional pool that
isn't necessary - namely the editor_pool.

I've done some light testing with the ra_serf patch below, which does
the following things:
 - Create a separate pool hierarchy for files
 - Combines the per-file editor pool with the regular pool for file

For serf itself, I've also switched zlib to use the bucket allocator -
which also helps which churn - as every call to inflateInit/inflateEnd
invokes malloc/free - so we can save on that quite bit.

Overall, this seems to reduce the amount of 20KB allocations
substantially...but not sure it does much to the overall memory usage.

Not sure if/when I'll pick this up again...so patches below if someone
else wants to run with it.  -- justin

Index: update.c
===
--- update.c(revision 1062462)
+++ update.c(working copy)
@@ -107,6 +107,7 @@
   /* controlling dir baton - this is only created in open_dir() */
   void *dir_baton;
   apr_pool_t *dir_baton_pool;
+  apr_pool_t *file_pool;

   /* Our master update editor and baton. */
   const svn_delta_editor_t *update_editor;
@@ -202,9 +203,6 @@
   /* The properties for this file */
   apr_hash_t *props;

-  /* pool passed to update-add_file, etc. */
-  apr_pool_t *editor_pool;
-
   /* controlling file_baton and textdelta handler */
   void *file_baton;
   const char *base_checksum;
@@ -403,7 +401,7 @@
   report_info_t *new_info;

   new_info = apr_pcalloc(info_parent_pool, sizeof(*new_info));
-  apr_pool_create(new_info-pool, info_parent_pool);
+  apr_pool_create(new_info-pool, info-dir-file_pool);
   new_info-file_baton = NULL;
   new_info-lock_token = NULL;
   new_info-fetch_file = FALSE;
@@ -500,6 +498,7 @@
   if (dir-base_name[0] == '\0')
 {
   apr_pool_create(dir-dir_baton_pool, dir-pool);
+  apr_pool_create(dir-file_pool, dir-pool);

   if (dir-report_context-destination 
   dir-report_context-sess-wc_callbacks-invalidate_wc_props)
@@ -519,6 +518,7 @@
   SVN_ERR(open_dir(dir-parent_dir));

   apr_pool_create(dir-dir_baton_pool, dir-parent_dir-dir_baton_pool);
+  dir-file_pool = dir-parent_dir-file_pool;

   if (SVN_IS_VALID_REVNUM(dir-base_rev))
 {
@@ -632,7 +632,7 @@
   if (lock_val)
 {
   char *new_lock;
-  new_lock = apr_pstrdup(info-editor_pool, lock_val);
+  new_lock = apr_pstrdup(info-pool, lock_val);
   apr_collapse_spaces(new_lock, new_lock);
   lock_val = new_lock;
 }
@@ -641,7 +641,7 @@
 {
   svn_string_t *str;

-  str = svn_string_ncreate(, 1, info-editor_pool);
+  str = svn_string_ncreate(, 1, info-pool);

   svn_ra_serf__set_ver_prop(info-dir-removed_props, info-base_name,
 info-base_rev, DAV:, lock-token,
@@ -756,13 +756,11 @@
   return error_fetch(request, fetch_ctx, err);
 }

-  apr_pool_create(info-editor_pool, info-dir-dir_baton_pool);
-
   /* Expand our full name now if we haven't done so yet. */
   if (!info-name)
 {
   info-name_buf = svn_stringbuf_dup(info-dir-name_buf,
- info-editor_pool);
+ info-pool);
   svn_path_add_component(info-name_buf, info-base_name);
   info-name = info-name_buf-data;
 }
@@ -772,7 +770,7 @@
   err = info-dir-update_editor-open_file(info-name,
 info-dir-dir_baton,
 info-base_rev,
-info-editor_pool,
+info-pool,
 info-file_baton);
 }
   else
@@ -781,7 +779,7 @@
info-dir-dir_baton,
info-copyfrom_path,
info-copyfrom_rev,
-   info-editor_pool,
+   info-pool,
info-file_baton);
 }

@@ -792,7 +790,7 @@

   err = info-dir-update_editor-apply_textdelta(info-file_baton,
   info-base_checksum,
-  info-editor_pool,
+  info-pool,
   info-textdelta,
   info-textdelta_baton);

@@ -806,7 +804,7 @@
   

Re: Tuning recommendations for Apache HTTP Server w/ra_serf clients (MaxKeepAliveRequests)

2011-01-23 Thread Justin Erenkrantz
On Sun, Jan 23, 2011 at 11:26 AM, Stefan Sperling s...@elego.de wrote:
 On Sun, Jan 23, 2011 at 09:27:18AM -0800, Justin Erenkrantz wrote:
 If we also have a tuning section, we should remind folks to also
 enable mod_deflate (SetOutputFilter DEFLATE in their Location block)
 as that'll help a bit when transferring XML content back and forth.
 mod_deflate will come at a slight latency penalty, but that'll be
 offset for folks with slower connections taking less time to transfer
 the responses overall.

 Has this problem been fixed, then?
 http://svn.haxx.se/dev/archive-2009-08/0274.shtml
 I don't think we should recommend mod_deflate if that problem still exists.

No, it doesn't look like it has been fixed yet.

However, how many real-world HTTP clients are out there that don't
speak zlib?  =)

 I don't think that thread came to a conclusion.
 Greg was implying the bug was in mod_deflate rather than svn:
 http://svn.haxx.se/dev/archive-2009-08/0290.shtml

Greg is correct in that the patch doesn't go far enough - mod_dav_svn
should change all uses of output to be a ** rather than *.

As far as mod_deflate being broken, yah, it probably shouldn't be
trying to do any memory usage until it knows it is
activated...but...that's a topic for another list (dev@httpd).
mod_deflate should probably memorize the fact that the checks have
already run and get out of the way...probably with setting no-gzip in
subprocess_env?  -- justin


Re: My take on the diff-optimizations-bytes branch

2011-01-23 Thread Johan Corveleyn
On Sat, Jan 8, 2011 at 6:50 AM, Stefan Fuhrmann
stefanfuhrm...@alice-dsl.de wrote:
 On 03.01.2011 02:14, Johan Corveleyn wrote:
 It would be interesting to see where the biggest gains
 are coming from (I'm guessing from the per-machine-word
 reading/comparing; I'd like to try that first, maybe together with the
 allocating of the file[] on the stack; I'd like to avoid
 special-casing file_len==2, splitting up functions in *_slow and
 *_fast, because it makes it a lot more difficult imho (so I'd only
 like to do that if it's a significant contributor)). But if you want
 to leave that as an exercise for the reader, that's fine as well :-).

 Exercise is certainly not a bad thing ;)

 But I think, the stack variable is certainly helpful
 and easy to do. While chunky operation gives
 a *big* boost, it is much more difficult to code if
 you need to compare multiple sources. It should
 be possible, though.

Ok, I finished the exercise :-).

- File info structures on stack instead of on heap: committed in
r1060625. Gives 10% boost.

- chunky operation for multiple sources: committed in r1062532. Gives
~250% boost on my machine.

For suffix scanning, I made a couple of changes to the logic you sent
in your patch (as you remember, there was a problem with the
implementation of suffix scanning [1]). It was (in
scan_backward_fast):

[[[
+#if SVN_UNALIGNED_ACCESS_IS_OK
+
+  if (   (file[0].curp - sizeof(apr_size_t)  file[0].curp)
+   (file[1].curp - sizeof(apr_size_t)  file[1].curp))
+{
+  do
+{
+  file[0].curp -= sizeof(apr_size_t);
+  file[1].curp -= sizeof(apr_size_t);
+}
+  while (   (file[0].curp = min_curp0)
+  (file[1].curp = min_curp1)
+  (   *(const apr_size_t*)(file[0].curp + 1)
+ == *(const apr_size_t*)(file[1].curp + 1)));
+
+  file[0].curp += sizeof(apr_size_t);
+  file[1].curp += sizeof(apr_size_t);
+}
]]]

I changed this to (the N-file equivalent of):
[[[
+#if SVN_UNALIGNED_ACCESS_IS_OK
+
+  if (   (file[0].curp - sizeof(apr_size_t) = min_curp0)
+   (file[1].curp - sizeof(apr_size_t) = min_curp1))
+{
+  do
+{
+  file[0].curp -= sizeof(apr_size_t);
+  file[1].curp -= sizeof(apr_size_t);
+}
+  while (   (file[0].curp - sizeof(apr_size_t) = min_curp0)
+  (file[1].curp - sizeof(apr_size_t) = min_curp1)
+  (   *(const apr_size_t*)(file[0].curp + 1)
+ == *(const apr_size_t*)(file[1].curp + 1)));
+
+  file[0].curp += sizeof(apr_size_t);
+  file[1].curp += sizeof(apr_size_t);
+}
]]]

(so, changed the if-comparisons before the do-while-loop, to make sure
there is room to subtract a sizeof(apr_size_t) (the original didn't
really make sense to me), and changed the comparisons with min_curpX
in the while condition (check if there is still room to subtract a
sizeof(apr_size_t)).

After that, all tests went fine.


Now, you suggested in a previous post in this thread that hardcoding
file_len to 2 should speed it up with up to 50%. And you are correct
(on my machine, it's sped up by ~40% if I replace all file_len's in
find_identical_* with 2).

So, that leads me to conclude that I should probably throw away the
generic code, and replace it with 3 variants: find_identical_*2,
find_identical_*3 and find_identical_*4 (once I write *2, the rest
should be mostly copy-n-paste). As you said, this should also vastly
simplify the code (drop the for loops, simpler conditionals, ...).

Cheers,
-- 
Johan

[1] http://svn.haxx.se/dev/archive-2011-01/0022.shtml


Re: My take on the diff-optimizations-bytes branch

2011-01-23 Thread Johan Corveleyn
On Sun, Jan 23, 2011 at 10:46 PM, Johan Corveleyn jcor...@gmail.com wrote:
 On Sat, Jan 8, 2011 at 6:50 AM, Stefan Fuhrmann
 stefanfuhrm...@alice-dsl.de wrote:
 On 03.01.2011 02:14, Johan Corveleyn wrote:
 It would be interesting to see where the biggest gains
 are coming from (I'm guessing from the per-machine-word
 reading/comparing; I'd like to try that first, maybe together with the
 allocating of the file[] on the stack; I'd like to avoid
 special-casing file_len==2, splitting up functions in *_slow and
 *_fast, because it makes it a lot more difficult imho (so I'd only
 like to do that if it's a significant contributor)). But if you want
 to leave that as an exercise for the reader, that's fine as well :-).

 Exercise is certainly not a bad thing ;)

 But I think, the stack variable is certainly helpful
 and easy to do. While chunky operation gives
 a *big* boost, it is much more difficult to code if
 you need to compare multiple sources. It should
 be possible, though.

 Ok, I finished the exercise :-).

 - File info structures on stack instead of on heap: committed in
 r1060625. Gives 10% boost.

 - chunky operation for multiple sources: committed in r1062532. Gives
 ~250% boost on my machine.

Oops, as some people pointed out on IRC, that should be ~60% (I meant
2.5 times faster, going from ~50s to ~20s, so that's about ~60% less
time spent in datasources_open).

-- 
Johan


Assertion failure during update_tests.py 58 (XFAIL: update a nonexistent child of a copied dir)

2011-01-23 Thread Johan Corveleyn
Hi,

Already for some time now, update_tests.py 58 (XFAIL: update a
nonexistent child of a copied dir) crashes on my machine:

svn: In file '..\..\..\subversion\libsvn_wc\update_editor.c' line
4877: assertion failed (repos_root != NULL  repos_uuid != NULL)

I understand that this test is XFAIL, that this isn't addressed yet,
but is it supposed to fail an assert?

On my system (Win XP) this causes an ugly popup to appear (which I
need to click away to continue), each time I run the full test suite
(This application has requested the Runtime to terminate it in an
unusual way...)

Relevant excerpt from tests.log in attachment (this was with trunk@1062600).

Cheers,
-- 
Johan


Re: FSFS format 6

2011-01-23 Thread Johan Corveleyn
On Wed, Dec 29, 2010 at 8:37 PM, Stefan Fuhrmann eq...@web.de wrote:
 On 29.12.2010 01:58, Johan Corveleyn wrote:

 On Sun, Dec 12, 2010 at 4:23 PM, Stefan Fuhrmann
 stefanfuhrm...@alice-dsl.de  wrote:

 On 19.10.2010 15:10, Daniel Shahaf wrote:

 Greg Stein wrote on Tue, Oct 19, 2010 at 04:31:42 -0400:

 Personally, I see [FSv2] as a broad swath of API changes to align our
 needs with the underlying storage. Trowbridge noted that our current
 API makes it *really* difficult to implement an effective backend. I'd
 also like to see a backend that allows for parallel PUTs during the
 commit process. Hyrum sees FSv2 as some kind of super-key-value
 storage with layers on top, allowing for various types of high-scaling
 mechanisms.

 At the retreat, stefan2 also had some thoughts about this...

 [This is just a brain-dump for 1.8+]

 While working on the performance branch I made some
 observations concerning the way FSFS organizes data
 and how that could be changed for reduced I/O overhead.

 notes/fsfs-improvements.txt contains a summary of that
 could be done to improve FSFS before FS-NG. A later
 FS-NG implementation should then still benefit from the
 improvements.

 +(number of fopen calls during a log operation)

 I like this proposal a lot. As I already told before, we are running
 our FSFS back-end on a SAN over NFS (and I suspect we're not the only
 company doing this). In this environment, the server-side I/O of SVN
 (especially the amount of random reads and fopen calls during e.g.
 log) is often the major bottleneck.

 There is one question going around in my head though: won't you have
 to change/rearrange a lot of the FS layer code (and maybe repos
 layer?) to benefit from this new format?

 Maybe. But as far as I understand the current
 FSFS structure, data access is mainly chasing
 pointers, i.e. reading relative or absolute byte
 offsets and moving there for the next piece of
 information. If everything goes well, none of that
 code needs to change; the revision packing
 algorithm will simply produce different offset
 values.

 The current code is written in a certain way, not particularly
 optimized for this new format (I seem to remember log does around 10
 fopen calls for every interesting rev file, each time reading a
 different part of it). Also, if an operation currently needs to access
 many revisions (like log or blame), it doesn't take advantage at all
 of the fact that they might be in a single packed rev file. The pack
 file is opened and seeked in just as much as the sum of the individual
 rev files.

 The fopen() calls should be eliminated by the
 file handle cache. IOW, they should already be
 addressed on the performance branch. Please
 let me know if that is not the case.

Ok, finally got around to verifying this.

You are completely correct: the performance branch avoids the vast
amount of repeated fopen() calls. With a simple test (testfile with 3
revisions, executing svn log of it) (note: this is an unpacked 1.6
repository):

- trunk: opens each rev file between 19 and 21 times.

- performance branch: opens each rev file 2 times.

(I don't know why it's not simply 1 time, but ok, 2 times is already a
factor 10 better than trunk :-)).

I tested this simply by adding one line of printf instrumentation
inside libsvn_subr/io.c#svn_io_file_open (see patch in attachment, as
well as the output for trunk and for perf-branch).

Now, if only that file-handle cache could be merged to trunk :-) ...

Cheers,
-- 
Johan
Index: subversion/libsvn_subr/io.c
===
--- subversion/libsvn_subr/io.c (revision 1062600)
+++ subversion/libsvn_subr/io.c (working copy)
@@ -2786,6 +2786,7 @@ svn_io_file_open(apr_file_t **new_file, const char
   const char *fname_apr;
   apr_status_t status;
 
+  fprintf(stderr,  OPEN(%d) %s\n, flag, fname);
   SVN_ERR(cstring_from_utf8(fname_apr, fname, pool));
   status = file_open(new_file, fname_apr, flag | APR_BINARY, perm, TRUE,
  pool);
$ ../../client_build/svn_branch_performance/dist/bin/svnserve -d -r 
c:/research/svn/experiment/repos
 OPEN(1) C:/research/svn/experiment/repos/format
 OPEN(129) C:/research/svn/experiment/repos/db/fs-type
 OPEN(129) C:/research/svn/experiment/repos/db/fs-type
 OPEN(129) C:/research/svn/experiment/repos/db/format
 OPEN(129) C:/research/svn/experiment/repos/db/uuid
 OPEN(129) C:/research/svn/experiment/repos/db/min-unpacked-rev
 OPEN(161) C:/research/svn/experiment/repos/db/fsfs.conf
 OPEN(129) C:/research/svn/experiment/repos/db/min-unpacked-revprop
 OPEN(129) C:/research/svn/experiment/repos/db/current
 OPEN(161) C:/research/svn/experiment/repos/conf/svnserve.conf
 OPEN(129) C:/research/svn/experiment/repos/db/revs/0/0
 OPEN(129) C:/research/svn/experiment/repos/db/current
 

Re: [serf-dev] Re: What stands between us and branching 1.7?

2011-01-23 Thread Lieven Govaerts
On Sun, Jan 23, 2011 at 8:39 PM, Justin Erenkrantz jus...@erenkrantz.comwrote:

 On Sun, Jan 23, 2011 at 10:22 AM, Lieven Govaerts svn...@mobsol.be
 wrote:
  Things have changed since then though. Can anyone test with svn 1.6.x to
 see
  how it uses memory?

 For ra_serf, I'm wondering if we're creating an additional pool that
 isn't necessary - namely the editor_pool.

 I've done some light testing with the ra_serf patch below, which does
 the following things:
  - Create a separate pool hierarchy for files
  - Combines the per-file editor pool with the regular pool for file

 For serf itself, I've also switched zlib to use the bucket allocator -
 which also helps which churn - as every call to inflateInit/inflateEnd
 invokes malloc/free - so we can save on that quite bit.

 Overall, this seems to reduce the amount of 20KB allocations
 substantially...but not sure it does much to the overall memory usage.

 Not sure if/when I'll pick this up again...so patches below if someone
 else wants to run with it.  -- justin

 Index: update.c
 ===
 --- update.c(revision 1062462)
 +++ update.c(working copy)
 @@ -107,6 +107,7 @@
   /* controlling dir baton - this is only created in open_dir() */
   void *dir_baton;
   apr_pool_t *dir_baton_pool;
 +  apr_pool_t *file_pool;

   /* Our master update editor and baton. */
   const svn_delta_editor_t *update_editor;
 @@ -202,9 +203,6 @@
   /* The properties for this file */
   apr_hash_t *props;

 -  /* pool passed to update-add_file, etc. */
 -  apr_pool_t *editor_pool;
 -
   /* controlling file_baton and textdelta handler */
   void *file_baton;
   const char *base_checksum;
 @@ -403,7 +401,7 @@
   report_info_t *new_info;

   new_info = apr_pcalloc(info_parent_pool, sizeof(*new_info));
 -  apr_pool_create(new_info-pool, info_parent_pool);
 +  apr_pool_create(new_info-pool, info-dir-file_pool);
   new_info-file_baton = NULL;
   new_info-lock_token = NULL;
   new_info-fetch_file = FALSE;
 @@ -500,6 +498,7 @@
   if (dir-base_name[0] == '\0')
 {
   apr_pool_create(dir-dir_baton_pool, dir-pool);
 +  apr_pool_create(dir-file_pool, dir-pool);

   if (dir-report_context-destination 
   dir-report_context-sess-wc_callbacks-invalidate_wc_props)
 @@ -519,6 +518,7 @@
   SVN_ERR(open_dir(dir-parent_dir));

   apr_pool_create(dir-dir_baton_pool,
 dir-parent_dir-dir_baton_pool);
 +  dir-file_pool = dir-parent_dir-file_pool;


Linking the file pool to the pool of its parent dir has resulted in crashes
earlier. I seem to remember it's because the files are received after the
dir is already closed in the editor drive, but I don't remember all the
details.

Lieven


[..]