Re: Some performance data

2011-01-18 Thread Philip Martin
Hyrum K Wright hy...@hyrumwright.org writes:

 I've been looking at which bugs can be closed by the pending
 completion of wc-ng.  In particular, I've been looking at issue #1284:
 http://subversion.tigris.org/issues/show_bug.cgi?id=1284

 The issue is essentially a complaint about the performance of 'svn mv
 somedir targetdir' when the number of files in somedir is very large,
 in the thousands.  It's a performance bug with no real good feeling
 for done, but given the performance improvements of wc-ng, I'm marking
 it as so.

I believe there is a relatively small change that could make it better
still.  Look at libsvn_client/copy.c:do_wc_to_wc_moves_with_locks2

  SVN_ERR(svn_wc_copy3(b-ctx-wc_ctx, b-pair-src_abspath_or_url,
   dst_abspath, FALSE /* metadata_only */,
   b-ctx-cancel_func, b-ctx-cancel_baton,
   b-ctx-notify_func2, b-ctx-notify_baton2,
   scratch_pool));

  SVN_ERR(svn_wc_delete4(b-ctx-wc_ctx, b-pair-src_abspath_or_url,
 FALSE, FALSE,
 b-ctx-cancel_func, b-ctx-cancel_baton,
 b-ctx-notify_func2, b-ctx-notify_baton2,
 scratch_pool));

Have svn_wc_copy3 set metadata_only to TRUE, this will be faster.  Then
do svn_io_rename to move the working tree, it's a single rename so it
should be fast.  The svn_wc_delete4 call should still operate if the
working tree is missing and could be marginally faster.

Overall this should be significantly faster as it doesn't have to copy
all the working files.  It has the additional benefit that it preserves
timestamps in the working tree.  The only reason I haven't implemented
it yet is that I haven't decided whether we should use a workqueue.

-- 
Philip


Re: Coding goals / requirements.

2011-01-18 Thread Johan Corveleyn
Hi,

As the guy responsible for the quote that started this thread ([1]):

 Actually, what's a little bit troubling is that there are currently
 only 3 possible file_len's, of which only 2 are used in practice:
 diff2 and diff3 (diff4 is not used in svn core, only in
 tools/diff/diff4). So, if we make a slow and a fast version, we could
 just as well make a XXX2 and an XXX3 version explicitly, and dispense
 with the need to pass arrays and array lenghts. Hmmm... generic (but
 more complex) code vs. a couple of specialized versions.

I'd like to point out that I didn't mean to criticize the API choices,
I was just weighing the options. In fact, I came up with the initial
API choice myself (passing an array of files, instead of passing 2
resp. 3 resp. 4 files), because I wanted it to be generically useful.
In any case it's a new API in the libsvn_diff module, for making
prefix/suffix optimizations possible.

I don't want to hijack this high-level discussion to be a technical
one again, but it may provide some additional context to give some
more details about the thoughts behind it ...

In fact, I started out with the simple form: passing 2 file
arguments for regular diff. But then I discovered there was also
something like diff3 (with 3 files) and diff4 (with 4 files; not used
in core SVN). So I simply thought to myself: theoretically this
optimization should work with N files, so why not just make it so. And
I changed the API to accept an array of files.

This made the code more complex, but in reality it's always the same
approach (so once you 'get' it, it's easy to follow):

Things like:

while(*file0.curp == *file1.curp)
{
  ...
}

are replaced with (actually AND-ing N (or N-1) conditions):

for (i = 1, is_match = TRUE; i  file_len; i++)
  is_match = is_match  *file[0].curp == *file[i].curp;
while(is_match)
{
  ...
  for (i = 1, is_match = TRUE; i  file_len; i++)
is_match = is_match  *file[0].curp == *file[i].curp;
}


So, to be very concrete, it's really the choice between:

(1) A generic set of functions, that work for N files, which are
signifcantly more complex than the N==2 case:

datasources_open(file[])
find_identical_prefix(file[])
find_identical_suffix(file[])

(2) A specialized set of functions for 2, 3 and 4 files (4 is maybe
optional, because not used in svn core), which are a lot simpler, but
actually completely duplicate the higher level logic:

datasources_open2(file0, file1)
find_identical_prefix2(file0, file1)
find_identical_suffix2(file0, file1)

datasources_open3(file0, file1, file2)
find_identical_prefix3(file0, file1, file2)
find_identical_suffix3(file0, file1, file2)

datasources_open4(file0, file1, file2, file3)
find_identical_prefix4(file0, file1, file2, file3)
find_identical_suffix4(file0, file1, file2, file3)


At the time, I felt that (1) was the right choice, WRT power/weight
balance, keeping the logic in one single place. But with stefan2's
proposed low-level optimizations of the algorithm, it's more or less
open for debate :). Those low-level optimizations (which are
definitely worth it) are much more difficult if you want to write them
generically. And apart from that, splitting out the functions in _2,
_3 and _4 variants is also a low-level optimization by itself.

Right now, I'm still trying to keep it generic (trying to integrate
stefan2's suggestions in a generic way; I'm assuming for now that the
low-level optimization coming from the splitting-out itself is not
significant (I'll have to test that)), but I'm not so sure anymore.
We'll see...

Anyway, all in all, I certainly don't feel like I've wasted my time,
because it was also a great learning experience for me (new to C). And
being a perfectionist, I want it to be as good as possible WRT the
power/weight ratio (the most bang (speed/features/...) for the buck
(complexity/maintainability/readability)) :-).

Cheers,
-- 
Johan

[1] http://svn.haxx.se/dev/archive-2011-01/0241.shtml

On Tue, Jan 18, 2011 at 7:37 AM, Arwin Arni ar...@collab.net wrote:
 Hi All,

 It is a very interesting notion that Gavin throws at us. I think it is very
 important for an open-source project to maintain it's code in a way that it
 is easy for a new guy (like me) to make quick and meaningful changes. Most
 open-source projects with a large development community ends up in a mess of
 perfectly working, yet highly unreadable code.

 However, as a pair of fresh eyes looking at the code, I can safely say that
 though being an open-source project, Subversion has managed to stay
 readable. This can only be attributed to the hours of work spent on
 over-engineering the code (BTW, I personally don't think anything can be
 over-engineered. In my book, it is merely an synonym for perfection).

 There are parts of the code (in svn) that have been written with the notion
 of getting things done. And these are the parts that I really find
 difficult to 

Re: svn command line client's inconsistent behavior in handling multiple targets.

2011-01-18 Thread Noorul Islam K M
Julian Foad julian.f...@wandisco.com writes:

 On Fri, 2011-01-14 at 17:50 +0530, Noorul Islam K M wrote:

 Stefan Sperling s...@elego.de writes:
 
  On Thu, Jan 13, 2011 at 03:31:11PM +0530, Noorul Islam K M wrote:
 
  | Command| Return Code | Continue |
  |+-+--|
  | add|   0 | Y|
  | blame  |   1 | N|
  | cat|   1 | Y|
  | changelist |   0 | N|
  | commit |   1 | N|
  | copy   |   1 | N|
  | delete |   1 | N|
  | diff   |   1 | N|
  | info   |   1 | Y|
  | list   |   1 | N|
  | log|   1 | N|
  | revert |   0 | Y|
 
  Nice overview, thanks for putting this together!

 Yes, good.  This shows operations on local (working copy) targets only,
 doesn't it, apart from commit.  Note Stefan's observation that the
 behaviour should (and probably does) vary depending on whether it's a
 local operation or a repository operation, so it might be useful to
 extend the table to show both.

  The commands add, cat, info and revert continues execution even if it
  finds one of the targets non-existing. All of them returns 0 to the
  shell except 'info' which returns 1. Also info prints 
  
  svn: A problem occurred; see other errors for details
 
  at the end of the execution which tells the user that something went
  wrong during execution. This is not the case with 'add', 'cat' and
  'revert'. I think these three commands also should return 1 and print
  the message which 'info' does, so that users does not have to scroll
  up to see the warnings to decide something went wrong.
 
  +1 on making more commands print a problem occured message at the end.

 Yes, +1.  (Spelling: occurred.)

  I also think that all commands should exit 1 if one of the specified
  targets could not be found. This makes it much easier to detect such
  errors in shell scripts.
 
  Among these revert prints 'Skipped' message and others print 'svn:
  warning: ..' message.
 
  Hmm... somewhat inconsistent, but I don't think we need consistency
  here. Either way of putting it (skipped or warning) is fine.

 I agree this is not so important.  My preference is for a warning that
 says what the problem was; a simple Skipped message is too vague for
 my liking.

  I think commit operation is an atomic one, therefore it does make
  sense when it errors out when one of the target is invalid.
  
  I am not sure why all the other commands errors out when one of the
  targets is non-existing.
 
  I think any command that tries to make a commit should error out.
  This includes copy and delete, because they can make commits without
  a working copy.
 
  Commands that operate on the working copy (including copy and delete!)
  should continue, print a warning or skipped message, and print
  the problem occured message at the end.

 Sounds good.  So is this what we want?:

   (1) If a command is making a commit (or, let's say, changing the
 repository in any way, so revprop changes are included), then as soon as
 an invalid target is detected, error out (print an error message and
 exit with return code = 1).

   (2) If a command is changing (only) the WC, and a target is invalid,
 then print a warning (or skipped message) and move on to the next
 target, and then at the end print a summary message and exit with return
 code = 1.

   (3) If a command is not changing the repos or WC, but only fetching
 information (cat, blame, export, info, list, pg, pl, ...) then ... same
 as (2)?


 If everyone agrees to this then I will start sending patches for this.

 Noorul, what would the table(s) of results look like after all these
 changes?


I included return code for URL related commands in the below table.

Commands add, changelist, commit and revert are WC only commands.
'delete' is a commit operation when performed on URL. So I have not
included them here in URL list.

| Command| Return Code | Continue |
|+-+--|
| blame  |   1 | N|
| cat|   1 | N|
| copy   |   1 | N|
| diff   |   1 | N|
| info   |   1 | Y|
| list   |   1 | N|
| log|   1 | N|


After all these changes the table will look like this.

For WC related commands:

I removed commit because it is an atomic one and it will error out. So
no change.

| Command| Return Code | Continue |
|+-+--|
| add|   1 | Y|
| blame  |   1 | Y|
| cat|   1 | Y|
| changelist |   1 | Y|
| delete |   1 | Y|
| diff   |   1 | Y|
| info   |   1 | Y|
| list   |   1 | Y|
| log 

Re: My take on the diff-optimizations-bytes branch

2011-01-18 Thread Johan Corveleyn
On Mon, Jan 17, 2011 at 3:12 AM, Johan Corveleyn jcor...@gmail.com wrote:
 On Sat, Jan 8, 2011 at 6:50 AM, Stefan Fuhrmann
 stefanfuhrm...@alice-dsl.de wrote:

[ ... snip ... ]

 But I think, the stack variable is certainly helpful
 and easy to do.

Ok, I've done this (locally, still have to clean up a little and then
commit). It gives a nice 10% speedup of prefix/suffix scanning, which
is great.

I have a couple of small questions though, about other small
optimizations you did:

 Index: subversion/libsvn_diff/diff_file.c
 ===
 --- subversion/libsvn_diff/diff_file.c(revision 1054044)
 +++ subversion/libsvn_diff/diff_file.c(working copy)
...
 @@ -258,10 +259,10 @@
   
 \
  for (i = 0; i  files_len; i++)  
 \
  {
 \
 -  if (all_files[i]-curp  all_files[i]-endp - 1)   
 \
 -all_files[i]-curp++;
 \
 +  if (all_files[i].curp + 1  all_files[i].endp) 
 \
 +all_files[i].curp++; 
 \

Just curious: why did you change that condition (from 'X  Y - 1' to
'X + 1  Y')?

You mention in your log message: minor re-arragements in arithmetic
expressions to maximize reuse of results (e.g. in
INCREMENT_POINTERS).

Why does this help, and what's the potential impact?


Second question:

 +find_identical_prefix_slow(svn_boolean_t *reached_one_eof,
 +   apr_off_t *prefix_lines,
 +   struct file_info file[], apr_size_t file_len,
 +   apr_pool_t *pool)
 +{
 +  svn_boolean_t had_cr = FALSE;
svn_boolean_t is_match, reached_all_eof;
apr_size_t i;
 +  apr_off_t lines = 0;

*prefix_lines = 0;
for (i = 1, is_match = TRUE; i  file_len; i++)
 -is_match = is_match  *file[0]-curp == *file[i]-curp;
 +is_match = is_match  *file[0].curp == *file[i].curp;
 +
while (is_match)
  {
/* ### TODO: see if we can take advantage of
   diff options like ignore_eol_style or ignore_space. */
/* check for eol, and count */
 -  if (*file[0]-curp == '\r')
 +  if (*file[0].curp == '\r')
  {
 -  (*prefix_lines)++;
 +  lines++;

Here you added a local variable 'lines', which is incremented in the
function, and copied to *prefix_lines at the end (you do the same in
fine_identical_prefix_fast). The original code just sets and
increments *prefix_lines directly. So, out of curiousness: why does
this help, and how much?

Thanks,
-- 
Johan


Re: NFS and fsfs: slow commit performance

2011-01-18 Thread Florian Weimer
* Blair Zajac:

 Attempting to perform high commit rates into an fsfs repository on NFS
 with two or more Linux boxes, one of the processes can get stuck in
 fcntl() for over 30 seconds:

 open(repo/db/write-lock, O_RDWR)  = 4
 fcntl(4, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}

What's the return value of fcntl?

(I wonder if this is a firewall issue, and the lock is never actually
acquired.)

-- 
Florian Weimerfwei...@bfk.de
BFK edv-consulting GmbH   http://www.bfk.de/
Kriegsstraße 100  tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99


Re: Ref-counting for pristine texts

2011-01-18 Thread Julian Foad
Can anyone spare a bit of time to take this forward?  I'll tell you what
I'm planning to do next.  Maybe someone can get to this before I can.

The first thing I want to do is add a DB index on the 'refcount' column
of the 'pristine' table.  I'm convinced that will be necessary for
adequate performance.  To do so involves:

  * Adding the 'CREATE INDEX' SQL statement.  That's simple - just
copy/modify/paste one line in wc-metadata.sql.

  * Adding a WC format bump step that adds the index to existing dev
WCs.  Also quite simple.

  * I'd be happier if I ran a timing comparison, before and after, with
searching for unreferenced pristines (those whose refcount is 0).  That
would give me confidence that adding the index has actually happened and
that it has the desired effect.  It would need a test with a large
number of pristines, a scattered selection of them having refcount=0,
and then time the running of svn_wc__db_pristine_cleanup().  With an
index, this should find them all very quickly and the deletion time
should dominate the whole cleanup operation, whereas without an index
the time taken to scan the whole table should dominate when the table is
very large.


On Thu, 2011-01-13, Branko Čibej wrote:
 On 13.01.2011 20:01, Julian Foad wrote:
  I have committed the ref counting for pristine texts (r1058523) and have
  had a bit more insight into when to perform the deletion.
 
  Deleting unreferenced texts on closing a wcroot is too late - too
  large a granularity - for the way the WC code is currently structured
  because this doesn't happen until a (long-lived) pool is cleaned up, as
  Bert pointed out.
 
  Deleting unreferenced texts after running the Work Queue is too soon -
  too fine a granularity - for the way commit is currently structured.
  A simple test of this approach showed that by the time the post-commit
  processing tries to install a reference to a pristine text whose
  checksum was noted earlier, in some cases that pristine row has already
  been purged.
 
 This would indicate that the reference counting happens too soon ... in
 other words, that a pristine can be dereferenced whilst some part of the
 code (or database) still refers to it. That breaks database consistency
 -- what happens if the user aborts a commit and then calls 'svn
 cleanup', for example?

If what I said about 'commit' is correct, then yes, that's bad and we
should look for a better way.  But I haven't tested it properly; I
noticed that the commit failed saying that the DB failed a 'REFERENCES'
clause, and what I said here is a hypothesis about how that happened.

Investingating this is the second thing I would want to do next.


  At the moment I think the practical solution is to insert calls to the
  pristine cleanup at the end of each main libsvn_wc public API that may
  have freed up one or more pristines.
 
 Mmm. Too many points of failure, don't you think? Of course, it's no
 worse than the cancellation handlers, however, and I suppose missing a
 chance to delete is better than having one too many.

I'm thinking this is nevertheless the best option.  The consequence of
missing some places is not too bad at all.  If we miss places where we
should clean up, then all it means is that some operations will leak
unused pristines, but they won't be leaked permanently, only until the
user runs one of the operations that do have the cleanup call in them.
That's a consequence of calling a global clean up all unreferenced
pristines function rather than trying to keep track of which pristines
have been freed up by *this* operation.


  Wherever we do it, if it's at all frequent, the search for unreferenced
  pristines needs to be efficient.  At present I use SELECT ... FROM
  PRISTINE WHERE refcount = 0.  I believe that can most easily be made
  efficient by adding an index on the 'refcount' column, so that's what I
  intend to do.
 
 That's the only thing you can do, unless you want to enhance the
 triggers to populate a queue of to-be-deleted pristines.

I agree that adding an index on the table is a better solution than
trying to maintain a separate queue of the pristines that are waiting to
be cleaned up.  The index makes it efficient for a simple query
('refcount = 0') to provide the same information as the separate queue
would provide.

- Julian





Re: NFS and fsfs: slow commit performance

2011-01-18 Thread Blair Zajac

On Jan 18, 2011, at 6:49 AM, Florian Weimer wrote:

 * Blair Zajac:
 
 Attempting to perform high commit rates into an fsfs repository on NFS
 with two or more Linux boxes, one of the processes can get stuck in
 fcntl() for over 30 seconds:
 
 open(repo/db/write-lock, O_RDWR)  = 4
 fcntl(4, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}
 
 What's the return value of fcntl?
 
 (I wonder if this is a firewall issue, and the lock is never actually
 acquired.)

It did succeed.  This was one of many fcntl() calls, I just did a control-C in 
strace.

Blair



Re: [PATCH] Add regression test for issue #3686:executable flag not correctly set on merge

2011-01-18 Thread Daniel Shahaf
Daniel Becroft wrote on Tue, Jan 18, 2011 at 07:13:12 +1000:
 Hi guys,
 
 I was looking through the Getting Involved section of the Subversion
 website, and decided to start by writing a regression test for an
 issue, starting with 3686 (executable flag not correctly set on
 merge). This issue was already assigned to someone, so apologies if
 it's already being worked on.
 

Welcome aboard :-).

As a note for next time, though, when an issue is assigned to someone,
it would be better to ask them (or dev@) whether they're working on it
before starting to work on it yourself.  I'm CCing Blair now for this
reason.

 I've attached the patch, and included it below as well. A few questions I 
 have:
 
 1) I couldn't find a precedent for examining the OS's executable flag,
 so my approach might be less than ideal. Should I add a test to
 prop_tests.py to check that setting svn:executable also sets the
 executable bit?

No, the test you're just adding already does just that.  Just copy the
two asserts above the # commit r2 line to after the comment and you're
done :-)

 2) The problem also exists under the '--reintegrate' scenario, even
 after a subsequent commit. Would it be better to include that case
 here, or move the entire test to the merge_reintegrate.py suite?
 

I'm not sure we need a separate test for the --reintegrate case;
it seems reasonable to assume it does (and will) share the 'set +x
permission' logic with normal merges.

As to location, if I wrote this I'd probably put it in prop_tests or
special_tests along with the other svn:executable tests.  (That's
somewhat of a bikeshed, though, and I don't feel strongly about this.)

 Regards,
 Daniel Becroft
 
 
 [[[
 Add regression test for issue #3686.
 
 This issue involves the executable flag being lost during a merge of a
 binary file with the 'svn:executable' property set.
 
 * subversion/tests/cmdline/merge_tests.py
   (merge_change_to_file_with_executable(sbox)) New test case.

Drop the signature, and add a colon:

(merge_change_to_file_with_executable): New test case.

 ]]]

(you've attached the patch both inline and as an attachment, I'll review
only the latter)

 Index: subversion/tests/cmdline/merge_tests.py
 ===
 --- subversion/tests/cmdline/merge_tests.py   (revision 1059656)
 +++ subversion/tests/cmdline/merge_tests.py   (working copy)
 @@ -16316,6 +16316,78 @@
  Subtree merge under working merge produced the wrong mergeinfo,
  '/A/C/nu:9', [], 'pg', SVN_PROP_MERGEINFO, nu_COPY_path)
  
 +
 +#--
 +# Test for issue #3686 'executable flag not correctly set on merge'
 +# See http://subversion.tigris.org/issues/show_bug.cgi?id=3686
 +def merge_change_to_file_with_executable(sbox):
 +  executable flag is maintained during binary merge
 +
 +  # Scenario: When merging a change to a binary file with the 
 'svn:executable'
 +  # property set, the file is not marked as 'executable'. After commit, the 
 +  # executable bit is set correctly.
 +  sbox.build()
 +  wc_dir = sbox.wc_dir
 +  trunk_url = sbox.repo_url + '/A/B/E'
 +
 +  alpha_path = os.path.join(wc_dir, A, B, E, alpha)
 +  beta_path = os.path.join(wc_dir, A, B, E, beta)
 +
 +  # Force one of the files to be a binary type
 +  svntest.actions.run_and_verify_svn(None, None, [],
 + 'propset', 'svn:mime-type', 
 'application/octet-stream',
 + alpha_path)
 +
 +  # Set the 'svn:executable' property on both files
 +  svntest.actions.run_and_verify_svn(None, None, [],
 + 'propset', 'svn:executable', 'ON',
 + beta_path)
 +
 +  svntest.actions.run_and_verify_svn(None, None, [],
 + 'propset', 'svn:executable', 'ON',
 + alpha_path)
 +
 +  # Verify the executable bit has been set before committing
 +  assert os.access(alpha_path, os.X_OK)
 +  assert os.access(beta_path, os.X_OK)
 +

No.  Please use 'assert' for bugs in the test itself, and something else
('raise svntest.Failure' is the most generic) for bugs in the code being
tested.  Example:

 struct test_case {
   const char *input;
   int *exit_code;
 } test_cases[] = { ... };

 for (i = 0; i  NUMBER_OF_TESTS; i++)
   {
 struct test_case t = test_cases[i];
 SVN_ERR_ASSERT(t.exit_code);
 SVN_TEST_ASSERT(system(t.input) == *t.exit_code);
   }

 +  # Commit change (r2)
 +  sbox.simple_commit()
 +
 +  # Create the branch
 +  svntest.actions.run_and_verify_svn(None, None, [], 'cp',
 + trunk_url,
 + sbox.repo_url + '/branch',
 + '-m', Creating the Branch)   
 +
 +  # Modify the files + commit (r3)
 +  svntest.main.file_append(alpha_path, 'appended 

Re: NFS and fsfs: slow commit performance

2011-01-18 Thread Blair Zajac
On Jan 18, 2011, at 1:49 AM, Bolstridge, Andrew wrote:

 Hi.
 
 Just a quick question/suggestion: 
 
 If the exponential sleep can be a maximum of 25ms, and the random one
 between 0 and 100ms, what is the performance like if you just sleep for
 25ms - without any randomness, or doubling of sleep times?
 
 I used to find (but this was ages ago) that just sleeping for 1ms would
 force the CPU to carry on with other things, perhaps that would have an
 effect that's just as good as the longer sleeps?
 
 Top stuff BTW, I doubt it'll have much effect on my slow virtual machine
 disks, but every performance boost helps!
 Thanks.

[cc'ing dev@s.a.o]

If you're not storing your repositories on NFS, then this patch doesn't help 
performance, the kernel's fcntl() will block the caller and unblock it when the 
lock is released.

The sleeps are there to not pound the NFS server with traffic. 

I pt in a fixed 25ms sleep and here are the results, which don't look as good 
as the random sleep:

N   is 1567
sum is 806.845106840133
meanis 0.514897962246415
SD Nis 0.591845693041212
SD N-1  is 0.592034630218584
min is 0.1712911129
25% is 0.232829093933
50% is 0.273239135742
90% is 1.02276110649
95% is 1.61777615547
99% is 3.28569889069
max is 6.39945292473

Percentage of commits by slow systems: 27.1
Percentage of commits by fast systems: 72.9

Blair



Re: [PATCH] Add regression test for issue #3686:executable flag not correctly set on merge

2011-01-18 Thread Daniel Becroft
On Wed, Jan 19, 2011 at 3:08 AM, Daniel Shahaf d...@daniel.shahaf.namewrote:

 Daniel Becroft wrote on Tue, Jan 18, 2011 at 07:13:12 +1000:
  Hi guys,
 
  I was looking through the Getting Involved section of the Subversion
  website, and decided to start by writing a regression test for an
  issue, starting with 3686 (executable flag not correctly set on
  merge). This issue was already assigned to someone, so apologies if
  it's already being worked on.
 

 Welcome aboard :-).

 As a note for next time, though, when an issue is assigned to someone,
 it would be better to ask them (or dev@) whether they're working on it
 before starting to work on it yourself.  I'm CCing Blair now for this
 reason.


Ah, will do that in the future. Oops.


   I've attached the patch, and included it below as well. A few questions
 I have:
 
  1) I couldn't find a precedent for examining the OS's executable flag,
  so my approach might be less than ideal. Should I add a test to
  prop_tests.py to check that setting svn:executable also sets the
  executable bit?

 No, the test you're just adding already does just that.  Just copy the
 two asserts above the # commit r2 line to after the comment and you're
 done :-)

  2) The problem also exists under the '--reintegrate' scenario, even
  after a subsequent commit. Would it be better to include that case
  here, or move the entire test to the merge_reintegrate.py suite?
 

 I'm not sure we need a separate test for the --reintegrate case;
 it seems reasonable to assume it does (and will) share the 'set +x
 permission' logic with normal merges.


I thought about that, but this situation fixes itself after the 'svn
merge/svn commit' combination. However, after using --reintegrate. and
committing, the executable bit is still not set, so I think there might be
something more going on there.


 As to location, if I wrote this I'd probably put it in prop_tests or
 special_tests along with the other svn:executable tests.  (That's
 somewhat of a bikeshed, though, and I don't feel strongly about this.)


My logic for the placement was: it's exercising a problem with the svn merge
process, so it belongs in merge_tests.py. Maybe that was wrong.


  Regards,
  Daniel Becroft
 
 
  [[[
  Add regression test for issue #3686.
 
  This issue involves the executable flag being lost during a merge of a
  binary file with the 'svn:executable' property set.
 
  * subversion/tests/cmdline/merge_tests.py
(merge_change_to_file_with_executable(sbox)) New test case.

 Drop the signature, and add a colon:

(merge_change_to_file_with_executable): New test case.

  ]]]

 (you've attached the patch both inline and as an attachment, I'll review
 only the latter)

  Index: subversion/tests/cmdline/merge_tests.py
  ===
  --- subversion/tests/cmdline/merge_tests.py   (revision 1059656)
  +++ subversion/tests/cmdline/merge_tests.py   (working copy)
  @@ -16316,6 +16316,78 @@
   Subtree merge under working merge produced the wrong mergeinfo,
   '/A/C/nu:9', [], 'pg', SVN_PROP_MERGEINFO, nu_COPY_path)
 
  +
  +#--
  +# Test for issue #3686 'executable flag not correctly set on merge'
  +# See http://subversion.tigris.org/issues/show_bug.cgi?id=3686
  +def merge_change_to_file_with_executable(sbox):
  +  executable flag is maintained during binary merge
  +
  +  # Scenario: When merging a change to a binary file with the
 'svn:executable'
  +  # property set, the file is not marked as 'executable'. After commit,
 the
  +  # executable bit is set correctly.
  +  sbox.build()
  +  wc_dir = sbox.wc_dir
  +  trunk_url = sbox.repo_url + '/A/B/E'
  +
  +  alpha_path = os.path.join(wc_dir, A, B, E, alpha)
  +  beta_path = os.path.join(wc_dir, A, B, E, beta)
  +
  +  # Force one of the files to be a binary type
  +  svntest.actions.run_and_verify_svn(None, None, [],
  + 'propset', 'svn:mime-type',
 'application/octet-stream',
  + alpha_path)
  +
  +  # Set the 'svn:executable' property on both files
  +  svntest.actions.run_and_verify_svn(None, None, [],
  + 'propset', 'svn:executable', 'ON',
  + beta_path)
  +
  +  svntest.actions.run_and_verify_svn(None, None, [],
  + 'propset', 'svn:executable', 'ON',
  + alpha_path)
  +
  +  # Verify the executable bit has been set before committing
  +  assert os.access(alpha_path, os.X_OK)
  +  assert os.access(beta_path, os.X_OK)
  +

 No.  Please use 'assert' for bugs in the test itself, and something else
 ('raise svntest.Failure' is the most generic) for bugs in the code being
 tested.  Example:

 struct test_case {
   const char *input;
   int *exit_code;
 } test_cases[] = { ... };

 for (i = 0; i  

Re: [PATCH] Add regression test for issue #3686:executable flag not correctly set on merge

2011-01-18 Thread Daniel Shahaf
Daniel Becroft wrote on Wed, Jan 19, 2011 at 07:27:15 +1000:
 On Wed, Jan 19, 2011 at 3:08 AM, Daniel Shahaf d...@daniel.shahaf.namewrote:
  Daniel Becroft wrote on Tue, Jan 18, 2011 at 07:13:12 +1000:
   2) The problem also exists under the '--reintegrate' scenario, even
   after a subsequent commit. Would it be better to include that case
   here, or move the entire test to the merge_reintegrate.py suite?
  
 
  I'm not sure we need a separate test for the --reintegrate case;
  it seems reasonable to assume it does (and will) share the 'set +x
  permission' logic with normal merges.
 
 
 I thought about that, but this situation fixes itself after the 'svn
 merge/svn commit' combination. However, after using --reintegrate. and
 committing, the executable bit is still not set, so I think there might be
 something more going on there.
 

Ah, if there's a different in the results between merge and reintegrate
merge, then two tests are justifiable.  (I missed that part of your
original comment.)

 
  As to location, if I wrote this I'd probably put it in prop_tests or
  special_tests along with the other svn:executable tests.  (That's
  somewhat of a bikeshed, though, and I don't feel strongly about this.)
 
 
 My logic for the placement was: it's exercising a problem with the svn merge
 process, so it belongs in merge_tests.py. Maybe that was wrong.
 

As I said, I don't feel strongly about this.

   +  # Verify the executable bit has been set
   +  assert os.access(alpha_path, os.X_OK)
   +  assert os.access(beta_path, os.X_OK)
   +
 
  As expected, the test passes if you remove these two lines.
 
 
 Actually, the test passes if you only take out the assertion for the
 'alpha_path' - the problem only arises when merging binary files, it works
 correctly for non-binary files. That's why I've got both in the test case.
 

OK.

  In short: +1 to commit, assuming you replace the asserts with something
  else per my earlier comment.
 
 Thanks for the review, Daniel. I'll make these changes, and will send later
 tonight. Is it preferable to be in a new email, or in a reply to this one?
 

Reply to this one.

 Cheers,
 Daniel B.


[PATCH] Server fix for issue #3657 - Any mod_dav_svn experts out there? (was Re: svn commit: r966822)

2011-01-18 Thread Paul Burba
 On Fri, Jan 7, 2011 at 7:08 PM, Paul Burba ptburba_at_gmail.com wrote:

 Mike suggested to me that the fix for this problem can be made in
 mod_dav_svn, since the update reporter already knows which properties
 have changed, *regardless* of the send-all mode, but it discards this
 information if send-all=FALSE and instead instead sends the client a
 'fetch-props' response. As described previously in this thread, this
 causes the client to fetch all the properties of the path, not just
 the changes, and *all* the props are pushed through the editor
 callbacks as if they were all actual changes (which again is the core
 problem of issue #3657).

 I'm working on that approach,

 I had pushed off the fix-it-in-mod_dav_svn approach till 1.8 since I
 was going in circles with it, but recently found another real-world
 example where this issue causes spurious conflicts on directory
 properties, see
 http://subversion.tigris.org/issues/show_bug.cgi?id=3657#desc13.

 This issue has existed before 1.5 so it's nothing new, but given that
 I found another non-theoretical example and given that the presence of
 mergeinfo is only going to make this issue more common[1] I thought
 I'd take another look for 1.7.

 I still haven't gotten anywhere with the mod_dav_svn fix, so not much
 to talk about there, but I did commit the fix for the directory
 property conflicts in
 http://svn.apache.org/viewvc?view=revisionrevision=1056565. The fix
 here is pretty much the same thing I did in
 http://svn.apache.org/viewvc?view=revisionrevision=966822: Filter out
 the phony prop changes in the libsvn_client/repos_diff.c
 svn_delta_editor_t change_[dir|file]_prop() callback implementations.

 Just to be clear about one thing, when I committed r966822 Bert asked:

 (change_file_prop): Only stash true property differences in the file
 baton. The DAV RA providers may be sending us all the properties on
 a file, not just the differences.

 Shouldn't this be fixed in the ra layer implementations then?

 This would be just 'guessing' if it is a change or a complete set.

 Why not the RA layer? Two reasons. The first I explained earlier in
 this thread, it reopens issue #2048 'Primary connection (of parallel
 network connections used) in ra-dav diff/merge timeout unnecessarily.'
  I won't go over that again, but even if there is a way to avoid that,
 the RA layer is simply doing what we ask it when using the DAV
 providers. Specifically, the mod_dav_svn update report handler, when
 in 'skelta' mode[2] and describing changes to a path on which a
 property has changed, asks the client to later fetch all properties
 and figure out what has changed itself, i.e. the client asks the RA
 layer to give us *all* the properties, it obliges, so where is the
 problem as far as the RA layer is concerned?

 So this ultimately needs to be fixed on the server. Given that, I am
 leaving these libsvn_client filters in place so a 1.7 client will DTRT
 with older servers (or maybe 1.7 servers too, since as I said, I'm not
 having great success there).

 As to the 'guessing', that was true. We'd go through the motions of
 filtering over ra_local and ra_svn too. A pretty minor performance
 hit, but in r1056565 I tweaked the filtering so it only applies when
 using ra_serf and ra_neon.

 Paul

 [1] simply by increasing the potential pool of property changes
 incoming during a merge.

 [2] Which merges always are, regardless of whether we use ra_serf or ra_neon.

 Paul

Ok, after many false starts and detours I finally have a server-side
fix for this issue.  Pursuing a suggestion by cmpilato, I made the
mod_dav_svn update report always send property changes, regardless of
whether the report is in skelta (send-all=false) mode or non-skelta
(send-all=true) mode.

In addition to the normal serf/neon trunk tests, I tested 1.5.10 (dev
build) and 1.6.16 (dev build) against this patched 1.7 server and
there were no unexpected [1] failures.

If any mod_dav_svn experts have time to take a look at this patch I'd
appreciate it.

Thanks,

Paul

[[[
Server-side fix for issue #3657 'dav update report handler in skelta mode can
cause spurious conflicts'.

This change has two parts that are loosely related:

1) It stops mod_dav_svn from ever sending the 'fetch-props' response to a
   client, even when in skelta (i.e. send-all=false) mode.  The server
   already knows what properties have changed so now it simply sends them,
   rather than telling the client to ask for them later.  This prevents
   the svn_delta_editor_t change_[file|dir]_prop callback from receiving
   faux property changes when the client asks for *all* the properties.
   See http://subversion.tigris.org/issues/show_bug.cgi?id=3657 for all the
   gory details.

2) The comments in our code frequently make reference to 'modern' servers
   and clients in the context of mod_dav_svn.  Unfortunately there is no such
   thing as a pre-modern server, they are all modern!  The use of the term
   'modern' in our code 

Re: Ref-counting for pristine texts

2011-01-18 Thread Branko Čibej
On 18.01.2011 16:58, Julian Foad wrote:
 On Thu, 2011-01-13, Branko Čibej wrote:
 This would indicate that the reference counting happens too soon ... in
 other words, that a pristine can be dereferenced whilst some part of the
 code (or database) still refers to it. That breaks database consistency
 -- what happens if the user aborts a commit and then calls 'svn
 cleanup', for example?
 If what I said about 'commit' is correct, then yes, that's bad and we
 should look for a better way.  But I haven't tested it properly; I
 noticed that the commit failed saying that the DB failed a 'REFERENCES'
 clause, and what I said here is a hypothesis about how that happened.

 Investingating this is the second thing I would want to do next.

I beg to differ, it should be the first thing, because it's a
correctness issue. If the references in the database are maintained
correctly, then it should be safe to delete after each successful
transaction commit.

That's assuming that any high-level operation involves a single
transaction, e.g., that a commit can't fail in some way that would
require the pristine to still be in place in order to recover, even if
the reference count in the database is 0.

-- Brane



Re: My take on the diff-optimizations-bytes branch

2011-01-18 Thread Stefan Fuhrmann

On 18.01.2011 12:56, Johan Corveleyn wrote:

On Mon, Jan 17, 2011 at 3:12 AM, Johan Corveleynjcor...@gmail.com  wrote:

On Sat, Jan 8, 2011 at 6:50 AM, Stefan Fuhrmann
stefanfuhrm...@alice-dsl.de  wrote:

[ ... snip ... ]


But I think, the stack variable is certainly helpful
and easy to do.

Ok, I've done this (locally, still have to clean up a little and then
commit). It gives a nice 10% speedup of prefix/suffix scanning, which
is great.

I have a couple of small questions though, about other small
optimizations you did:


Index: subversion/libsvn_diff/diff_file.c
===
--- subversion/libsvn_diff/diff_file.c  (revision 1054044)
+++ subversion/libsvn_diff/diff_file.c  (working copy)

...

@@ -258,10 +259,10 @@
   \
  for (i = 0; i  files_len; i++)  \
  {\
-  if (all_files[i]-curp  all_files[i]-endp - 1)   \
-all_files[i]-curp++;\
+  if (all_files[i].curp + 1  all_files[i].endp) \
+all_files[i].curp++; \

Just curious: why did you change that condition (from 'X  Y - 1' to
'X + 1  Y')?

You mention in your log message: minor re-arragements in arithmetic
expressions to maximize reuse of results (e.g. in
INCREMENT_POINTERS).

Why does this help, and what's the potential impact?

There is a hidden common sub-expression.
Variant 1 in pseudo-code:

lhs = all_files[i].curp;
temp1 = all_files[i].endp;
rhs = temp1 - 1;
if (lhs  rhs)
{
  temp2 = lhs + 1;
  all_files[i].curp = temp2;
}

Variant 2:

lhs = all_files[i].curp;
temp = lhs + 1;
rhs = all_files[i].endp;
if (lhs  rhs)
  all_files[i].curp = temp;

The problem is that the compiler cannot figure that out
on its own because (x+1) and (y-1) don't overflow / wrap
around for the same values. In particular 0  (0-1) is true
and (0+1)  0 is false in unsigned arithmetic.



Second question:


+find_identical_prefix_slow(svn_boolean_t *reached_one_eof,
+   apr_off_t *prefix_lines,
+   struct file_info file[], apr_size_t file_len,
+   apr_pool_t *pool)
+{
+  svn_boolean_t had_cr = FALSE;
svn_boolean_t is_match, reached_all_eof;
apr_size_t i;
+  apr_off_t lines = 0;

*prefix_lines = 0;
for (i = 1, is_match = TRUE; i  file_len; i++)
-is_match = is_match  *file[0]-curp == *file[i]-curp;
+is_match = is_match  *file[0].curp == *file[i].curp;
+
while (is_match)
  {
/* ### TODO: see if we can take advantage of
   diff options like ignore_eol_style or ignore_space. */
/* check for eol, and count */
-  if (*file[0]-curp == '\r')
+  if (*file[0].curp == '\r')
  {
-  (*prefix_lines)++;
+  lines++;

Here you added a local variable 'lines', which is incremented in the
function, and copied to *prefix_lines at the end (you do the same in
fine_identical_prefix_fast). The original code just sets and
increments *prefix_lines directly. So, out of curiousness: why does
this help, and how much?

Incrementing the temporary variable requires one
indirection less than the original code and uses only
one instead of two registers (in typical cases).

The 'how much?' part depends on compiler / optimizer
and even more on the target platform (x86 has very
few registers to keep data in fly; x64 has about twice
as many).

So, both changes are more or less general coding
patterns that will improve performance somewhat
without compromising readability.

-- Stefan^2.




Re: Coding goals / requirements.

2011-01-18 Thread Gavin Beau Baumanis
Hi Johan,

I hope it wasn't inferred in any part of my email that I was trying to tell 
you what to do it in any specific manner.
And although I was prompted by your thread, it was completely unrelated.

In my day job - I always strive for the simplest solution possible.
If the requirements of the task require that it be shared around the 
application , then it becomes a requirement of that task to ensure that an 
appropriate API is provided.
If however there is no current need - even if I can see it might be used at a 
later date, I won't bother with any API related work.
What I think might happen later on; may not come to pass - and to my mind my 
employer deserves as much of my time being consumed for the purposes of adding 
new features, fixing bugs.
Future-proofing for the sake of future proofing is not of any advantage today 
to the product, or myself as the developer of the code.

It can be argued that if you're in a specific problem space, you'll save time 
by completing the API now for that code - as opposed to coming back to it later 
on and having to reacquaint yourself with code that you previously wrote.
Especially in the OSS space - it could well be a case of you having to acquaint 
yourself with someone else's code.

For me, I find that the time saved, still,  isn't that great, as long as;

* You have a clearly defined design standard that is ALWAYS followed.
* You write the simplest code possible to successfully complete the engineering 
task at hand.
* You provide as required appropriate inline documentation where the simplest 
solution possible is in fact complex.

I do realise that everyone has their own work practices / workflows and mileage 
varies depending on a matter of factors.

And it would seem that, at least initially, that I am being outvoted!


On 18/01/2011, at 8:35 PM, Johan Corveleyn wrote:

 Hi,
 
 As the guy responsible for the quote that started this thread ([1]):
 
 Actually, what's a little bit troubling is that there are currently
 only 3 possible file_len's, of which only 2 are used in practice:
 diff2 and diff3 (diff4 is not used in svn core, only in
 tools/diff/diff4). So, if we make a slow and a fast version, we could
 just as well make a XXX2 and an XXX3 version explicitly, and dispense
 with the need to pass arrays and array lenghts. Hmmm... generic (but
 more complex) code vs. a couple of specialized versions.
 
 I'd like to point out that I didn't mean to criticize the API choices,
 I was just weighing the options. In fact, I came up with the initial
 API choice myself (passing an array of files, instead of passing 2
 resp. 3 resp. 4 files), because I wanted it to be generically useful.
 In any case it's a new API in the libsvn_diff module, for making
 prefix/suffix optimizations possible.
 
 I don't want to hijack this high-level discussion to be a technical
 one again, but it may provide some additional context to give some
 more details about the thoughts behind it ...
 
 In fact, I started out with the simple form: passing 2 file
 arguments for regular diff. But then I discovered there was also
 something like diff3 (with 3 files) and diff4 (with 4 files; not used
 in core SVN). So I simply thought to myself: theoretically this
 optimization should work with N files, so why not just make it so. And
 I changed the API to accept an array of files.
 
 This made the code more complex, but in reality it's always the same
 approach (so once you 'get' it, it's easy to follow):
 
 Things like:
 
while(*file0.curp == *file1.curp)
{
  ...
}
 
 are replaced with (actually AND-ing N (or N-1) conditions):
 
for (i = 1, is_match = TRUE; i  file_len; i++)
  is_match = is_match  *file[0].curp == *file[i].curp;
while(is_match)
{
  ...
  for (i = 1, is_match = TRUE; i  file_len; i++)
is_match = is_match  *file[0].curp == *file[i].curp;
}
 
 
 So, to be very concrete, it's really the choice between:
 
 (1) A generic set of functions, that work for N files, which are
 signifcantly more complex than the N==2 case:
 
datasources_open(file[])
find_identical_prefix(file[])
find_identical_suffix(file[])
 
 (2) A specialized set of functions for 2, 3 and 4 files (4 is maybe
 optional, because not used in svn core), which are a lot simpler, but
 actually completely duplicate the higher level logic:
 
datasources_open2(file0, file1)
find_identical_prefix2(file0, file1)
find_identical_suffix2(file0, file1)
 
datasources_open3(file0, file1, file2)
find_identical_prefix3(file0, file1, file2)
find_identical_suffix3(file0, file1, file2)
 
datasources_open4(file0, file1, file2, file3)
find_identical_prefix4(file0, file1, file2, file3)
find_identical_suffix4(file0, file1, file2, file3)
 
 
 At the time, I felt that (1) was the right choice, WRT power/weight
 balance, keeping the logic in one single place. But with stefan2's
 proposed low-level optimizations of the algorithm, it's more or