Re: Some performance data
Hyrum K Wright hy...@hyrumwright.org writes: I've been looking at which bugs can be closed by the pending completion of wc-ng. In particular, I've been looking at issue #1284: http://subversion.tigris.org/issues/show_bug.cgi?id=1284 The issue is essentially a complaint about the performance of 'svn mv somedir targetdir' when the number of files in somedir is very large, in the thousands. It's a performance bug with no real good feeling for done, but given the performance improvements of wc-ng, I'm marking it as so. I believe there is a relatively small change that could make it better still. Look at libsvn_client/copy.c:do_wc_to_wc_moves_with_locks2 SVN_ERR(svn_wc_copy3(b-ctx-wc_ctx, b-pair-src_abspath_or_url, dst_abspath, FALSE /* metadata_only */, b-ctx-cancel_func, b-ctx-cancel_baton, b-ctx-notify_func2, b-ctx-notify_baton2, scratch_pool)); SVN_ERR(svn_wc_delete4(b-ctx-wc_ctx, b-pair-src_abspath_or_url, FALSE, FALSE, b-ctx-cancel_func, b-ctx-cancel_baton, b-ctx-notify_func2, b-ctx-notify_baton2, scratch_pool)); Have svn_wc_copy3 set metadata_only to TRUE, this will be faster. Then do svn_io_rename to move the working tree, it's a single rename so it should be fast. The svn_wc_delete4 call should still operate if the working tree is missing and could be marginally faster. Overall this should be significantly faster as it doesn't have to copy all the working files. It has the additional benefit that it preserves timestamps in the working tree. The only reason I haven't implemented it yet is that I haven't decided whether we should use a workqueue. -- Philip
Re: Coding goals / requirements.
Hi, As the guy responsible for the quote that started this thread ([1]): Actually, what's a little bit troubling is that there are currently only 3 possible file_len's, of which only 2 are used in practice: diff2 and diff3 (diff4 is not used in svn core, only in tools/diff/diff4). So, if we make a slow and a fast version, we could just as well make a XXX2 and an XXX3 version explicitly, and dispense with the need to pass arrays and array lenghts. Hmmm... generic (but more complex) code vs. a couple of specialized versions. I'd like to point out that I didn't mean to criticize the API choices, I was just weighing the options. In fact, I came up with the initial API choice myself (passing an array of files, instead of passing 2 resp. 3 resp. 4 files), because I wanted it to be generically useful. In any case it's a new API in the libsvn_diff module, for making prefix/suffix optimizations possible. I don't want to hijack this high-level discussion to be a technical one again, but it may provide some additional context to give some more details about the thoughts behind it ... In fact, I started out with the simple form: passing 2 file arguments for regular diff. But then I discovered there was also something like diff3 (with 3 files) and diff4 (with 4 files; not used in core SVN). So I simply thought to myself: theoretically this optimization should work with N files, so why not just make it so. And I changed the API to accept an array of files. This made the code more complex, but in reality it's always the same approach (so once you 'get' it, it's easy to follow): Things like: while(*file0.curp == *file1.curp) { ... } are replaced with (actually AND-ing N (or N-1) conditions): for (i = 1, is_match = TRUE; i file_len; i++) is_match = is_match *file[0].curp == *file[i].curp; while(is_match) { ... for (i = 1, is_match = TRUE; i file_len; i++) is_match = is_match *file[0].curp == *file[i].curp; } So, to be very concrete, it's really the choice between: (1) A generic set of functions, that work for N files, which are signifcantly more complex than the N==2 case: datasources_open(file[]) find_identical_prefix(file[]) find_identical_suffix(file[]) (2) A specialized set of functions for 2, 3 and 4 files (4 is maybe optional, because not used in svn core), which are a lot simpler, but actually completely duplicate the higher level logic: datasources_open2(file0, file1) find_identical_prefix2(file0, file1) find_identical_suffix2(file0, file1) datasources_open3(file0, file1, file2) find_identical_prefix3(file0, file1, file2) find_identical_suffix3(file0, file1, file2) datasources_open4(file0, file1, file2, file3) find_identical_prefix4(file0, file1, file2, file3) find_identical_suffix4(file0, file1, file2, file3) At the time, I felt that (1) was the right choice, WRT power/weight balance, keeping the logic in one single place. But with stefan2's proposed low-level optimizations of the algorithm, it's more or less open for debate :). Those low-level optimizations (which are definitely worth it) are much more difficult if you want to write them generically. And apart from that, splitting out the functions in _2, _3 and _4 variants is also a low-level optimization by itself. Right now, I'm still trying to keep it generic (trying to integrate stefan2's suggestions in a generic way; I'm assuming for now that the low-level optimization coming from the splitting-out itself is not significant (I'll have to test that)), but I'm not so sure anymore. We'll see... Anyway, all in all, I certainly don't feel like I've wasted my time, because it was also a great learning experience for me (new to C). And being a perfectionist, I want it to be as good as possible WRT the power/weight ratio (the most bang (speed/features/...) for the buck (complexity/maintainability/readability)) :-). Cheers, -- Johan [1] http://svn.haxx.se/dev/archive-2011-01/0241.shtml On Tue, Jan 18, 2011 at 7:37 AM, Arwin Arni ar...@collab.net wrote: Hi All, It is a very interesting notion that Gavin throws at us. I think it is very important for an open-source project to maintain it's code in a way that it is easy for a new guy (like me) to make quick and meaningful changes. Most open-source projects with a large development community ends up in a mess of perfectly working, yet highly unreadable code. However, as a pair of fresh eyes looking at the code, I can safely say that though being an open-source project, Subversion has managed to stay readable. This can only be attributed to the hours of work spent on over-engineering the code (BTW, I personally don't think anything can be over-engineered. In my book, it is merely an synonym for perfection). There are parts of the code (in svn) that have been written with the notion of getting things done. And these are the parts that I really find difficult to
Re: svn command line client's inconsistent behavior in handling multiple targets.
Julian Foad julian.f...@wandisco.com writes: On Fri, 2011-01-14 at 17:50 +0530, Noorul Islam K M wrote: Stefan Sperling s...@elego.de writes: On Thu, Jan 13, 2011 at 03:31:11PM +0530, Noorul Islam K M wrote: | Command| Return Code | Continue | |+-+--| | add| 0 | Y| | blame | 1 | N| | cat| 1 | Y| | changelist | 0 | N| | commit | 1 | N| | copy | 1 | N| | delete | 1 | N| | diff | 1 | N| | info | 1 | Y| | list | 1 | N| | log| 1 | N| | revert | 0 | Y| Nice overview, thanks for putting this together! Yes, good. This shows operations on local (working copy) targets only, doesn't it, apart from commit. Note Stefan's observation that the behaviour should (and probably does) vary depending on whether it's a local operation or a repository operation, so it might be useful to extend the table to show both. The commands add, cat, info and revert continues execution even if it finds one of the targets non-existing. All of them returns 0 to the shell except 'info' which returns 1. Also info prints svn: A problem occurred; see other errors for details at the end of the execution which tells the user that something went wrong during execution. This is not the case with 'add', 'cat' and 'revert'. I think these three commands also should return 1 and print the message which 'info' does, so that users does not have to scroll up to see the warnings to decide something went wrong. +1 on making more commands print a problem occured message at the end. Yes, +1. (Spelling: occurred.) I also think that all commands should exit 1 if one of the specified targets could not be found. This makes it much easier to detect such errors in shell scripts. Among these revert prints 'Skipped' message and others print 'svn: warning: ..' message. Hmm... somewhat inconsistent, but I don't think we need consistency here. Either way of putting it (skipped or warning) is fine. I agree this is not so important. My preference is for a warning that says what the problem was; a simple Skipped message is too vague for my liking. I think commit operation is an atomic one, therefore it does make sense when it errors out when one of the target is invalid. I am not sure why all the other commands errors out when one of the targets is non-existing. I think any command that tries to make a commit should error out. This includes copy and delete, because they can make commits without a working copy. Commands that operate on the working copy (including copy and delete!) should continue, print a warning or skipped message, and print the problem occured message at the end. Sounds good. So is this what we want?: (1) If a command is making a commit (or, let's say, changing the repository in any way, so revprop changes are included), then as soon as an invalid target is detected, error out (print an error message and exit with return code = 1). (2) If a command is changing (only) the WC, and a target is invalid, then print a warning (or skipped message) and move on to the next target, and then at the end print a summary message and exit with return code = 1. (3) If a command is not changing the repos or WC, but only fetching information (cat, blame, export, info, list, pg, pl, ...) then ... same as (2)? If everyone agrees to this then I will start sending patches for this. Noorul, what would the table(s) of results look like after all these changes? I included return code for URL related commands in the below table. Commands add, changelist, commit and revert are WC only commands. 'delete' is a commit operation when performed on URL. So I have not included them here in URL list. | Command| Return Code | Continue | |+-+--| | blame | 1 | N| | cat| 1 | N| | copy | 1 | N| | diff | 1 | N| | info | 1 | Y| | list | 1 | N| | log| 1 | N| After all these changes the table will look like this. For WC related commands: I removed commit because it is an atomic one and it will error out. So no change. | Command| Return Code | Continue | |+-+--| | add| 1 | Y| | blame | 1 | Y| | cat| 1 | Y| | changelist | 1 | Y| | delete | 1 | Y| | diff | 1 | Y| | info | 1 | Y| | list | 1 | Y| | log
Re: My take on the diff-optimizations-bytes branch
On Mon, Jan 17, 2011 at 3:12 AM, Johan Corveleyn jcor...@gmail.com wrote: On Sat, Jan 8, 2011 at 6:50 AM, Stefan Fuhrmann stefanfuhrm...@alice-dsl.de wrote: [ ... snip ... ] But I think, the stack variable is certainly helpful and easy to do. Ok, I've done this (locally, still have to clean up a little and then commit). It gives a nice 10% speedup of prefix/suffix scanning, which is great. I have a couple of small questions though, about other small optimizations you did: Index: subversion/libsvn_diff/diff_file.c === --- subversion/libsvn_diff/diff_file.c(revision 1054044) +++ subversion/libsvn_diff/diff_file.c(working copy) ... @@ -258,10 +259,10 @@ \ for (i = 0; i files_len; i++) \ { \ - if (all_files[i]-curp all_files[i]-endp - 1) \ -all_files[i]-curp++; \ + if (all_files[i].curp + 1 all_files[i].endp) \ +all_files[i].curp++; \ Just curious: why did you change that condition (from 'X Y - 1' to 'X + 1 Y')? You mention in your log message: minor re-arragements in arithmetic expressions to maximize reuse of results (e.g. in INCREMENT_POINTERS). Why does this help, and what's the potential impact? Second question: +find_identical_prefix_slow(svn_boolean_t *reached_one_eof, + apr_off_t *prefix_lines, + struct file_info file[], apr_size_t file_len, + apr_pool_t *pool) +{ + svn_boolean_t had_cr = FALSE; svn_boolean_t is_match, reached_all_eof; apr_size_t i; + apr_off_t lines = 0; *prefix_lines = 0; for (i = 1, is_match = TRUE; i file_len; i++) -is_match = is_match *file[0]-curp == *file[i]-curp; +is_match = is_match *file[0].curp == *file[i].curp; + while (is_match) { /* ### TODO: see if we can take advantage of diff options like ignore_eol_style or ignore_space. */ /* check for eol, and count */ - if (*file[0]-curp == '\r') + if (*file[0].curp == '\r') { - (*prefix_lines)++; + lines++; Here you added a local variable 'lines', which is incremented in the function, and copied to *prefix_lines at the end (you do the same in fine_identical_prefix_fast). The original code just sets and increments *prefix_lines directly. So, out of curiousness: why does this help, and how much? Thanks, -- Johan
Re: NFS and fsfs: slow commit performance
* Blair Zajac: Attempting to perform high commit rates into an fsfs repository on NFS with two or more Linux boxes, one of the processes can get stuck in fcntl() for over 30 seconds: open(repo/db/write-lock, O_RDWR) = 4 fcntl(4, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0} What's the return value of fcntl? (I wonder if this is a firewall issue, and the lock is never actually acquired.) -- Florian Weimerfwei...@bfk.de BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstraße 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99
Re: Ref-counting for pristine texts
Can anyone spare a bit of time to take this forward? I'll tell you what I'm planning to do next. Maybe someone can get to this before I can. The first thing I want to do is add a DB index on the 'refcount' column of the 'pristine' table. I'm convinced that will be necessary for adequate performance. To do so involves: * Adding the 'CREATE INDEX' SQL statement. That's simple - just copy/modify/paste one line in wc-metadata.sql. * Adding a WC format bump step that adds the index to existing dev WCs. Also quite simple. * I'd be happier if I ran a timing comparison, before and after, with searching for unreferenced pristines (those whose refcount is 0). That would give me confidence that adding the index has actually happened and that it has the desired effect. It would need a test with a large number of pristines, a scattered selection of them having refcount=0, and then time the running of svn_wc__db_pristine_cleanup(). With an index, this should find them all very quickly and the deletion time should dominate the whole cleanup operation, whereas without an index the time taken to scan the whole table should dominate when the table is very large. On Thu, 2011-01-13, Branko Čibej wrote: On 13.01.2011 20:01, Julian Foad wrote: I have committed the ref counting for pristine texts (r1058523) and have had a bit more insight into when to perform the deletion. Deleting unreferenced texts on closing a wcroot is too late - too large a granularity - for the way the WC code is currently structured because this doesn't happen until a (long-lived) pool is cleaned up, as Bert pointed out. Deleting unreferenced texts after running the Work Queue is too soon - too fine a granularity - for the way commit is currently structured. A simple test of this approach showed that by the time the post-commit processing tries to install a reference to a pristine text whose checksum was noted earlier, in some cases that pristine row has already been purged. This would indicate that the reference counting happens too soon ... in other words, that a pristine can be dereferenced whilst some part of the code (or database) still refers to it. That breaks database consistency -- what happens if the user aborts a commit and then calls 'svn cleanup', for example? If what I said about 'commit' is correct, then yes, that's bad and we should look for a better way. But I haven't tested it properly; I noticed that the commit failed saying that the DB failed a 'REFERENCES' clause, and what I said here is a hypothesis about how that happened. Investingating this is the second thing I would want to do next. At the moment I think the practical solution is to insert calls to the pristine cleanup at the end of each main libsvn_wc public API that may have freed up one or more pristines. Mmm. Too many points of failure, don't you think? Of course, it's no worse than the cancellation handlers, however, and I suppose missing a chance to delete is better than having one too many. I'm thinking this is nevertheless the best option. The consequence of missing some places is not too bad at all. If we miss places where we should clean up, then all it means is that some operations will leak unused pristines, but they won't be leaked permanently, only until the user runs one of the operations that do have the cleanup call in them. That's a consequence of calling a global clean up all unreferenced pristines function rather than trying to keep track of which pristines have been freed up by *this* operation. Wherever we do it, if it's at all frequent, the search for unreferenced pristines needs to be efficient. At present I use SELECT ... FROM PRISTINE WHERE refcount = 0. I believe that can most easily be made efficient by adding an index on the 'refcount' column, so that's what I intend to do. That's the only thing you can do, unless you want to enhance the triggers to populate a queue of to-be-deleted pristines. I agree that adding an index on the table is a better solution than trying to maintain a separate queue of the pristines that are waiting to be cleaned up. The index makes it efficient for a simple query ('refcount = 0') to provide the same information as the separate queue would provide. - Julian
Re: NFS and fsfs: slow commit performance
On Jan 18, 2011, at 6:49 AM, Florian Weimer wrote: * Blair Zajac: Attempting to perform high commit rates into an fsfs repository on NFS with two or more Linux boxes, one of the processes can get stuck in fcntl() for over 30 seconds: open(repo/db/write-lock, O_RDWR) = 4 fcntl(4, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0} What's the return value of fcntl? (I wonder if this is a firewall issue, and the lock is never actually acquired.) It did succeed. This was one of many fcntl() calls, I just did a control-C in strace. Blair
Re: [PATCH] Add regression test for issue #3686:executable flag not correctly set on merge
Daniel Becroft wrote on Tue, Jan 18, 2011 at 07:13:12 +1000: Hi guys, I was looking through the Getting Involved section of the Subversion website, and decided to start by writing a regression test for an issue, starting with 3686 (executable flag not correctly set on merge). This issue was already assigned to someone, so apologies if it's already being worked on. Welcome aboard :-). As a note for next time, though, when an issue is assigned to someone, it would be better to ask them (or dev@) whether they're working on it before starting to work on it yourself. I'm CCing Blair now for this reason. I've attached the patch, and included it below as well. A few questions I have: 1) I couldn't find a precedent for examining the OS's executable flag, so my approach might be less than ideal. Should I add a test to prop_tests.py to check that setting svn:executable also sets the executable bit? No, the test you're just adding already does just that. Just copy the two asserts above the # commit r2 line to after the comment and you're done :-) 2) The problem also exists under the '--reintegrate' scenario, even after a subsequent commit. Would it be better to include that case here, or move the entire test to the merge_reintegrate.py suite? I'm not sure we need a separate test for the --reintegrate case; it seems reasonable to assume it does (and will) share the 'set +x permission' logic with normal merges. As to location, if I wrote this I'd probably put it in prop_tests or special_tests along with the other svn:executable tests. (That's somewhat of a bikeshed, though, and I don't feel strongly about this.) Regards, Daniel Becroft [[[ Add regression test for issue #3686. This issue involves the executable flag being lost during a merge of a binary file with the 'svn:executable' property set. * subversion/tests/cmdline/merge_tests.py (merge_change_to_file_with_executable(sbox)) New test case. Drop the signature, and add a colon: (merge_change_to_file_with_executable): New test case. ]]] (you've attached the patch both inline and as an attachment, I'll review only the latter) Index: subversion/tests/cmdline/merge_tests.py === --- subversion/tests/cmdline/merge_tests.py (revision 1059656) +++ subversion/tests/cmdline/merge_tests.py (working copy) @@ -16316,6 +16316,78 @@ Subtree merge under working merge produced the wrong mergeinfo, '/A/C/nu:9', [], 'pg', SVN_PROP_MERGEINFO, nu_COPY_path) + +#-- +# Test for issue #3686 'executable flag not correctly set on merge' +# See http://subversion.tigris.org/issues/show_bug.cgi?id=3686 +def merge_change_to_file_with_executable(sbox): + executable flag is maintained during binary merge + + # Scenario: When merging a change to a binary file with the 'svn:executable' + # property set, the file is not marked as 'executable'. After commit, the + # executable bit is set correctly. + sbox.build() + wc_dir = sbox.wc_dir + trunk_url = sbox.repo_url + '/A/B/E' + + alpha_path = os.path.join(wc_dir, A, B, E, alpha) + beta_path = os.path.join(wc_dir, A, B, E, beta) + + # Force one of the files to be a binary type + svntest.actions.run_and_verify_svn(None, None, [], + 'propset', 'svn:mime-type', 'application/octet-stream', + alpha_path) + + # Set the 'svn:executable' property on both files + svntest.actions.run_and_verify_svn(None, None, [], + 'propset', 'svn:executable', 'ON', + beta_path) + + svntest.actions.run_and_verify_svn(None, None, [], + 'propset', 'svn:executable', 'ON', + alpha_path) + + # Verify the executable bit has been set before committing + assert os.access(alpha_path, os.X_OK) + assert os.access(beta_path, os.X_OK) + No. Please use 'assert' for bugs in the test itself, and something else ('raise svntest.Failure' is the most generic) for bugs in the code being tested. Example: struct test_case { const char *input; int *exit_code; } test_cases[] = { ... }; for (i = 0; i NUMBER_OF_TESTS; i++) { struct test_case t = test_cases[i]; SVN_ERR_ASSERT(t.exit_code); SVN_TEST_ASSERT(system(t.input) == *t.exit_code); } + # Commit change (r2) + sbox.simple_commit() + + # Create the branch + svntest.actions.run_and_verify_svn(None, None, [], 'cp', + trunk_url, + sbox.repo_url + '/branch', + '-m', Creating the Branch) + + # Modify the files + commit (r3) + svntest.main.file_append(alpha_path, 'appended
Re: NFS and fsfs: slow commit performance
On Jan 18, 2011, at 1:49 AM, Bolstridge, Andrew wrote: Hi. Just a quick question/suggestion: If the exponential sleep can be a maximum of 25ms, and the random one between 0 and 100ms, what is the performance like if you just sleep for 25ms - without any randomness, or doubling of sleep times? I used to find (but this was ages ago) that just sleeping for 1ms would force the CPU to carry on with other things, perhaps that would have an effect that's just as good as the longer sleeps? Top stuff BTW, I doubt it'll have much effect on my slow virtual machine disks, but every performance boost helps! Thanks. [cc'ing dev@s.a.o] If you're not storing your repositories on NFS, then this patch doesn't help performance, the kernel's fcntl() will block the caller and unblock it when the lock is released. The sleeps are there to not pound the NFS server with traffic. I pt in a fixed 25ms sleep and here are the results, which don't look as good as the random sleep: N is 1567 sum is 806.845106840133 meanis 0.514897962246415 SD Nis 0.591845693041212 SD N-1 is 0.592034630218584 min is 0.1712911129 25% is 0.232829093933 50% is 0.273239135742 90% is 1.02276110649 95% is 1.61777615547 99% is 3.28569889069 max is 6.39945292473 Percentage of commits by slow systems: 27.1 Percentage of commits by fast systems: 72.9 Blair
Re: [PATCH] Add regression test for issue #3686:executable flag not correctly set on merge
On Wed, Jan 19, 2011 at 3:08 AM, Daniel Shahaf d...@daniel.shahaf.namewrote: Daniel Becroft wrote on Tue, Jan 18, 2011 at 07:13:12 +1000: Hi guys, I was looking through the Getting Involved section of the Subversion website, and decided to start by writing a regression test for an issue, starting with 3686 (executable flag not correctly set on merge). This issue was already assigned to someone, so apologies if it's already being worked on. Welcome aboard :-). As a note for next time, though, when an issue is assigned to someone, it would be better to ask them (or dev@) whether they're working on it before starting to work on it yourself. I'm CCing Blair now for this reason. Ah, will do that in the future. Oops. I've attached the patch, and included it below as well. A few questions I have: 1) I couldn't find a precedent for examining the OS's executable flag, so my approach might be less than ideal. Should I add a test to prop_tests.py to check that setting svn:executable also sets the executable bit? No, the test you're just adding already does just that. Just copy the two asserts above the # commit r2 line to after the comment and you're done :-) 2) The problem also exists under the '--reintegrate' scenario, even after a subsequent commit. Would it be better to include that case here, or move the entire test to the merge_reintegrate.py suite? I'm not sure we need a separate test for the --reintegrate case; it seems reasonable to assume it does (and will) share the 'set +x permission' logic with normal merges. I thought about that, but this situation fixes itself after the 'svn merge/svn commit' combination. However, after using --reintegrate. and committing, the executable bit is still not set, so I think there might be something more going on there. As to location, if I wrote this I'd probably put it in prop_tests or special_tests along with the other svn:executable tests. (That's somewhat of a bikeshed, though, and I don't feel strongly about this.) My logic for the placement was: it's exercising a problem with the svn merge process, so it belongs in merge_tests.py. Maybe that was wrong. Regards, Daniel Becroft [[[ Add regression test for issue #3686. This issue involves the executable flag being lost during a merge of a binary file with the 'svn:executable' property set. * subversion/tests/cmdline/merge_tests.py (merge_change_to_file_with_executable(sbox)) New test case. Drop the signature, and add a colon: (merge_change_to_file_with_executable): New test case. ]]] (you've attached the patch both inline and as an attachment, I'll review only the latter) Index: subversion/tests/cmdline/merge_tests.py === --- subversion/tests/cmdline/merge_tests.py (revision 1059656) +++ subversion/tests/cmdline/merge_tests.py (working copy) @@ -16316,6 +16316,78 @@ Subtree merge under working merge produced the wrong mergeinfo, '/A/C/nu:9', [], 'pg', SVN_PROP_MERGEINFO, nu_COPY_path) + +#-- +# Test for issue #3686 'executable flag not correctly set on merge' +# See http://subversion.tigris.org/issues/show_bug.cgi?id=3686 +def merge_change_to_file_with_executable(sbox): + executable flag is maintained during binary merge + + # Scenario: When merging a change to a binary file with the 'svn:executable' + # property set, the file is not marked as 'executable'. After commit, the + # executable bit is set correctly. + sbox.build() + wc_dir = sbox.wc_dir + trunk_url = sbox.repo_url + '/A/B/E' + + alpha_path = os.path.join(wc_dir, A, B, E, alpha) + beta_path = os.path.join(wc_dir, A, B, E, beta) + + # Force one of the files to be a binary type + svntest.actions.run_and_verify_svn(None, None, [], + 'propset', 'svn:mime-type', 'application/octet-stream', + alpha_path) + + # Set the 'svn:executable' property on both files + svntest.actions.run_and_verify_svn(None, None, [], + 'propset', 'svn:executable', 'ON', + beta_path) + + svntest.actions.run_and_verify_svn(None, None, [], + 'propset', 'svn:executable', 'ON', + alpha_path) + + # Verify the executable bit has been set before committing + assert os.access(alpha_path, os.X_OK) + assert os.access(beta_path, os.X_OK) + No. Please use 'assert' for bugs in the test itself, and something else ('raise svntest.Failure' is the most generic) for bugs in the code being tested. Example: struct test_case { const char *input; int *exit_code; } test_cases[] = { ... }; for (i = 0; i
Re: [PATCH] Add regression test for issue #3686:executable flag not correctly set on merge
Daniel Becroft wrote on Wed, Jan 19, 2011 at 07:27:15 +1000: On Wed, Jan 19, 2011 at 3:08 AM, Daniel Shahaf d...@daniel.shahaf.namewrote: Daniel Becroft wrote on Tue, Jan 18, 2011 at 07:13:12 +1000: 2) The problem also exists under the '--reintegrate' scenario, even after a subsequent commit. Would it be better to include that case here, or move the entire test to the merge_reintegrate.py suite? I'm not sure we need a separate test for the --reintegrate case; it seems reasonable to assume it does (and will) share the 'set +x permission' logic with normal merges. I thought about that, but this situation fixes itself after the 'svn merge/svn commit' combination. However, after using --reintegrate. and committing, the executable bit is still not set, so I think there might be something more going on there. Ah, if there's a different in the results between merge and reintegrate merge, then two tests are justifiable. (I missed that part of your original comment.) As to location, if I wrote this I'd probably put it in prop_tests or special_tests along with the other svn:executable tests. (That's somewhat of a bikeshed, though, and I don't feel strongly about this.) My logic for the placement was: it's exercising a problem with the svn merge process, so it belongs in merge_tests.py. Maybe that was wrong. As I said, I don't feel strongly about this. + # Verify the executable bit has been set + assert os.access(alpha_path, os.X_OK) + assert os.access(beta_path, os.X_OK) + As expected, the test passes if you remove these two lines. Actually, the test passes if you only take out the assertion for the 'alpha_path' - the problem only arises when merging binary files, it works correctly for non-binary files. That's why I've got both in the test case. OK. In short: +1 to commit, assuming you replace the asserts with something else per my earlier comment. Thanks for the review, Daniel. I'll make these changes, and will send later tonight. Is it preferable to be in a new email, or in a reply to this one? Reply to this one. Cheers, Daniel B.
[PATCH] Server fix for issue #3657 - Any mod_dav_svn experts out there? (was Re: svn commit: r966822)
On Fri, Jan 7, 2011 at 7:08 PM, Paul Burba ptburba_at_gmail.com wrote: Mike suggested to me that the fix for this problem can be made in mod_dav_svn, since the update reporter already knows which properties have changed, *regardless* of the send-all mode, but it discards this information if send-all=FALSE and instead instead sends the client a 'fetch-props' response. As described previously in this thread, this causes the client to fetch all the properties of the path, not just the changes, and *all* the props are pushed through the editor callbacks as if they were all actual changes (which again is the core problem of issue #3657). I'm working on that approach, I had pushed off the fix-it-in-mod_dav_svn approach till 1.8 since I was going in circles with it, but recently found another real-world example where this issue causes spurious conflicts on directory properties, see http://subversion.tigris.org/issues/show_bug.cgi?id=3657#desc13. This issue has existed before 1.5 so it's nothing new, but given that I found another non-theoretical example and given that the presence of mergeinfo is only going to make this issue more common[1] I thought I'd take another look for 1.7. I still haven't gotten anywhere with the mod_dav_svn fix, so not much to talk about there, but I did commit the fix for the directory property conflicts in http://svn.apache.org/viewvc?view=revisionrevision=1056565. The fix here is pretty much the same thing I did in http://svn.apache.org/viewvc?view=revisionrevision=966822: Filter out the phony prop changes in the libsvn_client/repos_diff.c svn_delta_editor_t change_[dir|file]_prop() callback implementations. Just to be clear about one thing, when I committed r966822 Bert asked: (change_file_prop): Only stash true property differences in the file baton. The DAV RA providers may be sending us all the properties on a file, not just the differences. Shouldn't this be fixed in the ra layer implementations then? This would be just 'guessing' if it is a change or a complete set. Why not the RA layer? Two reasons. The first I explained earlier in this thread, it reopens issue #2048 'Primary connection (of parallel network connections used) in ra-dav diff/merge timeout unnecessarily.' I won't go over that again, but even if there is a way to avoid that, the RA layer is simply doing what we ask it when using the DAV providers. Specifically, the mod_dav_svn update report handler, when in 'skelta' mode[2] and describing changes to a path on which a property has changed, asks the client to later fetch all properties and figure out what has changed itself, i.e. the client asks the RA layer to give us *all* the properties, it obliges, so where is the problem as far as the RA layer is concerned? So this ultimately needs to be fixed on the server. Given that, I am leaving these libsvn_client filters in place so a 1.7 client will DTRT with older servers (or maybe 1.7 servers too, since as I said, I'm not having great success there). As to the 'guessing', that was true. We'd go through the motions of filtering over ra_local and ra_svn too. A pretty minor performance hit, but in r1056565 I tweaked the filtering so it only applies when using ra_serf and ra_neon. Paul [1] simply by increasing the potential pool of property changes incoming during a merge. [2] Which merges always are, regardless of whether we use ra_serf or ra_neon. Paul Ok, after many false starts and detours I finally have a server-side fix for this issue. Pursuing a suggestion by cmpilato, I made the mod_dav_svn update report always send property changes, regardless of whether the report is in skelta (send-all=false) mode or non-skelta (send-all=true) mode. In addition to the normal serf/neon trunk tests, I tested 1.5.10 (dev build) and 1.6.16 (dev build) against this patched 1.7 server and there were no unexpected [1] failures. If any mod_dav_svn experts have time to take a look at this patch I'd appreciate it. Thanks, Paul [[[ Server-side fix for issue #3657 'dav update report handler in skelta mode can cause spurious conflicts'. This change has two parts that are loosely related: 1) It stops mod_dav_svn from ever sending the 'fetch-props' response to a client, even when in skelta (i.e. send-all=false) mode. The server already knows what properties have changed so now it simply sends them, rather than telling the client to ask for them later. This prevents the svn_delta_editor_t change_[file|dir]_prop callback from receiving faux property changes when the client asks for *all* the properties. See http://subversion.tigris.org/issues/show_bug.cgi?id=3657 for all the gory details. 2) The comments in our code frequently make reference to 'modern' servers and clients in the context of mod_dav_svn. Unfortunately there is no such thing as a pre-modern server, they are all modern! The use of the term 'modern' in our code
Re: Ref-counting for pristine texts
On 18.01.2011 16:58, Julian Foad wrote: On Thu, 2011-01-13, Branko Čibej wrote: This would indicate that the reference counting happens too soon ... in other words, that a pristine can be dereferenced whilst some part of the code (or database) still refers to it. That breaks database consistency -- what happens if the user aborts a commit and then calls 'svn cleanup', for example? If what I said about 'commit' is correct, then yes, that's bad and we should look for a better way. But I haven't tested it properly; I noticed that the commit failed saying that the DB failed a 'REFERENCES' clause, and what I said here is a hypothesis about how that happened. Investingating this is the second thing I would want to do next. I beg to differ, it should be the first thing, because it's a correctness issue. If the references in the database are maintained correctly, then it should be safe to delete after each successful transaction commit. That's assuming that any high-level operation involves a single transaction, e.g., that a commit can't fail in some way that would require the pristine to still be in place in order to recover, even if the reference count in the database is 0. -- Brane
Re: My take on the diff-optimizations-bytes branch
On 18.01.2011 12:56, Johan Corveleyn wrote: On Mon, Jan 17, 2011 at 3:12 AM, Johan Corveleynjcor...@gmail.com wrote: On Sat, Jan 8, 2011 at 6:50 AM, Stefan Fuhrmann stefanfuhrm...@alice-dsl.de wrote: [ ... snip ... ] But I think, the stack variable is certainly helpful and easy to do. Ok, I've done this (locally, still have to clean up a little and then commit). It gives a nice 10% speedup of prefix/suffix scanning, which is great. I have a couple of small questions though, about other small optimizations you did: Index: subversion/libsvn_diff/diff_file.c === --- subversion/libsvn_diff/diff_file.c (revision 1054044) +++ subversion/libsvn_diff/diff_file.c (working copy) ... @@ -258,10 +259,10 @@ \ for (i = 0; i files_len; i++) \ {\ - if (all_files[i]-curp all_files[i]-endp - 1) \ -all_files[i]-curp++;\ + if (all_files[i].curp + 1 all_files[i].endp) \ +all_files[i].curp++; \ Just curious: why did you change that condition (from 'X Y - 1' to 'X + 1 Y')? You mention in your log message: minor re-arragements in arithmetic expressions to maximize reuse of results (e.g. in INCREMENT_POINTERS). Why does this help, and what's the potential impact? There is a hidden common sub-expression. Variant 1 in pseudo-code: lhs = all_files[i].curp; temp1 = all_files[i].endp; rhs = temp1 - 1; if (lhs rhs) { temp2 = lhs + 1; all_files[i].curp = temp2; } Variant 2: lhs = all_files[i].curp; temp = lhs + 1; rhs = all_files[i].endp; if (lhs rhs) all_files[i].curp = temp; The problem is that the compiler cannot figure that out on its own because (x+1) and (y-1) don't overflow / wrap around for the same values. In particular 0 (0-1) is true and (0+1) 0 is false in unsigned arithmetic. Second question: +find_identical_prefix_slow(svn_boolean_t *reached_one_eof, + apr_off_t *prefix_lines, + struct file_info file[], apr_size_t file_len, + apr_pool_t *pool) +{ + svn_boolean_t had_cr = FALSE; svn_boolean_t is_match, reached_all_eof; apr_size_t i; + apr_off_t lines = 0; *prefix_lines = 0; for (i = 1, is_match = TRUE; i file_len; i++) -is_match = is_match *file[0]-curp == *file[i]-curp; +is_match = is_match *file[0].curp == *file[i].curp; + while (is_match) { /* ### TODO: see if we can take advantage of diff options like ignore_eol_style or ignore_space. */ /* check for eol, and count */ - if (*file[0]-curp == '\r') + if (*file[0].curp == '\r') { - (*prefix_lines)++; + lines++; Here you added a local variable 'lines', which is incremented in the function, and copied to *prefix_lines at the end (you do the same in fine_identical_prefix_fast). The original code just sets and increments *prefix_lines directly. So, out of curiousness: why does this help, and how much? Incrementing the temporary variable requires one indirection less than the original code and uses only one instead of two registers (in typical cases). The 'how much?' part depends on compiler / optimizer and even more on the target platform (x86 has very few registers to keep data in fly; x64 has about twice as many). So, both changes are more or less general coding patterns that will improve performance somewhat without compromising readability. -- Stefan^2.
Re: Coding goals / requirements.
Hi Johan, I hope it wasn't inferred in any part of my email that I was trying to tell you what to do it in any specific manner. And although I was prompted by your thread, it was completely unrelated. In my day job - I always strive for the simplest solution possible. If the requirements of the task require that it be shared around the application , then it becomes a requirement of that task to ensure that an appropriate API is provided. If however there is no current need - even if I can see it might be used at a later date, I won't bother with any API related work. What I think might happen later on; may not come to pass - and to my mind my employer deserves as much of my time being consumed for the purposes of adding new features, fixing bugs. Future-proofing for the sake of future proofing is not of any advantage today to the product, or myself as the developer of the code. It can be argued that if you're in a specific problem space, you'll save time by completing the API now for that code - as opposed to coming back to it later on and having to reacquaint yourself with code that you previously wrote. Especially in the OSS space - it could well be a case of you having to acquaint yourself with someone else's code. For me, I find that the time saved, still, isn't that great, as long as; * You have a clearly defined design standard that is ALWAYS followed. * You write the simplest code possible to successfully complete the engineering task at hand. * You provide as required appropriate inline documentation where the simplest solution possible is in fact complex. I do realise that everyone has their own work practices / workflows and mileage varies depending on a matter of factors. And it would seem that, at least initially, that I am being outvoted! On 18/01/2011, at 8:35 PM, Johan Corveleyn wrote: Hi, As the guy responsible for the quote that started this thread ([1]): Actually, what's a little bit troubling is that there are currently only 3 possible file_len's, of which only 2 are used in practice: diff2 and diff3 (diff4 is not used in svn core, only in tools/diff/diff4). So, if we make a slow and a fast version, we could just as well make a XXX2 and an XXX3 version explicitly, and dispense with the need to pass arrays and array lenghts. Hmmm... generic (but more complex) code vs. a couple of specialized versions. I'd like to point out that I didn't mean to criticize the API choices, I was just weighing the options. In fact, I came up with the initial API choice myself (passing an array of files, instead of passing 2 resp. 3 resp. 4 files), because I wanted it to be generically useful. In any case it's a new API in the libsvn_diff module, for making prefix/suffix optimizations possible. I don't want to hijack this high-level discussion to be a technical one again, but it may provide some additional context to give some more details about the thoughts behind it ... In fact, I started out with the simple form: passing 2 file arguments for regular diff. But then I discovered there was also something like diff3 (with 3 files) and diff4 (with 4 files; not used in core SVN). So I simply thought to myself: theoretically this optimization should work with N files, so why not just make it so. And I changed the API to accept an array of files. This made the code more complex, but in reality it's always the same approach (so once you 'get' it, it's easy to follow): Things like: while(*file0.curp == *file1.curp) { ... } are replaced with (actually AND-ing N (or N-1) conditions): for (i = 1, is_match = TRUE; i file_len; i++) is_match = is_match *file[0].curp == *file[i].curp; while(is_match) { ... for (i = 1, is_match = TRUE; i file_len; i++) is_match = is_match *file[0].curp == *file[i].curp; } So, to be very concrete, it's really the choice between: (1) A generic set of functions, that work for N files, which are signifcantly more complex than the N==2 case: datasources_open(file[]) find_identical_prefix(file[]) find_identical_suffix(file[]) (2) A specialized set of functions for 2, 3 and 4 files (4 is maybe optional, because not used in svn core), which are a lot simpler, but actually completely duplicate the higher level logic: datasources_open2(file0, file1) find_identical_prefix2(file0, file1) find_identical_suffix2(file0, file1) datasources_open3(file0, file1, file2) find_identical_prefix3(file0, file1, file2) find_identical_suffix3(file0, file1, file2) datasources_open4(file0, file1, file2, file3) find_identical_prefix4(file0, file1, file2, file3) find_identical_suffix4(file0, file1, file2, file3) At the time, I felt that (1) was the right choice, WRT power/weight balance, keeping the logic in one single place. But with stefan2's proposed low-level optimizations of the algorithm, it's more or