Re: relation to minfo-cnt bug Re: predecessor count for the root node-revision is wrong message
On Wed, Mar 28, 2012 at 12:00 PM, Daniel Shahaf danie...@elego.de wrote: Jason Wong wrote on Wed, Mar 28, 2012 at 11:49:20 -0700: dump-noderev.pl /repo / - id: 0.0.r62104/28771 type: dir pred: 0.0.r62103/28680 count: 62071 text: 62104 27520 1238 1238 ea635421e867454f9f7bc503c8160a2c cpath: / copyroot: 0 / minfo-cnt: 25707 - dump-noderev.pl /mirror2 / --- id: 0.0.r62104/6122 type: dir pred: 0.0.r62103/6039 count: 62104 text: 62104 4874 1235 1235 1f315ed2437ba5d70dba2587d9ef2d5a cpath: / copyroot: 0 / minfo-cnt: 25707 --- Is this in line with what you expected? It's in line with my expectations, insofar as on the mirror the 'count' is correct. It also indicates that you weren't bitten by the minfo-cnt part of this bug. As you know from the dev@ thread, Philip identified that part and fixed it too -- after my above email. Thanks again for your help in chasing down this bug. It was backported today towards 1.7.5 too. Cheers, Daniel Hi Daniel. No problem. I am glad the issues are fixed. Thank you for all your help and patience with my slow replies. It has been a busy couple of months for me in trying find the time to do these tests. So for correcting the count information in our live repository, I should run svnsync on it at some point? Is there anything I need to do after running that command in order to have it not link to the original? Thanks. Jason Wong
Re: relation to minfo-cnt bug Re: predecessor count for the root node-revision is wrong message
Jason Wong wrote on Fri, Mar 30, 2012 at 11:39:02 -0700: On Wed, Mar 28, 2012 at 12:00 PM, Daniel Shahaf danie...@elego.de wrote: Jason Wong wrote on Wed, Mar 28, 2012 at 11:49:20 -0700: dump-noderev.pl /repo / - id: 0.0.r62104/28771 type: dir pred: 0.0.r62103/28680 count: 62071 text: 62104 27520 1238 1238 ea635421e867454f9f7bc503c8160a2c cpath: / copyroot: 0 / minfo-cnt: 25707 - dump-noderev.pl /mirror2 / --- id: 0.0.r62104/6122 type: dir pred: 0.0.r62103/6039 count: 62104 text: 62104 4874 1235 1235 1f315ed2437ba5d70dba2587d9ef2d5a cpath: / copyroot: 0 / minfo-cnt: 25707 --- Is this in line with what you expected? It's in line with my expectations, insofar as on the mirror the 'count' is correct. It also indicates that you weren't bitten by the minfo-cnt part of this bug. As you know from the dev@ thread, Philip identified that part and fixed it too -- after my above email. Thanks again for your help in chasing down this bug. It was backported today towards 1.7.5 too. Cheers, Daniel Hi Daniel. No problem. I am glad the issues are fixed. Thank you for all your help and patience with my slow replies. It has been a busy couple of months for me in trying find the time to do these tests. Welcome. So for correcting the count information in our live repository, I should run svnsync on it at some point? You already did, as you have a mirror. (Maybe you created it via dump/load.) Now you just need to swap the mirror for the original repository: - stop commits - svnsync sync URL/to/mirror2 URL/to/repo (or svnadmin dump -r (YOUNGEST ON MIRROR2):HEAD --deltas --incremental repo \ | svnadmin load mirror2) - rename repo repo.i4129-victim rename mirror2 repo - enable commits Is there anything I need to do after running that command in order to have it not link to the original? You can optionally remove the svn:sync-* revprops from r0. Thanks. Jason Wong
Re: relation to minfo-cnt bug Re: predecessor count for the root node-revision is wrong message
On Thu, Mar 22, 2012 at 11:32 AM, Jason Wong jwong1m...@gmail.com wrote: Hello Daniel. I will give it a go and let you know what I find. Jason On Wed, Mar 21, 2012 at 1:39 AM, Daniel Shahaf danie...@elego.de wrote: Jason, I've learnt yesterday something new about the minfo-cnt corruption bug: it can manifest not only as absurdly high values (on the order of 2**70), but as far smaller wrong increments too (such as increment of 172 instead of of 0 on one occasion). Could you determine whether said bug has occurred in your history? You can do that by duplicating your repository using svnsync or dump|load, running dump-noderev.pl on / of both copies at the same revisions, and comparing the minfo-cnt values. I would be interested in knowing whether they are equal between the two copies. Thanks, Daniel Jason Wong wrote on Thu, Feb 16, 2012 at 11:42:42 -0800: ./dump-noderev.pl /repository / 61851 -- id: 0.0.r61851/33470 type: dir pred: 0.0.r61850/3844 count: 61818 text: 61851 32225 1232 1232 7555349571e297c23e647cc2441d5b8f cpath: / copyroot: 0 / minfo-cnt: 25685 -- Hello Daniel. The svnsync took a while to run once I got it going. I ran the command on the hotcopy I had made originally to keep the results consistant. I have run the following two commands: dump-noderev.pl /repo / dump-noderev.pl /mirror2 / Here are the outputs from the commands: dump-noderev.pl /repo / - id: 0.0.r62104/28771 type: dir pred: 0.0.r62103/28680 count: 62071 text: 62104 27520 1238 1238 ea635421e867454f9f7bc503c8160a2c cpath: / copyroot: 0 / minfo-cnt: 25707 - dump-noderev.pl /mirror2 / --- id: 0.0.r62104/6122 type: dir pred: 0.0.r62103/6039 count: 62104 text: 62104 4874 1235 1235 1f315ed2437ba5d70dba2587d9ef2d5a cpath: / copyroot: 0 / minfo-cnt: 25707 --- Is this in line with what you expected? Jason Wong.
Re: relation to minfo-cnt bug Re: predecessor count for the root node-revision is wrong message
Jason Wong wrote on Wed, Mar 28, 2012 at 11:49:20 -0700: dump-noderev.pl /repo / - id: 0.0.r62104/28771 type: dir pred: 0.0.r62103/28680 count: 62071 text: 62104 27520 1238 1238 ea635421e867454f9f7bc503c8160a2c cpath: / copyroot: 0 / minfo-cnt: 25707 - dump-noderev.pl /mirror2 / --- id: 0.0.r62104/6122 type: dir pred: 0.0.r62103/6039 count: 62104 text: 62104 4874 1235 1235 1f315ed2437ba5d70dba2587d9ef2d5a cpath: / copyroot: 0 / minfo-cnt: 25707 --- Is this in line with what you expected? It's in line with my expectations, insofar as on the mirror the 'count' is correct. It also indicates that you weren't bitten by the minfo-cnt part of this bug. As you know from the dev@ thread, Philip identified that part and fixed it too -- after my above email. Thanks again for your help in chasing down this bug. It was backported today towards 1.7.5 too. Cheers, Daniel
Re: #4129 is reproducible Re: predecessor count for the root node-revision is wrong message
Philip Martin philip.mar...@wandisco.com writes: Moving update_ancestry from tree.c to dag.c is one way to fix the problem. This was applied in r1302613. I believe this also fixes the minfo-cnt corruption that has been observed. To reproduce apply the following patch to the old client to allow commit to be paused: Index: ../src/subversion/libsvn_fs_fs/tree.c === --- ../src/subversion/libsvn_fs_fs/tree.c (revision 1302612) +++ ../src/subversion/libsvn_fs_fs/tree.c (working copy) @@ -1694,6 +1694,7 @@ any future merges will only be between that node and whatever the root node of the youngest rev is by then. */ err = merge_changes(NULL, youngish_root_node, txn, conflict, iterpool); + { char buf[256]; fputs(waiting..., stdout);fgets(buf, 255, stdin); } if (err) { if ((err-apr_err == SVN_ERR_FS_CONFLICT) conflict_p) Now the scenario: svnadmin create repo svn mkdir -mm file://`pwd`/repo/{A,B,C} svn co file://`pwd`/repo wc1 svn co file://`pwd`/repo wc2 svn co file://`pwd`/repo wc3 svn ps svn:mergeinfo /P:2 wc1/A svn ps svn:mergeinfo /Q:2 wc2/B svn ps svn:mergeinfo /R:2 wc3/C Now commit wc1 using the patched client running under valgrind. At the first waiting... prompt commit wc2 using a standard client. Continue the first commit to get a second waiting... prompt and commit wc3 using a standard client. Continue the first commit and valgrind reports errors such as: ==23311== Conditional jump or move depends on uninitialised value(s) ==23311==at 0x98E0E5A: svn_fs_fs__dag_increment_mergeinfo_count (dag.c:544) ==23311==by 0x9906847: merge (tree.c:1551) ==23311==by 0x9906A18: merge_changes (tree.c:1599) ==23311==by 0x9906C06: svn_fs_fs__commit_txn (tree.c:1696) ==23311==by 0x6C8C6D9: svn_fs_commit_txn (fs-loader.c:646) ==23311==by 0x6A5B12C: svn_repos_fs_commit_txn (fs-wrap.c:59) ==23311==by 0x6A51D0E: close_edit (commit.c:693) ==23311==by 0x4E4A34F: svn_client__do_commit (commit_util.c:1898) ==23311==by 0x4E45723: svn_client_commit6 (commit.c:1689) ==23311==by 0x4097D6: svn_cl__commit (commit-cmd.c:169) ==23311==by 0x416FA1: main (main.c:2699) ==23311== Syscall param write(buf) points to uninitialised byte(s) ==23311==at 0x62CC0D0: __write_nocancel (syscall-template.S:82) ==23311==by 0x5E7D4FC: apr_file_flush_locked (readwrite.c:317) ==23311==by 0x5E7D7AF: apr_file_flush (readwrite.c:340) ==23311==by 0x5E7CFA4: apr_unix_file_cleanup (open.c:77) ==23311==by 0x59FADAF: svn_io_file_close (io.c:3092) ==23311==by 0x98E9083: svn_fs_fs__put_node_revision (fs_fs.c:2318) ==23311==by 0x98E1039: svn_fs_fs__dag_increment_mergeinfo_count (dag.c:575) ==23311==by 0x9906847: merge (tree.c:1551) ==23311==by 0x9906A18: merge_changes (tree.c:1599) ==23311==by 0x9906C06: svn_fs_fs__commit_txn (tree.c:1696) ==23311==by 0x6C8C6D9: svn_fs_commit_txn (fs-loader.c:646) ==23311==by 0x6A5B12C: svn_repos_fs_commit_txn (fs-wrap.c:59) ==23311== Address 0xb2ee1dc is 92 bytes inside a block of size 4,096 alloc'd ==23311==at 0x4C244E8: malloc (vg_replace_malloc.c:236) ==23311==by 0x5E806D0: pool_alloc (apr_pools.c:1463) ==23311==by 0x5E7CDFB: apr_file_open (open.c:211) ==23311==by 0x59F5194: file_open (io.c:280) ==23311==by 0x59FABF6: svn_io_file_open (io.c:3049) ==23311==by 0x98E8FC4: svn_fs_fs__put_node_revision (fs_fs.c:2308) ==23311==by 0x98E1039: svn_fs_fs__dag_increment_mergeinfo_count (dag.c:575) ==23311==by 0x9906847: merge (tree.c:1551) ==23311==by 0x9906A18: merge_changes (tree.c:1599) ==23311==by 0x9906C06: svn_fs_fs__commit_txn (tree.c:1696) ==23311==by 0x6C8C6D9: svn_fs_commit_txn (fs-loader.c:646) ==23311==by 0x6A5B12C: svn_repos_fs_commit_txn (fs-wrap.c:59) The commit fails because the count: is wrong, but if I disable that check to allow the commit to complete then I see a bogus minfo-cnt such as: id: 0.0.r7/322 type: dir pred: 0.0.r4/180 count: 5 text: 7 198 111 0 242bff40060f22bcf85959dcf552851a cpath: / copyroot: 0 / minfo-cnt: 43 Using the current trunk with the issue 4129 fix applied this no longer happens: I don't get the valgrind warnings and the minfo-cnt is correct. -- uberSVN: Apache Subversion Made Easy http://www.uberSVN.com
Re: predecessor count for the root node-revision is wrong message
On Tue, Mar 20, 2012 at 2:32 PM, Daniel Shahaf danie...@elego.de wrote: Daniel Shahaf wrote on Tue, Mar 20, 2012 at 00:49:06 +0200: The time until 1.7.5 is counted in weeks, and 1.6.18 is scheduled to be released next week. The fix was merged to 1.6.x@HEAD today and barring surprises will be included in 1.6.18. Can someone tell me if the fix is client-side or server-side? I see r1302613 but am not familiar with the code. Also, will I need to do something with my repositories to fix any corruption? Thanks. Justin
Re: predecessor count for the root node-revision is wrong message
Justin Johnson wrote on Thu, Mar 22, 2012 at 13:03:04 -0500: On Tue, Mar 20, 2012 at 2:32 PM, Daniel Shahaf danie...@elego.de wrote: Daniel Shahaf wrote on Tue, Mar 20, 2012 at 00:49:06 +0200: The time until 1.7.5 is counted in weeks, and 1.6.18 is scheduled to be released next week. The fix was merged to 1.6.x@HEAD today and barring surprises will be included in 1.6.18. Can someone tell me if the fix is client-side or server-side? I see Server-side. r1302613 but am not familiar with the code. Also, will I need to do something with my repositories to fix any corruption? As noted on the issue, either create an svnsync mirror of them, or dump|load them, to fix the corruption. (svnsync is a client-side tool.) Thanks. Justin
Re: relation to minfo-cnt bug Re: predecessor count for the root node-revision is wrong message
Hello Daniel. I will give it a go and let you know what I find. Jason On Wed, Mar 21, 2012 at 1:39 AM, Daniel Shahaf danie...@elego.de wrote: Jason, I've learnt yesterday something new about the minfo-cnt corruption bug: it can manifest not only as absurdly high values (on the order of 2**70), but as far smaller wrong increments too (such as increment of 172 instead of of 0 on one occasion). Could you determine whether said bug has occurred in your history? You can do that by duplicating your repository using svnsync or dump|load, running dump-noderev.pl on / of both copies at the same revisions, and comparing the minfo-cnt values. I would be interested in knowing whether they are equal between the two copies. Thanks, Daniel Jason Wong wrote on Thu, Feb 16, 2012 at 11:42:42 -0800: ./dump-noderev.pl /repository / 61851 -- id: 0.0.r61851/33470 type: dir pred: 0.0.r61850/3844 count: 61818 text: 61851 32225 1232 1232 7555349571e297c23e647cc2441d5b8f cpath: / copyroot: 0 / minfo-cnt: 25685 --
Re: predecessor count for the root node-revision is wrong message
On Thu, Mar 22, 2012 at 1:13 PM, Daniel Shahaf danie...@elego.de wrote: Justin Johnson wrote on Thu, Mar 22, 2012 at 13:03:04 -0500: On Tue, Mar 20, 2012 at 2:32 PM, Daniel Shahaf danie...@elego.de wrote: Daniel Shahaf wrote on Tue, Mar 20, 2012 at 00:49:06 +0200: The time until 1.7.5 is counted in weeks, and 1.6.18 is scheduled to be released next week. The fix was merged to 1.6.x@HEAD today and barring surprises will be included in 1.6.18. Can someone tell me if the fix is client-side or server-side? I see Server-side. r1302613 but am not familiar with the code. Also, will I need to do something with my repositories to fix any corruption? As noted on the issue, either create an svnsync mirror of them, or dump|load them, to fix the corruption. (svnsync is a client-side tool.) Thanks. Justin Excellent. Thank you.
relation to minfo-cnt bug Re: predecessor count for the root node-revision is wrong message
Jason, I've learnt yesterday something new about the minfo-cnt corruption bug: it can manifest not only as absurdly high values (on the order of 2**70), but as far smaller wrong increments too (such as increment of 172 instead of of 0 on one occasion). Could you determine whether said bug has occurred in your history? You can do that by duplicating your repository using svnsync or dump|load, running dump-noderev.pl on / of both copies at the same revisions, and comparing the minfo-cnt values. I would be interested in knowing whether they are equal between the two copies. Thanks, Daniel Jason Wong wrote on Thu, Feb 16, 2012 at 11:42:42 -0800: ./dump-noderev.pl /repository / 61851 -- id: 0.0.r61851/33470 type: dir pred: 0.0.r61850/3844 count: 61818 text: 61851 32225 1232 1232 7555349571e297c23e647cc2441d5b8f cpath: / copyroot: 0 / minfo-cnt: 25685 --
Re: predecessor count for the root node-revision is wrong message
Daniel Shahaf wrote on Tue, Mar 20, 2012 at 00:49:06 +0200: The time until 1.7.5 is counted in weeks, and 1.6.18 is scheduled to be released next week. The fix was merged to 1.6.x@HEAD today and barring surprises will be included in 1.6.18.
Re: predecessor count for the root node-revision is wrong message
Daniel Shahaf danie...@elego.de writes: From what is there so far, yes. We do have different operations occurring at the same time, but for these ones, I see MERGE and DELETE verbs overlapping in the same or near time intervals according to the Apache logs. I just did a quick look in the Apache logs during a time window where the bug wasn't triggered, and was able to see cases where I have the following: rev(x-1) merge rev(x) merge rev(x) delete rev(x-1) delete This seems fine. The case that I had seen in my reported cases are as follows: rev(x-1) merge rev(x) merge rev(x-1) delete rev(x) delete I have tried forcing this sequence in the debugger, but that did not reproduce the bug. [...] I conclude that the specific MERGE/MERGE/DELETE/DELETE order, by itself, is not sufficient to reproduce the bug. DELETE is extremely unlikely to be involved in this bug. mod_dav_svn maintains a mapping of DAV activity ID to Subversion transaction name in the dav/ subdirectory of a repository. The dav/ subdirectory is not part of the Subversion filesystem. DELETE is used to remove the mapping for a particular activity ID and to remove the transaction if it still exists. Following a successful MERGE the transaction will have been converted into a revision so DELETE has no effect on the the Subversion filesystem. DELETE will only affect the Subversion filesystem if called before a successful MERGE, and that will cause any subsequent MERGE for the activity to fail. -- uberSVN: Apache Subversion Made Easy http://www.uberSVN.com
#4129 is reproducible Re: predecessor count for the root node-revision is wrong message
Daniel Shahaf wrote on Sun, Mar 18, 2012 at 16:28:21 +0200: [ cc += dev@. summary for dev@: investigating issue #4129: predecessor count of rN is not incremented by one wrt that of r(N-1); see http://subversion.tigris.org/issues/show_bug.cgi?id=4129 ] Okay, count me happy :-) I can reproduce this: [[[ $SVN ps svn:mergeinfo '/branch/A:2' wc1/trunk/A $SVN ps svn:mergeinfo '/branch/iota:2' wc2/trunk/iota $SVN mkdir wc1/foo $SVN mkdir wc2/bar $svn ci -mm wc1 $svn ci -mm wc2 wait ]]] Output: [[[ Adding wc1/foo Sendingwc1/trunk/A Adding wc2/bar Sendingwc2/trunk/iota Committed revision 2. subversion/svn/commit-cmd.c:183: (apr_err=160004) subversion/libsvn_client/commit.c:876: (apr_err=160004) subversion/libsvn_client/commit.c:876: (apr_err=160004) svn: E160004: Commit failed (details follow): subversion/libsvn_ra_serf/commit.c:2216: (apr_err=160004) subversion/libsvn_ra_serf/commit.c:2216: (apr_err=160004) subversion/libsvn_ra_serf/commit.c:2216: (apr_err=160004) subversion/libsvn_ra_serf/util.c:774: (apr_err=160004) subversion/libsvn_ra_serf/util.c:2087: (apr_err=160004) subversion/libsvn_ra_serf/util.c:2087: (apr_err=160004) subversion/libsvn_ra_serf/util.c:920: (apr_err=160004) svn: E160004: predecessor count for the root node-revision is wrong: found (2+1 != 2), committing r3 ]]] Error log: [[[ [Mon Mar 19 14:19:41.388413 2012] [dav:error] [pid 17156:tid 3064073072] [client ::1:40969] Could not MERGE resource /t/r1/!svn/txn/1-2 into /t/r1. [409, #0] [Mon Mar 19 14:19:41.388497 2012] [dav:error] [pid 17156:tid 3064073072] [client ::1:40969] An error occurred while committing the transaction. [409, #160004] [Mon Mar 19 14:19:41.388518 2012] [dav:error] [pid 17156:tid 3064073072] [client ::1:40969] predecessor count for the root node-revision is wrong: found (2+1 != 2), committing r3 [409, #160004] [Mon Mar 19 14:19:42.024052 2012] [authz_core:debug] [pid 17156:tid 3055680368] mod_authz_core.c(783): [client ::1:40971] AH01626: authorization result of Require all granted: granted [Mon Mar 19 14:19:42.024114 2012] [authz_core:debug] [pid 17156:tid 3055680368] mod_authz_core.c(783): [client ::1:40971] AH01626: authorization result of RequireAny: granted [Mon Mar 19 14:19:42.024781 2012] [dav:error] [pid 17156:tid 3055680368] [client ::1:40971] Could not fetch resource information. [404, #0] [Mon Mar 19 14:19:42.024820 2012] [dav:error] [pid 17156:tid 3055680368] [client ::1:40971] Named transaction doesn't exist. [404, #0] ]]] Using another property, or omitting either propset, is enough to cause the bug not to trigger. The bug reproduced both with and without the following patch: [[[ --- subversion/libsvn_fs_fs/tree.c (revision 1301511) +++ subversion/libsvn_fs_fs/tree.c (working copy) @@ -1 +1,2 @@ +#include unistd.h /* tree.c : tree-like filesystem, built on DAG filesystem @@ -1701 +1702,2 @@ svn_fs_fs__commit_txn(const char **conflict_p, + sleep(3); err = svn_fs_fs__commit(new_rev, fs, txn, iterpool); @@ -1729,3 +1731,4 @@ svn_fs_fs__commit_txn(const char **conflict_p, cleanup: + sleep(3); svn_fs_fs__reset_txn_caches(fs); ]]] The bug reproduced with either ServerLimit 1 or ThreadLimit 1 in httpd.conf. (That forced both commits to be served by the same process (resp., by different processes).) I use httpd 2.4.1 with event MPM.
Re: #4129 is reproducible Re: predecessor count for the root node-revision is wrong message
Daniel Shahaf danie...@elego.de writes: The bug reproduced with either ServerLimit 1 or ThreadLimit 1 in httpd.conf. (That forced both commits to be served by the same process (resp., by different processes).) I use httpd 2.4.1 with event MPM. I can reproduce ove ra_local: svnadmin create repo svn mkdir -mm file://`pwd`/repo/A svn mkdir -mm file://`pwd`/repo/B svn co file://`pwd`/repo wc1 svn co file://`pwd`/repo wc2 svn ps svn:mergeinfo /P:2 wc1/A svn ps svn:mergeinfo /Q:2 wc2/B svn mkdir wc1/X svn mkdir wc2/Y svn ci -mm wc1 svn ci -mm wc2 wait Gives: Sendingwc1/A Adding wc1/X Sendingwc2/B Adding wc2/Y Committed revision 3. svn: E160004: Commit failed (details follow): svn: E160004: predecessor count for the root node-revision is wrong: found 3, committing r4 -- uberSVN: Apache Subversion Made Easy http://www.uberSVN.com
Re: predecessor count for the root node-revision is wrong message
shashank subramaniam wrote on Mon, Mar 19, 2012 at 20:29:47 +0530: Hey, We tried looping as a solution to the ''Predescessor Count for the root node revision is wrong' error (We looped the commit alone till it is commited).This works, but if 100 people try to commit to the same repository at the same time, the 100th guy's instance of the program is looped 100 times.There is also the possibility of it falling into an infinite loop due to some internal error.So, can you suggest some other solution around this bug, whereby multiple users are able to commit to the same repository at the same time(Space complexity is not a constraint) Normally concurrent commits are supported. The bug under discussion has to do with a flaw in the mechanism that causes concurrent commits to succeed. I noticed a similar problem also, which I think might be related to the 'Predescessor' bug. Simultaneous SVN Updates from the same repository by different users fails silently (throws no error, but update doesn't happen). I think, probably an implicit lock is being put on the repository, preventing simultaneous operations.Please do let us know if some solution exists. In FSFS there are no read locks, and different readers don't interfere with each other or with others. You'll have to describe what you see in more detail. (Exactly what you did, what you saw, and how it differed from what you expected to see) Thanks Daniel Thanks Subu
Re: #4129 is reproducible Re: predecessor count for the root node-revision is wrong message
Philip Martin philip.mar...@wandisco.com writes: I can reproduce ove ra_local: svnadmin create repo svn mkdir -mm file://`pwd`/repo/A svn mkdir -mm file://`pwd`/repo/B svn co file://`pwd`/repo wc1 svn co file://`pwd`/repo wc2 svn ps svn:mergeinfo /P:2 wc1/A svn ps svn:mergeinfo /Q:2 wc2/B svn mkdir wc1/X svn mkdir wc2/Y svn ci -mm wc1 svn ci -mm wc2 wait Gives: Sendingwc1/A Adding wc1/X Sendingwc2/B Adding wc2/Y Committed revision 3. svn: E160004: Commit failed (details follow): svn: E160004: predecessor count for the root node-revision is wrong: found 3, committing r4 This is the problem code in libsvn_fs_fs/tree.c:merge SVN_ERR(svn_fs_fs__dag_get_predecessor_count(pred_count, source, pool)); SVN_ERR(update_ancestry(fs, source_id, target_id, target_path, pred_count, pool)); if (svn_fs_fs__fs_supports_mergeinfo(fs)) SVN_ERR(svn_fs_fs__dag_increment_mergeinfo_count(target, mergeinfo_increment, pool)); target is dag_node_t* which is opaque outside dag.c and target-node_revision-predecessor_count is 3 when we reach the above code. update_ancestry rewrites the file repo/db/transactions/2-2.txn/node.0.0 with the correctly updated value count: 4 but nothing updates target-node_revision-predecessor_count in memory. svn_fs_fs__dag_increment_mergeinfo_count then rewrites repo/db/transactions/2-2.txn/node.0.0 pulling the old value of target-node_revision-predecessor_count from memory and putting count: 3 back in the file. If I use the debugger to manually set target-node_revision to NULL inside svn_fs_fs__dag_increment_mergeinfo_count then the commit works. I'm not exactly sure how all the FSFS caching layers are supposed to interact. Is tree.c:update_ancestry supposed to update the in-memory predecessor_count? Should there be a svn_fs_fs__dag_xxx function to change the predecessor count? Should target-node_revision be set to NULL soemwehere? Something else? -- uberSVN: Apache Subversion Made Easy http://www.uberSVN.com
Re: #4129 is reproducible Re: predecessor count for the root node-revision is wrong message
Philip Martin wrote on Mon, Mar 19, 2012 at 17:25:22 +: Philip Martin philip.mar...@wandisco.com writes: I can reproduce ove ra_local: svnadmin create repo svn mkdir -mm file://`pwd`/repo/A svn mkdir -mm file://`pwd`/repo/B svn co file://`pwd`/repo wc1 svn co file://`pwd`/repo wc2 svn ps svn:mergeinfo /P:2 wc1/A svn ps svn:mergeinfo /Q:2 wc2/B svn mkdir wc1/X svn mkdir wc2/Y svn ci -mm wc1 svn ci -mm wc2 wait Gives: Sendingwc1/A Adding wc1/X Sendingwc2/B Adding wc2/Y Committed revision 3. svn: E160004: Commit failed (details follow): svn: E160004: predecessor count for the root node-revision is wrong: found 3, committing r4 This is the problem code in libsvn_fs_fs/tree.c:merge SVN_ERR(svn_fs_fs__dag_get_predecessor_count(pred_count, source, pool)); SVN_ERR(update_ancestry(fs, source_id, target_id, target_path, pred_count, pool)); if (svn_fs_fs__fs_supports_mergeinfo(fs)) SVN_ERR(svn_fs_fs__dag_increment_mergeinfo_count(target, mergeinfo_increment, pool)); target is dag_node_t* which is opaque outside dag.c and target-node_revision-predecessor_count is 3 when we reach the above code. update_ancestry rewrites the file repo/db/transactions/2-2.txn/node.0.0 with the correctly updated value count: 4 but nothing updates target-node_revision-predecessor_count in memory. svn_fs_fs__dag_increment_mergeinfo_count then rewrites repo/db/transactions/2-2.txn/node.0.0 pulling the old value of target-node_revision-predecessor_count from memory and putting count: 3 back in the file. If I use the debugger to manually set target-node_revision to NULL inside svn_fs_fs__dag_increment_mergeinfo_count then the commit works. I'm not exactly sure how all the FSFS caching layers are supposed to interact. Is tree.c:update_ancestry supposed to update the in-memory predecessor_count? Should there be a svn_fs_fs__dag_xxx function to change the predecessor count? Should target-node_revision be set to NULL soemwehere? Something else? Just a note that another option here is for merge() to re-fetch 'dag_node_t *target' via dag.h APIs, to get another struct with a fresh (actually: not yet populated) cache -noderev member.
Re: #4129 is reproducible Re: predecessor count for the root node-revision is wrong message
On 03/19/2012 01:25 PM, Philip Martin wrote: Philip Martin philip.mar...@wandisco.com writes: I can reproduce ove ra_local: svnadmin create repo svn mkdir -mm file://`pwd`/repo/A svn mkdir -mm file://`pwd`/repo/B svn co file://`pwd`/repo wc1 svn co file://`pwd`/repo wc2 svn ps svn:mergeinfo /P:2 wc1/A svn ps svn:mergeinfo /Q:2 wc2/B svn mkdir wc1/X svn mkdir wc2/Y svn ci -mm wc1 svn ci -mm wc2 wait Gives: Sendingwc1/A Adding wc1/X Sendingwc2/B Adding wc2/Y Committed revision 3. svn: E160004: Commit failed (details follow): svn: E160004: predecessor count for the root node-revision is wrong: found 3, committing r4 Is this problem specific to the FSFS backend? -- C. Michael Pilato cmpil...@collab.net CollabNet www.collab.net Distributed Development On Demand signature.asc Description: OpenPGP digital signature
Re: #4129 is reproducible Re: predecessor count for the root node-revision is wrong message
C. Michael Pilato wrote on Mon, Mar 19, 2012 at 13:57:51 -0400: Is this problem specific to the FSFS backend? No. % ../runpytest svnadmin mergeinfo_race --fs-type bdb 2012-03-19 20:21:44 [WARNING] CWD: /home/daniel/src/svn/t1/subversion/tests/cmdline 2012-03-19 20:21:44 [WARNING] EXCEPTION: Failure: one or both commits failed XFAIL: svnadmin_tests.py 29: concurrent mergeinfo commits invalidate pred-count
Re: #4129 is reproducible Re: predecessor count for the root node-revision is wrong message
C. Michael Pilato cmpil...@collab.net writes: On 03/19/2012 01:25 PM, Philip Martin wrote: Philip Martin philip.mar...@wandisco.com writes: I can reproduce ove ra_local: svnadmin create repo svn mkdir -mm file://`pwd`/repo/A svn mkdir -mm file://`pwd`/repo/B svn co file://`pwd`/repo wc1 svn co file://`pwd`/repo wc2 svn ps svn:mergeinfo /P:2 wc1/A svn ps svn:mergeinfo /Q:2 wc2/B svn mkdir wc1/X svn mkdir wc2/Y svn ci -mm wc1 svn ci -mm wc2 wait Gives: Sendingwc1/A Adding wc1/X Sendingwc2/B Adding wc2/Y Committed revision 3. svn: E160004: Commit failed (details follow): svn: E160004: predecessor count for the root node-revision is wrong: found 3, committing r4 Is this problem specific to the FSFS backend? Yes, I think it is. For BDB the dag_node_t type in dag.c doesn't have a node_revision member. When update_ancestry does svn_fs_bdb__put_node_revision it writes to the database and subsequent svn_fs_bdb__get_node_revision will see the updated value. For FSFS the svn_fs_fs__put_node_revision call writes to the transactions subdir but the in-memory node_revision doesn't get changed. -- uberSVN: Apache Subversion Made Easy http://www.uberSVN.com
Re: #4129 is reproducible Re: predecessor count for the root node-revision is wrong message
Daniel Shahaf danie...@elego.de writes: C. Michael Pilato wrote on Mon, Mar 19, 2012 at 13:57:51 -0400: Is this problem specific to the FSFS backend? No. % ../runpytest svnadmin mergeinfo_race --fs-type bdb 2012-03-19 20:21:44 [WARNING] CWD: /home/daniel/src/svn/t1/subversion/tests/cmdline 2012-03-19 20:21:44 [WARNING] EXCEPTION: Failure: one or both commits failed XFAIL: svnadmin_tests.py 29: concurrent mergeinfo commits invalidate pred-count I think that's a failure of the regression test, after the XFAIL the repository looks like: $ svn log -vqr3:2 file://`pwd`/svn-test-work/repositories/svnadmin_tests-29/ r3 | jrandom | 2012-03-19 18:28:53 + (Mon, 19 Mar 2012) Changed paths: M /A A /d1 r2 | jrandom | 2012-03-19 18:28:53 + (Mon, 19 Mar 2012) Changed paths: A /d2 M /iota Also the test still XFAILs for FSFS with my FSFS patch to fix the problem. -- uberSVN: Apache Subversion Made Easy http://www.uberSVN.com
Re: #4129 is reproducible Re: predecessor count for the root node-revision is wrong message
On 03/19/2012 02:24 PM, Philip Martin wrote: C. Michael Pilato cmpil...@collab.net writes: Is this problem specific to the FSFS backend? Yes, I think it is. For BDB the dag_node_t type in dag.c doesn't have a node_revision member. When update_ancestry does svn_fs_bdb__put_node_revision it writes to the database and subsequent svn_fs_bdb__get_node_revision will see the updated value. For FSFS the svn_fs_fs__put_node_revision call writes to the transactions subdir but the in-memory node_revision doesn't get changed. Gotcha. Thanks for the response. I remember when the BDB code had an in-memory node-revision structure and tried to maintain that using the trail construct. I remember also the day I purged the code of that mess. :-) -- C. Michael Pilato cmpil...@collab.net CollabNet www.collab.net Distributed Development On Demand signature.asc Description: OpenPGP digital signature
Re: #4129 is reproducible Re: predecessor count for the root node-revision is wrong message
Philip Martin wrote on Mon, Mar 19, 2012 at 18:31:41 +: Daniel Shahaf danie...@elego.de writes: C. Michael Pilato wrote on Mon, Mar 19, 2012 at 13:57:51 -0400: Is this problem specific to the FSFS backend? No. % ../runpytest svnadmin mergeinfo_race --fs-type bdb 2012-03-19 20:21:44 [WARNING] CWD: /home/daniel/src/svn/t1/subversion/tests/cmdline 2012-03-19 20:21:44 [WARNING] EXCEPTION: Failure: one or both commits failed XFAIL: svnadmin_tests.py 29: concurrent mergeinfo commits invalidate pred-count I think that's a failure of the regression test, after the XFAIL the repository looks like: Right as usual, implemented your fix in r1302591.
Re: #4129 is reproducible Re: predecessor count for the root node-revision is wrong message
Philip Martin philip.mar...@wandisco.com writes: If I use the debugger to manually set target-node_revision to NULL inside svn_fs_fs__dag_increment_mergeinfo_count then the commit works. I'm not exactly sure how all the FSFS caching layers are supposed to interact. Is tree.c:update_ancestry supposed to update the in-memory predecessor_count? Should there be a svn_fs_fs__dag_xxx function to change the predecessor count? Should target-node_revision be set to NULL soemwehere? Something else? Moving update_ancestry from tree.c to dag.c is one way to fix the problem. Daniel also suggested removing the node_revision member of dag_node_t altogether and relying on new 1.7 caching to give us the performance. I suppose we would still need a patch like this for 1.6. Index: ../src/subversion/libsvn_fs_fs/dag.c === --- ../src/subversion/libsvn_fs_fs/dag.c(revision 1302591) +++ ../src/subversion/libsvn_fs_fs/dag.c(working copy) @@ -1296,3 +1296,27 @@ return SVN_NO_ERROR; } + +svn_error_t * +svn_fs_fs__dag_update_ancestry(dag_node_t *target, + dag_node_t *source, + apr_pool_t *pool) +{ + node_revision_t *source_noderev, *target_noderev; + + if (! svn_fs_fs__dag_check_mutable(target)) +return svn_error_createf + (SVN_ERR_FS_NOT_MUTABLE, NULL, + _(Attempted to update ancestry of non-mutable node)); + + SVN_ERR(get_node_revision(source_noderev, source, pool)); + SVN_ERR(get_node_revision(target_noderev, target, pool)); + + target_noderev-predecessor_id = source-id; + target_noderev-predecessor_count = source_noderev-predecessor_count; + if (target_noderev-predecessor_count != -1) +target_noderev-predecessor_count++; + + return svn_fs_fs__put_node_revision(target-fs, target-id, target_noderev, + FALSE, pool); +} Index: ../src/subversion/libsvn_fs_fs/tree.c === --- ../src/subversion/libsvn_fs_fs/tree.c (revision 1302591) +++ ../src/subversion/libsvn_fs_fs/tree.c (working copy) @@ -1142,32 +1142,6 @@ } -/* Teach node-revision TARGET_ID that node-revision SOURCE_ID is its - predecessor. TARGET_PATH is used for error messages only. */ -static svn_error_t * -update_ancestry(svn_fs_t *fs, -const svn_fs_id_t *source_id, -const svn_fs_id_t *target_id, -const char *target_path, -int source_pred_count, -apr_pool_t *pool) -{ - node_revision_t *noderev; - - if (svn_fs_fs__id_txn_id(target_id) == NULL) -return svn_error_createf - (SVN_ERR_FS_NOT_MUTABLE, NULL, - _(Unexpected immutable node at '%s'), target_path); - - SVN_ERR(svn_fs_fs__get_node_revision(noderev, fs, target_id, pool)); - noderev-predecessor_id = source_id; - noderev-predecessor_count = source_pred_count; - if (noderev-predecessor_count != -1) -noderev-predecessor_count++; - return svn_fs_fs__put_node_revision(fs, target_id, noderev, FALSE, pool); -} - - /* Set the contents of CONFLICT_PATH to PATH, and return an SVN_ERR_FS_CONFLICT error that indicates that there was a conflict at PATH. Perform all allocations in POOL (except the allocation of @@ -1219,7 +1193,6 @@ apr_hash_index_t *hi; svn_fs_t *fs; apr_pool_t *iterpool; - int pred_count; apr_int64_t mergeinfo_increment = 0; /* Make sure everyone comes from the same filesystem. */ @@ -1543,9 +1516,7 @@ } svn_pool_destroy(iterpool); - SVN_ERR(svn_fs_fs__dag_get_predecessor_count(pred_count, source, pool)); - SVN_ERR(update_ancestry(fs, source_id, target_id, target_path, - pred_count, pool)); + SVN_ERR(svn_fs_fs__dag_update_ancestry(target, source, pool)); if (svn_fs_fs__fs_supports_mergeinfo(fs)) SVN_ERR(svn_fs_fs__dag_increment_mergeinfo_count(target, Index: ../src/subversion/libsvn_fs_fs/dag.h === --- ../src/subversion/libsvn_fs_fs/dag.h(revision 1302591) +++ ../src/subversion/libsvn_fs_fs/dag.h(working copy) @@ -603,6 +603,10 @@ dag_node_t *node, apr_pool_t *pool); +svn_error_t * +svn_fs_fs__dag_update_ancestry(dag_node_t *target, + dag_node_t *source, + apr_pool_t *pool); #ifdef __cplusplus } #endif /* __cplusplus */ -- uberSVN: Apache Subversion Made Easy http://www.uberSVN.com
Re: #4129 is reproducible Re: predecessor count for the root node-revision is wrong message
Philip Martin wrote on Mon, Mar 19, 2012 at 18:45:37 +: Philip Martin philip.mar...@wandisco.com writes: If I use the debugger to manually set target-node_revision to NULL inside svn_fs_fs__dag_increment_mergeinfo_count then the commit works. I'm not exactly sure how all the FSFS caching layers are supposed to interact. Is tree.c:update_ancestry supposed to update the in-memory predecessor_count? Should there be a svn_fs_fs__dag_xxx function to change the predecessor count? Should target-node_revision be set to NULL soemwehere? Something else? Moving update_ancestry from tree.c to dag.c is one way to fix the problem. Daniel also suggested removing the node_revision member of dag_node_t altogether and relying on new 1.7 caching to give us the performance. I suppose we would still need a patch like this for 1.6. ++1
Re: predecessor count for the root node-revision is wrong message
Hello Daniel, Philip. I have been following the thread: #4129 is reproducible Re: predecessor count for the root node-revision is wrong message. It looks like you all have it figured out now. Good job. Do you need any more information from me at this point? Thanks. Jason Wong.
Re: predecessor count for the root node-revision is wrong message
Jason Wong wrote on Mon, Mar 19, 2012 at 13:41:19 -0700: Hello Daniel, Philip. I have been following the thread: #4129 is reproducible Re: predecessor count for the root node-revision is wrong message. It looks like you all have it figured out now. Good job. Do you need any more information from me at this point? Thanks. Thanks Jason. It would be useful if you could confirm that you do not run into the error after rebuilding the server with r1302399 and r1302613 applied. (If you run the test suite, apply r1302539 and r1302591 too.) These revisions constitute the fix which is nominated for inclusion in 1.6.18 and 1.7.5; see ^/subversion/branches/1.7.x/STATUS. Cheers, Daniel Jason Wong.
Re: predecessor count for the root node-revision is wrong message
On Mon, Mar 19, 2012 at 1:56 PM, Daniel Shahaf danie...@elego.de wrote: Jason Wong wrote on Mon, Mar 19, 2012 at 13:41:19 -0700: Hello Daniel, Philip. I have been following the thread: #4129 is reproducible Re: predecessor count for the root node-revision is wrong message. It looks like you all have it figured out now. Good job. Do you need any more information from me at this point? Thanks. Thanks Jason. It would be useful if you could confirm that you do not run into the error after rebuilding the server with r1302399 and r1302613 applied. (If you run the test suite, apply r1302539 and r1302591 too.) These revisions constitute the fix which is nominated for inclusion in 1.6.18 and 1.7.5; see ^/subversion/branches/1.7.x/STATUS. Hi Daniel. The developer who built the svn client is away and will probably not be back until later this week. What is your ETA for 1.7.5? Just wondering if that would released before the developer I have is back. Thanks. Jason Cheers, Daniel Jason Wong.
Re: predecessor count for the root node-revision is wrong message
Jason Wong wrote on Mon, Mar 19, 2012 at 15:34:53 -0700: On Mon, Mar 19, 2012 at 1:56 PM, Daniel Shahaf danie...@elego.de wrote: Jason Wong wrote on Mon, Mar 19, 2012 at 13:41:19 -0700: Hello Daniel, Philip. I have been following the thread: #4129 is reproducible Re: predecessor count for the root node-revision is wrong message. It looks like you all have it figured out now. Good job. Do you need any more information from me at this point? Thanks. Thanks Jason. It would be useful if you could confirm that you do not run into the error after rebuilding the server with r1302399 and r1302613 applied. (If you run the test suite, apply r1302539 and r1302591 too.) These revisions constitute the fix which is nominated for inclusion in 1.6.18 and 1.7.5; see ^/subversion/branches/1.7.x/STATUS. Hi Daniel. The developer who built the svn client is away and will probably not be back until later this week. What is your ETA for 1.7.5? Just wondering if that would released before the developer I have is back. The time until 1.7.5 is counted in weeks, and 1.6.18 is scheduled to be released next week. Thanks. Jason Cheers, Daniel Jason Wong.
Re: predecessor count for the root node-revision is wrong message
[ cc += dev@. summary for dev@: investigating issue #4129: predecessor count of rN is not incremented by one wrt that of r(N-1); see http://subversion.tigris.org/issues/show_bug.cgi?id=4129 ] Jason Wong wrote on Thu, Mar 15, 2012 at 07:57:47 -0700: On Wed, Mar 14, 2012 at 5:15 PM, Daniel Shahaf danie...@elego.de wrote: Jason Wong wrote on Tue, Mar 13, 2012 at 06:57:59 -0700: On Fri, Mar 2, 2012 at 8:12 AM, Daniel Shahaf danie...@elego.de wrote: Jason Wong wrote on Fri, Mar 02, 2012 at 07:32:38 -0800: On Fri, Mar 2, 2012 at 2:58 AM, Daniel Shahaf danie...@elego.de wrote: Jason Wong wrote on Thu, Mar 01, 2012 at 10:01:26 -0800: I have had a developer here create a build of the latest SVN code with your changes you mentioned in r1294470 for the svnadmin verify Okay, that's great news, for two reasons: 1. It means building svn on windows isn't as painful as it used to be :) Actually, it did take some work to get it going as we did not have another system available to us and also did not have VC++ 6. We had to use VS 2010 in order to do this. Also, for the other components required (python,perl etc), the files after the install were copied to the workstation to see if it would work as we did not want to change the current workstation configuration by running the installers. All in all, it did seem to work. Okay. The normal build requires just the *.exe and *.dll files to be placed appropriately (such that the *.exe's and httpd's find their libsvn_* DLL's at runtime) --- it doesn't require Administrator access, for example. To clarify, Perl is only required to build OpenSSL; it is not required to build APR, Neon, or Subversion. 2. It means I can ask you to build a custom server with the 'inprocess' cache disabled, or (if all else fails) to bisect, per my previous email. One of the things you could try is to disable caching: simply modify the function create_cache() in libsvn_fs_fs/caching.c to always return NULL in *CACHE_P. See below for another suggestion. command. We have run 'svnadmin verify' against every revision of our hotcopy of our repository taken when we first brought this issue to the forums and are now tracking down each of the revisions to see what actions were being done at those times. Thanks! I do hope this work enables us to pinpoint and fix the bug. I will be going through the list to see what else was happening at the same time on the apache server since it was alluded to that there may be concurrency issues. I know the last two times that this error has popped up, we had two svn operations starting at around the same time according to the Apache logs. I will go through the previous apache history to see if this was always the case or not. Thanks, looking forward to hear what you come up with. FWIW, Justin's reply suggests that the error was seen on three different platforms --- Windows, Solaris, and FreeBSD --- so that should narrow down the range of possible explanations. (I'll also note that at ASF's installation we are not running into new instances of the bug.) Hi Daniel. Hi. Sorry for the delay --- was away from svn the last few days. No problem. I have been really busy as well past couple weeks on other tasks. I haven't gone through all the cases yet, but I have made progress through quite a number of them and a pattern seems to be coming up. Is it safe to summarize your findings as: in every instance of the bug (as determined by the new 'svnadmin verify' output), the victim revision was started whilst (victim-1) was in progress? From what is there so far, yes. We do have different operations occurring at the same time, but for these ones, I see MERGE and DELETE verbs overlapping in the same or near time intervals according to the Apache logs. I just did a quick look in the Apache logs during a time window where the bug wasn't triggered, and was able to see cases where I have the following: rev(x-1) merge rev(x) merge rev(x) delete rev(x-1) delete This seems fine. The case that I had seen in my reported cases are as follows: rev(x-1) merge rev(x) merge rev(x-1) delete rev(x) delete I have tried forcing this sequence in the debugger, but that did not reproduce the bug. I was using trunk@1301511 mod_dav_svn, trunk client, and httpd-2.4. I set a breakpoint in ra_neon's commit.c:commit_close_edit() and ra_serf's commit.c:close_edit() before the DELETE of the activity. and sent the requests in the following order: WC1-MERGE, WC2-MERGE, WC1-DELETE, WC2-DELETE. I tried both serf and neon. I tried mod_dav_svn of trunk@1301511 and 1.6.x@1301511. (Only httpv1 makes sense, since in httpv2 DELETEs do not occur.) I conclude that the specific MERGE/MERGE/DELETE/DELETE order, by itself, is not sufficient to
Re: predecessor count for the root node-revision is wrong message
Hey Jason, I'm also facing a similar problem.I'm working on a project where the svn implementation is on the server side.Since svn merge(reintegration only required for my project) reqiures a local working copy, I maintain one working copy folder for each user, which I switch to the target repository folder(the folder becomes the working copy of the repository url of the merge target).When two users, try to merge at the very same instant, the merge happens, but the following commit that is called inside the same function returns the 'Predescessor Count for the root node revision is wrong' for the second user (1st user is able to commit properly).I'm invoking a cmd propmt on the server via c# code and running the svn commands there.I have gone through your discussions here.I found it very technical.Could you Kindly please try to be a bit non-technical while replying to my queries? On Wednesday, 8 February 2012 02:53:10 UTC+5:30, Jason Wong wrote: Hello. I have recently run into an issue with my subversion system (1.7.1) where a specific component I am trying to build has failed. I have had sucessful builds of this project before this issue happened since we upgraded to 1.7.1. I am currently getting by this by using TortoiseSVN 1.7.1 to do the check-in of the files left in the workspace. This issue seems to be intermittent as it does not happen every time. I am using subversion 1.7.1 windows binaries and subversion is hosting on Apache 2.2.21. I have run svnadmin verify against the directory and it has come back clean. I was wondering what type of events can cause this to happen? Are there any resolutions? Here are some entries from the apache log for the last time this issue happened. The Apache error log shows the following lines: [Tue Jan 31 11:37:23 2012] [error] [client 9.31.13.109] Could not MERGE resource /repository/!svn/txn/61847-1bz5 into /repository/project/binaries/release/phase1/iteration/81/trunk . [409, #0] [Tue Jan 31 11:37:23 2012] [error] [client 9.31.13.109] An error occurred while committing the transaction. [409, #160004] [Tue Jan 31 11:37:23 2012] [error] [client 9.31.13.109] predecessor count for the root node-revision is wrong: found 61815, committing r61852 [409, #160004] [Tue Jan 31 11:37:24 2012] [error] [client 9.31.13.109] Could not fetch resource information. [404, #0] [Tue Jan 31 11:37:24 2012] [error] [client 9.31.13.109] Named transaction doesn't exist. [404, #0] The following are from the Apache access log at the same time: 9.31.13.109 - username [31/Jan/2012:11:37:22 -0800] MERGE /repository/project/binaries/release/phase1/iteration/81/trunk HTTP/1.1 409 281 9.31.13.109 - username [31/Jan/2012:11:37:24 -0800] DELETE /repository/!svn/txn/61847-1bz5 HTTP/1.1 404 232 Any help/comments would be appreciated. Thank you. Jason Wong
Re: predecessor count for the root node-revision is wrong message
Jason Wong wrote on Tue, Mar 13, 2012 at 06:57:59 -0700: On Fri, Mar 2, 2012 at 8:12 AM, Daniel Shahaf danie...@elego.de wrote: Jason Wong wrote on Fri, Mar 02, 2012 at 07:32:38 -0800: On Fri, Mar 2, 2012 at 2:58 AM, Daniel Shahaf danie...@elego.de wrote: Jason Wong wrote on Thu, Mar 01, 2012 at 10:01:26 -0800: I have had a developer here create a build of the latest SVN code with your changes you mentioned in r1294470 for the svnadmin verify Okay, that's great news, for two reasons: 1. It means building svn on windows isn't as painful as it used to be :) Actually, it did take some work to get it going as we did not have another system available to us and also did not have VC++ 6. We had to use VS 2010 in order to do this. Also, for the other components required (python,perl etc), the files after the install were copied to the workstation to see if it would work as we did not want to change the current workstation configuration by running the installers. All in all, it did seem to work. Okay. The normal build requires just the *.exe and *.dll files to be placed appropriately (such that the *.exe's and httpd's find their libsvn_* DLL's at runtime) --- it doesn't require Administrator access, for example. To clarify, Perl is only required to build OpenSSL; it is not required to build APR, Neon, or Subversion. 2. It means I can ask you to build a custom server with the 'inprocess' cache disabled, or (if all else fails) to bisect, per my previous email. One of the things you could try is to disable caching: simply modify the function create_cache() in libsvn_fs_fs/caching.c to always return NULL in *CACHE_P. See below for another suggestion. command. We have run 'svnadmin verify' against every revision of our hotcopy of our repository taken when we first brought this issue to the forums and are now tracking down each of the revisions to see what actions were being done at those times. Thanks! I do hope this work enables us to pinpoint and fix the bug. I will be going through the list to see what else was happening at the same time on the apache server since it was alluded to that there may be concurrency issues. I know the last two times that this error has popped up, we had two svn operations starting at around the same time according to the Apache logs. I will go through the previous apache history to see if this was always the case or not. Thanks, looking forward to hear what you come up with. FWIW, Justin's reply suggests that the error was seen on three different platforms --- Windows, Solaris, and FreeBSD --- so that should narrow down the range of possible explanations. (I'll also note that at ASF's installation we are not running into new instances of the bug.) Hi Daniel. Hi. Sorry for the delay --- was away from svn the last few days. I haven't gone through all the cases yet, but I have made progress through quite a number of them and a pattern seems to be coming up. Is it safe to summarize your findings as: in every instance of the bug (as determined by the new 'svnadmin verify' output), the victim revision was started whilst (victim-1) was in progress? That by itself is an everyday occurence, but I think it's nonetheless a useful piece of information. I'll try and digest it further later when I'm less sleepy (it's way past midnight here). (As I understand ra_dav, the MERGE verb corresponds to the FS level svn_fs_commit_txn(). Someone please correct me if I'm wrong.) Thanks, Daniel I have attached 2 txt files. One shows the modified svnadmin verify output from the binaries we built. The other shows the revisions and what appears to have been occuring at the time of the bug. I figure better to provide this now rather than delay any longer for the rest of the results. I will continue to go through the rest of the events and see if there are other differences seen when the issue occurs. I hope this information helps. Thanks. Jason
Re: predecessor count for the root node-revision is wrong message
subu7890 wrote on Tue, Mar 13, 2012 at 21:51:17 -0700: Hey Jason, I'm also facing a similar problem.I'm working on a project where the svn implementation is on the server side.Since svn merge(reintegration only required for my project) reqiures a local working copy, I maintain one working copy folder for each user, which I switch to the target repository folder(the folder becomes the working copy of the repository url of the merge target).When two users, try to merge at the very same instant, the merge happens, but the following commit that is called inside the same function returns the 'Predescessor Count for the root node revision is wrong' for the second user (1st user is able to commit properly).I'm invoking a cmd propmt on the server via c# code and running the svn commands there.I have gone through your discussions here.I found it very technical.Could you Kindly please try to be a bit non-technical while replying to my queries? Sure, we will try. What are your queries? Right now we don't know what exactly causes the bug; we are still investigating. When we make progress or find a fix, we will announce that. The best place to watch for such announcements is by registering to the issue tracker and adding yourself to the CC list on issue #4129 (link below): http://subversion.tigris.org/issues/show_bug.cgi?id=4129 Please keep discussion of the bug on the mailing list, not on the issue tracker. Cheers, Daniel On Wednesday, 8 February 2012 02:53:10 UTC+5:30, Jason Wong wrote: Hello. I have recently run into an issue with my subversion system (1.7.1) where a specific component I am trying to build has failed. I have had sucessful builds of this project before this issue happened since we upgraded to 1.7.1. I am currently getting by this by using TortoiseSVN 1.7.1 to do the check-in of the files left in the workspace. This issue seems to be intermittent as it does not happen every time. I am using subversion 1.7.1 windows binaries and subversion is hosting on Apache 2.2.21. I have run svnadmin verify against the directory and it has come back clean. I was wondering what type of events can cause this to happen? Are there any resolutions? Here are some entries from the apache log for the last time this issue happened. The Apache error log shows the following lines: [Tue Jan 31 11:37:23 2012] [error] [client 9.31.13.109] Could not MERGE resource /repository/!svn/txn/61847-1bz5 into /repository/project/binaries/release/phase1/iteration/81/trunk . [409, #0] [Tue Jan 31 11:37:23 2012] [error] [client 9.31.13.109] An error occurred while committing the transaction. [409, #160004] [Tue Jan 31 11:37:23 2012] [error] [client 9.31.13.109] predecessor count for the root node-revision is wrong: found 61815, committing r61852 [409, #160004] [Tue Jan 31 11:37:24 2012] [error] [client 9.31.13.109] Could not fetch resource information. [404, #0] [Tue Jan 31 11:37:24 2012] [error] [client 9.31.13.109] Named transaction doesn't exist. [404, #0] The following are from the Apache access log at the same time: 9.31.13.109 - username [31/Jan/2012:11:37:22 -0800] MERGE /repository/project/binaries/release/phase1/iteration/81/trunk HTTP/1.1 409 281 9.31.13.109 - username [31/Jan/2012:11:37:24 -0800] DELETE /repository/!svn/txn/61847-1bz5 HTTP/1.1 404 232 Any help/comments would be appreciated. Thank you. Jason Wong
Re: predecessor count for the root node-revision is wrong message
On Fri, Mar 2, 2012 at 8:12 AM, Daniel Shahaf danie...@elego.de wrote: Jason Wong wrote on Fri, Mar 02, 2012 at 07:32:38 -0800: On Fri, Mar 2, 2012 at 2:58 AM, Daniel Shahaf danie...@elego.de wrote: Jason Wong wrote on Thu, Mar 01, 2012 at 10:01:26 -0800: I have had a developer here create a build of the latest SVN code with your changes you mentioned in r1294470 for the svnadmin verify Okay, that's great news, for two reasons: 1. It means building svn on windows isn't as painful as it used to be :) Actually, it did take some work to get it going as we did not have another system available to us and also did not have VC++ 6. We had to use VS 2010 in order to do this. Also, for the other components required (python,perl etc), the files after the install were copied to the workstation to see if it would work as we did not want to change the current workstation configuration by running the installers. All in all, it did seem to work. Okay. The normal build requires just the *.exe and *.dll files to be placed appropriately (such that the *.exe's and httpd's find their libsvn_* DLL's at runtime) --- it doesn't require Administrator access, for example. To clarify, Perl is only required to build OpenSSL; it is not required to build APR, Neon, or Subversion. 2. It means I can ask you to build a custom server with the 'inprocess' cache disabled, or (if all else fails) to bisect, per my previous email. One of the things you could try is to disable caching: simply modify the function create_cache() in libsvn_fs_fs/caching.c to always return NULL in *CACHE_P. See below for another suggestion. command. We have run 'svnadmin verify' against every revision of our hotcopy of our repository taken when we first brought this issue to the forums and are now tracking down each of the revisions to see what actions were being done at those times. Thanks! I do hope this work enables us to pinpoint and fix the bug. I will be going through the list to see what else was happening at the same time on the apache server since it was alluded to that there may be concurrency issues. I know the last two times that this error has popped up, we had two svn operations starting at around the same time according to the Apache logs. I will go through the previous apache history to see if this was always the case or not. Thanks, looking forward to hear what you come up with. FWIW, Justin's reply suggests that the error was seen on three different platforms --- Windows, Solaris, and FreeBSD --- so that should narrow down the range of possible explanations. (I'll also note that at ASF's installation we are not running into new instances of the bug.) Hi Daniel. I haven't gone through all the cases yet, but I have made progress through quite a number of them and a pattern seems to be coming up. I have attached 2 txt files. One shows the modified svnadmin verify output from the binaries we built. The other shows the revisions and what appears to have been occuring at the time of the bug. I figure better to provide this now rather than delay any longer for the rest of the results. I will continue to go through the rest of the events and see if there are other differences seen when the issue occurs. I hope this information helps. Thanks. Jason SVN log history for predecessor node error: from svnadmin verify svnadmin: E160004: predecessor count for the root node-revision is wrong: r45558 has 45557, but r45557 has 45557 svnadmin: E160004: predecessor count for the root node-revision is wrong: r46947 has 46945, but r46946 has 46945 svnadmin: E160004: predecessor count for the root node-revision is wrong: r46997 has 46994, but r46996 has 46994 svnadmin: E160004: predecessor count for the root node-revision is wrong: r47004 has 47000, but r47003 has 47000 svnadmin: E160004: predecessor count for the root node-revision is wrong: r47006 has 47001, but r47005 has 47001 svnadmin: E160004: predecessor count for the root node-revision is wrong: r47193 has 47187, but r47192 has 47187 svnadmin: E160004: predecessor count for the root node-revision is wrong: r47715 has 47708, but r47714 has 47708 svnadmin: E160004: predecessor count for the root node-revision is wrong: r47718 has 47710, but r47717 has 47710 svnadmin: E160004: predecessor count for the root node-revision is wrong: r50049 has 50040, but r50048 has 50040 svnadmin: E160004: predecessor count for the root node-revision is wrong: r50963 has 50953, but r50962 has 50953 svnadmin: E160004: predecessor count for the root node-revision is wrong: r51481 has 51470, but r51480 has 51470 svnadmin: E160004: predecessor count for the root node-revision is wrong: r51684 has 51672, but r51683 has 51672 svnadmin: E160004: predecessor count for the root node-revision is wrong: r52082 has 52069, but r52081 has 52069 svnadmin: E160004: predecessor count for the root node-revision is wrong: r53220 has 53205, but
Re: predecessor count for the root node-revision is wrong message
Justin Johnson wrote on Thu, Mar 01, 2012 at 07:45:08 -0600: On Wed, Feb 29, 2012 at 4:14 PM, Justin Johnson justinandto...@gmail.comwrote: On Wed, Feb 29, 2012 at 11:22 AM, Daniel Shahaf danie...@elego.de wrote: ... so please try SVNInMemoryCacheSize 0, and see if that makes the issue less frequent. I'm a dork. I will do so once I take care of the appropriate change control I have to deal with. Thanks. We made the change and problem is still occurring. OK. So that means that either membuffer isn't the cause of the error, or both membuffer and inprocess have (independent) bugs. That said, I'm happy to see that it took you less than a day to report that the bug no longer reproduces. Can you reproduce it at will? --- I'm not completely sure what to try next. Were there any other significant changes in 1.6.x-1.7.x besides caching? If so, the next steps would be to disable _both_ inprocess and membuffer, or to bisect. Unfortunately, the latter requires building svn from source, and the former requires either that, or having you upgrade to specially-built binaries, or squeezing the disable inprocess feature into 1.7.4 (in a way that doesn't break our compat policy too badly) and having you upgrade to that. devs/all --- thoughts? Is there a simpler way forward than the two potential 'next step's I've outlined here?
Re: predecessor count for the root node-revision is wrong message
Daniel Shahaf wrote on Wed, Feb 29, 2012 at 18:15:41 +0200: Justin, Jason, Some things you could do are: - What RA method do you use? svn:// or http://? Justin, what operating system does your server run?
Re: predecessor count for the root node-revision is wrong message
Jason Wong wrote on Thu, Mar 01, 2012 at 10:01:26 -0800: I have had a developer here create a build of the latest SVN code with your changes you mentioned in r1294470 for the svnadmin verify Okay, that's great news, for two reasons: 1. It means building svn on windows isn't as painful as it used to be :) 2. It means I can ask you to build a custom server with the 'inprocess' cache disabled, or (if all else fails) to bisect, per my previous email. One of the things you could try is to disable caching: simply modify the function create_cache() in libsvn_fs_fs/caching.c to always return NULL in *CACHE_P. See below for another suggestion. command. We have run 'svnadmin verify' against every revision of our hotcopy of our repository taken when we first brought this issue to the forums and are now tracking down each of the revisions to see what actions were being done at those times. Thanks! I do hope this work enables us to pinpoint and fix the bug. From the results, we see 25 error messages for predecessor count is wrong and the first one appeared on January 26, 2011. Near that time the following events occurred: Jan. 14, 2011 - svn upgraded from 1.6.6 to 1.6.15 Jan. 14, 2011 - Apache HTTP server upgraded from 2.2.15 to 2.2.17 Jan. 21, 2011 - repository was pruned to delete some binary files. Between January and our upgrade in Dec. to 1.7.1, we have had about 14,000 revisions and seen only 25 instances of this node revision issue. During the times we had these errors, we were using svn versions 1.6.15 and 1.6.16. Thanks, very valuable information. I've reviewed the 1.6.6-1.6.15 diff, and I have the following suggestions: - Change subversion/libsvn_fs_fs/fs.h such that SVN_FS_FS__USE_LOCK_MUTEX is set to 1. It was set to 1 in 1.6.6 but to 0 in 1.6.15. (This wouldn't explain why ASF saw it, but it might explain why you're seeing it.) Fail2ban from what I could find does not look like it has a Windows port which I currently have my production environment hosted on. Yeah, sorry. But you can write a cron job -- I mean, a Scheduled Task -- that greps your error logs for 160004 every night and mails you it it found anything, right? That's the error code to watch for for many FS error conditions: % ./tools/dev/which-error.py E160004 00160004 SVN_ERR_FS_CORRUPT Thanks. Jason For convenience I'm attaching a patch that implements both of my suggestions. Let us know please if it has any effect. Cheers, Daniel Index: subversion/libsvn_fs_fs/fs.h === --- subversion/libsvn_fs_fs/fs.h (revision 1295418) +++ subversion/libsvn_fs_fs/fs.h (working copy) @@ -161,7 +161,7 @@ typedef struct fs_fs_shared_txn_data_t per file. On Windows apr implements the locking as per file handle locks, so we don't have to add our own mutex for just in-process synchronization. */ -#if APR_HAS_THREADS !defined(WIN32) +#if APR_HAS_THREADS /* disabled: and !defined(WIN32) */ #define SVN_FS_FS__USE_LOCK_MUTEX 1 #else #define SVN_FS_FS__USE_LOCK_MUTEX 0 Index: subversion/libsvn_fs_fs/caching.c === --- subversion/libsvn_fs_fs/caching.c (revision 1295418) +++ subversion/libsvn_fs_fs/caching.c (working copy) @@ -209,6 +209,9 @@ create_cache(svn_cache__t **cache_p, const char *prefix, apr_pool_t *pool) { + *cache_p = NULL; + return SVN_NO_ERROR; + if (memcache) { SVN_ERR(svn_cache__create_memcache(cache_p, memcache,
Re: predecessor count for the root node-revision is wrong message
Daniel Shahaf wrote on Fri, Mar 02, 2012 at 12:21:46 +0200: Daniel Shahaf wrote on Wed, Feb 29, 2012 at 18:15:41 +0200: Justin, Jason, Some things you could do are: - What RA method do you use? svn:// or http://? Justin, what operating system does your server run? ... and what versions of httpd (Apache) and APR.
Re: predecessor count for the root node-revision is wrong message
On Fri, Mar 2, 2012 at 3:41 AM, Daniel Shahaf danie...@elego.de wrote: Justin Johnson wrote on Thu, Mar 01, 2012 at 07:45:08 -0600: On Wed, Feb 29, 2012 at 4:14 PM, Justin Johnson justinandto...@gmail.comwrote: On Wed, Feb 29, 2012 at 11:22 AM, Daniel Shahaf danie...@elego.de wrote: ... so please try SVNInMemoryCacheSize 0, and see if that makes the issue less frequent. I'm a dork. I will do so once I take care of the appropriate change control I have to deal with. Thanks. We made the change and problem is still occurring. OK. So that means that either membuffer isn't the cause of the error, or both membuffer and inprocess have (independent) bugs. That said, I'm happy to see that it took you less than a day to report that the bug no longer reproduces. Can you reproduce it at will? No, but our server has about 2500 active users on it, so it doesn't take long to see errors. If we make a configuration change and a day goes by without the error showing up in the logs I would suspect the problem was fixed. If a week were to pass with no error I would feel much more confident. Yesterday had 6 of the errors and 1 today so far.
Re: predecessor count for the root node-revision is wrong message
On Fri, Mar 2, 2012 at 6:21 AM, Daniel Shahaf danie...@elego.de wrote: Daniel Shahaf wrote on Fri, Mar 02, 2012 at 12:21:46 +0200: Daniel Shahaf wrote on Wed, Feb 29, 2012 at 18:15:41 +0200: Justin, Jason, Some things you could do are: - What RA method do you use? svn:// or http://? Justin, what operating system does your server run? ... and what versions of httpd (Apache) and APR. Solaris 10 apr 1.4.5 httpd 2.2.19
Re: predecessor count for the root node-revision is wrong message
On Fri, Mar 2, 2012 at 2:58 AM, Daniel Shahaf danie...@elego.de wrote: Jason Wong wrote on Thu, Mar 01, 2012 at 10:01:26 -0800: I have had a developer here create a build of the latest SVN code with your changes you mentioned in r1294470 for the svnadmin verify Okay, that's great news, for two reasons: 1. It means building svn on windows isn't as painful as it used to be :) Actually, it did take some work to get it going as we did not have another system available to us and also did not have VC++ 6. We had to use VS 2010 in order to do this. Also, for the other components required (python,perl etc), the files after the install were copied to the workstation to see if it would work as we did not want to change the current workstation configuration by running the installers. All in all, it did seem to work. 2. It means I can ask you to build a custom server with the 'inprocess' cache disabled, or (if all else fails) to bisect, per my previous email. One of the things you could try is to disable caching: simply modify the function create_cache() in libsvn_fs_fs/caching.c to always return NULL in *CACHE_P. See below for another suggestion. command. We have run 'svnadmin verify' against every revision of our hotcopy of our repository taken when we first brought this issue to the forums and are now tracking down each of the revisions to see what actions were being done at those times. Thanks! I do hope this work enables us to pinpoint and fix the bug. I will be going through the list to see what else was happening at the same time on the apache server since it was alluded to that there may be concurrency issues. I know the last two times that this error has popped up, we had two svn operations starting at around the same time according to the Apache logs. I will go through the previous apache history to see if this was always the case or not. From the results, we see 25 error messages for predecessor count is wrong and the first one appeared on January 26, 2011. Near that time the following events occurred: Jan. 14, 2011 - svn upgraded from 1.6.6 to 1.6.15 Jan. 14, 2011 - Apache HTTP server upgraded from 2.2.15 to 2.2.17 Jan. 21, 2011 - repository was pruned to delete some binary files. Between January and our upgrade in Dec. to 1.7.1, we have had about 14,000 revisions and seen only 25 instances of this node revision issue. During the times we had these errors, we were using svn versions 1.6.15 and 1.6.16. Thanks, very valuable information. I've reviewed the 1.6.6-1.6.15 diff, and I have the following suggestions: - Change subversion/libsvn_fs_fs/fs.h such that SVN_FS_FS__USE_LOCK_MUTEX is set to 1. It was set to 1 in 1.6.6 but to 0 in 1.6.15. (This wouldn't explain why ASF saw it, but it might explain why you're seeing it.) Fail2ban from what I could find does not look like it has a Windows port which I currently have my production environment hosted on. Yeah, sorry. But you can write a cron job -- I mean, a Scheduled Task -- that greps your error logs for 160004 every night and mails you it it found anything, right? That's the error code to watch for for many FS error conditions: % ./tools/dev/which-error.py E160004 00160004 SVN_ERR_FS_CORRUPT I will look into it. We did ask developers to note any error messages that they see from tortoisesvn now as the last time we saw the error message pop up, we asked the developer what happened and he said that an error message popped up and he just tried to check in again and it worked. We will note the exact message next time. Thanks. Jason For convenience I'm attaching a patch that implements both of my suggestions. Let us know please if it has any effect. I will forward this to the developer to look at. Cheers, Daniel Hi. See replies above. I will post what we find. Thanks. Jason
Re: predecessor count for the root node-revision is wrong message
Jason Wong wrote on Fri, Mar 02, 2012 at 07:32:38 -0800: On Fri, Mar 2, 2012 at 2:58 AM, Daniel Shahaf danie...@elego.de wrote: Jason Wong wrote on Thu, Mar 01, 2012 at 10:01:26 -0800: I have had a developer here create a build of the latest SVN code with your changes you mentioned in r1294470 for the svnadmin verify Okay, that's great news, for two reasons: 1. It means building svn on windows isn't as painful as it used to be :) Actually, it did take some work to get it going as we did not have another system available to us and also did not have VC++ 6. We had to use VS 2010 in order to do this. Also, for the other components required (python,perl etc), the files after the install were copied to the workstation to see if it would work as we did not want to change the current workstation configuration by running the installers. All in all, it did seem to work. Okay. The normal build requires just the *.exe and *.dll files to be placed appropriately (such that the *.exe's and httpd's find their libsvn_* DLL's at runtime) --- it doesn't require Administrator access, for example. To clarify, Perl is only required to build OpenSSL; it is not required to build APR, Neon, or Subversion. 2. It means I can ask you to build a custom server with the 'inprocess' cache disabled, or (if all else fails) to bisect, per my previous email. One of the things you could try is to disable caching: simply modify the function create_cache() in libsvn_fs_fs/caching.c to always return NULL in *CACHE_P. See below for another suggestion. command. We have run 'svnadmin verify' against every revision of our hotcopy of our repository taken when we first brought this issue to the forums and are now tracking down each of the revisions to see what actions were being done at those times. Thanks! I do hope this work enables us to pinpoint and fix the bug. I will be going through the list to see what else was happening at the same time on the apache server since it was alluded to that there may be concurrency issues. I know the last two times that this error has popped up, we had two svn operations starting at around the same time according to the Apache logs. I will go through the previous apache history to see if this was always the case or not. Thanks, looking forward to hear what you come up with. FWIW, Justin's reply suggests that the error was seen on three different platforms --- Windows, Solaris, and FreeBSD --- so that should narrow down the range of possible explanations. (I'll also note that at ASF's installation we are not running into new instances of the bug.)
Re: predecessor count for the root node-revision is wrong message
On Wed, Feb 29, 2012 at 4:14 PM, Justin Johnson justinandto...@gmail.comwrote: On Wed, Feb 29, 2012 at 11:22 AM, Daniel Shahaf danie...@elego.de wrote: Justin Johnson wrote on Wed, Feb 29, 2012 at 11:11:18 -0600: On Wed, Feb 29, 2012 at 10:35 AM, Daniel Shahaf danie...@elego.de wrote: Justin Johnson wrote on Wed, Feb 29, 2012 at 10:25:38 -0600: On Wed, Feb 29, 2012 at 10:15 AM, Daniel Shahaf danie...@elego.de wrote: - Are the failing revisions always small (eg: just a URL-URL copy), or always large (eg: results of a merge)? As mentioned before, so far it appears to be 1) create a tag by copying an entire working copy of a branch to a URL, and 2) commit merge results for an entire branch. That's not clear enough. Could you show 'log -qv' of those revisions? A wc-to-URL copy could touch just one or two files (compare `svn log -qv --stop-on-copy http://svn.apache.org/repos/asf/subversion/tags/1.7.3`http://svn.apache.org/repos/asf/subversion/tags/1.7.3 ) or a full tree (http://subversion.apache.org/faq.html#in-place-import ). Which is it? The commits fail, so there is no revision to run this against. Other tags that have succeeded seem to just have one added path that is a copy of the branch at revision x. Does that answer your question? Yes, thanks. - Could you try setting the maximum cache size to zero? (svnserve: --memory-cache-size=0; mod_dav_svn: SVNInMemoryCacheSize 0) Apache is our server, so this is not applicable. SVNInMemoryCacheSize is applicable. Sorry, I missed that one. We have not specified SVNInMemoryCacheSize, so we're using the default. ... so please try SVNInMemoryCacheSize 0, and see if that makes the issue less frequent. I'm a dork. I will do so once I take care of the appropriate change control I have to deal with. Thanks. We made the change and problem is still occurring.
Re: predecessor count for the root node-revision is wrong message
- Are the failing revisions always small (eg: just a URL-URL copy), or always large (eg: results of a merge)? As mentioned before, so far it appears to be 1) create a tag by copying an entire working copy of a branch to a URL, and 2) commit merge results for an entire branch. That's not clear enough. Could you show 'log -qv' of those revisions? A wc-to-URL copy could touch just one or two files (compare `svn log -qv --stop-on-copy http://svn.apache.org/repos/asf/subversion/tags/1.7.3`http://svn.apache.org/repos/asf/subversion/tags/1.7.3 ) or a full tree ( http://subversion.apache.org/faq.html#in-place-import). Which is it? The commits fail, so there is no revision to run this against. Other tags that have succeeded seem to just have one added path that is a copy of the branch at revision x. Does that answer your question? Yes, thanks. - Could you try setting the maximum cache size to zero? (svnserve: --memory-cache-size=0; mod_dav_svn: SVNInMemoryCacheSize 0) Apache is our server, so this is not applicable. SVNInMemoryCacheSize is applicable. Sorry, I missed that one. We have not specified SVNInMemoryCacheSize, so we're using the default. ... so please try SVNInMemoryCacheSize 0, and see if that makes the issue less frequent. I'm a dork. I will do so once I take care of the appropriate change control I have to deal with. Thanks. We made the change and problem is still occurring. The error occurs when a tag is created by copying from a URL to a URL as well. Note that the clients are all 1.6 at this time. If it is helpful I can try to setup some tests with a 1.7 client. To make sure I understand the issue, should I be concerned about the repositories and our ability to reproduce the history or recover from any corruption that this bug may have caused? Let me know if I can help in any other way. Thanks. Justin
Re: predecessor count for the root node-revision is wrong message
Justin Johnson wrote on Thu, Mar 01, 2012 at 08:28:20 -0600: To make sure I understand the issue, should I be concerned about the repositories and our ability to reproduce the history or recover from any corruption that this bug may have caused? The only known (and predicted) effect of the error is that some revisions are wrongly skipped during a backward history walk --- such as 'log -r HEAD:0' does. Creating an svnsync mirror is enough to fix the issue. (There is no need to check the mirror, even: if the mirror is written to by a 1.7 server (or a 1.7 svnsync over file://), the normal validation that Jason's error logs show will be done automatically.)
Re: predecessor count for the root node-revision is wrong message
On Tue, Feb 28, 2012 at 3:07 AM, Daniel Shahaf danie...@elego.de wrote: Jason Wong wrote on Mon, Feb 27, 2012 at 07:36:39 -0800: On Thu, Feb 16, 2012 at 12:14 PM, Daniel Shahaf danie...@elego.de wrote: The output from these two tells me two things: 1. The minfo-cnt value is reasonable (within a typical ballpark). That's relevant since minfo-cnt abnormalities were seen in another instance of the bug. 2. Everything else looks correct: the 'id:'/'pred:' headers are accurate, and the 'count:' header was incremented correctly. The 'count:' header does, however, indicate that your repository has _in the past_ triggered an instance of the bug. This is true. We have seen the bug happen before. The first occurence of this that we had seen was Dec. 7th, 2011, a few days after we went from 1.6.16 to 1.7.1. That was the first time we had seen that happen. At the time, we did not know about the cause and the developer who had encountered the error didn't report it and was able to work Well, install fail2ban and have it mail you when that string appears in the logs? I'll do so too... around it. From the Apache logs we have: [Wed Dec 07 15:16:36 2011] [error] [client 10.2.3.1] predecessor count for the root node-revision is wrong: found 59444, committing r59478 [409, #160004] [Wed Dec 07 15:33:47 2011] [error] [client 10.2.3.2] predecessor count for the root node-revision is wrong: found 59482, committing r59516 [409, #160004] [Wed Dec 07 15:35:19 2011] [error] [client 10.2.3.3] predecessor count for the root node-revision is wrong: found 59488, committing r59522 [409, #160004] As Stefan mentioned, these represent commit attempts that were rejected in order to prevent a new instance of the bug from entering the history. [Wed Dec 07 15:44:10 2011] [error] [client 10.2.3.4] predecessor count for the root node-revision is wrong: found 59505, committing r59539 [409, #160004] Of the ips above, the last line is from the build machine. The others were from developer workstations. I mentioned the most recent two times first as we were actually aware of the issue at that time and it was recent so we knew to start looking into it. Between Dec. 7 and Jan. 31, the bug has occurred 12 times, 3 of those times from the build server. The rest are from workstations. This month, it has only occurred once and it was from the build server. What percentage of your commits are from the build server? Is there anything noteworthy about commits that were in progress around the time the bug occurred? (their svn:date's would be near the time stamp in the httpd log) Each of these times, the error has occurred in different parts of the repository. Replies above. Sorry about the delay in replying. I have been really busy of late. I will try and get the results this week, if not, it will most likely be next week. No problem. I just discovered we are having this problem as well. Since upgrading from 1.6.17 to 1.7.2 on the evening of 2/21 we have seen this error on 10 different machines. 9 of those are build machines, so they would have been creating tags. I have a suspicion that all of the builds were trying to create tags by copying from a working copy as opposed to a URL to URL copy, but I need to confirm. I know at least some of them were. The other 1 machine received the error when trying to commit merge results. The details of the merge are not available to me since the user reverted the local merge results. Let me know if I can provide any more info to help solve this problem. Justin
Re: predecessor count for the root node-revision is wrong message
Justin, Jason, Some things you could do are: - What RA method do you use? svn:// or http://? - Are the failing revisions always small (eg: just a URL-URL copy), or always large (eg: results of a merge)? - Do you have any caching enabled at the OS filesystem layer or below it? - Did you configure Subversion to use memcached? - Did you configure a maximum cache size for Subversion? - Could you try setting the maximum cache size to zero? (svnserve: --memory-cache-size=0; mod_dav_svn: SVNInMemoryCacheSize 0) This doesn't entirely disable caching, actually -- it just disables the new 'membuffer' cache backend, reverting to another older backend called 'inprocess'. @Justin -- thanks for the information. I've asked Jason upthread some questions (eg, whether there patterns of commits concurrent to the failing ones), which likely apply to you too. Daniel Justin Johnson wrote on Wed, Feb 29, 2012 at 09:22:40 -0600: On Tue, Feb 28, 2012 at 3:07 AM, Daniel Shahaf danie...@elego.de wrote: Jason Wong wrote on Mon, Feb 27, 2012 at 07:36:39 -0800: Replies above. Sorry about the delay in replying. I have been really busy of late. I will try and get the results this week, if not, it will most likely be next week. No problem. I just discovered we are having this problem as well. Since upgrading from 1.6.17 to 1.7.2 on the evening of 2/21 we have seen this error on 10 different machines. 9 of those are build machines, so they would have been creating tags. I have a suspicion that all of the builds were trying to create tags by copying from a working copy as opposed to a URL to URL copy, but I need to confirm. I know at least some of them were. The other 1 machine received the error when trying to commit merge results. The details of the merge are not available to me since the user reverted the local merge results. Let me know if I can provide any more info to help solve this problem. Justin
Re: predecessor count for the root node-revision is wrong message
On Wed, Feb 29, 2012 at 10:15 AM, Daniel Shahaf danie...@elego.de wrote: Justin, Jason, Some things you could do are: - What RA method do you use? svn:// or http://? http:// - Are the failing revisions always small (eg: just a URL-URL copy), or always large (eg: results of a merge)? As mentioned before, so far it appears to be 1) create a tag by copying an entire working copy of a branch to a URL, and 2) commit merge results for an entire branch. - Do you have any caching enabled at the OS filesystem layer or below it? No. - Did you configure Subversion to use memcached? No. - Did you configure a maximum cache size for Subversion? No. - Could you try setting the maximum cache size to zero? (svnserve: --memory-cache-size=0; mod_dav_svn: SVNInMemoryCacheSize 0) Apache is our server, so this is not applicable. This doesn't entirely disable caching, actually -- it just disables the new 'membuffer' cache backend, reverting to another older backend called 'inprocess'. @Justin -- thanks for the information. I've asked Jason upthread some questions (eg, whether there patterns of commits concurrent to the failing ones), which likely apply to you too. Daniel Justin Johnson wrote on Wed, Feb 29, 2012 at 09:22:40 -0600: On Tue, Feb 28, 2012 at 3:07 AM, Daniel Shahaf danie...@elego.de wrote: Jason Wong wrote on Mon, Feb 27, 2012 at 07:36:39 -0800: Replies above. Sorry about the delay in replying. I have been really busy of late. I will try and get the results this week, if not, it will most likely be next week. No problem. I just discovered we are having this problem as well. Since upgrading from 1.6.17 to 1.7.2 on the evening of 2/21 we have seen this error on 10 different machines. 9 of those are build machines, so they would have been creating tags. I have a suspicion that all of the builds were trying to create tags by copying from a working copy as opposed to a URL to URL copy, but I need to confirm. I know at least some of them were. The other 1 machine received the error when trying to commit merge results. The details of the merge are not available to me since the user reverted the local merge results. Let me know if I can provide any more info to help solve this problem. Justin
Re: predecessor count for the root node-revision is wrong message
Justin Johnson wrote on Wed, Feb 29, 2012 at 10:25:38 -0600: On Wed, Feb 29, 2012 at 10:15 AM, Daniel Shahaf danie...@elego.de wrote: - Are the failing revisions always small (eg: just a URL-URL copy), or always large (eg: results of a merge)? As mentioned before, so far it appears to be 1) create a tag by copying an entire working copy of a branch to a URL, and 2) commit merge results for an entire branch. That's not clear enough. Could you show 'log -qv' of those revisions? A wc-to-URL copy could touch just one or two files (compare `svn log -qv --stop-on-copy http://svn.apache.org/repos/asf/subversion/tags/1.7.3`) or a full tree (http://subversion.apache.org/faq.html#in-place-import). Which is it? - Could you try setting the maximum cache size to zero? (svnserve: --memory-cache-size=0; mod_dav_svn: SVNInMemoryCacheSize 0) Apache is our server, so this is not applicable. SVNInMemoryCacheSize is applicable.
Re: predecessor count for the root node-revision is wrong message
Justin Johnson wrote on Wed, Feb 29, 2012 at 11:11:18 -0600: On Wed, Feb 29, 2012 at 10:35 AM, Daniel Shahaf danie...@elego.de wrote: Justin Johnson wrote on Wed, Feb 29, 2012 at 10:25:38 -0600: On Wed, Feb 29, 2012 at 10:15 AM, Daniel Shahaf danie...@elego.de wrote: - Are the failing revisions always small (eg: just a URL-URL copy), or always large (eg: results of a merge)? As mentioned before, so far it appears to be 1) create a tag by copying an entire working copy of a branch to a URL, and 2) commit merge results for an entire branch. That's not clear enough. Could you show 'log -qv' of those revisions? A wc-to-URL copy could touch just one or two files (compare `svn log -qv --stop-on-copy http://svn.apache.org/repos/asf/subversion/tags/1.7.3`) or a full tree (http://subversion.apache.org/faq.html#in-place-import). Which is it? The commits fail, so there is no revision to run this against. Other tags that have succeeded seem to just have one added path that is a copy of the branch at revision x. Does that answer your question? Yes, thanks. - Could you try setting the maximum cache size to zero? (svnserve: --memory-cache-size=0; mod_dav_svn: SVNInMemoryCacheSize 0) Apache is our server, so this is not applicable. SVNInMemoryCacheSize is applicable. Sorry, I missed that one. We have not specified SVNInMemoryCacheSize, so we're using the default. ... so please try SVNInMemoryCacheSize 0, and see if that makes the issue less frequent.
Re: predecessor count for the root node-revision is wrong message
On Wed, Feb 29, 2012 at 11:22 AM, Daniel Shahaf danie...@elego.de wrote: Justin Johnson wrote on Wed, Feb 29, 2012 at 11:11:18 -0600: On Wed, Feb 29, 2012 at 10:35 AM, Daniel Shahaf danie...@elego.de wrote: Justin Johnson wrote on Wed, Feb 29, 2012 at 10:25:38 -0600: On Wed, Feb 29, 2012 at 10:15 AM, Daniel Shahaf danie...@elego.de wrote: - Are the failing revisions always small (eg: just a URL-URL copy), or always large (eg: results of a merge)? As mentioned before, so far it appears to be 1) create a tag by copying an entire working copy of a branch to a URL, and 2) commit merge results for an entire branch. That's not clear enough. Could you show 'log -qv' of those revisions? A wc-to-URL copy could touch just one or two files (compare `svn log -qv --stop-on-copy http://svn.apache.org/repos/asf/subversion/tags/1.7.3`) or a full tree (http://subversion.apache.org/faq.html#in-place-import ). Which is it? The commits fail, so there is no revision to run this against. Other tags that have succeeded seem to just have one added path that is a copy of the branch at revision x. Does that answer your question? Yes, thanks. - Could you try setting the maximum cache size to zero? (svnserve: --memory-cache-size=0; mod_dav_svn: SVNInMemoryCacheSize 0) Apache is our server, so this is not applicable. SVNInMemoryCacheSize is applicable. Sorry, I missed that one. We have not specified SVNInMemoryCacheSize, so we're using the default. ... so please try SVNInMemoryCacheSize 0, and see if that makes the issue less frequent. I'm a dork. I will do so once I take care of the appropriate change control I have to deal with. Thanks.
Re: predecessor count for the root node-revision is wrong message
Daniel Shahaf wrote on Tue, Feb 28, 2012 at 07:17:04 +0200: Stefan Sperling wrote on Tue, Feb 28, 2012 at 03:18:35 +0100: On Mon, Feb 27, 2012 at 03:25:00PM -0800, Jason Wong wrote: I guess I am wondering that if this is the case, then why is it that if the check-in fails, and then we manually check it in again using tortoisesvn, that it works the second time? As far as I know, we don't know what the underlying problem is yet. So I'm afraid I cannot answer all of your questions in a satisfactory manner because I don't know the answers. I'm not really surprised by this situation. I guess Jason means the same set of tree/text/prop/revprop changes when he says it, but by the nature of the bug it may well have to do with some concurrency issues on the server --- i.e., it depends on more than just the contents being committed. Daniel might know more -- he has been following the problem more closely than me. And I hope he will correct me if I'm making any wrong or misleading statements in this post :) Yeah, I'm following this thread :) We have been using svn for a while now and I am wondering what this means that for 1.6, that this issue has been occurring from communcations between 1.6 client and 1.6 server. It has probably been happening since before the 1.7 release. The problem was first discovered in the ASF repository. The first occurrence there has been traced to a time before 1.7.0 was released. At the time the ASF server was still running 1.6, and the first commit that introduced the problem in the ASF repository was very likely a 1.6 client. AI We should check whether it _was_ a 1.6 client.. Also, is this bug something that svnadmin verify will not detect? The last time we ran svnadmin verify, it said all was good. Apparently, svnadmin verify won't find it. This should probably be fixed. But the priority right now is to understand why it is happening in the first place. AI The 'verify' part is a relatively low-hanging fruit, though. Done: r1294479 and followups. See also r1294470 which improves the error message. The latter will hopefully be in 1.7.4. svn_fs_verify() is not present in 1.7.x so the r1294479 set of changes cannot easily be backported. AI File an issue for the pred-count bug Issue #4129: http://subversion.tigris.org/issues/show_bug.cgi?id=4129
Re: predecessor count for the root node-revision is wrong message
On Thu, Feb 16, 2012 at 12:14 PM, Daniel Shahaf danie...@elego.de wrote: The output from these two tells me two things: 1. The minfo-cnt value is reasonable (within a typical ballpark). That's relevant since minfo-cnt abnormalities were seen in another instance of the bug. 2. Everything else looks correct: the 'id:'/'pred:' headers are accurate, and the 'count:' header was incremented correctly. The 'count:' header does, however, indicate that your repository has _in the past_ triggered an instance of the bug. This is true. We have seen the bug happen before. The first occurence of this that we had seen was Dec. 7th, 2011, a few days after we went from 1.6.16 to 1.7.1. That was the first time we had seen that happen. At the time, we did not know about the cause and the developer who had encountered the error didn't report it and was able to work around it. From the Apache logs we have: [Wed Dec 07 15:16:36 2011] [error] [client 10.2.3.1] predecessor count for the root node-revision is wrong: found 59444, committing r59478 [409, #160004] [Wed Dec 07 15:33:47 2011] [error] [client 10.2.3.2] predecessor count for the root node-revision is wrong: found 59482, committing r59516 [409, #160004] [Wed Dec 07 15:35:19 2011] [error] [client 10.2.3.3] predecessor count for the root node-revision is wrong: found 59488, committing r59522 [409, #160004] [Wed Dec 07 15:44:10 2011] [error] [client 10.2.3.4] predecessor count for the root node-revision is wrong: found 59505, committing r59539 [409, #160004] Of the ips above, the last line is from the build machine. The others were from developer workstations. I mentioned the most recent two times first as we were actually aware of the issue at that time and it was recent so we knew to start looking into it. Between Dec. 7 and Jan. 31, the bug has occurred 12 times, 3 of those times from the build server. The rest are from workstations. This month, it has only occurred once and it was from the build server. Each of these times, the error has occurred in different parts of the repository. In a bit more detail: the value of the 'count:' header should be equal to the revision number given as the third argument to dump-noderev.pl. (That revision number is also embedded in the 'id:' header, and is practically guaranteed to be embedded in the 'text:' header as well.) So, there are two things you can do to help us identify the bug: 1. Hunt for past instance of the bug, identify what revisions triggered it, and try and identify a common pattern to those revisions. (This basically calls for running 'dump-noderev.pl $REPOS /' in a loop and looking for non-sequential 'count:' or 'pred:' headers in the output for a pair of successive revisions.) I will try and see if I can get this done this week. 2. Look for new instances of the bug. You could periodically scan for new instances of the bug, or implement a post-commit hook such as the following (written for unix-like systems, sorry): [[[ # look for a corruption or two minfo_cnt() { dump-noderev.pl $REPOS / $1 | sed -ne 's/minfo-cnt: //p' } PREV_REV=`expr $REV - 1` if expr `minfo_cnt $PREV_REV` - `minfo_cnt $REV` | grep ... /dev/null; then # echo an error to stderr and mail the admin exit 1 fi skipped_root_noderevs() { expr $1 - `dump-noderev.pl $REPOS / $1 | sed -ne 's/^count: //p'` } if [ `skipped_root_noderevs $PREV_REV` -ne `skipped_root_noderevs $REV` ]; then # echo an error to stderr and mail the admin exit 2 fi ]]] I will talk to the build team here about the post-commit hook. We have had the bug occur again since my last reply. Replied above. The summary is that you have indeed ran into the bug, but for some reason not in r61852 but sometime before that, (and why did r61852 trigger the syslog error anyway? Good question) and now we're at the point of trying to identify the cause of the bug --- at least circumstantially. Thanks for your help so far, Daniel Hi Daniel. Replies above. Sorry about the delay in replying. I have been really busy of late. I will try and get the results this week, if not, it will most likely be next week. Thanks Jason.
Re: predecessor count for the root node-revision is wrong message
On Mon, Feb 27, 2012 at 07:36:39AM -0800, Jason Wong wrote: This is true. We have seen the bug happen before. The first occurence of this that we had seen was Dec. 7th, 2011, a few days after we went from 1.6.16 to 1.7.1. That was the first time we had seen that happen. At the time, we did not know about the cause and the developer who had encountered the error didn't report it and was able to work around it. From the Apache logs we have: [Wed Dec 07 15:16:36 2011] [error] [client 10.2.3.1] predecessor count for the root node-revision is wrong: found 59444, committing r59478 [409, #160004] Just to be clear: These errors emitted by the 1.7.1 server prevent the bug from corrupting new revisions. With a 1.6 server the problem would go unnoticed and create bad revision data. When this corruption occurs, the repository still works. But the history links for affected revisions are incorrect.
Re: predecessor count for the root node-revision is wrong message
On Mon, Feb 27, 2012 at 8:09 AM, Stefan Sperling s...@elego.de wrote: On Mon, Feb 27, 2012 at 07:36:39AM -0800, Jason Wong wrote: This is true. We have seen the bug happen before. The first occurence of this that we had seen was Dec. 7th, 2011, a few days after we went from 1.6.16 to 1.7.1. That was the first time we had seen that happen. At the time, we did not know about the cause and the developer who had encountered the error didn't report it and was able to work around it. From the Apache logs we have: [Wed Dec 07 15:16:36 2011] [error] [client 10.2.3.1] predecessor count for the root node-revision is wrong: found 59444, committing r59478 [409, #160004] Just to be clear: These errors emitted by the 1.7.1 server prevent the bug from corrupting new revisions. With a 1.6 server the problem would go unnoticed and create bad revision data. When this corruption occurs, the repository still works. But the history links for affected revisions are incorrect. Hi Stephan. So I think I misunderstood why the error messages were occurring. I had thought that there was a condition done by this check (in 1.7), that was erroneously causing svn to reject the attempt to check-in. I guess I am wondering that if this is the case, then why is it that if the check-in fails, and then we manually check it in again using tortoisesvn, that it works the second time? So the errors prevent the bug from corrupting new revisions? Is this something between the 1.7 versions or would this have been in 1.6 versions as well? We have been using svn for a while now and I am wondering what this means that for 1.6, that this issue has been occurring from communcations between 1.6 client and 1.6 server. Also, is this bug something that svnadmin verify will not detect? The last time we ran svnadmin verify, it said all was good. If it is the case that this bug has been occuring for a long time, what are the implications of the history links for affected revisions? When you say the history links are incorrect, does it just put in a random value or is it actually unreadable values? Does this mean subsequent revisions that occur after these bad revisions will propagate this bad information? A developer asked me to pose the following question. If he was to open a bad revision, would the client fail and give an error prompt or would it display history information which could belong to other files? Thanks. Jason.
Re: predecessor count for the root node-revision is wrong message
On Mon, Feb 27, 2012 at 03:25:00PM -0800, Jason Wong wrote: So I think I misunderstood why the error messages were occurring. I had thought that there was a condition done by this check (in 1.7), that was erroneously causing svn to reject the attempt to check-in. The purpose of this error is to abort a commit in a situation where Subversion would otherwise create a commit that has the wrong idea about the number of predecessor revisions of the repository's root node. The root node is changed in every commit. So for revision N, the number of predecessor revisions should add up to N (including revision zero). Somehow, this count sometimes ends up being wrong during a commit, and such commits are now rejected. However, when checking the count for sanity, Subversion accounts for the fact that older revisions might have gotten the count wrong. I.e. the check only prevents the count from becoming even more wrong in future revisions than it already is. I guess I am wondering that if this is the case, then why is it that if the check-in fails, and then we manually check it in again using tortoisesvn, that it works the second time? As far as I know, we don't know what the underlying problem is yet. So I'm afraid I cannot answer all of your questions in a satisfactory manner because I don't know the answers. Daniel might know more -- he has been following the problem more closely than me. And I hope he will correct me if I'm making any wrong or misleading statements in this post :) So the errors prevent the bug from corrupting new revisions? Yes. This error is a sanity check. Right now, it's a sort of good thing to be getting the error. Just not as good as a proper fix for the, as of yet unknown, underlying problem that's triggering the sanity check. Is this something between the 1.7 versions or would this have been in 1.6 versions as well? We currently assume that is also happens with 1.6. We have been using svn for a while now and I am wondering what this means that for 1.6, that this issue has been occurring from communcations between 1.6 client and 1.6 server. It has probably been happening since before the 1.7 release. The problem was first discovered in the ASF repository. The first occurrence there has been traced to a time before 1.7.0 was released. At the time the ASF server was still running 1.6, and the first commit that introduced the problem in the ASF repository was very likely a 1.6 client. Also, is this bug something that svnadmin verify will not detect? The last time we ran svnadmin verify, it said all was good. Apparently, svnadmin verify won't find it. This should probably be fixed. But the priority right now is to understand why it is happening in the first place. If it is the case that this bug has been occuring for a long time, what are the implications of the history links for affected revisions? 'svn log' skips some revisions where it shouldn't. This is the only harmful effect of this bug, as far as I know. When you say the history links are incorrect, does it just put in a random value or is it actually unreadable values? I don't know if it is random or somehow predetermined. That depends on how the wrong predecessor value actually comes about. Does this mean subsequent revisions that occur after these bad revisions will propagate this bad information? Yes, the predecessor count of subsequent revisions is off by some constant value. A developer asked me to pose the following question. If he was to open a bad revision, would the client fail and give an error prompt or would it display history information which could belong to other files? So far this hasn't triggered any noticeable errors apart from missing items when viewing the log history of a path. Once the problem was discovered by inspection of 'svn log' output, a sanity check was put in place to prevent the problem from happening in new revisions. This check is what you are hitting. What we need to do now is to figure out why it's happening, and then fix that problem.
Re: predecessor count for the root node-revision is wrong message
Stefan Sperling wrote on Tue, Feb 28, 2012 at 03:18:35 +0100: On Mon, Feb 27, 2012 at 03:25:00PM -0800, Jason Wong wrote: I guess I am wondering that if this is the case, then why is it that if the check-in fails, and then we manually check it in again using tortoisesvn, that it works the second time? As far as I know, we don't know what the underlying problem is yet. So I'm afraid I cannot answer all of your questions in a satisfactory manner because I don't know the answers. I'm not really surprised by this situation. I guess Jason means the same set of tree/text/prop/revprop changes when he says it, but by the nature of the bug it may well have to do with some concurrency issues on the server --- i.e., it depends on more than just the contents being committed. Daniel might know more -- he has been following the problem more closely than me. And I hope he will correct me if I'm making any wrong or misleading statements in this post :) Yeah, I'm following this thread :) We have been using svn for a while now and I am wondering what this means that for 1.6, that this issue has been occurring from communcations between 1.6 client and 1.6 server. It has probably been happening since before the 1.7 release. The problem was first discovered in the ASF repository. The first occurrence there has been traced to a time before 1.7.0 was released. At the time the ASF server was still running 1.6, and the first commit that introduced the problem in the ASF repository was very likely a 1.6 client. AI We should check whether it _was_ a 1.6 client.. Also, is this bug something that svnadmin verify will not detect? The last time we ran svnadmin verify, it said all was good. Apparently, svnadmin verify won't find it. This should probably be fixed. But the priority right now is to understand why it is happening in the first place. AI The 'verify' part is a relatively low-hanging fruit, though. AI File an issue for the pred-count bug If it is the case that this bug has been occuring for a long time, what are the implications of the history links for affected revisions? 'svn log' skips some revisions where it shouldn't. This is the only harmful effect of this bug, as far as I know. As I'd mentioned upthread the ASF repository triggering this bug also has bogus minfo-cnt headers. It's not known yet whether that is the same bug, a separate bug, or a bug in something other than svn. (that's also why I'm not filing an issue for the minfo-cnt bug yet -- it's not clear that the bug is in svn.) When you say the history links are incorrect, does it just put in a random value or is it actually unreadable values? I don't know if it is random or somehow predetermined. That depends on how the wrong predecessor value actually comes about. The links are not garbage values -- they point to an existing location (node-rev) in history. They don't point to the right location. Does this mean subsequent revisions that occur after these bad revisions will propagate this bad information? Yes, the predecessor count of subsequent revisions is off by some constant value. Indeed; but that's just error propagation, rather than a bug. Practically it implies fixing the effects of the bug will involve changing all the historical metadata. (This probably means dump/load. There is a way to implement this by doing in-place surgery (see l1488) but it is of the Don't try this at home variety.) The propagation of wrong counts, and the skipping of revisions during backwards history walk, are both expected behaviours from the DAG layer given the DAG in question (which includes a replacement of the root node). In English: the lower level APIs are performing correctly given the violating-the-invariants-of-higher-layers state of the filesystem. A developer asked me to pose the following question. If he was to open a bad revision, would the client fail and give an error prompt or would it display history information which could belong to other files? So far this hasn't triggered any noticeable errors apart from missing items when viewing the log history of a path. Once the problem was Indeed. And that's about all the direct effects it can have. (Perhaps some code is doing that history walk internally as part of another higher-level operation, though.) I guess minfo-cnt bug mentioned earlier should normally be very obvious, as the values are off by an order of magnitude. If they are too high then the tree walk to find svn:mergeinfo-ful nodes will not stop at the root of trees that lack those nodes (so, inefficiency but not incorrectness); if it's too low then it will ignore the svn:mergeinfo on some nodes, so I expect merges will visibly misperform (but Stefan will know better). discovered by inspection of 'svn log' output, a sanity check was put in place to prevent the problem from happening in new revisions. This check is what you are hitting. What
Re: predecessor count for the root node-revision is wrong message
On Wed, Feb 15, 2012 at 6:15 PM, Daniel Shahaf danie...@elego.de wrote: Jason Wong wrote on Wed, Feb 15, 2012 at 10:20:23 -0800: On Wed, Feb 8, 2012 at 6:22 PM, Nico Kadel-Garcia nka...@gmail.com wrote: On Wed, Feb 8, 2012 at 7:42 PM, Daniel Shahaf danie...@elego.de wrote: Daniel Shahaf wrote on Thu, Feb 09, 2012 at 01:46:45 +0200: Jason Wong wrote on Wed, Feb 08, 2012 at 15:32:05 -0800: Get xxd.exe from http://www.vim.org/ and cat.exe and sed.exe from http://gnuwin32.sf.net (or from Cygwin). Delete from the script the line that uses the 'head' command. There is a second use of 'head', which you shouldn't delete. So instead, just get head.exe from the same place as the other two, or use the following kind of statement: Or install CygWin and run the scripts from inside CygWin. This does present end-of-line issues, so be very careful about using svn:eol native properties. my $line = do { open FOO, perl -V 21 |; FOO; }; Lastly, there's a 'sed' invocation that uses single-quoted arguments. All it does is print the input up to the first empty line --- feel free to implement it differently. (One way: my @lines = split /\n/, `command | goes | here`; $_ and print or last for @lines; Both of these examples could do with some error checking.) Daniel (yes, there's also a neater way to do this without split(). but it's not a Perl class here) Hello. Sorry for the delay. Here is an update of what I have done since the last time I posted. I have run svn log -q ^/ on the respository and it came back with no missing revisions. I stand corrected, then. I've confirmed on another instance of the bug that 'svn log -q ^/' does not behave abnormally when the bug is present. Sorry for the misinformation. Question to devs: what operation will walk the predecessor links for the root fspath? (and can therefore be used to identify instances of the bug) Since I first posted, each of the projects we have tried to build that had failed have since successfully been built without any changes on our side. What is the significance of this? I don't know how your build process interacts with Subversion. I was having an issue with converting the script to run in windows as I was only getting the first line returned so I set up cygwin. I ran the script against both of the revisions (61815 and 61852) in mentioned in the Apache error log and the output was the same for each. Commands: dump-noderev.pl /repository /project/binaries/release/phase1/iteration/81/trunk 61815 dump-noderev.pl /repository /project/binaries/release/phase1/iteration/81/trunk 61852 Output: id: 9-45362.0-61242.r61424/0 type: dir pred: 9-45362.0-60310/0 Are you sure that's the value of the pred: field? It contains only one ., instead of two. I missed a part of it, you are right. here is the full pred line: pred: 9-45362.0-60310.r60310/0 count: 43 text: 58741 121716266 218 218 74eb31e90880ba1345fc49252ca6efe6 cpath: /project/binaries/release/phase1/iteration/81/trunk copyfrom: 61423 /project/binaries/release/phase1/iteration/80/trunk Is this information helpful? Let me know if this tells you anything. Thanks The fact that the output is identical suggests that the /project/binaries/release/phase1/iteration/81/trunk tree hasn't changed between those two revisions (or that there was a directory replace above it). However, this is the error you report: [Tue Jan 31 11:37:23 2012] [error] [client 9.31.13.109] predecessor count for the root node-revision is wrong: found 61815, committing r61852 [409, #160004] The metadata this error complains about will be output by these two commands: ./dump-noderev.pl /repository / 61851 -- id: 0.0.r61851/33470 type: dir pred: 0.0.r61850/3844 count: 61818 text: 61851 32225 1232 1232 7555349571e297c23e647cc2441d5b8f cpath: / copyroot: 0 / minfo-cnt: 25685 -- ./dump-noderev.pl /repository / 61852 -- id: 0.0.r61852/27663 type: dir pred: 0.0.r61851/33470 count: 61819 text: 61852 26417 1233 1233 712fec619d55677e67aca8f7aa4ceb97 cpath: / copyroot: 0 / minfo-cnt: 25685 Jason. Cheers, Daniel Hi Daniel Thanks for the quick reply. I have posted the results from the two commands you have asked me to run above as well as the full pred value that was incomplete. Let me know if you need any other information. Thanks. Jason
Re: predecessor count for the root node-revision is wrong message
Jason Wong wrote on Thu, Feb 16, 2012 at 11:42:42 -0800: On Wed, Feb 15, 2012 at 6:15 PM, Daniel Shahaf danie...@elego.de wrote: Jason Wong wrote on Wed, Feb 15, 2012 at 10:20:23 -0800: On Wed, Feb 8, 2012 at 6:22 PM, Nico Kadel-Garcia nka...@gmail.com wrote: On Wed, Feb 8, 2012 at 7:42 PM, Daniel Shahaf danie...@elego.de wrote: Daniel Shahaf wrote on Thu, Feb 09, 2012 at 01:46:45 +0200: Jason Wong wrote on Wed, Feb 08, 2012 at 15:32:05 -0800: Get xxd.exe from http://www.vim.org/ and cat.exe and sed.exe from http://gnuwin32.sf.net (or from Cygwin). Delete from the script the line that uses the 'head' command. There is a second use of 'head', which you shouldn't delete. So instead, just get head.exe from the same place as the other two, or use the following kind of statement: Or install CygWin and run the scripts from inside CygWin. This does present end-of-line issues, so be very careful about using svn:eol native properties. my $line = do { open FOO, perl -V 21 |; FOO; }; Lastly, there's a 'sed' invocation that uses single-quoted arguments. All it does is print the input up to the first empty line --- feel free to implement it differently. (One way: my @lines = split /\n/, `command | goes | here`; $_ and print or last for @lines; Both of these examples could do with some error checking.) Daniel (yes, there's also a neater way to do this without split(). but it's not a Perl class here) Hello. Sorry for the delay. Here is an update of what I have done since the last time I posted. I have run svn log -q ^/ on the respository and it came back with no missing revisions. I stand corrected, then. I've confirmed on another instance of the bug that 'svn log -q ^/' does not behave abnormally when the bug is present. Sorry for the misinformation. Question to devs: what operation will walk the predecessor links for the root fspath? (and can therefore be used to identify instances of the bug) Since I first posted, each of the projects we have tried to build that had failed have since successfully been built without any changes on our side. What is the significance of this? I don't know how your build process interacts with Subversion. I was having an issue with converting the script to run in windows as I was only getting the first line returned so I set up cygwin. I ran the script against both of the revisions (61815 and 61852) in mentioned in the Apache error log and the output was the same for each. Commands: dump-noderev.pl /repository /project/binaries/release/phase1/iteration/81/trunk 61815 dump-noderev.pl /repository /project/binaries/release/phase1/iteration/81/trunk 61852 Output: id: 9-45362.0-61242.r61424/0 type: dir pred: 9-45362.0-60310/0 Are you sure that's the value of the pred: field? It contains only one ., instead of two. I missed a part of it, you are right. here is the full pred line: pred: 9-45362.0-60310.r60310/0 Okay. Did /project/binaries/release/phase1/iteration/81/trunk@61852, or anything under it, change between r60311 and r61423? (I'm intentionally using the peg rev notation here --- these id's do not change when a parent of the path in question gets copied.) count: 43 text: 58741 121716266 218 218 74eb31e90880ba1345fc49252ca6efe6 cpath: /project/binaries/release/phase1/iteration/81/trunk copyfrom: 61423 /project/binaries/release/phase1/iteration/80/trunk Is this information helpful? Let me know if this tells you anything. Thanks The fact that the output is identical suggests that the /project/binaries/release/phase1/iteration/81/trunk tree hasn't changed between those two revisions (or that there was a directory replace above it). However, this is the error you report: [Tue Jan 31 11:37:23 2012] [error] [client 9.31.13.109] predecessor count for the root node-revision is wrong: found 61815, committing r61852 [409, #160004] The metadata this error complains about will be output by these two commands: ./dump-noderev.pl /repository / 61851 -- id: 0.0.r61851/33470 type: dir pred: 0.0.r61850/3844 count: 61818 text: 61851 32225 1232 1232 7555349571e297c23e647cc2441d5b8f cpath: / copyroot: 0 / minfo-cnt: 25685 -- ./dump-noderev.pl /repository / 61852 -- id: 0.0.r61852/27663 type: dir pred: 0.0.r61851/33470 count: 61819 text: 61852 26417 1233 1233 712fec619d55677e67aca8f7aa4ceb97 cpath: / copyroot: 0 / minfo-cnt: 25685 The output from these two tells me two things: 1. The minfo-cnt value is reasonable (within a typical ballpark). That's relevant since minfo-cnt abnormalities were seen in another instance of the bug. 2.
Re: predecessor count for the root node-revision is wrong message
On Wed, Feb 8, 2012 at 6:22 PM, Nico Kadel-Garcia nka...@gmail.com wrote: On Wed, Feb 8, 2012 at 7:42 PM, Daniel Shahaf danie...@elego.de wrote: Daniel Shahaf wrote on Thu, Feb 09, 2012 at 01:46:45 +0200: Jason Wong wrote on Wed, Feb 08, 2012 at 15:32:05 -0800: Get xxd.exe from http://www.vim.org/ and cat.exe and sed.exe from http://gnuwin32.sf.net (or from Cygwin). Delete from the script the line that uses the 'head' command. There is a second use of 'head', which you shouldn't delete. So instead, just get head.exe from the same place as the other two, or use the following kind of statement: Or install CygWin and run the scripts from inside CygWin. This does present end-of-line issues, so be very careful about using svn:eol native properties. my $line = do { open FOO, perl -V 21 |; FOO; }; Lastly, there's a 'sed' invocation that uses single-quoted arguments. All it does is print the input up to the first empty line --- feel free to implement it differently. (One way: my @lines = split /\n/, `command | goes | here`; $_ and print or last for @lines; Both of these examples could do with some error checking.) Daniel (yes, there's also a neater way to do this without split(). but it's not a Perl class here) Hello. Sorry for the delay. Here is an update of what I have done since the last time I posted. I have run svn log -q ^/ on the respository and it came back with no missing revisions. Since I first posted, each of the projects we have tried to build that had failed have since successfully been built without any changes on our side. I was having an issue with converting the script to run in windows as I was only getting the first line returned so I set up cygwin. I ran the script against both of the revisions (61815 and 61852) in mentioned in the Apache error log and the output was the same for each. Commands: dump-noderev.pl /repository /project/binaries/release/phase1/iteration/81/trunk 61815 dump-noderev.pl /repository /project/binaries/release/phase1/iteration/81/trunk 61852 Output: id: 9-45362.0-61242.r61424/0 type: dir pred: 9-45362.0-60310/0 count: 43 text: 58741 121716266 218 218 74eb31e90880ba1345fc49252ca6efe6 cpath: /project/binaries/release/phase1/iteration/81/trunk copyfrom: 61423 /project/binaries/release/phase1/iteration/80/trunk Is this information helpful? Let me know if this tells you anything. Thanks Jason.
Re: predecessor count for the root node-revision is wrong message
Jason Wong wrote on Wed, Feb 15, 2012 at 10:20:23 -0800: On Wed, Feb 8, 2012 at 6:22 PM, Nico Kadel-Garcia nka...@gmail.com wrote: On Wed, Feb 8, 2012 at 7:42 PM, Daniel Shahaf danie...@elego.de wrote: Daniel Shahaf wrote on Thu, Feb 09, 2012 at 01:46:45 +0200: Jason Wong wrote on Wed, Feb 08, 2012 at 15:32:05 -0800: Get xxd.exe from http://www.vim.org/ and cat.exe and sed.exe from http://gnuwin32.sf.net (or from Cygwin). Delete from the script the line that uses the 'head' command. There is a second use of 'head', which you shouldn't delete. So instead, just get head.exe from the same place as the other two, or use the following kind of statement: Or install CygWin and run the scripts from inside CygWin. This does present end-of-line issues, so be very careful about using svn:eol native properties. my $line = do { open FOO, perl -V 21 |; FOO; }; Lastly, there's a 'sed' invocation that uses single-quoted arguments. All it does is print the input up to the first empty line --- feel free to implement it differently. (One way: my @lines = split /\n/, `command | goes | here`; $_ and print or last for @lines; Both of these examples could do with some error checking.) Daniel (yes, there's also a neater way to do this without split(). but it's not a Perl class here) Hello. Sorry for the delay. Here is an update of what I have done since the last time I posted. I have run svn log -q ^/ on the respository and it came back with no missing revisions. I stand corrected, then. I've confirmed on another instance of the bug that 'svn log -q ^/' does not behave abnormally when the bug is present. Sorry for the misinformation. Question to devs: what operation will walk the predecessor links for the root fspath? (and can therefore be used to identify instances of the bug) Since I first posted, each of the projects we have tried to build that had failed have since successfully been built without any changes on our side. What is the significance of this? I don't know how your build process interacts with Subversion. I was having an issue with converting the script to run in windows as I was only getting the first line returned so I set up cygwin. I ran the script against both of the revisions (61815 and 61852) in mentioned in the Apache error log and the output was the same for each. Commands: dump-noderev.pl /repository /project/binaries/release/phase1/iteration/81/trunk 61815 dump-noderev.pl /repository /project/binaries/release/phase1/iteration/81/trunk 61852 Output: id: 9-45362.0-61242.r61424/0 type: dir pred: 9-45362.0-60310/0 Are you sure that's the value of the pred: field? It contains only one ., instead of two. count: 43 text: 58741 121716266 218 218 74eb31e90880ba1345fc49252ca6efe6 cpath: /project/binaries/release/phase1/iteration/81/trunk copyfrom: 61423 /project/binaries/release/phase1/iteration/80/trunk Is this information helpful? Let me know if this tells you anything. Thanks The fact that the output is identical suggests that the /project/binaries/release/phase1/iteration/81/trunk tree hasn't changed between those two revisions (or that there was a directory replace above it). However, this is the error you report: [Tue Jan 31 11:37:23 2012] [error] [client 9.31.13.109] predecessor count for the root node-revision is wrong: found 61815, committing r61852 [409, #160004] The metadata this error complains about will be output by these two commands: ./dump-noderev.pl /repository / 61851 ./dump-noderev.pl /repository / 61852 Jason. Cheers, Daniel
Re: predecessor count for the root node-revision is wrong message
Hello and thank you for replying. On Tue, Feb 7, 2012 at 4:04 PM, Daniel Shahaf danie...@elego.de wrote: Jason Wong wrote on Tue, Feb 07, 2012 at 13:23:10 -0800: Any help/comments would be appreciated. Thank you. As I said, I'd be interested in isolating the cause of these errors. Is there anything common to revisions that triggered the bug (as explained above)? Are they concomitant with concurrent writes (commits, propedits, 'svn lock' operations, 'svnadmin pack' operations)? What version of svn does your server run (1.7.1?)? What operating system does your server run? Is there anything noteworthy about its filesystems or disks? I am working with our lead developer to come up with more details on our build process. I will post this when I get it. Our svn repository is 1.7.1 and is hosted on Apache 2.2.21 on a Windows 2003 server. The server has running RAID5 with SCSI disks. Because my systems are on Windows, I don't think the perl script you had sent me will run as there are a couple commands in it that are called which I don't have. Do you have any suggestions for how I can run the script? In the meantime, I am running svn log -q and will go though the output to scan for missing revisions. I will let you know those results when I have them. Thank you. Jason Wong
Re: predecessor count for the root node-revision is wrong message
Jason Wong wrote on Wed, Feb 08, 2012 at 15:32:05 -0800: Hello and thank you for replying. On Tue, Feb 7, 2012 at 4:04 PM, Daniel Shahaf danie...@elego.de wrote: Jason Wong wrote on Tue, Feb 07, 2012 at 13:23:10 -0800: Any help/comments would be appreciated. Thank you. As I said, I'd be interested in isolating the cause of these errors. Is there anything common to revisions that triggered the bug (as explained above)? Are they concomitant with concurrent writes (commits, propedits, 'svn lock' operations, 'svnadmin pack' operations)? What version of svn does your server run (1.7.1?)? What operating system does your server run? Is there anything noteworthy about its filesystems or disks? I am working with our lead developer to come up with more details on our build process. I will post this when I get it. Our svn repository is 1.7.1 and is hosted on Apache 2.2.21 on a Windows 2003 server. The server has running RAID5 with SCSI disks. Okay. We've seen this happen on svn.apache.org, which today runs httpd-2.3/FreeBSD/zfs, and was probably running this or a similar stack when the problems triggered there. Because my systems are on Windows, I don't think the perl script you had sent me will run as there are a couple commands in it that are called which I don't have. Do you have any suggestions for how I can run the script? Get xxd.exe from http://www.vim.org/ and cat.exe and sed.exe from http://gnuwin32.sf.net (or from Cygwin). Delete from the script the line that uses the 'head' command. Or just reimplement it in your language of choice. Basically it runs 'svnlook', parses out the noderev-id as '%s.%s.%s', parses \3 as 'r%d.%d', and opens revision file \1 at offset \2 --- accounting for the fact that that revision may have been 'svnadmin pack'ed. If I spoke too fast, feel free to ask. In the meantime, I am running svn log -q and will go though the output to scan for missing revisions. I will let you know those results when I have them. Thanks, Daniel Thank you. Jason Wong
Re: predecessor count for the root node-revision is wrong message
Daniel Shahaf wrote on Thu, Feb 09, 2012 at 01:46:45 +0200: Jason Wong wrote on Wed, Feb 08, 2012 at 15:32:05 -0800: Hello and thank you for replying. On Tue, Feb 7, 2012 at 4:04 PM, Daniel Shahaf danie...@elego.de wrote: Jason Wong wrote on Tue, Feb 07, 2012 at 13:23:10 -0800: Any help/comments would be appreciated. Thank you. As I said, I'd be interested in isolating the cause of these errors. Is there anything common to revisions that triggered the bug (as explained above)? Are they concomitant with concurrent writes (commits, propedits, 'svn lock' operations, 'svnadmin pack' operations)? What version of svn does your server run (1.7.1?)? What operating system does your server run? Is there anything noteworthy about its filesystems or disks? I am working with our lead developer to come up with more details on our build process. I will post this when I get it. Our svn repository is 1.7.1 and is hosted on Apache 2.2.21 on a Windows 2003 server. The server has running RAID5 with SCSI disks. Okay. We've seen this happen on svn.apache.org, which today runs httpd-2.3/FreeBSD/zfs, and was probably running this or a similar stack when the problems triggered there. Because my systems are on Windows, I don't think the perl script you had sent me will run as there are a couple commands in it that are called which I don't have. Do you have any suggestions for how I can run the script? Get xxd.exe from http://www.vim.org/ and cat.exe and sed.exe from http://gnuwin32.sf.net (or from Cygwin). Delete from the script the line that uses the 'head' command. Or just reimplement it in your language of choice. Basically it runs 'svnlook', parses out the noderev-id as '%s.%s.%s', parses \3 as 'r%d.%d', and opens revision file \1 at offset \2 --- accounting for the Typo fix: 'r%d.%d' - 'r%d/%d' fact that that revision may have been 'svnadmin pack'ed. If I spoke too fast, feel free to ask. In the meantime, I am running svn log -q and will go though the output to scan for missing revisions. I will let you know those results when I have them. Thanks, Daniel Thank you. Jason Wong
Re: predecessor count for the root node-revision is wrong message
Daniel Shahaf wrote on Thu, Feb 09, 2012 at 01:46:45 +0200: Jason Wong wrote on Wed, Feb 08, 2012 at 15:32:05 -0800: Hello and thank you for replying. On Tue, Feb 7, 2012 at 4:04 PM, Daniel Shahaf danie...@elego.de wrote: Jason Wong wrote on Tue, Feb 07, 2012 at 13:23:10 -0800: Any help/comments would be appreciated. Thank you. As I said, I'd be interested in isolating the cause of these errors. Is there anything common to revisions that triggered the bug (as explained above)? Are they concomitant with concurrent writes (commits, propedits, 'svn lock' operations, 'svnadmin pack' operations)? What version of svn does your server run (1.7.1?)? What operating system does your server run? Is there anything noteworthy about its filesystems or disks? I am working with our lead developer to come up with more details on our build process. I will post this when I get it. Our svn repository is 1.7.1 and is hosted on Apache 2.2.21 on a Windows 2003 server. The server has running RAID5 with SCSI disks. Okay. We've seen this happen on svn.apache.org, which today runs httpd-2.3/FreeBSD/zfs, and was probably running this or a similar stack when the problems triggered there. Because my systems are on Windows, I don't think the perl script you had sent me will run as there are a couple commands in it that are called which I don't have. Do you have any suggestions for how I can run the script? Get xxd.exe from http://www.vim.org/ and cat.exe and sed.exe from http://gnuwin32.sf.net (or from Cygwin). Delete from the script the line that uses the 'head' command. There is a second use of 'head', which you shouldn't delete. So instead, just get head.exe from the same place as the other two, or use the following kind of statement: my $line = do { open FOO, perl -V 21 |; FOO; }; Lastly, there's a 'sed' invocation that uses single-quoted arguments. All it does is print the input up to the first empty line --- feel free to implement it differently. (One way: my @lines = split /\n/, `command | goes | here`; $_ and print or last for @lines; Both of these examples could do with some error checking.) Daniel (yes, there's also a neater way to do this without split(). but it's not a Perl class here)
Re: predecessor count for the root node-revision is wrong message
On Wed, Feb 8, 2012 at 7:42 PM, Daniel Shahaf danie...@elego.de wrote: Daniel Shahaf wrote on Thu, Feb 09, 2012 at 01:46:45 +0200: Jason Wong wrote on Wed, Feb 08, 2012 at 15:32:05 -0800: Get xxd.exe from http://www.vim.org/ and cat.exe and sed.exe from http://gnuwin32.sf.net (or from Cygwin). Delete from the script the line that uses the 'head' command. There is a second use of 'head', which you shouldn't delete. So instead, just get head.exe from the same place as the other two, or use the following kind of statement: Or install CygWin and run the scripts from inside CygWin. This does present end-of-line issues, so be very careful about using svn:eol native properties. my $line = do { open FOO, perl -V 21 |; FOO; }; Lastly, there's a 'sed' invocation that uses single-quoted arguments. All it does is print the input up to the first empty line --- feel free to implement it differently. (One way: my @lines = split /\n/, `command | goes | here`; $_ and print or last for @lines; Both of these examples could do with some error checking.) Daniel (yes, there's also a neater way to do this without split(). but it's not a Perl class here)
Re: predecessor count for the root node-revision is wrong message
Jason Wong wrote on Tue, Feb 07, 2012 at 13:23:10 -0800: Hello. I have recently run into an issue with my subversion system (1.7.1) where a specific component I am trying to build has failed. I have had sucessful builds of this project before this issue happened since we upgraded to 1.7.1. I am currently getting by this by using TortoiseSVN 1.7.1 to do the check-in of the files left in the workspace. This issue seems to be intermittent as it does not happen every time. I am using subversion 1.7.1 windows binaries and subversion is hosting on Apache 2.2.21. I have run svnadmin verify against the directory and it has come back clean. I was wondering what type of events can cause this to happen? Are there any resolutions? Here are some entries from the apache log for the last time this issue happened. The Apache error log shows the following lines: [Tue Jan 31 11:37:23 2012] [error] [client 9.31.13.109] Could not MERGE resource /repository/!svn/txn/61847-1bz5 into /repository/project/binaries/release/phase1/iteration/81/trunk . [409, #0] [Tue Jan 31 11:37:23 2012] [error] [client 9.31.13.109] An error occurred while committing the transaction. [409, #160004] [Tue Jan 31 11:37:23 2012] [error] [client 9.31.13.109] predecessor count for the root node-revision is wrong: found 61815, committing r61852 [409, #160004] This error message indicates a bug has been detected. The implications of the bug are that walking backwards through history (eg, 'svn log -r HEAD:0') may skip some revisions. I'm interested in getting to the bottom of this. There are two ways to identify instances of the bug: either by directly querying the filesystem (explained below), or by running 'svn log -q ^/' and looking for gaps in the sequences of revision numbers. The second approch is as follows: look for revisions N such that f(N) != f(N-1), where f(N) is 'N minus (predecessors count of the node-revision of / in revision N)'. To compute f(), you find the predecessors count; it is given by the 'count:' header in the output of the attached script: % ./dump-noderev.pl /tmp/svn/r1 / 1 id: 0.0.r1/4760 type: dir pred: 0.0.r0/17 count: 1 text: 1 4640 107 107 c34b7d0de08f48db97941642f719e2f4 cpath: / copyroot: 0 / So, in summary, there are two ways to identify revisions that trigger the bug: by 'svn log' or by manually looking for gaps in the sequence of node-revisions of the root of the filesystem. I would ask you to run the attached script anyway, on the HEAD revision, and review its output for sensibility --- in particular, what is the value of the minfo-cnt header (if present). [Tue Jan 31 11:37:24 2012] [error] [client 9.31.13.109] Could not fetch resource information. [404, #0] [Tue Jan 31 11:37:24 2012] [error] [client 9.31.13.109] Named transaction doesn't exist. [404, #0] The following are from the Apache access log at the same time: 9.31.13.109 - username [31/Jan/2012:11:37:22 -0800] MERGE /repository/project/binaries/release/phase1/iteration/81/trunk HTTP/1.1 409 281 9.31.13.109 - username [31/Jan/2012:11:37:24 -0800] DELETE /repository/!svn/txn/61847-1bz5 HTTP/1.1 404 232 Any help/comments would be appreciated. Thank you. As I said, I'd be interested in isolating the cause of these errors. Is there anything common to revisions that triggered the bug (as explained above)? Are they concomitant with concurrent writes (commits, propedits, 'svn lock' operations, 'svnadmin pack' operations)? What version of svn does your server run (1.7.1?)? What operating system does your server run? Is there anything noteworthy about its filesystems or disks? Jason Wong Thanks, Daniel #!/usr/local/bin/perl use warnings; use strict; $ENV{uc $_} ||= lc $_ for qw/svn svnlook svnadmin/; sub open_pack_or_rev_file { my ($FS, $REVISION, $OFFSET) = @_; my $shard = int ($REVISION / 1000); my $remainder = $REVISION % 1000; if (-e $FS/revs/$shard/$REVISION) { return $FS/revs/$shard/$REVISION, $OFFSET; } elsif (-e $FS/revs/$REVISION) { return $FS/revs/$REVISION, $OFFSET; } elsif (-e $FS/revs/$shard.pack) { my $lineno = $remainder+1; my $rev_offset = `cat $FS/revs/$shard.pack/manifest | sed -ne ${lineno}p`; return $FS/revs/$shard.pack/pack, $rev_offset + $OFFSET; } } sub main { my ($REPOS, $FSPATH, $REV) = @_; my $FS = $REPOS/db; $REV =~ s/^r*// if defined $REV; die USAGE: $0 /path/to/repos /path/in/repos [revnum] if @_ != 2 and @_ != 3; die Non-numeric revision number if defined $REV and $REV !~ /^\d+$/; die Unknown repos format: $REPOS/format if `cat $REPOS/format` !~ /^[35]$/; die Not FSFS: $FS/fs-type if `cat $FS/fs-type` ne fsfs\n; die Unknown FSFS format: $FS/format if `head -n1 $FS/format` !~ /^[12346]$/; my ($revision, $offset) = do { my $REV_ARG = defined($REV) ? -r$REV : ; #