Re: relation to minfo-cnt bug Re: predecessor count for the root node-revision is wrong message

2012-03-30 Thread Jason Wong
On Wed, Mar 28, 2012 at 12:00 PM, Daniel Shahaf danie...@elego.de wrote:
 Jason Wong wrote on Wed, Mar 28, 2012 at 11:49:20 -0700:
 dump-noderev.pl /repo /
 -
 id: 0.0.r62104/28771
 type: dir
 pred: 0.0.r62103/28680
 count: 62071
 text: 62104 27520 1238 1238 ea635421e867454f9f7bc503c8160a2c
 cpath: /
 copyroot: 0 /
 minfo-cnt: 25707
 -

 dump-noderev.pl /mirror2 /
 ---
 id: 0.0.r62104/6122
 type: dir
 pred: 0.0.r62103/6039
 count: 62104
 text: 62104 4874 1235 1235 1f315ed2437ba5d70dba2587d9ef2d5a
 cpath: /
 copyroot: 0 /
 minfo-cnt: 25707
 ---

 Is this in line with what you expected?

 It's in line with my expectations, insofar as on the mirror the 'count'
 is correct.

 It also indicates that you weren't bitten by the minfo-cnt part of this
 bug.  As you know from the dev@ thread, Philip identified that part and
 fixed it too -- after my above email.

 Thanks again for your help in chasing down this bug.  It was backported
 today towards 1.7.5 too.

 Cheers,

 Daniel

Hi Daniel.

No problem. I am glad the issues are fixed. Thank you for all your help
and patience with my slow replies. It has been a busy couple of months
for me in trying find the time to do these tests.

So for correcting the count information in our live repository, I
should run svnsync on it at some point? Is there anything I need to do
after running that command in order to have it not link to the original?

Thanks.


Jason Wong


Re: relation to minfo-cnt bug Re: predecessor count for the root node-revision is wrong message

2012-03-30 Thread Daniel Shahaf
Jason Wong wrote on Fri, Mar 30, 2012 at 11:39:02 -0700:
 On Wed, Mar 28, 2012 at 12:00 PM, Daniel Shahaf danie...@elego.de wrote:
  Jason Wong wrote on Wed, Mar 28, 2012 at 11:49:20 -0700:
  dump-noderev.pl /repo /
  -
  id: 0.0.r62104/28771
  type: dir
  pred: 0.0.r62103/28680
  count: 62071
  text: 62104 27520 1238 1238 ea635421e867454f9f7bc503c8160a2c
  cpath: /
  copyroot: 0 /
  minfo-cnt: 25707
  -
 
  dump-noderev.pl /mirror2 /
  ---
  id: 0.0.r62104/6122
  type: dir
  pred: 0.0.r62103/6039
  count: 62104
  text: 62104 4874 1235 1235 1f315ed2437ba5d70dba2587d9ef2d5a
  cpath: /
  copyroot: 0 /
  minfo-cnt: 25707
  ---
 
  Is this in line with what you expected?
 
  It's in line with my expectations, insofar as on the mirror the 'count'
  is correct.
 
  It also indicates that you weren't bitten by the minfo-cnt part of this
  bug.  As you know from the dev@ thread, Philip identified that part and
  fixed it too -- after my above email.
 
  Thanks again for your help in chasing down this bug.  It was backported
  today towards 1.7.5 too.
 
  Cheers,
 
  Daniel
 
 Hi Daniel.
 
 No problem. I am glad the issues are fixed. Thank you for all your help
 and patience with my slow replies. It has been a busy couple of months
 for me in trying find the time to do these tests.
 

Welcome.

 So for correcting the count information in our live repository, I
 should run svnsync on it at some point?

You already did, as you have a mirror.  (Maybe you created it via
dump/load.)  Now you just need to swap the mirror for the original
repository:

- stop commits
- svnsync sync URL/to/mirror2 URL/to/repo
  (or svnadmin dump -r (YOUNGEST ON MIRROR2):HEAD --deltas --incremental repo \
  | svnadmin load mirror2)
- rename repo repo.i4129-victim  rename mirror2 repo
- enable commits

 Is there anything I need to do
 after running that command in order to have it not link to the original?

You can optionally remove the svn:sync-* revprops from r0.

 
 Thanks.
 
 
 Jason Wong


Re: relation to minfo-cnt bug Re: predecessor count for the root node-revision is wrong message

2012-03-28 Thread Jason Wong
On Thu, Mar 22, 2012 at 11:32 AM, Jason Wong jwong1m...@gmail.com wrote:
 Hello Daniel.

 I will give it a go and let you know what I find.

 Jason

 On Wed, Mar 21, 2012 at 1:39 AM, Daniel Shahaf danie...@elego.de wrote:
 Jason,

 I've learnt yesterday something new about the minfo-cnt corruption bug:
 it can manifest not only as absurdly high values (on the order of 2**70),
 but as far smaller wrong increments too (such as increment of 172
 instead of of 0 on one occasion).

 Could you determine whether said bug has occurred in your history?  You
 can do that by duplicating your repository using svnsync or dump|load,
 running dump-noderev.pl on / of both copies at the same revisions, and
 comparing the minfo-cnt values.

 I would be interested in knowing whether they are equal between the two
 copies.

 Thanks,

 Daniel

 Jason Wong wrote on Thu, Feb 16, 2012 at 11:42:42 -0800:
   ./dump-noderev.pl /repository / 61851
 --

 id: 0.0.r61851/33470
 type: dir
 pred: 0.0.r61850/3844
 count: 61818
 text: 61851 32225 1232 1232 7555349571e297c23e647cc2441d5b8f
 cpath: /
 copyroot: 0 /
 minfo-cnt: 25685
 --

Hello Daniel.

The svnsync took a while to run once I got it going. I ran the command
on the hotcopy I had made originally to keep the results consistant.

I have run the following two commands:
dump-noderev.pl /repo /
dump-noderev.pl /mirror2 /

Here are the outputs from the commands:

dump-noderev.pl /repo /
-
id: 0.0.r62104/28771
type: dir
pred: 0.0.r62103/28680
count: 62071
text: 62104 27520 1238 1238 ea635421e867454f9f7bc503c8160a2c
cpath: /
copyroot: 0 /
minfo-cnt: 25707
-

dump-noderev.pl /mirror2 /
---
id: 0.0.r62104/6122
type: dir
pred: 0.0.r62103/6039
count: 62104
text: 62104 4874 1235 1235 1f315ed2437ba5d70dba2587d9ef2d5a
cpath: /
copyroot: 0 /
minfo-cnt: 25707
---

Is this in line with what you expected?

Jason Wong.


Re: relation to minfo-cnt bug Re: predecessor count for the root node-revision is wrong message

2012-03-28 Thread Daniel Shahaf
Jason Wong wrote on Wed, Mar 28, 2012 at 11:49:20 -0700:
 dump-noderev.pl /repo /
 -
 id: 0.0.r62104/28771
 type: dir
 pred: 0.0.r62103/28680
 count: 62071
 text: 62104 27520 1238 1238 ea635421e867454f9f7bc503c8160a2c
 cpath: /
 copyroot: 0 /
 minfo-cnt: 25707
 -
 
 dump-noderev.pl /mirror2 /
 ---
 id: 0.0.r62104/6122
 type: dir
 pred: 0.0.r62103/6039
 count: 62104
 text: 62104 4874 1235 1235 1f315ed2437ba5d70dba2587d9ef2d5a
 cpath: /
 copyroot: 0 /
 minfo-cnt: 25707
 ---
 
 Is this in line with what you expected?

It's in line with my expectations, insofar as on the mirror the 'count'
is correct.

It also indicates that you weren't bitten by the minfo-cnt part of this
bug.  As you know from the dev@ thread, Philip identified that part and
fixed it too -- after my above email.

Thanks again for your help in chasing down this bug.  It was backported
today towards 1.7.5 too.

Cheers,

Daniel


Re: #4129 is reproducible Re: predecessor count for the root node-revision is wrong message

2012-03-23 Thread Philip Martin
Philip Martin philip.mar...@wandisco.com writes:

 Moving update_ancestry from tree.c to dag.c is one way to fix the
 problem.

This was applied in r1302613.  I believe this also fixes the minfo-cnt
corruption that has been observed.  To reproduce apply the following
patch to the old client to allow commit to be paused:

Index: ../src/subversion/libsvn_fs_fs/tree.c
===
--- ../src/subversion/libsvn_fs_fs/tree.c   (revision 1302612)
+++ ../src/subversion/libsvn_fs_fs/tree.c   (working copy)
@@ -1694,6 +1694,7 @@
  any future merges will only be between that node and whatever
  the root node of the youngest rev is by then. */
   err = merge_changes(NULL, youngish_root_node, txn, conflict, iterpool);
+  { char buf[256]; fputs(waiting..., stdout);fgets(buf, 255, stdin); }
   if (err)
 {
   if ((err-apr_err == SVN_ERR_FS_CONFLICT)  conflict_p)

Now the scenario:

svnadmin create repo
svn mkdir -mm file://`pwd`/repo/{A,B,C}
svn co file://`pwd`/repo wc1
svn co file://`pwd`/repo wc2
svn co file://`pwd`/repo wc3
svn ps svn:mergeinfo /P:2 wc1/A
svn ps svn:mergeinfo /Q:2 wc2/B
svn ps svn:mergeinfo /R:2 wc3/C

Now commit wc1 using the patched client running under valgrind. At the
first waiting...  prompt commit wc2 using a standard client.  Continue
the first commit to get a second waiting... prompt and commit wc3
using a standard client.  Continue the first commit and valgrind reports
errors such as:

==23311== Conditional jump or move depends on uninitialised value(s)
==23311==at 0x98E0E5A: svn_fs_fs__dag_increment_mergeinfo_count (dag.c:544)
==23311==by 0x9906847: merge (tree.c:1551)
==23311==by 0x9906A18: merge_changes (tree.c:1599)
==23311==by 0x9906C06: svn_fs_fs__commit_txn (tree.c:1696)
==23311==by 0x6C8C6D9: svn_fs_commit_txn (fs-loader.c:646)
==23311==by 0x6A5B12C: svn_repos_fs_commit_txn (fs-wrap.c:59)
==23311==by 0x6A51D0E: close_edit (commit.c:693)
==23311==by 0x4E4A34F: svn_client__do_commit (commit_util.c:1898)
==23311==by 0x4E45723: svn_client_commit6 (commit.c:1689)
==23311==by 0x4097D6: svn_cl__commit (commit-cmd.c:169)
==23311==by 0x416FA1: main (main.c:2699)

==23311== Syscall param write(buf) points to uninitialised byte(s)
==23311==at 0x62CC0D0: __write_nocancel (syscall-template.S:82)
==23311==by 0x5E7D4FC: apr_file_flush_locked (readwrite.c:317)
==23311==by 0x5E7D7AF: apr_file_flush (readwrite.c:340)
==23311==by 0x5E7CFA4: apr_unix_file_cleanup (open.c:77)
==23311==by 0x59FADAF: svn_io_file_close (io.c:3092)
==23311==by 0x98E9083: svn_fs_fs__put_node_revision (fs_fs.c:2318)
==23311==by 0x98E1039: svn_fs_fs__dag_increment_mergeinfo_count (dag.c:575)
==23311==by 0x9906847: merge (tree.c:1551)
==23311==by 0x9906A18: merge_changes (tree.c:1599)
==23311==by 0x9906C06: svn_fs_fs__commit_txn (tree.c:1696)
==23311==by 0x6C8C6D9: svn_fs_commit_txn (fs-loader.c:646)
==23311==by 0x6A5B12C: svn_repos_fs_commit_txn (fs-wrap.c:59)
==23311==  Address 0xb2ee1dc is 92 bytes inside a block of size 4,096 alloc'd
==23311==at 0x4C244E8: malloc (vg_replace_malloc.c:236)
==23311==by 0x5E806D0: pool_alloc (apr_pools.c:1463)
==23311==by 0x5E7CDFB: apr_file_open (open.c:211)
==23311==by 0x59F5194: file_open (io.c:280)
==23311==by 0x59FABF6: svn_io_file_open (io.c:3049)
==23311==by 0x98E8FC4: svn_fs_fs__put_node_revision (fs_fs.c:2308)
==23311==by 0x98E1039: svn_fs_fs__dag_increment_mergeinfo_count (dag.c:575)
==23311==by 0x9906847: merge (tree.c:1551)
==23311==by 0x9906A18: merge_changes (tree.c:1599)
==23311==by 0x9906C06: svn_fs_fs__commit_txn (tree.c:1696)
==23311==by 0x6C8C6D9: svn_fs_commit_txn (fs-loader.c:646)
==23311==by 0x6A5B12C: svn_repos_fs_commit_txn (fs-wrap.c:59)

The commit fails because the count: is wrong, but if I disable that
check to allow the commit to complete then I see a bogus minfo-cnt such
as:

id: 0.0.r7/322
type: dir
pred: 0.0.r4/180
count: 5
text: 7 198 111 0 242bff40060f22bcf85959dcf552851a
cpath: /
copyroot: 0 /
minfo-cnt: 43

Using the current trunk with the issue 4129 fix applied this no longer
happens: I don't get the valgrind warnings and the minfo-cnt is correct.

-- 
uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com


Re: predecessor count for the root node-revision is wrong message

2012-03-22 Thread Justin Johnson
On Tue, Mar 20, 2012 at 2:32 PM, Daniel Shahaf danie...@elego.de wrote:

 Daniel Shahaf wrote on Tue, Mar 20, 2012 at 00:49:06 +0200:
  The time until 1.7.5 is counted in weeks, and 1.6.18 is scheduled to be
  released next week.
 

 The fix was merged to 1.6.x@HEAD today and barring surprises will be
 included in 1.6.18.


Can someone tell me if the fix is client-side or server-side?  I see
r1302613 but am not familiar with the code.  Also, will I need to do
something with my repositories to fix any corruption?

Thanks.
Justin


Re: predecessor count for the root node-revision is wrong message

2012-03-22 Thread Daniel Shahaf
Justin Johnson wrote on Thu, Mar 22, 2012 at 13:03:04 -0500:
 On Tue, Mar 20, 2012 at 2:32 PM, Daniel Shahaf danie...@elego.de wrote:
 
  Daniel Shahaf wrote on Tue, Mar 20, 2012 at 00:49:06 +0200:
   The time until 1.7.5 is counted in weeks, and 1.6.18 is scheduled to be
   released next week.
  
 
  The fix was merged to 1.6.x@HEAD today and barring surprises will be
  included in 1.6.18.
 
 
 Can someone tell me if the fix is client-side or server-side?  I see

Server-side.

 r1302613 but am not familiar with the code.  Also, will I need to do
 something with my repositories to fix any corruption?
 

As noted on the issue, either create an svnsync mirror of them, or
dump|load them, to fix the corruption.  (svnsync is a client-side tool.)

 Thanks.
 Justin


Re: relation to minfo-cnt bug Re: predecessor count for the root node-revision is wrong message

2012-03-22 Thread Jason Wong
Hello Daniel.

I will give it a go and let you know what I find.

Jason

On Wed, Mar 21, 2012 at 1:39 AM, Daniel Shahaf danie...@elego.de wrote:
 Jason,

 I've learnt yesterday something new about the minfo-cnt corruption bug:
 it can manifest not only as absurdly high values (on the order of 2**70),
 but as far smaller wrong increments too (such as increment of 172
 instead of of 0 on one occasion).

 Could you determine whether said bug has occurred in your history?  You
 can do that by duplicating your repository using svnsync or dump|load,
 running dump-noderev.pl on / of both copies at the same revisions, and
 comparing the minfo-cnt values.

 I would be interested in knowing whether they are equal between the two
 copies.

 Thanks,

 Daniel

 Jason Wong wrote on Thu, Feb 16, 2012 at 11:42:42 -0800:
   ./dump-noderev.pl /repository / 61851
 --

 id: 0.0.r61851/33470
 type: dir
 pred: 0.0.r61850/3844
 count: 61818
 text: 61851 32225 1232 1232 7555349571e297c23e647cc2441d5b8f
 cpath: /
 copyroot: 0 /
 minfo-cnt: 25685
 --


Re: predecessor count for the root node-revision is wrong message

2012-03-22 Thread Justin Johnson
On Thu, Mar 22, 2012 at 1:13 PM, Daniel Shahaf danie...@elego.de wrote:

 Justin Johnson wrote on Thu, Mar 22, 2012 at 13:03:04 -0500:
  On Tue, Mar 20, 2012 at 2:32 PM, Daniel Shahaf danie...@elego.de
 wrote:
 
   Daniel Shahaf wrote on Tue, Mar 20, 2012 at 00:49:06 +0200:
The time until 1.7.5 is counted in weeks, and 1.6.18 is scheduled to
 be
released next week.
   
  
   The fix was merged to 1.6.x@HEAD today and barring surprises will be
   included in 1.6.18.
  
 
  Can someone tell me if the fix is client-side or server-side?  I see

 Server-side.

  r1302613 but am not familiar with the code.  Also, will I need to do
  something with my repositories to fix any corruption?
 

 As noted on the issue, either create an svnsync mirror of them, or
 dump|load them, to fix the corruption.  (svnsync is a client-side tool.)

  Thanks.
  Justin


Excellent.  Thank you.


relation to minfo-cnt bug Re: predecessor count for the root node-revision is wrong message

2012-03-21 Thread Daniel Shahaf
Jason,

I've learnt yesterday something new about the minfo-cnt corruption bug:
it can manifest not only as absurdly high values (on the order of 2**70),
but as far smaller wrong increments too (such as increment of 172
instead of of 0 on one occasion).

Could you determine whether said bug has occurred in your history?  You
can do that by duplicating your repository using svnsync or dump|load,
running dump-noderev.pl on / of both copies at the same revisions, and
comparing the minfo-cnt values.

I would be interested in knowing whether they are equal between the two
copies.

Thanks,

Daniel

Jason Wong wrote on Thu, Feb 16, 2012 at 11:42:42 -0800:
   ./dump-noderev.pl /repository / 61851
 --
 
 id: 0.0.r61851/33470
 type: dir
 pred: 0.0.r61850/3844
 count: 61818
 text: 61851 32225 1232 1232 7555349571e297c23e647cc2441d5b8f
 cpath: /
 copyroot: 0 /
 minfo-cnt: 25685
 --


Re: predecessor count for the root node-revision is wrong message

2012-03-20 Thread Daniel Shahaf
Daniel Shahaf wrote on Tue, Mar 20, 2012 at 00:49:06 +0200:
 The time until 1.7.5 is counted in weeks, and 1.6.18 is scheduled to be
 released next week.
 

The fix was merged to 1.6.x@HEAD today and barring surprises will be
included in 1.6.18.


Re: predecessor count for the root node-revision is wrong message

2012-03-19 Thread Philip Martin
Daniel Shahaf danie...@elego.de writes:

 From what is there so far, yes. We do have different operations
 occurring at the same time, but for these ones, I see MERGE and DELETE
 verbs overlapping in the same or near time intervals according to the
 Apache logs. I just did a quick look in the Apache logs during a time
 window where the bug wasn't triggered, and was able to see cases where
 I have the following:
  rev(x-1) merge
  rev(x) merge
  rev(x) delete
  rev(x-1) delete
 
 This seems fine.
 The case that I had seen in my reported cases are as follows:
  rev(x-1) merge
  rev(x) merge
  rev(x-1) delete
  rev(x) delete
 

 I have tried forcing this sequence in the debugger, but that did not
 reproduce the bug.

[...]

 I conclude that the specific MERGE/MERGE/DELETE/DELETE order, by itself,
 is not sufficient to reproduce the bug.

DELETE is extremely unlikely to be involved in this bug.  mod_dav_svn
maintains a mapping of DAV activity ID to Subversion transaction name in
the dav/ subdirectory of a repository.  The dav/ subdirectory is not
part of the Subversion filesystem.  DELETE is used to remove the mapping
for a particular activity ID and to remove the transaction if it still
exists.  Following a successful MERGE the transaction will have been
converted into a revision so DELETE has no effect on the the Subversion
filesystem.  DELETE will only affect the Subversion filesystem if called
before a successful MERGE, and that will cause any subsequent MERGE for
the activity to fail.

-- 
uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com


#4129 is reproducible Re: predecessor count for the root node-revision is wrong message

2012-03-19 Thread Daniel Shahaf
Daniel Shahaf wrote on Sun, Mar 18, 2012 at 16:28:21 +0200:
 [ cc += dev@. summary for dev@: investigating issue #4129: predecessor
 count of rN is not incremented by one wrt that of r(N-1); see 
 http://subversion.tigris.org/issues/show_bug.cgi?id=4129 ]

Okay, count me happy :-)  I can reproduce this:

[[[
$SVN ps svn:mergeinfo '/branch/A:2' wc1/trunk/A
$SVN ps svn:mergeinfo '/branch/iota:2' wc2/trunk/iota
$SVN mkdir wc1/foo
$SVN mkdir wc2/bar
$svn ci -mm wc1  $svn ci -mm wc2  wait
]]]

Output:
[[[
Adding wc1/foo
Sendingwc1/trunk/A
Adding wc2/bar
Sendingwc2/trunk/iota

Committed revision 2.
subversion/svn/commit-cmd.c:183: (apr_err=160004)
subversion/libsvn_client/commit.c:876: (apr_err=160004)
subversion/libsvn_client/commit.c:876: (apr_err=160004)
svn: E160004: Commit failed (details follow):
subversion/libsvn_ra_serf/commit.c:2216: (apr_err=160004)
subversion/libsvn_ra_serf/commit.c:2216: (apr_err=160004)
subversion/libsvn_ra_serf/commit.c:2216: (apr_err=160004)
subversion/libsvn_ra_serf/util.c:774: (apr_err=160004)
subversion/libsvn_ra_serf/util.c:2087: (apr_err=160004)
subversion/libsvn_ra_serf/util.c:2087: (apr_err=160004)
subversion/libsvn_ra_serf/util.c:920: (apr_err=160004)
svn: E160004: predecessor count for the root node-revision is wrong: found (2+1 
!= 2), committing r3
]]]

Error log:
[[[
[Mon Mar 19 14:19:41.388413 2012] [dav:error] [pid 17156:tid 3064073072] 
[client ::1:40969] Could not MERGE resource /t/r1/!svn/txn/1-2 into /t/r1.  
[409, #0]
[Mon Mar 19 14:19:41.388497 2012] [dav:error] [pid 17156:tid 3064073072] 
[client ::1:40969] An error occurred while committing the transaction.  [409, 
#160004]
[Mon Mar 19 14:19:41.388518 2012] [dav:error] [pid 17156:tid 3064073072] 
[client ::1:40969] predecessor count for the root node-revision is wrong: found 
(2+1 != 2), committing r3  [409, #160004]
[Mon Mar 19 14:19:42.024052 2012] [authz_core:debug] [pid 17156:tid 3055680368] 
mod_authz_core.c(783): [client ::1:40971] AH01626: authorization result of 
Require all granted: granted
[Mon Mar 19 14:19:42.024114 2012] [authz_core:debug] [pid 17156:tid 3055680368] 
mod_authz_core.c(783): [client ::1:40971] AH01626: authorization result of 
RequireAny: granted
[Mon Mar 19 14:19:42.024781 2012] [dav:error] [pid 17156:tid 3055680368] 
[client ::1:40971] Could not fetch resource information.  [404, #0]
[Mon Mar 19 14:19:42.024820 2012] [dav:error] [pid 17156:tid 3055680368] 
[client ::1:40971] Named transaction doesn't exist.  [404, #0]
]]]

Using another property, or omitting either propset, is enough to cause
the bug not to trigger.


The bug reproduced both with and without the following patch:
[[[
--- subversion/libsvn_fs_fs/tree.c  (revision 1301511)
+++ subversion/libsvn_fs_fs/tree.c  (working copy)
@@ -1 +1,2 @@
+#include unistd.h
 /* tree.c : tree-like filesystem, built on DAG filesystem
@@ -1701 +1702,2 @@ svn_fs_fs__commit_txn(const char **conflict_p,
+  sleep(3);
   err = svn_fs_fs__commit(new_rev, fs, txn, iterpool);
@@ -1729,3 +1731,4 @@ svn_fs_fs__commit_txn(const char **conflict_p,
  cleanup:
 
+  sleep(3);
   svn_fs_fs__reset_txn_caches(fs);
]]]


The bug reproduced with either ServerLimit 1 or ThreadLimit 1 in
httpd.conf.  (That forced both commits to be served by the same process
(resp., by different processes).)  I use httpd 2.4.1 with event MPM.



Re: #4129 is reproducible Re: predecessor count for the root node-revision is wrong message

2012-03-19 Thread Philip Martin
Daniel Shahaf danie...@elego.de writes:

 The bug reproduced with either ServerLimit 1 or ThreadLimit 1 in
 httpd.conf.  (That forced both commits to be served by the same process
 (resp., by different processes).)  I use httpd 2.4.1 with event MPM.

I can reproduce ove ra_local:

svnadmin create repo
svn mkdir -mm file://`pwd`/repo/A
svn mkdir -mm file://`pwd`/repo/B
svn co file://`pwd`/repo wc1
svn co file://`pwd`/repo wc2
svn ps svn:mergeinfo /P:2 wc1/A
svn ps svn:mergeinfo /Q:2 wc2/B
svn mkdir wc1/X
svn mkdir wc2/Y
svn ci -mm wc1  svn ci -mm wc2  wait

Gives:

Sendingwc1/A
Adding wc1/X
Sendingwc2/B
Adding wc2/Y

Committed revision 3.
svn: E160004: Commit failed (details follow):
svn: E160004: predecessor count for the root node-revision is wrong: found 3, 
committing r4

-- 
uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com


Re: predecessor count for the root node-revision is wrong message

2012-03-19 Thread Daniel Shahaf
shashank subramaniam wrote on Mon, Mar 19, 2012 at 20:29:47 +0530:
 Hey,
  We tried looping as a solution to the ''Predescessor
 Count for the root node revision is wrong' error (We looped the commit
 alone till it is commited).This works, but if 100 people try to commit to
 the same repository at the same time, the 100th guy's instance of the
 program is looped 100 times.There is also the possibility of it falling
 into an infinite loop due to some internal error.So, can you suggest some
 other solution around this bug, whereby multiple users are able to commit
 to the same repository at the same time(Space complexity is not
 a constraint)
 

Normally concurrent commits are supported.  The bug under discussion has
to do with a flaw in the mechanism that causes concurrent commits to
succeed.

 I noticed a similar problem also, which I think might be related to the
 'Predescessor' bug. Simultaneous SVN Updates from the same repository by
 different users fails silently (throws no error, but update doesn't
 happen). I think, probably an implicit lock is being put on the repository,
 preventing simultaneous operations.Please do let us know if some solution
 exists.
 

In FSFS there are no read locks, and different readers don't interfere
with each other or with others.  You'll have to describe what you see in
more detail.  (Exactly what you did, what you saw, and how it differed
from what you expected to see)

Thanks

Daniel


 Thanks
 Subu


Re: #4129 is reproducible Re: predecessor count for the root node-revision is wrong message

2012-03-19 Thread Philip Martin
Philip Martin philip.mar...@wandisco.com writes:

 I can reproduce ove ra_local:

 svnadmin create repo
 svn mkdir -mm file://`pwd`/repo/A
 svn mkdir -mm file://`pwd`/repo/B
 svn co file://`pwd`/repo wc1
 svn co file://`pwd`/repo wc2
 svn ps svn:mergeinfo /P:2 wc1/A
 svn ps svn:mergeinfo /Q:2 wc2/B
 svn mkdir wc1/X
 svn mkdir wc2/Y
 svn ci -mm wc1  svn ci -mm wc2  wait

 Gives:

 Sendingwc1/A
 Adding wc1/X
 Sendingwc2/B
 Adding wc2/Y

 Committed revision 3.
 svn: E160004: Commit failed (details follow):
 svn: E160004: predecessor count for the root node-revision is wrong: found 3, 
 committing r4

This is the problem code in libsvn_fs_fs/tree.c:merge

  SVN_ERR(svn_fs_fs__dag_get_predecessor_count(pred_count, source, pool));
  SVN_ERR(update_ancestry(fs, source_id, target_id, target_path,
  pred_count, pool));

  if (svn_fs_fs__fs_supports_mergeinfo(fs))
SVN_ERR(svn_fs_fs__dag_increment_mergeinfo_count(target,
 mergeinfo_increment,
 pool));


target is dag_node_t* which is opaque outside dag.c and
target-node_revision-predecessor_count is 3 when we reach the above
code.

update_ancestry rewrites the file repo/db/transactions/2-2.txn/node.0.0
with the correctly updated value count: 4 but nothing updates
target-node_revision-predecessor_count in memory.

svn_fs_fs__dag_increment_mergeinfo_count then rewrites 
repo/db/transactions/2-2.txn/node.0.0
pulling the old value of target-node_revision-predecessor_count from
memory and putting count: 3 back in the file.

If I use the debugger to manually set target-node_revision to NULL
inside svn_fs_fs__dag_increment_mergeinfo_count then the commit works.
I'm not exactly sure how all the FSFS caching layers are supposed to
interact.  Is tree.c:update_ancestry supposed to update the in-memory
predecessor_count?  Should there be a svn_fs_fs__dag_xxx function to
change the predecessor count?  Should target-node_revision be set to
NULL soemwehere?  Something else?

-- 
uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com


Re: #4129 is reproducible Re: predecessor count for the root node-revision is wrong message

2012-03-19 Thread Daniel Shahaf
Philip Martin wrote on Mon, Mar 19, 2012 at 17:25:22 +:
 Philip Martin philip.mar...@wandisco.com writes:
 
  I can reproduce ove ra_local:
 
  svnadmin create repo
  svn mkdir -mm file://`pwd`/repo/A
  svn mkdir -mm file://`pwd`/repo/B
  svn co file://`pwd`/repo wc1
  svn co file://`pwd`/repo wc2
  svn ps svn:mergeinfo /P:2 wc1/A
  svn ps svn:mergeinfo /Q:2 wc2/B
  svn mkdir wc1/X
  svn mkdir wc2/Y
  svn ci -mm wc1  svn ci -mm wc2  wait
 
  Gives:
 
  Sendingwc1/A
  Adding wc1/X
  Sendingwc2/B
  Adding wc2/Y
 
  Committed revision 3.
  svn: E160004: Commit failed (details follow):
  svn: E160004: predecessor count for the root node-revision is wrong: found 
  3, committing r4
 
 This is the problem code in libsvn_fs_fs/tree.c:merge
 
   SVN_ERR(svn_fs_fs__dag_get_predecessor_count(pred_count, source, pool));
   SVN_ERR(update_ancestry(fs, source_id, target_id, target_path,
   pred_count, pool));
 
   if (svn_fs_fs__fs_supports_mergeinfo(fs))
 SVN_ERR(svn_fs_fs__dag_increment_mergeinfo_count(target,
  mergeinfo_increment,
  pool));
 
 
 target is dag_node_t* which is opaque outside dag.c and
 target-node_revision-predecessor_count is 3 when we reach the above
 code.
 
 update_ancestry rewrites the file repo/db/transactions/2-2.txn/node.0.0
 with the correctly updated value count: 4 but nothing updates
 target-node_revision-predecessor_count in memory.
 
 svn_fs_fs__dag_increment_mergeinfo_count then rewrites 
 repo/db/transactions/2-2.txn/node.0.0
 pulling the old value of target-node_revision-predecessor_count from
 memory and putting count: 3 back in the file.
 
 If I use the debugger to manually set target-node_revision to NULL
 inside svn_fs_fs__dag_increment_mergeinfo_count then the commit works.
 I'm not exactly sure how all the FSFS caching layers are supposed to
 interact.  Is tree.c:update_ancestry supposed to update the in-memory
 predecessor_count?  Should there be a svn_fs_fs__dag_xxx function to
 change the predecessor count?  Should target-node_revision be set to
 NULL soemwehere?  Something else?

Just a note that another option here is for merge() to re-fetch
'dag_node_t *target' via dag.h APIs, to get another struct with a fresh
(actually: not yet populated) cache -noderev member.


Re: #4129 is reproducible Re: predecessor count for the root node-revision is wrong message

2012-03-19 Thread C. Michael Pilato
On 03/19/2012 01:25 PM, Philip Martin wrote:
 Philip Martin philip.mar...@wandisco.com writes:
 
 I can reproduce ove ra_local:

 svnadmin create repo
 svn mkdir -mm file://`pwd`/repo/A
 svn mkdir -mm file://`pwd`/repo/B
 svn co file://`pwd`/repo wc1
 svn co file://`pwd`/repo wc2
 svn ps svn:mergeinfo /P:2 wc1/A
 svn ps svn:mergeinfo /Q:2 wc2/B
 svn mkdir wc1/X
 svn mkdir wc2/Y
 svn ci -mm wc1  svn ci -mm wc2  wait

 Gives:

 Sendingwc1/A
 Adding wc1/X
 Sendingwc2/B
 Adding wc2/Y

 Committed revision 3.
 svn: E160004: Commit failed (details follow):
 svn: E160004: predecessor count for the root node-revision is wrong: found 
 3, committing r4

Is this problem specific to the FSFS backend?

-- 
C. Michael Pilato cmpil...@collab.net
CollabNet  www.collab.net  Distributed Development On Demand



signature.asc
Description: OpenPGP digital signature


Re: #4129 is reproducible Re: predecessor count for the root node-revision is wrong message

2012-03-19 Thread Daniel Shahaf
C. Michael Pilato wrote on Mon, Mar 19, 2012 at 13:57:51 -0400:
 Is this problem specific to the FSFS backend?

No.

% ../runpytest svnadmin mergeinfo_race --fs-type bdb
2012-03-19 20:21:44 [WARNING] CWD: 
/home/daniel/src/svn/t1/subversion/tests/cmdline
2012-03-19 20:21:44 [WARNING] EXCEPTION: Failure: one or both commits failed
XFAIL: svnadmin_tests.py 29: concurrent mergeinfo commits invalidate pred-count


Re: #4129 is reproducible Re: predecessor count for the root node-revision is wrong message

2012-03-19 Thread Philip Martin
C. Michael Pilato cmpil...@collab.net writes:

 On 03/19/2012 01:25 PM, Philip Martin wrote:
 Philip Martin philip.mar...@wandisco.com writes:
 
 I can reproduce ove ra_local:

 svnadmin create repo
 svn mkdir -mm file://`pwd`/repo/A
 svn mkdir -mm file://`pwd`/repo/B
 svn co file://`pwd`/repo wc1
 svn co file://`pwd`/repo wc2
 svn ps svn:mergeinfo /P:2 wc1/A
 svn ps svn:mergeinfo /Q:2 wc2/B
 svn mkdir wc1/X
 svn mkdir wc2/Y
 svn ci -mm wc1  svn ci -mm wc2  wait

 Gives:

 Sendingwc1/A
 Adding wc1/X
 Sendingwc2/B
 Adding wc2/Y

 Committed revision 3.
 svn: E160004: Commit failed (details follow):
 svn: E160004: predecessor count for the root node-revision is wrong: found 
 3, committing r4

 Is this problem specific to the FSFS backend?

Yes, I think it is.

For BDB the dag_node_t type in dag.c doesn't have a node_revision
member.  When update_ancestry does svn_fs_bdb__put_node_revision it
writes to the database and subsequent svn_fs_bdb__get_node_revision will
see the updated value.

For FSFS the svn_fs_fs__put_node_revision call writes to the
transactions subdir but the in-memory node_revision doesn't get changed.

-- 
uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com


Re: #4129 is reproducible Re: predecessor count for the root node-revision is wrong message

2012-03-19 Thread Philip Martin
Daniel Shahaf danie...@elego.de writes:

 C. Michael Pilato wrote on Mon, Mar 19, 2012 at 13:57:51 -0400:
 Is this problem specific to the FSFS backend?

 No.

 % ../runpytest svnadmin mergeinfo_race --fs-type bdb
 2012-03-19 20:21:44 [WARNING] CWD: 
 /home/daniel/src/svn/t1/subversion/tests/cmdline
 2012-03-19 20:21:44 [WARNING] EXCEPTION: Failure: one or both commits failed
 XFAIL: svnadmin_tests.py 29: concurrent mergeinfo commits invalidate 
 pred-count

I think that's a failure of the regression test, after the XFAIL the
repository looks like:

$ svn log -vqr3:2 file://`pwd`/svn-test-work/repositories/svnadmin_tests-29/

r3 | jrandom | 2012-03-19 18:28:53 + (Mon, 19 Mar 2012)
Changed paths:
   M /A
   A /d1

r2 | jrandom | 2012-03-19 18:28:53 + (Mon, 19 Mar 2012)
Changed paths:
   A /d2
   M /iota


Also the test still XFAILs for FSFS with my FSFS patch to fix the
problem.

-- 
uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com


Re: #4129 is reproducible Re: predecessor count for the root node-revision is wrong message

2012-03-19 Thread C. Michael Pilato
On 03/19/2012 02:24 PM, Philip Martin wrote:
 C. Michael Pilato cmpil...@collab.net writes:
 Is this problem specific to the FSFS backend?
 
 Yes, I think it is.
 
 For BDB the dag_node_t type in dag.c doesn't have a node_revision
 member.  When update_ancestry does svn_fs_bdb__put_node_revision it
 writes to the database and subsequent svn_fs_bdb__get_node_revision will
 see the updated value.
 
 For FSFS the svn_fs_fs__put_node_revision call writes to the
 transactions subdir but the in-memory node_revision doesn't get changed.

Gotcha.  Thanks for the response.

I remember when the BDB code had an in-memory node-revision structure and
tried to maintain that using the trail construct.  I remember also the day I
purged the code of that mess. :-)

-- 
C. Michael Pilato cmpil...@collab.net
CollabNet  www.collab.net  Distributed Development On Demand



signature.asc
Description: OpenPGP digital signature


Re: #4129 is reproducible Re: predecessor count for the root node-revision is wrong message

2012-03-19 Thread Daniel Shahaf
Philip Martin wrote on Mon, Mar 19, 2012 at 18:31:41 +:
 Daniel Shahaf danie...@elego.de writes:
 
  C. Michael Pilato wrote on Mon, Mar 19, 2012 at 13:57:51 -0400:
  Is this problem specific to the FSFS backend?
 
  No.
 
  % ../runpytest svnadmin mergeinfo_race --fs-type bdb
  2012-03-19 20:21:44 [WARNING] CWD: 
  /home/daniel/src/svn/t1/subversion/tests/cmdline
  2012-03-19 20:21:44 [WARNING] EXCEPTION: Failure: one or both commits failed
  XFAIL: svnadmin_tests.py 29: concurrent mergeinfo commits invalidate 
  pred-count
 
 I think that's a failure of the regression test, after the XFAIL the
 repository looks like:
 

Right as usual, implemented your fix in r1302591.


Re: #4129 is reproducible Re: predecessor count for the root node-revision is wrong message

2012-03-19 Thread Philip Martin
Philip Martin philip.mar...@wandisco.com writes:

 If I use the debugger to manually set target-node_revision to NULL
 inside svn_fs_fs__dag_increment_mergeinfo_count then the commit works.
 I'm not exactly sure how all the FSFS caching layers are supposed to
 interact.  Is tree.c:update_ancestry supposed to update the in-memory
 predecessor_count?  Should there be a svn_fs_fs__dag_xxx function to
 change the predecessor count?  Should target-node_revision be set to
 NULL soemwehere?  Something else?

Moving update_ancestry from tree.c to dag.c is one way to fix the
problem.  Daniel also suggested removing the node_revision member of
dag_node_t altogether and relying on new 1.7 caching to give us the
performance.  I suppose we would still need a patch like this for 1.6.


Index: ../src/subversion/libsvn_fs_fs/dag.c
===
--- ../src/subversion/libsvn_fs_fs/dag.c(revision 1302591)
+++ ../src/subversion/libsvn_fs_fs/dag.c(working copy)
@@ -1296,3 +1296,27 @@
 
   return SVN_NO_ERROR;
 }
+
+svn_error_t *
+svn_fs_fs__dag_update_ancestry(dag_node_t *target,
+   dag_node_t *source,
+   apr_pool_t *pool)
+{
+  node_revision_t *source_noderev, *target_noderev;
+
+  if (! svn_fs_fs__dag_check_mutable(target))
+return svn_error_createf
+  (SVN_ERR_FS_NOT_MUTABLE, NULL,
+   _(Attempted to update ancestry of non-mutable node));
+
+  SVN_ERR(get_node_revision(source_noderev, source, pool));
+  SVN_ERR(get_node_revision(target_noderev, target, pool));
+
+  target_noderev-predecessor_id = source-id;
+  target_noderev-predecessor_count = source_noderev-predecessor_count;
+  if (target_noderev-predecessor_count != -1)
+target_noderev-predecessor_count++;
+
+  return svn_fs_fs__put_node_revision(target-fs, target-id, target_noderev,
+  FALSE, pool);
+}
Index: ../src/subversion/libsvn_fs_fs/tree.c
===
--- ../src/subversion/libsvn_fs_fs/tree.c   (revision 1302591)
+++ ../src/subversion/libsvn_fs_fs/tree.c   (working copy)
@@ -1142,32 +1142,6 @@
 }
 
 
-/* Teach node-revision TARGET_ID that node-revision SOURCE_ID is its
-   predecessor.  TARGET_PATH is used for error messages only. */
-static svn_error_t *
-update_ancestry(svn_fs_t *fs,
-const svn_fs_id_t *source_id,
-const svn_fs_id_t *target_id,
-const char *target_path,
-int source_pred_count,
-apr_pool_t *pool)
-{
-  node_revision_t *noderev;
-
-  if (svn_fs_fs__id_txn_id(target_id) == NULL)
-return svn_error_createf
-  (SVN_ERR_FS_NOT_MUTABLE, NULL,
-   _(Unexpected immutable node at '%s'), target_path);
-
-  SVN_ERR(svn_fs_fs__get_node_revision(noderev, fs, target_id, pool));
-  noderev-predecessor_id = source_id;
-  noderev-predecessor_count = source_pred_count;
-  if (noderev-predecessor_count != -1)
-noderev-predecessor_count++;
-  return svn_fs_fs__put_node_revision(fs, target_id, noderev, FALSE, pool);
-}
-
-
 /* Set the contents of CONFLICT_PATH to PATH, and return an
SVN_ERR_FS_CONFLICT error that indicates that there was a conflict
at PATH.  Perform all allocations in POOL (except the allocation of
@@ -1219,7 +1193,6 @@
   apr_hash_index_t *hi;
   svn_fs_t *fs;
   apr_pool_t *iterpool;
-  int pred_count;
   apr_int64_t mergeinfo_increment = 0;
 
   /* Make sure everyone comes from the same filesystem. */
@@ -1543,9 +1516,7 @@
 }
   svn_pool_destroy(iterpool);
 
-  SVN_ERR(svn_fs_fs__dag_get_predecessor_count(pred_count, source, pool));
-  SVN_ERR(update_ancestry(fs, source_id, target_id, target_path,
-  pred_count, pool));
+  SVN_ERR(svn_fs_fs__dag_update_ancestry(target, source, pool));
 
   if (svn_fs_fs__fs_supports_mergeinfo(fs))
 SVN_ERR(svn_fs_fs__dag_increment_mergeinfo_count(target,
Index: ../src/subversion/libsvn_fs_fs/dag.h
===
--- ../src/subversion/libsvn_fs_fs/dag.h(revision 1302591)
+++ ../src/subversion/libsvn_fs_fs/dag.h(working copy)
@@ -603,6 +603,10 @@
   dag_node_t *node,
   apr_pool_t *pool);
 
+svn_error_t *
+svn_fs_fs__dag_update_ancestry(dag_node_t *target,
+   dag_node_t *source,
+   apr_pool_t *pool);
 #ifdef __cplusplus
 }
 #endif /* __cplusplus */

-- 
uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com


Re: #4129 is reproducible Re: predecessor count for the root node-revision is wrong message

2012-03-19 Thread Daniel Shahaf
Philip Martin wrote on Mon, Mar 19, 2012 at 18:45:37 +:
 Philip Martin philip.mar...@wandisco.com writes:
 
  If I use the debugger to manually set target-node_revision to NULL
  inside svn_fs_fs__dag_increment_mergeinfo_count then the commit works.
  I'm not exactly sure how all the FSFS caching layers are supposed to
  interact.  Is tree.c:update_ancestry supposed to update the in-memory
  predecessor_count?  Should there be a svn_fs_fs__dag_xxx function to
  change the predecessor count?  Should target-node_revision be set to
  NULL soemwehere?  Something else?
 
 Moving update_ancestry from tree.c to dag.c is one way to fix the
 problem.  Daniel also suggested removing the node_revision member of
 dag_node_t altogether and relying on new 1.7 caching to give us the
 performance.  I suppose we would still need a patch like this for 1.6.
 

++1


Re: predecessor count for the root node-revision is wrong message

2012-03-19 Thread Jason Wong
Hello Daniel, Philip.

I have been following the thread: #4129 is reproducible Re:
predecessor count for the root node-revision is wrong message.
It looks like you all have it figured out now. Good job.

Do you need any more information from me at this point? Thanks.

Jason Wong.


Re: predecessor count for the root node-revision is wrong message

2012-03-19 Thread Daniel Shahaf
Jason Wong wrote on Mon, Mar 19, 2012 at 13:41:19 -0700:
 Hello Daniel, Philip.
 
 I have been following the thread: #4129 is reproducible Re:
 predecessor count for the root node-revision is wrong message.
 It looks like you all have it figured out now. Good job.
 
 Do you need any more information from me at this point? Thanks.
 

Thanks Jason.  It would be useful if you could confirm that you do not
run into the error after rebuilding the server with r1302399 and
r1302613 applied.  (If you run the test suite, apply r1302539 and
r1302591 too.)  These revisions constitute the fix which is nominated
for inclusion in 1.6.18 and 1.7.5; see ^/subversion/branches/1.7.x/STATUS.

Cheers,

Daniel

 Jason Wong.


Re: predecessor count for the root node-revision is wrong message

2012-03-19 Thread Jason Wong
On Mon, Mar 19, 2012 at 1:56 PM, Daniel Shahaf danie...@elego.de wrote:
 Jason Wong wrote on Mon, Mar 19, 2012 at 13:41:19 -0700:
 Hello Daniel, Philip.

 I have been following the thread: #4129 is reproducible Re:
 predecessor count for the root node-revision is wrong message.
 It looks like you all have it figured out now. Good job.

 Do you need any more information from me at this point? Thanks.


 Thanks Jason.  It would be useful if you could confirm that you do not
 run into the error after rebuilding the server with r1302399 and
 r1302613 applied.  (If you run the test suite, apply r1302539 and
 r1302591 too.)  These revisions constitute the fix which is nominated
 for inclusion in 1.6.18 and 1.7.5; see ^/subversion/branches/1.7.x/STATUS.


Hi Daniel.

The developer who built the svn client is away and will probably not be back
until later this week. What is your ETA for 1.7.5? Just wondering if that would
released before the developer I have is back.

Thanks.

Jason

 Cheers,

 Daniel

 Jason Wong.


Re: predecessor count for the root node-revision is wrong message

2012-03-19 Thread Daniel Shahaf
Jason Wong wrote on Mon, Mar 19, 2012 at 15:34:53 -0700:
 On Mon, Mar 19, 2012 at 1:56 PM, Daniel Shahaf danie...@elego.de wrote:
  Jason Wong wrote on Mon, Mar 19, 2012 at 13:41:19 -0700:
  Hello Daniel, Philip.
 
  I have been following the thread: #4129 is reproducible Re:
  predecessor count for the root node-revision is wrong message.
  It looks like you all have it figured out now. Good job.
 
  Do you need any more information from me at this point? Thanks.
 
 
  Thanks Jason.  It would be useful if you could confirm that you do not
  run into the error after rebuilding the server with r1302399 and
  r1302613 applied.  (If you run the test suite, apply r1302539 and
  r1302591 too.)  These revisions constitute the fix which is nominated
  for inclusion in 1.6.18 and 1.7.5; see ^/subversion/branches/1.7.x/STATUS.
 
 
 Hi Daniel.
 
 The developer who built the svn client is away and will probably not be back
 until later this week. What is your ETA for 1.7.5? Just wondering if that 
 would
 released before the developer I have is back.
 

The time until 1.7.5 is counted in weeks, and 1.6.18 is scheduled to be
released next week.

 Thanks.
 
 Jason
 
  Cheers,
 
  Daniel
 
  Jason Wong.


Re: predecessor count for the root node-revision is wrong message

2012-03-18 Thread Daniel Shahaf
[ cc += dev@. summary for dev@: investigating issue #4129: predecessor
count of rN is not incremented by one wrt that of r(N-1); see 
http://subversion.tigris.org/issues/show_bug.cgi?id=4129 ]

Jason Wong wrote on Thu, Mar 15, 2012 at 07:57:47 -0700:
 On Wed, Mar 14, 2012 at 5:15 PM, Daniel Shahaf danie...@elego.de wrote:
  Jason Wong wrote on Tue, Mar 13, 2012 at 06:57:59 -0700:
  On Fri, Mar 2, 2012 at 8:12 AM, Daniel Shahaf danie...@elego.de wrote:
   Jason Wong wrote on Fri, Mar 02, 2012 at 07:32:38 -0800:
   On Fri, Mar 2, 2012 at 2:58 AM, Daniel Shahaf danie...@elego.de wrote:
Jason Wong wrote on Thu, Mar 01, 2012 at 10:01:26 -0800:
I have had a developer here create a build of the latest SVN code
with your changes you mentioned in r1294470 for the svnadmin verify
   
Okay, that's great news, for two reasons:
   
1. It means building svn on windows isn't as painful as it used to be 
:)
  
   Actually, it did take some work to get it going as we did not have
   another system available to us and also did not have VC++ 6. We had
   to use VS 2010 in order to do this. Also, for the other components
   required (python,perl etc), the files after the install were copied
   to the workstation to see if it would work as we did not want to
   change the current workstation configuration by running the
   installers. All in all, it did seem to work.
  
  
   Okay.  The normal build requires just the *.exe and *.dll files to be
   placed appropriately (such that the *.exe's and httpd's find their
   libsvn_* DLL's at runtime) --- it doesn't require Administrator access,
   for example.
  
   To clarify, Perl is only required to build OpenSSL; it is not required
   to build APR, Neon, or Subversion.
  
   
2. It means I can ask you to build a custom server with the 
'inprocess'
cache disabled, or (if all else fails) to bisect, per my previous 
email.
   
One of the things you could try is to disable caching: simply modify
the function create_cache() in libsvn_fs_fs/caching.c to always return
NULL in *CACHE_P.  See below for another suggestion.
   
command. We have run 'svnadmin verify' against every revision of our
hotcopy of our repository taken when we first brought this issue to
the forums and are now tracking down each of the revisions to see
what actions were being done at those times.
   
   
Thanks!  I do hope this work enables us to pinpoint and fix the bug.
  
   I will be going through the list to see what else was happening at the
   same time on the apache server since it was alluded to that there may
   be concurrency issues. I know the last two times that this error has
   popped up, we had two svn operations starting at around the same time
   according to the Apache logs. I will go through the previous apache
   history to see if this was always the case or not.
  
  
   Thanks, looking forward to hear what you come up with.
  
   FWIW, Justin's reply suggests that the error was seen on three different
   platforms --- Windows, Solaris, and FreeBSD --- so that should narrow
   down the range of possible explanations.
  
   (I'll also note that at ASF's installation we are not running into new
   instances of the bug.)
 
  Hi Daniel.
 
 
  Hi.  Sorry for the delay --- was away from svn the last few days.
 
 
 No problem. I have been really busy as well past couple weeks on other
 tasks.
 
  I haven't gone through all the cases yet, but I have made progress
  through quite a number of them and a pattern seems to be coming up.
 
 
  Is it safe to summarize your findings as: in every instance of the bug
  (as determined by the new 'svnadmin verify' output), the victim revision
  was started whilst (victim-1) was in progress?
 
 
 From what is there so far, yes. We do have different operations
 occurring at the same time, but for these ones, I see MERGE and DELETE
 verbs overlapping in the same or near time intervals according to the
 Apache logs. I just did a quick look in the Apache logs during a time
 window where the bug wasn't triggered, and was able to see cases where
 I have the following:
  rev(x-1) merge
  rev(x) merge
  rev(x) delete
  rev(x-1) delete
 
 This seems fine.
 The case that I had seen in my reported cases are as follows:
  rev(x-1) merge
  rev(x) merge
  rev(x-1) delete
  rev(x) delete
 

I have tried forcing this sequence in the debugger, but that did not
reproduce the bug.

I was using trunk@1301511 mod_dav_svn, trunk client, and httpd-2.4.
I set a breakpoint in ra_neon's commit.c:commit_close_edit() and
ra_serf's commit.c:close_edit() before the DELETE of the activity.  and
sent the requests in the following order: WC1-MERGE, WC2-MERGE,
WC1-DELETE, WC2-DELETE.  I tried both serf and neon.  I tried
mod_dav_svn of trunk@1301511 and 1.6.x@1301511.  (Only httpv1 makes
sense, since in httpv2 DELETEs do not occur.)

I conclude that the specific MERGE/MERGE/DELETE/DELETE order, by itself,
is not sufficient to 

Re: predecessor count for the root node-revision is wrong message

2012-03-14 Thread subu7890
Hey Jason,
I'm also facing a similar problem.I'm working on a project 
where the svn implementation is on the server side.Since svn 
merge(reintegration only required for my project) reqiures a local working 
copy, I maintain one working copy folder for each user, which I switch to 
the target repository folder(the folder becomes the working copy of the 
repository url of the merge target).When two users, try to merge at the 
very same instant, the merge happens, but the following commit that is 
called inside the same function returns the 'Predescessor Count for the 
root node revision is wrong' for the second user (1st user is able to 
commit properly).I'm invoking a cmd propmt on the server via c# code and 
running the svn commands there.I have gone through your discussions here.I 
found it very technical.Could you Kindly please try to be a bit 
non-technical while replying to my queries?
On Wednesday, 8 February 2012 02:53:10 UTC+5:30, Jason Wong wrote:

 Hello. 

 I have recently run into an issue with my subversion system (1.7.1)
 where a specific component I am trying to build has failed. I have
 had sucessful builds of this project before this issue happened since
 we upgraded to 1.7.1. I am currently getting by this by using
 TortoiseSVN 1.7.1 to do the check-in of the files left in the
 workspace. 

 This issue seems to be intermittent as it does not happen every time.
 I am using subversion 1.7.1 windows binaries and subversion is
 hosting on Apache 2.2.21. 

 I have run svnadmin verify against the directory and it has come
 back clean. I was wondering what type of events can cause this to
 happen? Are there any resolutions? 

 Here are some entries from the apache log for the last time this
 issue happened. 

 The Apache error log shows the following lines: 

 [Tue Jan 31 11:37:23 2012] [error] [client 9.31.13.109] Could not
 MERGE resource /repository/!svn/txn/61847-1bz5 into
 /repository/project/binaries/release/phase1/iteration/81/trunk
 . [409, #0]
 [Tue Jan 31 11:37:23 2012] [error] [client 9.31.13.109] An error
 occurred while committing the transaction.  [409, #160004]
 [Tue Jan 31 11:37:23 2012] [error] [client 9.31.13.109] predecessor
 count for the root node-revision is wrong: found 61815,
 committing r61852  [409, #160004]
 [Tue Jan 31 11:37:24 2012] [error] [client 9.31.13.109] Could not
 fetch resource information.  [404, #0]
 [Tue Jan 31 11:37:24 2012] [error] [client 9.31.13.109] Named
 transaction doesn't exist.  [404, #0] 

 The following are from the Apache access log at the same time: 

 9.31.13.109 - username [31/Jan/2012:11:37:22 -0800] MERGE
 /repository/project/binaries/release/phase1/iteration/81/trunk
 HTTP/1.1 409 281
 9.31.13.109 - username [31/Jan/2012:11:37:24 -0800] DELETE
 /repository/!svn/txn/61847-1bz5 HTTP/1.1 404 232 

 Any help/comments would be appreciated. Thank you. 


 Jason Wong

 

Re: predecessor count for the root node-revision is wrong message

2012-03-14 Thread Daniel Shahaf
Jason Wong wrote on Tue, Mar 13, 2012 at 06:57:59 -0700:
 On Fri, Mar 2, 2012 at 8:12 AM, Daniel Shahaf danie...@elego.de wrote:
  Jason Wong wrote on Fri, Mar 02, 2012 at 07:32:38 -0800:
  On Fri, Mar 2, 2012 at 2:58 AM, Daniel Shahaf danie...@elego.de wrote:
   Jason Wong wrote on Thu, Mar 01, 2012 at 10:01:26 -0800:
   I have had a developer here create a build of the latest SVN code
   with your changes you mentioned in r1294470 for the svnadmin verify
  
   Okay, that's great news, for two reasons:
  
   1. It means building svn on windows isn't as painful as it used to be :)
 
  Actually, it did take some work to get it going as we did not have
  another system available to us and also did not have VC++ 6. We had
  to use VS 2010 in order to do this. Also, for the other components
  required (python,perl etc), the files after the install were copied
  to the workstation to see if it would work as we did not want to
  change the current workstation configuration by running the
  installers. All in all, it did seem to work.
 
 
  Okay.  The normal build requires just the *.exe and *.dll files to be
  placed appropriately (such that the *.exe's and httpd's find their
  libsvn_* DLL's at runtime) --- it doesn't require Administrator access,
  for example.
 
  To clarify, Perl is only required to build OpenSSL; it is not required
  to build APR, Neon, or Subversion.
 
  
   2. It means I can ask you to build a custom server with the 'inprocess'
   cache disabled, or (if all else fails) to bisect, per my previous email.
  
   One of the things you could try is to disable caching: simply modify
   the function create_cache() in libsvn_fs_fs/caching.c to always return
   NULL in *CACHE_P.  See below for another suggestion.
  
   command. We have run 'svnadmin verify' against every revision of our
   hotcopy of our repository taken when we first brought this issue to
   the forums and are now tracking down each of the revisions to see
   what actions were being done at those times.
  
  
   Thanks!  I do hope this work enables us to pinpoint and fix the bug.
 
  I will be going through the list to see what else was happening at the
  same time on the apache server since it was alluded to that there may
  be concurrency issues. I know the last two times that this error has
  popped up, we had two svn operations starting at around the same time
  according to the Apache logs. I will go through the previous apache
  history to see if this was always the case or not.
 
 
  Thanks, looking forward to hear what you come up with.
 
  FWIW, Justin's reply suggests that the error was seen on three different
  platforms --- Windows, Solaris, and FreeBSD --- so that should narrow
  down the range of possible explanations.
 
  (I'll also note that at ASF's installation we are not running into new
  instances of the bug.)
 
 Hi Daniel.
 

Hi.  Sorry for the delay --- was away from svn the last few days.

 I haven't gone through all the cases yet, but I have made progress
 through quite a number of them and a pattern seems to be coming up.
 

Is it safe to summarize your findings as: in every instance of the bug
(as determined by the new 'svnadmin verify' output), the victim revision
was started whilst (victim-1) was in progress?

That by itself is an everyday occurence, but I think it's nonetheless
a useful piece of information.  I'll try and digest it further later
when I'm less sleepy (it's way past midnight here).

(As I understand ra_dav, the MERGE verb corresponds to the FS level
svn_fs_commit_txn().  Someone please correct me if I'm wrong.)

Thanks,

Daniel

 I have attached 2 txt files. One shows the modified svnadmin verify
 output from the binaries we built. The other shows the revisions and
 what appears to have been occuring at the time of the bug. I figure
 better to provide this now rather than delay any longer for the rest
 of the results.
 
 I will continue to go through the rest of the events and see if
 there are other differences seen when the issue occurs. I hope
 this information helps.
 
 Thanks.
 
 Jason



Re: predecessor count for the root node-revision is wrong message

2012-03-14 Thread Daniel Shahaf
subu7890 wrote on Tue, Mar 13, 2012 at 21:51:17 -0700:
 Hey Jason,
 I'm also facing a similar problem.I'm working on a project 
 where the svn implementation is on the server side.Since svn 
 merge(reintegration only required for my project) reqiures a local working 
 copy, I maintain one working copy folder for each user, which I switch to 
 the target repository folder(the folder becomes the working copy of the 
 repository url of the merge target).When two users, try to merge at the 
 very same instant, the merge happens, but the following commit that is 
 called inside the same function returns the 'Predescessor Count for the 
 root node revision is wrong' for the second user (1st user is able to 
 commit properly).I'm invoking a cmd propmt on the server via c# code and 
 running the svn commands there.I have gone through your discussions here.I 
 found it very technical.Could you Kindly please try to be a bit 
 non-technical while replying to my queries?

Sure, we will try.  What are your queries?

Right now we don't know what exactly causes the bug; we are still
investigating.  When we make progress or find a fix, we will announce
that.  The best place to watch for such announcements is by registering
to the issue tracker and adding yourself to the CC list on issue #4129
(link below):

http://subversion.tigris.org/issues/show_bug.cgi?id=4129

Please keep discussion of the bug on the mailing list, not on the
issue tracker.

Cheers,

Daniel

 On Wednesday, 8 February 2012 02:53:10 UTC+5:30, Jason Wong wrote:
 
  Hello. 
 
  I have recently run into an issue with my subversion system (1.7.1)
  where a specific component I am trying to build has failed. I have
  had sucessful builds of this project before this issue happened since
  we upgraded to 1.7.1. I am currently getting by this by using
  TortoiseSVN 1.7.1 to do the check-in of the files left in the
  workspace. 
 
  This issue seems to be intermittent as it does not happen every time.
  I am using subversion 1.7.1 windows binaries and subversion is
  hosting on Apache 2.2.21. 
 
  I have run svnadmin verify against the directory and it has come
  back clean. I was wondering what type of events can cause this to
  happen? Are there any resolutions? 
 
  Here are some entries from the apache log for the last time this
  issue happened. 
 
  The Apache error log shows the following lines: 
 
  [Tue Jan 31 11:37:23 2012] [error] [client 9.31.13.109] Could not
  MERGE resource /repository/!svn/txn/61847-1bz5 into
  /repository/project/binaries/release/phase1/iteration/81/trunk
  . [409, #0]
  [Tue Jan 31 11:37:23 2012] [error] [client 9.31.13.109] An error
  occurred while committing the transaction.  [409, #160004]
  [Tue Jan 31 11:37:23 2012] [error] [client 9.31.13.109] predecessor
  count for the root node-revision is wrong: found 61815,
  committing r61852  [409, #160004]
  [Tue Jan 31 11:37:24 2012] [error] [client 9.31.13.109] Could not
  fetch resource information.  [404, #0]
  [Tue Jan 31 11:37:24 2012] [error] [client 9.31.13.109] Named
  transaction doesn't exist.  [404, #0] 
 
  The following are from the Apache access log at the same time: 
 
  9.31.13.109 - username [31/Jan/2012:11:37:22 -0800] MERGE
  /repository/project/binaries/release/phase1/iteration/81/trunk
  HTTP/1.1 409 281
  9.31.13.109 - username [31/Jan/2012:11:37:24 -0800] DELETE
  /repository/!svn/txn/61847-1bz5 HTTP/1.1 404 232 
 
  Any help/comments would be appreciated. Thank you. 
 
 
  Jason Wong
 
  



Re: predecessor count for the root node-revision is wrong message

2012-03-13 Thread Jason Wong
On Fri, Mar 2, 2012 at 8:12 AM, Daniel Shahaf danie...@elego.de wrote:
 Jason Wong wrote on Fri, Mar 02, 2012 at 07:32:38 -0800:
 On Fri, Mar 2, 2012 at 2:58 AM, Daniel Shahaf danie...@elego.de wrote:
  Jason Wong wrote on Thu, Mar 01, 2012 at 10:01:26 -0800:
  I have had a developer here create a build of the latest SVN code
  with your changes you mentioned in r1294470 for the svnadmin verify
 
  Okay, that's great news, for two reasons:
 
  1. It means building svn on windows isn't as painful as it used to be :)

 Actually, it did take some work to get it going as we did not have
 another system available to us and also did not have VC++ 6. We had
 to use VS 2010 in order to do this. Also, for the other components
 required (python,perl etc), the files after the install were copied
 to the workstation to see if it would work as we did not want to
 change the current workstation configuration by running the
 installers. All in all, it did seem to work.


 Okay.  The normal build requires just the *.exe and *.dll files to be
 placed appropriately (such that the *.exe's and httpd's find their
 libsvn_* DLL's at runtime) --- it doesn't require Administrator access,
 for example.

 To clarify, Perl is only required to build OpenSSL; it is not required
 to build APR, Neon, or Subversion.

 
  2. It means I can ask you to build a custom server with the 'inprocess'
  cache disabled, or (if all else fails) to bisect, per my previous email.
 
  One of the things you could try is to disable caching: simply modify
  the function create_cache() in libsvn_fs_fs/caching.c to always return
  NULL in *CACHE_P.  See below for another suggestion.
 
  command. We have run 'svnadmin verify' against every revision of our
  hotcopy of our repository taken when we first brought this issue to
  the forums and are now tracking down each of the revisions to see
  what actions were being done at those times.
 
 
  Thanks!  I do hope this work enables us to pinpoint and fix the bug.

 I will be going through the list to see what else was happening at the
 same time on the apache server since it was alluded to that there may
 be concurrency issues. I know the last two times that this error has
 popped up, we had two svn operations starting at around the same time
 according to the Apache logs. I will go through the previous apache
 history to see if this was always the case or not.


 Thanks, looking forward to hear what you come up with.

 FWIW, Justin's reply suggests that the error was seen on three different
 platforms --- Windows, Solaris, and FreeBSD --- so that should narrow
 down the range of possible explanations.

 (I'll also note that at ASF's installation we are not running into new
 instances of the bug.)

Hi Daniel.

I haven't gone through all the cases yet, but I have made progress
through quite a number of them and a pattern seems to be coming up.

I have attached 2 txt files. One shows the modified svnadmin verify
output from the binaries we built. The other shows the revisions and
what appears to have been occuring at the time of the bug. I figure
better to provide this now rather than delay any longer for the rest
of the results.

I will continue to go through the rest of the events and see if
there are other differences seen when the issue occurs. I hope
this information helps.

Thanks.

Jason
SVN log history for predecessor node error: from svnadmin verify

svnadmin: E160004: predecessor count for the root node-revision is wrong: 
r45558 has 45557, but r45557 has 45557
svnadmin: E160004: predecessor count for the root node-revision is wrong: 
r46947 has 46945, but r46946 has 46945
svnadmin: E160004: predecessor count for the root node-revision is wrong: 
r46997 has 46994, but r46996 has 46994
svnadmin: E160004: predecessor count for the root node-revision is wrong: 
r47004 has 47000, but r47003 has 47000
svnadmin: E160004: predecessor count for the root node-revision is wrong: 
r47006 has 47001, but r47005 has 47001
svnadmin: E160004: predecessor count for the root node-revision is wrong: 
r47193 has 47187, but r47192 has 47187
svnadmin: E160004: predecessor count for the root node-revision is wrong: 
r47715 has 47708, but r47714 has 47708
svnadmin: E160004: predecessor count for the root node-revision is wrong: 
r47718 has 47710, but r47717 has 47710
svnadmin: E160004: predecessor count for the root node-revision is wrong: 
r50049 has 50040, but r50048 has 50040
svnadmin: E160004: predecessor count for the root node-revision is wrong: 
r50963 has 50953, but r50962 has 50953
svnadmin: E160004: predecessor count for the root node-revision is wrong: 
r51481 has 51470, but r51480 has 51470
svnadmin: E160004: predecessor count for the root node-revision is wrong: 
r51684 has 51672, but r51683 has 51672
svnadmin: E160004: predecessor count for the root node-revision is wrong: 
r52082 has 52069, but r52081 has 52069
svnadmin: E160004: predecessor count for the root node-revision is wrong: 
r53220 has 53205, but 

Re: predecessor count for the root node-revision is wrong message

2012-03-02 Thread Daniel Shahaf
Justin Johnson wrote on Thu, Mar 01, 2012 at 07:45:08 -0600:
 On Wed, Feb 29, 2012 at 4:14 PM, Justin Johnson 
 justinandto...@gmail.comwrote:
  On Wed, Feb 29, 2012 at 11:22 AM, Daniel Shahaf danie...@elego.de wrote:
  ... so please try SVNInMemoryCacheSize 0, and see if that makes the
  issue less frequent.
 
 
  I'm a dork.  I will do so once I take care of the appropriate change
  control I have to deal with.  Thanks.
 
 
 We made the change and problem is still occurring.

OK.  So that means that either membuffer isn't the cause of the error,
or both membuffer and inprocess have (independent) bugs.

That said, I'm happy to see that it took you less than a day to report
that the bug no longer reproduces.  Can you reproduce it at will?

---

I'm not completely sure what to try next.  Were there any other
significant changes in 1.6.x-1.7.x besides caching?  If so, the
next steps would be to disable _both_ inprocess and membuffer, or
to bisect.

Unfortunately, the latter requires building svn from source, and the
former requires either that, or having you upgrade to specially-built
binaries, or squeezing the disable inprocess feature into 1.7.4 (in
a way that doesn't break our compat policy too badly) and having you
upgrade to that.

devs/all --- thoughts?  Is there a simpler way forward than the two
potential 'next step's I've outlined here?


Re: predecessor count for the root node-revision is wrong message

2012-03-02 Thread Daniel Shahaf
Daniel Shahaf wrote on Wed, Feb 29, 2012 at 18:15:41 +0200:
 Justin, Jason,
 
 Some things you could do are:
 
 - What RA method do you use?  svn:// or http://?
 

Justin, what operating system does your server run?



Re: predecessor count for the root node-revision is wrong message

2012-03-02 Thread Daniel Shahaf
Jason Wong wrote on Thu, Mar 01, 2012 at 10:01:26 -0800:
 I have had a developer here create a build of the latest SVN code
 with your changes you mentioned in r1294470 for the svnadmin verify

Okay, that's great news, for two reasons:

1. It means building svn on windows isn't as painful as it used to be :)

2. It means I can ask you to build a custom server with the 'inprocess'
cache disabled, or (if all else fails) to bisect, per my previous email.

One of the things you could try is to disable caching: simply modify
the function create_cache() in libsvn_fs_fs/caching.c to always return
NULL in *CACHE_P.  See below for another suggestion.

 command. We have run 'svnadmin verify' against every revision of our
 hotcopy of our repository taken when we first brought this issue to
 the forums and are now tracking down each of the revisions to see
 what actions were being done at those times.
 

Thanks!  I do hope this work enables us to pinpoint and fix the bug.

 From the results, we see 25 error messages for predecessor count is
 wrong and the first one appeared on January 26, 2011. Near that time
 the following events occurred:
Jan. 14, 2011 - svn upgraded from 1.6.6 to 1.6.15
Jan. 14, 2011 - Apache HTTP server upgraded from 2.2.15 to 2.2.17
Jan. 21, 2011 - repository was pruned to delete some binary files.
   
 Between January and our upgrade in Dec. to 1.7.1, we have had about
 14,000 revisions and seen only 25 instances of this node revision
 issue. During the times we had these errors, we were using svn
 versions 1.6.15 and 1.6.16.
 

Thanks, very valuable information.

I've reviewed the 1.6.6-1.6.15 diff, and I have the following
suggestions:

- Change subversion/libsvn_fs_fs/fs.h such that
  SVN_FS_FS__USE_LOCK_MUTEX is set to 1.  It was set to 1 in 1.6.6
  but to 0 in 1.6.15.

  (This wouldn't explain why ASF saw it, but it might explain why you're
  seeing it.)

 Fail2ban from what I could find does not look like it has a Windows
 port which I currently have my production environment hosted on.
 

Yeah, sorry.  But you can write a cron job -- I mean, a Scheduled Task
-- that greps your error logs for 160004 every night and mails you it
it found anything, right?

That's the error code to watch for for many FS error conditions:
% ./tools/dev/which-error.py E160004
00160004  SVN_ERR_FS_CORRUPT

 Thanks.
 
 Jason

For convenience I'm attaching a patch that implements both of my
suggestions.  Let us know please if it has any effect.

Cheers,

Daniel

Index: subversion/libsvn_fs_fs/fs.h
===
--- subversion/libsvn_fs_fs/fs.h	(revision 1295418)
+++ subversion/libsvn_fs_fs/fs.h	(working copy)
@@ -161,7 +161,7 @@ typedef struct fs_fs_shared_txn_data_t
per file.  On Windows apr implements the locking as per file handle
locks, so we don't have to add our own mutex for just in-process
synchronization. */
-#if APR_HAS_THREADS  !defined(WIN32)
+#if APR_HAS_THREADS /* disabled: and !defined(WIN32) */
 #define SVN_FS_FS__USE_LOCK_MUTEX 1
 #else
 #define SVN_FS_FS__USE_LOCK_MUTEX 0
Index: subversion/libsvn_fs_fs/caching.c
===
--- subversion/libsvn_fs_fs/caching.c	(revision 1295418)
+++ subversion/libsvn_fs_fs/caching.c	(working copy)
@@ -209,6 +209,9 @@ create_cache(svn_cache__t **cache_p,
  const char *prefix,
  apr_pool_t *pool)
 {
+  *cache_p = NULL;
+  return SVN_NO_ERROR;
+
 if (memcache)
   {
 SVN_ERR(svn_cache__create_memcache(cache_p, memcache,


Re: predecessor count for the root node-revision is wrong message

2012-03-02 Thread Daniel Shahaf
Daniel Shahaf wrote on Fri, Mar 02, 2012 at 12:21:46 +0200:
 Daniel Shahaf wrote on Wed, Feb 29, 2012 at 18:15:41 +0200:
  Justin, Jason,
  
  Some things you could do are:
  
  - What RA method do you use?  svn:// or http://?
  
 
 Justin, what operating system does your server run?
 

... and what versions of httpd (Apache) and APR.


Re: predecessor count for the root node-revision is wrong message

2012-03-02 Thread Justin Johnson
On Fri, Mar 2, 2012 at 3:41 AM, Daniel Shahaf danie...@elego.de wrote:

 Justin Johnson wrote on Thu, Mar 01, 2012 at 07:45:08 -0600:
  On Wed, Feb 29, 2012 at 4:14 PM, Justin Johnson 
 justinandto...@gmail.comwrote:
   On Wed, Feb 29, 2012 at 11:22 AM, Daniel Shahaf danie...@elego.de
 wrote:
   ... so please try SVNInMemoryCacheSize 0, and see if that makes the
   issue less frequent.
  
  
   I'm a dork.  I will do so once I take care of the appropriate change
   control I have to deal with.  Thanks.
  
 
  We made the change and problem is still occurring.

 OK.  So that means that either membuffer isn't the cause of the error,
 or both membuffer and inprocess have (independent) bugs.

 That said, I'm happy to see that it took you less than a day to report
 that the bug no longer reproduces.  Can you reproduce it at will?


No, but our server has about 2500 active users on it, so it doesn't take
long to see errors.  If we make a configuration change and a day goes by
without the error showing up in the logs I would suspect the problem was
fixed.  If a week were to pass with no error I would feel much more
confident.

Yesterday had 6 of the errors and 1 today so far.


Re: predecessor count for the root node-revision is wrong message

2012-03-02 Thread Justin Johnson
On Fri, Mar 2, 2012 at 6:21 AM, Daniel Shahaf danie...@elego.de wrote:

 Daniel Shahaf wrote on Fri, Mar 02, 2012 at 12:21:46 +0200:
  Daniel Shahaf wrote on Wed, Feb 29, 2012 at 18:15:41 +0200:
   Justin, Jason,
  
   Some things you could do are:
  
   - What RA method do you use?  svn:// or http://?
  
 
  Justin, what operating system does your server run?
 

 ... and what versions of httpd (Apache) and APR.


Solaris 10
apr 1.4.5
httpd 2.2.19


Re: predecessor count for the root node-revision is wrong message

2012-03-02 Thread Jason Wong
On Fri, Mar 2, 2012 at 2:58 AM, Daniel Shahaf danie...@elego.de wrote:
 Jason Wong wrote on Thu, Mar 01, 2012 at 10:01:26 -0800:
 I have had a developer here create a build of the latest SVN code
 with your changes you mentioned in r1294470 for the svnadmin verify

 Okay, that's great news, for two reasons:

 1. It means building svn on windows isn't as painful as it used to be :)

Actually, it did take some work to get it going as we did not have
another system available to us and also did not have VC++ 6. We had
to use VS 2010 in order to do this. Also, for the other components
required (python,perl etc), the files after the install were copied
to the workstation to see if it would work as we did not want to
change the current workstation configuration by running the
installers. All in all, it did seem to work.


 2. It means I can ask you to build a custom server with the 'inprocess'
 cache disabled, or (if all else fails) to bisect, per my previous email.

 One of the things you could try is to disable caching: simply modify
 the function create_cache() in libsvn_fs_fs/caching.c to always return
 NULL in *CACHE_P.  See below for another suggestion.

 command. We have run 'svnadmin verify' against every revision of our
 hotcopy of our repository taken when we first brought this issue to
 the forums and are now tracking down each of the revisions to see
 what actions were being done at those times.


 Thanks!  I do hope this work enables us to pinpoint and fix the bug.

I will be going through the list to see what else was happening at the
same time on the apache server since it was alluded to that there may
be concurrency issues. I know the last two times that this error has
popped up, we had two svn operations starting at around the same time
according to the Apache logs. I will go through the previous apache
history to see if this was always the case or not.


 From the results, we see 25 error messages for predecessor count is
 wrong and the first one appeared on January 26, 2011. Near that time
 the following events occurred:
Jan. 14, 2011 - svn upgraded from 1.6.6 to 1.6.15
Jan. 14, 2011 - Apache HTTP server upgraded from 2.2.15 to 2.2.17
Jan. 21, 2011 - repository was pruned to delete some binary files.

 Between January and our upgrade in Dec. to 1.7.1, we have had about
 14,000 revisions and seen only 25 instances of this node revision
 issue. During the times we had these errors, we were using svn
 versions 1.6.15 and 1.6.16.


 Thanks, very valuable information.

 I've reviewed the 1.6.6-1.6.15 diff, and I have the following
 suggestions:

 - Change subversion/libsvn_fs_fs/fs.h such that
  SVN_FS_FS__USE_LOCK_MUTEX is set to 1.  It was set to 1 in 1.6.6
  but to 0 in 1.6.15.

  (This wouldn't explain why ASF saw it, but it might explain why you're
  seeing it.)

 Fail2ban from what I could find does not look like it has a Windows
 port which I currently have my production environment hosted on.


 Yeah, sorry.  But you can write a cron job -- I mean, a Scheduled Task
 -- that greps your error logs for 160004 every night and mails you it
 it found anything, right?

 That's the error code to watch for for many FS error conditions:
 % ./tools/dev/which-error.py E160004
 00160004  SVN_ERR_FS_CORRUPT


I will look into it. We did ask developers to note any error messages
that they see from tortoisesvn now as the last time we saw the error
message pop up, we asked the developer what happened and he said that
an error message popped up and he just tried to check in again and it
worked. We will note the exact message next time.

 Thanks.

 Jason

 For convenience I'm attaching a patch that implements both of my
 suggestions.  Let us know please if it has any effect.


I will forward this to the developer to look at.

 Cheers,

 Daniel


Hi.

See replies above. I will post what we find.

Thanks.

Jason


Re: predecessor count for the root node-revision is wrong message

2012-03-02 Thread Daniel Shahaf
Jason Wong wrote on Fri, Mar 02, 2012 at 07:32:38 -0800:
 On Fri, Mar 2, 2012 at 2:58 AM, Daniel Shahaf danie...@elego.de wrote:
  Jason Wong wrote on Thu, Mar 01, 2012 at 10:01:26 -0800:
  I have had a developer here create a build of the latest SVN code
  with your changes you mentioned in r1294470 for the svnadmin verify
 
  Okay, that's great news, for two reasons:
 
  1. It means building svn on windows isn't as painful as it used to be :)
 
 Actually, it did take some work to get it going as we did not have
 another system available to us and also did not have VC++ 6. We had
 to use VS 2010 in order to do this. Also, for the other components
 required (python,perl etc), the files after the install were copied
 to the workstation to see if it would work as we did not want to
 change the current workstation configuration by running the
 installers. All in all, it did seem to work.
 

Okay.  The normal build requires just the *.exe and *.dll files to be
placed appropriately (such that the *.exe's and httpd's find their
libsvn_* DLL's at runtime) --- it doesn't require Administrator access,
for example.

To clarify, Perl is only required to build OpenSSL; it is not required
to build APR, Neon, or Subversion.

 
  2. It means I can ask you to build a custom server with the 'inprocess'
  cache disabled, or (if all else fails) to bisect, per my previous email.
 
  One of the things you could try is to disable caching: simply modify
  the function create_cache() in libsvn_fs_fs/caching.c to always return
  NULL in *CACHE_P.  See below for another suggestion.
 
  command. We have run 'svnadmin verify' against every revision of our
  hotcopy of our repository taken when we first brought this issue to
  the forums and are now tracking down each of the revisions to see
  what actions were being done at those times.
 
 
  Thanks!  I do hope this work enables us to pinpoint and fix the bug.
 
 I will be going through the list to see what else was happening at the
 same time on the apache server since it was alluded to that there may
 be concurrency issues. I know the last two times that this error has
 popped up, we had two svn operations starting at around the same time
 according to the Apache logs. I will go through the previous apache
 history to see if this was always the case or not.
 

Thanks, looking forward to hear what you come up with.

FWIW, Justin's reply suggests that the error was seen on three different
platforms --- Windows, Solaris, and FreeBSD --- so that should narrow
down the range of possible explanations.

(I'll also note that at ASF's installation we are not running into new
instances of the bug.)


Re: predecessor count for the root node-revision is wrong message

2012-03-01 Thread Justin Johnson
On Wed, Feb 29, 2012 at 4:14 PM, Justin Johnson justinandto...@gmail.comwrote:

 On Wed, Feb 29, 2012 at 11:22 AM, Daniel Shahaf danie...@elego.de wrote:

 Justin Johnson wrote on Wed, Feb 29, 2012 at 11:11:18 -0600:
  On Wed, Feb 29, 2012 at 10:35 AM, Daniel Shahaf danie...@elego.de
 wrote:
 
   Justin Johnson wrote on Wed, Feb 29, 2012 at 10:25:38 -0600:
On Wed, Feb 29, 2012 at 10:15 AM, Daniel Shahaf danie...@elego.de
   wrote:
 - Are the failing revisions always small (eg: just a URL-URL
 copy),
  or always large (eg: results of a merge)?


As mentioned before, so far it appears to be 1) create a tag by
 copying
   an
entire working copy of a branch to a URL, and 2) commit merge
 results for
an entire branch.
   
  
   That's not clear enough.  Could you show 'log -qv' of those revisions?
  
   A wc-to-URL copy could touch just one or two files (compare
   `svn log -qv --stop-on-copy
   http://svn.apache.org/repos/asf/subversion/tags/1.7.3`http://svn.apache.org/repos/asf/subversion/tags/1.7.3
 )
   or a full tree (http://subversion.apache.org/faq.html#in-place-import
 ).
   Which is it?
  
  
  The commits fail, so there is no revision to run this against.  Other
 tags
  that have succeeded seem to just have one added path that is a copy of
 the
  branch at revision x.  Does that answer your question?
 

 Yes, thanks.

 
   
 - Could you try setting the maximum cache size to zero?
  (svnserve:
  --memory-cache-size=0; mod_dav_svn: SVNInMemoryCacheSize 0)


Apache is our server, so this is not applicable.
  
   SVNInMemoryCacheSize is applicable.
  
 
  Sorry, I missed that one.  We have not specified SVNInMemoryCacheSize,
 so
  we're using the default.

 ... so please try SVNInMemoryCacheSize 0, and see if that makes the
 issue less frequent.


 I'm a dork.  I will do so once I take care of the appropriate change
 control I have to deal with.  Thanks.


We made the change and problem is still occurring.


Re: predecessor count for the root node-revision is wrong message

2012-03-01 Thread Justin Johnson

 - Are the failing revisions always small (eg: just a URL-URL copy),
  or always large (eg: results of a merge)?


As mentioned before, so far it appears to be 1) create a tag by
 copying
   an
entire working copy of a branch to a URL, and 2) commit merge
 results for
an entire branch.
   
  
   That's not clear enough.  Could you show 'log -qv' of those
 revisions?
  
   A wc-to-URL copy could touch just one or two files (compare
   `svn log -qv --stop-on-copy
   http://svn.apache.org/repos/asf/subversion/tags/1.7.3`http://svn.apache.org/repos/asf/subversion/tags/1.7.3
 )
   or a full tree (
 http://subversion.apache.org/faq.html#in-place-import).
   Which is it?
  
  
  The commits fail, so there is no revision to run this against.  Other
 tags
  that have succeeded seem to just have one added path that is a copy of
 the
  branch at revision x.  Does that answer your question?
 

 Yes, thanks.

 
   
 - Could you try setting the maximum cache size to zero?
  (svnserve:
  --memory-cache-size=0; mod_dav_svn: SVNInMemoryCacheSize 0)


Apache is our server, so this is not applicable.
  
   SVNInMemoryCacheSize is applicable.
  
 
  Sorry, I missed that one.  We have not specified SVNInMemoryCacheSize,
 so
  we're using the default.

 ... so please try SVNInMemoryCacheSize 0, and see if that makes the
 issue less frequent.


 I'm a dork.  I will do so once I take care of the appropriate change
 control I have to deal with.  Thanks.


 We made the change and problem is still occurring.


The error occurs when a tag is created by copying from a URL to a URL as
well.  Note that the clients are all 1.6 at this time.  If it is helpful I
can try to setup some tests with a 1.7 client.

To make sure I understand the issue, should I be concerned about the
repositories and our ability to reproduce the history or recover from any
corruption that this bug may have caused?

Let me know if I can help in any other way.

Thanks.
Justin


Re: predecessor count for the root node-revision is wrong message

2012-03-01 Thread Daniel Shahaf
Justin Johnson wrote on Thu, Mar 01, 2012 at 08:28:20 -0600:
 To make sure I understand the issue, should I be concerned about the
 repositories and our ability to reproduce the history or recover from any
 corruption that this bug may have caused?

The only known (and predicted) effect of the error is that some
revisions are wrongly skipped during a backward history walk --- such as
'log -r HEAD:0' does.

Creating an svnsync mirror is enough to fix the issue.  (There is no
need to check the mirror, even: if the mirror is written to by a 1.7
server (or a 1.7 svnsync over file://), the normal validation that
Jason's error logs show will be done automatically.)


Re: predecessor count for the root node-revision is wrong message

2012-02-29 Thread Justin Johnson
On Tue, Feb 28, 2012 at 3:07 AM, Daniel Shahaf danie...@elego.de wrote:

 Jason Wong wrote on Mon, Feb 27, 2012 at 07:36:39 -0800:
  On Thu, Feb 16, 2012 at 12:14 PM, Daniel Shahaf danie...@elego.de
 wrote:
 
  
   The output from these two tells me two things:
  
   1. The minfo-cnt value is reasonable (within a typical ballpark).
   That's relevant since minfo-cnt abnormalities were seen in another
   instance of the bug.
  
   2. Everything else looks correct: the 'id:'/'pred:' headers are
 accurate,
   and the 'count:' header was incremented correctly.  The 'count:' header
   does, however, indicate that your repository has _in the past_
 triggered
   an instance of the bug.
 
  This is true. We have seen the bug happen before. The first occurence
  of this that we had seen was Dec. 7th, 2011, a few days after we went
  from 1.6.16 to 1.7.1. That was the first time we had seen that happen.
  At the time, we did not know about the cause and the developer who
  had encountered the error didn't report it and was able to work

 Well, install fail2ban and have it mail you when that string appears
 in the logs?  I'll do so too...

  around it. From the Apache logs we have:
 
[Wed Dec 07 15:16:36 2011] [error] [client 10.2.3.1] predecessor
count for the root node-revision is wrong: found 59444,
committing r59478  [409, #160004]
[Wed Dec 07 15:33:47 2011] [error] [client 10.2.3.2] predecessor
count for the root node-revision is wrong: found 59482,
committing r59516  [409, #160004]
[Wed Dec 07 15:35:19 2011] [error] [client 10.2.3.3] predecessor
count for the root node-revision is wrong: found 59488,
committing r59522  [409, #160004]

 As Stefan mentioned, these represent commit attempts that were rejected
 in order to prevent a new instance of the bug from entering the history.

[Wed Dec 07 15:44:10 2011] [error] [client 10.2.3.4] predecessor
count for the root node-revision is wrong: found 59505,
committing r59539  [409, #160004]
 
  Of the ips above, the last line is from the build machine. The others
  were from developer workstations. I mentioned the most recent two
  times first as we were actually aware of the issue at that time and
  it was recent so we knew to start looking into it. Between Dec. 7 and
  Jan. 31, the bug has occurred 12 times, 3 of those times from the
  build server. The rest are from workstations. This month, it has only
  occurred once and it was from the build server.
 

 What percentage of your commits are from the build server?

 Is there anything noteworthy about commits that were in progress around
 the time the bug occurred?  (their svn:date's would be near the time
 stamp in the httpd log)

  Each of these times, the error has occurred in different parts of
  the repository.
 
  Replies above. Sorry about the delay in replying. I have been really
  busy of late. I will try and get the results this week, if not, it
  will most likely be next week.
 

 No problem.


I just discovered we are having this problem as well.  Since upgrading from
1.6.17 to 1.7.2 on the evening of 2/21 we have seen this error on 10
different machines.  9 of those are build machines, so they would have been
creating tags.  I have a suspicion that all of the builds were trying to
create tags by copying from a working copy as opposed to a URL to URL copy,
but I need to confirm.  I know at least some of them were.  The other 1
machine received the error when trying to commit merge results.  The
details of the merge are not available to me since the user reverted the
local merge results.

Let me know if I can provide any more info to help solve this problem.

Justin


Re: predecessor count for the root node-revision is wrong message

2012-02-29 Thread Daniel Shahaf
Justin, Jason,

Some things you could do are:

- What RA method do you use?  svn:// or http://?

- Are the failing revisions always small (eg: just a URL-URL copy),
  or always large (eg: results of a merge)?

- Do you have any caching enabled at the OS filesystem layer or
  below it?

- Did you configure Subversion to use memcached?

- Did you configure a maximum cache size for Subversion?

- Could you try setting the maximum cache size to zero?  (svnserve:
  --memory-cache-size=0; mod_dav_svn: SVNInMemoryCacheSize 0)

  This doesn't entirely disable caching, actually -- it just disables
  the new 'membuffer' cache backend, reverting to another older
  backend called 'inprocess'.

@Justin -- thanks for the information.  I've asked Jason upthread some
questions (eg, whether there patterns of commits concurrent to the
failing ones), which likely apply to you too.

Daniel

Justin Johnson wrote on Wed, Feb 29, 2012 at 09:22:40 -0600:
 On Tue, Feb 28, 2012 at 3:07 AM, Daniel Shahaf danie...@elego.de wrote:
  Jason Wong wrote on Mon, Feb 27, 2012 at 07:36:39 -0800:
   Replies above. Sorry about the delay in replying. I have been really
   busy of late. I will try and get the results this week, if not, it
   will most likely be next week.
  
 
  No problem.
 
 
 I just discovered we are having this problem as well.  Since upgrading from
 1.6.17 to 1.7.2 on the evening of 2/21 we have seen this error on 10
 different machines.  9 of those are build machines, so they would have been
 creating tags.  I have a suspicion that all of the builds were trying to
 create tags by copying from a working copy as opposed to a URL to URL copy,
 but I need to confirm.  I know at least some of them were.  The other 1
 machine received the error when trying to commit merge results.  The
 details of the merge are not available to me since the user reverted the
 local merge results.
 
 Let me know if I can provide any more info to help solve this problem.
 
 Justin


Re: predecessor count for the root node-revision is wrong message

2012-02-29 Thread Justin Johnson
On Wed, Feb 29, 2012 at 10:15 AM, Daniel Shahaf danie...@elego.de wrote:

 Justin, Jason,

 Some things you could do are:

 - What RA method do you use?  svn:// or http://?


http://


 - Are the failing revisions always small (eg: just a URL-URL copy),
  or always large (eg: results of a merge)?


As mentioned before, so far it appears to be 1) create a tag by copying an
entire working copy of a branch to a URL, and 2) commit merge results for
an entire branch.


 - Do you have any caching enabled at the OS filesystem layer or
  below it?


No.


 - Did you configure Subversion to use memcached?


No.


 - Did you configure a maximum cache size for Subversion?


No.


 - Could you try setting the maximum cache size to zero?  (svnserve:
  --memory-cache-size=0; mod_dav_svn: SVNInMemoryCacheSize 0)


Apache is our server, so this is not applicable.


  This doesn't entirely disable caching, actually -- it just disables
  the new 'membuffer' cache backend, reverting to another older
  backend called 'inprocess'.

 @Justin -- thanks for the information.  I've asked Jason upthread some
 questions (eg, whether there patterns of commits concurrent to the
 failing ones), which likely apply to you too.

 Daniel

 Justin Johnson wrote on Wed, Feb 29, 2012 at 09:22:40 -0600:
  On Tue, Feb 28, 2012 at 3:07 AM, Daniel Shahaf danie...@elego.de
 wrote:
   Jason Wong wrote on Mon, Feb 27, 2012 at 07:36:39 -0800:
Replies above. Sorry about the delay in replying. I have been really
busy of late. I will try and get the results this week, if not, it
will most likely be next week.
   
  
   No problem.
  
 
  I just discovered we are having this problem as well.  Since upgrading
 from
  1.6.17 to 1.7.2 on the evening of 2/21 we have seen this error on 10
  different machines.  9 of those are build machines, so they would have
 been
  creating tags.  I have a suspicion that all of the builds were trying to
  create tags by copying from a working copy as opposed to a URL to URL
 copy,
  but I need to confirm.  I know at least some of them were.  The other 1
  machine received the error when trying to commit merge results.  The
  details of the merge are not available to me since the user reverted the
  local merge results.
 
  Let me know if I can provide any more info to help solve this problem.
 
  Justin



Re: predecessor count for the root node-revision is wrong message

2012-02-29 Thread Daniel Shahaf
Justin Johnson wrote on Wed, Feb 29, 2012 at 10:25:38 -0600:
 On Wed, Feb 29, 2012 at 10:15 AM, Daniel Shahaf danie...@elego.de wrote:
  - Are the failing revisions always small (eg: just a URL-URL copy),
   or always large (eg: results of a merge)?
 
 
 As mentioned before, so far it appears to be 1) create a tag by copying an
 entire working copy of a branch to a URL, and 2) commit merge results for
 an entire branch.
 

That's not clear enough.  Could you show 'log -qv' of those revisions?

A wc-to-URL copy could touch just one or two files (compare
`svn log -qv --stop-on-copy 
http://svn.apache.org/repos/asf/subversion/tags/1.7.3`)
or a full tree (http://subversion.apache.org/faq.html#in-place-import).
Which is it?

 
  - Could you try setting the maximum cache size to zero?  (svnserve:
   --memory-cache-size=0; mod_dav_svn: SVNInMemoryCacheSize 0)
 
 
 Apache is our server, so this is not applicable.

SVNInMemoryCacheSize is applicable.


Re: predecessor count for the root node-revision is wrong message

2012-02-29 Thread Daniel Shahaf
Justin Johnson wrote on Wed, Feb 29, 2012 at 11:11:18 -0600:
 On Wed, Feb 29, 2012 at 10:35 AM, Daniel Shahaf danie...@elego.de wrote:
 
  Justin Johnson wrote on Wed, Feb 29, 2012 at 10:25:38 -0600:
   On Wed, Feb 29, 2012 at 10:15 AM, Daniel Shahaf danie...@elego.de
  wrote:
- Are the failing revisions always small (eg: just a URL-URL copy),
 or always large (eg: results of a merge)?
   
   
   As mentioned before, so far it appears to be 1) create a tag by copying
  an
   entire working copy of a branch to a URL, and 2) commit merge results for
   an entire branch.
  
 
  That's not clear enough.  Could you show 'log -qv' of those revisions?
 
  A wc-to-URL copy could touch just one or two files (compare
  `svn log -qv --stop-on-copy
  http://svn.apache.org/repos/asf/subversion/tags/1.7.3`)
  or a full tree (http://subversion.apache.org/faq.html#in-place-import).
  Which is it?
 
 
 The commits fail, so there is no revision to run this against.  Other tags
 that have succeeded seem to just have one added path that is a copy of the
 branch at revision x.  Does that answer your question?
 

Yes, thanks.

 
  
- Could you try setting the maximum cache size to zero?  (svnserve:
 --memory-cache-size=0; mod_dav_svn: SVNInMemoryCacheSize 0)
   
   
   Apache is our server, so this is not applicable.
 
  SVNInMemoryCacheSize is applicable.
 
 
 Sorry, I missed that one.  We have not specified SVNInMemoryCacheSize, so
 we're using the default.

... so please try SVNInMemoryCacheSize 0, and see if that makes the
issue less frequent.


Re: predecessor count for the root node-revision is wrong message

2012-02-29 Thread Justin Johnson
On Wed, Feb 29, 2012 at 11:22 AM, Daniel Shahaf danie...@elego.de wrote:

 Justin Johnson wrote on Wed, Feb 29, 2012 at 11:11:18 -0600:
  On Wed, Feb 29, 2012 at 10:35 AM, Daniel Shahaf danie...@elego.de
 wrote:
 
   Justin Johnson wrote on Wed, Feb 29, 2012 at 10:25:38 -0600:
On Wed, Feb 29, 2012 at 10:15 AM, Daniel Shahaf danie...@elego.de
   wrote:
 - Are the failing revisions always small (eg: just a URL-URL copy),
  or always large (eg: results of a merge)?


As mentioned before, so far it appears to be 1) create a tag by
 copying
   an
entire working copy of a branch to a URL, and 2) commit merge
 results for
an entire branch.
   
  
   That's not clear enough.  Could you show 'log -qv' of those revisions?
  
   A wc-to-URL copy could touch just one or two files (compare
   `svn log -qv --stop-on-copy
   http://svn.apache.org/repos/asf/subversion/tags/1.7.3`)
   or a full tree (http://subversion.apache.org/faq.html#in-place-import
 ).
   Which is it?
  
  
  The commits fail, so there is no revision to run this against.  Other
 tags
  that have succeeded seem to just have one added path that is a copy of
 the
  branch at revision x.  Does that answer your question?
 

 Yes, thanks.

 
   
 - Could you try setting the maximum cache size to zero?  (svnserve:
  --memory-cache-size=0; mod_dav_svn: SVNInMemoryCacheSize 0)


Apache is our server, so this is not applicable.
  
   SVNInMemoryCacheSize is applicable.
  
 
  Sorry, I missed that one.  We have not specified SVNInMemoryCacheSize, so
  we're using the default.

 ... so please try SVNInMemoryCacheSize 0, and see if that makes the
 issue less frequent.


I'm a dork.  I will do so once I take care of the appropriate change
control I have to deal with.  Thanks.


Re: predecessor count for the root node-revision is wrong message

2012-02-28 Thread Daniel Shahaf
Daniel Shahaf wrote on Tue, Feb 28, 2012 at 07:17:04 +0200:
 Stefan Sperling wrote on Tue, Feb 28, 2012 at 03:18:35 +0100:
  On Mon, Feb 27, 2012 at 03:25:00PM -0800, Jason Wong wrote:
   I guess I am wondering that if this is the case, then why is it that
   if the check-in fails, and then we manually check it in again using
   tortoisesvn, that it works the second time?
  
  As far as I know, we don't know what the underlying problem is yet.
  So I'm afraid I cannot answer all of your questions in a satisfactory
  manner because I don't know the answers.
 
 I'm not really surprised by this situation.  I guess Jason means the
 same set of tree/text/prop/revprop changes when he says it, but by
 the nature of the bug it may well have to do with some concurrency
 issues on the server --- i.e., it depends on more than just the contents
 being committed.
 
  Daniel might know more -- he has been following the problem more closely
  than me. And I hope he will correct me if I'm making any wrong or
  misleading statements in this post :)
 
 Yeah, I'm following this thread :)
 
   We have been using svn for a while now and I am
   wondering what this means that for 1.6, that this issue has been
   occurring from communcations between 1.6 client and 1.6 server.
  
  It has probably been happening since before the 1.7 release.
  The problem was first discovered in the ASF repository. The first
  occurrence there has been traced to a time before 1.7.0 was released.
  At the time the ASF server was still running 1.6, and the first commit
  that introduced the problem in the ASF repository was very likely
  a 1.6 client.
  
 
 AI We should check whether it _was_ a 1.6 client..
 
   Also, is this bug something that svnadmin verify will not detect?
   The last time we ran svnadmin verify, it said all was good.
  
  Apparently, svnadmin verify won't find it.
  This should probably be fixed. But the priority right now is to
  understand why it is happening in the first place.
  
 
 AI The 'verify' part is a relatively low-hanging fruit, though.
 

Done: r1294479 and followups.  See also r1294470 which improves the
error message.

The latter will hopefully be in 1.7.4.  svn_fs_verify() is not present
in 1.7.x so the r1294479 set of changes cannot easily be backported.

 AI File an issue for the pred-count bug

Issue #4129:
http://subversion.tigris.org/issues/show_bug.cgi?id=4129


Re: predecessor count for the root node-revision is wrong message

2012-02-27 Thread Jason Wong
On Thu, Feb 16, 2012 at 12:14 PM, Daniel Shahaf danie...@elego.de wrote:


 The output from these two tells me two things:

 1. The minfo-cnt value is reasonable (within a typical ballpark).
 That's relevant since minfo-cnt abnormalities were seen in another
 instance of the bug.

 2. Everything else looks correct: the 'id:'/'pred:' headers are accurate,
 and the 'count:' header was incremented correctly.  The 'count:' header
 does, however, indicate that your repository has _in the past_ triggered
 an instance of the bug.

This is true. We have seen the bug happen before. The first occurence
of this that we had seen was Dec. 7th, 2011, a few days after we went
from 1.6.16 to 1.7.1. That was the first time we had seen that happen.
At the time, we did not know about the cause and the developer who
had encountered the error didn't report it and was able to work
around it. From the Apache logs we have:

[Wed Dec 07 15:16:36 2011] [error] [client 10.2.3.1] predecessor
count for the root node-revision is wrong: found 59444,
committing r59478  [409, #160004]
[Wed Dec 07 15:33:47 2011] [error] [client 10.2.3.2] predecessor
count for the root node-revision is wrong: found 59482,
committing r59516  [409, #160004]
[Wed Dec 07 15:35:19 2011] [error] [client 10.2.3.3] predecessor
count for the root node-revision is wrong: found 59488,
committing r59522  [409, #160004]
[Wed Dec 07 15:44:10 2011] [error] [client 10.2.3.4] predecessor
count for the root node-revision is wrong: found 59505,
committing r59539  [409, #160004]

Of the ips above, the last line is from the build machine. The others
were from developer workstations. I mentioned the most recent two
times first as we were actually aware of the issue at that time and
it was recent so we knew to start looking into it. Between Dec. 7 and
Jan. 31, the bug has occurred 12 times, 3 of those times from the
build server. The rest are from workstations. This month, it has only
occurred once and it was from the build server.

Each of these times, the error has occurred in different parts of
the repository.


 In a bit more detail: the value of the 'count:' header should be equal to
 the revision number given as the third argument to dump-noderev.pl.
 (That revision number is also embedded in the 'id:' header, and is
 practically guaranteed to be embedded in the 'text:' header as well.)
 So, there are two things you can do to help us identify the bug:

 1. Hunt for past instance of the bug, identify what revisions triggered
 it, and try and identify a common pattern to those revisions.  (This
 basically calls for running 'dump-noderev.pl $REPOS /' in a loop and
 looking for non-sequential 'count:' or 'pred:' headers in the output for
 a pair of successive revisions.)

I will try and see if I can get this done this week.

 2. Look for new instances of the bug.  You could periodically scan for
 new instances of the bug, or implement a post-commit hook such as the
 following (written for unix-like systems, sorry):

 [[[
 # look for a corruption or two
 minfo_cnt() {
  dump-noderev.pl $REPOS / $1 | sed -ne 's/minfo-cnt: //p'
 }
 PREV_REV=`expr $REV - 1`
 if expr `minfo_cnt $PREV_REV` - `minfo_cnt $REV` | grep ... /dev/null; 
 then
  # echo an error to stderr and mail the admin
  exit 1
 fi

 skipped_root_noderevs() {
  expr $1 - `dump-noderev.pl $REPOS / $1 | sed -ne 's/^count: //p'`
 }
 if [ `skipped_root_noderevs $PREV_REV` -ne `skipped_root_noderevs $REV` 
 ]; then
  # echo an error to stderr and mail the admin
  exit 2
 fi
 ]]]


I will talk to the build team here about the post-commit hook. We have had
the bug occur again since my last reply.


 Replied above.  The summary is that you have indeed ran into the bug,
 but for some reason not in r61852 but sometime before that, (and why
 did r61852 trigger the syslog error anyway?  Good question) and now
 we're at the point of trying to identify the cause of the bug --- at
 least circumstantially.

 Thanks for your help so far,

 Daniel

Hi Daniel.

Replies above. Sorry about the delay in replying. I have been really
busy of late. I will try and get the results this week, if not, it
will most likely be next week.

Thanks

Jason.


Re: predecessor count for the root node-revision is wrong message

2012-02-27 Thread Stefan Sperling
On Mon, Feb 27, 2012 at 07:36:39AM -0800, Jason Wong wrote:
 This is true. We have seen the bug happen before. The first occurence
 of this that we had seen was Dec. 7th, 2011, a few days after we went
 from 1.6.16 to 1.7.1. That was the first time we had seen that happen.
 At the time, we did not know about the cause and the developer who
 had encountered the error didn't report it and was able to work
 around it. From the Apache logs we have:
 
   [Wed Dec 07 15:16:36 2011] [error] [client 10.2.3.1] predecessor
   count for the root node-revision is wrong: found 59444,
   committing r59478  [409, #160004]

Just to be clear: These errors emitted by the 1.7.1 server prevent the
bug from corrupting new revisions. With a 1.6 server the problem would
go unnoticed and create bad revision data.

When this corruption occurs, the repository still works.
But the history links for affected revisions are incorrect.


Re: predecessor count for the root node-revision is wrong message

2012-02-27 Thread Jason Wong
On Mon, Feb 27, 2012 at 8:09 AM, Stefan Sperling s...@elego.de wrote:
 On Mon, Feb 27, 2012 at 07:36:39AM -0800, Jason Wong wrote:
 This is true. We have seen the bug happen before. The first occurence
 of this that we had seen was Dec. 7th, 2011, a few days after we went
 from 1.6.16 to 1.7.1. That was the first time we had seen that happen.
 At the time, we did not know about the cause and the developer who
 had encountered the error didn't report it and was able to work
 around it. From the Apache logs we have:

   [Wed Dec 07 15:16:36 2011] [error] [client 10.2.3.1] predecessor
   count for the root node-revision is wrong: found 59444,
   committing r59478  [409, #160004]

 Just to be clear: These errors emitted by the 1.7.1 server prevent the
 bug from corrupting new revisions. With a 1.6 server the problem would
 go unnoticed and create bad revision data.

 When this corruption occurs, the repository still works.
 But the history links for affected revisions are incorrect.

Hi Stephan.

So I think I misunderstood why the error messages were occurring.
I had thought that there was a condition done by this check (in 1.7),
that was erroneously causing svn to reject the attempt to check-in.

I guess I am wondering that if this is the case, then why is it that
if the check-in fails, and then we manually check it in again using
tortoisesvn, that it works the second time?

So the errors prevent the bug from corrupting new revisions? Is this
something between the 1.7 versions or would this have been in 1.6
versions as well? We have been using svn for a while now and I am
wondering what this means that for 1.6, that this issue has been
occurring from communcations between 1.6 client and 1.6 server.

Also, is this bug something that svnadmin verify will not detect?
The last time we ran svnadmin verify, it said all was good.

If it is the case that this bug has been occuring for a long time,
what are the implications of the history links for affected
revisions? When you say the history links are incorrect, does it
just put in a random value or is it actually unreadable values?
Does this mean subsequent revisions that occur after these bad
revisions will propagate this bad information?

A developer asked me to pose the following question. If he was to
open a bad revision, would the client fail and give an error prompt
or would it display history information which could belong to other
files?

Thanks.

Jason.


Re: predecessor count for the root node-revision is wrong message

2012-02-27 Thread Stefan Sperling
On Mon, Feb 27, 2012 at 03:25:00PM -0800, Jason Wong wrote:
 So I think I misunderstood why the error messages were occurring.
 I had thought that there was a condition done by this check (in 1.7),
 that was erroneously causing svn to reject the attempt to check-in.

The purpose of this error is to abort a commit in a situation where
Subversion would otherwise create a commit that has the wrong idea
about the number of predecessor revisions of the repository's root node.

The root node is changed in every commit. So for revision N, the
number of predecessor revisions should add up to N (including revision
zero). Somehow, this count sometimes ends up being wrong during a
commit, and such commits are now rejected. However, when checking
the count for sanity, Subversion accounts for the fact that older
revisions might have gotten the count wrong. I.e. the check only prevents
the count from becoming even more wrong in future revisions than it
already is.

 I guess I am wondering that if this is the case, then why is it that
 if the check-in fails, and then we manually check it in again using
 tortoisesvn, that it works the second time?

As far as I know, we don't know what the underlying problem is yet.
So I'm afraid I cannot answer all of your questions in a satisfactory
manner because I don't know the answers.
Daniel might know more -- he has been following the problem more closely
than me. And I hope he will correct me if I'm making any wrong or
misleading statements in this post :)

 So the errors prevent the bug from corrupting new revisions?

Yes. This error is a sanity check.
Right now, it's a sort of good thing to be getting the error.
Just not as good as a proper fix for the, as of yet unknown, underlying
problem that's triggering the sanity check.

 Is this something between the 1.7 versions or would this have been in 1.6
 versions as well?

We currently assume that is also happens with 1.6.

 We have been using svn for a while now and I am
 wondering what this means that for 1.6, that this issue has been
 occurring from communcations between 1.6 client and 1.6 server.

It has probably been happening since before the 1.7 release.
The problem was first discovered in the ASF repository. The first
occurrence there has been traced to a time before 1.7.0 was released.
At the time the ASF server was still running 1.6, and the first commit
that introduced the problem in the ASF repository was very likely
a 1.6 client.

 Also, is this bug something that svnadmin verify will not detect?
 The last time we ran svnadmin verify, it said all was good.

Apparently, svnadmin verify won't find it.
This should probably be fixed. But the priority right now is to
understand why it is happening in the first place.

 If it is the case that this bug has been occuring for a long time,
 what are the implications of the history links for affected
 revisions?

'svn log' skips some revisions where it shouldn't. This is the
only harmful effect of this bug, as far as I know.

 When you say the history links are incorrect, does it
 just put in a random value or is it actually unreadable values?

I don't know if it is random or somehow predetermined. That depends
on how the wrong predecessor value actually comes about.

 Does this mean subsequent revisions that occur after these bad
 revisions will propagate this bad information?

Yes, the predecessor count of subsequent revisions is off by
some constant value.

 A developer asked me to pose the following question. If he was to
 open a bad revision, would the client fail and give an error prompt
 or would it display history information which could belong to other
 files?

So far this hasn't triggered any noticeable errors apart from missing
items when viewing the log history of a path. Once the problem was
discovered by inspection of 'svn log' output, a sanity check was put
in place to prevent the problem from happening in new revisions.
This check is what you are hitting. What we need to do now is to figure
out why it's happening, and then fix that problem.


Re: predecessor count for the root node-revision is wrong message

2012-02-27 Thread Daniel Shahaf
Stefan Sperling wrote on Tue, Feb 28, 2012 at 03:18:35 +0100:
 On Mon, Feb 27, 2012 at 03:25:00PM -0800, Jason Wong wrote:
  I guess I am wondering that if this is the case, then why is it that
  if the check-in fails, and then we manually check it in again using
  tortoisesvn, that it works the second time?
 
 As far as I know, we don't know what the underlying problem is yet.
 So I'm afraid I cannot answer all of your questions in a satisfactory
 manner because I don't know the answers.

I'm not really surprised by this situation.  I guess Jason means the
same set of tree/text/prop/revprop changes when he says it, but by
the nature of the bug it may well have to do with some concurrency
issues on the server --- i.e., it depends on more than just the contents
being committed.

 Daniel might know more -- he has been following the problem more closely
 than me. And I hope he will correct me if I'm making any wrong or
 misleading statements in this post :)

Yeah, I'm following this thread :)

  We have been using svn for a while now and I am
  wondering what this means that for 1.6, that this issue has been
  occurring from communcations between 1.6 client and 1.6 server.
 
 It has probably been happening since before the 1.7 release.
 The problem was first discovered in the ASF repository. The first
 occurrence there has been traced to a time before 1.7.0 was released.
 At the time the ASF server was still running 1.6, and the first commit
 that introduced the problem in the ASF repository was very likely
 a 1.6 client.
 

AI We should check whether it _was_ a 1.6 client..

  Also, is this bug something that svnadmin verify will not detect?
  The last time we ran svnadmin verify, it said all was good.
 
 Apparently, svnadmin verify won't find it.
 This should probably be fixed. But the priority right now is to
 understand why it is happening in the first place.
 

AI The 'verify' part is a relatively low-hanging fruit, though.

AI File an issue for the pred-count bug

  If it is the case that this bug has been occuring for a long time,
  what are the implications of the history links for affected
  revisions?
 
 'svn log' skips some revisions where it shouldn't. This is the
 only harmful effect of this bug, as far as I know.
 

As I'd mentioned upthread the ASF repository triggering this bug also
has bogus minfo-cnt headers.  It's not known yet whether that is the
same bug, a separate bug, or a bug in something other than svn.

(that's also why I'm not filing an issue for the minfo-cnt bug yet --
it's not clear that the bug is in svn.)

  When you say the history links are incorrect, does it
  just put in a random value or is it actually unreadable values?
 
 I don't know if it is random or somehow predetermined. That depends
 on how the wrong predecessor value actually comes about.
 

The links are not garbage values -- they point to an existing location
(node-rev) in history.  They don't point to the right location.

  Does this mean subsequent revisions that occur after these bad
  revisions will propagate this bad information?
 
 Yes, the predecessor count of subsequent revisions is off by
 some constant value.
 

Indeed; but that's just error propagation, rather than a bug.
Practically it implies fixing the effects of the bug will involve
changing all the historical metadata.  (This probably means dump/load.
There is a way to implement this by doing in-place surgery (see l1488)
but it is of the Don't try this at home variety.)

The propagation of wrong counts, and the skipping of revisions during
backwards history walk, are both expected behaviours from the DAG layer
given the DAG in question (which includes a replacement of the root
node).  In English: the lower level APIs are performing correctly given
the violating-the-invariants-of-higher-layers state of the filesystem.

  A developer asked me to pose the following question. If he was to
  open a bad revision, would the client fail and give an error prompt
  or would it display history information which could belong to other
  files?
 
 So far this hasn't triggered any noticeable errors apart from missing
 items when viewing the log history of a path. Once the problem was

Indeed.  And that's about all the direct effects it can have.  (Perhaps
some code is doing that history walk internally as part of another
higher-level operation, though.)

I guess minfo-cnt bug mentioned earlier should normally be very obvious,
as the values are off by an order of magnitude.  If they are too high
then the tree walk to find svn:mergeinfo-ful nodes will not stop at the
root of trees that lack those nodes (so, inefficiency but not
incorrectness); if it's too low then it will ignore the svn:mergeinfo on
some nodes, so I expect merges will visibly misperform (but Stefan will
know better).

 discovered by inspection of 'svn log' output, a sanity check was put
 in place to prevent the problem from happening in new revisions.
 This check is what you are hitting. What 

Re: predecessor count for the root node-revision is wrong message

2012-02-16 Thread Jason Wong
On Wed, Feb 15, 2012 at 6:15 PM, Daniel Shahaf danie...@elego.de wrote:
 Jason Wong wrote on Wed, Feb 15, 2012 at 10:20:23 -0800:
 On Wed, Feb 8, 2012 at 6:22 PM, Nico Kadel-Garcia nka...@gmail.com wrote:
  On Wed, Feb 8, 2012 at 7:42 PM, Daniel Shahaf danie...@elego.de wrote:
  Daniel Shahaf wrote on Thu, Feb 09, 2012 at 01:46:45 +0200:
  Jason Wong wrote on Wed, Feb 08, 2012 at 15:32:05 -0800:
 
  Get xxd.exe from http://www.vim.org/ and cat.exe and sed.exe from
  http://gnuwin32.sf.net (or from Cygwin).  Delete from the script the
  line that uses the 'head' command.
 
  There is a second use of 'head', which you shouldn't delete.  So
  instead, just get head.exe from the same place as the other two, or use
  the following kind of statement:
 
  Or install CygWin and run the scripts from inside CygWin. This does
  present end-of-line issues, so be very careful about using svn:eol
  native properties.
 
 
 my $line = do {
 open FOO, perl -V 21 |;
 FOO;
 };
 
  Lastly, there's a 'sed' invocation that uses single-quoted arguments.
  All it does is print the input up to the first empty line --- feel
  free to implement it differently.  (One way:
 
 my @lines = split /\n/, `command | goes | here`;
 $_ and print or last for @lines;
 
  Both of these examples could do with some error checking.)
 
  Daniel
  (yes, there's also a neater way to do this without split(). but it's
  not a Perl class here)

 Hello.

 Sorry for the delay. Here is an update of what I have done since
 the last time I posted.

 I have run svn log -q ^/ on the respository and it came back with
 no missing revisions.


 I stand corrected, then.  I've confirmed on another instance of the bug
 that 'svn log -q ^/' does not behave abnormally when the bug is present.
 Sorry for the misinformation.

 Question to devs: what operation will walk the predecessor links for
 the root fspath? (and can therefore be used to identify instances of
 the bug)

 Since I first posted, each of the projects we have tried to build
 that had failed have since successfully been built without any changes
 on our side.


 What is the significance of this?  I don't know how your build process
 interacts with Subversion.

 I was having an issue with converting the script to run in windows as
 I was only getting the first line returned so I set up cygwin.

 I ran the script against both of the revisions (61815 and 61852) in
 mentioned in the Apache error log and the output was the same for
 each.

 Commands:
dump-noderev.pl /repository
 /project/binaries/release/phase1/iteration/81/trunk 61815
dump-noderev.pl /repository
 /project/binaries/release/phase1/iteration/81/trunk 61852

 Output:
id: 9-45362.0-61242.r61424/0
type: dir
pred: 9-45362.0-60310/0

 Are you sure that's the value of the pred: field?  It contains only
 one ., instead of two.


I missed a part of it, you are right. here is the full pred line:
   pred: 9-45362.0-60310.r60310/0

count: 43
text: 58741 121716266 218 218 74eb31e90880ba1345fc49252ca6efe6
cpath: /project/binaries/release/phase1/iteration/81/trunk
copyfrom: 61423 /project/binaries/release/phase1/iteration/80/trunk

 Is this information helpful? Let me know if this tells you anything. Thanks


 The fact that the output is identical suggests that the
 /project/binaries/release/phase1/iteration/81/trunk tree hasn't changed
 between those two revisions (or that there was a directory replace above
 it).

 However, this is the error you report:

 [Tue Jan 31 11:37:23 2012] [error] [client 9.31.13.109] predecessor count 
 for the root node-revision is wrong: found 61815, committing r61852  [409, 
 #160004]

 The metadata this error complains about will be output by these two
 commands:

  ./dump-noderev.pl /repository / 61851
--

id: 0.0.r61851/33470
type: dir
pred: 0.0.r61850/3844
count: 61818
text: 61851 32225 1232 1232 7555349571e297c23e647cc2441d5b8f
cpath: /
copyroot: 0 /
minfo-cnt: 25685
--

  ./dump-noderev.pl /repository / 61852
--

id: 0.0.r61852/27663
type: dir
pred: 0.0.r61851/33470
count: 61819
text: 61852 26417 1233 1233 712fec619d55677e67aca8f7aa4ceb97
cpath: /
copyroot: 0 /
minfo-cnt: 25685



 Jason.

 Cheers,

 Daniel


Hi Daniel

Thanks for the quick reply. I have posted the results from the two
commands you have asked me to run above as well as the full pred
value that was incomplete.

Let me know if you need any other information.
Thanks.

Jason


Re: predecessor count for the root node-revision is wrong message

2012-02-16 Thread Daniel Shahaf
Jason Wong wrote on Thu, Feb 16, 2012 at 11:42:42 -0800:
 On Wed, Feb 15, 2012 at 6:15 PM, Daniel Shahaf danie...@elego.de wrote:
  Jason Wong wrote on Wed, Feb 15, 2012 at 10:20:23 -0800:
  On Wed, Feb 8, 2012 at 6:22 PM, Nico Kadel-Garcia nka...@gmail.com wrote:
   On Wed, Feb 8, 2012 at 7:42 PM, Daniel Shahaf danie...@elego.de wrote:
   Daniel Shahaf wrote on Thu, Feb 09, 2012 at 01:46:45 +0200:
   Jason Wong wrote on Wed, Feb 08, 2012 at 15:32:05 -0800:
  
   Get xxd.exe from http://www.vim.org/ and cat.exe and sed.exe from
   http://gnuwin32.sf.net (or from Cygwin).  Delete from the script the
   line that uses the 'head' command.
  
   There is a second use of 'head', which you shouldn't delete.  So
   instead, just get head.exe from the same place as the other two, or use
   the following kind of statement:
  
   Or install CygWin and run the scripts from inside CygWin. This does
   present end-of-line issues, so be very careful about using svn:eol
   native properties.
  
  
  my $line = do {
  open FOO, perl -V 21 |;
  FOO;
  };
  
   Lastly, there's a 'sed' invocation that uses single-quoted arguments.
   All it does is print the input up to the first empty line --- feel
   free to implement it differently.  (One way:
  
  my @lines = split /\n/, `command | goes | here`;
  $_ and print or last for @lines;
  
   Both of these examples could do with some error checking.)
  
   Daniel
   (yes, there's also a neater way to do this without split(). but it's
   not a Perl class here)
 
  Hello.
 
  Sorry for the delay. Here is an update of what I have done since
  the last time I posted.
 
  I have run svn log -q ^/ on the respository and it came back with
  no missing revisions.
 
 
  I stand corrected, then.  I've confirmed on another instance of the bug
  that 'svn log -q ^/' does not behave abnormally when the bug is present.
  Sorry for the misinformation.
 
  Question to devs: what operation will walk the predecessor links for
  the root fspath? (and can therefore be used to identify instances of
  the bug)
 
  Since I first posted, each of the projects we have tried to build
  that had failed have since successfully been built without any changes
  on our side.
 
 
  What is the significance of this?  I don't know how your build process
  interacts with Subversion.
 
  I was having an issue with converting the script to run in windows as
  I was only getting the first line returned so I set up cygwin.
 
  I ran the script against both of the revisions (61815 and 61852) in
  mentioned in the Apache error log and the output was the same for
  each.
 
  Commands:
 dump-noderev.pl /repository
  /project/binaries/release/phase1/iteration/81/trunk 61815
 dump-noderev.pl /repository
  /project/binaries/release/phase1/iteration/81/trunk 61852
 
  Output:
 id: 9-45362.0-61242.r61424/0
 type: dir
 pred: 9-45362.0-60310/0
 
  Are you sure that's the value of the pred: field?  It contains only
  one ., instead of two.
 
 
 I missed a part of it, you are right. here is the full pred line:
pred: 9-45362.0-60310.r60310/0
 

Okay.  Did /project/binaries/release/phase1/iteration/81/trunk@61852, or
anything under it, change between r60311 and r61423?  (I'm intentionally
using the peg rev notation here --- these id's do not change when
a parent of the path in question gets copied.)

 count: 43
 text: 58741 121716266 218 218 74eb31e90880ba1345fc49252ca6efe6
 cpath: /project/binaries/release/phase1/iteration/81/trunk
 copyfrom: 61423 /project/binaries/release/phase1/iteration/80/trunk
 
  Is this information helpful? Let me know if this tells you anything. Thanks
 
 
  The fact that the output is identical suggests that the
  /project/binaries/release/phase1/iteration/81/trunk tree hasn't changed
  between those two revisions (or that there was a directory replace above
  it).
 
  However, this is the error you report:
 
  [Tue Jan 31 11:37:23 2012] [error] [client 9.31.13.109] predecessor 
  count for the root node-revision is wrong: found 61815, committing r61852  
  [409, #160004]
 
  The metadata this error complains about will be output by these two
  commands:
 
   ./dump-noderev.pl /repository / 61851
 --
 
 id: 0.0.r61851/33470
 type: dir
 pred: 0.0.r61850/3844
 count: 61818
 text: 61851 32225 1232 1232 7555349571e297c23e647cc2441d5b8f
 cpath: /
 copyroot: 0 /
 minfo-cnt: 25685
 --
 
   ./dump-noderev.pl /repository / 61852
 --
 
 id: 0.0.r61852/27663
 type: dir
 pred: 0.0.r61851/33470
 count: 61819
 text: 61852 26417 1233 1233 712fec619d55677e67aca8f7aa4ceb97
 cpath: /
 copyroot: 0 /
 minfo-cnt: 25685
 
 

The output from these two tells me two things:

1. The minfo-cnt value is reasonable (within a typical ballpark).
That's relevant since minfo-cnt abnormalities were seen in another
instance of the bug.

2. 

Re: predecessor count for the root node-revision is wrong message

2012-02-15 Thread Jason Wong
On Wed, Feb 8, 2012 at 6:22 PM, Nico Kadel-Garcia nka...@gmail.com wrote:
 On Wed, Feb 8, 2012 at 7:42 PM, Daniel Shahaf danie...@elego.de wrote:
 Daniel Shahaf wrote on Thu, Feb 09, 2012 at 01:46:45 +0200:
 Jason Wong wrote on Wed, Feb 08, 2012 at 15:32:05 -0800:

 Get xxd.exe from http://www.vim.org/ and cat.exe and sed.exe from
 http://gnuwin32.sf.net (or from Cygwin).  Delete from the script the
 line that uses the 'head' command.

 There is a second use of 'head', which you shouldn't delete.  So
 instead, just get head.exe from the same place as the other two, or use
 the following kind of statement:

 Or install CygWin and run the scripts from inside CygWin. This does
 present end-of-line issues, so be very careful about using svn:eol
 native properties.


    my $line = do {
        open FOO, perl -V 21 |;
        FOO;
    };

 Lastly, there's a 'sed' invocation that uses single-quoted arguments.
 All it does is print the input up to the first empty line --- feel
 free to implement it differently.  (One way:

    my @lines = split /\n/, `command | goes | here`;
    $_ and print or last for @lines;

 Both of these examples could do with some error checking.)

 Daniel
 (yes, there's also a neater way to do this without split(). but it's
 not a Perl class here)

Hello.

Sorry for the delay. Here is an update of what I have done since
the last time I posted.

I have run svn log -q ^/ on the respository and it came back with
no missing revisions.

Since I first posted, each of the projects we have tried to build
that had failed have since successfully been built without any changes
on our side.

I was having an issue with converting the script to run in windows as
I was only getting the first line returned so I set up cygwin.

I ran the script against both of the revisions (61815 and 61852) in
mentioned in the Apache error log and the output was the same for
each.

Commands:
   dump-noderev.pl /repository
/project/binaries/release/phase1/iteration/81/trunk 61815
   dump-noderev.pl /repository
/project/binaries/release/phase1/iteration/81/trunk 61852

Output:
   id: 9-45362.0-61242.r61424/0
   type: dir
   pred: 9-45362.0-60310/0
   count: 43
   text: 58741 121716266 218 218 74eb31e90880ba1345fc49252ca6efe6
   cpath: /project/binaries/release/phase1/iteration/81/trunk
   copyfrom: 61423 /project/binaries/release/phase1/iteration/80/trunk

Is this information helpful? Let me know if this tells you anything. Thanks

Jason.


Re: predecessor count for the root node-revision is wrong message

2012-02-15 Thread Daniel Shahaf
Jason Wong wrote on Wed, Feb 15, 2012 at 10:20:23 -0800:
 On Wed, Feb 8, 2012 at 6:22 PM, Nico Kadel-Garcia nka...@gmail.com wrote:
  On Wed, Feb 8, 2012 at 7:42 PM, Daniel Shahaf danie...@elego.de wrote:
  Daniel Shahaf wrote on Thu, Feb 09, 2012 at 01:46:45 +0200:
  Jason Wong wrote on Wed, Feb 08, 2012 at 15:32:05 -0800:
 
  Get xxd.exe from http://www.vim.org/ and cat.exe and sed.exe from
  http://gnuwin32.sf.net (or from Cygwin).  Delete from the script the
  line that uses the 'head' command.
 
  There is a second use of 'head', which you shouldn't delete.  So
  instead, just get head.exe from the same place as the other two, or use
  the following kind of statement:
 
  Or install CygWin and run the scripts from inside CygWin. This does
  present end-of-line issues, so be very careful about using svn:eol
  native properties.
 
 
     my $line = do {
         open FOO, perl -V 21 |;
         FOO;
     };
 
  Lastly, there's a 'sed' invocation that uses single-quoted arguments.
  All it does is print the input up to the first empty line --- feel
  free to implement it differently.  (One way:
 
     my @lines = split /\n/, `command | goes | here`;
     $_ and print or last for @lines;
 
  Both of these examples could do with some error checking.)
 
  Daniel
  (yes, there's also a neater way to do this without split(). but it's
  not a Perl class here)
 
 Hello.
 
 Sorry for the delay. Here is an update of what I have done since
 the last time I posted.
 
 I have run svn log -q ^/ on the respository and it came back with
 no missing revisions.
 

I stand corrected, then.  I've confirmed on another instance of the bug
that 'svn log -q ^/' does not behave abnormally when the bug is present.
Sorry for the misinformation.

Question to devs: what operation will walk the predecessor links for
the root fspath? (and can therefore be used to identify instances of
the bug)

 Since I first posted, each of the projects we have tried to build
 that had failed have since successfully been built without any changes
 on our side.
 

What is the significance of this?  I don't know how your build process
interacts with Subversion.

 I was having an issue with converting the script to run in windows as
 I was only getting the first line returned so I set up cygwin.
 
 I ran the script against both of the revisions (61815 and 61852) in
 mentioned in the Apache error log and the output was the same for
 each.
 
 Commands:
dump-noderev.pl /repository
 /project/binaries/release/phase1/iteration/81/trunk 61815
dump-noderev.pl /repository
 /project/binaries/release/phase1/iteration/81/trunk 61852
 
 Output:
id: 9-45362.0-61242.r61424/0
type: dir
pred: 9-45362.0-60310/0

Are you sure that's the value of the pred: field?  It contains only
one ., instead of two.

count: 43
text: 58741 121716266 218 218 74eb31e90880ba1345fc49252ca6efe6
cpath: /project/binaries/release/phase1/iteration/81/trunk
copyfrom: 61423 /project/binaries/release/phase1/iteration/80/trunk
 
 Is this information helpful? Let me know if this tells you anything. Thanks
 

The fact that the output is identical suggests that the
/project/binaries/release/phase1/iteration/81/trunk tree hasn't changed
between those two revisions (or that there was a directory replace above
it).

However, this is the error you report:

    [Tue Jan 31 11:37:23 2012] [error] [client 9.31.13.109] predecessor count 
for the root node-revision is wrong: found 61815, committing r61852  [409, 
#160004]

The metadata this error complains about will be output by these two
commands:

  ./dump-noderev.pl /repository / 61851
  ./dump-noderev.pl /repository / 61852

 Jason.

Cheers,

Daniel



Re: predecessor count for the root node-revision is wrong message

2012-02-08 Thread Jason Wong
Hello and thank you for replying.

On Tue, Feb 7, 2012 at 4:04 PM, Daniel Shahaf danie...@elego.de wrote:
 Jason Wong wrote on Tue, Feb 07, 2012 at 13:23:10 -0800:
 Any help/comments would be appreciated. Thank you.


 As I said, I'd be interested in isolating the cause of these errors.
 Is there anything common to revisions that triggered the bug (as
 explained above)?  Are they concomitant with concurrent writes (commits,
 propedits, 'svn lock' operations, 'svnadmin pack' operations)?  What
 version of svn does your server run (1.7.1?)?  What operating system
 does your server run?  Is there anything noteworthy about its
 filesystems or disks?

I am working with our lead developer to come up with more details
on our build process. I will post this when I get it.
Our svn repository is 1.7.1 and is hosted on Apache 2.2.21 on a
Windows 2003 server. The server has running RAID5 with SCSI disks.

Because my systems are on Windows, I don't think the perl script
you had sent me will run as there are a couple commands in it
that are called which I don't have. Do you have any suggestions
for how I can run the script?

In the meantime, I am running svn log -q and will go though the
output to scan for missing revisions. I will let you know those
results when I have them.

Thank you.

Jason Wong


Re: predecessor count for the root node-revision is wrong message

2012-02-08 Thread Daniel Shahaf
Jason Wong wrote on Wed, Feb 08, 2012 at 15:32:05 -0800:
 Hello and thank you for replying.
 
 On Tue, Feb 7, 2012 at 4:04 PM, Daniel Shahaf danie...@elego.de wrote:
  Jason Wong wrote on Tue, Feb 07, 2012 at 13:23:10 -0800:
  Any help/comments would be appreciated. Thank you.
 
 
  As I said, I'd be interested in isolating the cause of these errors.
  Is there anything common to revisions that triggered the bug (as
  explained above)?  Are they concomitant with concurrent writes (commits,
  propedits, 'svn lock' operations, 'svnadmin pack' operations)?  What
  version of svn does your server run (1.7.1?)?  What operating system
  does your server run?  Is there anything noteworthy about its
  filesystems or disks?
 
 I am working with our lead developer to come up with more details
 on our build process. I will post this when I get it.
 Our svn repository is 1.7.1 and is hosted on Apache 2.2.21 on a
 Windows 2003 server. The server has running RAID5 with SCSI disks.
 

Okay.  We've seen this happen on svn.apache.org, which today runs
httpd-2.3/FreeBSD/zfs, and was probably running this or a similar stack
when the problems triggered there.

 Because my systems are on Windows, I don't think the perl script
 you had sent me will run as there are a couple commands in it
 that are called which I don't have. Do you have any suggestions
 for how I can run the script?
 

Get xxd.exe from http://www.vim.org/ and cat.exe and sed.exe from
http://gnuwin32.sf.net (or from Cygwin).  Delete from the script the
line that uses the 'head' command.

Or just reimplement it in your language of choice.  Basically it runs
'svnlook', parses out the noderev-id as '%s.%s.%s', parses \3 as
'r%d.%d', and opens revision file \1 at offset \2 --- accounting for the
fact that that revision may have been 'svnadmin pack'ed.

If I spoke too fast, feel free to ask.

 In the meantime, I am running svn log -q and will go though the
 output to scan for missing revisions. I will let you know those
 results when I have them.
 

Thanks,

Daniel

 Thank you.
 
 Jason Wong


Re: predecessor count for the root node-revision is wrong message

2012-02-08 Thread Daniel Shahaf
Daniel Shahaf wrote on Thu, Feb 09, 2012 at 01:46:45 +0200:
 Jason Wong wrote on Wed, Feb 08, 2012 at 15:32:05 -0800:
  Hello and thank you for replying.
  
  On Tue, Feb 7, 2012 at 4:04 PM, Daniel Shahaf danie...@elego.de wrote:
   Jason Wong wrote on Tue, Feb 07, 2012 at 13:23:10 -0800:
   Any help/comments would be appreciated. Thank you.
  
  
   As I said, I'd be interested in isolating the cause of these errors.
   Is there anything common to revisions that triggered the bug (as
   explained above)?  Are they concomitant with concurrent writes (commits,
   propedits, 'svn lock' operations, 'svnadmin pack' operations)?  What
   version of svn does your server run (1.7.1?)?  What operating system
   does your server run?  Is there anything noteworthy about its
   filesystems or disks?
  
  I am working with our lead developer to come up with more details
  on our build process. I will post this when I get it.
  Our svn repository is 1.7.1 and is hosted on Apache 2.2.21 on a
  Windows 2003 server. The server has running RAID5 with SCSI disks.
  
 
 Okay.  We've seen this happen on svn.apache.org, which today runs
 httpd-2.3/FreeBSD/zfs, and was probably running this or a similar stack
 when the problems triggered there.
 
  Because my systems are on Windows, I don't think the perl script
  you had sent me will run as there are a couple commands in it
  that are called which I don't have. Do you have any suggestions
  for how I can run the script?
  
 
 Get xxd.exe from http://www.vim.org/ and cat.exe and sed.exe from
 http://gnuwin32.sf.net (or from Cygwin).  Delete from the script the
 line that uses the 'head' command.
 
 Or just reimplement it in your language of choice.  Basically it runs
 'svnlook', parses out the noderev-id as '%s.%s.%s', parses \3 as
 'r%d.%d', and opens revision file \1 at offset \2 --- accounting for the

Typo fix: 'r%d.%d' - 'r%d/%d'

 fact that that revision may have been 'svnadmin pack'ed.
 
 If I spoke too fast, feel free to ask.
 
  In the meantime, I am running svn log -q and will go though the
  output to scan for missing revisions. I will let you know those
  results when I have them.
  
 
 Thanks,
 
 Daniel
 
  Thank you.
  
  Jason Wong


Re: predecessor count for the root node-revision is wrong message

2012-02-08 Thread Daniel Shahaf
Daniel Shahaf wrote on Thu, Feb 09, 2012 at 01:46:45 +0200:
 Jason Wong wrote on Wed, Feb 08, 2012 at 15:32:05 -0800:
  Hello and thank you for replying.
  
  On Tue, Feb 7, 2012 at 4:04 PM, Daniel Shahaf danie...@elego.de wrote:
   Jason Wong wrote on Tue, Feb 07, 2012 at 13:23:10 -0800:
   Any help/comments would be appreciated. Thank you.
  
  
   As I said, I'd be interested in isolating the cause of these errors.
   Is there anything common to revisions that triggered the bug (as
   explained above)?  Are they concomitant with concurrent writes (commits,
   propedits, 'svn lock' operations, 'svnadmin pack' operations)?  What
   version of svn does your server run (1.7.1?)?  What operating system
   does your server run?  Is there anything noteworthy about its
   filesystems or disks?
  
  I am working with our lead developer to come up with more details
  on our build process. I will post this when I get it.
  Our svn repository is 1.7.1 and is hosted on Apache 2.2.21 on a
  Windows 2003 server. The server has running RAID5 with SCSI disks.
  
 
 Okay.  We've seen this happen on svn.apache.org, which today runs
 httpd-2.3/FreeBSD/zfs, and was probably running this or a similar stack
 when the problems triggered there.
 
  Because my systems are on Windows, I don't think the perl script
  you had sent me will run as there are a couple commands in it
  that are called which I don't have. Do you have any suggestions
  for how I can run the script?
  
 
 Get xxd.exe from http://www.vim.org/ and cat.exe and sed.exe from
 http://gnuwin32.sf.net (or from Cygwin).  Delete from the script the
 line that uses the 'head' command.

There is a second use of 'head', which you shouldn't delete.  So
instead, just get head.exe from the same place as the other two, or use
the following kind of statement:

my $line = do {
open FOO, perl -V 21 |;
FOO;
};

Lastly, there's a 'sed' invocation that uses single-quoted arguments.
All it does is print the input up to the first empty line --- feel
free to implement it differently.  (One way:

my @lines = split /\n/, `command | goes | here`;
$_ and print or last for @lines;

Both of these examples could do with some error checking.)

Daniel
(yes, there's also a neater way to do this without split(). but it's
not a Perl class here)


Re: predecessor count for the root node-revision is wrong message

2012-02-08 Thread Nico Kadel-Garcia
On Wed, Feb 8, 2012 at 7:42 PM, Daniel Shahaf danie...@elego.de wrote:
 Daniel Shahaf wrote on Thu, Feb 09, 2012 at 01:46:45 +0200:
 Jason Wong wrote on Wed, Feb 08, 2012 at 15:32:05 -0800:

 Get xxd.exe from http://www.vim.org/ and cat.exe and sed.exe from
 http://gnuwin32.sf.net (or from Cygwin).  Delete from the script the
 line that uses the 'head' command.

 There is a second use of 'head', which you shouldn't delete.  So
 instead, just get head.exe from the same place as the other two, or use
 the following kind of statement:

Or install CygWin and run the scripts from inside CygWin. This does
present end-of-line issues, so be very careful about using svn:eol
native properties.


    my $line = do {
        open FOO, perl -V 21 |;
        FOO;
    };

 Lastly, there's a 'sed' invocation that uses single-quoted arguments.
 All it does is print the input up to the first empty line --- feel
 free to implement it differently.  (One way:

    my @lines = split /\n/, `command | goes | here`;
    $_ and print or last for @lines;

 Both of these examples could do with some error checking.)

 Daniel
 (yes, there's also a neater way to do this without split(). but it's
 not a Perl class here)


Re: predecessor count for the root node-revision is wrong message

2012-02-07 Thread Daniel Shahaf
Jason Wong wrote on Tue, Feb 07, 2012 at 13:23:10 -0800:
 Hello.
 
 I have recently run into an issue with my subversion system (1.7.1)
 where a specific component I am trying to build has failed. I have
 had sucessful builds of this project before this issue happened since
 we upgraded to 1.7.1. I am currently getting by this by using
 TortoiseSVN 1.7.1 to do the check-in of the files left in the
 workspace.
 
 This issue seems to be intermittent as it does not happen every time.
 I am using subversion 1.7.1 windows binaries and subversion is
 hosting on Apache 2.2.21.
 
 I have run svnadmin verify against the directory and it has come
 back clean. I was wondering what type of events can cause this to
 happen? Are there any resolutions?
 
 Here are some entries from the apache log for the last time this
 issue happened.
 
 The Apache error log shows the following lines:
 
     [Tue Jan 31 11:37:23 2012] [error] [client 9.31.13.109] Could not
 MERGE resource /repository/!svn/txn/61847-1bz5 into
 /repository/project/binaries/release/phase1/iteration/81/trunk
 . [409, #0]
     [Tue Jan 31 11:37:23 2012] [error] [client 9.31.13.109] An error
 occurred while committing the transaction.  [409, #160004]
     [Tue Jan 31 11:37:23 2012] [error] [client 9.31.13.109] predecessor
 count for the root node-revision is wrong: found 61815,
 committing r61852  [409, #160004]

This error message indicates a bug has been detected.  The implications
of the bug are that walking backwards through history (eg, 'svn log -r
HEAD:0') may skip some revisions.

I'm interested in getting to the bottom of this.

There are two ways to identify instances of the bug: either by directly
querying the filesystem (explained below), or by running 'svn log -q ^/'
and looking for gaps in the sequences of revision numbers.

The second approch is as follows: look for revisions N such that
f(N) != f(N-1), where f(N) is 'N minus (predecessors count of the
node-revision of / in revision N)'.  To compute f(), you find the
predecessors count; it is given by the 'count:' header in the output
of the attached script:

  % ./dump-noderev.pl /tmp/svn/r1 / 1
  id: 0.0.r1/4760
  type: dir
  pred: 0.0.r0/17
  count: 1
  text: 1 4640 107 107 c34b7d0de08f48db97941642f719e2f4
  cpath: /
  copyroot: 0 /

So, in summary, there are two ways to identify revisions that trigger
the bug: by 'svn log' or by manually looking for gaps in the sequence of
node-revisions of the root of the filesystem.

I would ask you to run the attached script anyway, on the HEAD revision,
and review its output for sensibility --- in particular, what is the
value of the minfo-cnt header (if present).

     [Tue Jan 31 11:37:24 2012] [error] [client 9.31.13.109] Could not
 fetch resource information.  [404, #0]
     [Tue Jan 31 11:37:24 2012] [error] [client 9.31.13.109] Named
 transaction doesn't exist.  [404, #0]
 
 The following are from the Apache access log at the same time:
 
     9.31.13.109 - username [31/Jan/2012:11:37:22 -0800] MERGE
 /repository/project/binaries/release/phase1/iteration/81/trunk
 HTTP/1.1 409 281
     9.31.13.109 - username [31/Jan/2012:11:37:24 -0800] DELETE
 /repository/!svn/txn/61847-1bz5 HTTP/1.1 404 232
 
 Any help/comments would be appreciated. Thank you.
 

As I said, I'd be interested in isolating the cause of these errors.
Is there anything common to revisions that triggered the bug (as
explained above)?  Are they concomitant with concurrent writes (commits,
propedits, 'svn lock' operations, 'svnadmin pack' operations)?  What
version of svn does your server run (1.7.1?)?  What operating system
does your server run?  Is there anything noteworthy about its
filesystems or disks?

 
 Jason Wong

Thanks,

Daniel

#!/usr/local/bin/perl

use warnings;
use strict;

$ENV{uc $_} ||= lc $_ for qw/svn svnlook svnadmin/;

sub open_pack_or_rev_file {
  my ($FS, $REVISION, $OFFSET) = @_;
  my $shard = int ($REVISION / 1000);
  my $remainder = $REVISION % 1000;

  if (-e $FS/revs/$shard/$REVISION) {
return  $FS/revs/$shard/$REVISION, $OFFSET;
  } elsif (-e $FS/revs/$REVISION) {
return  $FS/revs/$REVISION, $OFFSET;
  } elsif (-e $FS/revs/$shard.pack) {
my $lineno = $remainder+1;
my $rev_offset = `cat $FS/revs/$shard.pack/manifest | sed -ne ${lineno}p`;
return  $FS/revs/$shard.pack/pack, $rev_offset + $OFFSET;
  }
}

sub main {
  my ($REPOS, $FSPATH, $REV) = @_;
  my $FS = $REPOS/db;
  $REV =~ s/^r*// if defined $REV;

  die USAGE: $0 /path/to/repos /path/in/repos [revnum] if @_ != 2 and @_ != 3;

  die Non-numeric revision number if defined $REV and $REV !~ /^\d+$/;
  die Unknown repos format: $REPOS/format if `cat $REPOS/format` !~ /^[35]$/;
  die Not FSFS: $FS/fs-type if `cat $FS/fs-type` ne fsfs\n;
  die Unknown FSFS format: $FS/format if `head -n1 $FS/format` !~ /^[12346]$/;

  my ($revision, $offset) = do {
my $REV_ARG = defined($REV) ? -r$REV : ;
#