Re: svnsync checksum error
On Wed, Nov 10, 2010 at 08:55:49PM -0500, Edward Ned Harvey wrote: From: Stefan Sperling [mailto:s...@elego.de] It's 100% consistent. I get the same checksum error, on the same file, every time. I have a supposed good copy of the slave repo, at rev 4050... which will fail every time at 4061 (or something like that)... The only explanation I can find is a md5sum collision going undetected, and then some larger operation has an md5sum which fails as a result. I know it's astronomically impossible, but I can't come up with any other explanation. So you can reproduce it reliably? That's very interesting. I'd like to try to debug this. If it's possible to arrange access to your repository data please contact me off-list. Thanks. I believe we found the cause for mine. It was hardware error, which was introduced silently into rev 4390 of my repo. But I can't speak for the other folks here... If they're having bugs, they might have bugs. One quick question though: If the system is calculating checksums, shouldn't it store the checksums for future reference? I find it very surprising that I can run svnadmin verify and no errors are detected, yet svnsync dies with a md5sum mismatch. Maybe the md5sums are only used transiently and only by svnsync? By design, the handling of checksums is sane. Checksums are stored in the repository, and are calculated by the repository layer. A client can only tell the repository what it expects the checksum to be. When the client sends content, the repository calculates the content's checksum and compares that to the expected checksum. See also http://svn.haxx.se/dev/archive-2010-07/0426.shtml, where Mike Pilato explains this in detail. [That thread is about a commit I made that added property content checksums to dump files. The commit was later reverted because it's just as cheap to compare the actual property contents since we treat property content as strings (but the loader doesn't do that, yet).] I'm not sure what svnadmin verify is doing wrong in your case. But I know that there are corruptions it doesn't detect, and we're planning to improve this situation: http://subversion.tigris.org/issues/show_bug.cgi?id=3706
Re: svnsync checksum error
Stefan Sperling wrote on Thu, Nov 11, 2010 at 13:29:14 +0100: I'm not sure what svnadmin verify is doing wrong in your case. But I know that there are corruptions it doesn't detect, and we're planning to improve this situation: http://subversion.tigris.org/issues/show_bug.cgi?id=3706 What's the recommendation in the meantime? To use 'dump'? To use 'dump|load'? To use svnsync (...)? To manually walk the entire history and recompute all checksums?
Re: svnsync checksum error
On Thu, Nov 11, 2010 at 03:10:19PM +0200, Daniel Shahaf wrote: Stefan Sperling wrote on Thu, Nov 11, 2010 at 13:29:14 +0100: I'm not sure what svnadmin verify is doing wrong in your case. But I know that there are corruptions it doesn't detect, and we're planning to improve this situation: http://subversion.tigris.org/issues/show_bug.cgi?id=3706 What's the recommendation in the meantime? To use 'dump'? To use 'dump|load'? To use svnsync (...)? To manually walk the entire history and recompute all checksums? I'd recommend using fsfs-verify.py.
RE: svnsync checksum error
From: Stefan Sperling [mailto:s...@elego.de] By design, the handling of checksums is sane. Checksums are stored in the repository, and are calculated by the repository layer. A client can only tell the repository what it expects the checksum to be. When the client sends content, the repository calculates the content's checksum and compares that to the expected checksum. See also http://svn.haxx.se/dev/archive-2010-07/0426.shtml, where Mike Pilato explains this in detail. [That thread is about a commit I made that added property content checksums to dump files. The commit was later reverted because it's just as cheap to compare the actual property contents since we treat property content as strings (but the loader doesn't do that, yet).] I'm not sure what svnadmin verify is doing wrong in your case. But I know that there are corruptions it doesn't detect, and we're planning to improve this situation: http://subversion.tigris.org/issues/show_bug.cgi?id=3706 Actually, there is another option. Perhaps svnadmin verify is doing exactly the right thing ... checksums are stored in the repo, it calculates checksums and verifies them all. Perhaps it's right. Perhaps we're only *assuming* the corruption is at the slave side, while the corruption is actually at the master. The only thing I know for sure is that there's a checksum mismatch between master slave, for a specific file, beginning at a specific rev... Maybe the master is the one who's wrong. Problem is, I don't know of any way to check, and determine which side is wrong. It's very labor intensive to checkout all the revs from the slave, and from the master, and diff them all, to see if any other files are corrupt. But that is my plan, if I can't come up with a better idea.
Re: svnsync checksum error
OSG wrote on Tue, Nov 09, 2010 at 20:58:53 -0600: On 11/09/2010 06:41 PM, Daniel Shahaf wrote: Edward Ned Harvey wrote on Sat, Nov 06, 2010 at 20:29:18 -0400: From: opensrcguru [mailto:opensrcg...@gmail.com] Today, the sync process started failing on 1 repo (all others were unaffected) on both r/o copies at the exact same time/same revision with errors similar to the following... Transmitting file data .svnsync: Base checksum mismatch on '/path/to/file/foo/bar': expected: 2f2e025c4c4855e7466799a877b3e23d actual: 272214b9518d352e16e7eeceeb22f573 Can you compare the contents of /path/to/file/foo/bar between the master and mirror, as of the last revision successfully synced to the mirror? Yes, I had done that and yes, the last sync'd revs were in tact and accurate. So they are textually identical? Can you compare their checksums to the two checksums in the error message? If you create a fresh mirror and svnsync it, from r0 to that revision, does the file /path/to/file/foo/bar in the fresh mirror differ from the one in the master? No, a resync from r0 to current does not result in any differences. Meaning, a fresh resync is successful and doesn't cause any error messages? Or meaning, it results in the same error messages as before?
Re: svnsync checksum error
Edward Ned Harvey wrote on Wed, Nov 10, 2010 at 00:28:48 -0500: From: Daniel Shahaf [mailto:d...@daniel.shahaf.name] Can you compare the contents of /path/to/file/foo/bar between the master and mirror, as of the last revision successfully synced to the mirror? The latest rev which synced without reporting any error was 5045. It was trying to go from 5045 to 5046 when it triggered the checksum failure. I checked the history of the file in question, and it was changed in ~200 different revs. But the revs of interest are: in 4390, it synced to the slave without reporting any error, however, from 4390 onward, if I checkout from the slave and master, the two files differ. And the next rev where this file was changed was 5046, which is when svnsync notices the checksum mismatch, and dies. Okay. It would seem, all of this behavior could be explained by a simple undetected hardware error. During sync of 4390, the slave wrote some bits to disk, which got written wrongly. It is known that disks will do this rarely. This is one of the huge arguments in favor of ZFS and BTRFS and filesystem checksumming in general. Such filesystems detect and correct data corruption which would have otherwise passed silently... Which seems to be what happened in my case. Yes, the question is whether this thread is just a bunch of hardware errors, or something deeper. All servers and clients are running 1.6.12. However, at the time when 4390 was committed... The master was 1.6.12, but the slave was probably 1.5.7 If you create a fresh mirror and svnsync it, from r0 to that revision, does the file /path/to/file/foo/bar in the fresh mirror differ from the one in the master? No problems. Although ... I didn't let it sync from rev 0. (That would be impossibly time consuming... weeks) I did as mentioned before. Transferred a backup of the master to the slave, and used it as the seed for the sync, so I only needed to sync the last 100 revs or something like that... That would mean that the last changed revision --- r4390 --- is contained in the seed and wasn't re-svnsync'd. If we suspect that svnsync committed a bogus r4390 to the slave, we'd better start with a slave that /doesn't/ already have a knowingly-good r4390... Of course, you can take that backup and use it to produce a repository whose youngest revision is earlier than r4390.
Re: svnsync checksum error
On Wed, Nov 10, 2010 at 10:49 AM, Daniel Shahaf d...@daniel.shahaf.name wrote: OSG wrote on Tue, Nov 09, 2010 at 20:58:53 -0600: On 11/09/2010 06:41 PM, Daniel Shahaf wrote: Edward Ned Harvey wrote on Sat, Nov 06, 2010 at 20:29:18 -0400: From: opensrcguru [mailto:opensrcg...@gmail.com] Today, the sync process started failing on 1 repo (all others were unaffected) on both r/o copies at the exact same time/same revision with errors similar to the following... Transmitting file data .svnsync: Base checksum mismatch on '/path/to/file/foo/bar': expected: 2f2e025c4c4855e7466799a877b3e23d actual: 272214b9518d352e16e7eeceeb22f573 Can you compare the contents of /path/to/file/foo/bar between the master and mirror, as of the last revision successfully synced to the mirror? Yes, I had done that and yes, the last sync'd revs were in tact and accurate. So they are textually identical? Yes. Can you compare their checksums to the two checksums in the error message? I hadn't yet, but I can. What is being used to perform the sum (md5/sha1/???)? If you create a fresh mirror and svnsync it, from r0 to that revision, does the file /path/to/file/foo/bar in the fresh mirror differ from the one in the master? No, a resync from r0 to current does not result in any differences. Meaning, a fresh resync is successful and doesn't cause any error messages? Or meaning, it results in the same error messages as before? Correct. A new/fresh resync from r0 (including the previously troubled revision) to latest completes successfully with no errors. That process was the last in my troubleshooting process and is how I worked around the problem. -- In my case, I do not believe it to be hardware related because I had two r/o copies that exhibited the same behavior at the same rev at the same time. That is, unless there was a hardware issue on the source copy. Although possible, pretty unlikely.
Re: svnsync checksum error
On 11/10/2010 1:39 PM, opensrcguru wrote: Correct. A new/fresh resync from r0 (including the previously troubled revision) to latest completes successfully with no errors. That process was the last in my troubleshooting process and is how I worked around the problem. -- In my case, I do not believe it to be hardware related because I had two r/o copies that exhibited the same behavior at the same rev at the same time. That is, unless there was a hardware issue on the source copy. Although possible, pretty unlikely. I was able to fix mine by dumping up to a revision before the last few changes to the file with the error, loading that back and tweaking the properties that tell svnsync where to continue. I agree that a hardware error is pretty unlikely here. In my case it was a large zip file where the problem happened. Is there any chance there could have been a problem in the binary diff computation in a 1.6.x release version? I'm not exactly sure what version would have been running when the error happened but I copied things over to a machine with 1.6.13 for the repair and it did not duplicate the problem. -- Les Mikesell lesmikes...@gmail.com
Re: svnsync checksum error
On Sun, Nov 07, 2010 at 12:48:01PM -0500, Edward Ned Harvey wrote: I do think it's a bug, but I was never able to find enough info to make it into a bug report. I kept all the good bad versions of the repository... I ran the svnadmin verify all over the place (which is enormously time consuming) ... svnadmin dump | svnadmin load ... Everything I can think of. Never got any error in any way, except by repeating the svnsync from the master. I think it's a bug, too. We (elego) have seen this svnsync checksum error at a customer site, too. Never figured out how to reproduce it. It's 100% consistent. I get the same checksum error, on the same file, every time. I have a supposed good copy of the slave repo, at rev 4050... which will fail every time at 4061 (or something like that)... The only explanation I can find is a md5sum collision going undetected, and then some larger operation has an md5sum which fails as a result. I know it's astronomically impossible, but I can't come up with any other explanation. So you can reproduce it reliably? That's very interesting. I'd like to try to debug this. If it's possible to arrange access to your repository data please contact me off-list. Thanks. Stefan
RE: svnsync checksum error
From: Stefan Sperling [mailto:s...@elego.de] It's 100% consistent. I get the same checksum error, on the same file, every time. I have a supposed good copy of the slave repo, at rev 4050... which will fail every time at 4061 (or something like that)... The only explanation I can find is a md5sum collision going undetected, and then some larger operation has an md5sum which fails as a result. I know it's astronomically impossible, but I can't come up with any other explanation. So you can reproduce it reliably? That's very interesting. I'd like to try to debug this. If it's possible to arrange access to your repository data please contact me off-list. Thanks. I believe we found the cause for mine. It was hardware error, which was introduced silently into rev 4390 of my repo. But I can't speak for the other folks here... If they're having bugs, they might have bugs. One quick question though: If the system is calculating checksums, shouldn't it store the checksums for future reference? I find it very surprising that I can run svnadmin verify and no errors are detected, yet svnsync dies with a md5sum mismatch. Maybe the md5sums are only used transiently and only by svnsync?
Re: svnsync checksum error
Edward Ned Harvey wrote on Sat, Nov 06, 2010 at 20:29:18 -0400: From: opensrcguru [mailto:opensrcg...@gmail.com] Today, the sync process started failing on 1 repo (all others were unaffected) on both r/o copies at the exact same time/same revision with errors similar to the following... Transmitting file data .svnsync: Base checksum mismatch on '/path/to/file/foo/bar': expected: 2f2e025c4c4855e7466799a877b3e23d actual: 272214b9518d352e16e7eeceeb22f573 Can you compare the contents of /path/to/file/foo/bar between the master and mirror, as of the last revision successfully synced to the mirror? If you create a fresh mirror and svnsync it, from r0 to that revision, does the file /path/to/file/foo/bar in the fresh mirror differ from the one in the master? What versions of everything are you using? What format are the repositories? (What are the contents of the files $REPOS_DIR/db/fs-type and $REPOS_DIR/db/format?) I recently had the same problem. I never found any cause for it, but I did manage to deal with it somewhat better than you did. On the master, I did svnadmin hotcopy, then I tarred up the backup and sent it to the slave, and extracted it. I had to configure the slave hook scripts, and the revprop rev 0 properties, and then I was able to svnsync to the slave again. The main point of difference ... No need to wait for 65k commits to transfer. Since it's starting from a recent backup, it's enormously faster.
RE: svnsync checksum error
From: Daniel Shahaf [mailto:d...@daniel.shahaf.name] Can you compare the contents of /path/to/file/foo/bar between the master and mirror, as of the last revision successfully synced to the mirror? The latest rev which synced without reporting any error was 5045. It was trying to go from 5045 to 5046 when it triggered the checksum failure. I checked the history of the file in question, and it was changed in ~200 different revs. But the revs of interest are: in 4390, it synced to the slave without reporting any error, however, from 4390 onward, if I checkout from the slave and master, the two files differ. And the next rev where this file was changed was 5046, which is when svnsync notices the checksum mismatch, and dies. It would seem, all of this behavior could be explained by a simple undetected hardware error. During sync of 4390, the slave wrote some bits to disk, which got written wrongly. It is known that disks will do this rarely. This is one of the huge arguments in favor of ZFS and BTRFS and filesystem checksumming in general. Such filesystems detect and correct data corruption which would have otherwise passed silently... Which seems to be what happened in my case. All servers and clients are running 1.6.12. However, at the time when 4390 was committed... The master was 1.6.12, but the slave was probably 1.5.7 If you create a fresh mirror and svnsync it, from r0 to that revision, does the file /path/to/file/foo/bar in the fresh mirror differ from the one in the master? No problems. Although ... I didn't let it sync from rev 0. (That would be impossibly time consuming... weeks) I did as mentioned before. Transferred a backup of the master to the slave, and used it as the seed for the sync, so I only needed to sync the last 100 revs or something like that...
RE: svnsync checksum error
From: opensrcguru [mailto:opensrcg...@gmail.com] Today, the sync process started failing on 1 repo (all others were unaffected) on both r/o copies at the exact same time/same revision with errors similar to the following... Transmitting file data .svnsync: Base checksum mismatch on '/path/to/file/foo/bar': expected: 2f2e025c4c4855e7466799a877b3e23d actual: 272214b9518d352e16e7eeceeb22f573 I recently had the same problem. I never found any cause for it, but I did manage to deal with it somewhat better than you did. On the master, I did svnadmin hotcopy, then I tarred up the backup and sent it to the slave, and extracted it. I had to configure the slave hook scripts, and the revprop rev 0 properties, and then I was able to svnsync to the slave again. The main point of difference ... No need to wait for 65k commits to transfer. Since it's starting from a recent backup, it's enormously faster.
RE: svnsync checksum error
-Original Message- From: Terry Inzauro [mailto:opensrcg...@gmail.com] I've found a handful of other cases similar to ours. Do you think a bug report is warranted or is this unique to our configurations? I do think it's a bug, but I was never able to find enough info to make it into a bug report. I kept all the good bad versions of the repository... I ran the svnadmin verify all over the place (which is enormously time consuming) ... svnadmin dump | svnadmin load ... Everything I can think of. Never got any error in any way, except by repeating the svnsync from the master. It's 100% consistent. I get the same checksum error, on the same file, every time. I have a supposed good copy of the slave repo, at rev 4050... which will fail every time at 4061 (or something like that)... The only explanation I can find is a md5sum collision going undetected, and then some larger operation has an md5sum which fails as a result. I know it's astronomically impossible, but I can't come up with any other explanation.
Re: svnsync checksum error
On 11/06/2010 07:29 PM, Edward Ned Harvey wrote: From: opensrcguru [mailto:opensrcg...@gmail.com] Today, the sync process started failing on 1 repo (all others were unaffected) on both r/o copies at the exact same time/same revision with errors similar to the following... Transmitting file data .svnsync: Base checksum mismatch on '/path/to/file/foo/bar': expected: 2f2e025c4c4855e7466799a877b3e23d actual: 272214b9518d352e16e7eeceeb22f573 I recently had the same problem. I never found any cause for it, but I did manage to deal with it somewhat better than you did. On the master, I did svnadmin hotcopy, then I tarred up the backup and sent it to the slave, and extracted it. I had to configure the slave hook scripts, and the revprop rev 0 properties, and then I was able to svnsync to the slave again. The main point of difference ... No need to wait for 65k commits to transfer. Since it's starting from a recent backup, it's enormously faster. Yes, that sounds quite a bit easier/quicker. I didn't realise the r/o copies maintained by svnsync were that similar to the r/w copies they get their data from. Thank you for the information. I've found a handful of other cases similar to ours. Do you think a bug report is warranted or is this unique to our configurations? kind regards, OSG
svnsync checksum error
List, I've got about 20 repos that have been successfully syncing (with svnsync) to two read only copies for a few months. The r/w copy and both r/o copies are located on a local LAN (different subnets separated by firewalls). Today, the sync process started failing on 1 repo (all others were unaffected) on both r/o copies at the exact same time/same revision with errors similar to the following... Transmitting file data .svnsync: Base checksum mismatch on '/path/to/file/foo/bar': expected: 2f2e025c4c4855e7466799a877b3e23d actual: 272214b9518d352e16e7eeceeb22f573 I successfully removed the uncommitted transactions (svnadmin rmtxns reponame `svnadmin lstxns reponame`) and attempted the re-sync, to no avail. svnadmin verify returned no errors I ended up re-creating the r/o repo and then re-syncing all 65k commits to the repos (which takes a while...) Software binaries from Collabnet: r/w version = svn/svnsync, version 1.6.13 (r1002816) r/o 1 version = svn/svnsync, version 1.6.13 (r1002816) r/o 2 version = svn/svnsync, version 1.6.13 (r1002816) Is there a better approach to resolving the issue Am I running into a known issue? Any help/insight would be greatly appreciated. OSG