Is there a way to dump the checksums from a svn repo?
Is there a way to dump the checksums from a svn repo? What I'm doing at the moment on masters and slaves is $> svnadmin verify and $> sqlite $repo/db/rep-cache.db "select hash,revision from rep_cache" then additional comparing the sqlite output from master and slaves. Since rep-cache is not used during read requests it would be nice to have for example a parameter for svnadmin verify to output the checksums so they can be compared between master and slaves. Is there way for example via the python/perl API? Thanks for every answer and code snippet ... -- Regards, olli
Re: Is there a way to dump the checksums from a svn repo?
Guten Tag olli hauer, am Sonntag, 25. November 2012 um 20:18 schrieben Sie: > Thanks for every answer and code snippet ... I'm interested in which problem you try to solve with your approach? What's the reason behind it? Maybe there are other ways to accomplish what you want. Mit freundlichen Grüßen, Thorsten Schöning -- Thorsten Schöning E-Mail:thorsten.schoen...@am-soft.de AM-SoFT IT-Systeme http://www.AM-SoFT.de/ Telefon...05151- 9468- 55 Fax...05151- 9468- 88 Mobil..0178-8 9468- 04 AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow
Re: Is there a way to dump the checksums from a svn repo?
On 2012-11-25 21:49, Thorsten Schöning wrote: > Guten Tag olli hauer, > am Sonntag, 25. November 2012 um 20:18 schrieben Sie: > >> Thanks for every answer and code snippet ... > > I'm interested in which problem you try to solve with your approach? > What's the reason behind it? Maybe there are other ways to accomplish > what you want. > > Mit freundlichen Grüßen, > > Thorsten Schöning > Sorry for the delay ... I will try to explain some of my thoughts. Given you have one svn master from where dedicated slaves are syncing Both master and first slaves are under your control so far so good. Now some additional mirrors which are not under you full control are syncing from the slaves to help offload traffic. Someone hacks one of the additional mirrors, modifies a revision and adjust the checksum (as described on many places how-to fix a corrupt repo) so it looks OK even with svnadmin verify. Now if you have a million of revisions it will be hard to detect such an issue. Wouldn't it be nice to have the ability to calculate the checksums regularly so they can be compared with the upstream checksums? Another methode to detect such thing would be rsync the repo first with a dry-run and then do a live sync but svnsync is preferred. -- Regards, olli
Re: Is there a way to dump the checksums from a svn repo?
Guten Tag olli hauer, am Mittwoch, 28. November 2012 um 22:45 schrieben Sie: > Someone hacks one of the additional mirrors, modifies a revision and adjust > the > checksum (as described on many places how-to fix a corrupt repo) so it looks > OK > even with svnadmin verify. Sounds interesting, but if the mirrors not under your full control already have been hacked how can you trust the locally produced checksums by svnadmin? You can't as you can't trust the mirror in any way, svnadmin could be manipulated, too, you would need to get the data to a trustful environment again and check it from there. You solution wouldn't even scale as you had to recalculate all checksums and compare all revisions all over again, you wouldn't have any point in time where you could say that the first million revisions are totally OK and could rely on that in the future. I would think in another direction and use digital signatures to be able to detect changes to revisions after the approval that there in a consistent state with the master. Get unsigned revisions from the mirrors, compare them file by file using hashes with the revisions you trust and if everything is ok sign them. Depending on your mirrors and the security you need you wouldn't even need to copy the data, just make it accessible for read access during ssh or whatever. The benefit is you could use already available tools and would only need to check unsigned revisions, but can check the integrity of the already signed revisions really fast and whenever you like. The signature information for each revision file or checked block, however you would implement such an approach, can even be stored on the untrustful mirrors, nor problem as nobody else than you and however you trust is able to create valid signatures. Just an idea, as signatures were exactly made for such purposes were one has to detect data manipulation in any way. Besides that, maybe have look at the mirroring products of WanDisco, it's possible that they already have a solution. Mit freundlichen Grüßen, Thorsten Schöning -- Thorsten Schöning E-Mail:thorsten.schoen...@am-soft.de AM-SoFT IT-Systeme http://www.AM-SoFT.de/ Telefon...05151- 9468- 55 Fax...05151- 9468- 88 Mobil..0178-8 9468- 04 AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow
Re: Is there a way to dump the checksums from a svn repo?
On Thu, Nov 29, 2012 at 1:59 AM, Thorsten Schöning wrote: > Guten Tag olli hauer, > am Mittwoch, 28. November 2012 um 22:45 schrieben Sie: > >> Someone hacks one of the additional mirrors, modifies a revision and adjust >> the >> checksum (as described on many places how-to fix a corrupt repo) so it looks >> OK >> even with svnadmin verify. > > Sounds interesting, but if the mirrors not under your full control > already have been hacked how can you trust the locally produced > checksums by svnadmin? You can't as you can't trust the mirror in any > way, svnadmin could be manipulated, too, you would need to get the > data to a trustful environment again and check it from there. For things where the file representation is the same, I just use an 'rsync -nv' against a known-good copy to verify integrity and it runs pretty quickly. But, the copy built by svnsync doesn't necessarily get stored the same way, does it? -- Les Mikesell lesmikes...@gmail.com
Re: Is there a way to dump the checksums from a svn repo?
Les Mikesell wrote on Thu, Nov 29, 2012 at 09:59:47 -0600: > On Thu, Nov 29, 2012 at 1:59 AM, Thorsten Schöning > wrote: > > Guten Tag olli hauer, > > am Mittwoch, 28. November 2012 um 22:45 schrieben Sie: > > > >> Someone hacks one of the additional mirrors, modifies a revision and > >> adjust the > >> checksum (as described on many places how-to fix a corrupt repo) so it > >> looks OK > >> even with svnadmin verify. > > > > Sounds interesting, but if the mirrors not under your full control > > already have been hacked how can you trust the locally produced > > checksums by svnadmin? You can't as you can't trust the mirror in any > > way, svnadmin could be manipulated, too, you would need to get the > > data to a trustful environment again and check it from there. > > For things where the file representation is the same, I just use an > 'rsync -nv' against a known-good copy to verify integrity and it runs > pretty quickly. But, the copy built by svnsync doesn't necessarily > get stored the same way, does it? I think in 1.8/fsfs it will byte-for-byte identical. (except rep-cache.db, but you can remove that file without consequences) There was a dev@ thread by philipm about this not too long ago.
Re: Is there a way to dump the checksums from a svn repo?
olli hauer writes: > Is there a way to dump the checksums from a svn repo? > > What I'm doing at the moment on masters and slaves is > $> svnadmin verify > and > $> sqlite $repo/db/rep-cache.db "select hash,revision from rep_cache" > > then additional comparing the sqlite output from master and slaves. > > Since rep-cache is not used during read requests it would be nice to have > for example a parameter for svnadmin verify to output the checksums so > they can be compared between master and slaves. > > Is there way for example via the python/perl API? > > Thanks for every answer and code snippet ... I did it in C but I suppose you might be able to use the Python bindings. I did svn_fs_open() svn_fs_revision_root(N) svn_repos_replay2(N-1) which drove an editor from rN-1 rto rN and the editor did nothing except extract the checksum from the close_file callback. -- Certified & Supported Apache Subversion Downloads: http://www.wandisco.com/subversion/download
Re: Is there a way to dump the checksums from a svn repo?
Daniel Shahaf writes: > Les Mikesell wrote on Thu, Nov 29, 2012 at 09:59:47 -0600: >> But, the copy built by svnsync doesn't necessarily >> get stored the same way, does it? > > I think in 1.8/fsfs it will byte-for-byte identical. (except > rep-cache.db, but you can remove that file without consequences) > > There was a dev@ thread by philipm about this not too long ago. No, an svnsync mirror is usually not identical to the master. It does contain the same versioned data but the representation of that data is different. For example, every failed commit on the master will bump the fsfs sequence number and that will cause the node-revision-ids to be different. -- Certified & Supported Apache Subversion Downloads: http://www.wandisco.com/subversion/download
Re: Is there a way to dump the checksums from a svn repo?
Philip Martin wrote on Thu, Nov 29, 2012 at 18:26:04 +: > Daniel Shahaf writes: > > > Les Mikesell wrote on Thu, Nov 29, 2012 at 09:59:47 -0600: > >> But, the copy built by svnsync doesn't necessarily > >> get stored the same way, does it? > > > > I think in 1.8/fsfs it will byte-for-byte identical. (except > > rep-cache.db, but you can remove that file without consequences) > > > > There was a dev@ thread by philipm about this not too long ago. > > No, an svnsync mirror is usually not identical to the master. It does > contain the same versioned data but the representation of that data is > different. For example, every failed commit on the master will bump the > fsfs sequence number and that will cause the node-revision-ids to be > different. Node-revision-id's in revisions don't embed transaction id's... For example the noderev header (yes, header, not just id) of /subversion/trunk/notes is identical between svn.us and svn.eu.
Re: Is there a way to dump the checksums from a svn repo?
Daniel Shahaf writes: > Philip Martin wrote on Thu, Nov 29, 2012 at 18:26:04 +: >> Daniel Shahaf writes: >> >> > Les Mikesell wrote on Thu, Nov 29, 2012 at 09:59:47 -0600: >> >> But, the copy built by svnsync doesn't necessarily >> >> get stored the same way, does it? >> > >> > I think in 1.8/fsfs it will byte-for-byte identical. (except >> > rep-cache.db, but you can remove that file without consequences) >> > >> > There was a dev@ thread by philipm about this not too long ago. >> >> No, an svnsync mirror is usually not identical to the master. It does >> contain the same versioned data but the representation of that data is >> different. For example, every failed commit on the master will bump the >> fsfs sequence number and that will cause the node-revision-ids to be >> different. > > Node-revision-id's in revisions don't embed transaction id's... > > For example the noderev header (yes, header, not just id) of > /subversion/trunk/notes is identical between svn.us and svn.eu. OK. But the sequence number differences do show up in other places: svnadmin create repo svn mkdir -mm file://`pwd`/repo/A # r1 svn mkdir -mm file://`pwd`/repo/A # fail svn mkdir -mm file://`pwd`/repo/A/B # r2 svnadmin create repo2 svnadmin dump repo | svnadmin load repo2 diff repo/db/revs/0/2 repo2/db/revs/0/2 37c37 < _1.0.t1-2 add-dir false false /A/B --- > _1.0.t1-1 add-dir false false /A/B Further, node-revision-ids can vary for other reasons. Representations in the revision files are in whatever order the client sends representations to the server. There are no defined orders for clients to use so it is quite likely that commits to the master and the mirror will use different orders: mkdir zz echo foo > zz/f echo bar > zz/g echo zigzig > zz/F echo zagzag > zz/G svnadmin create repo svn mkdir -mm file://`pwd`/repo/A svnadmin create repo2 svnsync init file://`pwd`/repo2 file://`pwd`/repo svnsync sync file://`pwd`/repo2 I see orders: repo/db/revs/0/1: foo, zigzig, zagzag, bar repo2/db/revs/0/1: zigzig, zagzag, foo, bar That affects the offsets in the text: lines, often changing the line length, which in turn affects the position of the subsequent nodes, and the position of the nodes affects the node-revision-ids. -- Certified & Supported Apache Subversion Downloads: http://www.wandisco.com/subversion/download
Re: Is there a way to dump the checksums from a svn repo?
Philip Martin writes: > mkdir zz > echo foo > zz/f > echo bar > zz/g > echo zigzig > zz/F > echo zagzag > zz/G > svnadmin create repo > svn mkdir -mm file://`pwd`/repo/A oops! should be import not mkdir svn import -mm zz file://`pwd`/repo/A > svnadmin create repo2 > svnsync init file://`pwd`/repo2 file://`pwd`/repo > svnsync sync file://`pwd`/repo2 -- Certified & Supported Apache Subversion Downloads: http://www.wandisco.com/subversion/download
Re: Is there a way to dump the checksums from a svn repo?
Philip Martin wrote on Thu, Nov 29, 2012 at 19:13:11 +: > Daniel Shahaf writes: > > > Philip Martin wrote on Thu, Nov 29, 2012 at 18:26:04 +: > >> Daniel Shahaf writes: > >> > >> > Les Mikesell wrote on Thu, Nov 29, 2012 at 09:59:47 -0600: > >> >> But, the copy built by svnsync doesn't necessarily > >> >> get stored the same way, does it? > >> > > >> > I think in 1.8/fsfs it will byte-for-byte identical. (except > >> > rep-cache.db, but you can remove that file without consequences) > >> > > >> > There was a dev@ thread by philipm about this not too long ago. > >> > >> No, an svnsync mirror is usually not identical to the master. It does > >> contain the same versioned data but the representation of that data is > >> different. For example, every failed commit on the master will bump the > >> fsfs sequence number and that will cause the node-revision-ids to be > >> different. > > > > Node-revision-id's in revisions don't embed transaction id's... > > > > For example the noderev header (yes, header, not just id) of > > /subversion/trunk/notes is identical between svn.us and svn.eu. > > OK. But the sequence number differences do show up in other places: > > Further, node-revision-ids can vary for other reasons. Representations > in the revision files are in whatever order the client sends > representations to the server. There are no defined orders for clients > to use so it is quite likely that commits to the master and the mirror > will use different orders: > That affects the offsets in the text: lines, often changing the line > length, which in turn affects the position of the subsequent nodes, and > the position of the nodes affects the node-revision-ids. > Yes, that's exactly what your thread <87mx2hw607@stat.home.lan> was about. I thought in the end that patch got committed? > svnadmin create repo > svn mkdir -mm file://`pwd`/repo/A # r1 > svn mkdir -mm file://`pwd`/repo/A # fail > svn mkdir -mm file://`pwd`/repo/A/B # r2 > svnadmin create repo2 > svnadmin dump repo | svnadmin load repo2 > diff repo/db/revs/0/2 repo2/db/revs/0/2 > 37c37 > < _1.0.t1-2 add-dir false false /A/B > --- > > _1.0.t1-1 add-dir false false /A/B > Well, that answers the question: revision files are not byte-for-byte identical. I wonder, though, if we should be rewriting these to use the revfile noderev id's? If not to avoid _* id's in revfiles, then to make the revfiles deterministic by using the ("stable") revfile noderev id's --- for the reasons given in your linked thread.
Re: Is there a way to dump the checksums from a svn repo?
Daniel Shahaf writes: >> Further, node-revision-ids can vary for other reasons. Representations >> in the revision files are in whatever order the client sends >> representations to the server. There are no defined orders for clients >> to use so it is quite likely that commits to the master and the mirror >> will use different orders: > >> That affects the offsets in the text: lines, often changing the line >> length, which in turn affects the position of the subsequent nodes, and >> the position of the nodes affects the node-revision-ids. > > Yes, that's exactly what your thread <87mx2hw607@stat.home.lan> was > about. I thought in the end that patch got committed? That was committed but it's not quite the same problem. That thread was about revision file differences caused by the server itself. When comparing commits on a master and slave there can also be differences caused by the client. -- Certified & Supported Apache Subversion Downloads: http://www.wandisco.com/subversion/download
Re: Is there a way to dump the checksums from a svn repo?
On 2012-11-29 20:13, Philip Martin wrote: > Daniel Shahaf writes: > >> Philip Martin wrote on Thu, Nov 29, 2012 at 18:26:04 +: >>> Daniel Shahaf writes: >>> Les Mikesell wrote on Thu, Nov 29, 2012 at 09:59:47 -0600: > But, the copy built by svnsync doesn't necessarily > get stored the same way, does it? I think in 1.8/fsfs it will byte-for-byte identical. (except rep-cache.db, but you can remove that file without consequences) There was a dev@ thread by philipm about this not too long ago. >>> >>> No, an svnsync mirror is usually not identical to the master. It does >>> contain the same versioned data but the representation of that data is >>> different. For example, every failed commit on the master will bump the >>> fsfs sequence number and that will cause the node-revision-ids to be >>> different. >> >> Node-revision-id's in revisions don't embed transaction id's... >> >> For example the noderev header (yes, header, not just id) of >> /subversion/trunk/notes is identical between svn.us and svn.eu. > > OK. But the sequence number differences do show up in other places: > > svnadmin create repo > svn mkdir -mm file://`pwd`/repo/A # r1 > svn mkdir -mm file://`pwd`/repo/A # fail > svn mkdir -mm file://`pwd`/repo/A/B # r2 > svnadmin create repo2 > svnadmin dump repo | svnadmin load repo2 > diff repo/db/revs/0/2 repo2/db/revs/0/2 > 37c37 > < _1.0.t1-2 add-dir false false /A/B > --- >> _1.0.t1-1 add-dir false false /A/B > > Further, node-revision-ids can vary for other reasons. Representations > in the revision files are in whatever order the client sends > representations to the server. There are no defined orders for clients > to use so it is quite likely that commits to the master and the mirror > will use different orders: > > mkdir zz > echo foo > zz/f > echo bar > zz/g > echo zigzig > zz/F > echo zagzag > zz/G > svnadmin create repo > svn mkdir -mm file://`pwd`/repo/A > svnadmin create repo2 > svnsync init file://`pwd`/repo2 file://`pwd`/repo > svnsync sync file://`pwd`/repo2 > > I see orders: > >repo/db/revs/0/1: foo, zigzig, zagzag, bar > repo2/db/revs/0/1: zigzig, zagzag, foo, bar > > That affects the offsets in the text: lines, often changing the line > length, which in turn affects the position of the subsequent nodes, and > the position of the nodes affects the node-revision-ids. > Thats what I also see with svnsync, specially for revisions with a lot of files in the initial commit (master and mirror are the same OS and installed with exact the same packages no matter if I sync over svn or http(s)).
Re: Is there a way to dump the checksums from a svn repo?
On 2012-11-29 19:24, Philip Martin wrote: > olli hauer writes: > >> Is there a way to dump the checksums from a svn repo? >> >> What I'm doing at the moment on masters and slaves is >> $> svnadmin verify >> and >> $> sqlite $repo/db/rep-cache.db "select hash,revision from rep_cache" >> >> then additional comparing the sqlite output from master and slaves. >> >> Since rep-cache is not used during read requests it would be nice to have >> for example a parameter for svnadmin verify to output the checksums so >> they can be compared between master and slaves. >> >> Is there way for example via the python/perl API? >> >> Thanks for every answer and code snippet ... > > I did it in C but I suppose you might be able to use the Python > bindings. I did > > svn_fs_open() > svn_fs_revision_root(N) > svn_repos_replay2(N-1) > > which drove an editor from rN-1 rto rN and the editor did nothing except > extract the checksum from the close_file callback. > Thanks for the hint, I will do some tests with your promised snipped.
Re: Is there a way to dump the checksums from a svn repo?
Philip Martin wrote on Thu, Nov 29, 2012 at 18:24:38 +: > olli hauer writes: > > > Is there a way to dump the checksums from a svn repo? > > > > What I'm doing at the moment on masters and slaves is > > $> svnadmin verify > > and > > $> sqlite $repo/db/rep-cache.db "select hash,revision from rep_cache" > > > > then additional comparing the sqlite output from master and slaves. > > > > Since rep-cache is not used during read requests it would be nice to have > > for example a parameter for svnadmin verify to output the checksums so > > they can be compared between master and slaves. > > > > Is there way for example via the python/perl API? > > > > Thanks for every answer and code snippet ... > > I did it in C but I suppose you might be able to use the Python > bindings. I did > > svn_fs_open() > svn_fs_revision_root(N) > svn_repos_replay2(N-1) > > which drove an editor from rN-1 rto rN and the editor did nothing except > extract the checksum from the close_file callback. This will only give you the precalculated checksum stored as a metadata attribute within the backend --- it's not going to checksum the file on-the-fly to compute the actual checksum.