Re: propagating repo corruption across clone
On Wed, Mar 27, 2013 at 2:51 PM, Rich Fromm wrote: > Nevertheless, I will try to contact Jeff and point him at this. My initial > reading of his blog posts definitely gave me the impression that this was a > --mirror vs. not issue, but it really sounds like his main problem was using > --local. Actually, I wasn't using --local, I just wasn't using --no-local, as I mixed up that and --no-hardlinks (which lies somewhere between, but is still not the same as using file://). It's entirely possible that you read the posts before I updated them. Also, keep in mind, if you're evaluating what you're doing, that the system we have was not set up to be a backup system. It was set up to be a mirror system. With proper sanity checking, it would have acted in a decent capacity as a backup system, but that wasn't why it was set up, nor how it was set up. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: propagating repo corruption across clone
On Tue, Mar 26, 2013 at 11:47 PM, Junio C Hamano wrote: > The difference between --mirror and no --mirror is a red herring. > You may want to ask Jeff Mitchell to remove the mention of it; it > only adds to the confusion without helping users. If you made > byte-for-byte copy of corrupt repository, it wouldn't make any > difference if the first "checkout" notices it. Hi, Several days ago I had actually already updated the post to indicate that my testing methodology was incorrect as a result of mixing up --no-hardlinks and --no-local, and pointed to this thread. I will say that we did see corrupted repos on the downstream mirrors. I don't have an explanation for it, but have not been able to reproduce it either. My only potential guess (untested) is that perhaps when the corruption was detected the clone aborted but left the objects already transferred locally. Again, untested -- I mention it only because it's my only potential explanation :-) > To be paranoid, you may want to set transfer.fsckObjects to true, > perhaps in your ~/.gitconfig. Interesting; I'd known about receive.fsckObjects but not transfer/fetch. Thanks for the pointer. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: propagating repo corruption across clone
On Wed, Mar 27, 2013 at 03:49:38PM -0400, Jeff King wrote: > On Wed, Mar 27, 2013 at 11:23:15AM -0700, Rich Fromm wrote: > > > But I'm still somewhat confused about what is and is not checked under what > > conditions. Consider the three statements: > > > > # 1 > > git clone --mirror myuser@myhost:my_repo > > > > # 2 > > git clone --mirror --config transfer.fsckObjects=true myuser@myhost:my_repo > > > > # 3 > > git clone --mirror myuser@myhost:my_repo && cd my_repo.git && git-fsck > > > > Are 2 and 3 equivalent? Or is there an increasing level of checking that > > occurs from 1 to 2, and from 2 to 3? My guess is the latter, but perhaps > > this could be clearer in the man pages. > > 2 and 3 are not exactly equivalent, in that they are implemented > slightly differently, but I do not know offhand of any case that would > pass 2 but not 3. We do check reachability with transfer.fsckObjects. Oh, and in the case of #1, I think we would already find corruption, in that index-pack will expand and check the sha1 of each object we receive. The transfer.fsckObjects check adds some semantic checks as well (e.g., making sure author identities are well-formed). Clone will not currently detect missing objects and reachability without transfer.fsckObjects set, but that is IMHO a bug; fetch will notice it, and clone should behave the same way. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: propagating repo corruption across clone
On Wed, Mar 27, 2013 at 11:23:15AM -0700, Rich Fromm wrote: > But I'm still somewhat confused about what is and is not checked under what > conditions. Consider the three statements: > > # 1 > git clone --mirror myuser@myhost:my_repo > > # 2 > git clone --mirror --config transfer.fsckObjects=true myuser@myhost:my_repo > > # 3 > git clone --mirror myuser@myhost:my_repo && cd my_repo.git && git-fsck > > Are 2 and 3 equivalent? Or is there an increasing level of checking that > occurs from 1 to 2, and from 2 to 3? My guess is the latter, but perhaps > this could be clearer in the man pages. 2 and 3 are not exactly equivalent, in that they are implemented slightly differently, but I do not know offhand of any case that would pass 2 but not 3. We do check reachability with transfer.fsckObjects. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: propagating repo corruption across clone
Rich Fromm writes: > Apologies if my questions are considered slightly off topic -- I'm not > positive if this is supposed to be a list for developers, and not users. The list is both for users and developers. > However, I think there may be room for some additional clarity in the docs. > The --local option in git-config(1) says "When the repository to clone from > is on a local machine, this flag bypasses the normal "git aware" transport > mechanism". But there's no mention of the consequences of this transport > bypass. Yeah, I would not mind a patch to update the documentation for "clone --local" and rsync transport to say something about byte-for-byte copying of broken repository. Thanks. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: propagating repo corruption across clone
Junio C Hamano wrote > If you use --local, that is equivalent to "cp -R". Your corruption > in the source will faithfully be byte-for-byte copied to the > destination. If you do not > ... > transport layer will notice > object corruption. > ... > The difference between --mirror and no --mirror is a red herring. > You may want to ask Jeff Mitchell to remove the mention of it; it > only adds to the confusion without helping users. Just to clarify, I don't know Jeff Mitchell personally, and I'm not affiliated with the KDE project. I happened to have recently implemented a backup strategy for a different codebase, that relies on `git clone --mirror` to take the actual snapshots of the live repos, and I read about Jeff's experiences, and that's why I started following this discussion. Apologies if my questions are considered slightly off topic -- I'm not positive if this is supposed to be a list for developers, and not users. Nevertheless, I will try to contact Jeff and point him at this. My initial reading of his blog posts definitely gave me the impression that this was a --mirror vs. not issue, but it really sounds like his main problem was using --local. However, I think there may be room for some additional clarity in the docs. The --local option in git-config(1) says "When the repository to clone from is on a local machine, this flag bypasses the normal "git aware" transport mechanism". But there's no mention of the consequences of this transport bypass. There's also no mention of this in the "GIT URLS" section that discusses transport protocols, and I also don't see anything noting it in either of these sections of the git book: http://git-scm.com/book/en/Git-on-the-Server-The-Protocols http://git-scm.com/book/en/Git-Internals-Transfer-Protocols -- View this message in context: http://git.661346.n2.nabble.com/propagating-repo-corruption-across-clone-tp7580504p7580845.html Sent from the git mailing list archive at Nabble.com. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: propagating repo corruption across clone
Jonathan Nieder-2 wrote > Is the "[transfer] fsckObjects" configuration on the host executing the > clone set to true? I hadn't been setting it at all, and according to git-config(1) it defaults to false, so the answer is no. It looks like setting it might be a good idea. But I'm still somewhat confused about what is and is not checked under what conditions. Consider the three statements: # 1 git clone --mirror myuser@myhost:my_repo # 2 git clone --mirror --config transfer.fsckObjects=true myuser@myhost:my_repo # 3 git clone --mirror myuser@myhost:my_repo && cd my_repo.git && git-fsck Are 2 and 3 equivalent? Or is there an increasing level of checking that occurs from 1 to 2, and from 2 to 3? My guess is the latter, but perhaps this could be clearer in the man pages. git-config(1) says that transfer.fsckObjects essentially (if fetch... and receive... are not explicitly set) "git-fetch-pack will check all fetched objects" and "git-receive-pack will check all received objects." While git-fsck(1) says "git-fsck tests SHA1 and general object sanity, and it does full tracking of the resulting reachability and everything else." The latter sounds like a stronger statement to me. But if that's true, perhaps should the relevant section(s) of git-config(1) explicitly note that this is not equivalent to a full git-fsck ? -- View this message in context: http://git.661346.n2.nabble.com/propagating-repo-corruption-across-clone-tp7580504p7580839.html Sent from the git mailing list archive at Nabble.com. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: propagating repo corruption across clone
On Wed, Mar 27, 2013 at 8:33 PM, Junio C Hamano wrote: > Sitaram Chamarty writes: > >> On Wed, Mar 27, 2013 at 9:17 AM, Junio C Hamano wrote: >> >>> To be paranoid, you may want to set transfer.fsckObjects to true, >>> perhaps in your ~/.gitconfig. >> >> do we have any numbers on the overhead of this? >> >> Even a "guesstimate" will do... > > On a reasonably slow machine: > > $ cd /project/git/git.git && git repack -a -d > $ ls -hl .git/objects/pack/*.pack > -r--r--r-- 1 junio src 44M Mar 26 13:18 > .git/objects/pack/pack-c40635e5ee2b7094eb0e2c416e921a2b129bd8d2.pack > > $ cd .. && git --bare init junk && cd junk > $ time git index-pack --strict --stdin <../git.git/.git/objects/pack/*.pack > real0m13.873s > user0m21.345s > sys 0m2.248s > > That's about 3.2 Mbps? > > Compare that with the speed your other side feeds you (or your line > speed could be the limiting factor) and decide how much you value > your data. Thanks. I was also interested in overhead on the server just as a %-age. I have no idea why but when I did some tests a long time ago I got upwards of 40% or so, but now when I try these tests for git.git cd git init --bare # git config transfer.fsckobjects true git fetch file:///full/path/to/git.git refs/*:refs/* then, the difference in elapsed time 18s -> 22s, so about 22%, and CPU time is 31 -> 37, so about 20%. I didn't measure disk access increases, but I guess 20% is not too bad. Is it likely to be linear in the size of the repo, by and large? -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: propagating repo corruption across clone
Sitaram Chamarty writes: > On Wed, Mar 27, 2013 at 9:17 AM, Junio C Hamano wrote: > >> To be paranoid, you may want to set transfer.fsckObjects to true, >> perhaps in your ~/.gitconfig. > > do we have any numbers on the overhead of this? > > Even a "guesstimate" will do... On a reasonably slow machine: $ cd /project/git/git.git && git repack -a -d $ ls -hl .git/objects/pack/*.pack -r--r--r-- 1 junio src 44M Mar 26 13:18 .git/objects/pack/pack-c40635e5ee2b7094eb0e2c416e921a2b129bd8d2.pack $ cd .. && git --bare init junk && cd junk $ time git index-pack --strict --stdin <../git.git/.git/objects/pack/*.pack real0m13.873s user0m21.345s sys 0m2.248s That's about 3.2 Mbps? Compare that with the speed your other side feeds you (or your line speed could be the limiting factor) and decide how much you value your data. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: propagating repo corruption across clone
On Wed, Mar 27, 2013 at 9:17 AM, Junio C Hamano wrote: > To be paranoid, you may want to set transfer.fsckObjects to true, > perhaps in your ~/.gitconfig. do we have any numbers on the overhead of this? Even a "guesstimate" will do... -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: propagating repo corruption across clone
Rich Fromm writes: > Jeff King wrote >> Fundamentally the problem is >> that the --local transport is not safe from propagating corruption, and >> should not be used if that's a requirement. > > I've read Jeff Mitchell's blog post, his update, relevant parts of the > git-clone(1) man page, and a decent chunk of this thread, and I'm still not > clear on one thing. Is the danger of `git clone --mirror` propagating > corruption only true when using the --local option ? If you use --local, that is equivalent to "cp -R". Your corruption in the source will faithfully be byte-for-byte copied to the destination. If you do not (and in your case you have two different machines), unless you are using the long deprecated rsync transport (which again is the same as "cp -R"), transport layer will notice object corruption. See Jeff's analysis earlier in the thread. If you are lucky (or unlucky, depending on how you look at it), the corruption you have in your object store may affect objects that are needed to check out the version at the tip of the history, and "git checkout" that happens as the last step of cloning may loudly complain, but that just means you can immediately notice the breakage in that case. You may be unlucky and the corruption may not affect objects that are needed to check out the tip. The initial checkout will succeed as if nothing is wrong, but the corruption in your object store is still there nevertheless. "git log -p --all" or "git fsck" will certainly be unhappy. The difference between --mirror and no --mirror is a red herring. You may want to ask Jeff Mitchell to remove the mention of it; it only adds to the confusion without helping users. If you made byte-for-byte copy of corrupt repository, it wouldn't make any difference if the first "checkout" notices it. To be paranoid, you may want to set transfer.fsckObjects to true, perhaps in your ~/.gitconfig. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: propagating repo corruption across clone
Hi, Rich Fromm wrote: >The host executing the clone > command is different than the the host on which the remote repository lives, > and I am using ssh as a transport protocol. If there is corruption, can I > or can I not expect the clone operation to fail and return a non-zero exit > value? If I can not expect this, is the workaround to run `git fsck` on the > resulting clone? Is the "[transfer] fsckObjects" configuration on the host executing the clone set to true? -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: propagating repo corruption across clone
Jeff King wrote > Fundamentally the problem is > that the --local transport is not safe from propagating corruption, and > should not be used if that's a requirement. I've read Jeff Mitchell's blog post, his update, relevant parts of the git-clone(1) man page, and a decent chunk of this thread, and I'm still not clear on one thing. Is the danger of `git clone --mirror` propagating corruption only true when using the --local option ? Specifically, in my case, I'm using `git clone --mirror`, but I'm *not* using --local, nor am I using --no-hardlinks. The host executing the clone command is different than the the host on which the remote repository lives, and I am using ssh as a transport protocol. If there is corruption, can I or can I not expect the clone operation to fail and return a non-zero exit value? If I can not expect this, is the workaround to run `git fsck` on the resulting clone? -- View this message in context: http://git.661346.n2.nabble.com/propagating-repo-corruption-across-clone-tp7580504p7580771.html Sent from the git mailing list archive at Nabble.com. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: propagating repo corruption across clone
On Tue, Mar 26, 2013 at 09:59:42PM -, Philip Oakley wrote: > Which way does `git bundle file.bundl --all` perform after the changes > for both the 'transport' checking and being reliable during updates. Bundles are treated at a fairly low level the same as a remote who provides us a particular set of refs and a packfile. So we should get the same protections via index-pack, and still run check_everything_connected on it, just as we would with a fetch over the git protocol. I didn't test it, though. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: propagating repo corruption across clone
From: "Jeff King" Sent: Tuesday, March 26, 2013 4:55 PM On Tue, Mar 26, 2013 at 09:43:01AM -0400, Jeff Mitchell wrote: On Mon, Mar 25, 2013 at 4:07 PM, Jeff King wrote: > On Mon, Mar 25, 2013 at 12:32:50PM -0400, Jeff Mitchell wrote: >> For commit corruptions, the --no-hardlinks, non --mirror case >> refused >> to create the new repository and exited with an error code of 128. >> The >> --no-hardlinks, --mirror case spewed errors to the console, yet >> *still* created the new clone *and* returned an error code of >> zero. > > I wasn't able to reproduce this; can you post a succint test case? [...link to tar.gz...] Once you extract that, you should be able to run a clone using paths (not file://) with --no-hardlinks --mirror and replicate the behavior I saw. FYI, I'm on Git 1.8.2. Thanks for providing an example. The difference is the same "--mirror implies --bare" issue; the non-bare case dies during the checkout (even before my patches, as the corruption is not in a blob, but rather in the HEAD commit object itself). You can replace --mirror with --bare and see the same behavior. The troubling part is that we see errors in the bare case, but do not die. Those errors all come from upload-pack, the "sending" side of a clone/fetch. Even though we do not transfer the objects via the git protocol, we still invoke upload-pack to get the ref list (and then copy the objects themselves out-of-band). What happens is that upload-pack sees the errors while trying to see if the object is a tag that can be peeled (the server advertises both tags and the objects they point to). It does not distinguish between "errors did not let me peel this object" and "this object is not a tag, and therefore there is nothing to peel". We could change that, but I'm not sure whether it is a good idea. I think the intent is that upload-pack's ref advertisement would remain resilient to corruption in the repository (e.g., even if that commit is corrupt, you can still fetch the rest of the data). We should not worry about advertising broken objects, because we will encounter the same error when we actually do try to send the objects. Dying at the advertisement phase would be premature, since we do not yet know what the client will request. The problem, of course, is that the --local optimization _skips_ the part where we actually ask upload-pack for data, and instead blindly copies it. So this is the same issue as usual, which is that the local transport is not thorough enough to catch corruption. It seems like a failing in this case, because upload-pack does notice the problem, but that is only luck; if the corruption were in a non-tip object, it would not notice it at all. So trying to die on errors in the ref advertisement would just be a band-aid. Fundamentally the problem is that the --local transport is not safe from propagating corruption, and should not be used if that's a requirement. -Peff -- Which way does `git bundle file.bundl --all` perform after the changes for both the 'transport' checking and being reliable during updates. Is it an option for creating an archivable file that can be used for a later `clone`? I wasn't sure if the bundle capability had been considered. Philip -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: propagating repo corruption across clone
On Tue, Mar 26, 2013 at 09:43:01AM -0400, Jeff Mitchell wrote: > On Mon, Mar 25, 2013 at 4:07 PM, Jeff King wrote: > > On Mon, Mar 25, 2013 at 12:32:50PM -0400, Jeff Mitchell wrote: > >> For commit corruptions, the --no-hardlinks, non --mirror case refused > >> to create the new repository and exited with an error code of 128. The > >> --no-hardlinks, --mirror case spewed errors to the console, yet > >> *still* created the new clone *and* returned an error code of zero. > > > > I wasn't able to reproduce this; can you post a succint test case? > > [...link to tar.gz...] > Once you extract that, you should be able to run a clone using paths > (not file://) with --no-hardlinks --mirror and replicate the behavior > I saw. FYI, I'm on Git 1.8.2. Thanks for providing an example. The difference is the same "--mirror implies --bare" issue; the non-bare case dies during the checkout (even before my patches, as the corruption is not in a blob, but rather in the HEAD commit object itself). You can replace --mirror with --bare and see the same behavior. The troubling part is that we see errors in the bare case, but do not die. Those errors all come from upload-pack, the "sending" side of a clone/fetch. Even though we do not transfer the objects via the git protocol, we still invoke upload-pack to get the ref list (and then copy the objects themselves out-of-band). What happens is that upload-pack sees the errors while trying to see if the object is a tag that can be peeled (the server advertises both tags and the objects they point to). It does not distinguish between "errors did not let me peel this object" and "this object is not a tag, and therefore there is nothing to peel". We could change that, but I'm not sure whether it is a good idea. I think the intent is that upload-pack's ref advertisement would remain resilient to corruption in the repository (e.g., even if that commit is corrupt, you can still fetch the rest of the data). We should not worry about advertising broken objects, because we will encounter the same error when we actually do try to send the objects. Dying at the advertisement phase would be premature, since we do not yet know what the client will request. The problem, of course, is that the --local optimization _skips_ the part where we actually ask upload-pack for data, and instead blindly copies it. So this is the same issue as usual, which is that the local transport is not thorough enough to catch corruption. It seems like a failing in this case, because upload-pack does notice the problem, but that is only luck; if the corruption were in a non-tip object, it would not notice it at all. So trying to die on errors in the ref advertisement would just be a band-aid. Fundamentally the problem is that the --local transport is not safe from propagating corruption, and should not be used if that's a requirement. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: propagating repo corruption across clone
On Mon, Mar 25, 2013 at 4:07 PM, Jeff King wrote: > On Mon, Mar 25, 2013 at 12:32:50PM -0400, Jeff Mitchell wrote: >> For commit corruptions, the --no-hardlinks, non --mirror case refused >> to create the new repository and exited with an error code of 128. The >> --no-hardlinks, --mirror case spewed errors to the console, yet >> *still* created the new clone *and* returned an error code of zero. > > I wasn't able to reproduce this; can you post a succint test case? This actually seems hard to reproduce. I found this during testing with an existing repository on-disk, but when I tried creating a new repository with some commit objects, and modifying one of the commit objects the same way I modified the an object in the previous repository, I was unable to reproduce it. I do have the original repository though, so I'll tar.gz it up so that you can have exactly the same content as I do. It's about 40MB and you can grab it here: https://www.dropbox.com/s/e8dhedmpd1a1axs/tomahawk-corrupt.tar.gz (MD5 sum: cde8a43233db5d649932407891f8366b). Once you extract that, you should be able to run a clone using paths (not file://) with --no-hardlinks --mirror and replicate the behavior I saw. FYI, I'm on Git 1.8.2. Thanks, Jeff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: propagating repo corruption across clone
On Mon, Mar 25, 2013 at 10:56 PM, Jeff King wrote: > On Mon, Mar 25, 2013 at 10:31:04PM +0700, Nguyen Thai Ngoc Duy wrote: > >> On Mon, Mar 25, 2013 at 9:56 PM, Jeff King wrote: >> > There are basically three levels of transport that can be used on a >> > local machine: >> > >> > 1. Hard-linking (very fast, no redundancy). >> > >> > 2. Byte-for-byte copy (medium speed, makes a separate copy of the >> > data, but does not check the integrity of the original). >> > >> > 3. Regular git transport, creating a pack (slowest, but should include >> > redundancy checks). >> > >> > Using --no-hardlinks turns off (1), but leaves (2) as an option. I >> > think the documentation in "git clone" could use some improvement in >> > that area. >> >> Not only git-clone. How git-fetch and git-push verify the new pack >> should also be documented. I don't think many people outside the >> contributor circle know what is done (and maybe how) when data is >> received from outside. > > I think it's less of a documentation issue there, though, because they > _only_ do (3). There is no option to do anything else, so there is > nothing to warn the user about in terms of tradeoffs. > > I agree that in general git's handling of corruption could be documented > somewhere, but I'm not sure where. I think either a section in git-fsck.txt or git.txt. Probably the former as people who read it are probably more concerned about corruption. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: propagating repo corruption across clone
On Mon, Mar 25, 2013 at 12:32:50PM -0400, Jeff Mitchell wrote: > I think what was conflating the issue in my testing is that with > --mirror it implies --bare, so there would be checking of the objects > when the working tree was being created, hence --mirror won't show the > error a normal clone will -- it's not a transport question, it's just > a matter of the normal clone doing more and so having more data run > through checks. Exactly. > However, there are still problems. For blob corruptions, even in this > --no-hardlinks, non --mirror case where an error was found, the exit > code from the clone was 0. I can see this tripping up all sorts of > automated scripts or repository GUIs that ignore the output and only > check the error code, which is not an unreasonable thing to do. Yes, this is a bug. I'll post a series in a minute which fixes it. > For commit corruptions, the --no-hardlinks, non --mirror case refused > to create the new repository and exited with an error code of 128. The > --no-hardlinks, --mirror case spewed errors to the console, yet > *still* created the new clone *and* returned an error code of zero. I wasn't able to reproduce this; can you post a succint test case? > It seems that when there is an "error" as opposed to a "fatal" it > doesn't affect the status code on a clone; I'd argue that it ought to. Agreed completely. The current behavior is buggy. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: propagating repo corruption across clone
On Mon, Mar 25, 2013 at 01:01:59PM -0700, Junio C Hamano wrote: > Jeff King writes: > > > We _do_ see a problem during the checkout phase, but we don't propagate > > a checkout failure to the exit code from clone. That is bad in general, > > and should probably be fixed. Though it would never find corruption of > > older objects in the history, anyway, so checkout should not be relied > > on for robustness. > > It is obvious that we should exit with non-zero status when we see a > failure from the checkout, but do we want to nuke the resulting > repository as in the case of normal transport failure? A checkout > failure might be due to being under quota for object store but > running out of quota upon populating the working tree, in which case > we probably do not want to. I'm just running through my final tests on a large-ish patch series which deals with this (among other issues). I had the same thought, though we do already die on a variety of checkout errors. I left it as a die() for now, but I think we should potentially address it with a further patch. > > $ git init non-local && cd non-local && git fetch .. > > remote: Counting objects: 3, done. > > remote: Total 3 (delta 0), reused 3 (delta 0) > > Unpacking objects: 100% (3/3), done. > > fatal: missing blob object 'd95f3ad14dee633a758d2e331151e950dd13e4ed' > > error: .. did not send all necessary objects > > > > we do notice. > > Yes, it is OK to add connectedness check to "git clone". That's in my series, too. Unfortunately, in the local clone case, it slows down the clone considerably (since we otherwise would not have to traverse the objects at all). -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: propagating repo corruption across clone
Jeff King writes: > We _do_ see a problem during the checkout phase, but we don't propagate > a checkout failure to the exit code from clone. That is bad in general, > and should probably be fixed. Though it would never find corruption of > older objects in the history, anyway, so checkout should not be relied > on for robustness. It is obvious that we should exit with non-zero status when we see a failure from the checkout, but do we want to nuke the resulting repository as in the case of normal transport failure? A checkout failure might be due to being under quota for object store but running out of quota upon populating the working tree, in which case we probably do not want to. > We do not notice the sha1 mis-match on the sending side (which we could, > if we checked the sha1 as we were sending). We do not notice the broken > object graph during the receive process either. I would have expected > check_everything_connected to handle this, but we don't actually call it > during clone! If you do this: > > $ git init non-local && cd non-local && git fetch .. > remote: Counting objects: 3, done. > remote: Total 3 (delta 0), reused 3 (delta 0) > Unpacking objects: 100% (3/3), done. > fatal: missing blob object 'd95f3ad14dee633a758d2e331151e950dd13e4ed' > error: .. did not send all necessary objects > > we do notice. Yes, it is OK to add connectedness check to "git clone". > And one final check: > > $ git -c transfer.fsckobjects=1 clone --no-local . fsck > Cloning into 'fsck'... > remote: Counting objects: 3, done. > remote: Total 3 (delta 0), reused 3 (delta 0) > Receiving objects: 100% (3/3), done. > error: unable to find d95f3ad14dee633a758d2e331151e950dd13e4ed > fatal: object of unexpected type > fatal: index-pack failed > > Fscking the incoming objects does work, but of course it comes at a cost > in the normal case (for linux-2.6, I measured an increase in CPU time > with "index-pack --strict" from ~2.5 minutes to ~4 minutes). And I think > it is probably overkill for finding corruption; index-pack already > recognizes bit corruption inside an object, and > check_everything_connected can detect object graph problems much more > cheaply. > One thing I didn't check is bit corruption inside a packed object that > still correctly zlib inflates. check_everything_connected will end up > reading all of the commits and trees (to walk them), but not the blobs. Correct. > So I think at the very least we should: > > 1. Make sure clone propagates errors from checkout to the final exit > code. > > 2. Teach clone to run check_everything_connected. I agree with both but with a slight reservation on the former one (see above). Thanks. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: propagating repo corruption across clone
On Mon, Mar 25, 2013 at 11:56 AM, Jeff King wrote: > On Mon, Mar 25, 2013 at 10:31:04PM +0700, Nguyen Thai Ngoc Duy wrote: > >> On Mon, Mar 25, 2013 at 9:56 PM, Jeff King wrote: >> > There are basically three levels of transport that can be used on a >> > local machine: >> > >> > 1. Hard-linking (very fast, no redundancy). >> > >> > 2. Byte-for-byte copy (medium speed, makes a separate copy of the >> > data, but does not check the integrity of the original). >> > >> > 3. Regular git transport, creating a pack (slowest, but should include >> > redundancy checks). >> > >> > Using --no-hardlinks turns off (1), but leaves (2) as an option. I >> > think the documentation in "git clone" could use some improvement in >> > that area. >> >> Not only git-clone. How git-fetch and git-push verify the new pack >> should also be documented. I don't think many people outside the >> contributor circle know what is done (and maybe how) when data is >> received from outside. > > I think it's less of a documentation issue there, though, because they > _only_ do (3). There is no option to do anything else, so there is > nothing to warn the user about in terms of tradeoffs. > > I agree that in general git's handling of corruption could be documented > somewhere, but I'm not sure where. Hi there, First of all, thanks for the analysis, it's much appreciated. It's good to know that we weren't totally off-base in thinking that a naive copy may be out of sync, as small as the chance are (certainly we wouldn't have known the right ordering). I think what was conflating the issue in my testing is that with --mirror it implies --bare, so there would be checking of the objects when the working tree was being created, hence --mirror won't show the error a normal clone will -- it's not a transport question, it's just a matter of the normal clone doing more and so having more data run through checks. However, there are still problems. For blob corruptions, even in this --no-hardlinks, non --mirror case where an error was found, the exit code from the clone was 0. I can see this tripping up all sorts of automated scripts or repository GUIs that ignore the output and only check the error code, which is not an unreasonable thing to do. For commit corruptions, the --no-hardlinks, non --mirror case refused to create the new repository and exited with an error code of 128. The --no-hardlinks, --mirror case spewed errors to the console, yet *still* created the new clone *and* returned an error code of zero. It seems that when there is an "error" as opposed to a "fatal" it doesn't affect the status code on a clone; I'd argue that it ought to. If Git knows that the source repository has problems, it ought to be reflected in the status code so that scripts performing clones have a normal way to detect this and alert a user/sysadmin/whoever. Even if a particular cloning method doesn't perform all sanity checks, if it finds something in the sanity checks it *does* perform, this should be trumpeted, loudly, regardless of transport mechanism and regardless of whether a user is watching the process or a script is. Thanks, Jeff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: propagating repo corruption across clone
On Mon, Mar 25, 2013 at 10:31:04PM +0700, Nguyen Thai Ngoc Duy wrote: > On Mon, Mar 25, 2013 at 9:56 PM, Jeff King wrote: > > There are basically three levels of transport that can be used on a > > local machine: > > > > 1. Hard-linking (very fast, no redundancy). > > > > 2. Byte-for-byte copy (medium speed, makes a separate copy of the > > data, but does not check the integrity of the original). > > > > 3. Regular git transport, creating a pack (slowest, but should include > > redundancy checks). > > > > Using --no-hardlinks turns off (1), but leaves (2) as an option. I > > think the documentation in "git clone" could use some improvement in > > that area. > > Not only git-clone. How git-fetch and git-push verify the new pack > should also be documented. I don't think many people outside the > contributor circle know what is done (and maybe how) when data is > received from outside. I think it's less of a documentation issue there, though, because they _only_ do (3). There is no option to do anything else, so there is nothing to warn the user about in terms of tradeoffs. I agree that in general git's handling of corruption could be documented somewhere, but I'm not sure where. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: propagating repo corruption across clone
On Mon, Mar 25, 2013 at 9:56 PM, Jeff King wrote: > There are basically three levels of transport that can be used on a > local machine: > > 1. Hard-linking (very fast, no redundancy). > > 2. Byte-for-byte copy (medium speed, makes a separate copy of the > data, but does not check the integrity of the original). > > 3. Regular git transport, creating a pack (slowest, but should include > redundancy checks). > > Using --no-hardlinks turns off (1), but leaves (2) as an option. I > think the documentation in "git clone" could use some improvement in > that area. Not only git-clone. How git-fetch and git-push verify the new pack should also be documented. I don't think many people outside the contributor circle know what is done (and maybe how) when data is received from outside. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: propagating repo corruption across clone
On Mon, Mar 25, 2013 at 09:43:23AM -0400, Jeff Mitchell wrote: > > But I haven't seen exactly what the corruption is, nor exactly what > > commands they used to clone. I've invited the blog author to give more > > details in this thread. > > The syncing was performed via a clone with git clone --mirror (and a > git:// URL) and updates with git remote update. OK. That should be resilient to corruption, then[1]. > So I should mention that my experiments after the fact were using > local paths, but with --no-hardlinks. Yeah, we will do a direct copy in that case, and there is nothing to prevent corruption propagating. > If you're saying that the transport is where corruption is supposed to > be caught, then it's possible that we shouldn't see corruption > propagate on an initial mirror clone across git://, and that something > else was responsible for the trouble we saw with the repositories that > got cloned after-the-fact. Right. Either it was something else, or there is a bug in git's protections (but I haven't been able to reproduce anything likely). > But then I'd argue that this is non-obvious. In particular, when using > --no-hardlinks, I wouldn't expect that behavior to be different with a > straight path and with file://. There are basically three levels of transport that can be used on a local machine: 1. Hard-linking (very fast, no redundancy). 2. Byte-for-byte copy (medium speed, makes a separate copy of the data, but does not check the integrity of the original). 3. Regular git transport, creating a pack (slowest, but should include redundancy checks). Using --no-hardlinks turns off (1), but leaves (2) as an option. I think the documentation in "git clone" could use some improvement in that area. > Something else: apparently one of my statements prompted joeyh to > think about potential issues with backing up live git repos > (http://joeyh.name/blog/entry/difficulties_in_backing_up_live_git_repositories/). > Looking at that post made me realize that, when we were doing our > initial thinking about the system three years ago, we made an > assumption that, in fact, taking a .tar.gz of a repo as it's in the > process of being written to or garbage collected or repacked could be > problematic. This isn't a totally baseless assumption, as I once had a > git repository that I was in the process of updating when I had a > sudden power outage that suffered corruption. (It could totally have > been the filesystem, of course, although it was a journaled file > system.) Yes, if you take a snapshot of a repository with rsync or tar, it may be in an inconsistent state. Using the git protocol should always be robust, but if you want to do it with other tools, you need to follow a particular order: 1. copy the refs (refs/ and packed-refs) first 2. copy everything else (including object/) That covers the case where somebody is pushing an update simultaneously (you _may_ get extra objects in step 2 that they have not yet referenced, but you will never end up with a case where you are referencing objects that you did not yet transfer). If it's possible that the repository might be repacked during your transfer, I think the issue a bit trickier, as there's a moment where the new packfile is renamed into place, and then the old ones are deleted. Depending on the timing and how your readdir() implementation behaves with respect to new and deleted entries, it might be possible to miss both the new one appearing and the old ones disappearing. This is quite a tight race to catch, I suspect, but if you were to rsync objects/pack twice in a row, that would be sufficient. For pruning, I think you could run into the opposite situation: you grab the refs, somebody updates them with a history rewind (or branch deletion), then somebody prunes and objects go away. Again, the timing on this race is quite tight and it's unlikely in practice. I'm not sure of a simple way to eliminate it completely. Yet another option is to simply rsync the whole thing and then "git fsck" the result. If it's not 100% good, just re-run the rsync. This is simple and should be robust, but is more CPU intensive (you'll end up re-checking all of the data on each update). > So, we decided to use Git's built-in capabilities of consistency > checking to our advantage (with, as it turns out, a flaw in our > implementation). But the question remains: are we wrong about thinking > that rsyncing or tar.gz live repositories in the middle of being > pushed to/gc'd/repacked could result in a bogus backup? No, I think you are right. If you do the refs-then-objects ordering, that saves you from most of it, but I do think there are still some races that exist during repacking or pruning. -Peff [1] I mentioned that clone-over-git:// is resilient to corruption. I think that is true for bit corruption, but my tests did show that we are not as careful about checking graph connectivity during clone as we are with
Re: propagating repo corruption across clone
On Sun, Mar 24, 2013 at 3:23 PM, Jeff King wrote: > On Sun, Mar 24, 2013 at 08:01:33PM +0100, Ævar Arnfjörð Bjarmason wrote: > >> On Sun, Mar 24, 2013 at 7:31 PM, Jeff King wrote: >> > >> > I don't have details on the KDE corruption, or why it wasn't detected >> > (if it was one of the cases I mentioned above, or a more subtle issue). >> >> One thing worth mentioning is this part of the article: >> >> "Originally, mirrored clones were in fact not used, but non-mirrored >> clones on the anongits come with their own set of issues, and are more >> prone to getting stopped up by legitimate, authenticated force pushes, >> ref deletions, and so on – and if we set the refspec such that those >> are allowed through silently, we don’t gain much. " >> >> So the only reason they were even using --mirror was because they were >> running into those problems with fetching. With a normal fetch. We actually *wanted* things like force updates and ref deletions to propagate, because we have not just Gitolite's checks but our own checks on the servers, and wanted that to be considered the authenticated source. Besides just daily use and preventing cruft, we wanted to ensure that such actions propagated so that if a branch was removed because it contained personal information, accidental commits, or a security issue (for instance) that the branch was removed on the anongits too, within a timely fashion. > I think the --mirror thing is a red herring. It should not be changing > the transport used, and that is the part of git that is expected to > catch such corruption. > > But I haven't seen exactly what the corruption is, nor exactly what > commands they used to clone. I've invited the blog author to give more > details in this thread. The syncing was performed via a clone with git clone --mirror (and a git:// URL) and updates with git remote update. So I should mention that my experiments after the fact were using local paths, but with --no-hardlinks. If you're saying that the transport is where corruption is supposed to be caught, then it's possible that we shouldn't see corruption propagate on an initial mirror clone across git://, and that something else was responsible for the trouble we saw with the repositories that got cloned after-the-fact. But then I'd argue that this is non-obvious. In particular, when using --no-hardlinks, I wouldn't expect that behavior to be different with a straight path and with file://. Something else: apparently one of my statements prompted joeyh to think about potential issues with backing up live git repos (http://joeyh.name/blog/entry/difficulties_in_backing_up_live_git_repositories/). Looking at that post made me realize that, when we were doing our initial thinking about the system three years ago, we made an assumption that, in fact, taking a .tar.gz of a repo as it's in the process of being written to or garbage collected or repacked could be problematic. This isn't a totally baseless assumption, as I once had a git repository that I was in the process of updating when I had a sudden power outage that suffered corruption. (It could totally have been the filesystem, of course, although it was a journaled file system.) So, we decided to use Git's built-in capabilities of consistency checking to our advantage (with, as it turns out, a flaw in our implementation). But the question remains: are we wrong about thinking that rsyncing or tar.gz live repositories in the middle of being pushed to/gc'd/repacked could result in a bogus backup? Thanks, Jeff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: propagating repo corruption across clone
On Sun, Mar 24, 2013 at 02:31:33PM -0400, Jeff King wrote: > > Fscking the incoming objects does work, but of course it comes at a cost > in the normal case (for linux-2.6, I measured an increase in CPU time > with "index-pack --strict" from ~2.5 minutes to ~4 minutes). And I think > it is probably overkill for finding corruption; index-pack already > recognizes bit corruption inside an object, and > check_everything_connected can detect object graph problems much more > cheaply. AFAIK, standard checks index-pack has to do + checking that the object graph has no broken links (and every ref points to something valid) will catch everything except: - SHA-1 collisions between corrupt objects and clean ones. - Corrupted refs (that still point to something valid). - "Born-corrupted" objects. > One thing I didn't check is bit corruption inside a packed object that > still correctly zlib inflates. check_everything_connected will end up > reading all of the commits and trees (to walk them), but not the blobs. Checking that everything is connected will (modulo SHA-1 collisions) save you there, at least if packv3 is used as transport stream. -Ilari -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: propagating repo corruption across clone
On Sun, Mar 24, 2013 at 08:01:33PM +0100, Ævar Arnfjörð Bjarmason wrote: > On Sun, Mar 24, 2013 at 7:31 PM, Jeff King wrote: > > > > I don't have details on the KDE corruption, or why it wasn't detected > > (if it was one of the cases I mentioned above, or a more subtle issue). > > One thing worth mentioning is this part of the article: > > "Originally, mirrored clones were in fact not used, but non-mirrored > clones on the anongits come with their own set of issues, and are more > prone to getting stopped up by legitimate, authenticated force pushes, > ref deletions, and so on – and if we set the refspec such that those > are allowed through silently, we don’t gain much. " > > So the only reason they were even using --mirror was because they were > running into those problems with fetching. I think the --mirror thing is a red herring. It should not be changing the transport used, and that is the part of git that is expected to catch such corruption. But I haven't seen exactly what the corruption is, nor exactly what commands they used to clone. I've invited the blog author to give more details in this thread. > So aside from the problems with --mirror I think we should have > something that updates your local refs to be exactly like they are on > the other end, i.e. deletes some, non-fast-forwards others etc. > (obviously behind several --force options and so on). But such an > option *wouldn't* accept corrupted objects. That _should_ be how "git fetch --prune +refs/*:refs/*" behaves (and that refspec is set up when you use "--mirror"; we should probably have it turn on --prune, too, but I do not think you can do so via a config option currently). -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: propagating repo corruption across clone
On Sun, Mar 24, 2013 at 7:31 PM, Jeff King wrote: > > I don't have details on the KDE corruption, or why it wasn't detected > (if it was one of the cases I mentioned above, or a more subtle issue). One thing worth mentioning is this part of the article: "Originally, mirrored clones were in fact not used, but non-mirrored clones on the anongits come with their own set of issues, and are more prone to getting stopped up by legitimate, authenticated force pushes, ref deletions, and so on – and if we set the refspec such that those are allowed through silently, we don’t gain much. " So the only reason they were even using --mirror was because they were running into those problems with fetching. So aside from the problems with --mirror I think we should have something that updates your local refs to be exactly like they are on the other end, i.e. deletes some, non-fast-forwards others etc. (obviously behind several --force options and so on). But such an option *wouldn't* accept corrupted objects. That would give KDE and other parties a safe way to do exact repo mirroring like this, wouldn't protect them from someone maliciously deleting all the refs in all the repos, but would prevent FS corruption from propagating. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html