On Tue, Oct 28, 2008 at 7:39 PM, David Roundy <[EMAIL PROTECTED]> wrote: > On Tue, Oct 28, 2008 at 03:36:32PM -0700, Eric Kow wrote: >> Salvo 6! >> >> Sun Oct 26 00:01:40 BST 2008 Don Stewart <[EMAIL PROTECTED]> >> * Add LANGUAGE pragmas for explicit language extensions >> >> Sun Oct 26 16:40:09 GMT 2008 [EMAIL PROTECTED] >> * add a get_unrecorded_in_files to check for unrecorded changes in a >> subset of working directory >> >> Sun Oct 26 19:06:12 GMT 2008 [EMAIL PROTECTED] >> * make get_unrecorded_private work with type witnesses again >> >> Sun Oct 26 19:46:36 GMT 2008 [EMAIL PROTECTED] >> * make whatsnew use the lstat-saving functions to scan the working copy > > These look very appealing, and have some nice ideas, but are also buggy. :( > >> ] hunk ./src/Darcs/Repository/Internal.lhs 417 >> ftf <- filetype_function >> Sealed pend <- read_pending repository >> debugMessage "diffing dir..." >> + -- the unsafeCoerceP below is necessary to be able to concatenate >> + -- pend with NilFL to form dif. See http://hpaste.org/11480 >> let diffs = if null files >> hunk ./src/Darcs/Repository/Internal.lhs 420 >> - then [smart_diff opts ftf cur work] >> - else catMaybes (map (diff_at_path opts ftf cur work) (map >> fn2fp files)) >> - let workdiff = foldl (+>+) NilFL diffs >> + then smart_diff opts ftf cur work >> + else let diffsPerFile = catMaybes (map (diff_at_path opts >> ftf cur work) (map fn2fp files)) >> + in foldr (+>+) (unsafeCoerceP NilFL) diffsPerFile >> dif = if AnyOrder `elem` opts > > There may be other bugs, but this one is sufficient to cause trouble. This > unsafeCoerceP really is unsafe. Combining patches together in this way has > serious potential for repository corruption! In particular, if a file is > referenced more than once in the paths, you can get multiple copies of the > same change.
It's not actually the unsafeCoerceP that makes this case unsafe, and I'll explain why I think that. The unsafeCoerceP is just there to deal with the problem in this psuedo code: foo :: a -> FL Prim C(x y) foo _ = NilFL foo a = -- do something legit with 'a' to create a sequence The problem being that this won't actually type check unless x = y. Basically, our type signature prevents foo from returning a list, such as the empty one, that claims to not modify context. If we add unsafeCoerceP on the NilFL the problem 'goes away'. I'm calling this the 'foo idiom' for now. You might argue that adding unsafeCoerceP to foo is unsafe in terms of Haskell's types but this argument depends on two reasons: 1) We have unsafeCoerce inside (=\/=) and (=/\=) and with it you can use foo to reimplement unsafeCoerce. 2) x and y are unconstrained by a relationship to the input of foo. Now, while we shouldn't encourage functions that make it possible to implement unsafeCoerce the above foo idiom isn't really a problem by itself. The usefulness of functions like foo is evidenced by unsafe functions like smart_diff. Being able to create a patch, with its context, 'out of thin air' is useful and we just have to be careful about it because of the next point. The next level of unsafety is the one we assign to x and y. We are embedding our patch theory constraints in Haskell's types and sometimes we can't do this flawlessly. Sometimes what we want to do is type safe in Haskell's type system but does not respect the invariants of patch theory. In this case the flaw is that smart_diff is really unsafe_smart_diff. Our type signature for smart_diff is equivalent to replacing 'smart_diff' with '(unsafeCoerceP . smart_diff)'. Again, the problem here isn't with the unsafeCoerceP on NilFL. Which is my way of saying, the problem isn't the fact that we are doing something that can be used to implement unsafeCoerce (while admitting that it is undesirable). The real problem is that darcs has an invariant that our type encoding is ignorant of. That invariant, the one you point out, seems to be about multiple copies of the same change. I agree that we need to fix the code above to respect that invariant. > The get_unrecorded_in_files idea is great, and looks like the right sort of > interface, but concatenating patches from separate diffs is way too > fragile. A better approach would be to pass all the interesting files into > a diff_at_paths. Okay, so that pushes the invariant down a level. I can't say for sure, but I think you're saying that makes it easier to deal with. I certainly can't think of a way off the top of my head to make a context threaded version of smart_diff. Well, with the possible exception of returning Sealed (FL Prim C(x)) like we do with readPatch, which is flawed but harder to misuse. And one more exception for dependent types; I'm pretty sure Agda could represent this even though I don't have an implementation to prove it. Generalizing your suggestion I think this is what you're saying: To make our flawed type encoding easier to work with, pass all the information necessary to construct the sequence to just one function. So that instead of O(n) calls to a (patch theory) unsafe function (smart_diff or diff_at_path) which can be threaded/concatenated in a (patch theory) unsafe way we make O(1) (patch theory) unsafe call(s). The code above is already sound in Haskell's type system (but could be made unsound by introducing (=\/=) or (=/\=)). Note: I'm hence forth making a distinction between things that are patch-theory unsafe and things that are haskell-type-system unsafe. Where the latter is sufficient for the former. It would also be handy to make a special case for the foo idiom in the places where it is audited and known to be used safely. > I'm right now pushing a test that demonstrates this bug in whatsnew. > (Except it passes, since I'm not applying these patches just yet.) Ah, tests are good. They help us document these invariants. Patch review is good, but I would certainly feel more comfortable with test centric patch acceptance over patch-review centric. I certainly miss a lot of things when I review patches. Also the research I read in my first year of graduate school, about psychology of programming, agrees. If I recall correctly Nancy Pennington has some great studies on the subject. We could of course have the best of both worlds when we get the code more factored so that we have a small kernel that must be correct at all costs and libraries built on top by contributors that just want feature X where X is tolerant to bugs, eg., doesn't corrupt data (like interactive change viewing, which is buggy by the way but it won't cause corruption, mild confusion at best, I made a ticket for this but I can't find it). > P.P.S. It's nice to see a case where the type witnesses really did catch us > a bug... Note that this code is right at the border between where the type > witnesses protect us and where they don't, since Diff itself isn't > protected by type witnesses. Fortunately, it was on the safe side of the > border, or I might not have caught it... also fortunately, the bit on the > diff side of the border is sufficiently simple so as to be easily > reviewable. I disagree that we are on the useful side of the type witnesses here. I've made my case for why above. It is good that you saw the unsafeCoerceP and started investigating, but it's really the diff_at_path and smart_diff that should have caused the audit in this case. Here I think it's unfair to say that unsafeCoerceP is the culprit, given it's usage. It's solving a different problem, namely the foo idiom. I'd argue that since I also reviewed this during the sprint but couldn't spot the bug that this is evidence that an invariant is missing from the type encoding. I'm not arguing that the type witness stuff is without use. It's incredibly handy, I just think we have to be careful about attribution here :) Thanks, Jason _______________________________________________ darcs-users mailing list [email protected] http://lists.osuosl.org/mailman/listinfo/darcs-users
