Re: [darcs-users] darcs-hs/hashed-storage review

Ganesh Sittampalam Wed, 05 Aug 2009 16:38:00 -0700

On Wed, 5 Aug 2009, Petr Rockai wrote:

Ganesh Sittampalam <[email protected]> writes:

> My most important question is about the dividing line between> hashed-storage and darcs-hs. The biggest weakness of hashed-storage as> a separate package is that it has functions that are specific to Darcs> repositories. To my mind, it either needs to abandon this knowledge,> perhaps by abstracting it over a typeclass that is meaningful in> itself, or some or all of hashed-storage needs to move into the darcs> tree proper. Otherwise, changes to some parts of darcs will continue> to require lockstep changes to hashed-storage - which is ok when Petr> is developing the two together intensively on his own, but not really> when other people are involved or over a longer timeframe. One option> would be to move it all into the darcs tree but distribute it as a> separate cabal package with a more darcs-specific name.

Well, I have done a bunch of work on separating the darcs-specific bits.For now, they live in the library, because they are an important testingresource. However, out of Storage.Hashed.Darcs, there are only a fewdarcs-specific bits living in Storage.Hashed. It also seems thatabstracting the darcs-specific bits is workable.

I however expect this to be useful outside of darcs proper as well. Thelock-step development is a problem, but I am not sure how much it can beeliminated. Probably something for further discussion.

I guess the question is what the intended use cases for hashed-storageoutside darcs are. I can see two possibilities:


(a) General hash-structured storage library. It feels like it should be

generally useful but the only concrete idea I can think of would be forreading git repos. In this situation hashed-storage really does belong asan independent library.


(b) A way of reading darcs repos (and perhaps writing them, for the
brave). This would be an important step towards a stable darcs API, and
clearly should be a separate cabal package, but if it's just that, I think
it should be called something darcs-specific (darcs-storage?) and live in
the darcs tree.

I'm not saying that either of these use cases have to exist prior tomerging into mainline, just that it has to be understood (and documented)what the expectations are.

> - In various places (e.g. deleting on windows, corrupt pending etc) a> file gets renamed to a fixed different name, which could cause> problems if it already exists. There should be some standard scheme to> avoid collisions.

The corrupt pending behaviour is lifted from pre-existing darcs code.For deleting on windows, this indeed needs a better solution, but thecurrent one mostly works for what its worth: if the old file exists, itis quite unlikely to be in use. I have asked elsewhere about disposingof files on win32, the only problem with _darcs/trash is thathashed-storage can't know about it.
> - We need to sort out haskell_policy, the ratification of readFile all> over the place is just silly. Replacing it with hlint as has been> suggested sounds like a very good idea.

Yes, I just didn't have the time to implement the hlint replacement yet.If people feel this is important, I can try to get this done before endof SoC.

No, I just mentioned it in passing. I think this is a better standalonetask that would just distract from the immediately important stuff, unlessit's very quick or easy.

> - I think treeDiff is actually relatively unlikely to be wrong because> it delegate s actual line by line diffing to Patch.Prim which hasn't> changed. Storage.Hashed.Diff exists but seems unused, what's the> status and plans there?

The line/line diff is not a problem -- as you say, this is done throughhunk canonization. The problem is that unsafeDiff does unhealthy amountof magic about empty files, tacking newlines here and there. I havetried to simplify the logic while retaining the behaviour, but bugscould have happened, even though unlikely, as I'd expect these to showup on the testsuite. (I previously got it wrong and it has shown up...but, you never know.)

I guess some QuickCheck is in order then, but it's not necessarily crucialthat treeDiff actually does replicate unsafeDiff, so long as it doesactually properly represent the diffs (otherwise users might be left withunrecorded changes which would be very weird/confusing).

> - Presumably biggest risk is random bugs with invalid indexes. An> explanation of when the index can become invalid would be helpful> (there may be one in the code that I haven't spotted yet).I recall stating this somewhere, but it could be in a mail or IRC. I'lladd that to the inline documentation:
-- | Mark the existing index as invalid. This has to be called whenever the
-- listing of pristine changes and will cause darcs to update the index next
-- time it tries to read it. (NB. This is about files added and removed from
-- pristine: changes to file content in either pristine or working are handled
-- transparently by the index reading code.)
The consequences of a missed invalidate would be that the file isconsidered deleted or added to the working copy, respectively,regardless of its actual state. I suppose recording such state couldlead to inconsistent repository, even. We should indeed double-checkthat the index is invalidated properly (this should always happen beforethe pristine change happens, as extra invalidations are harmless).


Also, darcs check should check the index (and darcs repair fix it).

> - Storage.Hashed.Index: code is full of magic numbers
Yes, this module is a mess. :| I'll clean that up on HEAD, since I havealready done some improvements there (representing hashes in pure binaryform instead of ascii-hex strings). See also later about "finish".
> - Petr and I have already discussed overloading Tree and TreeIO over> the container (IO). This would make writing test rigs easier, and mean> that code that processes Tree could be polymorphic where appropriate.> Overloaded code would typically constrain the container to be in Monad> or perhaps Applicative.This is actually done in HEAD. Seehttp://repos.mornfall.net/hashed-storage/dist/doc/html/hashed-storage/Storage-Hashed-Monad.html


Cool, it looks good.

> - Win32 removeFile workaround: not used consistently, should be hidden> behind some standard abstraction?Definitely, I just tacked this on in the beta cycle for 2.3 to maketests pass on windows.
> - I don't really understand what virtualTreeIO does and doesn't do.
We can discuss this, or I can try to improve the haddocks.

I've read through it again and I now understand what's going on. The factthat IO actions (or whatever monad) are all over the place - in the Tree,in the sync operation, and as a lower layer of the monad - is ratherconfusing, though I can see why it's that way now I've got to grips withit. I think the main improvement to the haddocks that would have helped meis to change "get somehow reflected in the actual real filesystem" to"might get somehow reflected ...".

> - I think TreeIO should be abstract, i.e. a newtype, unless there is> some value in exposing the fact that it is a StateT.
Yes, you are probably right. It was just me being lazy.

> - hashedTreeIO never deletes things, so what removes dead stuff?

Well, hashedTreeIO is currently not used. Well, it is used by darcscheck/repair in darcs-hs, which does its own cleanup. This is howeversomething that will need addressing.

OK, I'm now rather lost about what is needed for what. I'd have expectedhashedTreeIO to be needed for everything that touches pristine in a hashedrepo. Is this just a work in progress?

> - I'm very dubious about semantics of finish - why doesn't expandPath> call it? Presumably the documented usage semantics of readIndex have> something to do with this, but it feels fragile, and I only sort of> understand the whole dirs_changed/finish stuff in readIndex'You are right about being dubious. I however haven't found a better wayto reasonably implement index updates. In retrospect, it may have beenbetter to just provide a specialised "expand" for the index tree, thatwould also do the update. I'll check if there's a reason preventingthat, and if no, I'll go for it in hashed-storage 0.4 and just get ridof finish.


That would definitely be an improvement, but readIndex would still return
a Tree that wasn't quite like normal Trees, which makes me uncomfortable.
We should discuss a bit more what it's trying to achieve (perhaps on IRC).

Ganesh
_______________________________________________
darcs-users mailing list
[email protected]
http://lists.osuosl.org/mailman/listinfo/darcs-users

Re: [darcs-users] darcs-hs/hashed-storage review

Reply via email to