Re: [darcs-users] darcs-hs/hashed-storage review

Petr Rockai Tue, 04 Aug 2009 15:40:02 -0700

First of all, thanks for the review work! I am sure this will help a lot with
filling in the gaps in hashed-storage/darcs-hs, as well as moving forward with
merging into mainline.


Ganesh Sittampalam <[email protected]> writes:
> My most important question is about the dividing line between hashed-storage
> and darcs-hs. The biggest weakness of hashed-storage as a separate package is
> that it has functions that are specific to Darcs repositories. To my mind, it
> either needs to abandon this knowledge, perhaps by abstracting it over a
> typeclass that is meaningful in itself, or some or all of hashed-storage needs
> to move into the darcs tree proper. Otherwise, changes to some parts of darcs
> will continue to require lockstep changes to hashed-storage - which is ok when
> Petr is developing the two together intensively on his own, but not really 
> when
> other people are involved or over a longer timeframe. One option would be to
> move it all into the darcs tree but distribute it as a separate cabal package
> with a more darcs-specific name.
Well, I have done a bunch of work on separating the darcs-specific bits. For
now, they live in the library, because they are an important testing
resource. However, out of Storage.Hashed.Darcs, there are only a few
darcs-specific bits living in Storage.Hashed. It also seems that abstracting
the darcs-specific bits is workable.

I however expect this to be useful outside of darcs proper as well. The
lock-step development is a problem, but I am not sure how much it can be
eliminated. Probably something for further discussion.

> - The naming of Darcs.Gorsvet is obscure and a barrier to future understanding
> of the code
Yes, the plan is to move everything to proper places -- Darcs.Repository, and
probably elsewhere. The Gorsvet module should go away, eventually -- it's there
to clearly separate the hashed-storage code from the rest (but this is now
blurred in darcs-hs, and I expect Gorsvet to start disappearing soon,
especially if we can start merging the code into mainline).

> - In Darcs.Gorsvet, is there any reason not to use bracket in
> mInCurrentDirectory (as a comment suggests)?
No, other than that I didn't get around to fix that and that I always have to
spend a little while wondering how to spell out the bracket. I'll fix it.

> - The floatPath function from Storage.Hashed is documented as being unsafe and
> then used all over Darcs.Gorsvet. Needs an explanation of what's unsafe and 
> why
> it's ok to use it in Darcs.Gorsvet.
Yes. :\ The AnchoredPath is supposed to represent paths in repository. Taking
an arbitrary string and claiming it is a repository-relative path is basically
unsafe. I rely on darcs to only pass the right kind of paths my way -- it's not
optimal, and ideally, there'd be only a single point where things pass through
floatPath, but that's currently not practical, as it would require extensive
changes to darcs path handling. I'll at least try to document this in the
meantime.

> - In various places (e.g. deleting on windows, corrupt pending etc) a file 
> gets
> renamed to a fixed different name, which could cause problems if it already
> exists. There should be some standard scheme to avoid collisions.
The corrupt pending behaviour is lifted from pre-existing darcs code. For
deleting on windows, this indeed needs a better solution, but the current one
mostly works for what its worth: if the old file exists, it is quite unlikely
to be in use. I have asked elsewhere about disposing of files on win32, the
only problem with _darcs/trash is that hashed-storage can't know about it.

> - We need to sort out haskell_policy, the ratification of readFile all over 
> the
> place is just silly. Replacing it with hlint as has been suggested sounds like
> a very good idea.
Yes, I just didn't have the time to implement the hlint replacement yet. If
people feel this is important, I can try to get this done before end of SoC.

> - I think treeDiff is actually relatively unlikely to be wrong because it
> delegate s actual line by line diffing to Patch.Prim which hasn't
> changed. Storage.Hashed.Diff exists but seems unused, what's the status and
> plans there?
The line/line diff is not a problem -- as you say, this is done through hunk
canonization. The problem is that unsafeDiff does unhealthy amount of magic
about empty files, tacking newlines here and there. I have tried to simplify
the logic while retaining the behaviour, but bugs could have happened, even
though unlikely, as I'd expect these to show up on the testsuite. (I previously
got it wrong and it has shown up... but, you never know.)

> - unrecordedChanges - Why do IgnoreTimes and LookForAdds lead to ignoring the
> index?
IgnoreTimes has to -- we cannot rely on the hashes, since they in turn rely on
timestamps. For LookForAdds, this is because I haven't yet implemented anything
to combine the plain tree and the indexed tree. The index does not have entries
for files that are neither in pending nor in pristine -- we have to look in the
actual working copy. Of course, it would be good to tack the applicable hashes
from the index onto the plain tree. This is something that I plan to fix. A
suitable Tree union operation may be the right answer.

> - Presumably biggest risk is random bugs with invalid indexes. An explanation
> of when the index can become invalid would be helpful (there may be one in the
> code that I haven't spotted yet).
I recall stating this somewhere, but it could be in a mail or IRC. I'll add
that to the inline documentation:

-- | Mark the existing index as invalid. This has to be called whenever the
-- listing of pristine changes and will cause darcs to update the index next
-- time it tries to read it. (NB. This is about files added and removed from
-- pristine: changes to file content in either pristine or working are handled
-- transparently by the index reading code.)

The consequences of a missed invalidate would be that the file is considered
deleted or added to the working copy, respectively, regardless of its actual
state. I suppose recording such state could lead to inconsistent repository,
even. We should indeed double-check that the index is invalidated properly
(this should always happen before the pristine change happens, as extra
invalidations are harmless).

> - Storage.Hashed.Index: code is full of magic numbers
Yes, this module is a mess. :| I'll clean that up on HEAD, since I have already
done some improvements there (representing hashes in pure binary form instead
of ascii-hex strings). See also later about "finish".

> - Petr and I have already discussed overloading Tree and TreeIO over the
> container (IO). This would make writing test rigs easier, and mean that code
> that processes Tree could be polymorphic where appropriate. Overloaded code
> would typically constrain the container to be in Monad or perhaps Applicative.
This is actually done in HEAD. See
http://repos.mornfall.net/hashed-storage/dist/doc/html/hashed-storage/Storage-Hashed-Monad.html

> - Win32 removeFile workaround: not used consistently, should be hidden behind
> some standard abstraction?
Definitely, I just tacked this on in the beta cycle for 2.3 to make tests pass
on windows.

> - I don't really understand what virtualTreeIO does and doesn't do.
We can discuss this, or I can try to improve the haddocks.

> - I think TreeIO should be abstract, i.e. a newtype, unless there is some 
> value
> in exposing the fact that it is a StateT.
Yes, you are probably right. It was just me being lazy.

> - hashedTreeIO never deletes things, so what removes dead stuff?
Well, hashedTreeIO is currently not used. Well, it is used by darcs
check/repair in darcs-hs, which does its own cleanup. This is however something
that will need addressing.

> - why don't the two cases of hashedTreeIO.updateFile use the same code path?
Probably historical reasons. This code needs at least one more pass. I also
still don't know about the memory leak -- I can't reproduce it with
check/repair, but I distinctly recall that implementing "pull" on top of
hashedTreeIO leaked memory like a sieve. Could be a bug in my pull
implementation, too.

> - I'm very dubious about semantics of finish - why doesn't expandPath call it?
> Presumably the documented usage semantics of readIndex have something to do
> with this, but it feels fragile, and I only sort of understand the whole
> dirs_changed/finish stuff in readIndex'
You are right about being dubious. I however haven't found a better way to
reasonably implement index updates. In retrospect, it may have been better to
just provide a specialised "expand" for the index tree, that would also do the
update. I'll check if there's a reason preventing that, and if no, I'll go for
it in hashed-storage 0.4 and just get rid of finish.

Yours,
   Petr.
_______________________________________________
darcs-users mailing list
[email protected]
http://lists.osuosl.org/mailman/listinfo/darcs-users

Re: [darcs-users] darcs-hs/hashed-storage review

Reply via email to