On Fri, Aug 05, 2005 at 04:38:04PM +0200, Florian Weimer wrote:
> * David Roundy:
> > I.e. rather than caching to avoid transport, I'd like to avoid
> > downloading any data we don't need.  I don't see any reason why we
> > should need zsyncish optimizations for fetching the inventory,
> > unless perhaps the inventory is very large because there aren't any
> > tags.
> 
> My benchmark is John Goerzen's fptools repository (created from
> fptools/GHC CVS, see <http://darcs.complete.org/fptools/>).
> 
> Your suggestion seems to imply that I wouldn't have to download 4
> megabyte of inventory data if John tagged his repository regularly.
> Is this true?  Below, you mentioned something about push not splitting
> the inventory, would this be relevant in this case?

Ah, this is an automatically generated repository.  If he tagged it
regularly and ran optimize on the main server, there wouldn't be a
large inventory to download each time.

It looks like the fptools repo has been tagged and optimized already
sometime before July 12, unless the dates on the server are wrong, the
slowness you're seeing may be due to some other issue.

Are you pulling into an unmodified repository, or one with local patches?
It may be that you need to run optimize locally, or even optimize
--reorder, in order to benefit from the inventory-splitting.  If neither of
these solves the problem, we've probably got an inefficiency somewhere that
can be improved.  Even if one of them does solve the problem, we still
*ought* to be able to avoid downloading the 4MB of old history, since you
already have all that information locally.

The code to be improved is most likely in Depends, and improving it would
also help us better handle --partial repositories.  Ideally, we'd have
functions like get_common_and_uncommon which are smart enough to avoid one
of the two sides when possible, but if necesary to go in the other
direction.  The problem of course is that it's hard to know where the best
tradeoff lies (i.e. how much commutation of local patches is worth the
trouble to avoid downloading more inventory... especially since there is no
cheap way to determine how much commutation will be required).

The other side of the solution would be to implement hashed inventories,
which will make (optionally) caching both remote patches and remote
inventories cheap and easy.
-- 
David Roundy
http://www.darcs.net

_______________________________________________
darcs-devel mailing list
[email protected]
http://www.abridgegame.org/cgi-bin/mailman/listinfo/darcs-devel

Reply via email to