On Mon, Mar 02, 2015 at 11:38:38AM -0500, Richard Hipp wrote:
> On 3/2/15, Joerg Sonnenberger <jo...@britannica.bec.de> wrote:
> > On Mon, Mar 02, 2015 at 07:30:44AM -0500, Richard Hipp wrote:
> >> So I was thinking, could Fossil 2.0 be enhanced in ways to support
> >> scaling to the point where it works on really massive projects?
> >
> > I think the single biggest practical issue right now still goes back to
> > the baseline manifests not being efficient enough. Would you consider
> > changing the rules to allow truely incremental manifests? I agree that
> > having full manifests is sometimes nicer, but I think those would be
> > build on-demand and cached separately. I belive that is the majority of
> > the current meta data, which matters a lot whenever a rebuild happens.
> >
> 
> The current mechanism is to have periodic full baseline manifests, and
> then have deltas against those baselines in between.  Hence, no more
> than two artifacts ever need to be decoded in order to access a
> manifest - the baseline and its delta.

I know. The manifest contains two parts: non-file content and the file
list. For delta manifests, the file list is encoded as changes relative
to the base line.

> Are you proposing to have deltas of deltas, so that a potentially
> large number of artifacts need to be decoded in order to reconstruct
> the complete manifest?

I think we have two different situations when it comes to access the
file list:

(1) Getting the full list. This is primarily used for initial checks and
as part of the status handling of checkouts, maybe also for the web view.

(2) Getting the changes relative to another checkin. This is what update
etc. is interested in.

The problem with the base line encoding is that it still has a high
degree of redundancy. While delta compression removes a good chunk of
the overhead in terms of disk space, rebuild still has to process the
full amount. That's a significant part for a large tree. My suggestion
is to store a plain file delta in the manifest. Let's call this is a
pure delta manifest. Rebuild parsing is then linear in the number of
changed files. The plink table is a direct mapping of the pure delta
manifest, they have effectively the same data. To keep the performance
of case (1) above, a new full manifest table is stored separate and
computed on demand. That can be either during rebuild or on first
access. Heuristics like "X commits since last full manifest" can be
applied. This is a (local) cache, no need to transfer it via sync
protocol, no need to preserve it during rebuild either.

Joerg
_______________________________________________
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users

Reply via email to