On Mon, Mar 02, 2015 at 11:38:38AM -0500, Richard Hipp wrote: > On 3/2/15, Joerg Sonnenberger <jo...@britannica.bec.de> wrote: > > On Mon, Mar 02, 2015 at 07:30:44AM -0500, Richard Hipp wrote: > >> So I was thinking, could Fossil 2.0 be enhanced in ways to support > >> scaling to the point where it works on really massive projects? > > > > I think the single biggest practical issue right now still goes back to > > the baseline manifests not being efficient enough. Would you consider > > changing the rules to allow truely incremental manifests? I agree that > > having full manifests is sometimes nicer, but I think those would be > > build on-demand and cached separately. I belive that is the majority of > > the current meta data, which matters a lot whenever a rebuild happens. > > > > The current mechanism is to have periodic full baseline manifests, and > then have deltas against those baselines in between. Hence, no more > than two artifacts ever need to be decoded in order to access a > manifest - the baseline and its delta.
I know. The manifest contains two parts: non-file content and the file list. For delta manifests, the file list is encoded as changes relative to the base line. > Are you proposing to have deltas of deltas, so that a potentially > large number of artifacts need to be decoded in order to reconstruct > the complete manifest? I think we have two different situations when it comes to access the file list: (1) Getting the full list. This is primarily used for initial checks and as part of the status handling of checkouts, maybe also for the web view. (2) Getting the changes relative to another checkin. This is what update etc. is interested in. The problem with the base line encoding is that it still has a high degree of redundancy. While delta compression removes a good chunk of the overhead in terms of disk space, rebuild still has to process the full amount. That's a significant part for a large tree. My suggestion is to store a plain file delta in the manifest. Let's call this is a pure delta manifest. Rebuild parsing is then linear in the number of changed files. The plink table is a direct mapping of the pure delta manifest, they have effectively the same data. To keep the performance of case (1) above, a new full manifest table is stored separate and computed on demand. That can be either during rebuild or on first access. Heuristics like "X commits since last full manifest" can be applied. This is a (local) cache, no need to transfer it via sync protocol, no need to preserve it during rebuild either. Joerg _______________________________________________ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users