On 17.02.2014 18:36, Julian Foad wrote: > Marc Strapetz wrote: > >>> ... I'll dig into the cache code ... >> >> I did that now and the storage is quite simple: we have a main file >> which contains the diff (added, removed) for every path in every >> revision and a revision-based index file with constant record length (to >> quickly locate entries in the main file). >> >> This storage allows to efficiently query for the mergeinfo diff for a >> path in a certain revision. That's sufficient to build the merge arrows. >> Assembling the complete mergeinfo for a certain revision is hard with >> this cache, but actually not necessary for our use case. >> >> Hence an API like the following should work well for us: >> >> interface MergeinfoDiffCallback { >> void mergeinfoDiff(int revision, >> Map<String, Mergeinfo> pathToAddedMergeinfo, >> Map<String, Mergeinfo> pathToRemovedMergeinfo); >> } >> >> void getMergeinfoDiff(String rootPath, >> long fromRev, long toRev, >> MergeinfoDiffCallback callback) >> throws ClientException; >> >> This should give us all mergeinfo which affects any path at or below >> rootPath. >> >> When disregarding our particular use case, a more consistent API could be: >> >> void getMergeinfoDiff(Iterable<String> paths, >> long fromRev, long toRev, >> Mergeinfo.Inheritance inherit, >> boolean includeDescendants, >> MergeinfoDiffCallback callback) >> throws ClientException; > > I want to discourage callers from knowing or caring how the mergeinfo is > stored, so I want to leave out the 'inherit' parameter. > > I also think it makes sense not to offer the options of ignoring descendants > (that is, subtree mergeinfo), or specifying multiple paths. After all, this > is not a low level API to be used for implementing the mergeinfo subsystem, > it's a high level query. > > So let's use the simpler version that's sufficient for your use case.
That will be fine. >> The mergeinfo diff should be received starting at fromRev and ending at >> toRev. No callback is expected if there is no mergeinfo diff for a >> certain revision. Depending on the server-side storage, we may require >> to always have fromRev >= toRev or always fromRev <= toRev. If it >> doesn't matter, better have always fromRev <= toRev (for reasons given >> below). > > The same procedure could work either forwards or backwards, it doesn't really > matter as long as you know which way it is going. Often it is useful to know > about the more recent changes first, and have the option to look back right > to revision 0 if necessary. >From cache perspective it's easier to build the cache starting at r0: then cache files will contain information for older revision at lower positions. This allows to crop files easily at a certain revision and rebuild them from there. That's something we do, if a Log message is modified from within the GUI (it might not play a role for mergeinfo, though). Anyway, I agree that receiving mergeinfo for more recent revisions first is reasonable as well. Hence if you say the effort is the same, then we could allow both: fromRev <= toRev, in which case we will received mergeinfo in ascending order and fromRev > toRev in which case it will be descending order? >> Regarding the usage, let's assume always fromRev <= toRev, then we will >> invoke >> >> getMergeinfoDiff(cacheRoot, 0, head, callback) >> >> This should start returning mergeinfo diff immediately, starting at >> revision 0, so we quickly make at least a bit of progress. Now, if the >> cache building process is shutdown and restarted later, it will resume >> with the latest known revision: >> >> getMergeinfoDiff(cacheRoot, latestKnownRevision, head, callback) >> >> This procedure will be performed until we have caught up with head. >> Note, that the latestKnownRevision is the last revision for which we >> have received a callback. Depending on the server-side storage, this may >> be different from the current revision which the server is currently >> processing at the time the cache building process is shutdown. Hence it >> will be important that ranges for which no mergeinfo diff is present >> will be processed quickly on the server-side, otherwise we could run >> into some kind of endless loop, if the cache building process is >> shutdown and resumed frequently. > > Yes -- if the server takes a long time to work its way through a large range > of (say a million) revisions where there are no mergeinfo changes, there is > no graceful way to stop the procedure part way through, and no way to > discover how far it has searched when you kill it. Maybe that is not > important. There is a client-side work-around: request ranges of say a > thousand revisions at a time, and then you can easily keep track of how many > of these requests have been completed. OK, that will work. -Marc