Re: RFE: API for an efficient retrieval of server-side mergeinfo data

Marc Strapetz Tue, 18 Feb 2014 02:59:24 -0800

On 17.02.2014 18:36, Julian Foad wrote:
> Marc Strapetz wrote:
> 
>>> ... I'll dig into the cache code ...
>>
>> I did that now and the storage is quite simple: we have a main file
>> which contains the diff (added, removed) for every path in every
>> revision and a revision-based index file with constant record length (to
>> quickly locate entries in the main file).
>>
>> This storage allows to efficiently query for the mergeinfo diff for a
>> path in a certain revision. That's sufficient to build the merge arrows.
>> Assembling the complete mergeinfo for a certain revision is hard with
>> this cache, but actually not necessary for our use case.
>>
>> Hence an API like the following should work well for us:
>>
>> interface MergeinfoDiffCallback {
>>   void mergeinfoDiff(int revision,
>>                      Map<String, Mergeinfo> pathToAddedMergeinfo,
>>                      Map<String, Mergeinfo> pathToRemovedMergeinfo);
>> }
>>
>> void getMergeinfoDiff(String rootPath,
>>                       long fromRev, long toRev,
>>                       MergeinfoDiffCallback callback)
>>                       throws ClientException;
>>
>> This should give us all mergeinfo which affects any path at or below
>> rootPath.
>>
>> When disregarding our particular use case, a more consistent API could be:
>>
>> void getMergeinfoDiff(Iterable<String> paths,
>>                       long fromRev, long toRev,
>>                       Mergeinfo.Inheritance inherit,
>>                       boolean includeDescendants,    
>>                       MergeinfoDiffCallback callback)
>>                       throws ClientException;
> 
> I want to discourage callers from knowing or caring how the mergeinfo is 
> stored, so I want to leave out the 'inherit' parameter.
> 
> I also think it makes sense not to offer the options of ignoring descendants 
> (that is, subtree mergeinfo), or specifying multiple paths. After all, this 
> is not a low level API to be used for implementing the mergeinfo subsystem, 
> it's a high level query.
> 
> So let's use the simpler version that's sufficient for your use case.


That will be fine.

>> The mergeinfo diff should be received starting at fromRev and ending at
>> toRev. No callback is expected if there is no mergeinfo diff for a
>> certain revision. Depending on the server-side storage, we may require
>> to always have fromRev >= toRev or always fromRev <= toRev. If it
>> doesn't matter, better have always fromRev <= toRev (for reasons given
>> below).
> 
> The same procedure could work either forwards or backwards, it doesn't really 
> matter as long as you know which way it is going. Often it is useful to know 
> about the more recent changes first, and have the option to look back right 
> to revision 0 if necessary.

>From cache perspective it's easier to build the cache starting at r0:
then cache files will contain information for older revision at lower
positions. This allows to crop files easily at a certain revision and
rebuild them from there. That's something we do, if a Log message is
modified from within the GUI (it might not play a role for mergeinfo,
though). Anyway, I agree that receiving mergeinfo for more recent
revisions first is reasonable as well. Hence if you say the effort is
the same, then we could allow both: fromRev <= toRev, in which case we
will received mergeinfo in ascending order and fromRev > toRev in which
case it will be descending order?

>> Regarding the usage, let's assume always fromRev <= toRev, then we will
>> invoke
>>
>> getMergeinfoDiff(cacheRoot, 0, head, callback)
>>
>> This should start returning mergeinfo diff immediately, starting at
>> revision 0, so we quickly make at least a bit of progress. Now, if the
>> cache building process is shutdown and restarted later, it will resume
>> with the latest known revision:
>>
>> getMergeinfoDiff(cacheRoot, latestKnownRevision, head, callback)
>>
>> This procedure will be performed until we have caught up with head.
>> Note, that the latestKnownRevision is the last revision for which we
>> have received a callback. Depending on the server-side storage, this may
>> be different from the current revision which the server is currently
>> processing at the time the cache building process is shutdown. Hence it
>> will be important that ranges for which no mergeinfo diff is present
>> will be processed quickly on the server-side, otherwise we could run
>> into some kind of endless loop, if the cache building process is
>> shutdown and resumed frequently.
> 
> Yes -- if the server takes a long time to work its way through a large range 
> of (say a million) revisions where there are no mergeinfo changes, there is 
> no graceful way to stop the procedure part way through, and no way to 
> discover how far it has searched when you kill it. Maybe that is not 
> important. There is a client-side work-around: request ranges of say a 
> thousand revisions at a time, and then you can easily keep track of how many 
> of these requests have been completed.

OK, that will work.

-Marc

Re: RFE: API for an efficient retrieval of server-side mergeinfo data

Reply via email to