> On 10 Apr 2017, at 22:54, Torsten Bögershausen <tbo...@web.de> wrote:
> 
> On 2017-04-09 21:11, Lars Schneider wrote:
> []
>> +------------------------
>> +packet:          git> command=smudge
>> +packet:          git> pathname=path/testfile.dat
>> +packet:          git> delay-able=1
>> +packet:          git> 0000
>> +packet:          git> CONTENT
>> +packet:          git> 0000
>> +packet:          git< status=delayed
>> +packet:          git< 0000
>> +packet:          git> delay-id=1
>> +packet:          git> 0000
>> +packet:          git< status=success
>> +packet:          git< 0000
> 
> (not sure if this was mentioned before)
> If a filter uses the delayed feature, I would read it as
> a response from the filter in the style:
> "Hallo Git, I need some time to process it, but as I have
> CPU capacity available, please send another blob,
> so I can chew them in parallel."
> 
> Can we save one round trip ?
> 
> packet:          git> command=smudge
> packet:          git> pathname=path/testfile.dat
> packet:          git> delay-id=1
> packet:          git> 0000
> packet:          git> CONTENT
> packet:          git> 0000
> packet:          git< status=delayed # this means: Git, please feed more
> packet:          git> 0000

Actually, this is how I implemented it first.

However, I didn't like that because we associate a
pathname with a delay-id. If the filter does not
delay the item then we associate a different
pathname with the same delay-id in the next request. 
Therefore I think it is better to present the delay-id 
*only* to the filter if the item is actually delayed.

I would be surprised if the extra round trip does impact
the performance in any meaningful way.


> # Git feeds the next blob.
> # This may be repeated some rounds.
> # (We may want to restrict the number of rounds for Git, see below)
> # After these some rounds, the filter needs to signal:
> # no more fresh blobs please, collect some data and I can free memory
> # and after that I am able to get a fresh blob.
> packet:          git> command=smudge
> packet:          git> pathname=path/testfile.dat
> packet:          git> delay-id=2
> packet:          git> 0000
> packet:          git> CONTENT
> packet:          git> 0000
> packet:          git< status=pleaseWait
> packet:          git> 0000
> 
> # Now Git needs to ask for ready blobs.

We could do this but I think this would only complicate
the protocol. I expect the filter to spool results to the
disk or something.


>> +------------------------
>> +
>> +If the filter supports the "delay" capability then it must support the
>> +"list_available_blobs" command. If Git sends this command, then the
>> +filter is expected to return a list of "delay_ids" of blobs that are
>> +available. The list must be terminated with a flush packet followed
>> +by a "success" status that is also terminated with a flush packet. If
>> +no blobs for the delayed paths are available, yet, then the filter is
>> +expected to block the response until at least one blob becomes
>> +available. The filter can tell Git that it has no more delayed blobs
>> +by sending an empty list.
>> +------------------------
>> +packet:          git> command=list_available_blobs
>> +packet:          git> 0000
>> +packet:          git< 7
> 
> Is the "7" the same as the "delay-id=1" from above?

Yes! Sorry, I will make this more clear in the next round.


> It may be easier to understand, even if it costs some bytes, to answer instead
> packet:          git< delay-id=1

Agreed!


> (And at this point, may I suggest to change "delay-id" into "request-id=1" ?

If there is no objection by another reviewer then I am happy to change it.


>> +packet:          git< 13
> 
> Same question here: is this the delay-id ?

Yes.


>> +packet:          git< 0000
>> +packet:          git< status=success
>> +packet:          git< 0000
>> +------------------------
>> +
>> +After Git received the "delay_ids", it will request the corresponding
>> +blobs again. These requests contain a "delay-id" and an empty content
>> +section. The filter is expected to respond with the smudged content
>> +in the usual way as explained above.
>> +------------------------
>> +packet:          git> command=smudge
>> +packet:          git> pathname=test-delay10.a
>> +packet:          git> delay-id=0
> 
> Minor question: Where does the "id=0" come from ?

That's the delay-id (aka request-id) that Git gave to the filter
on the first request (which was delayed).


>> +packet:          git> 0000
>> +packet:          git> 0000  # empty content!
>> +packet:          git< status=success
>> +packet:          git< 0000
>> +packet:          git< SMUDGED_CONTENT
>> +packet:          git< 0000
>> +packet:          git< 0000
> 
> OK, good.
> 
> The quest is: what happens next ?
> 
> 2 things, kind of in parallel, but we need to prioritize and serialize:
> - Send the next blob
> - Fetch ready blobs
> - And of course: ask for more ready blobs.
> (it looks as if Peff and Jakub had useful comments already,
>  so I can stop here?)

I would like to keep the mechanism as follows:

1. sends all blobs to the filter
2. fetch blobs until we are done

@Taylor: Do you think that would be OK for LFS?


> In general, Git should not have a unlimited number of blobs outstanding,
> as memory constraints may apply.
> There may be a config variable for the number of outstanding blobs,
> (similar to the window size in other protocols) and a variable
> for the number of "send bytes in outstanding blobs"
> (similar to window size (again!) in e.g TCP)
> 
> The number of outstanding blobs is may be less important, and it is more
> important to monitor the number of bytes we keep in memory in some way.
> 
> Something like "we set a limit to 500K of out standng data", once we are
> above the limit, don't send any new blobs.

I don't expect the filter to keep everything in memory. If there is no memory
anymore then I expect the filter to spool to disk. This keeps the protocol 
simple. 
If this turns out to be not sufficient then we could improve that later, too.


Thanks,
Lars

Reply via email to