W dniu 2016-07-28 o 15:29, Jeff King pisze: > On Thu, Jul 28, 2016 at 09:16:18AM +0200, Lars Schneider wrote: > >> But Peff ($gmane/299902), Duy, and Eric, seemed to prefer the pkt-line >> solution (gmane is down - otherwise I would have given you the links). > > FWIW, I think there are arguments for transmitting size + content > (namely, that it is simpler); the downside is that it doesn't allow > streaming.
And that it requires for the filter to know the size of its output upfront (which, as I wrote, might be easy to do based on size of input and data stored elsewhere, or might need generating whole output to know). I don't know how parallel Git is, but if it is parallel enough, and other limits do not apply (limited amount of CPU cores, I/O limits), without streaming new filter protocol might be slower, unless startup time dominates (MS Windows?): Current parallel: | startup | processing 1 | | startup | processing 2 | | startup | processing 3 | | startup | processing 4 | Protocol v2: | startup | processing 1 | processing 2 | processing 3 | processing 4 | > > So I think there are two viable alternatives: > > 1. Total size of data in ASCII decimal, newline, then that many bytes > of content. > > 2. No size header, then a series of pkt-lines followed by a flush > packet. 3. Optional size header[2][3], then a series of pkt-lines followed by a flush packet[4]. [2] Git should always provide size, because it is easy to do, and I think quite cheap (stored with blob, stored in index, or stat() on file away). Filter can provide size if it is easy to calculate, or approximation of size / size hint[5] - it helps to avoid reallocation. [3] It is also a place where filter can pass error conditions that are known before starting processing a file. [4] On one hand you need to catch cases where real size is larger than size sent upfront, or smaller than size sent upfront; on the other hand it might be a place where to send warnings and errors... unless we utilize stderr of a process (but then there is a problem of deadlocking, I think). [5] I suggest <size as ascii decimal> "approx" SPC <size as ascii decimal> "unknown" "fail" > And you should choose between the two based on whether it's more > important to allow streaming, or more important to make the filter > implementations simple[1]. > > Any solution that is in between those (like sending a size header and > then using pktlines anyway) is sacrificing simplicity but not getting > the streaming benefits. > > -Peff > > [1] I haven't thought hard enough about it to have a real opinion. My > gut says to go with the streaming, just because we've had to > retrofit streaming in other areas when dealing with blobs, so I > think we'll end up there eventually. So choosing a simpler protocol > like (1) would probably mean eventually implementing a next-version > protocol that does (2), and having to support both. > > PS Jakub asked for links, but gmane is down. Here are the relevant threads: > > http://public-inbox.org/git/20160720134916.gb19...@sigill.intra.peff.net > > > http://public-inbox.org/git/20160722154900.19477-1-larsxschneider%40gmail.com/t/#u > -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html