Re: [PATCH v4 4/4] convert: add "status=delayed" to filter process protocol

2017-05-29 Thread Junio C Hamano
Junio C Hamano  writes:

> If you are depending on a single topic in 'next', it is better to
> build on the tip of that topic, not on 'next', if you can figure out
> where the tip is.  In practice, while we are exchanging patches via
> e-mail, there should be no noticeable difference either way [*1*],
> ...

And the forgotten foot note would have said something like this:

 *1* If you are depending only on a single topic in 'next', and
 other topics in 'next' do not interfere with your work, then by
 definition, your patches that apply cleanly on 'next' ought to
 apply cleanly on the tip of that single topic.

;-)


Re: [PATCH v4 4/4] convert: add "status=delayed" to filter process protocol

2017-05-29 Thread Junio C Hamano
Lars Schneider  writes:

>> That's right. There might be some code sharing opportunity with Ben's
>> code that is already in "next":
>> https://github.com/git/git/blob/next/convert.c#L660-L677
>> 
>> Would it be useful for you if I send v5 with the changes rebased 
>> onto "next"?
>
> Hi Junio,
>
> sorry for bugging you again, but Ben's topic did not make it to "master"
> today. Is it OK if I rebase my topic onto "next" and resend?

Sorry, your earlier question was lost in the noise and I should have
picked it up during my last sweep of leftover bits.

If you are depending on a single topic in 'next', it is better to
build on the tip of that topic, not on 'next', if you can figure out
where the tip is.  In practice, while we are exchanging patches via
e-mail, there should be no noticeable difference either way [*1*],
but once you start throwing a complex and long series, you may want
to publish it to a public repository for reviewers and the
maintainer to pull, and that workflow might give us an easier way to
review, but a topic based on 'next' will never have a chance to be
pulled to be merged for real, as merging its tip to 'master' means
it will bring all other junk that may not ready.  So if you anticipate
that to happen someday, practicing to build on things that are only
needed (e.g. if you depend on two topics, you may start by merging
them on top of 'master' and then building your change on top) is a
good idea.

No matter what you do, please mention on top of what you built your
work.

Thanks.


Re: [PATCH v4 4/4] convert: add "status=delayed" to filter process protocol

2017-05-29 Thread Lars Schneider

> On 23 May 2017, at 10:43, Lars Schneider  wrote:
> 
> 
>> On 23 May 2017, at 07:22, Junio C Hamano  wrote:
>> 
>> Lars Schneider  writes:
>> 
> + sigchain_pop(SIGPIPE);
> +
> + if (err || errno == EPIPE) {
 
 This looks strange, at first glance.
 Do we set errno to 0 before ?
 Or is there a trick that EPIPE can only be reached,
 if it is "our" error ?
>>> 
>>> You are right and I'll fix it! 
>>> Thanks for reminding me! 
>>> Peff also noticed that some time ago:
>>> http://public-inbox.org/git/20170411200520.oivytvlzkdu7b...@sigill.intra.peff.net/
>> 
>> Ben Peart's bp/sub-process-convert-filter topic also had the same
>> EPIPE issues in its earlier incarnation, IIRC.  I haven't looked at
>> this topic for some time, but I wonder if we can share code with it.
> 
> That's right. There might be some code sharing opportunity with Ben's
> code that is already in "next":
> https://github.com/git/git/blob/next/convert.c#L660-L677
> 
> Would it be useful for you if I send v5 with the changes rebased 
> onto "next"?

Hi Junio,

sorry for bugging you again, but Ben's topic did not make it to "master"
today. Is it OK if I rebase my topic onto "next" and resend?

Thanks,
Lars

Re: [PATCH v4 4/4] convert: add "status=delayed" to filter process protocol

2017-05-23 Thread Lars Schneider

> On 23 May 2017, at 07:22, Junio C Hamano  wrote:
> 
> Lars Schneider  writes:
> 
 +  sigchain_pop(SIGPIPE);
 +
 +  if (err || errno == EPIPE) {
>>> 
>>> This looks strange, at first glance.
>>> Do we set errno to 0 before ?
>>> Or is there a trick that EPIPE can only be reached,
>>> if it is "our" error ?
>> 
>> You are right and I'll fix it! 
>> Thanks for reminding me! 
>> Peff also noticed that some time ago:
>> http://public-inbox.org/git/20170411200520.oivytvlzkdu7b...@sigill.intra.peff.net/
> 
> Ben Peart's bp/sub-process-convert-filter topic also had the same
> EPIPE issues in its earlier incarnation, IIRC.  I haven't looked at
> this topic for some time, but I wonder if we can share code with it.

That's right. There might be some code sharing opportunity with Ben's
code that is already in "next":
https://github.com/git/git/blob/next/convert.c#L660-L677

Would it be useful for you if I send v5 with the changes rebased 
onto "next"?

Thanks,
Lars




Re: [PATCH v4 4/4] convert: add "status=delayed" to filter process protocol

2017-05-22 Thread Junio C Hamano
Lars Schneider  writes:

>>> +   sigchain_pop(SIGPIPE);
>>> +
>>> +   if (err || errno == EPIPE) {
>> 
>> This looks strange, at first glance.
>> Do we set errno to 0 before ?
>> Or is there a trick that EPIPE can only be reached,
>> if it is "our" error ?
>
> You are right and I'll fix it! 
> Thanks for reminding me! 
> Peff also noticed that some time ago:
> http://public-inbox.org/git/20170411200520.oivytvlzkdu7b...@sigill.intra.peff.net/

Ben Peart's bp/sub-process-convert-filter topic also had the same
EPIPE issues in its earlier incarnation, IIRC.  I haven't looked at
this topic for some time, but I wonder if we can share code with it.



Re: [PATCH v4 4/4] convert: add "status=delayed" to filter process protocol

2017-05-22 Thread Lars Schneider

> On 22 May 2017, at 21:58, Torsten Bögershausen  wrote:
> 
> On 2017-05-22 15:50, Lars Schneider wrote:
>> +
>> +int async_query_available_blobs(const char *cmd, struct string_list 
>> *delayed_paths)
>> +{
>> +int err;
>> +char *line;
>> +struct cmd2process *entry;
>> +struct child_process *process;
>> +struct strbuf filter_status = STRBUF_INIT;
>> +
>> +entry = find_multi_file_filter_entry(_process_map, cmd);
>> +if (!entry) {
>> +error("external filter '%s' is not available anymore although "
>> +  "not all paths have been filtered", cmd);
>> +return 0;
>> +}
>> +process = >process;
>> +sigchain_push(SIGPIPE, SIG_IGN);
>> +
>> +err = packet_write_fmt_gently(
>> +process->in, "command=list_available_blobs\n");
>> +if (err)
>> +goto done;
>> +
>> +err = packet_flush_gently(process->in);
>> +if (err)
>> +goto done;
>> +
>> +for (;;) {
>> +const char* pre = "pathname=";
>> +const int pre_len = strlen(pre);
>> +line = packet_read_line(process->out, NULL);
>> +if (!line)
>> +break;
>> +err = strlen(line) <= pre_len || strncmp(line, pre, pre_len);
>> +if (err)
>> +goto done;
>> +string_list_insert(delayed_paths, xstrdup(line+pre_len));
>> +}
>> +
>> +read_multi_file_filter_status(process->out, _status);
>> +err = strcmp(filter_status.buf, "success");
>> +
>> +done:
>> +sigchain_pop(SIGPIPE);
>> +
>> +if (err || errno == EPIPE) {
> 
> This looks strange, at first glance.
> Do we set errno to 0 before ?
> Or is there a trick that EPIPE can only be reached,
> if it is "our" error ?

You are right and I'll fix it! 
Thanks for reminding me! 
Peff also noticed that some time ago:
http://public-inbox.org/git/20170411200520.oivytvlzkdu7b...@sigill.intra.peff.net/

Thanks,
Lars

Re: [PATCH v4 4/4] convert: add "status=delayed" to filter process protocol

2017-05-22 Thread Lars Schneider

> On 22 May 2017, at 21:52, Torsten Bögershausen  wrote:
> 
> On 2017-05-22 15:50, Lars Schneider wrote:
>> +After Git received the pathnames, it will request the corresponding
>> +blobs again. These requests contain a pathname and an empty content
>> +section. The filter is expected to respond with the smudged content
>> +in the usual way as explained above.
>> +
>> +packet:  git> command=smudge
>> +packet:  git> pathname=path/testfile.dat
>> +packet:  git> 
>> +packet:  git>   # empty content!
>> +packet:  git< status=success
>> +packet:  git< 
>> +packet:  git< SMUDGED_CONTENT
>> +packet:  git< 
>> +packet:  git< 
>> +
> 
> The documentation mentions "" 2 times.
> Is this a bug in the docu ? Or a feature which may need a comment ?

The first  marks the end of the content and the second 
marks the end of an empty status list.

Explained in the existing protocol here:
https://github.com/git/git/blob/10c78a162fa821ee85203165b805ff46be454091/Documentation/gitattributes.txt#L451-L457
https://github.com/git/git/blob/10c78a162fa821ee85203165b805ff46be454091/Documentation/gitattributes.txt#L464

For clarity I should probably change it to this:

...
packet:  git< status=success
packet:  git< 
packet:  git< SMUDGED_CONTENT
packet:  git< 
packet:  git<   # empty list, keep "status=success" unchanged!

Thanks,
Lars

Re: [PATCH v4 4/4] convert: add "status=delayed" to filter process protocol

2017-05-22 Thread Torsten Bögershausen
On 2017-05-22 15:50, Lars Schneider wrote:
> +
> +int async_query_available_blobs(const char *cmd, struct string_list 
> *delayed_paths)
> +{
> + int err;
> + char *line;
> + struct cmd2process *entry;
> + struct child_process *process;
> + struct strbuf filter_status = STRBUF_INIT;
> +
> + entry = find_multi_file_filter_entry(_process_map, cmd);
> + if (!entry) {
> + error("external filter '%s' is not available anymore although "
> +   "not all paths have been filtered", cmd);
> + return 0;
> + }
> + process = >process;
> + sigchain_push(SIGPIPE, SIG_IGN);
> +
> + err = packet_write_fmt_gently(
> + process->in, "command=list_available_blobs\n");
> + if (err)
> + goto done;
> +
> + err = packet_flush_gently(process->in);
> + if (err)
> + goto done;
> +
> + for (;;) {
> + const char* pre = "pathname=";
> + const int pre_len = strlen(pre);
> + line = packet_read_line(process->out, NULL);
> + if (!line)
> + break;
> + err = strlen(line) <= pre_len || strncmp(line, pre, pre_len);
> + if (err)
> + goto done;
> + string_list_insert(delayed_paths, xstrdup(line+pre_len));
> + }
> +
> + read_multi_file_filter_status(process->out, _status);
> + err = strcmp(filter_status.buf, "success");
> +
> +done:
> + sigchain_pop(SIGPIPE);
> +
> + if (err || errno == EPIPE) {

This looks strange, at first glance.
Do we set errno to 0 before ?
Or is there a trick that EPIPE can only be reached,
if it is "our" error ?


> + if (!strcmp(filter_status.buf, "error")) {
> + /* The filter signaled a problem with the file. */
> + } else {
> + /*
> +  * Something went wrong with the protocol filter.
> +  * Force shutdown and restart if another blob requires
> +  * filtering.
> +  */
> + error("external filter '%s' failed", cmd);
> + kill_multi_file_filter(_process_map, entry);
> + }
> + }
> + return !err;
> +}
> +



Re: [PATCH v4 4/4] convert: add "status=delayed" to filter process protocol

2017-05-22 Thread Torsten Bögershausen
On 2017-05-22 15:50, Lars Schneider wrote:
> +After Git received the pathnames, it will request the corresponding
> +blobs again. These requests contain a pathname and an empty content
> +section. The filter is expected to respond with the smudged content
> +in the usual way as explained above.
> +
> +packet:  git> command=smudge
> +packet:  git> pathname=path/testfile.dat
> +packet:  git> 
> +packet:  git>   # empty content!
> +packet:  git< status=success
> +packet:  git< 
> +packet:  git< SMUDGED_CONTENT
> +packet:  git< 
> +packet:  git< 
> +

The documentation mentions "" 2 times.
Is this a bug in the docu ? Or a feature which may need a comment ?



Re: [PATCH v4 4/4] convert: add "status=delayed" to filter process protocol

2017-05-22 Thread Taylor Blau
I have no remaining concerns about the protocol specification in terms of
implementing a filter with this capability.


[PATCH v4 4/4] convert: add "status=delayed" to filter process protocol

2017-05-22 Thread Lars Schneider
Some `clean` / `smudge` filters might require a significant amount of
time to process a single blob (e.g. the Git LFS smudge filter might
perform network requests). During this process the Git checkout
operation is blocked and Git needs to wait until the filter is done to
continue with the checkout.

Teach the filter process protocol (introduced in edcc858) to accept the
status "delayed" as response to a filter request. Upon this response Git
continues with the checkout operation. After the checkout operation Git
calls "finish_delayed_checkout" which queries the filter for remaining
blobs. If the filter is still working on the completion, then the filter
is expected to block. If the filter has completed all remaining blobs
then an empty response is expected.

Git has a multiple code paths that checkout a blob. Support delayed
checkouts only in `clone` (in unpack-trees.c) and `checkout` operations.

Signed-off-by: Lars Schneider 
---
 Documentation/gitattributes.txt |  65 +-
 builtin/checkout.c  |   3 +
 cache.h |   6 +-
 convert.c   | 131 +++-
 convert.h   |  21 +
 entry.c | 110 +--
 t/t0021-conversion.sh   |  74 
 t/t0021/rot13-filter.pl | 189 ++--
 unpack-trees.c  |   2 +
 9 files changed, 507 insertions(+), 94 deletions(-)

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 4736483865..ec92d3e3fa 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -425,8 +425,8 @@ packet:  git< capability=clean
 packet:  git< capability=smudge
 packet:  git< 
 
-Supported filter capabilities in version 2 are "clean" and
-"smudge".
+Supported filter capabilities in version 2 are "clean", "smudge",
+and "delay".
 
 Afterwards Git sends a list of "key=value" pairs terminated with
 a flush packet. The list will contain at least the filter command
@@ -512,12 +512,69 @@ the protocol then Git will stop the filter process and 
restart it
 with the next file that needs to be processed. Depending on the
 `filter..required` flag Git will interpret that as error.
 
-After the filter has processed a blob it is expected to wait for
-the next "key=value" list containing a command. Git will close
+After the filter has processed a command it is expected to wait for
+a "key=value" list containing the next command. Git will close
 the command pipe on exit. The filter is expected to detect EOF
 and exit gracefully on its own. Git will wait until the filter
 process has stopped.
 
+Delay
+^
+
+If the filter supports the "delay" capability, then Git can send the
+flag "can-delay" after the filter command and pathname. This flag
+denotes that the filter can delay filtering the current blob (e.g. to
+compensate network latencies) by responding with no content but with
+the status "delayed" and a flush packet.
+
+packet:  git> command=smudge
+packet:  git> pathname=path/testfile.dat
+packet:  git> can-delay=1
+packet:  git> 
+packet:  git> CONTENT
+packet:  git> 
+packet:  git< status=delayed
+packet:  git< 
+
+
+If the filter supports the "delay" capability then it must support the
+"list_available_blobs" command. If Git sends this command, then the
+filter is expected to return a list of pathnames of blobs that are
+available. The list must be terminated with a flush packet followed
+by a "success" status that is also terminated with a flush packet. If
+no blobs for the delayed paths are available, yet, then the filter is
+expected to block the response until at least one blob becomes
+available. The filter can tell Git that it has no more delayed blobs
+by sending an empty list.
+
+packet:  git> command=list_available_blobs
+packet:  git> 
+packet:  git< pathname=path/testfile.dat
+packet:  git< pathname=path/otherfile.dat
+packet:  git< 
+packet:  git< status=success
+packet:  git< 
+
+
+After Git received the pathnames, it will request the corresponding
+blobs again. These requests contain a pathname and an empty content
+section. The filter is expected to respond with the smudged content
+in the usual way as explained above.
+
+packet:  git> command=smudge
+packet:  git> pathname=path/testfile.dat
+packet:  git> 
+packet:  git>   # empty content!
+packet:  git< status=success
+packet:  git< 
+packet:  git< SMUDGED_CONTENT
+packet:  git< 
+packet:  git< 
+
+
+Example
+^^^
+