Re: [PATCH v1 3/3] convert: add filter..useProtocol option

Lars Schneider Mon, 25 Jul 2016 13:26:09 -0700

On 25 Jul 2016, at 00:36, Ramsay Jones <[email protected]> wrote:


> On 24/07/16 18:16, Lars Schneider wrote:
>> 
>> On 23 Jul 2016, at 01:19, Ramsay Jones <[email protected]> wrote:
>> 
>>> On 22/07/16 16:49, [email protected] wrote:
>>>> From: Lars Schneider <[email protected]>
>>>> 
>>>> Git's clean/smudge mechanism invokes an external filter process for every
>>>> single blob that is affected by a filter. If Git filters a lot of blobs
>>>> then the startup time of the external filter processes can become a
>>>> significant part of the overall Git execution time.
>>>> 
>>>> This patch adds the filter.<driver>.useProtocol option which, if enabled,
>>>> keeps the external filter process running and processes all blobs with
>>>> the following protocol over stdin/stdout.
>>>> 
>>>> 1. Git starts the filter on first usage and expects a welcome message
>>>> with protocol version number:
>>>>    Git <-- Filter: "git-filter-protocol\n"
>>>>    Git <-- Filter: "version 1"
>>> 
>>> Hmm, I was a bit surprised to see a 'filter' talk first (but so long as the
>>> interaction is fully defined, I guess it doesn't matter).
>>> 
>>> [If you wanted to check for a version, you could add a "version" command
>>> instead, just like "clean" and "smudge".]
>> 
>> It was a conscious decision to have the `filter` talk first. My reasoning 
>> was:
>> 
>> (1) I want a reliable way to distinguish the existing filter protocol 
>> ("single-shot 
>> invocation") from the new one ("long running"). I don't think there would be 
>> a
>> situation where the existing protocol would talk first. Therefore the users 
>> would
>> not accidentally mix them with a possibly half working, undetermined, 
>> outcome.
> 
> If an 'single-shot' filter were incorrectly configured, instead of a new one, 
> then
> the interaction could last a little while - since it would result in 
> deadlock! ;-)
> 
> [If Git talks first instead, configuring a 'single-shot' filter _may_ still 
> result
> in a deadlock - depending on pipe size, etc.]

Do you think this is an issue that needs to be addressed in the first version?
If yes, I would probably look into "select" to specify a timeout for the filter.
However, wouldn't the current "single-shot" clean/smudge filter block in the 
same way if they don't write anything?


>> (2) In the future we could extend the pipe protocol (see $gmane/297994, it's 
>> very
>> interesting). A filter could check Git's version and then pick the most 
>> appropriate
>> filter protocol on startup.
>> 
>> 
>>> [...]
>>>> +static struct cmd2process *start_protocol_filter(const char *cmd)
>>>> +{
>>>> +  int ret = 1;
>>>> +  struct cmd2process *entry = NULL;
>>>> +  struct child_process *process = NULL;
>>>> +  struct strbuf nbuf = STRBUF_INIT;
>>>> +  struct string_list split = STRING_LIST_INIT_NODUP;
>>>> +  const char *argv[] = { NULL, NULL };
>>>> +  const char *header = "git-filter-protocol\nversion";
>>>> +
>>>> +  entry = xmalloc(sizeof(*entry));
>>>> +  hashmap_entry_init(entry, strhash(cmd));
>>>> +  entry->cmd = cmd;
>>>> +  process = &entry->process;
>>>> +
>>>> +  child_process_init(process);
>>>> +  argv[0] = cmd;
>>>> +  process->argv = argv;
>>>> +  process->use_shell = 1;
>>>> +  process->in = -1;
>>>> +  process->out = -1;
>>>> +
>>>> +  if (start_command(process)) {
>>>> +          error("cannot fork to run external persistent filter '%s'", 
>>>> cmd);
>>>> +          return NULL;
>>>> +  }
>>>> +  strbuf_reset(&nbuf);
>>>> +
>>>> +  sigchain_push(SIGPIPE, SIG_IGN);
>>>> +  ret &= strbuf_read_once(&nbuf, process->out, 0) > 0;
>>> 
>>> Hmm, how much will be read into nbuf by this single call?
>>> Since strbuf_read_once() makes a single call to xread(), with
>>> a len argument that will probably be 8192, you can not really
>>> tell how much it will read, in general. (xread() does not
>>> guarantee how many bytes it will read.)
>>> 
>>> In particular, it could be less than strlen(header).
>> 
>> As mentioned to Torsten in $gmane/300156, I will add a newline
>> and then read until I find the second newline. That should solve
>> the problem, right?
>> 
>> (You wrote in $gmane/300119 that I should ignore your email but
>> I think you have a valid point here ;-)
> 
> Heh, as I said, it was late and I was trying to do several things
> at once. (I am updating 3 installations of Linux Mint 17.3 to Linux
> Mint 18 - I decided to do a complete re-install, since I needed to
> change partition sizes anyway. I have only just got email back up ...)
> 
> I stopped commenting on the patch early but, after sending the first
> email, I decided to scan the rest of your patch before going to bed
> and noticed something which would invalidate my comments ...
> 
>> 
>> 
>>>> [...]
>>>> +  sigchain_push(SIGPIPE, SIG_IGN);
>>>> +  switch (entry->protocol) {
>>>> +          case 1:
>>>> +                  if (fd >= 0 && !src) {
>>>> +                          ret &= fstat(fd, &fileStat) != -1;
>>>> +                          len = fileStat.st_size;
>>>> +                  }
>>>> +                  strbuf_reset(&nbuf);
>>>> +                  strbuf_addf(&nbuf, "%s\n%s\n%zu\n", filter_type, path, 
>>>> len);
>>>> +                  ret &= write_str_in_full(process->in, nbuf.buf) > 1;
>>> 
>>> why not write_in_full(process->in, nbuf.buf, nbuf.len) ?
>> OK, this would save a "strlen" call. Do you think such a function could be 
>> of general
>> use? If yes, then I would add:
>> 
>> static inline ssize_t write_strbuf_in_full(int fd, struct strbuf *str)
>> {
>>      return write_in_full(fd, str->buf, str->len);
>> }
> 
> [I don't have strong feelings either way (but I suspect it's not worth it).]

OK


>>>> +                  if (len > 0) {
>>>> +                          if (src)
>>>> +                                  ret &= write_in_full(process->in, src, 
>>>> len) == len;
>>>> +                          else if (fd >= 0)
>>>> +                                  ret &= copy_fd(fd, process->in) == 0;
>>>> +                          else
>>>> +                                  ret &= 0;
>>>> +                  }
>>>> +
>>>> +                  strbuf_reset(&nbuf);
>>>> +                  while (xread(process->out, &c, 1) == 1 && c != '\n')
>>>> +                          strbuf_addchars(&nbuf, c, 1);
>>>> +                  nbuf_len = (size_t)strtol(nbuf.buf, &strtol_end, 10);
>>>> +                  ret &= (strtol_end != nbuf.buf && errno != ERANGE);
>>>> +                  strbuf_reset(&nbuf);
>>>> +                  if (nbuf_len > 0)
>>>> +                          ret &= strbuf_read_once(&nbuf, process->out, 
>>>> nbuf_len) == nbuf_len;
>>> 
>>> Again, how many bytes will be read?
>>> Note, that in the default configuration, a _maximum_ of
>>> MAX_IO_SIZE (8MB or SSIZE_MAX, whichever is smaller) bytes
>>> will be read.
> 
> ... In particular, your 2GB test case should not have worked, so
> I assumed that I had missed a loop somewhere ...

Thanks a lot for this comment. The 2GB test case was bogus... v2
will have a much improved version :-)


>> Would something like this be more appropriate?
>> 
>> strbuf_reset(&nbuf);
>> if (nbuf_len > 0) {
>>    strbuf_grow(&nbuf, nbuf_len);
>>    ret &= read_in_full(process->out, nbuf.buf, nbuf_len) == nbuf_len;
>> }
> 
> ... and this looks better. [Note: this comment would apply equally to the
> version message.]

And it works better with large files, too :D


> [Hmm, now can I remember which packages I need to install ...]

:-)


Thanks,
Lars--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v1 3/3] convert: add filter..useProtocol option

Reply via email to