Re: [PATCH v1 3/3] convert: add filter..useProtocol option

Lars Schneider Sun, 24 Jul 2016 10:17:37 -0700

On 23 Jul 2016, at 01:19, Ramsay Jones <ram...@ramsayjones.plus.com> wrote:


> On 22/07/16 16:49, larsxschnei...@gmail.com wrote:
>> From: Lars Schneider <larsxschnei...@gmail.com>
>> 
>> Git's clean/smudge mechanism invokes an external filter process for every
>> single blob that is affected by a filter. If Git filters a lot of blobs
>> then the startup time of the external filter processes can become a
>> significant part of the overall Git execution time.
>> 
>> This patch adds the filter.<driver>.useProtocol option which, if enabled,
>> keeps the external filter process running and processes all blobs with
>> the following protocol over stdin/stdout.
>> 
>> 1. Git starts the filter on first usage and expects a welcome message
>> with protocol version number:
>>      Git <-- Filter: "git-filter-protocol\n"
>>      Git <-- Filter: "version 1"
> 
> Hmm, I was a bit surprised to see a 'filter' talk first (but so long as the
> interaction is fully defined, I guess it doesn't matter).
> 
> [If you wanted to check for a version, you could add a "version" command
> instead, just like "clean" and "smudge".]

It was a conscious decision to have the `filter` talk first. My reasoning was:

(1) I want a reliable way to distinguish the existing filter protocol 
("single-shot 
invocation") from the new one ("long running"). I don't think there would be a
situation where the existing protocol would talk first. Therefore the users 
would
not accidentally mix them with a possibly half working, undetermined, outcome.

(2) In the future we could extend the pipe protocol (see $gmane/297994, it's 
very
interesting). A filter could check Git's version and then pick the most 
appropriate
filter protocol on startup.


> [...]
>> +static struct cmd2process *start_protocol_filter(const char *cmd)
>> +{
>> +    int ret = 1;
>> +    struct cmd2process *entry = NULL;
>> +    struct child_process *process = NULL;
>> +    struct strbuf nbuf = STRBUF_INIT;
>> +    struct string_list split = STRING_LIST_INIT_NODUP;
>> +    const char *argv[] = { NULL, NULL };
>> +    const char *header = "git-filter-protocol\nversion";
>> +
>> +    entry = xmalloc(sizeof(*entry));
>> +    hashmap_entry_init(entry, strhash(cmd));
>> +    entry->cmd = cmd;
>> +    process = &entry->process;
>> +
>> +    child_process_init(process);
>> +    argv[0] = cmd;
>> +    process->argv = argv;
>> +    process->use_shell = 1;
>> +    process->in = -1;
>> +    process->out = -1;
>> +
>> +    if (start_command(process)) {
>> +            error("cannot fork to run external persistent filter '%s'", 
>> cmd);
>> +            return NULL;
>> +    }
>> +    strbuf_reset(&nbuf);
>> +
>> +    sigchain_push(SIGPIPE, SIG_IGN);
>> +    ret &= strbuf_read_once(&nbuf, process->out, 0) > 0;
> 
> Hmm, how much will be read into nbuf by this single call?
> Since strbuf_read_once() makes a single call to xread(), with
> a len argument that will probably be 8192, you can not really
> tell how much it will read, in general. (xread() does not
> guarantee how many bytes it will read.)
> 
> In particular, it could be less than strlen(header).

As mentioned to Torsten in $gmane/300156, I will add a newline
and then read until I find the second newline. That should solve
the problem, right?

(You wrote in $gmane/300119 that I should ignore your email but
I think you have a valid point here ;-)


>> [...]
>> +    sigchain_push(SIGPIPE, SIG_IGN);
>> +    switch (entry->protocol) {
>> +            case 1:
>> +                    if (fd >= 0 && !src) {
>> +                            ret &= fstat(fd, &fileStat) != -1;
>> +                            len = fileStat.st_size;
>> +                    }
>> +                    strbuf_reset(&nbuf);
>> +                    strbuf_addf(&nbuf, "%s\n%s\n%zu\n", filter_type, path, 
>> len);
>> +                    ret &= write_str_in_full(process->in, nbuf.buf) > 1;
> 
> why not write_in_full(process->in, nbuf.buf, nbuf.len) ?
OK, this would save a "strlen" call. Do you think such a function could be of 
general
use? If yes, then I would add:

static inline ssize_t write_strbuf_in_full(int fd, struct strbuf *str)
{
        return write_in_full(fd, str->buf, str->len);
}


>> +                    if (len > 0) {
>> +                            if (src)
>> +                                    ret &= write_in_full(process->in, src, 
>> len) == len;
>> +                            else if (fd >= 0)
>> +                                    ret &= copy_fd(fd, process->in) == 0;
>> +                            else
>> +                                    ret &= 0;
>> +                    }
>> +
>> +                    strbuf_reset(&nbuf);
>> +                    while (xread(process->out, &c, 1) == 1 && c != '\n')
>> +                            strbuf_addchars(&nbuf, c, 1);
>> +                    nbuf_len = (size_t)strtol(nbuf.buf, &strtol_end, 10);
>> +                    ret &= (strtol_end != nbuf.buf && errno != ERANGE);
>> +                    strbuf_reset(&nbuf);
>> +                    if (nbuf_len > 0)
>> +                            ret &= strbuf_read_once(&nbuf, process->out, 
>> nbuf_len) == nbuf_len;
> 
> Again, how many bytes will be read?
> Note, that in the default configuration, a _maximum_ of
> MAX_IO_SIZE (8MB or SSIZE_MAX, whichever is smaller) bytes
> will be read.
Would something like this be more appropriate?

strbuf_reset(&nbuf);
if (nbuf_len > 0) {
    strbuf_grow(&nbuf, nbuf_len);
    ret &= read_in_full(process->out, nbuf.buf, nbuf_len) == nbuf_len;
}


Thanks for the review,
Lars

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v1 3/3] convert: add filter..useProtocol option

Reply via email to