Re: [PATCH v4 5/8] trailer: clarify failure modes in parse_trailer

2016-10-22 Thread Junio C Hamano
Christian Couder  writes:

> On Fri, Oct 21, 2016 at 2:18 AM, Junio C Hamano  wrote:
>>
>> If I were guiding a topic that introduce this feature from scratch
>> today, I would probably suggest a pattern based approach, e.g.  a
>> built-in "[-A-Za-z0-9]+:" [*1*] may be the default prefix that is
>> used to recognize the beginning of a trailer, and a user or a
>> project that wants "Bug #538" would be allowed to add an additional
>> pattern, e.g. "Bug *#", that recognises a custom trailer line that
>> is used by the project.
>
> When we designed the separator mechanism, we had the following discussions:
>
> https://public-inbox.org/git/xmqqa9a1d6xn@gitster.dls.corp.google.com/
> https://public-inbox.org/git/xmqqmwcuzyqx@gitster.dls.corp.google.com/
>
> They made me think that you were against too much flexibility, so I
> removed functionality that allowed to put separators into the ".key"
> config options, and now you are saying that we botched the thing and
> that you would like more flexibility of this kind back.

Correct.  Pay attention to the fact that I said _we_ botched.

If an initial design made by a topic author is crappy, that may be
author's botch.  Once a topic goes through a review cycle by getting
reviewed, rerolled, re-reviewed, ... to the point that those
involved accept the result, and we later realize that it was not
good, the botch no longer is author's alone.  If it is shipped as
part of a release, then it is not just the authors and the reviewers
but everybody.  We collectively stopped at a place that was not
ideal and share the blame ;-).

> Anyway I think it is still possible to add back such kind of
> functionality in a backward compatible way for example by adding
> ".extendedKey" config options.

Yup, or with trailer.keyPattern that is multi-values, or with any
number of alternatives.


Re: [PATCH v4 5/8] trailer: clarify failure modes in parse_trailer

2016-10-22 Thread Christian Couder
On Fri, Oct 21, 2016 at 2:18 AM, Junio C Hamano  wrote:
> Jonathan Tan  writes:
>
>> That is true - I think we can take the allowed separators as an
>> argument (meaning that we can have different behavior for file parsing
>> and command line parsing), and since we already have that string, we
>> can use strcspn. I'll try this out in the next reroll.
>
> Sounds good.  Thanks.
>
>
> The following is a tangent that I think this topic should ignore,
> but we may want to revisit it sometime later.
>
> I think the design of the "separator" mechanism is one of the things
> we botched in the current system.  If I recall correctly, this was
> introduced to allow people write "Bug# 538" in the trailer section
> and get it recognised as a valid trailer.
>
> When I say that this was a botched design, I do not mean to say that
> we should have instead forced projects to adopt "Bug: 538" format.
> The design is botched because the users' wish to allow "Bug# 538" or
> "Bug #538" by setting separators to ":#" from the built-in ":" does
> not mean that they would want "Signed-off-by# me " to
> be accepted.
>
> If I were guiding a topic that introduce this feature from scratch
> today, I would probably suggest a pattern based approach, e.g.  a
> built-in "[-A-Za-z0-9]+:" [*1*] may be the default prefix that is
> used to recognize the beginning of a trailer, and a user or a
> project that wants "Bug #538" would be allowed to add an additional
> pattern, e.g. "Bug *#", that recognises a custom trailer line that
> is used by the project.

When we designed the separator mechanism, we had the following discussions:

https://public-inbox.org/git/xmqqa9a1d6xn@gitster.dls.corp.google.com/
https://public-inbox.org/git/xmqqmwcuzyqx@gitster.dls.corp.google.com/

They made me think that you were against too much flexibility, so I
removed functionality that allowed to put separators into the ".key"
config options, and now you are saying that we botched the thing and
that you would like more flexibility of this kind back.

Anyway I think it is still possible to add back such kind of
functionality in a backward compatible way for example by adding
".extendedKey" config options.


Re: [PATCH v4 5/8] trailer: clarify failure modes in parse_trailer

2016-10-22 Thread Christian Couder
On Fri, Oct 21, 2016 at 12:45 AM, Junio C Hamano  wrote:
> Jonathan Tan  writes:
>
>> If we do that, there is also the necessity of creating a string that
>> combines the separators and '=' (I guess '\n' is not necessary now,
>> since all the lines are null terminated). I'm OK either way.
>>
>> (We could cache that string, although I would think that if we did
>> that, we might as well write the loop manually, like in this patch.)
>
> I wonder if there is a legit reason to look for '=' in the first
> place.  "Signed-off-by= Jonathan Tan " does not look
> like a valid trailer line to me.
>
> Isn't that a remnant of lazy coding in the original that tried to
> share a single parser for contents and command line options or
> something?

I think the relevant discussion was this one:

https://public-inbox.org/git/20140915.080429.1739849931027469667.chrisc...@tuxfamily.org/


Re: [PATCH v4 5/8] trailer: clarify failure modes in parse_trailer

2016-10-20 Thread Junio C Hamano
Jonathan Tan  writes:

> That is true - I think we can take the allowed separators as an
> argument (meaning that we can have different behavior for file parsing
> and command line parsing), and since we already have that string, we
> can use strcspn. I'll try this out in the next reroll.

Sounds good.  Thanks.


The following is a tangent that I think this topic should ignore,
but we may want to revisit it sometime later.

I think the design of the "separator" mechanism is one of the things
we botched in the current system.  If I recall correctly, this was
introduced to allow people write "Bug# 538" in the trailer section
and get it recognised as a valid trailer.

When I say that this was a botched design, I do not mean to say that
we should have instead forced projects to adopt "Bug: 538" format.
The design is botched because the users' wish to allow "Bug# 538" or
"Bug #538" by setting separators to ":#" from the built-in ":" does
not mean that they would want "Signed-off-by# me " to
be accepted.

If I were guiding a topic that introduce this feature from scratch
today, I would probably suggest a pattern based approach, e.g.  a
built-in "[-A-Za-z0-9]+:" [*1*] may be the default prefix that is
used to recognize the beginning of a trailer, and a user or a
project that wants "Bug #538" would be allowed to add an additional
pattern, e.g. "Bug *#", that recognises a custom trailer line that
is used by the project.


[Footnote]

*1* Or more lenient "[A-Za-z0-9][- A-Za-z0-9]*:".


Re: [PATCH v4 5/8] trailer: clarify failure modes in parse_trailer

2016-10-20 Thread Jonathan Tan

On 10/20/2016 03:45 PM, Junio C Hamano wrote:

Jonathan Tan  writes:


If we do that, there is also the necessity of creating a string that
combines the separators and '=' (I guess '\n' is not necessary now,
since all the lines are null terminated). I'm OK either way.

(We could cache that string, although I would think that if we did
that, we might as well write the loop manually, like in this patch.)


I wonder if there is a legit reason to look for '=' in the first
place.  "Signed-off-by= Jonathan Tan " does not look
like a valid trailer line to me.

Isn't that a remnant of lazy coding in the original that tried to
share a single parser for contents and command line options or
something?


That is true - I think we can take the allowed separators as an argument 
(meaning that we can have different behavior for file parsing and 
command line parsing), and since we already have that string, we can use 
strcspn. I'll try this out in the next reroll.


Re: [PATCH v4 5/8] trailer: clarify failure modes in parse_trailer

2016-10-20 Thread Junio C Hamano
Jonathan Tan  writes:

> If we do that, there is also the necessity of creating a string that
> combines the separators and '=' (I guess '\n' is not necessary now,
> since all the lines are null terminated). I'm OK either way.
>
> (We could cache that string, although I would think that if we did
> that, we might as well write the loop manually, like in this patch.)

I wonder if there is a legit reason to look for '=' in the first
place.  "Signed-off-by= Jonathan Tan " does not look
like a valid trailer line to me.

Isn't that a remnant of lazy coding in the original that tried to
share a single parser for contents and command line options or
something?


Re: [PATCH v4 5/8] trailer: clarify failure modes in parse_trailer

2016-10-20 Thread Jonathan Tan

On 10/20/2016 03:40 PM, Jonathan Tan wrote:

On 10/20/2016 03:14 PM, Junio C Hamano wrote:

Stefan Beller  writes:


+static int find_separator(const char *line)
+{
+   const char *c;
+   for (c = line; ; c++) {
+   if (!*c || *c == '\n')
+   return -1;
+   if (*c == '=' || strchr(separators, *c))
+   return c - line;
+   }


I was about to suggest this function can be simplified and maybe
even inlined by the use of strspn or strcspn, but I think manual
processing of the string is fine, too, as it would not really be
shorter.


Hmm, I fear that iterating over a line one-byte-at-a-time and
running strchr(separators, *c) on it for each byte has a performance
implication over running a single call to strcspn(line, separators).


If we do that, there is also the necessity of creating a string that
combines the separators and '=' (I guess '\n' is not necessary now,
since all the lines are null terminated). I'm OK either way.

(We could cache that string, although I would think that if we did that,
we might as well write the loop manually, like in this patch.)


Actually I guess we could generate the separators_and_equal string 
whenever we obtain new separators from the config. I'll do this in the 
next reroll.


Re: [PATCH v4 5/8] trailer: clarify failure modes in parse_trailer

2016-10-20 Thread Jonathan Tan

On 10/20/2016 03:14 PM, Junio C Hamano wrote:

Stefan Beller  writes:


+static int find_separator(const char *line)
+{
+   const char *c;
+   for (c = line; ; c++) {
+   if (!*c || *c == '\n')
+   return -1;
+   if (*c == '=' || strchr(separators, *c))
+   return c - line;
+   }


I was about to suggest this function can be simplified and maybe
even inlined by the use of strspn or strcspn, but I think manual
processing of the string is fine, too, as it would not really be shorter.


Hmm, I fear that iterating over a line one-byte-at-a-time and
running strchr(separators, *c) on it for each byte has a performance
implication over running a single call to strcspn(line, separators).


If we do that, there is also the necessity of creating a string that 
combines the separators and '=' (I guess '\n' is not necessary now, 
since all the lines are null terminated). I'm OK either way.


(We could cache that string, although I would think that if we did that, 
we might as well write the loop manually, like in this patch.)


Re: [PATCH v4 5/8] trailer: clarify failure modes in parse_trailer

2016-10-20 Thread Junio C Hamano
Stefan Beller  writes:

>> +static int find_separator(const char *line)
>> +{
>> +   const char *c;
>> +   for (c = line; ; c++) {
>> +   if (!*c || *c == '\n')
>> +   return -1;
>> +   if (*c == '=' || strchr(separators, *c))
>> +   return c - line;
>> +   }
>
> I was about to suggest this function can be simplified and maybe
> even inlined by the use of strspn or strcspn, but I think manual
> processing of the string is fine, too, as it would not really be shorter.

Hmm, I fear that iterating over a line one-byte-at-a-time and
running strchr(separators, *c) on it for each byte has a performance
implication over running a single call to strcspn(line, separators).


Re: [PATCH v4 5/8] trailer: clarify failure modes in parse_trailer

2016-10-20 Thread Stefan Beller
On Thu, Oct 20, 2016 at 2:39 PM, Jonathan Tan  wrote:
> The parse_trailer function has a few modes of operation, all depending
> on whether the separator is present in its input, and if yes, the
> separator's position. Some of these modes are failure modes, and these
> failure modes are handled differently depending on whether the trailer
> line was sourced from a file or from a command-line argument.
>
> Extract a function to find the separator, allowing the invokers of
> parse_trailer to determine how to handle the failure modes instead of
> making parse_trailer do it.
>
> Signed-off-by: Jonathan Tan 
> ---
>  trailer.c | 70 
> +++
>  1 file changed, 48 insertions(+), 22 deletions(-)
>
> diff --git a/trailer.c b/trailer.c
> index 99018f8..137a3fb 100644
> --- a/trailer.c
> +++ b/trailer.c
> @@ -543,29 +543,40 @@ static int token_matches_item(const char *tok, struct 
> arg_item *item, int tok_le
> return item->conf.key ? !strncasecmp(tok, item->conf.key, tok_len) : 
> 0;
>  }
>
> -static int parse_trailer(struct strbuf *tok, struct strbuf *val,
> -const struct conf_info **conf, const char *trailer)
> +/*
> + * Return the location of the first separator or '=' in line, or -1 if 
> either a
> + * newline or the null terminator is reached first.
> + */
> +static int find_separator(const char *line)
> +{
> +   const char *c;
> +   for (c = line; ; c++) {
> +   if (!*c || *c == '\n')
> +   return -1;
> +   if (*c == '=' || strchr(separators, *c))
> +   return c - line;
> +   }

I was about to suggest this function can be simplified and maybe
even inlined by the use of strspn or strcspn, but I think manual
processing of the string is fine, too, as it would not really be shorter.


[PATCH v4 5/8] trailer: clarify failure modes in parse_trailer

2016-10-20 Thread Jonathan Tan
The parse_trailer function has a few modes of operation, all depending
on whether the separator is present in its input, and if yes, the
separator's position. Some of these modes are failure modes, and these
failure modes are handled differently depending on whether the trailer
line was sourced from a file or from a command-line argument.

Extract a function to find the separator, allowing the invokers of
parse_trailer to determine how to handle the failure modes instead of
making parse_trailer do it.

Signed-off-by: Jonathan Tan 
---
 trailer.c | 70 +++
 1 file changed, 48 insertions(+), 22 deletions(-)

diff --git a/trailer.c b/trailer.c
index 99018f8..137a3fb 100644
--- a/trailer.c
+++ b/trailer.c
@@ -543,29 +543,40 @@ static int token_matches_item(const char *tok, struct 
arg_item *item, int tok_le
return item->conf.key ? !strncasecmp(tok, item->conf.key, tok_len) : 0;
 }
 
-static int parse_trailer(struct strbuf *tok, struct strbuf *val,
-const struct conf_info **conf, const char *trailer)
+/*
+ * Return the location of the first separator or '=' in line, or -1 if either a
+ * newline or the null terminator is reached first.
+ */
+static int find_separator(const char *line)
+{
+   const char *c;
+   for (c = line; ; c++) {
+   if (!*c || *c == '\n')
+   return -1;
+   if (*c == '=' || strchr(separators, *c))
+   return c - line;
+   }
+}
+
+/*
+ * Obtain the token, value, and conf from the given trailer.
+ *
+ * separator_pos must not be 0, since the token cannot be an empty string.
+ *
+ * If separator_pos is -1, interpret the whole trailer as a token.
+ */
+static void parse_trailer(struct strbuf *tok, struct strbuf *val,
+const struct conf_info **conf, const char *trailer,
+int separator_pos)
 {
-   size_t len;
-   struct strbuf seps = STRBUF_INIT;
struct arg_item *item;
int tok_len;
struct list_head *pos;
 
-   strbuf_addstr(, separators);
-   strbuf_addch(, '=');
-   len = strcspn(trailer, seps.buf);
-   strbuf_release();
-   if (len == 0) {
-   int l = strlen(trailer);
-   while (l > 0 && isspace(trailer[l - 1]))
-   l--;
-   return error(_("empty trailer token in trailer '%.*s'"), l, 
trailer);
-   }
-   if (len < strlen(trailer)) {
-   strbuf_add(tok, trailer, len);
+   if (separator_pos != -1) {
+   strbuf_add(tok, trailer, separator_pos);
strbuf_trim(tok);
-   strbuf_addstr(val, trailer + len + 1);
+   strbuf_addstr(val, trailer + separator_pos + 1);
strbuf_trim(val);
} else {
strbuf_addstr(tok, trailer);
@@ -587,8 +598,6 @@ static int parse_trailer(struct strbuf *tok, struct strbuf 
*val,
break;
}
}
-
-   return 0;
 }
 
 static void add_trailer_item(struct list_head *head, char *tok, char *val)
@@ -631,11 +640,22 @@ static void process_command_line_args(struct list_head 
*arg_head,
 
/* Add an arg item for each trailer on the command line */
for_each_string_list_item(tr, trailers) {
-   if (!parse_trailer(, , , tr->string))
+   int separator_pos = find_separator(tr->string);
+   if (separator_pos == 0) {
+   struct strbuf sb = STRBUF_INIT;
+   strbuf_addstr(, tr->string);
+   strbuf_trim();
+   error(_("empty trailer token in trailer '%.*s'"),
+ (int) sb.len, sb.buf);
+   strbuf_release();
+   } else {
+   parse_trailer(, , , tr->string,
+ separator_pos);
add_arg_item(arg_head,
 strbuf_detach(, NULL),
 strbuf_detach(, NULL),
 conf);
+   }
}
 }
 
@@ -775,11 +795,17 @@ static int process_input_file(FILE *outfile,
 
/* Parse trailer lines */
for (i = trailer_start; i < trailer_end; i++) {
-   if (lines[i]->buf[0] != comment_line_char &&
-   !parse_trailer(, , NULL, lines[i]->buf))
+   int separator_pos;
+   if (lines[i]->buf[0] == comment_line_char)
+   continue;
+   separator_pos = find_separator(lines[i]->buf);
+   if (separator_pos >= 1) {
+   parse_trailer(, , NULL, lines[i]->buf,
+ separator_pos);
add_trailer_item(head,
 strbuf_detach(, NULL),