Re: [PATCH] pretty-print: de-tabify indented logs to make things line up properly

Junio C Hamano Sat, 19 Mar 2016 09:50:50 -0700

Linus Torvalds <[email protected]> writes:

> From: Linus Torvalds <[email protected]>
> Date: Wed, 16 Mar 2016 09:15:53 -0700
> Subject: [PATCH] pretty-print: de-tabify indented logs to make things line up 
> properly
>
> This should all line up:
>
>   Column 1    Column 2
>   --------    --------
>   A           B
>   ABCD                EFGH
>   SPACES        Instead of Tabs
>
> Even with multi-byte UTF8 characters:
>
>   Column 1    Column 2
>   --------    --------
>   Ä           B
>   åäö         100
>   A Møøse     once bit my sister..
>
> Signed-off-by: Linus Torvalds <[email protected]>
> ---
>
> This seems to work for me, and while there is some cost, it's minimal. 
> Doing a "git log > /dev/null" of the current git tree is about 1% slower 
> because of the tab-finding. A tree with a lot of tabs in the commit 
> messages would be more noticeable, because then you actually end up 
> hitting the whole "how wide is this" issue.
>
> (But if the tabs are all at the beginning of a line, you'd still be ok 
> and avoid the utf8 width calculations).
>
> Comments?


I stared at it for a while, and didn't spot anything wrong with it.

I did wonder about two things, though:

 (1) if turning your "preparation; do { ... } while()" into
     "while () { }" would make the result a bit easier to read;

 (2) if we can somehow eliminate duplication of "tab + 1" (spelled
     differently on the previous line as "1+tab"), the end result
     may get easier to follow.

but both are minor.

>  pretty.c | 76 
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 74 insertions(+), 2 deletions(-)
>
> diff --git a/pretty.c b/pretty.c
> index 92b2870a7eab..0b40457f99f0 100644
> --- a/pretty.c
> +++ b/pretty.c
> @@ -1629,6 +1629,76 @@ void pp_title_line(struct pretty_print_context *pp,
>       strbuf_release(&title);
>  }
>  
> +static int pp_utf8_width(const char *start, const char *end)
> +{
> +     int width = 0;
> +     size_t remain = end - start;
> +
> +     while (remain) {
> +             int n = utf8_width(&start, &remain);
> +             if (n < 0 || !start)
> +                     return -1;
> +             width += n;
> +     }
> +     return width;
> +}
> +
> +/*
> + * pp_handle_indent() prints out the intendation, and
> + * perhaps the whole line (without the final newline)
> + *
> + * Why "perhaps"? If there are tabs in the indented line
> + * it will print it out in order to de-tabify the line.
> + *
> + * But if there are no tabs, we just fall back on the
> + * normal "print the whole line".
> + */
> +static int pp_handle_indent(struct strbuf *sb, int indent,
> +                          const char *line, int linelen)
> +{
> +     const char *tab;
> +
> +     strbuf_addchars(sb, ' ', indent);
> +
> +     tab = memchr(line, '\t', linelen);
> +     if (!tab)
> +             return 0;
> +
> +     do {
> +             int width = pp_utf8_width(line, tab);
> +
> +             /*
> +              * If it wasn't well-formed utf8, or it
> +              * had characters with badly defined
> +              * width (control characters etc), just
> +              * give up on trying to align things.
> +              */
> +             if (width < 0)
> +                     break;
> +
> +             /* Output the data .. */
> +             strbuf_add(sb, line, tab - line);
> +
> +             /* .. and the de-tabified tab */
> +             strbuf_addchars(sb, ' ', 8-(width & 7));
> +
> +             /* Skip over the printed part .. */
> +             linelen -= 1+tab-line;
> +             line = tab + 1;
> +
> +             /* .. and look for the next tab */
> +             tab = memchr(line, '\t', linelen);
> +     } while (tab);
> +
> +     /*
> +      * Print out everything after the last tab without
> +      * worrying about width - there's nothing more to
> +      * align.
> +      */
> +     strbuf_add(sb, line, linelen);
> +     return 1;
> +}
> +
>  void pp_remainder(struct pretty_print_context *pp,
>                 const char **msg_p,
>                 struct strbuf *sb,
> @@ -1652,8 +1722,10 @@ void pp_remainder(struct pretty_print_context *pp,
>               first = 0;
>  
>               strbuf_grow(sb, linelen + indent + 20);
> -             if (indent)
> -                     strbuf_addchars(sb, ' ', indent);
> +             if (indent) {
> +                     if (pp_handle_indent(sb, indent, line, linelen))
> +                             linelen = 0;
> +             }
>               strbuf_add(sb, line, linelen);
>               strbuf_addch(sb, '\n');
>       }
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] pretty-print: de-tabify indented logs to make things line up properly

Reply via email to