Re: Tabs in commit messages - de-tabify option in strbuf_stripspace()?

2016-03-20 Thread Matthieu Moy
Marc Branchaud  writes:

> On 16-03-15 09:02 PM, Stefan Beller wrote:
>> On Tue, Mar 15, 2016 at 6:00 PM, Stefan Beller  wrote:
>>>
>>> Instead of converting to whitespaces in Git, we could make use of the
>>> set_tabs capability for ttys and setup the terminal for having tabs align
>>> to 12,+8,+8,+8...
>> 
>> Or rather read in the existing tabs configuration and shift it by a constant.
>
> Could this also help with diff output, where the leading + or - mars the
> indentation in a similar way?

Hardly: the same pager would be used to display the commit message (with
a 4-space prefix) and the diff (with a 1-char +/- prefix). So the
offsets in these two places would need to be different, and I don't
think we can change that on the fly.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Tabs in commit messages - de-tabify option in strbuf_stripspace()?

2016-03-19 Thread Linus Torvalds
On Wed, Mar 16, 2016 at 7:27 AM, Marc Branchaud  wrote:
>
> Could this also help with diff output, where the leading + or - mars the
> indentation in a similar way?

I don't think that's a good idea at least by default, since then it
will break things like running diff in emacs capture mode.

So even if you're in a terminal, you can't assume that you can munge
the output too much.

Of course, if colorization is on, you might as well pretty-print the
diff by indenting things properly too, since the end result isn't
going to be used as a _diff_.

  Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Tabs in commit messages - de-tabify option in strbuf_stripspace()?

2016-03-19 Thread Duy Nguyen
On Wed, Mar 16, 2016 at 9:27 PM, Marc Branchaud  wrote:
> On 16-03-15 09:02 PM, Stefan Beller wrote:
>> On Tue, Mar 15, 2016 at 6:00 PM, Stefan Beller  wrote:
>>>
>>> Instead of converting to whitespaces in Git, we could make use of the
>>> set_tabs capability for ttys and setup the terminal for having tabs align
>>> to 12,+8,+8,+8...
>>
>> Or rather read in the existing tabs configuration and shift it by a constant.
>
> Could this also help with diff output, where the leading + or - mars the
> indentation in a similar way?

Yes, please! Even if we only activate this when stdout is tty. Much
better output.

> That opens a bit of a deeper problem, because not all the files in a single
> repo necessarily use the same tab size.

Maybe we can pass --tab-width=xx to git?
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Tabs in commit messages - de-tabify option in strbuf_stripspace()?

2016-03-19 Thread Marc Branchaud
On 16-03-15 09:02 PM, Stefan Beller wrote:
> On Tue, Mar 15, 2016 at 6:00 PM, Stefan Beller  wrote:
>>
>> Instead of converting to whitespaces in Git, we could make use of the
>> set_tabs capability for ttys and setup the terminal for having tabs align
>> to 12,+8,+8,+8...
> 
> Or rather read in the existing tabs configuration and shift it by a constant.

Could this also help with diff output, where the leading + or - mars the
indentation in a similar way?

That opens a bit of a deeper problem, because not all the files in a single
repo necessarily use the same tab size.

M.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Tabs in commit messages - de-tabify option in strbuf_stripspace()?

2016-03-18 Thread Junio C Hamano
Linus Torvalds  writes:

> On Wed, Mar 16, 2016 at 7:27 AM, Marc Branchaud  wrote:
>>
>> Could this also help with diff output, where the leading + or - mars the
>> indentation in a similar way?
>
> I don't think that's a good idea at least by default, since then it
> will break things like running diff in emacs capture mode.
>
> So even if you're in a terminal, you can't assume that you can munge
> the output too much.
>
> Of course, if colorization is on, you might as well pretty-print the
> diff by indenting things properly too, since the end result isn't
> going to be used as a _diff_.

I agree that I will never use such an end result as a diff, but I
may still be tempted to cut-and-paste individual lines after '^+'
when resurrecting a WIP topic that does not rebase very well, so I
agree with you that the output shouldn't be munged by default even
though I think it is OK to have an option..


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Tabs in commit messages - de-tabify option in strbuf_stripspace()?

2016-03-15 Thread Jeff King
On Tue, Mar 15, 2016 at 09:57:16PM -0700, Linus Torvalds wrote:

> On Tue, Mar 15, 2016 at 9:46 PM, Junio C Hamano  wrote:
> >
> > It also ignores that byte counts of non-HT bytes may not necessarily
> > match display columns.  There is utf8_width() function exported from
> > utf8.c for that purpose.
> 
> Hmm. I did that to now make it horribly slower. Doing the
> per-character widths really does horrible things for performance.
> 
> That said, I think I can make it do the fast thing for lines that have
> no tabs at all, which is the big bulk of it. So it would do the width
> calculations only in the rare "yes, I found a tab" case.
> 
> I already wrote it in a way that non-tab lines end up just looping
> over the line looking for a tab and then fall back to the old code.
> 
> I might just have to make that behavior a bit more explicit. Let me
> stare at it a bit, but it won't happen until tomorrow, I think.

I wondered about performance when reading yours, and measured a straight
"git log >/dev/null" on the kernel at about 3% slower (best-of-five and
average). We can get most of that back by sticking a

memchr(line, '\t', linelen);

in front and skipping the loop if it returns NULL. You can also
implement your loop in terms of memchr() calls to skip to the next tab,
but I don't know if it's worth it. It's really only worth spending time
optimizing the non-tab case (and even then, only worth it for trivial
things like "use the already optimized-as-hell memchr").

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Tabs in commit messages - de-tabify option in strbuf_stripspace()?

2016-03-15 Thread Linus Torvalds
On Tue, Mar 15, 2016 at 9:46 PM, Junio C Hamano  wrote:
>
> It also ignores that byte counts of non-HT bytes may not necessarily
> match display columns.  There is utf8_width() function exported from
> utf8.c for that purpose.

Hmm. I did that to now make it horribly slower. Doing the
per-character widths really does horrible things for performance.

That said, I think I can make it do the fast thing for lines that have
no tabs at all, which is the big bulk of it. So it would do the width
calculations only in the rare "yes, I found a tab" case.

I already wrote it in a way that non-tab lines end up just looping
over the line looking for a tab and then fall back to the old code.

I might just have to make that behavior a bit more explicit. Let me
stare at it a bit, but it won't happen until tomorrow, I think.

 Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Tabs in commit messages - de-tabify option in strbuf_stripspace()?

2016-03-15 Thread Junio C Hamano
Linus Torvalds  writes:

> Here's a first try at it. It does tab expansion only for the cases
> that indent the commit message, so for things like "pretty=email" it
> makes no difference at all.

It also ignores that byte counts of non-HT bytes may not necessarily
match display columns.  There is utf8_width() function exported from
utf8.c for that purpose.

I think it is fine to make it default for the pretty=medium, but it
would be nicer to introduce a new command line option --no-untabify
to optionally disable the expansion.

>  pretty.c | 20 +++-
>  1 file changed, 19 insertions(+), 1 deletion(-)
>
> diff --git a/pretty.c b/pretty.c
> index 92b2870a7eab..dcd6105d1eb2 100644
> --- a/pretty.c
> +++ b/pretty.c
> @@ -1652,8 +1652,26 @@ void pp_remainder(struct pretty_print_context *pp,
>   first = 0;
>  
>   strbuf_grow(sb, linelen + indent + 20);
> - if (indent)
> + if (indent) {
> + int i, last = 0, pos = 0;
> +
>   strbuf_addchars(sb, ' ', indent);
> + for (i = 0; i < linelen; i++) {
> + int expand;
> + if (line[i] != '\t')
> + continue;
> + strbuf_add(sb, line+last, i-last);
> + pos += i-last;
> + expand = (pos + 8) & ~7;
> + strbuf_addchars(sb, ' ', expand - pos);
> + pos = expand;
> + last = i+1;
> + }
> +
> + // Handle the tail non-tab content
> + line += last;
> + linelen -= last;
> + }
>   strbuf_add(sb, line, linelen);
>   strbuf_addch(sb, '\n');
>   }
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Tabs in commit messages - de-tabify option in strbuf_stripspace()?

2016-03-15 Thread Linus Torvalds
On Tue, Mar 15, 2016 at 5:48 PM, Linus Torvalds
 wrote:
> On Tue, Mar 15, 2016 at 5:45 PM, Junio C Hamano  wrote:
>>
>> Wouldn't it be nicer to do this kind of thing at the output side?  A
>> stupid way would be to have an option to indent the log text with a
>> tab instead of 4-spaces, but a more sensible way would be to keep
>> the visual 4-space-indent and do the expand-tab for each line of
>> text.
>
> Yes, that would certainly work for me too. It's just the "ascii boxes
> don't line up" that is problematic..

Ok, that actually turns out to be pretty easy.

Here's a first try at it. It does tab expansion only for the cases
that indent the commit message, so for things like "pretty=email" it
makes no difference at all.

However, it hardcodes the hardtab size to 8, and makes this all
unconditional. That's how a vt100 terminal works, but fancer terminals
may allow you set actual tab-stops, and if you're in emacs you can
probably do just about anything)

Comments? This does make the kernel commit 0dc8c730c98a look fine, so
it would make me happy.

I can write a commit log etc if people think this is ok, but right now
this is just a silly raw patch in the attachement..

   Linus
 pretty.c | 20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/pretty.c b/pretty.c
index 92b2870a7eab..dcd6105d1eb2 100644
--- a/pretty.c
+++ b/pretty.c
@@ -1652,8 +1652,26 @@ void pp_remainder(struct pretty_print_context *pp,
first = 0;
 
strbuf_grow(sb, linelen + indent + 20);
-   if (indent)
+   if (indent) {
+   int i, last = 0, pos = 0;
+
strbuf_addchars(sb, ' ', indent);
+   for (i = 0; i < linelen; i++) {
+   int expand;
+   if (line[i] != '\t')
+   continue;
+   strbuf_add(sb, line+last, i-last);
+   pos += i-last;
+   expand = (pos + 8) & ~7;
+   strbuf_addchars(sb, ' ', expand - pos);
+   pos = expand;
+   last = i+1;
+   }
+
+   // Handle the tail non-tab content
+   line += last;
+   linelen -= last;
+   }
strbuf_add(sb, line, linelen);
strbuf_addch(sb, '\n');
}


Re: Tabs in commit messages - de-tabify option in strbuf_stripspace()?

2016-03-15 Thread Stefan Beller
On Tue, Mar 15, 2016 at 6:00 PM, Stefan Beller  wrote:
>
> Instead of converting to whitespaces in Git, we could make use of the
> set_tabs capability for ttys and setup the terminal for having tabs align
> to 12,+8,+8,+8...

Or rather read in the existing tabs configuration and shift it by a constant.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Tabs in commit messages - de-tabify option in strbuf_stripspace()?

2016-03-15 Thread Stefan Beller
On Tue, Mar 15, 2016 at 5:48 PM, Linus Torvalds
 wrote:
> On Tue, Mar 15, 2016 at 5:45 PM, Junio C Hamano  wrote:
>>
>> Wouldn't it be nicer to do this kind of thing at the output side?  A
>> stupid way would be to have an option to indent the log text with a
>> tab instead of 4-spaces, but a more sensible way would be to keep
>> the visual 4-space-indent and do the expand-tab for each line of
>> text.
>
> Yes, that would certainly work for me too. It's just the "ascii boxes
> don't line up" that is problematic..

I would also rather side to correct the display side rather than the
applying side. [I still want to send and apply patches with git written in
the white space language ;]

Instead of converting to whitespaces in Git, we could make use of the
set_tabs capability for ttys and setup the terminal for having tabs align
to 12,+8,+8,+8... such that it looks like an indented terminal.
That way we would also preserve the tabs in case you'd want to
copy the message as is.

Thanks,
Stefan

>
> (Also people including example source code etc, where the first tab
> looks wildly different from the second one)
>
> Linus
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Tabs in commit messages - de-tabify option in strbuf_stripspace()?

2016-03-15 Thread Linus Torvalds
On Tue, Mar 15, 2016 at 5:45 PM, Junio C Hamano  wrote:
>
> Wouldn't it be nicer to do this kind of thing at the output side?  A
> stupid way would be to have an option to indent the log text with a
> tab instead of 4-spaces, but a more sensible way would be to keep
> the visual 4-space-indent and do the expand-tab for each line of
> text.

Yes, that would certainly work for me too. It's just the "ascii boxes
don't line up" that is problematic..

(Also people including example source code etc, where the first tab
looks wildly different from the second one)

Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Tabs in commit messages - de-tabify option in strbuf_stripspace()?

2016-03-15 Thread Junio C Hamano
Linus Torvalds  writes:

> Do people hate that idea? I may not get around to it for a while (it's
> the kernel merge window right now), but I can write the patch
> eventually - I just wanted to do an RFC first.

Wouldn't it be nicer to do this kind of thing at the output side?  A
stupid way would be to have an option to indent the log text with a
tab instead of 4-spaces, but a more sensible way would be to keep
the visual 4-space-indent and do the expand-tab for each line of
text.

That way, your viewing of existing commits that use 8-space HT to
align (and worse yet, mixture of 8-space HT and 8 spaces and assume
the end result would align in the output) would become more pleasant
without you having to run filter-branch ;-)
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Tabs in commit messages - de-tabify option in strbuf_stripspace()?

2016-03-15 Thread Linus Torvalds
On Tue, Mar 15, 2016 at 5:23 PM, Stefan Beller  wrote:
>
> Could you point at some example to better understand the problem?

So in the kernel repo, I just randomly looked for tabs that show this
problem, and take for example commit
ff9a9b4c4334b53b52ee9279f30bd5dd92ea9bdd.

You can see how the original lined up, by doing

git show --pretty=email ff9a9b4c4334

because the email format doesn't indent the commit message. But if you do just

git show ff9a9b4c4334

and get the usual indentation, you'll see things not line up at all.

In case you don't want to bother with the kernel repo, here's what it
looks like:

email format:

--- snip snip 8< ---
A microbenchmark calling an invalid syscall number 10 million
times in a row speeds up an additional 30% over the numbers
with just the previous patches, for a total speedup of about
40% over 4.4 and 4.5-rc1.

Run times for the microbenchmark:

 4.43.8 seconds
 4.5-rc13.7 seconds
 4.5-rc1 + first patch  3.3 seconds
 4.5-rc1 + first 3 patches  3.1 seconds
 4.5-rc1 + all patches  2.3 seconds

A non-NOHZ_FULL cpu (not the housekeeping CPU):

 all kernels1.86 seconds
--- snip snip 8< ---

Normal "git show" format:

--- snip snip 8< ---
A microbenchmark calling an invalid syscall number 10 million
times in a row speeds up an additional 30% over the numbers
with just the previous patches, for a total speedup of about
40% over 4.4 and 4.5-rc1.

Run times for the microbenchmark:

 4.43.8 seconds
 4.5-rc13.7 seconds
 4.5-rc1 + first patch  3.3 seconds
 4.5-rc1 + first 3 patches  3.1 seconds
 4.5-rc1 + all patches  2.3 seconds

A non-NOHZ_FULL cpu (not the housekeeping CPU):

 all kernels1.86 seconds
--- snip snip 8< ---

which hopefully clarifies.

In the above case, it really isn't very annoying. It's just slightly
ugly. In some other cases, it can get quite hard to see what's up, but
the ones that come through me I actually tend to try to edit, so many
of them have been corrected.

For other examples (again, in the kernel), look at 19b2c30d3cce, or
0dc8c730c98a.

Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Tabs in commit messages - de-tabify option in strbuf_stripspace()?

2016-03-15 Thread Randall S. Becker
On March 15, 2016 8:17 PM Linus Torvalds wrote:
> So I end up doing this manually when I notice, but I was wondering ig maybe
> git could just have an option to "git am" and friends to de-tabify the commit
> message.
> 
> It's particularly noticeable when people line things up using tabs (for the
> kernel, it's often things like "cpu1 does X, cpu2 does Y"), and then when you
> do "git log" it looks like a unholy mess, because the 4-char indentation of 
> the
> log message ends up causing those things to not line up at all after all.
> 
> The natural thing to do would be to pass in a "tab size" parameter to
> strbuf_stripspace(), and default it to 0 (for no change), but have some way to
> let people say "expand tabs to spaces at 8-character tab-stops" or similar
> (but let people use different tab-stops if they want).
> 
> Do people hate that idea? I may not get around to it for a while (it's the
> kernel merge window right now), but I can write the patch eventually - I just
> wanted to do an RFC first.

Speaking partly as a consumer of the comments and partly as someone who 
generates the commits through APIs, I would ask that the commit tab handling 
semantic be more formalized than just tab size to strbuf_stripspace(). While it 
might seem a bit unfair to have to worry about non-git git clients, the 
detabbing can impact the other commit implementers (e.g., SourceTree, EGit, 
JGit, and the raft of process automation bits out there using JGit for cool 
stuff). Personally, I would prefer to have a normalized behaviour so that any 
bit of automation building a commit message would have a specific definition to 
go to (and hopefully comply with) in order to properly format the message for 
posterity and across all consumers. It might also be useful to have some 
ability to be presentation-compatible with legacy commits (done after this type 
of enhancement) so that a reasonable presentation can be done for those 8 year 
old commits that still have embedded tabs. Personally, I don't encourage tabs 
in commits myself and do see the value of this, but is this really restricted 
just to git am?

Just my $0.02,

Randall

-- Brief whoami: NonStop developer since approximately 
UNIX(421664400)/NonStop(2112884442)
-- In my real life, I talk too much.



--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Tabs in commit messages - de-tabify option in strbuf_stripspace()?

2016-03-15 Thread Stefan Beller
On Tue, Mar 15, 2016 at 5:16 PM, Linus Torvalds
 wrote:
> Do people hate that idea? I may not get around to it for a while (it's
> the kernel merge window right now), but I can write the patch
> eventually - I just wanted to do an RFC first.

Could you point at some example to better understand the problem?

Thanks,
Stefan

>
> Linus
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html