Re: [PATH] [CLEANUP] Remove trailing whitespace characters

2023-09-11 Thread Arthur Cohen

On 9/11/23 16:43, Eric Gallager via Gcc-patches wrote:

On Mon, Sep 11, 2023 at 9:43 AM Jakub Jelinek via Gcc-patches
 wrote:


On Mon, Sep 11, 2023 at 09:27:48AM -0400, David Malcolm via Gcc-patches wrote:

On Sun, 2023-09-10 at 16:36 +0200, Guillaume Gomez wrote:

When going through the code, I saw a lot of trailing whitespace
characters so I decided to write a small script that would remove
them. I didn't expect there would be so many though... Not sure if
patch with so many changes are accepted like this or if I should send
more focused one.


I'm not sure either.

Some notes on the patch:


IMHO testsuite shouldn't be touched at all, there are certainly tests
which test whether such sources are handled correctly.

Non-C/C++ sources shouldn't be changed this way either.

The ^L stuff should be preserved, not removed.

And even with that, I'm not sure it is a good idea to change it because
it will be a nightmare for git blame.



Some git hosting services have added support for special files to
ignore revisions like this in git blame, for example, on GitHub, it's
called .git-blame-ignore-revs:
https://github.blog/changelog/2022-03-24-ignore-commits-in-the-blame-view-beta/
See for example:
https://github.com/cooljeanius/highlight.js/blob/main/.git-blame-ignore-revs


This is also something you can setup locally, without changing the remote:

`git config blame.ignoreRevsFile `

so this could be a good step to add to the `gcc-git-customization.sh` 
script if such a file was created.



The usual way of fixing up formatting if it was committed in a broken way
is only when one is touching with real code changes something, fixing up
formatting on it or around it is fine.

If we decide to fix formatting in bulk, I think we should have a flag day
and change also other formatting mistakes at the same time (say 8 spaces
instead of tabs for start of line indentation (before first non-blank
character), = at the end of line except for static var initializers, etc.
But to make that worthwhile, it would be better to then have a pre-commit
hook that would enforce formatting.  And, we haven't managed to come up with
something like that yet.

 Jakub


Kindly,

Arthur


OpenPGP_0x1B3465B044AD9C65.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATH] [CLEANUP] Remove trailing whitespace characters

2023-09-11 Thread Eric Gallager via Gcc-patches
On Mon, Sep 11, 2023 at 9:43 AM Jakub Jelinek via Gcc-patches
 wrote:
>
> On Mon, Sep 11, 2023 at 09:27:48AM -0400, David Malcolm via Gcc-patches wrote:
> > On Sun, 2023-09-10 at 16:36 +0200, Guillaume Gomez wrote:
> > > When going through the code, I saw a lot of trailing whitespace
> > > characters so I decided to write a small script that would remove
> > > them. I didn't expect there would be so many though... Not sure if
> > > patch with so many changes are accepted like this or if I should send
> > > more focused one.
> >
> > I'm not sure either.
> >
> > Some notes on the patch:
>
> IMHO testsuite shouldn't be touched at all, there are certainly tests
> which test whether such sources are handled correctly.
>
> Non-C/C++ sources shouldn't be changed this way either.
>
> The ^L stuff should be preserved, not removed.
>
> And even with that, I'm not sure it is a good idea to change it because
> it will be a nightmare for git blame.
>

Some git hosting services have added support for special files to
ignore revisions like this in git blame, for example, on GitHub, it's
called .git-blame-ignore-revs:
https://github.blog/changelog/2022-03-24-ignore-commits-in-the-blame-view-beta/
See for example:
https://github.com/cooljeanius/highlight.js/blob/main/.git-blame-ignore-revs

> The usual way of fixing up formatting if it was committed in a broken way
> is only when one is touching with real code changes something, fixing up
> formatting on it or around it is fine.
>
> If we decide to fix formatting in bulk, I think we should have a flag day
> and change also other formatting mistakes at the same time (say 8 spaces
> instead of tabs for start of line indentation (before first non-blank
> character), = at the end of line except for static var initializers, etc.
> But to make that worthwhile, it would be better to then have a pre-commit
> hook that would enforce formatting.  And, we haven't managed to come up with
> something like that yet.
>
> Jakub
>


Re: [PATH] [CLEANUP] Remove trailing whitespace characters

2023-09-11 Thread Jakub Jelinek via Gcc-patches
On Mon, Sep 11, 2023 at 09:27:48AM -0400, David Malcolm via Gcc-patches wrote:
> On Sun, 2023-09-10 at 16:36 +0200, Guillaume Gomez wrote:
> > When going through the code, I saw a lot of trailing whitespace
> > characters so I decided to write a small script that would remove
> > them. I didn't expect there would be so many though... Not sure if
> > patch with so many changes are accepted like this or if I should send
> > more focused one.
> 
> I'm not sure either.
> 
> Some notes on the patch:

IMHO testsuite shouldn't be touched at all, there are certainly tests
which test whether such sources are handled correctly.

Non-C/C++ sources shouldn't be changed this way either.

The ^L stuff should be preserved, not removed.

And even with that, I'm not sure it is a good idea to change it because
it will be a nightmare for git blame.

The usual way of fixing up formatting if it was committed in a broken way
is only when one is touching with real code changes something, fixing up
formatting on it or around it is fine.

If we decide to fix formatting in bulk, I think we should have a flag day
and change also other formatting mistakes at the same time (say 8 spaces
instead of tabs for start of line indentation (before first non-blank
character), = at the end of line except for static var initializers, etc.
But to make that worthwhile, it would be better to then have a pre-commit
hook that would enforce formatting.  And, we haven't managed to come up with
something like that yet.

Jakub



Re: [PATH] [CLEANUP] Remove trailing whitespace characters

2023-09-11 Thread Guillaume Gomez via Gcc-patches
Hi David.

Thanks for the feedback! Well, I think at this point, better to get
"global approval" on the idea and on what you suggested (which I agree
with) before I make any additional changes considering how big the
patch is.
Let's wait for others to also express their opinion here.


Re: [PATH] [CLEANUP] Remove trailing whitespace characters

2023-09-11 Thread David Malcolm via Gcc-patches
On Sun, 2023-09-10 at 16:36 +0200, Guillaume Gomez wrote:
> When going through the code, I saw a lot of trailing whitespace
> characters so I decided to write a small script that would remove
> them. I didn't expect there would be so many though... Not sure if
> patch with so many changes are accepted like this or if I should send
> more focused one.

I'm not sure either.

Some notes on the patch:

- the ChangeLog sensibly makes use of "Likewise", but for the initial
file in each ChangeLog it incorrectly also says "Likewise".  When these
are copied into the individual ChangeLog files by the "Daily bump"
cronjob, the Subject line from the commit won't be visible [1], so the
thing that "Likewise" refers to won't be present.  So that initial file
in each category should read "Remove trailing whitespace characters".

- the patch touches the testsuite.  Note that not all source files in
the testsuite are UTF-8 encoded, and we want the testsuite to contain a
variety of source formatting idioms (and examples of badly formatted
source code).

- some of our source files use U+000C, the form feed character, and the
patch eliminates these.  I think this is an old convention used to
indicate a major change of topic within the source file.  Perhaps it
leads to a page break when printing the source file?  Personally I
dislike this convention, and feel a suitable big comment line would be
clearer such as:

/* Name of new topic.

   General comments about the new topic, where useful.  */

for such "high-level" source file organizational bounaries, but perhaps
people like and use the form feed characters?

Hope this is constructive
Dave

[1] see e.g. a134b6ce8e5c589f8c1b58cdf124cd4a916b0e8c


> 
> Anyway, for posterity, here is the python script I used:
> 
> ```
> from os import listdir
> from os.path import isfile, join
> 
> 
> def clean_file(p):
>     if not p.endswith(".cc") and not p.endswith(".h"):
>     return
>     with open(p, 'r', encoding='utf8') as f:
>     content = f.read().split('\n')
>     updated = 0
>     i = 0
>     while i < len(content):
>     s = content[i].rstrip()
>     if s != content[i]:
>     updated += 1
>     content[i] = s
>     i += 1
>     if updated == 0:
>     return
>     with open(p, 'w', encoding='utf8') as f:
>     f.write('\n'.join(content))
> 
> 
> def recur_read(p):
>     for f in listdir(p):
>     full_path = join(p, f)
>     if isfile(full_path):
>     clean_file(full_path)
>     else:
>     recur_read(full_path)
> 
> recur_read(".")
> ```
> 
> Cordially.