Re: [PATH] [CLEANUP] Remove trailing whitespace characters
On 9/11/23 16:43, Eric Gallager via Gcc-patches wrote: On Mon, Sep 11, 2023 at 9:43 AM Jakub Jelinek via Gcc-patches wrote: On Mon, Sep 11, 2023 at 09:27:48AM -0400, David Malcolm via Gcc-patches wrote: On Sun, 2023-09-10 at 16:36 +0200, Guillaume Gomez wrote: When going through the code, I saw a lot of trailing whitespace characters so I decided to write a small script that would remove them. I didn't expect there would be so many though... Not sure if patch with so many changes are accepted like this or if I should send more focused one. I'm not sure either. Some notes on the patch: IMHO testsuite shouldn't be touched at all, there are certainly tests which test whether such sources are handled correctly. Non-C/C++ sources shouldn't be changed this way either. The ^L stuff should be preserved, not removed. And even with that, I'm not sure it is a good idea to change it because it will be a nightmare for git blame. Some git hosting services have added support for special files to ignore revisions like this in git blame, for example, on GitHub, it's called .git-blame-ignore-revs: https://github.blog/changelog/2022-03-24-ignore-commits-in-the-blame-view-beta/ See for example: https://github.com/cooljeanius/highlight.js/blob/main/.git-blame-ignore-revs This is also something you can setup locally, without changing the remote: `git config blame.ignoreRevsFile ` so this could be a good step to add to the `gcc-git-customization.sh` script if such a file was created. The usual way of fixing up formatting if it was committed in a broken way is only when one is touching with real code changes something, fixing up formatting on it or around it is fine. If we decide to fix formatting in bulk, I think we should have a flag day and change also other formatting mistakes at the same time (say 8 spaces instead of tabs for start of line indentation (before first non-blank character), = at the end of line except for static var initializers, etc. But to make that worthwhile, it would be better to then have a pre-commit hook that would enforce formatting. And, we haven't managed to come up with something like that yet. Jakub Kindly, Arthur OpenPGP_0x1B3465B044AD9C65.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
Re: [PATH] [CLEANUP] Remove trailing whitespace characters
On Mon, Sep 11, 2023 at 9:43 AM Jakub Jelinek via Gcc-patches wrote: > > On Mon, Sep 11, 2023 at 09:27:48AM -0400, David Malcolm via Gcc-patches wrote: > > On Sun, 2023-09-10 at 16:36 +0200, Guillaume Gomez wrote: > > > When going through the code, I saw a lot of trailing whitespace > > > characters so I decided to write a small script that would remove > > > them. I didn't expect there would be so many though... Not sure if > > > patch with so many changes are accepted like this or if I should send > > > more focused one. > > > > I'm not sure either. > > > > Some notes on the patch: > > IMHO testsuite shouldn't be touched at all, there are certainly tests > which test whether such sources are handled correctly. > > Non-C/C++ sources shouldn't be changed this way either. > > The ^L stuff should be preserved, not removed. > > And even with that, I'm not sure it is a good idea to change it because > it will be a nightmare for git blame. > Some git hosting services have added support for special files to ignore revisions like this in git blame, for example, on GitHub, it's called .git-blame-ignore-revs: https://github.blog/changelog/2022-03-24-ignore-commits-in-the-blame-view-beta/ See for example: https://github.com/cooljeanius/highlight.js/blob/main/.git-blame-ignore-revs > The usual way of fixing up formatting if it was committed in a broken way > is only when one is touching with real code changes something, fixing up > formatting on it or around it is fine. > > If we decide to fix formatting in bulk, I think we should have a flag day > and change also other formatting mistakes at the same time (say 8 spaces > instead of tabs for start of line indentation (before first non-blank > character), = at the end of line except for static var initializers, etc. > But to make that worthwhile, it would be better to then have a pre-commit > hook that would enforce formatting. And, we haven't managed to come up with > something like that yet. > > Jakub >
Re: [PATH] [CLEANUP] Remove trailing whitespace characters
On Mon, Sep 11, 2023 at 09:27:48AM -0400, David Malcolm via Gcc-patches wrote: > On Sun, 2023-09-10 at 16:36 +0200, Guillaume Gomez wrote: > > When going through the code, I saw a lot of trailing whitespace > > characters so I decided to write a small script that would remove > > them. I didn't expect there would be so many though... Not sure if > > patch with so many changes are accepted like this or if I should send > > more focused one. > > I'm not sure either. > > Some notes on the patch: IMHO testsuite shouldn't be touched at all, there are certainly tests which test whether such sources are handled correctly. Non-C/C++ sources shouldn't be changed this way either. The ^L stuff should be preserved, not removed. And even with that, I'm not sure it is a good idea to change it because it will be a nightmare for git blame. The usual way of fixing up formatting if it was committed in a broken way is only when one is touching with real code changes something, fixing up formatting on it or around it is fine. If we decide to fix formatting in bulk, I think we should have a flag day and change also other formatting mistakes at the same time (say 8 spaces instead of tabs for start of line indentation (before first non-blank character), = at the end of line except for static var initializers, etc. But to make that worthwhile, it would be better to then have a pre-commit hook that would enforce formatting. And, we haven't managed to come up with something like that yet. Jakub
Re: [PATH] [CLEANUP] Remove trailing whitespace characters
Hi David. Thanks for the feedback! Well, I think at this point, better to get "global approval" on the idea and on what you suggested (which I agree with) before I make any additional changes considering how big the patch is. Let's wait for others to also express their opinion here.
Re: [PATH] [CLEANUP] Remove trailing whitespace characters
On Sun, 2023-09-10 at 16:36 +0200, Guillaume Gomez wrote: > When going through the code, I saw a lot of trailing whitespace > characters so I decided to write a small script that would remove > them. I didn't expect there would be so many though... Not sure if > patch with so many changes are accepted like this or if I should send > more focused one. I'm not sure either. Some notes on the patch: - the ChangeLog sensibly makes use of "Likewise", but for the initial file in each ChangeLog it incorrectly also says "Likewise". When these are copied into the individual ChangeLog files by the "Daily bump" cronjob, the Subject line from the commit won't be visible [1], so the thing that "Likewise" refers to won't be present. So that initial file in each category should read "Remove trailing whitespace characters". - the patch touches the testsuite. Note that not all source files in the testsuite are UTF-8 encoded, and we want the testsuite to contain a variety of source formatting idioms (and examples of badly formatted source code). - some of our source files use U+000C, the form feed character, and the patch eliminates these. I think this is an old convention used to indicate a major change of topic within the source file. Perhaps it leads to a page break when printing the source file? Personally I dislike this convention, and feel a suitable big comment line would be clearer such as: /* Name of new topic. General comments about the new topic, where useful. */ for such "high-level" source file organizational bounaries, but perhaps people like and use the form feed characters? Hope this is constructive Dave [1] see e.g. a134b6ce8e5c589f8c1b58cdf124cd4a916b0e8c > > Anyway, for posterity, here is the python script I used: > > ``` > from os import listdir > from os.path import isfile, join > > > def clean_file(p): > if not p.endswith(".cc") and not p.endswith(".h"): > return > with open(p, 'r', encoding='utf8') as f: > content = f.read().split('\n') > updated = 0 > i = 0 > while i < len(content): > s = content[i].rstrip() > if s != content[i]: > updated += 1 > content[i] = s > i += 1 > if updated == 0: > return > with open(p, 'w', encoding='utf8') as f: > f.write('\n'.join(content)) > > > def recur_read(p): > for f in listdir(p): > full_path = join(p, f) > if isfile(full_path): > clean_file(full_path) > else: > recur_read(full_path) > > recur_read(".") > ``` > > Cordially.