On 2018-05-02 00:13, Andrew Morton wrote:
> On Thu, 26 Apr 2018 22:24:44 +0300 Alexey Dobriyan <[email protected]> 
> wrote:
> 
>>> The LOC argument also does not sound very convincing.
>>
>> When was the last time you did -80 kLOC patch for free?
> 
> That would be the way to do it - sell the idea to Linus, send him a
> script to do it then stand back.  The piecemeal approach is ongoing
> pain.
> 

FWIW, it's not just removing some identifiers from cpp's hash tables, it
also reduces I/O: Due to our header mess, we have some cyclic includes,
e.g mm.h -> memremap.h -> mm.h. While parsing mm.h, cpp sees the #define
_LINUX_MM_H, then goes parsing memremap.h, but since it hasn't reached
the end of mm.h yet (seeing that there's nothing but comments outside
the #ifndef/#endif pair), it hasn't had a chance to set the internal
flag for mm.h, so it goes slurping in mm.h again. Obviously, the
definedness of _LINUX_MM_H at that point means it "only" has to parse
those 87K for comments and matching up #ifs, #ifdefs,#endifs etc. With
#pragma once, the flag gets set for mm.h immediately, so the #include
from memremap.h is entirely ignored. This can easily be verified with
strace. And mm.h is not the only header getting read twice.

I had some "extract the include guard" line noise lying around, so I
hacked up the below if someone wants to play some more with this. A few
not-very-careful kbuild timings didn't show anything significant, but
both the before and after times were way too noisy, and I only patched
include/linux/*.h.

Anyway, the first order of business is to figure out which ones to leave
alone. We have a bunch of #ifndef THAT_ONE #error "don't include
$this_one directly". The brute-force way is to simply record all macros
which are checked for definedness at least twice.

git grep -h -E '^\s*#\s*if(.*defined\s*\(|n?def)\s*[A-Za-z0-9_]+' | grep
-o -E '[A-Za-z_][A-Za-z_0-9]*' | sort | uniq --repeated > multest.txt

But there's also stuff like arch/x86/boot/compressed/kaslr.c that plays
games with pre-defining _EXPORT_H to avoid parsing export.h when it
inevitably gets included. Oh well, just add the list of macros that have
at least two definitions.

git grep -h -E '^\s*#\s*define\s+[A-Za-z0-9_]+' | grep -o -E
'^\s*#\s*define\s+[A-Za-z0-9_]+' | grep -oE '[A-Za-z0-9_]+' | sort |
uniq --repeated > muldef.txt

With those, one can just do

cat muldef.txt multest.txt | scripts/replace_ig.pl ...

This ends up detecting a lot of copy-pasting (e.g.
__LINUX_MFD_MAX8998_H), as well as lots of headers that for no obvious
reason do not have an include guard. Oh, and once.h has a redundant \.

Rasmus

wear sunglasses...

=== scripts/replace_ig.pl ===

#!/usr/bin/perl

use strict;
use warnings;
use File::Slurp;

my %preserve;

sub strip_comments {
    my $txt = shift;

    # Line continuations are handled before comment stripping, so
    # <slash> <backslash> <newline> <star> actually starts a comment,
    # and a // comment can swallow the following line. Let's just
    # assume nobody has modified the #if control flow using such dirty
    # tricks when we do a more naive line-by-line parsing below to
    # actually remove the include guard deffery.
    $txt =~ s/\\\n//g;

    # http://stackoverflow.com/a/911583/722859
    $txt =~ s{
                 /\*         ##  Start of /* ... */ comment
                 [^*]*\*+    ##  Non-* followed by 1-or-more *'s
                 (?:
                     [^/*][^*]*\*+
                 )*          ##  0-or-more things which don't start with /
                 ##    but do end with '*'
                 /           ##  End of /* ... */ comment

             |
                 //     ## Start of // comment
                 [^\n]* ## Anything which is not a newline
                 (?=\n) ## End of // comment; use look-ahead to avoid consuming 
the
newline

             |         ##     OR  various things which aren't comments:

                 (
                     "           ##  Start of " ... " string
                     (?:
                         \\.           ##  Escaped char
                     |               ##    OR
                         [^"\\]        ##  Non "\
                     )*
                     "           ##  End of " ... " string

                 |         ##     OR

                     '           ##  Start of ' ... ' string
                     (
                         \\.           ##  Escaped char
                     |               ##    OR
                         [^'\\]        ##  Non '\
                     )*
                     '           ##  End of ' ... ' string

                 |         ##     OR

                     .           ##  Anything other char
                     [^/"'\\]*   ##  Chars which doesn't start a comment, 
string or escape
                 )
         }{defined $1 ? $1 : " "}gxse;

    return $txt;
}

sub include_guard {
    my $txt = shift;
    my @lines = (split /^/, $txt);
    my $i = 0;
    my $level = 1;
    my $name;

   # The first non-empty line must be an #ifndef or an #if !defined().
    ++$i while ($i < @lines && $lines[$i] =~ m/^\s*$/);
    goto not_found if ($i == @lines);
    goto not_found
        if (!($lines[$i] =~
m/^\s*#\s*ifndef\s+(?<name>[A-Za-z_][A-Za-z_0-9]*)\s*$/) &&
            !($lines[$i] =~
m/^\s*#\s*if\s+!\s*defined\s*\(\s*(?<name>[A-Za-z_][A-Za-z_0-9]*)\s*\)\s*$/));
    $name = $+{name};

    # The next non-empty line must be a #define of that macro.
    1 while (++$i < @lines && $lines[$i] =~ m/^\s*$/);
    goto not_found if ($i == @lines);
    goto not_found if !($lines[$i] =~ m/^\s*#\s*define\s+\b$name\b/);

    # Now track #ifs and #endifs. #elifs and #elses don't change the level.
    while (++$i < @lines && $level > 0) {
        if ($lines[$i] =~ m/^\s*#\s*(?:if|ifdef|ifndef)\b/) {
            $level++;
        } elsif ($lines[$i] =~ m/^\s*#\s*endif\b/) {
            $level--;
        }
    }
    goto not_found if ($level > 0); # issue a warning?
    # Check that the rest of the file consists of empty lines.
    ++$i while ($i < @lines && $lines[$i] =~ m/^\s*$/);
    goto not_found if ($i < @lines);
    return $name;

 not_found:
    return undef;
}

sub do_file {
    my $fn = shift;
    my $src = read_file($fn);
    my $ig = include_guard(strip_comments($src));
    if (not defined $ig) {
        printf STDERR "%s: no include guard\n", $fn;
        return;
    }
    if (exists $preserve{$ig}) {
        printf STDERR "%s: include guard %s exempted\n", $fn, $ig;
        return;
    }

    # OK, the entire text should match this horrible regexp.
    if ($src =~ m{
  (.*?) # arbitrary stuff before #ifndef
  (^\s*\#\s*if(?:\s*!\s*defined\s*\(\s*$ig\s*\)|ndef\s*$ig) .*? \n #
  (?:^\s*\n)*
   ^\s*\#\s*define\s*$ig .*? \n) # 2/3 of include guard
  (.*(?=^\s*\#\s*endif)) # body of file
  (^\s*\#\s*endif .*? \n) # last 1/3
  (.*) # rest of file (trailing comments)
        }smx) {
        my $pre = $1;
        my $define = $2;
        my $body = $3;
        my $endif = $4;
        my $post = $5;
        $body =~ s/\n[ \t]*\n$/\n/g;
        $src = $pre . "#pragma once\n";
        $src .= $body . $post;
    } else {
        printf STDERR "%s: has include guard %s, but I failed to replace it
with #pragma once\n",
        $fn, $ig;
        return;
    }
    write_file($fn, $src);
}

while (<STDIN>) {
    chomp;
    $preserve{$_} = 1;
}

for (@ARGV) {
    do_file($_);
}

Reply via email to