On Wed, 2020-10-14 at 11:35 -0700, Joe Perches wrote: > On Wed, 2020-10-14 at 23:42 +0530, Dwaipayan Ray wrote: > > On Wed, Oct 14, 2020 at 11:33 PM Joe Perches <j...@perches.com> wrote: > > > On Wed, 2020-10-14 at 22:07 +0530, Dwaipayan Ray wrote: > > > > Recently, commit 4f6ad8aa1eac ("checkpatch: move repeated word test") > > > > moved the repeated word test to check for more file types. But after > > > > this, if checkpatch.pl is run on MAINTAINERS, it generates several > > > > new warnings of the type: > > > > > > Perhaps instead of adding more content checks so that > > > word boundaries are not something like \S but also > > > not punctuation so that content like > > > > > > git git:// > > > @size size > > > > > > does not match? > > > > > > > > Hi, > > So currently the words are trimmed of non alphabets before the check: > > > > while ($rawline =~ /\b($word_pattern) (?=($word_pattern))/g) { > > my $first = $1; > > my $second = $2; > > > > where, the word_pattern is: > > my $word_pattern = '\b[A-Z]?[a-z]{2,}\b'; > > I'm familiar. > > > So do you perhaps recommend modifying this word pattern to > > include the punctuation as well rather than trimming them off? > > Not really, perhaps use the capture group position > markers @- @+ or $-[1] $+[1] and $-[2] $+[2] with the > substr could be used to see what characters are > before and after the word matches.
Perhaps something like: --- scripts/checkpatch.pl | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index fab38b493cef..a65eb40a5539 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -3054,15 +3054,25 @@ sub process { my $first = $1; my $second = $2; + my $start_pos = $-[1]; + my $end_pos = $+[2]; if ($first =~ /(?:struct|union|enum)/) { pos($rawline) += length($first) + length($second) + 1; next; } - next if ($first ne $second); + next if (lc($first) ne lc($second)); next if ($first eq 'long'); + my $start_char = ""; + my $end_char = ""; + $start_char = substr($rawline, $start_pos - 1, 1) if ($start_pos > 0); + $end_char = substr($rawline, $end_pos, 1) if (length($rawline) > $end_pos); + + next if ($start_char =~ /^\S$/); + next if ($end_char !~ /^[\.\,\s]?$/); + if (WARN("REPEATED_WORD", "Possible repeated word: '$first'\n" . $herecurr) && $fix) {