Re: [PATCH v2] checkpatch: look for common misspellings
On Fri, Sep 12, 2014 at 1:45 PM, Joe Perches wrote: > On Fri, 2014-09-12 at 13:09 +0900, Masanari Iida wrote: >> Test with "reseting" case, codespell found 21, grep found 26. > > Hello Masanari. > > How did codespell find any uses of reseting? > What version of codespell are you using? > (I tested with 1.7) > > Looking at the git tree for codespell, > https://github.com/lucasdemarchi/codespell.git > the dictionary there doesn't have reseting. > Joe, First of all, I use codespell 1.4 scripts with my original dictionary based on 1.4. So I believe the "reseting" was added by me some times ago. > If I add reseting->resetting to the dictionary, > then codespell finds the same 31 uses that > git grep -i does. > My codespell 1.4 works as case sensitive. That's why we saw a little bit different result. Masanari -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] checkpatch: look for common misspellings
On Fri, 2014-09-12 at 13:09 +0900, Masanari Iida wrote: > Test with "reseting" case, codespell found 21, grep found 26. Hello Masanari. How did codespell find any uses of reseting? What version of codespell are you using? (I tested with 1.7) Looking at the git tree for codespell, https://github.com/lucasdemarchi/codespell.git the dictionary there doesn't have reseting. If I add reseting->resetting to the dictionary, then codespell finds the same 31 uses that git grep -i does. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] checkpatch: look for common misspellings
Talking about codespell, it detected 76 "informations" in 3.17-rc4. " grep -R informations * |wc -l" found 120 typos. Test with "occured", codespell found 46, grep found 110. Test with "reseting" case, codespell found 21, grep found 26. So I expect about half of the incoming typos will be detected by the tool, and be fixed. Masanari -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] checkpatch: look for common misspellings
On Thu, 2014-09-11 at 09:19 +0200, Geert Uytterhoeven wrote: > On Thu, Sep 11, 2014 at 12:52 AM, Andrew Morton > wrote: > > On Mon, 8 Sep 2014 11:15:24 -0700 Kees Cook wrote: > >> Check for misspellings, based on Debian's lintian list. Several false > >> positives were removed, and several additional words added that were [] > > I have a feeling this is going to be a rat hole and that > > scripts/spelling.txt will grow to consume the planet. Oh well, whatev. > > What about making checkpatch use the codespell dictionay if codespell > is installed? > > Codespell is in Ubuntu 14.04LTS (but not in 12.04LTS). I'm a little concerned about false positives if that's done, but it seems simple enough. Maybe both of: codespell: /usr/share/codespell/dictionary.txt lintian:/usr/share/lintian/data/spelling/corrections -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] checkpatch: look for common misspellings
On Thu, Sep 11, 2014 at 12:19 AM, Geert Uytterhoeven wrote: > On Thu, Sep 11, 2014 at 12:52 AM, Andrew Morton > wrote: >> On Mon, 8 Sep 2014 11:15:24 -0700 Kees Cook wrote: >> >>> Check for misspellings, based on Debian's lintian list. Several false >>> positives were removed, and several additional words added that were >>> common in the kernel: >>> >>> backword backwords >>> invalide valide >>> recieves >>> singed unsinged >>> >>> While going back and fixing existing spelling mistakes isn't a high >>> priority, it'd be nice to try to catch them before they hit the tree. >> >> I have a feeling this is going to be a rat hole and that >> scripts/spelling.txt will grow to consume the planet. Oh well, whatev. > > What about making checkpatch use the codespell dictionay if codespell > is installed? > > Codespell is in Ubuntu 14.04LTS (but not in 12.04LTS). It's probably not a bad idea, but given the level of pruning that's been needed already to keep down the false positive rate, I'm nervous about a larger "general" corpus. -Kees -- Kees Cook Chrome OS Security -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] checkpatch: look for common misspellings
On Thu, Sep 11, 2014 at 12:52 AM, Andrew Morton wrote: > On Mon, 8 Sep 2014 11:15:24 -0700 Kees Cook wrote: > >> Check for misspellings, based on Debian's lintian list. Several false >> positives were removed, and several additional words added that were >> common in the kernel: >> >> backword backwords >> invalide valide >> recieves >> singed unsinged >> >> While going back and fixing existing spelling mistakes isn't a high >> priority, it'd be nice to try to catch them before they hit the tree. > > I have a feeling this is going to be a rat hole and that > scripts/spelling.txt will grow to consume the planet. Oh well, whatev. What about making checkpatch use the codespell dictionay if codespell is installed? Codespell is in Ubuntu 14.04LTS (but not in 12.04LTS). Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] checkpatch: look for common misspellings
On Wed, 2014-09-10 at 15:52 -0700, Andrew Morton wrote: > Have a kernel joke: [] > @@ -553,6 +553,7 @@ jeffies||jiffies > +kubys|linus Gimmu Smftre/// -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] checkpatch: look for common misspellings
On Mon, 8 Sep 2014 11:15:24 -0700 Kees Cook wrote: > Check for misspellings, based on Debian's lintian list. Several false > positives were removed, and several additional words added that were > common in the kernel: > > backword backwords > invalide valide > recieves > singed unsinged > > While going back and fixing existing spelling mistakes isn't a high > priority, it'd be nice to try to catch them before they hit the tree. I have a feeling this is going to be a rat hole and that scripts/spelling.txt will grow to consume the planet. Oh well, whatev. Have a kernel joke: --- a/scripts/spelling.txt~checkpatch-look-for-common-misspellings-fix +++ a/scripts/spelling.txt @@ -553,6 +553,7 @@ jeffies||jiffies juse||just jus||just kown||known +kubys|linus langage||language langauage||language langauge||language _ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] checkpatch: look for common misspellings
On Wed, 2014-09-10 at 13:37 +0900, Masanari Iida wrote: > Hello Joe, Kees, Hello Masanari-san. > Sorry for late reply. > I was on holiday when the version 1 patch discussions were posted. No worries, holidays are far more important than patches like this... These patches are simple niceties, not fixes for bugs, so review and acceptance timing is not urgent. > I am using codespell ( https://github.com/lucasdemarchi/codespell/ ). > The codespell has its own typo dictionary. > The dictionary format is > > typo->good (1 candidate) > typo->good1,good2, (multiple candidates) > typo->good, comment (1 candidate with special remark) > > Its similar to your typo||good format. > > The license of the codespell is GPLv2 according to COPYING file in tar ball. > > Compare number of typo samples in dictionary. > Your dictionary : 1033 > codespell-1.4 : 4261 > codespell-1.4 + my adding 5245 > Your dictionary + codespell-1.4 + my adding - remove duplicate: 5742 > > Latest version of codespell is 1.7. > My dictionary is based on codespell-1.4. So I use the number as of 1.4. > > I can provide my typo samples under GPLv2 license. Thanks. Any additions you have to the dictionary would be gladly welcomed. Using a common format for the dictionary and any suggested corrections would be good too. Maybe the dictionary and code should be changed to use the codespell format. It seems a bit more flexible than the lintian form. I do not know if one project is more active than the other, but perhaps that should be the deciding factor. Or maybe just Kees' preference... Merging all these together might not be a good solution though. Right now, the checkpatch spelling code uses word boundaries that include an underscore. checkpatch spelling tests are done on 4 segments of a #define like "PREFIX_PREFERED_SEG_ABC" finding the misspelling of PREFERED. Some sifting of the dictionary is still necessary to eliminate some common prefixes to avoid too many false positives. For example, "ths" was dropped because it's a prefix used by several modules even though it's a somewhat frequent typo. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] checkpatch: look for common misspellings
Hello Joe, Kees, Sorry for late reply. I was on holiday when the version 1 patch discussions were posted. I am using codespell ( https://github.com/lucasdemarchi/codespell/ ). The codespell has its own typo dictionary. The dictionary format is typo->good (1 candidate) typo->good1,good2, (multiple candidates) typo->good, comment (1 candidate with special remark) Its similar to your typo||good format. The license of the codespell is GPLv2 according to COPYING file in tar ball. Compare number of typo samples in dictionary. Your dictionary : 1033 codespell-1.4 : 4261 codespell-1.4 + my adding 5245 Your dictionary + codespell-1.4 + my adding - remove duplicate: 5742 Latest version of codespell is 1.7. My dictionary is based on codespell-1.4. So I use the number as of 1.4. I can provide my typo samples under GPLv2 license. Masanari -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] checkpatch: look for common misspellings
On Mon, 2014-09-08 at 11:15 -0700, Kees Cook wrote: > Check for misspellings, based on Debian's lintian list. Several false > positives were removed, and several additional words added that were > common in the kernel: > > backword backwords > invalide valide > recieves > singed unsinged > > While going back and fixing existing spelling mistakes isn't a high > priority, it'd be nice to try to catch them before they hit the tree. Seems sensible enough. Acked-by: Joe Perches -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2] checkpatch: look for common misspellings
Check for misspellings, based on Debian's lintian list. Several false positives were removed, and several additional words added that were common in the kernel: backword backwords invalide valide recieves singed unsinged While going back and fixing existing spelling mistakes isn't a high priority, it'd be nice to try to catch them before they hit the tree. In the 13830 commits between 3.15 and 3.16, the script would have noticed 560 spelling mistakes. The top 25 are shown here: $ git log --pretty=oneline v3.15..v3.16 | wc -l 13830 $ git log --format='%H' v3.15..v3.16 | \ while read commit ; do \ echo "commit $commit" ; \ git log --format=email --stat -p -1 $commit | \ ./scripts/checkpatch.pl --types=typo_spelling --no-summary - ; \ done | tee spell_v3.15..v3.16.txt | grep "may be misspelled" | \ awk '{print $2}' | tr A-Z a-z | sort | uniq -c | sort -rn 21 'seperate' 17 'endianess' 15 'sucess' 13 'noticable' 11 'occured' 11 'accomodate' 10 'interrup' 9 'prefered' 8 'unecessary' 8 'explicitely' 7 'supress' 7 'overriden' 7 'immediatly' 7 'funtion' 7 'defult' 7 'childs' 6 'succesful' 6 'splitted' 6 'specifc' 6 'reseting' 6 'recieve' 6 'changable' 5 'tmis' 5 'singed' 5 'preceeding' Thanks to Joe Perches for rewrites, suggestions, additional misspelling entries, and testing. Signed-off-by: Kees Cook --- v2: - Joe Perches made several improvements, including: - relocated test to catch commit messages - handle alternative capitalizations - catch all mistakes in a line - additional misspelling fix entries --- scripts/checkpatch.pl | 44 ++- scripts/spelling.txt | 1042 + 2 files changed, 1085 insertions(+), 1 deletion(-) create mode 100644 scripts/spelling.txt diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index b385bcbbf2f5..d0ac3d30d93e 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -9,7 +9,8 @@ use strict; use POSIX; my $P = $0; -$P =~ s@.*/@@g; +$P =~ s@(.*)/@@g; +my $D = $1; my $V = '0.32'; @@ -43,6 +44,7 @@ my $configuration_file = ".checkpatch.conf"; my $max_line_length = 80; my $ignore_perl_version = 0; my $minimum_perl_version = 5.10.0; +my $spelling_file = "$D/spelling.txt"; sub help { my ($exitcode) = @_; @@ -429,6 +431,29 @@ our $allowed_asm_includes = qr{(?x: )}; # memory.h: ARM has a custom one +# Load common spelling mistakes and build regular expression list. +my $misspellings; +my @spelling_list; +my %spelling_fix; +open(my $spelling, '<', $spelling_file) +or die "$P: Can't open $spelling_file for reading: $!\n"; +while (<$spelling>) { + my $line = $_; + + $line =~ s/\s*\n?$//g; + $line =~ s/^\s*//g; + + next if ($line =~ m/^\s*#/); + next if ($line =~ m/^\s*$/); + + my ($suspect, $fix) = split(/\|\|/, $line); + + push(@spelling_list, $suspect); + $spelling_fix{$suspect} = $fix; +} +close($spelling); +$misspellings = join("|", @spelling_list); + sub build_types { my $mods = "(?x: \n" . join("|\n ", @modifierList) . "\n)"; my $all = "(?x: \n" . join("|\n ", @typeList) . "\n)"; @@ -2212,6 +2237,23 @@ sub process { "8-bit UTF-8 used in possible commit log\n" . $herecurr); } +# Check for various typo / spelling mistakes + if ($in_commit_log || $line =~ /^\+/) { + while ($rawline =~ /(?:^|[^a-z@])($misspellings)(?:$|[^a-z@])/gi) { + my $typo = $1; + my $typo_fix = $spelling_fix{lc($typo)}; + $typo_fix = ucfirst($typo_fix) if ($typo =~ /^[A-Z]/); + $typo_fix = uc($typo_fix) if ($typo =~ /^[A-Z]+$/); + my $msg_type = \&WARN; + $msg_type = \&CHK if ($file); + if (&{$msg_type}("TYPO_SPELLING", +"'$typo' may be misspelled - perhaps '$typo_fix'?\n" . $herecurr) && + $fix) { + $fixed[$fixlinenr] =~ s/(^|[^A-Za-z@])($typo)($|[^A-Za-z@])/$1$typo_fix$3/; + } + } + } + # ignore non-hunk lines and lines being removed next if (!$hunk_line || $line =~ /^-/); diff --git a/scripts/spelling.txt b/scripts/spelling.txt new file mode 100644 index ..fc7fd52b5e03 --- /dev/null +++ b/scripts/spelling.txt @@ -0,0 +1,1042 @@ +# Originally from Debian's Lintian tool. Various false positives have been +# removed, and various additions have been made as they've been discovered +# in the kernel source. +# +# License: GPLv2 +# +# The