Control: clone -1 -2
Control: tag -1 + confirmed pending
Control: retitle -2 lintian: New spelling corrections should be automatically 
checked against an american and a british english dictionary
Control: severity -2 wishlist

Hi Andreas,

Andreas Beckmann wrote:
> 'licence' is a valid (mostly british) variant of license

Yep, noticed this as well before I saw your bug report. Already fixed in
https://salsa.debian.org/lintian/lintian/-/commit/7d801b2c9c88683051afe0937b46f065cb8873a2

> Perhaps (new) spelling corrections should be automatically checked
> against an american and a british english dictionary and carefully
> reconsidered if they are found?

Good idea! Cloning the bug report for that accordingly as this is a
separate thing.

Still don't have an idea how to actually do that, but I guess it will
be part of the test suite, not a commit hook.

> Without implying to delete all the matches (I haven't heard most of the
> matching words and would need to look up their meaning...):
> 
> $ grep -v ^# /usr/share/lintian/data/spelling/corrections | cut -d '|' -f 1 | 
> while read word ; do grep "^$word\$" /usr/share/dict/american-english 
> /usr/share/dict/british-english ; done

Thanks for figuring out this nice little command! I though will try to
optimize it to not call grep for each word but use something like:

  grep -Fw -f <(grep -v '^#' /usr/share/lintian/data/spelling/corrections | cut 
-d '|' -f 1) /usr/share/dict/american-english /usr/share/dict/british-english

I now wonder if we should use wamerican/wbritish or
wamerican-insane/wbritish-insane for that. Maybe wamerican/wbritish is
a good start and if we still get too many false posiives, we can
extend it to use wamerican-insane/wbritish-insane. (The latter will
probably also take longer. But then again with my optimized query
above it also just takes less than a second on a 7 year old laptop.
And it yields about 350 hits.)

Some comments about some of those you found:

> /usr/share/dict/american-english:bellow
> /usr/share/dict/british-english:bellow
> /usr/share/dict/american-english:singed
> /usr/share/dict/british-english:singed

Would keep these. The chances that it is a misspelling of "below" or
"signed" are IMHO much higher than the chance that it is used in
Debian in its actual meaning.

So in case we write a test for this, we should probably list
exceptions we want to keep in that test.

> /usr/share/dict/american-english:convertor
> /usr/share/dict/british-english:convertor
> /usr/share/dict/american-english:dependance
> /usr/share/dict/american-english:dependant
> /usr/share/dict/british-english:dependant
> /usr/share/dict/american-english:extravert
> /usr/share/dict/british-english:extravert
> /usr/share/dict/american-english:extraverts
> /usr/share/dict/british-english:extraverts
> /usr/share/dict/american-english:licence
> /usr/share/dict/british-english:licence
> /usr/share/dict/american-english:miniscule
> /usr/share/dict/british-english:miniscule
> /usr/share/dict/american-english:venders
> /usr/share/dict/american-english:vender
> /usr/share/dict/american-english:want's
> /usr/share/dict/british-english:want's

These should probably be removed. They all look like alternative
spellings, either historic or local.

Not sure about the remaining ones.

                Regards, Axel
-- 
 ,''`.  |  Axel Beckert <a...@debian.org>, https://people.debian.org/~abe/
: :' :  |  Debian Developer, ftp.ch.debian.org Admin
`. `'   |  4096R: 2517 B724 C5F6 CA99 5329  6E61 2FF9 CD59 6126 16B5
  `-    |  1024D: F067 EA27 26B9 C3FC 1486  202E C09E 1D89 9593 0EDE

Reply via email to