Control: clone -1 -2 Control: tag -1 + confirmed pending Control: retitle -2 lintian: New spelling corrections should be automatically checked against an american and a british english dictionary Control: severity -2 wishlist
Hi Andreas, Andreas Beckmann wrote: > 'licence' is a valid (mostly british) variant of license Yep, noticed this as well before I saw your bug report. Already fixed in https://salsa.debian.org/lintian/lintian/-/commit/7d801b2c9c88683051afe0937b46f065cb8873a2 > Perhaps (new) spelling corrections should be automatically checked > against an american and a british english dictionary and carefully > reconsidered if they are found? Good idea! Cloning the bug report for that accordingly as this is a separate thing. Still don't have an idea how to actually do that, but I guess it will be part of the test suite, not a commit hook. > Without implying to delete all the matches (I haven't heard most of the > matching words and would need to look up their meaning...): > > $ grep -v ^# /usr/share/lintian/data/spelling/corrections | cut -d '|' -f 1 | > while read word ; do grep "^$word\$" /usr/share/dict/american-english > /usr/share/dict/british-english ; done Thanks for figuring out this nice little command! I though will try to optimize it to not call grep for each word but use something like: grep -Fw -f <(grep -v '^#' /usr/share/lintian/data/spelling/corrections | cut -d '|' -f 1) /usr/share/dict/american-english /usr/share/dict/british-english I now wonder if we should use wamerican/wbritish or wamerican-insane/wbritish-insane for that. Maybe wamerican/wbritish is a good start and if we still get too many false posiives, we can extend it to use wamerican-insane/wbritish-insane. (The latter will probably also take longer. But then again with my optimized query above it also just takes less than a second on a 7 year old laptop. And it yields about 350 hits.) Some comments about some of those you found: > /usr/share/dict/american-english:bellow > /usr/share/dict/british-english:bellow > /usr/share/dict/american-english:singed > /usr/share/dict/british-english:singed Would keep these. The chances that it is a misspelling of "below" or "signed" are IMHO much higher than the chance that it is used in Debian in its actual meaning. So in case we write a test for this, we should probably list exceptions we want to keep in that test. > /usr/share/dict/american-english:convertor > /usr/share/dict/british-english:convertor > /usr/share/dict/american-english:dependance > /usr/share/dict/american-english:dependant > /usr/share/dict/british-english:dependant > /usr/share/dict/american-english:extravert > /usr/share/dict/british-english:extravert > /usr/share/dict/american-english:extraverts > /usr/share/dict/british-english:extraverts > /usr/share/dict/american-english:licence > /usr/share/dict/british-english:licence > /usr/share/dict/american-english:miniscule > /usr/share/dict/british-english:miniscule > /usr/share/dict/american-english:venders > /usr/share/dict/american-english:vender > /usr/share/dict/american-english:want's > /usr/share/dict/british-english:want's These should probably be removed. They all look like alternative spellings, either historic or local. Not sure about the remaining ones. Regards, Axel -- ,''`. | Axel Beckert <a...@debian.org>, https://people.debian.org/~abe/ : :' : | Debian Developer, ftp.ch.debian.org Admin `. `' | 4096R: 2517 B724 C5F6 CA99 5329 6E61 2FF9 CD59 6126 16B5 `- | 1024D: F067 EA27 26B9 C3FC 1486 202E C09E 1D89 9593 0EDE