Re: [ovs-dev] [PATCH] checkpatch.py: Load multiple codespell dictionaries.

Aaron Conole Tue, 18 Feb 2025 05:30:34 -0800

Roi Dayan <[email protected]> writes:

> On 12/02/2025 19:18, Aaron Conole wrote:
>> Hi Roi,
>> 
>> Roi Dayan via dev <[email protected]> writes:
>> 
>>> Load dictionary_code.txt in addition to the default dictionary.
>> 
>> The code dictionary isn't loaded by default with codespell 
>> (codespell_lib/_codespell.py)::
>> 
>>   _builtin_default = "clear,rare"
>> 
>> And there are some questionable conversions in that dictionary (like
>> uint to unit and stdio to studio).  I think adding the _rare dictionary
>> could make sense, but perhaps we should be more careful when adding the
>> others.
>> 
>> Can you add the rationale for turning these on?  I think it's okay to
>> turn on more than one codespell dict, but we should consider the
>> individual dictionaries, too.
>
> I don't think it matters what is loaded by default or not as the script
> uses enchant and not codespell.


Yes, but the point is the codespell authors don't think that this
dictionary is a good default.

> Also don't look at the conversions as it's not being used since we don't
> use codespell. In the code below it's being stripped to take only the
> final wording and add to enchant as allowed words.
>
> I looked again also in the others and I think most of the words already in
> enchant dictionary but loading them won't harm.
> I do think we can skip the main dictionary_en-GB_to_en-US.txt for example
> as we use the enchant en_US dictionary which should be equal more or less.
> The other has more unique words which I think this is what we can say in the
> commit message.
>
> What do you think?

Yes, as you noted most of the words are already there.  I actually ran
through many of the RHS spellings, and they already appear (as you
noted).  Actually, we only are not already getting:

  * copiable
  * clonable
  * subpatches
  * traceback
  * tracebacks

Just 5 words and they are not actually universally agreed upon
spellings.  For example, if I use something like wiktionary (not the
most authoritative source, I agree):

  https://en.wiktionary.org/wiki/clonable#English

It says that 'cloneable' is an alternative form used in computing
context.  Enchant suggests 'clone able' or 'clone-able'

Likewise, there isn't an accepted form of copiable (and enchant does
similar, including with subpatches).

So I guess 'traceback' and 'tracebacks' are for sure the ones that there
isn't yet any ambiguity.

Anyway, I guess it's okay to add, but we should probably consider
looking at all the dictionaries and seeing which ones make sense to add
as well.  Otherwise, it's quite a bit of change here for something that
could be done by just adding the words above directly (ie: you make 7
lines of change here, vs adding words to extra_keywords).

> Files I see in codespell path:
>
> dictionary_code.txt
> dictionary_en-GB_to_en-US.txt
> dictionary_informal.txt
> dictionary_names.txt
> dictionary_rare.txt
> dictionary.txt
> dictionary_usage.txt
>
>
>> 
>>> Signed-off-by: Roi Dayan <[email protected]>
>>> Acked-by: Salem Sol <[email protected]>
>>> ---
>>>  utilities/checkpatch.py | 14 ++++++++------
>>>  1 file changed, 8 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/utilities/checkpatch.py b/utilities/checkpatch.py
>>> index f8caeb811604..9571380c291f 100755
>>> --- a/utilities/checkpatch.py
>>> +++ b/utilities/checkpatch.py
>>> @@ -42,14 +42,16 @@ missing_authors = []
>>>  def open_spell_check_dict():
>>>      import enchant
>>>  
>>> +    codespell_files = []
>>>      try:
>>>          import codespell_lib
>>>          codespell_dir = os.path.dirname(codespell_lib.__file__)
>>> -        codespell_file = os.path.join(codespell_dir, 'data', 
>>> 'dictionary.txt')
>>> -        if not os.path.exists(codespell_file):
>>> -            codespell_file = ''
>>> +        for fn in ['dictionary.txt', 'dictionary_code.txt']:
>>> +            fn = os.path.join(codespell_dir, 'data', fn)
>>> +            if os.path.exists(fn):
>>> +                codespell_files.append(fn)
>>>      except:
>>> -        codespell_file = ''
>>> +        pass
>>>  
>>>      try:
>>>          extra_keywords = ['ovs', 'vswitch', 'vswitchd', 'ovs-vswitchd',
>>> @@ -121,8 +123,8 @@ def open_spell_check_dict():
>>>  
>>>          spell_check_dict = enchant.Dict("en_US")
>>>  
>>> -        if codespell_file:
>>> -            with open(codespell_file) as f:
>>> +        for fn in codespell_files:
>>> +            with open(fn) as f:
>>>                  for line in f.readlines():
>>>                      words = line.strip().split('>')[1].strip(', 
>>> ').split(',')
>>>                      for word in words:
>> 

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] checkpatch.py: Load multiple codespell dictionaries.

Reply via email to