Roi Dayan <[email protected]> writes:

> On 18/02/2025 15:30, Aaron Conole wrote:
>> Roi Dayan <[email protected]> writes:
>> 
>>> On 12/02/2025 19:18, Aaron Conole wrote:
>>>> Hi Roi,
>>>>
>>>> Roi Dayan via dev <[email protected]> writes:
>>>>
>>>>> Load dictionary_code.txt in addition to the default dictionary.
>>>>
>>>> The code dictionary isn't loaded by default with codespell 
>>>> (codespell_lib/_codespell.py)::
>>>>
>>>>   _builtin_default = "clear,rare"
>>>>
>>>> And there are some questionable conversions in that dictionary (like
>>>> uint to unit and stdio to studio).  I think adding the _rare dictionary
>>>> could make sense, but perhaps we should be more careful when adding the
>>>> others.
>>>>
>>>> Can you add the rationale for turning these on?  I think it's okay to
>>>> turn on more than one codespell dict, but we should consider the
>>>> individual dictionaries, too.
>>>
>>> I don't think it matters what is loaded by default or not as the script
>>> uses enchant and not codespell.
>> 
>> Yes, but the point is the codespell authors don't think that this
>> dictionary is a good default.
>> 
>>> Also don't look at the conversions as it's not being used since we don't
>>> use codespell. In the code below it's being stripped to take only the
>>> final wording and add to enchant as allowed words.
>>>
>>> I looked again also in the others and I think most of the words already in
>>> enchant dictionary but loading them won't harm.
>>> I do think we can skip the main dictionary_en-GB_to_en-US.txt for example
>>> as we use the enchant en_US dictionary which should be equal more or less.
>>> The other has more unique words which I think this is what we can say in the
>>> commit message.
>>>
>>> What do you think?
>> 
>> Yes, as you noted most of the words are already there.  I actually ran
>> through many of the RHS spellings, and they already appear (as you
>> noted).  Actually, we only are not already getting:
>> 
>>   * copiable
>>   * clonable
>>   * subpatches
>>   * traceback
>>   * tracebacks
>> 
>> Just 5 words and they are not actually universally agreed upon
>> spellings.  For example, if I use something like wiktionary (not the
>> most authoritative source, I agree):
>> 
>>   https://en.wiktionary.org/wiki/clonable#English
>> 
>> It says that 'cloneable' is an alternative form used in computing
>> context.  Enchant suggests 'clone able' or 'clone-able'
>> 
>> Likewise, there isn't an accepted form of copiable (and enchant does
>> similar, including with subpatches).
>> 
>> So I guess 'traceback' and 'tracebacks' are for sure the ones that there
>> isn't yet any ambiguity.
>> 
>> Anyway, I guess it's okay to add, but we should probably consider
>> looking at all the dictionaries and seeing which ones make sense to add
>> as well.  Otherwise, it's quite a bit of change here for something that
>> could be done by just adding the words above directly (ie: you make 7
>> lines of change here, vs adding words to extra_keywords).
>> 
>
> yes but this change allows newer versions of codespell with potential
> updates to the dictionary to catch in.
>
> I looked a bit in the other dictionaries.
> We probably don't want the main one dictionary_en-GB_to_en-US.txt as we
> use enchant for core words.
> Also we probably won't need dictionary_usage.txt, dictionary_rare.txt,
> dictionary_names.txt as they seem to be more for spelling mistakes rather
> than introducing words.
>
> So the only exception is dictionary.txt which is already loaded and
> dictionary_code.txt which seems to add those more accepted words
> like you noted.
>
> So I don't think we need to add the others. from here we can keep
> updating the internal list.
>
> What do you think?

I've been thinking about it, and I think it could be useful to have this
facility.  Can you make the dictionary selection also configurable via
command line (similar to codespell option)?

>>> Files I see in codespell path:
>>>
>>> dictionary_code.txt
>>> dictionary_en-GB_to_en-US.txt
>>> dictionary_informal.txt
>>> dictionary_names.txt
>>> dictionary_rare.txt
>>> dictionary.txt
>>> dictionary_usage.txt
>>>
>>>
>>>>
>>>>> Signed-off-by: Roi Dayan <[email protected]>
>>>>> Acked-by: Salem Sol <[email protected]>
>>>>> ---
>>>>>  utilities/checkpatch.py | 14 ++++++++------
>>>>>  1 file changed, 8 insertions(+), 6 deletions(-)
>>>>>
>>>>> diff --git a/utilities/checkpatch.py b/utilities/checkpatch.py
>>>>> index f8caeb811604..9571380c291f 100755
>>>>> --- a/utilities/checkpatch.py
>>>>> +++ b/utilities/checkpatch.py
>>>>> @@ -42,14 +42,16 @@ missing_authors = []
>>>>>  def open_spell_check_dict():
>>>>>      import enchant
>>>>>  
>>>>> +    codespell_files = []
>>>>>      try:
>>>>>          import codespell_lib
>>>>>          codespell_dir = os.path.dirname(codespell_lib.__file__)
>>>>> -        codespell_file = os.path.join(codespell_dir, 'data', 
>>>>> 'dictionary.txt')
>>>>> -        if not os.path.exists(codespell_file):
>>>>> -            codespell_file = ''
>>>>> +        for fn in ['dictionary.txt', 'dictionary_code.txt']:
>>>>> +            fn = os.path.join(codespell_dir, 'data', fn)
>>>>> +            if os.path.exists(fn):
>>>>> +                codespell_files.append(fn)
>>>>>      except:
>>>>> -        codespell_file = ''
>>>>> +        pass
>>>>>  
>>>>>      try:
>>>>>          extra_keywords = ['ovs', 'vswitch', 'vswitchd', 'ovs-vswitchd',
>>>>> @@ -121,8 +123,8 @@ def open_spell_check_dict():
>>>>>  
>>>>>          spell_check_dict = enchant.Dict("en_US")
>>>>>  
>>>>> -        if codespell_file:
>>>>> -            with open(codespell_file) as f:
>>>>> +        for fn in codespell_files:
>>>>> +            with open(fn) as f:
>>>>>                  for line in f.readlines():
>>>>>                      words = line.strip().split('>')[1].strip(', 
>>>>> ').split(',')
>>>>>                      for word in words:
>>>>
>> 

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to