On 05/03/2025 17:25, Aaron Conole wrote:
> Roi Dayan <[email protected]> writes:
> 
>> On 18/02/2025 15:30, Aaron Conole wrote:
>>> Roi Dayan <[email protected]> writes:
>>>
>>>> On 12/02/2025 19:18, Aaron Conole wrote:
>>>>> Hi Roi,
>>>>>
>>>>> Roi Dayan via dev <[email protected]> writes:
>>>>>
>>>>>> Load dictionary_code.txt in addition to the default dictionary.
>>>>>
>>>>> The code dictionary isn't loaded by default with codespell 
>>>>> (codespell_lib/_codespell.py)::
>>>>>
>>>>>   _builtin_default = "clear,rare"
>>>>>
>>>>> And there are some questionable conversions in that dictionary (like
>>>>> uint to unit and stdio to studio).  I think adding the _rare dictionary
>>>>> could make sense, but perhaps we should be more careful when adding the
>>>>> others.
>>>>>
>>>>> Can you add the rationale for turning these on?  I think it's okay to
>>>>> turn on more than one codespell dict, but we should consider the
>>>>> individual dictionaries, too.
>>>>
>>>> I don't think it matters what is loaded by default or not as the script
>>>> uses enchant and not codespell.
>>>
>>> Yes, but the point is the codespell authors don't think that this
>>> dictionary is a good default.
>>>
>>>> Also don't look at the conversions as it's not being used since we don't
>>>> use codespell. In the code below it's being stripped to take only the
>>>> final wording and add to enchant as allowed words.
>>>>
>>>> I looked again also in the others and I think most of the words already in
>>>> enchant dictionary but loading them won't harm.
>>>> I do think we can skip the main dictionary_en-GB_to_en-US.txt for example
>>>> as we use the enchant en_US dictionary which should be equal more or less.
>>>> The other has more unique words which I think this is what we can say in 
>>>> the
>>>> commit message.
>>>>
>>>> What do you think?
>>>
>>> Yes, as you noted most of the words are already there.  I actually ran
>>> through many of the RHS spellings, and they already appear (as you
>>> noted).  Actually, we only are not already getting:
>>>
>>>   * copiable
>>>   * clonable
>>>   * subpatches
>>>   * traceback
>>>   * tracebacks
>>>
>>> Just 5 words and they are not actually universally agreed upon
>>> spellings.  For example, if I use something like wiktionary (not the
>>> most authoritative source, I agree):
>>>
>>>   https://en.wiktionary.org/wiki/clonable#English
>>>
>>> It says that 'cloneable' is an alternative form used in computing
>>> context.  Enchant suggests 'clone able' or 'clone-able'
>>>
>>> Likewise, there isn't an accepted form of copiable (and enchant does
>>> similar, including with subpatches).
>>>
>>> So I guess 'traceback' and 'tracebacks' are for sure the ones that there
>>> isn't yet any ambiguity.
>>>
>>> Anyway, I guess it's okay to add, but we should probably consider
>>> looking at all the dictionaries and seeing which ones make sense to add
>>> as well.  Otherwise, it's quite a bit of change here for something that
>>> could be done by just adding the words above directly (ie: you make 7
>>> lines of change here, vs adding words to extra_keywords).
>>>
>>
>> yes but this change allows newer versions of codespell with potential
>> updates to the dictionary to catch in.
>>
>> I looked a bit in the other dictionaries.
>> We probably don't want the main one dictionary_en-GB_to_en-US.txt as we
>> use enchant for core words.
>> Also we probably won't need dictionary_usage.txt, dictionary_rare.txt,
>> dictionary_names.txt as they seem to be more for spelling mistakes rather
>> than introducing words.
>>
>> So the only exception is dictionary.txt which is already loaded and
>> dictionary_code.txt which seems to add those more accepted words
>> like you noted.
>>
>> So I don't think we need to add the others. from here we can keep
>> updating the internal list.
>>
>> What do you think?
> 
> I've been thinking about it, and I think it could be useful to have this
> facility.  Can you make the dictionary selection also configurable via
> command line (similar to codespell option)?
> 

yes. sending v2.

>>>> Files I see in codespell path:
>>>>
>>>> dictionary_code.txt
>>>> dictionary_en-GB_to_en-US.txt
>>>> dictionary_informal.txt
>>>> dictionary_names.txt
>>>> dictionary_rare.txt
>>>> dictionary.txt
>>>> dictionary_usage.txt
>>>>
>>>>
>>>>>
>>>>>> Signed-off-by: Roi Dayan <[email protected]>
>>>>>> Acked-by: Salem Sol <[email protected]>
>>>>>> ---
>>>>>>  utilities/checkpatch.py | 14 ++++++++------
>>>>>>  1 file changed, 8 insertions(+), 6 deletions(-)
>>>>>>
>>>>>> diff --git a/utilities/checkpatch.py b/utilities/checkpatch.py
>>>>>> index f8caeb811604..9571380c291f 100755
>>>>>> --- a/utilities/checkpatch.py
>>>>>> +++ b/utilities/checkpatch.py
>>>>>> @@ -42,14 +42,16 @@ missing_authors = []
>>>>>>  def open_spell_check_dict():
>>>>>>      import enchant
>>>>>>  
>>>>>> +    codespell_files = []
>>>>>>      try:
>>>>>>          import codespell_lib
>>>>>>          codespell_dir = os.path.dirname(codespell_lib.__file__)
>>>>>> -        codespell_file = os.path.join(codespell_dir, 'data', 
>>>>>> 'dictionary.txt')
>>>>>> -        if not os.path.exists(codespell_file):
>>>>>> -            codespell_file = ''
>>>>>> +        for fn in ['dictionary.txt', 'dictionary_code.txt']:
>>>>>> +            fn = os.path.join(codespell_dir, 'data', fn)
>>>>>> +            if os.path.exists(fn):
>>>>>> +                codespell_files.append(fn)
>>>>>>      except:
>>>>>> -        codespell_file = ''
>>>>>> +        pass
>>>>>>  
>>>>>>      try:
>>>>>>          extra_keywords = ['ovs', 'vswitch', 'vswitchd', 'ovs-vswitchd',
>>>>>> @@ -121,8 +123,8 @@ def open_spell_check_dict():
>>>>>>  
>>>>>>          spell_check_dict = enchant.Dict("en_US")
>>>>>>  
>>>>>> -        if codespell_file:
>>>>>> -            with open(codespell_file) as f:
>>>>>> +        for fn in codespell_files:
>>>>>> +            with open(fn) as f:
>>>>>>                  for line in f.readlines():
>>>>>>                      words = line.strip().split('>')[1].strip(', 
>>>>>> ').split(',')
>>>>>>                      for word in words:
>>>>>
>>>
> 

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to