Roi Dayan <[email protected]> writes: > On 18/02/2025 15:30, Aaron Conole wrote: >> Roi Dayan <[email protected]> writes: >> >>> On 12/02/2025 19:18, Aaron Conole wrote: >>>> Hi Roi, >>>> >>>> Roi Dayan via dev <[email protected]> writes: >>>> >>>>> Load dictionary_code.txt in addition to the default dictionary. >>>> >>>> The code dictionary isn't loaded by default with codespell >>>> (codespell_lib/_codespell.py):: >>>> >>>> _builtin_default = "clear,rare" >>>> >>>> And there are some questionable conversions in that dictionary (like >>>> uint to unit and stdio to studio). I think adding the _rare dictionary >>>> could make sense, but perhaps we should be more careful when adding the >>>> others. >>>> >>>> Can you add the rationale for turning these on? I think it's okay to >>>> turn on more than one codespell dict, but we should consider the >>>> individual dictionaries, too. >>> >>> I don't think it matters what is loaded by default or not as the script >>> uses enchant and not codespell. >> >> Yes, but the point is the codespell authors don't think that this >> dictionary is a good default. >> >>> Also don't look at the conversions as it's not being used since we don't >>> use codespell. In the code below it's being stripped to take only the >>> final wording and add to enchant as allowed words. >>> >>> I looked again also in the others and I think most of the words already in >>> enchant dictionary but loading them won't harm. >>> I do think we can skip the main dictionary_en-GB_to_en-US.txt for example >>> as we use the enchant en_US dictionary which should be equal more or less. >>> The other has more unique words which I think this is what we can say in the >>> commit message. >>> >>> What do you think? >> >> Yes, as you noted most of the words are already there. I actually ran >> through many of the RHS spellings, and they already appear (as you >> noted). Actually, we only are not already getting: >> >> * copiable >> * clonable >> * subpatches >> * traceback >> * tracebacks >> >> Just 5 words and they are not actually universally agreed upon >> spellings. For example, if I use something like wiktionary (not the >> most authoritative source, I agree): >> >> https://en.wiktionary.org/wiki/clonable#English >> >> It says that 'cloneable' is an alternative form used in computing >> context. Enchant suggests 'clone able' or 'clone-able' >> >> Likewise, there isn't an accepted form of copiable (and enchant does >> similar, including with subpatches). >> >> So I guess 'traceback' and 'tracebacks' are for sure the ones that there >> isn't yet any ambiguity. >> >> Anyway, I guess it's okay to add, but we should probably consider >> looking at all the dictionaries and seeing which ones make sense to add >> as well. Otherwise, it's quite a bit of change here for something that >> could be done by just adding the words above directly (ie: you make 7 >> lines of change here, vs adding words to extra_keywords). >> > > yes but this change allows newer versions of codespell with potential > updates to the dictionary to catch in. > > I looked a bit in the other dictionaries. > We probably don't want the main one dictionary_en-GB_to_en-US.txt as we > use enchant for core words. > Also we probably won't need dictionary_usage.txt, dictionary_rare.txt, > dictionary_names.txt as they seem to be more for spelling mistakes rather > than introducing words. > > So the only exception is dictionary.txt which is already loaded and > dictionary_code.txt which seems to add those more accepted words > like you noted. > > So I don't think we need to add the others. from here we can keep > updating the internal list. > > What do you think?
I've been thinking about it, and I think it could be useful to have this facility. Can you make the dictionary selection also configurable via command line (similar to codespell option)? >>> Files I see in codespell path: >>> >>> dictionary_code.txt >>> dictionary_en-GB_to_en-US.txt >>> dictionary_informal.txt >>> dictionary_names.txt >>> dictionary_rare.txt >>> dictionary.txt >>> dictionary_usage.txt >>> >>> >>>> >>>>> Signed-off-by: Roi Dayan <[email protected]> >>>>> Acked-by: Salem Sol <[email protected]> >>>>> --- >>>>> utilities/checkpatch.py | 14 ++++++++------ >>>>> 1 file changed, 8 insertions(+), 6 deletions(-) >>>>> >>>>> diff --git a/utilities/checkpatch.py b/utilities/checkpatch.py >>>>> index f8caeb811604..9571380c291f 100755 >>>>> --- a/utilities/checkpatch.py >>>>> +++ b/utilities/checkpatch.py >>>>> @@ -42,14 +42,16 @@ missing_authors = [] >>>>> def open_spell_check_dict(): >>>>> import enchant >>>>> >>>>> + codespell_files = [] >>>>> try: >>>>> import codespell_lib >>>>> codespell_dir = os.path.dirname(codespell_lib.__file__) >>>>> - codespell_file = os.path.join(codespell_dir, 'data', >>>>> 'dictionary.txt') >>>>> - if not os.path.exists(codespell_file): >>>>> - codespell_file = '' >>>>> + for fn in ['dictionary.txt', 'dictionary_code.txt']: >>>>> + fn = os.path.join(codespell_dir, 'data', fn) >>>>> + if os.path.exists(fn): >>>>> + codespell_files.append(fn) >>>>> except: >>>>> - codespell_file = '' >>>>> + pass >>>>> >>>>> try: >>>>> extra_keywords = ['ovs', 'vswitch', 'vswitchd', 'ovs-vswitchd', >>>>> @@ -121,8 +123,8 @@ def open_spell_check_dict(): >>>>> >>>>> spell_check_dict = enchant.Dict("en_US") >>>>> >>>>> - if codespell_file: >>>>> - with open(codespell_file) as f: >>>>> + for fn in codespell_files: >>>>> + with open(fn) as f: >>>>> for line in f.readlines(): >>>>> words = line.strip().split('>')[1].strip(', >>>>> ').split(',') >>>>> for word in words: >>>> >> _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
