brian.gallagher <oss.brn...@gmail.com> added the comment:
Just giving this a bump, in case it has been forgotten about. I've posted a patch at https://github.com/python/cpython/pull/18983. It adds a new parameter "ignorecase" to get_close_matches() that, if set to True, will result in the SequenceMatcher treating any character case insensitively (as determined by str.lower()). The benefit to using this keyword, as opposed to letting the application handle the normalization, is that it saves on memory. If the application has to normalize and supply a separate list to get_close_matches(), then it ends up having to maintain a mapping between the original string and the normalized string. As an example: >>> from difflib import get_close_matches >>> word = 'apple' >>> possibilities = ['apPLE', 'APPLE', 'APE', 'Banana', 'Fruit', 'PEAR', >>> 'CoCoNuT'] >>> normalized_possibilities = {p.lower(): p for p in possibilities} >>> result = get_close_matches(word, normalized_possibilities.keys()) >>> result ['apple', 'ape'] >>> normalized_result = [normalized_possibilities[r] for r in result] >>> normalized_result ['APPLE', 'APE'] By letting the SequenceMatcher handle the casing on the fly, we could potentially save large amounts of memory if someone was providing a huge list to get_close_matches. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue39891> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com