New submission from Jim Jewett <jimjjew...@gmail.com>:

> http://hg.python.org/cpython/rev/0b5ce36a7a24
> changeset:   74515:0b5ce36a7a24


> +   Casefolding is similar to lowercasing but more aggressive because it is
> +   intended to remove all case distinctions in a string. For example, the 
> German
> +   lowercase letter ``'ß'`` is equivalent to ``"ss"``. Since it is already
> +   lowercase, :meth:`lower` would do nothing to ``'ß'``; :meth:`casefold`
> +   converts it to ``"ss"``.

Perhaps add the recommendation to canonicalize as well.

A complete, but possibly too long, try is below:


Casefolding is similar to lowercasing but more aggressive because it is 
intended to remove all case distinctions in a string. For example, the German 
lowercase letter ``'ß'`` is equivalent to ``"ss"``. Since it is already 
lowercase, :meth:`lower` would do nothing to ``'ß'``; :meth:`casefold` converts 
it to ``"ss"``.  Note that most case-insensitive matches should also match 
compatibility equivalent characters.  

The casefolding algorithm is described in section 3.13 of the Unicode Standard. 
 Per D146, a compatibility caseless match can be achieved by

    from unicodedata import normalize
    def caseless_compat(string):
        nfd_string = normalize("NFD", string)
        nfkd1_string = normalize("NFKD", nfd_string.casefold())
        return normalize("NFKD", nfkd1_string.casefold())

----------
assignee: docs@python
components: Documentation
messages: 151644
nosy: Jim.Jewett, benjamin.peterson, docs@python
priority: normal
severity: normal
status: open
title: Further improve casefold documentation
versions: Python 3.3

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue13828>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to