On Mon, Sep 5, 2016 at 9:40 AM, Ned Batchelder <n...@nedbatchelder.com> wrote: > But, 'CAP' appears in 'CAPITAL', which gives more than 1800 matches: > > >>> for c in range(32, 0x110000): > ... try: > ... name = unicodedata.name(chr(c)) > ... except ValueError: > ... continue > ... if 'CAP' in name: > ... print(c, name) > ... > 65 LATIN CAPITAL LETTER A > 66 LATIN CAPITAL LETTER B > .. > .. many other lines, mostly with CAPITAL in them .. > .. > 917593 TAG LATIN CAPITAL LETTER Y > 917594 TAG LATIN CAPITAL LETTER Z > >>>
FWIW, hex is much more common for displaying Unicode codepoints than decimal is. So I'd print it like this (incorporating the 'not CAPITAL' filter): >>> for c in range(32, 0x110000): ... try: ... name = unicodedata.name(chr(c)) ... except ValueError: ... continue ... if 'CAP' in name and 'CAPITAL' not in name: ... print("U+%04X %s" % (c, name)) ... U+20E3 COMBINING ENCLOSING KEYCAP U+2293 SQUARE CAP U+2410 SYMBOL FOR DATA LINK ESCAPE U+241B SYMBOL FOR ESCAPE U+2651 CAPRICORN U+2E3F CAPITULUM U+A2B9 YI SYLLABLE CAP U+CC42 HANGUL SYLLABLE CAP U+101D3 PHAISTOS DISC SIGN CAPTIVE U+1D10A MUSICAL SYMBOL DA CAPO U+1F306 CITYSCAPE AT DUSK U+1F393 GRADUATION CAP U+1F3D4 SNOW CAPPED MOUNTAIN U+1F3D9 CITYSCAPE U+1F51F KEYCAP TEN U+1F74E ALCHEMICAL SYMBOL FOR CAPUT MORTUUM >>> Takes advantage of %04X giving a minimum, but not maximum, of four digits :) ChrisA -- https://mail.python.org/mailman/listinfo/python-list