On Jan 14, 11:48 am, Stefan Behnel <stefan...@behnel.de> wrote: > Sadly, the OP did not clearly state that the required feature > is really not supported by "textwrap" and in what way textwrap > behaves differently. That would have helped in answering.
Oh, textwrap doesn’t work for arbitrary Unicode text at all. For example, it separates combining sequences: >>> s = "tiếng Việt" # precomposed >>> len(s) 10 >>> s = "tiếng Việt" # combining >>> len(s) # number of unicode characters; ≠ line length 14 >>> print(textwrap.fill(s, width=4)) # breaks sequences tiê ng Viê t It also doesn’t know about double-width characters: >>> s1 = "日本語のテキト" >>> s2 = "12345678901234" # both s1 and s2 use 14 columns >>> print(textwrap.fill(s1, width=7)) 日本語のテキト >>> print(textwrap.fill(s2, width=7)) 1234567 8901234 It doesn’t know about non-ascii punctuation: >>> print(textwrap.fill("abc-def", width=5)) # ASCII minus-hyphen abc- def >>> print(textwrap.fill("abc‐def", width=5)) # true hyphen U+2010 abc‐d ef It doesn’t know East Asian filling rules (though this is perhaps pushing it a bit beyond textwrap’s goals): >>> print(textwrap.fill("日本語、中国語", width=3)) 日本語 、中国 # should avoid linebreak before CJK punctuation 語 And it generally doesn’t try to pick good places to break lines at all, just making the assumption that 1 character = 1 column and that breaking on ASCII whitespaces/hyphens is enough. We can’t really blame textwrap for that, it is a very simple module and Unicode line breaking gets complex fast (that’s why the consortium provides a ready-made algorithm). It’s just that, with python3’s emphasis on Unicode support, I was surprised not to be able to find an UAX #14 implementation. I thought someone would surely have written one and I simply couldn’t find, so I asked precisely that. -- http://mail.python.org/mailman/listinfo/python-list