Re: Is Unicode support so hard...
jmfauth於 2013年4月21日星期日UTC+8上午1時12分43秒寫道: In a previous post, http://groups.google.com/group/comp.lang.python/browse_thread/thread/6aec70817705c226# , Chris “Kwpolska” Warrick wrote: “Is Unicode support so hard, especially in the 21st century?” -- Unicode is not really complicate and it works very well (more than two decades of development if you take into account iso-14). But, - I can say, as usual - people prefer to spend their time to make a better Unicode than Unicode and it usually fails. Python does not escape to this rule. - I'm busy with TeX (unicode engine variant), fonts and typography. This gives me plenty of ideas to test the flexible string representation (FSR). I should recognize this FSR is failing particulary very well... I can almost say, a delight. jmf Unicode lover To support the unicode is easy in the language part. But to support the unicode in a platform involves the OS and the display and input hardware devices which are not suitable to be free most of the time. -- http://mail.python.org/mailman/listinfo/python-list
Re: Is Unicode support so hard...
On 4/20/2013 9:37 PM, rusi wrote: I believe that the recent correction in unicode performance followed jmf's grumbles No, the correction followed upon his accurate report of a regression, last August, which was unfortunately mixed in with grumbles and inaccurate claims. Others separated out and verified the accurate report. I reported it to pydev and enquired as to its necessity, I believe Mark opened the tracker issue, and the two people who worked on optimizing 3.3 a year ago fairly quickly came up with two different patches. The several month delay after was a matter of testing and picking the best approach. -- http://mail.python.org/mailman/listinfo/python-list
Re: Is Unicode support so hard...
On 21/04/2013 10:02, Terry Jan Reedy wrote: On 4/20/2013 9:37 PM, rusi wrote: I believe that the recent correction in unicode performance followed jmf's grumbles No, the correction followed upon his accurate report of a regression, last August, which was unfortunately mixed in with grumbles and inaccurate claims. Others separated out and verified the accurate report. I reported it to pydev and enquired as to its necessity, I believe Mark opened the tracker issue, and the two people who worked on optimizing 3.3 a year ago fairly quickly came up with two different patches. The several month delay after was a matter of testing and picking the best approach. I'd again like to point out that all I did was raise the issue. It was based on data provided by Steven D'Aprano and confirmed by Serhiy Storchaka. -- If you're using GoogleCrap™ please read this http://wiki.python.org/moin/GoogleGroupsPython. Mark Lawrence -- http://mail.python.org/mailman/listinfo/python-list
Is Unicode support so hard...
In a previous post, http://groups.google.com/group/comp.lang.python/browse_thread/thread/6aec70817705c226# , Chris “Kwpolska” Warrick wrote: “Is Unicode support so hard, especially in the 21st century?” -- Unicode is not really complicate and it works very well (more than two decades of development if you take into account iso-14). But, - I can say, as usual - people prefer to spend their time to make a better Unicode than Unicode and it usually fails. Python does not escape to this rule. - I'm busy with TeX (unicode engine variant), fonts and typography. This gives me plenty of ideas to test the flexible string representation (FSR). I should recognize this FSR is failing particulary very well... I can almost say, a delight. jmf Unicode lover -- http://mail.python.org/mailman/listinfo/python-list
Re: Is Unicode support so hard...
On 4/20/2013 1:12 PM, jmfauth wrote: In a previous post, http://groups.google.com/group/comp.lang.python/browse_thread/thread/6aec70817705c226# , Chris “Kwpolska” Warrick wrote: “Is Unicode support so hard, especially in the 21st century?” -- Unicode is not really complicate and it works very well (more than two decades of development if you take into account iso-14). But, - I can say, as usual - people prefer to spend their time to make a better Unicode than Unicode and it usually fails. Python does not escape to this rule. - I'm busy with TeX (unicode engine variant), fonts and typography. This gives me plenty of ideas to test the flexible string representation (FSR). I should recognize this FSR is failing particulary very well... I can almost say, a delight. jmf Unicode lover I'm totally confused about what you are saying. What does make a better Unicode than Unicode mean? Are you saying that Python is guilty of this? In what way? Can you provide specifics? Or are you saying that you like how Python has implemented it? FSR is failing ... a delight? I don't know what you mean. --Ned. -- http://mail.python.org/mailman/listinfo/python-list
Re: Is Unicode support so hard...
On Sat, Apr 20, 2013 at 10:22 AM, Ned Batchelder n...@nedbatchelder.com wrote: On 4/20/2013 1:12 PM, jmfauth wrote: In a previous post, http://groups.google.com/group/comp.lang.python/browse_thread/thread/6aec70817705c226# , Chris “Kwpolska” Warrick wrote: “Is Unicode support so hard, especially in the 21st century?” -- Unicode is not really complicate and it works very well (more than two decades of development if you take into account iso-14). But, - I can say, as usual - people prefer to spend their time to make a better Unicode than Unicode and it usually fails. Python does not escape to this rule. - I'm busy with TeX (unicode engine variant), fonts and typography. This gives me plenty of ideas to test the flexible string representation (FSR). I should recognize this FSR is failing particulary very well... I can almost say, a delight. jmf Unicode lover I'm totally confused about what you are saying. What does make a better Unicode than Unicode mean? Are you saying that Python is guilty of this? In what way? Can you provide specifics? Or are you saying that you like how Python has implemented it? FSR is failing ... a delight? I don't know what you mean. --Ned. Don't bother trying to figure this out. jmfauth has been hijacking every thread that mentions Unicode to complain about the flexible string representation introduced in Python 3.3. Apparently, having proper Unicode semantics (indexing is based on characters, not code points) at the expense of performance when calling .replace on the only non-ASCII or BMP character in the string is a horrible bug. -- http://mail.python.org/mailman/listinfo/python-list
Re: Is Unicode support so hard...
On Sun, Apr 21, 2013 at 3:22 AM, Ned Batchelder n...@nedbatchelder.com wrote: I'm totally confused about what you are saying. What does make a better Unicode than Unicode mean? Are you saying that Python is guilty of this? In what way? Can you provide specifics? Or are you saying that you like how Python has implemented it? FSR is failing ... a delight? I don't know what you mean. You're not familiar with jmf? He's one of our resident trolls. Allow me to summarize Python 3's Unicode support... From 3.0 up to and including 3.2.x, Python could be built as either narrow or wide. A wide build consumes four bytes per character in every string, which is rather wasteful (given that very few strings actually NEED that); a narrow build gets some things wrong. (I'm using a 2.7 here as I don't have a narrow-build 3.x handy; the same considerations apply, though.) Python 2.7.4 (default, Apr 6 2013, 19:54:46) [MSC v.1500 32 bit (Intel)] on win32 Type copyright, credits or license() for more information. len(uasdf\U00012345qwer) 10 uasdf\U00012345qwer[8] u'e' In a narrow build, strings are stored in UTF-16, so astral characters count as two. This means that a program will behave unexpectedly differently on different platforms (other languages, such as ECMAScript, actually *mandate* UTF-16; at least this means you can depend on this otherwise-bizarre behaviour regardless of what platform you're on), and I have to say this is counter-intuitive. Enter Python 3.3 and PEP 393 strings. Now *EVERY* Python build is, conceptually, wide. (I'm not sure how PEP 393 applies to other Pythons - Jython, PyPy, etc - so assume that whenever I refer to Python, I'm restricting this to CPython.) The underlying representation might be more efficient, but to the script, it's exactly the same as a wide build. If a string has no characters that demand more width, it'll be stored nice and narrow. (It's the same technique that Pike has been using for a while, so it's a proven system; in any case, we know that this is going to work, it's just a question of performance - it adds a fixed overhead.) Great! We save memory in Python programs. Wonderful! Right? Enter jmf. No, it's not wonderful, because OBVIOUSLY Python is now America-centric, because now the full Unicode range is divided into these ones get stored in 1 byte per char, these in 2, these in 4. Clearly that's making life way worse for everyone else. Also, compared to the narrow build that jmf was previously using, this uses heaps MORE space in the stupid micro-benchmarks that he keeps on trotting out, because he has just one astral character in a sea of ASCII. And that's totally what programs are doing all the time, too. Never mind that basic operations like length, slicing, etc are no longer buggy, no, Python has taken a terrible step backwards here. Oh, and check this out: def munge(s): Move characters around in a string. l=len(s)//4 return s[:l]+s[l*2:l*3]+s[l:l*2]+s[l*3:] munge(asdfqwerzxcv1234) 'asdfzxcvqwer1234' Looks fine. munge(uasd\U00012345we\U00034567xc\U00023456bla) u'asd\U00012167xc\U00023745we\U00034456bla' Where'd those characters come from? I was just moving stuff around, right? I can't get new characters out of it... can I? Flash forward to current date, and jmf has hijacked so many threads to moan about PEP 393 that I'm actually happy about this one, simply because he gave it a new subject line and one appropriate to a discussion about Unicode. ChrisA -- http://mail.python.org/mailman/listinfo/python-list
Re: Is Unicode support so hard...
On Sat, Apr 20, 2013 at 8:02 PM, Benjamin Kaplan benjamin.kap...@case.edu wrote: On Sat, Apr 20, 2013 at 10:22 AM, Ned Batchelder n...@nedbatchelder.com wrote: On 4/20/2013 1:12 PM, jmfauth wrote: In a previous post, http://groups.google.com/group/comp.lang.python/browse_thread/thread/6aec70817705c226# , Chris “Kwpolska” Warrick wrote: “Is Unicode support so hard, especially in the 21st century?” -- Unicode is not really complicate and it works very well (more than two decades of development if you take into account iso-14). But, - I can say, as usual - people prefer to spend their time to make a better Unicode than Unicode and it usually fails. Python does not escape to this rule. - I'm busy with TeX (unicode engine variant), fonts and typography. This gives me plenty of ideas to test the flexible string representation (FSR). I should recognize this FSR is failing particulary very well... I can almost say, a delight. jmf Unicode lover I'm totally confused about what you are saying. What does make a better Unicode than Unicode mean? Are you saying that Python is guilty of this? In what way? Can you provide specifics? Or are you saying that you like how Python has implemented it? FSR is failing ... a delight? I don't know what you mean. --Ned. Don't bother trying to figure this out. jmfauth has been hijacking every thread that mentions Unicode to complain about the flexible string representation introduced in Python 3.3. Apparently, having proper Unicode semantics (indexing is based on characters, not code points) at the expense of performance when calling .replace on the only non-ASCII or BMP character in the string is a horrible bug. -- http://mail.python.org/mailman/listinfo/python-list Don’t forget the original context: this was a short remark to a guy I was responding to. His newsgroups software (slrn according to the headers) mangled the encoding of U+201C and U+201D in my From field, turning them into three question marks each. And jmf started a rant, as usual… PS. There are two fancy Unicode characters around. Can you find both of them, jmf? -- Kwpolska http://kwpolska.tk | GPG KEY: 5EAAEA16 stop html mail| always bottom-post http://asciiribbon.org| http://caliburn.nl/topposting.html -- http://mail.python.org/mailman/listinfo/python-list
Re: Is Unicode support so hard...
On 20/04/2013 19:02, Benjamin Kaplan wrote: On Sat, Apr 20, 2013 at 10:22 AM, Ned Batchelder n...@nedbatchelder.com wrote: On 4/20/2013 1:12 PM, jmfauth wrote: In a previous post, http://groups.google.com/group/comp.lang.python/browse_thread/thread/6aec70817705c226# , Chris “Kwpolska” Warrick wrote: “Is Unicode support so hard, especially in the 21st century?” -- Unicode is not really complicate and it works very well (more than two decades of development if you take into account iso-14). But, - I can say, as usual - people prefer to spend their time to make a better Unicode than Unicode and it usually fails. Python does not escape to this rule. - I'm busy with TeX (unicode engine variant), fonts and typography. This gives me plenty of ideas to test the flexible string representation (FSR). I should recognize this FSR is failing particulary very well... I can almost say, a delight. jmf Unicode lover I'm totally confused about what you are saying. What does make a better Unicode than Unicode mean? Are you saying that Python is guilty of this? In what way? Can you provide specifics? Or are you saying that you like how Python has implemented it? FSR is failing ... a delight? I don't know what you mean. --Ned. Don't bother trying to figure this out. jmfauth has been hijacking every thread that mentions Unicode to complain about the flexible string representation introduced in Python 3.3. Apparently, having proper Unicode semantics (indexing is based on characters, not code points) at the expense of performance when calling .replace on the only non-ASCII or BMP character in the string is a horrible bug. He can't complain about performance for the .replace issue any more as it's been fixed http://bugs.python.org/issue16061 Sadly he'll almost certainly have more edge cases up his sleeve while continuing to ignore minor issues like memory saving and correctness. -- If you're using GoogleCrap™ please read this http://wiki.python.org/moin/GoogleGroupsPython. Mark Lawrence -- http://mail.python.org/mailman/listinfo/python-list
Re: Is Unicode support so hard...
Hi jmf, This gives me plenty of ideas to test the flexible string representation (FSR). I should recognize this FSR is failing particulary very well... This is too vague for me. Which string representation should Python use? 1) UTF-32 2) UTF-8 3) Python 3.3 -- 1, 2, or 4 bytes per character decided at runtime 4) Python 3.2 -- 2 or 4 bytes per character decided at Python build time 5) Something else Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: Is Unicode support so hard...
On 04/20/2013 11:14 AM, Chris Angelico wrote: Flash forward to current date, and jmf has hijacked so many threads to moan about PEP 393 that I'm actually happy about this one, simply because he gave it a new subject line and one appropriate to a discussion about Unicode. +1000 -- http://mail.python.org/mailman/listinfo/python-list
Re: Is Unicode support so hard...
On Apr 21, 4:03 am, Neil Hodgson nhodg...@iinet.net.au wrote: Hi jmf, This gives me plenty of ideas to test the flexible string representation (FSR). I should recognize this FSR is failing particulary very well... This is too vague for me. Which string representation should Python use? 1) UTF-32 2) UTF-8 3) Python 3.3 -- 1, 2, or 4 bytes per character decided at runtime 4) Python 3.2 -- 2 or 4 bytes per character decided at Python build time 5) Something else jmf recommends UTF-8. Apart from the fact the UTF-8 would be less (time) performant in all cases and more extremely so in cases like indexing, the fact that jmf says so makes it more ridiculous. According to jmf python sucks up to ASCII (those big bad Americans… of whom Steven is the first…) whereas unicode is the true international/ universal standard. I guess the irony is clear to all (except jmf) given that: - its unicode that sucks up to ASCII by carefully conforming in the first 127 positions including the completely useless control chars; python just implements the standard - UTF-8 is an ASCII-biased unicode-compression method viz UTF-8 is most space-efficient on ASCII at the cost of being generally time- inefficient - All jmf's beefs (as far as I remember) are variations on the theme: time-inefficiency is equivalent to non-unicode-compliant In short he manifests a dog-in-the-manger mindset: Since the whole world will never speak french (grief, mope, grumble, thrash…) everyone should pay for the Chinese character set's size even if they are monolingually English All that said… I believe that the recent correction in unicode performance followed jmf's grumbles (Mark please correct me if I am wrong) So python community can be thankful to jmf even if he insists on laboring under bizarre political hallucinations. [Written from India where a monolingual person is as rare as a palmtree on a polecap] -- http://mail.python.org/mailman/listinfo/python-list
Re: Is Unicode support so hard...
On Sat, 20 Apr 2013 18:37:00 -0700, rusi wrote: According to jmf python sucks up to ASCII (those big bad Americans… of whom Steven is the first…) Watch who you're calling an American, mate. -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: Is Unicode support so hard...
On Sun, Apr 21, 2013 at 1:36 PM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: On Sat, 20 Apr 2013 18:37:00 -0700, rusi wrote: According to jmf python sucks up to ASCII (those big bad Americans… of whom Steven is the first…) Watch who you're calling an American, mate. I think he knows, and that's why he said it. You and I are foremost among Americans who are destroying Python. ChrisA -- http://mail.python.org/mailman/listinfo/python-list