Re: Is Unicode support so hard...

2013-04-21 Thread 88888 Dihedral
jmfauth於 2013年4月21日星期日UTC+8上午1時12分43秒寫道:
 In a previous post,
 
 
 
 http://groups.google.com/group/comp.lang.python/browse_thread/thread/6aec70817705c226#
 
 ,
 
 
 
 Chris “Kwpolska” Warrick wrote:
 
 
 
 “Is Unicode support so hard, especially in the 21st century?”
 
 
 
 --
 
 
 
 Unicode is not really complicate and it works very well (more
 
 than two decades of development if you take into account
 
 iso-14).
 
 
 
 But, - I can say, as usual - people prefer to spend their
 
 time to make a better Unicode than Unicode and it usually
 
 fails. Python does not escape to this rule.
 
 
 
 -
 
 
 
 I'm busy with TeX (unicode engine variant), fonts and typography.
 
 This gives me plenty of ideas to test the flexible string
 
 representation (FSR). I should recognize this FSR is failing
 
 particulary very well...
 
 
 
 I can almost say, a delight.
 
 
 
 jmf
 
 Unicode lover

To support the unicode is easy in the language part.
But to support the unicode in a platform involves
the OS and the display and input hardware devices 
which are not suitable to be free most of the time.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Is Unicode support so hard...

2013-04-21 Thread Terry Jan Reedy

On 4/20/2013 9:37 PM, rusi wrote:


I believe that the recent correction in unicode performance followed
jmf's grumbles


No, the correction followed upon his accurate report of a regression, 
last August, which was unfortunately mixed in with grumbles and 
inaccurate claims. Others separated out and verified the accurate 
report. I reported it to pydev and enquired as to its necessity, I 
believe Mark opened the tracker issue, and the two people who worked on 
optimizing 3.3 a year ago fairly quickly came up with two different 
patches. The several month delay after was a matter of testing and 
picking the best approach.



--
http://mail.python.org/mailman/listinfo/python-list


Re: Is Unicode support so hard...

2013-04-21 Thread Mark Lawrence

On 21/04/2013 10:02, Terry Jan Reedy wrote:

On 4/20/2013 9:37 PM, rusi wrote:


I believe that the recent correction in unicode performance followed
jmf's grumbles


No, the correction followed upon his accurate report of a regression,
last August, which was unfortunately mixed in with grumbles and
inaccurate claims. Others separated out and verified the accurate
report. I reported it to pydev and enquired as to its necessity, I
believe Mark opened the tracker issue, and the two people who worked on
optimizing 3.3 a year ago fairly quickly came up with two different
patches. The several month delay after was a matter of testing and
picking the best approach.




I'd again like to point out that all I did was raise the issue.  It was 
based on data provided by Steven D'Aprano and confirmed by Serhiy Storchaka.


--
If you're using GoogleCrap™ please read this 
http://wiki.python.org/moin/GoogleGroupsPython.


Mark Lawrence

--
http://mail.python.org/mailman/listinfo/python-list


Is Unicode support so hard...

2013-04-20 Thread jmfauth
In a previous post,

http://groups.google.com/group/comp.lang.python/browse_thread/thread/6aec70817705c226#
,

Chris “Kwpolska” Warrick wrote:

“Is Unicode support so hard, especially in the 21st century?”

--

Unicode is not really complicate and it works very well (more
than two decades of development if you take into account
iso-14).

But, - I can say, as usual - people prefer to spend their
time to make a better Unicode than Unicode and it usually
fails. Python does not escape to this rule.

-

I'm busy with TeX (unicode engine variant), fonts and typography.
This gives me plenty of ideas to test the flexible string
representation (FSR). I should recognize this FSR is failing
particulary very well...

I can almost say, a delight.

jmf
Unicode lover
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Is Unicode support so hard...

2013-04-20 Thread Ned Batchelder

On 4/20/2013 1:12 PM, jmfauth wrote:

In a previous post,

http://groups.google.com/group/comp.lang.python/browse_thread/thread/6aec70817705c226#
,

Chris “Kwpolska” Warrick wrote:

“Is Unicode support so hard, especially in the 21st century?”

--

Unicode is not really complicate and it works very well (more
than two decades of development if you take into account
iso-14).

But, - I can say, as usual - people prefer to spend their
time to make a better Unicode than Unicode and it usually
fails. Python does not escape to this rule.

-

I'm busy with TeX (unicode engine variant), fonts and typography.
This gives me plenty of ideas to test the flexible string
representation (FSR). I should recognize this FSR is failing
particulary very well...

I can almost say, a delight.

jmf
Unicode lover
I'm totally confused about what you are saying.  What does make a 
better Unicode than Unicode mean?  Are you saying that Python is guilty 
of this?  In what way?  Can you provide specifics?  Or are you saying 
that you like how Python has implemented it?  FSR is failing ... a 
delight?  I don't know what you mean.


--Ned.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Is Unicode support so hard...

2013-04-20 Thread Benjamin Kaplan
On Sat, Apr 20, 2013 at 10:22 AM, Ned Batchelder n...@nedbatchelder.com wrote:
 On 4/20/2013 1:12 PM, jmfauth wrote:

 In a previous post,


 http://groups.google.com/group/comp.lang.python/browse_thread/thread/6aec70817705c226#
 ,

 Chris “Kwpolska” Warrick wrote:

 “Is Unicode support so hard, especially in the 21st century?”

 --

 Unicode is not really complicate and it works very well (more
 than two decades of development if you take into account
 iso-14).

 But, - I can say, as usual - people prefer to spend their
 time to make a better Unicode than Unicode and it usually
 fails. Python does not escape to this rule.

 -

 I'm busy with TeX (unicode engine variant), fonts and typography.
 This gives me plenty of ideas to test the flexible string
 representation (FSR). I should recognize this FSR is failing
 particulary very well...

 I can almost say, a delight.

 jmf
 Unicode lover

 I'm totally confused about what you are saying.  What does make a better
 Unicode than Unicode mean?  Are you saying that Python is guilty of this?
 In what way?  Can you provide specifics?  Or are you saying that you like
 how Python has implemented it?  FSR is failing ... a delight?  I don't
 know what you mean.

 --Ned.

Don't bother trying to figure this out. jmfauth has been hijacking
every thread that mentions Unicode to complain about the flexible
string representation introduced in Python 3.3. Apparently, having
proper Unicode semantics (indexing is based on characters, not code
points) at the expense of performance when calling .replace on the
only non-ASCII or BMP character in the string is a horrible bug.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Is Unicode support so hard...

2013-04-20 Thread Chris Angelico
On Sun, Apr 21, 2013 at 3:22 AM, Ned Batchelder n...@nedbatchelder.com wrote:
 I'm totally confused about what you are saying.  What does make a better
 Unicode than Unicode mean?  Are you saying that Python is guilty of this?
 In what way?  Can you provide specifics?  Or are you saying that you like
 how Python has implemented it?  FSR is failing ... a delight?  I don't
 know what you mean.

You're not familiar with jmf? He's one of our resident trolls. Allow
me to summarize Python 3's Unicode support...

From 3.0 up to and including 3.2.x, Python could be built as either
narrow or wide. A wide build consumes four bytes per character in
every string, which is rather wasteful (given that very few strings
actually NEED that); a narrow build gets some things wrong. (I'm using
a 2.7 here as I don't have a narrow-build 3.x handy; the same
considerations apply, though.)

Python 2.7.4 (default, Apr  6 2013, 19:54:46) [MSC v.1500 32 bit
(Intel)] on win32
Type copyright, credits or license() for more information.
 len(uasdf\U00012345qwer)
10
 uasdf\U00012345qwer[8]
u'e'

In a narrow build, strings are stored in UTF-16, so astral characters
count as two. This means that a program will behave unexpectedly
differently on different platforms (other languages, such as
ECMAScript, actually *mandate* UTF-16; at least this means you can
depend on this otherwise-bizarre behaviour regardless of what platform
you're on), and I have to say this is counter-intuitive.

Enter Python 3.3 and PEP 393 strings. Now *EVERY* Python build is,
conceptually, wide. (I'm not sure how PEP 393 applies to other Pythons
- Jython, PyPy, etc - so assume that whenever I refer to Python, I'm
restricting this to CPython.) The underlying representation might be
more efficient, but to the script, it's exactly the same as a wide
build. If a string has no characters that demand more width, it'll be
stored nice and narrow. (It's the same technique that Pike has been
using for a while, so it's a proven system; in any case, we know that
this is going to work, it's just a question of performance - it adds a
fixed overhead.) Great! We save memory in Python programs. Wonderful!
Right?

Enter jmf. No, it's not wonderful, because OBVIOUSLY Python is now
America-centric, because now the full Unicode range is divided into
these ones get stored in 1 byte per char, these in 2, these in 4.
Clearly that's making life way worse for everyone else. Also, compared
to the narrow build that jmf was previously using, this uses heaps
MORE space in the stupid micro-benchmarks that he keeps on trotting
out, because he has just one astral character in a sea of ASCII. And
that's totally what programs are doing all the time, too. Never mind
that basic operations like length, slicing, etc are no longer buggy,
no, Python has taken a terrible step backwards here.

Oh, and check this out:

 def munge(s):
Move characters around in a string.
l=len(s)//4
return s[:l]+s[l*2:l*3]+s[l:l*2]+s[l*3:]

 munge(asdfqwerzxcv1234)
'asdfzxcvqwer1234'

Looks fine.

 munge(uasd\U00012345we\U00034567xc\U00023456bla)
u'asd\U00012167xc\U00023745we\U00034456bla'

Where'd those characters come from? I was just moving stuff around,
right? I can't get new characters out of it... can I?

Flash forward to current date, and jmf has hijacked so many threads to
moan about PEP 393 that I'm actually happy about this one, simply
because he gave it a new subject line and one appropriate to a
discussion about Unicode.

ChrisA
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Is Unicode support so hard...

2013-04-20 Thread Chris “Kwpolska” Warrick
On Sat, Apr 20, 2013 at 8:02 PM, Benjamin Kaplan
benjamin.kap...@case.edu wrote:
 On Sat, Apr 20, 2013 at 10:22 AM, Ned Batchelder n...@nedbatchelder.com 
 wrote:
 On 4/20/2013 1:12 PM, jmfauth wrote:

 In a previous post,


 http://groups.google.com/group/comp.lang.python/browse_thread/thread/6aec70817705c226#
 ,

 Chris “Kwpolska” Warrick wrote:

 “Is Unicode support so hard, especially in the 21st century?”

 --

 Unicode is not really complicate and it works very well (more
 than two decades of development if you take into account
 iso-14).

 But, - I can say, as usual - people prefer to spend their
 time to make a better Unicode than Unicode and it usually
 fails. Python does not escape to this rule.

 -

 I'm busy with TeX (unicode engine variant), fonts and typography.
 This gives me plenty of ideas to test the flexible string
 representation (FSR). I should recognize this FSR is failing
 particulary very well...

 I can almost say, a delight.

 jmf
 Unicode lover

 I'm totally confused about what you are saying.  What does make a better
 Unicode than Unicode mean?  Are you saying that Python is guilty of this?
 In what way?  Can you provide specifics?  Or are you saying that you like
 how Python has implemented it?  FSR is failing ... a delight?  I don't
 know what you mean.

 --Ned.

 Don't bother trying to figure this out. jmfauth has been hijacking
 every thread that mentions Unicode to complain about the flexible
 string representation introduced in Python 3.3. Apparently, having
 proper Unicode semantics (indexing is based on characters, not code
 points) at the expense of performance when calling .replace on the
 only non-ASCII or BMP character in the string is a horrible bug.
 --
 http://mail.python.org/mailman/listinfo/python-list

Don’t forget the original context: this was a short remark to a guy I
was responding to.  His newsgroups software (slrn according to the
headers) mangled the encoding of U+201C and U+201D in my From field,
turning them into three question marks each.  And jmf started a rant,
as usual…

PS. There are two fancy Unicode characters around.  Can you find both
of them, jmf?

--
Kwpolska http://kwpolska.tk | GPG KEY: 5EAAEA16
stop html mail| always bottom-post
http://asciiribbon.org| http://caliburn.nl/topposting.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Is Unicode support so hard...

2013-04-20 Thread Mark Lawrence

On 20/04/2013 19:02, Benjamin Kaplan wrote:

On Sat, Apr 20, 2013 at 10:22 AM, Ned Batchelder n...@nedbatchelder.com wrote:

On 4/20/2013 1:12 PM, jmfauth wrote:


In a previous post,


http://groups.google.com/group/comp.lang.python/browse_thread/thread/6aec70817705c226#
,

Chris “Kwpolska” Warrick wrote:

“Is Unicode support so hard, especially in the 21st century?”

--

Unicode is not really complicate and it works very well (more
than two decades of development if you take into account
iso-14).

But, - I can say, as usual - people prefer to spend their
time to make a better Unicode than Unicode and it usually
fails. Python does not escape to this rule.

-

I'm busy with TeX (unicode engine variant), fonts and typography.
This gives me plenty of ideas to test the flexible string
representation (FSR). I should recognize this FSR is failing
particulary very well...

I can almost say, a delight.

jmf
Unicode lover


I'm totally confused about what you are saying.  What does make a better
Unicode than Unicode mean?  Are you saying that Python is guilty of this?
In what way?  Can you provide specifics?  Or are you saying that you like
how Python has implemented it?  FSR is failing ... a delight?  I don't
know what you mean.

--Ned.


Don't bother trying to figure this out. jmfauth has been hijacking
every thread that mentions Unicode to complain about the flexible
string representation introduced in Python 3.3. Apparently, having
proper Unicode semantics (indexing is based on characters, not code
points) at the expense of performance when calling .replace on the
only non-ASCII or BMP character in the string is a horrible bug.



He can't complain about performance for the .replace issue any more as 
it's been fixed http://bugs.python.org/issue16061


Sadly he'll almost certainly have more edge cases up his sleeve while 
continuing to ignore minor issues like memory saving and correctness.


--
If you're using GoogleCrap™ please read this 
http://wiki.python.org/moin/GoogleGroupsPython.


Mark Lawrence

--
http://mail.python.org/mailman/listinfo/python-list


Re: Is Unicode support so hard...

2013-04-20 Thread Neil Hodgson

   Hi jmf,


This gives me plenty of ideas to test the flexible string
representation (FSR). I should recognize this FSR is failing
particulary very well...


   This is too vague for me.

   Which string representation should Python use?
1) UTF-32
2) UTF-8
3) Python 3.3 -- 1, 2, or 4 bytes per character decided at runtime
4) Python 3.2 -- 2 or 4 bytes per character decided at Python build time
5) Something else

   Neil
--
http://mail.python.org/mailman/listinfo/python-list


Re: Is Unicode support so hard...

2013-04-20 Thread Ethan Furman

On 04/20/2013 11:14 AM, Chris Angelico wrote:

Flash forward to current date, and jmf has hijacked so many threads to
moan about PEP 393 that I'm actually happy about this one, simply
because he gave it a new subject line and one appropriate to a
discussion about Unicode.


+1000
--
http://mail.python.org/mailman/listinfo/python-list


Re: Is Unicode support so hard...

2013-04-20 Thread rusi
On Apr 21, 4:03 am, Neil Hodgson nhodg...@iinet.net.au wrote:
     Hi jmf,

  This gives me plenty of ideas to test the flexible string
  representation (FSR). I should recognize this FSR is failing
  particulary very well...

     This is too vague for me.

     Which string representation should Python use?
 1) UTF-32
 2) UTF-8
 3) Python 3.3 -- 1, 2, or 4 bytes per character decided at runtime
 4) Python 3.2 -- 2 or 4 bytes per character decided at Python build time
 5) Something else

jmf recommends UTF-8.

Apart from the fact the UTF-8 would be less (time) performant in all
cases and more extremely so in cases like indexing, the fact that jmf
says so makes it more ridiculous.
According to jmf python sucks up to ASCII (those big bad Americans… of
whom Steven is the first…) whereas unicode is the true international/
universal standard.

I guess the irony is clear to all (except jmf) given that:
- its unicode that sucks up to ASCII by carefully conforming in the
first 127 positions including the completely useless control chars;
python just implements the standard
- UTF-8 is an ASCII-biased unicode-compression method viz UTF-8 is
most space-efficient on ASCII at the cost of being generally time-
inefficient
- All jmf's beefs (as far as I remember) are variations on the theme:
time-inefficiency is equivalent to non-unicode-compliant

In short he manifests a dog-in-the-manger mindset:
Since the whole world will never speak french (grief, mope, grumble,
thrash…) everyone should pay for the Chinese character set's size even
if they are monolingually English

All that said…

I believe that the recent correction in unicode performance followed
jmf's grumbles
(Mark please correct me if I am wrong)
So python community can be thankful to jmf even if he insists on
laboring under bizarre political hallucinations.

[Written from India where a monolingual person is as rare as a
palmtree on a polecap]
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Is Unicode support so hard...

2013-04-20 Thread Steven D'Aprano
On Sat, 20 Apr 2013 18:37:00 -0700, rusi wrote:

 According to jmf python sucks up to ASCII (those big bad Americans… of
 whom Steven is the first…) 

Watch who you're calling an American, mate.


-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Is Unicode support so hard...

2013-04-20 Thread Chris Angelico
On Sun, Apr 21, 2013 at 1:36 PM, Steven D'Aprano
steve+comp.lang.pyt...@pearwood.info wrote:
 On Sat, 20 Apr 2013 18:37:00 -0700, rusi wrote:

 According to jmf python sucks up to ASCII (those big bad Americans… of
 whom Steven is the first…)

 Watch who you're calling an American, mate.

I think he knows, and that's why he said it. You and I are foremost
among Americans who are destroying Python.

ChrisA
-- 
http://mail.python.org/mailman/listinfo/python-list