On 2 avr, 18:57, rusi <rustompm...@gmail.com> wrote:
> On Apr 2, 8:17 pm, Ethan Furman <et...@stoneleaf.us> wrote:
>
> > Simmons (too many Steves!), I know you're new so don't have all the history 
> > with jmf that many
> > of us do, but consider that the original post was about numbers, had 
> > nothing to do with
> > characters or unicode *in any way*, and yet jmf still felt the need to 
> > bring unicode up.
>
> Just for reference, here is the starting para of Chris' original mail
> that started this thread.
>
> > The Python 3 merge of int and long has effectively penalized
> > small-number arithmetic by removing an optimization. As we've seen
> > from PEP 393 strings (jmf aside), there can be huge benefits from
> > having a single type with multiple representations internally. Is
> > there value in making the int type have a machine-word optimization in
> > the same way?
>
> ie it mentions numbers, strings, PEP 393 *AND jmf.*  So while it is
> true that jmf has been butting in with trollish behavior into
> completely unrelated threads with his unicode rants, that cannot be
> said for this thread.

-----

That's because you did not understand the analogy, int/long <-> FSR.

One another illustration,

>>> def AddOne(i):
...     if 0 < i <= 100:
...         return i + 10 + 10 + 10 - 10 - 10 - 10 + 1
...     elif 100 < i <= 1000:
...         return i + 100 + 100 + 100  + 100 - 100 - 100 - 100 - 100
+ 1
...     else:
...         return i + 1
...

Do it work? yes.
Is is "correct"? this can be discussed.

Now replace i by a char, a representent of each "subset"
of the FSR, select a method where this FST behave badly
and take a look of what happen.


>>> timeit.repeat("'a' * 1000 + 'z'")
[0.6532032148133153, 0.6407248807756699, 0.6407264561239894]
>>> timeit.repeat("'a' * 1000 + '9'")
[0.6429508479509245, 0.6242782443215589, 0.6240490311410927]
>>>

>>> timeit.repeat("'a' * 1000 + '€'")
[1.095694927496563, 1.0696347279235603, 1.0687741939041082]
>>> timeit.repeat("'a' * 1000 + 'ẞ'")
[1.0796421281222877, 1.0348612767961853, 1.035325216876231]
>>> timeit.repeat("'a' * 1000 + '\u2345'")
[1.0855414137412112, 1.0694677410017164, 1.0688096392412945]
>>>

>>> timeit.repeat("'œ' * 1000 + '\U00010001'")
[1.237314015362017, 1.2226262553064657, 1.21994619397816]
>>> timeit.repeat("'œ' * 1000 + '\U00010002'")
[1.245773635836997, 1.2303978424029651, 1.2258257877430765]

Where does it come from? Simple, the FSR breaks the
simple rules used in all coding schemes (unicode or not).
1) a unique set of chars
2) the "same" algorithm for all chars.

And again that's why utf-8 is working very smoothly.

The "corporates" which understood this very well and
wanted to incorporate, let say, the used characters
of the French language had only the choice to
create new coding schemes (eg mac-roman, cp1252).

In unicode, the "latin-1" range is real plague.

After years of experience, I'm still fascinated to see
the corporates has solved this issue easily and the "free
software" is still relying on latin-1.
I never succeed to find an explanation.

Even, the TeX folks, when they shifted to the Cork
encoding in 199?, were aware of this and consequently
provides special package(s).

No offense, this is in my mind why "corporate software"
will always be "corporate software" and "hobbyist software"
will always stay at the level of "hobbyist software".

A French windows user, understanding nothing in the
coding of characters, assuming he is aware of its
existence (!), has certainly no problem.


Fascinating how it is possible to use Python to teach,
to illustrate, to explain the coding of the characters. No?


jmf

-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to