On 12/03/2016 12:13, Marko Rauhamaa wrote:
BartC <b...@freeuk.com>:

If you're looking at fast processing of language source code (in a
thread partly about efficiency), then you cannot ignore the fact that
the vast majority of characters being processed are going to have
ASCII codes.

I don't know why you would optimize for inputting program source code.
Text in general has left ASCII behind a long time ago. Just go to
Wikipedia and click on any of the other languages.

Why, look at the *English* page on Hillary Clinton:

    Hillary Diane Rodham Clinton /ˈhɪləri daɪˈæn ˈrɒdəm ˈklɪntən/ (born
    October 26, 1947) is an American politician.
    <URL: https://en.wikipedia.org/wiki/Hillary_Clinton>

You couldn't get past the first sentence in ASCII.

I saved that page locally as a .htm file in UTF-8 encoding. I ran a modified version of my benchmark, and it appeared that 99.7% of the bytes had ASCII codes. The other 0.3% presumably were multi-byte sequences, so that the actual proportion of Unicode characters would be even less.

I then saved the Arabic version of the page, which visually, when rendered, consists of 99% Arabic script. But the .htm file was still 80% ASCII!

So what were you saying about ASCII being practically obsolete ... ?

--
Bartc
--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to