On 12/03/2016 12:13, Marko Rauhamaa wrote:
BartC <b...@freeuk.com>:
If you're looking at fast processing of language source code (in a
thread partly about efficiency), then you cannot ignore the fact that
the vast majority of characters being processed are going to have
ASCII codes.
I don't know why you would optimize for inputting program source code.
Text in general has left ASCII behind a long time ago. Just go to
Wikipedia and click on any of the other languages.
Why, look at the *English* page on Hillary Clinton:
Hillary Diane Rodham Clinton /ˈhɪləri daɪˈæn ˈrɒdəm ˈklɪntən/ (born
October 26, 1947) is an American politician.
<URL: https://en.wikipedia.org/wiki/Hillary_Clinton>
You couldn't get past the first sentence in ASCII.
I saved that page locally as a .htm file in UTF-8 encoding. I ran a
modified version of my benchmark, and it appeared that 99.7% of the
bytes had ASCII codes. The other 0.3% presumably were multi-byte
sequences, so that the actual proportion of Unicode characters would be
even less.
I then saved the Arabic version of the page, which visually, when
rendered, consists of 99% Arabic script. But the .htm file was still 80%
ASCII!
So what were you saying about ASCII being practically obsolete ... ?
--
Bartc
--
https://mail.python.org/mailman/listinfo/python-list