On Friday, May 27, 2016 at 9:39:19 PM UTC+5:30, Random832 wrote: > On Fri, May 27, 2016, at 11:53, Rustom Mody wrote: > > And coding systems are VERY political. > > Sure what characters are put in (and not) is political > > But more invisible but equally political is the collating order. > > > > eg No one understands what jmf's gripes are... My guess is that a Euro > > costs 3 times a Dollar. > > > > >>> "€".encode("UTF-8") > > b'\xe2\x82\xac' > > >>> "$".encode("UTF-8") > > b'$' > > > > [Its another matter that this is not the evil deed of python but of > > UTF-8!] > > AIUI jmf's issue is that python's string type (nothing to do with UTF-8) > doesn't treat all strings equally. Strings that are only in Latin-1 > (including your dollar example) have only one byte per character, > whereas strings with BMP characters have two bytes per character (he > also has some more difficult to understand objections to the large fixed > overhead and the cached UTF-8 version [which ASCII strings don't have])
Yeah I know and my choice of using UTF-8 encode was probably not felicitous Consider instead: >>> ord('$') 36 >>> ord('€') 8364 >>> bin(ord('$')) '0b100100' >>> bin(ord('€')) '0b10000010101100' >>> Shows that '$' costs 6 bits whereas '€' costs 14 In idealized, simplified models like Turing models where 3 is 111 7 is 1111111 100, 8364 etc I wont try to write but you get the idea! its quite clear that bigger numbers cost more than smaller ones With current hardware it would seem to be a flat characteristic for everything < 2³² (or even 2⁶⁴) But thats only an optical illusion because after that the characteristic will rise jaggedly, slowly but monotonically, typically log-linearly [which AIUI is jmf's principal error] Which also means that if the Chinese were to have more say in the design of Unicode/ UTF-8 they would likely not waste swathes of prime real-estate for almost never used control characters just in the name of ASCII compliance IOW ANY coding standard makes choices that are essentially political Unicode just happens to be (currently) politically correct -- https://mail.python.org/mailman/listinfo/python-list