Re: How do I display unicode value stored in a string variable using ord()

Paul Rubin Sun, 19 Aug 2012 23:28:14 -0700

Steven D'Aprano <[email protected]> writes:
> Paul Rubin already told you about his experience using OCR to generate 
> multiple terrabytes of text, and how he would not be happy if that was 
> stored in UCS-4.


That particular text was stored on disk as compressed XML that had UTF-8
in the data fields, but I think Roy is right that it would have
compressed to around the same size in UCS-4.  Converting it to UCS-4 on
input would have bloated up the memory footprint and that was the issue
of concern to me.

> Pittance or not, I do not believe that people will widely abandon compact 
> storage formats like UTF-8 and Latin-1 for UCS-4 any time soon.

Looking at http://www.icu-project.org/ the C++ classes seem to use
UTF-16 sort like Python 3.2 :(.  I'm not certain of this though.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: How do I display unicode value stored in a string variable using ord()

Reply via email to