On Sun, 12 Feb 2012 17:27:34 -0500, Roy Smith wrote: > In article <mailman.5739.1329084873.27778.python-l...@python.org>, > Chris Angelico <ros...@gmail.com> wrote: > >> On Mon, Feb 13, 2012 at 9:07 AM, Terry Reedy <tjre...@udel.edu> wrote: >> > The situation before ascii is like where we ended up *before* >> > unicode. Unicode aims to replace all those byte encoding and >> > character sets with *one* byte encoding for *one* character set, >> > which will be a great simplification. It is the idea of ascii applied >> > on a global rather that local basis. >> >> Unicode doesn't deal with byte encodings; UTF-8 is an encoding, but so >> are UTF-16, UTF-32. and as many more as you could hope for. But broadly >> yes, Unicode IS the solution. > > I could hope for one and only one, but I know I'm just going to be > disapointed. The last project I worked on used UTF-8 in most places, > but also used some C and Java libraries which were only available for > UTF-16. So it was transcoding hell all over the place.
Um, surely the solution to that is to always call a simple wrapper function to the UTF-16 code to handle the transcoding? What do the Design Patterns people call it, a facade? No, an adapter. (I never remember the names...) Instead of calling library.foo() which only outputs UTF-16, write a wrapper myfoo() which calls foo, captures its output and transcribes to UTF-8. You have to do that once (per function), but now it works from everywhere, so long as you remember to always call myfoo instead of foo. > Hopefully, we will eventually reach the point where storage is so cheap > that nobody minds how inefficient UTF-32 is and we all just start using > that. Life will be a lot simpler then. No more transcoding, a string > will just as many bytes as it is characters, and everybody will be happy > again. I think you mean 4 times as many bytes as characters. Unless you have 32 bit bytes :) -- Steven -- http://mail.python.org/mailman/listinfo/python-list