On Thu, Feb 14, 2008 at 08:59:29PM -0000, Ted Pedersen wrote: > You seem to be saying there is a better option than "use locale",
Yes - make use of the unicode capabilities of perl. > which I'm more than willing to believe. However, what I can't estimate > at present is how difficult or time consuming it would be to modify > NSP in the way you describe. We'll certainly follow up on your hints It is more tme consumng than the "use locale" way. Of course. But given NSPs codebase - its a timely doable task. > The advantage of "use locale" is that it seems to solve at least some > problems, and it's a fairly simple modification to make. So as > imperfect as it might be, it seems better than what we have now. Ths advantage is illusional - unfortunately. llusional in the sense, as the "some problems" it seems to solve rely on a well set up environment on the OS side. Which isn't always the case. Moreover, "use locale" will - in most cases - give you good results for languages that correlate with the locale environment on a given machine. That is: If a user on a "czech host" with correctly set up czech locale tries to process czech text, it will be ok. However, if the same user on the same host, tries to process turkish text: *boom*. > Further comments discussions on use locale versus other alternatives > is more than welcome, and would in fact be appreciated. I wonder why the original author had problems with an catalan text anyway. The only two viable encodings for catalan I know of are iso-8859-1 and windows-1252. iso-8859-1 should give him no problem, because that's what NSP has been created and (mostly) tested with. Probably he catched a win-1252 encoded text which could cause the problems he described. The effort to get a perl application unicode-clean isn't that high at least it isn't higher than twiddling with locales. You just have to catch all input streams (where data comes in) and all output streams (obviously, where the application spills data) and decode (input) and encode (output) the data respectively. See http://search.cpan.org/~dankogai/Encode-2.23/Encode.pm You must - and this is a mandatory requirement - always know what encoding your input data are in. Without this, no reliable processing can be guaranteed. -- Kind regards, Dipl.-Inf. Richard Jelinek - The PetaMem Group - Prague/Nuremberg - www.petamem.com - -= 2007-09-25: 49235653 Mind Units =-