On January 5, 2003 at 05:42, Jarkko Hietaniemi wrote:

> > This is Bad Juju (tm). It _guarantees_ script breakage (potentially
> > silently!) for Unix people doing _anything_ but ASCII text manipulation.  
> 
> I repeat: I don't think you can do "more than ASCII" by hanging tooth
> and nail to the "everything is bytes" credo.

This statement assumes someone is working with characters.  It is
common for many to use regexs and other operators (substr, index,
et. al.) on binary data directly.

> I repeat: all your filehandles are still 'binary' unless you either
> explicitly (binmode) or implicitly (locale) command them not be.
> If you try to push Unicode (data marked as UTF-8, such as characters
> beyond 255) on such a filehandle, you'll get 'Wide character' warning.
> If you do not like the locale implicit switching, reset your locale
> to something not /utf-?8/i in it before running the script.

I think this reasoning is flawed since it assumes the author of
the script has complete control over the environment.  For example,
the script can be used by others in environments the author does not
control.  Therefore, older programs can quietly break, or behave
different.

According the perllocale manpage, locale should have no effect
unless the 'use locale' pragma is specified.  It appears from
Benjamin's script that he is not using the pragma, so even if the
environment has a utf-8 locale, the script should be unaffected.

--ewh

Reply via email to