Is -C useless?

2008-09-11 Thread Paul LeoNerd Evans
Consider the following:

  #!/usr/bin/perl -COL

  print "Hello w\xe9rld\n";

  $ perl test.pl 
  Too late for "-COL" option at test.pl line 1.

WTF?? If that's "too late", where else can I put it?

If I remove it from the file:

  $ perl -COL test.pl
  Hello wérld

But of course now STDOUT isn't UTF-8.

  $ perl test.pl
  Hello w�ld

This can be fixed by:

  #!/usr/bin/perl

  binmode STDOUT, ":utf8" if $ENV{LANG} =~ m/\.UTF-8$/;

  print "Hello w\xe9rld\n";

  $ perl test.pl
  Hello wérld

But of course, this is the exact behaviour that -COL is supposed to
provide; and yet is "too late" by the shebang time.. Yet, clearly not
because I can do it even later at runtime.

Can anyone offer any insight here?

-- 
Paul "LeoNerd" Evans

[EMAIL PROTECTED]
ICQ# 4135350   |  Registered Linux# 179460
http://www.leonerd.org.uk/


signature.asc
Description: PGP signature


Re: Is -C useless?

2008-09-11 Thread Philip Skinner
On Thu, 2008-09-11 at 13:11 +0100, Paul LeoNerd Evans wrote:
> Consider the following:
> 
>   #!/usr/bin/perl -COL
> 
>   print "Hello w\xe9rld\n";
> 
>   $ perl test.pl 
>   Too late for "-COL" option at test.pl line 1.
> 
> WTF?? If that's "too late", where else can I put it?
> 
> If I remove it from the file:
> 
>   $ perl -COL test.pl
>   Hello wérld
> 
> But of course now STDOUT isn't UTF-8.
> 
>   $ perl test.pl
>   Hello w�ld
> 
> This can be fixed by:
> 
>   #!/usr/bin/perl
> 
>   binmode STDOUT, ":utf8" if $ENV{LANG} =~ m/\.UTF-8$/;
> 
>   print "Hello w\xe9rld\n";
> 
>   $ perl test.pl
>   Hello wérld
> 
> But of course, this is the exact behaviour that -COL is supposed to
> provide; and yet is "too late" by the shebang time.. Yet, clearly not
> because I can do it even later at runtime.
> 
> Can anyone offer any insight here?
> 

Maybe:

chmod 755 test.pl
./test.pl

?




Re: Is -C useless?

2008-09-11 Thread Paul Orrock



Paul LeoNerd Evans wrote:

Consider the following:

  #!/usr/bin/perl -COL

  print "Hello w\xe9rld\n";

  $ perl test.pl 
  Too late for "-COL" option at test.pl line 1.


Ermmm... that's because by the time you get to the -COL switch you're 
already inside the perl interpreter that you called from the command line


I think you meant

$ chmod 755 test.pl
$ ./test.pl

Which gives me the output you want : Hello wérld

Unless I've missed something, which is highly likely

regards,

Paul

--
Paul Orrock  Digital Craftsmen
Lead SysAdmin www.digitalcraftsmen.net
Exmouth House, 3 Pine Street, London, EC1R 0JH
Tel: 020 7183 1410  Fax: 020 7099 5140


Re: Is -C useless?

2008-09-11 Thread Nicholas Clark
On Thu, Sep 11, 2008 at 01:52:31PM +0100, Paul LeoNerd Evans wrote:

> By the way, this is perl 5.10. I think it used to work on 5.8.8.

For some value of "work" which was actually "silently fail"

Nicholas Clark


Re: Is -C useless?

2008-09-11 Thread Paul LeoNerd Evans
On Thu, 11 Sep 2008 13:24:34 +0100
Paul Orrock <[EMAIL PROTECTED]> wrote:

> $ chmod 755 test.pl
> $ ./test.pl
> 
> Which gives me the output you want : Hello wérld
> 
> Unless I've missed something, which is highly likely

  $ ll test.pl 
  -rwxr-xr-x 1 leo leo 48 2008-09-11 13:51 test.pl

  $ cat test.pl 
  #!/usr/bin/perl -COL

  print "Hello w\xe9rld\n";

  $ ./test.pl 
  Too late for "-COL" option at ./test.pl line 1.

By the way, this is perl 5.10. I think it used to work on 5.8.8.

Perl version?

-- 
Paul "LeoNerd" Evans

[EMAIL PROTECTED]
ICQ# 4135350   |  Registered Linux# 179460
http://www.leonerd.org.uk/


signature.asc
Description: PGP signature


Re: Is -C useless?

2008-09-11 Thread Nicholas Clark
On Thu, Sep 11, 2008 at 02:03:41PM +0100, Nicholas Clark wrote:
> On Thu, Sep 11, 2008 at 01:52:31PM +0100, Paul LeoNerd Evans wrote:
> 
> > By the way, this is perl 5.10. I think it used to work on 5.8.8.
> 
> For some value of "work" which was actually "silently fail"

Post in haste, repent at leisure.

I think that it's (at least) a significant subset of what it says it does
that it fails on, if -C is first processed by the perl interpreter after
opening the file, rather than being passed in when the kernel (or user)
invokes the perl interpreter with it on the command line.

The change I remember had happened turns out to be this one:

http://public.activestate.com/cgi-bin/perlbrowse?patch_num=24070&show_patch=Show+Patch

as a result of this bug report:

http://rt.perl.org/rt3//Public/Bug/Display.html?id=34087


I wasn't involved in any of this, so I don't fully understand the issues.

Nicholas Clark


Re: Is -C useless?

2008-09-11 Thread Paul Johnson
On Thu, Sep 11, 2008 at 01:52:31PM +0100, Paul LeoNerd Evans wrote:
> On Thu, 11 Sep 2008 13:24:34 +0100
> Paul Orrock <[EMAIL PROTECTED]> wrote:
> 
> > $ chmod 755 test.pl
> > $ ./test.pl
> > 
> > Which gives me the output you want : Hello wérld
> > 
> > Unless I've missed something, which is highly likely
> 
>   $ ll test.pl 
>   -rwxr-xr-x 1 leo leo 48 2008-09-11 13:51 test.pl
> 
>   $ cat test.pl 
>   #!/usr/bin/perl -COL
> 
>   print "Hello w\xe9rld\n";
> 
>   $ ./test.pl 
>   Too late for "-COL" option at ./test.pl line 1.
> 
> By the way, this is perl 5.10. I think it used to work on 5.8.8.

Ah, then you'll have read pod/perl5100delta.pod.

  The B<-C> option can no longer be used on the C<#!> line. It wasn't
  working there anyway, since the standard streams are already set up
  at this point in the execution of the perl interpreter. You can use
  binmode() instead to get the desired behaviour.

Which is just what you seem to have discovered ;-)

-- 
Paul Johnson - [EMAIL PROTECTED]
http://www.pjcj.net



Re: Is -C useless?

2008-09-11 Thread Paul LeoNerd Evans
On Thu, 11 Sep 2008 15:19:17 +0200
Paul Johnson <[EMAIL PROTECTED]> wrote:

> Which is just what you seem to have discovered ;-)

In that case, I might have to consider the following:

 package utf8::LocaleAware;

 my $converted = 0;

 sub import
 {
   return if $converted;
   $converted = 1;

   return unless $ENV{LANG} =~ m/\.UTF-8/;

   binmode STDIN, ":utf8";
   binmode STDOUT, ":utf8";

   require Encode;
   map { $_ = Encode::decode_utf8 $_ } @ARGV, values %ENV;
 }


Then just begin every program with

 #!/usr/bin/perl
 use utf8::LocaleAware;

But.. you know.. Surely there must be _some_ way to make -C not
useless?

Hell, even this might work:

 #!/usr/bin/perl
 BEGIN { exec $^X, "-C", @ARGV unless ${^UNICODE} }

-- 
Paul "LeoNerd" Evans

[EMAIL PROTECTED]
ICQ# 4135350   |  Registered Linux# 179460
http://www.leonerd.org.uk/


signature.asc
Description: PGP signature


Re: Is -C useless?

2008-09-11 Thread Paul LeoNerd Evans
On Thu, 11 Sep 2008 15:19:17 +0200
Paul Johnson <[EMAIL PROTECTED]> wrote:

> since the standard streams are already set up
>   at this point in the execution of the perl interpreter. You can use
>   binmode() instead to get the desired behaviour.

Waaait a moment.

Why can't -C just call binmode itself?

This is getting stupider by the moment.

I've even tried this; it JustWorks:

 #!/usr/bin/perl
 BEGIN { exec $^X, "-COL", "-f", $0, @ARGV unless ${^UNICODE} }

 print "Hello w\xe9rld\n";

 $ ./test-unicode.pl 
 Hello wérld

-- 
Paul "LeoNerd" Evans

[EMAIL PROTECTED]
ICQ# 4135350   |  Registered Linux# 179460
http://www.leonerd.org.uk/


signature.asc
Description: PGP signature


Re: Is -C useless?

2008-09-11 Thread Dagfinn Ilmari Mannsåker
Paul LeoNerd Evans <[EMAIL PROTECTED]> writes:

> On Thu, 11 Sep 2008 15:19:17 +0200
> Paul Johnson <[EMAIL PROTECTED]> wrote:
>
>> Which is just what you seem to have discovered ;-)
>
> In that case, I might have to consider the following:
>
>  package utf8::LocaleAware;
>
>  my $converted = 0;
>
>  sub import
>  {
>return if $converted;
>$converted = 1;
>
>return unless $ENV{LANG} =~ m/\.UTF-8/;

You want LC_CTYPE, which can be inherited from LANG or overridden by
LC_ALL. Also, not all the world is UTF-8.

you want something like this:

require I18N::Langinfo;
require Encode;

my encoding = I18N::Langinfo::langinfo(I18N::Langinfo::CODESET());
binmode STDIN, ":encoding($encoding)";
binmode STDOUT, ":encoding($encoding)";
binmode STDERR, ":encoding($encoding)";

map { $_ = Encode::decode($encoding, $_) } @ARGV, values %ENV;


>binmode STDIN, ":utf8";
>binmode STDOUT, ":utf8";
>
>require Encode;
>map { $_ = Encode::decode_utf8 $_ } @ARGV, values %ENV;
>  }

-- 
ilmari
"A disappointingly low fraction of the human race is,
 at any given time, on fire." - Stig Sandbeck Mathisen


Re: Is -C useless?

2008-09-11 Thread Paul LeoNerd Evans
On Fri, 12 Sep 2008 00:34:15 +0100
[EMAIL PROTECTED] (Dagfinn Ilmari Mannsåker) wrote:

> You want LC_CTYPE, which can be inherited from LANG or overridden by
> LC_ALL. Also, not all the world is UTF-8.

I could want all sorts of things.

I'm simply pointing out there's no reason why perl can't implement -C by
the time it reads line 1 of the program script. That is not "too late" as
its message would otherwise indicate. My demonstration using binmode
already shows that.

Why can't perl just call binmode itself on STDIN/STDOUT/STDERR by the
time it gets that far, instead of throwing a wobbly and telling me to do
it myself. That doesn't sound very DWIM to me

And furthermore, binmode / decode_utf8 only get us round the IOE / A
flags respectively. There is, to my knowledge, no perl code that can set
the default UTF-8ness of new filehandles, the way that -Cio does.

And all of this presumes that whoever calls binmode knows how to test the
environment properly. It's come to my attention that this might be more
accurate:

  binmode STDOUT, ":utf8" if grep m/utf-?8/i, @ENV{qw(LANG LC_MESSAGES LC_ALL)};

And even then I'm not sure it's right. Which really just proves my
point...

The -C...L flag _already_ implements the correct logic. It's just not
useful as it is...

-- 
Paul "LeoNerd" Evans

[EMAIL PROTECTED]
ICQ# 4135350   |  Registered Linux# 179460
http://www.leonerd.org.uk/


signature.asc
Description: PGP signature


Re: Is -C useless?

2008-09-11 Thread Paul LeoNerd Evans
On Fri, 12 Sep 2008 00:48:06 +0100
Paul LeoNerd Evans <[EMAIL PROTECTED]> wrote:

>   binmode STDOUT, ":utf8" if grep m/utf-?8/i, @ENV{qw(LANG LC_MESSAGES 
> LC_ALL)};
> 
> And even then I'm not sure it's right.

Actually it's still not... We have to take the first defined one, not try
our best to find one:

  binmode STDOUT, ":utf8" if ( $ENV{LC_ALL} || $ENV{LC_MESSAGES} || $ENV{LANG} 
) =~ m/utf-?8/i

which -technically- still breaks because we could have  LC_ALL=0  in our
environment, but I think it's close enough.

> Which really just proves my point...

  ... even more.

-- 
Paul "LeoNerd" Evans

[EMAIL PROTECTED]
ICQ# 4135350   |  Registered Linux# 179460
http://www.leonerd.org.uk/


signature.asc
Description: PGP signature


Re: Is -C useless?

2008-09-11 Thread Dagfinn Ilmari Mannsåker
Paul LeoNerd Evans <[EMAIL PROTECTED]> writes:

> On Fri, 12 Sep 2008 00:48:06 +0100
> Paul LeoNerd Evans <[EMAIL PROTECTED]> wrote:
>
>>   binmode STDOUT, ":utf8" if grep m/utf-?8/i, @ENV{qw(LANG LC_MESSAGES 
>> LC_ALL)};
>> 
>> And even then I'm not sure it's right.
>
> Actually it's still not... We have to take the first defined one, not try
> our best to find one:
>
>   binmode STDOUT, ":utf8" if ( $ENV{LC_ALL} || $ENV{LC_MESSAGES} || 
> $ENV{LANG} ) =~ m/utf-?8/i
>
> which -technically- still breaks because we could have  LC_ALL=0  in our
> environment, but I think it's close enough.

Or you could just use I18N::Langinfo.

-- 
ilmari
"A disappointingly low fraction of the human race is,
 at any given time, on fire." - Stig Sandbeck Mathisen


Re: Is -C useless?

2008-09-11 Thread Paul Johnson
On Fri, Sep 12, 2008 at 12:28:22AM +0100, Paul LeoNerd Evans wrote:
> On Thu, 11 Sep 2008 15:19:17 +0200
> Paul Johnson <[EMAIL PROTECTED]> wrote:
> 
> > since the standard streams are already set up
> >   at this point in the execution of the perl interpreter. You can use
> >   binmode() instead to get the desired behaviour.
> 
> Waaait a moment.
> 
> Why can't -C just call binmode itself?
> 
> This is getting stupider by the moment.

Well.  As I recall (in other words, don't blame me if I'm wrong), the
problem wasn't that it couldn't be made to work, but rather than no one could
be found who had the time, ability and inclination to make it work.  But
stopping it being broken was easier.

Perhaps you are that person?  You certainly seem to be at least one third of
the way there.

-- 
Paul Johnson - [EMAIL PROTECTED]
http://www.pjcj.net