Hi.

I cannot really think of a reason why Perl itself would do something
different in either case.  And in your tests, it was verified that
PERL_UNICODE itself is still set right under mod_perl.  So it must be
that mod_perl somehow overrides the basic Perl setting.  Maybe mod_perl
needs to do something re the filehandles, because some of them might be
connected to Apache ?

Anyhow, out of my depth now, so let's call on a real mod_perl guru if
any of them is around ?

By the way :
I have tried the same thing in the meantime under Apache 2.x/mod_perl 2.x, and I seem to have the same problem.

I have one more question : where exactly do you set PERL_UNICODE ?



Rob French wrote:
Hi André,

Yes, I tried that as well and it worked as expected (UTF-8 flag is
set). Explicit PerlIO layer decoding works in both the non-mod_perl
and mod_perl tests. It seems only the default PERL_UNICODE setting is
ignored in mod_perl even though it is set.

Rgrds,
Rob

On Wed, Mar 19, 2008 at 3:01 AM, André Warnier <[EMAIL PROTECTED]> wrote:
Hi.

 Perl's handling of Unicode (and of character sets in general) is
 extremely clever and powerful.
 But it can sometimes be a bit counter-intuitive.

 In any case, it seems to me that the evaluation of the PERL_UNICODE
 environment variable is a "Perl thing" rather than a "mod_perl thing",
 and that mod_perl per se should not interfere with it.  But maybe
 mod_perl does some magic on filehandles in general which interferes, who
 knows ?

 Maybe the first thing to do is to ascertain that the problem is really
 due to a mishandling of the PERL_UNICODE environment variable, or
 something else.  I propose a simple test :
 Instead of relying on the PERL_UNICODE variable, what happens when you
 change the open() statement as follows :

  > open(FH, '<:utf8',"/tmp/utf8.txt");

 thus explicitly setting a UTF-8 decoding layer for the stream FH,
 instead of relying on PERL_UNICODE.
 Does your follow-up test then indicate that the utf8 flag for $var is  set ?

 Note : even with the decoding layer set, that does not necessarily mean
 that all data you read will end up with the utf8 flag set.  It depends
 on the data.  But in your case, if you are really using the same file
 data in both tests you show below, then it seems a valid test.

 André




 Rob French wrote:
 > I have recently started converting one of our webapps to make it fully
 > UTF-8 compliant. All input/output from the webapp will be encoded as
 > UTF-8. As such, I am trying to use the PERL_UNICODE env variable to
 > enable UTF-8 flagging on all input/output streams. This works with
 > standalone Perl scripts like the one below (the /tmp/utf8.txt file
 > contains a single character (U+00E6 - LATIN SMALL LETTER Ae) :
 >
 > #!/usr/bin/perl -w
 >
 > use strict;
 > use Encode;
 >
 > print "PERL_UNICODE Value: ${^UNICODE}\n";
 > open(FH, "</tmp/utf8.txt");
 > undef $/;
 > my $var = <FH>;
 > close(FH);
 >
 > print "Flagged as UTF8? " . Encode::is_utf8($var) . "\n";
 > exit;
 >
 > The resulting output after setting my PERL_UNICODE env var to SDA is:
 >
 > PERL_UNICODE Value: 63
 > Flagged as UTF8? 1
 >
 > Which is correct. Perl processed the input stream (open) as UTF-8 and
 > flagged it accordingly.
 >
 > Unfortunately if I put the exact same open call in my mod_perl
 > TransHandler $var is not flagged as UTF-8. The resulting output when
 > run in the TransHandler is:
 >
 > PERL_UNICODE Value: 63
 > Flagged as UTF8?
 >
 > The input stream is not processed as UTF-8 and not flagged internally
 > as UTF-8. If I explicitly add an Encode::decode_utf8($var) in mod_perl
 > then everything works as expected. It appears as if mod_perl is
 > ignoring the PERL_UNICODE env variable and not processing my input
 > streams as UTF-8.
 >
 > Thanks in advance.
 >
 > Cheers
 >
 >
 >
 >
 > Environment details below:
 >
 > Summary of my perl5 (revision 5 version 8 subversion 5) configuration:
 >   Platform:
 >     osname=linux, osvers=2.6.9-22.18.bz155725.elsmp,
 > archname=i386-linux-thread-multi
 >     uname='linux hs20-bc1-4.build.redhat.com
 > 2.6.9-22.18.bz155725.elsmp #1 smp thu nov 17 15:34:08 est 2005 i686
 > i686 i386 gnulinux '
 >     config_args='-des -Doptimize=-O2 -g -pipe -m32 -march=i386
 > -mtune=pentium4 -Dversion=5.8.5 -Dmyhostname=localhost
 > [EMAIL PROTECTED] -Dcc=gcc -Dcf_by=Red Hat, Inc.
 > -Dinstallprefix=/usr -Dprefix=/usr -Darchname=i386-linux
 > -Dvendorprefix=/usr -Dsiteprefix=/usr -Duseshrplib -Dusethreads
 > -Duseithreads -Duselargefiles -Dd_dosuid -Dd_semctl_semun -Di_db
 > -Ui_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Duseperlio
 > -Dinstallusrbinperl -Ubincompat5005 -Uversiononly
 > -Dpager=/usr/bin/less -isr -Dinc_version_list=5.8.4 5.8.3 5.8.2 5.8.1
 > 5.8.0'
 >     hint=recommended, useposix=true, d_sigaction=define
 >     usethreads=define use5005threads=undef useithreads=define
 > usemultiplicity=define
 >     useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
 >     use64bitint=undef use64bitall=undef uselongdouble=undef
 >     usemymalloc=n, bincompat5005=undef
 >   Compiler:
 >     cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING
 > -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE
 > -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm',
 >     optimize='-O2 -g -pipe -m32 -march=i386 -mtune=pentium4',
 >     cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING
 > -fno-strict-aliasing -pipe -I/usr/local/include -I/usr/include/gdbm'
 >     ccversion='', gccversion='3.4.6 20060404 (Red Hat 3.4.6-2)', 
gccosandvers=''
 >     intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
 >     d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
 >     ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t',
 > lseeksize=8
 >     alignbytes=4, prototype=define
 >   Linker and Libraries:
 >     ld='gcc', ldflags =' -L/usr/local/lib'
 >     libpth=/usr/local/lib /lib /usr/lib
 >     libs=-lresolv -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc
 >     perllibs=-lresolv -lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
 >     libc=/lib/libc-2.3.4.so, so=so, useshrplib=true, libperl=libperl.so
 >     gnulibc_version='2.3.4'
 >   Dynamic Linking:
 >     dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E
 > -Wl,-rpath,/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE'
 >     cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'
 >
 >
 > Characteristics of this binary (from libperl):
 >   Compile-time options: DEBUGGING MULTIPLICITY USE_ITHREADS
 > USE_LARGE_FILES PERL_IMPLICIT_CONTEXT
 >   Built under linux
 >   Compiled at Jul 24 2006 18:28:10
 >   @INC:
 >     /usr/lib/perl5/5.8.5/i386-linux-thread-multi
 >     /usr/lib/perl5/5.8.5
 >     /usr/lib/perl5/site_perl/5.8.5/i386-linux-thread-multi
 >     /usr/lib/perl5/site_perl/5.8.4/i386-linux-thread-multi
 >     /usr/lib/perl5/site_perl/5.8.3/i386-linux-thread-multi
 >     /usr/lib/perl5/site_perl/5.8.2/i386-linux-thread-multi
 >     /usr/lib/perl5/site_perl/5.8.1/i386-linux-thread-multi
 >     /usr/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi
 >     /usr/lib/perl5/site_perl/5.8.5
 >     /usr/lib/perl5/site_perl/5.8.4
 >     /usr/lib/perl5/site_perl/5.8.3
 >     /usr/lib/perl5/site_perl/5.8.2
 >     /usr/lib/perl5/site_perl/5.8.1
 >     /usr/lib/perl5/site_perl/5.8.0
 >     /usr/lib/perl5/site_perl
 >     /usr/lib/perl5/vendor_perl/5.8.5/i386-linux-thread-multi
 >     /usr/lib/perl5/vendor_perl/5.8.4/i386-linux-thread-multi
 >     /usr/lib/perl5/vendor_perl/5.8.3/i386-linux-thread-multi
 >     /usr/lib/perl5/vendor_perl/5.8.2/i386-linux-thread-multi
 >     /usr/lib/perl5/vendor_perl/5.8.1/i386-linux-thread-multi
 >     /usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi
 >     /usr/lib/perl5/vendor_perl/5.8.5
 >     /usr/lib/perl5/vendor_perl/5.8.4
 >     /usr/lib/perl5/vendor_perl/5.8.3
 >     /usr/lib/perl5/vendor_perl/5.8.2
 >     /usr/lib/perl5/vendor_perl/5.8.1
 >     /usr/lib/perl5/vendor_perl/5.8.0
 >     /usr/lib/perl5/vendor_perl
 >     .
 > mod_perl version: 1.30
 >



Reply via email to