We also do everything (not source code, which is in ISO-8859-1, only content) 
in UTF-8 where I work, and we support many different languages.  We never use 
any apache configurations or make any explicit reference to the OS locale being 
used.  As of Perl 5.8*, internally Perl assumes UTF-8 for all strings/character 
data, unless you say otherwise.  I do know that there have been issues with the 
core Encode module (which handles much of the character encoding control)  and 
UTF-8 character data in versions 5.8.2-5.8.6 (there are bugs filed against it), 
but I think as of 5.8.7, most have been fixed.  Anyway, you are probably safe 
with 5.8.8; but we still use 5.8.1.

Our stack is:
RedHat Linux 7.2
Apache 1.3x
mod_perl 1.29
HTML::Mason + HTML::Template
Perl 5.8.1

... and we have had no issues with UTF-8 corruption of our content.

I'm not sure this helps you or not.
- Jeff


----- Original Message ----
From: Tamer Embaby <[EMAIL PROTECTED]>
To: modperl@perl.apache.org
Sent: Wednesday, April 4, 2007 1:59:37 AM
Subject: UTF-8 encoding problems under Apache 2 with mod_perl 2.

All,

I have character encoding problem with my environment:

$ uname -a
SunOS vulcano 5.10 Generic_118844-26 i86pc i386 i86pc

Server: Apache/2.0.58 (Unix) mod_perl/2.0.3 Perl/v5.8.4

I'm hosting commercial application using mod_perl, the site we are
dealing with has Arabic character so I changed the following in Apache
to add support for UTF-8 charset:

AddDefaultCharset UTF-8

The application itself doesn't handle character set encoding as I
verified
with the vendor that they don't have anything to do with character
encoding
and they verified that their application is working fine in the same 
settings so that the problem is with my environment.

Somehow something is transforming characters with encoding above 0x7f to

HTML character entities &#XX; so that the document with Arabic letters 
arrive to the browser corrupted.

I started to suspect it's something either with Apache or mod_perl that
is 
doing that, Apache itself is capable of serving static files with UTF-8 
encoding correctly (without transforming UTF-8 character to HTML char 
entities).

Below is additional info about my server.

Would anyone have an idea about what might be causing this? And how to 
correct it.

I have a hunch that it's something to do with the Locale passed to the 
mod_perl that I should be using "PerlPassEnv LANG" or something.

Any pointers are appreciated.

Thanks,
Tamer

----- INFO BEGIN -----
$ ../../bin/apachectl -l
Compiled in modules:
  core.c
  mod_access.c
  mod_auth.c
  mod_include.c
  mod_log_config.c
  mod_env.c
  mod_setenvif.c
  prefork.c
  http_core.c
  mod_mime.c
  mod_status.c
  mod_autoindex.c
  mod_asis.c
  mod_cgi.c
  mod_negotiation.c
  mod_dir.c
  mod_imap.c
  mod_actions.c
  mod_userdir.c
  mod_alias.c
  mod_so.c

$ locale -a
C
POSIX
de
es
fi
fr
iso_8859_1
nl
ru
sl

$ locale
LANG=C
LC_CTYPE="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_COLLATE="C"
LC_MONETARY="C"
LC_MESSAGES="C"
LC_ALL=

$ perl -V
Summary of my perl5 (revision 5 version 8 subversion 4) configuration:
  Platform:
    osname=solaris, osvers=2.10, archname=i86pc-solaris-64int
    uname='sunos localhost 5.10 i86pc i386 i86pc'
    config_args=''
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=undef use5005threads=undef useithreads=undef
usemultiplicity=undef
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=define use64bitall=undef uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='gcc', ccflags ='-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64
-D_TS_ERRNO',
    optimize='-O2 -fno-strict-aliasing',
    cppflags=''
    ccversion='GNU gcc', gccversion='', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long long', ivsize=8, nvtype='double', nvsize=8,
Off_t='off_t', lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='gcc', ldflags =''
    libpth=/lib /usr/lib /usr/ccs/lib
    libs=-lsocket -lnsl -ldl -lm -lc
    perllibs=-lsocket -lnsl -ldl -lm -lc
    libc=/lib/libc.so, so=so, useshrplib=true, libperl=libperl.so
    gnulibc_version=''
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-R
/usr/perl5/5.8.4/lib/i86pc-solaris-64int/CORE'
    cccdlflags='-fPIC', lddlflags='-G'


Characteristics of this binary (from libperl):
  Compile-time options: USE_64_BIT_INT USE_LARGE_FILES
  Locally applied patches:
        22667 The optree builder was looping when constructing the ops
...
        22715 Upgrade to FileCache 1.04
        22733 Missing copyright in the README.
        22746 fix a coredump caused by rv2gv not fully converting a PV
...
        22755 Fix 29149 - another UTF8 cache bug hit by substr.
        22774 [perl #28938] split could leave an array without ...
        22775 [perl #29127] scalar delete of empty slice returned
garbage
        22776 [perl #28986] perl -e "open m" crashes Perl
        22777 add test for change #22776 ("open m" crashes Perl)
        22778 add test for change #22746 ([perl #29102] Crash on assign
...
        22781 [perl #29340] Bizarre copy of ARRAY make sure a pad op's
...
        22796 [perl #29346] Double warning for int(undef) and abs(undef)
...
        22818 BOM-marked and (BOMless) UTF-16 scripts not working
        22823 [perl #29581] glob() misses a lot of matches
        22827 Smoke [5.9.2] 22818 FAIL(F) MSWin32 WinXP/.Net SP1 (x86/1
cpu)
        22830 [perl #29637] Thread creation time is hypersensitive
        22831 improve hashing algorithm for ptr tables in perl_clone:
...
        22839 [perl #29790] Optimization busted: '@a = "b", sort @a' ...
        22850 [PATCH] 'perl -v' fails if local_patches contains code
snippets
        22852 TEST needs to ignore SCM files
        22886 Pod::Find should ignore SCM files and dirs
        22888 Remove redundant %SIG assignments from FileCache
        23006 [perl #30509] use encoding and "eq" cause memory leak
        23074 Segfault using HTML::Entities
        23106 Numeric comparison operators mustn't compare addresses of
...
        23320 [perl #30066] Memory leak in nested shared data structures
...
        23321 [perl #31459] Bug in read()
  Built under solaris
  Compiled at Jan 21 2005 15:48:11
  @INC:
    /usr/perl5/5.8.4/lib/i86pc-solaris-64int
    /usr/perl5/5.8.4/lib
    /usr/perl5/site_perl/5.8.4/i86pc-solaris-64int
    /usr/perl5/site_perl/5.8.4
    /usr/perl5/site_perl
    /usr/perl5/vendor_perl/5.8.4/i86pc-solaris-64int
    /usr/perl5/vendor_perl/5.8.4
    /usr/perl5/vendor_perl

----- INFO END -----

--

Tamer Embaby <[EMAIL PROTECTED]>

" f u cn rd ths, u cn gt a gd jb n cmptr prgrmmng. "





Reply via email to