We also do everything (not source code, which is in ISO-8859-1, only content) in UTF-8 where I work, and we support many different languages. We never use any apache configurations or make any explicit reference to the OS locale being used. As of Perl 5.8*, internally Perl assumes UTF-8 for all strings/character data, unless you say otherwise. I do know that there have been issues with the core Encode module (which handles much of the character encoding control) and UTF-8 character data in versions 5.8.2-5.8.6 (there are bugs filed against it), but I think as of 5.8.7, most have been fixed. Anyway, you are probably safe with 5.8.8; but we still use 5.8.1.
Our stack is: RedHat Linux 7.2 Apache 1.3x mod_perl 1.29 HTML::Mason + HTML::Template Perl 5.8.1 ... and we have had no issues with UTF-8 corruption of our content. I'm not sure this helps you or not. - Jeff ----- Original Message ---- From: Tamer Embaby <[EMAIL PROTECTED]> To: modperl@perl.apache.org Sent: Wednesday, April 4, 2007 1:59:37 AM Subject: UTF-8 encoding problems under Apache 2 with mod_perl 2. All, I have character encoding problem with my environment: $ uname -a SunOS vulcano 5.10 Generic_118844-26 i86pc i386 i86pc Server: Apache/2.0.58 (Unix) mod_perl/2.0.3 Perl/v5.8.4 I'm hosting commercial application using mod_perl, the site we are dealing with has Arabic character so I changed the following in Apache to add support for UTF-8 charset: AddDefaultCharset UTF-8 The application itself doesn't handle character set encoding as I verified with the vendor that they don't have anything to do with character encoding and they verified that their application is working fine in the same settings so that the problem is with my environment. Somehow something is transforming characters with encoding above 0x7f to HTML character entities &#XX; so that the document with Arabic letters arrive to the browser corrupted. I started to suspect it's something either with Apache or mod_perl that is doing that, Apache itself is capable of serving static files with UTF-8 encoding correctly (without transforming UTF-8 character to HTML char entities). Below is additional info about my server. Would anyone have an idea about what might be causing this? And how to correct it. I have a hunch that it's something to do with the Locale passed to the mod_perl that I should be using "PerlPassEnv LANG" or something. Any pointers are appreciated. Thanks, Tamer ----- INFO BEGIN ----- $ ../../bin/apachectl -l Compiled in modules: core.c mod_access.c mod_auth.c mod_include.c mod_log_config.c mod_env.c mod_setenvif.c prefork.c http_core.c mod_mime.c mod_status.c mod_autoindex.c mod_asis.c mod_cgi.c mod_negotiation.c mod_dir.c mod_imap.c mod_actions.c mod_userdir.c mod_alias.c mod_so.c $ locale -a C POSIX de es fi fr iso_8859_1 nl ru sl $ locale LANG=C LC_CTYPE="C" LC_NUMERIC="C" LC_TIME="C" LC_COLLATE="C" LC_MONETARY="C" LC_MESSAGES="C" LC_ALL= $ perl -V Summary of my perl5 (revision 5 version 8 subversion 4) configuration: Platform: osname=solaris, osvers=2.10, archname=i86pc-solaris-64int uname='sunos localhost 5.10 i86pc i386 i86pc' config_args='' hint=recommended, useposix=true, d_sigaction=define usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef useperlio=define d_sfio=undef uselargefiles=define usesocks=undef use64bitint=define use64bitall=undef uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='gcc', ccflags ='-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_TS_ERRNO', optimize='-O2 -fno-strict-aliasing', cppflags='' ccversion='GNU gcc', gccversion='', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=12345678 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12 ivtype='long long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=4, prototype=define Linker and Libraries: ld='gcc', ldflags ='' libpth=/lib /usr/lib /usr/ccs/lib libs=-lsocket -lnsl -ldl -lm -lc perllibs=-lsocket -lnsl -ldl -lm -lc libc=/lib/libc.so, so=so, useshrplib=true, libperl=libperl.so gnulibc_version='' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-R /usr/perl5/5.8.4/lib/i86pc-solaris-64int/CORE' cccdlflags='-fPIC', lddlflags='-G' Characteristics of this binary (from libperl): Compile-time options: USE_64_BIT_INT USE_LARGE_FILES Locally applied patches: 22667 The optree builder was looping when constructing the ops ... 22715 Upgrade to FileCache 1.04 22733 Missing copyright in the README. 22746 fix a coredump caused by rv2gv not fully converting a PV ... 22755 Fix 29149 - another UTF8 cache bug hit by substr. 22774 [perl #28938] split could leave an array without ... 22775 [perl #29127] scalar delete of empty slice returned garbage 22776 [perl #28986] perl -e "open m" crashes Perl 22777 add test for change #22776 ("open m" crashes Perl) 22778 add test for change #22746 ([perl #29102] Crash on assign ... 22781 [perl #29340] Bizarre copy of ARRAY make sure a pad op's ... 22796 [perl #29346] Double warning for int(undef) and abs(undef) ... 22818 BOM-marked and (BOMless) UTF-16 scripts not working 22823 [perl #29581] glob() misses a lot of matches 22827 Smoke [5.9.2] 22818 FAIL(F) MSWin32 WinXP/.Net SP1 (x86/1 cpu) 22830 [perl #29637] Thread creation time is hypersensitive 22831 improve hashing algorithm for ptr tables in perl_clone: ... 22839 [perl #29790] Optimization busted: '@a = "b", sort @a' ... 22850 [PATCH] 'perl -v' fails if local_patches contains code snippets 22852 TEST needs to ignore SCM files 22886 Pod::Find should ignore SCM files and dirs 22888 Remove redundant %SIG assignments from FileCache 23006 [perl #30509] use encoding and "eq" cause memory leak 23074 Segfault using HTML::Entities 23106 Numeric comparison operators mustn't compare addresses of ... 23320 [perl #30066] Memory leak in nested shared data structures ... 23321 [perl #31459] Bug in read() Built under solaris Compiled at Jan 21 2005 15:48:11 @INC: /usr/perl5/5.8.4/lib/i86pc-solaris-64int /usr/perl5/5.8.4/lib /usr/perl5/site_perl/5.8.4/i86pc-solaris-64int /usr/perl5/site_perl/5.8.4 /usr/perl5/site_perl /usr/perl5/vendor_perl/5.8.4/i86pc-solaris-64int /usr/perl5/vendor_perl/5.8.4 /usr/perl5/vendor_perl ----- INFO END ----- -- Tamer Embaby <[EMAIL PROTECTED]> " f u cn rd ths, u cn gt a gd jb n cmptr prgrmmng. "