Hello Edin, I don't know if your proposal is logical or not, I figured the problem in the historical context, that it shouldn't rely on mbstring ext too much because it's been there since mbstring(formerly called as jstring) was introduced.
But, It's a problem that the internal encoding is not necessarily the same as the output encoding when mb_output_handler is enabled. So in this sense, giving more priority to mbstring.internal_encoding is quite natural to me. In addition there's a hack in mbstring.c that overrides the Content-Type header whatever the SAPI setting is, when output handler is enabled by the ini setting. I think the real issue is we have two similar options that seem to stay different as long as the ZE's parser doens't support various charsets, at least those which can be handled by the current version of mbstring. You may want to point me out that we already have --enable-zend-multibyte, but it's virtually a hack IMO, and it should be integrated to the core at lower level in the future version. BTW, the temporary solution is to give a priority to each setting, like 1. MBSTRG(internal_encoding) 2. SG(default_charset) 3. System's locale setting How about this option? Moriyoshi "Edin Kadribasic" <[EMAIL PROTECTED]> wrote: > Hey Moriyoshi, > > Sorry for my late entry into the debate, but I run into > htmlentities() default charset problem today. I wonder why did you > opt to use mbstring ini setting (thus making this nice feature > mbstring dependant) when we have "default_charset" ini setting. > > It just sounds more logical to me to use SG(default_charset) for the > default charset of htmlentities(). Your thoughts? > > Edin > > ----- Original Message ----- > From: "Moriyoshi Koizumi" <[EMAIL PROTECTED]> > To: "Wez Furlong" <[EMAIL PROTECTED]> > Cc: <[EMAIL PROTECTED]> > Sent: Thursday, October 17, 2002 7:48 AM > Subject: Re: [PHP-DEV] [PATCH] Changing entity charset > handlinginext/standard/html.c > > > > Yep, as far as I read the archives, I haven't found any > discussions on the > > charset related backwards problems. So I wrote "*exactly* about > this > > issue". > > > > You may want to redirect me to bug #9392 > (http://bugs.php.net/bug.php?id=9392), but it doens't seem to > help... > > > > In addition, I found determining the internal charset by LC_CTYPE > is > > dangerous because setlocale() is not thread-safe in some libc > > implementations (glibc seems to be that one). > > > > I'm going to read archives more carefully, though I think even > handling > > the charset in phpinfo() will yield the same discussion in the > future. > > > > > > Moriyoshi Koizumi > > > > "Wez Furlong" <[EMAIL PROTECTED]> wrote: > > > > > Search the archives for the discussion. > > > phpinfo could determine the charset as your patch does at the > start, > > > and then pass the info in php_escape_html_entities. > > > > > > Seems easy to me. > > > > > > --Wez. > > > > > > On 10/16/02, "Moriyoshi Koizumi" <[EMAIL PROTECTED]> > wrote: > > > > Wez Furlong <[EMAIL PROTECTED]> wrote: > > > > > Unfortunately, we absolutely must remain 100% backwards > compatible with > > > > > htmlentities(), so this patch should not be applied. > > > > > > > > Were there any discussions exactly about this issue? Though I > have to see > > > > some historical reason, however I don't understand why 100% > backwards > > > > compatibility is required for htmlentities(). > > > > Because the patched htmlentities() acts in the same way with > default > > > > configuration, and IMHO defaulting to iso-8859-1 is quite > meaningless for > > > > the scripts that uses other charsets than it. > > > > > > > > Hmm... otherwise I would like to suggest a mbstring function > like > > > > mb_htmlentities(), but it would sound like a reinvention of > the same > > > > wheel... > > > > > > > > > However, I don't see a problem with making phpinfo determine > the charset > > > > > and passing that on to the internal htmlentities function? > > > > > > > > The problem is that php_info_html_esc() in ext/standard/info.c > calls > > > > php_escape_html_entities() with no charset information > specified. Without > > > > the patch, every character is treated as ISO-8859-1 even if a > fetched > > > > character is actually a mere first byte of a multibyte > character. > > > > > > > > > > > > Moriyoshi Koizumi > > > > > > > > > > > > > > > > -- > > > > PHP Development Mailing List <http://www.php.net/> > > > > To unsubscribe, visit: http://www.php.net/unsub.php > > > > > > > > > > > > > > > -- > > PHP Development Mailing List <http://www.php.net/> > > To unsubscribe, visit: http://www.php.net/unsub.php > > > > > > > > > -- > PHP Development Mailing List <http://www.php.net/> > To unsubscribe, visit: http://www.php.net/unsub.php > -- PHP Development Mailing List <http://www.php.net/> To unsubscribe, visit: http://www.php.net/unsub.php