On Tue, Mar 13, 2012 at 1:52 AM, Yasuo Ohgaki <yohg...@ohgaki.net> wrote:
> 2012/3/13 Rasmus Lerdorf <ras...@lerdorf.com>: > > On 03/12/2012 03:05 AM, Yasuo Ohgaki wrote: > >> I thought default_charset became UTF-8, so I was expecting > >> following HTTP header. > >> > >> content-type text/html; charset=UTF-8 > >> > >> However, I got empty charset (missing 'charset=UTF-8'). > >> So I looked up to source and found the line in SAPI.h > >> > >> 293 #define SAPI_DEFAULT_CHARSET "" > >> > >> Empty string should be "UTF-8", isn't it? > > > > No, we can't force an output charset on people since it would end up > > breaking a lot of sites. > > Right, so may be for the next major release? 5.5.0? > > As the first XSS advisory in 2000 states, explicitly setting char coding > will > prevent certain XSS. Recent browsers have much better encoding handing, > but setting encoding explicitly is better for security still. > > > PHP 5.3's determine_charset behaves exactly like 5.4's. In 5.3 we have: > > > > if (charset_hint == NULL) > > return cs_8859_1; > > > > and in 5.4 we have: > > > > if (charset_hint == NULL) > > return cs_utf_8; > > > > So there is no difference in their guessing when there is no hint, the > > only difference is that in 5.4 we choose utf8 and in 5.3 we choose > > 8859-1 in that case. > > I got this with 5.3 > <?php > echo htmlentities('<日本語UTF-8>',ENT_QUOTES); > echo htmlentities('<日本語UTF-8>',ENT_QUOTES, 'UTF-8'); > > <æ�¥æ�¬èª�UTF8 > ><日本語UTF-8> > > So people migrating from 5.3 to 5.4 should not have problems. > Migration older than 5.3 to 5.4 will be problematic. > > I always set all parameters for htmlentities/htmlspecialchars, therefore > I haven't noticed this was changed from 5.3. They may be migrating from > 5.2 or older. (RHEL5 uses 5.1) > > Since PHP does not have default multibyte module, it may be good for having > > input_encoding > internal_encoding > output_encoding > > I would then propose to make mbstring compile time mandatory. I'm against yet another global ini setting, I find the actual ini settings confusing enough to add one more that would moreover reflect mbstring one's (and add more and more confusion). Why not turn ext/mbstring mandatory at compile time, for all future PHP versions, like preg or spl are ? We do need multibyte handling either. ZendEngine takes advantage of mbstring for internal encoding as well, so I probably missed something as why it is still possible to --disable-mbstring (or not add --enable-mbstring) when compiling ? Has it a huge performance impact ? Thank you :) Julien.P