On 03/12/2012 12:51 PM, Stas Malyshev wrote:
> Hi!
> 
>> But you can't necessarily hardcode the encoding if you are writing
>> portable code. That's a bit like hardcoding a timezone. In order to
>> write portable code you need to give people the ability to localize it.
> 
> No, it's not like timezone at all. I have to support all timezones in a
> global app, but I don't have to internally support every encoding on
> Earth - having everything internally in UTF-8 works quite well, and a
> lot of applications do exactly that - they have everything internally in
> UTF-8 and only may convert when importing or exporting the data. I don't
> see anything in using UTF-8 throughout the app/library that makes it
> non-portable. However, if we allow to change defaults in
> htmlspecialchars() etc. that essentially makes having defaults useless
> as I'd have so explicitly specify UTF-8 each time - otherwise it's a
> gamble what encoding I'd actually get.

If everything was UTF-8 we wouldn't have any of these issues.
Unfortunately that isn't the case. The question is what to do with apps
that need to deal with non UTF-8 data. Are we going to provide any help
to them beyond just telling them to convert everything to UTF-8?

We took steps in 5.4 to improve htmlspecialchars to understand more
encodings and we have the concept of script_encoding and
internal_encoding that is used both in the engine and in mbstring.
Currently internal_encoding isn't checked by htmlspecialchars. If you
pass it '' it checks script_encoding and default_charset which is a bit
odd since neither directly relate to the encoding of the internal data
you are feeding to it. So maybe a way to tackle this is to use the
mbstring internal encoding when it is set as the htmlspecialchars
default when it is called without an encoding arg.

-Rasmus

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to