php-i18n Digest 5 Oct 2009 01:30:56 -0000 Issue 426
Topics (messages 1325 through 1328):
PHP support of Unicode?
1325 by: Gunnar Vestergaard
1326 by: Rasmus Lerdorf
1327 by: Tex Texin
Re: iconv - bad encoding
1328 by: Moriyoshi Koizumi
Administrivia:
To subscribe to the digest, e-mail:
[email protected]
To unsubscribe from the digest, e-mail:
[email protected]
To post to the list, e-mail:
[email protected]
----------------------------------------------------------------------
--- Begin Message ---
When using PHP, writing content in my local language and my neighbouring
contries' language, ISO 8859-1 has been sufficient as a character
encoding. But using PHP with other languages, is that possible? I mean,
does PHP support Unicode at present time? As I understand it, the
following statement is true:
"PHP supports Unicode only as long as it is encoded as UTF-8"
Is that correct, or does PHP also support UTF-16?
--- End Message ---
--- Begin Message ---
Gunnar Vestergaard wrote:
> When using PHP, writing content in my local language and my neighbouring
> contries' language, ISO 8859-1 has been sufficient as a character
> encoding. But using PHP with other languages, is that possible? I mean,
> does PHP support Unicode at present time? As I understand it, the
> following statement is true:
> "PHP supports Unicode only as long as it is encoded as UTF-8"
>
> Is that correct, or does PHP also support UTF-16?
It depends what you are doing. PCRE, our regex library, only speaks
UTF-8 and there are functions like json_encode() that assume utf-8 as
well. If you are just doing pass-through stuff, you can use whatever
you want. It is only if you want to manipulate the text in some manner
that you need to worry about the encoding.
-Rasmus
--- End Message ---
--- Begin Message ---
Rasmus, Gunnar,
When Rasmus says "manipulate", it means to me "modify". Perhaps that is not
Rasmus' meaning.
However, you may need to be aware of the encoding if you are testing,
comparing, searching values, etc. as well.
For example a case-insensitive search would need to be aware of the encoding
to use the right values for upper and lower case.
More generally you should be aware of the encoding, label it properly, and
potentially convert encodings appropriately to/from processes or I/O that
may require another encoding.
My answer to Gunnar's question is that UTF-8 is a perfectly valid form of
Unicode (UTF-16, UTF-32 being others) and you don't need to favor utf-16.
PHP 5.3 has more functions for internationalization that are utf-8 and
locale based. You might look into those.
tex
-----Original Message-----
From: Rasmus Lerdorf [mailto:[email protected]]
Sent: Saturday, October 03, 2009 9:59 AM
To: Gunnar Vestergaard
Cc: [email protected]
Subject: Re: [PHP-I18N] PHP support of Unicode?
Gunnar Vestergaard wrote:
> When using PHP, writing content in my local language and my neighbouring
> contries' language, ISO 8859-1 has been sufficient as a character
> encoding. But using PHP with other languages, is that possible? I mean,
> does PHP support Unicode at present time? As I understand it, the
> following statement is true:
> "PHP supports Unicode only as long as it is encoded as UTF-8"
>
> Is that correct, or does PHP also support UTF-16?
It depends what you are doing. PCRE, our regex library, only speaks
UTF-8 and there are functions like json_encode() that assume utf-8 as
well. If you are just doing pass-through stuff, you can use whatever
you want. It is only if you want to manipulate the text in some manner
that you need to worry about the encoding.
-Rasmus
--
PHP Unicode & I18N Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
--- End Message ---
--- Begin Message ---
Are you sure that $text was really encoded in UTF-8 when the output
was garbled? try dumping bin2hex($text) to see if there's any
difference.
Regards,
Moriyoshi
2009/10/2 Jarosek <[email protected]>:
> Hello
>
> I noticed a problem using iconv, but investigation showed, that this is
> not exectly the iconv itself, but something like php encoding.
>
> From the beginning:
>
> a hava an incoming variable $text;
> it has some Polish diactics 'strona główna'.
> I expect to remove diactics : 'strona glowna'.
>
> So I use iconv: iconv('utf-8', 'us-ascii//TRANSLIT', $text)
>
> and unfortunetly i get: 'strona g??wna' ... But only under apache, in
> browser
>
> The same file, command, etc run from bash gives: 'strona glowna' ... OK
> So i think : "bad config", but double checked and configs are
> identical... (cli and apache2)
>
> Best part: for 2 or 3 times after starting computer (debian), it didn't
> worker until I did: (apache start && apache stop - not apache restart),
> then it worked. Now it doesn't work any more. No other config (except
> php was changed).
>
> Any ideas?
>
>
>
>
> --
> PHP Unicode & I18N Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>
--- End Message ---