php-i18n Digest 2 Oct 2009 06:15:27 -0000 Issue 425
Topics (messages 1323 through 1324):
Re: Problem using international characters
1323 by: Nisse Engström
iconv - bad encoding
1324 by: Jarosek
Administrivia:
To subscribe to the digest, e-mail:
[email protected]
To unsubscribe from the digest, e-mail:
[email protected]
To post to the list, e-mail:
[email protected]
----------------------------------------------------------------------
--- Begin Message ---
On Mon, 29 Jun 2009 06:52:23 +0100, Nicholas Robinson wrote:
> I have a mysql database and use a php application that captures, stores,
> retrieves and displays data correctly - including French language words
> with accents. It has been running for around five years. I've recently
> written an extension that creates an openoffice writer document using
> this data. Everything works apart from the these wretched French
> characters!!! If I unzip the odt package and examine content.xml, then
> the characters are wrong - but simply cutting and pasting correct ones
> in gives me a working document, so the error is definitely in the way I
> am creating the content using php.
>
> An example of the problem is Côte. As I've just typed it, the o has a
> circumflex accent or 'hat' on it. Within the odt file, the o-circumflex
> is shown as ô. Piping this to od -c gives 303 203 302 264. If I take
> the o-circumflex character from gnome charmap and od -c this, then I get
> 303 264. If I copy the character from my php/web app then it is correct.
> Where are these two middle bytes coming from? I've tried various
> combinations of mbstring functions and ini file settings but without
> joy.
Hexadecimal is easier on my eyes, so:
303 203 302 264 == c3 83 c2 b4
303 264 == c3 b4
These are UTF-8 encodings:
<c3 83><c2 b4> == U+00C3 (LATIN CAPITAL LETTER A WITH TILDE),
U+00B4 (ACUTE ACCENT)
<c3 b4> == U+00F4 (LATIN SMALL LETTER O WITH CIRCUMFLEX)
In other words, somewhere in the process, a perfectly fine
UTF-8 encoded character:
<c3 b4> (U+00F4)
has been (incorrectly) converted from ISO-8859-1 (or similar)
to UTF-8, resulting in:
<c3 83><c2 b4> (U+00C3, U+00B4)
Perhaps this gives you some idea of what's going wrong.
/Nisse
--- End Message ---
--- Begin Message ---
Hello
I noticed a problem using iconv, but investigation showed, that this is
not exectly the iconv itself, but something like php encoding.
From the beginning:
a hava an incoming variable $text;
it has some Polish diactics 'strona główna'.
I expect to remove diactics : 'strona glowna'.
So I use iconv: iconv('utf-8', 'us-ascii//TRANSLIT', $text)
and unfortunetly i get: 'strona g??wna' ... But only under apache, in
browser
The same file, command, etc run from bash gives: 'strona glowna' ... OK
So i think : "bad config", but double checked and configs are
identical... (cli and apache2)
Best part: for 2 or 3 times after starting computer (debian), it didn't
worker until I did: (apache start && apache stop - not apache restart),
then it worked. Now it doesn't work any more. No other config (except
php was changed).
Any ideas?
--- End Message ---