php-i18n Digest 2 Oct 2009 06:15:27 -0000 Issue 425

Topics (messages 1323 through 1324):

Re: Problem using international characters
        1323 by: Nisse Engström

iconv - bad encoding
        1324 by: Jarosek

Administrivia:

To subscribe to the digest, e-mail:
        [email protected]

To unsubscribe from the digest, e-mail:
        [email protected]

To post to the list, e-mail:
        [email protected]


----------------------------------------------------------------------
--- Begin Message ---
On Mon, 29 Jun 2009 06:52:23 +0100, Nicholas Robinson wrote:

> I have a mysql database and use a php application that captures, stores,
> retrieves and displays data correctly - including French language words
> with accents. It has been running for around five years. I've recently
> written an extension that creates an openoffice writer document using
> this data. Everything works apart from the these wretched French
> characters!!! If I unzip the odt package and examine content.xml, then
> the characters are wrong - but simply cutting and pasting correct ones
> in gives me a working document, so the error is definitely in the way I
> am creating the content using php.
> 
> An example of the problem is Côte. As I've just typed it, the o has a
> circumflex accent or 'hat' on it. Within the odt file, the o-circumflex
> is shown as ô. Piping this to od -c gives 303 203 302 264. If I take
> the o-circumflex character from gnome charmap and od -c this, then I get
> 303 264. If I copy the character from my php/web app then it is correct.
> Where are these two middle bytes coming from? I've tried various
> combinations of mbstring functions and ini file settings but without
> joy.

Hexadecimal is easier on my eyes, so:

  303 203 302 264  ==  c3 83 c2 b4
  303 264          ==  c3 b4

These are UTF-8 encodings:

  <c3 83><c2 b4>  == U+00C3 (LATIN CAPITAL LETTER A WITH TILDE),
                     U+00B4 (ACUTE ACCENT)
  <c3 b4>         == U+00F4 (LATIN SMALL LETTER O WITH CIRCUMFLEX)


In other words, somewhere in the process, a perfectly fine
UTF-8 encoded character:

  <c3 b4> (U+00F4)

has been (incorrectly) converted from ISO-8859-1 (or similar)
to UTF-8, resulting in:

  <c3 83><c2 b4> (U+00C3, U+00B4)


Perhaps this gives you some idea of what's going wrong.


/Nisse

--- End Message ---
--- Begin Message ---
Hello

I noticed a problem using iconv, but investigation showed, that this is
not exectly the iconv itself, but something like php encoding.

From the beginning:

a hava an incoming variable $text;
it has some Polish diactics 'strona główna'.
I expect to remove diactics : 'strona glowna'.

So I use iconv: iconv('utf-8', 'us-ascii//TRANSLIT', $text)

and unfortunetly i get: 'strona g??wna' ... But only under apache, in
browser

The same file, command, etc run from bash gives: 'strona glowna' ... OK
So i think : "bad config", but double checked and configs are
identical... (cli and apache2)

Best part: for 2 or 3 times after starting computer (debian), it didn't
worker until I did: (apache start && apache stop - not apache restart),
then it worked. Now it doesn't work any more. No other config (except
php was changed).

Any ideas?




--- End Message ---

Reply via email to