[Savane-dev] [bug #1981] pretend to be utf8 but is not

Tobias Toedter Thu, 24 Mar 2005 01:43:01 -0800

Follow-up Comment #29, bug #1981 (project savane):

"DIG proposal is interesting. Unfortunately, since the whole utf8 stuff
already happened on the trunk, it do not know what we should do exactly."


I don't know what you mean with this. If you refer to the changes I made WRT
ngettext, I can assure you that there are no UTF-8 issues mixed in the current
code. Are there any other problem areas?

"- Are we sure have only iso-8859-1 content?"

Well, not 100% sure. Most of the users will have used iso-8859-1 (or
iso-8859-15) for the browsing. I guess there's only a very little fraction (if
at all) which used another encoding. The reason simply is that savane did not
work properly with other charsets than iso-8859-1(5). It didn't even display
the affected languages like Russian, Japanese and Korean.

But I'm afraid that there might be some small fraction of database content
which is not pure iso-8859-1. This content will most probably not display
correctly after the conversion.

"- Are we sure only utf-8 will be put in the database afterwards?"

I think so, yes. The reason is that the webpages will be in true UTF-8
encoding, so the browser will send back any forms in UTF-8, too, if the
program is not totally brain-dead. So the data savane gets from the user is
already in UTF-8 encoding.

"But Savane will not convert stuff to utf8 each time it loads a page, will it?
If so, isn't it resources (cpu) waste -- since we'll want utf8 anyway,
converting everything once and for all would save resources?"

Well, yes and no ...

The calls to gettext() are using this conversion since the beginning of
savane, it's happening in the gettext-library compiled into PHP. The
conversion happens if the encoding of the output locale is different from the
encoding of the corresponding .mo file.

This overhead is probably rather small WRT the cpu, but this is guessing and I
might be wrong. As an example, currently this conversion has to be done for
German: the .mo file is in UTF-8, but the webpages are in iso-8859-1.

To minimize this need for conversion, it's certainly possible to require the
.po files to use UTF-8. This is a matter of internal policy of savane and of
stepping on the translators' toes ;-)

"If this item can be closed by the end of the week, that should not delay the
release. Would it be possible?"

Oops, sorry. Obviously, it was too late already yesterday. Somehow, I read
"the end of the week" as "the end of April". If you're talking about two or
three days left, I very much favor DIG's proposal. I don't have a good feeling
of forcing this conversion in the next few hours, especially because it
involves a *huge* amount of valuable data. After all, the database is what
savane is all about, the frontend is just for accessing it ... (might be a
little bit overemphasized).

So, in conclusion, I think DIG has had a very good idea. Mathieu, do you think
that we can release 1.0.7 really shortly after 1.0.6? Like, say, four weeks?
And this release should really just include the transition to UTF-8, nothing
else.

    _______________________________________________________

Reply to this item at:

  <http://gna.org/bugs/?func=detailitem&item_id=1981>

_______________________________________________
  Message sent via/by Gna!
  http://gna.org/


_______________________________________________
Savane-dev mailing list
[email protected]
https://mail.gna.org/listinfo/savane-dev

[Savane-dev] [bug #1981] pretend to be utf8 but is not

Reply via email to