Bug#297074: Temporary file created by poedit sould be of the same encoding as the file being edited

Marcin Owsiany Sun, 27 Feb 2005 16:43:19 -0800

On Mon, Feb 28, 2005 at 12:55:11AM +0100, Boris Yakobowski wrote:
> On Sun, Feb 27, 2005 at 11:26:06PM +0100, Marcin Owsiany wrote:
> > As far as I know, potool knows nothing about encodings, so it should be
> > completly transparent to them, and just pass text from the po file in
> > whatever encoding it is, unchanged, to the temp file, and back. But I
> > may be wrong.
> 
> Yes


Ah, so the issue is not that poedit performs some inappropriate
recoding, but that $EDITOR decides to interpret a file containing just
US-ASCII file as iso-8859-15, and not as UTF-8. But then after you input
some non-us-ascii characters (which emacs encodes as iso-8859-15),
poedit merges a UTF-8 and an iso-8859-15 file.

> but I find the current behavior unsatisfactory because it is the
> responsibility of the user to set the correct encoding for the temporary
> file. Otherwise it is appended as is, in an incorrect way; in my case emacs
> saw the temporary file as an iso-8859-15 file (which was technically
> correct), and then poedit merged it as is with the original utf8 po file.

The problem is that the temporary file which poedit creates does not
have any metadata which would indicate its encoding. Therefore emacs is
free to choose whatever encoding it feels is appropriate. And since on
creation the file contains pure US-ASCII, emacs chooses iso-8859-15.

> So
> there are two ways to correct this in my opinion :
> - the temporary file is created with the correct encoding

What do you mean by "with the correct encoding"? The problem is exactly
that for pure US-ASCII input, its iso-8859-15 and UTF-8 representations
are _exactly_ the same. So technically speaking, it _does_ have the
correct encoding.

> - the temporary file is converted after it has been saved, before being
> merged.

Since automagic detection of encoding (based just on the data) seems a
very risky business, in order to perform a conversion, two things would
be needed:
 - a specification of the target encoding (could be easily retrieved
   from the original po file Content-Type: header)
 - a specification of the source encoding, i.e. "what encoding $EDITOR
   chose to save your input in". I can't see how that could be done for
   any editor in general.

However, I can see a third possibility, namely to have poedit prepend a
Content-Type header, which would hopefully force $EDITOR into using
correct (i.e. matching the initial po file) encoding for the following
input.

> I think just about anything would work ; besides it is highly locales and
> emacs/whatever dependent unfortunately...

By the way, doesn't something like:

LC_CTYPE=fr_FR.UTF-8 poedit blah.po

provide a workaround? I guess that should force into using UTF-8 as the
tempfile encoding..

Marcin
-- 
Marcin Owsiany <[EMAIL PROTECTED]>             http://marcin.owsiany.pl/
GnuPG: 1024D/60F41216  FE67 DA2D 0ACA FC5E 3F75  D6F6 3A0D 8AA0 60F4 1216


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Bug#297074: Temporary file created by poedit sould be of the same encoding as the file being edited

Reply via email to