On quinta-feira, 7 de junho de 2012 14.58.03, Oswald Buddenhagen wrote:
> > Imagine a Qt application run from the command-line with:
> >     qtapp *
> >
> > In that directory there is a file name with broken encoding. The shell
> > will not recode (which is why I don't by the command-line encoding
> > argument).
> yeah, too bad.

Don't be so quick to dismiss it.

> > The Qt application should be expected to work and interpret it that
> > argument properly.
>
> and how? have you read the part about the encoding being mount point
> specific?

I have. This applies only to filesystems that store data in Unicode, like VFAT,
NTFS and  ISO9660+Joliet (and possibly UDF). For those, the user is expected
to mount the filesystem with the proper option so that the filenames are
rendered into the locale's encoding. If the proper setup was done, those
filenames will not be a problem. If it was done improperly, then we fall into
the next case.

The problem only arises from filesystems that don't store Unicode filenames, but
plain 8-bit C strings, like all the Unix filesystems. For those, there's no
concept of locale. Filenames are simply arbitrary data and can contain any
byte but two: null and slash.

> how does the application know whether the caller did 8-bit
> pass-through or actually did the right thing and recoded to the locale
> encoding (which would be the case for example when you paste a correctly
> decoded filename from a gui to the command line)?

Like this: the application assumes that an 8-bit input contains the data that
is obtained from the OS's 8-bit API. That's what most applications do today,
that's what the shell would do when it expands *. For the same reason, an
application *should* produce the same 8-bit form in its output. Not another.

There's no such thing as a "correctly decoded filename from a GUI". The GUI is
still using the OS's 8-bit API and is subject to the same decoding problems as
the command-line application. For that reason, both applications need to make
the same encoding decisions.

> this is simply a no-win situation, and by trying to work around it you
> make it only worse by introducing unpredictability into the game.

Agreed. That's why I've given up on solving this problem completely.

My solution:

File names outside the locale are filesystem corruption. Qt applications do not
need to handle them. Leave that for system administration applications.

> the lesser evil is imo assuming correct locale encoding when actually
> interpreting external input, being consistent within the qt realm when
> dealing with i/o functions, and having functions for 8-bit pass-through
> when dealing with external things which are just passed along
> (qprocessenvironment already has this; it should be possible to do the
> same for cmdline args by having laszlo's work integrate with qprocess as
> well).

I agree on that too. Which is why I am telling João that the idea that
filesystem encoding != locale encoding is insane. It simply cannot be
implemented properly.

--
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel Open Source Technology Center
     Intel Sweden AB - Registration Number: 556189-6027
     Knarrarnäsgatan 15, 164 40 Kista, Stockholm, Sweden

Attachment: signature.asc
Description: This is a digitally signed message part.

_______________________________________________
Development mailing list
Development@qt-project.org
http://lists.qt-project.org/mailman/listinfo/development

Reply via email to