Re: Filename Encoding

John Darrington Sun, 22 Dec 2013 08:52:53 -0800

On Tue, Dec 10, 2013 at 12:38:04PM -0800, Ben Pfaff wrote:

     in syntax and the output engine, we tend to convert everything we
     receive externally into UTF-8 for internal processing, and then convert
     back to other encodings as necessary.


I'm not sure this was the best decision.  How do we know to which encoding we 
should
convert back to?

Consider this scenario:

On a GNU/Linux system (where the filesystem is encoding agnostic) there exists 
two files 
which I shall call fileA and fileB.  

Let us assume that the bytes which comprise the the name of fileA happen to be 
valid UTF-8.  Let us also assume that the bytes which comprise the name of fileB
happen to be valid ISO-8859-1.  Further, let us also assume that when the name 
of fileB is converted from ISO-8859-1 to UTF-8 the result happens to be 
identical
to the name of fileA.

On "normal" applications the question is not relevant.  Filenames are simply 
byte 
strings.  However because we convert everything to UTF-8 in syntax (for example:
GET FILE="Äpfelfaß.sav".)  We no longer know the encoding of that filename.

I don't know how to solve this problem.

J'









-- 
PGP Public key ID: 1024D/2DE827B3 
fingerprint = 8797 A26D 0854 2EAB 0285  A290 8A67 719C 2DE8 27B3
See http://sks-keyservers.net or any PGP keyserver for public key.

signature.asc
Description: Digital signature

_______________________________________________
pspp-dev mailing list
[email protected]
https://lists.gnu.org/mailman/listinfo/pspp-dev

Re: Filename Encoding

Reply via email to