On Tue, Dec 10, 2013 at 12:38:04PM -0800, Ben Pfaff wrote: in syntax and the output engine, we tend to convert everything we receive externally into UTF-8 for internal processing, and then convert back to other encodings as necessary.
I'm not sure this was the best decision. How do we know to which encoding we should convert back to? Consider this scenario: On a GNU/Linux system (where the filesystem is encoding agnostic) there exists two files which I shall call fileA and fileB. Let us assume that the bytes which comprise the the name of fileA happen to be valid UTF-8. Let us also assume that the bytes which comprise the name of fileB happen to be valid ISO-8859-1. Further, let us also assume that when the name of fileB is converted from ISO-8859-1 to UTF-8 the result happens to be identical to the name of fileA. On "normal" applications the question is not relevant. Filenames are simply byte strings. However because we convert everything to UTF-8 in syntax (for example: GET FILE="Äpfelfaß.sav".) We no longer know the encoding of that filename. I don't know how to solve this problem. J' -- PGP Public key ID: 1024D/2DE827B3 fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3 See http://sks-keyservers.net or any PGP keyserver for public key.
signature.asc
Description: Digital signature
_______________________________________________ pspp-dev mailing list pspp-dev@gnu.org https://lists.gnu.org/mailman/listinfo/pspp-dev