John Darrington <[EMAIL PROTECTED]> writes: > Since we have absolutely no idea of the locale in which a system file > was created, I think we should simply take it on trust that the > variable names and strings within a file are valid ones.
Do you think we can assume that variables names are encoded in UTF-8? Then it is fairly easy to convert variable names to/from the current locale on system file input/output. I have not experimented with non-ASCII variable names in SPSS. A few experiments might turn up the encoding. > Thus lines such as [...] > need to be excised from sfm-read.c --- we don't know the locale in > which the file was written, so we don't know how isalpha/islower etc ought > to behave when reading. I think it'd still be a good idea to sanity-check variable names, assuming that we can figure out the variable name encoding used in system files. > Similarly, I think that that sfm-write should also not use any > ctype functions. Let's just assume that the dictionary and > casefiles are valid ones. I don't think sfm-write validates anything in the dictionary currently. > Instead, let's do all that sort of checking in the lexer, and the > output routines. Thus, > > DATA LIST LIST /Äpfel *. > > Will give an error (or perhaps just a warning) in the default "C" > locale, but continue happily if the LC_CTYPE locale has been set to > say "de_DE". Similarly, if I generate output from a system file which > was created in the "de_DE" locale, but my current locale is "en_US", > then the output routine will generate a warning when it encounters a > variable name for which isalpha returns false. Is that the way that other languages with support for internationalization parse variable names? e.g. how does Java work? I must admit that I have a pretty weak grasp of how this sort of thing is supposed to work. > So you're probably right, we'd need to audit the code for files which > currently use ctype (I had a look, it's about 12 files), and decide > whether they really should honour LC_CTYPE. [...] -- "To the engineer, the world is a toy box full of sub-optimized and feature-poor toys." --Scott Adams _______________________________________________ pspp-dev mailing list [email protected] http://lists.gnu.org/mailman/listinfo/pspp-dev
