John Darrington <[EMAIL PROTECTED]> writes:

> Since we have absolutely no idea of the locale in which a system file
> was created, I think we should simply take it on trust that the
> variable names and strings within a file are valid ones.  

Do you think we can assume that variables names are encoded in
UTF-8?  Then it is fairly easy to convert variable names to/from
the current locale on system file input/output.

I have not experimented with non-ASCII variable names in SPSS.  A
few experiments might turn up the encoding.

> Thus lines such as
[...]
> need to be excised from sfm-read.c --- we don't know the locale in
> which the file was written, so we don't know how isalpha/islower etc ought
> to behave when reading.  

I think it'd still be a good idea to sanity-check variable names,
assuming that we can figure out the variable name encoding used
in system files.

> Similarly, I think that that sfm-write should also not use any
> ctype functions. Let's just assume that the dictionary and
> casefiles are valid ones.

I don't think sfm-write validates anything in the dictionary
currently.

> Instead, let's do all that sort of checking in the lexer, and the
> output routines.  Thus, 
>
>  DATA LIST LIST /Äpfel *.
>
> Will give an error (or perhaps just a warning) in the default "C"
> locale, but continue happily if the LC_CTYPE locale has been set to
> say "de_DE".  Similarly, if I generate output from a system file which
> was created in the "de_DE" locale, but my current locale is "en_US",
> then the output routine will generate a warning when it encounters a
> variable name for which isalpha returns false.

Is that the way that other languages with support for
internationalization parse variable names?  e.g. how does Java
work?  I must admit that I have a pretty weak grasp of how this
sort of thing is supposed to work.

> So you're probably right, we'd need to audit the code for files which
> currently use ctype (I had a look, it's about 12 files), and decide
> whether they really should honour LC_CTYPE.  [...]
-- 
"To the engineer, the world is a toy box full of sub-optimized and
 feature-poor toys."
--Scott Adams


_______________________________________________
pspp-dev mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/pspp-dev

Reply via email to