> On Sun, 17 Apr 2005, Jan T. Kim wrote: > >> On Sun, Apr 17, 2005 at 12:38:10PM +0100, Prof Brian Ripley wrote: >>> These are some points stimulated by reading about C history (and >>> related in their implementation). >>> >>> >>> 1) On some platforms >>> >>>> as.integer("0xA") >>> [1] 10 >>> >>> but not all (not on Solaris nor Windows). We do not define what is >>> allowed, and rely on the OS's implementation of strtod (yes, not >>> strtol). >>> It seems that glibc does allow hex: C99 mandates it but C89 seems not >>> to >>> allow it. >>> >>> I think that was a mistake, and strtol should have been used. Then C89 >>> does mandate the handling of hex constants and also octal ones. So >>> changing to strtol would change the meaning of as.integer("011"). >> >> I think interpretation of a leading "0" as a prefix indicating an octal >> representation should indeed be avoided. People not familiar to C will >> have a hard time understanding and getting used to this concept, and >> in addition, it happens way too often that numeric data are provided >> left- >> padded with zeros.
I agree with this: 011 should be 11, it should not be 9. >>> Proposal: we handle this ourselves and define what values are >>> acceptable, >>> namely for as.integer: >>> >>> [+|-][0-9]+ >>> NA >>> 0[x|X][0-9A-fa-f]+ >> >> It can be a somewhat mixed blessing if the string representation of >> numeric >> values contain information about their base, in the form of the 0x >> prefix >> in this case. >> >> The base argument (#3) of C's strtol function can be set to to a base >> explicitly or to 0, which gives the prefix-based "auto-selection" >> behaviour. On the R level, such a base argument (to as.integer) could be >> included and a default could be set. > > A lot of this is internal, not at R level. > >> Personally, I would be equally happy with the default being 0 >> (auto-select) >> or 10. Considering the perhaps limited spread of familiarity with C's >> "0x" idiom, I somewhat favour a consistent and "stubborn" decimal >> behaviour >> (base defaults to 10), though. > > Some people already rely on it, and those who don't know about it are > unliekly to ever enter what they think is an illegal value, surely? As long as we document it, I think the 0x prefix is fine. We should provide a way to use other bases on input and output. This could be through format specifiers, but it would be enough to have a pair of dedicated functions to do the conversions. Duncan Murdoch ______________________________________________ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel