Using the (unsigned int)(unsigned char) in isspace() resolved the problem in my Windows build. I put some Rprintf statements into isBlankString and for type.convert("\247") it printed *s=3D-89 (4294967207 if unsigned) 8=3Disspace(*s) 8=3Disspace((unsigned int)*s) 0=3Disspace((unsigned int)(unsigned char)*s) I think the 8 is the value of a random bit of memory.
When I converted S+ to use full 8-bit characters I ran into the same problem. The is<class> macros in <ctype.h> all take unsigned int argument and if char was signed you had to do the double cast to avoid sign extension. Whoever designed the interface either didn't worry about 8-bit characters or had chars that were unsigned by default. It doesn't look like any of the isspace calls in R do this double casting. Bill Dunlap TIBCO Software Inc - Spotfire Division wdunlap tibco.com =20 > -----Original Message----- > From: Peter Dalgaard [mailto:p.dalga...@biostat.ku.dk]=20 > Sent: Friday, April 10, 2009 2:50 PM > To: William Dunlap > Cc: r-b...@r-project.org; Raberger, Stefan > Subject: Re: [Rd] type.convert (PR#13646) >=20 > William Dunlap wrote: > > You may have to use > > (unsigned int)(unsigned char)*s++ > > instead of just > > (unsigned int)*s++ > > to avoid the sign extension. >=20 > Thanks again, >=20 > I probably won't be doing the change since I don't have a=20 > Windows build=20 > environment around, and I'm a bit superstitious about fixing=20 > bugs that I=20 > cannot see... >=20 > Let me just filter this information into the bug repository for now. >=20 > -pd >=20 > >=20 > > Bill Dunlap > > TIBCO Software Inc - Spotfire Division > > wdunlap tibco.com =20 > >=20 > >> -----Original Message----- > >> From: Peter Dalgaard [mailto:p.dalga...@biostat.ku.dk]=20 > >> Sent: Friday, April 10, 2009 1:41 PM > >> To: William Dunlap > >> Cc: r-devel@r-project.org > >> Subject: Re: [Rd] type.convert (PR#13646) > >> > >> William Dunlap wrote: > >>> I can reproduce the difference that Stefan saw, depending > >>> on whether or not I start Rgui with the flags > >>> --no-environ --no-Rconsole > >>> I think it boils down to the isBlankString() function. > >>> For the string "\247" it returns 1 when those flags are > >>> not present and 0 when they are. isBlankString does use > >>> some locale-specific functions: > >>> Rboolean isBlankString(const char *s) > >>> { > >>> #ifdef SUPPORT_MBCS > >>> if(mbcslocale) { > >>> wchar_t wc; int used; mbstate_t mb_st; > >>> mbs_init(&mb_st); > >>> while( (used =3D Mbrtowc(&wc, s, MB_CUR_MAX, &mb_st)) ) { > >>> if(!iswspace(wc)) return FALSE; > >>> s +=3D used; > >>> } > >>> } else > >>> #endif > >>> while (*s) > >>> if (!isspace((int)*s++)) return FALSE; > >>> return TRUE; > >>> } > >>> > >>> I was using R 2.8.1, downloaded precompiled from CRAN, on Windows > >>> XP SP3. The outputs of sessionInfo() and Sys.getenv() are the same > >>> in both sessions. 'Process Explorer' shows that the 2 sessions > >>> have the same dll's opened. > >> Thanks for that analysis Bill! > >> > >> Stefan was in "German_Austria.1252" which I don't think is=20 > >> multibyte, so=20 > >> only the else-clause should be relevant, pointing the=20 > finger rather=20 > >> squarely at isspace(). Googling indicates that others have=20 > >> been caught=20 > >> out by signed/unsigned char issues there. Should this=20 > >> possibly rather read > >> > >> if (!isspace((unsigned int)*s++)) return FALSE; > >> > >> ?? > >> > >>>> sessionInfo() > >>> R version 2.8.1 (2008-12-22)=20 > >>> i386-pc-mingw32=20 > >>> > >>> locale: > >>> LC_COLLATE=3DEnglish_United=20 > >> States.1252;LC_CTYPE=3DEnglish_United=20 > >> States.1252;LC_MONETARY=3DEnglish_United=20 > >> States.1252;LC_NUMERIC=3DC;LC_TIME=3DEnglish_United States.1252 > >>> attached base packages: > >>> [1] stats graphics grDevices utils datasets =20 > >> methods base =20 > >>> I did the test with a dll compiled from > >>> #include <R.h> > >>> #include <R_ext/Utils.h> > >>> > >>> void test_isBlankString(char **s, int *res) > >>> { > >>> *res =3D isBlankString(*s) ; > >>> } > >>> > >>> and called by .C("test_isBlankString","\247",-1L) > >>> > >>> I don't see the difference while running a version of 2.9.0(devel) > >>> compiled locally on 11 March 2009 (from svn rev 48116). > >>> > >>> Bill Dunlap > >>> TIBCO Software Inc - Spotfire Division > >>> wdunlap tibco.com =20 > >>> > >>>> -----Original Message----- > >>>> From: r-devel-boun...@r-project.org=20 > >>>> [mailto:r-devel-boun...@r-project.org] On Behalf Of=20 > Peter Dalgaard > >>>> Sent: Friday, April 10, 2009 2:03 AM > >>>> To: Raberger, Stefan > >>>> Cc: r-b...@r-project.org; r-de...@stat.math.ethz.ch > >>>> Subject: Re: [Rd] type.convert (PR#13646) > >>>> > >>>> Raberger, Stefan wrote: > >>>>> Hi Peter, > >>>>> > >>>>> each of the four PCs actually has the same locale setting:=20 > >>>>> > >>>>>> Sys.setlocale("LC_CTYPE") > >>>>> [1] "German_Austria.1252" > >>>>> > >>>>> (all the other settings returned by invoking=20 > >>>> Sys.getlocale() are identical as well). > >>>>> Just to be sure (because it's displayed incorrectly in my=20 > >>>> browser on the bugtracking page): the character inside the=20 > >>>> type.convert function ought to be a "section"-sign (HTML Code=20 > >>>> § or § , in R "\247", and not a dot "."). > >>>> > >>>> I saw it correctly. It's "\302\247" in UTF8 locales, which is=20 > >>>> of course=20 > >>>> the reason I suspected locale settings, but I can't seem to=20 > >>>> trigger the=20 > >>>> NA behaviour. > >>>> > >>>> I'm at a loss here, but some ideas: > >>>> > >>>> In the cases where it returns NA, what type is it? (I.e.=20 > >>>> storage.mode(type.convert(....))) > >>>> > >>>> What do you get from > >>>> > >>>> > charToRaw("=A7") > >>>> [1] c2 a7 > >>>> > >>>> (a7, presumably, but better check). > >>>> > >>>> -p > >>>> > >>>>> -----Urspr=FCngliche Nachricht----- > >>>>> Von: Peter Dalgaard [mailto:p.dalga...@biostat.ku.dk]=20 > >>>>> Gesendet: Donnerstag, 09. April 2009 19:26 > >>>>> An: Raberger, Stefan > >>>>> Cc: r-de...@stat.math.ethz.ch; r-b...@r-project.org > >>>>> Betreff: Re: [Rd] type.convert (PR#13646) > >>>>> > >>>>> s.raber...@innovest.at wrote: > >>>>>> Full_Name: Stefan Raberger > >>>>>> Version: 2.8.1 > >>>>>> OS: Windows XP > >>>>>> Submission from: (NULL) (213.185.163.242) > >>>>>> > >>>>>> > >>>>>> Hi there,=20 > >>>>>> > >>>>>> I recently noticed some strange behaviour of the command=20 > >>>> "type.convert", > >>>>>> depending on the startup mode used. But there also seems=20 > >>>> to be different > >>>>>> behaviour on different PCs (all running the same OS and=20 > >>>> the same version of R). > >>>>>> On PC1: > >>>>>> When I start R in SDI mode (RGui --no-save --no-restore=20 > >>>> --no-site-file > >>>>>> --no-init-file --no-environ) and try to convert, the result is > >>>>>> > >>>>>>> type.convert("=A7") > >>>>>> [1] NA > >>>>>> > >>>>>> If I use MDI mode (RGui --no-save --no-restore=20 > >>>> --no-site-file --no-init-file > >>>>>> --no-environ --no-Rconsole) instead, the result is > >>>>>> > >>>>>>> type.convert("=A7") > >>>>>> [1] =A7 > >>>>>> Levels: =A7 > >>>>>> > >>>>>> On PC2 it's exactly the other way round (SDI: =A7, MDI: NA),=20 > >>>> on PC2 the result is > >>>>>> always NA, independent of the startup mode used, and on=20 > >>>> PC4 it's always =A7. > >>>>>> What's the result I should expect R to return, and why is=20 > >>>> it different in so > >>>>>> many cases? > >>>>> Which locale does R think it is in in the four cases?=20 > >>>>> (Sys.setlocale("LC_CTYPE"), I think). > >>>>> > >>>>> Might well not be a bug (so please don't file it as one). > >>>>> > >>>>>> Any help is much appreciated! > >>>>>> Regards, Stefan > >>>>>> > >>>>>> ______________________________________________ > >>>>>> R-devel@r-project.org mailing list > >>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel > >>>> --=20 > >>>> O__ ---- Peter Dalgaard =D8ster=20 > >> Farimagsgade 5, Entr.B > >>>> c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K > >>>> (*) \(*) -- University of Copenhagen Denmark Ph: =20 > >>>> (+45) 35327918 > >>>> ~~~~~~~~~~ - (p.dalga...@biostat.ku.dk) FAX:=20 > >>>> (+45) 35327907 > >>>> > >>>> ______________________________________________ > >>>> R-devel@r-project.org mailing list > >>>> https://stat.ethz.ch/mailman/listinfo/r-devel > >>>> > >> > >> --=20 > >> O__ ---- Peter Dalgaard =D8ster=20 > Farimagsgade 5, Entr.B > >> c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K > >> (*) \(*) -- University of Copenhagen Denmark Ph: =20 > >> (+45) 35327918 > >> ~~~~~~~~~~ - (p.dalga...@biostat.ku.dk) FAX:=20 > >> (+45) 35327907 > >> >=20 >=20 > --=20 > O__ ---- Peter Dalgaard =D8ster Farimagsgade 5, = Entr.B > c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K > (*) \(*) -- University of Copenhagen Denmark Ph: =20 > (+45) 35327918 > ~~~~~~~~~~ - (p.dalga...@biostat.ku.dk) FAX:=20 > (+45) 35327907 >=20 ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel