> From: r-devel-boun...@r-project.org > [mailto:r-devel-boun...@r-project.org] On Behalf Of wdun...@tibco.com > Sent: Friday, April 10, 2009 4:00 PM > To: r-de...@stat.math.ethz.ch > Cc: r-b...@r-project.org > Subject: Re: [Rd] type.convert (PR#13646) > > Using the (unsigned int)(unsigned char) in isspace() > resolved the problem in my Windows build.
(int)(unsigned char) the proper thing, since isspace is declared to be int isspace(int). The (unsigned int)(unsigned char) will work because C does the unsigned int -> int conversion automatically when the prototype is present and that conversion doesn't change the value of the thing. > I put some Rprintf > statements into isBlankString and for type.convert("\247") > it printed > *s=3D-89 (4294967207 if unsigned) > 8=3Disspace(*s) > 8=3Disspace((unsigned int)*s) > 0=3Disspace((unsigned int)(unsigned char)*s) > I think the 8 is the value of a random bit of memory. > > When I converted S+ to use full 8-bit characters I ran > into the same problem. The is<class> macros in <ctype.h> > all take unsigned int argument and if char was signed you had > to do the double cast to avoid sign extension. Whoever > designed the interface either didn't worry about 8-bit characters > or had chars that were unsigned by default. > > It doesn't look like any of the isspace calls in R do > this double casting. > > Bill Dunlap > TIBCO Software Inc - Spotfire Division > wdunlap tibco.com =20 > > > -----Original Message----- > > From: Peter Dalgaard [mailto:p.dalga...@biostat.ku.dk]=20 > > Sent: Friday, April 10, 2009 2:50 PM > > To: William Dunlap > > Cc: r-b...@r-project.org; Raberger, Stefan > > Subject: Re: [Rd] type.convert (PR#13646) > >=20 > > William Dunlap wrote: > > > You may have to use > > > (unsigned int)(unsigned char)*s++ > > > instead of just > > > (unsigned int)*s++ > > > to avoid the sign extension. > >=20 > > Thanks again, > >=20 > > I probably won't be doing the change since I don't have a=20 > > Windows build=20 > > environment around, and I'm a bit superstitious about fixing=20 > > bugs that I=20 > > cannot see... > >=20 > > Let me just filter this information into the bug repository for now. > >=20 > > -pd > >=20 > > >=20 > > > Bill Dunlap > > > TIBCO Software Inc - Spotfire Division > > > wdunlap tibco.com =20 > > >=20 > > >> -----Original Message----- > > >> From: Peter Dalgaard [mailto:p.dalga...@biostat.ku.dk]=20 > > >> Sent: Friday, April 10, 2009 1:41 PM > > >> To: William Dunlap > > >> Cc: r-devel@r-project.org > > >> Subject: Re: [Rd] type.convert (PR#13646) > > >> > > >> William Dunlap wrote: > > >>> I can reproduce the difference that Stefan saw, depending > > >>> on whether or not I start Rgui with the flags > > >>> --no-environ --no-Rconsole > > >>> I think it boils down to the isBlankString() function. > > >>> For the string "\247" it returns 1 when those flags are > > >>> not present and 0 when they are. isBlankString does use > > >>> some locale-specific functions: > > >>> Rboolean isBlankString(const char *s) > > >>> { > > >>> #ifdef SUPPORT_MBCS > > >>> if(mbcslocale) { > > >>> wchar_t wc; int used; mbstate_t mb_st; > > >>> mbs_init(&mb_st); > > >>> while( (used =3D Mbrtowc(&wc, s, MB_CUR_MAX, > &mb_st)) ) { > > >>> if(!iswspace(wc)) return FALSE; > > >>> s +=3D used; > > >>> } > > >>> } else > > >>> #endif > > >>> while (*s) > > >>> if (!isspace((int)*s++)) return FALSE; > > >>> return TRUE; > > >>> } > > >>> > > >>> I was using R 2.8.1, downloaded precompiled from CRAN, > on Windows > > >>> XP SP3. The outputs of sessionInfo() and Sys.getenv() > are the same > > >>> in both sessions. 'Process Explorer' shows that the 2 sessions > > >>> have the same dll's opened. > > >> Thanks for that analysis Bill! > > >> > > >> Stefan was in "German_Austria.1252" which I don't think is=20 > > >> multibyte, so=20 > > >> only the else-clause should be relevant, pointing the=20 > > finger rather=20 > > >> squarely at isspace(). Googling indicates that others have=20 > > >> been caught=20 > > >> out by signed/unsigned char issues there. Should this=20 > > >> possibly rather read > > >> > > >> if (!isspace((unsigned int)*s++)) return FALSE; > > >> > > >> ?? > > >> > > >>>> sessionInfo() > > >>> R version 2.8.1 (2008-12-22)=20 > > >>> i386-pc-mingw32=20 > > >>> > > >>> locale: > > >>> LC_COLLATE=3DEnglish_United=20 > > >> States.1252;LC_CTYPE=3DEnglish_United=20 > > >> States.1252;LC_MONETARY=3DEnglish_United=20 > > >> States.1252;LC_NUMERIC=3DC;LC_TIME=3DEnglish_United States.1252 > > >>> attached base packages: > > >>> [1] stats graphics grDevices utils datasets =20 > > >> methods base =20 > > >>> I did the test with a dll compiled from > > >>> #include <R.h> > > >>> #include <R_ext/Utils.h> > > >>> > > >>> void test_isBlankString(char **s, int *res) > > >>> { > > >>> *res =3D isBlankString(*s) ; > > >>> } > > >>> > > >>> and called by .C("test_isBlankString","\247",-1L) > > >>> > > >>> I don't see the difference while running a version of > 2.9.0(devel) > > >>> compiled locally on 11 March 2009 (from svn rev 48116). > > >>> > > >>> Bill Dunlap > > >>> TIBCO Software Inc - Spotfire Division > > >>> wdunlap tibco.com =20 > > >>> > > >>>> -----Original Message----- > > >>>> From: r-devel-boun...@r-project.org=20 > > >>>> [mailto:r-devel-boun...@r-project.org] On Behalf Of=20 > > Peter Dalgaard > > >>>> Sent: Friday, April 10, 2009 2:03 AM > > >>>> To: Raberger, Stefan > > >>>> Cc: r-b...@r-project.org; r-de...@stat.math.ethz.ch > > >>>> Subject: Re: [Rd] type.convert (PR#13646) > > >>>> > > >>>> Raberger, Stefan wrote: > > >>>>> Hi Peter, > > >>>>> > > >>>>> each of the four PCs actually has the same locale setting:=20 > > >>>>> > > >>>>>> Sys.setlocale("LC_CTYPE") > > >>>>> [1] "German_Austria.1252" > > >>>>> > > >>>>> (all the other settings returned by invoking=20 > > >>>> Sys.getlocale() are identical as well). > > >>>>> Just to be sure (because it's displayed incorrectly in my=20 > > >>>> browser on the bugtracking page): the character inside the=20 > > >>>> type.convert function ought to be a "section"-sign > (HTML Code=20 > > >>>> § or § , in R "\247", and not a dot "."). > > >>>> > > >>>> I saw it correctly. It's "\302\247" in UTF8 locales, > which is=20 > > >>>> of course=20 > > >>>> the reason I suspected locale settings, but I can't seem to=20 > > >>>> trigger the=20 > > >>>> NA behaviour. > > >>>> > > >>>> I'm at a loss here, but some ideas: > > >>>> > > >>>> In the cases where it returns NA, what type is it? (I.e.=20 > > >>>> storage.mode(type.convert(....))) > > >>>> > > >>>> What do you get from > > >>>> > > >>>> > charToRaw("=A7") > > >>>> [1] c2 a7 > > >>>> > > >>>> (a7, presumably, but better check). > > >>>> > > >>>> -p > > >>>> > > >>>>> -----Urspr=FCngliche Nachricht----- > > >>>>> Von: Peter Dalgaard [mailto:p.dalga...@biostat.ku.dk]=20 > > >>>>> Gesendet: Donnerstag, 09. April 2009 19:26 > > >>>>> An: Raberger, Stefan > > >>>>> Cc: r-de...@stat.math.ethz.ch; r-b...@r-project.org > > >>>>> Betreff: Re: [Rd] type.convert (PR#13646) > > >>>>> > > >>>>> s.raber...@innovest.at wrote: > > >>>>>> Full_Name: Stefan Raberger > > >>>>>> Version: 2.8.1 > > >>>>>> OS: Windows XP > > >>>>>> Submission from: (NULL) (213.185.163.242) > > >>>>>> > > >>>>>> > > >>>>>> Hi there,=20 > > >>>>>> > > >>>>>> I recently noticed some strange behaviour of the command=20 > > >>>> "type.convert", > > >>>>>> depending on the startup mode used. But there also seems=20 > > >>>> to be different > > >>>>>> behaviour on different PCs (all running the same OS and=20 > > >>>> the same version of R). > > >>>>>> On PC1: > > >>>>>> When I start R in SDI mode (RGui --no-save --no-restore=20 > > >>>> --no-site-file > > >>>>>> --no-init-file --no-environ) and try to convert, the > result is > > >>>>>> > > >>>>>>> type.convert("=A7") > > >>>>>> [1] NA > > >>>>>> > > >>>>>> If I use MDI mode (RGui --no-save --no-restore=20 > > >>>> --no-site-file --no-init-file > > >>>>>> --no-environ --no-Rconsole) instead, the result is > > >>>>>> > > >>>>>>> type.convert("=A7") > > >>>>>> [1] =A7 > > >>>>>> Levels: =A7 > > >>>>>> > > >>>>>> On PC2 it's exactly the other way round (SDI: =A7, > MDI: NA),=20 > > >>>> on PC2 the result is > > >>>>>> always NA, independent of the startup mode used, and on=20 > > >>>> PC4 it's always =A7. > > >>>>>> What's the result I should expect R to return, and why is=20 > > >>>> it different in so > > >>>>>> many cases? > > >>>>> Which locale does R think it is in in the four cases?=20 > > >>>>> (Sys.setlocale("LC_CTYPE"), I think). > > >>>>> > > >>>>> Might well not be a bug (so please don't file it as one). > > >>>>> > > >>>>>> Any help is much appreciated! > > >>>>>> Regards, Stefan > > >>>>>> > > >>>>>> ______________________________________________ > > >>>>>> R-devel@r-project.org mailing list > > >>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel > > >>>> --=20 > > >>>> O__ ---- Peter Dalgaard =D8ster=20 > > >> Farimagsgade 5, Entr.B > > >>>> c/ /'_ --- Dept. of Biostatistics PO Box 2099, > 1014 Cph. K > > >>>> (*) \(*) -- University of Copenhagen Denmark Ph: =20 > > >>>> (+45) 35327918 > > >>>> ~~~~~~~~~~ - (p.dalga...@biostat.ku.dk) FAX:=20 > > >>>> (+45) 35327907 > > >>>> > > >>>> ______________________________________________ > > >>>> R-devel@r-project.org mailing list > > >>>> https://stat.ethz.ch/mailman/listinfo/r-devel > > >>>> > > >> > > >> --=20 > > >> O__ ---- Peter Dalgaard =D8ster=20 > > Farimagsgade 5, Entr.B > > >> c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K > > >> (*) \(*) -- University of Copenhagen Denmark Ph: =20 > > >> (+45) 35327918 > > >> ~~~~~~~~~~ - (p.dalga...@biostat.ku.dk) FAX:=20 > > >> (+45) 35327907 > > >> > >=20 > >=20 > > --=20 > > O__ ---- Peter Dalgaard =D8ster Farimagsgade 5, = > Entr.B > > c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K > > (*) \(*) -- University of Copenhagen Denmark Ph: =20 > > (+45) 35327918 > > ~~~~~~~~~~ - (p.dalga...@biostat.ku.dk) FAX:=20 > > (+45) 35327907 > >=20 > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel