Re: [Rd] type.convert (PR#13646)
Raberger, Stefan wrote: Hi Peter, each of the four PCs actually has the same locale setting: Sys.setlocale(LC_CTYPE) [1] German_Austria.1252 (all the other settings returned by invoking Sys.getlocale() are identical as well). Just to be sure (because it's displayed incorrectly in my browser on the bugtracking page): the character inside the type.convert function ought to be a section-sign (HTML Code #167; or sect; , in R \247, and not a dot .). I saw it correctly. It's \302\247 in UTF8 locales, which is of course the reason I suspected locale settings, but I can't seem to trigger the NA behaviour. I'm at a loss here, but some ideas: In the cases where it returns NA, what type is it? (I.e. storage.mode(type.convert())) What do you get from charToRaw(§) [1] c2 a7 (a7, presumably, but better check). -p -Ursprüngliche Nachricht- Von: Peter Dalgaard [mailto:p.dalga...@biostat.ku.dk] Gesendet: Donnerstag, 09. April 2009 19:26 An: Raberger, Stefan Cc: r-de...@stat.math.ethz.ch; r-b...@r-project.org Betreff: Re: [Rd] type.convert (PR#13646) s.raber...@innovest.at wrote: Full_Name: Stefan Raberger Version: 2.8.1 OS: Windows XP Submission from: (NULL) (213.185.163.242) Hi there, I recently noticed some strange behaviour of the command type.convert, depending on the startup mode used. But there also seems to be different behaviour on different PCs (all running the same OS and the same version of R). On PC1: When I start R in SDI mode (RGui --no-save --no-restore --no-site-file --no-init-file --no-environ) and try to convert, the result is type.convert(§) [1] NA If I use MDI mode (RGui --no-save --no-restore --no-site-file --no-init-file --no-environ --no-Rconsole) instead, the result is type.convert(§) [1] § Levels: § On PC2 it's exactly the other way round (SDI: §, MDI: NA), on PC2 the result is always NA, independent of the startup mode used, and on PC4 it's always §. What's the result I should expect R to return, and why is it different in so many cases? Which locale does R think it is in in the four cases? (Sys.setlocale(LC_CTYPE), I think). Might well not be a bug (so please don't file it as one). Any help is much appreciated! Regards, Stefan __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - (p.dalga...@biostat.ku.dk) FAX: (+45) 35327907 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Package (PR#13475)
I had the same normalizePath error recently on a new laptop, with a fresh install of R 2.8.1 and an attempt to install lme4. First attempt: package 'Matrix' successfully unpacked and MD5 sums checked Error in normalizePath(path) : path[1]: The system cannot find the file specified Second attempt: package 'Matrix' successfully unpacked and MD5 sums checked package 'mlmRev' successfully unpacked and MD5 sums checked package 'MEMSS' successfully unpacked and MD5 sums checked package 'lme4' successfully unpacked and MD5 sums checked Error in normalizePath(path) : path[1]: The system cannot find the file specified The irreproducibility made me wonder... so I turned off Norton's auto-protect, which has a habit of scanning files on the fly when requested and that often delays file opening. The error disappeared, at least that once and for subsequent installations of NADA and the much larger rggobi install. The main reason for logging this post is to suggest a posible cause and workround. But if it does turn out to be a consistent issue, perhaps it would be worth checking for timeout issues related to normalizePath or related routines in a future update? S Duncan Murdoch-2 wrote: On 1/27/2009 10:15 AM, partho_bhowm...@ml.com wrote: Full_Name: Partho Bhowmick Version: 2.8.1 OS: Windows XP Submission from: (NULL) (199.43.48.131) While trying to install package sn (I have tried multiple mirrors), I get the following message trying URL 'http://www.revolution-computing.com/cran/bin/windows/contrib/2.8/sn_0.4-10.zip' Content type 'application/zip' length 320643 bytes (313 Kb) opened URL downloaded 313 Kb package 'sn' successfully unpacked and MD5 sums checked Error in normalizePath(path) : path[1]: The system cannot find the file specified It works for me. I suspect it's a permission problem or something similar on your system. Duncan Murdoch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- View this message in context: http://www.nabble.com/Package-%28PR-13475%29-tp21690164p22987300.html Sent from the R devel mailing list archive at Nabble.com. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Package (PR#13475)
S Ellison wrote: I had the same normalizePath error recently on a new laptop, with a fresh install of R 2.8.1 and an attempt to install lme4. First attempt: package 'Matrix' successfully unpacked and MD5 sums checked Error in normalizePath(path) : path[1]: The system cannot find the file specified Second attempt: package 'Matrix' successfully unpacked and MD5 sums checked package 'mlmRev' successfully unpacked and MD5 sums checked package 'MEMSS' successfully unpacked and MD5 sums checked package 'lme4' successfully unpacked and MD5 sums checked Error in normalizePath(path) : path[1]: The system cannot find the file specified The irreproducibility made me wonder... so I turned off Norton's auto-protect, which has a habit of scanning files on the fly when requested and that often delays file opening. The error disappeared, at least that once and for subsequent installations of NADA and the much larger rggobi install. The main reason for logging this post is to suggest a posible cause and workround. But if it does turn out to be a consistent issue, perhaps it would be worth checking for timeout issues related to normalizePath or related routines in a future update? Well, you need to ask Symantec to fix Norton, hence this is the wrong address. Best wishes, Uwe Ligges S Duncan Murdoch-2 wrote: On 1/27/2009 10:15 AM, partho_bhowm...@ml.com wrote: Full_Name: Partho Bhowmick Version: 2.8.1 OS: Windows XP Submission from: (NULL) (199.43.48.131) While trying to install package sn (I have tried multiple mirrors), I get the following message trying URL 'http://www.revolution-computing.com/cran/bin/windows/contrib/2.8/sn_0.4-10.zip' Content type 'application/zip' length 320643 bytes (313 Kb) opened URL downloaded 313 Kb package 'sn' successfully unpacked and MD5 sums checked Error in normalizePath(path) : path[1]: The system cannot find the file specified It works for me. I suspect it's a permission problem or something similar on your system. Duncan Murdoch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Wishlist: timeout detection (was Package (PR#13475))
I don't know if detecting timeouts is feasible. There are two problems. The first is being able to tell that failing to find the file was a timeout problem. The second is distinguishing timeouts due to antivirus software from timeouts due to, eg, missing network connections, where giving up quickly is better than hanging indefinitely. -thomas On Fri, 10 Apr 2009, Uwe Ligges wrote: S Ellison wrote: I had the same normalizePath error recently on a new laptop, with a fresh install of R 2.8.1 and an attempt to install lme4. First attempt: package 'Matrix' successfully unpacked and MD5 sums checked Error in normalizePath(path) : path[1]: The system cannot find the file specified Second attempt: package 'Matrix' successfully unpacked and MD5 sums checked package 'mlmRev' successfully unpacked and MD5 sums checked package 'MEMSS' successfully unpacked and MD5 sums checked package 'lme4' successfully unpacked and MD5 sums checked Error in normalizePath(path) : path[1]: The system cannot find the file specified The irreproducibility made me wonder... so I turned off Norton's auto-protect, which has a habit of scanning files on the fly when requested and that often delays file opening. The error disappeared, at least that once and for subsequent installations of NADA and the much larger rggobi install. The main reason for logging this post is to suggest a posible cause and workround. But if it does turn out to be a consistent issue, perhaps it would be worth checking for timeout issues related to normalizePath or related routines in a future update? Well, you need to ask Symantec to fix Norton, hence this is the wrong address. Best wishes, Uwe Ligges S Duncan Murdoch-2 wrote: On 1/27/2009 10:15 AM, partho_bhowm...@ml.com wrote: Full_Name: Partho Bhowmick Version: 2.8.1 OS: Windows XP Submission from: (NULL) (199.43.48.131) While trying to install package sn (I have tried multiple mirrors), I get the following message trying URL 'http://www.revolution-computing.com/cran/bin/windows/contrib/2.8/sn_0.4-10.zip' Content type 'application/zip' length 320643 bytes (313 Kb) opened URL downloaded 313 Kb package 'sn' successfully unpacked and MD5 sums checked Error in normalizePath(path) : path[1]: The system cannot find the file specified It works for me. I suspect it's a permission problem or something similar on your system. Duncan Murdoch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel Thomas Lumley Assoc. Professor, Biostatistics tlum...@u.washington.eduUniversity of Washington, Seattle __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] type.convert (PR#13646)
I can reproduce the difference that Stefan saw, depending on whether or not I start Rgui with the flags --no-environ --no-Rconsole I think it boils down to the isBlankString() function. For the string \247 it returns 1 when those flags are not present and 0 when they are. isBlankString does use some locale-specific functions: Rboolean isBlankString(const char *s) { #ifdef SUPPORT_MBCS if(mbcslocale) { wchar_t wc; int used; mbstate_t mb_st; mbs_init(mb_st); while( (used = Mbrtowc(wc, s, MB_CUR_MAX, mb_st)) ) { if(!iswspace(wc)) return FALSE; s += used; } } else #endif while (*s) if (!isspace((int)*s++)) return FALSE; return TRUE; } I was using R 2.8.1, downloaded precompiled from CRAN, on Windows XP SP3. The outputs of sessionInfo() and Sys.getenv() are the same in both sessions. 'Process Explorer' shows that the 2 sessions have the same dll's opened. sessionInfo() R version 2.8.1 (2008-12-22) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base I did the test with a dll compiled from #include R.h #include R_ext/Utils.h void test_isBlankString(char **s, int *res) { *res = isBlankString(*s) ; } and called by .C(test_isBlankString,\247,-1L) I don't see the difference while running a version of 2.9.0(devel) compiled locally on 11 March 2009 (from svn rev 48116). Bill Dunlap TIBCO Software Inc - Spotfire Division wdunlap tibco.com -Original Message- From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] On Behalf Of Peter Dalgaard Sent: Friday, April 10, 2009 2:03 AM To: Raberger, Stefan Cc: r-b...@r-project.org; r-de...@stat.math.ethz.ch Subject: Re: [Rd] type.convert (PR#13646) Raberger, Stefan wrote: Hi Peter, each of the four PCs actually has the same locale setting: Sys.setlocale(LC_CTYPE) [1] German_Austria.1252 (all the other settings returned by invoking Sys.getlocale() are identical as well). Just to be sure (because it's displayed incorrectly in my browser on the bugtracking page): the character inside the type.convert function ought to be a section-sign (HTML Code #167; or sect; , in R \247, and not a dot .). I saw it correctly. It's \302\247 in UTF8 locales, which is of course the reason I suspected locale settings, but I can't seem to trigger the NA behaviour. I'm at a loss here, but some ideas: In the cases where it returns NA, what type is it? (I.e. storage.mode(type.convert())) What do you get from charToRaw(§) [1] c2 a7 (a7, presumably, but better check). -p -Ursprüngliche Nachricht- Von: Peter Dalgaard [mailto:p.dalga...@biostat.ku.dk] Gesendet: Donnerstag, 09. April 2009 19:26 An: Raberger, Stefan Cc: r-de...@stat.math.ethz.ch; r-b...@r-project.org Betreff: Re: [Rd] type.convert (PR#13646) s.raber...@innovest.at wrote: Full_Name: Stefan Raberger Version: 2.8.1 OS: Windows XP Submission from: (NULL) (213.185.163.242) Hi there, I recently noticed some strange behaviour of the command type.convert, depending on the startup mode used. But there also seems to be different behaviour on different PCs (all running the same OS and the same version of R). On PC1: When I start R in SDI mode (RGui --no-save --no-restore --no-site-file --no-init-file --no-environ) and try to convert, the result is type.convert(§) [1] NA If I use MDI mode (RGui --no-save --no-restore --no-site-file --no-init-file --no-environ --no-Rconsole) instead, the result is type.convert(§) [1] § Levels: § On PC2 it's exactly the other way round (SDI: §, MDI: NA), on PC2 the result is always NA, independent of the startup mode used, and on PC4 it's always §. What's the result I should expect R to return, and why is it different in so many cases? Which locale does R think it is in in the four cases? (Sys.setlocale(LC_CTYPE), I think). Might well not be a bug (so please don't file it as one). Any help is much appreciated! Regards, Stefan __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - (p.dalga...@biostat.ku.dk) FAX: (+45) 35327907 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __
Re: [Rd] type.convert (PR#13646)
William Dunlap wrote: I can reproduce the difference that Stefan saw, depending on whether or not I start Rgui with the flags --no-environ --no-Rconsole I think it boils down to the isBlankString() function. For the string \247 it returns 1 when those flags are not present and 0 when they are. isBlankString does use some locale-specific functions: Rboolean isBlankString(const char *s) { #ifdef SUPPORT_MBCS if(mbcslocale) { wchar_t wc; int used; mbstate_t mb_st; mbs_init(mb_st); while( (used = Mbrtowc(wc, s, MB_CUR_MAX, mb_st)) ) { if(!iswspace(wc)) return FALSE; s += used; } } else #endif while (*s) if (!isspace((int)*s++)) return FALSE; return TRUE; } I was using R 2.8.1, downloaded precompiled from CRAN, on Windows XP SP3. The outputs of sessionInfo() and Sys.getenv() are the same in both sessions. 'Process Explorer' shows that the 2 sessions have the same dll's opened. Thanks for that analysis Bill! Stefan was in German_Austria.1252 which I don't think is multibyte, so only the else-clause should be relevant, pointing the finger rather squarely at isspace(). Googling indicates that others have been caught out by signed/unsigned char issues there. Should this possibly rather read if (!isspace((unsigned int)*s++)) return FALSE; ?? sessionInfo() R version 2.8.1 (2008-12-22) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base I did the test with a dll compiled from #include R.h #include R_ext/Utils.h void test_isBlankString(char **s, int *res) { *res = isBlankString(*s) ; } and called by .C(test_isBlankString,\247,-1L) I don't see the difference while running a version of 2.9.0(devel) compiled locally on 11 March 2009 (from svn rev 48116). Bill Dunlap TIBCO Software Inc - Spotfire Division wdunlap tibco.com -Original Message- From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] On Behalf Of Peter Dalgaard Sent: Friday, April 10, 2009 2:03 AM To: Raberger, Stefan Cc: r-b...@r-project.org; r-de...@stat.math.ethz.ch Subject: Re: [Rd] type.convert (PR#13646) Raberger, Stefan wrote: Hi Peter, each of the four PCs actually has the same locale setting: Sys.setlocale(LC_CTYPE) [1] German_Austria.1252 (all the other settings returned by invoking Sys.getlocale() are identical as well). Just to be sure (because it's displayed incorrectly in my browser on the bugtracking page): the character inside the type.convert function ought to be a section-sign (HTML Code #167; or sect; , in R \247, and not a dot .). I saw it correctly. It's \302\247 in UTF8 locales, which is of course the reason I suspected locale settings, but I can't seem to trigger the NA behaviour. I'm at a loss here, but some ideas: In the cases where it returns NA, what type is it? (I.e. storage.mode(type.convert())) What do you get from charToRaw(§) [1] c2 a7 (a7, presumably, but better check). -p -Ursprüngliche Nachricht- Von: Peter Dalgaard [mailto:p.dalga...@biostat.ku.dk] Gesendet: Donnerstag, 09. April 2009 19:26 An: Raberger, Stefan Cc: r-de...@stat.math.ethz.ch; r-b...@r-project.org Betreff: Re: [Rd] type.convert (PR#13646) s.raber...@innovest.at wrote: Full_Name: Stefan Raberger Version: 2.8.1 OS: Windows XP Submission from: (NULL) (213.185.163.242) Hi there, I recently noticed some strange behaviour of the command type.convert, depending on the startup mode used. But there also seems to be different behaviour on different PCs (all running the same OS and the same version of R). On PC1: When I start R in SDI mode (RGui --no-save --no-restore --no-site-file --no-init-file --no-environ) and try to convert, the result is type.convert(§) [1] NA If I use MDI mode (RGui --no-save --no-restore --no-site-file --no-init-file --no-environ --no-Rconsole) instead, the result is type.convert(§) [1] § Levels: § On PC2 it's exactly the other way round (SDI: §, MDI: NA), on PC2 the result is always NA, independent of the startup mode used, and on PC4 it's always §. What's the result I should expect R to return, and why is it different in so many cases? Which locale does R think it is in in the four cases? (Sys.setlocale(LC_CTYPE), I think). Might well not be a bug (so please don't file it as one). Any help is much appreciated! Regards, Stefan __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~
Re: [Rd] type.convert (PR#13646)
William Dunlap wrote: You may have to use (unsigned int)(unsigned char)*s++ instead of just (unsigned int)*s++ to avoid the sign extension. Thanks again, I probably won't be doing the change since I don't have a Windows build environment around, and I'm a bit superstitious about fixing bugs that I cannot see... Let me just filter this information into the bug repository for now. -pd Bill Dunlap TIBCO Software Inc - Spotfire Division wdunlap tibco.com -Original Message- From: Peter Dalgaard [mailto:p.dalga...@biostat.ku.dk] Sent: Friday, April 10, 2009 1:41 PM To: William Dunlap Cc: r-devel@r-project.org Subject: Re: [Rd] type.convert (PR#13646) William Dunlap wrote: I can reproduce the difference that Stefan saw, depending on whether or not I start Rgui with the flags --no-environ --no-Rconsole I think it boils down to the isBlankString() function. For the string \247 it returns 1 when those flags are not present and 0 when they are. isBlankString does use some locale-specific functions: Rboolean isBlankString(const char *s) { #ifdef SUPPORT_MBCS if(mbcslocale) { wchar_t wc; int used; mbstate_t mb_st; mbs_init(mb_st); while( (used = Mbrtowc(wc, s, MB_CUR_MAX, mb_st)) ) { if(!iswspace(wc)) return FALSE; s += used; } } else #endif while (*s) if (!isspace((int)*s++)) return FALSE; return TRUE; } I was using R 2.8.1, downloaded precompiled from CRAN, on Windows XP SP3. The outputs of sessionInfo() and Sys.getenv() are the same in both sessions. 'Process Explorer' shows that the 2 sessions have the same dll's opened. Thanks for that analysis Bill! Stefan was in German_Austria.1252 which I don't think is multibyte, so only the else-clause should be relevant, pointing the finger rather squarely at isspace(). Googling indicates that others have been caught out by signed/unsigned char issues there. Should this possibly rather read if (!isspace((unsigned int)*s++)) return FALSE; ?? sessionInfo() R version 2.8.1 (2008-12-22) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base I did the test with a dll compiled from #include R.h #include R_ext/Utils.h void test_isBlankString(char **s, int *res) { *res = isBlankString(*s) ; } and called by .C(test_isBlankString,\247,-1L) I don't see the difference while running a version of 2.9.0(devel) compiled locally on 11 March 2009 (from svn rev 48116). Bill Dunlap TIBCO Software Inc - Spotfire Division wdunlap tibco.com -Original Message- From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] On Behalf Of Peter Dalgaard Sent: Friday, April 10, 2009 2:03 AM To: Raberger, Stefan Cc: r-b...@r-project.org; r-de...@stat.math.ethz.ch Subject: Re: [Rd] type.convert (PR#13646) Raberger, Stefan wrote: Hi Peter, each of the four PCs actually has the same locale setting: Sys.setlocale(LC_CTYPE) [1] German_Austria.1252 (all the other settings returned by invoking Sys.getlocale() are identical as well). Just to be sure (because it's displayed incorrectly in my browser on the bugtracking page): the character inside the type.convert function ought to be a section-sign (HTML Code #167; or sect; , in R \247, and not a dot .). I saw it correctly. It's \302\247 in UTF8 locales, which is of course the reason I suspected locale settings, but I can't seem to trigger the NA behaviour. I'm at a loss here, but some ideas: In the cases where it returns NA, what type is it? (I.e. storage.mode(type.convert())) What do you get from charToRaw(§) [1] c2 a7 (a7, presumably, but better check). -p -Ursprüngliche Nachricht- Von: Peter Dalgaard [mailto:p.dalga...@biostat.ku.dk] Gesendet: Donnerstag, 09. April 2009 19:26 An: Raberger, Stefan Cc: r-de...@stat.math.ethz.ch; r-b...@r-project.org Betreff: Re: [Rd] type.convert (PR#13646) s.raber...@innovest.at wrote: Full_Name: Stefan Raberger Version: 2.8.1 OS: Windows XP Submission from: (NULL) (213.185.163.242) Hi there, I recently noticed some strange behaviour of the command type.convert, depending on the startup mode used. But there also seems to be different behaviour on different PCs (all running the same OS and the same version of R). On PC1: When I start R in SDI mode (RGui --no-save --no-restore --no-site-file --no-init-file --no-environ) and try to convert, the result is type.convert(§) [1] NA If I use MDI mode (RGui --no-save --no-restore --no-site-file --no-init-file --no-environ --no-Rconsole) instead, the result is type.convert(§) [1] § Levels: § On PC2 it's
Re: [Rd] type.convert (PR#13646)
Using the (unsigned int)(unsigned char) in isspace() resolved the problem in my Windows build. I put some Rprintf statements into isBlankString and for type.convert(\247) it printed *s=3D-89 (4294967207 if unsigned) 8=3Disspace(*s) 8=3Disspace((unsigned int)*s) 0=3Disspace((unsigned int)(unsigned char)*s) I think the 8 is the value of a random bit of memory. When I converted S+ to use full 8-bit characters I ran into the same problem. The isclass macros in ctype.h all take unsigned int argument and if char was signed you had to do the double cast to avoid sign extension. Whoever designed the interface either didn't worry about 8-bit characters or had chars that were unsigned by default. It doesn't look like any of the isspace calls in R do this double casting. Bill Dunlap TIBCO Software Inc - Spotfire Division wdunlap tibco.com =20 -Original Message- From: Peter Dalgaard [mailto:p.dalga...@biostat.ku.dk]=20 Sent: Friday, April 10, 2009 2:50 PM To: William Dunlap Cc: r-b...@r-project.org; Raberger, Stefan Subject: Re: [Rd] type.convert (PR#13646) =20 William Dunlap wrote: You may have to use (unsigned int)(unsigned char)*s++ instead of just (unsigned int)*s++ to avoid the sign extension. =20 Thanks again, =20 I probably won't be doing the change since I don't have a=20 Windows build=20 environment around, and I'm a bit superstitious about fixing=20 bugs that I=20 cannot see... =20 Let me just filter this information into the bug repository for now. =20 -pd =20 =20 Bill Dunlap TIBCO Software Inc - Spotfire Division wdunlap tibco.com =20 =20 -Original Message- From: Peter Dalgaard [mailto:p.dalga...@biostat.ku.dk]=20 Sent: Friday, April 10, 2009 1:41 PM To: William Dunlap Cc: r-devel@r-project.org Subject: Re: [Rd] type.convert (PR#13646) William Dunlap wrote: I can reproduce the difference that Stefan saw, depending on whether or not I start Rgui with the flags --no-environ --no-Rconsole I think it boils down to the isBlankString() function. For the string \247 it returns 1 when those flags are not present and 0 when they are. isBlankString does use some locale-specific functions: Rboolean isBlankString(const char *s) { #ifdef SUPPORT_MBCS if(mbcslocale) { wchar_t wc; int used; mbstate_t mb_st; mbs_init(mb_st); while( (used =3D Mbrtowc(wc, s, MB_CUR_MAX, mb_st)) ) { if(!iswspace(wc)) return FALSE; s +=3D used; } } else #endif while (*s) if (!isspace((int)*s++)) return FALSE; return TRUE; } I was using R 2.8.1, downloaded precompiled from CRAN, on Windows XP SP3. The outputs of sessionInfo() and Sys.getenv() are the same in both sessions. 'Process Explorer' shows that the 2 sessions have the same dll's opened. Thanks for that analysis Bill! Stefan was in German_Austria.1252 which I don't think is=20 multibyte, so=20 only the else-clause should be relevant, pointing the=20 finger rather=20 squarely at isspace(). Googling indicates that others have=20 been caught=20 out by signed/unsigned char issues there. Should this=20 possibly rather read if (!isspace((unsigned int)*s++)) return FALSE; ?? sessionInfo() R version 2.8.1 (2008-12-22)=20 i386-pc-mingw32=20 locale: LC_COLLATE=3DEnglish_United=20 States.1252;LC_CTYPE=3DEnglish_United=20 States.1252;LC_MONETARY=3DEnglish_United=20 States.1252;LC_NUMERIC=3DC;LC_TIME=3DEnglish_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets =20 methods base=20 I did the test with a dll compiled from #include R.h #include R_ext/Utils.h void test_isBlankString(char **s, int *res) { *res =3D isBlankString(*s) ; } and called by .C(test_isBlankString,\247,-1L) I don't see the difference while running a version of 2.9.0(devel) compiled locally on 11 March 2009 (from svn rev 48116). Bill Dunlap TIBCO Software Inc - Spotfire Division wdunlap tibco.com =20 -Original Message- From: r-devel-boun...@r-project.org=20 [mailto:r-devel-boun...@r-project.org] On Behalf Of=20 Peter Dalgaard Sent: Friday, April 10, 2009 2:03 AM To: Raberger, Stefan Cc: r-b...@r-project.org; r-de...@stat.math.ethz.ch Subject: Re: [Rd] type.convert (PR#13646) Raberger, Stefan wrote: Hi Peter, each of the four PCs actually has the same locale setting:=20 Sys.setlocale(LC_CTYPE) [1] German_Austria.1252 (all the other settings returned by invoking=20 Sys.getlocale() are identical as well). Just to be sure (because it's displayed incorrectly in my=20 browser on the bugtracking page): the character inside the=20 type.convert function ought to be a section-sign (HTML Code=20 #167; or sect; , in R \247, and not a dot .). I saw it correctly. It's \302\247 in UTF8 locales, which is=20
Re: [Rd] type.convert (PR#13646)
From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] On Behalf Of wdun...@tibco.com Sent: Friday, April 10, 2009 4:00 PM To: r-de...@stat.math.ethz.ch Cc: r-b...@r-project.org Subject: Re: [Rd] type.convert (PR#13646) Using the (unsigned int)(unsigned char) in isspace() resolved the problem in my Windows build. (int)(unsigned char) the proper thing, since isspace is declared to be int isspace(int). The (unsigned int)(unsigned char) will work because C does the unsigned int - int conversion automatically when the prototype is present and that conversion doesn't change the value of the thing. I put some Rprintf statements into isBlankString and for type.convert(\247) it printed *s=3D-89 (4294967207 if unsigned) 8=3Disspace(*s) 8=3Disspace((unsigned int)*s) 0=3Disspace((unsigned int)(unsigned char)*s) I think the 8 is the value of a random bit of memory. When I converted S+ to use full 8-bit characters I ran into the same problem. The isclass macros in ctype.h all take unsigned int argument and if char was signed you had to do the double cast to avoid sign extension. Whoever designed the interface either didn't worry about 8-bit characters or had chars that were unsigned by default. It doesn't look like any of the isspace calls in R do this double casting. Bill Dunlap TIBCO Software Inc - Spotfire Division wdunlap tibco.com =20 -Original Message- From: Peter Dalgaard [mailto:p.dalga...@biostat.ku.dk]=20 Sent: Friday, April 10, 2009 2:50 PM To: William Dunlap Cc: r-b...@r-project.org; Raberger, Stefan Subject: Re: [Rd] type.convert (PR#13646) =20 William Dunlap wrote: You may have to use (unsigned int)(unsigned char)*s++ instead of just (unsigned int)*s++ to avoid the sign extension. =20 Thanks again, =20 I probably won't be doing the change since I don't have a=20 Windows build=20 environment around, and I'm a bit superstitious about fixing=20 bugs that I=20 cannot see... =20 Let me just filter this information into the bug repository for now. =20 -pd =20 =20 Bill Dunlap TIBCO Software Inc - Spotfire Division wdunlap tibco.com =20 =20 -Original Message- From: Peter Dalgaard [mailto:p.dalga...@biostat.ku.dk]=20 Sent: Friday, April 10, 2009 1:41 PM To: William Dunlap Cc: r-devel@r-project.org Subject: Re: [Rd] type.convert (PR#13646) William Dunlap wrote: I can reproduce the difference that Stefan saw, depending on whether or not I start Rgui with the flags --no-environ --no-Rconsole I think it boils down to the isBlankString() function. For the string \247 it returns 1 when those flags are not present and 0 when they are. isBlankString does use some locale-specific functions: Rboolean isBlankString(const char *s) { #ifdef SUPPORT_MBCS if(mbcslocale) { wchar_t wc; int used; mbstate_t mb_st; mbs_init(mb_st); while( (used =3D Mbrtowc(wc, s, MB_CUR_MAX, mb_st)) ) { if(!iswspace(wc)) return FALSE; s +=3D used; } } else #endif while (*s) if (!isspace((int)*s++)) return FALSE; return TRUE; } I was using R 2.8.1, downloaded precompiled from CRAN, on Windows XP SP3. The outputs of sessionInfo() and Sys.getenv() are the same in both sessions. 'Process Explorer' shows that the 2 sessions have the same dll's opened. Thanks for that analysis Bill! Stefan was in German_Austria.1252 which I don't think is=20 multibyte, so=20 only the else-clause should be relevant, pointing the=20 finger rather=20 squarely at isspace(). Googling indicates that others have=20 been caught=20 out by signed/unsigned char issues there. Should this=20 possibly rather read if (!isspace((unsigned int)*s++)) return FALSE; ?? sessionInfo() R version 2.8.1 (2008-12-22)=20 i386-pc-mingw32=20 locale: LC_COLLATE=3DEnglish_United=20 States.1252;LC_CTYPE=3DEnglish_United=20 States.1252;LC_MONETARY=3DEnglish_United=20 States.1252;LC_NUMERIC=3DC;LC_TIME=3DEnglish_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets =20 methods base=20 I did the test with a dll compiled from #include R.h #include R_ext/Utils.h void test_isBlankString(char **s, int *res) { *res =3D isBlankString(*s) ; } and called by .C(test_isBlankString,\247,-1L) I don't see the difference while running a version of 2.9.0(devel) compiled locally on 11 March 2009 (from svn rev 48116). Bill Dunlap TIBCO Software Inc - Spotfire Division wdunlap tibco.com =20 -Original Message- From: r-devel-boun...@r-project.org=20 [mailto:r-devel-boun...@r-project.org] On Behalf Of=20 Peter Dalgaard Sent: Friday, April 10, 2009 2:03