Re: [Rd] type.convert (PR#13646)

2009-04-10 Thread Peter Dalgaard

Raberger, Stefan wrote:

Hi Peter,

each of the four PCs actually has the same locale setting: 


Sys.setlocale(LC_CTYPE)

[1] German_Austria.1252

(all the other settings returned by invoking Sys.getlocale() are identical as 
well).

Just to be sure (because it's displayed incorrectly in my browser on the bugtracking page): the character inside the 
type.convert function ought to be a section-sign (HTML Code #167; or sect; , in R 
\247, and not a dot .).


I saw it correctly. It's \302\247 in UTF8 locales, which is of course 
the reason I suspected locale settings, but I can't seem to trigger the 
NA behaviour.


I'm at a loss here, but some ideas:

In the cases where it returns NA, what type is it? (I.e. 
storage.mode(type.convert()))


What do you get from

 charToRaw(§)
[1] c2 a7

(a7, presumably, but better check).

-p


-Ursprüngliche Nachricht-
Von: Peter Dalgaard [mailto:p.dalga...@biostat.ku.dk] 
Gesendet: Donnerstag, 09. April 2009 19:26

An: Raberger, Stefan
Cc: r-de...@stat.math.ethz.ch; r-b...@r-project.org
Betreff: Re: [Rd] type.convert (PR#13646)

s.raber...@innovest.at wrote:

Full_Name: Stefan Raberger
Version: 2.8.1
OS: Windows XP
Submission from: (NULL) (213.185.163.242)


Hi there, 


I recently noticed some strange behaviour of the command type.convert,
depending on the startup mode used. But there also seems to be different
behaviour on different PCs (all running the same OS and the same version of R).

On PC1:
When I start R in SDI mode (RGui --no-save --no-restore --no-site-file
--no-init-file --no-environ) and try to convert, the result is


type.convert(§)

[1] NA

If I use MDI mode (RGui --no-save --no-restore --no-site-file --no-init-file
--no-environ --no-Rconsole) instead, the result is


type.convert(§)

[1] §
Levels: §

On PC2 it's exactly the other way round (SDI: §, MDI: NA), on PC2 the result is
always NA, independent of the startup mode used, and on PC4 it's always §.

What's the result I should expect R to return, and why is it different in so
many cases?


Which locale does R think it is in in the four cases? 
(Sys.setlocale(LC_CTYPE), I think).


Might well not be a bug (so please don't file it as one).


Any help is much appreciated!
Regards, Stefan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel






--
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - (p.dalga...@biostat.ku.dk)  FAX: (+45) 35327907

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Package (PR#13475)

2009-04-10 Thread S Ellison

I had the same normalizePath error recently on a new laptop, with a fresh
install of R 2.8.1 and an attempt to install lme4. First attempt:
package 'Matrix' successfully unpacked and MD5 sums checked
Error in normalizePath(path) : 
  path[1]: The system cannot find the file specified

Second attempt:

package 'Matrix' successfully unpacked and MD5 sums checked
package 'mlmRev' successfully unpacked and MD5 sums checked
package 'MEMSS' successfully unpacked and MD5 sums checked
package 'lme4' successfully unpacked and MD5 sums checked
Error in normalizePath(path) : 
  path[1]: The system cannot find the file specified


The irreproducibility made me wonder... so I turned off Norton's
auto-protect, which has a habit of scanning files on the fly when requested
and that often delays file opening. The error disappeared, at least that
once and for subsequent installations of NADA and the much larger rggobi
install.

The main reason for logging this post is to suggest a posible cause and
workround. But if it does turn out to be a consistent issue, perhaps it
would be worth checking for timeout issues related to normalizePath or
related routines in a future update?

S


Duncan Murdoch-2 wrote:
 
 On 1/27/2009 10:15 AM, partho_bhowm...@ml.com wrote:
 Full_Name: Partho Bhowmick
 Version: 2.8.1
 OS: Windows XP
 Submission from: (NULL) (199.43.48.131)
 
 
 While trying to install package sn (I have tried multiple mirrors),
 I get the following message
 
 trying URL
 'http://www.revolution-computing.com/cran/bin/windows/contrib/2.8/sn_0.4-10.zip'
 Content type 'application/zip' length 320643 bytes (313 Kb)
 opened URL
 downloaded 313 Kb
 
 package 'sn' successfully unpacked and MD5 sums checked
 Error in normalizePath(path) : 
   path[1]: The system cannot find the file specified
 
 
 It works for me.  I suspect it's a permission problem or something 
 similar on your system.
 
 Duncan Murdoch
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
 
 

-- 
View this message in context: 
http://www.nabble.com/Package-%28PR-13475%29-tp21690164p22987300.html
Sent from the R devel mailing list archive at Nabble.com.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Package (PR#13475)

2009-04-10 Thread Uwe Ligges



S Ellison wrote:

I had the same normalizePath error recently on a new laptop, with a fresh
install of R 2.8.1 and an attempt to install lme4. First attempt:
package 'Matrix' successfully unpacked and MD5 sums checked
Error in normalizePath(path) : 
  path[1]: The system cannot find the file specified


Second attempt:

package 'Matrix' successfully unpacked and MD5 sums checked
package 'mlmRev' successfully unpacked and MD5 sums checked
package 'MEMSS' successfully unpacked and MD5 sums checked
package 'lme4' successfully unpacked and MD5 sums checked
Error in normalizePath(path) : 
  path[1]: The system cannot find the file specified



The irreproducibility made me wonder... so I turned off Norton's
auto-protect, which has a habit of scanning files on the fly when requested
and that often delays file opening. The error disappeared, at least that
once and for subsequent installations of NADA and the much larger rggobi
install.



The main reason for logging this post is to suggest a posible cause and
workround. But if it does turn out to be a consistent issue, perhaps it
would be worth checking for timeout issues related to normalizePath or
related routines in a future update?


Well, you need to ask Symantec to fix Norton, hence this is the wrong 
address.


Best wishes,
Uwe Ligges





S


Duncan Murdoch-2 wrote:

On 1/27/2009 10:15 AM, partho_bhowm...@ml.com wrote:

Full_Name: Partho Bhowmick
Version: 2.8.1
OS: Windows XP
Submission from: (NULL) (199.43.48.131)


While trying to install package sn (I have tried multiple mirrors),
I get the following message

trying URL
'http://www.revolution-computing.com/cran/bin/windows/contrib/2.8/sn_0.4-10.zip'
Content type 'application/zip' length 320643 bytes (313 Kb)
opened URL
downloaded 313 Kb

package 'sn' successfully unpacked and MD5 sums checked
Error in normalizePath(path) : 
  path[1]: The system cannot find the file specified


It works for me.  I suspect it's a permission problem or something 
similar on your system.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel






__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Wishlist: timeout detection (was Package (PR#13475))

2009-04-10 Thread Thomas Lumley


I don't know if detecting timeouts is feasible.  There are two problems. The 
first is being able to tell that failing to find the file was a timeout 
problem. The second is distinguishing timeouts due to antivirus software from 
timeouts due to, eg, missing network connections, where giving up quickly is 
better than hanging indefinitely.

 -thomas

On Fri, 10 Apr 2009, Uwe Ligges wrote:




S Ellison wrote:

I had the same normalizePath error recently on a new laptop, with a fresh
install of R 2.8.1 and an attempt to install lme4. First attempt:
package 'Matrix' successfully unpacked and MD5 sums checked
Error in normalizePath(path) :   path[1]: The system cannot find the file 
specified


Second attempt:

package 'Matrix' successfully unpacked and MD5 sums checked
package 'mlmRev' successfully unpacked and MD5 sums checked
package 'MEMSS' successfully unpacked and MD5 sums checked
package 'lme4' successfully unpacked and MD5 sums checked
Error in normalizePath(path) :   path[1]: The system cannot find the file 
specified



The irreproducibility made me wonder... so I turned off Norton's
auto-protect, which has a habit of scanning files on the fly when requested
and that often delays file opening. The error disappeared, at least that
once and for subsequent installations of NADA and the much larger rggobi
install.

The main reason for logging this post is to suggest a posible cause and
workround. But if it does turn out to be a consistent issue, perhaps it
would be worth checking for timeout issues related to normalizePath or
related routines in a future update?


Well, you need to ask Symantec to fix Norton, hence this is the wrong address.

Best wishes,
Uwe Ligges





S


Duncan Murdoch-2 wrote:

On 1/27/2009 10:15 AM, partho_bhowm...@ml.com wrote:

Full_Name: Partho Bhowmick
Version: 2.8.1
OS: Windows XP
Submission from: (NULL) (199.43.48.131)


While trying to install package sn (I have tried multiple mirrors),
I get the following message

trying URL
'http://www.revolution-computing.com/cran/bin/windows/contrib/2.8/sn_0.4-10.zip'
Content type 'application/zip' length 320643 bytes (313 Kb)
opened URL
downloaded 313 Kb

package 'sn' successfully unpacked and MD5 sums checked
Error in normalizePath(path) :   path[1]: The system cannot find the 
file specified


It works for me.  I suspect it's a permission problem or something similar 
on your system.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel






__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



Thomas Lumley   Assoc. Professor, Biostatistics
tlum...@u.washington.eduUniversity of Washington, Seattle

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] type.convert (PR#13646)

2009-04-10 Thread William Dunlap
I can reproduce the difference that Stefan saw, depending
on whether or not I start Rgui with the flags
--no-environ --no-Rconsole
I think it boils down to the isBlankString() function.
For the string \247 it returns 1 when those flags are
not present and 0 when they are.  isBlankString does use
some locale-specific functions:
Rboolean isBlankString(const char *s)
{
#ifdef SUPPORT_MBCS
if(mbcslocale) {
wchar_t wc; int used; mbstate_t mb_st;
mbs_init(mb_st);
while( (used = Mbrtowc(wc, s, MB_CUR_MAX, mb_st)) ) {
if(!iswspace(wc)) return FALSE;
s += used;
}
} else
#endif
while (*s)
if (!isspace((int)*s++)) return FALSE;
return TRUE;
}

I was using R 2.8.1, downloaded precompiled from CRAN, on Windows
XP SP3. The outputs of sessionInfo() and Sys.getenv() are the same
in both sessions.  'Process Explorer' shows that the 2 sessions
have the same dll's opened.

 sessionInfo()
R version 2.8.1 (2008-12-22) 
i386-pc-mingw32 

locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United 
States.1252;LC_MONETARY=English_United 
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base 
 

I did the test with a dll compiled from
#include R.h
#include R_ext/Utils.h

void test_isBlankString(char **s, int *res)
{
   *res = isBlankString(*s) ;
}

and called by .C(test_isBlankString,\247,-1L)

I don't see the difference while running a version of 2.9.0(devel)
compiled locally on 11 March 2009 (from svn rev 48116).

Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com  

 -Original Message-
 From: r-devel-boun...@r-project.org 
 [mailto:r-devel-boun...@r-project.org] On Behalf Of Peter Dalgaard
 Sent: Friday, April 10, 2009 2:03 AM
 To: Raberger, Stefan
 Cc: r-b...@r-project.org; r-de...@stat.math.ethz.ch
 Subject: Re: [Rd] type.convert (PR#13646)
 
 Raberger, Stefan wrote:
  Hi Peter,
  
  each of the four PCs actually has the same locale setting: 
  
  Sys.setlocale(LC_CTYPE)
  [1] German_Austria.1252
  
  (all the other settings returned by invoking 
 Sys.getlocale() are identical as well).
  
  Just to be sure (because it's displayed incorrectly in my 
 browser on the bugtracking page): the character inside the 
 type.convert function ought to be a section-sign (HTML Code 
 #167; or sect; , in R \247, and not a dot .).
 
 I saw it correctly. It's \302\247 in UTF8 locales, which is 
 of course 
 the reason I suspected locale settings, but I can't seem to 
 trigger the 
 NA behaviour.
 
 I'm at a loss here, but some ideas:
 
 In the cases where it returns NA, what type is it? (I.e. 
 storage.mode(type.convert()))
 
 What do you get from
 
   charToRaw(§)
 [1] c2 a7
 
 (a7, presumably, but better check).
 
 -p
 
  -Ursprüngliche Nachricht-
  Von: Peter Dalgaard [mailto:p.dalga...@biostat.ku.dk] 
  Gesendet: Donnerstag, 09. April 2009 19:26
  An: Raberger, Stefan
  Cc: r-de...@stat.math.ethz.ch; r-b...@r-project.org
  Betreff: Re: [Rd] type.convert (PR#13646)
  
  s.raber...@innovest.at wrote:
  Full_Name: Stefan Raberger
  Version: 2.8.1
  OS: Windows XP
  Submission from: (NULL) (213.185.163.242)
 
 
  Hi there, 
 
  I recently noticed some strange behaviour of the command 
 type.convert,
  depending on the startup mode used. But there also seems 
 to be different
  behaviour on different PCs (all running the same OS and 
 the same version of R).
 
  On PC1:
  When I start R in SDI mode (RGui --no-save --no-restore 
 --no-site-file
  --no-init-file --no-environ) and try to convert, the result is
 
  type.convert(§)
  [1] NA
 
  If I use MDI mode (RGui --no-save --no-restore 
 --no-site-file --no-init-file
  --no-environ --no-Rconsole) instead, the result is
 
  type.convert(§)
  [1] §
  Levels: §
 
  On PC2 it's exactly the other way round (SDI: §, MDI: NA), 
 on PC2 the result is
  always NA, independent of the startup mode used, and on 
 PC4 it's always §.
 
  What's the result I should expect R to return, and why is 
 it different in so
  many cases?
  
  Which locale does R think it is in in the four cases? 
  (Sys.setlocale(LC_CTYPE), I think).
  
  Might well not be a bug (so please don't file it as one).
  
  Any help is much appreciated!
  Regards, Stefan
 
  __
  R-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-devel
  
  
 
 
 -- 
 O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
   (*) \(*) -- University of Copenhagen   Denmark  Ph:  
 (+45) 35327918
 ~~ - (p.dalga...@biostat.ku.dk)  FAX: 
 (+45) 35327907
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
 

__

Re: [Rd] type.convert (PR#13646)

2009-04-10 Thread Peter Dalgaard

William Dunlap wrote:

I can reproduce the difference that Stefan saw, depending
on whether or not I start Rgui with the flags
--no-environ --no-Rconsole
I think it boils down to the isBlankString() function.
For the string \247 it returns 1 when those flags are
not present and 0 when they are.  isBlankString does use
some locale-specific functions:
Rboolean isBlankString(const char *s)
{
#ifdef SUPPORT_MBCS
if(mbcslocale) {
wchar_t wc; int used; mbstate_t mb_st;
mbs_init(mb_st);
while( (used = Mbrtowc(wc, s, MB_CUR_MAX, mb_st)) ) {
if(!iswspace(wc)) return FALSE;
s += used;
}
} else
#endif
while (*s)
if (!isspace((int)*s++)) return FALSE;
return TRUE;
}

I was using R 2.8.1, downloaded precompiled from CRAN, on Windows
XP SP3. The outputs of sessionInfo() and Sys.getenv() are the same
in both sessions.  'Process Explorer' shows that the 2 sessions
have the same dll's opened.


Thanks for that analysis Bill!

Stefan was in German_Austria.1252 which I don't think is multibyte, so 
only the else-clause should be relevant, pointing the finger rather 
squarely at isspace(). Googling indicates that others have been caught 
out by signed/unsigned char issues there. Should this possibly rather read


if (!isspace((unsigned int)*s++)) return FALSE;

??




sessionInfo()
R version 2.8.1 (2008-12-22) 
i386-pc-mingw32 


locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United 
States.1252;LC_MONETARY=English_United 
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base 


I did the test with a dll compiled from
#include R.h
#include R_ext/Utils.h

void test_isBlankString(char **s, int *res)
{
   *res = isBlankString(*s) ;
}

and called by .C(test_isBlankString,\247,-1L)

I don't see the difference while running a version of 2.9.0(devel)
compiled locally on 11 March 2009 (from svn rev 48116).

Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com  


-Original Message-
From: r-devel-boun...@r-project.org 
[mailto:r-devel-boun...@r-project.org] On Behalf Of Peter Dalgaard

Sent: Friday, April 10, 2009 2:03 AM
To: Raberger, Stefan
Cc: r-b...@r-project.org; r-de...@stat.math.ethz.ch
Subject: Re: [Rd] type.convert (PR#13646)

Raberger, Stefan wrote:

Hi Peter,

each of the four PCs actually has the same locale setting: 


Sys.setlocale(LC_CTYPE)

[1] German_Austria.1252

(all the other settings returned by invoking 

Sys.getlocale() are identical as well).
Just to be sure (because it's displayed incorrectly in my 
browser on the bugtracking page): the character inside the 
type.convert function ought to be a section-sign (HTML Code 
#167; or sect; , in R \247, and not a dot .).


I saw it correctly. It's \302\247 in UTF8 locales, which is 
of course 
the reason I suspected locale settings, but I can't seem to 
trigger the 
NA behaviour.


I'm at a loss here, but some ideas:

In the cases where it returns NA, what type is it? (I.e. 
storage.mode(type.convert()))


What do you get from

  charToRaw(§)
[1] c2 a7

(a7, presumably, but better check).

-p


-Ursprüngliche Nachricht-
Von: Peter Dalgaard [mailto:p.dalga...@biostat.ku.dk] 
Gesendet: Donnerstag, 09. April 2009 19:26

An: Raberger, Stefan
Cc: r-de...@stat.math.ethz.ch; r-b...@r-project.org
Betreff: Re: [Rd] type.convert (PR#13646)

s.raber...@innovest.at wrote:

Full_Name: Stefan Raberger
Version: 2.8.1
OS: Windows XP
Submission from: (NULL) (213.185.163.242)


Hi there, 

I recently noticed some strange behaviour of the command 

type.convert,
depending on the startup mode used. But there also seems 

to be different
behaviour on different PCs (all running the same OS and 

the same version of R).

On PC1:
When I start R in SDI mode (RGui --no-save --no-restore 

--no-site-file

--no-init-file --no-environ) and try to convert, the result is


type.convert(§)

[1] NA

If I use MDI mode (RGui --no-save --no-restore 

--no-site-file --no-init-file

--no-environ --no-Rconsole) instead, the result is


type.convert(§)

[1] §
Levels: §

On PC2 it's exactly the other way round (SDI: §, MDI: NA), 

on PC2 the result is
always NA, independent of the startup mode used, and on 

PC4 it's always §.
What's the result I should expect R to return, and why is 

it different in so

many cases?
Which locale does R think it is in in the four cases? 
(Sys.setlocale(LC_CTYPE), I think).


Might well not be a bug (so please don't file it as one).


Any help is much appreciated!
Regards, Stefan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel




--
O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
   c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
  (*) \(*) -- University of Copenhagen   Denmark  Ph:  
(+45) 35327918
~~ 

Re: [Rd] type.convert (PR#13646)

2009-04-10 Thread p . dalgaard
William Dunlap wrote:
 You may have to use
   (unsigned int)(unsigned char)*s++
 instead of just
   (unsigned int)*s++
 to avoid the sign extension.

Thanks again,

I probably won't be doing the change since I don't have a Windows build 
environment around, and I'm a bit superstitious about fixing bugs that I 
cannot see...

Let me just filter this information into the bug repository for now.

-pd

 
 Bill Dunlap
 TIBCO Software Inc - Spotfire Division
 wdunlap tibco.com  
 
 -Original Message-
 From: Peter Dalgaard [mailto:p.dalga...@biostat.ku.dk] 
 Sent: Friday, April 10, 2009 1:41 PM
 To: William Dunlap
 Cc: r-devel@r-project.org
 Subject: Re: [Rd] type.convert (PR#13646)

 William Dunlap wrote:
 I can reproduce the difference that Stefan saw, depending
 on whether or not I start Rgui with the flags
 --no-environ --no-Rconsole
 I think it boils down to the isBlankString() function.
 For the string \247 it returns 1 when those flags are
 not present and 0 when they are.  isBlankString does use
 some locale-specific functions:
 Rboolean isBlankString(const char *s)
 {
 #ifdef SUPPORT_MBCS
 if(mbcslocale) {
 wchar_t wc; int used; mbstate_t mb_st;
 mbs_init(mb_st);
 while( (used = Mbrtowc(wc, s, MB_CUR_MAX, mb_st)) ) {
 if(!iswspace(wc)) return FALSE;
 s += used;
 }
 } else
 #endif
 while (*s)
 if (!isspace((int)*s++)) return FALSE;
 return TRUE;
 }

 I was using R 2.8.1, downloaded precompiled from CRAN, on Windows
 XP SP3. The outputs of sessionInfo() and Sys.getenv() are the same
 in both sessions.  'Process Explorer' shows that the 2 sessions
 have the same dll's opened.
 Thanks for that analysis Bill!

 Stefan was in German_Austria.1252 which I don't think is 
 multibyte, so 
 only the else-clause should be relevant, pointing the finger rather 
 squarely at isspace(). Googling indicates that others have 
 been caught 
 out by signed/unsigned char issues there. Should this 
 possibly rather read

 if (!isspace((unsigned int)*s++)) return FALSE;

 ??

 sessionInfo()
 R version 2.8.1 (2008-12-22) 
 i386-pc-mingw32 

 locale:
 LC_COLLATE=English_United 
 States.1252;LC_CTYPE=English_United 
 States.1252;LC_MONETARY=English_United 
 States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
 attached base packages:
 [1] stats graphics  grDevices utils datasets  
 methods   base 
 I did the test with a dll compiled from
 #include R.h
 #include R_ext/Utils.h

 void test_isBlankString(char **s, int *res)
 {
*res = isBlankString(*s) ;
 }

 and called by .C(test_isBlankString,\247,-1L)

 I don't see the difference while running a version of 2.9.0(devel)
 compiled locally on 11 March 2009 (from svn rev 48116).

 Bill Dunlap
 TIBCO Software Inc - Spotfire Division
 wdunlap tibco.com  

 -Original Message-
 From: r-devel-boun...@r-project.org 
 [mailto:r-devel-boun...@r-project.org] On Behalf Of Peter Dalgaard
 Sent: Friday, April 10, 2009 2:03 AM
 To: Raberger, Stefan
 Cc: r-b...@r-project.org; r-de...@stat.math.ethz.ch
 Subject: Re: [Rd] type.convert (PR#13646)

 Raberger, Stefan wrote:
 Hi Peter,

 each of the four PCs actually has the same locale setting: 

 Sys.setlocale(LC_CTYPE)
 [1] German_Austria.1252

 (all the other settings returned by invoking 
 Sys.getlocale() are identical as well).
 Just to be sure (because it's displayed incorrectly in my 
 browser on the bugtracking page): the character inside the 
 type.convert function ought to be a section-sign (HTML Code 
 #167; or sect; , in R \247, and not a dot .).

 I saw it correctly. It's \302\247 in UTF8 locales, which is 
 of course 
 the reason I suspected locale settings, but I can't seem to 
 trigger the 
 NA behaviour.

 I'm at a loss here, but some ideas:

 In the cases where it returns NA, what type is it? (I.e. 
 storage.mode(type.convert()))

 What do you get from

   charToRaw(§)
 [1] c2 a7

 (a7, presumably, but better check).

 -p

 -Ursprüngliche Nachricht-
 Von: Peter Dalgaard [mailto:p.dalga...@biostat.ku.dk] 
 Gesendet: Donnerstag, 09. April 2009 19:26
 An: Raberger, Stefan
 Cc: r-de...@stat.math.ethz.ch; r-b...@r-project.org
 Betreff: Re: [Rd] type.convert (PR#13646)

 s.raber...@innovest.at wrote:
 Full_Name: Stefan Raberger
 Version: 2.8.1
 OS: Windows XP
 Submission from: (NULL) (213.185.163.242)


 Hi there, 

 I recently noticed some strange behaviour of the command 
 type.convert,
 depending on the startup mode used. But there also seems 
 to be different
 behaviour on different PCs (all running the same OS and 
 the same version of R).
 On PC1:
 When I start R in SDI mode (RGui --no-save --no-restore 
 --no-site-file
 --no-init-file --no-environ) and try to convert, the result is

 type.convert(§)
 [1] NA

 If I use MDI mode (RGui --no-save --no-restore 
 --no-site-file --no-init-file
 --no-environ --no-Rconsole) instead, the result is

 type.convert(§)
 [1] §
 Levels: §

 On PC2 it's 

Re: [Rd] type.convert (PR#13646)

2009-04-10 Thread wdunlap
Using the (unsigned int)(unsigned char) in isspace()
resolved the problem in my Windows build.  I put some Rprintf
statements into isBlankString and for type.convert(\247)
it printed
  *s=3D-89 (4294967207 if unsigned)
8=3Disspace(*s)
8=3Disspace((unsigned int)*s)
0=3Disspace((unsigned int)(unsigned char)*s)
I think the 8 is the value of a random bit of memory.

When I converted S+ to use full 8-bit characters I ran
into the same problem.  The isclass macros in ctype.h
all take unsigned int argument and if char was signed you had
to do the double cast to avoid sign extension.  Whoever
designed the interface either didn't worry about 8-bit characters
or had chars that were unsigned by default.

It doesn't look like any of the isspace calls in R do
this double casting.

Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com =20

 -Original Message-
 From: Peter Dalgaard [mailto:p.dalga...@biostat.ku.dk]=20
 Sent: Friday, April 10, 2009 2:50 PM
 To: William Dunlap
 Cc: r-b...@r-project.org; Raberger, Stefan
 Subject: Re: [Rd] type.convert (PR#13646)
=20
 William Dunlap wrote:
  You may have to use
(unsigned int)(unsigned char)*s++
  instead of just
(unsigned int)*s++
  to avoid the sign extension.
=20
 Thanks again,
=20
 I probably won't be doing the change since I don't have a=20
 Windows build=20
 environment around, and I'm a bit superstitious about fixing=20
 bugs that I=20
 cannot see...
=20
 Let me just filter this information into the bug repository for now.
=20
   -pd
=20
 =20
  Bill Dunlap
  TIBCO Software Inc - Spotfire Division
  wdunlap tibco.com =20
 =20
  -Original Message-
  From: Peter Dalgaard [mailto:p.dalga...@biostat.ku.dk]=20
  Sent: Friday, April 10, 2009 1:41 PM
  To: William Dunlap
  Cc: r-devel@r-project.org
  Subject: Re: [Rd] type.convert (PR#13646)
 
  William Dunlap wrote:
  I can reproduce the difference that Stefan saw, depending
  on whether or not I start Rgui with the flags
  --no-environ --no-Rconsole
  I think it boils down to the isBlankString() function.
  For the string \247 it returns 1 when those flags are
  not present and 0 when they are.  isBlankString does use
  some locale-specific functions:
  Rboolean isBlankString(const char *s)
  {
  #ifdef SUPPORT_MBCS
  if(mbcslocale) {
  wchar_t wc; int used; mbstate_t mb_st;
  mbs_init(mb_st);
  while( (used =3D Mbrtowc(wc, s, MB_CUR_MAX, mb_st)) ) {
  if(!iswspace(wc)) return FALSE;
  s +=3D used;
  }
  } else
  #endif
  while (*s)
  if (!isspace((int)*s++)) return FALSE;
  return TRUE;
  }
 
  I was using R 2.8.1, downloaded precompiled from CRAN, on Windows
  XP SP3. The outputs of sessionInfo() and Sys.getenv() are the same
  in both sessions.  'Process Explorer' shows that the 2 sessions
  have the same dll's opened.
  Thanks for that analysis Bill!
 
  Stefan was in German_Austria.1252 which I don't think is=20
  multibyte, so=20
  only the else-clause should be relevant, pointing the=20
 finger rather=20
  squarely at isspace(). Googling indicates that others have=20
  been caught=20
  out by signed/unsigned char issues there. Should this=20
  possibly rather read
 
  if (!isspace((unsigned int)*s++)) return FALSE;
 
  ??
 
  sessionInfo()
  R version 2.8.1 (2008-12-22)=20
  i386-pc-mingw32=20
 
  locale:
  LC_COLLATE=3DEnglish_United=20
  States.1252;LC_CTYPE=3DEnglish_United=20
  States.1252;LC_MONETARY=3DEnglish_United=20
  States.1252;LC_NUMERIC=3DC;LC_TIME=3DEnglish_United States.1252
  attached base packages:
  [1] stats graphics  grDevices utils datasets =20
  methods   base=20
  I did the test with a dll compiled from
  #include R.h
  #include R_ext/Utils.h
 
  void test_isBlankString(char **s, int *res)
  {
 *res =3D isBlankString(*s) ;
  }
 
  and called by .C(test_isBlankString,\247,-1L)
 
  I don't see the difference while running a version of 2.9.0(devel)
  compiled locally on 11 March 2009 (from svn rev 48116).
 
  Bill Dunlap
  TIBCO Software Inc - Spotfire Division
  wdunlap tibco.com =20
 
  -Original Message-
  From: r-devel-boun...@r-project.org=20
  [mailto:r-devel-boun...@r-project.org] On Behalf Of=20
 Peter Dalgaard
  Sent: Friday, April 10, 2009 2:03 AM
  To: Raberger, Stefan
  Cc: r-b...@r-project.org; r-de...@stat.math.ethz.ch
  Subject: Re: [Rd] type.convert (PR#13646)
 
  Raberger, Stefan wrote:
  Hi Peter,
 
  each of the four PCs actually has the same locale setting:=20
 
  Sys.setlocale(LC_CTYPE)
  [1] German_Austria.1252
 
  (all the other settings returned by invoking=20
  Sys.getlocale() are identical as well).
  Just to be sure (because it's displayed incorrectly in my=20
  browser on the bugtracking page): the character inside the=20
  type.convert function ought to be a section-sign (HTML Code=20
  #167; or sect; , in R \247, and not a dot .).
 
  I saw it correctly. It's \302\247 in UTF8 locales, which is=20
  

Re: [Rd] type.convert (PR#13646)

2009-04-10 Thread William Dunlap
 From: r-devel-boun...@r-project.org 
 [mailto:r-devel-boun...@r-project.org] On Behalf Of wdun...@tibco.com
 Sent: Friday, April 10, 2009 4:00 PM
 To: r-de...@stat.math.ethz.ch
 Cc: r-b...@r-project.org
 Subject: Re: [Rd] type.convert (PR#13646)
 
 Using the (unsigned int)(unsigned char) in isspace()
 resolved the problem in my Windows build.  

(int)(unsigned char) the proper thing, since isspace
is declared to be int isspace(int).

The (unsigned int)(unsigned char) will work because
C does the unsigned int - int conversion automatically
when the prototype is present and that conversion doesn't
change the value of the thing.

 I put some Rprintf
 statements into isBlankString and for type.convert(\247)
 it printed
   *s=3D-89 (4294967207 if unsigned)
 8=3Disspace(*s)
 8=3Disspace((unsigned int)*s)
 0=3Disspace((unsigned int)(unsigned char)*s)
 I think the 8 is the value of a random bit of memory.
 
 When I converted S+ to use full 8-bit characters I ran
 into the same problem.  The isclass macros in ctype.h
 all take unsigned int argument and if char was signed you had
 to do the double cast to avoid sign extension.  Whoever
 designed the interface either didn't worry about 8-bit characters
 or had chars that were unsigned by default.
 
 It doesn't look like any of the isspace calls in R do
 this double casting.
 
 Bill Dunlap
 TIBCO Software Inc - Spotfire Division
 wdunlap tibco.com =20
 
  -Original Message-
  From: Peter Dalgaard [mailto:p.dalga...@biostat.ku.dk]=20
  Sent: Friday, April 10, 2009 2:50 PM
  To: William Dunlap
  Cc: r-b...@r-project.org; Raberger, Stefan
  Subject: Re: [Rd] type.convert (PR#13646)
 =20
  William Dunlap wrote:
   You may have to use
 (unsigned int)(unsigned char)*s++
   instead of just
 (unsigned int)*s++
   to avoid the sign extension.
 =20
  Thanks again,
 =20
  I probably won't be doing the change since I don't have a=20
  Windows build=20
  environment around, and I'm a bit superstitious about fixing=20
  bugs that I=20
  cannot see...
 =20
  Let me just filter this information into the bug repository for now.
 =20
  -pd
 =20
  =20
   Bill Dunlap
   TIBCO Software Inc - Spotfire Division
   wdunlap tibco.com =20
  =20
   -Original Message-
   From: Peter Dalgaard [mailto:p.dalga...@biostat.ku.dk]=20
   Sent: Friday, April 10, 2009 1:41 PM
   To: William Dunlap
   Cc: r-devel@r-project.org
   Subject: Re: [Rd] type.convert (PR#13646)
  
   William Dunlap wrote:
   I can reproduce the difference that Stefan saw, depending
   on whether or not I start Rgui with the flags
   --no-environ --no-Rconsole
   I think it boils down to the isBlankString() function.
   For the string \247 it returns 1 when those flags are
   not present and 0 when they are.  isBlankString does use
   some locale-specific functions:
   Rboolean isBlankString(const char *s)
   {
   #ifdef SUPPORT_MBCS
   if(mbcslocale) {
   wchar_t wc; int used; mbstate_t mb_st;
   mbs_init(mb_st);
   while( (used =3D Mbrtowc(wc, s, MB_CUR_MAX, 
 mb_st)) ) {
   if(!iswspace(wc)) return FALSE;
   s +=3D used;
   }
   } else
   #endif
   while (*s)
   if (!isspace((int)*s++)) return FALSE;
   return TRUE;
   }
  
   I was using R 2.8.1, downloaded precompiled from CRAN, 
 on Windows
   XP SP3. The outputs of sessionInfo() and Sys.getenv() 
 are the same
   in both sessions.  'Process Explorer' shows that the 2 sessions
   have the same dll's opened.
   Thanks for that analysis Bill!
  
   Stefan was in German_Austria.1252 which I don't think is=20
   multibyte, so=20
   only the else-clause should be relevant, pointing the=20
  finger rather=20
   squarely at isspace(). Googling indicates that others have=20
   been caught=20
   out by signed/unsigned char issues there. Should this=20
   possibly rather read
  
   if (!isspace((unsigned int)*s++)) return FALSE;
  
   ??
  
   sessionInfo()
   R version 2.8.1 (2008-12-22)=20
   i386-pc-mingw32=20
  
   locale:
   LC_COLLATE=3DEnglish_United=20
   States.1252;LC_CTYPE=3DEnglish_United=20
   States.1252;LC_MONETARY=3DEnglish_United=20
   States.1252;LC_NUMERIC=3DC;LC_TIME=3DEnglish_United States.1252
   attached base packages:
   [1] stats graphics  grDevices utils datasets =20
   methods   base=20
   I did the test with a dll compiled from
   #include R.h
   #include R_ext/Utils.h
  
   void test_isBlankString(char **s, int *res)
   {
  *res =3D isBlankString(*s) ;
   }
  
   and called by .C(test_isBlankString,\247,-1L)
  
   I don't see the difference while running a version of 
 2.9.0(devel)
   compiled locally on 11 March 2009 (from svn rev 48116).
  
   Bill Dunlap
   TIBCO Software Inc - Spotfire Division
   wdunlap tibco.com =20
  
   -Original Message-
   From: r-devel-boun...@r-project.org=20
   [mailto:r-devel-boun...@r-project.org] On Behalf Of=20
  Peter Dalgaard
   Sent: Friday, April 10, 2009 2:03