Re: [Rd] NIST StRD linear regression

2006-07-31 Thread Prof Brian Ripley
That is not an appropriate way to fit a degree-10 polynomial (in any 
language, if fitting a degree-10 polynomial is in fact an appropriate 
statistical analysis, which seems unlikely).

On Sun, 30 Jul 2006, Carnell, Rob C wrote:

 NIST maintains a repository of Statistical Reference Datasets at
 http://www.itl.nist.gov/div898/strd/.  I have been working through the
 datasets to compare R's results to their references with the hope that
 if all works well, this could become a validation package.

What does it validate?  The R user's understanding of numerical methods?

 All the linear regression datasets give results with some degree of
 accuracy except one.  The NIST model includes 11 parameters, but R will
 not compute the estimates for all 11 parameters because it finds the
 data matrix to be singular.
 
 The code I used is below.  Any help in getting R to estimate all 11
 regression parameters would be greatly appreciated.
 
 I am posting this to the R-devel list since I think that the discussion
 might involve the limitations of platform precision.
 
 I am using R 2.3.1 for Windows XP.
 
 rm(list=ls())
 require(gsubfn)

That is not needed.

 defaultPath - my path
 
 data.base - http://www.itl.nist.gov/div898/strd/lls/data/LINKS/DATA;
 
 reg.data - paste(data.base, /Filip.dat, sep=)
 
 model -
 V1~V2+I(V2^2)+I(V2^3)+I(V2^4)+I(V2^5)+I(V2^6)+I(V2^7)+I(V2^8)+I(V2^9)+I
 (V2^10)
 
 filePath - paste(defaultPath, //NISTtest.dat, sep=)
 download.file(reg.data, filePath, quiet=TRUE)

filePath - 
url(http://www.itl.nist.gov/div898/strd/lls/data/LINKS/DATA/Filip.dat;)

will suffice.

 A - read.table(filePath, skip=60, strip.white=TRUE)

 lm.data - lm(formula(model), A)
 
 lm.data

lm(V1 ~ poly(V2, 10), A)

works.

 kappa(model.matrix(V1 ~ poly(V2, 10, raw=TRUE), A), exact=TRUE)
[1] 1.767963e+15

shows the design matrix is indeed numerically singular by the naive 
method.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] NIST StRD linear regression

2006-07-31 Thread Martin Maechler
 RobCar == Carnell, Rob C [EMAIL PROTECTED]
 on Sun, 30 Jul 2006 19:42:29 -0400 writes:

RobCar NIST maintains a repository of Statistical Reference
RobCar Datasets at http://www.itl.nist.gov/div898/strd/.  I
RobCar have been working through the datasets to compare
RobCar R's results to their references with the hope that
RobCar if all works well, this could become a validation
RobCar package.

RobCar All the linear regression datasets give results with
RobCar some degree of accuracy except one.  The NIST model
RobCar includes 11 parameters, but R will not compute the
RobCar estimates for all 11 parameters because it finds the
RobCar data matrix to be singular.

RobCar The code I used is below.  Any help in getting R to
RobCar estimate all 11 regression parameters would be
RobCar greatly appreciated.

RobCar I am posting this to the R-devel list since I think
RobCar that the discussion might involve the limitations of
RobCar platform precision.

RobCar I am using R 2.3.1 for Windows XP.

RobCar rm(list=ls())
RobCar require(gsubfn)

RobCar defaultPath - my path

RobCar data.base - 
http://www.itl.nist.gov/div898/strd/lls/data/LINKS/DATA;

Here is a slight improvement {note the function file.path(); and
model - ..; also  poly(V2, 10) !} 
which shows you how to tell lm() to believe in 10 digit
precision of input data.  

---

reg.data - paste(data.base, /Filip.dat, sep=)
filePath - file.path(defaultPath, NISTtest.dat)
download.file(reg.data, filePath, quiet=TRUE)

A - read.table(filePath, skip=60, strip.white=TRUE)

## If you really need high-order polynomial regression in S and R,
##  *DO* as you are told in all good books, and use orthogonal polynomials:
(lm.ok - lm(V1 ~ poly(V2,10), data = A))
## and there is no problem
summary(lm.ok)

## But if you insist on doing nonsense 

model - V1 ~ V2+ 
I(V2^2)+I(V2^3)+I(V2^4)+I(V2^5)+I(V2^6)+I(V2^7)+I(V2^8)+I(V2^9)+I(V2^10)

## MM: better:
(model - paste(V1 ~ V2, paste(+ I(V2^, 2:10, ), sep='', collapse='')))
(form - formula(model))

mod.mat - model.matrix(form, data = A)
dim(mod.mat) ## 82 11
(m.qr - qr(mod.mat ))$rank # - 10 (only, instead of 11)
(m.qr - qr(mod.mat, tol = 1e-10))$rank # - 11

(lm.def  - lm(form, data = A)) ## last coef. is NA
(lm.plus - lm(form, data = A, tol = 1e-10))## no NA coefficients

---


RobCar reg.data - paste(data.base, /Filip.dat, sep=)

RobCar model -
RobCar 
V1~V2+I(V2^2)+I(V2^3)+I(V2^4)+I(V2^5)+I(V2^6)+I(V2^7)+I(V2^8)+I(V2^9)+I
RobCar (V2^10)

RobCar filePath - paste(defaultPath, //NISTtest.dat, sep=)
RobCar download.file(reg.data, filePath, quiet=TRUE)

RobCar A - read.table(filePath, skip=60, strip.white=TRUE)
RobCar lm.data - lm(formula(model), A)

RobCar lm.data


RobCar Rob Carnell

A propos NIST StRD:
If you go further to  NONlinear regression,
and use nls(), you will see that high quality statistics
packages such as R  do *NOT* always conform to NIST -- at least
not to what NIST did about 5 years ago when I last looked.
There are many nonlinear least squares problems where the
correct result is *NO CONVERGENCE* (because of
over-parametrization, ill-posednes, ...), 
owever many (cr.p) pieces of software do converge---falsely. 
I think you find more on this topic in the monograph of 
Bates and Watts (1988), but in any case,
just install and use the CRAN R package 'NISTnls' by Doug Bates
which contains the data sets with documentation and example
calls.

Martin Maechler, ETH Zurich

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] R sources: .tgz oder .tar.gz

2006-07-31 Thread Uwe Ligges
This is a minor documentation inconsistency:
While the R Installation and Administration manual speaks about the file 
R-x.y.z.tgz, this has to be replaced by R-x.y.z.tar.gz for x  1.

Uwe Ligges

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Warning when indexing data.frame with FALSE

2006-07-31 Thread hadley wickham
 data.frame()[]
NULL data frame with 0 rows
 data.frame()[FALSE]
Warning in is.na(nm) : is.na() applied to non-(list or vector)
NULL data frame with 0 rows
 data.frame()[NULL]
Warning in is.na(nm) : is.na() applied to non-(list or vector)
NULL data frame with 0 rows

Is this a bug?  I wouldn't have expected the warning in the last two cases.

Regards,

Hadley

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Warning when indexing data.frame with FALSE

2006-07-31 Thread hadley wickham
Sorry, a better example is:

 data.frame(a=1)[FALSE]
NULL data frame with 1 rows
 data.frame(a=1)[NULL]
NULL data frame with 1 rows

vs

 data.frame()[FALSE]
Warning in is.na(nm) : is.na() applied to non-(list or vector)
NULL data frame with 0 rows
 data.frame()[NULL]
Warning in is.na(nm) : is.na() applied to non-(list or vector)
NULL data frame with 0 rows

Hadley

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Any interest in merge and by implementations specifically for so

2006-07-31 Thread Kevin B. Hendricks
Hi Tom,

 Now, try sorting and using a loop:

 idx - order(i)
 xs - x[idx]
 is - i[idx]
 res - array(NA, 1e6)
 idx - which(diff(is)  0)
 startidx - c(1, idx+1)
 endidx - c(idx, length(xs))
 f1 - function(x, startidx, endidx, FUN = sum)  {
 +   for (j in 1:length(res)) {
 + res[j] - FUN(x[startidx[j]:endidx[j]])
 +   }
 +   res
 + }
 unix.time(res1 - f1(xs, startidx, endidx))
 [1] 6.86 0.00 7.04   NA   NA

I wonder how much time the sorting, reordering and creation os  
startidx and endidx would add to this time?

Either way, your code can nicely be used to quickly create the small  
integer factors I would need if the igroup functions get integrated.   
Thanks!

 For the case of sum (or averages), you can vectorize this using  
 cumsum as
 follows. This won't work for median or max.

 f2 - function(x, startidx, endidx)  {
 +   cum - cumsum(x)
 +   res - cum[endidx]
 +   res[2:length(res)] - res[2:length(res)] - cum[endidx[1:(length 
 (res) -
 1)]]
 +   res
 + }
 unix.time(res2 - f2(xs, startidx, endidx))
 [1] 0.20 0.00 0.21   NA   NA

Yes that is a quite fast way to handle sums.

 You can also use Luke Tierney's byte compiler
 (http://www.stat.uiowa.edu/~luke/R/compiler/) to speed up the loop for
 functions where you can't vectorize:

 library(compiler)
 f3 - cmpfun(f1)
 Note: local functions used: FUN
 unix.time(res3 - f3(xs, startidx, endidx))
 [1] 3.84 0.00 3.91   NA   NA

That looks interesting.  Does it only work for specific operating  
systems and processors?  I will give it a try.

Thanks,

Kevin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Choropleths maps of the USA states (PR#9111)

2006-07-31 Thread mlcarvalho
Dear Colegues:

I have tried to reach Ray Brownrigg,   [EMAIL PROTECTED]
but my mails have turned back with the following comment
did not reach the following recipient(s):
[EMAIL PROTECTED] on Mon, 31 Jul 2006 12:52:39 +0100

There is no such account in the address

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

I have a problem with the choropleths maps of the USA states=20

 In fact, when I use the following code

#

palette(gray(seq(0,.9,len=3D10)))

 ordena.estados-c(ALABAMA,ARIZONA,ARKANSAS,CALIFORNIA,COLORADO,
CONNECTICUT,DELAWARE,FLORIDA,GEORGIA,IDAHO,ILLINOIS,INDIANA,=
IOWA,
KANSAS,
KENTUCKY,LOUISIANA,MAINE,MARYLAND,MASSACHUSETTS,MICHIGAN,MINNE=
SOTA,MISSISSIPI,
 MISSOURI, MONTANA,NEBRASKA,NEVADA,NEW HAMPSHIRE,NEW JERSEY,
NEW MEXICO,NEW YORK,
NORTH CAROLINA,NORTH DAKOTA,OHIO,OKLAHOMA,OREGON,PENNSYLVANIA,=
RHODE ISLAND,
SOUTH CAROLINA,
SOUTH DAKOTA,TENNESSEE,TEXAS,UTAH,VERMONT,VIRGINIA,WASHINGTON,
WEST VIRGINIA,WISCONSIN,WYOMING)

W-c(1,10,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1=
,1,1,1,1,1,1,1,1,1,1,1,1,1)

mapa.sat-map('state', region =3D ordena.estados,fill=3DT,col=3DW)=20=20

#

That should paint all the states with the same colour except   ARIZONA , =
 I get the  UTAH state also coloured as ARIZONA.

The same thing happens with some other states, I get two states coloured in=
stead of one.

Is there any problem with the way I have made the ranking of states in ord=
ena.estados?





=20

Thank you for any help.

Sincerely

Luc=EDlia Carvalho









[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Any interest in merge and by implementations specifically for sorted data?

2006-07-31 Thread Thomas Lumley
On Sat, 29 Jul 2006, Kevin B. Hendricks wrote:

 Hi Bill,

sum : igroupSums

 Okay, after thinking about this ...

 # assumes i is the small integer factor with n levels
 # v is some long vector
 # no sorting required

 igroupSums - function(v,i) {
   sums - rep(0,max(i))
   for (j in 1:length(v)) {
   sums[[i[[j - sums[[i[[j + v[[j]]
   }
   sums
 }

 if written in fortran or c might be faster than using split.  It is
 at least just linear in time with the length of vector v.

For sums you should look at rowsum().  It uses a hash table in C and last 
time I looked was faster than using split(). It returns a vector of the 
same length as the input, but that would easily be fixed.

The same approach would work for min, max, range, count, mean, but not for 
arbitrary functions.

-thomas

Thomas Lumley   Assoc. Professor, Biostatistics
[EMAIL PROTECTED]   University of Washington, Seattle

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] bug in format.default: trim=TRUE does not always work as advertised (PR#9114)

2006-07-31 Thread tplate
This is a multi-part message in MIME format.
--090008060607010208040805
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

DESCRIPTION OF PROBLEM:

Output from format.default sometimes has whitespace around it when using
big.mark=, and trim=TRUE.  E.g.:

   # works ok as long as big.mark is not actually used:
   format(c(-1,1,10,999), big.mark=,, trim=TRUE)
[1] -1  1   10  999
   # but if big.mark is used, output is justified and not trimmed:
   format(c(-1,1,10,999,1e6), big.mark=,, trim=TRUE)
[1]-1 110   999 1,000,000
  

The documentation for the argument 'trim' to format.default() states:
 trim: logical; if 'FALSE', logical, numeric and complex values are
   right-justified to a common width: if 'TRUE' the leading
   blanks for justification are suppressed.

Thus, the above behavior of including blanks for justification when 
trim=FALSE (in some situations) seems to contradict the documentation.

PROPOSED FIX:

The last call to prettyNum() in format.default() 
(src/library/base/R/format.R) has the argument

preserve.width = common

If this is changed to

preserve.width = if (trim) individual else common

then output is formatted correctly in the case above.

A patch for this one line is attached to this message (patch done 
against the released R-2.3.1 source tarball (2006/06/01), the format.R 
file in this release is not different to the one in the current snapshot 
of the devel version of R).  After making these changes, I ran make 
check-all.  I did not see any tests that seemed to  break with these 
changes.

-- Tony Plate


--090008060607010208040805
Content-Type: text/plain;
 name=format.default.fixes.diff
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename=format.default.fixes.diff

--- R-2.3.1-orig/src/library/base/R/format.R2006-04-09 16:19:19.0 
-0600
+++ R-2.3.1/src/library/base/R/format.R 2006-07-26 15:52:42.117456700 -0600
@@ -37,7 +37,7 @@
 small.interval = small.interval,
 decimal.mark = decimal.mark,
 zero.print = zero.print,
-preserve.width = common)
+ preserve.width = if (trim) individual else common)
   )
 }
 }

--090008060607010208040805--

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Any interest in merge and by implementations specifically for so

2006-07-31 Thread tshort

 Hi Tom,
 
  Now, try sorting and using a loop:
 
  idx - order(i)
  xs - x[idx]
  is - i[idx]
  res - array(NA, 1e6)
  idx - which(diff(is)  0)
  startidx - c(1, idx+1)
  endidx - c(idx, length(xs))
  f1 - function(x, startidx, endidx, FUN = sum)  {
  +   for (j in 1:length(res)) {
  + res[j] - FUN(x[startidx[j]:endidx[j]])
  +   }
  +   res
  + }
  unix.time(res1 - f1(xs, startidx, endidx))
  [1] 6.86 0.00 7.04   NA   NA
 
 I wonder how much time the sorting, reordering and creation os  
 startidx and endidx would add to this time?

Done interactively, sorting and indexing seemed fast. Here are some timings:

 unix.time({idx - order(i)
+xs - x[idx]
+is - i[idx]
+res - array(NA, 1e6)
+idx - which(diff(is)  0)
+startidx - c(1, idx+1)
+endidx - c(idx, length(xs))
+  })
[1] 1.06 0.00 1.09   NA   NA


 That looks interesting.  Does it only work for specific operating  
 systems and processors?  I will give it a try.

No, as far as I know, it works on all operating systems. Also, it gets a
little faster if you directly put the sum in the function:

 f4 - function(x, startidx, endidx)  {
+   for (j in 1:length(res)) {
+ res[j] - sum(x[startidx[j]:endidx[j]])
+   }
+   res
+ }
 f5 - cmpfun(f4)
 unix.time(res5 - f5(xs, startidx, endidx))
[1] 2.67 0.03 2.95   NA   NA

- Tom


-- 
View this message in context: 
http://www.nabble.com/Any-interest-in-%22merge%22-and-%22by%22-implementations-specifically-for-sorted-data--tf2009595.html#a5578580
Sent from the R devel forum at Nabble.com.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Bug in title() crashes R (PR#9115)

2006-07-31 Thread Robert . McGehee
Hello,

The below reliably crashes R 2.3.1:
 plot.new()
 title(1:10)
Process R segmentation fault (core dumped) ...

Also, R will crash when these vectors are much smaller, just not as
reliably.

I haven't tried this in on today's snapshot, but didn't see anything in
the changelog that seems to have addressed this.

HTH,
Robert

 R.version
   _=20
platform   i386-pc-mingw32  =20
arch   i386 =20
os mingw32  =20
system i386, mingw32=20
status  =20
major  2=20
minor  3.1  =20
year   2006 =20
month  06   =20
day01   =20
svn rev38247=20
language   R=20
version.string Version 2.3.1 (2006-06-01)

Robert McGehee
Quantitative Analyst
Geode Capital Management, LLC
53 State Street, 5th Floor | Boston, MA | 02109
Tel: 617/392-8396Fax:617/476-6389
mailto:[EMAIL PROTECTED]



This e-mail, and any attachments hereto, are intended for us...{{dropped}}

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] lazyLoadDBfetch error

2006-07-31 Thread Kimpel, Mark William

I am getting the following output when I try to load R package gmodels under 
R-devel. This does not occur in R 2.3.1. In searching the R archives it appears 
that this message regarding lazyLoadDBfetch should not be occurring in recent R 
releases.

I have redownloaded R-devel and gmodels and the problem persists.

Ideas?

Thanks,

Mark

 require(gmodels)
Loading required package: gmodels
Error in lazyLoadDBfetch(key, datafile, compressed, envhook) : 
internal error in R_decompress1
[1] FALSE

sessionInfo()

R version 2.4.0 Under development (unstable) (2006-06-15 r38348) 
i386-pc-mingw32 

locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United 
States.1252;LC_MONETARY=English_United 
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
[1] splines   tools methods   stats graphics  grDevices 
utils datasets  base 

other attached packages:
gtools  gdataGOstats   Category   hgu95av2   KEGG   multtest
 xtable   RBGL   annotate GO  graph  Ruuid  limma 
   2.2.32.1.21.6.01.4.1   1.10.01.8.11.9.5
1.3-21.8.1   1.10.01.6.5   1.11.9   1.10.02.7.5 
genefilter   survival rgu34a   affy affyioBiobaseRWinEdt 
  1.10.1 2.26   1.12.0   1.10.01.1.3   1.10.01.7-4

Mark W. Kimpel MD 

 
Official Business Address:
 
Department of Psychiatry
Indiana University School of Medicine
PR M116
Institute of Psychiatric Research
791 Union Drive
Indianapolis, IN 46202
 
Preferred Mailing Address:
 
15032 Hunter Court
Westfield, IN  46074
 
(317) 490-5129 Work,  Mobile
 
(317) 663-0513 Home (no voice mail please)
1-(317)-536-2730 FAX

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [R] HTTP User-Agent header

2006-07-31 Thread Robert Gentleman
should appear at an R-devel near you...
thanks Seth


Seth Falcon wrote:
 Robert Gentleman [EMAIL PROTECTED] writes:
 OK, that suggests setting at the options level would solve both of your 
 problems and that seems like the best approach. I don't really want to 
 pass this around as a parameter through the maze of functions that might 
 actually download something if we don't have to.
 
 I have an updated patch that adds an HTTPUserAgent option.  The
 default is a string like:
 
 R (2.4.0 x86_64-unknown-linux-gnu x86_64 linux-gnu)
 
 If the HTTPUserAgent option is NULL, no user agent header is added to
 HTTP requests (this is the current behavior).  This option allows R to
 use an arbitrary user agent header.
 
 The patch adds two non-exported functions to utils: 
1) defaultUserAgent - returns a string like above
2) makeUserAgent - formats content of HTTPUserAgent option for use
   as part of an HTTP request header.
 
 I've tested on OSX and Linux, but not on Windows.  When USE_WININET is
 defined, a user agent string of R was already being used.  With this
 patch, the HTTPUserAgent options is used.  I'm unsure if NULL is
 allowed.
 
 Also, in src/main/internet.c there is a comment:
   Next 6 are for use by libxml, only
 and then a definition for R_HTTPOpen.  Not sure how/when these get
 used.  The user agent for these calls remains unspecified with this
 patch.
 
 + seth
 
 
 Patch summary:
  src/include/R_ext/R-ftp-http.h   |2 +-
  src/include/Rmodules/Rinternet.h |2 +-
  src/library/base/man/options.Rd  |5 +
  src/library/utils/R/readhttp.R   |   25 +
  src/library/utils/R/zzz.R|3 ++-
  src/main/internet.c  |2 +-
  src/modules/internet/internet.c  |   37 +
  src/modules/internet/nanohttp.c  |8 ++--
  8 files changed, 66 insertions(+), 18 deletions(-)
 
 
 
 Index: src/include/R_ext/R-ftp-http.h
 ===
 --- src/include/R_ext/R-ftp-http.h(revision 38715)
 +++ src/include/R_ext/R-ftp-http.h(working copy)
 @@ -36,7 +36,7 @@
  int   R_FTPRead(void *ctx, char *dest, int len);
  void  R_FTPClose(void *ctx);
  
 -void *   RxmlNanoHTTPOpen(const char *URL, char **contentType, int 
 cacheOK);
 +void *   RxmlNanoHTTPOpen(const char *URL, char **contentType, const 
 char *headers, int cacheOK);
  int  RxmlNanoHTTPRead(void *ctx, void *dest, int len);
  void RxmlNanoHTTPClose(void *ctx);
  int  RxmlNanoHTTPReturnCode(void *ctx);
 Index: src/include/Rmodules/Rinternet.h
 ===
 --- src/include/Rmodules/Rinternet.h  (revision 38715)
 +++ src/include/Rmodules/Rinternet.h  (working copy)
 @@ -9,7 +9,7 @@
  typedef Rconnection (*R_NewUrlRoutine)(char *description, char *mode);
  typedef Rconnection (*R_NewSockRoutine)(char *host, int port, int server, 
 char *mode); 
  
 -typedef void * (*R_HTTPOpenRoutine)(const char *url, const int cacheOK);
 +typedef void * (*R_HTTPOpenRoutine)(const char *url, const char *headers, 
 const int cacheOK);
  typedef int(*R_HTTPReadRoutine)(void *ctx, char *dest, int len);
  typedef void   (*R_HTTPCloseRoutine)(void *ctx);
 
 Index: src/main/internet.c
 ===
 --- src/main/internet.c   (revision 38715)
 +++ src/main/internet.c   (working copy)
 @@ -129,7 +129,7 @@
  {
  if(!initialized) internet_Init();
  if(initialized  0)
 - return (*ptr-HTTPOpen)(url, 0);
 + return (*ptr-HTTPOpen)(url, NULL, 0);
  else {
   error(_(internet routines cannot be loaded));
   return NULL;
 Index: src/library/utils/R/zzz.R
 ===
 --- src/library/utils/R/zzz.R (revision 38715)
 +++ src/library/utils/R/zzz.R (working copy)
 @@ -9,7 +9,8 @@
   internet.info = 2,
   pkgType = .Platform$pkgType,
   str = list(strict.width = no),
 - example.ask = default)
 + example.ask = default,
 + HTTPUserAgent = defaultUserAgent())
  extra -
  if(.Platform$OS.type == windows) {
  list(mailer = none,
 Index: src/library/utils/R/readhttp.R
 ===
 --- src/library/utils/R/readhttp.R(revision 38715)
 +++ src/library/utils/R/readhttp.R(working copy)
 @@ -6,3 +6,28 @@
  stop(transfer failure)
  file.show(file, delete.file = delete.file, title = title, ...)
  }
 +
 +
 +
 +defaultUserAgent - function()
 +{
 +Rver - paste(R.version$major, R.version$minor, sep=.)
 +Rdetails - paste(Rver, R.version$platform, R.version$arch,
 +  R.version$os)
 +paste(R (, Rdetails, ), sep=)
 +}
 +
 +
 +makeUserAgent - function(format = TRUE) {
 +agent - getOption(HTTPUserAgent)
 +if (is.null(agent)) {
 

[Rd] building windows packages under wine/linux and cross-compiling.

2006-07-31 Thread Hin-Tak Leung
Had some fun today, and thought it might be a good idea to share
and possibly for inclusion to R/src/gnuwin32/README.packages .

Wine/linux : while R, ActiveState Perl, mingw all works alright under 
wine, the blocking issue is Rtool's cygwin dependency. forking
(as much of make and sh is forking sub-processes)
on posix-on-win32-on-posix currently doesn't work.
(http://wiki.winehq.org/CygwinSupport)

Cross-compiling: The instruction in R/src/gnuwin32/README.packages 
essentially works, with one missing detail: R_EXE=/usr/bin/R is also 
needed. Thus it should be:

 make R_EXE=/usr/bin/R PKGDIR=/mysources RLIB=/R/win/library \
 pkg-mypkg
 make P_EXE=/usr/bin/R KGDIR=/mysources RLIB=/R/win/library \
 pkgcheck-mypkg

Hin-Tak Leung

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] building windows packages under wine/linux and cross-compiling.

2006-07-31 Thread Prof Brian Ripley
On Mon, 31 Jul 2006, Hin-Tak Leung wrote:

 Had some fun today, and thought it might be a good idea to share
 and possibly for inclusion to R/src/gnuwin32/README.packages .

[...]

 Cross-compiling: The instruction in R/src/gnuwin32/README.packages 
 essentially works, with one missing detail: R_EXE=/usr/bin/R is also 
 needed. Thus it should be:
 
  make R_EXE=/usr/bin/R PKGDIR=/mysources RLIB=/R/win/library \
  pkg-mypkg
  make P_EXE=/usr/bin/R KGDIR=/mysources RLIB=/R/win/library \
  pkgcheck-mypkg

The instructions do work for those who actually follow them! That file 
says

  Edit MkRules to set BUILD=CROSS and the appropriate paths (including
  HEADER) as needed.

and the appropriate section of that file is

## === cross-compilation settings  =

ifeq ($(strip $(BUILD)),CROSS)
# Next might be mingw32- or i386-mingw32msvc- or i586-
# depending on the cross-compiler.
BINPREF=i586-mingw32-
# Set this to where the mingw32 include files are. It must be accurate.
HEADER=/users/ripley/R/cross-tools5/i586-mingw32/include
endif
# path (possibly full path) to same version of R on the host system
# R_EXE=R

and please do note the last two lines.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Any interest in merge and by implementations specifically for sorted data?

2006-07-31 Thread Kevin B. Hendricks
Hi Thomas,

Here is a comparison of performance times from my own igroupSums  
versus using split and rowsum:

  x - rnorm(2e6)
  i - rep(1:1e6,2)
 
  unix.time(suma - unlist(lapply(split(x,i),sum)))
[1] 8.188 0.076 8.263 0.000 0.000
 
  names(suma)- NULL
 
  unix.time(sumb - igroupSums(x,i))
[1] 0.036 0.000 0.035 0.000 0.000
 
  all.equal(suma, sumb)
[1] TRUE
 
  unix.time(sumc - rowsum(x,i))
[1] 0.744 0.000 0.742 0.000 0.000
 
  sumc - sumc[,1]
  names(sumc)-NULL
  all.equal(suma,sumc)
[1] TRUE


So my implementation of igroupSums is faster and already handles NA.   
I also have implemented igroupMins, igroupMaxs, igroupAnys,  
igroupAlls, igroupCounts, igroupMeans, and igroupRanges.

The igroup functions I implemented do not handle weights yet but do  
handle NAs properly.

Assuming I clean them up, is anyone in the R developer group interested?

Or would you rather I instead extend the rowsum appropach to create  
rowcount, rowmax, rowmin, rowcount, etc using a hash function approach.

All of these approaches simply use differently ways to map group  
codes to integers and then do the functions the same.

Thanks,

Kevin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Bug in title() crashes R (PR#9115)

2006-07-31 Thread Duncan Murdoch
On 7/31/2006 11:56 AM, [EMAIL PROTECTED] wrote:
 Hello,
 
 The below reliably crashes R 2.3.1:
 plot.new()
 title(1:10)
 Process R segmentation fault (core dumped) ...

This was an internal bug in do_title.  When the integer vector was 
converted to character it wasn't protected, and garbage collection in 
the middle of drawing it threw it away.  I suppose it could happen with 
any title that needed conversion, but the large size of this one made it 
more likely that GC would happen.  Soon to be fixed in R-devel and
R-patched.

Thanks for the report.

Duncan Murdoch

 
 Also, R will crash when these vectors are much smaller, just not as
 reliably.
 
 I haven't tried this in on today's snapshot, but didn't see anything in
 the changelog that seems to have addressed this.
 
 HTH,
 Robert
 
 R.version
_=20
 platform   i386-pc-mingw32  =20
 arch   i386 =20
 os mingw32  =20
 system i386, mingw32=20
 status  =20
 major  2=20
 minor  3.1  =20
 year   2006 =20
 month  06   =20
 day01   =20
 svn rev38247=20
 language   R=20
 version.string Version 2.3.1 (2006-06-01)
 
 Robert McGehee
 Quantitative Analyst
 Geode Capital Management, LLC
 53 State Street, 5th Floor | Boston, MA | 02109
 Tel: 617/392-8396Fax:617/476-6389
 mailto:[EMAIL PROTECTED]
 
 
 
 This e-mail, and any attachments hereto, are intended for us...{{dropped}}
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Choropleths maps of the USA states (PR#9111)

2006-07-31 Thread Chris Lawrence
On 7/31/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
 I have a problem with the choropleths maps of the USA states=20

  In fact, when I use the following code

 #

 palette(gray(seq(0,.9,len=3D10)))

  ordena.estados-c(ALABAMA,ARIZONA,ARKANSAS,CALIFORNIA,COLORADO,
 CONNECTICUT,DELAWARE,FLORIDA,GEORGIA,IDAHO,ILLINOIS,INDIANA,=
 IOWA,
 KANSAS,
 KENTUCKY,LOUISIANA,MAINE,MARYLAND,MASSACHUSETTS,MICHIGAN,MINNE=
 SOTA,MISSISSIPI,
  MISSOURI, MONTANA,NEBRASKA,NEVADA,NEW HAMPSHIRE,NEW JERSEY,
 NEW MEXICO,NEW YORK,
 NORTH CAROLINA,NORTH DAKOTA,OHIO,OKLAHOMA,OREGON,PENNSYLVANIA,=
 RHODE ISLAND,
 SOUTH CAROLINA,
 SOUTH DAKOTA,TENNESSEE,TEXAS,UTAH,VERMONT,VIRGINIA,WASHINGTON,
 WEST VIRGINIA,WISCONSIN,WYOMING)

 W-c(1,10,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1=
 ,1,1,1,1,1,1,1,1,1,1,1,1,1)

 mapa.sat-map('state', region =3D ordena.estados,fill=3DT,col=3DW)=20=20

 #

 That should paint all the states with the same colour except   ARIZONA , =
  I get the  UTAH state also coloured as ARIZONA.

 The same thing happens with some other states, I get two states coloured in=
 stead of one.

 Is there any problem with the way I have made the ranking of states in ord=
 ena.estados?

I believe that Michigan and Virginia (at least... there may be others)
have more than one polygon.  So... you end up not having enough colors
in the list and it wraps around.

I once put together a little recipe including a call to grep() that
handled the duplicate polygons.  Try something like:

regions - map('state', namesonly=T)
W - rep(1, length(regions))
W[grep('^arizona', regions, ignore.case=T)] - 10
map('state', fill=T, col=W)

It should also work for the weird states like Michigan and Virginia
(with the appropriate change to the grep call).

Hope this helps...


Chris

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel