Re: [Rd] NIST StRD linear regression

2006-07-31 Thread Prof Brian Ripley
That is not an appropriate way to fit a degree-10 polynomial (in any 
language, if fitting a degree-10 polynomial is in fact an appropriate 
statistical analysis, which seems unlikely).

On Sun, 30 Jul 2006, Carnell, Rob C wrote:

 NIST maintains a repository of Statistical Reference Datasets at
 http://www.itl.nist.gov/div898/strd/.  I have been working through the
 datasets to compare R's results to their references with the hope that
 if all works well, this could become a validation package.

What does it validate?  The R user's understanding of numerical methods?

 All the linear regression datasets give results with some degree of
 accuracy except one.  The NIST model includes 11 parameters, but R will
 not compute the estimates for all 11 parameters because it finds the
 data matrix to be singular.
 
 The code I used is below.  Any help in getting R to estimate all 11
 regression parameters would be greatly appreciated.
 
 I am posting this to the R-devel list since I think that the discussion
 might involve the limitations of platform precision.
 
 I am using R 2.3.1 for Windows XP.
 
 rm(list=ls())
 require(gsubfn)

That is not needed.

 defaultPath - my path
 
 data.base - http://www.itl.nist.gov/div898/strd/lls/data/LINKS/DATA;
 
 reg.data - paste(data.base, /Filip.dat, sep=)
 
 model -
 V1~V2+I(V2^2)+I(V2^3)+I(V2^4)+I(V2^5)+I(V2^6)+I(V2^7)+I(V2^8)+I(V2^9)+I
 (V2^10)
 
 filePath - paste(defaultPath, //NISTtest.dat, sep=)
 download.file(reg.data, filePath, quiet=TRUE)

filePath - 
url(http://www.itl.nist.gov/div898/strd/lls/data/LINKS/DATA/Filip.dat;)

will suffice.

 A - read.table(filePath, skip=60, strip.white=TRUE)

 lm.data - lm(formula(model), A)
 
 lm.data

lm(V1 ~ poly(V2, 10), A)

works.

 kappa(model.matrix(V1 ~ poly(V2, 10, raw=TRUE), A), exact=TRUE)
[1] 1.767963e+15

shows the design matrix is indeed numerically singular by the naive 
method.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] NIST StRD linear regression

2006-07-31 Thread Martin Maechler
 RobCar == Carnell, Rob C [EMAIL PROTECTED]
 on Sun, 30 Jul 2006 19:42:29 -0400 writes:

RobCar NIST maintains a repository of Statistical Reference
RobCar Datasets at http://www.itl.nist.gov/div898/strd/.  I
RobCar have been working through the datasets to compare
RobCar R's results to their references with the hope that
RobCar if all works well, this could become a validation
RobCar package.

RobCar All the linear regression datasets give results with
RobCar some degree of accuracy except one.  The NIST model
RobCar includes 11 parameters, but R will not compute the
RobCar estimates for all 11 parameters because it finds the
RobCar data matrix to be singular.

RobCar The code I used is below.  Any help in getting R to
RobCar estimate all 11 regression parameters would be
RobCar greatly appreciated.

RobCar I am posting this to the R-devel list since I think
RobCar that the discussion might involve the limitations of
RobCar platform precision.

RobCar I am using R 2.3.1 for Windows XP.

RobCar rm(list=ls())
RobCar require(gsubfn)

RobCar defaultPath - my path

RobCar data.base - 
http://www.itl.nist.gov/div898/strd/lls/data/LINKS/DATA;

Here is a slight improvement {note the function file.path(); and
model - ..; also  poly(V2, 10) !} 
which shows you how to tell lm() to believe in 10 digit
precision of input data.  

---

reg.data - paste(data.base, /Filip.dat, sep=)
filePath - file.path(defaultPath, NISTtest.dat)
download.file(reg.data, filePath, quiet=TRUE)

A - read.table(filePath, skip=60, strip.white=TRUE)

## If you really need high-order polynomial regression in S and R,
##  *DO* as you are told in all good books, and use orthogonal polynomials:
(lm.ok - lm(V1 ~ poly(V2,10), data = A))
## and there is no problem
summary(lm.ok)

## But if you insist on doing nonsense 

model - V1 ~ V2+ 
I(V2^2)+I(V2^3)+I(V2^4)+I(V2^5)+I(V2^6)+I(V2^7)+I(V2^8)+I(V2^9)+I(V2^10)

## MM: better:
(model - paste(V1 ~ V2, paste(+ I(V2^, 2:10, ), sep='', collapse='')))
(form - formula(model))

mod.mat - model.matrix(form, data = A)
dim(mod.mat) ## 82 11
(m.qr - qr(mod.mat ))$rank # - 10 (only, instead of 11)
(m.qr - qr(mod.mat, tol = 1e-10))$rank # - 11

(lm.def  - lm(form, data = A)) ## last coef. is NA
(lm.plus - lm(form, data = A, tol = 1e-10))## no NA coefficients

---


RobCar reg.data - paste(data.base, /Filip.dat, sep=)

RobCar model -
RobCar 
V1~V2+I(V2^2)+I(V2^3)+I(V2^4)+I(V2^5)+I(V2^6)+I(V2^7)+I(V2^8)+I(V2^9)+I
RobCar (V2^10)

RobCar filePath - paste(defaultPath, //NISTtest.dat, sep=)
RobCar download.file(reg.data, filePath, quiet=TRUE)

RobCar A - read.table(filePath, skip=60, strip.white=TRUE)
RobCar lm.data - lm(formula(model), A)

RobCar lm.data


RobCar Rob Carnell

A propos NIST StRD:
If you go further to  NONlinear regression,
and use nls(), you will see that high quality statistics
packages such as R  do *NOT* always conform to NIST -- at least
not to what NIST did about 5 years ago when I last looked.
There are many nonlinear least squares problems where the
correct result is *NO CONVERGENCE* (because of
over-parametrization, ill-posednes, ...), 
owever many (cr.p) pieces of software do converge---falsely. 
I think you find more on this topic in the monograph of 
Bates and Watts (1988), but in any case,
just install and use the CRAN R package 'NISTnls' by Doug Bates
which contains the data sets with documentation and example
calls.

Martin Maechler, ETH Zurich

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] NIST StRD linear regression

2006-07-30 Thread Carnell, Rob C
NIST maintains a repository of Statistical Reference Datasets at
http://www.itl.nist.gov/div898/strd/.  I have been working through the
datasets to compare R's results to their references with the hope that
if all works well, this could become a validation package.

All the linear regression datasets give results with some degree of
accuracy except one.  The NIST model includes 11 parameters, but R will
not compute the estimates for all 11 parameters because it finds the
data matrix to be singular.

The code I used is below.  Any help in getting R to estimate all 11
regression parameters would be greatly appreciated.

I am posting this to the R-devel list since I think that the discussion
might involve the limitations of platform precision.

I am using R 2.3.1 for Windows XP.

rm(list=ls())
require(gsubfn)

defaultPath - my path

data.base - http://www.itl.nist.gov/div898/strd/lls/data/LINKS/DATA;

reg.data - paste(data.base, /Filip.dat, sep=)

model -
V1~V2+I(V2^2)+I(V2^3)+I(V2^4)+I(V2^5)+I(V2^6)+I(V2^7)+I(V2^8)+I(V2^9)+I
(V2^10)

filePath - paste(defaultPath, //NISTtest.dat, sep=)
download.file(reg.data, filePath, quiet=TRUE)

A - read.table(filePath, skip=60, strip.white=TRUE)
lm.data - lm(formula(model), A)

lm.data


Rob Carnell

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel