Re: [Rd] NIST StRD linear regression
That is not an appropriate way to fit a degree-10 polynomial (in any language, if fitting a degree-10 polynomial is in fact an appropriate statistical analysis, which seems unlikely). On Sun, 30 Jul 2006, Carnell, Rob C wrote: NIST maintains a repository of Statistical Reference Datasets at http://www.itl.nist.gov/div898/strd/. I have been working through the datasets to compare R's results to their references with the hope that if all works well, this could become a validation package. What does it validate? The R user's understanding of numerical methods? All the linear regression datasets give results with some degree of accuracy except one. The NIST model includes 11 parameters, but R will not compute the estimates for all 11 parameters because it finds the data matrix to be singular. The code I used is below. Any help in getting R to estimate all 11 regression parameters would be greatly appreciated. I am posting this to the R-devel list since I think that the discussion might involve the limitations of platform precision. I am using R 2.3.1 for Windows XP. rm(list=ls()) require(gsubfn) That is not needed. defaultPath - my path data.base - http://www.itl.nist.gov/div898/strd/lls/data/LINKS/DATA; reg.data - paste(data.base, /Filip.dat, sep=) model - V1~V2+I(V2^2)+I(V2^3)+I(V2^4)+I(V2^5)+I(V2^6)+I(V2^7)+I(V2^8)+I(V2^9)+I (V2^10) filePath - paste(defaultPath, //NISTtest.dat, sep=) download.file(reg.data, filePath, quiet=TRUE) filePath - url(http://www.itl.nist.gov/div898/strd/lls/data/LINKS/DATA/Filip.dat;) will suffice. A - read.table(filePath, skip=60, strip.white=TRUE) lm.data - lm(formula(model), A) lm.data lm(V1 ~ poly(V2, 10), A) works. kappa(model.matrix(V1 ~ poly(V2, 10, raw=TRUE), A), exact=TRUE) [1] 1.767963e+15 shows the design matrix is indeed numerically singular by the naive method. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] NIST StRD linear regression
RobCar == Carnell, Rob C [EMAIL PROTECTED] on Sun, 30 Jul 2006 19:42:29 -0400 writes: RobCar NIST maintains a repository of Statistical Reference RobCar Datasets at http://www.itl.nist.gov/div898/strd/. I RobCar have been working through the datasets to compare RobCar R's results to their references with the hope that RobCar if all works well, this could become a validation RobCar package. RobCar All the linear regression datasets give results with RobCar some degree of accuracy except one. The NIST model RobCar includes 11 parameters, but R will not compute the RobCar estimates for all 11 parameters because it finds the RobCar data matrix to be singular. RobCar The code I used is below. Any help in getting R to RobCar estimate all 11 regression parameters would be RobCar greatly appreciated. RobCar I am posting this to the R-devel list since I think RobCar that the discussion might involve the limitations of RobCar platform precision. RobCar I am using R 2.3.1 for Windows XP. RobCar rm(list=ls()) RobCar require(gsubfn) RobCar defaultPath - my path RobCar data.base - http://www.itl.nist.gov/div898/strd/lls/data/LINKS/DATA; Here is a slight improvement {note the function file.path(); and model - ..; also poly(V2, 10) !} which shows you how to tell lm() to believe in 10 digit precision of input data. --- reg.data - paste(data.base, /Filip.dat, sep=) filePath - file.path(defaultPath, NISTtest.dat) download.file(reg.data, filePath, quiet=TRUE) A - read.table(filePath, skip=60, strip.white=TRUE) ## If you really need high-order polynomial regression in S and R, ## *DO* as you are told in all good books, and use orthogonal polynomials: (lm.ok - lm(V1 ~ poly(V2,10), data = A)) ## and there is no problem summary(lm.ok) ## But if you insist on doing nonsense model - V1 ~ V2+ I(V2^2)+I(V2^3)+I(V2^4)+I(V2^5)+I(V2^6)+I(V2^7)+I(V2^8)+I(V2^9)+I(V2^10) ## MM: better: (model - paste(V1 ~ V2, paste(+ I(V2^, 2:10, ), sep='', collapse=''))) (form - formula(model)) mod.mat - model.matrix(form, data = A) dim(mod.mat) ## 82 11 (m.qr - qr(mod.mat ))$rank # - 10 (only, instead of 11) (m.qr - qr(mod.mat, tol = 1e-10))$rank # - 11 (lm.def - lm(form, data = A)) ## last coef. is NA (lm.plus - lm(form, data = A, tol = 1e-10))## no NA coefficients --- RobCar reg.data - paste(data.base, /Filip.dat, sep=) RobCar model - RobCar V1~V2+I(V2^2)+I(V2^3)+I(V2^4)+I(V2^5)+I(V2^6)+I(V2^7)+I(V2^8)+I(V2^9)+I RobCar (V2^10) RobCar filePath - paste(defaultPath, //NISTtest.dat, sep=) RobCar download.file(reg.data, filePath, quiet=TRUE) RobCar A - read.table(filePath, skip=60, strip.white=TRUE) RobCar lm.data - lm(formula(model), A) RobCar lm.data RobCar Rob Carnell A propos NIST StRD: If you go further to NONlinear regression, and use nls(), you will see that high quality statistics packages such as R do *NOT* always conform to NIST -- at least not to what NIST did about 5 years ago when I last looked. There are many nonlinear least squares problems where the correct result is *NO CONVERGENCE* (because of over-parametrization, ill-posednes, ...), owever many (cr.p) pieces of software do converge---falsely. I think you find more on this topic in the monograph of Bates and Watts (1988), but in any case, just install and use the CRAN R package 'NISTnls' by Doug Bates which contains the data sets with documentation and example calls. Martin Maechler, ETH Zurich __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] NIST StRD linear regression
NIST maintains a repository of Statistical Reference Datasets at http://www.itl.nist.gov/div898/strd/. I have been working through the datasets to compare R's results to their references with the hope that if all works well, this could become a validation package. All the linear regression datasets give results with some degree of accuracy except one. The NIST model includes 11 parameters, but R will not compute the estimates for all 11 parameters because it finds the data matrix to be singular. The code I used is below. Any help in getting R to estimate all 11 regression parameters would be greatly appreciated. I am posting this to the R-devel list since I think that the discussion might involve the limitations of platform precision. I am using R 2.3.1 for Windows XP. rm(list=ls()) require(gsubfn) defaultPath - my path data.base - http://www.itl.nist.gov/div898/strd/lls/data/LINKS/DATA; reg.data - paste(data.base, /Filip.dat, sep=) model - V1~V2+I(V2^2)+I(V2^3)+I(V2^4)+I(V2^5)+I(V2^6)+I(V2^7)+I(V2^8)+I(V2^9)+I (V2^10) filePath - paste(defaultPath, //NISTtest.dat, sep=) download.file(reg.data, filePath, quiet=TRUE) A - read.table(filePath, skip=60, strip.white=TRUE) lm.data - lm(formula(model), A) lm.data Rob Carnell __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel