David et.al.: It's a problem with poly (or rather with how it is being misused)
> mx <- as.matrix(gasoline[1:50,"NIR"]) > str(mx) AsIs [1:50, 1:401] -0.0502 -0.0442 -0.0469 -0.0467 -0.0509 ... - attr(*, "dimnames")=List of 2 ..$ : chr [1:50] "1" "2" "3" "4" ... ..$ : chr [1:401] "900 nm" "902 nm" "904 nm" "906 nm" ... > poly(mx[,1:5],2) ## only 5 columns Error in poly(dots[[i]], degree, raw = raw, simple = raw) : 'degree' must be less than number of unique points > out <- poly(mx[,1:5], degree =2) > dim(out) [1] 50 20 ## So this is same issue as before. But: > out <- poly(mx[,1:30],degree = 2) ## 30 columns means 30*30 =900 2nd degree terms, but there are at most 50 ## orthogonal vectors for the 50 -d space; ergo, poly() chokes rather gracelessly, which is what you saw, with the following output: rsession(2093,0x7fffe0c113c0) malloc: *** mach_vm_map(size=823564528381952) failed (error code=3) *** error: can't allocate region *** set a breakpoint in malloc_error_break to debug rsession(2093,0x7fffe0c113c0) malloc: *** mach_vm_map(size=823564528381952) failed (error code=3) *** error: can't allocate region *** set a breakpoint in malloc_error_break to debug Error: cannot allocate vector of size 767004.2 Gb Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Thu, Jul 13, 2017 at 4:36 PM, David Winsemius <dwinsem...@comcast.net> wrote: > >> On Jul 13, 2017, at 10:43 AM, Bert Gunter <bgunter.4...@gmail.com> wrote: >> >> poly(NIR, degree = 2) will work if NIR is a matrix, not a data.frame. >> The degree argument apparently *must* be explicitly named if NIR is >> not a numeric vector. AFAICS, this is unclear or unstated in ?poly. > > I still get the same error with: > > library(pld) > data(gasoline) > gasTrain <- gasoline[1:50,] > gas1 <- plsr(octane ~ poly(as.matrix(NIR), 2), ncomp = 10, data = gasTrain, > validation = "LOO") > > > Error in rep.int(rep.int(seq_len(nx), rep.int(rep.fac, nx)), orep) : > invalid 'times' value > >> gas1 <- plsr(octane ~ poly(as.matrix(gasTrain$NIR), degree=2), ncomp = 10, >> data = gasTrain, validation = "CV") > Error in rep.int(rep.int(seq_len(nx), rep.int(rep.fac, nx)), orep) : > invalid 'times' value > >> str(as.matrix(gasTrain$NIR)) > AsIs [1:50, 1:401] -0.0502 -0.0442 -0.0469 -0.0467 -0.0509 ... > - attr(*, "dimnames")=List of 2 > ..$ : chr [1:50] "1" "2" "3" "4" ... > ..$ : chr [1:401] "900 nm" "902 nm" "904 nm" "906 nm" ... > > So tried to strip the RHS down to a "simple" matrix > >> gas1 <- plsr(octane ~ poly(matrix(gasTrain$NIR, nrow=nrow(gasTrain$NIR) ), >> degree=2), ncomp = 10, data = gasTrain, validation = "CV") > Error in rep.int(rep.int(seq_len(nx), rep.int(rep.fac, nx)), orep) : > invalid 'times' value > > I guess it reflects my lack of understanding of poly (which parallels my lack > of understanding of PLS.) > -- > David. >> >> >> -- Bert >> >> Bert Gunter >> >> "The trouble with having an open mind is that people keep coming along >> and sticking things into it." >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >> >> >> On Thu, Jul 13, 2017 at 10:15 AM, David Winsemius >> <dwinsem...@comcast.net> wrote: >>> >>>> On Jul 12, 2017, at 6:58 PM, Ng, Kelvin Sai-cheong <ks...@connect.hku.hk> >>>> wrote: >>>> >>>> Dear all, >>>> >>>> I am using the pls package of R to perform partial least square on a set of >>>> multivariate data. Instead of fitting a linear model, I want to fit my >>>> data with a quadratic function with interaction terms. But I am not sure >>>> how. I will use an example to illustrate my problem: >>>> >>>> Following the example in the PLS manual: >>>> ## Read data >>>> data(gasoline) >>>> gasTrain <- gasoline[1:50,] >>>> ## Perform PLS >>>> gas1 <- plsr(octane ~ NIR, ncomp = 10, data = gasTrain, validation = "LOO") >>>> >>>> where octane ~ NIR is the model that this example is fitting with. >>>> >>>> NIR is a collective of variables, i.e. NIR spectra consists of 401 diffuse >>>> reflectance measurements from 900 to 1700 nm. >>>> >>>> Instead of fitting with predict.octane[i] = a[0] * NIR[0,i] + a[1] * >>>> NIR[1,i] + ... >>>> I want to fit the data with: >>>> predict.octane[i] = a[0] * NIR[0,i] + a[1] * NIR[1,i] + ... + >>>> b[0]*NIR[0,i]*NIR[0,i] + b[1] * NIR[0,i]*NIR[1,i] + ... >>>> >>>> i.e. quadratic with interaction terms. >>>> >>>> But I don't know how to formulate this. >>> >>> I did not see any terms in the model that I would have called interaction >>> terms. I'm seeing a desire for a polynomial function in NIR. For that >>> purpose, one might see if you get satisfactory results with: >>> >>> gas1 <- plsr(octane ~NIR + I(NIR^2), ncomp = 10, data = gasTrain, >>> validation = "LOO") >>> gas1 >>> >>> I first tried using poly(NIR, 2) on the RHS and it threw an error, which >>> raises concerns in my mind that this may not be a proper model. I have no >>> experience with the use of plsr or its underlying theory, so the fact that >>> this is not throwing an error is no guarantee of validity. Using this >>> construction in ordinary least squares regression has dangers with >>> inferential statistics because of the correlation of the linear and squared >>> terms as well as likely violation of homoscedasticity. >>> >>> -- >>> David. >>> >>> >>>> >>>> May I have some help please? >>>> >>>> Thanks, >>>> >>>> Kelvin >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> ______________________________________________ >>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>> >>> David Winsemius >>> Alameda, CA, USA >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. > > David Winsemius > Alameda, CA, USA > ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.