David et.al.:

It's a problem with poly (or rather with how it is being misused)

> mx <- as.matrix(gasoline[1:50,"NIR"])
> str(mx)
 AsIs [1:50, 1:401] -0.0502 -0.0442 -0.0469 -0.0467 -0.0509 ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:50] "1" "2" "3" "4" ...
  ..$ : chr [1:401] "900 nm" "902 nm" "904 nm" "906 nm" ...

> poly(mx[,1:5],2)  ## only 5 columns
Error in poly(dots[[i]], degree, raw = raw, simple = raw) :
  'degree' must be less than number of unique points
> out <- poly(mx[,1:5], degree =2)
> dim(out)
[1] 50 20
## So this is same issue as before. But:

> out <- poly(mx[,1:30],degree = 2)

## 30 columns means 30*30 =900 2nd degree terms, but there are at most 50
## orthogonal vectors for the 50 -d space; ergo, poly() chokes rather
gracelessly, which is what you saw, with the following output:

rsession(2093,0x7fffe0c113c0) malloc: ***
mach_vm_map(size=823564528381952) failed (error code=3)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
rsession(2093,0x7fffe0c113c0) malloc: ***
mach_vm_map(size=823564528381952) failed (error code=3)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
Error: cannot allocate vector of size 767004.2 Gb

Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Thu, Jul 13, 2017 at 4:36 PM, David Winsemius <dwinsem...@comcast.net> wrote:
>
>> On Jul 13, 2017, at 10:43 AM, Bert Gunter <bgunter.4...@gmail.com> wrote:
>>
>> poly(NIR, degree = 2) will work if NIR is a matrix, not a data.frame.
>> The degree argument apparently  *must* be explicitly named if NIR is
>> not a numeric vector. AFAICS, this is unclear or unstated in ?poly.
>
> I still get the same error with:
>
> library(pld)
> data(gasoline)
> gasTrain <- gasoline[1:50,]
> gas1 <- plsr(octane ~ poly(as.matrix(NIR), 2), ncomp = 10, data = gasTrain, 
> validation = "LOO")
>
>
> Error in rep.int(rep.int(seq_len(nx), rep.int(rep.fac, nx)), orep) :
>   invalid 'times' value
>
>> gas1 <- plsr(octane ~ poly(as.matrix(gasTrain$NIR), degree=2), ncomp = 10, 
>> data = gasTrain, validation = "CV")
> Error in rep.int(rep.int(seq_len(nx), rep.int(rep.fac, nx)), orep) :
>   invalid 'times' value
>
>> str(as.matrix(gasTrain$NIR))
>  AsIs [1:50, 1:401] -0.0502 -0.0442 -0.0469 -0.0467 -0.0509 ...
>  - attr(*, "dimnames")=List of 2
>   ..$ : chr [1:50] "1" "2" "3" "4" ...
>   ..$ : chr [1:401] "900 nm" "902 nm" "904 nm" "906 nm" ...
>
> So tried to strip the RHS down to a "simple" matrix
>
>> gas1 <- plsr(octane ~ poly(matrix(gasTrain$NIR, nrow=nrow(gasTrain$NIR) ), 
>> degree=2), ncomp = 10, data = gasTrain, validation = "CV")
> Error in rep.int(rep.int(seq_len(nx), rep.int(rep.fac, nx)), orep) :
>   invalid 'times' value
>
> I guess it reflects my lack of understanding of poly (which parallels my lack 
> of understanding of PLS.)
> --
> David.
>>
>>
>> -- Bert
>>
>> Bert Gunter
>>
>> "The trouble with having an open mind is that people keep coming along
>> and sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>
>>
>> On Thu, Jul 13, 2017 at 10:15 AM, David Winsemius
>> <dwinsem...@comcast.net> wrote:
>>>
>>>> On Jul 12, 2017, at 6:58 PM, Ng, Kelvin Sai-cheong <ks...@connect.hku.hk> 
>>>> wrote:
>>>>
>>>> Dear all,
>>>>
>>>> I am using the pls package of R to perform partial least square on a set of
>>>> multivariate data.  Instead of fitting a linear model, I want to fit my
>>>> data with a quadratic function with interaction terms.  But I am not sure
>>>> how.  I will use an example to illustrate my problem:
>>>>
>>>> Following the example in the PLS manual:
>>>> ## Read data
>>>> data(gasoline)
>>>> gasTrain <- gasoline[1:50,]
>>>> ## Perform PLS
>>>> gas1 <- plsr(octane ~ NIR, ncomp = 10, data = gasTrain, validation = "LOO")
>>>>
>>>> where octane ~ NIR is the model that this example is fitting with.
>>>>
>>>> NIR is a collective of variables, i.e. NIR spectra consists of 401 diffuse
>>>> reflectance measurements from 900 to 1700 nm.
>>>>
>>>> Instead of fitting with predict.octane[i] = a[0] * NIR[0,i] + a[1] *
>>>> NIR[1,i] + ...
>>>> I want to fit the data with:
>>>> predict.octane[i] = a[0] * NIR[0,i] + a[1] * NIR[1,i] + ... +
>>>> b[0]*NIR[0,i]*NIR[0,i] + b[1] * NIR[0,i]*NIR[1,i] + ...
>>>>
>>>> i.e. quadratic with interaction terms.
>>>>
>>>> But I don't know how to formulate this.
>>>
>>> I did not see any terms in the model that I would have called interaction 
>>> terms. I'm seeing a desire for a polynomial function in NIR. For that 
>>> purpose, one might see if you get satisfactory results with:
>>>
>>> gas1 <- plsr(octane ~NIR + I(NIR^2), ncomp = 10, data = gasTrain, 
>>> validation = "LOO")
>>> gas1
>>>
>>> I first tried using poly(NIR, 2) on the RHS and it threw an error, which 
>>> raises concerns in my mind that this may not be a proper model. I have no 
>>> experience with the use of plsr or its underlying theory, so the fact that 
>>> this is not throwing an error is no guarantee of validity. Using this 
>>> construction in ordinary least squares regression has dangers with 
>>> inferential statistics because of the correlation of the linear and squared 
>>> terms as well as likely violation of homoscedasticity.
>>>
>>> --
>>> David.
>>>
>>>
>>>>
>>>> May I have some help please?
>>>>
>>>> Thanks,
>>>>
>>>> Kelvin
>>>>
>>>>      [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide 
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>> David Winsemius
>>> Alameda, CA, USA
>>>
>>> ______________________________________________
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA
>

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to