Simon,

Many thanks as always for your help.

I see and appreciate the example that you cited, but I'm having a hard 
time generalizing it to a multivariate case.  A bit about my context -- 
my dataset is response ratios; the log of a treatment over a control.  
One of my explanatory variables is treatment intensity.  When intensity 
goes to zero, the expectation of the response ratio should also go to 
zero.  Here is the model that I would like to fit:

model = (ResponseRatio~
     +s(as.factor(study),bs="re",by=intensity)
     +s(intensity)
     +s(x1),by=intensity)
     +s(x2),by=intensity)
     +te(x1,x2),by=intensity)
     +te(x1,intensity)
)

Here is the example that you gave:

library(mgcv)
set.seed(0)
n <- 100
x <- runif(n)*4-1;x <- sort(x);
f <- exp(4*x)/(1+exp(4*x));y <- f+rnorm(100)*0.1;plot(x,y)
dat <- data.frame(x=x,y=y)

## Create a spline basis and penalty, making sure there is a knot
## at the constraint point, (0 here, but could be anywhere)
knots <- data.frame(x=seq(-1,3,length=9)) ## create knots
## set up smoother...
sm <- smoothCon(s(x,k=9,bs="cr"),dat,knots=knots)[[1]]

## 3rd parameter is value of spline at knot location 0,
## set it to 0 by dropping...
X <- sm$X[,-3]        ## spline basis
S <- sm$S[[1]][-3,-3] ## spline penalty
off <- y*0 + .6       ## offset term to force curve through (0, .6)

## fit spline constrained through (0, .6)...
b <- gam(y ~ X - 1 + offset(off),paraPen=list(X=list(S)))
lines(x,predict(b))

## compare to unconstrained fit...

b.u <- gam(y ~ s(x,k=9),data=dat,knots=knots)
lines(x,predict(b.u))

*My question*:  how can I extend your example to more than one smooth 
terms, and several smooth interactions?  In the context of a model where 
E[ResponseRatio] must equal 0 when intensity equals zero?

I am not sure that I understand exactly what is going on when you call 
smoothCon to specify the `sm`.  I've written my own penalized spline 
code from scratch, but it is far less sophisticated than mgcv, and is 
basically a ridge regression where I optimize to get a lambda after 
specifying a model with a lot of knots.  mgcv clearly has a lot more 
going on, and is far preferable to my rudimentary code for its handling 
of tensors and random effects.

(Also, for prediction, how can I do "by=dum" AND "by=intensity" at the 
same time?)

Many thanks,
Andrew

PS:  I am aware that interacting my model with an intensity variable 
makes my model quite heteroskedastic.  I am thinking of using a cluster 
wild bootstrap to construct confidence intervals.  If a better way 
forward immediately comes to your mind -- especially if its 
computationally cheaper -- I'd greatly appreciate if you could share it.

On 04/17/2013 02:16 AM, Simon Wood wrote:
> hi Andrew.
>
> gam does suppress the intercept, it's just that this doesn't force the 
> smooth through the intercept in the way that you would like. Basically 
> for the parameteric component of the model '-1' behaves exactly like 
> it does in 'lm' (it's using the same code). The smooths are 'added on' 
> to the parametric component of the model, with sum to zero constraints 
> to force identifiability.
>
> There is a solution to forcing a spline through a particular point at
> http://r.789695.n4.nabble.com/Use-pcls-in-quot-mgcv-quot-package-to-achieve-constrained-cubic-spline-td4660966.html
>  
>
> (i.e. the R help thread "Re: [R] Use pcls in "mgcv" package to achieve 
> constrained cubic spline")
>
> best,
> Simon
>
> On 16/04/13 22:36, Andrew Crane-Droesch wrote:
>>>   Dear List,
>>>
>>> I've just tried to specify a GAM without an intercept -- I've got one
>>> of the (rare) cases where it is appropriate for E(y) -> 0 as X ->0.
>>> Naively running a GAM with the "-1" appended to the formula and the
>>> calling "predict.gam", I see that the model isn't behaving as expected.
>>>
>>> I don't understand why this would be.  Google turns up this old R help
>>> thread: 
>>> http://r.789695.n4.nabble.com/GAM-without-intercept-td4645786.html
>>>
>>> Simon writes:
>>>
>>>      *Smooth terms are constrained to sum to zero over the covariate
>>>      values. **
>>>      **This is an identifiability constraint designed to avoid
>>>      confounding with **
>>>      **the intercept (particularly important if you have more than one
>>>      smooth). *
>>>      If you remove the intercept from you model altogether (m2) then 
>>> the
>>>      smooth will still sum to zero over the covariate values, which in
>>>      your
>>>      case will mean that the smooth is quite a long way from the data.
>>>      When
>>>      you include the intercept (m1) then the intercept is effectively
>>>      shifting the constrained curve up towards the data, and you get a
>>>      nice fit.
>>>
>>> Why?  I haven't read Simon's book in great detail, though I have read
>>> Ruppert et al.'s Semiparametric Regression.  I don't see a reason why
>>> a penalized spline model shouldn't equal the intercept (or zero) when
>>> all of the regressors equals zero.
>>>
>>> Is anyone able to help with a bit of intuition?  Or relevant passages
>>> from a good description of why this would be the case?
>>>
>>> Furthermore, why does the "-1" formula specification work if it
>>> doesn't work "as intended" by for example lm?
>>>
>>> Many thanks,
>>> Andrew
>>>
>>>
>>>
>>
>>
>>     [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to