Dear List,

I'm just teaching myself semi-parametric techniques.  Apologies in 
advance for the long post.

I've got observational data and a longitudinal, semi-parametric model 
that I want to fit in GAM (or potentially something equivalent), and I'm 
not sure how to do it.  I'm posting this to ask whether it is possible 
to do what I want to do using "canned" commands and plotting routines.  
If not, I'd probably have to spend some time programming from scratch.

The response is modeled as a function of a few dummy variables and 
several continuous variables.  To control for time-invariant 
unobservable characteristics, I'm also including a "fixed effect" in the 
econometrics sense of the term -- an individual-specific intercept.  I 
also want to model the continuous variables flexibly -- I have no good 
priors on the proper specification for the function form.  The model is 
the following:

y_{it} = \alpha_i + \beta_1(T_{it}) + f(continuous.vars_{it}) + e_{it}

To control for unobserved time-invariant heterogeneity, I want to 
de-mean the data as follows:

y_{it}-\bar{y_i} = \beta_1(T_{it}-\bar{T_i}) + 
f(continuous.vars_{it}-\bar{continuous.vars_i}) + e_{it} - \bar{e}_i

Fitting the demeaned model should give me coefficient estimates equal to 
the non de-meaned model, including coefficient estimates on the spline 
terms.  However, there is certainly autocorrelation in the errors, and 
potentially heteroskedasticity.  The Ruppert et al textbook on 
semiparametric regression uses GLS to account for correlated errors.  I 
haven't really used GLS much and I don't think it solves the 
autocorrelation problem.  I'm more accustomed to using a cluster-robust 
"sandwich" estimator:

(X'X)^{-1}   (sum_j(X_j' e_j e_j' X_j))  (X'X)^{-1}

In a penalized spline context, this would be something like the following:

  (X'X+\lambda K)^{-1}   (sum_j(X_j' e_j e_j' X_j))   (X'X+\lambda K)^{-1}

(where J are clusters -- units on whom observations are repeated).

As far as I can tell, there is nothing theoretically wrong with this 
extension of the sandwich estimator to the semiparametric context, 
though I'm still learning all of this.  If anybody could point out any 
potential problems in the above formulation, I'd be grateful.

So unless I am missing something with the theory (which is certainly 
quite possible at this stage), my problem is with the implementation:

1.  I want to run GAM on de-meaned data, including de-meaned spline 
terms.  How would I go about doing this?  In octave, I would define a 
matrix to include spline terms, and then de-mean that whole matrix 
before fitting the model (via REML or ML, to get the smoothing 
parameters).  But if I do so manually in R and feed this matrix into 
GAM, GAM would simply take them as more data, to be splined, unless I 
specify them as linear terms.  If I do that, I don't get the graphs, 
which I want.

2.  In the graphs given by plot(gam.object), I get confidence intervals 
that correspond to standard errors that are NOT based on the sandwich 
estimator, above.  How could I get GAM to plot confidence intervals 
based on the sandwich estimator for the vcv matrix?

3.  Some of the terms are interactions.  (i.e.: T*var2).  I realize that 
GAM has tensor product capabilities, but (1) frankly I don't understand 
them yet, and (2) I want my semi-parametric fit to be comparable to 
polynomial "parametric" fits.  So, in addition to the plots given by 
GAM, I'd like to be able to make a graph that adds the estimate of the 
coefficient on T to the smooth function of T*var2, and also adds their 
standard errors.  Of course, I'd like these to be the standard errors 
estimated by the "sandwich" estimator, above.

Thanks for any help and advice, and for bearing with my long post.

Best,
Andrew


-- 

*Andrew Crane-Droesch*
Energy and Resources Group
UC Berkeley
+1 215 435 2644
andre...@berkeley.edu
skype: andrew.crane-droesch
http://andrewcd.berkeley.edu

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to