Dear List, I'm just teaching myself semi-parametric techniques. Apologies in advance for the long post.
I've got observational data and a longitudinal, semi-parametric model that I want to fit in GAM (or potentially something equivalent), and I'm not sure how to do it. I'm posting this to ask whether it is possible to do what I want to do using "canned" commands and plotting routines. If not, I'd probably have to spend some time programming from scratch. The response is modeled as a function of a few dummy variables and several continuous variables. To control for time-invariant unobservable characteristics, I'm also including a "fixed effect" in the econometrics sense of the term -- an individual-specific intercept. I also want to model the continuous variables flexibly -- I have no good priors on the proper specification for the function form. The model is the following: y_{it} = \alpha_i + \beta_1(T_{it}) + f(continuous.vars_{it}) + e_{it} To control for unobserved time-invariant heterogeneity, I want to de-mean the data as follows: y_{it}-\bar{y_i} = \beta_1(T_{it}-\bar{T_i}) + f(continuous.vars_{it}-\bar{continuous.vars_i}) + e_{it} - \bar{e}_i Fitting the demeaned model should give me coefficient estimates equal to the non de-meaned model, including coefficient estimates on the spline terms. However, there is certainly autocorrelation in the errors, and potentially heteroskedasticity. The Ruppert et al textbook on semiparametric regression uses GLS to account for correlated errors. I haven't really used GLS much and I don't think it solves the autocorrelation problem. I'm more accustomed to using a cluster-robust "sandwich" estimator: (X'X)^{-1} (sum_j(X_j' e_j e_j' X_j)) (X'X)^{-1} In a penalized spline context, this would be something like the following: (X'X+\lambda K)^{-1} (sum_j(X_j' e_j e_j' X_j)) (X'X+\lambda K)^{-1} (where J are clusters -- units on whom observations are repeated). As far as I can tell, there is nothing theoretically wrong with this extension of the sandwich estimator to the semiparametric context, though I'm still learning all of this. If anybody could point out any potential problems in the above formulation, I'd be grateful. So unless I am missing something with the theory (which is certainly quite possible at this stage), my problem is with the implementation: 1. I want to run GAM on de-meaned data, including de-meaned spline terms. How would I go about doing this? In octave, I would define a matrix to include spline terms, and then de-mean that whole matrix before fitting the model (via REML or ML, to get the smoothing parameters). But if I do so manually in R and feed this matrix into GAM, GAM would simply take them as more data, to be splined, unless I specify them as linear terms. If I do that, I don't get the graphs, which I want. 2. In the graphs given by plot(gam.object), I get confidence intervals that correspond to standard errors that are NOT based on the sandwich estimator, above. How could I get GAM to plot confidence intervals based on the sandwich estimator for the vcv matrix? 3. Some of the terms are interactions. (i.e.: T*var2). I realize that GAM has tensor product capabilities, but (1) frankly I don't understand them yet, and (2) I want my semi-parametric fit to be comparable to polynomial "parametric" fits. So, in addition to the plots given by GAM, I'd like to be able to make a graph that adds the estimate of the coefficient on T to the smooth function of T*var2, and also adds their standard errors. Of course, I'd like these to be the standard errors estimated by the "sandwich" estimator, above. Thanks for any help and advice, and for bearing with my long post. Best, Andrew -- *Andrew Crane-Droesch* Energy and Resources Group UC Berkeley +1 215 435 2644 andre...@berkeley.edu skype: andrew.crane-droesch http://andrewcd.berkeley.edu [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.