Please, I seek expertise and advice, possibly leads to R packages or stats literature. My data: measurements of economic variables for each county of California over 37 years. My dependent variable is square feet of office floor space permitted to be added in a county. Independent variables include for example change in number of office jobs in same county same year (and lagged years). Smaller (less populous) counties have many years in which there were no permits taken out; the largest counties had at least some permitted square footage each year. Among the set of years and places where permits were taken out, smaller counties tend to more permitted square footage per capita. I imagine the relationships are as follows: y* = desired change in floor space = X b + e , where X are independent variables and b their coefficients, and e is heteroskedastic (by state) and possibly autocorrelated noise. e is the sort of noise youâd expect on a time series of cross sections, no sampling (one observation for every county for every year studied). I must include fixed county effects (county dummies) in X because I will need them for county-specific forecasts later. y = permits taken out y = y* when y* > 0 y = 0 when y* = 0 y never has been observed = 0 when population > pop* How do folks recommend that I estimate this regression? I could first estimate the probability of having any permits given level of population and change in office jobs. That could include a deterministic component: if population > pop* then y = y*. I could try a tobit-2 model or its ML estimator, which I see have just been developed in an R package called âsampleSelection.â But in that case, I have I would guess in my ignorance that Iâm way biased because it assumes specific error distributions and that they are homoskedastic. (I could transform the data in advance to get homoskedasticity if necessary). I could make an instrument for probability(y*>0) and multiply that by each observation of permits, avoiding distributional assumptions in e (but not in prob(y*>0)), but would that give me really high variance? (And is there an easier way to find variance than algebraically figuring out Newey West estimator for 2-stage method of moments procedures and how it applies here?) I know nothing of the non- and semi-parametric options here but does someone had an article or book chapter telling thatâs the right thing to do and how? It would be most convenient for me to use R but I also have access to STATA and SAS. Now another question for the statistically-minded: After running this regression I will forecast for each region how many square feet of office space will have permits taken out for it each year, given expected trends on office jobs and such. This does not allow each individual county to have a different type of response to office jobs, assuming the office job coefficient is pooled. Please be encouraged to comment on these options I am considering to allow more variation: County-specific coefficients donât work well; I tried separate (admittedly OLS) regressions for each county and found that with only 35 or so observations per county, my variances were too large and coefficients were insignificant and often of unintuitive signs. Random coefficients wonât give me county specific info, which Iâll need for the forecasts. So is this idea good? After I have coefficients from the pooled regression above, I take each coefficient b and its standard error. I use that as a stochastic restriction or Bayesian prior, for individual county regressions. That is, each county regression estimates its own b value, but subject to the stochastic restriction or Bayesian prior that b is in fact the pooled b, with the distribution of said prior being that bâs variance is the variance we estimated in the pooled regression. (Iâm thinking of what has been called Bayesian/Mixed Estimation here, but if Iâm out of the loop on newer better techniques, do tell.) Iâd think this county-specific estimation would be a simple non-tobit regression for large counties that never lack additions in any year. For small counties, I might need to do a tobit-style or instrumental variable regression again (or whatever you folks recommend). It might be harder to estimate probability of nonzero permits on the smaller sample size so I might have to keep the old estimate.
All thoughts are welcome and appreciated. Thanks very very much. [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.