> On Mar 10, 2016, at 2:00 PM, Robert McGehee <rmcge...@gmail.com> wrote: > > Hello R-helpers, > I'd like a function that given an arbitrary formula and a data frame > returns the residual of the dependent variable,and maintains all NA values.
What does "maintains all NA values" actually mean? > > Here's an example that will give me what I want if my formula is y~x1+x2+x3 > and my data frame is df: > > resid(lm(y~x1+x2+x3, data=df, na.action=na.exclude)) > > Here's the catch, I do not want my function to ever fail due to a factor > with only one level. A one-level factor may appear because 1) the user > passed it in, or 2) (more common) only one factor in a term is left after > na.exclude removes the other NA values. > > Here is the error I would get >From what code? > above if one of the terms was a factor with > one level: > Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : > contrasts can be applied only to factors with 2 or more levels Unable to create that error with the actions you decribe but to not actually offer in coded form: > dfrm <- data.frame(y=rnorm(10), x1=rnorm(10) ,x2=TRUE, x3=rnorm(10)) > lm(y~x1+x2+x3, dfrm) Call: lm(formula = y ~ x1 + x2 + x3, data = dfrm) Coefficients: (Intercept) x1 x2TRUE x3 -0.16274 -0.30032 NA -0.09093 > resid(lm(y~x1+x2+x3, data=dfrm, na.action=na.exclude)) 1 2 3 4 5 6 -0.16097245 0.65408508 -0.70098223 -0.15360434 1.26027872 0.55752239 7 8 9 10 -0.05965653 -2.17480605 1.42917190 -0.65103650 > > Instead of giving me an error, I'd like the function to do just what lm() > normally does when it sees a variable with no variance, ignore the variable > (coefficient is NA) and continue to regress out all the other variables. > Thus if 'x2' is a factor with one variable in the above example, I'd like > the function to return the result of: > resid(lm(y~x1+x3, data=df, na.action=na.exclude)) > Can anyone provide me a straight forward recommendation for how to do this? > I feel like it should be easy, but I'm honestly stuck, and my Google > searching for this hasn't gotten anywhere. The key is that I'd like the > solution to be generic enough to work with an arbitrary linear formula, and > not substantially kludgy (like trying ever combination of regressions terms > until one works) as I'll be running this a lot on big data sets and don't > want my computation time swamped by running unnecessary regressions or > checking for number of factors after removing NAs. > > Thanks in advance! > --Robert > > > PS. The Google search feature in the R-help archives appears to be down: > http://tolstoy.newcastle.edu.au/R/ It's working for me. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.