Hello,

I have been toying with the survey package's withReplicates function, which 
lets users easily extend the survey package to support any weighted statistic. 
There are a number of ML algorithms in various packages that accept weights, 
and it is fairly easy to use them with withReplicates. Below is a naïve example:

library(survey)
library(rpart)
library(gbm)

data(api)

# create survey object
dstrat<-svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat, fpc=~fpc)

rstrat<-as.svrepdesign(dstrat)

# try rpart
predr <- as.data.frame(withReplicates(rstrat, function(w, data) {
  predict(rpart(api00~ell+meals+mobility,data=data,weights=w))
}))

# try gbm
predg <- as.data.frame(withReplicates(rstrat, function(w, data) {
  predict(gbm(api00~ell+meals+mobility,data=data,weights=w,
              n.trees=100))
}))

# try regular svyglm
preds <- as.data.frame(predict(svyglm(api00~ell+meals+mobility,rstrat)))

head(data.frame(predr,predg,preds))

With rpart, the standard errors are absurdly large, and clearly incorrect. With 
gbm, the results seem reasonable. 

I see in this extremely old post that you can't use quantile regression with 
withReplicates for some survey designs and expect to get reasonable results: 
https://stat.ethz.ch/pipermail/r-help/2008-August/171620.html

Quantiles and survey stats are messy business so that issue may be unique to 
quantile regressions, but based on that post it would seem that the function, 
and survey design need to have certain properties for withReplicates to 
generate valid SEs. This is not documented with withReplicates though. 

So my question is, what properties does an ML algorithm/survey design need for 
withReplicates to generate valid SEs?

Kind Regards,
Carl Ganz

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to