is the `with` not passing make.formula( get( 'var_name' ) ) through to svyquantile for some reason? does this work?
MIcombine( with(des, svyquantile(~LBXTCD, .5))) if that's not it, could you make a minimal reproducible example that includes the data download? code to download and import nhanes here https://github.com/ajdamico/asdfree/tree/master/National%20Health%20and%20Nutrition%20Examination%20Survey On Tue, May 10, 2016 at 4:33 PM, Anne Bichteler < abichte...@toxstrategies.com> wrote: > Hello, and thank you for considering this question: > > The svystat object created with multiply imputed NHANES data files is > failing on calling survey::svyquantile. I'm wondering if I'm diagnosing the > issue correctly, whether the behavior is expected, and whether y'all might > have any ideas for workarounds. > > I'm following T. Lumley's general method outlined here: > http://faculty.washington.edu/tlumley/old-survey/svymi.html, but with > data files I've imputed myself on the 2001/2002 biennial. Each file has > 1081 observations and no missing values. > > ### Create the survey design object with list of imputed data files > ImputedList0102. > des <- svydesign(id=~SDMVPSU, strat=~SDMVSTRA, weight=~WTSPO2YR, > data=imputationList(ImputedList0102), nest=TRUE) > > > ### Blood analyte of interest > var_name <- "LBXTCD" # analyte in blood serum > > ### All is well calculating the mean: > M <- with(des, svymean(make.formula(get('var_name')))) > summary(M) > Result <- MIcombine(M) > Result$coefficients > # LBXTCD > # 17.41635 > > > ### but svystat object fails to calculate a 50th percentile: > ### it fails when hard-coding the name rather than using make.formula; > ### it fails regardless of number of files or choices in handling ties or > interval type. > ### There are 16 ties in each data file. > M1 <- with(des, svyquantile(make.formula(get('var_name')), quantiles = > c(.5))) > summary(M1) > > # Length Class Mode > #[1,] 1 -none- numeric > #[2,] 1 -none- numeric > #[3,] 1 -none- numeric > > > ### The quantile is successfully calculated on one file at a time, > however, and is different for each file. > ### (had thought perhaps there was a lack-of-variance issue). The quantile > calculated on each file > ### is the same regardless of interval.type. > des_single1 <- svydesign(id=~SDMVPSU, strat=~SDMVSTRA, weight=~WTSPO2YR, > data=ImputedList0102[[1]], nest=TRUE) > svyquantile(make.formula(get('var_name')), des_single1, c(.5)) > # 0.5 > # LBXTCD 13.5554 > > > des_single2 <- svydesign(id=~SDMVPSU, strat=~SDMVSTRA, weight=~WTSPO2YR, > data=ImputedList0102[[2]], nest=TRUE) > svyquantile(make.formula(get('var_name')), des_single2, c(.5)) > # 0.5 > # LBXTCD 14.06154 > > # The number of observations exceeding the 50th percentile differs for > each file, which I can't claim to understand. > > # I removed the 16 ties, but no help. Do the ties and/or different number > of observations above/below prevent the svydesigns from being combined? > nrow(subset(ImputedList0102[[1]], LBXTCD > 13.5554)) > # [1] 516 > nrow(subset(ImputedList0102[[2]], LBXTCD > 14.06154)) > # [1] 512 > > > I'm hoping someone can point me to some gross error I'm making or another > function parameter or data manipulation or another survey-savvy method > altogether to calculate a 50th percentile across multiply imputed data > files. Thanks for any advice, > > Brennan > > www.toxstrategies.com > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.