[R] Direct Method Age-Adjustment to Complex Survey Data

Anthony Damico Fri, 10 Aug 2012 10:55:35 -0700

Hi everyone, my apologies in advance if I'm overlooking something simple in
this question.  I am trying to use R's survey package to make a direct
method age-adjustment to some complex survey data.  I have played with
postStratify, calibrate, rake, and simply multiplying the base weights by
the correct proportions - nothing seems to hit the published numbers on the
nose.


I am trying to replicate any of the following:

    the stdize and stdweight parameters from a Stata complex survey
analysis command (like this)

        svyset [w= WEIGHTVAR], psu(PSUVAR) strata(STRATAVAR) vce(linearized)
        svy: mean VARNAME, stdize(agecat) stdweight(agewt)

    the stdvar and stdwgt parameters from a SUDAAN proc descript call

    the ESTIMATE statement in a PROC SURVEYREG (used to generate adjusted
means) in SAS version 9.2


I am attempting to match figure 1 from nchs data brief #92 exactly using
R.  The data are here: http://www.cdc.gov/nchs/data/databriefs/db92_fig1.png
 and the full brief is here:
http://www.cdc.gov/nchs/data/databriefs/db92.htm  This is the National
Health and Nutrition Examination Survey, produced by the US CDC.

The CDC has published a bunch of syntax examples for generating
age-adjusted statistics with SUDAAN, SAS, and Stata.  Here's a link with
all of that syntax:
http://www.cdc.gov/nchs/tutorials/NHANES/NHANESAnalyses/agestandardization/age_standardization_intro.htm


I've provided my detailed code below that gets to the point of the age
adjustment, and then fails.  However, when I export my R data frame near
the end of the process and run a few Stata commands, I get the correct
numbers easily.  Note that the code below uses the download.file command
twice to pull the (very small) data sets onto your local computer.


Any guidance about how to perform this age-adjustment would be appreciated!

Thanks!!

Anthony Damico
Kaiser Family Foundation



require(foreign)
require(survey)

############# download the NHANES 2009-2010 demographics and total
cholesterol files ##################
tf <- tempfile()
download.file(
    "
ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/nhanes/2009-2010/demo_f.xpt"; ,
    tf ,
    mode = "wb"
)
NHANES.0910.demographics.df <- read.xport( tf )

tf <- tempfile()
download.file(
    "
ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/nhanes/2009-2010/TCHOL_F.xpt"; ,
    tf ,
    mode = "wb"
)
NHANES.0910.TCHOL_F.df <- read.xport( tf )


############# merge them together, keeping only a few columns
##################

DemoKeepVars <- c( "SEQN" , "WTMEC2YR" , "RIDSTATR" , "SDMVPSU" ,
"SDMVSTRA" , "RIDAGEYR" , "RIAGENDR" , "RIDRETH1" )
TCHOLKeepVars <- c( "SEQN" , "LBXTC" )

x <-
    merge(
        NHANES.0910.demographics.df[ , DemoKeepVars ] ,
        NHANES.0910.TCHOL_F.df[ , TCHOLKeepVars ] ,
        all = T
    )

# keep only individuals who took the "mobile examination"
x <- subset( x , RIDSTATR %in% 2 )

# make a few recodes..
x <-
    transform(
        x ,

        # high cholesterol
        HI_CHOL =
            ifelse( LBXTC >= 240 , 1 , 0 ) ,

        # four race/ethnicity categories
        race =
            ifelse( RIDRETH1 %in% 1:2 , 3 ,
            ifelse( RIDRETH1 %in% 3 , 1 ,
            ifelse( RIDRETH1 %in% 4 , 2 ,
            ifelse( RIDRETH1 %in% 5 , 4 ,
                NA ) ) ) ) ,

        # four age categories
        agecat =
            ifelse( RIDAGEYR %in% 0:19 , 0 ,
            ifelse( RIDAGEYR %in% 20:39 , 1 ,
            ifelse( RIDAGEYR %in% 40:59 , 2 ,
            ifelse( RIDAGEYR >= 60 , 3 ,
                NA ) ) ) )
    )


# 2000 census populations to adjust to
# from excel file:
http://www.cdc.gov/nchs/tutorials/nhanes/Downloads/Continuous/ageadjwt.xls
# found on page:
http://www.cdc.gov/nchs/tutorials/nhanes/nhanesanalyses/agestandardization/Task1c.htm
pop.by.age <- c( 55901 , 77670 , 72816 , 45364 )

# calculate each age group's percent of the total
share.pop.by.age <- pop.by.age / sum( pop.by.age )


x <-
    transform(
        x ,
        agewt =
            ifelse( agecat %in% 0 , share.pop.by.age[ 1 ] ,
            ifelse( agecat %in% 1 , share.pop.by.age[ 2 ] ,
            ifelse( agecat %in% 2 , share.pop.by.age[ 3 ] ,
            ifelse( agecat %in% 3 , share.pop.by.age[ 4 ] , NA ) ) ) ) )


# initiate the survey object
y <-
    svydesign(
        id = ~SDMVPSU ,
        strata = ~SDMVSTRA ,
        nest = TRUE ,
        weights = ~WTMEC2YR ,
        data = x
    )

# only perform analyses among individuals aged 20 and over
z <- subset( y , RIDAGEYR >= 20 )

# these four commands calculate the correct non-age-adjusted estimate
svymean( ~HI_CHOL , z , na.rm = T )
svyby( ~HI_CHOL , ~race , z , svymean , na.rm = T )
svyby( ~HI_CHOL , ~RIAGENDR , z , svymean , na.rm = T )
svyby( ~HI_CHOL , ~RIAGENDR+race , z , svymean , na.rm = T )
# but matching the figure exactly requires an exact age adjustment.

# create the population types vector
pop.types <-
    data.frame(
        agecat = 0:3 ,
        Freq = c( 55901 , 77670 , 72816 , 45364 )
    )


z.postStratified <- postStratify( z , ~agecat , pop.types , partial = T )

# these estimates are close, but not exactly right.
svymean( ~HI_CHOL , z.postStratified , na.rm = T )
svyby( ~HI_CHOL , ~race , z.postStratified , svymean , na.rm = T )
svyby( ~HI_CHOL , ~RIAGENDR , z.postStratified , svymean , na.rm = T )
svyby( ~HI_CHOL , ~RIAGENDR+race , z.postStratified , svymean , na.rm = T )



###########export for stata#################
# write the file to a stata file on your local drive..
write.dta( x , "c:/temp/temp.dta" )

#######STATA/MP 11.2 COMMANDS###############
use "c:\temp\temp.dta", clear
svyset [w= WTMEC2YR], psu(SDMVPSU) strata(SDMVSTRA) vce(linearized)
gen adult = 1 if agecat > 0
# these three commands produce the correct results
svy, subpop(adult): mean HI_CHOL, stdize(agecat) stdweight(agewt)
svy, subpop(adult):  mean HI_CHOL, stdize(agecat) stdweight(agewt) over(
race )
svy, subpop(adult):  mean HI_CHOL, stdize(agecat) stdweight(agewt) over(
race RIAGENDR )

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Direct Method Age-Adjustment to Complex Survey Data

Reply via email to