hi pradip, this should give you what you want
library(foreign) library(survey) tf <- tempfile() download.file( " https://meps.ahrq.gov/mepsweb/data_files/pufs/h163ssp.zip" , tf , mode = 'wb' ) z <- unzip( tf , exdir = tempdir() ) x <- read.xport( z ) names( x ) <- tolower( names( x ) ) design <- svydesign(id=~varpsu,strat=~varstr, weights=~perwt13f, data=x, nest=TRUE) # include missings as "No" values here design <- update(design, xbpchek53 = ifelse(bpchek53 ==1,'yes','no or missing'), xcholck53 = ifelse(cholck53 ==1, 'yes','no or missing') ) # subset out records that were missing for either variable svykappa( ~ xbpchek53 + xcholck53 , subset(design, bpchek53 > 0 & cholck53 > 0 ) ) On Mon, Jun 20, 2016 at 7:49 PM, Muhuri, Pradip (AHRQ/CFACT) < pradip.muh...@ahrq.hhs.gov> wrote: > Hello, > > My goal is to calculate the weighted kappa measure of agreement between > two factors using the R survey package. I am getting the following error > message (the console is appended below; sorry no data provided). > > > # calculate survey Kappa > > svykappa(~xbpchek53+xcholck53, design) > Error in names(probs) <- nms : > 'names' attribute [15] must be the same length as the vector [8] > > I have followed the following major steps: > > 1) Used the "haven" package to read the sas data set into R. > 2) Used the dplyr mutate() to create 2 new variables and converted to > factors [required for the svykappa()?]. > 3) Created an object (named design) using the survey design variables and > the data file. > 4) Used the svykappa() to compute the kappa measure of agreement. > > I will appreciate if someone could give me hints on how to resolve the > issue. > > Thanks, > > Pradip Muhuri > > ############### The detailed console is appended below > #################### > > > setwd ("U:/A_PSAQ") > > library(haven) > > library(dplyr) > > library(survey) > > library(srvyr) > > library(Hmisc) > > my_hc2013_data <- read_sas("pc2013.sas7bdat") > > > > # Function to convert var names in upper cases to var names in lower > cases > > lower <- function (df) { > + names(df) <- tolower(names(df)) > + df > + } > > my_hc2013_data <- lower(my_hc2013_data) > > > > # Check the contents - Hmisc package (as above) required > > # contents(my_hc2013_data) > > > > # create two new variables > > my_hc2013_data <- mutate(my_hc2013_data, > + xbpchek53 = ifelse(bpchek53 ==1, 1, > + ifelse(bpchek53 %in% 2:6, 2,NA)), > + xcholck53 = ifelse(cholck53 ==1, 1, > + ifelse(cholck53 %in% 2:6, 2,NA))) > > > > # convert the numeric variables to factors for the kappa measure > > my_hc2013_data$xbpchek53 <- as.factor(my_hc2013_data$xbpchek53) > > my_hc2013_data$xcholck53 <- as.factor(my_hc2013_data$xcholck53) > > > > # check whether the variables are factors > > is.factor(my_hc2013_data$xbpchek53) > [1] TRUE > > is.factor(my_hc2013_data$xcholck53) > [1] TRUE > > > > > > # check the data from the cross table > > addmargins(with(my_hc2013_data, table(bpchek53,xbpchek53 ))) > xbpchek53 > bpchek53 1 2 Sum > -9 0 0 0 > -8 0 0 0 > -7 0 0 0 > -1 0 0 0 > 1 19778 0 19778 > 2 0 2652 2652 > 3 0 1014 1014 > 4 0 538 538 > 5 0 737 737 > 6 0 623 623 > Sum 19778 5564 25342 > > addmargins(with(my_hc2013_data, table(cholck53,xcholck53 ))) > xcholck53 > cholck53 1 2 Sum > -9 0 0 0 > -8 0 0 0 > -7 0 0 0 > -1 0 0 0 > 1 14850 0 14850 > 2 0 3153 3153 > 3 0 1170 1170 > 4 0 696 696 > 5 0 909 909 > 6 0 3764 3764 > Sum 14850 9692 24542 > > addmargins(with(my_hc2013_data, table(xbpchek53,xcholck53 ))) > xcholck53 > xbpchek53 1 2 Sum > 1 14667 4379 19046 > 2 163 5225 5388 > Sum 14830 9604 24434 > > > > # create an object with design variables and data > > design<-svydesign(id=~varpsu,strat=~varstr, weights=~perwt13f, > data=my_hc2013_data, nest=TRUE) > > > > # calculate survey Kappa > > svykappa(~xbpchek53+xcholck53, design) > Error in names(probs) <- nms : > 'names' attribute [15] must be the same length as the vector [8] > > ################################################################# > > Pradip K. Muhuri, AHRQ/CFACT > 5600 Fishers Lane # 7N142A, Rockville, MD 20857 > Tel: 301-427-1564 > > > > > -----Original Message----- > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Muhuri, > Pradip (AHRQ/CFACT) > Sent: Thursday, June 16, 2016 2:06 PM > To: David Winsemius > Cc: r-help@r-project.org > Subject: Re: [R] dplyr's arrange function - 3 solutions received - 1 New > Question > > Hello David, > > Your revisions to the earlier code have given me desired results. > > library("gtools") > mydata[ mixedorder(mydata$prevalence_c, decreasing=TRUE), c("indicator", > "prevalence_c") ] > > Thanks, > > Pradip > > > Pradip K. Muhuri, AHRQ/CFACT > 5600 Fishers Lane # 7N142A, Rockville, MD 20857 > Tel: 301-427-1564 > > > > > > -----Original Message----- > From: David Winsemius [mailto:dwinsem...@comcast.net] > Sent: Thursday, June 16, 2016 12:54 PM > To: Muhuri, Pradip (AHRQ/CFACT) > Cc: r-help@r-project.org > Subject: Re: [R] dplyr's arrange function - 3 solutions received - 1 New > Question > > > > On Jun 16, 2016, at 6:12 AM, Muhuri, Pradip (AHRQ/CFACT) < > pradip.muh...@ahrq.hhs.gov> wrote: > > > > Hello, > > > > I got 3 solutions to my earlier code. Thanks to the contributors. May > I bring your attention to a new question below (with respect to David's > solution)? > > > > 1) Thanks to Daniel Nordlund for the tips - replacing leading space > with a 0 in the data. > > > > 2) Thanks to David Winsemius for his solution with the > gtools::mixedorder function. I have added an argument to his. > > > > mydata[ mixedorder(mydata$prevalence_c, decreasing=TRUE), ] > > > > 3) Thanks to Jim Lemon's for his solution. I have prepended a minus > sign to reverse the order. > > > > numprev<-as.numeric(sapply(strsplit(trimws(mydata$prevalence_c)," > > "),"[",1)) mydata[order(-numprev), ] > > > > > > (New)Question for solution 2: > > > > I want to keep only 2 variables (say, indicator and prevalence_c) in > the output. Where to insert the additional code? Why does the following > code fail? > > > >> mydata[ mixedorder(mydata$prevalence_c, decreasing=TRUE), > >> c(mydata$indicator, mydata$prevalence_c) ] > > > > > Try instead just a vector of names for the second argument to "[" > > mydata[ mixedorder(mydata$prevalence_c, decreasing=TRUE), > c("indicator", "prevalence_c") ] > > > Error in `[.data.frame`(mydata, mixedorder(mydata$prevalence_c, > decreasing = TRUE), : > > undefined columns selected > > > > ******************** > >> str(mydata) > > Classes 'tbl_df', 'tbl' and 'data.frame': 10 obs. of 10 variables: > > $ indicator : chr "1. Health check-up" "2. Blood cholesterol checked > " "3. Recieved flu vaccine" "4. Blood pressure checked" ... > > $ subgroup : chr "Both sexes, ages =35 yrs""| __truncated__ "Both > sexes, ages =35 yrs""| __truncated__ "Both sexes, ages =35 yrs""| > __truncated__ "Both sexes, ages =35 yrs""| __truncated__ ... > > $ n : num 2117 2127 2124 2135 1027 ... > > $ prevalence_c: chr "74.7 (1.20)" "90.3 (0.89)" "51.7 (1.35)" "93.2 > (0.70)" ... > > $ prevalence_p: chr "77.2 (1.19)" "84.5 (1.14)" "50.0 (1.33)" "88.7 > (0.88)" ... > > $ sensitivity : chr "87.4 (1.10)" "99.2 (0.27)" "97.0 (0.62)" "99.0 > (0.27)" ... > > $ specificity : chr "68.3 (2.80)" "58.2 (3.72)" "93.5 (0.90)" "52.7 > (3.90)" ... > > $ ppv : chr "90.4 (0.94)" "92.8 (0.85)" "93.7 (0.87)" "94.3 > (0.63)" ... > > $ npv : chr "61.5 (3.00)" "92.8 (2.27)" "96.9 (0.63)" "87.5 > (3.27)" ... > > $ kappa : chr "0.536 (0.029)" "0.676 (0.032)" "0.905 (0.011)" > "0.626 (0.035)" ... > > > > Pradip K. Muhuri, AHRQ/CFACT > > 5600 Fishers Lane # 7N142A, Rockville, MD 20857 > > Tel: 301-427-1564 > > > > > > > > > > -----Original Message----- > > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Daniel > > Nordlund > > Sent: Wednesday, June 15, 2016 6:37 PM > > To: r-help@r-project.org > > Subject: Re: [R] dplyr's arrange function > > > > On 6/15/2016 2:08 PM, Muhuri, Pradip (AHRQ/CFACT) wrote: > >> Hello, > >> > >> I am using the dplyr's arrange() function to sort one of the many > data frames on a character variable (named "prevalence"). > >> > >> Issue: I am not getting the desired output (line 7 is the problem, > which should be the very last line in the sorted data frame) because the > sorted field is character, not numeric. > >> > >> The reproducible example and the output are appended below. > >> > >> Is there any work-around to convert/treat this character variable > (named "prevalence" in the data frame below) as numeric before using the > arrange() function within the dplyr package? > >> > >> Any hints will be appreciated. > >> > >> Thanks, > >> > >> Pradip Muhuri > >> > >> # Reproducible Example > >> > >> library("readr") > >> testdata <- read_csv( > >> "indicator, prevalence > >> 1. Health check-up, 77.2 (1.19) > >> 2. Blood cholesterol checked, 84.5 (1.14) 3. Recieved flu vaccine, > >> 50.0 (1.33) 4. Blood pressure checked, 88.7 (0.88) 5. Aspirin > >> use-problems, 11.7 (1.02) 6.Colonoscopy, 60.2 (1.41) 7. > >> Sigmoidoscopy, > >> 6.1 (0.61) 8. Blood stool test, 14.6 (1.00) 9.Mammogram, 72.6 (1.82) > >> 10. Pap Smear test, 73.3 (2.37)") > >> > >> # Sort on the character variable in descending order > >> arrange(testdata, > >> desc(prevalence)) > >> > >> # Results from Console > >> > >> indicator prevalence > >> (chr) (chr) > >> 1 4. Blood pressure checked 88.7 (0.88) > >> 2 2. Blood cholesterol checked 84.5 (1.14) > >> 3 1. Health check-up 77.2 (1.19) > >> 4 10. Pap Smear test 73.3 (2.37) > >> 5 9.Mammogram 72.6 (1.82) > >> 6 6.Colonoscopy 60.2 (1.41) > >> 7 7. Sigmoidoscopy 6.1 (0.61) > >> 8 3. Recieved flu vaccine 50.0 (1.33) > >> 9 8. Blood stool test 14.6 (1.00) > >> 10 5. Aspirin use-problems 11.7 (1.02) > >> > >> > >> Pradip K. Muhuri, AHRQ/CFACT > >> 5600 Fishers Lane # 7N142A, Rockville, MD 20857 > >> Tel: 301-427-1564 > >> > >> > >> > > > > The problem is that you are sorting a character variable. > > > >> testdata$prevalence > > [1] "77.2 (1.19)" "84.5 (1.14)" "50.0 (1.33)" "88.7 (0.88)" "11.7 > (1.02)" > > [6] "60.2 (1.41)" "6.1 (0.61)" "14.6 (1.00)" "72.6 (1.82)" "73.3 > (2.37)" > >> > > > > Notice that the 7th element is "6.1 (0.61)". The first CHARACTER is a > "6", so it is going to sort BEFORE the "50.0 (1.33)" (in descending > order). If you want the character value of line 7 to sort last, it would > need to be "06.1 (0.61)" or " 6.1 (0.61)" (notice the leading space). > > > > Hope this is helpful, > > > > Dan > > > > Daniel Nordlund > > Port Townsend, WA USA > > > > ______________________________________________ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > ______________________________________________ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > David Winsemius > Alameda, CA, USA > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.