Re: [R] Stratified Random Sampling Proportional to Size
This problem in sampling::strata() comes from calling cbind on a zero-row data.frame with a scalar number. library(sampling) strata(mtcars[,c(mpg,hp,gear)], strat=gear, size=c(5,5,0)) Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 0, 1 In addition: Warning message: In strata(mtcars[, c(mpg, hp, gear)], strat = gear, size = c(5, : the method is not specified; by default, the method is srswor traceback() 5: stop(arguments imply differing number of rows: , paste(unique(nrows), collapse = , )) 4: data.frame(..., check.names = FALSE) 3: cbind(deparse.level, ...) 2: cbind(r, i) 1: strata(mtcars[, c(mpg, hp, gear)], strat = gear, size = c(5, 5, 0)) Changing that cbind call from cbind(r, i) to cbind(r, rep(i, length.out=nrow(r))) would fix it up. cbind is not entirely consistent with what it does with a 0-row rectangular input and a scalar. With a matrix you get a 0-row result and a warning m - matrix(numeric(), nrow=0, ncol=3, dimnames=list(NULL,paste0(Col,1:3))) str(cbind(m, 666)) num[0 , 1:4] - attr(*, dimnames)=List of 2 ..$ : NULL ..$ : chr [1:4] Col1 Col2 Col3 Warning message: In cbind(m, 666) : number of rows of result is not a multiple of vector length (arg 2) With a data.frame you get an error str(cbind(data.frame(m), 666)) Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 0, 1 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Thomas Lumley Sent: Sunday, April 28, 2013 1:31 PM To: Jeff Newmiller Cc: R help (r-help@r-project.org) Subject: Re: [R] Stratified Random Sampling Proportional to Size It looks as though you can't sample zero observations from a stratum. If you take the example on the help page and change one of the sample sizes to zero you get exactly the same error. From the fact that there isn't a more explicit error message, I would guess that the author just never considered the possibility that someone would have a population stratum and not sample from it. -thomas On Sun, Apr 28, 2013 at 7:14 PM, Jeff Newmiller jdnew...@dcn.davis.ca.uswrote: a) Please post plain text b) Please make reproducible examples (e.g. telling us how you accessed a database that we have no access to is not helpful). See ?head, ?dput and [1] c) I don't know anything about the sampling package or the strata function, but I would recommend eliminating the rows that have zeros from the input data. E.g.: stratum_cp - stratum_cp[ 0stratum_cp$stratp, ] [1] http://stackoverflow.com/**questions/5963269/how-to-make-** a-great-r-reproducible-examplehttp://stackoverflow.com/questions/5963269/how- to-make-a-great-r-reproducible-example On Fri, 26 Apr 2013, Lopez, Dan wrote: Hello R Experts, I kindly request your assistance on figuring out how to get a stratified random sampling proportional to 100. Below is my r code showing what I did and the error I'm getting with sampling::strata # FIRST I summarized count of records by the two variables I want to use as strata Library(RODBC) library(sqldf) library(sampling) #After establishing connection I query the data and sort it by strata APPT_TYP_CD_LL and EMPL_TYPE and store it in a dataframe CURRPOP-sqlQuery(ch,SELECT APPT_TYP_CD_LL, EMPL_TYPE,ASOFDATE,EMPLID,** NAME,DEPTID,JOBCODE,JOBTITLE,**SAL_ADMIN_PLAN,RET_TYP_CD_LL FROM PS_EMPLOYEES_LL WHERE EMPL_STATUS NOT IN('R','T') ORDER BY APPT_TYP_CD_LL, EMPL_TYPE) #ROWID is a dummy ID I added and repositioned after the strat columns for later use CURRPOP$ROWID-seq(nrow(**CURRPOP)) CURRPOP-CURRPOP[,c(1:2,11,3:**10)] # My strata. Stratp is how many I want to sampled from each strata. NOTE THERE ARE SOME 0's which just means I won't sample from that group. stratum_cp-sqldf(SELECT APPT_TYP_CD_LL,EMPL_TYPE, count(*) HC FROM CURRPOP GROUP BY APPT_TYP_CD_LL,EMPL_TYPE) stratum_cp$stratp-round(**stratum_cp$HC/nrow(CURRPOP)***100) stratum_cp APPT_TYP_CD_LL EMPL_TYPE HC stratp 1 FA S1 0 2 FC S5 0 3 FP S 173 3 4 FR H 170 3 5 FX H 49 1 6 FX S 57 1 7 IN H 1589 25 8 IN S 3987 63 9 IP H7 0 10 IP S 53 1 11 SA H8 0 12 SE S 43 1 13 SF H 14 0 14 SF S1 0 15 SG S 10 0 16 ST H 107 2 17 ST S6 0 #THEN I attempted to use
Re: [R] Stratified Random Sampling Proportional to Size
Hi Jeff, a b) points taken. Thanks for the reference too. c) taking the zero's out did the trick. Dan -Original Message- From: Jeff Newmiller [mailto:jdnew...@dcn.davis.ca.us] Sent: Sunday, April 28, 2013 12:15 AM To: Lopez, Dan Cc: R help (r-help@r-project.org) Subject: Re: [R] Stratified Random Sampling Proportional to Size a) Please post plain text b) Please make reproducible examples (e.g. telling us how you accessed a database that we have no access to is not helpful). See ?head, ?dput and [1] c) I don't know anything about the sampling package or the strata function, but I would recommend eliminating the rows that have zeros from the input data. E.g.: stratum_cp - stratum_cp[ 0stratum_cp$stratp, ] [1] http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example On Fri, 26 Apr 2013, Lopez, Dan wrote: Hello R Experts, I kindly request your assistance on figuring out how to get a stratified random sampling proportional to 100. Below is my r code showing what I did and the error I'm getting with sampling::strata # FIRST I summarized count of records by the two variables I want to use as strata Library(RODBC) library(sqldf) library(sampling) #After establishing connection I query the data and sort it by strata APPT_TYP_CD_LL and EMPL_TYPE and store it in a dataframe CURRPOP-sqlQuery(ch,SELECT APPT_TYP_CD_LL, EMPL_TYPE,ASOFDATE,EMPLID,NAME,DEPTID,JOBCODE,JOBTITLE,SAL_ADMIN_PLAN, RET_TYP_CD_LL FROM PS_EMPLOYEES_LL WHERE EMPL_STATUS NOT IN('R','T') ORDER BY APPT_TYP_CD_LL, EMPL_TYPE) #ROWID is a dummy ID I added and repositioned after the strat columns for later use CURRPOP$ROWID-seq(nrow(CURRPOP)) CURRPOP-CURRPOP[,c(1:2,11,3:10)] # My strata. Stratp is how many I want to sampled from each strata. NOTE THERE ARE SOME 0's which just means I won't sample from that group. stratum_cp-sqldf(SELECT APPT_TYP_CD_LL,EMPL_TYPE, count(*) HC FROM CURRPOP GROUP BY APPT_TYP_CD_LL,EMPL_TYPE) stratum_cp$stratp-round(stratum_cp$HC/nrow(CURRPOP)*100) stratum_cp APPT_TYP_CD_LL EMPL_TYPE HC stratp 1 FA S1 0 2 FC S5 0 3 FP S 173 3 4 FR H 170 3 5 FX H 49 1 6 FX S 57 1 7 IN H 1589 25 8 IN S 3987 63 9 IP H7 0 10 IP S 53 1 11 SA H8 0 12 SE S 43 1 13 SF H 14 0 14 SF S1 0 15 SG S 10 0 16 ST H 107 2 17 ST S6 0 #THEN I attempted to use sampling::strata using the instructions in that package and got an error #I use stratum_cp$stratp for my sizes. s-strata(CURRPOP,c(APPT_TYP_CD_LL,EMPL_TYPE),size=stratum_cp$str atp,method=srswor) Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 0, 1 traceback() 5: stop(arguments imply differing number of rows: , paste(unique(nrows), collapse = , )) 4: data.frame(..., check.names = FALSE) 3: cbind(deparse.level, ...) 2: cbind(r, i) 1: strata(CURRPOP, c(APPT_TYP_CD_LL, EMPL_TYPE), size = stratum_cp$stratp, method = srswor) #In lieu of a reproducible sample here is some info regarding most of my data dim(CURRPOP) [1] 6280 11 #Cols w/ personal info have been removed in this output str(CURRPOP[,c(1:3,7:11)]) 'data.frame': 6280 obs. of 8 variables: $ APPT_TYP_CD_LL: Factor w/ 12 levels FA,FC,FP,..: 1 2 2 2 2 2 3 3 3 3 ... $ EMPL_TYPE : Factor w/ 2 levels H,S: 2 2 2 2 2 2 2 2 2 2 ... $ ROWID : int 1 2 3 4 5 6 7 8 9 10 ... $ DEPTID: int 9825 9613 9613 9852 9772 9852 9853 9853 9853 9854 ... $ JOBCODE : Factor w/ 325 levels 055.2,055.3,..: 311 112 112 112 112 112 298 299 299 300 ... $ JOBTITLE : Factor w/ 325 levels Accounting Assistant,..: 227 192 192 192 192 192 190 191 191 153 ... $ SAL_ADMIN_PLAN: Factor w/ 40 levels ADE,AME,ASE,..: 36 38 38 38 38 38 31 31 31 31 ... $ RET_TYP_CD_LL : Factor w/ 2 levels TCP1,TCP2: 2 2 2 2 2 2 2 2 2 2 ... Daniel Lopez Workforce Analyst HRIM - Workforce Analytics Metrics Strategic Human Resources Management wf-analytics-metr...@lists.llnl.govmailto:wf-analytics-metrics@lists. llnl.gov (925) 422-0814 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. --- Jeff Newmiller
Re: [R] Stratified Random Sampling Proportional to Size
a) Please post plain text b) Please make reproducible examples (e.g. telling us how you accessed a database that we have no access to is not helpful). See ?head, ?dput and [1] c) I don't know anything about the sampling package or the strata function, but I would recommend eliminating the rows that have zeros from the input data. E.g.: stratum_cp - stratum_cp[ 0stratum_cp$stratp, ] [1] http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example On Fri, 26 Apr 2013, Lopez, Dan wrote: Hello R Experts, I kindly request your assistance on figuring out how to get a stratified random sampling proportional to 100. Below is my r code showing what I did and the error I'm getting with sampling::strata # FIRST I summarized count of records by the two variables I want to use as strata Library(RODBC) library(sqldf) library(sampling) #After establishing connection I query the data and sort it by strata APPT_TYP_CD_LL and EMPL_TYPE and store it in a dataframe CURRPOP-sqlQuery(ch,SELECT APPT_TYP_CD_LL, EMPL_TYPE,ASOFDATE,EMPLID,NAME,DEPTID,JOBCODE,JOBTITLE,SAL_ADMIN_PLAN,RET_TYP_CD_LL FROM PS_EMPLOYEES_LL WHERE EMPL_STATUS NOT IN('R','T') ORDER BY APPT_TYP_CD_LL, EMPL_TYPE) #ROWID is a dummy ID I added and repositioned after the strat columns for later use CURRPOP$ROWID-seq(nrow(CURRPOP)) CURRPOP-CURRPOP[,c(1:2,11,3:10)] # My strata. Stratp is how many I want to sampled from each strata. NOTE THERE ARE SOME 0's which just means I won't sample from that group. stratum_cp-sqldf(SELECT APPT_TYP_CD_LL,EMPL_TYPE, count(*) HC FROM CURRPOP GROUP BY APPT_TYP_CD_LL,EMPL_TYPE) stratum_cp$stratp-round(stratum_cp$HC/nrow(CURRPOP)*100) stratum_cp APPT_TYP_CD_LL EMPL_TYPE HC stratp 1 FA S1 0 2 FC S5 0 3 FP S 173 3 4 FR H 170 3 5 FX H 49 1 6 FX S 57 1 7 IN H 1589 25 8 IN S 3987 63 9 IP H7 0 10 IP S 53 1 11 SA H8 0 12 SE S 43 1 13 SF H 14 0 14 SF S1 0 15 SG S 10 0 16 ST H 107 2 17 ST S6 0 #THEN I attempted to use sampling::strata using the instructions in that package and got an error #I use stratum_cp$stratp for my sizes. s-strata(CURRPOP,c(APPT_TYP_CD_LL,EMPL_TYPE),size=stratum_cp$stratp,method=srswor) Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 0, 1 traceback() 5: stop(arguments imply differing number of rows: , paste(unique(nrows), collapse = , )) 4: data.frame(..., check.names = FALSE) 3: cbind(deparse.level, ...) 2: cbind(r, i) 1: strata(CURRPOP, c(APPT_TYP_CD_LL, EMPL_TYPE), size = stratum_cp$stratp, method = srswor) #In lieu of a reproducible sample here is some info regarding most of my data dim(CURRPOP) [1] 6280 11 #Cols w/ personal info have been removed in this output str(CURRPOP[,c(1:3,7:11)]) 'data.frame': 6280 obs. of 8 variables: $ APPT_TYP_CD_LL: Factor w/ 12 levels FA,FC,FP,..: 1 2 2 2 2 2 3 3 3 3 ... $ EMPL_TYPE : Factor w/ 2 levels H,S: 2 2 2 2 2 2 2 2 2 2 ... $ ROWID : int 1 2 3 4 5 6 7 8 9 10 ... $ DEPTID: int 9825 9613 9613 9852 9772 9852 9853 9853 9853 9854 ... $ JOBCODE : Factor w/ 325 levels 055.2,055.3,..: 311 112 112 112 112 112 298 299 299 300 ... $ JOBTITLE : Factor w/ 325 levels Accounting Assistant,..: 227 192 192 192 192 192 190 191 191 153 ... $ SAL_ADMIN_PLAN: Factor w/ 40 levels ADE,AME,ASE,..: 36 38 38 38 38 38 31 31 31 31 ... $ RET_TYP_CD_LL : Factor w/ 2 levels TCP1,TCP2: 2 2 2 2 2 2 2 2 2 2 ... Daniel Lopez Workforce Analyst HRIM - Workforce Analytics Metrics Strategic Human Resources Management wf-analytics-metr...@lists.llnl.govmailto:wf-analytics-metr...@lists.llnl.gov (925) 422-0814 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k __ R-help@r-project.org mailing list
Re: [R] Stratified Random Sampling Proportional to Size
It looks as though you can't sample zero observations from a stratum. If you take the example on the help page and change one of the sample sizes to zero you get exactly the same error. From the fact that there isn't a more explicit error message, I would guess that the author just never considered the possibility that someone would have a population stratum and not sample from it. -thomas On Sun, Apr 28, 2013 at 7:14 PM, Jeff Newmiller jdnew...@dcn.davis.ca.uswrote: a) Please post plain text b) Please make reproducible examples (e.g. telling us how you accessed a database that we have no access to is not helpful). See ?head, ?dput and [1] c) I don't know anything about the sampling package or the strata function, but I would recommend eliminating the rows that have zeros from the input data. E.g.: stratum_cp - stratum_cp[ 0stratum_cp$stratp, ] [1] http://stackoverflow.com/**questions/5963269/how-to-make-** a-great-r-reproducible-examplehttp://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example On Fri, 26 Apr 2013, Lopez, Dan wrote: Hello R Experts, I kindly request your assistance on figuring out how to get a stratified random sampling proportional to 100. Below is my r code showing what I did and the error I'm getting with sampling::strata # FIRST I summarized count of records by the two variables I want to use as strata Library(RODBC) library(sqldf) library(sampling) #After establishing connection I query the data and sort it by strata APPT_TYP_CD_LL and EMPL_TYPE and store it in a dataframe CURRPOP-sqlQuery(ch,SELECT APPT_TYP_CD_LL, EMPL_TYPE,ASOFDATE,EMPLID,** NAME,DEPTID,JOBCODE,JOBTITLE,**SAL_ADMIN_PLAN,RET_TYP_CD_LL FROM PS_EMPLOYEES_LL WHERE EMPL_STATUS NOT IN('R','T') ORDER BY APPT_TYP_CD_LL, EMPL_TYPE) #ROWID is a dummy ID I added and repositioned after the strat columns for later use CURRPOP$ROWID-seq(nrow(**CURRPOP)) CURRPOP-CURRPOP[,c(1:2,11,3:**10)] # My strata. Stratp is how many I want to sampled from each strata. NOTE THERE ARE SOME 0's which just means I won't sample from that group. stratum_cp-sqldf(SELECT APPT_TYP_CD_LL,EMPL_TYPE, count(*) HC FROM CURRPOP GROUP BY APPT_TYP_CD_LL,EMPL_TYPE) stratum_cp$stratp-round(**stratum_cp$HC/nrow(CURRPOP)***100) stratum_cp APPT_TYP_CD_LL EMPL_TYPE HC stratp 1 FA S1 0 2 FC S5 0 3 FP S 173 3 4 FR H 170 3 5 FX H 49 1 6 FX S 57 1 7 IN H 1589 25 8 IN S 3987 63 9 IP H7 0 10 IP S 53 1 11 SA H8 0 12 SE S 43 1 13 SF H 14 0 14 SF S1 0 15 SG S 10 0 16 ST H 107 2 17 ST S6 0 #THEN I attempted to use sampling::strata using the instructions in that package and got an error #I use stratum_cp$stratp for my sizes. s-strata(CURRPOP,c(APPT_TYP_**CD_LL,EMPL_TYPE),size=** stratum_cp$stratp,method=**srswor) Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 0, 1 traceback() 5: stop(arguments imply differing number of rows: , paste(unique(nrows), collapse = , )) 4: data.frame(..., check.names = FALSE) 3: cbind(deparse.level, ...) 2: cbind(r, i) 1: strata(CURRPOP, c(APPT_TYP_CD_LL, EMPL_TYPE), size = stratum_cp$stratp, method = srswor) #In lieu of a reproducible sample here is some info regarding most of my data dim(CURRPOP) [1] 6280 11 #Cols w/ personal info have been removed in this output str(CURRPOP[,c(1:3,7:11)]) 'data.frame': 6280 obs. of 8 variables: $ APPT_TYP_CD_LL: Factor w/ 12 levels FA,FC,FP,..: 1 2 2 2 2 2 3 3 3 3 ... $ EMPL_TYPE : Factor w/ 2 levels H,S: 2 2 2 2 2 2 2 2 2 2 ... $ ROWID : int 1 2 3 4 5 6 7 8 9 10 ... $ DEPTID: int 9825 9613 9613 9852 9772 9852 9853 9853 9853 9854 ... $ JOBCODE : Factor w/ 325 levels 055.2,055.3,..: 311 112 112 112 112 112 298 299 299 300 ... $ JOBTITLE : Factor w/ 325 levels Accounting Assistant,..: 227 192 192 192 192 192 190 191 191 153 ... $ SAL_ADMIN_PLAN: Factor w/ 40 levels ADE,AME,ASE,..: 36 38 38 38 38 38 31 31 31 31 ... $ RET_TYP_CD_LL : Factor w/ 2 levels TCP1,TCP2: 2 2 2 2 2 2 2 2 2 2 ... Daniel Lopez Workforce Analyst HRIM - Workforce Analytics Metrics Strategic Human Resources Management wf-analytics-metrics@lists.**llnl.govwf-analytics-metr...@lists.llnl.gov mailto:wf-analytics-**metr...@lists.llnl.govwf-analytics-metr...@lists.llnl.gov (925) 422-0814 [[alternative HTML version deleted]] __** R-help@r-project.org