[R] Stratified Random Sampling Proportional to Size

Lopez, Dan Fri, 26 Apr 2013 11:54:39 -0700

Hello R Experts,

I kindly request your assistance on figuring out how to get a stratified random 
sampling proportional to 100.


Below is my r code showing what I did and the error I'm getting with 
sampling::strata

# FIRST I summarized count of records by the two variables I want to use as 
strata

Library(RODBC)
library(sqldf)
library(sampling)
#After establishing connection I query the data and sort it by strata 
APPT_TYP_CD_LL and EMPL_TYPE and store it in a dataframe
CURRPOP<-sqlQuery(ch,"SELECT APPT_TYP_CD_LL, 
EMPL_TYPE,ASOFDATE,EMPLID,NAME,DEPTID,JOBCODE,JOBTITLE,SAL_ADMIN_PLAN,RET_TYP_CD_LL
 FROM PS_EMPLOYEES_LL WHERE EMPL_STATUS NOT IN('R','T') ORDER BY 
APPT_TYP_CD_LL, EMPL_TYPE")
#ROWID is a dummy ID I added and repositioned after the strat columns for later 
use
CURRPOP$ROWID<-seq(nrow(CURRPOP))
CURRPOP<-CURRPOP[,c(1:2,11,3:10)]

# My strata.  Stratp is how many I want to sampled from each strata. NOTE THERE 
ARE SOME 0's which just means I won't sample from that group.
stratum_cp<-sqldf("SELECT APPT_TYP_CD_LL,EMPL_TYPE, count(*) HC FROM CURRPOP 
GROUP BY APPT_TYP_CD_LL,EMPL_TYPE")
stratum_cp$stratp<-round(stratum_cp$HC/nrow(CURRPOP)*100)

> stratum_cp
   APPT_TYP_CD_LL EMPL_TYPE   HC stratp
1              FA         S    1      0
2              FC         S    5      0
3              FP         S  173      3
4              FR         H  170      3
5              FX         H   49      1
6              FX         S   57      1
7              IN         H 1589     25
8              IN         S 3987     63
9              IP         H    7      0
10             IP         S   53      1
11             SA         H    8      0
12             SE         S   43      1
13             SF         H   14      0
14             SF         S    1      0
15             SG         S   10      0
16             ST         H  107      2
17             ST         S    6      0

#THEN I attempted to use sampling::strata using the instructions in that 
package and got an error


#I use stratum_cp$stratp for my sizes.



> s<-strata(CURRPOP,c("APPT_TYP_CD_LL","EMPL_TYPE"),size=stratum_cp$stratp,method="srswor")

Error in data.frame(..., check.names = FALSE) :

  arguments imply differing number of rows: 0, 1

> traceback()

5: stop("arguments imply differing number of rows: ", paste(unique(nrows),

       collapse = ", "))

4: data.frame(..., check.names = FALSE)

3: cbind(deparse.level, ...)

2: cbind(r, i)

1: strata(CURRPOP, c("APPT_TYP_CD_LL", "EMPL_TYPE"), size = stratum_cp$stratp,

       method = "srswor")



#In lieu of a reproducible sample here is some info regarding most of my data
dim(CURRPOP)
[1] 6280   11
#Cols w/ personal info have been removed in this output

> str(CURRPOP[,c(1:3,7:11)])

'data.frame':  6280 obs. of  8 variables:

 $ APPT_TYP_CD_LL: Factor w/ 12 levels "FA","FC","FP",..: 1 2 2 2 2 2 3 3 3 3 
...

 $ EMPL_TYPE     : Factor w/ 2 levels "H","S": 2 2 2 2 2 2 2 2 2 2 ...

 $ ROWID         : int  1 2 3 4 5 6 7 8 9 10 ...

 $ DEPTID        : int  9825 9613 9613 9852 9772 9852 9853 9853 9853 9854 ...

 $ JOBCODE       : Factor w/ 325 levels "055.2","055.3",..: 311 112 112 112 112 
112 298 299 299 300 ...

 $ JOBTITLE      : Factor w/ 325 levels "Accounting Assistant",..: 227 192 192 
192 192 192 190 191 191 153 ...

 $ SAL_ADMIN_PLAN: Factor w/ 40 levels "ADE","AME","ASE",..: 36 38 38 38 38 38 
31 31 31 31 ...

 $ RET_TYP_CD_LL : Factor w/ 2 levels "TCP1","TCP2": 2 2 2 2 2 2 2 2 2 2 ...

Daniel Lopez
Workforce Analyst
HRIM - Workforce Analytics & Metrics
Strategic Human Resources Management
wf-analytics-metr...@lists.llnl.gov<mailto:wf-analytics-metr...@lists.llnl.gov>
(925) 422-0814


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Stratified Random Sampling Proportional to Size

Reply via email to