a) Please post plain text
b) Please make reproducible examples (e.g. telling us how you accessed a
database that we have no access to is not helpful). See ?head, ?dput and
[1]
c) I don't know anything about the sampling package or the strata
function, but I would recommend eliminating the rows that have zeros from
the input data. E.g.:
stratum_cp <- stratum_cp[ 0<stratum_cp$stratp, ]
[1]
http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
On Fri, 26 Apr 2013, Lopez, Dan wrote:
Hello R Experts,
I kindly request your assistance on figuring out how to get a stratified
random sampling proportional to 100.
Below is my r code showing what I did and the error I'm getting with
sampling::strata
# FIRST I summarized count of records by the two variables I want to use
as strata
Library(RODBC)
library(sqldf)
library(sampling)
#After establishing connection I query the data and sort it by strata
APPT_TYP_CD_LL and EMPL_TYPE and store it in a dataframe
CURRPOP<-sqlQuery(ch,"SELECT APPT_TYP_CD_LL,
EMPL_TYPE,ASOFDATE,EMPLID,NAME,DEPTID,JOBCODE,JOBTITLE,SAL_ADMIN_PLAN,RET_TYP_CD_LL FROM
PS_EMPLOYEES_LL WHERE EMPL_STATUS NOT IN('R','T') ORDER BY APPT_TYP_CD_LL, EMPL_TYPE")
#ROWID is a dummy ID I added and repositioned after the strat columns for later
use
CURRPOP$ROWID<-seq(nrow(CURRPOP))
CURRPOP<-CURRPOP[,c(1:2,11,3:10)]
# My strata. Stratp is how many I want to sampled from each strata. NOTE THERE
ARE SOME 0's which just means I won't sample from that group.
stratum_cp<-sqldf("SELECT APPT_TYP_CD_LL,EMPL_TYPE, count(*) HC FROM CURRPOP GROUP
BY APPT_TYP_CD_LL,EMPL_TYPE")
stratum_cp$stratp<-round(stratum_cp$HC/nrow(CURRPOP)*100)
stratum_cp
APPT_TYP_CD_LL EMPL_TYPE HC stratp
1 FA S 1 0
2 FC S 5 0
3 FP S 173 3
4 FR H 170 3
5 FX H 49 1
6 FX S 57 1
7 IN H 1589 25
8 IN S 3987 63
9 IP H 7 0
10 IP S 53 1
11 SA H 8 0
12 SE S 43 1
13 SF H 14 0
14 SF S 1 0
15 SG S 10 0
16 ST H 107 2
17 ST S 6 0
#THEN I attempted to use sampling::strata using the instructions in that
package and got an error
#I use stratum_cp$stratp for my sizes.
s<-strata(CURRPOP,c("APPT_TYP_CD_LL","EMPL_TYPE"),size=stratum_cp$stratp,method="srswor")
Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 0, 1
traceback()
5: stop("arguments imply differing number of rows: ", paste(unique(nrows),
collapse = ", "))
4: data.frame(..., check.names = FALSE)
3: cbind(deparse.level, ...)
2: cbind(r, i)
1: strata(CURRPOP, c("APPT_TYP_CD_LL", "EMPL_TYPE"), size = stratum_cp$stratp,
method = "srswor")
#In lieu of a reproducible sample here is some info regarding most of my data
dim(CURRPOP)
[1] 6280 11
#Cols w/ personal info have been removed in this output
str(CURRPOP[,c(1:3,7:11)])
'data.frame': 6280 obs. of 8 variables:
$ APPT_TYP_CD_LL: Factor w/ 12 levels "FA","FC","FP",..: 1 2 2 2 2 2 3 3 3 3 ...
$ EMPL_TYPE : Factor w/ 2 levels "H","S": 2 2 2 2 2 2 2 2 2 2 ...
$ ROWID : int 1 2 3 4 5 6 7 8 9 10 ...
$ DEPTID : int 9825 9613 9613 9852 9772 9852 9853 9853 9853 9854 ...
$ JOBCODE : Factor w/ 325 levels "055.2","055.3",..: 311 112 112 112 112
112 298 299 299 300 ...
$ JOBTITLE : Factor w/ 325 levels "Accounting Assistant",..: 227 192 192
192 192 192 190 191 191 153 ...
$ SAL_ADMIN_PLAN: Factor w/ 40 levels "ADE","AME","ASE",..: 36 38 38 38 38 38
31 31 31 31 ...
$ RET_TYP_CD_LL : Factor w/ 2 levels "TCP1","TCP2": 2 2 2 2 2 2 2 2 2 2 ...
Daniel Lopez
Workforce Analyst
HRIM - Workforce Analytics & Metrics
Strategic Human Resources Management
wf-analytics-metr...@lists.llnl.gov<mailto:wf-analytics-metr...@lists.llnl.gov>
(925) 422-0814
[[alternative HTML version deleted]]
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnew...@dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.