Re: [R] Stratified Random Sampling Proportional to Size

2013-04-29 Thread William Dunlap
This problem in sampling::strata() comes from calling cbind on a zero-row 
data.frame
with a scalar number.

   library(sampling)
   strata(mtcars[,c(mpg,hp,gear)], strat=gear, size=c(5,5,0))
  Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 0, 1
  In addition: Warning message:
  In strata(mtcars[, c(mpg, hp, gear)], strat = gear, size = c(5,  :
the method is not specified; by default, the method is srswor
   traceback()
  5: stop(arguments imply differing number of rows: , paste(unique(nrows),
 collapse = , ))
  4: data.frame(..., check.names = FALSE)
  3: cbind(deparse.level, ...)
  2: cbind(r, i)
  1: strata(mtcars[, c(mpg, hp, gear)], strat = gear, size = c(5,
 5, 0))

Changing that cbind call from cbind(r, i) to cbind(r, rep(i, 
length.out=nrow(r)))
would fix it up.

cbind is not entirely consistent with what it does with a 0-row rectangular 
input
and a scalar.

With a matrix you get a 0-row result and a warning
   m - matrix(numeric(), nrow=0, ncol=3, 
dimnames=list(NULL,paste0(Col,1:3)))
   str(cbind(m, 666))
   num[0 , 1:4] 
   - attr(*, dimnames)=List of 2
..$ : NULL
..$ : chr [1:4] Col1 Col2 Col3 
  Warning message:
  In cbind(m, 666) :
number of rows of result is not a multiple of vector length (arg 2)

With a data.frame you get an error
   str(cbind(data.frame(m), 666))
  Error in data.frame(..., check.names = FALSE) : 
arguments imply differing number of rows: 0, 1

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf
 Of Thomas Lumley
 Sent: Sunday, April 28, 2013 1:31 PM
 To: Jeff Newmiller
 Cc: R help (r-help@r-project.org)
 Subject: Re: [R] Stratified Random Sampling Proportional to Size
 
 It looks as though you can't sample zero observations from a stratum.  If
 you take the example on the help page and change one of the sample sizes to
 zero you get exactly the same error.
 
 From the fact that there isn't a more explicit error message, I would guess
 that the author just never considered the possibility that someone would
 have a population stratum and not sample from it.
 
 -thomas
 
 
 On Sun, Apr 28, 2013 at 7:14 PM, Jeff Newmiller 
 jdnew...@dcn.davis.ca.uswrote:
 
  a) Please post plain text
 
  b) Please make reproducible examples (e.g. telling us how you accessed a
  database that we have no access to is not helpful). See ?head, ?dput and [1]
 
  c) I don't know anything about the sampling package or the strata
  function, but I would recommend eliminating the rows that have zeros from
  the input data. E.g.:
 
  stratum_cp - stratum_cp[ 0stratum_cp$stratp, ]
 
  [1] http://stackoverflow.com/**questions/5963269/how-to-make-**
  a-great-r-reproducible-examplehttp://stackoverflow.com/questions/5963269/how-
 to-make-a-great-r-reproducible-example
 
  On Fri, 26 Apr 2013, Lopez, Dan wrote:
 
   Hello R Experts,
 
  I kindly request your assistance on figuring out how to get a stratified
  random sampling proportional to 100.
 
  Below is my r code showing what I did and the error I'm getting with
  sampling::strata
 
  # FIRST I summarized count of records by the two variables I want to use
  as strata
 
  Library(RODBC)
  library(sqldf)
  library(sampling)
  #After establishing connection I query the data and sort it by strata
  APPT_TYP_CD_LL and EMPL_TYPE and store it in a dataframe
  CURRPOP-sqlQuery(ch,SELECT APPT_TYP_CD_LL,
 EMPL_TYPE,ASOFDATE,EMPLID,**
  NAME,DEPTID,JOBCODE,JOBTITLE,**SAL_ADMIN_PLAN,RET_TYP_CD_LL FROM
  PS_EMPLOYEES_LL WHERE EMPL_STATUS NOT IN('R','T') ORDER BY
 APPT_TYP_CD_LL,
  EMPL_TYPE)
  #ROWID is a dummy ID I added and repositioned after the strat columns for
  later use
  CURRPOP$ROWID-seq(nrow(**CURRPOP))
  CURRPOP-CURRPOP[,c(1:2,11,3:**10)]
 
  # My strata.  Stratp is how many I want to sampled from each strata. NOTE
  THERE ARE SOME 0's which just means I won't sample from that group.
  stratum_cp-sqldf(SELECT APPT_TYP_CD_LL,EMPL_TYPE, count(*) HC FROM
  CURRPOP GROUP BY APPT_TYP_CD_LL,EMPL_TYPE)
  stratum_cp$stratp-round(**stratum_cp$HC/nrow(CURRPOP)***100)
 
   stratum_cp
 
APPT_TYP_CD_LL EMPL_TYPE   HC stratp
  1  FA S1  0
  2  FC S5  0
  3  FP S  173  3
  4  FR H  170  3
  5  FX H   49  1
  6  FX S   57  1
  7  IN H 1589 25
  8  IN S 3987 63
  9  IP H7  0
  10 IP S   53  1
  11 SA H8  0
  12 SE S   43  1
  13 SF H   14  0
  14 SF S1  0
  15 SG S   10  0
  16 ST H  107  2
  17 ST S6  0
 
  #THEN I attempted to use

Re: [R] Stratified Random Sampling Proportional to Size

2013-04-29 Thread Lopez, Dan
Hi Jeff,
a  b) points taken. Thanks for the reference too.
c) taking the zero's out did the trick.

Dan

-Original Message-
From: Jeff Newmiller [mailto:jdnew...@dcn.davis.ca.us] 
Sent: Sunday, April 28, 2013 12:15 AM
To: Lopez, Dan
Cc: R help (r-help@r-project.org)
Subject: Re: [R] Stratified Random Sampling Proportional to Size

a) Please post plain text

b) Please make reproducible examples (e.g. telling us how you accessed a 
database that we have no access to is not helpful). See ?head, ?dput and [1]

c) I don't know anything about the sampling package or the strata function, but 
I would recommend eliminating the rows that have zeros from the input data. 
E.g.:

stratum_cp - stratum_cp[ 0stratum_cp$stratp, ]

[1]
http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example

On Fri, 26 Apr 2013, Lopez, Dan wrote:

 Hello R Experts,

 I kindly request your assistance on figuring out how to get a 
 stratified random sampling proportional to 100.

 Below is my r code showing what I did and the error I'm getting with 
 sampling::strata

 # FIRST I summarized count of records by the two variables I want to 
 use as strata

 Library(RODBC)
 library(sqldf)
 library(sampling)
 #After establishing connection I query the data and sort it by strata 
 APPT_TYP_CD_LL and EMPL_TYPE and store it in a dataframe 
 CURRPOP-sqlQuery(ch,SELECT APPT_TYP_CD_LL, 
 EMPL_TYPE,ASOFDATE,EMPLID,NAME,DEPTID,JOBCODE,JOBTITLE,SAL_ADMIN_PLAN,
 RET_TYP_CD_LL FROM PS_EMPLOYEES_LL WHERE EMPL_STATUS NOT IN('R','T') 
 ORDER BY APPT_TYP_CD_LL, EMPL_TYPE) #ROWID is a dummy ID I added and 
 repositioned after the strat columns for later use
 CURRPOP$ROWID-seq(nrow(CURRPOP))
 CURRPOP-CURRPOP[,c(1:2,11,3:10)]

 # My strata.  Stratp is how many I want to sampled from each strata. NOTE 
 THERE ARE SOME 0's which just means I won't sample from that group.
 stratum_cp-sqldf(SELECT APPT_TYP_CD_LL,EMPL_TYPE, count(*) HC FROM 
 CURRPOP GROUP BY APPT_TYP_CD_LL,EMPL_TYPE)
 stratum_cp$stratp-round(stratum_cp$HC/nrow(CURRPOP)*100)

 stratum_cp
   APPT_TYP_CD_LL EMPL_TYPE   HC stratp
 1  FA S1  0
 2  FC S5  0
 3  FP S  173  3
 4  FR H  170  3
 5  FX H   49  1
 6  FX S   57  1
 7  IN H 1589 25
 8  IN S 3987 63
 9  IP H7  0
 10 IP S   53  1
 11 SA H8  0
 12 SE S   43  1
 13 SF H   14  0
 14 SF S1  0
 15 SG S   10  0
 16 ST H  107  2
 17 ST S6  0

 #THEN I attempted to use sampling::strata using the instructions in 
 that package and got an error


 #I use stratum_cp$stratp for my sizes.



 s-strata(CURRPOP,c(APPT_TYP_CD_LL,EMPL_TYPE),size=stratum_cp$str
 atp,method=srswor)

 Error in data.frame(..., check.names = FALSE) :

  arguments imply differing number of rows: 0, 1

 traceback()

 5: stop(arguments imply differing number of rows: , 
 paste(unique(nrows),

   collapse = , ))

 4: data.frame(..., check.names = FALSE)

 3: cbind(deparse.level, ...)

 2: cbind(r, i)

 1: strata(CURRPOP, c(APPT_TYP_CD_LL, EMPL_TYPE), size = 
 stratum_cp$stratp,

   method = srswor)



 #In lieu of a reproducible sample here is some info regarding most of 
 my data
 dim(CURRPOP)
 [1] 6280   11
 #Cols w/ personal info have been removed in this output

 str(CURRPOP[,c(1:3,7:11)])

 'data.frame':  6280 obs. of  8 variables:

 $ APPT_TYP_CD_LL: Factor w/ 12 levels FA,FC,FP,..: 1 2 2 2 2 2 3 3 3 3 
 ...

 $ EMPL_TYPE : Factor w/ 2 levels H,S: 2 2 2 2 2 2 2 2 2 2 ...

 $ ROWID : int  1 2 3 4 5 6 7 8 9 10 ...

 $ DEPTID: int  9825 9613 9613 9852 9772 9852 9853 9853 9853 9854 ...

 $ JOBCODE   : Factor w/ 325 levels 055.2,055.3,..: 311 112 112 112 
 112 112 298 299 299 300 ...

 $ JOBTITLE  : Factor w/ 325 levels Accounting Assistant,..: 227 192 192 
 192 192 192 190 191 191 153 ...

 $ SAL_ADMIN_PLAN: Factor w/ 40 levels ADE,AME,ASE,..: 36 38 38 38 38 38 
 31 31 31 31 ...

 $ RET_TYP_CD_LL : Factor w/ 2 levels TCP1,TCP2: 2 2 2 2 2 2 2 2 2 2 ...

 Daniel Lopez
 Workforce Analyst
 HRIM - Workforce Analytics  Metrics
 Strategic Human Resources Management
 wf-analytics-metr...@lists.llnl.govmailto:wf-analytics-metrics@lists.
 llnl.gov
 (925) 422-0814


   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


---
Jeff Newmiller

Re: [R] Stratified Random Sampling Proportional to Size

2013-04-28 Thread Jeff Newmiller

a) Please post plain text

b) Please make reproducible examples (e.g. telling us how you accessed a 
database that we have no access to is not helpful). See ?head, ?dput and 
[1]


c) I don't know anything about the sampling package or the strata 
function, but I would recommend eliminating the rows that have zeros from 
the input data. E.g.:


stratum_cp - stratum_cp[ 0stratum_cp$stratp, ]

[1] 
http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example


On Fri, 26 Apr 2013, Lopez, Dan wrote:


Hello R Experts,

I kindly request your assistance on figuring out how to get a stratified 
random sampling proportional to 100.


Below is my r code showing what I did and the error I'm getting with 
sampling::strata


# FIRST I summarized count of records by the two variables I want to use 
as strata


Library(RODBC)
library(sqldf)
library(sampling)
#After establishing connection I query the data and sort it by strata 
APPT_TYP_CD_LL and EMPL_TYPE and store it in a dataframe
CURRPOP-sqlQuery(ch,SELECT APPT_TYP_CD_LL, 
EMPL_TYPE,ASOFDATE,EMPLID,NAME,DEPTID,JOBCODE,JOBTITLE,SAL_ADMIN_PLAN,RET_TYP_CD_LL FROM 
PS_EMPLOYEES_LL WHERE EMPL_STATUS NOT IN('R','T') ORDER BY APPT_TYP_CD_LL, EMPL_TYPE)
#ROWID is a dummy ID I added and repositioned after the strat columns for later 
use
CURRPOP$ROWID-seq(nrow(CURRPOP))
CURRPOP-CURRPOP[,c(1:2,11,3:10)]

# My strata.  Stratp is how many I want to sampled from each strata. NOTE THERE 
ARE SOME 0's which just means I won't sample from that group.
stratum_cp-sqldf(SELECT APPT_TYP_CD_LL,EMPL_TYPE, count(*) HC FROM CURRPOP GROUP 
BY APPT_TYP_CD_LL,EMPL_TYPE)
stratum_cp$stratp-round(stratum_cp$HC/nrow(CURRPOP)*100)


stratum_cp

  APPT_TYP_CD_LL EMPL_TYPE   HC stratp
1  FA S1  0
2  FC S5  0
3  FP S  173  3
4  FR H  170  3
5  FX H   49  1
6  FX S   57  1
7  IN H 1589 25
8  IN S 3987 63
9  IP H7  0
10 IP S   53  1
11 SA H8  0
12 SE S   43  1
13 SF H   14  0
14 SF S1  0
15 SG S   10  0
16 ST H  107  2
17 ST S6  0

#THEN I attempted to use sampling::strata using the instructions in that 
package and got an error



#I use stratum_cp$stratp for my sizes.




s-strata(CURRPOP,c(APPT_TYP_CD_LL,EMPL_TYPE),size=stratum_cp$stratp,method=srswor)


Error in data.frame(..., check.names = FALSE) :

 arguments imply differing number of rows: 0, 1


traceback()


5: stop(arguments imply differing number of rows: , paste(unique(nrows),

  collapse = , ))

4: data.frame(..., check.names = FALSE)

3: cbind(deparse.level, ...)

2: cbind(r, i)

1: strata(CURRPOP, c(APPT_TYP_CD_LL, EMPL_TYPE), size = stratum_cp$stratp,

  method = srswor)



#In lieu of a reproducible sample here is some info regarding most of my data
dim(CURRPOP)
[1] 6280   11
#Cols w/ personal info have been removed in this output


str(CURRPOP[,c(1:3,7:11)])


'data.frame':  6280 obs. of  8 variables:

$ APPT_TYP_CD_LL: Factor w/ 12 levels FA,FC,FP,..: 1 2 2 2 2 2 3 3 3 3 ...

$ EMPL_TYPE : Factor w/ 2 levels H,S: 2 2 2 2 2 2 2 2 2 2 ...

$ ROWID : int  1 2 3 4 5 6 7 8 9 10 ...

$ DEPTID: int  9825 9613 9613 9852 9772 9852 9853 9853 9853 9854 ...

$ JOBCODE   : Factor w/ 325 levels 055.2,055.3,..: 311 112 112 112 112 
112 298 299 299 300 ...

$ JOBTITLE  : Factor w/ 325 levels Accounting Assistant,..: 227 192 192 
192 192 192 190 191 191 153 ...

$ SAL_ADMIN_PLAN: Factor w/ 40 levels ADE,AME,ASE,..: 36 38 38 38 38 38 
31 31 31 31 ...

$ RET_TYP_CD_LL : Factor w/ 2 levels TCP1,TCP2: 2 2 2 2 2 2 2 2 2 2 ...

Daniel Lopez
Workforce Analyst
HRIM - Workforce Analytics  Metrics
Strategic Human Resources Management
wf-analytics-metr...@lists.llnl.govmailto:wf-analytics-metr...@lists.llnl.gov
(925) 422-0814


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k

__
R-help@r-project.org mailing list

Re: [R] Stratified Random Sampling Proportional to Size

2013-04-28 Thread Thomas Lumley
It looks as though you can't sample zero observations from a stratum.  If
you take the example on the help page and change one of the sample sizes to
zero you get exactly the same error.

From the fact that there isn't a more explicit error message, I would guess
that the author just never considered the possibility that someone would
have a population stratum and not sample from it.

-thomas


On Sun, Apr 28, 2013 at 7:14 PM, Jeff Newmiller jdnew...@dcn.davis.ca.uswrote:

 a) Please post plain text

 b) Please make reproducible examples (e.g. telling us how you accessed a
 database that we have no access to is not helpful). See ?head, ?dput and [1]

 c) I don't know anything about the sampling package or the strata
 function, but I would recommend eliminating the rows that have zeros from
 the input data. E.g.:

 stratum_cp - stratum_cp[ 0stratum_cp$stratp, ]

 [1] http://stackoverflow.com/**questions/5963269/how-to-make-**
 a-great-r-reproducible-examplehttp://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example

 On Fri, 26 Apr 2013, Lopez, Dan wrote:

  Hello R Experts,

 I kindly request your assistance on figuring out how to get a stratified
 random sampling proportional to 100.

 Below is my r code showing what I did and the error I'm getting with
 sampling::strata

 # FIRST I summarized count of records by the two variables I want to use
 as strata

 Library(RODBC)
 library(sqldf)
 library(sampling)
 #After establishing connection I query the data and sort it by strata
 APPT_TYP_CD_LL and EMPL_TYPE and store it in a dataframe
 CURRPOP-sqlQuery(ch,SELECT APPT_TYP_CD_LL, EMPL_TYPE,ASOFDATE,EMPLID,**
 NAME,DEPTID,JOBCODE,JOBTITLE,**SAL_ADMIN_PLAN,RET_TYP_CD_LL FROM
 PS_EMPLOYEES_LL WHERE EMPL_STATUS NOT IN('R','T') ORDER BY APPT_TYP_CD_LL,
 EMPL_TYPE)
 #ROWID is a dummy ID I added and repositioned after the strat columns for
 later use
 CURRPOP$ROWID-seq(nrow(**CURRPOP))
 CURRPOP-CURRPOP[,c(1:2,11,3:**10)]

 # My strata.  Stratp is how many I want to sampled from each strata. NOTE
 THERE ARE SOME 0's which just means I won't sample from that group.
 stratum_cp-sqldf(SELECT APPT_TYP_CD_LL,EMPL_TYPE, count(*) HC FROM
 CURRPOP GROUP BY APPT_TYP_CD_LL,EMPL_TYPE)
 stratum_cp$stratp-round(**stratum_cp$HC/nrow(CURRPOP)***100)

  stratum_cp

   APPT_TYP_CD_LL EMPL_TYPE   HC stratp
 1  FA S1  0
 2  FC S5  0
 3  FP S  173  3
 4  FR H  170  3
 5  FX H   49  1
 6  FX S   57  1
 7  IN H 1589 25
 8  IN S 3987 63
 9  IP H7  0
 10 IP S   53  1
 11 SA H8  0
 12 SE S   43  1
 13 SF H   14  0
 14 SF S1  0
 15 SG S   10  0
 16 ST H  107  2
 17 ST S6  0

 #THEN I attempted to use sampling::strata using the instructions in that
 package and got an error


 #I use stratum_cp$stratp for my sizes.



  s-strata(CURRPOP,c(APPT_TYP_**CD_LL,EMPL_TYPE),size=**
 stratum_cp$stratp,method=**srswor)


 Error in data.frame(..., check.names = FALSE) :

  arguments imply differing number of rows: 0, 1

  traceback()


 5: stop(arguments imply differing number of rows: , paste(unique(nrows),

   collapse = , ))

 4: data.frame(..., check.names = FALSE)

 3: cbind(deparse.level, ...)

 2: cbind(r, i)

 1: strata(CURRPOP, c(APPT_TYP_CD_LL, EMPL_TYPE), size =
 stratum_cp$stratp,

   method = srswor)



 #In lieu of a reproducible sample here is some info regarding most of my
 data
 dim(CURRPOP)
 [1] 6280   11
 #Cols w/ personal info have been removed in this output

  str(CURRPOP[,c(1:3,7:11)])


 'data.frame':  6280 obs. of  8 variables:

 $ APPT_TYP_CD_LL: Factor w/ 12 levels FA,FC,FP,..: 1 2 2 2 2 2 3 3
 3 3 ...

 $ EMPL_TYPE : Factor w/ 2 levels H,S: 2 2 2 2 2 2 2 2 2 2 ...

 $ ROWID : int  1 2 3 4 5 6 7 8 9 10 ...

 $ DEPTID: int  9825 9613 9613 9852 9772 9852 9853 9853 9853 9854
 ...

 $ JOBCODE   : Factor w/ 325 levels 055.2,055.3,..: 311 112 112
 112 112 112 298 299 299 300 ...

 $ JOBTITLE  : Factor w/ 325 levels Accounting Assistant,..: 227 192
 192 192 192 192 190 191 191 153 ...

 $ SAL_ADMIN_PLAN: Factor w/ 40 levels ADE,AME,ASE,..: 36 38 38 38
 38 38 31 31 31 31 ...

 $ RET_TYP_CD_LL : Factor w/ 2 levels TCP1,TCP2: 2 2 2 2 2 2 2 2 2 2
 ...

 Daniel Lopez
 Workforce Analyst
 HRIM - Workforce Analytics  Metrics
 Strategic Human Resources Management
 wf-analytics-metrics@lists.**llnl.govwf-analytics-metr...@lists.llnl.gov
 mailto:wf-analytics-**metr...@lists.llnl.govwf-analytics-metr...@lists.llnl.gov
 
 (925) 422-0814


 [[alternative HTML version deleted]]

 __**
 R-help@r-project.org