On Wed, 7 Apr 2010, ONKELINX, Thierry wrote:

Dear all,

We are analysing some survey data and we are not sure if we are using
the correct syntax for our design.

The population of interest is a set of 4416 polygons with different
sizes ranging from 0.003 to 45.6 ha, 7460 ha in total. Each polygon has
a binary attribute (presence/absence) and we want to estimate the
probability of presence in the population.

We used sampling with replacement weighted by the area of the polygon.
The population was stratified using 2 variables: block and type. Each of
the 14 blocks is a 20 by 50 km geographical region. Type is a two level
factor. Not every level is present in each block. Each block has a
Status attribute with two levels: medium (9 blocks) or good (5 blocks).
Besides the overall ratio, we would like the estimate the ratio per
Status.
The samplesize per stratum was calculated with epi.stratasize() from the
epiR package. The population size in the 21 strata ranges from 1 to
1158. The sample size ranges from 0 in the blocks with very few polygons
(<20), 1 in blocks with a low number of polygon (20 - 50) and up to 25
polygons in the largest strata.

That sounds strange.  If you have a stratified sample and have set the sample 
size in some strata to be zero, you cannot possibly learn anything about those 
strata and so you can't get unbiased population estimates.   In order to get 
unbiased estimates and valid standard errors you need at least two samples per 
stratum.

You're going to have to combine some of the strata so that each stratum has at 
least two observations.  Since your design only makes sense if you assume the 
small, unsampled, strata are similar to some of the larger strata, it should be 
possible for you to combine them.


Does the syntax below represents the data structure above? Any comments
are welcome.

library(survey)
svydesign(
        id = ~ 1, #no clustering
        weights = ~ Area, #weighted by the area of the polygon
        strata = ~ Status + Block + Type,
        nest = TRUE
)

You want strata = ~interaction(Block,Type,drop=TRUE), which specifies a single 
stage of sampling in which the strata are combinations of Block and Type.  The 
fact that you need drop=TRUE is a bug, which I will fix.

# Is Area a correct weighting factor? Or should we use the area divided
by the sum of the total area (per stratum?)

It's not clear to me from your description whether the probability of sampling 
a particular region is proportional to its Area or inversely proportional to 
its Area.  If the probability is proportional to Area, the weight would be 
1/Area

 svydesign(
        id = ~ 1, #no clustering
        weights = ~ I(1/Area), #weighted by the area of the polygon
        strata = ~ interaction(Block, Type,drop=TRUE),
        nest = TRUE
 )


# The code above runs. But when we omit "Status" from the strata, then
we get an error: "a stratum has only 1 PSU". Shouldn't we get the same
error with the code above?

#with finity population correction
svydesign(
        id = ~ 1, #no clustering
        weights = ~ Area, #weighted by the area of the polygon
        strata = ~ Status + Block + Type,
        fpc ~ nStatus + nBlock + nType,
        nest = TRUE
)
#We are not sure what to use for nStatus, nBlock and nType. Is it the
number of levels of that stratum (nStatus = 2)? The number of levels in
the stratum below (nStatus = length(unique(Block)) per level of Status,
nType = number of polygons per Status:Block:Type)? The total number of
polygons in that stratum?

This is easier when you get the right strata.  There should be a single fpc 
variable, which should be equal to the number of polygons in the population for 
that stratum.


To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher

Indeed.


     -thomas

Thomas Lumley                   Assoc. Professor, Biostatistics
tlum...@u.washington.edu        University of Washington, Seattle

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to