Depends on what you want to do with the weights once you've calculated
them.  For the most obvious application I can think of, one would have
an attribute for each stratum (e.g., the proportion P of persons in the
stratum who have a characteristic of interest, like high blood pressure)
and one would desire to estimate the proportion of that characteristic
in the population at large.  For that desideratum the weights ought to
be (or be proportional to) the frequency with which each stratum is
found in the population at large.  The overall proportion of HBP in the
population at large would then be calculated by
 SUM( P_i * w_i ) / SUM (w_i )
 where  P_i  is the proportion of persons in stratum i who have HBP
 and  w_i  is the weight assigned to that stratum.

For your data, this leads at once to the question, Do you know the
proportions of each age stratum in the population at large?  (I note in
passing that "the population at large" does not include all persons in
the country or region of interest to your inquiry:  you evidently have
no information on persons younger than 15.  This doesn't strike me as
unreasonable -- I merely point it out as a characteristic.)

If you have such information (from a census, or some other previous
statistical work), use those proportions (or equivalently those
population sizes) as you weights w_i.

If you don't have information of that kind, then I ask whether you
actually used a "stratified sampling design".  The fact that your data
don't have approximately equal proportions of each stratum leads me to
suspect that you actually carried out something closer to a simple
random sampling procedure, and that the percentages you report (33%,
24%, 18%, 12%, 7%, 5%) may be the best estimates you currently have for
the proportions of those strata in the population.  (You call them
"sample sizes", but they must be sample percentages -- they total 100%
within rounding error, and sample sizes would be integers.)

In which case these would be the weights you want:

> 15 - 24: 32.89
> 25 - 34: 24.25
> 35 - 44: 18.30
> 45 - 54: 12.28
> 55 - 64:  6.90
>  >= 65:   5.37

OTOH my guesses as to your intent and procedure may be wholly incorrect.


On Sun, 15 Feb 2004, Raoul Kamadjeu wrote in part (edited):

> I'm a Medical doctor, Epidemiologist, actually involved in the
> analysis of a baseline survey.
> The survey used a multilevel stratified sampling design, with The
> province as PSU and Age group as strata
> How do I weight my sample in the estimation of proportion taking into
> account the stratified sampling procedure?
> This is the contibution (percentages) of each age group to the total
> sample sample size
>
> 15 - 24: 32.89
> 25 - 34: 24.25
> 35 - 44: 18.30
> 45 - 54: 12.28
> 55 - 64 6.90
> >= 65: 5.37

> As you notice, the age groups are quite imbalanced.

Yes.  That's why I think these may actually be the prevalences of these
strata in the population, or at any rate decent estimates of those
prevalences.

> It looks necessary to weight my data to estimate a proportion (like
> prevalence of HBP).

Perhaps.  Notice that a weighted average is still an average.  If the
quantities being averaged (what I've called the P_i above) are not very
different from stratum to stratum, a weighted average will not be very
different from an equally weighted (sometimes called, illogically,
"unweighted") average.

> How do I attribute weight (pweight) for survey analysis?

See my remarks above.

> Is it the inverse of the probability of each age group to be selected?

Normally it would be either (1) that probability in the population at
large, if you know that from an independent and reliable source;  or
(2) that probability as estimated from your sample, which might well be
the percentages you observed and reported above.  (In case you had
deliberately under-sampled some strata and over-sampled others, e.g. in
order to obtain approximately equal numbers from each stratum in your
sample, which would be desirable in some circumstances for some
purposes, you would then apply corrections to the observed
percentages.)

> That is what I did:  I created a variable call agweight.  To each age
> group I attributed a weight that is equal to the inverse of the
> probability of a record in that agegroup to be sampled.  I obtained
> something like this?
>
> 15 - 24: 0.67
> 25 - 34: 0.76
> 35 - 44: 0.82
> 45 - 54: 0.88
> 55 - 64: 0.93
> >=65:    0.95

Not "the inverse" but "the complement":  these proposed weights are in
fact (1 - p) for each stratum, where p = the relative frequency (or
proportion) with which that stratum is observed in your overall sample,
rounded to two decimal places.

> Does this procedure sound right?

Not for what I would want to be doing with data.  But I may have
misunderstood your intentions.

> Do i set svyset pweight to the newly created variable agweight?

This looks like a technical question about your statistical package,
which the "Subject:" line identifies as Stata 6.
 Sorry, I can't help you with that.

Good luck!    -- Don Burrill.
 ------------------------------------------------------------
 Donald F. Burrill                              [EMAIL PROTECTED]
 56 Sebbins Pond Drive, Bedford, NH 03110      (603) 626-0816
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to