Depends on what you want to do with the weights once you've calculated them. For the most obvious application I can think of, one would have an attribute for each stratum (e.g., the proportion P of persons in the stratum who have a characteristic of interest, like high blood pressure) and one would desire to estimate the proportion of that characteristic in the population at large. For that desideratum the weights ought to be (or be proportional to) the frequency with which each stratum is found in the population at large. The overall proportion of HBP in the population at large would then be calculated by SUM( P_i * w_i ) / SUM (w_i ) where P_i is the proportion of persons in stratum i who have HBP and w_i is the weight assigned to that stratum.
For your data, this leads at once to the question, Do you know the proportions of each age stratum in the population at large? (I note in passing that "the population at large" does not include all persons in the country or region of interest to your inquiry: you evidently have no information on persons younger than 15. This doesn't strike me as unreasonable -- I merely point it out as a characteristic.) If you have such information (from a census, or some other previous statistical work), use those proportions (or equivalently those population sizes) as you weights w_i. If you don't have information of that kind, then I ask whether you actually used a "stratified sampling design". The fact that your data don't have approximately equal proportions of each stratum leads me to suspect that you actually carried out something closer to a simple random sampling procedure, and that the percentages you report (33%, 24%, 18%, 12%, 7%, 5%) may be the best estimates you currently have for the proportions of those strata in the population. (You call them "sample sizes", but they must be sample percentages -- they total 100% within rounding error, and sample sizes would be integers.) In which case these would be the weights you want: > 15 - 24: 32.89 > 25 - 34: 24.25 > 35 - 44: 18.30 > 45 - 54: 12.28 > 55 - 64: 6.90 > >= 65: 5.37 OTOH my guesses as to your intent and procedure may be wholly incorrect. On Sun, 15 Feb 2004, Raoul Kamadjeu wrote in part (edited): > I'm a Medical doctor, Epidemiologist, actually involved in the > analysis of a baseline survey. > The survey used a multilevel stratified sampling design, with The > province as PSU and Age group as strata > How do I weight my sample in the estimation of proportion taking into > account the stratified sampling procedure? > This is the contibution (percentages) of each age group to the total > sample sample size > > 15 - 24: 32.89 > 25 - 34: 24.25 > 35 - 44: 18.30 > 45 - 54: 12.28 > 55 - 64 6.90 > >= 65: 5.37 > As you notice, the age groups are quite imbalanced. Yes. That's why I think these may actually be the prevalences of these strata in the population, or at any rate decent estimates of those prevalences. > It looks necessary to weight my data to estimate a proportion (like > prevalence of HBP). Perhaps. Notice that a weighted average is still an average. If the quantities being averaged (what I've called the P_i above) are not very different from stratum to stratum, a weighted average will not be very different from an equally weighted (sometimes called, illogically, "unweighted") average. > How do I attribute weight (pweight) for survey analysis? See my remarks above. > Is it the inverse of the probability of each age group to be selected? Normally it would be either (1) that probability in the population at large, if you know that from an independent and reliable source; or (2) that probability as estimated from your sample, which might well be the percentages you observed and reported above. (In case you had deliberately under-sampled some strata and over-sampled others, e.g. in order to obtain approximately equal numbers from each stratum in your sample, which would be desirable in some circumstances for some purposes, you would then apply corrections to the observed percentages.) > That is what I did: I created a variable call agweight. To each age > group I attributed a weight that is equal to the inverse of the > probability of a record in that agegroup to be sampled. I obtained > something like this? > > 15 - 24: 0.67 > 25 - 34: 0.76 > 35 - 44: 0.82 > 45 - 54: 0.88 > 55 - 64: 0.93 > >=65: 0.95 Not "the inverse" but "the complement": these proposed weights are in fact (1 - p) for each stratum, where p = the relative frequency (or proportion) with which that stratum is observed in your overall sample, rounded to two decimal places. > Does this procedure sound right? Not for what I would want to be doing with data. But I may have misunderstood your intentions. > Do i set svyset pweight to the newly created variable agweight? This looks like a technical question about your statistical package, which the "Subject:" line identifies as Stata 6. Sorry, I can't help you with that. Good luck! -- Don Burrill. ------------------------------------------------------------ Donald F. Burrill [EMAIL PROTECTED] 56 Sebbins Pond Drive, Bedford, NH 03110 (603) 626-0816 . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
