Hello, I have got a cluster sample using an election dataset where I already had the final results of a county-specific election. I am trying to figure out what would be the best sampling design for my data.
The structure of the dataset is: 1) polling station (in general schools where people vote, for a county, for example, there are 15 polling stations) 2) inside each polling station, there are voting units, where people actually vote (on average there are about 40 voting units for polling station) 3) for each voting unit I have the total votes by candidate (e.g., candidate 1 =322, candidate 2=122, candidate 3= 89) The initial sampling design is: 1) selection of 5 polling stations PPS (based on number of voters) 2) selection of 10 voting units (SRS) I am interested in estimating the proportion of votes by candidate (let's assume we have 3 candidates). My naive estimate would be: votes for candidate 1 / all valid votes = proportion e.g. candidate 1= 2132 / 10874= .1906 candidate 2= 5323 / 10874= .4895 candidate 3= 3419 / 10874= .3144 In this case, the unit of analysis is voters (or votes). If I specify the sampling design using the survey package in this way... design <-svydesign(id=~station + unit fpc=~probstation +probunit, data=sample, pps="brewer") svyciprop(~I(candidate1/totalVotes), design) ... I am assuming that the unit of analysis is the voting unit, right? and I am estimating an average among voting units? I should expand my database at individual level (voters), or I just have to include a unit weight according to the number of voters for voting unit? In other words, is there a way to estimate, for instance, "votes for candidate 1 / all valid votes = proportion", directly from the survey package or I have to expand the database at people level (voters), and then estimate the proportion using svymean and the respective design. I would appreciate any advice or help. Sebastian ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.