I've been trading e-mails with several of you and want you to know I really appreciate your time and intelligence. So far, the consensus seems to be to do nothing as anything I can do is no better than what I have. But...
How can I (or you, for that matter), as a resonable person, ignore what is glaring evidence of bias in my (or your) data? What follows below is a rather long description of the problem, so I appologize in advance. But it does show why it is difficult for me to ignore the bias from a practical rather than statistical point of view. Let me be a bit more specific about the problem. We, and others like us who study volunteering with limited budgets, tend to use contract data collection firms (we used Westat) who have a less that stirling record of data collection. The real response rates hover in the low 30s, and errors abound. On the other side of things is the Bureau of Labor Statistics who uses the Ccensus Bureau to collect volunteering data in a supplement to the Current Population Survey. They have a real response rate of at least 70%. Additionally, there are differences in the methodologies. We use a randon sample (one adult per household) and they (CPS) use a cluster sample in which they interview every person in a "housing unit" aged 16 and up. Since a housing unit can contain more than one household and a household can contain more than one family, their data have a high intracluster correlation. Additionally, we only take direct responses (What did YOU do) while they allow prozy responses (What did your son do?). Our sample size was 4,000 adults; there's was around 120,000 aged 16+. In raw numbers, they show a volunteering rate of 27% and we show 44% -- a diference that has raised more than a few eyebrows in our field. Hence my project and my problem. I can adjust their data to make it look like a random sample: eliminate proxy respondents, eliminate youth, and then take a random sample of one adult per household. Doing this yields a volunteering rate of around 32%. But in all fairness, if I am going to adjust their cluster sample to look like a random sample -- to correct for the methodological noise -- then I need also to correct for the methodological noise in my sample. Another way to phrase this is, now can I, a reasonable person, look at the voting bias in my data and not do something about it? (A few months ago another association did a volunteering study which showed something like a 70% volunteering rate, with 90% saying they "Always vote". When I was asked to evaluate their research, I said it wasn't worth the paper it was written on.) Therefore, rather than guessing at how to adjust for this bias, and rather than ignoring it, I'm looking for ideas on how to correct it. Thanks for reading this far, Chris Chris Toppe, Ph.D. Director, Philanthropic Studies Independent Sector [EMAIL PROTECTED] Richard Ulrich <[EMAIL PROTECTED]> wrote in message news:<[EMAIL PROTECTED]>... > On 13 Apr 2004 06:32:25 -0700, [EMAIL PROTECTED] (Chris > Toppe) wrote: > > > I have to explain my data before I can ask my question. I have survey > > data on volunteering. The data were collected using an RDD > > methodology. The data suffer from two problems -- non-response bias > > (people opting out of the survey) and response bias (people giving the > > socially accetpable answer). I can't tell the degree to which either > > impacts my estimations, but I know they do. In addition to answering > > questions about volunteering, the respondents were also asked if they > > voted in the last presidential election (the data were collected in > > the spring of 2001, not long after the election of 2000). Seventy > > percent (70%) of the respondents said they voted, which is much higher > > than the 51% who actually voted. I don't know if my higher voting > > rate is a non-response bias or a response bias, just that it's too > > high. I also know that 44% said they volunteered. With me so far? > > > > What I want to do is adjust the volunteering rate to correct for the > > known bias. There is support in the literature for adusting a sample > > to known population parameters, something that is done frequently when > > a sample is adjusted to fit paramters such as gender, age, race, etc., > > but I can find nothing that talks about using an embedded question > > proportion to adjust another proportion. In other words, I want to > > adjust the sample so that 51% are voters, thereby gaining a more > > accurate estimation of the percentage who are volunteers. Still with > > me? > > I think you want to do some book-research. It seems to > me that your 70%-claimed, 51% actual, may be about right, > for the number who will *claim* to have voted in an much- > discussed election. > > Does the group of 'voters' include most of the volunteers? > > Does this mean that the 44% will be inflated by a similar > fraction? - I don't know. That's why I think you want to know > what the careful literature says, and that should be > important in your conclusions. > > > > > > I can do a simple ratio adjustment (51 is to 70 as X is to 44), but > > that doesn't take into account the fact that some people are more > > likely to be volunteers than are others. I've been struggling with > > logistic regression as an approach to this, but without success. Does > > anyone have any suggestions on how I can approach this? . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
