On 08-Mar-09 15:14:03, Duncan Murdoch wrote: > On 08/03/2009 10:49 AM, hadley wickham wrote: >>> More seriously : I don't think relative numbers of package downloads >>> can be interpreted in any reasonable way, because reasons for >>> package download have a very wide range from curiosity ("what's >>> this ?"), fun (think "fortunes"...), to vital need tthink lme4 >>> if/when a consensus on denominator DFs can be reached :-)...). >>> What can you infer in good faith from such a mess ? >> >> So when we have messy data with measurement error, we should just >> give up? Doesn't sound very statistical! ;) > > I think the situation is worse than messy. If a client comes in with > data that doesn't address the question they're interested in, I think > they are better served to be told that, than to be given an answer that > is not actually valid. They should also be told how to design a study > that actually does address their question. > > You (and others) have mentioned Google Analytics as a possible way to > address the quality of data; that's helpful. But analyzing bad data > will just give bad conclusions. > Duncan Murdoch
The population of R users (which we would need to sample in order to obtain good data) is probably more elusive than a fish population in the ocean -- only partially visible at best, and with an unknown proportion invisible. At least in Fisheries research, there are long established capture techniques (from trawling to netting to electro-fishing to ... ) which can be deployed, for research purposes, in such a way as to potentially reach all members of a target population, with at least a moderately good approximation to random sampling. What have we for R? Come to think of it, electro-fishing, ... Suppose R were released with 2 types of cookie embedded in base R. Each type is randomly configured, when R is first run, to be Active or Inactive (probability of activation to be decided at the design stage ... ). Type 1, if active, on a certain date generates an event which brings it to the notice of R-Core (e.g. by clandestine email or by inducing a bug report). Type 2 acts similarly on a later date. If Type 2 acts, it carries with it information as to whether there was a Type 1 action along with whether, apparently, the Type 1 action "succeeded". We then have, in effect, an analogue of the Mark-Recapture technique of population estimation (along with the usual questions about equal catchability and so forth). However, since this sort of thing (which I am not proposing seriously, only for the sake of argument) is undoubtedly unethical (and would do R's reputation no good if it came to light), I tentatively conclude that the population of R users is likely to remain as elusive as ever. Best wishes to all, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <ted.hard...@manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 08-Mar-09 Time: 16:11:44 ------------------------------ XFMail ------------------------------ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.