> In revising my book Regression Modeling Strategies for a second edition, I > am seeking a dataset for exemplifying multiple regression using least > squares. Ideally the dataset would have 5-40 variables and 40-10000 > independent observations, and would generate significant interest for a wide > variety of readers. For example, the topic could be political science, > society, human suffering, sports, psychology, economics, entertainment, > history, etc. The dataset needs to be publicly available.
I have a few datasets that might be of interest: * Movie rankings from imdb, https://github.com/hadley/data-movies/tree * Prices of 50,000 round cut diamonds (included in ggplot2) * Baby name popularity for the top 1000 names over the whole USA 1880-2008, and top 100 names per state 1960 to 2008, https://github.com/hadley/data-baby-names/tree * EPA fuel economy measurements for all cars tested in the US, https://github.com/hadley/data-fuel-economy/tree * Many datasets about the US housing crisis (work in progress), https://github.com/hadley/data-housing-crisis * 500,000 house sales in the Bay Area, https://github.com/hadley/sfhousing/tree If any of those sound of interest, I can provide more details. Hadley -- http://had.co.nz/ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.