Fine detective work, David. Now, you can see the reasons for my frustration - multiplicity of data sets combined with non-existent documentation of the source of data in journal articles (e.g. Kay 1986; Lunn and McNeil 1995).
Best, Ravi. ____________________________________________________________________ Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: rvarad...@jhmi.edu ----- Original Message ----- From: David Winsemius <dwinsem...@comcast.net> Date: Tuesday, March 24, 2009 10:54 pm Subject: Re: [R] Green and Byar (1980) Prostate Cancer Data set from Andrews and Herzberg - Data To: Rolf Turner <r.tur...@auckland.ac.nz> Cc: R-help Forum <r-help@r-project.org>, Ravi Varadhan <rvarad...@jhmi.edu> > On Mar 24, 2009, at 8:57 PM, Rolf Turner wrote: > > > > > On 25/03/2009, at 12:09 PM, Frank E Harrell Jr wrote: > > > > <snip> > > > >>> (2) Scrolling down to ``Byar and Green prostate cancer data'' > >>> appeared > >>> to get > >>> me to the right place. But I couldn't see any signs of any ``R > > >>> binary > >>> files''. > >> > >> Please look again. It's under the heading "R". Unfortunately I used > >> .sav suffix for save() files in the old days. > > > > Ah-ha. Oh me of little faith. I have been hanging around (in > > my current work environment) with too many SPSS users, and the > > *.sav extension seems to be the standard for SPSS data files. > > Whence my corrupted thinking. > > > >> The .xls fine opened with no problem in OpenOffice; has 506 rows. > > > > Hmmm. When I opened it with Excel on the Mac I got a spread > > sheet with 503 rows --- the first row being the column names, > > so there were really 502 rows. > > The last "patnr" is "506" but there are only 502 lines of data. 471, > > 473, 475 and 488 are missing. > > And the CMU Statlib version for 2002 looks the same. > > > The version at this site is missing more than 25 cases: > > > Here are two other copies of the dataset the first of which appears > to > have those missing cases: > This one has patient numbers: > > > This one has a description of the fields and cites the one above but > > has not retained the patient numbers and has apparently only kept the > > 475 cases with complete data. > > > > > > > David Winsemius, MD > Heritage Laboratories > West Hartford, CT > > ______________________________________________ > R-help@r-project.org mailing list > > PLEASE do read the posting guide > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.