Alzola and Harrell discuss some of these issues in "An introduction to S and the Hmisc and Design Libraries".
-ista On Wed, Oct 28, 2009 at 1:27 PM, Jacob Wegelin <jacobwege...@fastmail.fm> wrote: > > Often it is useful to keep a "codebook" to document the contents of a > dataset. (By "dataset" I mean > a rectangular structure such as a dataframe.) > > The codebook has as many rows as the dataset has columns (variables, > fields). The columns (fields) > of the codebook may include: > > • variable name > > • type (character, factor, integer, etc) > > • variable label (e.g., a variable called "bmi2" might be > labeled "BMI hand-input by > clinic personnel, must be checked" > > • permissible values > > • which values indicate missing (and potentially different > kinds of missing) > > Some statistics software (e.g., SPSS and Stata) provides at least a subset > of this kind of > information automatically in a convenient form. For instance, in Stata one > can define a "label" for > a variable and it is thenceforth linked to the variable. In output from > certain modeling and > graphics functions, Stata by default uses the label rather than the variable > name. > > Furthemore: In Stata, if "myvariable" is labeled numeric (in R lingo, a > factor), and I type > > codebook myvariable > > then Stata tells me, among other things, the "levels" of myvariable. > > Does a tool of this sort exist in R? > > The prompt() function is related to this, but prompt(someDataFrame) creates > a text file on disk. The > text file is associated with, but not unambiguously linked to, > someDataFrame. > > The epicalc function codebook() provides a summary of a dataframe similar to > that created by > summary() but easier to read. But this is not a way to define and keep track > of labels that are > linked to variables. > > To link a dataframe to its codebook, one could do the following "by hand": > Create a list, say, > "somedata", where somedata$DATA is a dataframe that contains the data, and > somedata$VARIABLE is also > a dataframe, but serves as the codebook. For instance, the following > function creates a template > into which one could subsequently edit to insert variable labels and turn > into somedata$VARIABLE. > > fnJunk <-function( THESEDATA ) { > # From a dataframe, make the start of a codebook. > if(!is.data.frame(THESEDATA)) stop("!is.data.frame(THESEDATA)") > data.frame( > Variable=names(THESEDATA) > , class=sapply(THESEDATA, class) > , type=sapply(THESEDATA, typeof) > , label="" > , comment="" > ) > } > > > But the following automatic behavior would be nice: > > • We should be able to treat somedata exactly as we treat a > dataframe, so that the > fact that it possesses a "codebook" is merely an added benefit, not > an interference with the > usual tasks. > > • If we delete a column of somedata$DATA, the associated row of > somedata$VARIABLE > should be automatically deleted. > > • If we add a column to somedata$DATA, the associated column > should be inserted in > somedata$VARIABLE, and some of the fields automatically populated > such as variable name and > type. It could get fancier. For instance: > > • If we try to add a value to a field in somedata$DATA which is > not permitted by the > "permissible values" listed for this field in somedata$VARIABLE, we > get an error. > > Has anyone already thought this through, maybe defined a class and > associated methods? > > Thanks > > Jacob A. Wegelin > Assistant Professor > Department of Biostatistics > Virginia Commonwealth University > 730 East Broad Street Room 3006 > P. O. Box 980032 > Richmond VA 23298-0032 > U.S.A. E-mail: jwege...@vcu.edu URL: http://www.people.vcu.edu/~jwegelin > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.