On 5/16/08, Jim Price <[EMAIL PROTECTED]> wrote: > > I have stumbled across something in the Lattice package that is vexing me. > Consider the code below: > __________________________________________________________ > > library(lattice) > > > myData <- expand.grid(sub = factor(1:16), time = 1:10) > > myData$observed <- rnorm(nrow(myData)) > myData$fitted <- with(myData, ave(observed, sub, FUN = mean)) > myData$event.time <- with(myData, ave(observed, sub, FUN = function(.x) 10 * > runif(1))) > > myData <- myData[order(myData$sub, myData$time),] > > > > # This version works... > xyplot( > fitted + observed ~ time | sub, > data = myData, > subscripts = TRUE, > panel = function(..., groups = groups, subscripts = subscripts) > { > panel.xyplot(..., groups = groups, subscripts = subscripts) > > event.time <- unique(myData$event.time[subscripts]) > panel.abline(v = event.time, lty = 2, col = 'green') > }, > type = c('l','p'), > distribute.type = TRUE, > as.table = TRUE > ) > > > > # ...but when you add the subset parameter it produces multiple index lines > per subject > xyplot( > fitted + observed ~ time | sub, > data = myData, > subset = sub %in% sample(unique(sub), 9), > subscripts = TRUE, > panel = function(..., groups = groups, subscripts = subscripts) > { > panel.xyplot(..., groups = groups, subscripts = subscripts) > > event.time <- unique(myData$event.time[subscripts]) > # print(event.time) > panel.abline(v = event.time, lty = 2, col = 'green') > }, > type = c('l','p'), > distribute.type = TRUE, > as.table = TRUE > ) > > ___________________________________________________________________ > > > The (commented out) print statement I think indicates that there is a data > reordering going on for the second example, that is causing the multiple > index lines issue. Is there a neat solution to get the correct index lines > per subject, or do I need a workaround? Or am I missing something > fundamental in the code above that is causing issues?
The short answer is that the value of 'subscripts' in the panel function has an unusual interpretation when you use the extended formula notation (as in 'fitted + observed ~ time') You would get better insight if you also printed out the 'subscripts' variable in each panel function. Basically, when you use the extended notation, lattice creates an artificial 'groups' variable that is longer than the original data frame, and consequently, the 'subscripts' variable must also have indices that exceed the number of rows of the original data frame. This is not a problem in your first call, because event.time[subscripts] returns NA for out-of-bound indices. When using 'subset', things get complicated further because this artificial 'groups' variable is computed after the subsetting, and 'subscripts' no longer even partially refer to the original data frame. There is no good solution to this; it is the price you pay for the usefulness of the extended formula notation. Your reasonable options are (1) to subset your data beforehand; e.g., myDataSub <- subset(myData, sub %in% sample(unique(sub), 9)) and then use myDataSub in the call. (2) not use the extended formula (this is conceptually cleaner, I think); e.g., xyplot(observed ~ time | sub, data = myData, subset = sub %in% sample(unique(sub), 9), fitted = myData$fitted, event.time = myData$event.time, panel = function(x, y, fitted, event.time, subscripts, ...) { panel.lines(x, fitted[subscripts], col = "black") panel.xyplot(x, y, ...) event.time <- unique(event.time[subscripts]) panel.abline(v = event.time, lty = 2, col = 'green') }, as.table = TRUE) -Deepayan ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.