> -----Original Message----- > From: Prof Brian Ripley [mailto:[EMAIL PROTECTED] > Sent: Fri 8/3/2007 1:05 PM > To: Steven McKinney > Cc: r-help@stat.math.ethz.ch > Subject: Re: [R] FW: Selecting undefined column of a data frame (was [BioC] > read.phenoData vs read.AnnotatedDataFrame) > > I've since seen your followup a more detailed explanation may help. > The path through the code for your argument list does not go where you > quoted, and there is a reason for it. >
Using a copy of "[.data.frame" with browser() I have traced the flow of execution. (My copy with the browser command is at the end of this email) > foo[, "FileName"] Called from: `[.data.frame`(foo, , "FileName") Browse[1]> n debug: mdrop <- missing(drop) Browse[1]> n debug: Narg <- nargs() - (!mdrop) Browse[1]> n debug: if (Narg < 3) { if (!mdrop) warning("drop argument will be ignored") if (missing(i)) return(x) if (is.matrix(i)) return(as.matrix(x)[i]) y <- NextMethod("[") cols <- names(y) if (!is.null(cols) && any(is.na(cols))) stop("undefined columns selected") if (any(duplicated(cols))) names(y) <- make.unique(cols) return(structure(y, class = oldClass(x), row.names = .row_names_info(x, 0L))) } Browse[1]> n debug: if (missing(i)) { if (missing(j) && drop && length(x) == 1L) return(.subset2(x, 1L)) y <- if (missing(j)) x else .subset(x, j) if (drop && length(y) == 1L) return(.subset2(y, 1L)) cols <- names(y) if (any(is.na(cols))) stop("undefined columns selected") if (any(duplicated(cols))) names(y) <- make.unique(cols) nrow <- .row_names_info(x, 2L) if (drop && !mdrop && nrow == 1L) return(structure(y, class = NULL, row.names = NULL)) else return(structure(y, class = oldClass(x), row.names = .row_names_info(x, 0L))) } Browse[1]> n debug: if (missing(j) && drop && length(x) == 1L) return(.subset2(x, 1L)) Browse[1]> n debug: y <- if (missing(j)) x else .subset(x, j) Browse[1]> n debug: if (drop && length(y) == 1L) return(.subset2(y, 1L)) Browse[1]> n NULL > So `[.data.frame` is exiting after executing + if (drop && length(y) == 1L) + return(.subset2(y, 1L)) ## This returns a result before undefined columns check is done. Is this intended? Couldn't the error check + cols <- names(y) + if (any(is.na(cols))) + stop("undefined columns selected") be done before the above return()? What would break if the error check on column names was done before returning a NULL result due to incorrect column name spelling? Why should > foo[, "FileName"] NULL differ from > foo[seq(nrow(foo)), "FileName"] Error in `[.data.frame`(foo, seq(nrow(foo)), "FileName") : undefined columns selected > Thank you for your explanations. > Generally when you extract in R and ask for an non-existent index you get > NA or NULL as the result (and no warning), e.g. > > > y <- list(x=1, y=2) > > y[["z"]] > NULL > > Because data frames 'must' have (column) names, they are a partial > exception and when the result is a data frame you get an error if it would > contain undefined columns. > > But in the case of foo[, "FileName"], the result is a single column and so > will not have a name: there seems no reason to be different from > > > foo[["FileName"]] > NULL > > foo$FileName > NULL > > which similarly select a single column. At one time they were different > in R, for no documented reason. > > > On Fri, 3 Aug 2007, Prof Brian Ripley wrote: > > > You are reading the wrong part of the code for your argument list: > > > >> foo["FileName"] > > Error in `[.data.frame`(foo, "FileName") : undefined columns selected > > > > [.data.frame is one of the most complex functions in R, and does many > > different things depending on which arguments are supplied. > > > > > > On Fri, 3 Aug 2007, Steven McKinney wrote: > > > >> Hi all, > >> > >> What are current methods people use in R to identify > >> mis-spelled column names when selecting columns > >> from a data frame? > >> > >> Alice Johnson recently tackled this issue > >> (see [BioC] posting below). > >> > >> Due to a mis-spelled column name ("FileName" > >> instead of "Filename") which produced no warning, > >> Alice spent a fair amount of time tracking down > >> this bug. With my fumbling fingers I'll be tracking > >> down such a bug soon too. > >> > >> Is there any options() setting, or debug technique > >> that will flag data frame column extractions that > >> reference a non-existent column? It seems to me > >> that the "[.data.frame" extractor used to throw an > >> error if given a mis-spelled variable name, and I > >> still see lines of code in "[.data.frame" such as > >> > >> if (any(is.na(cols))) > >> stop("undefined columns selected") > >> > >> > >> > >> In R 2.5.1 a NULL is silently returned. > >> > >>> foo <- data.frame(Filename = c("a", "b")) > >>> foo[, "FileName"] > >> NULL > >> > >> Has something changed so that the code lines > >> if (any(is.na(cols))) > >> stop("undefined columns selected") > >> in "[.data.frame" no longer work properly (if > >> I am understanding the intention properly)? > >> > >> If not, could "[.data.frame" check an > >> options() variable setting (say > >> warn.undefined.colnames) and throw a warning > >> if a non-existent column name is referenced? > >> > >> > >> > >> > >>> sessionInfo() > >> R version 2.5.1 (2007-06-27) > >> powerpc-apple-darwin8.9.1 > >> > >> locale: > >> en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8 > >> > >> attached base packages: > >> [1] "stats" "graphics" "grDevices" "utils" "datasets" "methods" > >> "base" > >> > >> other attached packages: > >> plotrix lme4 Matrix lattice > >> "2.2-3" "0.99875-4" "0.999375-0" "0.16-2" > >>> > >> > >> > >> > >> Steven McKinney > >> > >> Statistician > >> Molecular Oncology and Breast Cancer Program > >> British Columbia Cancer Research Centre > >> > >> email: smckinney +at+ bccrc +dot+ ca > >> > >> tel: 604-675-8000 x7561 > >> > >> BCCRC > >> Molecular Oncology > >> 675 West 10th Ave, Floor 4 > >> Vancouver B.C. > >> V5Z 1L3 > >> Canada > >> > >> > > > -- > Brian D. Ripley, [EMAIL PROTECTED] > Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ > University of Oxford, Tel: +44 1865 272861 (self) > 1 South Parks Road, +44 1865 272866 (PA) > Oxford OX1 3TG, UK Fax: +44 1865 272595 > > > > "[.data.frame" <- + Function (x, i, j, drop = if (missing(i)) TRUE else length(cols) == + 1) + { + browser() + mdrop <- missing(drop) + Narg <- nargs() - (!mdrop) + if (Narg < 3) { + if (!mdrop) + warning("drop argument will be ignored") + if (missing(i)) + return(x) + if (is.matrix(i)) + return(as.matrix(x)[i]) + y <- NextMethod("[") + cols <- names(y) + if (!is.null(cols) && any(is.na(cols))) + stop("undefined columns selected") + if (any(duplicated(cols))) + names(y) <- make.unique(cols) + return(structure(y, class = oldClass(x), row.names = .row_names_info(x, + 0L))) + } + if (missing(i)) { + if (missing(j) && drop && length(x) == 1L) + return(.subset2(x, 1L)) + y <- if (missing(j)) + x + else .subset(x, j) + if (drop && length(y) == 1L) + return(.subset2(y, 1L)) ## This returns a result before undefined columns check is done. Is this intended? + cols <- names(y) + if (any(is.na(cols))) + stop("undefined columns selected") + if (any(duplicated(cols))) + names(y) <- make.unique(cols) + nrow <- .row_names_info(x, 2L) + if (drop && !mdrop && nrow == 1L) + return(structure(y, class = NULL, row.names = NULL)) + else return(structure(y, class = oldClass(x), row.names = .row_names_info(x, + 0L))) + } + xx <- x + cols <- names(xx) + x <- vector("list", length(x)) + x <- .Call("R_copyDFattr", xx, x, PACKAGE = "base") + oldClass(x) <- attr(x, "row.names") <- NULL + if (!missing(j)) { + x <- x[j] + cols <- names(x) + if (any(is.na(cols))) + stop("undefined columns selected") + nxx <- structure(seq_along(xx), names = names(xx)) + sxx <- match(nxx[j], seq_along(xx)) + } + else sxx <- seq_along(x) + rows <- NULL + if (is.character(i)) { + rows <- attr(xx, "row.names") + i <- pmatch(i, rows, duplicates.ok = TRUE) + } + for (j in seq_along(x)) { + xj <- xx[[sxx[j]]] + x[[j]] <- if (length(dim(xj)) != 2L) + xj[i] + else xj[i, , drop = FALSE] + } + if (drop) { + n <- length(x) + if (n == 1L) + return(x[[1L]]) + if (n > 1L) { + xj <- x[[1L]] + nrow <- if (length(dim(xj)) == 2L) + dim(xj)[1L] + else length(xj) + drop <- !mdrop && nrow == 1L + } + else drop <- FALSE + } + if (!drop) { + if (is.null(rows)) + rows <- attr(xx, "row.names") + rows <- rows[i] + if ((ina <- any(is.na(rows))) | (dup <- any(duplicated(rows)))) { + if (ina) + rows[is.na(rows)] <- "NA" + if (dup) + rows <- make.unique(as.character(rows)) + } + if (any(duplicated(nm <- names(x)))) + names(x) <- make.unique(nm) + if (is.null(rows)) + rows <- attr(xx, "row.names")[i] + attr(x, "row.names") <- rows + oldClass(x) <- oldClass(xx) + } + x + } > ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.