On Sun, 1 Apr 2007, John Fox wrote: > Dear r-devel members, > > It's just been brought to my attention that R permits non-unique column > names in data frames -- e.g., via assignment to names() or colnames(). This > behaviour is consistent with the help files (as I discovered), but it's not > consistent with the behaviour of rownames() and row.names(). For example,
?? matrices and data frames are different, but rownames() and row.names() do the same on each class. > > row.names(airquality) <- rep("a", nrow(airquality)) > > generates an error, but as does rownames(). > > names(airquality) <- rep("a", ncol(airquality)) > > or even > > names(airquality) <- rep("", ncol(airquality)) > > do not. > > I figure that there must be some rationale for this difference, but I can't > think of what it might be. Any thoughts? It's part of the definition of a data frame, from long ago (White Book p.60). Think of the row names as a 'primary key' in the sense of a DBMS/SQL. Why the names are not also required to be non-empty and unique is something for the designer (and John Chambers has not (yet) replied), but it is clearly deliberate as data.frame(check.names=FALSE) is allowed. One possible issue is that there are many ways to set names of a data frame, e.g. DF$name <- value can add a column, and checking them all could be tedious. OTOH, setting row names is centralized (it is done inside attr<-()). -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel