On Sun, 1 Apr 2007, John Fox wrote:
Dear r-devel members,
It's just been brought to my attention that R permits non-unique column
names in data frames -- e.g., via assignment to names() or colnames(). This
behaviour is consistent with the help files (as I discovered), but it's not
consistent with the behaviour of rownames() and row.names(). For example,
?? matrices and data frames are different, but rownames() and row.names()
do the same on each class.
row.names(airquality) - rep(a, nrow(airquality))
generates an error, but
as does rownames().
names(airquality) - rep(a, ncol(airquality))
or even
names(airquality) - rep(, ncol(airquality))
do not.
I figure that there must be some rationale for this difference, but I can't
think of what it might be. Any thoughts?
It's part of the definition of a data frame, from long ago (White Book
p.60). Think of the row names as a 'primary key' in the sense of a
DBMS/SQL.
Why the names are not also required to be non-empty and unique
is something for the designer (and John Chambers has not (yet) replied),
but it is clearly deliberate as data.frame(check.names=FALSE) is allowed.
One possible issue is that there are many ways to set names of a data
frame, e.g. DF$name - value can add a column, and checking them all could
be tedious. OTOH, setting row names is centralized (it is done inside
attr-()).
--
Brian D. Ripley, [EMAIL PROTECTED]
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax: +44 1865 272595
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel