subject:"\[Rd\] Non\-unique column names in data frames"

Re: [Rd] Non-unique column names in data frames

2007-04-03 Thread Prof Brian Ripley

On Sun, 1 Apr 2007, John Fox wrote:

 Dear r-devel members,

 It's just been brought to my attention that R permits non-unique column
 names in data frames -- e.g., via assignment to names() or colnames(). This
 behaviour is consistent with the help files (as I discovered), but it's not
 consistent with the behaviour of rownames() and row.names(). For example,

??  matrices and data frames are different, but rownames() and row.names() 
do the same on each class.


   row.names(airquality) - rep(a, nrow(airquality))

 generates an error, but

as does rownames().


   names(airquality) - rep(a, ncol(airquality))

 or even

   names(airquality) - rep(, ncol(airquality))

 do not.

 I figure that there must be some rationale for this difference, but I can't
 think of what it might be. Any thoughts?

It's part of the definition of a data frame, from long ago (White Book 
p.60).  Think of the row names as a 'primary key' in the sense of a 
DBMS/SQL.

Why the names are not also required to be non-empty and unique 
is something for the designer (and John Chambers has not (yet) replied), 
but it is clearly deliberate as data.frame(check.names=FALSE) is allowed.
One possible issue is that there are many ways to set names of a data 
frame, e.g. DF$name - value can add a column, and checking them all could 
be tedious.  OTOH, setting row names is centralized (it is done inside
attr-()).

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Non-unique column names in data frames

2007-04-01 Thread John Fox

Dear r-devel members,

It's just been brought to my attention that R permits non-unique column
names in data frames -- e.g., via assignment to names() or colnames(). This
behaviour is consistent with the help files (as I discovered), but it's not
consistent with the behaviour of rownames() and row.names(). For example,

row.names(airquality) - rep(a, nrow(airquality)) 

generates an error, but 

names(airquality) - rep(a, ncol(airquality))

or even 

names(airquality) - rep(, ncol(airquality))

do not.

I figure that there must be some rationale for this difference, but I can't
think of what it might be. Any thoughts?

Regards,
 John


John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Non-unique column names in data frames

[Rd] Non-unique column names in data frames

2 matches

Site Navigation

Mail list logo

Footer information