Re: [R] for loop over dataframe without indices

2003-12-20 Thread Gabor Grothendieck



I think I've found a problem with the by approach.  Compare:

data(iris)
by( iris, row.names(iris), function(x)x )[1:5,]

to

iris[1:5,]

It seems by has reordered the rows.

 
Date: Fri, 19 Dec 2003 21:31:50 -0500 (EST) 
From: Gabor Grothendieck [EMAIL PROTECTED]
To: [EMAIL PROTECTED] 
Cc: [EMAIL PROTECTED] 
Subject: Re: [R] for loop over dataframe without indices 

 
 


Thomas, Thanks for your response. Its is quite nifty. 

Pursuing your solutions,
I think the objective should be to reproduce the output from 
t.data.frame defined as below (note that I posted a proposal
to change t.data.frame to r-devel before I received your reply):

t.data.frame - function( df ) { 
ll - NULL
for( i in 1:nrow(df) ) ll - append( ll, list(df[i,]) )
ll 
}

Using the first 3 rows from the iris data set as our data frame,
run the following which shows that your by solution works provided
we nullify out the attributes afterwards. The do.call solution
does not appear to work, as required, since it turns the data 
frame into a matrix.

data(iris)
df - iris[1:3,]

# Consider:

id - function(x)x

# t.data.frame solution
zt - t(df)

# by solution is good but it adds some junk attributes 
zby - by( df, row.names(df), id )
identical(zt,zby) # FALSE

# nullifying these attributes seems to do it
zby2 - zby
attributes(zby2) - NULL
identical(zt,zby2) # TRUE

# do.call doesn't work right since it appears to turn the result into a matrix
str( do.call(mapply, list(id,df) ) ) # note matrix output


Here is the result of pasting the above into R 1.8.1 on Windows 2000:

 data(iris)
 df - iris[1:3,]
 
 # Consider:
 
 id - function(x)x
 
 # t.data.frame solution
 zt - t(df)
 
 # by solution is good but it adds some junk attributes 
 zby - by( df, row.names(df), id )
 identical(zt,zby)
[1] FALSE
 
 # nullifying these attributes seems to do it
 zby2 - zby
 attributes(zby2) - NULL
 identical(zt,zby2)
[1] TRUE
 
 # do.call doesn't work right since it appears to turn the result into a matrix
 str( do.call(mapply, list(id,df) ) )
num [1:3, 1:5] 5.1 4.9 4.7 3.5 3 3.2 1.4 1.4 1.3 0.2 ...
- attr(*, dimnames)=List of 2
..$ : NULL
..$ : NULL
 


Based on your solution I think the proposal should be changed
to:

t.data.frame - function(df) {
z - by( df, row.names(df), function(x)x )
attributes(z) - NULL
z
}


---

Date: Fri, 19 Dec 2003 10:03:55 -0800 (PST) 
From: Thomas Lumley [EMAIL PROTECTED]
To: Gabor Grothendieck [EMAIL PROTECTED] 
Cc: [EMAIL PROTECTED] 
Subject: Re: [R] for loop over dataframe without indices 



On Fri, 19 Dec 2003, Gabor Grothendieck wrote:

 What I now realize is that the thing that is oddly
 missing in R is that you can't do an apply over
 the rows of a dataframe (at least not without having
 it coerced to an array and the elements coerced to
 possibly different types). The documentation does
 point this out. Its not a bug but its an omission
 that seems deserving of being addressed.


Since mapply() applies a function to each 'row' of a list of vectors, ou
can achieve this effect with
do.call(mapply, list(FUN,data.frame))
and also as a degenerate case of by():
by(data.frame, row.names(data.frame), FUN)

These should probably be documented under apply()


-thomas

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] for loop over dataframe without indices

2003-12-19 Thread Peter Wolf
Try:

 data(iris); df-as.data.frame(t(iris[1:3,]))
 for(i in df) print(i)
[1] 5.13.51.40.2setosa
Levels: 0.2 1.4 3.5 5.1 setosa
[1] 4.93.01.40.2setosa
Levels: 0.2 1.4 3.0 4.9 setosa
[1] 4.73.21.30.2setosa
Levels: 0.2 1.3 3.2 4.7 setosa
... however, not very nice

Peter Wolf

Gabor Grothendieck wrote:

Based on an off list email conversation, I had I am concerned that
my original email was not sufficiently clear.
Recall that I wanted to use a for loop to iterate over the rows of 
a dataframe without using indices.   Its easy to do this over
the columns (for(v in df) ...) but not for rows.

What I wanted to do is might be something like this. 
Define a function, rows, which takes a dataframe, df, as input 
and converts it to the structure: 
list(df[1,], df[2,], ..., df[n,]) where there are n rows:

rows - function( df ) { 
 ll - NULL
 for( i in 1:nrow(df) ) 
  ll - append( ll, list(df[i,]) )
 ll 
}

This allows us to iterate over the rows of df without indices like this:

data( iris )
df - iris[1:3,] # use 1st 3 rows of iris data set as df
for( v in rows(df) ) print(v)
Of course, this involves iterating over the rows of df twice --
once within rows() and once in the for loop. Perhaps this is
the price one must pay for being able to eliminate index 
computations from a for loop or is it? Have I answered my 
own question or is there a better way to use a for loop 
over the rows of a dataframe without indices?

--- 
Date: Thu, 18 Dec 2003 19:20:04 -0500 
From: Gabor Grothendieck [EMAIL PROTECTED]
To: [EMAIL PROTECTED] 
Subject: for loop over dataframe without indices 



One can perform a for loop without indices over the columns
of a dataframe like this:
for( v in df ) ... some statements involving v ...

Is there some way to do this for rows other than using indices:

for( i in 1:nrow(df) ) ... some statements involving df[i,] ...

If the dataframe had only numeric entries I could transpose it
and then do it over columns but what about the general case?
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
 

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] for loop over dataframe without indices

2003-12-19 Thread Gabor Grothendieck


Regarding my problem of how to use a for loop over
the rows of a dataframe without using indices,
several people mentioned using transpose and then
iterating over the columns (which were the rows)
and one person suggested apply(df,1,list);
however, both these solutions coerce the data to
different types.

What I now realize is that the thing that is oddly
missing in R is that you can't do an apply over
the rows of a dataframe (at least not without having
it coerced to an array and the elements coerced to
possibly different types).  The documentation does
point this out.  Its not a bug but its an omission
that seems deserving of being addressed.

Thus I propose that apply be extended to handle
data frames directly.   Any comments on this 
before I send a message to r-devel?


(In terms of my previous posting, with such an apply
one could do:

rows - function(df) apply( df, 1, function(x)x )
for( v in rows(df) ) ... some statements involving v ...

There is still the limitation, of course, that one can
only _access_ rows of df like this.  One still needs
indices to change them.  

As an aside, should id - function(x)x and rows, as defined
above, be predefined in R?  id certainly plays a special 
role in mathematics and it seems natural to want to iterate
over rows and not just columns of dataframes.

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] for loop over dataframe without indices

2003-12-19 Thread Thomas Lumley
On Fri, 19 Dec 2003, Gabor Grothendieck wrote:

 What I now realize is that the thing that is oddly
 missing in R is that you can't do an apply over
 the rows of a dataframe (at least not without having
 it coerced to an array and the elements coerced to
 possibly different types).  The documentation does
 point this out.  Its not a bug but its an omission
 that seems deserving of being addressed.


Since mapply() applies a function to each 'row' of a list of vectors, ou
can achieve this effect with
do.call(mapply, list(FUN,data.frame))
and also as a degenerate case of by():
by(data.frame, row.names(data.frame), FUN)

These should probably be documented under apply()


-thomas

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] for loop over dataframe without indices

2003-12-19 Thread Gabor Grothendieck


Thomas, Thanks for your response.  Its is quite nifty.  

Pursuing your solutions,
I think the objective should be to reproduce the output from 
t.data.frame defined as below (note that I posted a proposal
to change t.data.frame to r-devel before I received your reply):

t.data.frame - function( df ) { 
  ll - NULL
  for( i in 1:nrow(df) ) ll - append( ll, list(df[i,]) )
  ll 
}

Using the first 3 rows from the iris data set as our data frame,
run the following which shows that your by solution works provided
we nullify out the attributes afterwards.  The do.call solution
does not appear to work, as required, since it turns the data 
frame into a matrix.

data(iris)
df - iris[1:3,]

# Consider:

id - function(x)x

# t.data.frame solution
zt - t(df)

# by solution is good but it adds some junk attributes 
zby - by( df, row.names(df), id )
identical(zt,zby) # FALSE

# nullifying these attributes seems to do it
zby2 - zby
attributes(zby2) - NULL
identical(zt,zby2) # TRUE

# do.call doesn't work right since it appears to turn the result into a matrix
str( do.call(mapply, list(id,df) ) ) # note matrix output


Here is the result of pasting the above into R 1.8.1 on Windows 2000:

 data(iris)
 df - iris[1:3,]
 
 # Consider:
 
 id - function(x)x
 
 # t.data.frame solution
 zt - t(df)
 
 # by solution is good but it adds some junk attributes 
 zby - by( df, row.names(df), id )
 identical(zt,zby)
[1] FALSE
 
 # nullifying these attributes seems to do it
 zby2 - zby
 attributes(zby2) - NULL
 identical(zt,zby2)
[1] TRUE
 
 # do.call doesn't work right since it appears to turn the result into a matrix
 str( do.call(mapply, list(id,df) ) )
 num [1:3, 1:5] 5.1 4.9 4.7 3.5 3 3.2 1.4 1.4 1.3 0.2 ...
 - attr(*, dimnames)=List of 2
  ..$ : NULL
  ..$ : NULL
 


Based on your solution I think the proposal should be changed
to:

t.data.frame - function(df) {
  z - by( df, row.names(df), function(x)x )
  attributes(z) - NULL
  z
}


---

Date: Fri, 19 Dec 2003 10:03:55 -0800 (PST) 
From: Thomas Lumley [EMAIL PROTECTED]
To: Gabor Grothendieck [EMAIL PROTECTED] 
Cc: [EMAIL PROTECTED] 
Subject: Re: [R] for loop over dataframe without indices 

 
 
On Fri, 19 Dec 2003, Gabor Grothendieck wrote:

 What I now realize is that the thing that is oddly
 missing in R is that you can't do an apply over
 the rows of a dataframe (at least not without having
 it coerced to an array and the elements coerced to
 possibly different types). The documentation does
 point this out. Its not a bug but its an omission
 that seems deserving of being addressed.


Since mapply() applies a function to each 'row' of a list of vectors, ou
can achieve this effect with
 do.call(mapply, list(FUN,data.frame))
and also as a degenerate case of by():
 by(data.frame, row.names(data.frame), FUN)

These should probably be documented under apply()


 -thomas

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] for loop over dataframe without indices

2003-12-18 Thread Gabor Grothendieck


Based on an off list email conversation, I had I am concerned that
my original email was not sufficiently clear.

Recall that I wanted to use a for loop to iterate over the rows of 
a dataframe without using indices.   Its easy to do this over
the columns (for(v in df) ...) but not for rows.

What I wanted to do is might be something like this. 
Define a function, rows, which takes a dataframe, df, as input 
and converts it to the structure: 
list(df[1,], df[2,], ..., df[n,]) where there are n rows:

 rows - function( df ) { 
  ll - NULL
  for( i in 1:nrow(df) ) 
   ll - append( ll, list(df[i,]) )
  ll 
 }

This allows us to iterate over the rows of df without indices like this:

 data( iris )
 df - iris[1:3,] # use 1st 3 rows of iris data set as df
 for( v in rows(df) ) print(v)

Of course, this involves iterating over the rows of df twice --
once within rows() and once in the for loop. Perhaps this is
the price one must pay for being able to eliminate index 
computations from a for loop or is it? Have I answered my 
own question or is there a better way to use a for loop 
over the rows of a dataframe without indices?

--- 
Date: Thu, 18 Dec 2003 19:20:04 -0500 
From: Gabor Grothendieck [EMAIL PROTECTED]
To: [EMAIL PROTECTED] 
Subject: for loop over dataframe without indices 




One can perform a for loop without indices over the columns
of a dataframe like this:

for( v in df ) ... some statements involving v ...

Is there some way to do this for rows other than using indices:

for( i in 1:nrow(df) ) ... some statements involving df[i,] ...

If the dataframe had only numeric entries I could transpose it
and then do it over columns but what about the general case?

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help