[R] applying math/stat functions to rows in data frame

2007-09-15 Thread Gerard Smits
Hi All,

There are a variety of functions that can be applied to a variable 
(column) in a data frame: mean, min, max, sd, range, IQR, etc.

I am aware of only two that work on the rows, using q1-q3 as example 
variables:

rowMeans(cbind(q1,q2,q3),na.rm=T)   #mean of multiple variables
rowSums (cbind(q1,q2,q3),na.rm=T)   #sum of multiple variables

Can the standard column functions (listed in the first sentence) be 
applied to rows, with the use of correct indexes to reference the 
columns of interest?  Or, must these summary functions be programmed 
separately to work on a row?

Thanks,

Gerard



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] applying math/stat functions to rows in data frame

2007-09-15 Thread Robert A LaBudde
At 12:02 PM 9/15/2007, Gerald wrote:
>Hi All,
>
>There are a variety of functions that can be applied to a variable
>(column) in a data frame: mean, min, max, sd, range, IQR, etc.
>
>I am aware of only two that work on the rows, using q1-q3 as example
>variables:
>
>rowMeans(cbind(q1,q2,q3),na.rm=T)   #mean of multiple variables
>rowSums (cbind(q1,q2,q3),na.rm=T)   #sum of multiple variables
>
>Can the standard column functions (listed in the first sentence) be
>applied to rows, with the use of correct indexes to reference the
>columns of interest?  Or, must these summary functions be programmed
>separately to work on a row?

Try using t() to transpose the matrix, and then apply the column 
function of interest.


Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

"Vere scire est per causas scire"

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] applying math/stat functions to rows in data frame

2007-09-15 Thread Marc Schwartz
On Sat, 2007-09-15 at 09:02 -0700, Gerard Smits wrote:
> Hi All,
> 
> There are a variety of functions that can be applied to a variable 
> (column) in a data frame: mean, min, max, sd, range, IQR, etc.
> 
> I am aware of only two that work on the rows, using q1-q3 as example 
> variables:
> 
> rowMeans(cbind(q1,q2,q3),na.rm=T)   #mean of multiple variables
> rowSums (cbind(q1,q2,q3),na.rm=T)   #sum of multiple variables
> 
> Can the standard column functions (listed in the first sentence) be 
> applied to rows, with the use of correct indexes to reference the 
> columns of interest?  Or, must these summary functions be programmed 
> separately to work on a row?
> 
> Thanks,
> 
> Gerard

The answer is: it depends

If the row can be coerced to a numeric vector, then yes. This presumes
that the data frame contains a single data type or the subset of columns
you need contains a single data type.

If the row contains multiple data types, then the row becomes a single
row data frame or a list and you would have to consider other possible
approaches.

For example:

Taking the first row of the 'iris' dataset becomes a single row data
frame:

> str(iris[1, ])
'data.frame':   1 obs. of  5 variables:
 $ Sepal.Length: num 5.1
 $ Sepal.Width : num 3.5
 $ Petal.Length: num 1.4
 $ Petal.Width : num 0.2
 $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1

or if you set 'drop = TRUE', a list:

> str(iris[1, , drop = TRUE])
List of 5
 $ Sepal.Length: num 5.1
 $ Sepal.Width : num 3.5
 $ Petal.Length: num 1.4
 $ Petal.Width : num 0.2
 $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1


If however, you remove the last column Species, which is a factor, you
can coerce the remaining object to a numeric matrix:

> str(as.matrix(iris[, -5]))
 num [1:150, 1:4] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"



Some functions will do this coercion internally:

For example:

> rowSums(iris)
Error in rowSums(x, prod(dn), p, na.rm) : 'x' must be numeric


However:

> head(rowSums(iris[, -5]))
[1] 10.2  9.5  9.4  9.4 10.2 11.4


HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] applying math/stat functions to rows in data frame

2007-09-15 Thread Gavin Simpson
On Sat, 2007-09-15 at 09:02 -0700, Gerard Smits wrote:
> Hi All,
> 
> There are a variety of functions that can be applied to a variable 
> (column) in a data frame: mean, min, max, sd, range, IQR, etc.

But one their own, these are not equivalents to rowMeans, rowSums etc
below.

> 
> I am aware of only two that work on the rows, using q1-q3 as example 
> variables:
> 
> rowMeans(cbind(q1,q2,q3),na.rm=T)   #mean of multiple variables
> rowSums (cbind(q1,q2,q3),na.rm=T)   #sum of multiple variables

If you really want to apply a function to the individual rows of a
matrix-like object then apply() is your friend:

?rowMeans states:

Details:

 These functions are equivalent to use of 'apply' with 'FUN = mean'
 or 'FUN = sum' with appropriate margins, but are a lot faster.

So see ?apply and argument 'margin'. For rows use margin = 1, e.g.:

dat <- matrix(runif(1000), ncol = 100)
apply(dat, 1, mean)
rowMeans(dat)


> 
> Can the standard column functions (listed in the first sentence) be 
> applied to rows, with the use of correct indexes to reference the 
> columns of interest?  Or, must these summary functions be programmed 
> separately to work on a row?

You can only use those functions on a column via subsetting, e.g.:

mean(dat[,4])
min(dat[,4])

If all you want is a single row (the equivalent of what you seem to be
asking) then these also work:

mean(dat[4,])
min(dat[4,])

HTH

G

> 
> Thanks,
> 
> Gerard
> 
> 
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] applying math/stat functions to rows in data frame

2007-09-17 Thread Gerard Smits
Hi all,

Thanks for the suggestions. I have not yet tried the apply () 
approach, but have tried to get the indexed version working, so far 
with limited success.  I realize that a transpose, as suggested, 
would work, but want to avoid that for something simpler.

To repeat, the task is to perform a function on N rows (all numeric) 
in a data frame.  I can use rowMeans, rowSums, pmin, and pmax to 
successfully average, sum, and find the min/max var1-var4).  But If I 
try to get the mean (var5) using, no doubt the incorrect syntax: 
mean(df[,c(4,5,6,7)], na.rm=T), I do not get the mean of 4 column.  I 
am not sure what is being returned as a value.

I summarize the code below with the output: any further suggestions 
are appreciated,

Thanks Gerard


Code with output:

 > df <- read.table(textConnection("id,workshop,gender,q1,q2,q3,q4
+ 1,1,f,1,1,5,1
+ 2,2,f,2,1,4,1
+ 3,1,f,2,2,4,3
+ 4,2,f,3,1, ,3
+ 5,1,m,4,5,2,4
+ 6,2,m,5,4,5,5
+ 7,1,m,5,3,4,4
+ 8,2,m,4,5,5,5"), header=TRUE, sep=",")
 >
 > attach(df)
 >
 > df$var1<-rowMeans(cbind(q1,q2,q3,q4),na.rm=T)
 > df$var2<-rowSums (cbind(q1,q2,q3,q4),na.rm=T)
 > df$var3<-pmin(q1,q2,q3,q4, na.rm=T)
 > df$var4<-pmax(q1,q2,q3,q4, na.rm=T)
 >
 > df$var5<-mean(df[,c(4,5,6,7)],na.rm=T)  #not doing what I want
 > df$var6<-sd  (df[,c(4,5,6,5)],na.rm=T)  #not doing what I want
 > df$var7<-min (df[,c(4,5,6,5)],na.rm=T)  #not doing what I want
 > df$var8<-max (df[,c(4,5,6,5)],na.rm=T)  #not doing what I want
 >
 > df


output with problem vars underlined:

   id workshop gender q1 q2 q3 q4 var1 var2 var3 
var4 var5 var6 var7 var8
1  11  f  1  1  5  1 2.00815 3.25 
1.48804815
2  22  f  2  1  4  1 2.00814 2.75 
1.75254915
3  31  f  2  2  4  3 2.75   1124 4.142857 
1.06904515
4  42  f  3  1 NA  3 2.33713 3.25 
1.75254915
5  51  m  4  5  2  4 3.75   1525 3.25 
1.48804815
6  62  m  5  4  5  5 4.75   1945 2.75 
1.75254915
7  71  m  5  3  4  4 4.00   1635 4.142857 
1.06904515
8  82  m  4  5  5  5 4.75   1945 3.25 
1.75254915
 >

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.