Re: [R] applying math/stat functions to rows in data frame

2007-09-17 Thread Gerard Smits
Hi all,

Thanks for the suggestions. I have not yet tried the apply () 
approach, but have tried to get the indexed version working, so far 
with limited success.  I realize that a transpose, as suggested, 
would work, but want to avoid that for something simpler.

To repeat, the task is to perform a function on N rows (all numeric) 
in a data frame.  I can use rowMeans, rowSums, pmin, and pmax to 
successfully average, sum, and find the min/max var1-var4).  But If I 
try to get the mean (var5) using, no doubt the incorrect syntax: 
mean(df[,c(4,5,6,7)], na.rm=T), I do not get the mean of 4 column.  I 
am not sure what is being returned as a value.

I summarize the code below with the output: any further suggestions 
are appreciated,

Thanks Gerard


Code with output:

  df - read.table(textConnection(id,workshop,gender,q1,q2,q3,q4
+ 1,1,f,1,1,5,1
+ 2,2,f,2,1,4,1
+ 3,1,f,2,2,4,3
+ 4,2,f,3,1, ,3
+ 5,1,m,4,5,2,4
+ 6,2,m,5,4,5,5
+ 7,1,m,5,3,4,4
+ 8,2,m,4,5,5,5), header=TRUE, sep=,)
 
  attach(df)
 
  df$var1-rowMeans(cbind(q1,q2,q3,q4),na.rm=T)
  df$var2-rowSums (cbind(q1,q2,q3,q4),na.rm=T)
  df$var3-pmin(q1,q2,q3,q4, na.rm=T)
  df$var4-pmax(q1,q2,q3,q4, na.rm=T)
 
  df$var5-mean(df[,c(4,5,6,7)],na.rm=T)  #not doing what I want
  df$var6-sd  (df[,c(4,5,6,5)],na.rm=T)  #not doing what I want
  df$var7-min (df[,c(4,5,6,5)],na.rm=T)  #not doing what I want
  df$var8-max (df[,c(4,5,6,5)],na.rm=T)  #not doing what I want
 
  df


output with problem vars underlined:

   id workshop gender q1 q2 q3 q4 var1 var2 var3 
var4 var5 var6 var7 var8
1  11  f  1  1  5  1 2.00815 3.25 
1.48804815
2  22  f  2  1  4  1 2.00814 2.75 
1.75254915
3  31  f  2  2  4  3 2.75   1124 4.142857 
1.06904515
4  42  f  3  1 NA  3 2.33713 3.25 
1.75254915
5  51  m  4  5  2  4 3.75   1525 3.25 
1.48804815
6  62  m  5  4  5  5 4.75   1945 2.75 
1.75254915
7  71  m  5  3  4  4 4.00   1635 4.142857 
1.06904515
8  82  m  4  5  5  5 4.75   1945 3.25 
1.75254915
 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] applying math/stat functions to rows in data frame

2007-09-15 Thread Robert A LaBudde
At 12:02 PM 9/15/2007, Gerald wrote:
Hi All,

There are a variety of functions that can be applied to a variable
(column) in a data frame: mean, min, max, sd, range, IQR, etc.

I am aware of only two that work on the rows, using q1-q3 as example
variables:

rowMeans(cbind(q1,q2,q3),na.rm=T)   #mean of multiple variables
rowSums (cbind(q1,q2,q3),na.rm=T)   #sum of multiple variables

Can the standard column functions (listed in the first sentence) be
applied to rows, with the use of correct indexes to reference the
columns of interest?  Or, must these summary functions be programmed
separately to work on a row?

Try using t() to transpose the matrix, and then apply the column 
function of interest.


Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

Vere scire est per causas scire

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] applying math/stat functions to rows in data frame

2007-09-15 Thread Marc Schwartz
On Sat, 2007-09-15 at 09:02 -0700, Gerard Smits wrote:
 Hi All,
 
 There are a variety of functions that can be applied to a variable 
 (column) in a data frame: mean, min, max, sd, range, IQR, etc.
 
 I am aware of only two that work on the rows, using q1-q3 as example 
 variables:
 
 rowMeans(cbind(q1,q2,q3),na.rm=T)   #mean of multiple variables
 rowSums (cbind(q1,q2,q3),na.rm=T)   #sum of multiple variables
 
 Can the standard column functions (listed in the first sentence) be 
 applied to rows, with the use of correct indexes to reference the 
 columns of interest?  Or, must these summary functions be programmed 
 separately to work on a row?
 
 Thanks,
 
 Gerard

The answer is: it depends

If the row can be coerced to a numeric vector, then yes. This presumes
that the data frame contains a single data type or the subset of columns
you need contains a single data type.

If the row contains multiple data types, then the row becomes a single
row data frame or a list and you would have to consider other possible
approaches.

For example:

Taking the first row of the 'iris' dataset becomes a single row data
frame:

 str(iris[1, ])
'data.frame':   1 obs. of  5 variables:
 $ Sepal.Length: num 5.1
 $ Sepal.Width : num 3.5
 $ Petal.Length: num 1.4
 $ Petal.Width : num 0.2
 $ Species : Factor w/ 3 levels setosa,versicolor,..: 1

or if you set 'drop = TRUE', a list:

 str(iris[1, , drop = TRUE])
List of 5
 $ Sepal.Length: num 5.1
 $ Sepal.Width : num 3.5
 $ Petal.Length: num 1.4
 $ Petal.Width : num 0.2
 $ Species : Factor w/ 3 levels setosa,versicolor,..: 1


If however, you remove the last column Species, which is a factor, you
can coerce the remaining object to a numeric matrix:

 str(as.matrix(iris[, -5]))
 num [1:150, 1:4] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 - attr(*, dimnames)=List of 2
  ..$ : NULL
  ..$ : chr [1:4] Sepal.Length Sepal.Width Petal.Length Petal.Width



Some functions will do this coercion internally:

For example:

 rowSums(iris)
Error in rowSums(x, prod(dn), p, na.rm) : 'x' must be numeric


However:

 head(rowSums(iris[, -5]))
[1] 10.2  9.5  9.4  9.4 10.2 11.4


HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] applying math/stat functions to rows in data frame

2007-09-15 Thread Gavin Simpson
On Sat, 2007-09-15 at 09:02 -0700, Gerard Smits wrote:
 Hi All,
 
 There are a variety of functions that can be applied to a variable 
 (column) in a data frame: mean, min, max, sd, range, IQR, etc.

But one their own, these are not equivalents to rowMeans, rowSums etc
below.

 
 I am aware of only two that work on the rows, using q1-q3 as example 
 variables:
 
 rowMeans(cbind(q1,q2,q3),na.rm=T)   #mean of multiple variables
 rowSums (cbind(q1,q2,q3),na.rm=T)   #sum of multiple variables

If you really want to apply a function to the individual rows of a
matrix-like object then apply() is your friend:

?rowMeans states:

Details:

 These functions are equivalent to use of 'apply' with 'FUN = mean'
 or 'FUN = sum' with appropriate margins, but are a lot faster.

So see ?apply and argument 'margin'. For rows use margin = 1, e.g.:

dat - matrix(runif(1000), ncol = 100)
apply(dat, 1, mean)
rowMeans(dat)


 
 Can the standard column functions (listed in the first sentence) be 
 applied to rows, with the use of correct indexes to reference the 
 columns of interest?  Or, must these summary functions be programmed 
 separately to work on a row?

You can only use those functions on a column via subsetting, e.g.:

mean(dat[,4])
min(dat[,4])

If all you want is a single row (the equivalent of what you seem to be
asking) then these also work:

mean(dat[4,])
min(dat[4,])

HTH

G

 
 Thanks,
 
 Gerard
 
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.