Re: [R] applying math/stat functions to rows in data frame
Hi all, Thanks for the suggestions. I have not yet tried the apply () approach, but have tried to get the indexed version working, so far with limited success. I realize that a transpose, as suggested, would work, but want to avoid that for something simpler. To repeat, the task is to perform a function on N rows (all numeric) in a data frame. I can use rowMeans, rowSums, pmin, and pmax to successfully average, sum, and find the min/max var1-var4). But If I try to get the mean (var5) using, no doubt the incorrect syntax: mean(df[,c(4,5,6,7)], na.rm=T), I do not get the mean of 4 column. I am not sure what is being returned as a value. I summarize the code below with the output: any further suggestions are appreciated, Thanks Gerard Code with output: df - read.table(textConnection(id,workshop,gender,q1,q2,q3,q4 + 1,1,f,1,1,5,1 + 2,2,f,2,1,4,1 + 3,1,f,2,2,4,3 + 4,2,f,3,1, ,3 + 5,1,m,4,5,2,4 + 6,2,m,5,4,5,5 + 7,1,m,5,3,4,4 + 8,2,m,4,5,5,5), header=TRUE, sep=,) attach(df) df$var1-rowMeans(cbind(q1,q2,q3,q4),na.rm=T) df$var2-rowSums (cbind(q1,q2,q3,q4),na.rm=T) df$var3-pmin(q1,q2,q3,q4, na.rm=T) df$var4-pmax(q1,q2,q3,q4, na.rm=T) df$var5-mean(df[,c(4,5,6,7)],na.rm=T) #not doing what I want df$var6-sd (df[,c(4,5,6,5)],na.rm=T) #not doing what I want df$var7-min (df[,c(4,5,6,5)],na.rm=T) #not doing what I want df$var8-max (df[,c(4,5,6,5)],na.rm=T) #not doing what I want df output with problem vars underlined: id workshop gender q1 q2 q3 q4 var1 var2 var3 var4 var5 var6 var7 var8 1 11 f 1 1 5 1 2.00815 3.25 1.48804815 2 22 f 2 1 4 1 2.00814 2.75 1.75254915 3 31 f 2 2 4 3 2.75 1124 4.142857 1.06904515 4 42 f 3 1 NA 3 2.33713 3.25 1.75254915 5 51 m 4 5 2 4 3.75 1525 3.25 1.48804815 6 62 m 5 4 5 5 4.75 1945 2.75 1.75254915 7 71 m 5 3 4 4 4.00 1635 4.142857 1.06904515 8 82 m 4 5 5 5 4.75 1945 3.25 1.75254915 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] applying math/stat functions to rows in data frame
At 12:02 PM 9/15/2007, Gerald wrote: Hi All, There are a variety of functions that can be applied to a variable (column) in a data frame: mean, min, max, sd, range, IQR, etc. I am aware of only two that work on the rows, using q1-q3 as example variables: rowMeans(cbind(q1,q2,q3),na.rm=T) #mean of multiple variables rowSums (cbind(q1,q2,q3),na.rm=T) #sum of multiple variables Can the standard column functions (listed in the first sentence) be applied to rows, with the use of correct indexes to reference the columns of interest? Or, must these summary functions be programmed separately to work on a row? Try using t() to transpose the matrix, and then apply the column function of interest. Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: [EMAIL PROTECTED] Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239Fax: 757-467-2947 Vere scire est per causas scire __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] applying math/stat functions to rows in data frame
On Sat, 2007-09-15 at 09:02 -0700, Gerard Smits wrote: Hi All, There are a variety of functions that can be applied to a variable (column) in a data frame: mean, min, max, sd, range, IQR, etc. I am aware of only two that work on the rows, using q1-q3 as example variables: rowMeans(cbind(q1,q2,q3),na.rm=T) #mean of multiple variables rowSums (cbind(q1,q2,q3),na.rm=T) #sum of multiple variables Can the standard column functions (listed in the first sentence) be applied to rows, with the use of correct indexes to reference the columns of interest? Or, must these summary functions be programmed separately to work on a row? Thanks, Gerard The answer is: it depends If the row can be coerced to a numeric vector, then yes. This presumes that the data frame contains a single data type or the subset of columns you need contains a single data type. If the row contains multiple data types, then the row becomes a single row data frame or a list and you would have to consider other possible approaches. For example: Taking the first row of the 'iris' dataset becomes a single row data frame: str(iris[1, ]) 'data.frame': 1 obs. of 5 variables: $ Sepal.Length: num 5.1 $ Sepal.Width : num 3.5 $ Petal.Length: num 1.4 $ Petal.Width : num 0.2 $ Species : Factor w/ 3 levels setosa,versicolor,..: 1 or if you set 'drop = TRUE', a list: str(iris[1, , drop = TRUE]) List of 5 $ Sepal.Length: num 5.1 $ Sepal.Width : num 3.5 $ Petal.Length: num 1.4 $ Petal.Width : num 0.2 $ Species : Factor w/ 3 levels setosa,versicolor,..: 1 If however, you remove the last column Species, which is a factor, you can coerce the remaining object to a numeric matrix: str(as.matrix(iris[, -5])) num [1:150, 1:4] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... - attr(*, dimnames)=List of 2 ..$ : NULL ..$ : chr [1:4] Sepal.Length Sepal.Width Petal.Length Petal.Width Some functions will do this coercion internally: For example: rowSums(iris) Error in rowSums(x, prod(dn), p, na.rm) : 'x' must be numeric However: head(rowSums(iris[, -5])) [1] 10.2 9.5 9.4 9.4 10.2 11.4 HTH, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] applying math/stat functions to rows in data frame
On Sat, 2007-09-15 at 09:02 -0700, Gerard Smits wrote: Hi All, There are a variety of functions that can be applied to a variable (column) in a data frame: mean, min, max, sd, range, IQR, etc. But one their own, these are not equivalents to rowMeans, rowSums etc below. I am aware of only two that work on the rows, using q1-q3 as example variables: rowMeans(cbind(q1,q2,q3),na.rm=T) #mean of multiple variables rowSums (cbind(q1,q2,q3),na.rm=T) #sum of multiple variables If you really want to apply a function to the individual rows of a matrix-like object then apply() is your friend: ?rowMeans states: Details: These functions are equivalent to use of 'apply' with 'FUN = mean' or 'FUN = sum' with appropriate margins, but are a lot faster. So see ?apply and argument 'margin'. For rows use margin = 1, e.g.: dat - matrix(runif(1000), ncol = 100) apply(dat, 1, mean) rowMeans(dat) Can the standard column functions (listed in the first sentence) be applied to rows, with the use of correct indexes to reference the columns of interest? Or, must these summary functions be programmed separately to work on a row? You can only use those functions on a column via subsetting, e.g.: mean(dat[,4]) min(dat[,4]) If all you want is a single row (the equivalent of what you seem to be asking) then these also work: mean(dat[4,]) min(dat[4,]) HTH G Thanks, Gerard [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.