Re: [R] how to apply functions to unbalanced data in long format by factors......cant get by or aggregate to work
Hello R-users The help I received from Petr helped me created this solution to my problems. t1-with(mydata ,aggregate(mydata$Y, list(mydata$time,mydata$treatment, mydata$expREP, mydata$techREP) , median, na.rm=T)) ### find median by factors colnames(t1)-c(time,treatment,expREP,techREP,Y50) ### column name ## newdata-merge(mydata, t1, by.x= names(mydata)[2:5], by.y=names(t1)[1:4], all=T) Thank you, Alan ### Message: 97 Date: Thu, 08 Mar 2007 08:00:53 +0100 From: Petr Pikal [EMAIL PROTECTED] Subject: Re: [R] how to apply functions to unbalanced data in long format by factors..cant get by or aggregate to work To: ALAN SMITH [EMAIL PROTECTED], r-help@stat.math.ethz.ch Message-ID: [EMAIL PROTECTED] Content-Type: text/plain; charset=US-ASCII Hi you can use aggregate to create table of medians with(mydata, aggregate(Y, list(time, tratment, expRep,), median) repeats of unique factors either by rle or aggregate with length function Then you can do replication by norep - rep(your.median, each = your replicates) Regards Petr submitted question abrigded Hello R users, #Example data frame## mydata-as.data.frame(structure(list(cpdID = c(7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 19, 19, 19, 19, 19, 19, 23, 23, 23, 23, 23, 23, 23, 23, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47), time = structure(as.integer(c(1, 1, 1, 1, 2, 2, 2, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 1, 1, 2, 1, 2, 2, 1, 1, 2, 2, 1, 2, 2, 2, 1, 2, 1, 2, 2, 2, 2, 1, 2, 2, 1, 2, 1, 2, 2, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 2, 1, 1, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 2, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 1, 2, 1, 2, 2, 1, 2, 1, 2, 2, 1, 1, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 1, 2, 2, 1, 2)), .Label = c(120hr, 24hr), class = factor), treatment = structure(as.integer(c(1, 1, 1, 2, 2, 1, 1, 2, 1, 1, 2, 2, 1, 2, 2, 1, 2, 2, 2, 1, 2, 1, 2, 2, 1, 2, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 2, 2, 2, 1, 2, 1, 1, 2, 2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 2, 1, 2, 1, 1, 1, 2, 2, 1, 1, 1, 1, 2, 2, 2, 2, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 2, 1, 2, 2, 1, 1, 2, 1, 1, 2, 2, 1, 2, 2, 1, 1, 1, 2, 1, 2, 2, 2, 1, 1, 1, 1, 2, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 1, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 2, 1, 2, 2, 2, 2, 1, 1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 1, 2, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 2, 1, 1, 1, 1, 2, 2, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 1, 1, 1, 2, 1)), .Label = c(control, trt), class = factor), expREP = structure(as.integer(c(1, 1, 1, 3, 1, 1, 1, 1, 2, 2, 1, 1, 2, 2, 1, 3, 1, 3, 3, 3, 1, 2, 1, 2, 2, 2, 2, 3, 3, 2, 2, 1, 2, 3, 3, 1, 1, 2, 3, 1, 3, 3, 3, 3, 1, 3, 1, 1, 2, 1, 1, 2, 3, 2, 2, 1, 3, 2, 2, 2, 3, 2, 1, 2, 2, 2, 2, 1, 1, 1, 3, 2, 2, 3, 3, 2, 2, 2, 3, 2, 3, 2, 3, 1, 2, 3, 3, 1, 1, 1, 3, 3, 1, 1, 3, 1, 1, 1, 1, 1, 3, 3, 3, 1, 1, 1, 2, 3, 2, 2, 3, 2, 2, 2, 1, 1, 1, 3, 3, 2, 2, 2, 1, 3, 1, 2, 3, 1, 3, 3, 1, 2, 3, 1, 2, 1, 3, 1, 3, 3, 2, 2, 2, 1, 2, 2, 2, 1, 2, 2, 2, 2, 1, 2, 2, 2, 1, 3, 1, 1, 1, 1, 3, 3, 1, 1, 1, 1, 3, 3, 3, 3, 3, 1, 3, 1, 1, 1, 1, 3, 3, 1, 1, 1, 3, 2, 1, 1, 2, 1, 3, 2, 1, 2, 1, 3, 1, 1, 2, 3)), .Label = c(expREP1, expREP2, expREP3), class = factor), techREP = structure(as.integer(c(3, 2, 1, 1, 1, 3, 1, 3, 3, 2, 2, 1, 1, 3, 2, 3, 3, 1, 2, 1, 2, 1, 3, 1, 3, 2, 2, 3, 1, 1, 3, 3, 2, 3, 3, 3, 2, 2, 2, 2, 1, 1, 2, 3, 1, 2, 3, 1, 3, 2, 1, 1, 2, 2, 3, 3, 3, 2, 1, 2, 1, 2, 3, 2, 3, 2, 3, 1, 3, 2, 2, 1, 2, 1, 3, 2, 2, 1, 1, 3, 3, 3, 1, 3, 1, 3, 2, 2, 2, 1, 2, 1, 3, 1, 3, 2, 1, 3, 1, 3, 2, 3, 1, 1, 2, 3, 1, 1, 3, 3, 2, 2, 1, 2, 3, 2, 2, 3, 2, 2, 1, 2, 2, 3, 3, 3, 1, 1, 1, 3, 1, 1, 2, 3, 2, 3, 3, 1, 1, 3, 2, 3, 3, 3, 3, 1, 3, 2, 1, 3, 3, 1, 3, 2, 1, 2, 2, 1, 2, 1, 2, 1, 1, 1, 2, 2, 3, 3, 1, 2, 3, 2, 3, 3, 3, 1, 2, 2, 1, 3, 2, 3, 3, 2, 2, 2, 3, 2, 1, 3, 1, 3, 1, 3, 1, 1, 1, 2, 2, 3)), .Label = c(techREP1, techREP2, techREP3), class = factor), log2Abun = c
[R] how to apply functions to unbalanced data in long format by factors......cant get by or aggregate to work
Hello R users, Problem...I do not understand how to use aggregate,by, or the appropriate apply to perform a function on data with more than one factor on unbalanced data... I have a data frame in the long format that does not contain balanced data. The ID is a unique identifier corresponding to the experimental unit that will later be examined by ANOVA, T-tests etc. Y is the data generated from the experiment. The factors represent the differences between each sample or run measured. str(mydata) ### sample of table at bottom of email ### 'data.frame': 129982 obs. of 6 variables: $ ID: num 7 7 7 7 7 7 8 8 8 8 ... $ time : Factor w/ 2 levels 120hr,24hr: 1 1 1 1 2 2 2 1 1 1 ... $ treatment: Factor w/ 2 levels control,trt: 1 1 1 2 2 1 1 2 1 1 ... $ expREP : Factor w/ 3 levels expREP1,expREP2,..: 1 1 1 3 1 1 1 1 2 2 ... $ techREP : Factor w/ 3 levels techREP1,techREP2,..: 3 2 1 1 1 3 1 3 3 2 ... $ Y : num 14.4 14.1 14.2 13.8 14.1 ... Could someone please help with doing something like the following 1. I would like to find the median for each unique combination of factors using the data in the long format (like finding the median of a single column of data). 2. Create a new column where the median is repeated for the number of rows of the unique factor combination 3. I would like to learn the most efficient way to do this because I want to avoid recreating the table from scratch with many commands like the series below. I will have to perform this operation on many different data sets some, with many more factors then this example. ### help me learn to use an apply or other command that will do the following # m0-mydata$cpdID[mydata$time==24hr mydata$treatment==control mydata$expREP==expREP1 mydata$techREP==techREP1] m1-mydata$Y[mydata$time==24hr mydata$treatment==control mydata$expREP==expREP1 mydata$techREP==techREP1] m2-median(m1) m3-cbind(ID=m0,time=rep(24hr,length(m1)), treatment=rep(control,length(m1)), expREP=rep(expREP1,length(m1)), techREP=rep(techREP1,length(m1)),Y=m1,Y50=rep(m2,length(m1))) # I would like to avoid writing the above hundreds of times ## I am able to reshape into wide format and then find the column medians. However restacking the data and regenerating the factors becomes very very messy on data sets with 150 columns. I am able to preform this analysis is SAS easily using BY, but I would like to know how to do it in R. I have tried these commands in a number of different variations with no luck and similar error messages test1-aggregate(mydata[,-1], list(mydata$time,mydata$treatment,mydata$expREP,mydata$techREP) ,median, na.rm=T) Error in median.default(X[[1]], ...) : need numeric data ### Y in numeric test1-by(mydata[,-1], list(mydata$time,mydata$treatment,mydata$expREP,mydata$techREP) ,median, na.rm=T) Error in median.default(data[x, ], ...) : need numeric data Thanks Alan winXP R 2.4.1 #Example data frame## mydata-as.data.frame(structure(list(cpdID = c(7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 19, 19, 19, 19, 19, 19, 23, 23, 23, 23, 23, 23, 23, 23, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47), time = structure(as.integer(c(1, 1, 1, 1, 2, 2, 2, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 1, 1, 2, 1, 2, 2, 1, 1, 2, 2, 1, 2, 2, 2, 1, 2, 1, 2, 2, 2, 2, 1, 2, 2, 1, 2, 1, 2, 2, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 2, 1, 1, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 2, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 1, 2, 1, 2, 2, 1, 2, 1, 2, 2, 1, 1, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 1, 2, 2, 1, 2)), .Label = c(120hr, 24hr), class = factor), treatment = structure(as.integer(c(1, 1, 1, 2, 2, 1, 1, 2, 1, 1, 2, 2, 1, 2, 2, 1, 2, 2, 2, 1, 2, 1, 2, 2, 1, 2, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 2, 2, 2, 1, 2, 1, 1, 2, 2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 2, 1, 2, 1, 1, 1, 2, 2, 1, 1, 1, 1, 2, 2, 2, 2, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 2, 1, 2, 2, 1, 1, 2, 1, 1, 2, 2, 1, 2, 2, 1, 1, 1, 2, 1, 2, 2, 2, 1, 1, 1, 1, 2, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 1, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 2, 1, 2, 2, 2, 2, 1, 1, 2, 2, 1, 2, 2, 2, 2, 2, 2,
[R] Help with tabs, and importing data
Hello, I am new at using R and I am trying to import data from microarray experiments and analyze the data using the MAANOVA package and also bioconductor. I have searched this lists email archive and could not find the solution to my problem so hopefully members of this group can help me. OS=windowsXP R 1.6.2 My Problem: It seems that MS excel is adding little boxes at the end of each cell when I save the file as tab delimited. Is there any way to let R know by sep command that these little boxes really mean a tab is present? I have had success using comma delimited format, but it seems that one of the programs I am using is designed to recognize tabs. Another question: Is there a way to stop MS excel (XP version) from placing these hidden boxes after each of the cells in a row and really place a tab? Could you please reply back to my email address because I am not a member of this list. Thank You, Alan Smith Graduate Student University of Wisconsin-Madison Plant Breeding and Plant Genetics [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help