[R] Grouping sets of data, performing function and re-assigning values
Hi there, I hope you have time to read this question and offer a suggestion or two. My basic question is this: I have data in sets of three. I would like to combine the data from each set, perform a function (probably just taking the median and MAD), then re-assign these values to each of the original sets of data. As a bit of background, I have performed a microscopy screen and analyzed the images using software called CellProfiler. I have three 'control' images that I would like to combine and then compare parameters from each of my 'treatment' images to the combined control data. However, there are a couple of 'wrinkles' to my problem: Not all the data is actually in sets of three. If there are no objects in a particular field,CellProfiler does not output anything for that image. For example, I could have the following data set: ImageNumber Measurement 1 5 2 7 3 8 4 3 6 9 So in this example, I would like to combine images 1 to 3 in one set and 4 to 6 in another set (image 5 was empty). Here is another 'wrinkle': For each image, there are multiple measurements based on the number of objects in the field of view I have measured. So my data actually looks something like this ImageNumber Measurement 1 4 1 5 1 5 2 6 2 7 2 8 3 8 3 9 4 1 4 1 4 6 4 3 6 10 6 9 6 5 6 8 I have attached a .csv containing a portion of the actual data (in this example, images 1 through 3 are not present). Right now, I can split the data.frame based on the ImageNumber column and perform functions, but I can't come up with a way of combining the data in a way that might account for the absence of some of the images. Thanks for reading and any help and suggestions are much appreciated. JT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Grouping sets of data, performing function and re-assigning values
Hi all, Since I could not attach a file to my original e-mail request, for those who want to look at an example of a data file I am working with, please use this link: http://dl.dropbox.com/u/4637975/exampledata.csv Thanks again, Johnny. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Grouping sets of data, performing function and re-assigning values
Hi Johnny, If I understand correctly, I think you can use cut() to create a grouping variable, and then calculate your summaries based on that. Something like dat - read.csv(~/Downloads/exampledata.csv) dat$image.group - cut(dat$a.ImageNumber, breaks = seq(0, max(dat$a.ImageNumber), by = 3)) library(plyr) ddply(dat, .(image.group), transform, measure.median = median(Measurement)) dat.med - ddply(dat, .(image.group), summarize, a.AreaShape_Area.median = median(a.AreaShape_Area), a.Intensity_IntegratedIntensity_OrigRFP.median = median(a.Intensity_IntegratedIntensity_OrigRFP), a.Intensity_IntegratedIntensity_OrigGFP.median = median(a.Intensity_IntegratedIntensity_OrigGFP), b.Intensity_MeanIntensity_OrigGFP.median = median(b.Intensity_MeanIntensity_OrigGFP), EstCytoIntensity.median = median(EstCytoIntensity), TotalIntensity.median = median(TotalIntensity), NucToCytoRatio.median = median(NucToCytoRatio) ) Best, Ista On Fri, Aug 27, 2010 at 5:28 PM, Johnny Tkach johnny.tk...@utoronto.cawrote: Hi all, Since I could not attach a file to my original e-mail request, for those who want to look at an example of a data file I am working with, please use this link: http://dl.dropbox.com/u/4637975/exampledata.csv Thanks again, Johnny. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Grouping sets of data, performing function and re-assigning values
HI Ista, Thanks for the help. The 'cut' function seems to do the trick . I'm not sure why you suggested this line of code: ddply(dat, .(image.group), transform, measure.median = median(Measurement)) I think I might have confused the issue by putting a 'Measurement' column in my example in the body of the e-mail, while there is no such column in the actual data. The second ddply function on the cut data file seems to do the trick for taking the median of the relevant data. However, I still have one more question. Would it be possible to assign the median data back to the original a.ImageNumber number. In this situation, the same data would be associated with images 1 through 3 and another set associated with images 4 through 6 and so on. For example (again, I use 'Measurement' just as a generic column): ImageNumber Measurement 1 1 1 2 1 3 2 2 2 2 3 4 3 3 3 3 3 4 where the median of all the 'Measurement' data is 3 and the output would be: ImageNumber Measurement 1 3 1 3 1 3 2 3 2 3 3 3 3 3 3 3 3 3 or ImageNumber Measurement 1 3 2 3 3 3 I really appreciate your help with this. JT Johnny Tkach, PhD Donnelly CCBR, Rm. 1230 Department of Biochemistry University of Toronto 160 College Street M5S 3E1 phone - 416 946 5774 fax - 416 978 8548 johnny.tk...@utoronto.ca Beauty's just another word I'm never certain how to spell On Aug 27, 2010, at 2:01 PM, Ista Zahn wrote: Hi Johnny, If I understand correctly, I think you can use cut() to create a grouping variable, and then calculate your summaries based on that. Something like dat - read.csv(~/Downloads/exampledata.csv) dat$image.group - cut(dat$a.ImageNumber, breaks = seq(0, max(dat$a.ImageNumber), by = 3)) library(plyr) ddply(dat, .(image.group), transform, measure.median = median(Measurement)) dat.med - ddply(dat, .(image.group), summarize, a.AreaShape_Area.median = median(a.AreaShape_Area), a.Intensity_IntegratedIntensity_OrigRFP.median = median(a.Intensity_IntegratedIntensity_OrigRFP), a.Intensity_IntegratedIntensity_OrigGFP.median = median(a.Intensity_IntegratedIntensity_OrigGFP), b.Intensity_MeanIntensity_OrigGFP.median = median(b.Intensity_MeanIntensity_OrigGFP), EstCytoIntensity.median = median(EstCytoIntensity), TotalIntensity.median = median(TotalIntensity), NucToCytoRatio.median = median(NucToCytoRatio) ) Best, Ista On Fri, Aug 27, 2010 at 5:28 PM, Johnny Tkach johnny.tk...@utoronto.ca wrote: Hi all, Since I could not attach a file to my original e-mail request, for those who want to look at an example of a data file I am working with, please use this link: http://dl.dropbox.com/u/4637975/exampledata.csv Thanks again, Johnny. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Grouping sets of data, performing function and re-assigning values
Hi Johnny, Something like this rbind(NA, dat.med)[as.numeric(dat$image.group), ] should do the trick (with the data you provided and Ista's code). The key is that dat.med has a different row for each level of the factor image.group (and in the same order). The idea is to convert the factor created by cut that shows which row belongs to which group into numbers (2, 2, 2, 3, 3, 3, etc.), and use that to select the appropriate rows from dat.med. Since level 1 had no data (so no row in dat.med), I just added an NA row in using rbind(). Supposing levels 1, 13, and 15 were missing, you would need to insert rows in the appropriate positions for my code to work. This is because if you have a factor with 3 levels (like you do), when converted to numbers they will be 1, 2, and 3. Even if there is no actual data for level 1, the numeric conversions of levels 2 and 3 will still be 2 and 3. So, you need to make sure that row 2 in dat.med, matches level 2 in image.group, and so on for every other level. HTH, Josh On Fri, Aug 27, 2010 at 12:26 PM, Johnny Tkach johnny.tk...@utoronto.ca wrote: HI Ista, Thanks for the help. The 'cut' function seems to do the trick . I'm not sure why you suggested this line of code: ddply(dat, .(image.group), transform, measure.median = median(Measurement)) I think I might have confused the issue by putting a 'Measurement' column in my example in the body of the e-mail, while there is no such column in the actual data. The second ddply function on the cut data file seems to do the trick for taking the median of the relevant data. However, I still have one more question. Would it be possible to assign the median data back to the original a.ImageNumber number. In this situation, the same data would be associated with images 1 through 3 and another set associated with images 4 through 6 and so on. For example (again, I use 'Measurement' just as a generic column): ImageNumber Measurement 1 1 1 2 1 3 2 2 2 2 3 4 3 3 3 3 3 4 where the median of all the 'Measurement' data is 3 and the output would be: ImageNumber Measurement 1 3 1 3 1 3 2 3 2 3 3 3 3 3 3 3 3 3 or ImageNumber Measurement 1 3 2 3 3 3 I really appreciate your help with this. JT Johnny Tkach, PhD Donnelly CCBR, Rm. 1230 Department of Biochemistry University of Toronto 160 College Street M5S 3E1 phone - 416 946 5774 fax - 416 978 8548 johnny.tk...@utoronto.ca Beauty's just another word I'm never certain how to spell On Aug 27, 2010, at 2:01 PM, Ista Zahn wrote: Hi Johnny, If I understand correctly, I think you can use cut() to create a grouping variable, and then calculate your summaries based on that. Something like dat - read.csv(~/Downloads/exampledata.csv) dat$image.group - cut(dat$a.ImageNumber, breaks = seq(0, max(dat$a.ImageNumber), by = 3)) library(plyr) ddply(dat, .(image.group), transform, measure.median = median(Measurement)) dat.med - ddply(dat, .(image.group), summarize, a.AreaShape_Area.median = median(a.AreaShape_Area), a.Intensity_IntegratedIntensity_OrigRFP.median = median(a.Intensity_IntegratedIntensity_OrigRFP), a.Intensity_IntegratedIntensity_OrigGFP.median = median(a.Intensity_IntegratedIntensity_OrigGFP), b.Intensity_MeanIntensity_OrigGFP.median = median(b.Intensity_MeanIntensity_OrigGFP), EstCytoIntensity.median = median(EstCytoIntensity), TotalIntensity.median = median(TotalIntensity), NucToCytoRatio.median = median(NucToCytoRatio) ) Best, Ista On Fri, Aug 27, 2010 at 5:28 PM, Johnny Tkach johnny.tk...@utoronto.ca wrote: Hi all, Since I could not attach a file to my original e-mail request, for those who want to look at an example of a data file I am working with, please use this link: http://dl.dropbox.com/u/4637975/exampledata.csv Thanks again, Johnny. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org [[alternative HTML version