[R] Grouping sets of data, performing function and re-assigning values

2010-08-27 Thread Johnny Tkach
Hi there,

I hope you have time to read this question and offer a suggestion or two.

My basic question is this:  

I have data in sets of three.  I would like to combine the data from each set, 
perform a function (probably just taking the median and MAD), then re-assign 
these values to each of the original sets of data.

As a bit of background, I have performed a microscopy screen and analyzed the 
images using software called CellProfiler.  I have three 'control' images that 
I would like to combine and then compare parameters from each of my 'treatment' 
images to the combined control data.

However, there are a couple of 'wrinkles' to my problem: Not all the data is 
actually in sets of three.   If there are no objects in a particular 
field,CellProfiler does not output anything for that image.  For example, I 
could have the following data set:

ImageNumber Measurement
1   5
2   7
3   8
4   3
6   9

So in this example, I would like to combine images 1 to 3 in one set and 4 to 6 
in another set (image 5 was empty).

Here is another 'wrinkle':
For each image, there are multiple measurements based on the number of objects 
in the field of view I have measured.  So my data actually looks something like 
this

ImageNumber Measurement
1   4
1   5
1   5
2   6
2   7
2   8
3   8
3   9
4   1
4   1
4   6
4   3
6   10
6   9
6   5
6   8

I have attached a .csv containing a portion of the actual data (in this 
example, images 1 through 3 are not present).

Right now, I can split the data.frame based on the ImageNumber column and 
perform functions, but I can't come up with a way of combining the data in a 
way that might account for the absence of some of the images.

Thanks for reading and any help and suggestions are much appreciated.   


JT





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping sets of data, performing function and re-assigning values

2010-08-27 Thread Johnny Tkach
Hi all,


Since I could not attach a file to my original e-mail request, for those who 
want to look at an example of a data file I am working with, please use this 
link:

http://dl.dropbox.com/u/4637975/exampledata.csv

Thanks again,

Johnny.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping sets of data, performing function and re-assigning values

2010-08-27 Thread Ista Zahn
Hi Johnny,

If I understand correctly, I think you can use cut() to create a grouping
variable, and then calculate your summaries based on that. Something like

dat - read.csv(~/Downloads/exampledata.csv)

dat$image.group - cut(dat$a.ImageNumber, breaks = seq(0,
max(dat$a.ImageNumber), by = 3))
library(plyr)
ddply(dat, .(image.group), transform, measure.median = median(Measurement))

dat.med - ddply(dat, .(image.group), summarize,
  a.AreaShape_Area.median = median(a.AreaShape_Area),
  a.Intensity_IntegratedIntensity_OrigRFP.median =
median(a.Intensity_IntegratedIntensity_OrigRFP),
  a.Intensity_IntegratedIntensity_OrigGFP.median =
median(a.Intensity_IntegratedIntensity_OrigGFP),
  b.Intensity_MeanIntensity_OrigGFP.median =
median(b.Intensity_MeanIntensity_OrigGFP),
  EstCytoIntensity.median = median(EstCytoIntensity),
  TotalIntensity.median = median(TotalIntensity),
  NucToCytoRatio.median = median(NucToCytoRatio)
  )

Best,
Ista
On Fri, Aug 27, 2010 at 5:28 PM, Johnny Tkach johnny.tk...@utoronto.cawrote:

 Hi all,


 Since I could not attach a file to my original e-mail request, for those
 who want to look at an example of a data file I am working with, please use
 this link:

 http://dl.dropbox.com/u/4637975/exampledata.csv

 Thanks again,

 Johnny.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping sets of data, performing function and re-assigning values

2010-08-27 Thread Johnny Tkach
HI Ista,

Thanks for the help.  The 'cut' function seems to do the trick .

I'm not sure why you suggested this line of code:
 ddply(dat, .(image.group), transform, measure.median = median(Measurement))

I think I might have confused the issue by putting a 'Measurement' column in my 
example in the body of the e-mail, while there is no such column in the actual 
data.

The second ddply function on the cut data file seems to do the trick for taking 
the median of the relevant data. However, I still have one more question.  
Would it be possible to assign the median data back to the original 
a.ImageNumber number.  In this situation, the same data would be associated 
with images 1 through 3 and another set associated with images 4 through 6 and 
so on.

For example (again, I use 'Measurement' just as a generic column):

ImageNumber Measurement
1   1
1   2
1   3
2   2
2   2
3   4
3   3
3   3
3   4

where the median of all the 'Measurement' data is 3 and the output would be:

ImageNumber Measurement
1   3
1   3
1   3
2   3
2   3
3   3
3   3
3   3
3   3

or

ImageNumber Measurement
1   3
2   3
3   3

I really appreciate your help with this.

JT

Johnny Tkach, PhD
Donnelly CCBR, Rm. 1230 
Department of Biochemistry
University of Toronto
160 College Street
M5S 3E1

phone - 416 946 5774
fax - 416 978 8548
johnny.tk...@utoronto.ca

Beauty's just another word I'm never certain how to spell




On Aug 27, 2010, at 2:01 PM, Ista Zahn wrote:

 Hi Johnny,
 
 If I understand correctly, I think you can use cut() to create a grouping 
 variable, and then calculate your summaries based on that. Something like
 
 dat - read.csv(~/Downloads/exampledata.csv)
 
 dat$image.group - cut(dat$a.ImageNumber, breaks = seq(0, 
 max(dat$a.ImageNumber), by = 3))
 library(plyr)
 ddply(dat, .(image.group), transform, measure.median = median(Measurement))
 
 dat.med - ddply(dat, .(image.group), summarize,
   a.AreaShape_Area.median = median(a.AreaShape_Area),
   a.Intensity_IntegratedIntensity_OrigRFP.median = 
 median(a.Intensity_IntegratedIntensity_OrigRFP),
   a.Intensity_IntegratedIntensity_OrigGFP.median = 
 median(a.Intensity_IntegratedIntensity_OrigGFP),
   b.Intensity_MeanIntensity_OrigGFP.median = 
 median(b.Intensity_MeanIntensity_OrigGFP),
   EstCytoIntensity.median = median(EstCytoIntensity),
   TotalIntensity.median = median(TotalIntensity),
   NucToCytoRatio.median = median(NucToCytoRatio)
   )
 
 Best,
 Ista
 On Fri, Aug 27, 2010 at 5:28 PM, Johnny Tkach johnny.tk...@utoronto.ca 
 wrote:
 Hi all,
 
 
 Since I could not attach a file to my original e-mail request, for those who 
 want to look at an example of a data file I am working with, please use this 
 link:
 
 http://dl.dropbox.com/u/4637975/exampledata.csv
 
 Thanks again,
 
 Johnny.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 
 -- 
 Ista Zahn
 Graduate student
 University of Rochester
 Department of Clinical and Social Psychology
 http://yourpsyche.org


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping sets of data, performing function and re-assigning values

2010-08-27 Thread Joshua Wiley
Hi Johnny,

Something like this

rbind(NA, dat.med)[as.numeric(dat$image.group), ]

should do the trick (with the data you provided and Ista's code).  The
key is that dat.med has a different row for each level of the factor
image.group (and in the same order).  The idea is to convert the
factor created by cut that shows which row belongs to which group into
numbers (2, 2, 2, 3, 3, 3, etc.), and use that to select the
appropriate rows from dat.med.

Since level 1 had no data (so no row in dat.med), I just added an NA
row in using rbind().  Supposing levels 1, 13, and 15 were missing,
you would need to insert rows in the appropriate positions for my code
to work.  This is because if you have a factor with 3 levels (like you
do), when converted to numbers they will be 1, 2, and 3.  Even if
there is no actual data for level 1, the numeric conversions of levels
2 and 3 will still be 2 and 3.  So, you need to make sure that row 2
in dat.med, matches level 2 in image.group, and so on for every other
level.

HTH,

Josh



On Fri, Aug 27, 2010 at 12:26 PM, Johnny Tkach johnny.tk...@utoronto.ca wrote:
 HI Ista,

 Thanks for the help.  The 'cut' function seems to do the trick .

 I'm not sure why you suggested this line of code:
 ddply(dat, .(image.group), transform, measure.median = median(Measurement))

 I think I might have confused the issue by putting a 'Measurement' column in 
 my example in the body of the e-mail, while there is no such column in the 
 actual data.

 The second ddply function on the cut data file seems to do the trick for 
 taking the median of the relevant data. However, I still have one more 
 question.  Would it be possible to assign the median data back to the 
 original a.ImageNumber number.  In this situation, the same data would be 
 associated with images 1 through 3 and another set associated with images 4 
 through 6 and so on.

 For example (again, I use 'Measurement' just as a generic column):

 ImageNumber     Measurement
 1                               1
 1                               2
 1                               3
 2                               2
 2                               2
 3                               4
 3                               3
 3                               3
 3                               4

 where the median of all the 'Measurement' data is 3 and the output would be:

 ImageNumber     Measurement
 1                               3
 1                               3
 1                               3
 2                               3
 2                               3
 3                               3
 3                               3
 3                               3
 3                               3

 or

 ImageNumber     Measurement
 1                               3
 2                               3
 3                               3

 I really appreciate your help with this.

 JT

 Johnny Tkach, PhD
 Donnelly CCBR, Rm. 1230
 Department of Biochemistry
 University of Toronto
 160 College Street
 M5S 3E1

 phone - 416 946 5774
 fax - 416 978 8548
 johnny.tk...@utoronto.ca

 Beauty's just another word I'm never certain how to spell




 On Aug 27, 2010, at 2:01 PM, Ista Zahn wrote:

 Hi Johnny,

 If I understand correctly, I think you can use cut() to create a grouping 
 variable, and then calculate your summaries based on that. Something like

 dat - read.csv(~/Downloads/exampledata.csv)

 dat$image.group - cut(dat$a.ImageNumber, breaks = seq(0, 
 max(dat$a.ImageNumber), by = 3))
 library(plyr)
 ddply(dat, .(image.group), transform, measure.median = median(Measurement))

 dat.med - ddply(dat, .(image.group), summarize,
       a.AreaShape_Area.median = median(a.AreaShape_Area),
       a.Intensity_IntegratedIntensity_OrigRFP.median = 
 median(a.Intensity_IntegratedIntensity_OrigRFP),
       a.Intensity_IntegratedIntensity_OrigGFP.median = 
 median(a.Intensity_IntegratedIntensity_OrigGFP),
       b.Intensity_MeanIntensity_OrigGFP.median = 
 median(b.Intensity_MeanIntensity_OrigGFP),
       EstCytoIntensity.median = median(EstCytoIntensity),
       TotalIntensity.median = median(TotalIntensity),
       NucToCytoRatio.median = median(NucToCytoRatio)
       )

 Best,
 Ista
 On Fri, Aug 27, 2010 at 5:28 PM, Johnny Tkach johnny.tk...@utoronto.ca 
 wrote:
 Hi all,


 Since I could not attach a file to my original e-mail request, for those who 
 want to look at an example of a data file I am working with, please use this 
 link:

 http://dl.dropbox.com/u/4637975/exampledata.csv

 Thanks again,

 Johnny.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 --
 Ista Zahn
 Graduate student
 University of Rochester
 Department of Clinical and Social Psychology
 http://yourpsyche.org


        [[alternative HTML version