On Jul 27, 2011, at 7:02 PM, a217 wrote:

Hello,

I have an input file:
http://r.789695.n4.nabble.com/file/n3700031/testOut.txt testOut.txt

where col 1 is chromosome, column2 is start of region, column 3 is end of region, column 4 and 5 is base position, column 6 is total reads, column 7
is methylation data, and column 8 is the strand.


I would like a summary output file such as:
http://r.789695.n4.nabble.com/file/n3700031/out.summary.txt out.summary.txt

where column 1 is chromosome, column 2 is start of region, column 3 is end of region, column 4 is total reads in general, column 5 is total reads >=1, column 6 is (col4/col5) or the percentage, and at the end I'd like to list 6
more columns based on summary results from summary() function in R.

The summary() function will be used to analyze all of the methylation data
(col7 from input) for each region (bounded by col2 and col3).

For example for chr1 100 159 summary() gives:
Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
0.0400  0.0425  0.0450  0.0450  0.0475  0.0500

which is simply the methylation data input into summary() only in the region
of chr1 100 159.

I know how to perform all of the required functions line-by-line, but the
hard part for me is essentially taking the input data with multiple
positions in each region and assigning all of the summary results to one
line identified by the region.

If any of you have any suggestions I would appreciate it.

So essentially you want to drop columns 4:5 and column 8 and calculate a proportion of counts >= 1 and get summary stats within separate categories of start-of-region. Is that correct?

This is probably a job for aggregate or for ddply in plyr if I felt comfortable with it, which I don't in general. Its documentation through the help pages is s not great IMO but there are those who love it. And I admit the melt function is a major contributor to human happiness. Why don't you read up on aggregate which is a base function (in the r-sense, not in the biological sense.) I will see what I can come up with in the meantime.

--
David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to