Hi Chris-
You can more easily using sample sheet via dba() or as you suggest add in them
the one at a time using dba.peakset.
Each unique peakset (line in the samplesheet/call to dba.peakset) is a unique
combination of metadata (including the tissue, factor, condition, treatment,
and replicate), peakset (intervals + scores), aligned reads (per ChIP), and an
Input. So what you are calling unique biological samples would each have two
entries. If your peak data are in separate files for each ChIP, the calls to
dba.peakset would look like:
DBA =
dba.peakset(DBA,sampID="samp1_O-Glc",peaks="samp1HOMERpeaks.txt",factor="O-GlcNAc"
condition="male",treatment="nostress",replicate=1,
bamReads="samp1_O-GLcNAc.bam",
controlReads="samp1_Input.bam")
DBA =
dba.peakset(DBA,sampID="samp1_H3K4me3",peaks="samp1K$windows.txt",factor="H3K4me3"
condition="male",treatment="nostress",replicate=1,
bamReads="samp1_O-H3K4me3.bam",
controlReads="samp1_Input.bam")
In this case, the "samp1HOMERpeaks.txt" file contains the O-GlcNAc peaks called
by HOMER for this sample, in a four-column format (chromosome, start, end,
peakscore, and the "samp1K$windows.txt" file contains the H3K4me3 scores for
the sample.
Again, all this is easier to keep track of in a sample sheet.
If you've already combined all the peak scores into one big dataframe with
columns for each library, you can pass in the the ones you want using
the"peaks" parameter. Instead of setting peaks to a file containing the peaks
for the sample, pass in the dataframe with the first three columns set to
chromosome, start, and end, and the fourth column containing the score for that
specific library. You are probably better off not doing this, and loading all
the peak files separately, as DiffBind will create the combined table for you;
by supplying each individual peak file you can more easily look at how the
peaksets overlap.
Cheers-
Rory
On 08/04/2013 20:32, "Christopher Howerton"
<[email protected]<mailto:[email protected]>> wrote:
Hi Rory,
First, thank you for putting together this R package for those of us whose
skills lie elsewhere! I have worked through your vignette, and it appears that
your package will do exactly what I require. My problem can be framed as having
difficulties loading data into the appropriate format. Let me give you a few
specifics:
Experimental design: a 2(sex) X 2(stress/nostress)
ChIP marks: H3K4me3 & O-GlcNAc (a PTM of interest to our lab)
libraries/biological sample: H3K4me3, O-GlcNAc and Input
Data I have available to me: Our core has aligned the reads, and done a peak
calling for me; Homer (I believe) for the H3K4me3, and a custom bin based
approach (5kb sized bins) for the O-Glcnac. I also have all the upstream files
available to me as well.
The format for the data are: colnames = chromosome, start, end, libraries; so
there is an entry per genomic loci per library even though the number might be
zero
So, my question is, what is the appropriate way to read this in/analyze?
Reading through the package documentation, I may want to make separate R
dataframes/library, and then read them into one dba object using the
dba.peakset function with the peak.format = "raw" if this is the case, how do I
specify that there are 3 libraries from the same biological replicate (i.e.
H3K4me3, O-glcnac and input)?
I'm sure this is pretty straight forward, so I apologize for bugging you, but
just wanted to make sure I started on the right foot.
Best,
Chris
--
Christopher Howerton, PhD
Postdoctoral Researcher
(215) 898-1368
University of Pennsylvania
201E Vet
3800 Spruce Street
Philadelphia, PA 19104-6046
[[alternative HTML version deleted]]
_______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel