Hello Michael,

A high level vignette with the infrastructure of the BioC would be great.

Also, I can be more specific about a class problem I am facing. It concerns a 
developmental package that I am privileged to be allowed to test. It's chipseq.

I am trying to follow a typical workflow guide as shown here:

http://www.bioconductor.org/workshops/2009/SeattleJan09/ChIP-seq/ChipSeqWorkflow.pdf

As you can see, the data that the package uses is not raw data but data that 
has been read in and labelled somehow beforehand. The document shows

load("../data/alignedLocs.rda")

That is not the scenario a user will find. A user will have one or several 
s_X_export.txt files.

So, my attempts to get my data read in in the simplest case is this

> library(chipseq)
> library(lattice)
> setwd('/scratch1/igregore/ChIPseq/runs/09-04-10/GERALD_14-04-2009_niddk/')
> pattern <- "s_1_export.txt"
> alignedLocs <- as(readAligned(".",
+                               pattern,
+                               "SolexaExport",
+                               filter = alignDataFilter(expression(filtering 
== "Y"))),
+                   "GenomeData")
> class(alignedLocs)
[1] "GenomeData"
attr(,"package")
[1] "BSgenome"

The guide says that alignedLocs should be a GenomeDataList class object but it 
shows up as class GenomeData. The guide also shows

> alignedLocs
  A GenomeDataList instance of length 3

but when I try it as is I get:

>  alignedLocs
   A GenomeData instance of length 51154

To try to figure this out by myself I went to

http://www.bioconductor.org/docs/

and search everywhere for the string GenomeDataList. I got zero hits which 
means that I do not know where to start.

As you can see, the problem I face is not actually the chipseq package itself 
but how to prepare the data to make it analysable by chipseq.

Can you shed some light on this?

Thank you!

Ivan




________________________________
From: Michael Lawrence <[email protected]>

Cc: [email protected]
Sent: Monday, 20 April, 2009 15:00:01
Subject: Re: [Bioc-sig-seq] A myriad of classes







Hello fellow listers,

Is there a document summarizing the myriad of data containing classes?


No, not yet. We're working on a vignette for the IRanges package (we'll have 
something in about a week), which will need to be complemented by additional 
vignettes in Biostrings and BSgenome. There is probably also a need for a 
high-level vignette explaining the sequence infrastructure in BioC.



I am trying to find a map to help me understand what is the difference between, 
say, GenomicData, GenomeData, GenomedataList, etc.

I need to be able to inter-convert data, merge different sources of data, and 
also subset data.


Can you be more specific? GenomicData is no longer a class, but there is a 
GenomicData function, which is a genome-oriented constructor for RangedData in 
the rtracklayer package. GenomeData (from the BSgenome package) is for storing 
arbitrary data objects on a per chromosome level. RangedData (in IRanges) is 
similar, except the data need to fit into a rectangular data.frame-like 
structure.

Hope this helps and sorry for the confusion,
Michael



For a single class, I think that finding the information is easy. For many 
classes, it gets challenging.

A possible solution is reading the whole BioC documentation but then BioC rises 
a productivity issue for users that are not developers.

Anybody can advice?

Thank you,

Ivan




_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


      
        [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to