Hello Michael, A high level vignette with the infrastructure of the BioC would be great.
Also, I can be more specific about a class problem I am facing. It concerns a developmental package that I am privileged to be allowed to test. It's chipseq. I am trying to follow a typical workflow guide as shown here: http://www.bioconductor.org/workshops/2009/SeattleJan09/ChIP-seq/ChipSeqWorkflow.pdf As you can see, the data that the package uses is not raw data but data that has been read in and labelled somehow beforehand. The document shows load("../data/alignedLocs.rda") That is not the scenario a user will find. A user will have one or several s_X_export.txt files. So, my attempts to get my data read in in the simplest case is this > library(chipseq) > library(lattice) > setwd('/scratch1/igregore/ChIPseq/runs/09-04-10/GERALD_14-04-2009_niddk/') > pattern <- "s_1_export.txt" > alignedLocs <- as(readAligned(".", + pattern, + "SolexaExport", + filter = alignDataFilter(expression(filtering == "Y"))), + "GenomeData") > class(alignedLocs) [1] "GenomeData" attr(,"package") [1] "BSgenome" The guide says that alignedLocs should be a GenomeDataList class object but it shows up as class GenomeData. The guide also shows > alignedLocs A GenomeDataList instance of length 3 but when I try it as is I get: > alignedLocs A GenomeData instance of length 51154 To try to figure this out by myself I went to http://www.bioconductor.org/docs/ and search everywhere for the string GenomeDataList. I got zero hits which means that I do not know where to start. As you can see, the problem I face is not actually the chipseq package itself but how to prepare the data to make it analysable by chipseq. Can you shed some light on this? Thank you! Ivan ________________________________ From: Michael Lawrence <[email protected]> Cc: [email protected] Sent: Monday, 20 April, 2009 15:00:01 Subject: Re: [Bioc-sig-seq] A myriad of classes Hello fellow listers, Is there a document summarizing the myriad of data containing classes? No, not yet. We're working on a vignette for the IRanges package (we'll have something in about a week), which will need to be complemented by additional vignettes in Biostrings and BSgenome. There is probably also a need for a high-level vignette explaining the sequence infrastructure in BioC. I am trying to find a map to help me understand what is the difference between, say, GenomicData, GenomeData, GenomedataList, etc. I need to be able to inter-convert data, merge different sources of data, and also subset data. Can you be more specific? GenomicData is no longer a class, but there is a GenomicData function, which is a genome-oriented constructor for RangedData in the rtracklayer package. GenomeData (from the BSgenome package) is for storing arbitrary data objects on a per chromosome level. RangedData (in IRanges) is similar, except the data need to fit into a rectangular data.frame-like structure. Hope this helps and sorry for the confusion, Michael For a single class, I think that finding the information is easy. For many classes, it gets challenging. A possible solution is reading the whole BioC documentation but then BioC rises a productivity issue for users that are not developers. Anybody can advice? Thank you, Ivan _______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing [[alternative HTML version deleted]] _______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
