On Tue, 2011-09-20 at 06:32 -0700, Martin Morgan wrote: > On 09/20/2011 05:56 AM, Rene Paradis wrote: > > On Mon, 2011-09-19 at 12:10 -0700, Martin Morgan wrote: > >> On 09/19/2011 11:55 AM, Michael Lawrence wrote: > >>> > >>> > >>> On Mon, Sep 19, 2011 at 11:31 AM, Martin Morgan<mtmor...@fhcrc.org > >>> <mailto:mtmor...@fhcrc.org>> wrote: > >>> > >>> On 09/19/2011 11:26 AM, Rene Paradis wrote: > >>> > >>> Thanks Martin and Michael for your constructive advices, > >>> > >>> I used the ScanBamParam object to successfully load a part of > >>> the Chr1 > >>> from a Bam file via ScanBam. Honestly I do not know what are the > >>> differences between readGappedAlignments, readBamGappedAlignment > >>> and > >>> ScanBam. The last two of them can take a ScanBamParam object. > >>> > >>> > >>> scanBam returns a list-of-lists, it's the most flexible but least > >>> 'user-friendly'. > >>> > >>> readGappedAlignments is meant to be a 'front end' to read > >>> GappedAlignments from several different sources, and > >>> readBamGappedAlignments is meant to be one of those sources; usually > >>> the 'user' would readGappedAlignments. > >>> > >>> > >>> But I wished I could select the seqname in GRanges to retrieve > >>> all the > >>> chr1 (as an example) data from the Bam file. It seems I must > >>> select a > >>> range. So I put a value that goes beyond the range of the chr1 > >>> because I > >>> do not know that range, and I got an<<INTEGER () can only be > >>> applied to > >>> a 'integer', not a special>>. > >>> > >>> > >>> Couldn't Rsamtools give something more informative? > >> > >> The info in the original post isn't enough to understand how the error > >> occurs. > >> > > To reproduce the original error: > > choose a 30GB Bam file. and run: > > dataIP<- read.table(fileName,header = TRUE, colClasses - > > c("factor","integer","factor")) > > > > after a while, it will trow the error. > > > > > > the seqinfo above did not work for me, probably I do not have the latest > > devel version of Seqinfo. > > > >> fl > > [1] "/opt/R/lib/Rsamtools/extdata/ex1.bam" > >> seqinfo(open(BamFile(fl))) > > Error in function (classes, fdef, mtable) : > > unable to find an inherited method for function "seqinfo", for > > signature "BamFile" > >> > > > > > > otherwise, we have splitted the Big bam into chromosomes with samtools. > > We were then able to load these smaller files. > > Hi Rene -- > > Here's a BAM file, for example > > library(Rsamtools) > fl <- system.file("extdata", "ex1.bam", package="Rsamtools") > > Get the information about each sequence length from the BAM file, and > make it a GRanges object, 1 ranges per sequence > > t <- scanBamHeader(fl)[[1]][["targets"]] > which <- GRanges(names(t), IRanges(1, unname(t))) > > Then iterate over the object, either in a 'for' loop > > for (i in seq_along(which)) { > ## read one chromosome > param <- ScanBamParam(which=which[i], what=character()) > aln <- readGappedAlignments(fl, param=param) > ## do more work... > } > > or lapply > > lapply(seq_along(which), function(i, fl, which) { > param <- ScanBamParam(which=which[i], what=character()) > aln <- readGappedAlignments(fl, param=param) > ## do more work > table(width(aln)) > }, fl, which) > > Hope that helps, > > Martin > Thanks Martin, your advices were really useful.
Here is what I did: t <- scanBamHeader("example.bam")[[1]][["targets"]] which <- GRanges(names(t)[22], IRanges(1, unname(t)[22])) param <- ScanBamParam(which=which, what=character()) aln <- readGappedAlignments("example.bam", param=param) Error in validObject(.Object) : invalid class "GappedAlignments" objects: 'ranges' contains values outside of sequence bounds This bam file comes from an Illumina sequencing machine and it is human genome. I think now this is not a problem regarding a large BAM file, but a problem in the bam file itself. Rene > > > > > > > > >>> > >>> There must be something I missed that > >>> could help me doing that. > >>> > >>> > >>> see ?scanBamHeader, e.g., > >>> > >>> > fl<- system.file("extdata", "ex1.bam", package="Rsamtools") > >>> > scanBamHeader(fl)[[1]]$targets > >>> seq1 seq2 > >>> 1575 1584 > >>> > >>> Would be nice to have a method for getting a Seqinfo out of a BAM > >>> header. Then one can just coerce that to a GRanges. rtracklayer does the > >>> equivalent for BigWig. > >> > >> In devel, > >> > >> > seqinfo(open(BamFile(fl))) > >> Seqinfo of length 2 > >> seqnames seqlengths isCircular > >> seq1 1575 NA > >> seq2 1584 NA > >> > >> I think this needs to be updated to deal with recent changes to Seqinfo > >> to store the reference genome (which is sometimes also present in the > >> BAM file). > >> > > I think this update would be useful for us. Thank you for your support > > Martin and Michael. > > > > Rene > > > >> Maritn > >>> > >>> Michael > >>> > >>> Martin > >>> > >>> > >>> > >>> ultimately, I want to launch a PICS analysis that requires a > >>> segReadsList object. > >>> > >>> Overall I definitely progressed by your help, thank you. > >>> > >>> Rene > >>> > >>> > >>> > >>> > >>> On Fri, 2011-09-16 at 14:29 -0700, Martin Morgan wrote: > >>> > >>> On 09/16/2011 02:11 PM, Michael Lawrence wrote: > >>> > >>> It sounds like you're trying to use BED as an > >>> alternative to BAM? Probably > >>> not a good idea, especially at this scale. Why are you > >>> aiming for a > >>> GenomeData? A GappedAlignments might be more > >>> appropriate. See > >>> GenomicRanges::__readGappedAlignments() for bringing a > >>> BAM into a > >>> GappedAlignments. > >>> > >>> > >>> Hi Rene > >>> > >>> the 'which' argument to readGappedAlignments (it'll become > >>> 'param' with > >>> the next release, and be a ScanBamParam object) allows you > >>> to select > >>> regions to process, e.g., chromosome-at-a-time, to help with > >>> file size. > >>> > >>> Martin > >>> > >>> > >>> This page might help: > >>> > >>> http://bioconductor.org/help/__workflows/high-throughput-__sequencing/#sequencing-__resources > >>> > >>> <http://bioconductor.org/help/workflows/high-throughput-sequencing/#sequencing-resources> > >>> > >>> But it could really be improved. > >>> > >>> Michael > >>> > >>> On Fri, Sep 16, 2011 at 1:44 PM, Rene > >>> Paradis<rene.paradis@genome.__ulaval.ca > >>> <mailto:rene.para...@genome.ulaval.ca> > >>> > >>> wrote: > >>> > >>> > >>> Hello, > >>> > >>> I am experiencing a problem regarding the load in > >>> memory of bed files of > >>> 30 GB. my function read.table unleash the error : > >>> Error in unique(x) : > >>> length xxxxxx is too large for hashing. > >>> > >>> this is generated by the function MKsetup of the > >>> unique.c file. Even by > >>> increasing by 10 000x the value, the error persists. > >>> I believe the > >>> function pushes more data in ram, but I am not sure > >>> this is the good way > >>> to focus on. > >>> > >>> Ultimately, I would like to produce a GenomeData > >>> object from either a > >>> BAM file or a bed file. > >>> > >>> has someone ever worked with very very big BAM files > >>> (about 30 GB) > >>> > >>> thanks > >>> > >>> Rene paradis > >>> > >>> _________________________________________________ > >>> Bioc-sig-sequencing mailing list > >>> Bioc-sig-sequencing@r-project.__org > >>> <mailto:Bioc-sig-sequencing@r-project.org> > >>> > >>> https://stat.ethz.ch/mailman/__listinfo/bioc-sig-sequencing > >>> > >>> <https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing> > >>> > >>> > >>> [[alternative HTML version deleted]] > >>> > >>> _________________________________________________ > >>> Bioc-sig-sequencing mailing list > >>> Bioc-sig-sequencing@r-project.__org > >>> <mailto:Bioc-sig-sequencing@r-project.org> > >>> > >>> https://stat.ethz.ch/mailman/__listinfo/bioc-sig-sequencing > >>> > >>> <https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> -- > >>> Computational Biology > >>> Fred Hutchinson Cancer Research Center > >>> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 > >>> > >>> Location: M1-B861 > >>> Telephone: 206 667-2793<tel:206%20667-2793> > >>> > >>> _________________________________________________ > >>> Bioc-sig-sequencing mailing list > >>> Bioc-sig-sequencing@r-project.__org > >>> <mailto:Bioc-sig-sequencing@r-project.org> > >>> https://stat.ethz.ch/mailman/__listinfo/bioc-sig-sequencing > >>> <https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing> > >>> > >>> > >> > >> > > > > > > _______________________________________________ Bioc-sig-sequencing mailing list Bioc-sig-sequencing@r-project.org https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing