On 09/20/2011 05:56 AM, Rene Paradis wrote:
On Mon, 2011-09-19 at 12:10 -0700, Martin Morgan wrote:
On 09/19/2011 11:55 AM, Michael Lawrence wrote:
On Mon, Sep 19, 2011 at 11:31 AM, Martin Morgan<mtmor...@fhcrc.org
<mailto:mtmor...@fhcrc.org>> wrote:
On 09/19/2011 11:26 AM, Rene Paradis wrote:
Thanks Martin and Michael for your constructive advices,
I used the ScanBamParam object to successfully load a part of
the Chr1
from a Bam file via ScanBam. Honestly I do not know what are the
differences between readGappedAlignments, readBamGappedAlignment and
ScanBam. The last two of them can take a ScanBamParam object.
scanBam returns a list-of-lists, it's the most flexible but least
'user-friendly'.
readGappedAlignments is meant to be a 'front end' to read
GappedAlignments from several different sources, and
readBamGappedAlignments is meant to be one of those sources; usually
the 'user' would readGappedAlignments.
But I wished I could select the seqname in GRanges to retrieve
all the
chr1 (as an example) data from the Bam file. It seems I must
select a
range. So I put a value that goes beyond the range of the chr1
because I
do not know that range, and I got an<<INTEGER () can only be
applied to
a 'integer', not a special>>.
Couldn't Rsamtools give something more informative?
The info in the original post isn't enough to understand how the error
occurs.
To reproduce the original error:
choose a 30GB Bam file. and run:
dataIP<- read.table(fileName,header = TRUE, colClasses -
c("factor","integer","factor"))
after a while, it will trow the error.
the seqinfo above did not work for me, probably I do not have the latest
devel version of Seqinfo.
fl
[1] "/opt/R/lib/Rsamtools/extdata/ex1.bam"
seqinfo(open(BamFile(fl)))
Error in function (classes, fdef, mtable) :
unable to find an inherited method for function "seqinfo", for
signature "BamFile"
otherwise, we have splitted the Big bam into chromosomes with samtools.
We were then able to load these smaller files.
Hi Rene --
Here's a BAM file, for example
library(Rsamtools)
fl <- system.file("extdata", "ex1.bam", package="Rsamtools")
Get the information about each sequence length from the BAM file, and
make it a GRanges object, 1 ranges per sequence
t <- scanBamHeader(fl)[[1]][["targets"]]
which <- GRanges(names(t), IRanges(1, unname(t)))
Then iterate over the object, either in a 'for' loop
for (i in seq_along(which)) {
## read one chromosome
param <- ScanBamParam(which=which[i], what=character())
aln <- readGappedAlignments(fl, param=param)
## do more work...
}
or lapply
lapply(seq_along(which), function(i, fl, which) {
param <- ScanBamParam(which=which[i], what=character())
aln <- readGappedAlignments(fl, param=param)
## do more work
table(width(aln))
}, fl, which)
Hope that helps,
Martin
There must be something I missed that
could help me doing that.
see ?scanBamHeader, e.g.,
> fl<- system.file("extdata", "ex1.bam", package="Rsamtools")
> scanBamHeader(fl)[[1]]$targets
seq1 seq2
1575 1584
Would be nice to have a method for getting a Seqinfo out of a BAM
header. Then one can just coerce that to a GRanges. rtracklayer does the
equivalent for BigWig.
In devel,
> seqinfo(open(BamFile(fl)))
Seqinfo of length 2
seqnames seqlengths isCircular
seq1 1575 NA
seq2 1584 NA
I think this needs to be updated to deal with recent changes to Seqinfo
to store the reference genome (which is sometimes also present in the
BAM file).
I think this update would be useful for us. Thank you for your support
Martin and Michael.
Rene
Maritn
Michael
Martin
ultimately, I want to launch a PICS analysis that requires a
segReadsList object.
Overall I definitely progressed by your help, thank you.
Rene
On Fri, 2011-09-16 at 14:29 -0700, Martin Morgan wrote:
On 09/16/2011 02:11 PM, Michael Lawrence wrote:
It sounds like you're trying to use BED as an
alternative to BAM? Probably
not a good idea, especially at this scale. Why are you
aiming for a
GenomeData? A GappedAlignments might be more
appropriate. See
GenomicRanges::__readGappedAlignments() for bringing a
BAM into a
GappedAlignments.
Hi Rene
the 'which' argument to readGappedAlignments (it'll become
'param' with
the next release, and be a ScanBamParam object) allows you
to select
regions to process, e.g., chromosome-at-a-time, to help with
file size.
Martin
This page might help:
http://bioconductor.org/help/__workflows/high-throughput-__sequencing/#sequencing-__resources
<http://bioconductor.org/help/workflows/high-throughput-sequencing/#sequencing-resources>
But it could really be improved.
Michael
On Fri, Sep 16, 2011 at 1:44 PM, Rene
Paradis<rene.paradis@genome.__ulaval.ca
<mailto:rene.para...@genome.ulaval.ca>
wrote:
Hello,
I am experiencing a problem regarding the load in
memory of bed files of
30 GB. my function read.table unleash the error :
Error in unique(x) :
length xxxxxx is too large for hashing.
this is generated by the function MKsetup of the
unique.c file. Even by
increasing by 10 000x the value, the error persists.
I believe the
function pushes more data in ram, but I am not sure
this is the good way
to focus on.
Ultimately, I would like to produce a GenomeData
object from either a
BAM file or a bed file.
has someone ever worked with very very big BAM files
(about 30 GB)
thanks
Rene paradis
_________________________________________________
Bioc-sig-sequencing mailing list
Bioc-sig-sequencing@r-project.__org
<mailto:Bioc-sig-sequencing@r-project.org>
https://stat.ethz.ch/mailman/__listinfo/bioc-sig-sequencing
<https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing>
[[alternative HTML version deleted]]
_________________________________________________
Bioc-sig-sequencing mailing list
Bioc-sig-sequencing@r-project.__org
<mailto:Bioc-sig-sequencing@r-project.org>
https://stat.ethz.ch/mailman/__listinfo/bioc-sig-sequencing
<https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing>
--
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
Location: M1-B861
Telephone: 206 667-2793<tel:206%20667-2793>
_________________________________________________
Bioc-sig-sequencing mailing list
Bioc-sig-sequencing@r-project.__org
<mailto:Bioc-sig-sequencing@r-project.org>
https://stat.ethz.ch/mailman/__listinfo/bioc-sig-sequencing
<https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing>
--
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
Location: M1-B861
Telephone: 206 667-2793
_______________________________________________
Bioc-sig-sequencing mailing list
Bioc-sig-sequencing@r-project.org
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing