On 09/19/2011 11:55 AM, Michael Lawrence wrote:


On Mon, Sep 19, 2011 at 11:31 AM, Martin Morgan <mtmor...@fhcrc.org
<mailto:mtmor...@fhcrc.org>> wrote:

    On 09/19/2011 11:26 AM, Rene Paradis wrote:

        Thanks Martin and Michael for your constructive advices,

        I used the ScanBamParam object to successfully load a part of
        the Chr1
        from a Bam file via ScanBam. Honestly I do not know what are the
        differences between readGappedAlignments, readBamGappedAlignment and
        ScanBam. The last two of them can take a  ScanBamParam object.


    scanBam returns a list-of-lists, it's the most flexible but least
    'user-friendly'.

    readGappedAlignments is meant to be a 'front end' to read
    GappedAlignments from several different sources, and
    readBamGappedAlignments is meant to be one of those sources; usually
    the 'user' would readGappedAlignments.


        But I wished I could select the seqname in GRanges to retrieve
        all the
        chr1 (as an example) data from the Bam file. It seems I must
        select a
        range. So I put a value that goes beyond the range of the chr1
        because I
        do not know that range, and I got an<<INTEGER () can only be
        applied to
        a 'integer', not a special>>.


Couldn't Rsamtools give something more informative?

The info in the original post isn't enough to understand how the error occurs.


        There must be something I missed that
        could help me doing that.


    see ?scanBamHeader, e.g.,

     >  fl <- system.file("extdata", "ex1.bam", package="Rsamtools")
     > scanBamHeader(fl)[[1]]$targets
    seq1 seq2
    1575 1584

Would be nice to have a method for getting a Seqinfo out of a BAM
header. Then one can just coerce that to a GRanges. rtracklayer does the
equivalent for BigWig.

In devel,

> seqinfo(open(BamFile(fl)))
Seqinfo of length 2
seqnames seqlengths isCircular
seq1           1575         NA
seq2           1584         NA

I think this needs to be updated to deal with recent changes to Seqinfo to store the reference genome (which is sometimes also present in the BAM file).

Maritn

Michael

    Martin



        ultimately, I want to launch a PICS analysis that requires a
        segReadsList object.

        Overall I definitely progressed by your help, thank you.

        Rene




        On Fri, 2011-09-16 at 14:29 -0700, Martin Morgan wrote:

            On 09/16/2011 02:11 PM, Michael Lawrence wrote:

                It sounds like you're trying to use BED as an
                alternative to BAM? Probably
                not a good idea, especially at this scale. Why are you
                aiming for a
                GenomeData? A GappedAlignments might be more
                appropriate. See
                GenomicRanges::__readGappedAlignments() for bringing a
                BAM into a
                GappedAlignments.


            Hi Rene

            the 'which' argument to readGappedAlignments (it'll become
            'param' with
            the next release, and be a ScanBamParam object) allows you
            to select
            regions to process, e.g., chromosome-at-a-time, to help with
            file size.

            Martin


                This page might help:
                
http://bioconductor.org/help/__workflows/high-throughput-__sequencing/#sequencing-__resources
                
<http://bioconductor.org/help/workflows/high-throughput-sequencing/#sequencing-resources>

                But it could really be improved.

                Michael

                On Fri, Sep 16, 2011 at 1:44 PM, Rene
                Paradis<rene.paradis@genome.__ulaval.ca
                <mailto:rene.para...@genome.ulaval.ca>

                    wrote:


                    Hello,

                    I am experiencing a problem regarding the load in
                    memory of bed files of
                    30 GB. my function read.table unleash the error :
                    Error in unique(x) :
                    length xxxxxx is too large for hashing.

                    this is generated by the function MKsetup of the
                    unique.c file. Even by
                    increasing by 10 000x the value, the error persists.
                    I believe the
                    function pushes more data in ram, but I am not sure
                    this is the good way
                    to focus on.

                    Ultimately, I would like to produce a GenomeData
                    object from either a
                    BAM file or a bed file.

                    has someone ever worked with very very big BAM files
                    (about 30 GB)

                    thanks

                    Rene paradis

                    _________________________________________________
                    Bioc-sig-sequencing mailing list
                    Bioc-sig-sequencing@r-project.__org
                    <mailto:Bioc-sig-sequencing@r-project.org>
                    https://stat.ethz.ch/mailman/__listinfo/bioc-sig-sequencing
                    <https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing>


                        [[alternative HTML version deleted]]

                _________________________________________________
                Bioc-sig-sequencing mailing list
                Bioc-sig-sequencing@r-project.__org
                <mailto:Bioc-sig-sequencing@r-project.org>
                https://stat.ethz.ch/mailman/__listinfo/bioc-sig-sequencing
                <https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing>







    --
    Computational Biology
    Fred Hutchinson Cancer Research Center
    1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

    Location: M1-B861
    Telephone: 206 667-2793 <tel:206%20667-2793>

    _________________________________________________
    Bioc-sig-sequencing mailing list
    Bioc-sig-sequencing@r-project.__org
    <mailto:Bioc-sig-sequencing@r-project.org>
    https://stat.ethz.ch/mailman/__listinfo/bioc-sig-sequencing
    <https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing>




--
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793

_______________________________________________
Bioc-sig-sequencing mailing list
Bioc-sig-sequencing@r-project.org
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to