Re: [Biojava-l] Fasta parsing question

Toorn, H.W.P. van den (Henk) Wed, 17 Jun 2015 02:26:36 -0700

Hi Andreas, thanks very much. I've compiled some (working) code toillustrate how I think this should work. The artificial sample fastafile contains only one sequence:



---------------
>test test
PEPTIDEK

---------------

If you use a larger FASTA file, the file is first parsed correctly, butwhen it finishes, the loop just continues. I'm aware I'm probably doingsomething wrong in my code, but to me it's just not clear how to do itcorrectly, and that's basically my question.


The code below loops forever, the output is repeating this:

--------------

11:18:56 [main] WARN org.biojava.nbio.core.sequence.io.FastaReader -Can't parse sequence 12. Got sequence of length 0!11:18:56 [main] WARN org.biojava.nbio.core.sequence.io.FastaReader -header: test test

test test
---------------

package nl.hecklab.bioinformatics.fastafilereaderexample;

import java.io.IOException;
import java.io.InputStream;
import java.util.LinkedHashMap;
import java.util.logging.Level;
import java.util.logging.Logger;
import org.biojava.nbio.core.sequence.ProteinSequence;
import org.biojava.nbio.core.sequence.compound.AminoAcidCompound;
import org.biojava.nbio.core.sequence.compound.AminoAcidCompoundSet;
import org.biojava.nbio.core.sequence.io.FastaReader;
import org.biojava.nbio.core.sequence.io.GenericFastaHeaderParser;
import org.biojava.nbio.core.sequence.io.ProteinSequenceCreator;

/**
 *
 * @author toorn101
 */
public class App {

    public App() {
        try {

InputStream inStream =this.getClass().getResourceAsStream("/test.fasta");FastaReader<ProteinSequence, AminoAcidCompound> fastaReader= new FastaReader<>(

                    inStream,

new GenericFastaHeaderParser<ProteinSequence,AminoAcidCompound>(),newProteinSequenceCreator(AminoAcidCompoundSet.getAminoAcidCompoundSet()));

            LinkedHashMap<String, ProteinSequence> b;
            while ((b = fastaReader.process(10)) != null) {
                for (String seq : b.keySet()) {
                    System.out.println(seq);
                }
            }
        } catch (IOException ex) {

Logger.getLogger(App.class.getName()).log(Level.SEVERE,null, ex);

        }
    }

    public static void main(String[] args) {
        new App();
    }

}


On 6/17/2015 7:04 AM, Andreas Prlic wrote:

Hi Henk,

Do you want to share some code-snippets so we can help you debug?

Thanks,

Andreas

On Mon, Jun 15, 2015 at 1:58 AM, Toorn, H.W.P. van den (Henk)<[email protected] <mailto:[email protected]>> wrote:


    Dear List,

    I've just started using BioJava 4.0.0 in my projects, and wanted
    to ask a question about parsing large Fasta files. There is the
    option to read parts of the fasta file.

    FastaReader.process(number)

    The problem I have is that it's not documented what happens if the
    file is read in its entirety. I was expecting a null or an empty
    map, or even some exception, but none happened and the parser kept
    on producing (empty) sequences.

    Could anyone enlighten me? I'm probably missing the point here.
    Maybe there is a better way to do this (there used to be the
    SequenceIterator if I remember correctly, but I can't find that in
    version 4.0).



    Regards, Henk

    My setup: windows 7 64-bit, java 1.8.0_45 64 bit, BioJava 4.0.0
    via Maven.

--


    _______________________________________________
    Biojava-l mailing list  - [email protected]
    <mailto:[email protected]>
    http://mailman.open-bio.org/mailman/listinfo/biojava-l




--
-----------------------------------------------------------------------
Dr. Andreas Prlic
RCSB PDB Protein Data Bank
Technical & Scientific Team Lead
University of California, San Diego

Editor Software Section
PLOS Computational Biology

BioJava Project Lead
-----------------------------------------------------------------------

--

<<attachment: h_w_p_vandentoorn.vcf>>

_______________________________________________
Biojava-l mailing list  -  [email protected]
http://mailman.open-bio.org/mailman/listinfo/biojava-l

Re: [Biojava-l] Fasta parsing question

Reply via email to