Hi Andreas, thanks very much. I've compiled some (working) code to illustrate how I think this should work. The artificial sample fasta file contains only one sequence:


---------------
>test test
PEPTIDEK

---------------
If you use a larger FASTA file, the file is first parsed correctly, but when it finishes, the loop just continues. I'm aware I'm probably doing something wrong in my code, but to me it's just not clear how to do it correctly, and that's basically my question.

The code below loops forever, the output is repeating this:

--------------
11:18:56 [main] WARN org.biojava.nbio.core.sequence.io.FastaReader - Can't parse sequence 12. Got sequence of length 0! 11:18:56 [main] WARN org.biojava.nbio.core.sequence.io.FastaReader - header: test test
test test
---------------

package nl.hecklab.bioinformatics.fastafilereaderexample;

import java.io.IOException;
import java.io.InputStream;
import java.util.LinkedHashMap;
import java.util.logging.Level;
import java.util.logging.Logger;
import org.biojava.nbio.core.sequence.ProteinSequence;
import org.biojava.nbio.core.sequence.compound.AminoAcidCompound;
import org.biojava.nbio.core.sequence.compound.AminoAcidCompoundSet;
import org.biojava.nbio.core.sequence.io.FastaReader;
import org.biojava.nbio.core.sequence.io.GenericFastaHeaderParser;
import org.biojava.nbio.core.sequence.io.ProteinSequenceCreator;

/**
 *
 * @author toorn101
 */
public class App {

    public App() {
        try {
InputStream inStream = this.getClass().getResourceAsStream("/test.fasta"); FastaReader<ProteinSequence, AminoAcidCompound> fastaReader = new FastaReader<>(
                    inStream,
new GenericFastaHeaderParser<ProteinSequence, AminoAcidCompound>(), new ProteinSequenceCreator(AminoAcidCompoundSet.getAminoAcidCompoundSet()));
            LinkedHashMap<String, ProteinSequence> b;
            while ((b = fastaReader.process(10)) != null) {
                for (String seq : b.keySet()) {
                    System.out.println(seq);
                }
            }
        } catch (IOException ex) {
Logger.getLogger(App.class.getName()).log(Level.SEVERE, null, ex);
        }
    }

    public static void main(String[] args) {
        new App();
    }

}


On 6/17/2015 7:04 AM, Andreas Prlic wrote:
Hi Henk,

Do you want to share some code-snippets so we can help you debug?

Thanks,

Andreas



On Mon, Jun 15, 2015 at 1:58 AM, Toorn, H.W.P. van den (Henk) <[email protected] <mailto:[email protected]>> wrote:

    Dear List,

    I've just started using BioJava 4.0.0 in my projects, and wanted
    to ask a question about parsing large Fasta files. There is the
    option to read parts of the fasta file.

    FastaReader.process(number)

    The problem I have is that it's not documented what happens if the
    file is read in its entirety. I was expecting a null or an empty
    map, or even some exception, but none happened and the parser kept
    on producing (empty) sequences.

    Could anyone enlighten me? I'm probably missing the point here.
    Maybe there is a better way to do this (there used to be the
    SequenceIterator if I remember correctly, but I can't find that in
    version 4.0).



    Regards, Henk

    My setup: windows 7 64-bit, java 1.8.0_45 64 bit, BioJava 4.0.0
    via Maven.
--

    _______________________________________________
    Biojava-l mailing list  - [email protected]
    <mailto:[email protected]>
    http://mailman.open-bio.org/mailman/listinfo/biojava-l




--
-----------------------------------------------------------------------
Dr. Andreas Prlic
RCSB PDB Protein Data Bank
Technical & Scientific Team Lead
University of California, San Diego

Editor Software Section
PLOS Computational Biology

BioJava Project Lead
-----------------------------------------------------------------------

--

<<attachment: h_w_p_vandentoorn.vcf>>

_______________________________________________
Biojava-l mailing list  -  [email protected]
http://mailman.open-bio.org/mailman/listinfo/biojava-l

Reply via email to