If your problem is in parsing/loading all the sequences in memory first, before managing them, I had created a method public LinkedHashMap<String,S> process(int max) in Class FastaReader in BioJava 3.0.6. It reads a maximum (max) sequences to parse, then read next sequenes in a subsequent call. You can use it. If you need a similar one in Biojava 1, I can make it for you.
Otherwise, you will need to modify your algorithm to deal with smaller clusters, based on the task you are doing. Amr -----Original Message----- From: biojava-l-boun...@lists.open-bio.org [mailto:biojava-l-boun...@lists.open-bio.org] On Behalf Of Khalil El Mazouari Sent: Thursday, August 01, 2013 1:17 AM To: Biojava-l@lists.open-bio.org Subject: [Biojava-l] Large RichSequence collection Hi, I have to process large dataset of DNA sequence(>= 120.000 seq). Sequences are first annotated, clustered ... I end up with huge collection of SimpleRichSequence objects consuming a lot of RAM. Any suggestion on how to deal effectively with large collection of RichSequence objects is welcome. Thanks in advance. khalil ----- Confidentiality Notice: This e-mail and any files transmitted with it are private and confidential and are solely for the use of the addressee. It may contain material which is legally privileged. If you are not the addressee or the person responsible for delivering to the addressee, please notify that you have received this e-mail in error and that any use of it is strictly prohibited. It would be helpful if you could notify the author by replying to it. _______________________________________________ Biojava-l mailing list - Biojava-l@lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l _______________________________________________ Biojava-l mailing list - Biojava-l@lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l