Re: [EMBOSS] diffseq memory problem?

Peter Rice Tue, 08 Feb 2011 02:50:36 -0800

Dear Caroline,

On 08/02/2011 10:01, Barretto, Caroline, LAUSANNE, BioInformatics wrote:

Dear EMBOSS developers,


I have been using diffseq to compare too strains of the same bacteria
species using "10" as wordsize without any problem.

However, when I try to reduce this number to "4", after several hours of
calculation the server collapses, all RAM and SWAP are used.

Is there any option to avoid that, or do you know if someone is working
on that problem?

Depending on the input size, and the number of simple repeats, a lowword size could easily generate too many matches for large sequence lengths.


We would recommend reducing the word size more slowly (maybe 10, 8, 6).

As a guideline, finding more matches than there are non-overlappingwords in the sequence is unlikely to be useful and is a reasonable pointto stop reducing the word size.

Meanwhile, we will take a look at diffseq in case there is some way toimprove its performance or to warn an early stage if the word sizeappears small for the input sequence lengths and may generate too manymatches.


Hope this helps

Peter Rice
EMBOSS Team
_______________________________________________
EMBOSS mailing list
EMBOSS@lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/emboss

Re: [EMBOSS] diffseq memory problem?

Reply via email to