Dear Caroline,

On 08/02/2011 10:01, Barretto, Caroline, LAUSANNE, BioInformatics wrote:
Dear EMBOSS developers,

I have been using diffseq to compare too strains of the same bacteria
species using "10" as wordsize without any problem.

However, when I try to reduce this number to "4", after several hours of
calculation the server collapses, all RAM and SWAP are used.

Is there any option to avoid that, or do you know if someone is working
on that problem?

Depending on the input size, and the number of simple repeats, a low word size could easily generate too many matches for large sequence lengths.

We would recommend reducing the word size more slowly (maybe 10, 8, 6).

As a guideline, finding more matches than there are non-overlapping words in the sequence is unlikely to be useful and is a reasonable point to stop reducing the word size.

Meanwhile, we will take a look at diffseq in case there is some way to improve its performance or to warn an early stage if the word size appears small for the input sequence lengths and may generate too many matches.

Hope this helps

Peter Rice
EMBOSS Team
_______________________________________________
EMBOSS mailing list
EMBOSS@lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/emboss

Reply via email to