On 13/10/2011 08:45, Jon Ison wrote:
Hi chaps (Aengus !)
If I understood Aengus' msg. what's needed is something that simply combines
overlapping hits (for
a given pattern) into one or more non-overlapping "region of hits", and reports
those regions e.g.
Start End Strand Pattern_name Mismatch Sequence
54 65 + pattern1 5 GCCAAATAAGGG
104 115 + pattern1 5 CCTAAATAAGGG
179 188 + pattern1 2 CCTTGCTTGG
190 200 + pattern1 6 CCGATTAGAGC
Mismatch in this case is reporting the sum of mismatches from before. A column
for number of
(sub)matches would also be needed. Is that right Aengus?
I'm not sure that adding the mismatches is sound. I'd assume just a best
hit from the overlapping matches.
The above might give a useful result depending in the input pattern. It would
I think be easy
enough to implement.
This is a report output, so post-processing could be done by trimming
the results before output using an associated qualifier.
Still not sure how useful it would be, we need more feedback from other
users on this one please!
Peter Rice
EMBOSS Team
_______________________________________________
EMBOSS mailing list
EMBOSS@lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/emboss