So Peter is right about what I want returned - the best match, but of course has pointed 
out the problem with having 2 best matches for the same region ( in this example 104-113, 
105-114).  However, it is still the case that the "real" result is 4 hits 
rather than 9.

I dont know if my example is a special case or not so it would be good as Peter 
suggests if someone else has used fuzznuc in a similar way.  Though surely if 
you include any mismatch at all for your pattern search then you automatically 
have this scenario of returning multiple results for the same location?


Cheers
Aengus







On 13/10/11 09:44, Peter Rice wrote:
On 13/10/2011 08:45, Jon Ison wrote:
Hi chaps (Aengus !)

If I understood Aengus' msg. what's needed is something that simply combines 
overlapping hits (for
a given pattern) into one or more non-overlapping "region of hits", and reports 
those regions e.g.

     Start     End  Strand Pattern_name Mismatch Sequence
        54      65       + pattern1            5 GCCAAATAAGGG
       104     115       + pattern1            5 CCTAAATAAGGG
       179     188       + pattern1            2 CCTTGCTTGG
       190     200       + pattern1            6 CCGATTAGAGC

Mismatch in this case is reporting the sum of mismatches from before.  A column 
for number of
(sub)matches would also be needed.  Is that right Aengus?

I'm not sure that adding the mismatches is sound. I'd assume just a best
hit from the overlapping matches.

The above might give a useful result depending in the input pattern.  It would 
I think be easy
enough to implement.

This is a report output, so post-processing could be done by trimming
the results before output using an associated qualifier.

Still not sure how useful it would be, we need more feedback from other
users on this one please!

Peter Rice
EMBOSS Team

_______________________________________________
EMBOSS mailing list
EMBOSS@lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/emboss


--
-----------------------------------------------------------------------
Aengus Stewart                                 Tel: +44 (0)20 7269 3679
Head of Bioinformatics and BioStatistics
CRUK London Research Institute
Lincoln's Inn Fields, Holborn, London, WC2A 3LY, UK
-----------------------------------------------------------------------

This electronic message contains information which may be privileged and
confidential.  The information is intended to be for the use of the
individual(s) or entity named above. Be aware that any third party
disclosure, distribution, copying or use of this communication, without
prior permission, is strictly prohibited.

NOTICE AND DISCLAIMER
This e-mail (including any attachments) is intended for the above-named person(s). If you are not the intended recipient, notify the sender immediately, delete this email from your system and do not disclose or use for any purpose. We may monitor all incoming and outgoing emails in line with current legislation. We have taken steps to ensure that this email and attachments are free from any virus, but it remains your responsibility to ensure that viruses do not adversely affect you. Cancer Research UK
Registered in England and Wales
Company Registered Number: 4325234.
Registered Charity Number: 1089464 and Scotland SC041666
Registered Office Address: Angel Building, 407 St John Street, London EC1V 4AD.
_______________________________________________
EMBOSS mailing list
EMBOSS@lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/emboss

Reply via email to