Hi chaps (Aengus !) If I understood Aengus' msg. what's needed is something that simply combines overlapping hits (for a given pattern) into one or more non-overlapping "region of hits", and reports those regions e.g.
Start End Strand Pattern_name Mismatch Sequence 54 65 + pattern1 5 GCCAAATAAGGG 104 115 + pattern1 5 CCTAAATAAGGG 179 188 + pattern1 2 CCTTGCTTGG 190 200 + pattern1 6 CCGATTAGAGC Mismatch in this case is reporting the sum of mismatches from before. A column for number of (sub)matches would also be needed. Is that right Aengus? The above might give a useful result depending in the input pattern. It would I think be easy enough to implement. Cheers Jon > On 12/10/2011 16:50, Aengus Stewart wrote: >> Hi Folks, >> >> I couldnt see a command line option to do what I wanted ie return >> non-overlapping hits. >> >> This is best explained with some sample output. >> >> #======================================= >> # >> # Sequence: chr1_174353258_174354335 from: 1 to: 200 >> # HitCount: 9 >> # >> # Pattern_name Mismatch Pattern >> # pattern1 3 CC[AT](6)GG >> >> As you can see this is actually only 4 hits rather than the 9 reported. > > Hmmm ... with that kind of pattern and 3 mismatches there are pretty > sure to be overlapping matches. > > Trouble is, which matches would you want to keep? Your second match, for > example, has 2 hits with 1 mismatch at 104..115 and 105..116 > > It should be possible to come up with patterns where the choice of 'best > hit' complicates which hits are considered to overlap. > > Probably writing a script is your best bet as you can then control which > hits are picked. > > We could try to write an application to remove overlapping features ... > if someone can define how to select them. In this case, the mismatch > number will be stored as a tag (feature qualifier) in the feature table > and could be included in the selection criteria. > > Hope this helps ... and maybe sparks some ideas > > Peter Rice > EMBOSS Team > _______________________________________________ > EMBOSS mailing list > EMBOSS@lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss > _______________________________________________ EMBOSS mailing list EMBOSS@lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/emboss