Re: [EMBOSS] fuzznuc pattern expansion
Dear all, Is or would it be possible to see the (numeric) position of the mismatches in the fuzznuc output file. E.g. the example output file shows mismatches, but not where there are located: http://emboss.sourceforge.net/apps/release/6.4/emboss/apps/fuzznuc.html#output.4 # pat21 cg(2)c(3)taaccctagc(3)ta 605 624 + pat2: cg(2)c(3)taaccctagc(3)ta1 cggccctaaccctaacccta Clearly, we can find the position of mismatched by matching the supplied pattern with the reported match, but would not be preferred. Kind regards, Bernd On Wed, Nov 2, 2011 at 6:37 PM, Peter Rice wrote: > Dear Bernd, > > On 02/11/2011 15:12, Bernd Web wrote: > >> Thanks! It would indeed be great to have the option to seach on the >> ambiguity codes directly. Probably, I'd prefer the escape option, but >> you mean to implement both escaping and expansion to subsets? > > Yes, we will implement both. Escaping is needed to find any ambiguity codes > in a sequence. Expansion allows S to find G, C and S. > >> It might be good to report the pattern that was used in the matching. >> Would the (very high) speed of fuzznuc be affected by always exploding >> the to the subsets? For example, "N" would become "ACTGUMRWSYKVHDB". > > N is not a problem - it matches anything. The 2-letter ambiguity codes only > expand to one extra letter, and 3-letter codes (B, D, H, V) are only very > rarely used. > > regards, > > Peter Rice > EMBOSS Team > > ___ EMBOSS mailing list EMBOSS@lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/emboss
Re: [EMBOSS] fuzznuc pattern expansion
Dear Bernd, On 02/11/2011 15:12, Bernd Web wrote: Thanks! It would indeed be great to have the option to seach on the ambiguity codes directly. Probably, I'd prefer the escape option, but you mean to implement both escaping and expansion to subsets? Yes, we will implement both. Escaping is needed to find any ambiguity codes in a sequence. Expansion allows S to find G, C and S. It might be good to report the pattern that was used in the matching. Would the (very high) speed of fuzznuc be affected by always exploding the to the subsets? For example, "N" would become "ACTGUMRWSYKVHDB". N is not a problem - it matches anything. The 2-letter ambiguity codes only expand to one extra letter, and 3-letter codes (B, D, H, V) are only very rarely used. regards, Peter Rice EMBOSS Team ___ EMBOSS mailing list EMBOSS@lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/emboss
Re: [EMBOSS] fuzznuc pattern expansion
Dear Peter, Thanks! It would indeed be great to have the option to seach on the ambiguity codes directly. Probably, I'd prefer the escape option, but you mean to implement both escaping and expansion to subsets? This actually might be good in case a user does not know the contents of the DNA file (ie which ambiguity codes are present). It might be good to report the pattern that was used in the matching. Would the (very high) speed of fuzznuc be affected by always exploding the to the subsets? For example, "N" would become "ACTGUMRWSYKVHDB". This could mean searches of patterns with high degeneracy would include a lot of ambiguity codes. Kind regards, Bernd On Sat, Oct 29, 2011 at 7:06 PM, Peter Rice wrote: > On 28/10/2011 18:03, Bernd Web wrote: >> >> Hi >> >> Using fuzznuc I get illegal pattern warnings. I realize what is going on: >> >> "You can use ambiguity codes for nucleic acid searches but not within >> [] or {} as they expand to bracketed counterparts. For example, "s" is >> expanded to "[GC]" therefore [S] would be expanded to [[GC]] which is >> illegal." >> >> However, what I cannot find it how to suppress this expansion. Is this >> possible? We actually need to have these ambiguity remain as they are >> within [] as the input sequences can contain R, Y, B, N themselves for >> example. Thus, [GCS] is a pattern we actually want to be able to use. > > That looks a reasonable suggestion. > > We can replace S with [GCS] directly. For the wider ambiguity codes, we can > replace them with the subsets: > > B [TGCBSYK] > D [TGADWRK] > H [TCAHWYM] > V [GCAVSRM] > > We can also allow 'C\S' to explicitly match CS in the input sequence by > escaping the S to skip the automatic expansion. > > These changes can be added to the next release. > > Thanks for the idea. > > Peter Rice > EMBOSS Team > > ___ EMBOSS mailing list EMBOSS@lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/emboss
Re: [EMBOSS] fuzznuc pattern expansion
On 28/10/2011 18:03, Bernd Web wrote: Hi Using fuzznuc I get illegal pattern warnings. I realize what is going on: "You can use ambiguity codes for nucleic acid searches but not within [] or {} as they expand to bracketed counterparts. For example, "s" is expanded to "[GC]" therefore [S] would be expanded to [[GC]] which is illegal." However, what I cannot find it how to suppress this expansion. Is this possible? We actually need to have these ambiguity remain as they are within [] as the input sequences can contain R, Y, B, N themselves for example. Thus, [GCS] is a pattern we actually want to be able to use. That looks a reasonable suggestion. We can replace S with [GCS] directly. For the wider ambiguity codes, we can replace them with the subsets: B [TGCBSYK] D [TGADWRK] H [TCAHWYM] V [GCAVSRM] We can also allow 'C\S' to explicitly match CS in the input sequence by escaping the S to skip the automatic expansion. These changes can be added to the next release. Thanks for the idea. Peter Rice EMBOSS Team ___ EMBOSS mailing list EMBOSS@lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/emboss