Re: [EMBOSS] fuzznuc pattern expansion

2011-11-07 Thread Bernd Web
Dear all,

Is or would it be possible to see the (numeric) position of the
mismatches in the fuzznuc output file.
E.g. the example output file shows mismatches, but not where there are located:
http://emboss.sourceforge.net/apps/release/6.4/emboss/apps/fuzznuc.html#output.4
# pat21 cg(2)c(3)taaccctagc(3)ta
  605 624   + pat2: cg(2)c(3)taaccctagc(3)ta1
cggccctaaccctaacccta

Clearly, we can find the position of mismatched by matching the
supplied pattern with the reported match, but would not be preferred.


Kind regards,
Bernd

On Wed, Nov 2, 2011 at 6:37 PM, Peter Rice  wrote:
> Dear Bernd,
>
> On 02/11/2011 15:12, Bernd Web wrote:
>
>> Thanks! It would indeed be great to have the option to seach on the
>> ambiguity codes directly. Probably, I'd prefer the escape option, but
>> you mean to implement both escaping and expansion to subsets?
>
> Yes, we will implement both. Escaping is needed to find any ambiguity codes
> in a sequence. Expansion allows S to find G, C and S.
>
>> It might be good to report the pattern that was used in the matching.
>> Would the (very high) speed of fuzznuc be affected by always exploding
>> the to the subsets? For example, "N" would become "ACTGUMRWSYKVHDB".
>
> N is not a problem - it matches anything. The 2-letter ambiguity codes only
> expand to one extra letter, and 3-letter codes (B, D, H, V) are only very
> rarely used.
>
> regards,
>
> Peter Rice
> EMBOSS Team
>
>
___
EMBOSS mailing list
EMBOSS@lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/emboss


Re: [EMBOSS] fuzznuc pattern expansion

2011-11-02 Thread Peter Rice

Dear Bernd,

On 02/11/2011 15:12, Bernd Web wrote:


Thanks! It would indeed be great to have the option to seach on the
ambiguity codes directly. Probably, I'd prefer the escape option, but
you mean to implement both escaping and expansion to subsets?


Yes, we will implement both. Escaping is needed to find any ambiguity 
codes in a sequence. Expansion allows S to find G, C and S.



It might be good to report the pattern that was used in the matching.
Would the (very high) speed of fuzznuc be affected by always exploding
the to the subsets? For example, "N" would become "ACTGUMRWSYKVHDB".


N is not a problem - it matches anything. The 2-letter ambiguity codes 
only expand to one extra letter, and 3-letter codes (B, D, H, V) are 
only very rarely used.


regards,

Peter Rice
EMBOSS Team

___
EMBOSS mailing list
EMBOSS@lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/emboss


Re: [EMBOSS] fuzznuc pattern expansion

2011-11-02 Thread Bernd Web
Dear Peter,

Thanks! It would indeed be great to have the option to seach on the
ambiguity codes directly. Probably, I'd prefer the escape option, but
you mean to implement both escaping and expansion to subsets?
This actually might be good in case a user does not know the contents
of the DNA file (ie which ambiguity codes are present).

It might be good to report the pattern that was used in the matching.
Would the (very high) speed of fuzznuc be affected by always exploding
the to the subsets? For example, "N" would become "ACTGUMRWSYKVHDB".
This could mean searches of patterns with high degeneracy would
include a lot of ambiguity codes.


Kind regards,
Bernd

On Sat, Oct 29, 2011 at 7:06 PM, Peter Rice  wrote:
> On 28/10/2011 18:03, Bernd Web wrote:
>>
>> Hi
>>
>> Using fuzznuc I get illegal pattern warnings. I realize what is going on:
>>
>> "You can use ambiguity codes for nucleic acid searches but not within
>> [] or {} as they expand to bracketed counterparts. For example, "s" is
>> expanded to "[GC]" therefore [S] would be expanded to [[GC]] which is
>> illegal."
>>
>> However, what I cannot find it how to suppress this expansion. Is this
>> possible? We actually need to have these ambiguity remain as they are
>> within [] as the input sequences can contain R, Y, B, N themselves for
>> example. Thus, [GCS] is a pattern we actually want to be able to use.
>
> That looks a reasonable suggestion.
>
> We can replace S with [GCS] directly. For the wider ambiguity codes, we can
> replace them with the subsets:
>
> B [TGCBSYK]
> D [TGADWRK]
> H [TCAHWYM]
> V [GCAVSRM]
>
> We can also allow 'C\S' to explicitly match CS in the input sequence by
> escaping the S to skip the automatic expansion.
>
> These changes can be added to the next release.
>
> Thanks for the idea.
>
> Peter Rice
> EMBOSS Team
>
>
___
EMBOSS mailing list
EMBOSS@lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/emboss


Re: [EMBOSS] fuzznuc pattern expansion

2011-10-29 Thread Peter Rice

On 28/10/2011 18:03, Bernd Web wrote:

Hi

Using fuzznuc I get illegal pattern warnings. I realize what is going on:

"You can use ambiguity codes for nucleic acid searches but not within
[] or {} as they expand to bracketed counterparts. For example, "s" is
expanded to "[GC]" therefore [S] would be expanded to [[GC]] which is
illegal."

However, what I cannot find it how to suppress this expansion. Is this
possible? We actually need to have these ambiguity remain as they are
within [] as the input sequences can contain R, Y, B, N themselves for
example. Thus, [GCS] is a pattern we actually want to be able to use.


That looks a reasonable suggestion.

We can replace S with [GCS] directly. For the wider ambiguity codes, we 
can replace them with the subsets:


B [TGCBSYK]
D [TGADWRK]
H [TCAHWYM]
V [GCAVSRM]

We can also allow 'C\S' to explicitly match CS in the input sequence by 
escaping the S to skip the automatic expansion.


These changes can be added to the next release.

Thanks for the idea.

Peter Rice
EMBOSS Team

___
EMBOSS mailing list
EMBOSS@lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/emboss