Sending back to the list, since others may be confused also.

On Aug 31, 2011, at 11:48 AM, wang peter wrote:
DEAR HARRIS:
           I am shan, thank you very much for your kindly help.
but i am still confused about the function of trimLRPatterns.
like the example
if i set
> subject = "TTTACGT"
> Lpattern = "TTTAACGT"
the result is :
> trimLRPatterns(Lpattern = Lpattern, subject = subject, max.Lmismatch=1,with.Lindels=TRUE)
[1] ""

but if i set
> subject = "TTTACGT"
> Lpattern = "AAATTTAACGT"
the result is :
> trimLRPatterns(Lpattern = Lpattern, subject = subject, max.Lmismatch=1,with.Lindels=TRUE)
[1] "TTTACGT"
how to explain it?

The problem is that max.Lmismatch is a vector that specifies one's mismatch tolerances for the successive match tests of the Lpattern suffixes, at the beginning of the subject. The vector is expected to be of length nchar(Lpattern), with the element max.Lmismatch[i] controlling the test for the suffix of length i. If a shorter vector is supplied, as you did here (you give a vector of length 1), the function expands that to a vector of length nchar(Lpattern) by filling with -1's at the *low end*. Your 1 becomes the last element of this vector in both cases above. This 1 is sufficient for "TTTAACGT" to match "TTTACGT" in the context of with.Lindels=TRUE, but it is not enough for "AAATTTAACGT" to match the same subject. You would need 4 edits (deletions
of A) for that:

> trimLRPatterns(Lpattern = Lpattern, subject = subject, max.Lmismatch=3, with.Lindels=T)
[1] "TTTACGT"

> trimLRPatterns(Lpattern = Lpattern, subject = subject, max.Lmismatch=4, with.Lindels=T)
[1] ""

On the other hand, you can trim the entire subject a different way, allowing for only 1 edit, by employing the 4_th longest suffix of Lpattern, namely "TTTAACGT". The commands below show that 1 edit is not enough to trim the whole subject using the *3_rd longest* Lpattern suffix,
namely "ATTTAACGT" (for which you would need 2 edits!):

> trimLRPatterns(Lpattern = Lpattern, subject = subject, max.Lmismatch=rep(1,3), with.Lindels=TRUE)
[1] "TTTACGT"

> trimLRPatterns(Lpattern = Lpattern, subject = subject, max.Lmismatch=rep(1,4), with.Lindels=TRUE)
[1] ""

# allows for 2 edits, for the 3 longest pattern suffixes:
> trimLRPatterns(Lpattern = Lpattern, subject = subject, max.Lmismatch=rep(2,3), with.Lindels=TRUE)
[1] ""

# shows exactly where the 2 is needed (for the 3_rd longest suffix):
> trimLRPatterns(Lpattern = Lpattern, subject = subject, max.Lmismatch=c(2,0,0), with.Lindels=TRUE)
[1] ""

To see the R code for trimLRPatterns, do

> showMethods("trimLRPatterns", includeDefs=TRUE)

and

> Biostrings:::.XStringSet.trimLRPatterns

and (for Lpattern)

> Biostrings:::.computeTrimStart

Also see  ?which.isMatchingStartingAt

and do you know how to read the c source code of trimLRPatterns

Start with the function XString_match_pattern_at() on

        Biostrings/src/lowlevel_matching.c

This is called by .matchPatternAt() on R/lowlevel-matching.R.

thank u very much
shan gao

_______________________________________________
Bioc-sig-sequencing mailing list
Bioc-sig-sequencing@r-project.org
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to