Re: [Bioc-sig-seq] a wired problem

Harris A. Jaffee Wed, 28 Sep 2011 14:46:58 -0700

I forgot to say that in this code,

result <- vcountPattern(PCR2rc, seqs, max.mismatch=1, min.mismatch=0,
with.indels=TRUE, algorithm="indels")
reads <- reads[result]



you do NOT want to select from 'reads' using 'result' as is, since
its values are all 0 or 1:

> str(result)
 int [1:25000] 0 0 0 0 0 0 0 0 0 0 ...

> table(result)
result
    0     1
24959    41

I suppose you are getting 41 instances of your first read, namely

  40-letter "DNAString" instance
seq: GTTTTCTCATTNGAAATTTTGTTACCGCAAAANNNCCACC

when instead, you want something like

        reads[result == 1]

On Sep 28, 2011, at 5:27 PM, Harris A. Jaffee wrote:

I don't understand where your 'subject1' sequence:

subject1 = "GTTGGTGCAAACATTAGTTCTTCTGTTGGTGCAACCTTTG"

came from.

Modulo that, everything looks fine.

In my hands, your vcountPattern call finds 41 40-letter elements
of 'seqs', all equal to GATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGAAACAAA,
that match your 41-letter pattern, subject to your

        max.mismatch=1, with.indels=TRUE'

They are all equal to your pattern with its first letter deleted.

I haven't looked closely, but I don't see a reason to doubt this
result.

The following calls would seem to be the sanity checks you want.
They at least confirm the 41 fuzzy matches.

> countPattern(PCR2rc, "GATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGAAACAAA",
        max.mismatch=1, with.indels=TRUE)
[1] 1

> matchPattern(PCR2rc, "GATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGAAACAAA",
        max.mismatch=1, with.indels=TRUE)

Views on a 40-letter BString subject
subject: GATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGAAACAAA
views:
    start end width
[1]     1  40    40 [GATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGAAACAAA]

On Sep 28, 2011, at 4:14 PM, wang peter wrote:

dear all:
     Using vcountPattern, i found some matched sequences.
but those are not similar to the pattern.
     see such coding

rm(list=ls())
reads <- readFastq(fastqfile);#downloaded from
http://biocluster.ucr.edu/~tbackman/query.fastq
seqs <- sread(reads);
PCR2rc<-DNAString("AGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGAAACAAA")
result <- vcountPattern(PCR2rc, seqs, max.mismatch=1, min.mismatch=0,
with.indels=TRUE, algorithm="indels")
reads <- reads[result]
seqs <- sread(reads)
sum(result)
     then using countPattern, i found they are really not match

subject1 = "GTTGGTGCAAACATTAGTTCTTCTGTTGGTGCAACCTTTG"

result <- countPattern(PCR2rc, subject1, max.mismatch=1,min.mismatch=0,

with.indels=TRUE)
[1] 0

shan gao

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
Bioc-sig-sequencing@r-project.org
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


_______________________________________________
Bioc-sig-sequencing mailing list
Bioc-sig-sequencing@r-project.org
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


_______________________________________________
Bioc-sig-sequencing mailing list
Bioc-sig-sequencing@r-project.org
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Re: [Bioc-sig-seq] a wired problem

Reply via email to