As I understand the question, you have first removed a 5' barcode ("GCATT"), and you're wondering if you should have actually left it there to facilitate trimming by your 3' adaptor (which might not occur at the 3' end). I don't
believe so.

Now, if your barcode-trimmed read ("ATCGAGAT...") actually contained your 3' adaptor ("AATCGAGAT...") or at least a prefix of it, things would be easier.
Then we could try to proceed somewhat normally as follows:

subject
ATCGAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTGAAAAA AAAAAAATATT


Rpattern
AATCGAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTG

  NNN = paste(rep("N", nchar(subject)-nchar(Rpattern)), collapse="")
  new_pattern = paste(Rpattern, NNN, sep="")
trimLRPatterns(Rpattern=new_pattern, subject=subject, Rfixed="subject")

But since this subject only contains the substring of your 3' adaptor starting at letter 2, this trimLRPatterns call won't accomplish anything. So, it seems
that we have to take one of these two approaches:

        1) a 'new_pattern' like above but with at least 1 more N on the right,
        making it 1 letter longer than the subject, "on the left", if you will
or
        2) we have to allow for some edits (with indels)

Possibly, 1) can be made to work, but I get an error:

> new2
[1] "AATCGAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTGNNNNN NNNNNNNNNNN"
> nchar(new2)
[1] 82
> nchar(subject)
[1] 81
> trimLRPatterns(Rpattern=new2, subject=subject, Rfixed="subject", max.Rmismatch=1) Error in .Call2("solve_user_SEW", refwidths, start, end, width, translate.negative.coord, : solving row 1: 'allow.nonnarrowing' is FALSE and the supplied start (0) is < 1

If this is just a bug that can be fixed, then we might be done.

For 2), I would like to do this:

> trimLRPatterns(Rpattern=new_pattern, subject=subject, Rfixed="subject",
        with.Rindels=TRUE, max.Rmismatch=2)
  0-letter "DNAString" instance
seq:

I think this is the result you want.

Note that max.Rmismatch >= 2 is needed because of this:

> neditEndingAt(new_pattern, subject, ending.at=nchar(subject), with.indels=TRUE,
        fixed="subject")
[1] 2

However (!), what I've just done in the last 2 calls is NOT publicly available.
You would get:

        when 'with.indels' is TRUE, only 'fixed=TRUE' is supported

The ability to find adaptors which do not "flank" where they normally should is in the works, by me. Hopefully in finite time, but even so I fear it will
be a little short of your question.

On Sep 26, 2011, at 9:37 AM, wang peter wrote:
dear all:
 i found a problem trimLRPatterns cannot allow this situation
that is the Rpattern has a left shift from subject sequence, see below

subject
ATCGAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTGAAAAA AAAAAAATATT

Rpattern
AATCGAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTG

so i change the code to keep the 5' barcode GCATT, so the pattern can be
recognized


subject
GCATTATCGAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTG AAAAAAAAAAAATATT

Rpattern
AATCGAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTG

after remove 3' adaptor, i can remove the 5' barcode

any ideas?

shan gao

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
Bioc-sig-sequencing@r-project.org
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

_______________________________________________
Bioc-sig-sequencing mailing list
Bioc-sig-sequencing@r-project.org
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to