As I understand the question, you have first removed a 5' barcode
("GCATT"),
and you're wondering if you should have actually left it there to
facilitate
trimming by your 3' adaptor (which might not occur at the 3' end). I
don't
believe so.
Now, if your barcode-trimmed read ("ATCGAGAT...") actually contained
your 3'
adaptor ("AATCGAGAT...") or at least a prefix of it, things would be
easier.
Then we could try to proceed somewhat normally as follows:
subject
ATCGAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTGAAAAA
AAAAAAATATT
Rpattern
AATCGAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTG
NNN = paste(rep("N", nchar(subject)-nchar(Rpattern)), collapse="")
new_pattern = paste(Rpattern, NNN, sep="")
trimLRPatterns(Rpattern=new_pattern, subject=subject,
Rfixed="subject")
But since this subject only contains the substring of your 3' adaptor
starting
at letter 2, this trimLRPatterns call won't accomplish anything. So,
it seems
that we have to take one of these two approaches:
1) a 'new_pattern' like above but with at least 1 more N on the right,
making it 1 letter longer than the subject, "on the left", if you will
or
2) we have to allow for some edits (with indels)
Possibly, 1) can be made to work, but I get an error:
> new2
[1]
"AATCGAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTGNNNNN
NNNNNNNNNNN"
> nchar(new2)
[1] 82
> nchar(subject)
[1] 81
> trimLRPatterns(Rpattern=new2, subject=subject, Rfixed="subject",
max.Rmismatch=1)
Error in .Call2("solve_user_SEW", refwidths, start, end, width,
translate.negative.coord, :
solving row 1: 'allow.nonnarrowing' is FALSE and the supplied
start (0) is < 1
If this is just a bug that can be fixed, then we might be done.
For 2), I would like to do this:
> trimLRPatterns(Rpattern=new_pattern, subject=subject,
Rfixed="subject",
with.Rindels=TRUE, max.Rmismatch=2)
0-letter "DNAString" instance
seq:
I think this is the result you want.
Note that max.Rmismatch >= 2 is needed because of this:
> neditEndingAt(new_pattern, subject, ending.at=nchar(subject),
with.indels=TRUE,
fixed="subject")
[1] 2
However (!), what I've just done in the last 2 calls is NOT publicly
available.
You would get:
when 'with.indels' is TRUE, only 'fixed=TRUE' is supported
The ability to find adaptors which do not "flank" where they normally
should
is in the works, by me. Hopefully in finite time, but even so I fear
it will
be a little short of your question.
On Sep 26, 2011, at 9:37 AM, wang peter wrote:
dear all:
i found a problem trimLRPatterns cannot allow this situation
that is the Rpattern has a left shift from subject sequence, see below
subject
ATCGAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTGAAAAA
AAAAAAATATT
Rpattern
AATCGAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTG
so i change the code to keep the 5' barcode GCATT, so the pattern
can be
recognized
subject
GCATTATCGAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTG
AAAAAAAAAAAATATT
Rpattern
AATCGAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTG
after remove 3' adaptor, i can remove the 5' barcode
any ideas?
shan gao
[[alternative HTML version deleted]]
_______________________________________________
Bioc-sig-sequencing mailing list
Bioc-sig-sequencing@r-project.org
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
_______________________________________________
Bioc-sig-sequencing mailing list
Bioc-sig-sequencing@r-project.org
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing