Thanks Nils.
I agree with what Nils wrote, with 2 additions:
1. There is no error correction on the tags, a single base-pair read-error
is understood to be a wholly different barcode
2. There is an option to use two tags, one from read1 and one from read2,
but as Nils said, due to the lack of standards now, I would encourage
concatenation of similar UMIs rather than a proliferation of tags.
Yossi.
On Wed, Jun 29, 2016 at 10:51 PM, Nils Homer <[email protected]> wrote:
> Hey Richard,
>
> there is an on-going discussion about what tag(s) to use for molecular
> identifiers. You can see the proposal here:
> https://github.com/samtools/hts-specs/pull/119. I have CC'ed the original
> submitter (yfarjoun).
>
> In short, the recommendation is to use the *BC* for the sample barcode,
> and *RX* for the molecular barcode. You can then specify
> "BARCODE_TAG=RX" in *MarkDuplicates* to only mark two reads as duplicates
> if the value in their *RX* tags are the same in addition to the other
> criteria. Furthermore, I would recommend concatenating the two hexamer
> sequences, as long as they were attached at the same time. The reason I
> say this is because there are a number of technologies that also use
> multiple molecular barcodes that are integrated at various points in the
> sample and library preparation process. For example, a single-cell barcode
> versus a unique molecule barcode. In this case, there is no convention
> defined yet, but likely you will want them stored in different tags so you
> can treat them differently, if that is warranted.
>
> Sincerely,
>
> Nils Homer
>
> On Tue, Jun 28, 2016 at 5:38 PM, Richard Corbett <[email protected]>
> wrote:
>
>> Hi all,
>>
>> We are working on an application in our wet lab where we introduce random
>> hexamers into our adapters to allow us to differentiate between PCR
>> duplicates and fragments that were present in multiple copies of starting
>> material. The workflow for processing the sequence is that we trim the
>> first and last six bases off of our single-ended reads and use those bases
>> as our "FP" or "fragment provenance". When we want to mark the duplicates
>> we only want to mark reads as duplicates if they align to the same position
>> AND they have the same FP. Reads aligning to the same position with
>> different FPs should not marked as duplicates.
>>
>> If I'm not mistaken, I now see in the Picard docs that I can supply a
>> BARCODE_TAG to MarkDuplicates to mark the duplicates as I describe above.
>> My question to the group is if it is more appropriate to use the BC tag, or
>> to use some custom tags to hold the hexamer sequences associated with each
>> read. I expect that the BC tag should be reserved for the classical
>> sample multiplexing application (many libraries in a pool).
>>
>> If you are still reading I have another question for you:
>> -Since our reads are single-ended and we have two separate hexamer
>> sequences (one from each end of the read)...would you recommend
>> concatenating them into one sequence (perhaps with a delimiter) and keeping
>> them in one tag, or separating them into two tags? Would there be any
>> issues in Picard if we have single ended sequencing but supply two
>> different barcode-like tags?
>>
>> thanks,
>> Richard
>>
>>
>> --
>> The contents of this electronic mail transmission are intended to be
>> CONFIDENTIAL and for the sole use of the designated recipient. If this
>> message has been misdirected, please contact the sender as soon as possible.
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Attend Shape: An AT&T Tech Expo July 15-16. Meet us at AT&T Park in San
>> Francisco, CA to explore cutting-edge tech and listen to tech luminaries
>> present their vision of the future. This family event has something for
>> everyone, including kids. Get more information and register today.
>> http://sdm.link/attshape
>> _______________________________________________
>> Samtools-help mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/samtools-help
>>
>>
>
------------------------------------------------------------------------------
Attend Shape: An AT&T Tech Expo July 15-16. Meet us at AT&T Park in San
Francisco, CA to explore cutting-edge tech and listen to tech luminaries
present their vision of the future. This family event has something for
everyone, including kids. Get more information and register today.
http://sdm.link/attshape
_______________________________________________
Samtools-help mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/samtools-help