On 14 Mar 2019, at 15:14, Aengus Stewart <[email protected]> wrote:
> I completely understand that and I can fix it. However I am just pointing
> out the current default we are getting from the
> illumina sequencers.
>
> So either
> Illumina needs to conform to the current SAM format in bcl2fastq
> The SAM format needs to be updated :-)
> Everyone who uses the -C option needs to reformat all of their FASTQ files if
> the files are dual index
The FASTQ format is not SAM. What you’re really seeing is the lack of standards
and conventions around representing metadata on FASTQ @ lines.
BWA’s -C option has a convention of interpreting the stuff after the read name
as SAM tagged fields, which is nicely general purpose and not a bad idea if you
want to put arbitrary SAM tagged fields through the aligner. OTOH Illumina has
its own conventions around what’s on the @ line:
>>> @M02212:177:000000000-CBJHK:1:1101:11456:1264 1:N:0:AGGCAGAA+CTCTCTAT
What would be handy would be if BWA also had an option to interpret Illumina’s
1:N:0:AGGCAGAA+CTCTCTAT metadata and re-encode it into appropriate SAM flags
and tagged fields. It doesn’t, and in the meantime everyone gets to write
scripts to do that reformatting.
John
_______________________________________________
Samtools-help mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/samtools-help