Hi, I'm currently working on a variety of SNP-calling pipelines to update reference genomes, and have run into a serious yet seemingly simple problem going from the SAM output of BWA-MEM to a sorted BAM for use with GATK. I'm using BWA v0.7.12-r044 and samtools v1.2-99-ge2bb18f, with the following bwa mem call: bwa mem -t 30 -R '@RG\tID:1\tSM:1\tPL:ILLUMINA\tLB:1' [ref.fa] [Illumina_SE_reads.fastq.gz] > [SAM file]
Based on the results of samtools view -h on the resultant SAM file, there is indeed a valid @RG header, and there are RG:Z:1 tags in all of the corresponding alignment records. Both the header and alignment RG tags are maintained during SAM->BAM conversion (samtools view -b), but sorting with output as BAM or SAM results in the loss of the RG header, while the alignment RG tags remain (thereby corrupting the SAM/BAM file, according to GATK). I've also noticed that the PG header is lost during the sorting procedure. Is there a bug in the merging procedure that drops the @RG and @PG headers? The bug does not occur with samtools sort in samtools 0.1.18, so I would guess that it resides in htslib. Thanks, Patrick Reilly
------------------------------------------------------------------------------
_______________________________________________ Samtools-help mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/samtools-help
