Hi Rob,

I tried that and it worked. Thanks.

I always thought "*" was for sequence not stored rather than that the
sequence is zero length. But as it works I'll stick with it.

Cheers, Colin

On 16 January 2017 at 17:54, Robert Davies <[email protected]> wrote:

> On Mon, 16 Jan 2017, Colin Hercus wrote:
>
> Sometimes we get reads to align that have been trimmed to zero length and
>> I'm wondering how these should be represented in SAM format.
>>
>> Here's a pair as reported by Novoalign that had been trimmed by cutadapt
>> and one read of the pair is zero length
>>
>> READID    77    *    0    0    *    *    0    0        *    PG:Z:novoalign
>> READID    141    *    0    0    *    *    0    0
>> GTGTAGATCTCGGTGGTCGCCGTATCATTAAAAAAAAAAGGGG
>> EEDDB:=<;A9/=C=@A;:<,1:<[email protected]<./;;;AC.;;5@::    PG:Z:novoalign
>>
>> The first read of the pair has a zero length SEQ field.
>>
>> This pair fails with a parse error in Samtools Version: 1.2 (using htslib
>> 1.2.1) but is accepted by Samtools Version: 0.1.19-44428cd.
>>
>> What is a valid SAM record for a zero length read?
>>
>
> The sequence should be '*' rather than blank.  In fact, the latest version
> of samtools seems to correct your record to this instead of complaining
> about it:
>
> cat > /tmp/test.sam
> READID  77      *       0       0       *       *       0       0
>      *       PG:Z:novoalign
> READID  141     *       0       0       *       *       0       0
> GTGTAGATCTCGGTGGTCGCCGTATCATTAAAAAAAAAAGGGG
>  EEDDB:=<;A9/=C=@A;:<,1:<[email protected]<./;;;AC.;;5@::     PG:Z:novoalign
>
> samtools view /tmp/test.sam
> READID  77      *       0       0       *       *       0       0       *
>      *       PG:Z:novoalign
> READID  141     *       0       0       *       *       0       0
> GTGTAGATCTCGGTGGTCGCCGTATCATTAAAAAAAAAAGGGG
>  EEDDB:=<;A9/=C=@A;:<,1:<[email protected]<./;;;AC.;;5@::     PG:Z:novoalign
>
>
> Rob Davies              [email protected]
> The Sanger Institute    http://www.sanger.ac.uk/
> Hinxton, Cambs.,        Tel. +44 (1223) 834244
> CB10 1SA, U.K.          Fax. +44 (1223) 494919
>
>
> --
> The Wellcome Trust Sanger Institute is operated by Genome Research
> Limited, a charity registered in England with number 1021457 and a company
> registered in England with number 2742969, whose registered office is 215
> Euston Road, London, NW1 2BE.
------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
_______________________________________________
Samtools-help mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to