Hi Andrew,

Thank you for the reply. This certainly helps explains everything. I had
thought the samtools fastq command was able to directly convert a
coordinate sorted bam to fastq as it was reference in one of the forums
(can't remember which) for this particular functionality.

The forum also highlighted an older custom made program where someone
implemented this functionality by reading a coordinate sorted bam, creating
a buffer of reads and then only printing out the reads when the read names
matches.

Would be great if this functionality could be implemented in samtools since
read name sorting is simply too slow for large bams.


Thank you.


Regards,

Kar-Tong



On Mon, Mar 21, 2016 at 11:45 AM, Andrew Bjonnes <[email protected]
> wrote:

> Hi Kar-Tong,
>
> As I understand it, the fastq command will output reads in the same order
> they are encountered in the BAM file, so if the input BAM file is not
> sorted by filename, the output fastq files will not be sorted either. This
> matches the example input and outputs in your email.
>
>
> Andrew
>
> On Sun, Mar 20, 2016 at 11:12 PM Kar Tong Tan <[email protected]> wrote:
>
>> I like the new Samtools fastq function which allows me to convert a
>> coordinate sorted bam file into a fastq file without having to sort the
>> file by read name (which can take forever especially for a really big bam
>> file).
>>
>> However, I have noticed what seems like a bug while trying to convert a
>> bam file recently.
>>
>> Using the following command (Samtools is samtools v1.3), I converted my
>> bam to fastq file
>>
>> $ Samtools fastq -1 1.fq -2 2.fq ./alignments.bam
>>
>> However, if I take a look at the 1.fq and 2.fq files, I notice that the
>> reads in the fastq files are not sorted properly according to readnames.
>>
>> $ head 1.fq
>> @UNC13-SN749_82:3:1102:14504:162540/1
>> GTTAGGGTTGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTT
>> +
>> BBCFFFFDHHHHDHGHIJFGIHIJ?GHIJJFHHIJJDGHHJJDGEHIJ.B
>> @UNC13-SN749_82:3:2105:9477:158884/1
>> GCTCCTCTCCACAGGAAAACTCCACTCCAGTGCTCAGCTTGCACCCTGGC
>> +
>> ?B@FFFFFHHHHHJJJJJJJJJGJJJJJIIIIJJJIJJJJJJJJIGIJJJ
>> @UNC13-SN749_82:3:1207:3243:175188/1
>> TATTAAGTTACATGCAGACAACAGGGGCCAGAAGATGAACAATGGCCCAT
>>
>> $ head 2.fq
>> @UNC13-SN749_82:3:1102:14504:162540/2
>> TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTA
>> +
>> CCCFFFFFHHHHHJJJJJJJJJJJIIIJJJIJJJJJJJJJJJJJJIJJIG
>> @UNC13-SN749_82:3:1207:3243:175188/2
>> ATTTTCTTTGACCTCTTCCTTCTGTTCATGTGTATTTGCTGTCTCTTAGC
>> +
>> <@@FFFDFHFAHHIJIJG4FFIHIIIIHGIEHH>HHGHICHHIGEHHIII
>> @UNC13-SN749_82:3:2105:9477:158884/2
>> CTTCTTTCTGTTCATGTGTATTTGCTGTCTCTTAGCCCAGACTTCCCGTG
>>
>>
>>
>> If I look at the bamfile, this is what I see:
>>
>> UNC13-SN749_82:3:1102:14504:162540/2    163     chr1    10019   69
>>  50M     =       10068   99
>>  TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTA
>>  CCCFFFFFHHHHHJJJJJJJJJJJIIIJJJIJJJJJJJJJJJJJJIJJIG
>>  RG:Z:110714_UNC13-SN749_0082_AD0DGMABXX_3_      IH:i:1  HI:i:1  NM:i:0
>> UNC13-SN749_82:3:1102:14504:162540/1    83      chr1    10068   69
>>  50M     =       10019   -99
>> AACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAACCCTAAC
>>  B.JIHEGDJJHHGDJJIHHFJJIHG?JIHIGFJIHGHDHHHHDFFFFCBB
>>  RG:Z:110714_UNC13-SN749_0082_AD0DGMABXX_3_      IH:i:1  HI:i:1  NM:i:0
>> UNC13-SN749_82:3:1207:3243:175188/2     163     chr1    11886   56
>>  50M     =       12105   269
>> ATTTTCTTTGACCTCTTCCTTCTGTTCATGTGTATTTGCTGTCTCTTAGC
>>  <@@FFFDFHFAHHIJIJG4FFIHIIIIHGIEHH>HHGHICHHIGEHHIII
>>  RG:Z:110714_UNC13-SN749_0082_AD0DGMABXX_3_      IH:i:7  HI:i:1  NM:i:2
>> UNC13-SN749_82:3:2105:9477:158884/2     163     chr1    11900   69
>>  50M     =       12040   190
>> CTTCTTTCTGTTCATGTGTATTTGCTGTCTCTTAGCCCAGACTTCCCGTG
>>  CCCFFFFFHHHHHJJJIIHJIJJJJJJJJJJJJJJJJJJIJJIIIJJJIJ
>>  RG:Z:110714_UNC13-SN749_0082_AD0DGMABXX_3_      IH:i:6  HI:i:1  NM:i:0
>> UNC13-SN749_82:3:2105:9477:158884/1     83      chr1    12040   69
>>  50M     =       11900   -190
>>  GCCAGGGTGCAAGCTGAGCACTGGAGTGGAGTTTTCCTGTGGAGAGGAGC
>>  JJJIGIJJJJJJJJIJJJIIIIJJJJJGJJJJJJJJJHHHHHFFFFF@B?
>>  RG:Z:110714_UNC13-SN749_0082_AD0DGMABXX_3_      IH:i:6  HI:i:1  NM:i:0
>> UNC13-SN749_82:3:1201:10653:108594/2    137     chr1    12085   59
>>  50M     *       0       0
>> GGAGCCATGCCTAGAGTGGGATGGGCCATTGTTCATATTCTGGCCCCTGT
>>  CCCFFFFFHHHHHIJIEFHGIGGJJJJJJJJJJJJJJIJIJJIIJJJIJJ
>>  RG:Z:110714_UNC13-SN749_0082_AD0DGMABXX_3_      IH:i:8  HI:i:1  NM:i:1
>> UNC13-SN749_82:3:1207:3243:175188/1     83      chr1    12105   69
>>  50M     =       11886   -269
>>  ATGGGCCATTGTTCATCTTCTGGCCCCTGTTGTCTGCATGTAACTTAATA
>>  IIIIIIIHDGIJIIHCIIIJJJIGIIJJJIIJIIJIGHGHFHDFFFFCC@
>>  RG:Z:110714_UNC13-SN749_0082_AD0DGMABXX_3_      IH:i:7  HI:i:1  NM:i:0
>> UNC13-SN749_82:3:1108:9942:173119/1     89      chr1    12110   39
>>  50M     *       0       0
>> CCATTGTTCATATTCTGGCCCCTGTTGTCTGCATGTAACCTAATACCACG
>>  EGIGJJIGJJJJJJJIJHHIGJJJJJJIJIJIJJIJIHGHGHDDBFD@BB
>>  RG:Z:110714_UNC13-SN749_0082_AD0DGMABXX_3_      IH:i:8  HI:i:1  NM:i:3
>>
>>
>>
>> Does anyone know what might be causing this error? By the way, this is a
>> RNA-seq bam file.
>>
>>
>>
>> Thank you.
>>
>>
>> Kar-Tong
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Transform Data into Opportunity.
>> Accelerate data analysis in your applications with
>> Intel Data Analytics Acceleration Library.
>> Click to learn more.
>> http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
>> _______________________________________________
>> Samtools-help mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/samtools-help
>>
>
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
_______________________________________________
Samtools-help mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to