Hi Andrew,
Thank you for the reply. This certainly helps explains everything. I had
thought the samtools fastq command was able to directly convert a
coordinate sorted bam to fastq as it was reference in one of the forums
(can't remember which) for this particular functionality.
The forum also highlighted an older custom made program where someone
implemented this functionality by reading a coordinate sorted bam, creating
a buffer of reads and then only printing out the reads when the read names
matches.
Would be great if this functionality could be implemented in samtools since
read name sorting is simply too slow for large bams.
Thank you.
Regards,
Kar-Tong
On Mon, Mar 21, 2016 at 11:45 AM, Andrew Bjonnes <[email protected]
> wrote:
> Hi Kar-Tong,
>
> As I understand it, the fastq command will output reads in the same order
> they are encountered in the BAM file, so if the input BAM file is not
> sorted by filename, the output fastq files will not be sorted either. This
> matches the example input and outputs in your email.
>
>
> Andrew
>
> On Sun, Mar 20, 2016 at 11:12 PM Kar Tong Tan <[email protected]> wrote:
>
>> I like the new Samtools fastq function which allows me to convert a
>> coordinate sorted bam file into a fastq file without having to sort the
>> file by read name (which can take forever especially for a really big bam
>> file).
>>
>> However, I have noticed what seems like a bug while trying to convert a
>> bam file recently.
>>
>> Using the following command (Samtools is samtools v1.3), I converted my
>> bam to fastq file
>>
>> $ Samtools fastq -1 1.fq -2 2.fq ./alignments.bam
>>
>> However, if I take a look at the 1.fq and 2.fq files, I notice that the
>> reads in the fastq files are not sorted properly according to readnames.
>>
>> $ head 1.fq
>> @UNC13-SN749_82:3:1102:14504:162540/1
>> GTTAGGGTTGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTT
>> +
>> BBCFFFFDHHHHDHGHIJFGIHIJ?GHIJJFHHIJJDGHHJJDGEHIJ.B
>> @UNC13-SN749_82:3:2105:9477:158884/1
>> GCTCCTCTCCACAGGAAAACTCCACTCCAGTGCTCAGCTTGCACCCTGGC
>> +
>> ?B@FFFFFHHHHHJJJJJJJJJGJJJJJIIIIJJJIJJJJJJJJIGIJJJ
>> @UNC13-SN749_82:3:1207:3243:175188/1
>> TATTAAGTTACATGCAGACAACAGGGGCCAGAAGATGAACAATGGCCCAT
>>
>> $ head 2.fq
>> @UNC13-SN749_82:3:1102:14504:162540/2
>> TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTA
>> +
>> CCCFFFFFHHHHHJJJJJJJJJJJIIIJJJIJJJJJJJJJJJJJJIJJIG
>> @UNC13-SN749_82:3:1207:3243:175188/2
>> ATTTTCTTTGACCTCTTCCTTCTGTTCATGTGTATTTGCTGTCTCTTAGC
>> +
>> <@@FFFDFHFAHHIJIJG4FFIHIIIIHGIEHH>HHGHICHHIGEHHIII
>> @UNC13-SN749_82:3:2105:9477:158884/2
>> CTTCTTTCTGTTCATGTGTATTTGCTGTCTCTTAGCCCAGACTTCCCGTG
>>
>>
>>
>> If I look at the bamfile, this is what I see:
>>
>> UNC13-SN749_82:3:1102:14504:162540/2 163 chr1 10019 69
>> 50M = 10068 99
>> TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTA
>> CCCFFFFFHHHHHJJJJJJJJJJJIIIJJJIJJJJJJJJJJJJJJIJJIG
>> RG:Z:110714_UNC13-SN749_0082_AD0DGMABXX_3_ IH:i:1 HI:i:1 NM:i:0
>> UNC13-SN749_82:3:1102:14504:162540/1 83 chr1 10068 69
>> 50M = 10019 -99
>> AACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAACCCTAAC
>> B.JIHEGDJJHHGDJJIHHFJJIHG?JIHIGFJIHGHDHHHHDFFFFCBB
>> RG:Z:110714_UNC13-SN749_0082_AD0DGMABXX_3_ IH:i:1 HI:i:1 NM:i:0
>> UNC13-SN749_82:3:1207:3243:175188/2 163 chr1 11886 56
>> 50M = 12105 269
>> ATTTTCTTTGACCTCTTCCTTCTGTTCATGTGTATTTGCTGTCTCTTAGC
>> <@@FFFDFHFAHHIJIJG4FFIHIIIIHGIEHH>HHGHICHHIGEHHIII
>> RG:Z:110714_UNC13-SN749_0082_AD0DGMABXX_3_ IH:i:7 HI:i:1 NM:i:2
>> UNC13-SN749_82:3:2105:9477:158884/2 163 chr1 11900 69
>> 50M = 12040 190
>> CTTCTTTCTGTTCATGTGTATTTGCTGTCTCTTAGCCCAGACTTCCCGTG
>> CCCFFFFFHHHHHJJJIIHJIJJJJJJJJJJJJJJJJJJIJJIIIJJJIJ
>> RG:Z:110714_UNC13-SN749_0082_AD0DGMABXX_3_ IH:i:6 HI:i:1 NM:i:0
>> UNC13-SN749_82:3:2105:9477:158884/1 83 chr1 12040 69
>> 50M = 11900 -190
>> GCCAGGGTGCAAGCTGAGCACTGGAGTGGAGTTTTCCTGTGGAGAGGAGC
>> JJJIGIJJJJJJJJIJJJIIIIJJJJJGJJJJJJJJJHHHHHFFFFF@B?
>> RG:Z:110714_UNC13-SN749_0082_AD0DGMABXX_3_ IH:i:6 HI:i:1 NM:i:0
>> UNC13-SN749_82:3:1201:10653:108594/2 137 chr1 12085 59
>> 50M * 0 0
>> GGAGCCATGCCTAGAGTGGGATGGGCCATTGTTCATATTCTGGCCCCTGT
>> CCCFFFFFHHHHHIJIEFHGIGGJJJJJJJJJJJJJJIJIJJIIJJJIJJ
>> RG:Z:110714_UNC13-SN749_0082_AD0DGMABXX_3_ IH:i:8 HI:i:1 NM:i:1
>> UNC13-SN749_82:3:1207:3243:175188/1 83 chr1 12105 69
>> 50M = 11886 -269
>> ATGGGCCATTGTTCATCTTCTGGCCCCTGTTGTCTGCATGTAACTTAATA
>> IIIIIIIHDGIJIIHCIIIJJJIGIIJJJIIJIIJIGHGHFHDFFFFCC@
>> RG:Z:110714_UNC13-SN749_0082_AD0DGMABXX_3_ IH:i:7 HI:i:1 NM:i:0
>> UNC13-SN749_82:3:1108:9942:173119/1 89 chr1 12110 39
>> 50M * 0 0
>> CCATTGTTCATATTCTGGCCCCTGTTGTCTGCATGTAACCTAATACCACG
>> EGIGJJIGJJJJJJJIJHHIGJJJJJJIJIJIJJIJIHGHGHDDBFD@BB
>> RG:Z:110714_UNC13-SN749_0082_AD0DGMABXX_3_ IH:i:8 HI:i:1 NM:i:3
>>
>>
>>
>> Does anyone know what might be causing this error? By the way, this is a
>> RNA-seq bam file.
>>
>>
>>
>> Thank you.
>>
>>
>> Kar-Tong
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Transform Data into Opportunity.
>> Accelerate data analysis in your applications with
>> Intel Data Analytics Acceleration Library.
>> Click to learn more.
>> http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
>> _______________________________________________
>> Samtools-help mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/samtools-help
>>
>
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
_______________________________________________
Samtools-help mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/samtools-help