On 17/08/2015 15:31, James Bonfield wrote:
> On Mon, Aug 17, 2015 at 12:45:50PM +0100, James Bonfield wrote:
>> Have you tested that they work?  I don't seem able to get this working
>> right now, but maybe I'm doing something wrong (or perhaps they have
>> corner cases which fail).
> Here is an example built on the latest cramtools-3.0.jar:
>
> # .BAI index
> $ java -Xmx2g -jar ~/work/cram/cramtools/cramtools-3.0.jar index 
> --bam-style-index -I /nfs/users/nfs_j/jkb/scratch/data/9827_2#49.java3.cram
> $ java -Xmx2g -jar ~/work/cram/cramtools/cramtools-3.0.jar bam -I 
> /nfs/users/nfs_j/jkb/scratch/data/9827_2#49.java3.cram -R 
> /nfs/srpipe_references/references/Human/1000Genomes_hs37d5/all/fasta/hs37d5.fa
>  --skip-md5-check 1:100000000-100000000
> [no output]
>
> # .CRAI index
> $ java -Xmx2g -jar ~/work/cram/cramtools/cramtools-3.0.jar index -I 
> /nfs/users/nfs_j/jkb/scratch/data/9827_2#49.java3.cram
> $ java -Xmx2g -jar ~/work/cram/cramtools/cramtools-3.0.jar bam -I 
> /nfs/users/nfs_j/jkb/scratch/data/9827_2#49.java3.cram -R 
> /nfs/srpipe_references/references/Human/1000Genomes_hs37d5/all/fasta/hs37d5.fa
>  --skip-md5-check 1:100000000-100000000
> HS25_09827:2:2214:6841:99219#49 99      1       99999945        60      100M  
>   =       100000329484     
> CTGGGAGCCCCGCAGAACATGGTGTTTTATTCTGACACTATATATCTAGCACTTGCACTGTAAAAATGGAAGTAATTCCCATTAGGACCAGCAAAACCTG
>      
> AAEFFGFGGFHGGJ>HGGGHGG@HHJHIGHHHHELGGGHGGHGHFGHIFFIIHHLIFGFGIJGHHGGGGEGIGJGJFIIHJGIHEGHFFHGGGHEFHHIE
>      X0:i:1  X1:i:0  BC:Z:NGTCTATC   RG:Z:1#49       XG:i:0  AM:i:37 SM:i:37 
> XM:i:0   XO:i:0  QT:Z:!1=DFFFB   XT:A:U
> ...
> [4 lines of output]
>
>
> # Samtools on the BAM equivalent
> $ samtools view /nfs/users/nfs_j/jkb/scratch/data/9827_2#49.bam 
> 1:100000000-100000000
> [Also 4 lines of output]
>
> This is just a random local human test file, but I haven't explored further.
> It also fails on whole chromosomes though:
>
> $ samtools view /nfs/users/nfs_j/jkb/scratch/data/9827_2#49.bam Y:|wc -l
> 417134
>
> # With .CRAI
> $ java -Xmx2g -jar ~/work/cram/cramtools/cramtools-3.0.jar bam -I 
> /nfjkb/scratch/data/9827_2#49.java3.cram -R 
> /nfs/srpipe_references/references/Human/1000Genomes_hs37d5/all/fasta/hs37d5.fa
>  --skip-md5-check Y:|wc -l
> 417134
>
> # With .BAI
> $ java -Xmx2g -jar ~/work/cram/cramtools/cramtools-3.0.jar bam -I 
> /nfjkb/scratch/data/9827_2#49.java3.cram -R 
> /nfs/srpipe_references/references/Human/1000Genomes_hs37d5/all/fasta/hs37d5.fa
>  --skip-md5-check Y:|wc -l
> 7134
Looks like a bug in the code that skips to the first container in query. 
It should check all coordinates returned by index instead of just 
skipping the last one.
Do you have the numbers from Picard's ViewSam? Right now cramtools has 
it's own methods for random access but I'm thinking to switch it to 
htsjdk completely.
>
>
> The second issue is that I don't understand how .BAI works when we get
> multiple references packed into a single slice within cram.  My
> understanding is that BAI on CRAM is implemented by encoding the slice
> coordinates into BAI instead of read coordinates, due to the
> read virtual-offset not being workable within CRAM.
>
> Does it produce multiple BAI fake reads, one for each reference used
> within the cram slice?
It does not do anything meaningful for multiref at this point. It could 
produce fake reads as you say, although at a cost of uncompressing records.

Vadim
>
> James
>


-- 
Vadim Zalunin
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
United Kingdom
Tel: + 44 (0) 1223 494 614
Fax: + 44 (0) 1223 494 468


------------------------------------------------------------------------------
_______________________________________________
Samtools-help mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to