On Mon, Aug 17, 2015 at 12:45:50PM +0100, James Bonfield wrote:
> Have you tested that they work?  I don't seem able to get this working
> right now, but maybe I'm doing something wrong (or perhaps they have
> corner cases which fail).

Here is an example built on the latest cramtools-3.0.jar:

# .BAI index
$ java -Xmx2g -jar ~/work/cram/cramtools/cramtools-3.0.jar index 
--bam-style-index -I /nfs/users/nfs_j/jkb/scratch/data/9827_2#49.java3.cram
$ java -Xmx2g -jar ~/work/cram/cramtools/cramtools-3.0.jar bam -I 
/nfs/users/nfs_j/jkb/scratch/data/9827_2#49.java3.cram -R 
/nfs/srpipe_references/references/Human/1000Genomes_hs37d5/all/fasta/hs37d5.fa 
--skip-md5-check 1:100000000-100000000
[no output]

# .CRAI index
$ java -Xmx2g -jar ~/work/cram/cramtools/cramtools-3.0.jar index -I 
/nfs/users/nfs_j/jkb/scratch/data/9827_2#49.java3.cram
$ java -Xmx2g -jar ~/work/cram/cramtools/cramtools-3.0.jar bam -I 
/nfs/users/nfs_j/jkb/scratch/data/9827_2#49.java3.cram -R 
/nfs/srpipe_references/references/Human/1000Genomes_hs37d5/all/fasta/hs37d5.fa 
--skip-md5-check 1:100000000-100000000
HS25_09827:2:2214:6841:99219#49 99      1       99999945        60      100M    
=       100000329484     
CTGGGAGCCCCGCAGAACATGGTGTTTTATTCTGACACTATATATCTAGCACTTGCACTGTAAAAATGGAAGTAATTCCCATTAGGACCAGCAAAACCTG
     
AAEFFGFGGFHGGJ>HGGGHGG@HHJHIGHHHHELGGGHGGHGHFGHIFFIIHHLIFGFGIJGHHGGGGEGIGJGJFIIHJGIHEGHFFHGGGHEFHHIE
     X0:i:1  X1:i:0  BC:Z:NGTCTATC   RG:Z:1#49       XG:i:0  AM:i:37 SM:i:37 
XM:i:0   XO:i:0  QT:Z:!1=DFFFB   XT:A:U
...
[4 lines of output]


# Samtools on the BAM equivalent
$ samtools view /nfs/users/nfs_j/jkb/scratch/data/9827_2#49.bam 
1:100000000-100000000
[Also 4 lines of output]

This is just a random local human test file, but I haven't explored further.
It also fails on whole chromosomes though:

$ samtools view /nfs/users/nfs_j/jkb/scratch/data/9827_2#49.bam Y:|wc -l
417134

# With .CRAI
$ java -Xmx2g -jar ~/work/cram/cramtools/cramtools-3.0.jar bam -I 
/nfjkb/scratch/data/9827_2#49.java3.cram -R 
/nfs/srpipe_references/references/Human/1000Genomes_hs37d5/all/fasta/hs37d5.fa 
--skip-md5-check Y:|wc -l
417134

# With .BAI
$ java -Xmx2g -jar ~/work/cram/cramtools/cramtools-3.0.jar bam -I 
/nfjkb/scratch/data/9827_2#49.java3.cram -R 
/nfs/srpipe_references/references/Human/1000Genomes_hs37d5/all/fasta/hs37d5.fa 
--skip-md5-check Y:|wc -l
7134


The second issue is that I don't understand how .BAI works when we get
multiple references packed into a single slice within cram.  My
understanding is that BAI on CRAM is implemented by encoding the slice
coordinates into BAI instead of read coordinates, due to the
read virtual-offset not being workable within CRAM. 

Does it produce multiple BAI fake reads, one for each reference used
within the cram slice?

James

-- 
James Bonfield ([email protected]) | Hora aderat briligi. Nunc et Slythia Tova
                                  | Plurima gyrabant gymbolitare vabo;
  A Staden Package developer:     | Et Borogovorum mimzebant undique formae,
https://sf.net/projects/staden/   | Momiferique omnes exgrabure Rathi. 


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 

------------------------------------------------------------------------------
_______________________________________________
Samtools-help mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to