On Mon, Aug 17, 2015 at 12:45:50PM +0100, James Bonfield wrote:
> Have you tested that they work? I don't seem able to get this working
> right now, but maybe I'm doing something wrong (or perhaps they have
> corner cases which fail).
Here is an example built on the latest cramtools-3.0.jar:
# .BAI index
$ java -Xmx2g -jar ~/work/cram/cramtools/cramtools-3.0.jar index
--bam-style-index -I /nfs/users/nfs_j/jkb/scratch/data/9827_2#49.java3.cram
$ java -Xmx2g -jar ~/work/cram/cramtools/cramtools-3.0.jar bam -I
/nfs/users/nfs_j/jkb/scratch/data/9827_2#49.java3.cram -R
/nfs/srpipe_references/references/Human/1000Genomes_hs37d5/all/fasta/hs37d5.fa
--skip-md5-check 1:100000000-100000000
[no output]
# .CRAI index
$ java -Xmx2g -jar ~/work/cram/cramtools/cramtools-3.0.jar index -I
/nfs/users/nfs_j/jkb/scratch/data/9827_2#49.java3.cram
$ java -Xmx2g -jar ~/work/cram/cramtools/cramtools-3.0.jar bam -I
/nfs/users/nfs_j/jkb/scratch/data/9827_2#49.java3.cram -R
/nfs/srpipe_references/references/Human/1000Genomes_hs37d5/all/fasta/hs37d5.fa
--skip-md5-check 1:100000000-100000000
HS25_09827:2:2214:6841:99219#49 99 1 99999945 60 100M
= 100000329484
CTGGGAGCCCCGCAGAACATGGTGTTTTATTCTGACACTATATATCTAGCACTTGCACTGTAAAAATGGAAGTAATTCCCATTAGGACCAGCAAAACCTG
AAEFFGFGGFHGGJ>HGGGHGG@HHJHIGHHHHELGGGHGGHGHFGHIFFIIHHLIFGFGIJGHHGGGGEGIGJGJFIIHJGIHEGHFFHGGGHEFHHIE
X0:i:1 X1:i:0 BC:Z:NGTCTATC RG:Z:1#49 XG:i:0 AM:i:37 SM:i:37
XM:i:0 XO:i:0 QT:Z:!1=DFFFB XT:A:U
...
[4 lines of output]
# Samtools on the BAM equivalent
$ samtools view /nfs/users/nfs_j/jkb/scratch/data/9827_2#49.bam
1:100000000-100000000
[Also 4 lines of output]
This is just a random local human test file, but I haven't explored further.
It also fails on whole chromosomes though:
$ samtools view /nfs/users/nfs_j/jkb/scratch/data/9827_2#49.bam Y:|wc -l
417134
# With .CRAI
$ java -Xmx2g -jar ~/work/cram/cramtools/cramtools-3.0.jar bam -I
/nfjkb/scratch/data/9827_2#49.java3.cram -R
/nfs/srpipe_references/references/Human/1000Genomes_hs37d5/all/fasta/hs37d5.fa
--skip-md5-check Y:|wc -l
417134
# With .BAI
$ java -Xmx2g -jar ~/work/cram/cramtools/cramtools-3.0.jar bam -I
/nfjkb/scratch/data/9827_2#49.java3.cram -R
/nfs/srpipe_references/references/Human/1000Genomes_hs37d5/all/fasta/hs37d5.fa
--skip-md5-check Y:|wc -l
7134
The second issue is that I don't understand how .BAI works when we get
multiple references packed into a single slice within cram. My
understanding is that BAI on CRAM is implemented by encoding the slice
coordinates into BAI instead of read coordinates, due to the
read virtual-offset not being workable within CRAM.
Does it produce multiple BAI fake reads, one for each reference used
within the cram slice?
James
--
James Bonfield ([email protected]) | Hora aderat briligi. Nunc et Slythia Tova
| Plurima gyrabant gymbolitare vabo;
A Staden Package developer: | Et Borogovorum mimzebant undique formae,
https://sf.net/projects/staden/ | Momiferique omnes exgrabure Rathi.
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
------------------------------------------------------------------------------
_______________________________________________
Samtools-help mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/samtools-help