Hi Irina,

Thank you for reducing this to a small test case.  Can you tell us more about 
how human.fa.fai was generated?  Does human.fa have any sequence names like 
"chrgi|251831106|ref|NC_012920.1" ?  Our code is expecting short sequence 
identifiers that we can match to our assembly's sequences, like "chr1" or "1" 
or perhaps "chr21_gl000210_random" (21 characters).  We don't know what to do 
with a sequence identifier like "chrgi|....", even if it appears only in the 
BAM header.  Even if you shorten the sequence name to NC_012920.1, we still 
can't display alignments to that sequence because it is not in our database.

Here is a workaround to try (note > instead of >> because test.bam must be 
overwritten): first use the -t to add the header as before, then translate back 
into SAM temporarily so we can use grep to remove chrgi sequences from both 
header and mappings:

  samtools view -bt human.fa.fai test.sam | samtools view -h - | grep -vFw 
chrgi | samtools view -Sb - > test.bam

NC_012920 in particular is the revised Cambridge Reference Sequence (rCRS) of 
the mitochondrion.  Unfortunately UCSC's hg19 uses a different individual's 
mitochondrial sequence for "chrM", which has a small indel so the coordinates 
don't map perfectly from NC_012920 to chrM.  We will use NC_012920 for chrM in 
the next human genome assembly.  If you need to view NC_012920 mappings in 
hg19, I can send a coordinate-mapping file for use with our liftOver tool that 
can translate NC_012920 coords into hg19 chrM coords.

Hope that helps, and if you have more questions please send them to us at 
[email protected] .

Angie

----- Original Message -----
> From: "I Pulyakhina" <[email protected]>
> To: [email protected]
> Sent: Wednesday, February 1, 2012 10:59:24 AM
> Subject: [Genome] buffer overflow in UCSC
> Dear UCSC developers,
> 
> I have the following problem. I try to upload a bam file in UCSC
> (http://genome.ucsc.edu/), just a one-read file. I have this sam-file:
> ========================
> DD7DT8Q1:4:1106:19806:39174#AAGGAT 147 chrX 33055128 40
> 47M19797N4M1D10M39S = 33055121 -124
> AGATGTGACAGAGAGTCCTGGAAGGTTTTGATTGCATTTTCTGAGAAGGAGATGTGACAGAGAGTCCTGGAAGGTTTTGATTGCATTTTCTGAGGTGTAC
> cgfeggggffgfgggggaggggggggggggggggggggggggggggggggggggggggggggggbggggggggggggggggggggggggggggggggggg
> MD:Z:45GT4^T0C9 NH:i:1 NM:i:3 SM:i:40 XQ:i:40 X2:i:0 XS:A:-
> ========================
> 
> I do the following:
> ========================
> > samtools view -bt human.fa.fai test.sam >> test.bam
> > samtools sort test.bam test.sort (even though it's senseless in my
> > case of one read I still do this)
> > samtools index test.sort.bam
> ========================
> 
> Then I try to upload it like this to the UCSC:
> track type=bam name="My_BAM"
> bigDataUrl=http://barmsijs.lumc.nl/test.sort.bam
> 
> And I get the following error:
> ========================
> Error : buffer overflow, size 32, format: %s, buffer:
> 'chrgi|251831106|ref|NC_012920.1'
> ========================
> 
> Can anyone help me with this? I tried to find the solutions on the
> mailing list archives but found nothing suitable. I'd really
> appreciate your help.
> 
> Thanks in advance!
> 
> Cheers,
> Irina Pulyakhina,
> 1st year PhD student,
> The Department of Human Genetics,
> Leiden University Medical Center,
> Leiden, the Netherlands
> _______________________________________________
> Genome maillist - [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to