Hi Irina, Thank you for reducing this to a small test case. Can you tell us more about how human.fa.fai was generated? Does human.fa have any sequence names like "chrgi|251831106|ref|NC_012920.1" ? Our code is expecting short sequence identifiers that we can match to our assembly's sequences, like "chr1" or "1" or perhaps "chr21_gl000210_random" (21 characters). We don't know what to do with a sequence identifier like "chrgi|....", even if it appears only in the BAM header. Even if you shorten the sequence name to NC_012920.1, we still can't display alignments to that sequence because it is not in our database.
Here is a workaround to try (note > instead of >> because test.bam must be overwritten): first use the -t to add the header as before, then translate back into SAM temporarily so we can use grep to remove chrgi sequences from both header and mappings: samtools view -bt human.fa.fai test.sam | samtools view -h - | grep -vFw chrgi | samtools view -Sb - > test.bam NC_012920 in particular is the revised Cambridge Reference Sequence (rCRS) of the mitochondrion. Unfortunately UCSC's hg19 uses a different individual's mitochondrial sequence for "chrM", which has a small indel so the coordinates don't map perfectly from NC_012920 to chrM. We will use NC_012920 for chrM in the next human genome assembly. If you need to view NC_012920 mappings in hg19, I can send a coordinate-mapping file for use with our liftOver tool that can translate NC_012920 coords into hg19 chrM coords. Hope that helps, and if you have more questions please send them to us at [email protected] . Angie ----- Original Message ----- > From: "I Pulyakhina" <[email protected]> > To: [email protected] > Sent: Wednesday, February 1, 2012 10:59:24 AM > Subject: [Genome] buffer overflow in UCSC > Dear UCSC developers, > > I have the following problem. I try to upload a bam file in UCSC > (http://genome.ucsc.edu/), just a one-read file. I have this sam-file: > ======================== > DD7DT8Q1:4:1106:19806:39174#AAGGAT 147 chrX 33055128 40 > 47M19797N4M1D10M39S = 33055121 -124 > AGATGTGACAGAGAGTCCTGGAAGGTTTTGATTGCATTTTCTGAGAAGGAGATGTGACAGAGAGTCCTGGAAGGTTTTGATTGCATTTTCTGAGGTGTAC > cgfeggggffgfgggggaggggggggggggggggggggggggggggggggggggggggggggggbggggggggggggggggggggggggggggggggggg > MD:Z:45GT4^T0C9 NH:i:1 NM:i:3 SM:i:40 XQ:i:40 X2:i:0 XS:A:- > ======================== > > I do the following: > ======================== > > samtools view -bt human.fa.fai test.sam >> test.bam > > samtools sort test.bam test.sort (even though it's senseless in my > > case of one read I still do this) > > samtools index test.sort.bam > ======================== > > Then I try to upload it like this to the UCSC: > track type=bam name="My_BAM" > bigDataUrl=http://barmsijs.lumc.nl/test.sort.bam > > And I get the following error: > ======================== > Error : buffer overflow, size 32, format: %s, buffer: > 'chrgi|251831106|ref|NC_012920.1' > ======================== > > Can anyone help me with this? I tried to find the solutions on the > mailing list archives but found nothing suitable. I'd really > appreciate your help. > > Thanks in advance! > > Cheers, > Irina Pulyakhina, > 1st year PhD student, > The Department of Human Genetics, > Leiden University Medical Center, > Leiden, the Netherlands > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
