Hello, Aritra. As one of my colleagues wisely pointed out to me, there are some instances other than with haplotype chromosomes where you might encounter duplicate IDs. In a case where RefSeq mRNAs align well to multiple loci, both alignments are kept and you will see the same RefSeq ID in two different locations in the genome. It is rare, but it happens. Take NR_026818 as an example. If you perform a search on NR_026818 in hg19, you will find it at both chr1:34,611-36,081 and chr19:77,220-77,690.
If you want to avoid a situation such as this, you can download the RefSeq mRNA sequences from the Table Browser, which will eliminate any duplicates, or you can download the original Genbank file from http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/refMrna.fa.gz. The only drawback to the Genbank file is that it also contains mRNAs that did not align to the genome. To obtain the RefSeq mRNA sequences from the Table Browser, perform the following steps: 1. From http://genome.ucsc.edu, select "Tables" from the blue navigation bar at the top of the screen. 2. Select the following options: Clade: Mammal Genome: Human Assembly: Feb. 2009 (GRCh37/hg19) Group: Genes and Gene Prediction Tracks Track: RefSeq Genes Table: refGene Region: genome Output format: sequence Output file: specify a filename 3. Click the "get output" button 4. Select "mRNA" and click the "submit" button Please contact us again at [email protected] if you have any further questions. --- Steve Heitner UCSC Genome Bioinformatics Group -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Steve Heitner Sent: Tuesday, June 12, 2012 10:10 AM To: 'Aritra Deb'; [email protected] Subject: Re: [Genome] Whether the refseq accession no. not unique Hello, Aritra. You are correct that RefSeq IDs are unique and that you should generally not have more than one result per RefSeq ID. In this particular case, however, the gene referenced in your previously-answered mailing list question (NR_001298) occurs on chromosome 6. A quick search of NR_001298 in hg19 reveals two search results: one on chr6 and one on chr6_cox_hap2 which is a haplotype of chromosome 6 (please see the previously-answered mailing list question at https://lists.soe.ucsc.edu/pipermail/genome/2006-December/012416.html). In hg19, there are haplotype chromosomes associated with chr4, chr6 and chr17 (http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&chromInfoPage=). Any genes that occur in these haplotype chromosomes will likely show up twice in any search results. Please contact us again at [email protected] if you have any further questions. --- Steve Heitner UCSC Genome Bioinformatics Group -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Aritra Deb Sent: Tuesday, June 12, 2012 7:40 AM To: [email protected] Subject: [Genome] Whether the refseq accession no. not unique Hello, While working with Blat I got an error; "blat: fuzzyFind.c:1438: ffFind: Assertion `hayStart <= hayEnd' failed" and I found an answer in this page; " http://www.mail-archive.com/[email protected]/msg01650.html". Now please tell me why there is same refseq accession no. assigned to different sequences in refseq sequence files downloaded from ucsc table browser. Is those accession no. are not unique? Please reply soon. Thanks. Aritra _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
