Hi Hiram, Thanks for your answer, but something it is not very clear to me:
I do not understand this: "In a database table ensGene, this gene is specified as coordinates: 14747-15888 in UCSC chrM 0-based coordinates 14746-15887 in chrM_rCRS 0-based coordinates " If the chrM_rCRS 14746 is already 0-based, why does UCSC add +1 to it? I think this should not happen. chrM_rCRS coordinates is 1 based and and thus starts from 14747. Also, I do not understand how this can be changed to "14748" as below? I believe this is not correct. Could you please clarify? Thank you, Gulsah ________________________________ From: Hiram Clawson <[email protected]> To: Laura Smith <[email protected]> Cc: [email protected] Sent: Friday, March 23, 2012 10:52 AM Subject: Re: [Genome] ensemble version Good Morning Laura: In a database table ensGene, this gene is specified as coordinates: 14747-15888 in UCSC chrM 0-based coordinates 14746-15887 in chrM_rCRS 0-based coordinates When viewing in the genome-test browser, this gene is seen at: chrM:14748-15888 which is a 1-based coordinate (If it were chrM_rCRS coordinates, it would be chrM_rCRS:14747-15887) UniProt correctly displays the seventh codon ATT == "I" The UCSC genome browser correctly displays this as "I" Ensembl incorrectly displays this as "T" The download protein from the UCSC genome browser serves up the Ensembl protein with the incorrect "T" The display of these proteins on the external genome.ucsc.edu browser are currently 1 position out of place. The display has been corrected on genome-test. --Hiram ----- Original Message ----- From: "Laura Smith" <[email protected]> To: "Hiram Clawson" <[email protected]> Cc: [email protected] Sent: Friday, March 23, 2012 10:06:24 AM Subject: Re: [Genome] ensemble version Hi Hiram, Thank you very much for your answer. I would like to clarify one more thing: I just looked at the ENSEMBL website for the specific transcript you gave as an example below. When I did a search on UCSC genome browser for " ENST00000361789 " , the coordinate 14746 becomes 14747 which is +1. 1. Is this because the coordinate 14746 is in BED format start coordinate which is 0 based whereas 14747 is in genomic coordinate format which is 1-based? 2. Please see attached figure. The protein sequence shown for this transcript seems to be wrong. It is not the same as the sequence given below: http://www.uniprot.org/uniprot/P00156 Could you please clarify if the protein sequence displayed for ENST00000361789 is wrong on the UCSC browser website? thank you, Laura From: Hiram Clawson <[email protected]> To: Laura Smith <[email protected]> Cc: "[email protected]" <[email protected]> Sent: Thursday, March 22, 2012 3:10 PM Subject: Re: [Genome] ensemble version This also means that if you obtained DNA sequence, then what you have matches nothing at all since the coordinates are chrM_rCRS but the DNA is UCSC chrM. --Hiram Hiram Clawson wrote: > Good Afternoon Laura: > > I have confirmed that the tables we have hosted on hg19 since at least > November 2010 (ens v60) have been > identical for the chrM predictions. They do appear to be predictions for > chrM_rCRS instead > of the UCSC chrM. We have mistakenly shown them in their chrM_rCRS locations > on the UCSC > chrM sequence. When you say "transcripts" are you talking about the gene > prediction locations, > for example: > ENST00000361789 chrM + 14746 15887 14746 15887 1 14746, 15887, 0 > ENSG00000198727 cmpl incmpl 0, > Or are you referring to the protein sequence: > ENST00000361789 > MTPMRKTNPLMKLINHSFIDLPTPSNISAWWNFGSLLGACLILQITTGLFLAMHYSPDASTAFSSIAHITRDVNYGWIIRYLH > > ANGASMFFICLFLHIGRGLYYGSFLYSETWNIGIILLLATMATAFMGYVLPWGQMSFWGATVITNLLSAIPYIGTDLVQWIWGGYSVDSPTLTRFFTFH > > FILPFIIAALATLHLLFLHETGSNNPLGITSHSDKITFHPYYTIKDALGLLLFLLSLMTLTLFSPDLLGDPDNYTLANPLNTPPHIKPEWYFLFAYTIL > > RSVPNKLGGVLALLLSILILAMIPILHMSKQQSMMFRPLSQSLYWLLAADLLILTWIGGQPVSYPFTIIGQVASVLYFTTILILMPTISLIENKMLKWA > > > Both of these have been chrM_rCRS since 2010. > > --Hiram > > Laura Smith wrote: >> Hi Vanessa, >> >> Thank you for your reply. However, this is not the answer to what I was >> asking for. >> Let me make the question short and more clear: >> >> Question: >> I downloaded ENSEMBL transcripts from UCSC website using “tables� tab >> on 06/2011 (version 62 of ENSEMBL at that time). Would these transcripts I >> downloaded form UCSC already contain the correct coordinates for the rCRS >> chr M?"   >> >> I just need a "yes" or "no" answer. >> >> >> >> Let me give you some information that may be useful for you to be able to >> answer this question more clearly: >> >> Facts: >> >> 1. The human MT genome has been replaced by the revised reference sequence >> (rCRS) NC_012920 (AC_000021) in Ensembl 57 > (March 2010). See the news at the bottom of > the page in the link below: > http://mar2010.archive.ensembl.org/Homo_sapiens/Info/WhatsNew >> >> >> >> 2. So, any version of Ensembl after Ensembl version 57 would include the new >> rCRS chrM trancript coordinates. >> >> 2. UCSC genome browser has NOT converted to rCRS chrM sequence and is still >> using the old sequence for chrM. >> >> >> 3. UCSC genome browser currently provides "tables" tab for users to download >> ENSEMBL sequences. >> >> >> 4. It is not clear that if the users download the ENSEMBL transcripts from >> UCSC genome browser, will they get the new rCRS chrM coordinates or the old >> chrM coordinates for these ENSEMBL transcripts??  >> This is the issue. >> >> >> thanks, >> Laura > > ------------------------------------------------------------------------ > > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
