Hello, Yes, you are extracting the data from the Table browser correctly. It appears from examine the data at ENSEMBL, that the transcript data source UniProtKB/Swiss-Prot P27144 (KAD4_HUMAN) was very recently updated.
Last modified February 9, 2010 The new version of the transcript has a much longer 3' UTR. When I take the revised sequence and run a simple web BLAT, it aligns easily with 100% identity covering the same 3' UTR region as the currently existing transcript plus the extra data. Comparing to other datasets, RefSeq does not have this new variant. The human mRNA track has a single read that represents a portion of the extended UTR. Examining EST data, spliced ESTs to do not confirm the region but unspliced ESTs do with significant, overlapping tiling (but these cannot be stranded without a splice site, so it should be kept in mind that maybe there is another gene present on the minus strand, maybe even a pseudogene that lacks introns, the tiling being so complete is a bit suspicious). Sequence data from other species (other RefSeq, mRna, Est) suggest that there is evidence for some type of transcription in this region and it is often connected to the positive strand with splice sites. None of these are intron-free, as is reported in the ENSEMBL transcript. From examination of the Conservation data, the genomic is syntenically conserved at the genome level, from Chimp to mouse - and most mammals evolutionarily in between. The UCSC Genes track was revised last on 2009-10-08, therefore the extended ENSEMBL transcript was not considered. The extended 3' UTR does seem very likely to be a transcribed region of genome - perhaps the 3' UTR of this gene - perhaps extended through 2-3 exons. A solid, contiguous block of this length is possible, but does not quite fit with the other data. But we are looking at sequence evidence only, there may be more evidence based on laboratory results that are not apparent from this analysis perspective. Perhaps a review of the other evidence at ENSEMBL and keeping an eye on other datasets (in particular RefSeq) as this data is reviewed by other teams will help to determine/confirm what exactly this region represents. In summary, the new data may be a legitimate extended 3' UTR, perhaps multi-exon, or it may be represent confusion with a non-coding, unspliced, transcribed, gene/pseudogene on the minus strand. If you were able to correlate expression information with the region (using ENCODE or other microarray data) that may also provide some clues. I will leave that part of the analysis for you to explore. Hopefully this helps a bit, Jennifer --------------------------------- Jennifer Jackson UCSC Genome Bioinformatics Group http://genome.ucsc.edu/ On 2/16/10 2:17 PM, Manisha Brahmachary wrote: > Hello, > > > > I have a query regarding downloading 3'UTR for ensembl genes for Homo sapien. > > > > I am trying to download 3'UTR for all genes of ensembl (hg19) for Human > > > >> From the UCSC table I do the following: > > > > Clade: mammal: Genome: human assembly: GRCh37 > > Group: Genes and gene Predictions tracks track: ensemble genes > > Table: ensGene > > Region:genome > > Output format: sequence > > > >> From Ensembl Genes genomic Sequence browser > > Sequence Retreival Region Options > > I choose: 3'UTR exons > > One FASTA record per gene > > > > When I download the sequence and compare one FASTA sequence for gene > ENST00000327299 with the 3'UTR sequence of the same gene downloaded from > ensembl, I see the lengths are different. The UCSC sequence appears to a > subset of the ensemble downloaded 3'UTR sequence. (See below the two > sequences) > > > > QUESTION: 1. Am I doing the steps right to download the entire 3'UTR > sequence from UCSC table or am I just downloading a part of the 3'UTR region? > > > > > > > > See below: > > > > FROM UCSC: > >> hg19_ensGene_ENST00000327299 range=chr1:65691861-65693173 5'pad=0 3'pad=0 > strand=+ repeatMasking=none > > CCCTGCCCAATGGAAGAACCAGGAAGATGTGGTCATTCATTCAATAGTGT > > GTGTAGTATTGGTGCTGTGTCCAAATTAGAAGCTAGCTGAGGTAGCTTGC > > AGCATCTTTTCTAGTTGAAATGGTGAACTGATAGGAAAACAAATGAGTAG > > AAAGAGTTCATGAAGAGGCCCTCCTCTGCCTTTCAAAAGGCTGGTCACCT > > ACACATGTTTAAGGTGTCTCTGCACATGTCTCAAGCCCATCACAAGAAAG > > CAAGTACAGTGTGGATTTCAAATGGTGTGTAACTTCAGCTCCAGCTGGTT > > TTTGACAGCTGTTGCTGTGGTAATATTTTTGACATGTGATGGTGATAGTC > > TCTGGTTCTCCCCATCCCCACAAAGGCTGTTGAACCACAGCACCAGGAAG > > CCTGAGAATGAATCCTGAGGGCTCTAGCCCAGGCTTTGTCCCAGGCTTTC > > TGGTGTGTGCCCTCCTGGTAACAGTGAAATTGAAGCTACTTACTCATAGT > > GGTTGTTTCTCTGGTCTTGAGTGACTGTGTCCACAGTTCATTTTTTTCCG > > GTAGGAATAACTCCTTTTCTACATCCACGCTCCATAGAGTCTCTCCTTTT > > CAGACATCCTGGGATGAAAGAATTTGGCTTTTTTTTTTCTTTTTTTTTTT > > GGACATCTGTTTTCACTCTTAGGCTTTTAAACAATAGTTATTGCTTTTAT > > CCCTCTCAGATTCTAATAACTGAGAGCGATGGGGCTATATTGAATCTCTG > > TATGCACTGAGAACTGAGCTATGAAGAGGATCTTATTAAACTGCTGGTCT > > GACTTTATGGATTGACACTGTTCCTTTCTTTTATTGTGAAAAAAAAAAAA > > AACCCTGAAAGTCTTGGGAACCCCCTAAAGTCTTTTGGGAATCCTCAAAA > > AGCATGGGAAGTTAAGTATTTAGCTACATAAATGTTGTAAGATCATATCT > > TATGTATAGAAGTAATAAGACCATTTGGAATTACTGGACTAATTGAATAG > > TTAAGGTTTCTATTCGGGACAATAAAATGTATTTTGAAAGTGCTGCTAAC > > TATTGATGCTGACAGTGTTTCACTCCTATGAGTGACCCAAACATATTATA > > AATATGTGGTAAAGGGAATGGAGCCTGTGGGGTTGAGCAGAATGTTGTAC > > TAGCTGTGCCTGGACTGAGTATAACAGCTTTATGATTATGAGAAAACAAA > > TTCTTTATTTTTTTTTTCTGTTCCAAAGATTCATCCTATGGGGTGGCCAT > > AAAGTCTAGAATTAGATACTAATATTTTGTCATTCATTATAACATATCAA > > TAAACCATTTGTT > > > > FROM ENSEMBL > > > >> ENSG00000162433|ENST00000327299 > > CCCTGCCCAATGGAAGAACCAGGAAGATGTGGTCATTCATTCAATAGTGTGTGTAGTATT > > GGTGCTGTGTCCAAATTAGAAGCTAGCTGAGGTAGCTTGCAGCATCTTTTCTAGTTGAAA > > TGGTGAACTGATAGGAAAACAAATGAGTAGAAAGAGTTCATGAAGAGGCCCTCCTCTGCC > > TTTCAAAAGGCTGGTCACCTACACATGTTTAAGGTGTCTCTGCACATGTCTCAAGCCCAT > > CACAAGAAAGCAAGTACAGTGTGGATTTCAAATGGTGTGTAACTTCAGCTCCAGCTGGTT > > TTTGACAGCTGTTGCTGTGGTAATATTTTTGACATGTGATGGTGATAGTCTCTGGTTCTC > > CCCATCCCCACAAAGGCTGTTGAACCACAGCACCAGGAAGCCTGAGAATGAATCCTGAGG > > GCTCTAGCCCAGGCTTTGTCCCAGGCTTTCTGGTGTGTGCCCTCCTGGTAACAGTGAAAT > > TGAAGCTACTTACTCATAGTGGTTGTTTCTCTGGTCTTGAGTGACTGTGTCCACAGTTCA > > TTTTTTTCCGGTAGGAATAACTCCTTTTCTACATCCACGCTCCATAGAGTCTCTCCTTTT > > CAGACATCCTGGGATGAAAGAATTTGGCTTTTTTTTTTCTTTTTTTTTTTGGACATCTGT > > TTTCACTCTTAGGCTTTTAAACAATAGTTATTGCTTTTATCCCTCTCAGATTCTAATAAC > > TGAGAGCGATGGGGCTATATTGAATCTCTGTATGCACTGAGAACTGAGCTATGAAGAGGA > > TCTTATTAAACTGCTGGTCTGACTTTATGGATTGACACTGTTCCTTTCTTTTATTGTGAA > > AAAAAAAAAAAACCCTGAAAGTCTTGGGAACCCCCTAAAGTCTTTTGGGAATCCTCAAAA > > AGCATGGGAAGTTAAGTATTTAGCTACATAAATGTTGTAAGATCATATCTTATGTATAGA > > AGTAATAAGACCATTTGGAATTACTGGACTAATTGAATAGTTAAGGTTTCTATTCGGGAC > > AATAAAATGTATTTTGAAAGTGCTGCTAACTATTGATGCTGACAGTGTTTCACTCCTATG > > AGTGACCCAAACATATTATAAATATGTGGTAAAGGGAATGGAGCCTGTGGGGTTGAGCAG > > AATGTTGTACTAGCTGTGCCTGGACTGAGTATAACAGCTTTATGATTATGAGAAAACAAA > > TTCTTTATTTTTTTTTTCTGTTCCAAAGATTCATCCTATGGGGTGGCCATAAAGTCTAGA > > ATTAGATACTAATATTTTGTCATTCATTATAACATATCAATAAACCATTTGTTAAAAGAT > > TTGCCTGGTTTCCAGACTTGGTGGCCACCTTGAATAATTCTTGCTGTCTTCTGGGAAGGA > > TGATGAAATTTATTCCTGCTGCCTTAAAAATATGTATCCCTTCTTCACCCATCATGACTG > > TCCCCAGTGAGTGTCCTTTACTATTCTTGGGAGTGACTCCTGTCTAACTTTTCATACTGG > > CGAGAAGAAAAGAAGCCTATTTTAACACTTTAGTGGTGTTGAAACACATTACTTACTTTC > > TGAAGATGTCCCAGTGAATCCTCTGTCAATTCACTGCCATATGTAATCTATATGATAAGG > > AATGCATCTTCCTTCTAAGTACTGCCCAAACTCTTGCCAGCTCCTCTCCCATTGTCCCTT > > CATGTGAATATTTCTTGGCTACCTTAGTGGAAATATAGATCAGTTTTCTCCCCATCCATC > > CTCTCAAACATAATGAGATTGTTTACTTTTTAGATTTATGCAGTGAAAATGCCCAGTCAG > > GTCTGAATCGTCAGTGCATTATATTGACTCTGAGCACTTTAGAATTTAGAGTTGCAATTG > > AATGCCAGCTGTGGAGATGGGGTGCATATCAGATATATAAATAAAGCTCAGGTTTGCTAG > > GGAACCAGGTATAGAGAAAAATAAGTCTGATATGAGGAAAATTGCACAATTTAGAGTAGT > > TATGCCGTAGAGAAAATTTCCACAAACTAGGAAATGTAGAGAGTTATTCTATAGAATACT > > CAAAAGAGGAAAGTATGTGATTTTTGGAAACAGGAAAATCTTCAAACTTCTTTCTTCACT > > TCCCTTTGTGTTTAGCTGACCCTCCAATGTGATCATTGCCTTTGGAGTTTGGGAGAGGTA > > CGGGAAGTGGCCTGATCCCTGCTTCCATACTTCACTCCTCCATCCATCCTTCCCTCCCTC > > TTCCCCTCCAGCTAAATGGACAATTCTAGCCAACATTGAGTCACTCAATAAGTCTCAACA > > GTGGGTGTGTTTGCTGAGATTGTCCAGCGGTTGAGCAGTTTGGTCTCACCTCCCTCGCTA > > GTTGAGACCAAAAAGAGACAAATAACTTTTTCATGGTCTTTGAAACATAATGCTTATTTC > > GTGGTCAATGGCTTTAAAAAAATCTGTTTCTTGTTTTCTTCAACAAACTCACTAGTTTTC > > CCTTAAATGATATTGTAAAAATTAAAGTAATCTTGAAAATGTTTTGACAAAAGTAAAATT > > AAAGGGACATCTTTTCTTGTTTTGTTTTTTTTTTTTCTATTGCCACACATGACCGTTCCT > > TCACCTTTAAGCAAAGAGAGTGGTTCAGATGGTTTCTAAGATGCCAACCTGACCTCGCAT > > TCTGTCATTCTACCCAGCTCTTAATTCAATTTGCTTCCATTATCCTAACAGGCTTCTTTC > > TTACTTAGAACTTGGAAAGGCTGCTGTATTTAATACCCTCCAACACTAACGCAGACTTAA > > GATAGGTACTGTTTATTGAAAACCTACTGAGTGAAATGTGCGGTTTTAGGACCTTCATAA > > ACATCTCATTTAATCTTTCTAGCATCCTGTGAAACAGCCATGATTTCACGTTGATAAACA > > AAGAAGACAGGGGTCCCAGGGATGTGAAGCATCTTGCCCAGGCTTCTGCTGCTGGTGACC > > AGTGTAGCCAGGACTCCAGCCCAGGTTTTCCTGACTCAGAAGACTGAGCTTTTTCCTGGA > > TGTTATTAATAGCTAATTGTGTCCAAGCAACCAAGGGCCTTGAGTCTGCTTGGTTCTGCT > > TATGGCCTCACATCAAGAAATGGAGCTAGTCCATGTCTGTAGTCCCAATGCTTTGGGAAG > > CCATGATGGGAAGGTTGTCGGAGGCCAAAAGTTCAAGACCAGGCTGGGCAATATCACAAG > > ACTCCATCTCTACGGAAAAGTAAAAAATTAGCCAGTCATGGTGGTGTACACTTATGGTCC > > TAGTTACTCAGGAGACTTAGGCAGGAGGATTGCTTGATCCTAGGAATTCGAGGCTGCAGT > > GAGCTATGATTGCACCTCTGCACCCAAGCCTGGGCGACACAGCGAGACCCTCTCTCTTAA > > AAAAAAAAAATAGCAGAGCTCACCAAAGTGATGTTCACCTTTTTATGACATTCCTTTTTC > > TTAGCTTAAGAAAAGAAAGCTGCTAGATGAGAGTCTTAGTTTTCCTGCATAAGACCTCCT > > TTATGAATAGAATAAAAGACTGTCAAAGTAGGCTGGGCTTGGGCCCAGGCTAATCTATGA > > AGGAAGCAAGCTCGTGTTCCTTACCTATCCTTTTGGTGTCCATTGGATTGTGCCCCGAAG > > TGGCCTTTACCCTTGAGCCGTCCCCAGCCATGGTGCTCACACATAGGCTTTTGAGCTCCT > > TGGAGCTATCCAGATCCTGCTCACTTTTCCTTCCTGAGATCAGAACAAATCACCCCCTTA > > CTCCCACTCCAAACAAGGCCTTGATGATAAACTAATCCTTCCTAAAATGCTGGTAGGTAA > > ACAAGCAATGATGAAGCATTGAACACAGGTTAACTCCTGACTTTTGTACCATTGTCTATT > > CCATTACACATTAACATGACTCTGAATGCCAGATCCAAACCTTTGCCCACCATCTGCTTG > > TCGTGCAACAGTTGAGGCAGTAACCAGGGGAGATTCACTTCCTGTCTTGTCCTTCCCCAG > > GGATCACCCCCCTGCTGCCCTCTAGCAGCCAAACTCAGATGAGTTCCATTGTTACCCTAG > > GTGTGCCCATCTCTTTGGTAGGGAAGGAGAAAGGTAAGAATAGCCATCAGTGAGGAAGGA > > TTCTTGGAGCGAGGAGCCACTGTGGTTTTTCCTGCTATTTAAGATGTTGAGACCGGATAA > > CTTTAGAAAGATACCTGCACAAACCCATAAATAGTGCTTTTATAAAGTTTAGTTCACCGG > > AACCTGAGTTCAGTATTTGACATTAGCTTTTTGTCCAAAGAGTTGAAGCCTGCTGGAGGT > > CTTTGCTCAAATAATAAATACCACATATTTCCAAGTGTGTTCAGGTATAGGCACTAGGTA > > CTGTCTGTTTACTTCATGTTAGGCACATTACATGCATTGGCTAATCAAATCCTCATCAAT > > TACATATGTAATAATCTAAACTTGCCTCCTTGTATTATAAATGGAAATAATCCTGTTTAT > > TTAAACGGGTTTTCATGTACCTGTAGGGATTAGGAAACTCAAATGGCCTTTTTAATACCT > > TTCCCTAGTTTGAGCTCCCTGTTCTCTTTAACAGATAAAACAACATATTTGCTTCAGCCT > > GGAATCTGTTTTTGGTGCTTTGGTGCAGAGACAGGAAATGGGCACTCAGAGTCACACTGG > > TAGTTGCACACTGTATCTACAGAGGGCGTGTCTCATCTGTACTCTGCTGGGTTACAGGAT > > TTCAGTAGGTATTTGTGTCCACCTGAGAATTCTGTTTATTACCTTTCATTTGACAGTGTC > > TTTCCTTTCTGCAGTTGATTTTGCTAGAGAGGCAATTCATAAGGTGAGGTCCTGTTCATA > > GTATGACTTGCTTTCTCAATATCTCCTTCAATTTTTAGTAACTCTTGGTCTATTTGGTGT > > CTTTAAAAAAAATAACCTAGTAATAAAGACTTCTTTTAATGTGGAAATGTGGTCTGGTAG > > TAAGTTATTTCTTTCCACATGTAACTGACCCAATCTGGTTTCCAAATGAGAAGTGTGCAG > > GCCCCAGAGGTTGAGAAGCCATATTTCAACTGTGAAAAAAATCTGCTTCCTGCATCTGTT > > GAAATATAGTTGTTCATACTTGCCATCCCTTATCTTTCTTGTAACAATTTGCACAGTTCT > > TGCCAGAATAAATGCCATTATCTGTATGTTTCAGGGAGTTCCCCAATTTGATCATTTTTG > > TGTGTGTGTGGTGTGTGTGTGAGAGAGAGAGATACTGCAGTAAAACATTTCTAAAGGATG > > AAAGCTCTTGTATGGCATAGATATGAATTCCTTCCTCTGGTAATAATTAGGTTATTCCCA > > GAAGCACAGTGTCATTCTTTAAATAAAAGCTTTCCTGTTTAAAGCTTTTCAAAGGAGCAG > > ACCACCTTGAAGATTCCCCCTAGGGTTGATATGTGTCTAATTCATTTTATAAAAATTATT > > CTTGTCTTCATTTTAAAGCTTTGGCTATATAGTCAGAAATGTCCTAAATAACAAACTATT > > TTGTATTTAATTTAGGGAAGACTAAAGGGAAGAAAAATGAAAACTCAGTCTTTATGTAAG > > CTCCAAGGATATTAGGGCTTAAAGGGCTTTTCTAGTTTTATGAGAATTTGTACTACTGAT > > TTTTATATATTCCTGTTTTTGAGATGAACAGATCTCTGGGGAAATTGTTGAGTTACAATG > > GCATTTCACTGTGATCCCTCTCAAGCTCAGATCAGTTCTATAACCCAATGACAACCTGTC > > TCTTTGGTTTACTGTCCTGTGAAATGTCAGCTCAAGTTTCCCAGAAGTCGTGTGTTTATG > > ATGAGTCAGAGTGCTTTTCCTCGGTGGGACAGTTGCTGGCCCTCTTAATTTTGGTGTATG > > TGCTTCCAAGTATCTAAACCTCCAGTCTGATCTGTATATGCTATCCTAACTGTTAATTGT > > ATTATTGATTATGTTGATTATCTTGCTTGAAGGTTCATACTTTTCAATTTGATAGAAATA > > AAGTTTTTTTCTGCTTATA > > > > > > > > > > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
