Hello,

 

I have a query regarding downloading 3'UTR for ensembl genes for Homo sapien.

 

I am trying to download 3'UTR for all genes of ensembl (hg19) for Human

 

>From the UCSC table I do the following:

 

Clade: mammal: Genome: human assembly: GRCh37

Group: Genes and gene Predictions tracks track: ensemble genes

Table: ensGene

Region:genome

Output format: sequence

 

>From Ensembl Genes genomic Sequence browser

Sequence Retreival Region Options

I choose: 3'UTR exons

One FASTA record per gene

 

When I download the sequence and compare one FASTA sequence for gene
ENST00000327299 with the 3'UTR sequence of the same gene downloaded from
ensembl, I see the lengths are different. The UCSC sequence appears to a
subset of the ensemble downloaded 3'UTR sequence. (See below the two
sequences)

 

QUESTION:  1. Am I doing the steps right to download the entire 3'UTR
sequence from UCSC table or am I just downloading a part of the 3'UTR region?

 

 

 

See below:

 

FROM UCSC:

>hg19_ensGene_ENST00000327299 range=chr1:65691861-65693173 5'pad=0 3'pad=0
strand=+ repeatMasking=none

CCCTGCCCAATGGAAGAACCAGGAAGATGTGGTCATTCATTCAATAGTGT

GTGTAGTATTGGTGCTGTGTCCAAATTAGAAGCTAGCTGAGGTAGCTTGC

AGCATCTTTTCTAGTTGAAATGGTGAACTGATAGGAAAACAAATGAGTAG

AAAGAGTTCATGAAGAGGCCCTCCTCTGCCTTTCAAAAGGCTGGTCACCT

ACACATGTTTAAGGTGTCTCTGCACATGTCTCAAGCCCATCACAAGAAAG

CAAGTACAGTGTGGATTTCAAATGGTGTGTAACTTCAGCTCCAGCTGGTT

TTTGACAGCTGTTGCTGTGGTAATATTTTTGACATGTGATGGTGATAGTC

TCTGGTTCTCCCCATCCCCACAAAGGCTGTTGAACCACAGCACCAGGAAG

CCTGAGAATGAATCCTGAGGGCTCTAGCCCAGGCTTTGTCCCAGGCTTTC

TGGTGTGTGCCCTCCTGGTAACAGTGAAATTGAAGCTACTTACTCATAGT

GGTTGTTTCTCTGGTCTTGAGTGACTGTGTCCACAGTTCATTTTTTTCCG

GTAGGAATAACTCCTTTTCTACATCCACGCTCCATAGAGTCTCTCCTTTT

CAGACATCCTGGGATGAAAGAATTTGGCTTTTTTTTTTCTTTTTTTTTTT

GGACATCTGTTTTCACTCTTAGGCTTTTAAACAATAGTTATTGCTTTTAT

CCCTCTCAGATTCTAATAACTGAGAGCGATGGGGCTATATTGAATCTCTG

TATGCACTGAGAACTGAGCTATGAAGAGGATCTTATTAAACTGCTGGTCT

GACTTTATGGATTGACACTGTTCCTTTCTTTTATTGTGAAAAAAAAAAAA

AACCCTGAAAGTCTTGGGAACCCCCTAAAGTCTTTTGGGAATCCTCAAAA

AGCATGGGAAGTTAAGTATTTAGCTACATAAATGTTGTAAGATCATATCT

TATGTATAGAAGTAATAAGACCATTTGGAATTACTGGACTAATTGAATAG

TTAAGGTTTCTATTCGGGACAATAAAATGTATTTTGAAAGTGCTGCTAAC

TATTGATGCTGACAGTGTTTCACTCCTATGAGTGACCCAAACATATTATA

AATATGTGGTAAAGGGAATGGAGCCTGTGGGGTTGAGCAGAATGTTGTAC

TAGCTGTGCCTGGACTGAGTATAACAGCTTTATGATTATGAGAAAACAAA

TTCTTTATTTTTTTTTTCTGTTCCAAAGATTCATCCTATGGGGTGGCCAT

AAAGTCTAGAATTAGATACTAATATTTTGTCATTCATTATAACATATCAA

TAAACCATTTGTT

 

FROM ENSEMBL

 

>ENSG00000162433|ENST00000327299

CCCTGCCCAATGGAAGAACCAGGAAGATGTGGTCATTCATTCAATAGTGTGTGTAGTATT

GGTGCTGTGTCCAAATTAGAAGCTAGCTGAGGTAGCTTGCAGCATCTTTTCTAGTTGAAA

TGGTGAACTGATAGGAAAACAAATGAGTAGAAAGAGTTCATGAAGAGGCCCTCCTCTGCC

TTTCAAAAGGCTGGTCACCTACACATGTTTAAGGTGTCTCTGCACATGTCTCAAGCCCAT

CACAAGAAAGCAAGTACAGTGTGGATTTCAAATGGTGTGTAACTTCAGCTCCAGCTGGTT

TTTGACAGCTGTTGCTGTGGTAATATTTTTGACATGTGATGGTGATAGTCTCTGGTTCTC

CCCATCCCCACAAAGGCTGTTGAACCACAGCACCAGGAAGCCTGAGAATGAATCCTGAGG

GCTCTAGCCCAGGCTTTGTCCCAGGCTTTCTGGTGTGTGCCCTCCTGGTAACAGTGAAAT

TGAAGCTACTTACTCATAGTGGTTGTTTCTCTGGTCTTGAGTGACTGTGTCCACAGTTCA

TTTTTTTCCGGTAGGAATAACTCCTTTTCTACATCCACGCTCCATAGAGTCTCTCCTTTT

CAGACATCCTGGGATGAAAGAATTTGGCTTTTTTTTTTCTTTTTTTTTTTGGACATCTGT

TTTCACTCTTAGGCTTTTAAACAATAGTTATTGCTTTTATCCCTCTCAGATTCTAATAAC

TGAGAGCGATGGGGCTATATTGAATCTCTGTATGCACTGAGAACTGAGCTATGAAGAGGA

TCTTATTAAACTGCTGGTCTGACTTTATGGATTGACACTGTTCCTTTCTTTTATTGTGAA

AAAAAAAAAAAACCCTGAAAGTCTTGGGAACCCCCTAAAGTCTTTTGGGAATCCTCAAAA

AGCATGGGAAGTTAAGTATTTAGCTACATAAATGTTGTAAGATCATATCTTATGTATAGA

AGTAATAAGACCATTTGGAATTACTGGACTAATTGAATAGTTAAGGTTTCTATTCGGGAC

AATAAAATGTATTTTGAAAGTGCTGCTAACTATTGATGCTGACAGTGTTTCACTCCTATG

AGTGACCCAAACATATTATAAATATGTGGTAAAGGGAATGGAGCCTGTGGGGTTGAGCAG

AATGTTGTACTAGCTGTGCCTGGACTGAGTATAACAGCTTTATGATTATGAGAAAACAAA

TTCTTTATTTTTTTTTTCTGTTCCAAAGATTCATCCTATGGGGTGGCCATAAAGTCTAGA

ATTAGATACTAATATTTTGTCATTCATTATAACATATCAATAAACCATTTGTTAAAAGAT

TTGCCTGGTTTCCAGACTTGGTGGCCACCTTGAATAATTCTTGCTGTCTTCTGGGAAGGA

TGATGAAATTTATTCCTGCTGCCTTAAAAATATGTATCCCTTCTTCACCCATCATGACTG

TCCCCAGTGAGTGTCCTTTACTATTCTTGGGAGTGACTCCTGTCTAACTTTTCATACTGG

CGAGAAGAAAAGAAGCCTATTTTAACACTTTAGTGGTGTTGAAACACATTACTTACTTTC

TGAAGATGTCCCAGTGAATCCTCTGTCAATTCACTGCCATATGTAATCTATATGATAAGG

AATGCATCTTCCTTCTAAGTACTGCCCAAACTCTTGCCAGCTCCTCTCCCATTGTCCCTT

CATGTGAATATTTCTTGGCTACCTTAGTGGAAATATAGATCAGTTTTCTCCCCATCCATC

CTCTCAAACATAATGAGATTGTTTACTTTTTAGATTTATGCAGTGAAAATGCCCAGTCAG

GTCTGAATCGTCAGTGCATTATATTGACTCTGAGCACTTTAGAATTTAGAGTTGCAATTG

AATGCCAGCTGTGGAGATGGGGTGCATATCAGATATATAAATAAAGCTCAGGTTTGCTAG

GGAACCAGGTATAGAGAAAAATAAGTCTGATATGAGGAAAATTGCACAATTTAGAGTAGT

TATGCCGTAGAGAAAATTTCCACAAACTAGGAAATGTAGAGAGTTATTCTATAGAATACT

CAAAAGAGGAAAGTATGTGATTTTTGGAAACAGGAAAATCTTCAAACTTCTTTCTTCACT

TCCCTTTGTGTTTAGCTGACCCTCCAATGTGATCATTGCCTTTGGAGTTTGGGAGAGGTA

CGGGAAGTGGCCTGATCCCTGCTTCCATACTTCACTCCTCCATCCATCCTTCCCTCCCTC

TTCCCCTCCAGCTAAATGGACAATTCTAGCCAACATTGAGTCACTCAATAAGTCTCAACA

GTGGGTGTGTTTGCTGAGATTGTCCAGCGGTTGAGCAGTTTGGTCTCACCTCCCTCGCTA

GTTGAGACCAAAAAGAGACAAATAACTTTTTCATGGTCTTTGAAACATAATGCTTATTTC

GTGGTCAATGGCTTTAAAAAAATCTGTTTCTTGTTTTCTTCAACAAACTCACTAGTTTTC

CCTTAAATGATATTGTAAAAATTAAAGTAATCTTGAAAATGTTTTGACAAAAGTAAAATT

AAAGGGACATCTTTTCTTGTTTTGTTTTTTTTTTTTCTATTGCCACACATGACCGTTCCT

TCACCTTTAAGCAAAGAGAGTGGTTCAGATGGTTTCTAAGATGCCAACCTGACCTCGCAT

TCTGTCATTCTACCCAGCTCTTAATTCAATTTGCTTCCATTATCCTAACAGGCTTCTTTC

TTACTTAGAACTTGGAAAGGCTGCTGTATTTAATACCCTCCAACACTAACGCAGACTTAA

GATAGGTACTGTTTATTGAAAACCTACTGAGTGAAATGTGCGGTTTTAGGACCTTCATAA

ACATCTCATTTAATCTTTCTAGCATCCTGTGAAACAGCCATGATTTCACGTTGATAAACA

AAGAAGACAGGGGTCCCAGGGATGTGAAGCATCTTGCCCAGGCTTCTGCTGCTGGTGACC

AGTGTAGCCAGGACTCCAGCCCAGGTTTTCCTGACTCAGAAGACTGAGCTTTTTCCTGGA

TGTTATTAATAGCTAATTGTGTCCAAGCAACCAAGGGCCTTGAGTCTGCTTGGTTCTGCT

TATGGCCTCACATCAAGAAATGGAGCTAGTCCATGTCTGTAGTCCCAATGCTTTGGGAAG

CCATGATGGGAAGGTTGTCGGAGGCCAAAAGTTCAAGACCAGGCTGGGCAATATCACAAG

ACTCCATCTCTACGGAAAAGTAAAAAATTAGCCAGTCATGGTGGTGTACACTTATGGTCC

TAGTTACTCAGGAGACTTAGGCAGGAGGATTGCTTGATCCTAGGAATTCGAGGCTGCAGT

GAGCTATGATTGCACCTCTGCACCCAAGCCTGGGCGACACAGCGAGACCCTCTCTCTTAA

AAAAAAAAAATAGCAGAGCTCACCAAAGTGATGTTCACCTTTTTATGACATTCCTTTTTC

TTAGCTTAAGAAAAGAAAGCTGCTAGATGAGAGTCTTAGTTTTCCTGCATAAGACCTCCT

TTATGAATAGAATAAAAGACTGTCAAAGTAGGCTGGGCTTGGGCCCAGGCTAATCTATGA

AGGAAGCAAGCTCGTGTTCCTTACCTATCCTTTTGGTGTCCATTGGATTGTGCCCCGAAG

TGGCCTTTACCCTTGAGCCGTCCCCAGCCATGGTGCTCACACATAGGCTTTTGAGCTCCT

TGGAGCTATCCAGATCCTGCTCACTTTTCCTTCCTGAGATCAGAACAAATCACCCCCTTA

CTCCCACTCCAAACAAGGCCTTGATGATAAACTAATCCTTCCTAAAATGCTGGTAGGTAA

ACAAGCAATGATGAAGCATTGAACACAGGTTAACTCCTGACTTTTGTACCATTGTCTATT

CCATTACACATTAACATGACTCTGAATGCCAGATCCAAACCTTTGCCCACCATCTGCTTG

TCGTGCAACAGTTGAGGCAGTAACCAGGGGAGATTCACTTCCTGTCTTGTCCTTCCCCAG

GGATCACCCCCCTGCTGCCCTCTAGCAGCCAAACTCAGATGAGTTCCATTGTTACCCTAG

GTGTGCCCATCTCTTTGGTAGGGAAGGAGAAAGGTAAGAATAGCCATCAGTGAGGAAGGA

TTCTTGGAGCGAGGAGCCACTGTGGTTTTTCCTGCTATTTAAGATGTTGAGACCGGATAA

CTTTAGAAAGATACCTGCACAAACCCATAAATAGTGCTTTTATAAAGTTTAGTTCACCGG

AACCTGAGTTCAGTATTTGACATTAGCTTTTTGTCCAAAGAGTTGAAGCCTGCTGGAGGT

CTTTGCTCAAATAATAAATACCACATATTTCCAAGTGTGTTCAGGTATAGGCACTAGGTA

CTGTCTGTTTACTTCATGTTAGGCACATTACATGCATTGGCTAATCAAATCCTCATCAAT

TACATATGTAATAATCTAAACTTGCCTCCTTGTATTATAAATGGAAATAATCCTGTTTAT

TTAAACGGGTTTTCATGTACCTGTAGGGATTAGGAAACTCAAATGGCCTTTTTAATACCT

TTCCCTAGTTTGAGCTCCCTGTTCTCTTTAACAGATAAAACAACATATTTGCTTCAGCCT

GGAATCTGTTTTTGGTGCTTTGGTGCAGAGACAGGAAATGGGCACTCAGAGTCACACTGG

TAGTTGCACACTGTATCTACAGAGGGCGTGTCTCATCTGTACTCTGCTGGGTTACAGGAT

TTCAGTAGGTATTTGTGTCCACCTGAGAATTCTGTTTATTACCTTTCATTTGACAGTGTC

TTTCCTTTCTGCAGTTGATTTTGCTAGAGAGGCAATTCATAAGGTGAGGTCCTGTTCATA

GTATGACTTGCTTTCTCAATATCTCCTTCAATTTTTAGTAACTCTTGGTCTATTTGGTGT

CTTTAAAAAAAATAACCTAGTAATAAAGACTTCTTTTAATGTGGAAATGTGGTCTGGTAG

TAAGTTATTTCTTTCCACATGTAACTGACCCAATCTGGTTTCCAAATGAGAAGTGTGCAG

GCCCCAGAGGTTGAGAAGCCATATTTCAACTGTGAAAAAAATCTGCTTCCTGCATCTGTT

GAAATATAGTTGTTCATACTTGCCATCCCTTATCTTTCTTGTAACAATTTGCACAGTTCT

TGCCAGAATAAATGCCATTATCTGTATGTTTCAGGGAGTTCCCCAATTTGATCATTTTTG

TGTGTGTGTGGTGTGTGTGTGAGAGAGAGAGATACTGCAGTAAAACATTTCTAAAGGATG

AAAGCTCTTGTATGGCATAGATATGAATTCCTTCCTCTGGTAATAATTAGGTTATTCCCA

GAAGCACAGTGTCATTCTTTAAATAAAAGCTTTCCTGTTTAAAGCTTTTCAAAGGAGCAG

ACCACCTTGAAGATTCCCCCTAGGGTTGATATGTGTCTAATTCATTTTATAAAAATTATT

CTTGTCTTCATTTTAAAGCTTTGGCTATATAGTCAGAAATGTCCTAAATAACAAACTATT

TTGTATTTAATTTAGGGAAGACTAAAGGGAAGAAAAATGAAAACTCAGTCTTTATGTAAG

CTCCAAGGATATTAGGGCTTAAAGGGCTTTTCTAGTTTTATGAGAATTTGTACTACTGAT

TTTTATATATTCCTGTTTTTGAGATGAACAGATCTCTGGGGAAATTGTTGAGTTACAATG

GCATTTCACTGTGATCCCTCTCAAGCTCAGATCAGTTCTATAACCCAATGACAACCTGTC

TCTTTGGTTTACTGTCCTGTGAAATGTCAGCTCAAGTTTCCCAGAAGTCGTGTGTTTATG

ATGAGTCAGAGTGCTTTTCCTCGGTGGGACAGTTGCTGGCCCTCTTAATTTTGGTGTATG

TGCTTCCAAGTATCTAAACCTCCAGTCTGATCTGTATATGCTATCCTAACTGTTAATTGT

ATTATTGATTATGTTGATTATCTTGCTTGAAGGTTCATACTTTTCAATTTGATAGAAATA

AAGTTTTTTTCTGCTTATA

 

 

 

 

_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to