Hello UCSC group,
I like to get the coding sequence of gene from refseq mrna ids (like,
NM_003820) from hg18 version - big list of such ids.
So I am getting information of exonstarts , exonends, cdsStart, cdsend from
refFlat table under hg18.
So for NM_003820, the record looks like this:
geneName: TNFRSF14
name: NM_003820
chrom: chr1
strand: -
txStart: 2479150
txEnd: 2486613
cdsStart: 2479705
cdsEnd: 2486314
exonCount: 8
exonStarts: 2479150,2480082,2481163,2482264,2483000,2484510,2485144,2486245,
exonEnds: 2479831,2480114,2481306,2482355,2483156,2484636,2485253,2486613,
To get the dna sequence corresponding to the coding regions, I am extracting
sequences from chr1.fa.gz file under chromosomes in hg18 version and then
extracting the dna sequence corresponding to the region:
2479705-2479831, 2480082-2480114, 2481163-2481306, 2482264-2482355,
2483000-2483156, 2484510-2484636, 2485144-2485253, 2486245-2486314
The corresponding sequence is not matching if I cross check with the
sequence from web. Can you please guide me whether I can extract sequence in
this way, or you already have sequences corresponding to genes stored
separately in your datanbase.
Thanks for your help.
Lipika
_______________________________________________
Genome maillist - [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome