Hello UCSC group,

I like to get the coding sequence of gene from refseq mrna ids (like,
NM_003820) from hg18 version - big list of such ids.

So I am getting information of exonstarts , exonends, cdsStart, cdsend from
refFlat table under hg18.

So for NM_003820, the record looks like this:

geneName: TNFRSF14
      name: NM_003820
     chrom: chr1
    strand: -
   txStart: 2479150
     txEnd: 2486613
  cdsStart: 2479705
    cdsEnd: 2486314
 exonCount: 8
exonStarts: 2479150,2480082,2481163,2482264,2483000,2484510,2485144,2486245,
  exonEnds: 2479831,2480114,2481306,2482355,2483156,2484636,2485253,2486613,

To get the dna sequence corresponding to the coding regions, I am extracting
sequences from chr1.fa.gz file under chromosomes in hg18 version and then
extracting the dna sequence corresponding to the region:

2479705-2479831, 2480082-2480114, 2481163-2481306, 2482264-2482355,
2483000-2483156, 2484510-2484636, 2485144-2485253, 2486245-2486314

The corresponding sequence is not matching if I cross check with the
sequence from web. Can you please guide me whether I can extract sequence in
this way, or you already have sequences corresponding to genes stored
separately in your datanbase.

Thanks for your help.

Lipika
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to