Well, here's your first question: how sure are you that you are assembling the 
sequence correctly?

Unless you've modified the .fa file, you have to:
1: Skip the first line (variable length, depending on the .fa file)
2: Divide the offset to the start of the exon by 50 (the number of characters 
in a line) (fractions round down), to get the number of lines to skip.
3: Multiple the number of lines by 51 (the number of characters in a line, plus 
the new line character)
4: Skip forward (number of characters to skip modulo 50) characters.
5: Read to the end of the line.
6: Skip the line ending character.
7: Go back to 5 until you've read the entire amount.

There's all sorts of chances for screwups and off by one errors when coding 
that up.  So, what kind of differences are you getting between your sequences 
and the ones you get from the web?

Greg (No longer a UCSC person, never been on the browser team, speaking only 
for myself)

----- Original Message -----

Hello UCSC group,

I like to get the coding sequence of gene from refseq mrna ids (like,
NM_003820) from hg18 version - big list of such ids.

So I am getting information of exonstarts , exonends, cdsStart, cdsend from
refFlat table under hg18.

So for NM_003820, the record looks like this:

geneName: TNFRSF14
      name: NM_003820
     chrom: chr1
    strand: -
   txStart: 2479150
     txEnd: 2486613
  cdsStart: 2479705
    cdsEnd: 2486314
 exonCount: 8
exonStarts: 2479150,2480082,2481163,2482264,2483000,2484510,2485144,2486245,
  exonEnds: 2479831,2480114,2481306,2482355,2483156,2484636,2485253,2486613,

To get the dna sequence corresponding to the coding regions, I am extracting
sequences from chr1.fa.gz file under chromosomes in hg18 version and then
extracting the dna sequence corresponding to the region:

2479705-2479831, 2480082-2480114, 2481163-2481306, 2482264-2482355,
2483000-2483156, 2484510-2484636, 2485144-2485253, 2486245-2486314

The corresponding sequence is not matching if I cross check with the
sequence from web. Can you please guide me whether I can extract sequence in
this way, or you already have sequences corresponding to genes stored
separately in your datanbase.

Thanks for your help.

Lipika


------------------------------

Message: 5
Date: Wed, 8 Sep 2010 11:18:12 +0800 (HKT)
From: ??? <[email protected]>
Subject: [Genome] help
To: [email protected]
Message-ID:
        <[email protected]>
Content-Type: text/plain; charset=utf-8

dear:
when was UCSC genome browser project launched?
thanks!



------------------------------

_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome


End of Genome Digest, Vol 92, Issue 12
**************************************
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to