Hello Lipika, Perhaps some help understanding the coordinate system used by UCSC will help. We use a 0-based start position. This can get tricky, especially when converting to the (-) strand, since we also store all coordinates smallest->largest along the chromosome.
Help is located in this wiki: http://genomewiki.ucsc.edu/index.php/Coordinate_Transforms All database tables/files will be formatted this way unless specifically noted in the data format FAQ: http://genome.ucsc.edu/FAQ/FAQformat.html There are utilities readily available that work with our coordinate system. Some function stand-alone and others require a database. The public mySQL database can be used when a database is required, if you do not run your own mirror. A list of utilities is here: http://hgwdev.cse.ucsc.edu/~larrym/utilities.html Many can be downloaded pre-compiled from here (for certain platforms): http://hgdownload.cse.ucsc.edu/admin/exe/ Otherwise, obtain the source and compile locally: http://hgdownload.cse.ucsc.edu/downloads.html#source_downloads Public mySQL access instructions: http://genome.ucsc.edu/FAQ/FAQdownloads.html#download29 Please feel free to contact the mailing list support team again if you would like more assistance. Warm regards, Jen UCSC Genome Browser Support On 9/8/10 11:35 AM, Lipika Ray wrote: > Hello UCSC group, > > I like to get the coding sequence of gene from refseq mrna ids (like, > NM_003820) from hg18 version - big list of such ids. > > So I am getting information of exonstarts , exonends, cdsStart, cdsend from > refFlat table under hg18. > > So for NM_003820, the record looks like this: > > geneName: TNFRSF14 > name: NM_003820 > chrom: chr1 > strand: - > txStart: 2479150 > txEnd: 2486613 > cdsStart: 2479705 > cdsEnd: 2486314 > exonCount: 8 > exonStarts: 2479150,2480082,2481163,2482264,2483000,2484510,2485144,2486245, > exonEnds: 2479831,2480114,2481306,2482355,2483156,2484636,2485253,2486613, > > To get the dna sequence corresponding to the coding regions, I am extracting > sequences from chr1.fa.gz file under chromosomes in hg18 version and then > extracting the dna sequence corresponding to the region: > > 2479705-2479831, 2480082-2480114, 2481163-2481306, 2482264-2482355, > 2483000-2483156, 2484510-2484636, 2485144-2485253, 2486245-2486314 > > The corresponding sequence is not matching if I cross check with the > sequence from web. Can you please guide me whether I can extract sequence in > this way, or you already have sequences corresponding to genes stored > separately in your datanbase. > > Thanks for your help. > > Lipika > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
