The number of pre-built utilities we have available is much smaller than the number that we have in our entire source code system. We only build the ones for which there is high demand. 32-bit versions are basically ignored. Most people are using 64-bit systems because they are common now and because they are often needed for large genomic databases' on systems with much more than 4GB RAM.
Even for the case of Linux 64-bit, our pre-compiled utilities will only work on a fraction of all systems out there. People can just get the source and compile it if they want access to everything. -Galt On 09/08/10 20:23, Lipika Ray wrote: > Hello Galt, > > Many many thanks for your kind reply - that helped me a lot to > understand where I was doing mistake - I was following every step, only > got confused with 0-based counting - what to write in substr function in > perl - now it is straight - thanks. > -64 > Another thing I was not aware of those utilities - we have linux 64-bit > and 32-bit machines - it seems all programs are not available for all > the executables - like you mentioned about twoBitToFa and faRc - I got > twoBitToFa for 64-bit machine not the other one. For 32-bit it seems > nothing is available except liftover. -- Am I missing something again or > it is true that all utilities are not available for all platforms? > > Thanks again for your detailed help - now I am getting correct sequences. > Thanks, > > Lipika > > On Wed, Sep 8, 2010 at 5:03 PM, Galt Barber <[email protected] > <mailto:[email protected]>> wrote: > > Hi, Lipika! > > I was able to do something very similar to the process you > describe and it worked. Here are my results: > > I made a file called ranges which has each exon > > [hgwdev:~/lipika> cat ranges > chr1:2479150-2479831 > chr1:2480082-2480114 > chr1:2481163-2481306 > chr1:2482264-2482355 > chr1:2483000-2483156 > chr1:2484510-2484636 > chr1:2485144-2485253 > chr1:2486245-2486613 > > twoBitToFa /gbdb/hg18/hg18.2bit -seqList=ranges out.fa > > This extracts the pieces as separate sequences. > > Then I merge the several exon pieces into a single new fasta record > creating a new header and stripping out the original multiple headers. > echo ">NM_003820_from_gp" > test.fa > cat out.fa | grep -v '>' >> test.fa > > Then I reverse-complement the results since it was reported on the > negative strand: > faRc test.fa testRC.fa > > >RC_NM_003820_from_gp > ccttcataccggcccttcccctcggctttgcctggacagctcctgcctcc > cgcagggcccacctgtgtcccccagcgccgctccacccagcaggcctgag > cccctctctgctgccagacaccccctgctgcccactctcctgctgctcgg > gttctgaggcacagcttgtcacaccgaggcggattctctttctctttctc > tttctcttctggcccacagccgcagcaatggcgctgagttcctctgctgg > agttcatcctgctagctgggttcccgagctgccggtctgagcctgaggca > tggagcctcctggagactgggggcctcctccctggagatccacccccaaa > accgacgtcttgaggctggtgctgtatctcaccttcctgggagccccctg > ctacgccccagctctgccgtcctgcaaggaggacgagtacccagtgggct > ccgagtgctgccccaagtgcagtccaggttatcgtgtgaaggaggcctgc > ggggagctgacgggcacagtgtgtgaaccctgccctccaggcacctacat > tgcccacctcaatggcctaagcaagtgtctgcagtgccaaatgtgtgacc > cagccatgggcctgcgcgcgagccggaactgctccaggacagagaacgcc > gtgtgtggctgcagcccaggccacttctgcatcgtccaggacggggacca > ctgcgccgcgtgccgcgcttacgccacctccagcccgggccagagggtgc > agaagggaggcaccgagagtcaggacaccctgtgtcagaactgccccccg > gggaccttctctcccaatgggaccctggaggaatgtcagcaccagaccaa > gtgcagctggctggtgacgaaggccggagctgggaccagcagctcccact > gggtatggtggtttctctcagggagcctcgtcatcgtcattgtttgctcc > acagttggcctaatcatatgtgtgaaaagaagaaagccaaggggtgatgt > agtcaaggtgatcgtctccgtccagcggaaaagacaggaggcagaaggtg > aggccacagtcattgaggccctgcaggcccctccggacgtcaccacggtg > gccgtggaggagacaataccctcattcacggggaggagcccaaaccactg > acccacagactctgcaccccgacgccagagatacctggagcgacggctgc > tgaaagaggctgtccacctggcggaaccaccggagcccggaggcttgggg > gctccgccctgggctggcttccgtctcctccagtggagggagaggtgggg > cccctgctggggtagagctggggacgccacgtgccattcccatgggccag > tgagggcctggggcctctgttctgctgtggcctgagctccccagagtcct > gaggaggagcgccagttgcccctcgctcacagaccacacacccagccctc > ctgggccagcccagagggcccttcagaccccagctgtctgcgcgtctgac > tcttgtggcctcagcaggacaggccccgggcactgcctcacagccaaggc > tggactgggttggctgcagtgtggtgtttagtggataccacatcggaagt > gattttctaaattggatttgaattcggctcctgttttctatttgtcatga > aacagtgtatttggggagatgctgtgggaggatgtaaatatcttgtttct > cctcaa > > Here is the browser output for hg18 refSeq NM_003820 > cDNA NM_003820 > > CCTTCATACC GGCCCTTCCC CTCGGCTTTG CCTGGACAGC TCCTGCCTCC 50 > CGCAGGGCCC ACCTGTGTCC CCCAGCGCCG CTCCACCCAG CAGGCCTGAG 100 > CCCCTCTCTG CTGCCAGACA CCCCCTGCTG CCCACTCTCC TGCTGCTCGG 150 > GTTCTGAGGC ACAGCTTGTC ACACCGAGGC GGATTCTCTT TCTCTTTCTC 200 > TTTCTCTTCT GGCCCACAGC CGCAGCAATG GCGCTGAGTT CCTCTGCTGG 250 > AGTTCATCCT GCTAGCTGGG TTCCCGAGCT GCCGGTCTGA GCCTGAGGCA 300 > TGGAGCCTCC TGGAGACTGG GGGCCTCCTC CCTGGAGATC CACCCCCAAA 350 > ACCGACGTCT TGAGGCTGGT GCTGTATCTC ACCTTCCTGG GAGCCCCCTG 400 > CTACGCCCCA GCTCTGCCGT CCTGCAAGGA GGACGAGTAC CCAGTGGGCT 450 > CCGAGTGCTG CCCCAAGTGC AGTCCAGGTT ATCGTGTGAA GGAGGCCTGC 500 > GGGGAGCTGA CGGGCACAGT GTGTGAACCC TGCCCTCCAG GCACCTACAT 550 > TGCCCACCTC AATGGCCTAA GCAAGTGTCT GCAGTGCCAA ATGTGTGACC 600 > CAGCCATGGG CCTGCGCGCG AGCCGGAACT GCTCCAGGAC AGAGAACGCC 650 > GTGTGTGGCT GCAGCCCAGG CCACTTCTGC ATCGTCCAGG ACGGGGACCA 700 > CTGCGCCGCG TGCCGCGCTT ACGCCACCTC CAGCCCGGGC CAGAGGGTGC 750 > AGAAGGGAGG CACCGAGAGT CAGGACACCC TGTGTCAGAA CTGCCCCCCG 800 > GGGACCTTCT CTCCCAATGG GACCCTGGAG GAATGTCAGC ACCAGACCAA 850 > GTGCAGCTGG CTGGTGACGA AGGCCGGAGC TGGGACCAGC AGCTCCCACT 900 > GGGTATGGTG GTTTCTCTCA GGGAGCCTCG TCATCGTCAT TGTTTGCTCC 950 > ACAGTTGGCC TAATCATATG TGTGAAAAGA AGAAAGCCAA GGGGTGATGT 1000 > AGTCAAGGTG ATCGTCTCCG TCCAGCGGAA AAGACAGGAG GCAGAAGGTG 1050 > AGGCCACAGT CATTGAGGCC CTGCAGGCCC CTCCGGACGT CACCACGGTG 1100 > GCCGTGGAGG AGACAATACC CTCATTCACG GGGAGGAGCC CAAACCACTG 1150 > ACCCACAGAC TCTGCACCCC GACGCCAGAG ATACCTGGAG CGACGGCTGC 1200 > TGAAAGAGGC TGTCCACCTG GCGGAACCAC CGGAGCCCGG AGGCTTGGGG 1250 > GCTCCGCCCT GGGCTGGCTT CCGTCTCCTC CAGTGGAGGG AGAGGTGGGG 1300 > CCCCTGCTGG GGTAGAGCTG GGGACGCCAC GTGCCATTCC CATGGGCCAG 1350 > TGAGGGCCTG GGGCCTCTGT TCTGCTGTGG CCTGAGCTCC CCAGAGTCCT 1400 > GAGGAGGAGC GCCAGTTGCC CCTCGCTCAC AGACCACACA CCCAGCCCTC 1450 > CTGGGCCAGC CCAGAGGGCC CTTCAGACCC CAGCTGTCTG CGCGTCTGAC 1500 > TCTTGTGGCC TCAGCAGGAC AGGCCCCGGG CACTGCCTCA CAGCCAAGGC 1550 > TGGACTGGGT TGGCTGCAGT GTGGTGTTTA GTGGATACCA CATCGGAAGT 1600 > GATTTTCTAA ATTGGATTTG AATTCGGCTC CTGTTTTCTA TTTGTCATGA 1650 > AACAGTGTAT TTGGGGAGAT GCTGTGGGAG GATGTAAATA TCTTGTTTCT 1700 > CCTCAAaaaa aaaaaaaaaa aaaaaaaaaa > > As you can see, they are a very good match. > > -Galt > > > On 09/08/10 13:23, Jennifer Jackson wrote: > > Hello Lipika, > > Perhaps some help understanding the coordinate system used by > UCSC will help. We use a 0-based start position. This can get > tricky, especially when converting to the (-) strand, since we > also store all coordinates smallest->largest along the chromosome. > > Help is located in this wiki: > http://genomewiki.ucsc.edu/index.php/Coordinate_Transforms > > All database tables/files will be formatted this way unless > specifically noted in the data format FAQ: > http://genome.ucsc.edu/FAQ/FAQformat.html > > There are utilities readily available that work with our > coordinate system. Some function stand-alone and others require > a database. The public mySQL database can be used when a > database is required, if you do not run your own mirror. > > A list of utilities is here: > http://hgwdev.cse.ucsc.edu/~larrym/utilities.html > > Many can be downloaded pre-compiled from here (for certain > platforms): > http://hgdownload.cse.ucsc.edu/admin/exe/ > > Otherwise, obtain the source and compile locally: > http://hgdownload.cse.ucsc.edu/downloads.html#source_downloads > > Public mySQL access instructions: > http://genome.ucsc.edu/FAQ/FAQdownloads.html#download29 > > Please feel free to contact the mailing list support team again > if you would like more assistance. > > Warm regards, > > Jen > UCSC Genome Browser Support > > On 9/8/10 11:35 AM, Lipika Ray wrote: > > Hello UCSC group, > > I like to get the coding sequence of gene from refseq mrna > ids (like, > NM_003820) from hg18 version - big list of such ids. > > So I am getting information of exonstarts , exonends, > cdsStart, cdsend from > refFlat table under hg18. > > So for NM_003820, the record looks like this: > > geneName: TNFRSF14 > name: NM_003820 > chrom: chr1 > strand: - > txStart: 2479150 > txEnd: 2486613 > cdsStart: 2479705 > cdsEnd: 2486314 > exonCount: 8 > exonStarts: > 2479150,2480082,2481163,2482264,2483000,2484510,2485144,2486245, > exonEnds: > 2479831,2480114,2481306,2482355,2483156,2484636,2485253,2486613, > > To get the dna sequence corresponding to the coding regions, > I am extracting > sequences from chr1.fa.gz file under chromosomes in hg18 > version and then > extracting the dna sequence corresponding to the region: > > 2479705-2479831, 2480082-2480114, 2481163-2481306, > 2482264-2482355, > 2483000-2483156, 2484510-2484636, 2485144-2485253, > 2486245-2486314 > > The corresponding sequence is not matching if I cross check > with the > sequence from web. Can you please guide me whether I can > extract sequence in > this way, or you already have sequences corresponding to > genes stored > separately in your datanbase. > > Thanks for your help. > > Lipika > _______________________________________________ > Genome maillist - [email protected] > <mailto:[email protected]> > https://lists.soe.ucsc.edu/mailman/listinfo/genome > > _______________________________________________ > Genome maillist - [email protected] > <mailto:[email protected]> > https://lists.soe.ucsc.edu/mailman/listinfo/genome > > > _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
