Hey Ivan, The pre-calculated CDS FASTA, as well as the those calculated on the fly by the table browser, use the frames defined by the gene model in the reference species, in this case hg19.
The only gene models on hg19 that we pre-calculate these FASTA files for are UCSC genes, and RefSeq genes. I hope this answers your questions. Feel free to contact the list for any future questions you have. Brian On Tue, Aug 31, 2010 at 8:19 PM, Ivan Adzhubey <[email protected]> wrote: > Hi, > > One more question regarding FASTA alignments. > > How codon translation was done for the downloadable multiz46way FASTA files? > There are several options listed at cons46way track description page here: > > http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=cons46way > > ---Quote--- > # Use default species reading frames for translation: the annotations from the > genome displayed in the Default species for translation; pull-down menu are > used to translate all the aligned species present in the alignment. > # Use reading frames for species if available, otherwise no translation: codon > translation is performed only for those species where the region is annotated > as protein coding. > # Use reading frames for species if available, otherwise use default species: > codon translation is done on those species that are annotated as being protein > coding over the aligned region using species-specific annotation; the > remaining > species are translated using the default species annotation. > ---End quote--- > > Which one of these options was used to generate each of the multiz46way FASTA > sets available? > > knownGene.exonAA.fa.gz > knownCanonical.exonAA.fa.gz > refGene.exonAA.fa.gz > > The above track description page also mentions muliz alignments for Ensembl > Genes and XenoRef Genes. These are not available as pre-built FASTA files, is > this correct? > > Thanks, > Ivan > > On Friday, August 13, 2010 12:40:48 pm Mary Goldman wrote: >> Hi Ivan, >> >> Information on how the FASTA files for the conservation track are >> formatted can be found here: >> http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html#FASTA, under >> "Explanation of CDS FASTA header format". >> >> I hope this information is helpful. Please feel free to contact the >> mail list again if you require further assistance. >> >> Best, >> Mary >> ------------------ >> Mary Goldman >> UCSC Bioinformatics Group >> >> On 8/12/10 3:31 PM, Ivan Adzhubey wrote: >> > Hello, >> > >> > Could someone point me to a description of defline format for the >> > multiz46way >> > >> > FASTA files? For instance: >> >> uc010nxq.1_hg19_1_3 38 0 2 chr1:12190-12227+ >> > >> > ATGAGTGAGAGCATCAACTTCTCTCACAACCTAGGCCA >> > >> >> uc010nxq.1_panTro2_1_3 38 0 2 chr15:100048575-100048624- >> > >> > ATGAGTGAGAGGATCAACTTCTCTGACAGCCTAGGCCA >> > >> >> uc010nxq.1_gorGor1_1_3 38 0 2 >> > >> > What is the meaning of the numbers following (obviously) the knownGene >> > and assembly version id's? >> > >> > Thanks, >> > Ivan > > > -- > Ivan Adzhubey, Ph.D. > Instructor > Division of Genetics, Dept of Medicine > Brigham & Women's Hospital, Harvard Medical School > HMS New Research Building, Room 0464C > 77 Avenue Louis Pasteur > Boston, MA 02115 > tel.: (617) 525-4728 > fax: (617) 525-4705 > web: http://genetics.bwh.harvard.edu/genetics/members/Ivan_Adzhubey.html > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome > _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
