Hi, One more question regarding FASTA alignments.
How codon translation was done for the downloadable multiz46way FASTA files? There are several options listed at cons46way track description page here: http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=cons46way ---Quote--- # Use default species reading frames for translation: the annotations from the genome displayed in the Default species for translation; pull-down menu are used to translate all the aligned species present in the alignment. # Use reading frames for species if available, otherwise no translation: codon translation is performed only for those species where the region is annotated as protein coding. # Use reading frames for species if available, otherwise use default species: codon translation is done on those species that are annotated as being protein coding over the aligned region using species-specific annotation; the remaining species are translated using the default species annotation. ---End quote--- Which one of these options was used to generate each of the multiz46way FASTA sets available? knownGene.exonAA.fa.gz knownCanonical.exonAA.fa.gz refGene.exonAA.fa.gz The above track description page also mentions muliz alignments for Ensembl Genes and XenoRef Genes. These are not available as pre-built FASTA files, is this correct? Thanks, Ivan On Friday, August 13, 2010 12:40:48 pm Mary Goldman wrote: > Hi Ivan, > > Information on how the FASTA files for the conservation track are > formatted can be found here: > http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html#FASTA, under > "Explanation of CDS FASTA header format". > > I hope this information is helpful. Please feel free to contact the > mail list again if you require further assistance. > > Best, > Mary > ------------------ > Mary Goldman > UCSC Bioinformatics Group > > On 8/12/10 3:31 PM, Ivan Adzhubey wrote: > > Hello, > > > > Could someone point me to a description of defline format for the > > multiz46way > > > > FASTA files? For instance: > >> uc010nxq.1_hg19_1_3 38 0 2 chr1:12190-12227+ > > > > ATGAGTGAGAGCATCAACTTCTCTCACAACCTAGGCCA > > > >> uc010nxq.1_panTro2_1_3 38 0 2 chr15:100048575-100048624- > > > > ATGAGTGAGAGGATCAACTTCTCTGACAGCCTAGGCCA > > > >> uc010nxq.1_gorGor1_1_3 38 0 2 > > > > What is the meaning of the numbers following (obviously) the knownGene > > and assembly version id's? > > > > Thanks, > > Ivan -- Ivan Adzhubey, Ph.D. Instructor Division of Genetics, Dept of Medicine Brigham & Women's Hospital, Harvard Medical School HMS New Research Building, Room 0464C 77 Avenue Louis Pasteur Boston, MA 02115 tel.: (617) 525-4728 fax: (617) 525-4705 web: http://genetics.bwh.harvard.edu/genetics/members/Ivan_Adzhubey.html _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
