Re: [Genome] multiz46way FASTA format description?

Ivan Adzhubey Tue, 31 Aug 2010 20:21:21 -0700

Hi,

One more question regarding FASTA alignments.


How codon translation was done for the downloadable multiz46way FASTA files? 
There are several options listed at cons46way track description page here:

http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=cons46way

---Quote---
# Use default species reading frames for translation: the annotations from the 
genome displayed in the Default species for translation; pull-down menu are 
used to translate all the aligned species present in the alignment.
# Use reading frames for species if available, otherwise no translation: codon 
translation is performed only for those species where the region is annotated 
as protein coding.
# Use reading frames for species if available, otherwise use default species: 
codon translation is done on those species that are annotated as being protein 
coding over the aligned region using species-specific annotation; the remaining 
species are translated using the default species annotation.
---End quote---

Which one of these options was used to generate each of the multiz46way FASTA 
sets available?

knownGene.exonAA.fa.gz
knownCanonical.exonAA.fa.gz
refGene.exonAA.fa.gz

The above track description page also mentions muliz alignments for Ensembl 
Genes and XenoRef Genes. These are not available as pre-built FASTA files, is 
this correct?

Thanks,
Ivan

On Friday, August 13, 2010 12:40:48 pm Mary Goldman wrote:
> Hi Ivan,
> 
> Information on how the FASTA files for the conservation track are
> formatted can be found here:
> http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html#FASTA, under
> "Explanation of CDS FASTA header format".
> 
> I hope this information is helpful.  Please feel free to contact the
> mail list again if you require further assistance.
> 
> Best,
> Mary
> ------------------
> Mary Goldman
> UCSC Bioinformatics Group
> 
> On 8/12/10 3:31 PM, Ivan Adzhubey wrote:
> > Hello,
> > 
> > Could someone point me to a description of defline format for the
> > multiz46way
> > 
> > FASTA files? For instance:
> >> uc010nxq.1_hg19_1_3 38 0 2 chr1:12190-12227+
> > 
> > ATGAGTGAGAGCATCAACTTCTCTCACAACCTAGGCCA
> > 
> >> uc010nxq.1_panTro2_1_3 38 0 2 chr15:100048575-100048624-
> > 
> > ATGAGTGAGAGGATCAACTTCTCTGACAGCCTAGGCCA
> > 
> >> uc010nxq.1_gorGor1_1_3 38 0 2
> > 
> > What is the meaning of the numbers following (obviously) the knownGene
> > and assembly version id's?
> > 
> > Thanks,
> > Ivan


-- 
Ivan Adzhubey, Ph.D.
Instructor
Division of Genetics, Dept of Medicine
Brigham & Women's Hospital, Harvard Medical School
HMS New Research Building, Room 0464C
77 Avenue Louis Pasteur
Boston, MA 02115
tel.: (617) 525-4728
fax:  (617) 525-4705
web: http://genetics.bwh.harvard.edu/genetics/members/Ivan_Adzhubey.html
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Re: [Genome] multiz46way FASTA format description?

Reply via email to