Hi Nimrod,

You could create your own multiple sequence alignments, or you could 
just use the existing alignments and pull out only the species (and 
regions) you are interested in.

If you want to create your own alignments, this page should be helpful:
http://genomewiki.ucsc.edu/index.php/Whole_genome_alignment_howto

There are a couple of tools that could help you extract what you want 
from existing alignments.  The first is the "CDS FASTA alignment from 
multiple alignment" output option in the Table Browser 
(http://genome.ucsc.edu/cgi-bin/hgTables).  Select the RefSeq Genes 
track in hg19, and the CDS FASTA output option will become visible. 
After hitting "get output" you will see a page where you can select the 
organisms you want to include in your output.  See the user's guide for 
more info on this option: 
http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html#FASTA
One caveat to be aware of is that, since not all species will be 
selected for output, there will be some columns in which all of the 
alignments will show only a "-".

Another option is to use Galaxy (http://main.g2.bx.psu.edu/), which is 
run by our collaborators at Penn State and works in conjunction with the 
Genome Browser.  I have not personally used the tools there, but there 
are several that look like they might be useful to you -- see "Filter 
MAF blocks by Species," "Extract MAF blocks given a set of genomic 
intervals," and "Stitch Gene blocks given a set of coding exon 
intervals" on the left-hand side of the page under the "Fetch 
Alignments" header.  If you have questions about using Galaxy, their 
helpdesk addres is [email protected].

--
Brooke Rhead
UCSC Genome Bioinformatics Group


On 05/27/11 09:49, nimrod rubinstein wrote:
> Hi,
> 
> I think my question is pretty trivial and has probably been raised many
> times before, nevertheless I couldn't find a direct answer for it in the
> archives.
> 
> Anyway, I'm interested in building
> Human-Chimp-Orangutan-Rhesus multiple sequence alignments for every human
> refseq gene.
> The way I thought of accomplishing this is to:
> 1. Derive the coding sequence coordinates from the hg19 refGene file for
> every human refseq gene.
> 2. Get the sequences of human and each of the other organisms that map to
> these coordinates from the syntenicNet pairwise alignment files
> (e.g., chr1.hg19.panTro2.synNet.axt.gz).
> 3. Combine these pairwise sequence files to multiple sequence files and run
> my own multiple sequence alignment program.
> 
> Does this make sense or is there any other better established way to do
> that?
> 
> Thanks a lot,
> Nimrod Rubinstein
> NESCent fellow
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to