On Mon, Jun 20, 2011 at 6:51 PM, nimrod rubinstein <[email protected]>wrote:
> Hi Brooke, > > Thanks a lot for the answer, which as usual was clear and thorough. > > I hope you don't mind me asking one further question. > In addition to the protein coding sequences I'm also interested in creating > alignments of upstream (i.e., promoter), intron, and UTR sequences. Aside > from using galaxy, should the way I suggested in my previous email (using > the gene coordinates of to cut sequences from the pairwise alignments) work > for that purpose? > > Sorry I didn't fully explain this in my previous email. > > Thanks a lot, > Nimrod > > > On Tue, May 31, 2011 at 6:40 PM, Brooke Rhead <[email protected]> wrote: > >> Hi Nimrod, >> >> You could create your own multiple sequence alignments, or you could just >> use the existing alignments and pull out only the species (and regions) you >> are interested in. >> >> If you want to create your own alignments, this page should be helpful: >> http://genomewiki.ucsc.edu/**index.php/Whole_genome_**alignment_howto<http://genomewiki.ucsc.edu/index.php/Whole_genome_alignment_howto> >> >> There are a couple of tools that could help you extract what you want from >> existing alignments. The first is the "CDS FASTA alignment from multiple >> alignment" output option in the Table Browser ( >> http://genome.ucsc.edu/cgi-**bin/hgTables<http://genome.ucsc.edu/cgi-bin/hgTables>). >> Select the RefSeq Genes track in hg19, and the CDS FASTA output option will >> become visible. After hitting "get output" you will see a page where you can >> select the organisms you want to include in your output. See the user's >> guide for more info on this option: http://genome.ucsc.edu/** >> goldenPath/help/hgTablesHelp.**html#FASTA<http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html#FASTA> >> One caveat to be aware of is that, since not all species will be selected >> for output, there will be some columns in which all of the alignments will >> show only a "-". >> >> Another option is to use Galaxy (http://main.g2.bx.psu.edu/), which is >> run by our collaborators at Penn State and works in conjunction with the >> Genome Browser. I have not personally used the tools there, but there are >> several that look like they might be useful to you -- see "Filter MAF blocks >> by Species," "Extract MAF blocks given a set of genomic intervals," and >> "Stitch Gene blocks given a set of coding exon intervals" on the left-hand >> side of the page under the "Fetch Alignments" header. If you have questions >> about using Galaxy, their helpdesk addres is [email protected] >> . >> >> -- >> Brooke Rhead >> UCSC Genome Bioinformatics Group >> >> >> >> On 05/27/11 09:49, nimrod rubinstein wrote: >> >>> Hi, >>> >>> I think my question is pretty trivial and has probably been raised many >>> times before, nevertheless I couldn't find a direct answer for it in the >>> archives. >>> >>> Anyway, I'm interested in building >>> Human-Chimp-Orangutan-Rhesus multiple sequence alignments for every human >>> refseq gene. >>> The way I thought of accomplishing this is to: >>> 1. Derive the coding sequence coordinates from the hg19 refGene file for >>> every human refseq gene. >>> 2. Get the sequences of human and each of the other organisms that map to >>> these coordinates from the syntenicNet pairwise alignment files >>> (e.g., chr1.hg19.panTro2.synNet.axt.**gz). >>> 3. Combine these pairwise sequence files to multiple sequence files and >>> run >>> my own multiple sequence alignment program. >>> >>> Does this make sense or is there any other better established way to do >>> that? >>> >>> Thanks a lot, >>> Nimrod Rubinstein >>> NESCent fellow >>> ______________________________**_________________ >>> Genome maillist - [email protected] >>> https://lists.soe.ucsc.edu/**mailman/listinfo/genome<https://lists.soe.ucsc.edu/mailman/listinfo/genome> >>> >> > _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
