Hi Deli,

You can use the Table Browser (http://genome.ucsc.edu/cgi-bin/hgTables) 
to retrieve only regions of the multiple alignment that you are 
interested in.  Select the canFam2 assembly and then:

group: Comparative Genomics
track: Conservation
table: multiz4way

If you have 1,000 regions or fewer, at this point you can hit the 
"define regions" button and enter the regions you want to retrieve.

Next, leave "output format: MAF - multiple alignment format" selected, 
and hit "get output."  You should see portions of the alignment file 
that correspond to your regions.

If you have more than 1,000 regions, you can create a custom track of 
your regions (in BED format) from this page: 
http://genome.ucsc.edu/cgi-bin/hgCustom, then choose the multiz4way 
table in the Table Browser and create an intersection with your new 
custom track.  On the intersection page, you would choose 
"Base-pair-wise intersection (AND) of Conservation and User Track."

I have one bit of additional input for you from one of our engineers 
about considering all differences with human/mouse/rat errors in the 
canFam2 reference assembly:

In your chr38 indel example, indeed the reference's additional "T" seems 
like it would cause a frameshift and early stop, assuming that dog has a 
gene where other species' genes have aligned.  However, it seems 
unlikely that absolutely all reference variants that differ from 
human/mouse/rat are reference errors; especially when considering SNPs, 
couldn't some of those differences be caused by true variation?

If you have further questions, please feel free to contact us again at 
[email protected].

--
Brooke Rhead
UCSC Genome Bioinformatics Group


On 07/24/11 12:47, Deli Liu wrote:
> Hi,
> 
> I mapped next-generation sequencing data of a dog genome to the dog
> reference genome (canFam2), and search for the SNP and indel. But I found
> that some SNPs and indels are due to the mistakes in dog reference genome,
> which are different from the human or mouse genome. For example:
> 
> I found a indel of my input dog genome by comparing the canFam2 reference
> genome, and the reference is T in this base pair region, but the input has
> one base pair deletion in this region:
> 
> chr38:8,242,507-8,242,507 T -
> 
> However, I found that the human, mouse and rat have no T in this base pair
> from conservation in browser. So there may be a mistake in this base pair
> region of canFam2 reference genome.
> 
> Now I want search for all my SNP and indel results back to the canFam2
> reference genome, and find out which are the real changes, and which are due
> to the canFam2 reference mistake by comparing the dog to the human and mouse
> genomes. And I download the multiple alignments files from dog canFam2, but
> it is very hard to output all the possible mistakes related to my SNP and
> indel regions. So is it possible I can get the conservation from some
> specific regions (my SNP and indel)?
> 
> Thanks a lot.
> 
> 
> 
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to