Is there a better option, then?  Something curated?

Michael

-----Original Message-----
From: Brooke Rhead [mailto:[email protected]] 
Sent: Friday, October 07, 2011 5:52 PM
To: Rusch, Michael
Cc: '[email protected]'
Subject: Re: [Genome] genes with disparate loci in refFlat

Hi Michael,

The RefSeq Genes track is made by aligning RefSeq sequences to the 
genome using BLAT.  You can click on the blue "RefSeq Genes" link on the 
main Genome Browser page to read the track description.  In part, it says:

"RefSeq RNAs were aligned against the human genome using blat; those 
with an alignment of less than 15% were discarded. When a single RNA 
aligned in multiple places, the alignment having the highest base 
identity was identified. Only alignments having a base identity level 
within 0.1% of the best and at least 96% base identity with the genomic 
sequence were kept."

So, it is expected that some sequences will align very well in multiple 
locations.  One explanation for what you are seeing is duplication 
events in the genome.  You might try turning on the "Segmental Dups" 
track (in the Variation and Repeats track group).  Both of your example 
regions show activity in that track.

If you have further questions, please contact us again at 
[email protected].

--
Brooke Rhead
UCSC Genome Bioinformatics Group


On 10/6/11 7:27 AM, Rusch, Michael wrote:
> I've found some things in refFlat that I don't understand. Perhaps
somebody can help shed some light on this.
>
> Intuitively it seemed to me that in most circumstances, all of the
records with the same geneName should be in about the same place, and
certainly in the same orientation on the same chromosome. However, I
have found several situations where this is not the case. Some of these
make sense to me, for example, genes in the PARs have records on both
chrX and chrY. Also, there are several that have some records on the
"hap" sequences. These I can understand. Others truly puzzle me. Maybe
somebody can help me interpret.
>
> First example is MAGEA2. This gene has two locations on chrX:
> MAGEA2  chrX    -      151918388       151922364       3
> MAGEA2  chrX    +       151883119       151887095       3
>
> I don't understand how the same gene could be in two different places?
>
> In some cases they are even on different chromosomes.
>
> In many cases, there seem to be duplicates with different
geneName/names. For example:
>
> MIR4509-1       NR_039732       chr15   -       22675147        22675241
> MIR4509-2       NR_039733       chr15   -       22675147        22675241
> MIR4509-3       NR_039734       chr15   -       22675147        22675241
> MIR4509-1       NR_039732       chr15   +       28671636        28671730
> MIR4509-2       NR_039733       chr15   +       28671636        28671730
> MIR4509-3       NR_039734       chr15   +       28671636        28671730
> MIR4509-1       NR_039732       chr15   -       28735897        28735991
> MIR4509-2       NR_039733       chr15   -       28735897        28735991
> MIR4509-3       NR_039734       chr15   -       28735897        28735991
>
> In this case, there are three geneName/name combinations, and three
loci, and each geneName/name has a record in each locus.
>
> There are hundreds of these that I've found.
>
> I get the impression that I'm not using this data correctly, and
perhaps there would be a better table to be using for the purpose of
locating genes and annotated transcripts on the genome. Can anybody
explain this to me?

> Michael
>
> ________________________________
> Email Disclaimer: www.stjude.org/emaildisclaimer
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome



_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to