Hi Sumanth,

A simple repeat can be a non-exact match. See the Repeat Masker track 
descriptions for the processing methods.

As for whether your data is a repeat or not - that is something you will 
need to decide.

Your sequence may represent a novel splice variant or it may just be 
what it looks like - a simple repeat in an intron. It takes a person 
sometimes to examine the data as a whole to make a call.

First, check out the gene/transcript you are using and go out to the 
data source to see if the component sequences are available (or 
additional info related to that transcript sequence's creation).

Next, bring up other tracks. For example, turn on the Conservation 
track, all species to full, and see what other genomes have in this 
area. Is it conserved? Then examine any conserved genomes to see what 
the gene/transcript annotation is for those regions. You could also try 
looking at the Mouse ESTs (spliced and unspliced) and Mouse mRNA/Other 
mRNA tracks. Also try examining some of the gene prediction tracks that 
are based on algorithms (not sequence data).

You will likely find that the information starts to cluster around a 
specific interpretation of the region. This can help you to make a 
decision with reasonable confidence.

Best wishes for your project,
Jennifer

---------------------------------
Jennifer Jackson
UCSC Genome Informatics Group
http://genome.ucsc.edu/

On 6/3/10 9:40 AM, Polikepahad, Sumanth wrote:
> Hi Jennifer
>
> Thanks a lot for the reply. As you said, I took the entire 197 bp intron 
> sequence and mapped it to the
> repeatmasker website. From 1 to 52, its showing as simple repeat and from 
> then on its a retrotransposon.
>
> The 15 nt sequence is mapping to the first 15 nt of the intron sequence. But 
> there are about 4 mismatches
> in it. Now my question is, should I really consider it as a simple repeat? 
> because only 70% of it is aligning to
> the repeat.
>
> Here is the ~197 bp intron sequence
>
> ttatcgtcatcgtcatcatcaccatcaccattgtttttgccattgttaccgtcaccttgag
> aacagagctttactctgtatcccaggctagcttcgaattgtttgtgtagc
> tgaggttaccctcaagctcatggcagtcctgtgcctgagccttctaagtg
> ctgggcttacaagtgtgagccaccttgtccagc
>
>
> Here is the alignment on repeatmasker for the first 52 nt.
>
> UnnamedSeq           1 TTATCGTCATCGTCATCATCACCATCACCATTGTTTTTGCCATTGTTACC 50
>                                         i   i     i         i     i   ii iv 
> iii   ii i i
>    (CAT)n#Simple      3 TCATCATCATCATCATCATCATCATCATCATCATCATCATCATCATCATC 52
>
> My 15 bp sequence is GACGATGACGATAAA
>
> Is repeatmasking by ucsc very stringent ans sensitive?
>
> THanks.
> ________________________________________
> From: Jennifer Jackson [[email protected]]
> Sent: Wednesday, June 02, 2010 5:25 PM
> To: Polikepahad, Sumanth
> Cc: [email protected]
> Subject: Re: [Genome] Repeat Masking
>
> Hello Sumanth,
>
> The 15bp sequences are probably simply too short to capture a repeat
> match. If you added in some flanking sequence (using the genome
> alignment), the match would probably be found, the same as the genome
> track.
>
> Perhaps noting what type of repeat this is (on the Repeat track's item
> detail page) and learning about it's characteristics would help. Perhaps
> there are variable regions - which would definitely interfere with short
> match alignments.
>
> It sounds like the sequence region is mapping uniquely. Perhaps align to
> genome first, then filter out matches to genome regions annotated as
> repeats (or repeats plus some other suspicious factors, like a reverse
> orientation intron mapping).
>
> Hopefully this helps,
> Jen
>
> ---------------------------------
> Jennifer Jackson
> UCSC Genome Informatics Group
> http://genome.ucsc.edu/
>
> On 6/2/10 10:28 AM, Polikepahad, Sumanth wrote:
>> Hi,
>>
>> I have been trying to map the deep-sequenced data to mouse genome.  There is 
>> a ~15 nt sequence (about 15000 copies) which is mapping in reverse 
>> orientation to the intron region of a gene. I have downloaded the intron 
>> database from Tables in ucsc browser without repeat masking option. But when 
>> I check the repeat masking option, the mapped regions of the above sequences 
>> are being masked suggesting that they might be repeat sequences. However, 
>> when I map them to the mouse repeat database obtained from the 
>> www.girinst.org and also to the www.repeatmasker.org, that particular 
>> sequence is not shown as a repeat. But it is shown as a repeat in Hydra 
>> genome. Can someone suggest me what I am missing here. Why ucsc browser 
>> considering that sequence as a repeat and the others are not?
>>
>> thanks in advance.
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to