Hi Marco,

There is some information on the track info page for the RepeatMasker 
track. Click the track title. There is also some info in the downloads' 
README file: http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/ 
(see esp.  chromOut.tar.gz)

To set up the Table browser so it recovers only the elements with at 
least 90% of identity with the consensus sequence....

(For some background, a definition of RepeatMasker output columns can be 
found here: http://repeatmasker.org/webrepeatmaskerhelp.html )

The 2nd, 3rd and 4th columns of the .out files are useful:

   15.6    = % substitutions in matching region compared to the consensus
   6.2     = % of bases opposite a gap in the query sequence (deleted bp)
   0.0     = % of bases opposite a gap in the repeat consensus (inserted bp)

In our database table, those are multiplied by 10 in order to get 
integer parts-per-thousand, and called milliDiv (substitutions), 
milliDel and milliIns.

The simplest % identity measurement is milliDiv only -- if you wish, you 
can factor in milliDel and milliIns too.

So, to get % identity >= 90% in the Table Browser, create a filter with 
milliDiv >= 900 (since it is parts per thousand).

Please let us know if you have any additional questions: [email protected]

-
Greg Roe
UCSC Genome Bioinformatics Group


On 6/13/11 9:32 AM, Marco Santagostino wrote:
> Dear Sirs,
>
> were can I find the parameters used to generate the RepeatMasker track?
> The problem is as it follows: I need to take from the horse genome a
> certain repetitive element, and I'm supposed to classify all the hits
> found according to their identity (with respect to the consensus
> sequence). Some collegues of mine already took all the sequences with at
> least 98% of identity by BLAST search, so, now I'm supposed to find
> those which have a lower identity, but I can't find out how to set up
> the Table Browser so that it finds the elements with the identity that I
> chose. How do I set up the table browser so, for exemple, it recovers
> only the elements with at least 90% of identity with the consensus sequence?
>
> Thank you,
>
> Marco Santagostino
>
>
>
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to