Hi Marco, There is some information on the track info page for the RepeatMasker track. Click the track title. There is also some info in the downloads' README file: http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/ (see esp. chromOut.tar.gz)
To set up the Table browser so it recovers only the elements with at least 90% of identity with the consensus sequence.... (For some background, a definition of RepeatMasker output columns can be found here: http://repeatmasker.org/webrepeatmaskerhelp.html ) The 2nd, 3rd and 4th columns of the .out files are useful: 15.6 = % substitutions in matching region compared to the consensus 6.2 = % of bases opposite a gap in the query sequence (deleted bp) 0.0 = % of bases opposite a gap in the repeat consensus (inserted bp) In our database table, those are multiplied by 10 in order to get integer parts-per-thousand, and called milliDiv (substitutions), milliDel and milliIns. The simplest % identity measurement is milliDiv only -- if you wish, you can factor in milliDel and milliIns too. So, to get % identity >= 90% in the Table Browser, create a filter with milliDiv >= 900 (since it is parts per thousand). Please let us know if you have any additional questions: [email protected] - Greg Roe UCSC Genome Bioinformatics Group On 6/13/11 9:32 AM, Marco Santagostino wrote: > Dear Sirs, > > were can I find the parameters used to generate the RepeatMasker track? > The problem is as it follows: I need to take from the horse genome a > certain repetitive element, and I'm supposed to classify all the hits > found according to their identity (with respect to the consensus > sequence). Some collegues of mine already took all the sequences with at > least 98% of identity by BLAST search, so, now I'm supposed to find > those which have a lower identity, but I can't find out how to set up > the Table Browser so that it finds the elements with the identity that I > chose. How do I set up the table browser so, for exemple, it recovers > only the elements with at least 90% of identity with the consensus sequence? > > Thank you, > > Marco Santagostino > > > _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
