Thanks Jennifer, I tried things like,
liftOver ./test.bed ./mm9ToHg18.over.chain ./test_out.bed ./unMapped -minMatch=0.05 -multiple -minChainT=50 -minChainQ=50 -minSizeQ=50 I understand that by allowing liftOver to search only for long alignment blocks (increasing minChain) it is possible to filter spurious multiple hits. I see that somehow minMatch is related to the minimum conservation of alignment block that liftOver considers. But can liftOver "map" regions with no cover (look for the nearest match), or gaps ? Best regards, Avi --- On Tue, 6/2/09, Jennifer Jackson <[email protected]> wrote: > From: Jennifer Jackson <[email protected]> > Subject: Re: [Genome] (no subject) > To: "Fungazid" <[email protected]> > Cc: [email protected] > Date: Tuesday, June 2, 2009, 9:30 PM > Hello Avi, > > Some options to try are to: > > 1) Lower the minmatch threshold. This will extend the > alignments around higher scoring regions. Then you can > compare the high scoring location (from a lift with a higher > minmatch) to identify the flanking sequence. > > 2) Allow multiple = Y. This will also help to > include/extend result data. Perhaps some regions are hitting > without specificity (this likelihood will increase with a > lower minmatch). This may not matter to you, since you will > be able to find the regions you are interested in based on > the higher scoring region locations. > > 3) Keep the parameters strict, locate the regions of high > identity, then use your own tools to build coordinates for > surrounding regions on the target genome, and extract > sequence from the genomic files in Downloads or use the "get > DNA" tool in the Assembly browser. This will only identify > regions that are contiguous with the original high scoring > lift region, but it sounds is if that is what you are trying > to do. > > 4) These processes can be cyclic. Run strict, filter, run > permissive, filter, etc. The failure reasons will alert you > to why the regions do not map and can point you to the > specific parameters to make less stringent to achieve a > lift. It is also possible that some regions will need some > manual judgment - try using the chain/net tracks in the > browser to visualize the data. The liftOver files are based > on this same source data. > > Permissive parameters can produce a lot of output. Consider > adding in other filters if they seem appropriate - > minChainT/Q and minSizeT/Q can help reduce the noise from > very small fragments mapping. > > LiftOver runs pretty quickly, so the best way to determine > the best set of parameters is usually a test/analyze > methodology. Each experiment can be different and some > regions of the genome can differ from other regions > depending on the presence of repetitive elements or the type > of gene(s) or the finished state of the assembly. Repeating > subunits in a single gene or several genes in close genomic > proximity that were all derived from a common ancestor gene > can cause complications that sometimes only a person can > resolve. > > Good luck and we hope that some of these suggestions are > helpful, > Jennifer Jackson > UCSC Genome Bioinformatics Group > > Fungazid wrote: > > Hello UCSC team, > > > > Thanks for all the help, > > The standalone tools I needed are now giving me > reasonable outputs, but I'm not sure about the exact > parameters to use for my specific needs in the liftOver > tool: > > > > In many introns the levels of conservation of > distantly related species are very low, with only some > island of conserved regions (not only exons). Sometimes I > wish to map coordinates of different species in regions that > are 200-500bp from the nearest conserved regions. > Accordingly, in such cases I do not need the to find exact > coordinate match, but I want the nearest match. Maybe you > can suggest how to use liftOver in such case > > > > Avi > > > > > > > > > _______________________________________________ > > Genome maillist - [email protected] > > https://lists.soe.ucsc.edu/mailman/listinfo/genome > > _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
