Re: [Genome] Self chain database block information

Jennifer Jackson Wed, 13 Jan 2010 17:57:00 -0800

Hello,

I wish there was an easy answer to give you, but there isn't one. The same 
explanation as before applies to this example. There will be variation due to 
the methods used to create the data.


How to merge this type of variation together - if you require a completely, 
perfectly, symmetrical self-chain result - will be part of a rule set you will 
need to develop and apply. 

For this example, presumably one of these is "more correct" but which to pick 
is something that will need to be determined by you. The best advice is to look 
at the actual data alignments for examples like this (in the browser, where you 
can use the data from other annotation tracks in the region for context and 
supporting evidence), develop some preliminary merge rules, test them on the 
dataset, and examine again. This can be simple or complex, depending on your 
tolerance for under or over merging. 

The tools I pointed you to before in the kent source tree can help with the 
application of the rule set you come up with. Ideally, you will not have to 
reinvent the actual merge utility on your own, but be able to use our utilities 
with your own rules (perhaps run on an expanded file that includes the original 
and all flipped chains). But this too needs to be evaluated by you to determine 
if the tools are actually able to do the merge as you want it performed.

Thanks,
Jennifer

------------------------------------------------ 
Jennifer Jackson 
UCSC Genome Bioinformatics Group 

----- "CHEN Jieqi Pauline" <[email protected]> wrote:

> From: "CHEN Jieqi Pauline" <[email protected]>
> To: "Jennifer Jackson" <[email protected]>
> Cc: [email protected]
> Sent: Wednesday, January 13, 2010 5:17:07 PM GMT -08:00 US/Canada Pacific
> Subject: RE: Self chain database block information
>
> Hi Jennifer,
> 
> 
> 
> Thank you for your quick response.
> 
> 
> 
> I have another query. Here are 2 symmetrical self-chaining regions and
> their respective block information:
> 
> 
> 
> chain 36972004 chr13 114142980 + 17918000 18348155 chr21 46944323 +
> 13369074 14275446 80
> 
> .
> 
> .
> 
> 1421    0       2
> 
> 2903    0       2
> 
> 258     0       4
> 
> 793     0       168
> 
> 747     2       0
> 
> 1666    1       0
> 
> .
> 
> .
> 
> 
> 
> chain 36971088 chr21 46944323 + 13369074 14275446 chr13 114142980 +
> 17918000 18348155 81
> 
> .
> 
> .
> 
> 1421    2       0
> 
> 2903    2       0
> 
> 258     4       0
> 
> 793     168     0
> 
> 227     1       1
> 
> 519     0       2
> 
> 1666    0       1
> 
> .
> 
> .
> 
> 
> 
> Being symmetrical in this case, I was expecting a block and gap
> information that was identical. There is however now an additional gap
> for the second entry. If I were to flip the original first entry to
> generate a symmetrical entry, it would not match to the actual
> self-chain partner. Could you please explain why?
> 
> 
> 
> Thank you for the trouble.
> 
> 
> 
> Regards,
> 
> Pauline
> 
> 
> 
> 
> 
> -----Original Message-----
> From: Jennifer Jackson [mailto:[email protected]]
> Sent: Thursday, January 14, 2010 6:45 AM
> To: CHEN Jieqi Pauline
> Cc: [email protected]
> Subject: Re: Self chain database block information
> 
> 
> 
> Hello Pauline,
> 
> 
> 
> The expectation is that the file is primarily symmetrical, but the
> reality of the processing details can result in some differences. Your
> example is one of these.
> 
> 
> 
> Here is an explanation from one of our Scientific developers that
> touches on some of the technical details of the processing and how
> they can influence the final result:
> 
> 
> 
> ----- start quote -----
> 
> We don't actively suppress symmetrical mappings.  (Although for
> self-chain, we try to filter out trivial mappings of a region to
> itself.)
> 
> 
> 
> It's important to keep in mind that genome-scale alignment programs
> use heuristics -- otherwise they would take an impossibly long time to
> run.  They trade off completeness for fast runtime, and the best
> methods are fast but find *most* of the correct alignments.  They are
> supposed to be *mostly* symmetrical but many factors can affect the
> output: the way sequence is sliced up for cluster runs, how repeats
> are masked, whether a match will be seeded one way or the other
> (blastz uses a fancy 12-of-19 bitmask seed), and probably lots of
> other subtle things about the algorithm and parameters.
> 
> -----    end quote -----
> 
> 
> 
> It would be possible for you to create a perfectly symmetrical file on
> your own. The most basic approach would be to flip query and target
> for all lines. This would produce duplications (which may or may not
> be a concern for you) or you could use the output as a starting place
> for you to make your own decisions about what rules to use to merge
> similar mapping that are not exactly identical).
> 
> 
> 
> Use these coordinate rules to do the conversion:
> 
> http://genomewiki.ucsc.edu/index.php/Coordinate_Transforms
> 
> 
> 
> Also see the utilities in the kent source tree that manipulate chain
> files (named like *[Cc]hain*). Specifically, the utility chainSwap
> would be helpful.
> 
> http://genomewiki.cse.ucsc.edu/index.php/Kent_source_utilities
> 
> http://genome.ucsc.edu/FAQ/FAQdownloads#download27
> 
> http://genomewiki.cse.ucsc.edu/index.php/The_source_tree
> 
> 
> 
> We hope that this helps to explain the data and provide you with some
> options to alter the data to meet your own analysis needs and/or input
> assumptions,
> 
> Jennifer
> 
> 
> 
> 
> 
> 
> 
> ------------------------------------------------
> 
> Jennifer Jackson
> 
> UCSC Genome Bioinformatics Group
> 
> 
> 
> ----- "CHEN Jieqi Pauline" <[email protected]> wrote:
> 
> 
> 
> > From: "CHEN Jieqi Pauline" <[email protected]>
> 
> > To: "[email protected]" <[email protected]>
> 
> > Cc: "[email protected]" <[email protected]>
> 
> > Sent: Wednesday, January 13, 2010 1:22:29 AM GMT -08:00 US/Canada
> Pacific
> 
> > Subject: Self chain database block information
> 
> >
> 
> > Hi,
> 
> >
> 
> > I'm working on a project using the self chain file that can be
> 
> > downloaded from UCSC. I found an entry as follows:
> 
> >
> 
> > chain 10000 chr20 62435964 + 34101021 34101274 chr7 158821424 -
> 
> > 135607531 135607769 8297793
> 
> >
> 
> > I was expecting to find an entry in the opposite direction, since
> the
> 
> > regions are homologous. That is, the target being the region on
> 
> > chromosome 7 and the query on chromosome 20. However, there is no
> such
> 
> > partner for it. Could you please explain to me why?
> 
> >
> 
> > I was expecting to find a partner, for example:
> 
> >
> 
> > chain 82378286 chr18 76117153 + 14115136 15365059 chr21 46944323 -
> 
> > 32615359 33653002 20
> 
> > chain 82114239 chr21 46944323 + 13291321 14328964 chr18 76117153 -
> 
> > 60752094 62002017 21
> 
> >
> 
> > Thank you for the trouble.
> 
> >
> 
> > Regards,
> 
> > Pauline
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Re: [Genome] Self chain database block information

Reply via email to