Sorry, in the previous message, 5 bonds out of 40 should be 10 bonds out of
60. The point is still valid, however.

Regards,

Leonid Chepelev

On Mon, Mar 8, 2010 at 1:29 PM, Leonid Chepelev
<leonid.chepe...@gmail.com>wrote:

> Thank you for answering, Gilleain.
>
> Hello, The term 'subgraph isomorphism' is (I think) usually used to mean
>> finding a subgraph of a graph G that is isomorphic to another graph H. What
>> you describe sounds like the more difficult problem of finding all common
>> subgraphs of G and H. So, I am not so sure that the javadocs are unclear in
>> this case.
>
>
> Hm, I couldn't really find an authoritative definition - and I am certain a
> number of people could argue about this, but from the literature I've seen,
> I always thought that two graphs may have a large number of isomorphisms,
> and it is the largest isomorphism that is selected by procedures like
> maximal common substructure searching, e.g. via the Ullman algorithm. When
> one starts with an atom/bond pair in G1 and G2 and attempts to iteratively
> extend the initial matched fragment, it would seem a trivial task to output
> at every step of the iterative extension the subgraph, or a part of it, that
> does not yet violate edge/node matching criteria set forth in the subgraph
> isomorphism searching algorithm. As far as I understood, these subgraphs are
> collected anyway for the purposes of comparison, unless superseded by an
> extension, and only the result of the longest extension-matching procedure
> is selected as the MCS. So, I erred and believed that the
> getSubgraphAtomsMaps would give me precisely that, a collection of all the
> fragments that match and their variations.
>
> There is the possibility that H may be found multiple times in G (a
>> polymer, for example, will have multiple copies of the monomer in it), which
>> is sumably why the return value of these methods are lists.
>
>
> Again, I thought that was for the MCS, but please, feel free to let me know
> if I was wrong. I just wanted to make certain that CDK does not currently do
> this and that I am not re-inventing the wheel here. And again, I suppose I
> just misinterpreted the javadoc, sorry.
>
>  As for how this could be done - well the simplest (and most expensive) way
>> would be to enumerate every (connected?) subgraph of H, testing each one to
>> see if it is also a subgraph of G. Gilleain Torrance
>
>
> That's one method that I have tried, however, in many cases, the cost of
> enumerating all the possible connected subgraphs of H is astronomical. For
> instance, if we cut five bonds of a molecule that contains 40 bonds, we will
> have about 7.5e10 fragments to enumerate. If we engage in growing fragments
> with all possible branches, that also often leads to combinatorial
> explosions. I have created an 'approximate solution', which is more
> efficient than what I am outlining here, but it's still an approximation.
> So, for all practical purposes, that is not a feasible approach. Well, it
> would have been easier to do it as I mentioned, that all fragments beyond a
> certain size would be reported in the generic collection at every iteration
> of the subgraph isomorphism detection tool. This has already been done and
> published numerous times, without the release of the source code. Oh well,
> if no one currently does this, I guess I'll go ahead and reinvent the wheel
> yet again.
>
> Any other opinions, anyone? I know you are there!
>
> Regards,
>
> Leonid Chepelev
>
>
------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user

Reply via email to