In theory if someone was so inclined they could right something to properly
handle Fe-Cp sandwiches on top of the CDK. You'd need some standardisation
for cases where the Fe is drawn with 10 bonds or as counter ions... and
having a zero order bond would be useful. If you could do all that that the
more complex interactions can be handled on-top with an Sgroup-like
approach. In fact right now canonicalization kind of already works:
C(C(=O)O)CCCCC(C(O)=O)(CC)CCCC1CCCC1.*[Fe]*.C1CCCC1
> |m:22:17.18.19.20.21,24:25.26.27.28.29|
> C1CCCC1.C(CCC1CCCC1)C(CC)(C(O)=O)CCCCCC(O)=O.*[Fe]*
> |m:27:0.1.2.3.4,29:8.9.10.11.12|
> C(C(C(=O)O)(CCCC1CCCC1)CCCCCC(O)=O)C.C1CCCC1.[Fe](*)*
> |m:28:8.9.10.11.12,29:22.23.24.25.26|
> C1CC(CC1)CCCC(CCCCCC(=O)O)(CC)C(O)=O.[Fe](*)*.C1CCCC1
> |m:23:0.1.2.3.4,24:25.26.27.28.29|
> O=C(O)C(CCCCCC(=O)O)(CCCC1CCCC1)CC.C1CCCC1.*[Fe]*
> |m:27:22.23.24.25.26,29:15.16.17.18.19|
> OC(C(CCCCCC(=O)O)(CC)CCCC1CCCC1)=O.C1CCCC1.*[Fe]*
> |m:27:22.23.24.25.26,29:16.17.18.19.20|
> C(C(CC)(CCCC1CCCC1)CCCCCC(O)=O)(O)=O.*[Fe]*.C1CCCC1
> |m:22:7.8.9.10.11,24:25.26.27.28.29|
> C(CCCCC(CCCC1CCCC1)(CC)C(O)=O)C(=O)O.[Fe](*)*.C1CCCC1
> |m:23:25.26.27.28.29,24:9.10.11.12.13|
> C(CCC(O)=O)CCC(CCCC1CCCC1)(CC)C(O)=O.*[Fe]*.C1CCCC1
> |m:22:12.13.14.15.16,24:25.26.27.28.29|
> C1CCCC1.C(=O)(O)C(CC)(CCCC1CCCC1)CCCCCC(=O)O.*[Fe]*
> |m:27:0.1.2.3.4,29:14.15.16.17.18|
Canonicalize to, the number on the end wibble because of the symmetry on
the '*' atoms it's quite easy to split ties on that though.
*[Fe]*.O=C(O)CCCCCC(C(=O)O)(CC)CCCC1CCCC1.C1CCCC1
> |m:0:20.21.22.23.24,2:25.26.27.28.29|
*[Fe]*.O=C(O)CCCCCC(C(=O)O)(CC)CCCC1CCCC1.C1CCCC1
> |m:0:25.26.27.28.29,2:20.21.22.23.24|
*[Fe]*.O=C(O)CCCCCC(C(=O)O)(CC)CCCC1CCCC1.C1CCCC1
> |m:0:25.26.27.28.29,2:20.21.22.23.24|
*[Fe]*.O=C(O)CCCCCC(C(=O)O)(CC)CCCC1CCCC1.C1CCCC1
> |m:0:25.26.27.28.29,2:20.21.22.23.24|
*[Fe]*.O=C(O)CCCCCC(C(=O)O)(CC)CCCC1CCCC1.C1CCCC1
> |m:0:25.26.27.28.29,2:20.21.22.23.24|
*[Fe]*.O=C(O)CCCCCC(C(=O)O)(CC)CCCC1CCCC1.C1CCCC1
> |m:0:25.26.27.28.29,2:20.21.22.23.24|
*[Fe]*.O=C(O)CCCCCC(C(=O)O)(CC)CCCC1CCCC1.C1CCCC1
> |m:0:20.21.22.23.24,2:25.26.27.28.29|
*[Fe]*.O=C(O)CCCCCC(C(=O)O)(CC)CCCC1CCCC1.C1CCCC1
> |m:0:25.26.27.28.29,2:20.21.22.23.24|
*[Fe]*.O=C(O)CCCCCC(C(=O)O)(CC)CCCC1CCCC1.C1CCCC1
> |m:0:20.21.22.23.24,2:25.26.27.28.29|
*[Fe]*.O=C(O)CCCCCC(C(=O)O)(CC)CCCC1CCCC1.C1CCCC1
> |m:0:20.21.22.23.24,2:25.26.27.28.29|
John
On 27 April 2017 at 23:34, gilleain torrance <gilleain.torra...@gmail.com>
wrote:
> Hi,
>
> Absolutely agree that we should optimise for the 99% of cases, but would
> be good to (try and) support the more complex ones.
>
> With regards to the implementation, Egon says:
>
>>
>> the core interface could restrict to a two entity interaction (atom-atom
>> bond) and an extending interface could generalize that (or the other way
>> around)
>
>
> so I think of things like Cp-Fe sandwiches or B-H-B bonds as hypergraphs.
> So as John points out, many standard algorithms most likely won't work on
> these data structures - as far as I know (which is hardly anything! when it
> comes to hypergraphs).
>
> Therefore, if we want - say - canonicalisation to work on these examples
> then I guess we need to have a valence-unrealistic version (B-H-B) and a
> 'real' version ({B, B, H}). If we model only the realistic one then we have
> to choose a simple graph to represent it for the algorithm. That would
> require special casing - for example, going from the hyper-edge {B, B, H}
> -> B-H-B and not B-B-H or even a triangle!
>
> Sorry, I should draw this as a diagram. However, one possible
> implementation could be to have the underlying simple graph, with
> hyperedges as a separate set on top?
>
> gilleain
>
> On Thu, Apr 27, 2017 at 10:18 AM, Egon Willighagen <
> egon.willigha...@gmail.com> wrote:
>
>>
>>
>> On Thu, Apr 27, 2017 at 11:04 AM, John Mayfield <
>> john.wilkinson...@gmail.com> wrote:
>>
>>> It's a chicken and egg problem... people don't use it, and even can't
>>>> use it, because there are no good tools yet; not making good tools also
>>>> ensures no one uses it.
>>>
>>>
>>> I somewhat agree but MDL/ChemDraw support it and it's still not used.
>>>
>>
>> Yes, you need a full stack... a solution from start to end.
>>
>>
>>> So, rather than trunking the CDK, I would suggest, let's make work of
>>>> convincing the cheminformatics community of these advanced features, just
>>>> like others are doing to the PTM-protein sequence mashups... let's be a
>>>> leader, rather than a follower.
>>>
>>>
>>> At the moment it makes the general case (with 99.999999% of uses) much
>>> slower for a case that isn't used (yet).
>>>
>>
>> Yes, a stack is needed here too... this is where an object design
>> actually should excel... the core interface could restrict to a two entity
>> interaction (atom-atom bond) and an extending interface could generalize
>> that (or the other way around)... that's why I started the interfaces...
>>
>>
>>> I think it should be possible to do the exotic but not at the cost of
>>> the regular. As a library I would push for getting the basics of the model
>>> right before moving on to extra bits, I quote Jurassic park in jest :p:
>>>
>>> *Your scientists were so preoccupied with whether or not they could,
>>>> they didn’t stop to think if they should*
>>>
>>>
>> Well, the reaction mechanism and organometallics cheminformatics are two
>> examples why you should: they current methods are not precise enough.
>>
>> Now, you could argue, because there are no downstream solutions, why care
>> about that. But following this thinking we should stop the CDK right now;
>> we have OpenBabel for file format conversion and that's all the world will
>> ever need. (And, yes, this statement is strongly backed up with citation
>> statistics...)
>>
>> I am not so worried about our current feeling of what we should
>> (serendipity cannot be predicted; by definition), but when and where should
>> we do it. Sadly, the latter is nowadays 99.99999% determined by funding,
>> not so much longer term innovation.
>>
>> I have been tracking how people are using the CDK, tho I am about two
>> years behind with this... but this is at the level of packages, and not
>> down to the class or even method level, as your question would demand...
>> what would be really helpful is a Maven extension that would tell me the
>> following:
>>
>> - given some Java code using the CDK, tell me which packages, classes and
>> methods are used and how often
>>
>> Do you know something like that? If so, then we can run such an analysis
>> on the code bases using the CDK (Bioclipse, Scaffold Hunter, PaDEL, ...,
>> ...)...
>>
>> (Yes, I am aware that even this does not answer that data-dependent
>> question how often a more-than-two atom bond is used... I doubt any general
>> purpose tool does that... :( )
>>
>> OK, (in addition to the above point), maybe we should rephrase the
>> question: how many databases that CDK users will want to be able to use,
>> have more-than-two atom bonds?
>>
>> Egon
>>
>> --
>> E.L. Willighagen
>> Department of Bioinformatics - BiGCaT
>> Maastricht University (http://www.bigcat.unimaas.nl/)
>> Homepage: http://egonw.github.com/
>> LinkedIn: http://se.linkedin.com/in/egonw
>> Blog: http://chem-bla-ics.blogspot.com/
>> PubList: http://www.citeulike.org/user/egonw/tag/papers
>> ORCID: 0000-0001-7542-0286
>> ImpactStory: https://impactstory.org/u/egonwillighagen
>>
>> ------------------------------------------------------------
>> ------------------
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> _______________________________________________
>> Cdk-user mailing list
>> Cdk-user@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/cdk-user
>>
>>
>
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Cdk-user mailing list
> Cdk-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-user
>
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user