Hi,

Absolutely agree that we should optimise for the 99% of cases, but would be
good to (try and) support the more complex ones.

With regards to the implementation, Egon says:

>
>  the core interface could restrict to a two entity interaction (atom-atom
> bond) and an extending interface could generalize that (or the other way
> around)


so I think of things like Cp-Fe sandwiches or B-H-B bonds as hypergraphs.
So as John points out, many standard algorithms most likely won't work on
these data structures - as far as I know (which is hardly anything! when it
comes to hypergraphs).

Therefore, if we want - say - canonicalisation to work on these examples
then I guess we need to have a valence-unrealistic version (B-H-B) and a
'real' version ({B, B, H}). If we model only the realistic one then we have
to choose a simple graph to represent it for the algorithm. That would
require special casing - for example, going from the hyper-edge {B, B, H}
-> B-H-B and not B-B-H or even a triangle!

Sorry, I should draw this as a diagram. However, one possible
implementation could be to have the underlying simple graph, with
hyperedges as a separate set on top?

gilleain

On Thu, Apr 27, 2017 at 10:18 AM, Egon Willighagen <
egon.willigha...@gmail.com> wrote:

>
>
> On Thu, Apr 27, 2017 at 11:04 AM, John Mayfield <
> john.wilkinson...@gmail.com> wrote:
>
>> It's a chicken and egg problem... people don't use it, and even can't use
>>> it, because there are no good tools yet; not making good tools also ensures
>>> no one uses it.
>>
>>
>> I somewhat agree but MDL/ChemDraw support it and it's still not used.
>>
>
> Yes, you need a full stack... a solution from start to end.
>
>
>> So, rather than trunking the CDK, I would suggest, let's make work of
>>> convincing the cheminformatics community of these advanced features, just
>>> like others are doing to the PTM-protein sequence mashups... let's be a
>>> leader, rather than a follower.
>>
>>
>> At the moment it makes the general case (with 99.999999% of uses) much
>> slower for a case that isn't used (yet).
>>
>
> Yes, a stack is needed here too... this is where an object design actually
> should excel... the core interface could restrict to a two entity
> interaction (atom-atom bond) and an extending interface could generalize
> that (or the other way around)... that's why I started the interfaces...
>
>
>> I think it should be possible to do the exotic but not at the cost of the
>> regular. As a library I would push for getting the basics of the model
>> right before moving on to extra bits, I quote Jurassic park in jest :p:
>>
>> *Your scientists were so preoccupied with whether or not they could, they
>>> didn’t stop to think if they should*
>>
>>
> Well, the reaction mechanism and organometallics cheminformatics are two
> examples why you should: they current methods are not precise enough.
>
> Now, you could argue, because there are no downstream solutions, why care
> about that. But following this thinking we should stop the CDK right now;
> we have OpenBabel for file format conversion and that's all the world will
> ever need. (And, yes, this statement is strongly backed up with citation
> statistics...)
>
> I am not so worried about our current feeling of what we should
> (serendipity cannot be predicted; by definition), but when and where should
> we do it. Sadly, the latter is nowadays 99.99999% determined by funding,
> not so much longer term innovation.
>
> I have been tracking how people are using the CDK, tho I am about two
> years behind with this... but this is at the level of packages, and not
> down to the class or even method level, as your question would demand...
> what would be really helpful is a Maven extension that would tell me the
> following:
>
> - given some Java code using the CDK, tell me which packages, classes and
> methods are used and how often
>
> Do you know something like that? If so, then we can run such an analysis
> on the code bases using the CDK (Bioclipse, Scaffold Hunter, PaDEL, ...,
> ...)...
>
> (Yes, I am aware that even this does not answer that data-dependent
> question how often a more-than-two atom bond is used... I doubt any general
> purpose tool does that... :( )
>
> OK, (in addition to the above point), maybe we should rephrase the
> question: how many databases that CDK users will want to be able to use,
> have more-than-two atom bonds?
>
> Egon
>
> --
> E.L. Willighagen
> Department of Bioinformatics - BiGCaT
> Maastricht University (http://www.bigcat.unimaas.nl/)
> Homepage: http://egonw.github.com/
> LinkedIn: http://se.linkedin.com/in/egonw
> Blog: http://chem-bla-ics.blogspot.com/
> PubList: http://www.citeulike.org/user/egonw/tag/papers
> ORCID: 0000-0001-7542-0286
> ImpactStory: https://impactstory.org/u/egonwillighagen
>
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Cdk-user mailing list
> Cdk-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-user
>
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user

Reply via email to