On Thu, Apr 27, 2017 at 11:04 AM, John Mayfield <john.wilkinson...@gmail.com
> wrote:

> It's a chicken and egg problem... people don't use it, and even can't use
>> it, because there are no good tools yet; not making good tools also ensures
>> no one uses it.
>
>
> I somewhat agree but MDL/ChemDraw support it and it's still not used.
>

Yes, you need a full stack... a solution from start to end.


> So, rather than trunking the CDK, I would suggest, let's make work of
>> convincing the cheminformatics community of these advanced features, just
>> like others are doing to the PTM-protein sequence mashups... let's be a
>> leader, rather than a follower.
>
>
> At the moment it makes the general case (with 99.999999% of uses) much
> slower for a case that isn't used (yet).
>

Yes, a stack is needed here too... this is where an object design actually
should excel... the core interface could restrict to a two entity
interaction (atom-atom bond) and an extending interface could generalize
that (or the other way around)... that's why I started the interfaces...


> I think it should be possible to do the exotic but not at the cost of the
> regular. As a library I would push for getting the basics of the model
> right before moving on to extra bits, I quote Jurassic park in jest :p:
>
> *Your scientists were so preoccupied with whether or not they could, they
>> didn’t stop to think if they should*
>
>
Well, the reaction mechanism and organometallics cheminformatics are two
examples why you should: they current methods are not precise enough.

Now, you could argue, because there are no downstream solutions, why care
about that. But following this thinking we should stop the CDK right now;
we have OpenBabel for file format conversion and that's all the world will
ever need. (And, yes, this statement is strongly backed up with citation
statistics...)

I am not so worried about our current feeling of what we should
(serendipity cannot be predicted; by definition), but when and where should
we do it. Sadly, the latter is nowadays 99.99999% determined by funding,
not so much longer term innovation.

I have been tracking how people are using the CDK, tho I am about two years
behind with this... but this is at the level of packages, and not down to
the class or even method level, as your question would demand... what would
be really helpful is a Maven extension that would tell me the following:

- given some Java code using the CDK, tell me which packages, classes and
methods are used and how often

Do you know something like that? If so, then we can run such an analysis on
the code bases using the CDK (Bioclipse, Scaffold Hunter, PaDEL, ...,
...)...

(Yes, I am aware that even this does not answer that data-dependent
question how often a more-than-two atom bond is used... I doubt any general
purpose tool does that... :( )

OK, (in addition to the above point), maybe we should rephrase the
question: how many databases that CDK users will want to be able to use,
have more-than-two atom bonds?

Egon

-- 
E.L. Willighagen
Department of Bioinformatics - BiGCaT
Maastricht University (http://www.bigcat.unimaas.nl/)
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers
ORCID: 0000-0001-7542-0286
ImpactStory: https://impactstory.org/u/egonwillighagen
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user

Reply via email to