There was a presentation at a recent Cambridge Cheminformatics meeting on using Elastic for similarity searches, and I think the presenter was also considering extending to substructure matching as well. But no code available as far as I can tell
https://github.com/MysterionRise On Thu, Jan 21, 2021 at 10:44 AM Joos Kiener <joos.kie...@gmail.com> wrote: > Hi Naomi, > > I once played around a bit with this idea using the Lucene-based RDKit > example as guidance. However what that code does inside Lucene and hence my > "adaption" inside elastic search is only the fingerprint screening part. > For the actual subgraph-match the data then has to be sent to the > caller/client and doesn't run inside elastic search and means one must > manipulate the elastic search results (hit count, paging,...) before > finally returning to the end user application. Simply said, not a very > usable but very hacky solution. > > Even ignoring that part, it wasn't very fast either. That could be due to > many things like only having 1 machine for ES (my machine, no cluster) and > not being an expert in ES anyway (suboptimal config?). Or maybe the dataset > was too small to actually benefit. Same data, same query is much faster in > PostgreSQL + RDKit + Full-text index and easier to use. (Yes, PostgreSQL > supports full-text search similar to elastic. if one doesn't need very > advanced features or has a lot of data, for sure worth a look) > > Any "real solution" must also do the subgraph matching inside elastic > itself which means writing a plugin / extension for elasticsearch. This was > simply too involved for me to even try. (If that is of interest, you should > probably also look at the very recent licensing changes to elasticsearch). > > The presentation Joshua mentioned is actually only about similarity search > which naturally is easier to implement and fast. > > Having said that, there is a commercial solution available from > PerkinElmer in their Signals Data factory offering. Of course this has > nothing to do with RDKit but it does hint that it's possible to do this if > you have the time, budget and skills/knowledge. > > Another commercial "fast substructure search" option would be nextmoves > Arthor but that has nothing to do with elasticsearch. Question is if you > want elasticsearch due to the speed or due to the combination with text > search. I would probably avoid it if the text search part is not important. > > Just using RDKit default functionality is actually pretty fast (see on > Gregs blog), well it does run in memory. Nowadays a machine with lots of > RAM doesn't cost all that much so I could see that scaling to 10-20 million > structures easily. > > hope that helps you a bit to come to a conclusion on what to do. > > Best Regards, > > Joos > > > ---------- Forwarded message ---------- >> From: Naomi Jacobs <na...@benchling.com> >> To: rdkit-discuss@lists.sourceforge.net >> Cc: Alan Pierce <a...@benchling.com>, Larry Taylor <la...@benchling.com> >> Bcc: >> Date: Wed, 20 Jan 2021 22:27:32 -0800 >> Subject: [Rdkit-discuss] RDKit ElasticSearch Plugin >> Hi all, >> >> We're looking for information about whether anyone has built an >> ElasticSearch plugin using RDKit to support chemical search. I didn't see >> anything open-source online, but was thinking some folks may have heard >> about internal efforts and would be willing to share any code and/or chat >> about it. Thanks! >> >> Cheers, >> Naomi >> >> -- >> *Naomi Jacobs* >> Software Engineer | benchling.com >> (415) 590-2798 >> >> >> >> ---------- Forwarded message ---------- >> From: Greg Landrum <greg.land...@gmail.com> >> To: Naomi Jacobs <na...@benchling.com> >> Cc: RDKit Discuss <rdkit-discuss@lists.sourceforge.net>, Larry Taylor < >> la...@benchling.com> >> Bcc: >> Date: Thu, 21 Jan 2021 08:54:08 +0100 >> Subject: Re: [Rdkit-discuss] RDKit ElasticSearch Plugin >> Hi Naomi, >> >> I'm not personally aware of any ElasticSearch work, but there is a >> prototype for a lucene plugin which could, I believe, be used as the basis >> for an ElasticSearch plugin: >> https://github.com/rdkit/org.rdkit.lucene >> >> It's (obviously) been a while since anyone did anything with that code >> and it may no longer work, but the more recent (and still functional) >> RDKit-neo4j integration (https://github.com/rdkit/neo4j-rdkit) can >> provide some patterns for how the RDKit java integration can be used in >> this type of context. >> >> I hope this helps, and would be interested to hear if you end up doing >> anything with the RDKit and ElasticSearch. >> -greg >> >> >> _______________________________________________ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > -- Rajarshi Guha | http://blog.rguha.net | @rguha <https://twitter.com/rguha>
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss