There was a presentation at a recent Cambridge Cheminformatics meeting on
using Elastic for similarity searches, and I think the presenter was also
considering extending to substructure matching as well. But no code
available as far as I can tell

https://github.com/MysterionRise

On Thu, Jan 21, 2021 at 10:44 AM Joos Kiener <joos.kie...@gmail.com> wrote:

> Hi Naomi,
>
> I once played around a bit with this idea using the Lucene-based RDKit
> example as guidance. However what that code does inside Lucene and hence my
> "adaption" inside elastic search is only the fingerprint screening part.
> For the actual subgraph-match the data then has to be sent to the
> caller/client and doesn't run inside elastic search and means one must
> manipulate the elastic search results (hit count, paging,...) before
> finally returning to the end user application. Simply said, not a very
> usable but very hacky solution.
>
> Even ignoring that part, it wasn't very fast either. That could be due to
> many things like only having 1 machine for ES (my machine, no cluster) and
> not being an expert in ES anyway (suboptimal config?). Or maybe the dataset
> was too small to actually benefit. Same data, same query is much faster in
> PostgreSQL + RDKit + Full-text index and easier to use. (Yes, PostgreSQL
> supports full-text search similar to elastic. if one doesn't need very
> advanced features or has a lot of data, for sure worth a look)
>
> Any "real solution" must also do the subgraph matching inside elastic
> itself which means writing a plugin / extension for elasticsearch. This was
> simply too involved for me to even try. (If that is of interest, you should
> probably also look at the very recent licensing changes to elasticsearch).
>
> The presentation Joshua mentioned is actually only about similarity search
> which naturally is easier to implement and fast.
>
> Having said that, there is a commercial solution available from
> PerkinElmer in their Signals Data factory offering. Of course this has
> nothing to do with RDKit but it does hint that it's possible to do this if
> you have the time, budget and skills/knowledge.
>
> Another  commercial "fast substructure search" option would be nextmoves
> Arthor but that has nothing to do with elasticsearch. Question is if you
> want elasticsearch due to the speed or due to the combination with text
> search. I would probably avoid it if the text search part is not important.
>
> Just using RDKit default functionality is actually pretty fast (see on
> Gregs blog), well it does run in memory. Nowadays a machine with lots of
> RAM doesn't cost all that much so I could see that scaling to 10-20 million
> structures easily.
>
> hope that helps you a bit to come to a conclusion on what to do.
>
> Best Regards,
>
> Joos
>
>
> ---------- Forwarded message ----------
>> From: Naomi Jacobs <na...@benchling.com>
>> To: rdkit-discuss@lists.sourceforge.net
>> Cc: Alan Pierce <a...@benchling.com>, Larry Taylor <la...@benchling.com>
>> Bcc:
>> Date: Wed, 20 Jan 2021 22:27:32 -0800
>> Subject: [Rdkit-discuss] RDKit ElasticSearch Plugin
>> Hi all,
>>
>> We're looking for information about whether anyone has built an
>> ElasticSearch plugin using RDKit to support chemical search. I didn't see
>> anything open-source online, but was thinking some folks may have heard
>> about internal efforts and would be willing to share any code and/or chat
>> about it. Thanks!
>>
>> Cheers,
>> Naomi
>>
>> --
>> *Naomi Jacobs*
>> Software Engineer | benchling.com
>> (415) 590-2798
>>
>>
>>
>> ---------- Forwarded message ----------
>> From: Greg Landrum <greg.land...@gmail.com>
>> To: Naomi Jacobs <na...@benchling.com>
>> Cc: RDKit Discuss <rdkit-discuss@lists.sourceforge.net>, Larry Taylor <
>> la...@benchling.com>
>> Bcc:
>> Date: Thu, 21 Jan 2021 08:54:08 +0100
>> Subject: Re: [Rdkit-discuss] RDKit ElasticSearch Plugin
>> Hi Naomi,
>>
>> I'm not personally aware of any ElasticSearch work, but there is a
>> prototype for a lucene plugin which could, I believe, be used as the basis
>> for an ElasticSearch plugin:
>> https://github.com/rdkit/org.rdkit.lucene
>>
>> It's (obviously) been a while since anyone did anything with that code
>> and it may no longer work, but the more recent (and still functional)
>> RDKit-neo4j integration (https://github.com/rdkit/neo4j-rdkit) can
>> provide some patterns for how the RDKit java integration can be used in
>> this type of context.
>>
>> I hope this helps, and would be interested to hear if you end up doing
>> anything with the RDKit and ElasticSearch.
>> -greg
>>
>>
>> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>


-- 
Rajarshi Guha | http://blog.rguha.net | @rguha <https://twitter.com/rguha>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to