Dear Greg and Brian,
Many thanks for your response. I was also thinking of your streaming
approach! I think the RAM of most machine would deal with lists of 100K mol
so we could put the threshold higher than 1000. Actually, I was thinking to
monitor the available RAM and only start processing the matrix and clearing
the list when less than 20% of RAM is left. This way, the best machines
could skip the clearing process and gain time. What do you think?


Best,

Alexis





On 9 June 2017 at 14:40, Brian Kelley <fustiga...@gmail.com> wrote:

> While not multithreaded (yet) this is the use case of the filter catalog:
>
> http://rdkit.blogspot.com/2016/04/changes-in-201603-
> release-filtercatalog.html?m=1
>
> Look for the SmartsMatcher class in the blog.
>
> It is a good idea to make this multithreaded as well, I'll add this as a
> possible enhancement.
>
> ----
> Brian Kelley
>
> On Jun 9, 2017, at 7:04 AM, Greg Landrum <greg.land...@gmail.com> wrote:
>
> Hi Alexis,
>
> I would approach this by loading the 1000 queries into a list of molecules
> and then "stream" the others past that (so that you never attempt to load
> the full 500K set at once).
>
> Here's a quick sketch of one way to do this:
>
> In [4]: queries = [x for x in Chem.ForwardSDMolSupplier('mols.1000.sdf')
> if x is not None]
>
> In [5]: matches = []
>
> In [6]: for m in Chem.ForwardSDMolSupplier('./znp.50k.sdf'):
>    ...:     if m is None:
>    ...:         continue
>    ...:     matches.append([m.HasSubstructMatch(q) for q in queries])
>    ...:
>
>
>
> Brian has some thoughts on making this particular use case easier/faster
> (in particular by adding multi-threading support), so maybe there will be
> something in the next release there.
>
> I hope this helps,
> -greg
>
>
> On Sun, Jun 4, 2017 at 10:25 PM, Alexis Parenty <
> alexis.parenty.h...@gmail.com> wrote:
>
>> Dear RDKit community,
>>
>> I need to screen for substructure relationships between two sets of
>> structures (1 000 X 500 000): I thought I should build two lists of mol
>> objects from SMILES, but I keep having a memory error when the second list
>> reaches 300 000 mol. All my RAM (12G) gets consumed along with all my
>> virtual memory.
>>
>> Do I really have to compromise on speed and make mol object on the flight
>> from two lists of SMILES? Is there another memory efficient way to store
>> mol object?
>>
>> Best,
>>
>> Alexis
>>
>> ------------------------------------------------------------
>> ------------------
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> _______________________________________________
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to