Hi,

I'm doing a lot of tanimoto similarity calculations on large datasets using
BulkTanimotoSimilarity.  It is an obvious candidate for parallelisation, so
I am using concurrent.futures to do so.  If I use ProcessPoolExectuor, I
get good speed-up but each process needs a copy of the fingerprint set and
for the sizes I'm dealing with that uses too much memory.  With
ThreadPoolExecutor I only need 1 copy of the fingerprints, but the GIL
means it only runs on 1 thread at a time so there's no gain.  Would it be
possible to amend the C++ BulkTanimotoSimilarity to free the GIL whilst
it's doing the calculation, and recapture it afterwards?  I understand
things like numpy do this for some of their functions.  I'm happy to
attempt it myself if someone who knows about these things can advise that
it could be done, it would help, and they could provide a few pointers.

Thanks,
Dave


-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to