Hi, I'm doing a lot of tanimoto similarity calculations on large datasets using BulkTanimotoSimilarity. It is an obvious candidate for parallelisation, so I am using concurrent.futures to do so. If I use ProcessPoolExectuor, I get good speed-up but each process needs a copy of the fingerprint set and for the sizes I'm dealing with that uses too much memory. With ThreadPoolExecutor I only need 1 copy of the fingerprints, but the GIL means it only runs on 1 thread at a time so there's no gain. Would it be possible to amend the C++ BulkTanimotoSimilarity to free the GIL whilst it's doing the calculation, and recapture it afterwards? I understand things like numpy do this for some of their functions. I'm happy to attempt it myself if someone who knows about these things can advise that it could be done, it would help, and they could provide a few pointers.
Thanks, Dave -- David Cosgrove Freelance computational chemistry and chemoinformatics developer http://cozchemix.co.uk
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss