Re: [Rdkit-discuss] Dividing inputstream over threads

Dmitri Maziuk via Rdkit-discuss Sun, 20 Jan 2019 12:17:48 -0800

On Sun, 20 Jan 2019 12:03:50 +0100
Shojiro Shibayama <notify.p...@gmail.com> wrote:


> ... I guess SQLalchemy
> in python might be good, but I'm not sure. Hope that you'll find out
> a good library of SQL OR mapper for python.

SQLalchemy creates a fairly specific ecosystem that you have to buy
into for it to make sense. When you don't have objects, only a table
of properties, OR mapper is just bloat. 

With parallel processing your bottleneck is going to be database
inserts. One option is write out CSV file(s) from each thread/job,
concatenate them in the final node, and then bulk-import into the
database: typically CSV (or other such format) bulk import is orders
of magnitude faster than inserting one SQL statement at a time.

-- 
Dmitri Maziuk <dmaz...@bmrb.wisc.edu>


_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Dividing inputstream over threads

Reply via email to