Dear all, Has anyone experience with the postgrsql.conf and tuning its parameters for efficient sub structure search in collections of about 5-10 million compounds?
We have about 250GB of RAM and 40 CPUs at hand. Concurrency will be around 10 simultaneous connections/queries so any memory parameter that is additive regarding number of parallel queries should be 1/10 of its max. It`s mainly a question on how to share the available RAM among the various parameters. I found the recommendations of Greg to increase "shared_bufferes" and "work_mem" as well as turning off "synchronous_commit" and "full_page_writes" but I`d really want to make best use of what I have available. I played around a little with those values but I was hoping for someone who already has an optimized setup in terms of ratios between the different memory parameters. I tested the same queries as shown in the documentation unfortunately we have a different collection (about 4x times as large) and I find the following results (repeated queries give almost identical results in terms of execution time) >From the documentation: select count(*) from mols where m@>'c1cccc2c1nncc2'; count ------- 1916 (1 row) Time: 531.994 ms select count(*) from mols where m@>'c1ccnc2c1nccn2'; count ------- 1020 (1 row) Time: 419.250 ms select count(*) from mols where m@>'c1cccc2c1ncs2' ; select count(*) from mols where m@>'c1cccc2c1CNCCN2'; count ------- 3014 (1 row) Time: 5717.531 ms Just to see if possible: select count(*) from mols where m@>'c1ccccc1'; count --------- 3466765 (1 row) Time: 219125.331 ms select count(*) from mols where m@>'C'; count --------- 4999564 (1 row) Time: 185236.779 ms You can see, the last query returns the full collection in 3 minutes while benzene returns ¾ and takes about 4 minutes. Thank you very much, Alex Best regards / Mit freundlichen Grüßen / Sincères salutations Dr. Alexander Garvin Klenner-Bajaja Administrator Requirements Engineering-Solution Design | Dir. 2.8.3.3 European Patent Office Patentlaan 3-9 | 2288 EE Rijswijk | The Netherlands Tel. +31(0)70340-1991 [email protected]<mailto:[email protected]> www.epo.org<http://www.epo.org/> Please consider the environment before printing this email.
------------------------------------------------------------------------------
_______________________________________________ Rdkit-discuss mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

