Dear all,

Has anyone experience with the postgrsql.conf and tuning its parameters for 
efficient sub structure search in collections of about 5-10 million compounds?

We have about 250GB of RAM and 40 CPUs at hand. Concurrency will be around 10 
simultaneous connections/queries so any memory parameter that is additive 
regarding number of parallel queries should be 1/10 of its max.  It`s mainly a 
question on how to share the available RAM among the various parameters.

I found the recommendations of Greg to increase "shared_bufferes" and 
"work_mem" as well as turning off "synchronous_commit" and "full_page_writes"  
but I`d really want to make best use of what I have available.

I played around a little with those values but I was hoping for someone who 
already has an optimized setup in terms of ratios between the different memory 
parameters.

I tested the same queries as shown in the documentation unfortunately we have a 
different collection (about 4x times as large) and I find the following results 
(repeated queries give almost identical results in terms of execution time)

>From the documentation:
select count(*) from mols where m@>'c1cccc2c1nncc2';
count
-------
  1916
(1 row)
Time: 531.994 ms

select count(*) from mols where m@>'c1ccnc2c1nccn2';
count
-------
  1020
(1 row)
Time: 419.250 ms

select count(*) from mols where m@>'c1cccc2c1ncs2' ;


select count(*) from mols where m@>'c1cccc2c1CNCCN2';
count
-------
  3014
(1 row)
Time: 5717.531 ms


Just to see if possible:
select count(*) from mols where m@>'c1ccccc1';
count
---------
3466765
(1 row)
Time: 219125.331 ms

select count(*) from mols where m@>'C';
count
---------
4999564
(1 row)

Time: 185236.779 ms


You can see, the last query returns the full collection in 3 minutes while 
benzene returns ¾ and takes about 4 minutes.

Thank you very much,

Alex


Best regards / Mit freundlichen Grüßen / Sincères salutations

Dr. Alexander Garvin Klenner-Bajaja
Administrator Requirements Engineering-Solution Design | Dir. 2.8.3.3
European Patent Office
Patentlaan 3-9 | 2288 EE Rijswijk | The Netherlands
Tel. +31(0)70340-1991
aklen...@epo.org<mailto:aklen...@epo.org>
www.epo.org<http://www.epo.org/>

Please consider the environment before printing this email.

------------------------------------------------------------------------------
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to