Hi JP, On Wed, May 4, 2011 at 10:49 AM, JP <jeanpaul.ebe...@inhibox.com> wrote: > Hi there Adrian, > > Why should splitting the table vertically make a difference? > Am I not correct in thinking that would then require a join, which is > expensive (especially on 8M rows) ?
You would actually only be doing the join on the rows that match your SSS/similarity query. As long as you have some kind of primary key and build indices on that primary key the performance should be fine. This is the layout I almost always use: molecules in one table, bit vect fingerprints in a second, count-based fingerprints in a third. > My FS is local, and my indices are quite large > > db=# \di+ idx_ligand_rdkitmol; > List of relations > Schema | Name | Type | Owner | Table | Size | Description > --------+---------------------+-------+-------+--------+---------+------------- > public | idx_ligand_rdkitmol | index | jpebe | ligand | 5030 MB | > (1 row) > > db=# \di+ idx_ligand_morganbv; > List of relations > Schema | Name | Type | Owner | Table | Size | Description > --------+---------------------+-------+-------+--------+---------+------------- > public | idx_ligand_morganbv | index | jpebe | ligand | 1645 MB | > (1 row) yeah, those are huge. > But Greg is right (!) running the query a second time resulted in much > faster performance (11285.886ms as opposed to the original > 193973.253ms) > Of course if you change the smiles string, than nothing is cached and > it takes ages again... I actually wasn't proposing re-running the query to get the cached results, that's cheating. :-) The idea is that once you have the index in memory all queries will go faster. This is what the emolecules example on the wiki shows. It sounds like you've reached a database size where doing some real performance tuning on the database machine is going to be required. I don't have much experience with this, but there does seem to be a fair amount of information out there on the web. This page, in particular, looks helpful in explaining what the configuration parameters are and providing suggestions for tuning them: http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server shared_buffers and effective_cache_size look particularly relevant. This kind of performance tuning information would be really useful to collect, so if you don't end up getting totally frustrated, please do share your findings. Best, -greg ------------------------------------------------------------------------------ WhatsUp Gold - Download Free Network Management Software The most intuitive, comprehensive, and cost-effective network management toolset available today. Delivers lowest initial acquisition cost and overall TCO of any competing solution. http://p.sf.net/sfu/whatsupgold-sd _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss