Dear Christian, 2010/4/4 Christian de Bouillé <[email protected]>: > Dear Greg > > Thank you for your email > > With now 6 descriptors for each structure > over 11 million chemicals, my goal is to help the > visitor to get a set of molecules with a good diversity. > > For instance if he find 1000 structures from a query with a subtructure > the visitor can reduce to 96 molecules with a good diversity > and he can buy the microplate. > > My idea is to use cluster over the descriptors > but it is not evident with the scripts I wrote below > > Your "Overview PDF" uses Diversity Picking > Could you advise how to use it ? with descriptors or fingerprints > how to display the clusters? how to cut the cluster to find > 96 molecules.
"Diverse" is not particularly well defined. If you are trying to pick a set that looks chemically diverse to a chemist, I would suggest using diversity picking based on a fingerprint like the Morgan fingerprint or MACCS fingerprint provided by the RDKit. If you want a set that are diverse in your descriptor space (but maybe don't look as diverse to a chemist) then using your descriptors makes sense. The RDKit has a couple of different ways of doing diversity picking available that are in the rdkit.SimDivFilters module. The one I would recommend using is the MaxMinPicker: http://www.rdkit.org/Python_Docs/rdkit.SimDivFilters.rdSimDivPickers.MaxMinPicker-class.html Unfortunately there's not much sample code available other than the testing code in $RDBASE/Code/SimDivPickers/Wrap: http://rdkit.svn.sourceforge.net/viewvc/rdkit/trunk/Code/SimDivPickers/Wrap/testPickers.py?revision=997&view=markup > Is Python/Django quicker to query than jsp for 11 millions chemicals > > Could you display the chemicals with Python so well as Jchem ? I think the 2D drawings from the RDKit are not bad; the ones from JChem/Marvin are often better though. > Voilà, some advices of yours would help me to start the work > > Best Regards > Christian -greg ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ Rdkit-discuss mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

