Dear James, On Thu, Sep 16, 2010 at 8:01 PM, James Davidson <j.david...@vernalis.com> wrote: > > I have attached the python-script that I have at the moment (a) in case it > is of some use to anybody else, (b) in the hope that I can improve my python > and rdkit abilities through any suggested alterations (I'm sure there are > many!), and (c) to form the basis of a couple of questions. At the moment, > the script is just running through each compound; checking if the molecule > is valid; and if so, noting how many components, and whether any of the > atoms are outside of the desired list. These two results are then written > out to a new SDF. I am then using this to make sure my data-set contains > only compounds that I would say are 'reasonable' to build a melting-point > model with. Now for the questions:
Thanks for sending along the script. I haven't been through it yet but I will try and find some time later for that. > 1. In RDKit, has the 'cleaning / washing / salt-stripping' of molecules > already been formalised based on a set of rules, etc? Not that I'm aware of on the open-source side of things. All of the functionality required to do this is, I believe, present in the RDKit though. > 2. When identifying compounds that contain a non-allowed atom-type, why do > I find the SMARTS def [!H;!C;!N;!O;!F;!S;!Cl;!Br;!I] gives unexpected > results, but [!#1;!#6;!#7;!#8;!#9;!#16;!#17;!#35;!#53] works as I would > expect? This is a fairly common SMARTS "gotcha": in SMARTS the query "[C]" means "aliphatic C". This leads to the following behavior: [3]>>> Chem.MolFromSmiles('c1ccccc1').GetSubstructMatches(Chem.MolFromSmarts('[!C]')) Out[3] ((0,), (1,), (2,), (3,), (4,), (5,)) If you want to be sure that your SMARTS will capture aliphatic or aromatic atoms, you need to provide the atomic numbers, as in your second query: [4]>>> Chem.MolFromSmiles('c1ccccc1').GetSubstructMatches(Chem.MolFromSmarts('[!#6]')) Out[4] () Best Regards, -greg ------------------------------------------------------------------------------ Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss