Re: [Rdkit-discuss] Cleaning SD files

2010-09-17 Thread markus kossner
Paul Emsley wrote:
> On 17/09/10 07:57, markus kossner wrote:
>   
>> some time ago I implemented a filter function during a pdb mining 
>> campaign. The Idea was to exclude compounds to far away from drug like 
>> chemical matter.
>>
>> 
>
> As a matter of interest, how did you convert from a pdb file that might 
> contain a ligand to an RDKit mol?  Look for residues with HETATMs? In 
> making the RDKit mol, how did you know the bonds and the bond orders 
> (look them up in the Chemical Component Library, perhaps)? (non-trivial 
> AFAICS).
>
> Paul.
>
>
> --
> Start uncovering the many advantages of virtual appliances
> and start using them to simplify application deployment and
> accelerate your shift to cloud computing.
> http://p.sf.net/sfu/novell-sfdev2dev
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>   
Ehm,
I think Greg has already said the right comment some days ago when he 
wrote that he would not even dare to try this.
Neither did I ...
I used the entries in the scPdb database. There you can download the 
Protein in pdb and the corresponding ligand as mol2.
This makes things a lot easier with small molecule pdb ligands ...
Then you can handle the protein with biopython and do the small molecule 
stuff using RDKit.
Markus




--
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Cleaning SD files

2010-09-17 Thread Paul Emsley
On 17/09/10 07:57, markus kossner wrote:
>
> some time ago I implemented a filter function during a pdb mining 
> campaign. The Idea was to exclude compounds to far away from drug like 
> chemical matter.
>

As a matter of interest, how did you convert from a pdb file that might 
contain a ligand to an RDKit mol?  Look for residues with HETATMs? In 
making the RDKit mol, how did you know the bonds and the bond orders 
(look them up in the Chemical Component Library, perhaps)? (non-trivial 
AFAICS).

Paul.


--
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Cleaning SD files

2010-09-16 Thread markus kossner




Hi James,
some time ago I implemented a filter function during a pdb mining
campaign. The Idea was to exclude compounds to far away from drug like
chemical matter.
The function I wrote is not the prettiest one, but it is somehow
complementary to your substructure matching. The filter function uses
typical descriptors you might want to have  in  a filter routine, so
maybe this can give some inspiration. I added the import statement and
some print statements in order to make cut and paste play-arounds
easier:

Snip
 from Chem import Descriptors, Crippen, Lipinski, AvailDescriptors
 #For more Descriptors try AvailDescriptors.descList
 def DLfilter(mol):
    try:
        if Descriptors.MolWt(mol) <= 100 or Descriptors.MolWt(mol)
>=800:
            print 'Failed in filter function
MolWt:',Descriptors.MolWt(mol)
            return False
        if Lipinski.NumHDonors(mol) >= 5:
            print 'Failed in filter function
NumHDonors:',Lipinski.NumHDonors(mol)
            return False
        if Lipinski.NumHAcceptors(mol) >= 10:
            print 'Failed in filter function
NumHAcceptors:',Lipinski.NumHAcceptors(mol)
            return False
        if Crippen.MolLogP(mol) <= -4 or Crippen.MolLogP(mol) >=
6:
            print 'Failed in filter function LogP:',Crippen.MolLogP(mol)
            return False
        if Lipinski.NumRotatableBonds(mol) >= 15:
            print 'Failed in filter function
NumRotatableBonds:',Lipinski.NumRotatableBonds(mol)
            return False
        else:
            return True
    except:
        print "D'oh"
        return False
Snip

Kind regards,
Markus

James Davidson wrote:

  Hi Greg,

Thanks for the reply.

  
  

  1.  In RDKit, has the 'cleaning / washing / salt-stripping' of 
molecules already been formalised based on a set of rules, etc?
  

Not that I'm aware of on the open-source side of things. All 
of the functionality required to do this is, I believe, 
present in the RDKit though.

  
  
Great - I certainly found all the functionality I was looking for, but just wanted to make sure I wasn't missing any short-cuts!

  
  

  2.  When identifying compounds that contain a non-allowed 
  

atom-type, 


  why do I find the SMARTS def [!H;!C;!N;!O;!F;!S;!Cl;!Br;!I] gives 
unexpected results, but [!#1;!#6;!#7;!#8;!#9;!#16;!#17;!#35;!#53] 
works as I would expect?
  

This is a fairly common SMARTS "gotcha": in SMARTS the query "[C]"
means "aliphatic C". This leads to the following behavior:
[3]>>> 
Chem.MolFromSmiles('c1c1').GetSubstructMatches(Chem.MolFro
mSmarts('[!C]'))
Out[3] ((0,), (1,), (2,), (3,), (4,), (5,)) If you want to be 
sure that your SMARTS will capture aliphatic or aromatic 
atoms, you need to provide the atomic numbers, as in your 
second query:
[4]>>> 
Chem.MolFromSmiles('c1c1').GetSubstructMatches(Chem.MolFro
mSmarts('[!#6]'))
Out[4] ()

  
  
Wow - I really was having some sort of mental block yesterday! (goes-off to look for some sort of embarrassed + dunce smiley...)

Kind regards

James

__
PLEASE READ: This email is confidential and may be privileged. It is intended for the named addressee(s) only and access to it by anyone else is unauthorised. If you are not an addressee, any disclosure or copying of the contents of this email or any action taken (or not taken) in reliance on it is unauthorised and may be unlawful. If you have received this email in error, please notify the sender or postmas...@vernalis.com. Email is not a secure method of communication and the Company cannot accept responsibility for the accuracy or completeness of this message or any attachment(s). Please check this email for virus infection for which the Company accepts no responsibility. If verification of this email is sought then please request a hard copy. Unless otherwise stated, any views or opinions presented are solely those of the author and do not represent those of the Company.

The Vernalis Group of Companies
Oakdene Court
613 Reading Road
Winnersh, Berkshire
RG41 5UA.
Tel: +44 118 977 3133

To access trading company registration and address details, please go to the Vernalis website at www.vernalis.com and click on the "Company address and registration details" link at the bottom of the page..
__

--
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
  





--

Re: [Rdkit-discuss] Cleaning SD files

2010-09-16 Thread James Davidson
Hi Greg,

Thanks for the reply.

> > 1.  In RDKit, has the 'cleaning / washing / salt-stripping' of 
> > molecules already been formalised based on a set of rules, etc?
> 
> Not that I'm aware of on the open-source side of things. All 
> of the functionality required to do this is, I believe, 
> present in the RDKit though.

Great - I certainly found all the functionality I was looking for, but just 
wanted to make sure I wasn't missing any short-cuts!

> > 2.  When identifying compounds that contain a non-allowed 
> atom-type, 
> > why do I find the SMARTS def [!H;!C;!N;!O;!F;!S;!Cl;!Br;!I] gives 
> > unexpected results, but [!#1;!#6;!#7;!#8;!#9;!#16;!#17;!#35;!#53] 
> > works as I would expect?
> 
> This is a fairly common SMARTS "gotcha": in SMARTS the query "[C]"
> means "aliphatic C". This leads to the following behavior:
> [3]>>> 
> Chem.MolFromSmiles('c1c1').GetSubstructMatches(Chem.MolFro
> mSmarts('[!C]'))
> Out[3] ((0,), (1,), (2,), (3,), (4,), (5,)) If you want to be 
> sure that your SMARTS will capture aliphatic or aromatic 
> atoms, you need to provide the atomic numbers, as in your 
> second query:
> [4]>>> 
> Chem.MolFromSmiles('c1c1').GetSubstructMatches(Chem.MolFro
> mSmarts('[!#6]'))
> Out[4] ()

Wow - I really was having some sort of mental block yesterday! (goes-off to 
look for some sort of embarrassed + dunce smiley...)

Kind regards

James

__
PLEASE READ: This email is confidential and may be privileged. It is intended 
for the named addressee(s) only and access to it by anyone else is 
unauthorised. If you are not an addressee, any disclosure or copying of the 
contents of this email or any action taken (or not taken) in reliance on it is 
unauthorised and may be unlawful. If you have received this email in error, 
please notify the sender or postmas...@vernalis.com. Email is not a secure 
method of communication and the Company cannot accept responsibility for the 
accuracy or completeness of this message or any attachment(s). Please check 
this email for virus infection for which the Company accepts no responsibility. 
If verification of this email is sought then please request a hard copy. Unless 
otherwise stated, any views or opinions presented are solely those of the 
author and do not represent those of the Company.

The Vernalis Group of Companies
Oakdene Court
613 Reading Road
Winnersh, Berkshire
RG41 5UA.
Tel: +44 118 977 3133

To access trading company registration and address details, please go to the 
Vernalis website at www.vernalis.com and click on the "Company address and 
registration details" link at the bottom of the page..
__

--
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Cleaning SD files

2010-09-16 Thread Greg Landrum
Dear James,

On Thu, Sep 16, 2010 at 8:01 PM, James Davidson  wrote:
>
> I have attached the python-script that I have at the moment (a) in case it
> is of some use to anybody else, (b) in the hope that I can improve my python
> and rdkit abilities through any suggested alterations (I'm sure there are
> many!), and (c) to form the basis of a couple of questions.  At the moment,
> the script is just running through each compound; checking if the molecule
> is valid; and if so, noting how many components, and whether any of the
> atoms are outside of the desired list.  These two results are then written
> out to a new SDF.  I am then using this to make sure my data-set contains
> only compounds that I would say are 'reasonable' to build a melting-point
> model with.  Now for the questions:

Thanks for sending along the script. I haven't been through it yet but
I will try and find some time later for that.

> 1.  In RDKit, has the 'cleaning / washing / salt-stripping' of molecules
> already been formalised based on a set of rules, etc?

Not that I'm aware of on the open-source side of things. All of the
functionality required to do this is, I believe, present in the RDKit
though.

> 2.  When identifying compounds that contain a non-allowed atom-type, why do
> I find the SMARTS def [!H;!C;!N;!O;!F;!S;!Cl;!Br;!I] gives unexpected
> results, but [!#1;!#6;!#7;!#8;!#9;!#16;!#17;!#35;!#53] works as I would
> expect?

This is a fairly common SMARTS "gotcha": in SMARTS the query "[C]"
means "aliphatic C". This leads to the following behavior:
[3]>>> 
Chem.MolFromSmiles('c1c1').GetSubstructMatches(Chem.MolFromSmarts('[!C]'))
Out[3] ((0,), (1,), (2,), (3,), (4,), (5,))
If you want to be sure that your SMARTS will capture aliphatic or
aromatic atoms, you need to provide the atomic numbers, as in your
second query:
[4]>>> 
Chem.MolFromSmiles('c1c1').GetSubstructMatches(Chem.MolFromSmarts('[!#6]'))
Out[4] ()

Best Regards,
-greg

--
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Cleaning SD files

2010-09-16 Thread James Davidson
Dear All,
 
Today I have spent some time processing a freely-available SDF that
contains many compounds and melting-points / ranges (
http://www.mdpi.org/molmall/mdpi1-51sd.zip).  The reason for doing this
is that I wanted to implement a melting-point predictor following the
work of Andreas Bender (J. Chem. Inf. Model. 2005, 45, 581-590) and more
recently Reifeng Liu at AZ (J. Chem. Inf. Model. 2008, 48, 981-987).
 
I have attached the python-script that I have at the moment (a) in case
it is of some use to anybody else, (b) in the hope that I can improve my
python and rdkit abilities through any suggested alterations (I'm sure
there are many!), and (c) to form the basis of a couple of questions.
At the moment, the script is just running through each compound;
checking if the molecule is valid; and if so, noting how many
components, and whether any of the atoms are outside of the desired
list.  These two results are then written out to a new SDF.  I am then
using this to make sure my data-set contains only compounds that I would
say are 'reasonable' to build a melting-point model with.  Now for the
questions:
 
1.  In RDKit, has the 'cleaning / washing / salt-stripping' of molecules
already been formalised based on a set of rules, etc?
2.  When identifying compounds that contain a non-allowed atom-type, why
do I find the SMARTS def [!H;!C;!N;!O;!F;!S;!Cl;!Br;!I] gives unexpected
results, but [!#1;!#6;!#7;!#8;!#9;!#16;!#17;!#35;!#53] works as I would
expect?
 
Kind regards
 
James

__
PLEASE READ: This email is confidential and may be privileged. It is intended 
for the named addressee(s) only and access to it by anyone else is 
unauthorised. If you are not an addressee, any disclosure or copying of the 
contents of this email or any action taken (or not taken) in reliance on it is 
unauthorised and may be unlawful. If you have received this email in error, 
please notify the sender or postmas...@vernalis.com. Email is not a secure 
method of communication and the Company cannot accept responsibility for the 
accuracy or completeness of this message or any attachment(s). Please check 
this email for virus infection for which the Company accepts no responsibility. 
If verification of this email is sought then please request a hard copy. Unless 
otherwise stated, any views or opinions presented are solely those of the 
author and do not represent those of the Company.

The Vernalis Group of Companies
Oakdene Court
613 Reading Road
Winnersh, Berkshire
RG41 5UA.
Tel: +44 118 977 3133

To access trading company registration and address details, please go to the 
Vernalis website at www.vernalis.com and click on the "Company address and 
registration details" link at the bottom of the page..
__

inorg_or_mix.py
Description: inorg_or_mix.py
--
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss