date:20191216

Re: [Rdkit-discuss] Constructing a mol object from a PDB ligand (Dimitri Maziuk)

2019-12-16 Thread Chris Swain via Rdkit-discuss

I had to do something similar on 40,000 PDB files ending up using PDB_tools

https://pypi.org/project/pdb-tools/

Cheers,

Chris




___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Constructing a mol object from a PDB ligand

2019-12-16 Thread Paolo Tosco


Hi IIllimar,

The RDKit PDB reader only recognize standard amino acids and, after the 
PR I did on Saturday https://github.com/rdkit/rdkit/pull/2850 will be 
merged, nucleic acid bases.


Anything else will not have the correct hybridization/bond orders 
perceived, as those are not encoded in the PDB format and the PDB reader 
does not have any functionality to do that.


The 1ARJ case is peculiar, as it has an ARG residue which would be 
recognized if it were in the ATOM records, but not in the HETATM 
section, for which no attempt to perceive the correct hybridization/bond 
is made.


My suggestion, if you are using standard PDB files, is to download the 
SDF file:


https://www.rcsb.org/pdb/download/downloadLigandFiles.do?ligandIdList=A2F&structIdList=3GOT&instanceType=all&excludeUnobserved=false&includeHydrogens=false

and construct your RDKit molecule from that.

You should be able to automate that without too much effort either 
constructing URLs using the template above or using the PDB REST API.


Cheers,
p.

On 16/12/2019 18:24, Illimar Hugo Rekand wrote:

Thanks, Paolo, for a good and clear example.


I adapted your code into my workflow to calculate some Lipinski-properties of 
RNA pdb-structures, and ran into some issues. I'm not sure if I should make a 
new thread or throw this onto this one I already created?


I used the following code under


from rdkit import Chem
from rdkit.Chem import rdmolops, Lipinski
from urllib.request import urlopen
import gzip
import pprint
pp = pprint.PrettyPrinter(indent=4)


Lipinski_dic = {'FractionCSP3':Lipinski.FractionCSP3,
 'HeavyAtomCount':Lipinski.HeavyAtomCount,
 'NHOHCount': Lipinski.NHOHCount,
 "NOCount":Lipinski.NOCount,
 "NumAliphaticCarbocycles": Lipinski.NumAliphaticCarbocycles,
 "NumAliphaticHeterocycles" : Lipinski.NumAliphaticHeterocycles,
 'NumAliphaticRings' :  Lipinski.NumAliphaticRings,
 'NumAromaticCarbocycles' : Lipinski.NumAromaticCarbocycles,
 'NumAromaticHeterocycles' : Lipinski.NumAromaticHeterocycles,
 'NumAromaticRings' : Lipinski.NumAromaticRings,
 'NumHAcceptors' : Lipinski.NumHAcceptors,
 'NumHDonors' : Lipinski.NumHDonors,
 'NumHeteroatoms' : Lipinski.NumHeteroatoms,
 'NumRotatableBonds' : Lipinski.NumRotatableBonds,
 'NumSaturatedCarbocycles' : Lipinski.NumSaturatedCarbocycles,
 'NumSaturatedHeterocycles' : Lipinski.NumSaturatedHeterocycles,
 'NumSaturatedRings' : Lipinski.NumSaturatedRings,
 'RingCount' : Lipinski.RingCount
 }

url =  "https://files.rcsb.org/download/1arj.pdb.gz";
pdb_data = gzip.decompress(urlopen(url).read())
mol = Chem.RWMol(Chem.MolFromPDBBlock(pdb_data))
bonds_to_cleave = {(b.GetBeginAtomIdx(), b.GetEndAtomIdx()) for b in 
mol.GetBonds() if b.GetBeginAtom().GetPDBResidueInfo().GetIsHeteroAtom() ^ 
b.GetEndAtom().GetPDBResidueInfo().GetIsHeteroAtom()}
[mol.RemoveBond(*b) for b in bonds_to_cleave]
hetatm_frags = [f for f in rdmolops.GetMolFrags(mol, asMols=True, 
sanitizeFrags=True) if f.GetNumAtoms() and 
f.GetAtomWithIdx(0).GetPDBResidueInfo().GetIsHeteroAtom()]
for hetatm in hetatm_frags:
 res_name = hetatm.GetAtomWithIdx(0).GetPDBResidueInfo().GetResidueName()
 calculated_props = {}
 for prop in Lipinski_dic:
 function = Lipinski_dic[prop]
 x = function(hetatm)
 calculated_props[prop] = x
 pp.pprint(calculated_props)


and as you can see the properties of the ligand doesn't match up with what is 
expected (The number of SP3-atoms doesn't match up). When parsing through the 
structure 3got, it fails to recognize the aromatic rings of the ligand A2F. I'm 
assuming this is caused by RDKit not assigning bond orders correctly when 
reading in RNA and DNA pdb files (something which I have reported in earlier on 
this mailing list)?


Running hetatm.UpdatePropertyCache(strict=True) does not remedy this problem. 
Is there a clever way I can fix this quickly without waiting for this to be 
fixed in a future version?


Illimar Rekand
Ph.D. candidate,
Brenk-lab, Haug-lab
Department of Biomedicine
Department of Chemistry
University of Bergen



From: Illimar Hugo Rekand
Sent: Monday, December 16, 2019 5:55:56 PM
To: Paolo Tosco
Subject: Re: [Rdkit-discuss] Constructing a mol object from a PDB ligand


Hey, Paolo,


thanks for a good and clear example!


all the best,


Illimar Rekand
Ph.D. candidate,
Brenk-lab, Haug-lab
Department of Biomedicine
Department of Chemistry
University of Bergen



From: Paolo Tosco 
Sent: Monday, December 16, 2019 5:52:18 PM
To: Illimar Hugo Rekand; rdkit-discuss@lists.sourceforge.net
Subject: Re: [Rdkit-discuss] Constructing a mol object from a PDB ligand

Hi Illimar,

this gist:

https://gist.github.

Re: [Rdkit-discuss] Constructing a mol object from a PDB ligand

2019-12-16 Thread Dimitri Maziuk via Rdkit-discuss

On 12/16/19 10:35 AM, Illimar Hugo Rekand wrote:
> Fair point.
> 
> But when working in the 100s and 1000s range of PDB-files it would be nice to 
> have some fewer steps when designing a pipeline.

But what's the selection criteria? NMR structures are usually deposited
with 20 models, do you want the ligand from every one? Only from the
representative one? There's at least one PDB ID (forget which) with 3
stable conformers, i.e. model 1 is not the representative structure.

Structures annotated by PDB will have HETATM instead of ATOM for
non-standards and ligands, but if your files haven't been processed by
them, all bets are off.

And so on
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu

signature.asc
Description: OpenPGP digital signature
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Constructing a mol object from a PDB ligand

2019-12-16 Thread Illimar Hugo Rekand

Thanks, Paolo, for a good and clear example.


I adapted your code into my workflow to calculate some Lipinski-properties of 
RNA pdb-structures, and ran into some issues. I'm not sure if I should make a 
new thread or throw this onto this one I already created?


I used the following code under


from rdkit import Chem
from rdkit.Chem import rdmolops, Lipinski
from urllib.request import urlopen
import gzip
import pprint
pp = pprint.PrettyPrinter(indent=4)


Lipinski_dic = {'FractionCSP3':Lipinski.FractionCSP3,
'HeavyAtomCount':Lipinski.HeavyAtomCount,
'NHOHCount': Lipinski.NHOHCount,
"NOCount":Lipinski.NOCount,
"NumAliphaticCarbocycles": Lipinski.NumAliphaticCarbocycles,
"NumAliphaticHeterocycles" : Lipinski.NumAliphaticHeterocycles,
'NumAliphaticRings' :  Lipinski.NumAliphaticRings,
'NumAromaticCarbocycles' : Lipinski.NumAromaticCarbocycles,
'NumAromaticHeterocycles' : Lipinski.NumAromaticHeterocycles,
'NumAromaticRings' : Lipinski.NumAromaticRings,
'NumHAcceptors' : Lipinski.NumHAcceptors,
'NumHDonors' : Lipinski.NumHDonors,
'NumHeteroatoms' : Lipinski.NumHeteroatoms,
'NumRotatableBonds' : Lipinski.NumRotatableBonds,
'NumSaturatedCarbocycles' : Lipinski.NumSaturatedCarbocycles,
'NumSaturatedHeterocycles' : Lipinski.NumSaturatedHeterocycles,
'NumSaturatedRings' : Lipinski.NumSaturatedRings,
'RingCount' : Lipinski.RingCount
}

url =  "https://files.rcsb.org/download/1arj.pdb.gz";
pdb_data = gzip.decompress(urlopen(url).read())
mol = Chem.RWMol(Chem.MolFromPDBBlock(pdb_data))
bonds_to_cleave = {(b.GetBeginAtomIdx(), b.GetEndAtomIdx()) for b in 
mol.GetBonds() if b.GetBeginAtom().GetPDBResidueInfo().GetIsHeteroAtom() ^ 
b.GetEndAtom().GetPDBResidueInfo().GetIsHeteroAtom()}
[mol.RemoveBond(*b) for b in bonds_to_cleave]
hetatm_frags = [f for f in rdmolops.GetMolFrags(mol, asMols=True, 
sanitizeFrags=True) if f.GetNumAtoms() and 
f.GetAtomWithIdx(0).GetPDBResidueInfo().GetIsHeteroAtom()]
for hetatm in hetatm_frags:
res_name = hetatm.GetAtomWithIdx(0).GetPDBResidueInfo().GetResidueName()
calculated_props = {}
for prop in Lipinski_dic:
function = Lipinski_dic[prop]
x = function(hetatm)
calculated_props[prop] = x
pp.pprint(calculated_props)


and as you can see the properties of the ligand doesn't match up with what is 
expected (The number of SP3-atoms doesn't match up). When parsing through the 
structure 3got, it fails to recognize the aromatic rings of the ligand A2F. I'm 
assuming this is caused by RDKit not assigning bond orders correctly when 
reading in RNA and DNA pdb files (something which I have reported in earlier on 
this mailing list)?


Running hetatm.UpdatePropertyCache(strict=True) does not remedy this problem. 
Is there a clever way I can fix this quickly without waiting for this to be 
fixed in a future version?


Illimar Rekand
Ph.D. candidate,
Brenk-lab, Haug-lab
Department of Biomedicine
Department of Chemistry
University of Bergen



From: Illimar Hugo Rekand
Sent: Monday, December 16, 2019 5:55:56 PM
To: Paolo Tosco
Subject: Re: [Rdkit-discuss] Constructing a mol object from a PDB ligand


Hey, Paolo,


thanks for a good and clear example!


all the best,


Illimar Rekand
Ph.D. candidate,
Brenk-lab, Haug-lab
Department of Biomedicine
Department of Chemistry
University of Bergen



From: Paolo Tosco 
Sent: Monday, December 16, 2019 5:52:18 PM
To: Illimar Hugo Rekand; rdkit-discuss@lists.sourceforge.net
Subject: Re: [Rdkit-discuss] Constructing a mol object from a PDB ligand

Hi Illimar,

this gist:

https://gist.github.com/ptosco/2ee199b219b27e01052a7a1433b3bd22

shows a way to achieve this.

Cheers,
p.

On 16/12/2019 16:07, Illimar Hugo Rekand wrote:
> Hello, everyone
>
>
> Is there a simple way to create a mol object from just the HETATM/ligand 
> lines from a pdb-file?
>
> Would it be viable to create a function where you could create a mol object 
> from specific lines within a pdb-file?
>
>
> Illimar Rekand
> Ph.D. candidate,
> Brenk-lab, Haug-lab
> Department of Biomedicine
> Department of Chemistry
> University of Bergen
>
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Constructing a mol object from a PDB ligand

2019-12-16 Thread Paolo Tosco


Hi Illimar,

this gist:

https://gist.github.com/ptosco/2ee199b219b27e01052a7a1433b3bd22

shows a way to achieve this.

Cheers,
p.

On 16/12/2019 16:07, Illimar Hugo Rekand wrote:

Hello, everyone


Is there a simple way to create a mol object from just the HETATM/ligand lines 
from a pdb-file?

Would it be viable to create a function where you could create a mol object 
from specific lines within a pdb-file?


Illimar Rekand
Ph.D. candidate,
Brenk-lab, Haug-lab
Department of Biomedicine
Department of Chemistry
University of Bergen



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Constructing a mol object from a PDB ligand

2019-12-16 Thread Illimar Hugo Rekand

Fair point.

But when working in the 100s and 1000s range of PDB-files it would be nice to 
have some fewer steps when designing a pipeline.

Illimar Rekand
Ph.D. candidate,
Brenk-lab, Haug-lab
Department of Biomedicine
Department of Chemistry
University of Bergen

From: Dimitri Maziuk via Rdkit-discuss 
Sent: Monday, December 16, 2019 5:24:49 PM
To: rdkit-discuss@lists.sourceforge.net
Subject: Re: [Rdkit-discuss] Constructing a mol object from a PDB ligand

On 12/16/2019 10:07 AM, Illimar Hugo Rekand wrote:

> Would it be viable to create a function where you could create a mol object 
> from specific lines within a pdb-file?

PDB file is simple text. There's any number of utilities to extract the
lines you want, incl. a plain text editor, why spend time on reinventing
the wheel?

Dima

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Constructing a mol object from a PDB ligand

2019-12-16 Thread Dimitri Maziuk via Rdkit-discuss


On 12/16/2019 10:07 AM, Illimar Hugo Rekand wrote:


Would it be viable to create a function where you could create a mol object 
from specific lines within a pdb-file?


PDB file is simple text. There's any number of utilities to extract the 
lines you want, incl. a plain text editor, why spend time on reinventing 
the wheel?


Dima


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Observations about RDKit performance: PatternFingerprinter, Windows, Linux and Virtual machines

2019-12-16 Thread Greg Landrum

Hi Thomas,

First it is important to compare equivalent major versions to each other.
Particularly in this case. On my linux box generating the pattern
fingerprints takes 24.2 seconds with v2019.03.x and 15.9 seconds with
v2019.09.x (that's due to the improvements in the substructure matcher that
the blog post you link to discusses).

Comparing the same versions to each other:

Performance on windows vs linux
Windows performance with the RDKit has always lagged behind linux
performance. There's something in the code (or in the way we use the
compiler) that leads to big differences on some benchmarks. The most
straightforward way I can demonstrate this is with results from my windows
10 laptop.
Here's the output when running the fingerprint_screenout.py benchmark using
the windows build:
| 2019.09.1 | 13.6 | 0.3 | 38.1 | 0.8 | 25.5 | 25.9 | 84.1 |
and here's the output from a linux build running on the Windows Linux
Subsystem:
| 2019.09.2 | 10.7 | 0.2 | 19.3 | 0.4 | 19.4 | 19.2 | 53.2 |
You can see the differences are not small.
I haven't invested massive time into it, but I haven't been able to figure
out what causes this.

Performance on (linux) VMs
I can't think of any particular reason why there should be huge differences
and it's really difficult to compare apples to apples here.
Since I have the numbers, here's one comparison

Here's a run on my linux workstation:
| 2019.09.2 | 7.6 | 0.3 | 15.9 | 0.4 | 21.4 | 20.4 | 55.7 |
and here's the same thing on an AWS t3.xlarge instance:
| 2019.09.2 | 9.6 | 0.2 | 20.3 | 0.4 | 38.4 | 38.2 | 94.7 |
The VM is significantly slower, but t3.xlarge an instance type that's
intended to be used for compute intensive jobs (I don't have on of those
active and configured at the moment).

Does that help at all?
-greg


On Mon, Dec 16, 2019 at 8:27 AM Thomas Strunz  wrote:

> Hi All,
>
> I was looking at a blog post from greg:
>
>
> https://rdkit.blogspot.com/2019/07/a-couple-of-substructure-search-topics.html
>
> about fingerprint screenout. The part that got me confused was the timings
> in his blog post because run times in my case where a lot slower.
>
> Gregs numbers:
>
> [07:21:19] INFO: mols from smiles
> [07:21:27] INFO: Results1:  7.77 seconds, 5 mols
> [07:21:27] INFO: queries from smiles
> [07:21:27] INFO: Results2:  0.16 seconds*[07:21:27] INFO: generating pattern 
> fingerprints for mols
> [07:21:43] INFO: Results3:  16.11 seconds*
> [07:21:43] INFO: generating pattern fingerprints for queries
> [07:21:43] INFO: Results4:  0.34 seconds
> [07:21:43] INFO: testing frags queries
> [07:22:03] INFO: Results5:  19.90 seconds. 6753 tested (0.0003 of total), 
> 3989 found,  0.59 accuracy. 0 errors.
> [07:22:03] INFO: testing leads queries
> [07:22:23] INFO: Results6:  19.77 seconds. 1586 tested (0.0001 of total), 
> 1067 found,  0.67 accuracy. 0 errors.
> [07:22:23] INFO: testing pieces queries
> [07:23:19] INFO: Results7:  55.37 seconds. 202 tested (0.0810 of total), 
> 1925628 found,  0.58 accuracy. 0 errors.
>
> | 2019.09.1dev1 | 7.8 | 0.2 | 16.1 | 0.3 | 19.9 | 19.8 | 55.4 |
>
>
>
>
> *Machine 1:*
> Virtual machine, Windows Server 2012 R2 with an intel xeon (4 virtual
> cores)
>
> Since the test is single-threaded it makes a bit of sense that it isn't
> fast here but it's not just a bit slower, but a lot slower, depending on
> test almost 3xtimes slower
>
> [09:03:19] INFO: mols from smiles
> [09:03:38] INFO: Results1:  19.44 seconds, 5 mols
> [09:03:38] INFO: queries from smiles
> [09:03:38] INFO: Results2:  0.36 seconds
>
> *[09:03:38] INFO: generating pattern fingerprints for mols *
> *[09:04:54] INFO: Results3:  75.99 seconds*
> [09:04:54] INFO: generating pattern fingerprints for queries
> [09:04:56] INFO: Results4:  1.55 seconds
> [09:04:56] INFO: testing frags queries
> [09:05:34] INFO: Results5:  37.59 seconds. 6753 tested (0.0003 of total),
> 3989 f
> ound,  0.59 accuracy. 0 errors.
> [09:05:34] INFO: testing leads queries
> [09:06:11] INFO: Results6:  37.34 seconds. 1586 tested (0.0001 of total),
> 1067 f
> ound,  0.67 accuracy. 0 errors.
> [09:06:11] INFO: testing pieces queries
> [09:08:39] INFO: Results7:  147.79 seconds. 202 tested (0.0810 of
> total), 19
> 25628 found,  0.58 accuracy. 0 errors.
> | 2019.03.3 | 19.4 | 0.4 | 76.0 | 1.5 | 37.6 | 37.3 | 147.8 |
>
> I thought maybe another issue with windows being slow so I tested on a
> linux VM on my laptop
>
> *Machine 2:*
> Virtual machine, Lubuntu 16.04 on a laptop i7-8850H 6-core
>
> [09:23:31] INFO: mols from smiles
> [09:23:54] INFO: Results1:  23.71 seconds, 5 mols
> [09:23:54] INFO: queries from smiles
> [09:23:55] INFO: Results2:  0.48 seconds
>
> *[09:23:55] INFO: generating pattern fingerprints for mols *
> *[09:24:53] INFO: Results3:  58.31 seconds*
> [09:24:53] INFO: generating pattern fingerprints for queries
> [09:24:54] INFO: Results4:  1.19 seconds
> [09:24:54] INFO: testing frags queries
> [09:25:41] INFO: Results5:  46.22 seconds. 6753 tested (0.0003 of total),
> 3989

[Rdkit-discuss] Constructing a mol object from a PDB ligand

2019-12-16 Thread Illimar Hugo Rekand

Hello, everyone


Is there a simple way to create a mol object from just the HETATM/ligand lines 
from a pdb-file?

Would it be viable to create a function where you could create a mol object 
from specific lines within a pdb-file?


Illimar Rekand
Ph.D. candidate,
Brenk-lab, Haug-lab
Department of Biomedicine
Department of Chemistry
University of Bergen



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Constructing a mol object from a PDB ligand (Dimitri Maziuk)

Re: [Rdkit-discuss] Constructing a mol object from a PDB ligand

Re: [Rdkit-discuss] Constructing a mol object from a PDB ligand

Re: [Rdkit-discuss] Constructing a mol object from a PDB ligand

Re: [Rdkit-discuss] Constructing a mol object from a PDB ligand

Re: [Rdkit-discuss] Constructing a mol object from a PDB ligand

Re: [Rdkit-discuss] Constructing a mol object from a PDB ligand

Re: [Rdkit-discuss] Observations about RDKit performance: PatternFingerprinter, Windows, Linux and Virtual machines

[Rdkit-discuss] Constructing a mol object from a PDB ligand

9 matches

Site Navigation

Mail list logo

Footer information