Re: [Rdkit-discuss] Chem.PandasTools

2014-05-09 Thread Paul . Czodrowski
Dear Grégori, 

when storing the image into a new data frame:

MMP_reaction = Chem.rdChemReactions.ReactionFromSmarts([*:1][H][*:1]C)
newnew_df = pd.DataFrame(columns=['fig'],index=[1] )
newnew_df['fig'].ix[1] = Draw.ReactionToImage(MMP_reaction)


apparently, the image can be stored in a data frame, but in the ipython 
notebook it is displayed as PIL image



Cheers  Thanks so far (in particular for the impressive speed in 
response!),
Paul

 Hi Paul,
 
 You first have to read the MMP into a reaction object 
 (Chem.ReactionFromSmarts).
 
 Greg
 
 On Friday, May 9, 2014, paul.czodrow...@merckgroup.com wrote:
 Dear Gregori  Samo,
 
 thanks for your hints.
 
 I just tried running
 
 Draw.ReactionToImage([*:1][H][*:1]C)
 
 =
 
 AttributeError: 'str' object has no attribute 'GetNumReactantTemplates'
 
 
 
 BTW, how would I finally add a picture to a Pandas data frame?
 
 
 Cheers,
 Paul


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, you 
must not copy this message or attachment or disclose the contents to any other 
person. If you have received this transmission in error, please notify the 
sender immediately and delete the message and any attachment from your system. 
Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept 
liability for any omissions or errors in this message which may arise as a 
result of E-Mail-transmission or for damages resulting from any unauthorized 
changes of the content of this message and any attachment thereto. Merck KGaA, 
Darmstadt, Germany and any of its subsidiaries do not guarantee that this 
message is free of viruses and does not accept liability for any damages caused 
by any virus transmitted therewith.

Click http://www.merckgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.

--
Is your legacy SCM system holding you back? Join Perforce May 7 to find out:
#149; 3 signs your SCM is hindering your productivity
#149; Requirements for releasing software faster
#149; Expert tips and advice for migrating your SCM now
http://p.sf.net/sfu/perforce
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit cartridge similarity search speeds(?)

2014-05-09 Thread Greg Landrum
Hi James,

[fair warning before I start: we quickly hit the limits of my postgresql
expertise here]


On Thu, May 8, 2014 at 2:35 PM, James Davidson j.david...@vernalis.comwrote:

  Dear All,



 I have recently been spending a bit more time with the RDKit cartridge,
 and have what is probably a very naïve question…

 Having built some RDKit fingerprints for ChEMBL_18, I see the following
 behaviour (for clarification – ‘ecfp4_bv’ is the column in my rdk.fps table
 that has been generated using morganbv_fp(mol, 2)):





 chembl_18=# \timing on

 Timing is on.



 chembl_18=# set rdkit.tanimoto_threshold=0.5;

 SET

 Time: 0.167 ms



 chembl_18=# select chembl_id from rdk.fps where ecfp4_bv %
 morganbv_fp('c1nnccc1'::mol,2);

   chembl_id

 -

 CHEMBL15719

 (1 row)



 Time: 2033.348 ms



 chembl_18=# select chembl_id from rdk.fps where tanimoto_sml(ecfp4_bv,
 morganbv_fp('c1nnccc1'::mol, 2))  0.5;

   chembl_id

 -

 CHEMBL15719

 (1 row)



 Time: 6843.605 ms





 I can see that the query plans are different in the two cases, but I don’t
 fully understand why – see below:



 *QUERY 1 (with explain analyze)*

 chembl_18=# explain analyze select chembl_id from rdk.fps where ecfp4_bv %
 morganbv_fp('c1nnccc1'::mol,2);


 QUERY PLAN


 

 Bitmap Heap Scan on fps  (cost=106.91..5298.31 rows=1352 width=13) (actual
 time=1774.986..1774.987 rows=1 loops=1)

Recheck Cond: (ecfp4_bv %
 '\x0100084200048204'::bfp)

-  Bitmap Index Scan on fps_ecfp4bv_idx  (cost=0.00..106.57 rows=1352
 width=0) (actual time=1774.969..1774.969 rows=1 loops=1)

  Index Cond: (ecfp4_bv %
 '\x0100084200048204'::bfp)

 Total runtime: 1775.035 ms

 (5 rows)



 Time: 1776.133 ms





 *QUERY 2 (with explain analyze)*

 chembl_18=# explain analyze select chembl_id from rdk.fps where
 tanimoto_sml(ecfp4_bv, morganbv_fp('c1nnccc1'::mol, 2))  0.5;


 QUERY PLAN


 ---

 Seq Scan on fps  (cost=0.00..388808.17 rows=450793 width=13) (actual
 time=1278.115..6953.977 rows=1 loops=1)

Filter: (tanimoto_sml(ecfp4_bv,
 '\x0100084200048204'::bfp)
  0.5::double precision)

Rows Removed by Filter: 1352377

 Total runtime: 6954.010 ms

 (4 rows)



 Time: 6955.103 ms


What these are telling you is that the second query is not using the index:
it's a sequential scan, so it has to test all rows of the database. This
happens because the index is defined for the operator %, but not for the
function tanimoto_sml(). There may be an approach to get the index set up
using that function, but there we reach the limits of my expertise.


  It seems conceptually ‘easier’ to add the similarity value as part of
 the query, rather than setting it as a variable ahead of the query; but
 clearly I should be doing it the latter way for performance reasons.  So
 even if I don’t fully understand why at the moment, am I correct in
 thinking that queries of this sort should always be run with the similarity
 operators (%, #)?  And if so, is the rdkit.tanimoto_threshold variable set
 at the level of the session, the user, or the database?


It's set at the session level.

When doing similarity searches, I find it generally helpful to also include
the % operator in an order by clause so that the results come back in
sorted order.
So instead of this;
chembl_17=# select molregno from rdk.fps where mfp2 %
morganbv_fp('Cc1ccc2nc(N(C)CC(=O)O)sc2c1');
 molregno
--
   412312
   412302
   412310
   441378
   470082
   773946
   775269
   911501
  1015485
  1034321
  1040255
  1040496
  1042958
  1043871
  1044892
  1045663
  1047691
  1049393
(18 rows)

Time: 1042.310 ms

I do this:
chembl_17=# select molregno from rdk.fps where mfp2 %
morganbv_fp('Cc1ccc2nc(N(C)CC(=O)O)sc2c1')
 order by
morganbv_fp('Cc1ccc2nc(N(C)CC(=O)O)sc2c1') % mfp2;
 molregno
--
   412312
   470082
  1040255
   773946
  1044892
  1049393
  1040496
   441378
  1047691
  1042958
   412302
  1043871
   412310
  1045663
   911501
   775269
  1015485
  1034321
(18 rows)

Time: 1032.266 ms

Notice that this doesn't make things any slower.

It's nice to see the actual similarity values:

chembl_17=# select
molregno,tanimoto_sml(morganbv_fp('Cc1ccc2nc(N(C)CC(=O)O)sc2c1'),mfp2)
 from rdk.fps
where 

Re: [Rdkit-discuss] Chem.PandasTools

2014-05-09 Thread Samo Turk
Hi,

You can create new object that stores MMP and has default pandas and
ipython representation as base64 encoded png. This usually works for me,
but I'm not sure why in this case it works only for ipython representation
and not for pandas.. The code:

# codecell

import pandas as pd
import rdkit.Chem as Chem
from rdkit.Chem import PandasTools
from rdkit.Chem import Draw
from rdkit.Chem.Draw import IPythonConsole

# codecell

from base64 import b64encode
from StringIO import StringIO

class Reaction():
def __init__(self, reaction=None):
self.reaction = reaction

def _repr_html_(self):
sio = StringIO()
Draw.ReactionToImage(self.reaction).save(sio,format='PNG')
s = b64encode(sio.getvalue())
return 'img src=data:image/png;base64,%s/' %s
def __str__(self):
sio = StringIO()
Draw.ReactionToImage(self.reaction).save(sio,format='PNG')
s = b64encode(sio.getvalue())
return 'img src=data:image/png;base64,%s/' %s

# codecell

MMP_reaction = Chem.rdChemReactions.ReactionFromSmarts([*:1][H][*:1]C)

# codecell

mmp = Reaction(MMP_reaction)

# codecell

mmp

# codecell

newnew_df = pd.DataFrame(columns=['fig'],index=[1] )
newnew_df['fig'].ix[1] = mmp

# codecell

newnew_df

# codecell


Regards,
Samo


On Fri, May 9, 2014 at 8:19 AM, paul.czodrow...@merckgroup.com wrote:

 Dear Grégori,

 when storing the image into a new data frame:
 
 MMP_reaction = Chem.rdChemReactions.ReactionFromSmarts([*:1][H][*:1]C)
 newnew_df = pd.DataFrame(columns=['fig'],index=[1] )
 newnew_df['fig'].ix[1] = Draw.ReactionToImage(MMP_reaction)
 

 apparently, the image can be stored in a data frame, but in the ipython
 notebook it is displayed as PIL image



 Cheers  Thanks so far (in particular for the impressive speed in
 response!),
 Paul

  Hi Paul,
 
  You first have to read the MMP into a reaction object
  (Chem.ReactionFromSmarts).
 
  Greg
 
  On Friday, May 9, 2014, paul.czodrow...@merckgroup.com wrote:
  Dear Gregori  Samo,
 
  thanks for your hints.
 
  I just tried running
 
  Draw.ReactionToImage([*:1][H][*:1]C)
 
  =
 
  AttributeError: 'str' object has no attribute 'GetNumReactantTemplates'
 
 
 
  BTW, how would I finally add a picture to a Pandas data frame?
 
 
  Cheers,
  Paul


 This message and any attachment are confidential and may be privileged or
 otherwise protected from disclosure. If you are not the intended recipient,
 you must not copy this message or attachment or disclose the contents to
 any other person. If you have received this transmission in error, please
 notify the sender immediately and delete the message and any attachment
 from your system. Merck KGaA, Darmstadt, Germany and any of its
 subsidiaries do not accept liability for any omissions or errors in this
 message which may arise as a result of E-Mail-transmission or for damages
 resulting from any unauthorized changes of the content of this message and
 any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
 subsidiaries do not guarantee that this message is free of viruses and does
 not accept liability for any damages caused by any virus transmitted
 therewith.

 Click http://www.merckgroup.com/disclaimer to access the German, French,
 Spanish and Portuguese versions of this disclaimer.


 --
 Is your legacy SCM system holding you back? Join Perforce May 7 to find
 out:
 #149; 3 signs your SCM is hindering your productivity
 #149; Requirements for releasing software faster
 #149; Expert tips and advice for migrating your SCM now
 http://p.sf.net/sfu/perforce
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

--
Is your legacy SCM system holding you back? Join Perforce May 7 to find out:
#149; 3 signs your SCM is hindering your productivity
#149; Requirements for releasing software faster
#149; Expert tips and advice for migrating your SCM now
http://p.sf.net/sfu/perforce___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit cartridge similarity search speeds(?)

2014-05-09 Thread James Davidson
Hi Greg,

 What these are telling you is that the second query is not using the index:
 it's a sequential scan, so it has to test all rows of the database. This
 happens because the index is defined for the operator %, but not for the
 function tanimoto_sml(). There may be an approach to get the index set up
 using that function, but there we reach the limits of my expertise.

Well, I will stick to the recommended operator use then!


 One final advanced topic: if you are planning on making regular use of the
 similarity features in the cartridge and are running on a linux system or
 Mac I would recommend recompiling the cartridge with some optimizations for
 tanimoto similarity. To do this, you need to edit the cartridge Makefile
 from:
 PG_CPPFLAGS = -I${BOOSTHOME} -I${RDKIT}/Code -DRDKITVER='007200'
 ${INCHIFLAGS} #-DUSE_BUILTIN_POPCOUNT -msse4.2
 
 to:
 PG_CPPFLAGS = -I${BOOSTHOME} -I${RDKIT}/Code -DRDKITVER='007200'
 ${INCHIFLAGS} -DUSE_BUILTIN_POPCOUNT -msse4.2
 
 (I just removed a comment character here). This speeds the Tanimoto
 calculation up a fair bit (it's still not nearly as fast as Andrew's
 chemfp, but it's better than the default behavior).

I'm on linux (Ubuntu), and have just re-built with the above recommendation.
I'll see what the speeds look like afterwards (out of interest, I presume the 
timings in your examples were with this optimisation in place?).

Does this also affect dice?

And final question - after rebuilding the cartridge, does the extension need to 
be dropped and then re-created in all databases; does postgreSQL server need 
restarting; or neither?


 Hope this helps,
 -greg

It does - thanks!

Kind regards

James

__
PLEASE READ: This email is confidential and may be privileged. It is intended 
for the named addressee(s) only and access to it by anyone else is 
unauthorised. If you are not an addressee, any disclosure or copying of the 
contents of this email or any action taken (or not taken) in reliance on it is 
unauthorised and may be unlawful. If you have received this email in error, 
please notify the sender or postmas...@vernalis.com. Email is not a secure 
method of communication and the Company cannot accept responsibility for the 
accuracy or completeness of this message or any attachment(s). Please check 
this email for virus infection for which the Company accepts no responsibility. 
If verification of this email is sought then please request a hard copy. Unless 
otherwise stated, any views or opinions presented are solely those of the 
author and do not represent those of the Company.

The Vernalis Group of Companies
100 Berkshire Place
Wharfedale Road
Winnersh, Berkshire
RG41 5RD, England
Tel: +44 (0)118 938 

To access trading company registration and address details, please go to the 
Vernalis website at www.vernalis.com and click on the Company address and 
registration details link at the bottom of the page..
__
--
Is your legacy SCM system holding you back? Join Perforce May 7 to find out:
#149; 3 signs your SCM is hindering your productivity
#149; Requirements for releasing software faster
#149; Expert tips and advice for migrating your SCM now
http://p.sf.net/sfu/perforce
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit cartridge similarity search speeds(?)

2014-05-09 Thread Greg Landrum
James,


On Fri, May 9, 2014 at 10:25 AM, James Davidson j.david...@vernalis.comwrote:


  One final advanced topic: if you are planning on making regular use of
 the
  similarity features in the cartridge and are running on a linux system or
  Mac I would recommend recompiling the cartridge with some optimizations
 for
  tanimoto similarity. To do this, you need to edit the cartridge Makefile
  from:
  PG_CPPFLAGS = -I${BOOSTHOME} -I${RDKIT}/Code -DRDKITVER='007200'
  ${INCHIFLAGS} #-DUSE_BUILTIN_POPCOUNT -msse4.2
 
  to:
  PG_CPPFLAGS = -I${BOOSTHOME} -I${RDKIT}/Code -DRDKITVER='007200'
  ${INCHIFLAGS} -DUSE_BUILTIN_POPCOUNT -msse4.2
 
  (I just removed a comment character here). This speeds the Tanimoto
  calculation up a fair bit (it's still not nearly as fast as Andrew's
  chemfp, but it's better than the default behavior).

 I'm on linux (Ubuntu), and have just re-built with the above
 recommendation.
 I'll see what the speeds look like afterwards (out of interest, I presume
 the timings in your examples were with this optimisation in place?).


I would have thought so, but it turns out that I was using a build on my
Mac without the optimization. It's about 20% faster for those sample
queries when I use the rebuilt version.


 Does this also affect dice?


It should, but, stupidly, it looks like it doesn't. I'll fix that.


 And final question - after rebuilding the cartridge, does the extension
 need to be dropped and then re-created in all databases; does postgreSQL
 server need restarting; or neither?


You do not need to drop the extension. This is just a change to the shared
library and doesn't affect the API. You just need to do a make install to
copy in the new shared lib and then restart the server.

-greg
--
Is your legacy SCM system holding you back? Join Perforce May 7 to find out:
#149; 3 signs your SCM is hindering your productivity
#149; Requirements for releasing software faster
#149; Expert tips and advice for migrating your SCM now
http://p.sf.net/sfu/perforce___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] autodock vina pdbqt file to mol2

2014-05-09 Thread Christos Kannas
Hi Jan,

AutoDock has a set of tools (MGLTools) that have tools to convert pdb to
pdbqt and vice-versa.
If I recall it can also convert pdbqt to mol2 also. See this discussion
http://autodock.1369657.n2.nabble.com/ADL-pdbqt-to-mol2-td6755769.html

Best,

Christos

Christos Kannas

Researcher
Ph.D Student

Mob (UK): +44 (0) 7447700937
Mob (Cyprus): +357 99530608

[image: View Christos Kannas's profile on
LinkedIn]http://cy.linkedin.com/in/christoskannas


On 9 May 2014 20:17, Jan Domanski jan...@gmail.com wrote:

 Hi guys,

 I'm really stuck here: I have some output from autodock vina in a rather
 obscure pdbqt format. It's a little bit like pdb but not quite. I'm trying
 to get back a mol2 file.

 The autodock pdbqt file has only the polar hydrogens in it – part of the
 trick is to re-add the hydrogens.

 Example autodock vina output is attached (it's a conformer of the ACE
 native ligand DUDE).

 First of all, I convert that to a PDB file by doing a simple sed,
 sed -e '/ROOT/d' -e '/BRANCH/d'
 Then I reorder the atoms to match those of the original
 crystal_ligand.mol2 (because autodock re-orders the atoms duh).

 Finally, I save a mol2 file out (attached) ordered as the original
 crystal_ligand and with polar hydrogens (for each pose of a conformer).

 Let's go to rdkit and try to add hydrogens:

 mol = Chem.MolFromMol2File(output, removeHs=False)
 mol2 = AllChem.AddHs(mol, addCoords=True)
 print mol.GetNumAtoms(), mol2.GetNumAtoms()
 44 44

 So, only the implicit hydorgens are present. Calling AddHs doesn't raise
 an error and it doesn't really change the number of hydrogens...

 Now this may not be the best way of doing things: what I care for is to
 get a mol2 from autodock vina that I can compare to the original mol2 from
 DUD (same atom order, same number of atoms). Maybe there are other ways to
 achieve this: one idea would be to inject the docked pose coordinates into
 the original mol2 atoms (heavy and polar hydrogens) and somehow adjust
 the non-polar hydrogens.

 Thanks,

 - Jan



 --
 Is your legacy SCM system holding you back? Join Perforce May 7 to find
 out:
 #149; 3 signs your SCM is hindering your productivity
 #149; Requirements for releasing software faster
 #149; Expert tips and advice for migrating your SCM now
 http://p.sf.net/sfu/perforce
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
Is your legacy SCM system holding you back? Join Perforce May 7 to find out:
#149; 3 signs your SCM is hindering your productivity
#149; Requirements for releasing software faster
#149; Expert tips and advice for migrating your SCM now
http://p.sf.net/sfu/perforce___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] autodock vina pdbqt file to mol2

2014-05-09 Thread Jan Domanski
Thanks for the quick reply Christos!

I found the pdbqt_to_pdb script that you mentioned but a google search for
a pdbqt to mol2 yield nothing (other than this thread). the pdbqt_to_pdb
converter is very crude: it retains only the best pose from _out.pdbqt and
it basically just strips the BRANCH and ROOT tags deposited by autodock
(which I was doing anyway with the sed).

The main problems remaining are atom order (I can fix that) and missing
hydrogens (can't fix that). There is a mode where I can prevent the
prepare_ligand4.py from removing the hydrogens - but the output poses then
have really weird geometry.

But let's refocus a little bit: this is not an autodock vina question
(although many folks here are knowledgeable enough to help me). This is a
question on a mol2 file to which it should be possible to add Hs with rdkit
and it's somehow not happening (at least not in my hands). My mol2 could be
somehow malformatted.





On 9 May 2014 20:57, Christos Kannas chriskan...@gmail.com wrote:

 Hi Jan,

 AutoDock has a set of tools (MGLTools) that have tools to convert pdb to
 pdbqt and vice-versa.
 If I recall it can also convert pdbqt to mol2 also. See this discussion
 http://autodock.1369657.n2.nabble.com/ADL-pdbqt-to-mol2-td6755769.html

 Best,

 Christos

 Christos Kannas

 Researcher
 Ph.D Student

 Mob (UK): +44 (0) 7447700937
 Mob (Cyprus): +357 99530608

 [image: View Christos Kannas's profile on 
 LinkedIn]http://cy.linkedin.com/in/christoskannas


 On 9 May 2014 20:17, Jan Domanski jan...@gmail.com wrote:

 Hi guys,

 I'm really stuck here: I have some output from autodock vina in a rather
 obscure pdbqt format. It's a little bit like pdb but not quite. I'm trying
 to get back a mol2 file.

 The autodock pdbqt file has only the polar hydrogens in it - part of the
 trick is to re-add the hydrogens.

 Example autodock vina output is attached (it's a conformer of the ACE
 native ligand DUDE).

 First of all, I convert that to a PDB file by doing a simple sed,
 sed -e '/ROOT/d' -e '/BRANCH/d'
 Then I reorder the atoms to match those of the original
 crystal_ligand.mol2 (because autodock re-orders the atoms duh).

 Finally, I save a mol2 file out (attached) ordered as the original
 crystal_ligand and with polar hydrogens (for each pose of a conformer).

 Let's go to rdkit and try to add hydrogens:

 mol = Chem.MolFromMol2File(output, removeHs=False)
 mol2 = AllChem.AddHs(mol, addCoords=True)
 print mol.GetNumAtoms(), mol2.GetNumAtoms()
 44 44

 So, only the implicit hydorgens are present. Calling AddHs doesn't raise
 an error and it doesn't really change the number of hydrogens...

 Now this may not be the best way of doing things: what I care for is to
 get a mol2 from autodock vina that I can compare to the original mol2 from
 DUD (same atom order, same number of atoms). Maybe there are other ways to
 achieve this: one idea would be to inject the docked pose coordinates into
 the original mol2 atoms (heavy and polar hydrogens) and somehow adjust
 the non-polar hydrogens.

 Thanks,

 - Jan



 --
 Is your legacy SCM system holding you back? Join Perforce May 7 to find
 out:
 #149; 3 signs your SCM is hindering your productivity
 #149; Requirements for releasing software faster
 #149; Expert tips and advice for migrating your SCM now
 http://p.sf.net/sfu/perforce
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



--
Is your legacy SCM system holding you back? Join Perforce May 7 to find out:
#149; 3 signs your SCM is hindering your productivity
#149; Requirements for releasing software faster
#149; Expert tips and advice for migrating your SCM now
http://p.sf.net/sfu/perforce___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] autodock vina pdbqt file to mol2

2014-05-09 Thread Greg Landrum
Jan,

On Fri, May 9, 2014 at 9:17 PM, Jan Domanski jan...@gmail.com wrote:


 Finally, I save a mol2 file out (attached) ordered as the original
 crystal_ligand and with polar hydrogens (for each pose of a conformer).

 Let's go to rdkit and try to add hydrogens:

 mol = Chem.MolFromMol2File(output, removeHs=False)
 mol2 = AllChem.AddHs(mol, addCoords=True)
 print mol.GetNumAtoms(), mol2.GetNumAtoms()
 44 44

 So, only the implicit hydorgens are present. Calling AddHs doesn't raise
 an error and it doesn't really change the number of hydrogens...


The mol2 parser assumes that all Hs are present, so it sets a flag on every
atom saying that it has no implicit Hs.
You can manually undo this:

In [23]: m =
Chem.MolFromMol2File('Downloads/conformer_132_out_1.mol2',sanitize=False)

In [24]: for atom in m.GetAtoms(): atom.SetNoImplicit(False)

In [25]: m.UpdatePropertyCache()

In [26]: mh = Chem.AddHs(m,addCoords=True)

In [27]: print mh.GetNumAtoms()
69

I skipped sanitization entirely when I read the molecule in. This is to
prevent the assignment of radicals.

Note that the mol2 parser is really only even remotely tested with output
from Corina; if you end up with alternate atom types, you should anticipate
further problems.

-greg
--
Is your legacy SCM system holding you back? Join Perforce May 7 to find out:
#149; 3 signs your SCM is hindering your productivity
#149; Requirements for releasing software faster
#149; Expert tips and advice for migrating your SCM now
http://p.sf.net/sfu/perforce___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] autodock vina pdbqt file to mol2

2014-05-09 Thread TJ O'Donnell
Babel can read and write both pdbqt and mol2 files. I'm not sure how the
atom ordering might be accomplished though.

TJ
On May 9, 2014 2:43 PM, Jan Domanski jan...@gmail.com wrote:

 Thanks for the quick reply Christos!

 I found the pdbqt_to_pdb script that you mentioned but a google search for
 a pdbqt to mol2 yield nothing (other than this thread). the pdbqt_to_pdb
 converter is very crude: it retains only the best pose from _out.pdbqt and
 it basically just strips the BRANCH and ROOT tags deposited by autodock
 (which I was doing anyway with the sed).

 The main problems remaining are atom order (I can fix that) and missing
 hydrogens (can't fix that). There is a mode where I can prevent the
 prepare_ligand4.py from removing the hydrogens – but the output poses then
 have really weird geometry.

 But let's refocus a little bit: this is not an autodock vina question
 (although many folks here are knowledgeable enough to help me). This is a
 question on a mol2 file to which it should be possible to add Hs with rdkit
 and it's somehow not happening (at least not in my hands). My mol2 could be
 somehow malformatted.





 On 9 May 2014 20:57, Christos Kannas chriskan...@gmail.com wrote:

 Hi Jan,

 AutoDock has a set of tools (MGLTools) that have tools to convert pdb to
 pdbqt and vice-versa.
 If I recall it can also convert pdbqt to mol2 also. See this discussion
 http://autodock.1369657.n2.nabble.com/ADL-pdbqt-to-mol2-td6755769.html

 Best,

 Christos

 Christos Kannas

 Researcher
 Ph.D Student

 Mob (UK): +44 (0) 7447700937
 Mob (Cyprus): +357 99530608

 [image: View Christos Kannas's profile on 
 LinkedIn]http://cy.linkedin.com/in/christoskannas


 On 9 May 2014 20:17, Jan Domanski jan...@gmail.com wrote:

  Hi guys,

 I'm really stuck here: I have some output from autodock vina in a rather
 obscure pdbqt format. It's a little bit like pdb but not quite. I'm trying
 to get back a mol2 file.

 The autodock pdbqt file has only the polar hydrogens in it – part of the
 trick is to re-add the hydrogens.

 Example autodock vina output is attached (it's a conformer of the ACE
 native ligand DUDE).

 First of all, I convert that to a PDB file by doing a simple sed,
 sed -e '/ROOT/d' -e '/BRANCH/d'
 Then I reorder the atoms to match those of the original
 crystal_ligand.mol2 (because autodock re-orders the atoms duh).

 Finally, I save a mol2 file out (attached) ordered as the original
 crystal_ligand and with polar hydrogens (for each pose of a conformer).

 Let's go to rdkit and try to add hydrogens:

 mol = Chem.MolFromMol2File(output, removeHs=False)
 mol2 = AllChem.AddHs(mol, addCoords=True)
 print mol.GetNumAtoms(), mol2.GetNumAtoms()
 44 44

 So, only the implicit hydorgens are present. Calling AddHs doesn't raise
 an error and it doesn't really change the number of hydrogens...

 Now this may not be the best way of doing things: what I care for is to
 get a mol2 from autodock vina that I can compare to the original mol2 from
 DUD (same atom order, same number of atoms). Maybe there are other ways to
 achieve this: one idea would be to inject the docked pose coordinates into
 the original mol2 atoms (heavy and polar hydrogens) and somehow adjust
 the non-polar hydrogens.

 Thanks,

 - Jan



 --
 Is your legacy SCM system holding you back? Join Perforce May 7 to find
 out:
 #149; 3 signs your SCM is hindering your productivity
 #149; Requirements for releasing software faster
 #149; Expert tips and advice for migrating your SCM now
 http://p.sf.net/sfu/perforce
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss





 --
 Is your legacy SCM system holding you back? Join Perforce May 7 to find
 out:
 #149; 3 signs your SCM is hindering your productivity
 #149; Requirements for releasing software faster
 #149; Expert tips and advice for migrating your SCM now
 http://p.sf.net/sfu/perforce
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
Is your legacy SCM system holding you back? Join Perforce May 7 to find out:
#149; 3 signs your SCM is hindering your productivity
#149; Requirements for releasing software faster
#149; Expert tips and advice for migrating your SCM now
http://p.sf.net/sfu/perforce___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss