Re: [Rdkit-discuss] Beta of Q3 2013 release available

2013-10-25 Thread ro...@nextmovesoftware.com

Hi James and Greg,

On Oct 25, 2013, at 4:03 AM, Greg Landrum wrote:
 1.   Do I remember correctly that there was a proposal (from  
 Roger) to add some auto bond-type perception to the PDB parser for  
 ligands (or is that just wishful thinking!)?

 Roger will have to confirm this, but I believe he said something  
 along the lines of that way lies madness.

My first comment is that a computational chemistry toolkit's assign  
bonds orders,
formal charges and protonation states from 3D coordinates function is/ 
should be
a (sanitize-like) step independent of its PDB file reader.  For one  
thing, this
functionality is required for reading XYZ format files, Schrodinger  
maestro files,
and quantum mechanics files formats, such as Gaussian and MOPAC.  For
another thing, many PDB file reading applications don't require bond  
orders,
e.g. GRASP surfaces and many docking functions/forcefield  
calculations, so
handling bond order perception independently of PDB reading has some  
merit.

All I'll say at this stage is that correctly perceiving bonds, formal  
charges and
protonation state (they're all interdependent) is probably more  
complicated than
most folks think.  Indeed, many of the crystallographers at the RDKit  
meeting
claimed it was impossible.  The bondage algorithm used in OpenEye's  
OEChem
is several thousands of lines of C++, and was still improving (on  
things like
iron-sulfur clusters and oxime vs. nitroso perception) up to the point  
I left
Santa Fe in 2010.  The state-of-the-art from a decade ago is described  
at:
http://www.daylight.com/meetings/mug01/Sayle/m4xbondage.html and was
used at the time to produce a searchable database of PDB ligands:
http://www.metaphorics.com/products/luna.html

 3.   Is there some explanation for what the ‘flavor’ option does for  
 reading/writing PDB?

 I'm not sure about the reader. Roger, can you answer that?

 This is what's in the C++ for the PDBWriter:
 // PDBWriter support multiple flavors of PDB output
 // flavor  1 : Write MODEL/ENDMDL lines around each record
 // flavor  2 : Don't write any CONECT records
 // flavor  4 : Write CONECT records in both directions
 // flavor  8 : Don't use multiple CONECTs to encode bond order
 // flavor  16 : Write MASTER record
 // flavor  32 : Write TER record

 This is now in the docs for both the Python and C++ code.

The use of an integer file format flavor argument allows the caller  
to customize
the behavior of the readers and writers.  The semantics is that a  
reasonable default
is zero (for all bits), but that new features may be added without  
changing the API/ABI.
Most of the bits above (for the writer) control strict compliance with  
the PDB format
specification.  For example, a flavor of 12 will write bond orders the  
way the RCSB
expects them both throwing away bond orders and increasing the size of  
the PDB file.

For the reader, the flavor argument controls whether alternate  
locations are read
(for use by PDB power users), or whether a sensible subset of atoms is  
used for
the RDKit::ROMol.
 5.   It seems to me that GetResidueNumber() and  
 GetSerialNumber() may have got mixed-up at some point(?).  At least,  
 when I call GetSerialNumber() I see what appears to be the residue  
 number; and when I call GetResidueNumber() I get “0”!

 This was another dumb bug from me. It's fixed.

Greg is being modest.  At the time of the RDKit meeting, the  
MonomerInfo data structure
had just a SerialNumber field which was used for storing residue  
numbers.  One of my
suggestions back to Greg was that although everything worked, this  
nomenclature might
be confusing to folks using the API, so it was suggested to rename the  
field for the Q3 beta.
The better solution was to support fields for both ResidueNumber and  
SerialNumber, but
following that change I failed to send the patch to make the reader/ 
writer use the correct
(changed) residueNumber field, and record/honour the serial number  
field.

My apologies.  I share some of the blame for this one.

 6.   I also seem to be seeing all of the bonds (for all  
 residues) being written out in CONECT records – such that they all  
 appear as single bonds in eg PyMOL – is this expected behaviour at  
 the moment?

 Another one for Roger.

I believe this should work fine.  RDKit's PDB file writer by default  
encodes the bond
orders, which should be interpreted by PyMol.  In the words of the  
late great Warren:
http://www.phenix-online.org/pipermail/phenixbb/2008-April/012188.html

We need to check where the bond orders are getting lost.  If you read  
the PDB file
back RDKit's PDB file reader and write out the SMILES does it have  
double bonds?


I hope this helps.

Many thanks again to Greg for all the code polishing described above.

Roger
--
Roger Sayle, Ph.D.
CEO and founder
NextMove Software Limited
Registered in England No. 07588305
Registered Office: Innovation Centre (Unit 23), Cambridge Science  
Park, Cambridge CB4 0EY



Re: [Rdkit-discuss] Beta of Q3 2013 release available

2013-10-25 Thread sereina riniker
Hi James,

Regarding the AssignBondOrdersFromTemplate() method:
As far as I understood, the PDB reader assigns bond orders to the amino
acids in a protein, but if a ligand is present it puts all bonds of it to
SINGLE bonds as auto bond-type perception is not trivial (see Roger's
comments). However, usually one knows which ligand was crystallized (i.e.
the SMILES is available), so the AssignBondOrdersFromTemplate() method can
be used to set the bond orders based on the known ligand structure. This is
the idea of the method. Now, to your real-world application. I'm sorry but
I don't think I understand it completely. Do you want to set only the bond
orders of a specific substructure? Or would you like to give the function a
set of ligands and a set of templates and it figures out which template
belongs to which ligand and sets the bonds orders accordingly?

Best,
Sereina



2013/10/24 Greg Landrum greg.land...@gmail.com

 James,

 On Thu, Oct 24, 2013 at 7:27 PM, James Davidson 
 j.david...@vernalis.comwrote:

  Hi Greg (et al.),

 ** **

 Thanks for the beta!  I have been going through some of the
 recently-added functionality, and had a couple of questions regarding the
 PDB reading / writing.


 Thanks for the bug reports!

 **

 **1.   **Do I remember correctly that there was a proposal (from
 Roger) to add some auto bond-type perception to the PDB parser for ligands
 (or is that just wishful thinking!)?

 Roger will have to confirm this, but I believe he said something along the
 lines of that way lies madness.

 2.   **If not, I notice that there is an
 AssignBondOrdersFromTemplate() method – but the example in the doc-string
 only shows (I think) the case where the input PDB is just a single small
 molecule – so the matching is pretty easy!  I think a more real-World case
 is when one wants to set the bond orders for multiple ligands (HETATM
 residues) based on substructure matches – which will then return an atom
 index selection that can be used as a start point.  Is there any way to
 have the AssignBondOrdersFromTemplate() convenience function optionally
 accept a list of atom indexes to specify a substructure?

 Sereina? Is that doable?

 

 **3.   **Is there some explanation for what the ‘flavor’ option does
 for reading/writing PDB?

 I'm not sure about the reader. Roger, can you answer that?

 This is what's in the C++ for the PDBWriter:
 // PDBWriter support multiple flavors of PDB output
 // flavor  1 : Write MODEL/ENDMDL lines around each record
 // flavor  2 : Don't write any CONECT records
 // flavor  4 : Write CONECT records in both directions
 // flavor  8 : Don't use multiple CONECTs to encode bond order
 // flavor  16 : Write MASTER record
 // flavor  32 : Write TER record

 This is now in the docs for both the Python and C++ code.

 

 **4.   **Having read in a PDB file I see the correct atoms flagged
 as HETATM (from GetIsHeteroAtom()).  But when call Chem.MolToPDBBlock()
 these atoms get written as ATOM records…  Also, a Chem.MolToPDBFile()
 method would be nice for completeness / symmetry : )

 The HETATM thing was the result of a dumb copy and paste error from me.
 It's fixed.

 Re: Chem.MolToPDBFile()
 that's missing because there's no corresponding Chem.MolToMolFile()
 This is an odd oversight, which I've now fixed.

 

 **5.   **It seems to me that GetResidueNumber() and
 GetSerialNumber() may have got mixed-up at some point(?).  At least, when I
 call GetSerialNumber() I see what appears to be the residue number; and
 when I call GetResidueNumber() I get “0”!

 This was another dumb bug from me. It's fixed.

 

 **6.   **I also seem to be seeing all of the bonds (for all
 residues) being written out in CONECT records – such that they all appear
 as single bonds in eg PyMOL – is this expected behaviour at the moment?

 Another one for Roger.

 -greg



 --
 October Webinars: Code for Performance
 Free Intel webinars can help you accelerate application performance.
 Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most
 from
 the latest Intel processors and coprocessors. See abstracts and register 
 http://pubads.g.doubleclick.net/gampad/clk?id=60135991iu=/4140/ostg.clktrk
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60135991iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list

[Rdkit-discuss] Friday pandas q

2013-10-25 Thread George Papadatos
Question to rdkit pandas users (pandaskitters?):

I managed to have the mol_send(m) object in a pandas frame:
[image: Inline images 1]
if I do this: data['mol'].map(str).map(Chem.Mol)
I get the mol in base64 PNG:

[image: Inline images 2]

How do I display the column as rendered images (and keep them internally as
a Series of rdmols) ?

PandasTools.ChangeMoleculeRendering seems relevant but I can't get it to
display the mols

Cheers,

George
image.pngimage.png--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60135991iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Friday pandas q

2013-10-25 Thread George Papadatos
It worked! Many thanks!
g


On 25 October 2013 16:18, Greg Landrum greg.land...@gmail.com wrote:

 Hi George,

 Nikolas is really the expert here, but this just worked for me:

 curs.execute('select molregno,mol_send(m) from rdk.mols where m@
 %s',('c12c1nncc2',))

 d = curs.fetchall()

 df2 = pd.DataFrame(d,columns=('molregno','pkl'))

 df2['romol']=df2.apply(lambda x:Chem.Mol(str(x['pkl'])),axis=1)

 PandasTools.RenderImagesInAllDataFrames()
 del df2['pkl']
 df2.head(2)

 -greg



 On Fri, Oct 25, 2013 at 4:43 PM, George Papadatos gpapada...@gmail.comwrote:

 Question to rdkit pandas users (pandaskitters?):

 I managed to have the mol_send(m) object in a pandas frame:
 [image: Inline images 1]
 if I do this: data['mol'].map(str).map(Chem.Mol)
 I get the mol in base64 PNG:

 [image: Inline images 2]

 How do I display the column as rendered images (and keep them internally
 as a Series of rdmols) ?

 PandasTools.ChangeMoleculeRendering seems relevant but I can't get it to
 display the mols

 Cheers,

 George



image.pngimage.png--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60135991iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Friday pandas q

2013-10-25 Thread Nikolas Fechner

 
  
   Hi George,
   
  
   Glad that Greg already helped you getting it working. Just to add some information: The methods RenderImagesInAllDataFrames and ChangeMoleculeRendering are related in their effects but doing slightly different things.
   
  
   RenderImagesInAllDataFrames is patching the core DataFrame class of pandas to disable the HTML escaping, which is enabled by default in pandas, so this methods affects ALL dataframes. Seeing HTML code in a dataframe is an indicator that something around this pandas patching was not done.The method is called inside 
   AddMoleculeColumnToFrame
   , but if you do get your molecule objects in another way into your dataframe you would have to call it explicitly. I think this is exactly what was going on in your case.
   
  
   
   
  
   ChangeMoleculeRendering is mainly patching the string representation of molecule objects. But you can pass the dataframe object to this method as well to patch its HTML escaping, which would affect only this single instance of a dataframe and not all future ones in contrast to RenderImagesInAllDataFrames. 
   
  
   
   
  
   Just as a side note: The reason that 
   AddMoleculeColumnToFrame
patches the general pandas behaviour is that many dataframe methods (like head, tail,...) return new dataframe instances that would not inherit the HTML escaping if this was only patched for the single dataframe object.
   
  
   
   
  
   Cheers,
   
  
   Niko
   
  
   
   
  
   
   
  
   On October 25, 2013 at 5:31 PM George Papadatos gpapada...@gmail.com wrote:
   

   
It worked! Many thanks! 

 g
 

   

 

 On 25 October 2013 16:18, Greg Landrum 
 greg.land...@gmail.com wrote:
  
  
  
   Hi George, 
   


   
Nikolas is really the expert here, but this just worked for me:

   


   
curs.execute(select molregno,mol_send(m) from rdk.mols where m@%s,(c12c1nncc2,))


d = curs.fetchall() 
df2 = pd.DataFrame(d,columns=(molregno,pkl)) 
df2[romol]=df2.apply(lambda x:Chem.Mol(str(x[pkl])),axis=1)  

 PandasTools.RenderImagesInAllDataFrames()
 

   
del df2[pkl]

   
df2.head(2)


   


   
-greg

   


   
   


 
  
 
  On Fri, Oct 25, 2013 at 4:43 PM, George Papadatos 
  gpapada...@gmail.com wrote:
   
   
   
Question to rdkit pandas users (pandaskitters?): 

 
 

 I managed to have the mol_send(m) object in a pandas frame:
 

 
 

 if I do this:data[mol].map(str).map(Chem.Mol)
 

 I get the mol in base64 PNG:
 

 
 

 
  
 
  
  
 
  How do I display the column as rendered images (and keep them internally as a Series of rdmols) ?
  
 

 
 

 PandasTools.ChangeMoleculeRendering seems relevant but I cant get it to display the mols
 

 
 

 Cheers,
 

 
 

 George
 

   
  
 

   
  
 

   
  
   
  
 
--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60135991iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] RDkit, OS X 10.9 and clang++

2013-10-25 Thread William G. Scott
Dear RDkit community:

I’ve been maintaining a fink package for RDkit (primarily as a dependency for 
coot).  

cf:  http://tinyurl.com/rdkitfink

It compiles and on OSX 10.6, 10.7 and 10.8, but not 10.9.  (This includes the 
2013_9 pre-release, FWIW.)

With 10.9, the migration to clang++ is upon us, and I’m stuck.

Has anyone succeeded in getting rdkit compiled on 10.9, and if so, how?  

Also, if anyone has feedback or recommendations  for how to improve the fink 
rdkit package, please let me know.

Thanks in advance.

Bill Scott





William G. Scott
Professor
Department of Chemistry and Biochemistry
and The Center for the Molecular Biology of RNA
228 Sinsheimer Laboratories
University of California at Santa Cruz
Santa Cruz, California 95064
USA
 
--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60135991iu=/4140/ostg.clktrk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Beta of Q3 2013 release available

2013-10-25 Thread James Davidson
Hi Sereina,

Sereina wrote:
 Regarding the AssignBondOrdersFromTemplate() method:
 As far as I understood, the PDB reader assigns bond orders to the amino acids 
 in a protein, but if a ligand is present it puts all bonds of it to SINGLE 
 bonds as auto bond-type perception is not trivial (see Roger's comments).
 However, usually one knows which ligand was crystallized (i.e. the SMILES is 
 available), so the AssignBondOrdersFromTemplate() method can be used to set 
 the bond orders based on the known ligand structure.
 This is the idea of the method. Now, to your real-world application. I'm 
 sorry but I don't think I understand it completely. Do you want to set only 
 the bond orders of a specific substructure?
 Or would you like to give the function a set of ligands and a set of 
 templates and it figures out which template belongs to which ligand and sets 
 the bonds orders accordingly? 

This is very likely to be me being stupid - so please bear with me!
If I read in a complex (pdb), and already have my reference ligand (lig), then 
AllChem.AssignBondOrdersFromTemplate(lig, pdb) fails because the reference 
ligand has not been matched to the ligand in the pdb 'complex' (dot-separated 
list of molecules).
The doc-string states that the method works on two molecules - but I want to 
work on a reference molecule (lig) and a *substructure* of the macromolecule 
(pdb).  How should I be getting the bound ligand out as a molecule object to 
then use the AssignBondOrdersFromTemplate() method?  Am I missing some new 
PDB-related methods, or have I forgotten some fundamental RDKit methods for 
dealing with multi-component molecules?

I guess a sensible process would be:
1. Identify any HETATM residues
2. For each residue (or at least those that have bonds!) extract or copy the 
mol (unless it can be addressed 'in place'?)
3. Use AssignBondOrdersFromTemplate() - relying on lookup be eg residue name, 
etc
4. Insert the molecule back into the complex (or update the info if it has been 
modified 'in place')

Is this how the method is intended to be used with complexes (and if so, do you 
have an example for steps 2 and 4?

Thanks

James

__
PLEASE READ: This email is confidential and may be privileged. It is intended 
for the named addressee(s) only and access to it by anyone else is 
unauthorised. If you are not an addressee, any disclosure or copying of the 
contents of this email or any action taken (or not taken) in reliance on it is 
unauthorised and may be unlawful. If you have received this email in error, 
please notify the sender or postmas...@vernalis.com. Email is not a secure 
method of communication and the Company cannot accept responsibility for the 
accuracy or completeness of this message or any attachment(s). Please check 
this email for virus infection for which the Company accepts no responsibility. 
If verification of this email is sought then please request a hard copy. Unless 
otherwise stated, any views or opinions presented are solely those of the 
author and do not represent those of the Company.

The Vernalis Group of Companies
100 Berkshire Place
Wharfedale Road
Winnersh, Berkshire
RG41 5RD, England
Tel: +44 (0)118 938 

To access trading company registration and address details, please go to the 
Vernalis website at www.vernalis.com and click on the Company address and 
registration details link at the bottom of the page..
__

--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60135991iu=/4140/ostg.clktrk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDkit, OS X 10.9 and clang++

2013-10-25 Thread Igor Filippov
There is no g++ for OSX 10.9 at all?
Would one of these work by any chance?
http://sourceforge.net/projects/hpc/files/hpc/gcc/

Igor


On Fri, Oct 25, 2013 at 12:43 PM, William G. Scott wgsc...@ucsc.edu wrote:

 Dear RDkit community:

 I’ve been maintaining a fink package for RDkit (primarily as a dependency
 for coot).

 cf:  http://tinyurl.com/rdkitfink

 It compiles and on OSX 10.6, 10.7 and 10.8, but not 10.9.  (This includes
 the 2013_9 pre-release, FWIW.)

 With 10.9, the migration to clang++ is upon us, and I’m stuck.

 Has anyone succeeded in getting rdkit compiled on 10.9, and if so, how?

 Also, if anyone has feedback or recommendations  for how to improve the
 fink rdkit package, please let me know.

 Thanks in advance.

 Bill Scott





 William G. Scott
 Professor
 Department of Chemistry and Biochemistry
 and The Center for the Molecular Biology of RNA
 228 Sinsheimer Laboratories
 University of California at Santa Cruz
 Santa Cruz, California 95064
 USA


 --
 October Webinars: Code for Performance
 Free Intel webinars can help you accelerate application performance.
 Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most
 from
 the latest Intel processors and coprocessors. See abstracts and register 
 http://pubads.g.doubleclick.net/gampad/clk?id=60135991iu=/4140/ostg.clktrk
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60135991iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Beta of Q3 2013 release available

2013-10-25 Thread Paul Emsley
On 25/10/13 08:09, James Davidson wrote:
 Hi Roger,

 Thanks for the response

 The use of an integer file format flavor argument allows the caller to
 customize the behavior of the readers and writers.  The semantics is that a
 reasonable default is zero (for all bits), but that new features may be added
 without changing the API/ABI.
 Most of the bits above (for the writer) control strict compliance with the 
 PDB
 format specification.  For example, a flavor of 12 will write bond orders the
 way the RCSB expects them both throwing away bond orders and increasing
 the size of the PDB file.
 As a test, I am using 2VCI, and am retrieving the PDB data from the RCSB 
 using the following

 import requests
 url = 
 http://www.rcsb.org/pdb/download/downloadFile.do?fileFormat=pdbcompression=NOstructureId=2VCI;
 response = requests.get(url)
 pdb_block = response.content
 response.close()


 pdb_block shows CONECT records only for the HETATM records.
 If I now read into RDKit, using the defaults, and write back out using the 
 defaults, I see CONECT records for every atom (ie protein as well).  And I 
 can't see any double-bonds rendered in PyMOL:

 from rdkit import Chem
 from rdkit.Chem import AllChem
 pdb = Chem.MolFromPDBBlock(pdb_block)
 pdb_block_out = Chem.MolToPDBBlock(pdb)

 First 10 CONECT records of output:
 CONECT12
 CONECT235
 CONECT344   10
 CONECT56
 CONECT67
 CONECT7889
 CONECT   10   11
 CONECT   11   12   14
 CONECT   12   13   13   17
 CONECT   14   15   16


 If I use Chem.MolToPDBBlock(pdb, flavor=12) I do, indeed see the ligand 
 CONECT records in what looks like the original format (albeit now numbered 
 differently), and I still see CONECT records for the protein - but this PDB 
 *will* render double bonds in PyMOL.

 First 10 CONECT records of output:
 CONECT344
 CONECT788
 CONECT   12   13   13
 CONECT   19   20   20
 CONECT   23   24   24
 CONECT   28   29   29
 CONECT   35   36   36
 CONECT   38   39   39
 CONECT   40   42   42
 CONECT   41   43   43


If I may be so bold, I believe an important part of the puzzle is 
missing.  The residue-name/3-letter-code/comp-id in the PDB file is a 
pointer to an entry in the mmCIF-formatted chemical component dictionary 
that describes the compound, for all compounds for all entries released 
by the PDB.

http://deposit.pdb.org/cc_dict_tut.html

If this is an internal PDB file there will, very likely be a similar 
mmCIF file used for crystallographic refinement.

Only when these options fail would I consider turning to bond-order 
perception and CONECT records.

Paul.


--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60135991iu=/4140/ostg.clktrk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Beta of Q3 2013 release available

2013-10-25 Thread sereina riniker
Hi James,

Okay, now it's clear. I somehow (wrongly) thought the PDB reader would give
you the protein and the ligand as two molecules and then it wouldn't have
been a problem... I will discuss with Greg on how to best do this and get
back to you.

Best,
Sereina


2013/10/25 James Davidson j.david...@vernalis.com

 Hi Sereina,

 Sereina wrote:
  Regarding the AssignBondOrdersFromTemplate() method:
  As far as I understood, the PDB reader assigns bond orders to the amino
 acids in a protein, but if a ligand is present it puts all bonds of it to
 SINGLE bonds as auto bond-type perception is not trivial (see Roger's
 comments).
  However, usually one knows which ligand was crystallized (i.e. the
 SMILES is available), so the AssignBondOrdersFromTemplate() method can be
 used to set the bond orders based on the known ligand structure.
  This is the idea of the method. Now, to your real-world application. I'm
 sorry but I don't think I understand it completely. Do you want to set only
 the bond orders of a specific substructure?
  Or would you like to give the function a set of ligands and a set of
 templates and it figures out which template belongs to which ligand and
 sets the bonds orders accordingly?

 This is very likely to be me being stupid - so please bear with me!
 If I read in a complex (pdb), and already have my reference ligand (lig),
 then AllChem.AssignBondOrdersFromTemplate(lig, pdb) fails because the
 reference ligand has not been matched to the ligand in the pdb 'complex'
 (dot-separated list of molecules).
 The doc-string states that the method works on two molecules - but I want
 to work on a reference molecule (lig) and a *substructure* of the
 macromolecule (pdb).  How should I be getting the bound ligand out as a
 molecule object to then use the AssignBondOrdersFromTemplate() method?  Am
 I missing some new PDB-related methods, or have I forgotten some
 fundamental RDKit methods for dealing with multi-component molecules?

 I guess a sensible process would be:
 1. Identify any HETATM residues
 2. For each residue (or at least those that have bonds!) extract or copy
 the mol (unless it can be addressed 'in place'?)
 3. Use AssignBondOrdersFromTemplate() - relying on lookup be eg residue
 name, etc
 4. Insert the molecule back into the complex (or update the info if it has
 been modified 'in place')

 Is this how the method is intended to be used with complexes (and if so,
 do you have an example for steps 2 and 4?

 Thanks

 James

 __
 PLEASE READ: This email is confidential and may be privileged. It is
 intended for the named addressee(s) only and access to it by anyone else is
 unauthorised. If you are not an addressee, any disclosure or copying of the
 contents of this email or any action taken (or not taken) in reliance on it
 is unauthorised and may be unlawful. If you have received this email in
 error, please notify the sender or postmas...@vernalis.com. Email is not
 a secure method of communication and the Company cannot accept
 responsibility for the accuracy or completeness of this message or any
 attachment(s). Please check this email for virus infection for which the
 Company accepts no responsibility. If verification of this email is sought
 then please request a hard copy. Unless otherwise stated, any views or
 opinions presented are solely those of the author and do not represent
 those of the Company.

 The Vernalis Group of Companies
 100 Berkshire Place
 Wharfedale Road
 Winnersh, Berkshire
 RG41 5RD, England
 Tel: +44 (0)118 938 

 To access trading company registration and address details, please go to
 the Vernalis website at www.vernalis.com and click on the Company
 address and registration details link at the bottom of the page..
 __

--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60135991iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDkit, OS X 10.9 and clang++

2013-10-25 Thread greg landrum
Not an easy one. Since I don't have a Mac with 10.9 installed, I can't try it 
out either.

I have been able to build the rdkit with clang++ in the past without problems 
and could certainly give that a try on a Linux box. Which version of clang are 
you using?

What are you seeing for error messages when you try a build?

-greg

 On 25 Oct 2013, at 18:43, William G. Scott wgsc...@ucsc.edu wrote:
 
 Dear RDkit community:
 
 I’ve been maintaining a fink package for RDkit (primarily as a dependency for 
 coot).  
 
 cf:  http://tinyurl.com/rdkitfink
 
 It compiles and on OSX 10.6, 10.7 and 10.8, but not 10.9.  (This includes the 
 2013_9 pre-release, FWIW.)
 
 With 10.9, the migration to clang++ is upon us, and I’m stuck.
 
 Has anyone succeeded in getting rdkit compiled on 10.9, and if so, how?  
 
 Also, if anyone has feedback or recommendations  for how to improve the fink 
 rdkit package, please let me know.
 
 Thanks in advance.
 
 Bill Scott
 
 
 
 
 
 William G. Scott
 Professor
 Department of Chemistry and Biochemistry
 and The Center for the Molecular Biology of RNA
 228 Sinsheimer Laboratories
 University of California at Santa Cruz
 Santa Cruz, California 95064
 USA
 
 --
 October Webinars: Code for Performance
 Free Intel webinars can help you accelerate application performance.
 Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
 the latest Intel processors and coprocessors. See abstracts and register 
 http://pubads.g.doubleclick.net/gampad/clk?id=60135991iu=/4140/ostg.clktrk
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60135991iu=/4140/ostg.clktrk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Beta of Q3 2013 release available

2013-10-25 Thread Andrew Dalke
On Oct 25, 2013, at 10:11 AM, Roger Sayle wrote:
 The use of an integer file format flavor argument allows the caller  
 to customize the behavior of the readers and writers.  The semantics
 is that a reasonable default is zero (for all bits), but that new
 features may be added without changing the API/ABI.

For some background, this is the API style used by OpenEye's
high-level readers and writers. There's more explanation at:

http://www.eyesopen.com/docs/toolkits/current/html/OEChem_TK-python/molreadwrite.html#flavored-input-and-output

It solves a difficult problem, which is that there is no
such thing as the PDB format. (For that matter, there are
also variations of the MDL format, if only because the
output writer could use V3000 format for all cases, vs. V3000
only when V2000 can't support the structure.)

RDKit also supports different input and output flavors, though
it uses parameter attributes, like sanitize=False or
removeHs=False for reading an SD file.

OEChem's interface is more generic, in that the single 'flavor'
parameter exists for the high-level readers, which is easier
to pass around in a C++ toolkit.

(OTOH, this is less important for Python code. In chemfp, I
just pass around a Python dictionary of kwargs and apply
it like: SDMolSupplier(filename, **kwargs). )


However, these integer flags are tricky to use in practice.

For example, if you see flavor=49, what does it mean? Few
people will be able to look at that number and know it's:

  bit  1 = Write MODEL/ENDMDL lines around each record
  bit 16 = Write MASTER record
  bit 32 = Write TER record

For OEChem support, I ended up writing my own conversion
routines between the integer and a string notation. After
all, I would rather people do:

  rdkit2fps input.pdb --flavor MASTER|MODEL|TER

than have to do bitwise or-ing themselves for:

  rdkit2fps input.pdb --flavor 49


Bitflags also don't mix well with non-binary states.
Consider an SD file writer which supports a three-state option:
 - only V2000 output (ignore or generate corrupt records otherwise?)
 - V3000 output if required, otherwise V2000
 - always V3000

It's of course possible to encode this using 2 bits, but it
loses some of its elegance.

Think though of RDKit's SMILES file reader. It supports a
'delimiter' option, in order to support space, tab, comma,
and I presume other delimiters as well. It also supports
the ability to say that the SMILES come from something other
than the first column, and the SMILES from other than the
second.

These are even harder to encode in a single flavor.

BTW, OEChem doesn't support a delimiter option. Their 'SMILES
file' comes from the Daylight practice of

  SMILES + whitespace + rest_of_line_as_title

vs. the RDKit practice of assuming the file is a set of
delimited columns, with a possible header.


Above Roger said above that a reasonable default is zero (for all
bits), but that new features may be added without changing
the API/ABI.

Most file format work nicely with binary flags, as OEChem's
practice well shows. Some do not, as RDKit's SMILES file
format suggests.

There are other possible APIs which can handle the requirement of
supporting new features without changing the API/ABI.

RDKit's current method, that of passing additional arguments
to the function or constructor, is not scalable. I may have
multiple layers before I get to the actual reader or writer,
and I don't want to update the intermediate APIs every time
something changes.

I think it's very interesting that OEChem's new InChI
support (added only recently, so Roger might not know about
it), takes an InChIOptions object.

http://www.eyesopen.com/docs/toolkits/current/html/OEChem_TK-python/OEChemClasses/OEInChIOptions.html

OEInChIOptions(unsigned int flavor = OEOFlavor::INCHI::Default)

with methods like:
  .GetChiral()
  .GetFixedHLayer()
 ...
  .SetChiral()
  .SetFixedHLayer()
 ...

I don't know why they switched to this style for this case.
I wonder if part of it was to insulate themselves from any
odd specifications InChI might add in the future.

I prefer this style - an instance which contains the different
parameters - though I haven't used it in earnest.

This style too has difficulties, especially in C++. Ideally
you want to support programs which support, say, version 2013
(without a given feature and associated method) and version
2014 (without). You can't do that in a language like C++ which
requires all methods to be resolved in order for the program
to run.

The XMLReader API supports a 'getFeature(name)' and associated
'getProperty()'/'setProperty()', which might provide the right
generic API.

That said, you should read my email as commentary, and not
as a statement for or against the current code. While I don't
like it that much; without doubt, bit flags do work for this
task. And because of C++ overloading, there's also a migration
path to support an options class API like I promoted just now.


Andrew

Re: [Rdkit-discuss] Beta of Q3 2013 release available

2013-10-25 Thread ro...@nextmovesoftware.com

Hi James,

There's something very strange going on here with PyMol.

On Oct 25, 2013, at 1:09 PM, James Davidson wrote:
 I can't see any double-bonds rendered in PyMOL:
 CONECT344   10

Here atom 3 has two bonds to atom 4.  Why isn't it displayed double?

 This PDB *will* render double bonds in PyMOL.
 CONECT344

As expected.

 (and, again, I also see double bonds in PyMOL).
 CONECT324   10

No explicit double bond.  Where is the double bond coming from?


I'd expect two of the above cases to show double bonds, and one to  
only have
single bonds.  What is confusing is that which is which doesn't make  
any sense.


 Can you (or Greg) post a list of what the current input flavors do?

Currently the reader only has a single flavor...
flavor  1 : Read alternate locations, XPLOR/NMR pseudo atoms, and PDB  
dummy residues.

By default the PDB file reader only returns atoms with alternate  
locations fields
of space, 'A' or '1'.  It also ignores atoms with co-ordinates  
.000, .000, .000
that appear in XPLOR output for leaving group atoms in covalently  
bonded ligands.
Likewise, atoms with atomic symbol  Q which are typically dummy  
atoms used as
refinement constraints in NMR refinement.

If the flavor parameter has the value 1, all these pseudo-atoms are  
read into
the RDKit::ROMol, but clearly their semantics isn't understood by the  
rest of the
toolkit.  Valences will be incorrect, and a protein with multiple  
alternate sidechain
conformations for some will likely fail sanitization.



I hope this helps.

Roger
--
Roger Sayle, Ph.D.
CEO and founder
NextMove Software Limited
Registered in England No. 07588305
Registered Office: Innovation Centre (Unit 23), Cambridge Science  
Park, Cambridge CB4 0EY


--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60135991iu=/4140/ostg.clktrk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDkit, OS X 10.9 and clang++

2013-10-25 Thread William G. Scott
Hi Greg:

I’ve just placed two log files on http://fennario.ucsc.edu/~wgscott/temp/rdkit/

One was generated using the /usr/bin/g++ compiler on 10.9, i.e.,

zsh-% /usr/bin/g++ --version
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr 
--with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 5.0 (clang-500.2.79) (based on LLVM 3.3svn)
Target: x86_64-apple-darwin13.0.0
Thread model: posix

which seems to be the same as

zsh-% /usr/bin/clang++ --version
Apple LLVM version 5.0 (clang-500.2.79) (based on LLVM 3.3svn)
Target: x86_64-apple-darwin13.0.0
Thread model: posix


The other one, which gets a bit further, was generated with fink’s g++ version 
4.8 compiler in /sw/bin/g++, i.e.,

zsh-% /sw/bin/g++-4  --version
g++-4 (GCC) 4.8.2
Copyright (C) 2013 Free Software Foundation, Inc.


However, as a positive control, I just installed the most recently available 
compilers for 10.8, which still works fine to compile rdkit

fennario-% /usr/bin/g++ --version
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr 
--with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 5.0 (clang-500.2.76) (based on LLVM 3.3svn)
Target: x86_64-apple-darwin12.5.0
Thread model: posix

It looks almost the same as the one for 10.9, so I am even more stumped than 
before.

Bill


On Oct 25, 2013, at 11:23 AM, greg landrum greg.land...@gmail.com wrote:

 Not an easy one. Since I don't have a Mac with 10.9 installed, I can't try it 
 out either.
 
 I have been able to build the rdkit with clang++ in the past without problems 
 and could certainly give that a try on a Linux box. Which version of clang 
 are you using?
 
 What are you seeing for error messages when you try a build?
 
 -greg
 
 On 25 Oct 2013, at 18:43, William G. Scott wgsc...@ucsc.edu wrote:
 
 Dear RDkit community:
 
 I’ve been maintaining a fink package for RDkit (primarily as a dependency 
 for coot).  
 
 cf:  http://tinyurl.com/rdkitfink
 
 It compiles and on OSX 10.6, 10.7 and 10.8, but not 10.9.  (This includes 
 the 2013_9 pre-release, FWIW.)
 
 With 10.9, the migration to clang++ is upon us, and I’m stuck.
 
 Has anyone succeeded in getting rdkit compiled on 10.9, and if so, how?  
 
 Also, if anyone has feedback or recommendations  for how to improve the fink 
 rdkit package, please let me know.
 
 Thanks in advance.
 
 Bill Scott
 
 
 
 
 
 William G. Scott
 Professor
 Department of Chemistry and Biochemistry
 and The Center for the Molecular Biology of RNA
 228 Sinsheimer Laboratories
 University of California at Santa Cruz
 Santa Cruz, California 95064
 USA
 
 --
 October Webinars: Code for Performance
 Free Intel webinars can help you accelerate application performance.
 Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most 
 from 
 the latest Intel processors and coprocessors. See abstracts and register 
 http://pubads.g.doubleclick.net/gampad/clk?id=60135991iu=/4140/ostg.clktrk
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60135991iu=/4140/ostg.clktrk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDkit, OS X 10.9 and clang++

2013-10-25 Thread Greg Landrum
On Fri, Oct 25, 2013 at 11:06 PM, William G. Scott wgsc...@ucsc.edu wrote:

 Hi Greg:

 I’ve just placed two log files on
 http://fennario.ucsc.edu/~wgscott/temp/rdkit/


The error messages look really familiar, but I unfortunately can't find a
reference for them. I will keep dredging around though.

One was generated using the /usr/bin/g++ compiler on 10.9, i.e.,

 zsh-% /usr/bin/g++ --version
 Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr
 --with-gxx-include-dir=/usr/include/c++/4.2.1
 Apple LLVM version 5.0 (clang-500.2.79) (based on LLVM 3.3svn)
 Target: x86_64-apple-darwin13.0.0
 Thread model: posix

 which seems to be the same as

 zsh-% /usr/bin/clang++ --version
 Apple LLVM version 5.0 (clang-500.2.79) (based on LLVM 3.3svn)
 Target: x86_64-apple-darwin13.0.0
 Thread model: posix


 The other one, which gets a bit further, was generated with fink’s g++
 version 4.8 compiler in /sw/bin/g++, i.e.,

 zsh-% /sw/bin/g++-4  --version
 g++-4 (GCC) 4.8.2
 Copyright (C) 2013 Free Software Foundation, Inc.


 However, as a positive control, I just installed the most recently
 available compilers for 10.8, which still works fine to compile rdkit

 fennario-% /usr/bin/g++ --version
 Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr
 --with-gxx-include-dir=/usr/include/c++/4.2.1
 Apple LLVM version 5.0 (clang-500.2.76) (based on LLVM 3.3svn)
 Target: x86_64-apple-darwin12.5.0
 Thread model: posix

 It looks almost the same as the one for 10.9, so I am even more stumped
 than before.


Any chance you could try on the 10.9 system with either a different version
of boost (1.51?) or the new RDKit beta?
We've got some evidence that it's not the compiler (I also did a clang3.3
build on my linux box without problems), so trying an alternate boost
version seems like the next logical step.

Sorry for the inconvenience,
-greg
--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60135991iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss