Re: [Rdkit-discuss] adding fragment to existing molecule

2017-08-07 Thread Greg Landrum
The other answers on this thread have been right on point with the
exception of neglecting to explicitly encourage you to call
Chem.SanitizeMol() on your joined molecule before you do anything else with
it. In your case you'd call:
Chem.SanitizeMol(back)
This will lead to the error that Nik explains...

Christos (in a different reply) provides the two possible strategies I'd
recommend for combining fragments:
1) using dummy atoms to mark the attachment points on the fragments,
combing the fragments, forming a bond between the atoms that are bonded to
the dummy atoms, and then removing the dummy atoms
2) using a chemical reaction. I'd probably also use dummy atoms here to
mark attachment points.

Note that this whole process will give you a single molecule with the two
fragments combined and bonded to each other with a bond having the
appropriate length (you set that explicitly) but it won't orient the
fragments automatically. You'd probably like the bonds between the atoms to
connect and their dummy atoms to be antiparallel to each other when you
actually connect the fragments. Once you've done the geometry to figure out
the appropriate rotations, the RDKit has the code required to do the
rotations. Note that you should do this on one of the fragments before
combining them.

(very) pseudo code for what I think you want to do:

   1. Add Hs to fragments that have attachment points.
   2. Generate conformations for each fragment
   3. Figure out the connection vectors for each fragment. This is the
   vector from the atom to be connected to its dummy atom. Let's call these
   CV1 (for fragment 1) and CV2 (for fragment 2)
   4. Translate fragment 2 to the origin
   5. Rotate fragment 2 so that CV2 is anti-parallel to CV1
   6. Translate fragment 2 so that it's atom to be connected is at the same
   position as the dummy atom from fragment 1 (note: you could also set the
   target bond length now, skipping step 11 below, but that's a bit of
   additional prep work)
   7. Combine fragment 1 and fragment 2 to form mol 1
   8. remove the attachment points in mol1
   9. form a bond between the two atoms that should be connected in mol 1
   10. sanitize mol 1
   11. set the bond length

If you use the reaction based approach, you'd skip steps 7-9 because mol 1
would come from the reaction.

I hope this helps; it's not a trivial task, so feel free to keep asking
followup questions if there's something you don't understand.

-greg



On Mon, Aug 7, 2017 at 5:46 PM, Per Jr. Greisen  wrote:

> Hi Nikolaus and Ling,
>
> Thanks for your help (the atom numbe shouldnt be 43 but it still gives the
> error I will clarify)- yes Nikolaus you are right it is a sanitization
> issue and in this case I am trying to use it as a molecular editor to build
> a model molecule (a transition state model to be exact) - I would normally
> do this calling some other script but it would be very nice to do all of it
> in the framework of RDkit - can this be done? Thanks
>
> On Mon, Aug 7, 2017 at 12:05 PM, Stiefl, Nikolaus <
> nikolaus.sti...@novartis.com> wrote:
>
>> Hi Per
>>
>> Just by looking at your code I would assume you have a sanitization
>> issue. You create your pentane molecule and then add H’s. This will
>> saturate each single carbon. When you then add a bond between the two
>> fragments your atom 3 will have a valence of 5 and this causes issues.
>>
>> Maybe do the fragment combination first and then add the H’s? Or do an
>> explicit handling of the correct carbon you link to upfront.
>>
>> Hope this helps
>>
>> Nik
>>
>>
>>
>>
>>
>> *From: *"Per Jr. Greisen" 
>> *Date: *Sunday 6 August 2017 at 19:55
>> *To: *RDKit 
>> *Subject: *[Rdkit-discuss] adding fragment to existing molecule
>>
>>
>>
>> Hi all,
>>
>>
>>
>> I am trying to add a fragment to an existing molecule using RDkit - I
>> start by generating the desired molecules I would like to combine:
>>
>>
>>
>> oh = '[OH-]'
>>
>> ohh = Chem.MolFromSmiles(oh)
>>
>> oh = Chem.AddHs(ohh)
>>
>> oh.SetProp("_Name","OH-")
>>
>> AllChem.EmbedMolecule(oh, AllChem.ETKDG())
>>
>>
>>
>> smiles_ = 'C'
>>
>> m = Chem.MolFromSmiles(smiles_)
>>
>> m_h = Chem.AddHs(vxm)
>>
>> m_h.SetProp("_Name","XP")
>>
>> AllChem.EmbedMolecule(m_h, AllChem.ETKDG())
>>
>>
>>
>> I combine them which works fine:
>>
>>
>>
>> combo = Chem.CombineMols(m_h,oh)
>>
>>
>>
>> and I can add the bond between the desired atoms:
>>
>>
>>
>>
>>
>> edcombo = Chem.EditableMol(combo)
>>
>>
>>
>> edcombo.AddBond(3,1,order=Chem.rdchem.BondType.SINGLE)
>>
>> back = edcombo.GetMol()
>>
>>
>>
>> The problems arises when I want to edit the geometry between the two :
>>
>>
>>
>> from rdkit.Chem import rdMolTransforms as rdmt
>>
>> conf = back.GetConformer(0)
>>
>>
>>
>> rdmt.SetBondLength(conf,3,43,10)
>>
>>
>>
>> writer3 = Chem.SDWriter('out_long.sdf')
>>
>> writer3.write(back,confId=0)
>>
>>
>>
>>
>>
>>
>>
>> RuntimeError  Traceback (most recent call last)
>>
>>  in ()
>>
>> *  2* conf = back.Ge

Re: [Rdkit-discuss] problem with AssignAtomChiralTagsFromStructure

2017-08-07 Thread Greg Landrum
I don't think I understand the question. The Conformer object is normally
attached to/associated with a mol object.

-greg


On Sun, Aug 6, 2017 at 11:41 AM, Per Jr. Greisen  wrote:

> Thanks - worked perfectly if I was to do this conformers is there an easy
> way to transform between conformer object and molecule object?
>
> On Sun, Aug 6, 2017 at 4:11 AM, Ling Chan  wrote:
>
>> Hello Per,
>>
>> Apparently the default for FindMolChiralCenters is that only explicitly
>> specified chiral centers are output. Try the following.
>>
>> Chem.FindMolChiralCenters(mh, includeUnassigned=True)
>>
>> As for AssignAtomChiralTagsFromStructure , I think it detects chirality
>> from 3D structures. You need atomic coordinates for it to work.
>>
>> Ling
>>
>>
>> On Sat, Aug 5, 2017 at 11:35 AM, Per Jr. Greisen 
>> wrote:
>>
>>> Hi all,
>>>
>>> I have an issue to correctly assign chirality through RDkit so I have
>>> the following isomers around a phosphorus atom:
>>>
>>> smiles = 'CCOP(C)(=O)SCC[NH+](C(C)C)C(C)C'
>>> m = Chem.MolFromSmiles(vx_smiles)
>>> mh = Chem.AddHs(m)
>>>
>>> m_s = Chem.MolToSmiles(Chem.MolFromSmiles(smiles),isomericSmiles=True)
>>>
>>> Chem.FindMolChiralCenters(mh)
>>>
>>> # returns none
>>> None
>>>
>>> #
>>> Chem.AssignAtomChiralTagsFromStructure(vxm_h)
>>> None
>>>
>>>
>>> How to resolve this - I am just starting to use RDkit and it is seems
>>> very promissing
>>>
>>> Thanks
>>>
>>>
>>> --
>>> With kind regards
>>>
>>> Per
>>>
>>> 
>>> --
>>> Check out the vibrant tech community on one of the world's most
>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>>
>>
>
>
> --
> With kind regards
>
> Per
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] adding fragment to existing molecule

2017-08-07 Thread Christos Kannas
Hi Per,

I can think of 2 approaches to solve this.

The 1st is to have fragments of molecules that have an explicit connection
point, i.e. OH[*] and C[*], and use RDKit's functionality of combining
fragments.
The 2nd is to use define a reaction for this using SMIRKS or Reaction
SMILES, i.e. [OH-].C>>COH, and use RDKit's reaction functionality
to perform the reaction on your molecules.

Hope this was a bit helpful.

Regards,

Christos

Christos Kannas

Chem[o]informatics Researcher & Software Developer

[image: View Christos Kannas's profile on LinkedIn]


On 7 August 2017 at 18:46, Per Jr. Greisen  wrote:

> Hi Nikolaus and Ling,
>
> Thanks for your help (the atom numbe shouldnt be 43 but it still gives the
> error I will clarify)- yes Nikolaus you are right it is a sanitization
> issue and in this case I am trying to use it as a molecular editor to build
> a model molecule (a transition state model to be exact) - I would normally
> do this calling some other script but it would be very nice to do all of it
> in the framework of RDkit - can this be done? Thanks
>
> On Mon, Aug 7, 2017 at 12:05 PM, Stiefl, Nikolaus <
> nikolaus.sti...@novartis.com> wrote:
>
>> Hi Per
>>
>> Just by looking at your code I would assume you have a sanitization
>> issue. You create your pentane molecule and then add H’s. This will
>> saturate each single carbon. When you then add a bond between the two
>> fragments your atom 3 will have a valence of 5 and this causes issues.
>>
>> Maybe do the fragment combination first and then add the H’s? Or do an
>> explicit handling of the correct carbon you link to upfront.
>>
>> Hope this helps
>>
>> Nik
>>
>>
>>
>>
>>
>> *From: *"Per Jr. Greisen" 
>> *Date: *Sunday 6 August 2017 at 19:55
>> *To: *RDKit 
>> *Subject: *[Rdkit-discuss] adding fragment to existing molecule
>>
>>
>>
>> Hi all,
>>
>>
>>
>> I am trying to add a fragment to an existing molecule using RDkit - I
>> start by generating the desired molecules I would like to combine:
>>
>>
>>
>> oh = '[OH-]'
>>
>> ohh = Chem.MolFromSmiles(oh)
>>
>> oh = Chem.AddHs(ohh)
>>
>> oh.SetProp("_Name","OH-")
>>
>> AllChem.EmbedMolecule(oh, AllChem.ETKDG())
>>
>>
>>
>> smiles_ = 'C'
>>
>> m = Chem.MolFromSmiles(smiles_)
>>
>> m_h = Chem.AddHs(vxm)
>>
>> m_h.SetProp("_Name","XP")
>>
>> AllChem.EmbedMolecule(m_h, AllChem.ETKDG())
>>
>>
>>
>> I combine them which works fine:
>>
>>
>>
>> combo = Chem.CombineMols(m_h,oh)
>>
>>
>>
>> and I can add the bond between the desired atoms:
>>
>>
>>
>>
>>
>> edcombo = Chem.EditableMol(combo)
>>
>>
>>
>> edcombo.AddBond(3,1,order=Chem.rdchem.BondType.SINGLE)
>>
>> back = edcombo.GetMol()
>>
>>
>>
>> The problems arises when I want to edit the geometry between the two :
>>
>>
>>
>> from rdkit.Chem import rdMolTransforms as rdmt
>>
>> conf = back.GetConformer(0)
>>
>>
>>
>> rdmt.SetBondLength(conf,3,43,10)
>>
>>
>>
>> writer3 = Chem.SDWriter('out_long.sdf')
>>
>> writer3.write(back,confId=0)
>>
>>
>>
>>
>>
>>
>>
>> RuntimeError  Traceback (most recent call last)
>>
>>  in ()
>>
>> *  2* conf = back.GetConformer(0)
>>
>> *  3*
>>
>> > 4 rdmt.SetBondLength(conf,3,43,10)
>>
>> *  5*
>>
>> *  6* writer3 = Chem.SDWriter('out_long.sdf')
>>
>>
>>
>> RuntimeError: Pre-condition Violation
>>
>> RingInfo not initialized
>>
>> Violation occurred on line 66 in file Code/GraphMol/RingInfo.cpp
>>
>> Failed Expression: df_init
>>
>> RDKIT: 2017.03.3
>>
>> BOOST: 1_56
>>
>>
>>
>> So I am not sure how fix - thanks in advance
>>
>>
>>
>>
>>
>> --
>>
>> With kind regards
>>
>>
>> Per
>>
>
>
>
> --
> With kind regards
>
> Per
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] . Re: using rdkit to read in chembl23 1.7 million compounds

2017-08-07 Thread Bennion, Brian
Hello Peter,
Great, that just made me realize that I was not using my most recent conda 
environment version of RDkit.
I reread the 2D sdf file with the latest rdkit version and now only 31 
molecules are tossed out by the SDMolsupplier in RDKit.  51 compounds had 
errors when reading in the smiles strings.
Brian


From: Peter S. Shenkin [mailto:shen...@gmail.com]
Sent: Monday, August 07, 2017 14:26
To: Bennion, Brian 
Cc: Chris Swain ; rdkit-discuss@lists.sourceforge.net
Subject: Re: [Rdkit-discuss] . Re: using rdkit to read in chembl23 1.7 million 
compounds

That molecule's SMILES is correctly rendered by RDKit, or at least by the 
version of RDKit behind Slack:

[Inline image 1]


-P.

On Mon, Aug 7, 2017 at 3:54 PM, Bennion, Brian 
mailto:benni...@llnl.gov>> wrote:

The carbocations are in small heterocyclic molecules. see CHEMBL3815233

Brian




From: Chris Swain mailto:sw...@mac.com>>
Sent: Monday, August 7, 2017 11:46:30 AM
To: 
rdkit-discuss@lists.sourceforge.net
Subject: [Rdkit-discuss] . Re: using rdkit to read in chembl23 1.7 million 
compounds

I've not tried to read in ChEMBL but I have tried to process other large 
datasets e.g. ZINC. My impression was that problems arose with small 
heterocyclic systems, particularly if fused or containing multiple different 
heteroatoms. I did wonder if the different aromaticity models might be the 
issue.

Chris
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] . Re: using rdkit to read in chembl23 1.7 million compounds

2017-08-07 Thread Peter S. Shenkin
That molecule's SMILES is correctly rendered by RDKit, or at least by the
version of RDKit behind Slack:

[image: Inline image 1]


-P.

On Mon, Aug 7, 2017 at 3:54 PM, Bennion, Brian  wrote:

> The carbocations are in small heterocyclic molecules. see CHEMBL3815233
>
> Brian
>
>
> --
> *From:* Chris Swain 
> *Sent:* Monday, August 7, 2017 11:46:30 AM
> *To:* rdkit-discuss@lists.sourceforge.net
> *Subject:* [Rdkit-discuss] . Re: using rdkit to read in chembl23 1.7
> million compounds
>
> I've not tried to read in ChEMBL but I have tried to process other large
> datasets e.g. ZINC. My impression was that problems arose with small
> heterocyclic systems, particularly if fused or containing multiple
> different heteroatoms. I did wonder if the different aromaticity models
> might be the issue.
>
> Chris
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] . Re: using rdkit to read in chembl23 1.7 million compounds

2017-08-07 Thread Bennion, Brian
The carbocations are in small heterocyclic molecules. see CHEMBL3815233

Brian



From: Chris Swain 
Sent: Monday, August 7, 2017 11:46:30 AM
To: rdkit-discuss@lists.sourceforge.net
Subject: [Rdkit-discuss] . Re: using rdkit to read in chembl23 1.7 million 
compounds

I've not tried to read in ChEMBL but I have tried to process other large 
datasets e.g. ZINC. My impression was that problems arose with small 
heterocyclic systems, particularly if fused or containing multiple different 
heteroatoms. I did wonder if the different aromaticity models might be the 
issue.

Chris
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] . Re: using rdkit to read in chembl23 1.7 million compounds

2017-08-07 Thread Chris Swain
I've not tried to read in ChEMBL but I have tried to process other large 
datasets e.g. ZINC. My impression was that problems arose with small 
heterocyclic systems, particularly if fused or containing multiple different 
heteroatoms. I did wonder if the different aromaticity models might be the 
issue.

Chris
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] FW: using rdkit to read in chembl23 1.7 million compounds

2017-08-07 Thread Maciek Wójcikowski
Hi Brian, Konrad,

Just a sidenote - It's not a crash. Python/Boost is just complaining, that
the first argument is in fact None and it should be RDKit Mol instance.
Instead of filtering all lowercase s from smiles, you should check if mol
is None in your for loop, and skip those which are.


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2017-08-07 20:39 GMT+02:00 Bennion, Brian :

>
>
>
>
> *From:* Bennion, Brian
> *Sent:* Monday, August 07, 2017 11:39
> *To:* 'Konrad Koehler' 
> *Subject:* RE: [Rdkit-discuss] using rdkit to read in chembl23 1.7
> million compounds
>
>
>
> Hello Konrad,
>
> Thank you for your response.
>
> For the handful of compounds i looked at:
>
> multiple ringed compounds that had %11 up to %14 labeled rings coordinated
> to zinc had issues
>
> aromatic carbocations [c+] had issues
>
>
>
> As a side note, I attempted reading in the 2D sdf file that chembl
> supplies.  I was able to reduce the failed molecules to 253.
>
> There were still many warnings about stereochemistry being ambiguous and
> strange tags like STY at the end of the molecules.
>
>
>
> Brian
>
>
>
> *From:* Konrad Koehler [mailto:konrad.koeh...@me.com
> ]
> *Sent:* Monday, August 07, 2017 11:25
> *To:* Bennion, Brian 
> *Subject:* Re: [Rdkit-discuss] using rdkit to read in chembl23 1.7
> million compounds
>
>
>
> Hi Brain,
>
>
>
> Similar problems here in trying to read, fragment, and canonicalize the
> Zinc “In Stock” database of roughly one million compounds. Most of the
> problematic structures contained aromatic sulfur atoms.  (Thiophene itself
> is no problem.  Most of the crashes were from more complex heteroaromatic
> systems containing sulfur). Filtering the input file to remove SMILES
> strings with lowercase “s” allowed me to process the rest of the file
> without RDKit crashing.
>
>
>
> Cheers,
>
>
>
> Konrad
>
>
>
> crash dump:
>
>
>
> Can't kekulize mol.
>
> child_node = AllChem.CanonSmiles(child_node)
>
>   File "/Users/konradkoehler/anaconda/lib/python2.7/site-
> packages/rdkit/Chem/__init__.py", line 43, in CanonSmiles
>
> return MolToSmiles(m, useChiral)
>
> Boost.Python.ArgumentError: Python argument types in
>
> rdkit.Chem.rdmolfiles.MolToSmiles(NoneType, int)
>
> did not match C++ signature:
>
> MolToSmiles(RDKit::ROMol mol, bool isomericSmiles=False, bool
> kekuleSmiles=False, int rootedAtAtom=-1, bool canonical=True, bool
> allBondsExplicit=False, bool allHsExplicit=False)
>
>
>
>
>
> On 7 Aug 2017, at 18:36, Bennion, Brian  wrote:
>
>
>
> Hello,
>
>
>
> This might be a nit picky question.  I am attempting to read in the smiles
> string for the 1.7 million non-biological compounds in the latest chembl23
> release.  As it turns out 382 compounds fail to be read by RDkit.
>
> The errors are either kekulization failure or valence errors.
>
>
>
> Has anyone attempted this task before?
>
> Brian
>
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org ! http://
> sdm.link/slashdot___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] FW: using rdkit to read in chembl23 1.7 million compounds

2017-08-07 Thread Bennion, Brian


From: Bennion, Brian
Sent: Monday, August 07, 2017 11:39
To: 'Konrad Koehler' 
Subject: RE: [Rdkit-discuss] using rdkit to read in chembl23 1.7 million 
compounds

Hello Konrad,
Thank you for your response.
For the handful of compounds i looked at:
multiple ringed compounds that had %11 up to %14 labeled rings coordinated to 
zinc had issues
aromatic carbocations [c+] had issues

As a side note, I attempted reading in the 2D sdf file that chembl supplies.  I 
was able to reduce the failed molecules to 253.
There were still many warnings about stereochemistry being ambiguous and 
strange tags like STY at the end of the molecules.

Brian

From: Konrad Koehler [mailto:konrad.koeh...@me.com]
Sent: Monday, August 07, 2017 11:25
To: Bennion, Brian mailto:benni...@llnl.gov>>
Subject: Re: [Rdkit-discuss] using rdkit to read in chembl23 1.7 million 
compounds

Hi Brain,

Similar problems here in trying to read, fragment, and canonicalize the Zinc 
“In Stock” database of roughly one million compounds. Most of the problematic 
structures contained aromatic sulfur atoms.  (Thiophene itself is no problem.  
Most of the crashes were from more complex heteroaromatic systems containing 
sulfur). Filtering the input file to remove SMILES strings with lowercase “s” 
allowed me to process the rest of the file without RDKit crashing.

Cheers,

Konrad

crash dump:

Can't kekulize mol.
child_node = AllChem.CanonSmiles(child_node)
  File 
"/Users/konradkoehler/anaconda/lib/python2.7/site-packages/rdkit/Chem/__init__.py",
 line 43, in CanonSmiles
return MolToSmiles(m, useChiral)
Boost.Python.ArgumentError: Python argument types in
rdkit.Chem.rdmolfiles.MolToSmiles(NoneType, int)
did not match C++ signature:
MolToSmiles(RDKit::ROMol mol, bool isomericSmiles=False, bool 
kekuleSmiles=False, int rootedAtAtom=-1, bool canonical=True, bool 
allBondsExplicit=False, bool allHsExplicit=False)


On 7 Aug 2017, at 18:36, Bennion, Brian 
mailto:benni...@llnl.gov>> wrote:

Hello,

This might be a nit picky question.  I am attempting to read in the smiles 
string for the 1.7 million non-biological compounds in the latest chembl23 
release.  As it turns out 382 compounds fail to be read by RDkit.
The errors are either kekulization failure or valence errors.

Has anyone attempted this task before?
Brian

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! 
http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] using rdkit to read in chembl23 1.7 million compounds

2017-08-07 Thread Greg Landrum
Hi Brian,

It's not that surprising. The RDKit is stricter about allowing unreasonable
chemistry that the tool the ChEMBL group uses to produce SMILES or mol
blocks.
There are always some molecules that the RDKit just won't process.

If you are concerned and see any in that group of failures that you think
should have been processed, please let me know.

-greg


On Mon, Aug 7, 2017 at 6:36 PM, Bennion, Brian  wrote:

> Hello,
>
>
>
> This might be a nit picky question.  I am attempting to read in the smiles
> string for the 1.7 million non-biological compounds in the latest chembl23
> release.  As it turns out 382 compounds fail to be read by RDkit.
>
> The errors are either kekulization failure or valence errors.
>
>
>
> Has anyone attempted this task before?
>
> Brian
>
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] using rdkit to read in chembl23 1.7 million compounds

2017-08-07 Thread Bennion, Brian
Hello,

This might be a nit picky question.  I am attempting to read in the smiles 
string for the 1.7 million non-biological compounds in the latest chembl23 
release.  As it turns out 382 compounds fail to be read by RDkit.
The errors are either kekulization failure or valence errors.

Has anyone attempted this task before?
Brian

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] adding fragment to existing molecule

2017-08-07 Thread Per Jr. Greisen
Hi Nikolaus and Ling,

Thanks for your help (the atom numbe shouldnt be 43 but it still gives the
error I will clarify)- yes Nikolaus you are right it is a sanitization
issue and in this case I am trying to use it as a molecular editor to build
a model molecule (a transition state model to be exact) - I would normally
do this calling some other script but it would be very nice to do all of it
in the framework of RDkit - can this be done? Thanks

On Mon, Aug 7, 2017 at 12:05 PM, Stiefl, Nikolaus <
nikolaus.sti...@novartis.com> wrote:

> Hi Per
>
> Just by looking at your code I would assume you have a sanitization issue.
> You create your pentane molecule and then add H’s. This will saturate each
> single carbon. When you then add a bond between the two fragments your atom
> 3 will have a valence of 5 and this causes issues.
>
> Maybe do the fragment combination first and then add the H’s? Or do an
> explicit handling of the correct carbon you link to upfront.
>
> Hope this helps
>
> Nik
>
>
>
>
>
> *From: *"Per Jr. Greisen" 
> *Date: *Sunday 6 August 2017 at 19:55
> *To: *RDKit 
> *Subject: *[Rdkit-discuss] adding fragment to existing molecule
>
>
>
> Hi all,
>
>
>
> I am trying to add a fragment to an existing molecule using RDkit - I
> start by generating the desired molecules I would like to combine:
>
>
>
> oh = '[OH-]'
>
> ohh = Chem.MolFromSmiles(oh)
>
> oh = Chem.AddHs(ohh)
>
> oh.SetProp("_Name","OH-")
>
> AllChem.EmbedMolecule(oh, AllChem.ETKDG())
>
>
>
> smiles_ = 'C'
>
> m = Chem.MolFromSmiles(smiles_)
>
> m_h = Chem.AddHs(vxm)
>
> m_h.SetProp("_Name","XP")
>
> AllChem.EmbedMolecule(m_h, AllChem.ETKDG())
>
>
>
> I combine them which works fine:
>
>
>
> combo = Chem.CombineMols(m_h,oh)
>
>
>
> and I can add the bond between the desired atoms:
>
>
>
>
>
> edcombo = Chem.EditableMol(combo)
>
>
>
> edcombo.AddBond(3,1,order=Chem.rdchem.BondType.SINGLE)
>
> back = edcombo.GetMol()
>
>
>
> The problems arises when I want to edit the geometry between the two :
>
>
>
> from rdkit.Chem import rdMolTransforms as rdmt
>
> conf = back.GetConformer(0)
>
>
>
> rdmt.SetBondLength(conf,3,43,10)
>
>
>
> writer3 = Chem.SDWriter('out_long.sdf')
>
> writer3.write(back,confId=0)
>
>
>
>
>
>
>
> RuntimeError  Traceback (most recent call last)
>
>  in ()
>
> *  2* conf = back.GetConformer(0)
>
> *  3*
>
> > 4 rdmt.SetBondLength(conf,3,43,10)
>
> *  5*
>
> *  6* writer3 = Chem.SDWriter('out_long.sdf')
>
>
>
> RuntimeError: Pre-condition Violation
>
> RingInfo not initialized
>
> Violation occurred on line 66 in file Code/GraphMol/RingInfo.cpp
>
> Failed Expression: df_init
>
> RDKIT: 2017.03.3
>
> BOOST: 1_56
>
>
>
> So I am not sure how fix - thanks in advance
>
>
>
>
>
> --
>
> With kind regards
>
>
> Per
>



-- 
With kind regards

Per
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] adding fragment to existing molecule

2017-08-07 Thread Stiefl, Nikolaus
Hi Per
Just by looking at your code I would assume you have a sanitization issue. You 
create your pentane molecule and then add H’s. This will saturate each single 
carbon. When you then add a bond between the two fragments your atom 3 will 
have a valence of 5 and this causes issues.
Maybe do the fragment combination first and then add the H’s? Or do an explicit 
handling of the correct carbon you link to upfront.
Hope this helps
Nik


From: "Per Jr. Greisen" 
Date: Sunday 6 August 2017 at 19:55
To: RDKit 
Subject: [Rdkit-discuss] adding fragment to existing molecule

Hi all,

I am trying to add a fragment to an existing molecule using RDkit - I start by 
generating the desired molecules I would like to combine:

oh = '[OH-]'
ohh = Chem.MolFromSmiles(oh)
oh = Chem.AddHs(ohh)
oh.SetProp("_Name","OH-")
AllChem.EmbedMolecule(oh, AllChem.ETKDG())

smiles_ = 'C'
m = Chem.MolFromSmiles(smiles_)
m_h = Chem.AddHs(vxm)
m_h.SetProp("_Name","XP")
AllChem.EmbedMolecule(m_h, AllChem.ETKDG())

I combine them which works fine:

combo = Chem.CombineMols(m_h,oh)

and I can add the bond between the desired atoms:


edcombo = Chem.EditableMol(combo)

edcombo.AddBond(3,1,order=Chem.rdchem.BondType.SINGLE)
back = edcombo.GetMol()

The problems arises when I want to edit the geometry between the two :

from rdkit.Chem import rdMolTransforms as rdmt
conf = back.GetConformer(0)

rdmt.SetBondLength(conf,3,43,10)

writer3 = Chem.SDWriter('out_long.sdf')
writer3.write(back,confId=0)




RuntimeError  Traceback (most recent call last)

 in ()

  2 conf = back.GetConformer(0)

  3

> 4 rdmt.SetBondLength(conf,3,43,10)

  5

  6 writer3 = Chem.SDWriter('out_long.sdf')



RuntimeError: Pre-condition Violation

RingInfo not initialized

Violation occurred on line 66 in file Code/GraphMol/RingInfo.cpp

Failed Expression: df_init

RDKIT: 2017.03.3

BOOST: 1_56

So I am not sure how fix - thanks in advance


--
With kind regards

Per
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss