Re: [Rdkit-discuss] Molecule losing properties

2016-01-21 Thread Brian Kelley
Joos,

  I'm glad you found the issue.  Perhaps GetMolFrags should retain or have an 
option to retain public properties such as sd data.


Brian Kelley

> On Jan 21, 2016, at 8:14 AM, Joos Kiener  wrote:
> 
> Hi Brian,
> 
> thanks for your reply. I now figured out the issue. The SDF I load has a few 
> multi-component entries and I wanted to just look at the first component to 
> avoid any issues with such molecules.
> 
> hence I had following step:
> 
> mols = [Chem.GetMolFrags(x, asMols=True)[0] for x in mols]
> 
> And this then breaks property for all molecules that where multi-component 
> but not for the other ones.
> 
> I fixed it by reassigning properties. If anyone know a nicer way to do this 
> would also be good:
> 
> for idx in range(0,len(mols)):
> mol = mols[idx]
> fragments = Chem.GetMolFrags(mol, asMols=True)
> if len(fragments) > 1:
> first_frag = fragments[0]
> for prop in mol.GetPropNames():
> first_frag.SetProp(prop, mol.GetProp(prop))
> mols[idx]=first_frag
> 
> 
> Best Regards,
> 
> Joos
> 
> 2016-01-21 13:26 GMT+01:00 Brian Kelley :
>> Joos,
>> 
>>   In your second loop, could you "print repr(prop)"as opposed to "print 
>> prop"  It could be that the name actually has a space in it which the sd 
>> format supports and can drive one to distraction.
>> 
>> 
>> Brian Kelley
>> 
>>> On Jan 21, 2016, at 2:11 AM, Joos Kiener  wrote:
>>> 
>>> Hi all,
>>> 
>>> I have a strange issue. I'm trying to display pairs of molecules (the pair 
>>> has a certain similarity threshold) and show a property for both molecules. 
>>> This is in IPyhton Notebook.
>>> 
>>> The weird thing is the first molecule of the pair loses all properties:
>>> 
>>> toShow=[]
>>> lbls=[]
>>> for idx in pairs:
>>> did=dindices[idx]
>>> mol1=und[did[0]] # und = list of molecules loaded from sd-file
>>> mol2=und[did[1]]
>>> toShow.append(mol1)
>>> toShow.append(mol2)
>>> lbls.append('Active: %.2f'%mol1.GetProp('Activ'))
>>> lbls.append('Active: %.2f'%mol2.GetProp('Activ'))
>>> Draw.MolsToGridImage(toShow,molsPerRow=2,legends=lbls)
>>> ---
>>> KeyError  Traceback (most recent call last)
>>>  in ()
>>>   7 toShow.append(mol1)
>>>   8 toShow.append(mol2)
>>> > 9 lbls.append('Active: %.2f'%mol1.GetProp('Activ'))
>>>  10 lbls.append('Active: %.2f'%mol2.GetProp('Activ'))
>>>  11 Draw.MolsToGridImage(toShow,molsPerRow=2,legends=lbls)
>>> 
>>> KeyError: 'Activ'
>>> 
>>> 
>>> If I change the code (remove the label) and print all properties of mol1, 
>>> the are displayed correctly.
>>> 
>>> toShow=[]
>>> lbls=[]
>>> for idx in pairs:
>>> did=dindices[idx]
>>> mol1=und[did[0]]
>>> mol2=und[did[1]]
>>> toShow.append(mol1)
>>> toShow.append(mol2)
>>> for prop in mol1.GetPropNames():
>>> print prop + ": "  + mol1.GetProp(prop)
>>> #lbls.append('Active: %.2f'%mol1.GetProp('Activ'))
>>> #lbls.append('Active: %.2f'%mol2.GetProp('Activ'))
>>> Draw.MolsToGridImage(toShow,molsPerRow=2)
>>> 
>>> This shows all the properties of mol1 plus draws the grid. No error.
>>> 
>>> However directly accessing the property by name fails with key error:
>>> toShow=[]
>>> lbls=[]
>>> for idx in pairs:
>>> did=dindices[idx]
>>> mol1=und[did[0]]
>>> mol2=und[did[1]]
>>> toShow.append(mol1)
>>> toShow.append(mol2)
>>> print mol1.GetProp('Activ')
>>> #lbls.append('Active: %.2f'%mol1.GetProp('Activ'))
>>> #lbls.append('Active: %.2f'%mol2.GetProp('Activ'))
>>> Draw.MolsToGridImage(toShow,molsPerRow=2)
>>> ---
>>> KeyError  Traceback (most recent call last)
>>>  in ()
>>>   7 toShow.append(mol1)
>>>   8 toShow.append(mol2)
>>> > 9 print mol1.GetProp('Activ')
>>>  10 #lbls.append('Active: %.2f'%mol1.GetProp('Activ'))
>>>  11 #lbls.append('Active: %.2f'%mol2.GetProp('Activ'))
>>> 
>>> KeyError: 'Activ'
>>> 
>>> This all works fine for mol2:
>>> 
>>> 
>>> toShow=[]
>>> lbls=[]
>>> for idx in pairs:
>>> did=dindices[idx]
>>> mol1=und[did[0]]
>>> mol2=und[did[1]]
>>> toShow.append(mol1)
>>> toShow.append(mol2)
>>> print mol2.GetProp('Activ')
>>> #lbls.append('Active: %.2f'%mol1.GetProp('Activ'))
>>> #lbls.append('Active: %.2f'%mol2.GetProp('Activ'))
>>> Draw.MolsToGridImage(toShow,molsPerRow=2)
>>> 2.5 
>>> 7.7 
>>> 10.93 
>>> 2.0434 
>>> 190.0 
>>> 25.0 
>>> ...
>>> What is going on here??? How can I resolve this?
>>> Best Regards,
>>> 
>>> Joos
>>> --
>>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
>>> APM + Mobile APM + RUM: Monitor 

Re: [Rdkit-discuss] Molecule losing properties

2016-01-21 Thread Brian Kelley
Joos,

  In your second loop, could you "print repr(prop)"as opposed to "print prop"  
It could be that the name actually has a space in it which the sd format 
supports and can drive one to distraction.


Brian Kelley

> On Jan 21, 2016, at 2:11 AM, Joos Kiener  wrote:
> 
> Hi all,
> 
> I have a strange issue. I'm trying to display pairs of molecules (the pair 
> has a certain similarity threshold) and show a property for both molecules. 
> This is in IPyhton Notebook.
> 
> The weird thing is the first molecule of the pair loses all properties:
> 
> toShow=[]
> lbls=[]
> for idx in pairs:
> did=dindices[idx]
> mol1=und[did[0]] # und = list of molecules loaded from sd-file
> mol2=und[did[1]]
> toShow.append(mol1)
> toShow.append(mol2)
> lbls.append('Active: %.2f'%mol1.GetProp('Activ'))
> lbls.append('Active: %.2f'%mol2.GetProp('Activ'))
> Draw.MolsToGridImage(toShow,molsPerRow=2,legends=lbls)
> ---
> KeyError  Traceback (most recent call last)
>  in ()
>   7 toShow.append(mol1)
>   8 toShow.append(mol2)
> > 9 lbls.append('Active: %.2f'%mol1.GetProp('Activ'))
>  10 lbls.append('Active: %.2f'%mol2.GetProp('Activ'))
>  11 Draw.MolsToGridImage(toShow,molsPerRow=2,legends=lbls)
> 
> KeyError: 'Activ'
> 
> 
> If I change the code (remove the label) and print all properties of mol1, the 
> are displayed correctly.
> 
> toShow=[]
> lbls=[]
> for idx in pairs:
> did=dindices[idx]
> mol1=und[did[0]]
> mol2=und[did[1]]
> toShow.append(mol1)
> toShow.append(mol2)
> for prop in mol1.GetPropNames():
> print prop + ": "  + mol1.GetProp(prop)
> #lbls.append('Active: %.2f'%mol1.GetProp('Activ'))
> #lbls.append('Active: %.2f'%mol2.GetProp('Activ'))
> Draw.MolsToGridImage(toShow,molsPerRow=2)
> 
> This shows all the properties of mol1 plus draws the grid. No error.
> 
> However directly accessing the property by name fails with key error:
> toShow=[]
> lbls=[]
> for idx in pairs:
> did=dindices[idx]
> mol1=und[did[0]]
> mol2=und[did[1]]
> toShow.append(mol1)
> toShow.append(mol2)
> print mol1.GetProp('Activ')
> #lbls.append('Active: %.2f'%mol1.GetProp('Activ'))
> #lbls.append('Active: %.2f'%mol2.GetProp('Activ'))
> Draw.MolsToGridImage(toShow,molsPerRow=2)
> ---
> KeyError  Traceback (most recent call last)
>  in ()
>   7 toShow.append(mol1)
>   8 toShow.append(mol2)
> > 9 print mol1.GetProp('Activ')
>  10 #lbls.append('Active: %.2f'%mol1.GetProp('Activ'))
>  11 #lbls.append('Active: %.2f'%mol2.GetProp('Activ'))
> 
> KeyError: 'Activ'
> 
> This all works fine for mol2:
> 
> 
> toShow=[]
> lbls=[]
> for idx in pairs:
> did=dindices[idx]
> mol1=und[did[0]]
> mol2=und[did[1]]
> toShow.append(mol1)
> toShow.append(mol2)
> print mol2.GetProp('Activ')
> #lbls.append('Active: %.2f'%mol1.GetProp('Activ'))
> #lbls.append('Active: %.2f'%mol2.GetProp('Activ'))
> Draw.MolsToGridImage(toShow,molsPerRow=2)
> 2.5 
> 7.7 
> 10.93 
> 2.0434 
> 190.0 
> 25.0 
> ...
> What is going on here??? How can I resolve this?
> Best Regards,
> 
> Joos
> --
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Molecular Fragments Invariant Violation: Problem solved

2016-01-21 Thread Konrad Koehler
Hi,

Problem solved (the sample script was missing a few lines of code). Sorry about 
that.  A fully functional script is below.

Best regards,

Konrad

= begin active_fragments.py =

from rdkit import Chem
from rdkit.ML.InfoTheory import InfoBitRanker
from rdkit.Chem import FragmentCatalog
from rdkit import RDConfig
import os

suppl = Chem.SDMolSupplier('bzr.sdf')
sdms = [x for x in suppl]
acts = [float(x.GetProp('ACTIVITY')) for x in sdms]

fName=os.path.join(RDConfig.RDDataDir,'FunctionalGroups.txt')
fparams = FragmentCatalog.FragCatParams(1,6,fName)
# fparams.GetNumFuncGroups()

fcat = FragmentCatalog.FragCatalog(fparams)
fcgen = FragmentCatalog.FragCatGenerator()
fpgen = FragmentCatalog.FragFPGenerator()

for m in sdms: nAdded=fcgen.AddFragsFromMol(m,fcat)

fps = [fpgen.GetFPForMol(x,fcat) for x in sdms]
ranker = InfoBitRanker(len(fps[0]),2)

for i,fp in enumerate(fps):
act = int(acts[i]>7)
ranker.AccumulateVotes(fp,act)

top5 = ranker.GetTopN(5)
for id,gain,n0,n1 in top5:
print(int(id),'%.3f '%gain,int(n0),int(n1))

= end active_fragments.py =


--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Fingerprints and explicit Hydrogens

2016-01-21 Thread Greg Landrum
Joos,

In that workbook, the code that generates 3D conformations (in block [6])
adds Hs to the 2D molecule before doing the conformation generation. This
is essential to generate realistic conformations. The Hs are left on
through the UFF minimization of the structures (also in block [6]), but are
then removed before any fingerprints are generated.

If I'm reading the notebook properly (it's been a while since I generated
it), fingerprints are always generated for molecules without Hs.

-greg


On Thu, Jan 21, 2016 at 2:14 AM, Joos Kiener  wrote:

> Hi Greg,
>
> thanks for your prompt reply.
>
> What added to my confusion was the comparing of AtomPair fingerprints in
> 2D and 3D eg:
>
>
> http://nbviewer.jupyter.org/github/greglandrum/rdkit_blog/blob/master/notebooks/Atom%20Pair%20Fingerprints.ipynb
>
> So if I understand you correctly here you need the Hs in 2D because you
> have them present in 3D?
> And if you use AtomPair FP in 2D only, you do not need hydrogens?
>
> Best Regards,
>
> Joos
>
> 2016-01-20 14:19 GMT+01:00 Greg Landrum :
>
>> Hi Joos,
>>
>> As long as you are sure to be consistent, it is certainly ok to generate
>> fingerprints for molecules with Hs still attached, but it's very easy to
>> make a mistake.
>>
>> The default behavior of the RDKit is to remove Hs. This is what I would
>> recommend before doing things like generating fingerprints or descriptors.
>>
>>
>> -greg
>>
>>
>> On Wed, Jan 20, 2016 at 7:06 AM, Joos Kiener 
>> wrote:
>>
>>> Hi all,
>>>
>>> I've been looking at different Fingerprints within the RDKit when I
>>> realized, that it matters  for many of them whether Hydrogens are
>>> explicitly present or not. This probably was obvious and clear for many of
>>> you but I wasn't aware of that.
>>>
>>> To visualize what I mean please see below notebook:
>>>
>>>
>>> http://nbviewer.jupyter.org/github/kienerj/notebooks/blob/master/Fingerprint%20Similarity%20-%20with%20and%20without%20hydrogens.ipynb
>>>
>>> Now my questions are:
>>>
>>> Should I always add hydrogens before generating fingerprints or should I
>>> remove them?
>>>
>>> How is this handled in KNIME nodes? Do I need to perform the according
>>> action (add/remove H) before generating the fingerprint? Or is this done
>>> correctly already internally of the node?
>>>
>>> Thank you for your help.
>>>
>>> Best Regards,
>>>
>>> Joos
>>>
>>>
>>> --
>>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
>>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>>> Monitor end-to-end web transactions and take corrective actions now
>>> Troubleshoot faster and improve end-user experience. Signup Now!
>>> http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>>
>>
>
--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Substructure subtraction in RDKit

2016-01-21 Thread James Wallace
Hi,
I'm using the KNIME implementation to write my own nodes, and I'm 
running into an issue. For the process I'm trying to do I'm trying to 
subtract the MCS between two molecules away from the larger molecule, to 
leave a list of fragments. I'm aware of the substructure matching, but 
I'm not sure how to subtract the matching atoms from a molecule graph 
within RDKit. As I say, I'm working with the Java version, but any 
pointers towards the fucntions needed would be useful. At the moment 
I've got (in pseudo code)

 RWMol mol1a = RWMol.MolFromSmiles(reactant_string, 0, true);
 RWMol mol2a = RWMol.MolFromSmiles(product_string, 0, true);

 frag_bonds = mol2a.GetSubstructMatches(mol1a);

But I'm unsure as to what to do with the array of matches to achieve 
what I want. Can I strip out the dummy atoms automatically, or is this 
something that is best achieved by processing the SMILES string?

--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Substructure subtraction in RDKit

2016-01-21 Thread Brian Kelley
Without a concrete example, this solution may not be appropriate, but I
believe the function you want is "ReplaceCore".

ReplaceCore(...)

ReplaceCore( (Mol)mol, (Mol)coreQuery [, (bool)replaceDummies=True [,
(bool)labelByIndex=False [, (bool)requireDummyMatch=False]]]) -> Mol :

Removes the core of a molecule and labels the sidechains with dummy
atoms.


I just have python available currently so this may not be appropriate, but
here goes:

>>> m1 = Chem.MolFromSmiles("Cc1c1N")

>>> m2 = Chem.MolFromSmiles("c1c1")

>>> mcs = MCS.FindMCS([m1, m2])

>>> frag = Chem.ReplaceCore(m1, Chem.MolFromSmarts(mcs.smarts))

>>> print "SideChains:", Chem.MolToSmiles(frag)

SideChains: [*]C.[*]N

I hope this helps (at least the steps).

Now if you are just trying to extract side chains from the results of
reactions, we have recently added helper functions to solve that (They
should be exposed in the next release).


ReduceProductToSideChains(...)

ReduceProductToSideChains( (Mol)product [, (bool)addDummyAtoms=True])
-> Mol :

reduce the product of a reaction to the side chains added by the
reaction.

 The output is a molecule with attached wildcards indicating where the
product was attached.  The isotope of the dummy atom is the reaction map
number of the product's atom (if available).

If this would be useful, let us know, I would be happy to have a tester
prior to release.

Brian Kelley

On Thu, Jan 21, 2016 at 9:41 AM, James Wallace 
wrote:

> Hi,
> I'm using the KNIME implementation to write my own nodes, and I'm
> running into an issue. For the process I'm trying to do I'm trying to
> subtract the MCS between two molecules away from the larger molecule, to
> leave a list of fragments. I'm aware of the substructure matching, but
> I'm not sure how to subtract the matching atoms from a molecule graph
> within RDKit. As I say, I'm working with the Java version, but any
> pointers towards the fucntions needed would be useful. At the moment
> I've got (in pseudo code)
>
>  RWMol mol1a = RWMol.MolFromSmiles(reactant_string, 0, true);
>  RWMol mol2a = RWMol.MolFromSmiles(product_string, 0, true);
>
>  frag_bonds = mol2a.GetSubstructMatches(mol1a);
>
> But I'm unsure as to what to do with the array of matches to achieve
> what I want. Can I strip out the dummy atoms automatically, or is this
> something that is best achieved by processing the SMILES string?
>
>
> --
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] The Chlorine molfile question

2016-01-21 Thread Dimitri Maziuk
On 01/20/2016 08:30 PM, Peter S. Shenkin wrote:

> ... the problem that I thought we were trying to
> address is rather the lack of extensibility, the lack of lower-case, the
> fact that different users (even for deposited structures, IIRC) and
> different software products overload the available fields differently (like
> putting partial charge in the Temperature Factor field) and have violated
> the standard by doing necessary but formally disallowed things ...

PDB has a format, with API and everything, that takes care of all of
that. It's called mmCIF. After 25 years (or however long it's been
around) nobody uses it outside of PDB.

I've seen this discussion countless times. It always does this exact
circle. Everybody wants to *have* a better format. Nobody wants to *use*
it because it's "too complex" and "too difficult".

In the meantime we are left trying to guess whether a given "CA" stands
for C-alpha or calcium.
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss