Re: [Rdkit-discuss] Molecule losing properties
Joos, I'm glad you found the issue. Perhaps GetMolFrags should retain or have an option to retain public properties such as sd data. Brian Kelley > On Jan 21, 2016, at 8:14 AM, Joos Kienerwrote: > > Hi Brian, > > thanks for your reply. I now figured out the issue. The SDF I load has a few > multi-component entries and I wanted to just look at the first component to > avoid any issues with such molecules. > > hence I had following step: > > mols = [Chem.GetMolFrags(x, asMols=True)[0] for x in mols] > > And this then breaks property for all molecules that where multi-component > but not for the other ones. > > I fixed it by reassigning properties. If anyone know a nicer way to do this > would also be good: > > for idx in range(0,len(mols)): > mol = mols[idx] > fragments = Chem.GetMolFrags(mol, asMols=True) > if len(fragments) > 1: > first_frag = fragments[0] > for prop in mol.GetPropNames(): > first_frag.SetProp(prop, mol.GetProp(prop)) > mols[idx]=first_frag > > > Best Regards, > > Joos > > 2016-01-21 13:26 GMT+01:00 Brian Kelley : >> Joos, >> >> In your second loop, could you "print repr(prop)"as opposed to "print >> prop" It could be that the name actually has a space in it which the sd >> format supports and can drive one to distraction. >> >> >> Brian Kelley >> >>> On Jan 21, 2016, at 2:11 AM, Joos Kiener wrote: >>> >>> Hi all, >>> >>> I have a strange issue. I'm trying to display pairs of molecules (the pair >>> has a certain similarity threshold) and show a property for both molecules. >>> This is in IPyhton Notebook. >>> >>> The weird thing is the first molecule of the pair loses all properties: >>> >>> toShow=[] >>> lbls=[] >>> for idx in pairs: >>> did=dindices[idx] >>> mol1=und[did[0]] # und = list of molecules loaded from sd-file >>> mol2=und[did[1]] >>> toShow.append(mol1) >>> toShow.append(mol2) >>> lbls.append('Active: %.2f'%mol1.GetProp('Activ')) >>> lbls.append('Active: %.2f'%mol2.GetProp('Activ')) >>> Draw.MolsToGridImage(toShow,molsPerRow=2,legends=lbls) >>> --- >>> KeyError Traceback (most recent call last) >>> in () >>> 7 toShow.append(mol1) >>> 8 toShow.append(mol2) >>> > 9 lbls.append('Active: %.2f'%mol1.GetProp('Activ')) >>> 10 lbls.append('Active: %.2f'%mol2.GetProp('Activ')) >>> 11 Draw.MolsToGridImage(toShow,molsPerRow=2,legends=lbls) >>> >>> KeyError: 'Activ' >>> >>> >>> If I change the code (remove the label) and print all properties of mol1, >>> the are displayed correctly. >>> >>> toShow=[] >>> lbls=[] >>> for idx in pairs: >>> did=dindices[idx] >>> mol1=und[did[0]] >>> mol2=und[did[1]] >>> toShow.append(mol1) >>> toShow.append(mol2) >>> for prop in mol1.GetPropNames(): >>> print prop + ": " + mol1.GetProp(prop) >>> #lbls.append('Active: %.2f'%mol1.GetProp('Activ')) >>> #lbls.append('Active: %.2f'%mol2.GetProp('Activ')) >>> Draw.MolsToGridImage(toShow,molsPerRow=2) >>> >>> This shows all the properties of mol1 plus draws the grid. No error. >>> >>> However directly accessing the property by name fails with key error: >>> toShow=[] >>> lbls=[] >>> for idx in pairs: >>> did=dindices[idx] >>> mol1=und[did[0]] >>> mol2=und[did[1]] >>> toShow.append(mol1) >>> toShow.append(mol2) >>> print mol1.GetProp('Activ') >>> #lbls.append('Active: %.2f'%mol1.GetProp('Activ')) >>> #lbls.append('Active: %.2f'%mol2.GetProp('Activ')) >>> Draw.MolsToGridImage(toShow,molsPerRow=2) >>> --- >>> KeyError Traceback (most recent call last) >>> in () >>> 7 toShow.append(mol1) >>> 8 toShow.append(mol2) >>> > 9 print mol1.GetProp('Activ') >>> 10 #lbls.append('Active: %.2f'%mol1.GetProp('Activ')) >>> 11 #lbls.append('Active: %.2f'%mol2.GetProp('Activ')) >>> >>> KeyError: 'Activ' >>> >>> This all works fine for mol2: >>> >>> >>> toShow=[] >>> lbls=[] >>> for idx in pairs: >>> did=dindices[idx] >>> mol1=und[did[0]] >>> mol2=und[did[1]] >>> toShow.append(mol1) >>> toShow.append(mol2) >>> print mol2.GetProp('Activ') >>> #lbls.append('Active: %.2f'%mol1.GetProp('Activ')) >>> #lbls.append('Active: %.2f'%mol2.GetProp('Activ')) >>> Draw.MolsToGridImage(toShow,molsPerRow=2) >>> 2.5 >>> 7.7 >>> 10.93 >>> 2.0434 >>> 190.0 >>> 25.0 >>> ... >>> What is going on here??? How can I resolve this? >>> Best Regards, >>> >>> Joos >>> -- >>> Site24x7 APM Insight: Get Deep Visibility into Application Performance >>> APM + Mobile APM + RUM: Monitor
Re: [Rdkit-discuss] Molecule losing properties
Joos, In your second loop, could you "print repr(prop)"as opposed to "print prop" It could be that the name actually has a space in it which the sd format supports and can drive one to distraction. Brian Kelley > On Jan 21, 2016, at 2:11 AM, Joos Kienerwrote: > > Hi all, > > I have a strange issue. I'm trying to display pairs of molecules (the pair > has a certain similarity threshold) and show a property for both molecules. > This is in IPyhton Notebook. > > The weird thing is the first molecule of the pair loses all properties: > > toShow=[] > lbls=[] > for idx in pairs: > did=dindices[idx] > mol1=und[did[0]] # und = list of molecules loaded from sd-file > mol2=und[did[1]] > toShow.append(mol1) > toShow.append(mol2) > lbls.append('Active: %.2f'%mol1.GetProp('Activ')) > lbls.append('Active: %.2f'%mol2.GetProp('Activ')) > Draw.MolsToGridImage(toShow,molsPerRow=2,legends=lbls) > --- > KeyError Traceback (most recent call last) > in () > 7 toShow.append(mol1) > 8 toShow.append(mol2) > > 9 lbls.append('Active: %.2f'%mol1.GetProp('Activ')) > 10 lbls.append('Active: %.2f'%mol2.GetProp('Activ')) > 11 Draw.MolsToGridImage(toShow,molsPerRow=2,legends=lbls) > > KeyError: 'Activ' > > > If I change the code (remove the label) and print all properties of mol1, the > are displayed correctly. > > toShow=[] > lbls=[] > for idx in pairs: > did=dindices[idx] > mol1=und[did[0]] > mol2=und[did[1]] > toShow.append(mol1) > toShow.append(mol2) > for prop in mol1.GetPropNames(): > print prop + ": " + mol1.GetProp(prop) > #lbls.append('Active: %.2f'%mol1.GetProp('Activ')) > #lbls.append('Active: %.2f'%mol2.GetProp('Activ')) > Draw.MolsToGridImage(toShow,molsPerRow=2) > > This shows all the properties of mol1 plus draws the grid. No error. > > However directly accessing the property by name fails with key error: > toShow=[] > lbls=[] > for idx in pairs: > did=dindices[idx] > mol1=und[did[0]] > mol2=und[did[1]] > toShow.append(mol1) > toShow.append(mol2) > print mol1.GetProp('Activ') > #lbls.append('Active: %.2f'%mol1.GetProp('Activ')) > #lbls.append('Active: %.2f'%mol2.GetProp('Activ')) > Draw.MolsToGridImage(toShow,molsPerRow=2) > --- > KeyError Traceback (most recent call last) > in () > 7 toShow.append(mol1) > 8 toShow.append(mol2) > > 9 print mol1.GetProp('Activ') > 10 #lbls.append('Active: %.2f'%mol1.GetProp('Activ')) > 11 #lbls.append('Active: %.2f'%mol2.GetProp('Activ')) > > KeyError: 'Activ' > > This all works fine for mol2: > > > toShow=[] > lbls=[] > for idx in pairs: > did=dindices[idx] > mol1=und[did[0]] > mol2=und[did[1]] > toShow.append(mol1) > toShow.append(mol2) > print mol2.GetProp('Activ') > #lbls.append('Active: %.2f'%mol1.GetProp('Activ')) > #lbls.append('Active: %.2f'%mol2.GetProp('Activ')) > Draw.MolsToGridImage(toShow,molsPerRow=2) > 2.5 > 7.7 > 10.93 > 2.0434 > 190.0 > 25.0 > ... > What is going on here??? How can I resolve this? > Best Regards, > > Joos > -- > Site24x7 APM Insight: Get Deep Visibility into Application Performance > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month > Monitor end-to-end web transactions and take corrective actions now > Troubleshoot faster and improve end-user experience. Signup Now! > http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140 > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Molecular Fragments Invariant Violation: Problem solved
Hi, Problem solved (the sample script was missing a few lines of code). Sorry about that. A fully functional script is below. Best regards, Konrad = begin active_fragments.py = from rdkit import Chem from rdkit.ML.InfoTheory import InfoBitRanker from rdkit.Chem import FragmentCatalog from rdkit import RDConfig import os suppl = Chem.SDMolSupplier('bzr.sdf') sdms = [x for x in suppl] acts = [float(x.GetProp('ACTIVITY')) for x in sdms] fName=os.path.join(RDConfig.RDDataDir,'FunctionalGroups.txt') fparams = FragmentCatalog.FragCatParams(1,6,fName) # fparams.GetNumFuncGroups() fcat = FragmentCatalog.FragCatalog(fparams) fcgen = FragmentCatalog.FragCatGenerator() fpgen = FragmentCatalog.FragFPGenerator() for m in sdms: nAdded=fcgen.AddFragsFromMol(m,fcat) fps = [fpgen.GetFPForMol(x,fcat) for x in sdms] ranker = InfoBitRanker(len(fps[0]),2) for i,fp in enumerate(fps): act = int(acts[i]>7) ranker.AccumulateVotes(fp,act) top5 = ranker.GetTopN(5) for id,gain,n0,n1 in top5: print(int(id),'%.3f '%gain,int(n0),int(n1)) = end active_fragments.py = -- Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140 ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Fingerprints and explicit Hydrogens
Joos, In that workbook, the code that generates 3D conformations (in block [6]) adds Hs to the 2D molecule before doing the conformation generation. This is essential to generate realistic conformations. The Hs are left on through the UFF minimization of the structures (also in block [6]), but are then removed before any fingerprints are generated. If I'm reading the notebook properly (it's been a while since I generated it), fingerprints are always generated for molecules without Hs. -greg On Thu, Jan 21, 2016 at 2:14 AM, Joos Kienerwrote: > Hi Greg, > > thanks for your prompt reply. > > What added to my confusion was the comparing of AtomPair fingerprints in > 2D and 3D eg: > > > http://nbviewer.jupyter.org/github/greglandrum/rdkit_blog/blob/master/notebooks/Atom%20Pair%20Fingerprints.ipynb > > So if I understand you correctly here you need the Hs in 2D because you > have them present in 3D? > And if you use AtomPair FP in 2D only, you do not need hydrogens? > > Best Regards, > > Joos > > 2016-01-20 14:19 GMT+01:00 Greg Landrum : > >> Hi Joos, >> >> As long as you are sure to be consistent, it is certainly ok to generate >> fingerprints for molecules with Hs still attached, but it's very easy to >> make a mistake. >> >> The default behavior of the RDKit is to remove Hs. This is what I would >> recommend before doing things like generating fingerprints or descriptors. >> >> >> -greg >> >> >> On Wed, Jan 20, 2016 at 7:06 AM, Joos Kiener >> wrote: >> >>> Hi all, >>> >>> I've been looking at different Fingerprints within the RDKit when I >>> realized, that it matters for many of them whether Hydrogens are >>> explicitly present or not. This probably was obvious and clear for many of >>> you but I wasn't aware of that. >>> >>> To visualize what I mean please see below notebook: >>> >>> >>> http://nbviewer.jupyter.org/github/kienerj/notebooks/blob/master/Fingerprint%20Similarity%20-%20with%20and%20without%20hydrogens.ipynb >>> >>> Now my questions are: >>> >>> Should I always add hydrogens before generating fingerprints or should I >>> remove them? >>> >>> How is this handled in KNIME nodes? Do I need to perform the according >>> action (add/remove H) before generating the fingerprint? Or is this done >>> correctly already internally of the node? >>> >>> Thank you for your help. >>> >>> Best Regards, >>> >>> Joos >>> >>> >>> -- >>> Site24x7 APM Insight: Get Deep Visibility into Application Performance >>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month >>> Monitor end-to-end web transactions and take corrective actions now >>> Troubleshoot faster and improve end-user experience. Signup Now! >>> http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140 >>> ___ >>> Rdkit-discuss mailing list >>> Rdkit-discuss@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >>> >>> >> > -- Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Substructure subtraction in RDKit
Hi, I'm using the KNIME implementation to write my own nodes, and I'm running into an issue. For the process I'm trying to do I'm trying to subtract the MCS between two molecules away from the larger molecule, to leave a list of fragments. I'm aware of the substructure matching, but I'm not sure how to subtract the matching atoms from a molecule graph within RDKit. As I say, I'm working with the Java version, but any pointers towards the fucntions needed would be useful. At the moment I've got (in pseudo code) RWMol mol1a = RWMol.MolFromSmiles(reactant_string, 0, true); RWMol mol2a = RWMol.MolFromSmiles(product_string, 0, true); frag_bonds = mol2a.GetSubstructMatches(mol1a); But I'm unsure as to what to do with the array of matches to achieve what I want. Can I strip out the dummy atoms automatically, or is this something that is best achieved by processing the SMILES string? -- Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140 ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Substructure subtraction in RDKit
Without a concrete example, this solution may not be appropriate, but I believe the function you want is "ReplaceCore". ReplaceCore(...) ReplaceCore( (Mol)mol, (Mol)coreQuery [, (bool)replaceDummies=True [, (bool)labelByIndex=False [, (bool)requireDummyMatch=False]]]) -> Mol : Removes the core of a molecule and labels the sidechains with dummy atoms. I just have python available currently so this may not be appropriate, but here goes: >>> m1 = Chem.MolFromSmiles("Cc1c1N") >>> m2 = Chem.MolFromSmiles("c1c1") >>> mcs = MCS.FindMCS([m1, m2]) >>> frag = Chem.ReplaceCore(m1, Chem.MolFromSmarts(mcs.smarts)) >>> print "SideChains:", Chem.MolToSmiles(frag) SideChains: [*]C.[*]N I hope this helps (at least the steps). Now if you are just trying to extract side chains from the results of reactions, we have recently added helper functions to solve that (They should be exposed in the next release). ReduceProductToSideChains(...) ReduceProductToSideChains( (Mol)product [, (bool)addDummyAtoms=True]) -> Mol : reduce the product of a reaction to the side chains added by the reaction. The output is a molecule with attached wildcards indicating where the product was attached. The isotope of the dummy atom is the reaction map number of the product's atom (if available). If this would be useful, let us know, I would be happy to have a tester prior to release. Brian Kelley On Thu, Jan 21, 2016 at 9:41 AM, James Wallacewrote: > Hi, > I'm using the KNIME implementation to write my own nodes, and I'm > running into an issue. For the process I'm trying to do I'm trying to > subtract the MCS between two molecules away from the larger molecule, to > leave a list of fragments. I'm aware of the substructure matching, but > I'm not sure how to subtract the matching atoms from a molecule graph > within RDKit. As I say, I'm working with the Java version, but any > pointers towards the fucntions needed would be useful. At the moment > I've got (in pseudo code) > > RWMol mol1a = RWMol.MolFromSmiles(reactant_string, 0, true); > RWMol mol2a = RWMol.MolFromSmiles(product_string, 0, true); > > frag_bonds = mol2a.GetSubstructMatches(mol1a); > > But I'm unsure as to what to do with the array of matches to achieve > what I want. Can I strip out the dummy atoms automatically, or is this > something that is best achieved by processing the SMILES string? > > > -- > Site24x7 APM Insight: Get Deep Visibility into Application Performance > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month > Monitor end-to-end web transactions and take corrective actions now > Troubleshoot faster and improve end-user experience. Signup Now! > http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140 > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > -- Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] The Chlorine molfile question
On 01/20/2016 08:30 PM, Peter S. Shenkin wrote: > ... the problem that I thought we were trying to > address is rather the lack of extensibility, the lack of lower-case, the > fact that different users (even for deposited structures, IIRC) and > different software products overload the available fields differently (like > putting partial charge in the Temperature Factor field) and have violated > the standard by doing necessary but formally disallowed things ... PDB has a format, with API and everything, that takes care of all of that. It's called mmCIF. After 25 years (or however long it's been around) nobody uses it outside of PDB. I've seen this discussion countless times. It always does this exact circle. Everybody wants to *have* a better format. Nobody wants to *use* it because it's "too complex" and "too difficult". In the meantime we are left trying to guess whether a given "CA" stands for C-alpha or calcium. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss