[Rdkit-discuss] Molecular Fragments Invariant Violation
Hi, First of all, thanks to the rdkit developers for making available this incredibly powerful package. I am trying to get an example script taken from the rdkit documentation to work, and it is generating an "Invariant Violation" error. The example script and the exact error message it generates is listed below. Any ideas on what is causing this error? Best regards, Konrad Environment: Rdkit version 2014.09.2, Python 2.7.9, running on Mac OS X, installed with “brew install --HEAD rdkit”. Example script taken from: "identify fragments that distinguish actives from inactive”, Getting Started with the RDKit in Python, Release 2015.09.1, page 54. begin active_fragments.py === import os from rdkit import Chem from rdkit.ML.InfoTheory import InfoBitRanker from rdkit.Chem import FragmentCatalog from rdkit import RDConfig fName=os.path.join(RDConfig.RDDataDir,'FunctionalGroups.txt') fparams = FragmentCatalog.FragCatParams(1,6,fName) # fparams.GetNumFuncGroups() fcat = FragmentCatalog.FragCatalog(fparams) fpgen = FragmentCatalog.FragFPGenerator() suppl = Chem.SDMolSupplier('bzr.sdf') sdms = [x for x in suppl] acts = [float(x.GetProp('ACTIVITY')) for x in sdms] fps = [fpgen.GetFPForMol(x,fcat) for x in sdms] ranker = InfoBitRanker(len(fps[0]),2) for i,fp in enumerate(fps): act = int(acts[i]>7) ranker.AccumulateVotes(fp,act) top5 = ranker.GetTopN(5) for id,gain,n0,n1 in top5: print(int(id),'%.3f '%gain,int(n0),int(n1)) end active_fragments.py === Invariant Violation catalog does not contain any entries of the order specified Violation occurred on line 424 in file /tmp/rdkit-UIZPcR/Code/Catalogs/Catalog.h Failed Expression: elem!=d_orderMap.end() Traceback (most recent call last): File "active_fragments.py", line 18, in fps = [fpgen.GetFPForMol(x,fcat) for x in sdms] RuntimeError: Invariant Violation -- Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Fingerprints and explicit Hydrogens
Hi Greg, thanks for your prompt reply. What added to my confusion was the comparing of AtomPair fingerprints in 2D and 3D eg: http://nbviewer.jupyter.org/github/greglandrum/rdkit_blog/blob/master/notebooks/Atom%20Pair%20Fingerprints.ipynb So if I understand you correctly here you need the Hs in 2D because you have them present in 3D? And if you use AtomPair FP in 2D only, you do not need hydrogens? Best Regards, Joos 2016-01-20 14:19 GMT+01:00 Greg Landrum : > Hi Joos, > > As long as you are sure to be consistent, it is certainly ok to generate > fingerprints for molecules with Hs still attached, but it's very easy to > make a mistake. > > The default behavior of the RDKit is to remove Hs. This is what I would > recommend before doing things like generating fingerprints or descriptors. > > > -greg > > > On Wed, Jan 20, 2016 at 7:06 AM, Joos Kiener > wrote: > >> Hi all, >> >> I've been looking at different Fingerprints within the RDKit when I >> realized, that it matters for many of them whether Hydrogens are >> explicitly present or not. This probably was obvious and clear for many of >> you but I wasn't aware of that. >> >> To visualize what I mean please see below notebook: >> >> >> http://nbviewer.jupyter.org/github/kienerj/notebooks/blob/master/Fingerprint%20Similarity%20-%20with%20and%20without%20hydrogens.ipynb >> >> Now my questions are: >> >> Should I always add hydrogens before generating fingerprints or should I >> remove them? >> >> How is this handled in KNIME nodes? Do I need to perform the according >> action (add/remove H) before generating the fingerprint? Or is this done >> correctly already internally of the node? >> >> Thank you for your help. >> >> Best Regards, >> >> Joos >> >> >> -- >> Site24x7 APM Insight: Get Deep Visibility into Application Performance >> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month >> Monitor end-to-end web transactions and take corrective actions now >> Troubleshoot faster and improve end-user experience. Signup Now! >> http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140 >> ___ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> >> > -- Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Molecule losing properties
Hi all, I have a strange issue. I'm trying to display pairs of molecules (the pair has a certain similarity threshold) and show a property for both molecules. This is in IPyhton Notebook. The weird thing is the first molecule of the pair loses all properties: toShow=[] lbls=[] for idx in pairs: did=dindices[idx] mol1=und[did[0]] # und = list of molecules loaded from sd-file mol2=und[did[1]] toShow.append(mol1) toShow.append(mol2) lbls.append('Active: %.2f'%mol1.GetProp('Activ')) lbls.append('Active: %.2f'%mol2.GetProp('Activ')) Draw.MolsToGridImage(toShow,molsPerRow=2,legends=lbls) ---KeyError Traceback (most recent call last) in () 7 toShow.append(mol1) 8 toShow.append(mol2)> 9 lbls.append('Active: %.2f'%mol1.GetProp('Activ')) 10 lbls.append('Active: %.2f'%mol2.GetProp('Activ')) 11 Draw.MolsToGridImage(toShow,molsPerRow=2,legends=lbls) KeyError: 'Activ' If I change the code (remove the label) and print all properties of mol1, the are displayed correctly. toShow=[] lbls=[] for idx in pairs: did=dindices[idx] mol1=und[did[0]] mol2=und[did[1]] toShow.append(mol1) toShow.append(mol2) for prop in mol1.GetPropNames(): print prop + ": " + mol1.GetProp(prop) #lbls.append('Active: %.2f'%mol1.GetProp('Activ')) #lbls.append('Active: %.2f'%mol2.GetProp('Activ')) Draw.MolsToGridImage(toShow,molsPerRow=2) This shows all the properties of mol1 plus draws the grid. No error. However directly accessing the property by name fails with key error: toShow=[] lbls=[] for idx in pairs: did=dindices[idx] mol1=und[did[0]] mol2=und[did[1]] toShow.append(mol1) toShow.append(mol2) print mol1.GetProp('Activ') #lbls.append('Active: %.2f'%mol1.GetProp('Activ')) #lbls.append('Active: %.2f'%mol2.GetProp('Activ')) Draw.MolsToGridImage(toShow,molsPerRow=2) ---KeyError Traceback (most recent call last) in () 7 toShow.append(mol1) 8 toShow.append(mol2)> 9 print mol1.GetProp('Activ') 10 #lbls.append('Active: %.2f'%mol1.GetProp('Activ')) 11 #lbls.append('Active: %.2f'%mol2.GetProp('Activ')) KeyError: 'Activ' This all works fine for mol2: toShow=[] lbls=[] for idx in pairs: did=dindices[idx] mol1=und[did[0]] mol2=und[did[1]] toShow.append(mol1) toShow.append(mol2) print mol2.GetProp('Activ') #lbls.append('Active: %.2f'%mol1.GetProp('Activ')) #lbls.append('Active: %.2f'%mol2.GetProp('Activ')) Draw.MolsToGridImage(toShow,molsPerRow=2) 2.5 7.7 10.93 2.0434 190.0 25.0 ... What is going on here??? How can I resolve this? Best Regards, Joos -- Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] The Chlorine molfile question
On Wed, Jan 20, 2016 at 7:42 PM, Dimitri Maziuk wrote: > On 01/20/2016 04:57 PM, Peter S. Shenkin wrote: > > On Wed, Jan 20, 2016 at 5:33 PM, Dimitri Maziuk > > wrote: > > >> JSON encodes a single string. That is a problem for sending larger files > >> over the net, say, an NMR structure of a larger molecule with 100 models > >> in the file. > >> > > > > That's not a problem, conceptually, because you can have an array of > > structures. > > No, my point was that streaming isn't a part of JSON specification and > common implementations do not offer it. > > https://en.wikipedia.org/wiki/JSON_Streaming You can cut one model out of a PDB file (or one structure out of and > SDF) and the result is a valid file. > If each array element was complete, the same would be true here. A pdb-aware JSON API could wrap a streaming unpacker around a batch implementation of choice. > In ASN.1 the length of the value is at the front. I believe that depends on the encoding, and in any case, streaming asn.1 decoders are available. But none are freeware, as far as I know. have a file full of "disjoint" single structures, possibly with > some kind of metadata header. (I haven't touched ASN.1 since school, so > don't quote me on this.) > Yes, I think that's right, though I've not used ASN.1 for a long time either. Oh wait, that sounds exactly like PDB with its REMARKs and MODELs. > No, it doesn't, because the problem that I thought we were trying to address is rather the lack of extensibility, the lack of lower-case, the fact that different users (even for deposited structures, IIRC) and different software products overload the available fields differently (like putting partial charge in the Temperature Factor field) and have violated the standard by doing necessary but formally disallowed things such as using multiple CONECT fields to indicate multiple bonds. Having said all this, it would suffices to write APIs that allow specification of a dialect (CHARMM, PDB_STD, etc.) and have a convention for returning all the contents in arrays, dictionaries, what have you, where the keys reflect the semantics of the dialect (like "partial_charge" or "T_factor"), and where the unused keys would return NULL. So then, a separate question is whether there also needs to be a serialized format for the resulting object that associated APIs can also read and write. 'Nuff said. (By me, at least, since I'm not volunteering to do it. :-) ) -P. -- Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] The Chlorine molfile question
On 01/20/2016 04:57 PM, Peter S. Shenkin wrote: > On Wed, Jan 20, 2016 at 5:33 PM, Dimitri Maziuk > wrote: >> JSON encodes a single string. That is a problem for sending larger files >> over the net, say, an NMR structure of a larger molecule with 100 models >> in the file. >> > > That's not a problem, conceptually, because you can have an array of > structures. No, my point was that streaming isn't a part of JSON specification and common implementations do not offer it. https://en.wikipedia.org/wiki/JSON_Streaming You can cut one model out of a PDB file (or one structure out of and SDF) and the result is a valid file. In ASN.1 the length of the value is at the front. If you define your array as sequence, a single structure pulled out of the middle should be OK, but the entire sequence is invalid until you read it to the end. I think in practice you wouldn't define your array as a sequence and instead have a file full of "disjoint" single structures, possibly with some kind of metadata header. (I haven't touched ASN.1 since school, so don't quote me on this.) Oh wait, that sounds exactly like PDB with its REMARKs and MODELs. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] The Chlorine molfile question
On Wed, Jan 20, 2016 at 5:33 PM, Dimitri Maziuk wrote: > On 01/20/2016 03:55 PM, Peter S. Shenkin wrote: > > On Wed, Jan 20, 2016 at 3:06 PM, Dimitri Maziuk > > wrote: > >> As much as PDB wants the old busted PDB format gone, they > >> are not offering a usable alternative that I know of. > > Such as: a JSON file ... > > JSON encodes a single string. That is a problem for sending larger files > over the net, say, an NMR structure of a larger molecule with 100 models > in the file. > That's not a problem, conceptually, because you can have an array of structures. Any hierarchical format has a verbosity problem due to duplication of metadata, but for use as an interchange format, I don't see this as a big problem. Actually, though, you could imagine having a JSON file encoding only the conventions to be used when reading or writing pdb files. So it would encode the dialect. Then APIs could be provided to read the dialect JSON file and then read or write the PDB file using the dialect -- similar, conceptually, to the way Python handles different dialects of .csv file. CSV is a good format for tabular data, and you can send rows > incrementally, but a typical application requires some small amount of > metadata as well. For example, the full sequence -- that does not fit > into a single-table format like CSV. Right, but each structure in the JSON array can have its own metadata blocks for data that are not 1:1 with atoms. ASN.1 provides a nice alternative but it has issues, too. (Mainly, the fact that it's fallen into obscurity.) -P. -- Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] The Chlorine molfile question
On 01/20/2016 03:55 PM, Peter S. Shenkin wrote: > On Wed, Jan 20, 2016 at 3:06 PM, Dimitri Maziuk > wrote: > >> As much as PDB wants the old busted PDB format gone, they > > are not offering a usable alternative that I know of. > > > Such as: a JSON file ... JSON encodes a single string. That is a problem for sending larger files over the net, say, an NMR structure of a larger molecule with 100 models in the file. CSV is a good format for tabular data, and you can send rows incrementally, but a typical application requires some small amount of metadata as well. For example, the full sequence -- that does not fit into a single-table format like CSV. And so on. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Atom Symbol Case in MolFile?
I think John has provided the solid argument for me to fix the reader so that it accepts this construct by default. I certainly won't write "CL". John, out of curiosity: how many of those applications would write "CL" back out again to a molfile? -greg On Wed, Jan 20, 2016 at 10:43 AM, John M wrote: > Correct message thread this time: > > The joys of the molfile - was curious whether it was accepted/correctly >> interpreted: > > >> >> ISIS Draw 2.5 Yes (arguably the arbitrator of the format) >> ChemDraw 15 Yes >> ChemDoodle No (accepted but only as a text label 'CL' no conversion) >> MarvinSketch Yes >> CDK Yes >> OEChem Yes >> Open Babel Yes >> Indigo Yes > > > J > > > On 20 January 2016 at 05:26, Greg Landrum wrote: > >> Paul, >> >> On Tue, Jan 19, 2016 at 7:59 PM, Paul Emsley >> wrote: >> >>> >>> Thanks for that. >>> >>> Why do I ask? Because the sdf files [1] distributed by the wwPDB, such >>> as this one: >>> >>> http://www.rcsb.org/pdb/files/ligand/CQ8_ideal.sdf >>> >>> from this page: >>> >>> http://www.rcsb.org/pdb/ligand/ligandsummary.do?hetId=CQ8 >>> >>> are upper-cased. I didn't know whether that was right or not (and, as >>> you imply, RDKit will not parse it). I'll get in touch with them and see >>> if they can get it changed. >>> >> >> It's an important data source, so it would be great if they were >> supplying data that's correctly formatted (assuming, of course, that my >> reading of that "spec" is correct). In the meantime, it would be pretty >> easy to modify the RDKit to handle these cases correctly when the >> "strictParsing" option is set to false. I'll add a github issue for this >> and get it in there. >> >> >>> [1] I thought that they were molfiles when I wrote the mail - and I >>> suppose the same thinking applies. >>> >> >> Yeah, the format of the CTAB piece is identical for mol files and SDFs. >> >> -greg >> >> >> >> -- >> Site24x7 APM Insight: Get Deep Visibility into Application Performance >> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month >> Monitor end-to-end web transactions and take corrective actions now >> Troubleshoot faster and improve end-user experience. Signup Now! >> http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140 >> ___ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> >> > > > -- > Site24x7 APM Insight: Get Deep Visibility into Application Performance > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month > Monitor end-to-end web transactions and take corrective actions now > Troubleshoot faster and improve end-user experience. Signup Now! > http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140 > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > -- Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] The Chlorine molfile question
On Wed, Jan 20, 2016 at 3:06 PM, Dimitri Maziuk wrote: > As much as PDB wants the old busted PDB format gone, they are not offering a usable alternative that I know of. Such as: a JSON file with predefined keys for all the bona-fide fields originally defined by the PDB and additional predefined fields for data that people often often overload the predefined fields with. Then a set of dialects for common sets of field combinations used by different software products. Included in each dialect would be keywords for commonly used extensions; for instance, whether the current dialog expects to see multiple CONECT records to indicate multiple bonds. If that keyword was not specified, multiple CONECTs would be viewed as an exception. Then, I suppose, a set of APIs to read the file, across some set of languages. -P. -- Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] The Chlorine molfile question
On 01/20/2016 10:06 AM, Peter Shenkin wrote: ... the terrible old PDB file format ... > As for those who would write that format, fight it! :-) > > The above, in my view, represents the voice of reason, and is therefore > unlikely to be generally adopted The long story is that most applications actually using the data need only the table of coordinates and that's pretty much what PDB file is. PDB's replacement: mmCIF includes everything and the kitchen sink wrapped in a subset of STAR-98 syntax. All of that is excess baggage nobody wants. As much as PDB wants the old busted PDB format gone, they are not offering a usable alternative that I know of. That's exactly what we've been doing at BMRB, too, and then complaining about low rate of adoption of NMR-STAR by the NMR community. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] The Chlorine molfile question
It seems to me that what we are talking about now has (or should have!) more to do with the interpretation of the terrible old PDB file format than about any software convention. It seems to me that software that must read this format should turn the contents into something generally chemically acceptable (that is, "Cl", not "CL", in this case) rather than foolishly propagating the error, or accepting it in other contexts. As for those who would write that format, fight it! :-) The above, in my view, represents the voice of reason, and is therefore unlikely to be generally adopted -P. On Wed, Jan 20, 2016 at 10:42 AM, John M wrote: > Whoops wrong thread this was in regard to the Chlorine molfile question. > > Regards, > John W May > john.wilkinson...@gmail.com > > On 20 January 2016 at 15:40, John M wrote: > >> The joys of the molfile - was curious whether it was accepted/correctly >> interpreted: >> >> ISIS Draw 2.5 Yes (arguably the arbitrator of the format) >> ChemDraw 15 Yes >> ChemDoodle No (accepted but only as a text label 'CL' no conversion) >> MarvinSketch Yes >> CDK Yes >> OEChem Yes >> Open Babel Yes >> Indigo Yes >> >> J >> >> Regards, >> John W May >> john.wilkinson...@gmail.com >> >> On 20 January 2016 at 13:19, Greg Landrum wrote: >> >>> Hi Joos, >>> >>> As long as you are sure to be consistent, it is certainly ok to generate >>> fingerprints for molecules with Hs still attached, but it's very easy to >>> make a mistake. >>> >>> The default behavior of the RDKit is to remove Hs. This is what I would >>> recommend before doing things like generating fingerprints or descriptors. >>> >>> >>> -greg >>> >>> >>> On Wed, Jan 20, 2016 at 7:06 AM, Joos Kiener >>> wrote: >>> Hi all, I've been looking at different Fingerprints within the RDKit when I realized, that it matters for many of them whether Hydrogens are explicitly present or not. This probably was obvious and clear for many of you but I wasn't aware of that. To visualize what I mean please see below notebook: http://nbviewer.jupyter.org/github/kienerj/notebooks/blob/master/Fingerprint%20Similarity%20-%20with%20and%20without%20hydrogens.ipynb Now my questions are: Should I always add hydrogens before generating fingerprints or should I remove them? How is this handled in KNIME nodes? Do I need to perform the according action (add/remove H) before generating the fingerprint? Or is this done correctly already internally of the node? Thank you for your help. Best Regards, Joos -- Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140 ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >>> >>> >>> -- >>> Site24x7 APM Insight: Get Deep Visibility into Application Performance >>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month >>> Monitor end-to-end web transactions and take corrective actions now >>> Troubleshoot faster and improve end-user experience. Signup Now! >>> http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140 >>> ___ >>> Rdkit-discuss mailing list >>> Rdkit-discuss@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >>> >>> >> > > > -- > Site24x7 APM Insight: Get Deep Visibility into Application Performance > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month > Monitor end-to-end web transactions and take corrective actions now > Troubleshoot faster and improve end-user experience. Signup Now! > http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140 > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > -- Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140___ Rd
Re: [Rdkit-discuss] Atom Symbol Case in MolFile?
Correct message thread this time: The joys of the molfile - was curious whether it was accepted/correctly > interpreted: > > ISIS Draw 2.5 Yes (arguably the arbitrator of the format) > ChemDraw 15 Yes > ChemDoodle No (accepted but only as a text label 'CL' no conversion) > MarvinSketch Yes > CDK Yes > OEChem Yes > Open Babel Yes > Indigo Yes J On 20 January 2016 at 05:26, Greg Landrum wrote: > Paul, > > On Tue, Jan 19, 2016 at 7:59 PM, Paul Emsley > wrote: > >> >> Thanks for that. >> >> Why do I ask? Because the sdf files [1] distributed by the wwPDB, such >> as this one: >> >> http://www.rcsb.org/pdb/files/ligand/CQ8_ideal.sdf >> >> from this page: >> >> http://www.rcsb.org/pdb/ligand/ligandsummary.do?hetId=CQ8 >> >> are upper-cased. I didn't know whether that was right or not (and, as >> you imply, RDKit will not parse it). I'll get in touch with them and see >> if they can get it changed. >> > > It's an important data source, so it would be great if they were supplying > data that's correctly formatted (assuming, of course, that my reading of > that "spec" is correct). In the meantime, it would be pretty easy to modify > the RDKit to handle these cases correctly when the "strictParsing" option > is set to false. I'll add a github issue for this and get it in there. > > >> [1] I thought that they were molfiles when I wrote the mail - and I >> suppose the same thinking applies. >> > > Yeah, the format of the CTAB piece is identical for mol files and SDFs. > > -greg > > > > -- > Site24x7 APM Insight: Get Deep Visibility into Application Performance > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month > Monitor end-to-end web transactions and take corrective actions now > Troubleshoot faster and improve end-user experience. Signup Now! > http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140 > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > -- Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Fingerprints and explicit Hydrogens
Whoops wrong thread this was in regard to the Chlorine molfile question. Regards, John W May john.wilkinson...@gmail.com On 20 January 2016 at 15:40, John M wrote: > The joys of the molfile - was curious whether it was accepted/correctly > interpreted: > > ISIS Draw 2.5 Yes (arguably the arbitrator of the format) > ChemDraw 15 Yes > ChemDoodle No (accepted but only as a text label 'CL' no conversion) > MarvinSketch Yes > CDK Yes > OEChem Yes > Open Babel Yes > Indigo Yes > > J > > Regards, > John W May > john.wilkinson...@gmail.com > > On 20 January 2016 at 13:19, Greg Landrum wrote: > >> Hi Joos, >> >> As long as you are sure to be consistent, it is certainly ok to generate >> fingerprints for molecules with Hs still attached, but it's very easy to >> make a mistake. >> >> The default behavior of the RDKit is to remove Hs. This is what I would >> recommend before doing things like generating fingerprints or descriptors. >> >> >> -greg >> >> >> On Wed, Jan 20, 2016 at 7:06 AM, Joos Kiener >> wrote: >> >>> Hi all, >>> >>> I've been looking at different Fingerprints within the RDKit when I >>> realized, that it matters for many of them whether Hydrogens are >>> explicitly present or not. This probably was obvious and clear for many of >>> you but I wasn't aware of that. >>> >>> To visualize what I mean please see below notebook: >>> >>> >>> http://nbviewer.jupyter.org/github/kienerj/notebooks/blob/master/Fingerprint%20Similarity%20-%20with%20and%20without%20hydrogens.ipynb >>> >>> Now my questions are: >>> >>> Should I always add hydrogens before generating fingerprints or should I >>> remove them? >>> >>> How is this handled in KNIME nodes? Do I need to perform the according >>> action (add/remove H) before generating the fingerprint? Or is this done >>> correctly already internally of the node? >>> >>> Thank you for your help. >>> >>> Best Regards, >>> >>> Joos >>> >>> >>> -- >>> Site24x7 APM Insight: Get Deep Visibility into Application Performance >>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month >>> Monitor end-to-end web transactions and take corrective actions now >>> Troubleshoot faster and improve end-user experience. Signup Now! >>> http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140 >>> ___ >>> Rdkit-discuss mailing list >>> Rdkit-discuss@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >>> >>> >> >> >> -- >> Site24x7 APM Insight: Get Deep Visibility into Application Performance >> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month >> Monitor end-to-end web transactions and take corrective actions now >> Troubleshoot faster and improve end-user experience. Signup Now! >> http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140 >> ___ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> >> > -- Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Fingerprints and explicit Hydrogens
The joys of the molfile - was curious whether it was accepted/correctly interpreted: ISIS Draw 2.5 Yes (arguably the arbitrator of the format) ChemDraw 15 Yes ChemDoodle No (accepted but only as a text label 'CL' no conversion) MarvinSketch Yes CDK Yes OEChem Yes Open Babel Yes Indigo Yes J Regards, John W May john.wilkinson...@gmail.com On 20 January 2016 at 13:19, Greg Landrum wrote: > Hi Joos, > > As long as you are sure to be consistent, it is certainly ok to generate > fingerprints for molecules with Hs still attached, but it's very easy to > make a mistake. > > The default behavior of the RDKit is to remove Hs. This is what I would > recommend before doing things like generating fingerprints or descriptors. > > > -greg > > > On Wed, Jan 20, 2016 at 7:06 AM, Joos Kiener > wrote: > >> Hi all, >> >> I've been looking at different Fingerprints within the RDKit when I >> realized, that it matters for many of them whether Hydrogens are >> explicitly present or not. This probably was obvious and clear for many of >> you but I wasn't aware of that. >> >> To visualize what I mean please see below notebook: >> >> >> http://nbviewer.jupyter.org/github/kienerj/notebooks/blob/master/Fingerprint%20Similarity%20-%20with%20and%20without%20hydrogens.ipynb >> >> Now my questions are: >> >> Should I always add hydrogens before generating fingerprints or should I >> remove them? >> >> How is this handled in KNIME nodes? Do I need to perform the according >> action (add/remove H) before generating the fingerprint? Or is this done >> correctly already internally of the node? >> >> Thank you for your help. >> >> Best Regards, >> >> Joos >> >> >> -- >> Site24x7 APM Insight: Get Deep Visibility into Application Performance >> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month >> Monitor end-to-end web transactions and take corrective actions now >> Troubleshoot faster and improve end-user experience. Signup Now! >> http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140 >> ___ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> >> > > > -- > Site24x7 APM Insight: Get Deep Visibility into Application Performance > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month > Monitor end-to-end web transactions and take corrective actions now > Troubleshoot faster and improve end-user experience. Signup Now! > http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140 > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > -- Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Fingerprints and explicit Hydrogens
Hi Joos, As long as you are sure to be consistent, it is certainly ok to generate fingerprints for molecules with Hs still attached, but it's very easy to make a mistake. The default behavior of the RDKit is to remove Hs. This is what I would recommend before doing things like generating fingerprints or descriptors. -greg On Wed, Jan 20, 2016 at 7:06 AM, Joos Kiener wrote: > Hi all, > > I've been looking at different Fingerprints within the RDKit when I > realized, that it matters for many of them whether Hydrogens are > explicitly present or not. This probably was obvious and clear for many of > you but I wasn't aware of that. > > To visualize what I mean please see below notebook: > > > http://nbviewer.jupyter.org/github/kienerj/notebooks/blob/master/Fingerprint%20Similarity%20-%20with%20and%20without%20hydrogens.ipynb > > Now my questions are: > > Should I always add hydrogens before generating fingerprints or should I > remove them? > > How is this handled in KNIME nodes? Do I need to perform the according > action (add/remove H) before generating the fingerprint? Or is this done > correctly already internally of the node? > > Thank you for your help. > > Best Regards, > > Joos > > > -- > Site24x7 APM Insight: Get Deep Visibility into Application Performance > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month > Monitor end-to-end web transactions and take corrective actions now > Troubleshoot faster and improve end-user experience. Signup Now! > http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140 > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > -- Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Fingerprints and explicit Hydrogens
Hi all, I've been looking at different Fingerprints within the RDKit when I realized, that it matters for many of them whether Hydrogens are explicitly present or not. This probably was obvious and clear for many of you but I wasn't aware of that. To visualize what I mean please see below notebook: http://nbviewer.jupyter.org/github/kienerj/notebooks/blob/master/Fingerprint%20Similarity%20-%20with%20and%20without%20hydrogens.ipynb Now my questions are: Should I always add hydrogens before generating fingerprints or should I remove them? How is this handled in KNIME nodes? Do I need to perform the according action (add/remove H) before generating the fingerprint? Or is this done correctly already internally of the node? Thank you for your help. Best Regards, Joos -- Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss