[Rdkit-discuss] Molecular Fragments Invariant Violation

2016-01-20 Thread Konrad Koehler
Hi,

First of all, thanks to the rdkit developers for making available this 
incredibly powerful package.

I am trying to get an example script taken from the rdkit documentation to 
work, and it is generating an "Invariant Violation" error.  The example script 
and the exact error message it generates is listed below.  Any ideas on what is 
causing this error?

Best regards,

Konrad

Environment: Rdkit version 2014.09.2, Python 2.7.9, running on Mac OS X, 
installed with “brew install --HEAD rdkit”.

Example script taken from: "identify fragments that distinguish actives from 
inactive”, Getting Started with the RDKit in Python, Release 2015.09.1, page 54.

 begin active_fragments.py ===

import os
from rdkit import Chem
from rdkit.ML.InfoTheory import InfoBitRanker
from rdkit.Chem import FragmentCatalog
from rdkit import RDConfig

fName=os.path.join(RDConfig.RDDataDir,'FunctionalGroups.txt')
fparams = FragmentCatalog.FragCatParams(1,6,fName)
# fparams.GetNumFuncGroups()

fcat = FragmentCatalog.FragCatalog(fparams)
fpgen = FragmentCatalog.FragFPGenerator()

suppl = Chem.SDMolSupplier('bzr.sdf')
sdms = [x for x in suppl]
acts = [float(x.GetProp('ACTIVITY')) for x in sdms]

fps = [fpgen.GetFPForMol(x,fcat) for x in sdms]
ranker = InfoBitRanker(len(fps[0]),2)

for i,fp in enumerate(fps):
act = int(acts[i]>7)
ranker.AccumulateVotes(fp,act)

top5 = ranker.GetTopN(5)
for id,gain,n0,n1 in top5:
print(int(id),'%.3f '%gain,int(n0),int(n1))

 end active_fragments.py ===


Invariant Violation
 catalog does not contain any entries of the order specified
Violation occurred on line 424 in file /tmp/rdkit-UIZPcR/Code/Catalogs/Catalog.h
Failed Expression: elem!=d_orderMap.end()


Traceback (most recent call last):
  File "active_fragments.py", line 18, in 
fps = [fpgen.GetFPForMol(x,fcat) for x in sdms]
RuntimeError: Invariant Violation

--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Fingerprints and explicit Hydrogens

2016-01-20 Thread Joos Kiener
Hi Greg,

thanks for your prompt reply.

What added to my confusion was the comparing of AtomPair fingerprints in 2D
and 3D eg:

http://nbviewer.jupyter.org/github/greglandrum/rdkit_blog/blob/master/notebooks/Atom%20Pair%20Fingerprints.ipynb

So if I understand you correctly here you need the Hs in 2D because you
have them present in 3D?
And if you use AtomPair FP in 2D only, you do not need hydrogens?

Best Regards,

Joos

2016-01-20 14:19 GMT+01:00 Greg Landrum :

> Hi Joos,
>
> As long as you are sure to be consistent, it is certainly ok to generate
> fingerprints for molecules with Hs still attached, but it's very easy to
> make a mistake.
>
> The default behavior of the RDKit is to remove Hs. This is what I would
> recommend before doing things like generating fingerprints or descriptors.
>
>
> -greg
>
>
> On Wed, Jan 20, 2016 at 7:06 AM, Joos Kiener 
> wrote:
>
>> Hi all,
>>
>> I've been looking at different Fingerprints within the RDKit when I
>> realized, that it matters  for many of them whether Hydrogens are
>> explicitly present or not. This probably was obvious and clear for many of
>> you but I wasn't aware of that.
>>
>> To visualize what I mean please see below notebook:
>>
>>
>> http://nbviewer.jupyter.org/github/kienerj/notebooks/blob/master/Fingerprint%20Similarity%20-%20with%20and%20without%20hydrogens.ipynb
>>
>> Now my questions are:
>>
>> Should I always add hydrogens before generating fingerprints or should I
>> remove them?
>>
>> How is this handled in KNIME nodes? Do I need to perform the according
>> action (add/remove H) before generating the fingerprint? Or is this done
>> correctly already internally of the node?
>>
>> Thank you for your help.
>>
>> Best Regards,
>>
>> Joos
>>
>>
>> --
>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>> Monitor end-to-end web transactions and take corrective actions now
>> Troubleshoot faster and improve end-user experience. Signup Now!
>> http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Molecule losing properties

2016-01-20 Thread Joos Kiener
Hi all,

I have a strange issue. I'm trying to display pairs of molecules (the pair
has a certain similarity threshold) and show a property for both molecules.
This is in IPyhton Notebook.

The weird thing is the first molecule of the pair loses all properties:

toShow=[]

lbls=[]

for idx in pairs:

did=dindices[idx]

mol1=und[did[0]] # und = list of molecules loaded from sd-file

mol2=und[did[1]]

toShow.append(mol1)

toShow.append(mol2)

lbls.append('Active: %.2f'%mol1.GetProp('Activ'))

lbls.append('Active: %.2f'%mol2.GetProp('Activ'))

Draw.MolsToGridImage(toShow,molsPerRow=2,legends=lbls)

---KeyError
 Traceback (most recent call
last) in ()  7
toShow.append(mol1)  8 toShow.append(mol2)> 9
lbls.append('Active: %.2f'%mol1.GetProp('Activ')) 10
lbls.append('Active: %.2f'%mol2.GetProp('Activ')) 11
Draw.MolsToGridImage(toShow,molsPerRow=2,legends=lbls)
KeyError: 'Activ'

If I change the code (remove the label) and print all properties of
mol1, the are displayed correctly.

toShow=[]
lbls=[]
for idx in pairs:
did=dindices[idx]
mol1=und[did[0]]
mol2=und[did[1]]
toShow.append(mol1)
toShow.append(mol2)
for prop in mol1.GetPropNames():
print prop + ": "  + mol1.GetProp(prop)
#lbls.append('Active: %.2f'%mol1.GetProp('Activ'))
#lbls.append('Active: %.2f'%mol2.GetProp('Activ'))
Draw.MolsToGridImage(toShow,molsPerRow=2)

This shows all the properties of mol1 plus draws the grid. No error.

However directly accessing the property by name fails with key error:

toShow=[]

lbls=[]

for idx in pairs:

did=dindices[idx]

mol1=und[did[0]]

mol2=und[did[1]]

toShow.append(mol1)

toShow.append(mol2)

print mol1.GetProp('Activ')

#lbls.append('Active: %.2f'%mol1.GetProp('Activ'))

#lbls.append('Active: %.2f'%mol2.GetProp('Activ'))

Draw.MolsToGridImage(toShow,molsPerRow=2)

---KeyError
 Traceback (most recent call
last) in ()  7
toShow.append(mol1)  8 toShow.append(mol2)> 9 print
mol1.GetProp('Activ') 10 #lbls.append('Active:
%.2f'%mol1.GetProp('Activ')) 11 #lbls.append('Active:
%.2f'%mol2.GetProp('Activ'))
KeyError: 'Activ'


This all works fine for mol2:

toShow=[]

lbls=[]

for idx in pairs:

did=dindices[idx]

mol1=und[did[0]]

mol2=und[did[1]]

toShow.append(mol1)

toShow.append(mol2)

print mol2.GetProp('Activ')

#lbls.append('Active: %.2f'%mol1.GetProp('Activ'))

#lbls.append('Active: %.2f'%mol2.GetProp('Activ'))

Draw.MolsToGridImage(toShow,molsPerRow=2)

2.5
7.7
10.93
2.0434
190.0
25.0
...

What is going on here??? How can I resolve this?

Best Regards,

Joos
--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] The Chlorine molfile question

2016-01-20 Thread Peter S. Shenkin
On Wed, Jan 20, 2016 at 7:42 PM, Dimitri Maziuk 
wrote:

> On 01/20/2016 04:57 PM, Peter S. Shenkin wrote:
> > On Wed, Jan 20, 2016 at 5:33 PM, Dimitri Maziuk 
> > wrote:
>
> >> JSON encodes a single string. That is a problem for sending larger files
> >> over the net, say, an NMR structure of a larger molecule with 100 models
> >> in the file.
> >>
> >
> > That's not a problem, conceptually, because you can have an array of
> > structures.
>
> No, my point was that streaming isn't a part of JSON specification and
> common implementations do not offer it.
>
> https://en.wikipedia.org/wiki/JSON_Streaming

You can cut one model out of a PDB file (or one structure out of and
> SDF) and the result is a valid file.
>

If each array element was complete, the same would be true here. A
pdb-aware JSON API could wrap a streaming unpacker around a batch
implementation of choice.


> In ASN.1 the length of the value is at the front.


I believe that depends on the encoding, and in any case, streaming asn.1
decoders are available. But none are freeware, as far as I know.

 have a file full of "disjoint" single structures, possibly with
> some kind of metadata header. (I haven't touched ASN.1 since school, so
> don't quote me on this.)
>

Yes, I think that's right, though I've not used ASN.1 for a long time
either.

Oh wait, that sounds exactly like PDB with its REMARKs and MODELs.
>

No, it doesn't, because the problem that I thought we were trying to
address is rather the lack of extensibility, the lack of lower-case, the
fact that different users (even for deposited structures, IIRC) and
different software products overload the available fields differently (like
putting partial charge in the Temperature Factor field) and have violated
the standard by doing necessary but formally disallowed things such as
using multiple CONECT fields to indicate multiple bonds.

Having said all this, it would suffices to write APIs that allow
specification of a dialect (CHARMM, PDB_STD, etc.) and have a convention
for returning all the contents in arrays, dictionaries, what have you,
where the keys reflect the semantics of the dialect (like "partial_charge"
or "T_factor"), and where the unused keys would return NULL.

So then, a separate question is whether there also needs to be a serialized
format for the resulting object that associated APIs can also read and
write.

'Nuff said. (By me, at least, since I'm not volunteering to do it. :-) )

-P.
--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] The Chlorine molfile question

2016-01-20 Thread Dimitri Maziuk
On 01/20/2016 04:57 PM, Peter S. Shenkin wrote:
> On Wed, Jan 20, 2016 at 5:33 PM, Dimitri Maziuk 
> wrote:

>> JSON encodes a single string. That is a problem for sending larger files
>> over the net, say, an NMR structure of a larger molecule with 100 models
>> in the file.
>>
> 
> That's not a problem, conceptually, because you can have an array of
> structures.

No, my point was that streaming isn't a part of JSON specification and
common implementations do not offer it.

https://en.wikipedia.org/wiki/JSON_Streaming

You can cut one model out of a PDB file (or one structure out of and
SDF) and the result is a valid file.

In ASN.1 the length of the value is at the front. If you define your
array as sequence, a single structure pulled out of the middle should be
OK, but the entire sequence is invalid until you read it to the end. I
think in practice you wouldn't define your array as a sequence and
instead have a file full of "disjoint" single structures, possibly with
some kind of metadata header. (I haven't touched ASN.1 since school, so
don't quote me on this.)

Oh wait, that sounds exactly like PDB with its REMARKs and MODELs.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] The Chlorine molfile question

2016-01-20 Thread Peter S. Shenkin
On Wed, Jan 20, 2016 at 5:33 PM, Dimitri Maziuk 
wrote:

> On 01/20/2016 03:55 PM, Peter S. Shenkin wrote:
> > On Wed, Jan 20, 2016 at 3:06 PM, Dimitri Maziuk 
> > wrote:
> >> As much as PDB wants the old busted PDB format gone, they
> >> are not offering a usable alternative that I know of.
> > Such as: a JSON file ...
>
> JSON encodes a single string. That is a problem for sending larger files
> over the net, say, an NMR structure of a larger molecule with 100 models
> in the file.
>

That's not a problem, conceptually, because you can have an array of
structures. Any hierarchical format has a verbosity problem due to
duplication of metadata, but for use as an interchange format, I don't see
this as a big problem.

Actually, though, you could imagine having a JSON file encoding only the
conventions to be used when reading or writing pdb files. So it would
encode the dialect. Then APIs could be provided to read the dialect JSON
file and then read or write the PDB file using the dialect  -- similar,
conceptually, to the way Python handles different dialects of .csv file.

CSV is a good format for tabular data, and you can send rows
> incrementally, but a typical application requires some small amount of
> metadata as well. For example, the full sequence -- that does not fit
> into a single-table format like CSV.


Right, but each structure in the JSON array can have its own metadata
blocks for data that are not 1:1 with atoms.

ASN.1 provides a nice alternative but it has issues, too. (Mainly, the fact
that it's fallen into obscurity.)

-P.
--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] The Chlorine molfile question

2016-01-20 Thread Dimitri Maziuk
On 01/20/2016 03:55 PM, Peter S. Shenkin wrote:
> On Wed, Jan 20, 2016 at 3:06 PM, Dimitri Maziuk 
> wrote:
> 
>> As much as PDB wants the old busted PDB format gone, they
> 
> are not offering a usable alternative that I know of.
> 
> 
> Such as: a JSON file ...

JSON encodes a single string. That is a problem for sending larger files
over the net, say, an NMR structure of a larger molecule with 100 models
in the file.

CSV is a good format for tabular data, and you can send rows
incrementally, but a typical application requires some small amount of
metadata as well. For example, the full sequence -- that does not fit
into a single-table format like CSV.

And so on.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Atom Symbol Case in MolFile?

2016-01-20 Thread Greg Landrum
I think John has provided the solid argument for me to fix the reader so
that it accepts this construct by default. I certainly won't write "CL".

John, out of curiosity: how many of those applications would write "CL"
back out again to a molfile?

-greg


On Wed, Jan 20, 2016 at 10:43 AM, John M 
wrote:

> Correct message thread this time:
>
> The joys of the molfile - was curious whether it was accepted/correctly
>> interpreted:
>
>
>>
>> ISIS Draw 2.5 Yes (arguably the arbitrator of the format)
>> ChemDraw 15 Yes
>> ChemDoodle No (accepted but only as a text label 'CL' no conversion)
>> MarvinSketch Yes
>> CDK Yes
>> OEChem Yes
>> Open Babel Yes
>> Indigo Yes
>
>
> J
>
>
> On 20 January 2016 at 05:26, Greg Landrum  wrote:
>
>> Paul,
>>
>> On Tue, Jan 19, 2016 at 7:59 PM, Paul Emsley 
>> wrote:
>>
>>>
>>> Thanks for that.
>>>
>>> Why do I ask?  Because the sdf files [1] distributed by the wwPDB, such
>>> as this one:
>>>
>>> http://www.rcsb.org/pdb/files/ligand/CQ8_ideal.sdf
>>>
>>> from this page:
>>>
>>> http://www.rcsb.org/pdb/ligand/ligandsummary.do?hetId=CQ8
>>>
>>> are upper-cased.  I didn't know whether that was right or not (and, as
>>> you imply, RDKit will not parse it).  I'll get in touch with them and see
>>> if they can get it changed.
>>>
>>
>> It's an important data source, so it would be great if they were
>> supplying data that's correctly formatted (assuming, of course, that my
>> reading of that "spec" is correct). In the meantime, it would be pretty
>> easy to modify the RDKit to handle these cases correctly when the
>> "strictParsing" option is set to false. I'll add a github issue for this
>> and get it in there.
>>
>>
>>> [1] I thought that they were molfiles when I wrote the mail - and I
>>> suppose the same thinking applies.
>>>
>>
>> Yeah, the format of the CTAB piece is identical for mol files and SDFs.
>>
>> -greg
>>
>>
>>
>> --
>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>> Monitor end-to-end web transactions and take corrective actions now
>> Troubleshoot faster and improve end-user experience. Signup Now!
>> http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
>
> --
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] The Chlorine molfile question

2016-01-20 Thread Peter S. Shenkin
On Wed, Jan 20, 2016 at 3:06 PM, Dimitri Maziuk 
wrote:

> As much as PDB wants the old busted PDB format gone, they

are not offering a usable alternative that I know of.


Such as: a JSON file with  predefined keys for all the bona-fide fields
originally defined by the PDB and additional predefined fields for data
that people often often overload the predefined fields with.

Then a set of dialects for common sets of field combinations used by
different software products. Included in each dialect would be keywords for
commonly used extensions; for instance, whether the current dialog expects
to see multiple CONECT records to indicate multiple bonds. If that keyword
was not specified, multiple CONECTs would be viewed as an exception.

Then, I suppose, a set of APIs to read the file, across some set of
languages.

-P.
--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] The Chlorine molfile question

2016-01-20 Thread Dimitri Maziuk
On 01/20/2016 10:06 AM, Peter Shenkin wrote:

... the terrible old PDB file format ...

> As for those who would write that format, fight it! :-)
> 
> The above, in my view, represents the voice of reason, and is therefore
> unlikely to be generally adopted

The long story is that most applications actually using the data need
only the table of coordinates and that's pretty much what PDB file is.
PDB's replacement: mmCIF includes everything and the kitchen sink
wrapped in a subset of STAR-98 syntax. All of that is excess baggage
nobody wants. As much as PDB wants the old busted PDB format gone, they
are not offering a usable alternative that I know of.

That's exactly what we've been doing at BMRB, too, and then complaining
about low rate of adoption of NMR-STAR by the NMR community.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] The Chlorine molfile question

2016-01-20 Thread Peter Shenkin
It seems to me that what we are talking about now has (or should have!)
more to do with the interpretation of the terrible old PDB file format than
about any software convention.

It seems to me that software that must read this format should turn the
contents into something generally chemically acceptable (that is, "Cl", not
"CL", in this case) rather than foolishly propagating the error, or
accepting it in other contexts.

As for those who would write that format, fight it! :-)

The above, in my view, represents the voice of reason, and is therefore
unlikely to be generally adopted

-P.

On Wed, Jan 20, 2016 at 10:42 AM, John M 
wrote:

> Whoops wrong thread this was in regard to the Chlorine molfile question.
>
> Regards,
> John W May
> john.wilkinson...@gmail.com
>
> On 20 January 2016 at 15:40, John M  wrote:
>
>> The joys of the molfile - was curious whether it was accepted/correctly
>> interpreted:
>>
>> ISIS Draw 2.5 Yes (arguably the arbitrator of the format)
>> ChemDraw 15 Yes
>> ChemDoodle No (accepted but only as a text label 'CL' no conversion)
>> MarvinSketch Yes
>> CDK Yes
>> OEChem Yes
>> Open Babel Yes
>> Indigo Yes
>>
>> J
>>
>> Regards,
>> John W May
>> john.wilkinson...@gmail.com
>>
>> On 20 January 2016 at 13:19, Greg Landrum  wrote:
>>
>>> Hi Joos,
>>>
>>> As long as you are sure to be consistent, it is certainly ok to generate
>>> fingerprints for molecules with Hs still attached, but it's very easy to
>>> make a mistake.
>>>
>>> The default behavior of the RDKit is to remove Hs. This is what I would
>>> recommend before doing things like generating fingerprints or descriptors.
>>>
>>>
>>> -greg
>>>
>>>
>>> On Wed, Jan 20, 2016 at 7:06 AM, Joos Kiener 
>>> wrote:
>>>
 Hi all,

 I've been looking at different Fingerprints within the RDKit when I
 realized, that it matters  for many of them whether Hydrogens are
 explicitly present or not. This probably was obvious and clear for many of
 you but I wasn't aware of that.

 To visualize what I mean please see below notebook:


 http://nbviewer.jupyter.org/github/kienerj/notebooks/blob/master/Fingerprint%20Similarity%20-%20with%20and%20without%20hydrogens.ipynb

 Now my questions are:

 Should I always add hydrogens before generating fingerprints or should
 I remove them?

 How is this handled in KNIME nodes? Do I need to perform the according
 action (add/remove H) before generating the fingerprint? Or is this done
 correctly already internally of the node?

 Thank you for your help.

 Best Regards,

 Joos


 --
 Site24x7 APM Insight: Get Deep Visibility into Application Performance
 APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
 Monitor end-to-end web transactions and take corrective actions now
 Troubleshoot faster and improve end-user experience. Signup Now!
 http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


>>>
>>>
>>> --
>>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
>>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>>> Monitor end-to-end web transactions and take corrective actions now
>>> Troubleshoot faster and improve end-user experience. Signup Now!
>>> http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>>
>>
>
>
> --
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140___
Rd

Re: [Rdkit-discuss] Atom Symbol Case in MolFile?

2016-01-20 Thread John M
Correct message thread this time:

The joys of the molfile - was curious whether it was accepted/correctly
> interpreted:


>
> ISIS Draw 2.5 Yes (arguably the arbitrator of the format)
> ChemDraw 15 Yes
> ChemDoodle No (accepted but only as a text label 'CL' no conversion)
> MarvinSketch Yes
> CDK Yes
> OEChem Yes
> Open Babel Yes
> Indigo Yes


J


On 20 January 2016 at 05:26, Greg Landrum  wrote:

> Paul,
>
> On Tue, Jan 19, 2016 at 7:59 PM, Paul Emsley 
> wrote:
>
>>
>> Thanks for that.
>>
>> Why do I ask?  Because the sdf files [1] distributed by the wwPDB, such
>> as this one:
>>
>> http://www.rcsb.org/pdb/files/ligand/CQ8_ideal.sdf
>>
>> from this page:
>>
>> http://www.rcsb.org/pdb/ligand/ligandsummary.do?hetId=CQ8
>>
>> are upper-cased.  I didn't know whether that was right or not (and, as
>> you imply, RDKit will not parse it).  I'll get in touch with them and see
>> if they can get it changed.
>>
>
> It's an important data source, so it would be great if they were supplying
> data that's correctly formatted (assuming, of course, that my reading of
> that "spec" is correct). In the meantime, it would be pretty easy to modify
> the RDKit to handle these cases correctly when the "strictParsing" option
> is set to false. I'll add a github issue for this and get it in there.
>
>
>> [1] I thought that they were molfiles when I wrote the mail - and I
>> suppose the same thinking applies.
>>
>
> Yeah, the format of the CTAB piece is identical for mol files and SDFs.
>
> -greg
>
>
>
> --
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Fingerprints and explicit Hydrogens

2016-01-20 Thread John M
Whoops wrong thread this was in regard to the Chlorine molfile question.

Regards,
John W May
john.wilkinson...@gmail.com

On 20 January 2016 at 15:40, John M  wrote:

> The joys of the molfile - was curious whether it was accepted/correctly
> interpreted:
>
> ISIS Draw 2.5 Yes (arguably the arbitrator of the format)
> ChemDraw 15 Yes
> ChemDoodle No (accepted but only as a text label 'CL' no conversion)
> MarvinSketch Yes
> CDK Yes
> OEChem Yes
> Open Babel Yes
> Indigo Yes
>
> J
>
> Regards,
> John W May
> john.wilkinson...@gmail.com
>
> On 20 January 2016 at 13:19, Greg Landrum  wrote:
>
>> Hi Joos,
>>
>> As long as you are sure to be consistent, it is certainly ok to generate
>> fingerprints for molecules with Hs still attached, but it's very easy to
>> make a mistake.
>>
>> The default behavior of the RDKit is to remove Hs. This is what I would
>> recommend before doing things like generating fingerprints or descriptors.
>>
>>
>> -greg
>>
>>
>> On Wed, Jan 20, 2016 at 7:06 AM, Joos Kiener 
>> wrote:
>>
>>> Hi all,
>>>
>>> I've been looking at different Fingerprints within the RDKit when I
>>> realized, that it matters  for many of them whether Hydrogens are
>>> explicitly present or not. This probably was obvious and clear for many of
>>> you but I wasn't aware of that.
>>>
>>> To visualize what I mean please see below notebook:
>>>
>>>
>>> http://nbviewer.jupyter.org/github/kienerj/notebooks/blob/master/Fingerprint%20Similarity%20-%20with%20and%20without%20hydrogens.ipynb
>>>
>>> Now my questions are:
>>>
>>> Should I always add hydrogens before generating fingerprints or should I
>>> remove them?
>>>
>>> How is this handled in KNIME nodes? Do I need to perform the according
>>> action (add/remove H) before generating the fingerprint? Or is this done
>>> correctly already internally of the node?
>>>
>>> Thank you for your help.
>>>
>>> Best Regards,
>>>
>>> Joos
>>>
>>>
>>> --
>>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
>>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>>> Monitor end-to-end web transactions and take corrective actions now
>>> Troubleshoot faster and improve end-user experience. Signup Now!
>>> http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>>
>>
>>
>> --
>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>> Monitor end-to-end web transactions and take corrective actions now
>> Troubleshoot faster and improve end-user experience. Signup Now!
>> http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Fingerprints and explicit Hydrogens

2016-01-20 Thread John M
The joys of the molfile - was curious whether it was accepted/correctly
interpreted:

ISIS Draw 2.5 Yes (arguably the arbitrator of the format)
ChemDraw 15 Yes
ChemDoodle No (accepted but only as a text label 'CL' no conversion)
MarvinSketch Yes
CDK Yes
OEChem Yes
Open Babel Yes
Indigo Yes

J

Regards,
John W May
john.wilkinson...@gmail.com

On 20 January 2016 at 13:19, Greg Landrum  wrote:

> Hi Joos,
>
> As long as you are sure to be consistent, it is certainly ok to generate
> fingerprints for molecules with Hs still attached, but it's very easy to
> make a mistake.
>
> The default behavior of the RDKit is to remove Hs. This is what I would
> recommend before doing things like generating fingerprints or descriptors.
>
>
> -greg
>
>
> On Wed, Jan 20, 2016 at 7:06 AM, Joos Kiener 
> wrote:
>
>> Hi all,
>>
>> I've been looking at different Fingerprints within the RDKit when I
>> realized, that it matters  for many of them whether Hydrogens are
>> explicitly present or not. This probably was obvious and clear for many of
>> you but I wasn't aware of that.
>>
>> To visualize what I mean please see below notebook:
>>
>>
>> http://nbviewer.jupyter.org/github/kienerj/notebooks/blob/master/Fingerprint%20Similarity%20-%20with%20and%20without%20hydrogens.ipynb
>>
>> Now my questions are:
>>
>> Should I always add hydrogens before generating fingerprints or should I
>> remove them?
>>
>> How is this handled in KNIME nodes? Do I need to perform the according
>> action (add/remove H) before generating the fingerprint? Or is this done
>> correctly already internally of the node?
>>
>> Thank you for your help.
>>
>> Best Regards,
>>
>> Joos
>>
>>
>> --
>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>> Monitor end-to-end web transactions and take corrective actions now
>> Troubleshoot faster and improve end-user experience. Signup Now!
>> http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
>
> --
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Fingerprints and explicit Hydrogens

2016-01-20 Thread Greg Landrum
Hi Joos,

As long as you are sure to be consistent, it is certainly ok to generate
fingerprints for molecules with Hs still attached, but it's very easy to
make a mistake.

The default behavior of the RDKit is to remove Hs. This is what I would
recommend before doing things like generating fingerprints or descriptors.


-greg


On Wed, Jan 20, 2016 at 7:06 AM, Joos Kiener  wrote:

> Hi all,
>
> I've been looking at different Fingerprints within the RDKit when I
> realized, that it matters  for many of them whether Hydrogens are
> explicitly present or not. This probably was obvious and clear for many of
> you but I wasn't aware of that.
>
> To visualize what I mean please see below notebook:
>
>
> http://nbviewer.jupyter.org/github/kienerj/notebooks/blob/master/Fingerprint%20Similarity%20-%20with%20and%20without%20hydrogens.ipynb
>
> Now my questions are:
>
> Should I always add hydrogens before generating fingerprints or should I
> remove them?
>
> How is this handled in KNIME nodes? Do I need to perform the according
> action (add/remove H) before generating the fingerprint? Or is this done
> correctly already internally of the node?
>
> Thank you for your help.
>
> Best Regards,
>
> Joos
>
>
> --
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Fingerprints and explicit Hydrogens

2016-01-20 Thread Joos Kiener
Hi all,

I've been looking at different Fingerprints within the RDKit when I
realized, that it matters  for many of them whether Hydrogens are
explicitly present or not. This probably was obvious and clear for many of
you but I wasn't aware of that.

To visualize what I mean please see below notebook:

http://nbviewer.jupyter.org/github/kienerj/notebooks/blob/master/Fingerprint%20Similarity%20-%20with%20and%20without%20hydrogens.ipynb

Now my questions are:

Should I always add hydrogens before generating fingerprints or should I
remove them?

How is this handled in KNIME nodes? Do I need to perform the according
action (add/remove H) before generating the fingerprint? Or is this done
correctly already internally of the node?

Thank you for your help.

Best Regards,

Joos
--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss