Re: [Rdkit-discuss] Possible rotatable bonds replacement

2014-01-30 Thread Paul . Czodrowski
> I could add the new descriptor as Toby provided it. People are then 
> free to pick between NumRotatableBonds() and NumStrictRotatableBonds
> (). This has the advantage of maintaining strict backwards 
> compatibility, but I could imagine it being confusing/irritating to 
> people using the code to have to choose between them (or, worse, using 
both).
> 
> Another option is to just replace the current NumRotatableBonds() 
> SMARTS with the new one.
> This loses backwards compatibility, but replaces NumRotableBonds() 
> with something more correct.
> 
> Finally, I could take a hybrid approach: replace the default 
> NumRotatableBonds() with the new one, but add an extra argument that
> allows the old one to be used.

> 
> I'm leaning towards the second option. I'd normally go with the 
> third, but I almost view this as a bug fix for the rotatable bonds 
definition.
> 
> Comments? suggestions? Other options?

I like your idea of your hybrid approach which would mean backwards 
compatibility.


paul



This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, you 
must not copy this message or attachment or disclose the contents to any other 
person. If you have received this transmission in error, please notify the 
sender immediately and delete the message and any attachment from your system. 
Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept 
liability for any omissions or errors in this message which may arise as a 
result of E-Mail-transmission or for damages resulting from any unauthorized 
changes of the content of this message and any attachment thereto. Merck KGaA, 
Darmstadt, Germany and any of its subsidiaries do not guarantee that this 
message is free of viruses and does not accept liability for any damages caused 
by any virus transmitted therewith.

Click http://www.merckgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.

--
WatchGuard Dimension instantly turns raw network data into actionable 
security intelligence. It gives you real-time visual feedback on key
security issues and trends.  Skip the complicated setup - simply import
a virtual appliance and go from zero to informed in seconds.
http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Possible rotatable bonds replacement

2014-01-30 Thread Greg Landrum
Dear all,

A question for the community:

Toby Wright submitted a pull request this week that introduces a new,
stricter, rotatable bond definition:
https://github.com/rdkit/rdkit/pull/211/files
The new SMARTS, re-formatted to be somewhat more readable, is:
[!$(*#*)&\
 !D1&\
 !$(C(F)(F)F)&\
 !$(C(Cl)(Cl)Cl)&\
 !$(C(Br)(Br)Br)&\
 !$(C([CH3])([CH3])[CH3])&\
 !$([CD3](=[N,O,S])-!@[#7,O,S!D1])&\
 !$([#7,O,S!D1]-!@[CD3]=[N,O,S])&\
 !$([CD3](=[N+])-!@[#7!D1])&\
 !$([#7!D1]-!@[CD3]=[N+])]\
-!@\
[!$(*#*)&\
 !D1&\
 !$(C(F)(F)F)&\
 !$(C(Cl)(Cl)Cl)&\
 !$(C(Br)(Br)Br)&\
 !$(C([CH3])([CH3])[CH3])]

Toby was quite careful and added a new descriptor -
NumStrictRotatableBonds() - that uses this SMARTS.

I see a few options to deal with this:

I could add the new descriptor as Toby provided it. People are then free to
pick between NumRotatableBonds() and NumStrictRotatableBonds(). This has
the advantage of maintaining strict backwards compatibility, but I could
imagine it being confusing/irritating to people using the code to have to
choose between them (or, worse, using both).

Another option is to just replace the current NumRotatableBonds() SMARTS
with the new one.
This loses backwards compatibility, but replaces NumRotableBonds() with
something more correct.

Finally, I could take a hybrid approach: replace the default
NumRotatableBonds() with the new one, but add an extra argument that allows
the old one to be used.

I'm leaning towards the second option. I'd normally go with the third, but
I almost view this as a bug fix for the rotatable bonds definition.

Comments? suggestions? Other options?
-greg
--
WatchGuard Dimension instantly turns raw network data into actionable 
security intelligence. It gives you real-time visual feedback on key
security issues and trends.  Skip the complicated setup - simply import
a virtual appliance and go from zero to informed in seconds.
http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] InChI roundtrip

2014-01-30 Thread John May
Hi George,

Don’t quote me on this but I’m guessing the reason Indigo does better is 
probably because they parse the input coordinates and wedge/hatch labels to the 
InChI API where as RDKit sets the winding. That is in Indigo’s case - InChI 
will look at the depiction and interpret where the stereo centres are whilst 
RDKit tells it explicitly. Basically RDKit is actually round tripping through 
it’s object model it whilst Indigo isn’t.

Anyways a little on InChI and stereochemistry….

Tl;DR; 200/90,000 (0.2%) ain’t bad.

When stereochemistry is validated (for 
https://www.ebi.ac.uk/chembl/compound/inspect/CHEMBL287254) the tetrahedral 
centre on the ring will be removed - it does not have a configuration. In that 
depiction it doesn’t matter whether the bond is up or down because the centre 
is dependant on the configuration of the other stereo centre in the ring. Since 
the other stereo centre doesn’t have a configuration it doesn’t mater want 
configuration this one has. Formally, a stereo centre is not a stereo centre if 
there is a permutation that inverts only it’s [the stereo centre] 
configuration. Clearly this is only the case when both have a configuration.

For a more concise example consider these structures.

InChI=1S/C8H16/c1-7-3-5-8(2)6-4-7/h7-8H,3-6H2,1-2H3/t7-,8?
InChI=1S/C8H16/c1-7-3-5-8(2)6-4-7/h7-8H,3-6H2,1-2H3

Load the first one into your favourite structure diagram editor and invert the 
wedge/hatch bond. Generating a new InChI will give the same InChI string. 
Hmm... but didn't inverted the stereo centre? If we got the same InChI it must 
not matter configuration it is. But the InChI does encode this as seen above.  
Clearly we do want the two stereoisomers to be different but I’m not sure how 
useful it is that the above two are not the same.

InChI=1S/C8H16/c1-7-3-5-8(2)6-4-7/h7-8H,3-6H2,1-2H3/t7-,8+
InChI=1S/C8H16/c1-7-3-5-8(2)6-4-7/h7-8H,3-6H2,1-2H3/t7-,8-

Anyways this isn’t really a problem with RDKit, check out the OEChem release 
notes: 
http://www.eyesopen.com/docs/toolkits/current/html/OEChem_TK-csharp/releasenotes/version1_9_2.html

They did have page showing exactly what they disagree with but that seems to of 
gone missing… thankfully PubChem also do it :-)…

CID 2375263 :  
InChI=1S/C17H11Cl3FN3/c1-10-14(9-22-12-4-7-16(21)15(19)8-12)17(20)24(23-10)13-5-2-11(18)3-6-13/h2-9H,1H3
  (notice standard InChI = 1S). 

If you feed InChI the depiction it will add a stereo configuration for the 
double bond so you’ll get one of the following (depending on depiction)

InChI=1S/C17H11Cl3FN3/c1-10-14(9-22-12-4-7-16(21)15(19)8-12)17(20)24(23-10)13-5-2-11(18)3-6-13/h2-9H,1H3/b22-9+
InChI=1S/C17H11Cl3FN3/c1-10-14(9-22-12-4-7-16(21)15(19)8-12)17(20)24(23-10)13-5-2-11(18)3-6-13/h2-9H,1H3/b22-9-

Testing last august I found ~250,000 (0.5%) differences in PubChem-Compound. 
The InChI is great but it’s not perfect and there will always be differences 
based on what toolkits agree on. There is of course an argument that the InChI 
is something they can agree on… InChI version 2 is where things get get really 
fun.

J

On 30 Jan 2014, at 19:55, George Papadatos  wrote:

> I agree; that's why I tried to minimise 'doctoring' as much as I could in 
> this case. 
> George
> 
> 
> On 30 January 2014 19:46, Dimitri Maziuk  wrote:
> On 01/30/2014 01:07 PM, George Papadatos wrote:
> > OK just to add some fuel to this fire: A colleague of mine and I looked at
> > the inchi roundtrip using KNIME 2.9 and the latest versions of indigo and
> > rdkit nodes.
> 
> > Rdkit had 10 times more discrepancies
> 
> If it's any consolation OpenBabel stereo perception does not do CIP
> ordering so any input that didn't have correct stereochemistry or it was
> removed during whatever processing you did, its output InChi will have a
> wrong stereo layer. I expect with properly doctored input you'll get
> 100% discrepancies there.
> 
> --
> Dimitri Maziuk
> Programmer/sysadmin
> BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
> 
> 
> --
> WatchGuard Dimension instantly turns raw network data into actionable
> security intelligence. It gives you real-time visual feedback on key
> security issues and trends.  Skip the complicated setup - simply import
> a virtual appliance and go from zero to informed in seconds.
> http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> 
> 
> --
> WatchGuard Dimension instantly turns raw network data into actionable 
> security intelligence. It gives you real-time visual feedback on key
> security issues and trends.  Skip the complicated setup - simply import
> a virtual appliance and go from zero to informed in seconds.
> http://puba

Re: [Rdkit-discuss] InChI roundtrip

2014-01-30 Thread Dimitri Maziuk
On 01/30/2014 01:55 PM, George Papadatos wrote:
> I agree; that's why I tried to minimise 'doctoring' as much as I could in
> this case.

What I meant by "doctoring" is select only the inputs that are known to
produce discrepancies.

One of the features that makes inchi useless is that formula is the only
layer that's required. Everything else is optional. Let's say you read
in InChI=1/C2H6O and print out InChI=1/C2H6O/c1-2-3.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
WatchGuard Dimension instantly turns raw network data into actionable 
security intelligence. It gives you real-time visual feedback on key
security issues and trends.  Skip the complicated setup - simply import
a virtual appliance and go from zero to informed in seconds.
http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] InChI roundtrip

2014-01-30 Thread George Papadatos
I agree; that's why I tried to minimise 'doctoring' as much as I could in
this case.
George


On 30 January 2014 19:46, Dimitri Maziuk  wrote:

> On 01/30/2014 01:07 PM, George Papadatos wrote:
> > OK just to add some fuel to this fire: A colleague of mine and I looked
> at
> > the inchi roundtrip using KNIME 2.9 and the latest versions of indigo and
> > rdkit nodes.
>
> > Rdkit had 10 times more discrepancies
>
> If it's any consolation OpenBabel stereo perception does not do CIP
> ordering so any input that didn't have correct stereochemistry or it was
> removed during whatever processing you did, its output InChi will have a
> wrong stereo layer. I expect with properly doctored input you'll get
> 100% discrepancies there.
>
> --
> Dimitri Maziuk
> Programmer/sysadmin
> BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
>
>
>
> --
> WatchGuard Dimension instantly turns raw network data into actionable
> security intelligence. It gives you real-time visual feedback on key
> security issues and trends.  Skip the complicated setup - simply import
> a virtual appliance and go from zero to informed in seconds.
>
> http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
WatchGuard Dimension instantly turns raw network data into actionable 
security intelligence. It gives you real-time visual feedback on key
security issues and trends.  Skip the complicated setup - simply import
a virtual appliance and go from zero to informed in seconds.
http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] InChI roundtrip

2014-01-30 Thread Dimitri Maziuk
On 01/30/2014 01:07 PM, George Papadatos wrote:
> OK just to add some fuel to this fire: A colleague of mine and I looked at
> the inchi roundtrip using KNIME 2.9 and the latest versions of indigo and
> rdkit nodes.

> Rdkit had 10 times more discrepancies

If it's any consolation OpenBabel stereo perception does not do CIP
ordering so any input that didn't have correct stereochemistry or it was
removed during whatever processing you did, its output InChi will have a
wrong stereo layer. I expect with properly doctored input you'll get
100% discrepancies there.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
WatchGuard Dimension instantly turns raw network data into actionable 
security intelligence. It gives you real-time visual feedback on key
security issues and trends.  Skip the complicated setup - simply import
a virtual appliance and go from zero to informed in seconds.
http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] InChI roundtrip

2014-01-30 Thread George Papadatos
Hi Igor,
Thanks for the quick reply.
I just did in my workflow. The number of discrepancies increased from 200
to 950 :(
George


On 30 January 2014 19:19, Igor Filippov  wrote:

> George,
>
> Have you added coordinates to the mols converted from InChI?
> It made a huge difference for the examples I've tried.
>
> Igor
>
>
> On Thu, Jan 30, 2014 at 2:07 PM, George Papadatos wrote:
>
>> OK just to add some fuel to this fire: A colleague of mine and I looked
>> at the inchi roundtrip using KNIME 2.9 and the latest versions of indigo
>> and rdkit nodes. We used ~90,000 inchis from chembl_17, converted them to
>> mols (sanitise + remove Hs), removed the ones that fail to convert, and
>> then we converted back to inchis (standard ones, no extra parameters). We
>> assessed the discrepancies between indigo and rdkit inchis compared to the
>> original input inchis that are stored in chembl.
>> Rdkit had 10 times more discrepancies with 200 failures as opposed to 21
>> from indigo. This rate (~0.2%) was also confirmed using ~1 million inchis.
>>
>> I had a closer look to a couple of cases here:
>> http://nbviewer.ipython.org/gist/madgpap/8715974
>>
>> It seems that there is more that one reason for the failure. I totally
>> understand Greg's caution about the inchi2mol conversion, but given the
>> difference between rdkit and indigo, there might room for improvement. Any
>> insights would be very much appreciated.
>>
>> Btw, the KNIME workflow and full list of fails are available to you.
>>
>> Cheers,
>>
>> George
>>
>>
>>
>> On 30 January 2014 04:11, Greg Landrum  wrote:
>>
>>> Yeah, I have been tempted several times to remove the InChI->RDKit
>>> functionality entirely
>>>
>>>
>>>
>>> On Thu, Jan 30, 2014 at 5:05 AM, Igor Filippov <
>>> igor.v.filip...@gmail.com> wrote:
>>>
 Thank you, Greg!
 Very nice explanation and I think this issue has confused people before
 me as well. I am going to have to keep reminding myself about it as the
 subject comes up every now and then.

 Igor
 On Jan 29, 2014 10:59 PM, "Greg Landrum" 
 wrote:

> Hi Igor,
>
> On Wed, Jan 29, 2014 at 2:04 PM, Igor Filippov <
> igor.v.filip...@gmail.com> wrote:
>
>> Greg et al,
>>
>> Here is a little script that demonstrates a problem with fingerprints
>> after the roundtrip through InChI.
>> My input mol file is also attached.
>> As you can see the similarity between "before" and "after" is not 1
>> in 45 out of 100 cases.
>> In one case it is as low as 0.29. Could someone take a look and tell
>> me what I'm doing wrong?
>>
>
> Ah! Now I see what you're doing and understand the problem.
>
> It's really important when using InChI to remember that InChI is
> designed to be an identifier, not an interchange format. The InChI
> algorithm modifies the molecule as part of its canonicalization step. This
> modification includes standardizing tautomers.
>
> Here's an example of the type of substructure modification that
> happens in your molecules:
> input smiles c1c1C(=O)Nc1c1 on begin converted to InChI and
> back yields: OC(=Nc1c1)c1c1
>
> Basically: If you think you know what your molecules are, you probably
> should be building them from SMILES or CTAB, not InChI.
>
> Apologies that I didn't think of this before; I was just focusing on
> the stereochemistry.
>
> -greg
>

>>>
>>>
>>> --
>>> WatchGuard Dimension instantly turns raw network data into actionable
>>> security intelligence. It gives you real-time visual feedback on key
>>> security issues and trends.  Skip the complicated setup - simply import
>>> a virtual appliance and go from zero to informed in seconds.
>>>
>>> http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>>
>>
>>
>> --
>> WatchGuard Dimension instantly turns raw network data into actionable
>> security intelligence. It gives you real-time visual feedback on key
>> security issues and trends.  Skip the complicated setup - simply import
>> a virtual appliance and go from zero to informed in seconds.
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
--
WatchGuard Dimension instantly turns raw network data into actionable 
security intelligence. It gives you real-time visual feedb

Re: [Rdkit-discuss] InChI roundtrip

2014-01-30 Thread Igor Filippov
George,

Have you added coordinates to the mols converted from InChI?
It made a huge difference for the examples I've tried.

Igor


On Thu, Jan 30, 2014 at 2:07 PM, George Papadatos wrote:

> OK just to add some fuel to this fire: A colleague of mine and I looked at
> the inchi roundtrip using KNIME 2.9 and the latest versions of indigo and
> rdkit nodes. We used ~90,000 inchis from chembl_17, converted them to mols
> (sanitise + remove Hs), removed the ones that fail to convert, and then we
> converted back to inchis (standard ones, no extra parameters). We assessed
> the discrepancies between indigo and rdkit inchis compared to the original
> input inchis that are stored in chembl.
> Rdkit had 10 times more discrepancies with 200 failures as opposed to 21
> from indigo. This rate (~0.2%) was also confirmed using ~1 million inchis.
>
> I had a closer look to a couple of cases here:
> http://nbviewer.ipython.org/gist/madgpap/8715974
>
> It seems that there is more that one reason for the failure. I totally
> understand Greg's caution about the inchi2mol conversion, but given the
> difference between rdkit and indigo, there might room for improvement. Any
> insights would be very much appreciated.
>
> Btw, the KNIME workflow and full list of fails are available to you.
>
> Cheers,
>
> George
>
>
>
> On 30 January 2014 04:11, Greg Landrum  wrote:
>
>> Yeah, I have been tempted several times to remove the InChI->RDKit
>> functionality entirely
>>
>>
>>
>> On Thu, Jan 30, 2014 at 5:05 AM, Igor Filippov > > wrote:
>>
>>> Thank you, Greg!
>>> Very nice explanation and I think this issue has confused people before
>>> me as well. I am going to have to keep reminding myself about it as the
>>> subject comes up every now and then.
>>>
>>> Igor
>>> On Jan 29, 2014 10:59 PM, "Greg Landrum"  wrote:
>>>
 Hi Igor,

 On Wed, Jan 29, 2014 at 2:04 PM, Igor Filippov <
 igor.v.filip...@gmail.com> wrote:

> Greg et al,
>
> Here is a little script that demonstrates a problem with fingerprints
> after the roundtrip through InChI.
> My input mol file is also attached.
> As you can see the similarity between "before" and "after" is not 1 in
> 45 out of 100 cases.
> In one case it is as low as 0.29. Could someone take a look and tell
> me what I'm doing wrong?
>

 Ah! Now I see what you're doing and understand the problem.

 It's really important when using InChI to remember that InChI is
 designed to be an identifier, not an interchange format. The InChI
 algorithm modifies the molecule as part of its canonicalization step. This
 modification includes standardizing tautomers.

 Here's an example of the type of substructure modification that happens
 in your molecules:
 input smiles c1c1C(=O)Nc1c1 on begin converted to InChI and
 back yields: OC(=Nc1c1)c1c1

 Basically: If you think you know what your molecules are, you probably
 should be building them from SMILES or CTAB, not InChI.

 Apologies that I didn't think of this before; I was just focusing on
 the stereochemistry.

 -greg

>>>
>>
>>
>> --
>> WatchGuard Dimension instantly turns raw network data into actionable
>> security intelligence. It gives you real-time visual feedback on key
>> security issues and trends.  Skip the complicated setup - simply import
>> a virtual appliance and go from zero to informed in seconds.
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
>
> --
> WatchGuard Dimension instantly turns raw network data into actionable
> security intelligence. It gives you real-time visual feedback on key
> security issues and trends.  Skip the complicated setup - simply import
> a virtual appliance and go from zero to informed in seconds.
>
> http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
WatchGuard Dimension instantly turns raw network data into actionable 
security intelligence. It gives you real-time visual feedback on key
security issues and trends.  Skip the complicated setup - simply import
a virtual appliance and go from zero to informed in seconds.
http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://l

Re: [Rdkit-discuss] InChI roundtrip

2014-01-30 Thread George Papadatos
OK just to add some fuel to this fire: A colleague of mine and I looked at
the inchi roundtrip using KNIME 2.9 and the latest versions of indigo and
rdkit nodes. We used ~90,000 inchis from chembl_17, converted them to mols
(sanitise + remove Hs), removed the ones that fail to convert, and then we
converted back to inchis (standard ones, no extra parameters). We assessed
the discrepancies between indigo and rdkit inchis compared to the original
input inchis that are stored in chembl.
Rdkit had 10 times more discrepancies with 200 failures as opposed to 21
from indigo. This rate (~0.2%) was also confirmed using ~1 million inchis.

I had a closer look to a couple of cases here:
http://nbviewer.ipython.org/gist/madgpap/8715974

It seems that there is more that one reason for the failure. I totally
understand Greg's caution about the inchi2mol conversion, but given the
difference between rdkit and indigo, there might room for improvement. Any
insights would be very much appreciated.

Btw, the KNIME workflow and full list of fails are available to you.

Cheers,

George



On 30 January 2014 04:11, Greg Landrum  wrote:

> Yeah, I have been tempted several times to remove the InChI->RDKit
> functionality entirely
>
>
>
> On Thu, Jan 30, 2014 at 5:05 AM, Igor Filippov 
> wrote:
>
>> Thank you, Greg!
>> Very nice explanation and I think this issue has confused people before
>> me as well. I am going to have to keep reminding myself about it as the
>> subject comes up every now and then.
>>
>> Igor
>> On Jan 29, 2014 10:59 PM, "Greg Landrum"  wrote:
>>
>>> Hi Igor,
>>>
>>> On Wed, Jan 29, 2014 at 2:04 PM, Igor Filippov <
>>> igor.v.filip...@gmail.com> wrote:
>>>
 Greg et al,

 Here is a little script that demonstrates a problem with fingerprints
 after the roundtrip through InChI.
 My input mol file is also attached.
 As you can see the similarity between "before" and "after" is not 1 in
 45 out of 100 cases.
 In one case it is as low as 0.29. Could someone take a look and tell me
 what I'm doing wrong?

>>>
>>> Ah! Now I see what you're doing and understand the problem.
>>>
>>> It's really important when using InChI to remember that InChI is
>>> designed to be an identifier, not an interchange format. The InChI
>>> algorithm modifies the molecule as part of its canonicalization step. This
>>> modification includes standardizing tautomers.
>>>
>>> Here's an example of the type of substructure modification that happens
>>> in your molecules:
>>> input smiles c1c1C(=O)Nc1c1 on begin converted to InChI and back
>>> yields: OC(=Nc1c1)c1c1
>>>
>>> Basically: If you think you know what your molecules are, you probably
>>> should be building them from SMILES or CTAB, not InChI.
>>>
>>> Apologies that I didn't think of this before; I was just focusing on the
>>> stereochemistry.
>>>
>>> -greg
>>>
>>
>
>
> --
> WatchGuard Dimension instantly turns raw network data into actionable
> security intelligence. It gives you real-time visual feedback on key
> security issues and trends.  Skip the complicated setup - simply import
> a virtual appliance and go from zero to informed in seconds.
>
> http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
WatchGuard Dimension instantly turns raw network data into actionable 
security intelligence. It gives you real-time visual feedback on key
security issues and trends.  Skip the complicated setup - simply import
a virtual appliance and go from zero to informed in seconds.
http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] standardiser tool, again

2014-01-30 Thread francis
I should have mentioned that the HTML version of the documentation is 
also available at...

http://wwwdev.ebi.ac.uk/chembl/extra/francis/standardiser/


 Original Message 
Subject: standardiser tool
Date: 2014-01-29 13:05
 From: Francis Atkinson 
To: rdkit-discuss@lists.sourceforge.net

Hello,

 Those who attended the RDKit UGM at the EBI may remember that I 
gave a talk on a structure standardisation tool I'd been working on. A 
few people expressed an interest, so I said I'd see about open-sourcing 
it. Well, I've done some work to tidy it up a bit, and it's now 
available from GitHub at 'https://github.com/flatkinson/standardiser'.

Hopefully it's clear enough how it works, but I'd be happy to answer 
any questions. Any feedback on the tool itself or the documentation 
provided would would also be much appreciated.

 Thanks,

 Francis

--
WatchGuard Dimension instantly turns raw network data into actionable 
security intelligence. It gives you real-time visual feedback on key
security issues and trends.  Skip the complicated setup - simply import
a virtual appliance and go from zero to informed in seconds.
http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss