Hi Dave,
Hi all,

wow, great to see that the rdkit mailing list has such a persistent memory ;)

No, I haven't worked on a re-implementation due to time restrictions back then. 
I ended up using non-standard InChIs (with options '-KET' and '-15T' switched 
on) to account for at least some common cases of tautomerism. This solution was 
an acceptable compromise because I only needed to check for identity at the end 
and didn't need to convert the molecules back from the generated InChIs. In 
addition, the standardization script had to run over the complete set of 
PubChem molecules, so I needed a fast and robust solution in within a short 
time.

Nevertheless, I'm still interested in the topic and think it would be a great 
feature to have in rdkit. As I don't need it right now at work I would also 
have to do it in my free time, but if I find some time I will check out Matt's 
code (thanks for sharing this!) and see if I can contribute something.

I will let you know if make any progress.

Cheers,
Markus


On 04/03/2014 05:34 PM, Dave Wood wrote:
Hi All,

Thanks for the replies.

Matthew - I ran into the same problem as you with the reaction breaking ring bonds. My 
"hack" solutions was to check that the Murcko Generic Frameworks matched. This 
almost seems to work... almost. :)

I shall take a look at your implementation.

Cheers,
Dave


On Thu, Apr 3, 2014 at 4:26 PM, Matthew Swain 
<[email protected]<mailto:[email protected]>> wrote:
Hi,

I made a start on some tautomer enumeration and canonicalization code using 
RDKit in python a few months ago but got distracted before it was fully 
complete.

Here's what I have: https://gist.github.com/mcs07/9956421

It's a bit of a mess, but seems to work and might be useful if someone wants to 
build on it.

One thing to note: I started by implementing the transforms using the RDKit 
reaction tools, but ran into problem with it incorrectly breaking open rings in 
certain cases 
(http://www.mail-archive.com/rdkit-discuss%40lists.sourceforge.net/msg03791.html).
 So I switched to just matching SMARTS and then adjusting the relevant bond 
orders and charges as needed. This seems to work alright in my limited testing.

Also I just used the first canonical SMILES alphabetically to break a tie for 
the canonical tautomer, whereas as I believe the original implementation uses 
CACTVS hashcodes.

Matt

On 3 Apr 2014, at 15:55, Markus Sitzmann 
<[email protected]<mailto:[email protected]>> wrote:

Hello,

I was about to ask the same (I am one of the authors of the mentioned
paper) - I had seen this post (gosh, a year ago) but had no time back
then to answer (job search and a move from the US to Europe).

I was digging into this last week a bit, however, I can not say much
yet - very initial work. If something comes out of it, I would
contribute it to RDKit. Well, if somebody has already done, I am
happy, too. Or we join forces (however, for me it is only some private
hacking with not so much time).

Markus

On Wed, Apr 2, 2014 at 11:32 AM, Dave W 
<[email protected]<mailto:[email protected]>> wrote:
Hi Markus and all,

Did you or anyone else end up coding this up? I am looking into doing it
myself, but if it's already been done...

Many thanks,
Dave



------------------------------------------------------------------------------
_______________________________________________
Rdkit-discuss mailing list
[email protected]<mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

------------------------------------------------------------------------------
_______________________________________________
Rdkit-discuss mailing list
[email protected]<mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss




--
Dr David Wood
Molplex Pharmaceuticals
The Biohub at Alderley Park
Macclesfield
Cheshire
SK10 4TG
01625 238702
[email protected]<mailto:[email protected]>

The information contained in this transmission may contain privileged and 
confidential information, including patient information protected by federal 
and state privacy laws. It is intended only for the use of the person(s) named 
above. If you are not the intended recipient, you are hereby notified that any 
review, dissemination, distribution, or duplication of this communication is 
strictly prohibited. If you are not the intended recipient, please contact the 
sender by reply email and destroy all copies of the original message.


------------------------------------------------------------------------------




_______________________________________________
Rdkit-discuss mailing list
[email protected]<mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



The information contained in this transmission may contain privileged and 
confidential information, including patient information protected by federal 
and state privacy laws. It is intended only for the use of the person(s) named 
above. If you are not the intended recipient, you are hereby notified that any 
review, dissemination, distribution, or duplication of this communication is 
strictly prohibited. If you are not the intended recipient, please contact the 
sender by reply email and destroy all copies of the original message.
------------------------------------------------------------------------------
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to