Hi Dave,
Hi all,
wow, great to see that the rdkit mailing list has such a persistent memory ;)
No, I haven't worked on a re-implementation due to time restrictions back then.
I ended up using non-standard InChIs (with options '-KET' and '-15T' switched
on) to account for at least some common cases of tautomerism. This solution was
an acceptable compromise because I only needed to check for identity at the end
and didn't need to convert the molecules back from the generated InChIs. In
addition, the standardization script had to run over the complete set of
PubChem molecules, so I needed a fast and robust solution in within a short
time.
Nevertheless, I'm still interested in the topic and think it would be a great
feature to have in rdkit. As I don't need it right now at work I would also
have to do it in my free time, but if I find some time I will check out Matt's
code (thanks for sharing this!) and see if I can contribute something.
I will let you know if make any progress.
Cheers,
Markus
On 04/03/2014 05:34 PM, Dave Wood wrote:
Hi All,
Thanks for the replies.
Matthew - I ran into the same problem as you with the reaction breaking ring bonds. My
"hack" solutions was to check that the Murcko Generic Frameworks matched. This
almost seems to work... almost. :)
I shall take a look at your implementation.
Cheers,
Dave
On Thu, Apr 3, 2014 at 4:26 PM, Matthew Swain
<[email protected]<mailto:[email protected]>> wrote:
Hi,
I made a start on some tautomer enumeration and canonicalization code using
RDKit in python a few months ago but got distracted before it was fully
complete.
Here's what I have: https://gist.github.com/mcs07/9956421
It's a bit of a mess, but seems to work and might be useful if someone wants to
build on it.
One thing to note: I started by implementing the transforms using the RDKit
reaction tools, but ran into problem with it incorrectly breaking open rings in
certain cases
(http://www.mail-archive.com/rdkit-discuss%40lists.sourceforge.net/msg03791.html).
So I switched to just matching SMARTS and then adjusting the relevant bond
orders and charges as needed. This seems to work alright in my limited testing.
Also I just used the first canonical SMILES alphabetically to break a tie for
the canonical tautomer, whereas as I believe the original implementation uses
CACTVS hashcodes.
Matt
On 3 Apr 2014, at 15:55, Markus Sitzmann
<[email protected]<mailto:[email protected]>> wrote:
Hello,
I was about to ask the same (I am one of the authors of the mentioned
paper) - I had seen this post (gosh, a year ago) but had no time back
then to answer (job search and a move from the US to Europe).
I was digging into this last week a bit, however, I can not say much
yet - very initial work. If something comes out of it, I would
contribute it to RDKit. Well, if somebody has already done, I am
happy, too. Or we join forces (however, for me it is only some private
hacking with not so much time).
Markus
On Wed, Apr 2, 2014 at 11:32 AM, Dave W
<[email protected]<mailto:[email protected]>> wrote:
Hi Markus and all,
Did you or anyone else end up coding this up? I am looking into doing it
myself, but if it's already been done...
Many thanks,
Dave
------------------------------------------------------------------------------
_______________________________________________
Rdkit-discuss mailing list
[email protected]<mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
------------------------------------------------------------------------------
_______________________________________________
Rdkit-discuss mailing list
[email protected]<mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
--
Dr David Wood
Molplex Pharmaceuticals
The Biohub at Alderley Park
Macclesfield
Cheshire
SK10 4TG
01625 238702
[email protected]<mailto:[email protected]>
The information contained in this transmission may contain privileged and
confidential information, including patient information protected by federal
and state privacy laws. It is intended only for the use of the person(s) named
above. If you are not the intended recipient, you are hereby notified that any
review, dissemination, distribution, or duplication of this communication is
strictly prohibited. If you are not the intended recipient, please contact the
sender by reply email and destroy all copies of the original message.
------------------------------------------------------------------------------
_______________________________________________
Rdkit-discuss mailing list
[email protected]<mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
The information contained in this transmission may contain privileged and
confidential information, including patient information protected by federal
and state privacy laws. It is intended only for the use of the person(s) named
above. If you are not the intended recipient, you are hereby notified that any
review, dissemination, distribution, or duplication of this communication is
strictly prohibited. If you are not the intended recipient, please contact the
sender by reply email and destroy all copies of the original message.
------------------------------------------------------------------------------
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss