Thank You for your input and clarification! Thanks, Steven Pak
On Sun, Nov 15, 2020 at 2:18 PM Peter Gedeck <peter.ged...@gmail.com> wrote: > The paper is pretty vague on implementation details. However, note that > the code is copyright Novartis Institutes for BioMedical Research Inc. It > was released in the public domain and at that point (2013) it was the > implementation that was used internally at Novartis. You can therefore use > the Python implementation in RDKit as the reference for this method. I > would not spend any more time on finding the discrepancy. > > Best, > > Peter > > > On Nov 15, 2020, at 11:01 AM, Gustavo Seabra <gustavo.sea...@gmail.com> > wrote: > > So, basically, your code perfectly reproduces RDKit's Python > implementation. However, those results (both yours and RDKit's) *do not* > match the original paper. > > It foes look like a constant shift, but it is not: Some molecules have a > different shift than others. > > Questions: > > 1. Are those the same molecules as in the original paper? > 2. How well defined are the equations in the original paper? > > I'm guessing the RDKit's implementation is *not* 100% the same as in the > original paper, as is stated in the guthub page ( > https://github.com/rdkit/rdkit/blob/master/Contrib/SA_Score/sascorer.py) > > # several small modifications to the original paper are included > # particularly slightly different formula for marocyclic penalty > # and taking into account also molecule symmetry (fingerprint density) > > > -- > Gustavo Seabra > ------------------------------ > *From:* Steven Pak <steven....@stonybrook.edu> > *Sent:* Saturday, November 14, 2020 12:20:47 PM > *To:* Greg Landrum <greg.land...@gmail.com> > *Cc:* rdkit-discuss@lists.sourceforge.net < > rdkit-discuss@lists.sourceforge.net> > *Subject:* Re: [Rdkit-discuss] Hello questions about the Synthetic > Accessibility score > > Blue dots are RDKit-based python code vs My CPP implementation code. > Orange dots are My CPP implementation code vs scores extracted from the > original paper ( Estimation of synthetic accessibility score of drug-like > molecules based on molecular complexity and fragment contributions). My > CPP implementation of the SA_score is based on the python version of RDKIT. > I am trying to match the values exactly the same as the RDKit version > (which appears to be working). That is why I am a bit confused about why > the orange dots appear to shift at a constant value. I am wondering as to > why it shifts like that. > > As for the open source comment, I will let you know. I also did the same > thing for QED scoring functions, and I have a couple of questions about > that too, which I will send an email soon. I must talk to my team about > this before we could step forward. > > Thanks! > > On Sat, Nov 14, 2020 at 2:29 AM Greg Landrum <greg.land...@gmail.com> > wrote: > > Steven, > > Wow cool! Any thoughts about making that implementation open source? > > Did you recalculate the Python SA score with the same version of the RDKit > you used for the CPP version? Did you do your implementation based on the > Python code (hopefully) or the algorithm description in the paper? > > If the answer to both those questionsthat is “yes”, then I’m going to > guess we’d need to see the code to diagnose the problem > > Best, > -greg > > On Sat, 14 Nov 2020 at 00:06, Steven Pak <steven....@stonybrook.edu> > wrote: > > Hello. > > I have been working on a CPP version of SA score. Results are fantastic! > <image.png> > As you can see in the image, the blue dots represent the SA_scores from > python vs scores from my CPP version. The scores are perfectly in line with > each other, which is great! However, for the orange dots, these are the > values from RDKit vs original paper's. These are the original 40 compounds > that I found in the original paper. I was just wondering why do the orange > dots seem to have a constant shift throughout the graph? What part of the > code was changed to have caused this? I am just curious. > > Thank you, > -- > Steven Pak Pharm.D > Ph.D Student | Rizzo Lab > Stony Brook University (SUNY) > Department of Pharmacological Sciences > _______________________________________________ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > > > -- > Steven Pak Pharm.D > Ph.D Student | Rizzo Lab > Stony Brook University (SUNY) > Department of Pharmacological Sciences > _______________________________________________ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > > -- Steven Pak Pharm.D Ph.D Student | Rizzo Lab Stony Brook University (SUNY) Department of Pharmacological Sciences
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss