Re: [Rdkit-discuss] AP / DP descriptors

2015-08-22 Thread Greg Landrum
Just an FYI on this one: I just merged a Python DP and DT implementation
onto master.
Here's the github issue referencing the commits:
https://github.com/rdkit/rdkit/issues/574

I will try to get a C++ version done in time for the next release.


On Wed, Jul 15, 2015 at 11:02 AM, Greg Landrum greg.land...@gmail.com
wrote:

 Hi Michael,

 On Tuesday, July 14, 2015, Michael Reutlinger rd...@mulchi.de wrote:

 Hi Greg,

 thanks for your answer. I also figured that the invariant feature of the
 AtomPair functions should be well suited. Initially, I thought that the
 pi-electron and neighbouring atom count would always be added, but this is
 clearly not the case.


 It shouldn't be the case for the AP fingerprint. It *does*, however, look
 like there will be a problem with the TT fingerprints if you provide your
 own invariants (this is based on a skim of the code, so I'm not sure that's
 correct).

 The definition of the pharmacophoric patterns was initially described in
 the PATTY paper (Bush  Sheridan, J. Chem. Inf. Comput. Sci., 1993) and as
 far as I know the definitions are still valid.

 Luckily, there is a SMARTS based version floating around (allegedly based
 on an early OpenBabel version):
 http://tripod.nih.gov/files/patty.rules

 I compared those to the original definitions from the paper and they seem
 to match and overall make sense.


  Thanks for that pointer. That is really helpful! If I manage to make the
 time to do so, I will incorporate this into the next RDKit release.

 I implemented a quick Typer based on this definition, taking into account
 the fact that only the first atom in the SMARTS is typed.

 I attached a ipython notebook containing a small example, including the
 example molecule from the paper.

 Btw. is the hashing method used for creating the hashed fingerprint still
 valid / suitable if only a small number of the
 maximal numAtomPairFingerprintBits / codeSize is actually used?


 The quick read-through I did of the code makes me believe that it should
 be fine. I will confirm over the next day or so and let you know.

 Best,
 -greg

--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] AP / DP descriptors

2015-07-15 Thread Greg Landrum
A couple of followups here since I had some time on a plane yesterday.

On Wed, Jul 15, 2015 at 11:02 AM, Greg Landrum greg.land...@gmail.com
wrote:


 On Tuesday, July 14, 2015, Michael Reutlinger rd...@mulchi.de wrote:


 thanks for your answer. I also figured that the invariant feature of the
 AtomPair functions should be well suited. Initially, I thought that the
 pi-electron and neighbouring atom count would always be added, but this is
 clearly not the case.


 It shouldn't be the case for the AP fingerprint. It *does*, however, look
 like there will be a problem with the TT fingerprints if you provide your
 own invariants (this is based on a skim of the code, so I'm not sure that's
 correct).


I skimmed too quickly before writing this response. User-provided atom
invariants should also work properly with TT descriptors. The branching
terms added in the TT calculation do seem to be properly handed with
user-provided invariants.


 Btw. is the hashing method used for creating the hashed fingerprint still
 valid / suitable if only a small number of the maximal 
 numAtomPairFingerprintBits
 / codeSize is actually used?


 The quick read-through I did of the code makes me believe that it should
 be fine. I will confirm over the next day or so and let you know.


Looks like it should be fine.

-greg
--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] AP / DP descriptors

2015-07-15 Thread Greg Landrum
Hi Michael,

On Tuesday, July 14, 2015, Michael Reutlinger rd...@mulchi.de wrote:

 Hi Greg,

 thanks for your answer. I also figured that the invariant feature of the
 AtomPair functions should be well suited. Initially, I thought that the
 pi-electron and neighbouring atom count would always be added, but this is
 clearly not the case.


It shouldn't be the case for the AP fingerprint. It *does*, however, look
like there will be a problem with the TT fingerprints if you provide your
own invariants (this is based on a skim of the code, so I'm not sure that's
correct).

The definition of the pharmacophoric patterns was initially described in
 the PATTY paper (Bush  Sheridan, J. Chem. Inf. Comput. Sci., 1993) and as
 far as I know the definitions are still valid.

 Luckily, there is a SMARTS based version floating around (allegedly based
 on an early OpenBabel version):
 http://tripod.nih.gov/files/patty.rules

 I compared those to the original definitions from the paper and they seem
 to match and overall make sense.


 Thanks for that pointer. That is really helpful! If I manage to make the
time to do so, I will incorporate this into the next RDKit release.

I implemented a quick Typer based on this definition, taking into account
 the fact that only the first atom in the SMARTS is typed.

 I attached a ipython notebook containing a small example, including the
 example molecule from the paper.

 Btw. is the hashing method used for creating the hashed fingerprint still
 valid / suitable if only a small number of the
 maximal numAtomPairFingerprintBits / codeSize is actually used?


The quick read-through I did of the code makes me believe that it should be
fine. I will confirm over the next day or so and let you know.

Best,
-greg
--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] AP / DP descriptors

2015-07-13 Thread Greg Landrum
Hi Michael,

On Sun, Jul 5, 2015 at 2:43 PM, Michael Reutlinger rd...@mulchi.de wrote:


 I would like to use a machine learning method with the AP and DP
 descriptors as described by Robert Sheridan.

 AP descriptors are the 'atom pair' descriptors from Carhart et al. 1985
 and I think they are already available in RDKIT.


Indeed, they are.


 DP 'donor−acceptor pair', called 'BP' in Kearsley et al. 1996, is a
 reduced pharmacophore version of AP.

 I would like to know if you think there is a straightforward way to use
 the existing AP functionality (maybe using  atomInvariants) to reproduce
 the descriptor as described in Kearsley et al.?


Yes, if you have a way to assign integer invariants (atom types) to atoms
that correspond to the BP features described in Kearsley et al. then it
would be very straightforward to use those in the calculation. Note that
the RDKit atom pair code does require that all atoms have a type (i.e. you
can't have atoms that are ignored like some pharmacophore methods would
do), but looking at the paper it seems like this isn't a problem: any atom
that doesn't get assigned to one of the other classes just gets put into
class 7: other.

The paper does not, unfortunately, include enough information to directly
implement the fingerprint: you will need to come up with definitions
(probably SMARTS-based?) for the the 6 atom classes. I've thought several
times about adding the fingerprint-types from the paper to the RDKit (and
then testing them out using Sereina's benchmarking platform), but this has
always ended up getting hung-up on the missing atom-type definitions.

-greg
--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] AP / DP descriptors

2015-07-05 Thread Michael Reutlinger
Dear all,

I would like to use a machine learning method with the AP and DP
descriptors as described by Robert Sheridan.

AP descriptors are the 'atom pair' descriptors from Carhart et al. 1985 and
I think they are already available in RDKIT.
DP 'donor−acceptor pair', called 'BP' in Kearsley et al. 1996, is a reduced
pharmacophore version of AP.

I would like to know if you think there is a straightforward way to use the
existing AP functionality (maybe using  atomInvariants) to reproduce the
descriptor as described in Kearsley et al.?

Best,
Michael
--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss