On Sep 8, 2020, at 14:30, Mike Mazanetz <mi...@novadatasolutions.co.uk> wrote:

> Does anyone know whether it’s possible to obtain not just a fingerprint keys 
> for MACCS (binary values) but the number of occurrences of the keys, 
> particularly these details:

The SMARTS patterns for most of the MACCS keys is available by:

>>> from rdkit.Chem import MACCSkeys
>>> for key, smarts in MACCSkeys.smartsPatts.items():
...   print("[%s] %s" % (key, smarts))
...
[1] ('?', 0)
[2] ('[#104]', 0)
[3] ('[#32,#33,#34,#50,#51,#52,#82,#83,#84]', 0)
[4] ('[Ac,Th,Pa,U,Np,Pu,Am,Cm,Bk,Cf,Es,Fm,Md,No,Lr]', 0)
[5] ('[Sc,Ti,Y,Zr,Hf]', 0)
[6] ('[La,Ce,Pr,Nd,Pm,Sm,Eu,Gd,Tb,Dy,Ho,Er,Tm,Yb,Lu]', 0)
 ...

There are two parts to the right-hand-side: SMARTS pattern and count.

If the SMARTS pattern is a "?", that means the pattern is not defined at the 
SMARTS level.

There must be at least count+1 matches. That is, if the count is 0 then there 
must be at least one match.

You write "the number of occurrences of the keys".

I don't know how that makes sense for all the keys. You have things like:

140: (key(164)-3 if key(164)>3; else 0)
141: (key(160)-2 if key(160)>2; else 0)
142: (key(161)-2 if key(161)>1; else 0)

These correspond to RDKit's definitions:

[140] ('[#8]', 3)
[141] ('[CH3]', 2)
[142] ('[#7]', 1)

How do you count those number of occurrences?


> On Sep 8, 2020, at 21:56, Mike Mazanetz <mi...@novadatasolutions.co.uk> wrote:
>  The KNIME node does a lot of double counting for the RDKit Substructure 
> Counter, so it’s not a useful tool for counting MACCS keys.

Something like [11] ('*1~*~*~*~1', 0) has many matches due to symmetry.

You have to decide if you think this should be counted once, or if all 8 
matches should be counted. 

The molecule method 'GetSubstructMatches()' has a uniquify option; by default 
it only returns unique counts. ("Unique" is based on unique atoms, not unique 
atoms and bonds. I don't think that distinction affect the MACCS patterns.)

Regards,

                                Andrew
                                da...@dalkescientific.com




_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to