On Sep 8, 2020, at 14:30, Mike Mazanetz <[email protected]> wrote:
> Does anyone know whether it’s possible to obtain not just a fingerprint keys
> for MACCS (binary values) but the number of occurrences of the keys,
> particularly these details:
The SMARTS patterns for most of the MACCS keys is available by:
>>> from rdkit.Chem import MACCSkeys
>>> for key, smarts in MACCSkeys.smartsPatts.items():
... print("[%s] %s" % (key, smarts))
...
[1] ('?', 0)
[2] ('[#104]', 0)
[3] ('[#32,#33,#34,#50,#51,#52,#82,#83,#84]', 0)
[4] ('[Ac,Th,Pa,U,Np,Pu,Am,Cm,Bk,Cf,Es,Fm,Md,No,Lr]', 0)
[5] ('[Sc,Ti,Y,Zr,Hf]', 0)
[6] ('[La,Ce,Pr,Nd,Pm,Sm,Eu,Gd,Tb,Dy,Ho,Er,Tm,Yb,Lu]', 0)
...
There are two parts to the right-hand-side: SMARTS pattern and count.
If the SMARTS pattern is a "?", that means the pattern is not defined at the
SMARTS level.
There must be at least count+1 matches. That is, if the count is 0 then there
must be at least one match.
You write "the number of occurrences of the keys".
I don't know how that makes sense for all the keys. You have things like:
140: (key(164)-3 if key(164)>3; else 0)
141: (key(160)-2 if key(160)>2; else 0)
142: (key(161)-2 if key(161)>1; else 0)
These correspond to RDKit's definitions:
[140] ('[#8]', 3)
[141] ('[CH3]', 2)
[142] ('[#7]', 1)
How do you count those number of occurrences?
> On Sep 8, 2020, at 21:56, Mike Mazanetz <[email protected]> wrote:
> The KNIME node does a lot of double counting for the RDKit Substructure
> Counter, so it’s not a useful tool for counting MACCS keys.
Something like [11] ('*1~*~*~*~1', 0) has many matches due to symmetry.
You have to decide if you think this should be counted once, or if all 8
matches should be counted.
The molecule method 'GetSubstructMatches()' has a uniquify option; by default
it only returns unique counts. ("Unique" is based on unique atoms, not unique
atoms and bonds. I don't think that distinction affect the MACCS patterns.)
Regards,
Andrew
[email protected]
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss