On Sep 8, 2020, at 14:30, Mike Mazanetz <mi...@novadatasolutions.co.uk> wrote:
> Does anyone know whether it’s possible to obtain not just a fingerprint keys > for MACCS (binary values) but the number of occurrences of the keys, > particularly these details: The SMARTS patterns for most of the MACCS keys is available by: >>> from rdkit.Chem import MACCSkeys >>> for key, smarts in MACCSkeys.smartsPatts.items(): ... print("[%s] %s" % (key, smarts)) ... [1] ('?', 0) [2] ('[#104]', 0) [3] ('[#32,#33,#34,#50,#51,#52,#82,#83,#84]', 0) [4] ('[Ac,Th,Pa,U,Np,Pu,Am,Cm,Bk,Cf,Es,Fm,Md,No,Lr]', 0) [5] ('[Sc,Ti,Y,Zr,Hf]', 0) [6] ('[La,Ce,Pr,Nd,Pm,Sm,Eu,Gd,Tb,Dy,Ho,Er,Tm,Yb,Lu]', 0) ... There are two parts to the right-hand-side: SMARTS pattern and count. If the SMARTS pattern is a "?", that means the pattern is not defined at the SMARTS level. There must be at least count+1 matches. That is, if the count is 0 then there must be at least one match. You write "the number of occurrences of the keys". I don't know how that makes sense for all the keys. You have things like: 140: (key(164)-3 if key(164)>3; else 0) 141: (key(160)-2 if key(160)>2; else 0) 142: (key(161)-2 if key(161)>1; else 0) These correspond to RDKit's definitions: [140] ('[#8]', 3) [141] ('[CH3]', 2) [142] ('[#7]', 1) How do you count those number of occurrences? > On Sep 8, 2020, at 21:56, Mike Mazanetz <mi...@novadatasolutions.co.uk> wrote: > The KNIME node does a lot of double counting for the RDKit Substructure > Counter, so it’s not a useful tool for counting MACCS keys. Something like [11] ('*1~*~*~*~1', 0) has many matches due to symmetry. You have to decide if you think this should be counted once, or if all 8 matches should be counted. The molecule method 'GetSubstructMatches()' has a uniquify option; by default it only returns unique counts. ("Unique" is based on unique atoms, not unique atoms and bonds. I don't think that distinction affect the MACCS patterns.) Regards, Andrew da...@dalkescientific.com _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss