Dear Egon, Thanks so much for your reply.
I don’t quite understand what you mean by “make a set of compounds which all hit the particular bin”. How do I make this set of compounds? And what is “bin”? I am not sure if I am right but I am guessing that you are asking me to find a set of compounds that share a common substructure/subgraph and see if they share the same feature/integer (for example, in my case, “14” in Extended Fingerprint). Is my understanding correct? If yes, then I would imagine creating a set of compounds and testing and trying them for all 1024 features in Extended is going to be quite a monumental task. I was thinking about the Klekota-Roth fingerprint because John mentioned using an unfolded fingerprint. From what I know the KR fingerprint describes 4086 substructures. So I was wondering if it is possible to figure out which integer corresponds to which of the KR substructure? Similarly, there is the PubChem Fingerprint with 881 substructural features....I am not sure if CDK can generate this fingerprint but if it can then it would also be helpful to know which of the substructure is assigned to which integer in the fingerprint. Thanks once again for your time and assistance. Best regards, Allen From: Egon Willighagen<mailto:[email protected]> Sent: Saturday, 2 September 2023 4:25 pm To: Chong Kim San Allen<mailto:[email protected]> Cc: John Mayfield<mailto:[email protected]>; cdkuser<mailto:[email protected]> Subject: Re: [Cdk-user] Extended Fingerprint: what do the features represent? Dear Allen, On Wed, 30 Aug 2023 at 01:24, Chong Kim San Allen via Cdk-user <[email protected]<mailto:[email protected]>> wrote: I am wondering if you can tell me which fingerprints generate unfolded features and if there is a table of the subgraphs that represents these features for these fingerprints? I would recommend doing this: - make a set of compounds which all hit the particular bin - for these, find the maximal common substructure You can do this visually and algorithmically (depending on your use case). With kind regards, Egon -- Inherited disorders can be hard to interpret when multiple biomarkers are involved. A network approach can help bring insight: https://doi.org/10.1186/s13023-023-02683-9 -- E.L. Willighagen Department of Bioinformatics - BiGCaT Maastricht University (http://www.bigcat.unimaas.nl/) Blog: https://chem-bla-ics.blogspot.com/ Mastodon: https://scholar.social/@egonw PubList: https://orcid.org/0000-0001-7542-0286 ________________________________ CONFIDENTIALITY: This email is intended solely for the person(s) named and may be confidential and/or privileged. If you are not the intended recipient, please delete it, notify us and do not copy, use, or disclose its contents. Towards a sustainable earth: Print only when necessary. Thank you.
_______________________________________________ Cdk-user mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/cdk-user

