Dear Egon,

Thanks so much for your reply.

I don’t quite understand what you mean by “make a set of compounds which all 
hit the particular bin”. How do I make this set of compounds? And what is 
“bin”? I am not sure if I am right but I am guessing that you are asking me to 
find a set of compounds that share a common substructure/subgraph and see if 
they share the same feature/integer (for example, in my case, “14” in Extended 
Fingerprint). Is my understanding correct? If yes, then I would imagine 
creating a set of compounds and testing and trying them for all 1024 features 
in Extended is going to be quite a monumental task.

I was thinking about the Klekota-Roth fingerprint because John mentioned using 
an unfolded fingerprint. From what I know the KR fingerprint describes 4086 
substructures. So I was wondering if it is possible to figure out which integer 
corresponds to which of the KR substructure? Similarly, there is the PubChem 
Fingerprint with 881 substructural features....I am not sure if CDK can 
generate this fingerprint but if it can then it would also be helpful to know 
which of the substructure is assigned to which integer in the fingerprint.

Thanks once again for your time and assistance.

Best regards,
Allen




From: Egon Willighagen<mailto:[email protected]>
Sent: Saturday, 2 September 2023 4:25 pm
To: Chong Kim San Allen<mailto:[email protected]>
Cc: John Mayfield<mailto:[email protected]>; 
cdkuser<mailto:[email protected]>
Subject: Re: [Cdk-user] Extended Fingerprint: what do the features represent?


Dear Allen,

On Wed, 30 Aug 2023 at 01:24, Chong Kim San Allen via Cdk-user 
<[email protected]<mailto:[email protected]>> wrote:
I am wondering if you can tell me which fingerprints generate unfolded features 
and if there is a table of the subgraphs that represents these features for 
these fingerprints?

I would recommend doing this:

- make a set of compounds which all hit the particular bin
- for these, find the maximal common substructure

You can do this visually and algorithmically (depending on your use case).

With kind regards,

Egon

--
Inherited disorders can be hard to interpret when multiple biomarkers are 
involved. A network approach can help bring insight:
https://doi.org/10.1186/s13023-023-02683-9

--
E.L. Willighagen
Department of Bioinformatics - BiGCaT
Maastricht University (http://www.bigcat.unimaas.nl/)
Blog: https://chem-bla-ics.blogspot.com/
Mastodon: https://scholar.social/@egonw
PubList: https://orcid.org/0000-0001-7542-0286

________________________________

CONFIDENTIALITY: This email is intended solely for the person(s) named and may 
be confidential and/or privileged. If you are not the intended recipient, 
please delete it, notify us and do not copy, use, or disclose its contents.
Towards a sustainable earth: Print only when necessary. Thank you.
_______________________________________________
Cdk-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/cdk-user

Reply via email to