> On Mar 13, 2021, at 20:29, Marawan Hussien via Rdkit-discuss
> <[email protected]> wrote:
> my question is if this is the valid approach of comparison, particularly if
> the class sizes vary widely and the average similarity will be inevitably
> affected by the size of each item in each pair. As a check, it looks that the
> diagonal is having the highest inter-classes similarity overall, which is
> anyway expected.
>
> I am also wondering if a size-weighted normalization approach could handle
> this situation?
What about a Z-score? That is:
zscore = (score - background_score) / background_standard_deviation
rather than using the mean score.
I worked out something like that a few years ago, using chemfp, at
http://www.dalkescientific.com/writings/diary/archive/2017/03/27/chembl_target_sets_association_network.html
.
If that's a reasonable approach, then it could all be done in RDKit, if you
don't want to use chemfp.
Best regards,
Andrew
[email protected]
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss