Hi everyone,
I have released chemfp 4.2. The new "simarray" functionality computes the full
comparison matrix as a NumPy array, eg, for use in some clustering algorithms.
It has built-in support for Tanimoto, Dice, cosine, and Hamming comparisons,
plus an option to get the individual "a", "b", "c", and "d" components should
you need a specialized metric. It processes roughly 100M comparisons per second
on my laptop, which means if you had 30 TB of free disk space you could
generate the NxN comparisons for ChEMBL in about a day. (I'm curious if someone
will do this!)
Chemfp supports the CDK, RDKit, Open Babel, and OpenEye toolkits. Some of the
specific improvements for the chemfp/CDK interface are:
- new "hydrogens" options for the SMILES and SDF readers ("as-is",
"make-explicit", "make-implicit", and "make-nonchiral-implicit") to change
between implicit and explicit hydrogens.
- added support for the CDK 2.9 Pubchem fingerprint improvements
- added support for jCompoundMapper fingerprints
The jCompoundMapper and "hydrogens" option were added after I read
“Effectiveness of molecular fingerprints for exploring the chemical space of
natural products” by Boldini, Ballabio, Consonni, Todeschini, Grisoni, and
Sieber, J. Cheminform. (2024) 16:35 https://doi.org/10.1186/s13321-024-00830-3
and realized there were a few rough edges chemfp could help smooth out.
For a full description of what's new in this release, see
https://chemfp.com/docs/whats_new_in_42.html .
Chemfp may be the package you’ve been looking for, if you work with binary
cheminformatics fingerprints. Chemfp is perhaps best known for its
high-performance fingerprint similarity search. Its Taylor/Butina clustering,
MaxMin diversity selection, and sphere exclusion, (including directed sphere
exclusion) are equally world-class. Or, if you simply need a 100K by 100K
distance array to pass into scikit-learn, chemfp’s simarray can generate that
in less than a minute.
The chemfp homepage is https://chemfp.com/ . To install a pre-compiled chemfp
for Linux-based OSes:
python -m pip install chemfp -i https://chemfp.com/packages/
The default installation limits or disables a few chemfp features as described
in the base license agreement at https://chemfp.com/BaseLicense.txt . To
request a license key, which is free for academic use, see
https://chemfp.com/license/ .
Best regards,
Andrew Dalke
[email protected]
_______________________________________________
Cdk-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/cdk-user