Thanks for posting! I have reshared an announcement based on this on Mastodon: https://fosstodon.org/@blueobelisk/113055702502809492 (I decided from the Blue Obelisk account, since this is not just for the CDK)
Egon On Mon, 5 Aug 2024 at 10:44, Andrew Dalke <da...@dalkescientific.com> wrote: > Hi everyone, > > I have released chemfp 4.2. The new "simarray" functionality computes the > full comparison matrix as a NumPy array, eg, for use in some clustering > algorithms. It has built-in support for Tanimoto, Dice, cosine, and Hamming > comparisons, plus an option to get the individual "a", "b", "c", and "d" > components should you need a specialized metric. It processes roughly 100M > comparisons per second on my laptop, which means if you had 30 TB of free > disk space you could generate the NxN comparisons for ChEMBL in about a > day. (I'm curious if someone will do this!) > > Chemfp supports the CDK, RDKit, Open Babel, and OpenEye toolkits. Some of > the specific improvements for the chemfp/CDK interface are: > > - new "hydrogens" options for the SMILES and SDF readers ("as-is", > "make-explicit", "make-implicit", and "make-nonchiral-implicit") to change > between implicit and explicit hydrogens. > > - added support for the CDK 2.9 Pubchem fingerprint improvements > > - added support for jCompoundMapper fingerprints > > The jCompoundMapper and "hydrogens" option were added after I read > “Effectiveness of molecular fingerprints for exploring the chemical space > of natural products” by Boldini, Ballabio, Consonni, Todeschini, Grisoni, > and Sieber, J. Cheminform. (2024) 16:35 > https://doi.org/10.1186/s13321-024-00830-3 and realized there were a few > rough edges chemfp could help smooth out. > > For a full description of what's new in this release, see > https://chemfp.com/docs/whats_new_in_42.html . > > Chemfp may be the package you’ve been looking for, if you work with binary > cheminformatics fingerprints. Chemfp is perhaps best known for its > high-performance fingerprint similarity search. Its Taylor/Butina > clustering, MaxMin diversity selection, and sphere exclusion, (including > directed sphere exclusion) are equally world-class. Or, if you simply need > a 100K by 100K distance array to pass into scikit-learn, chemfp’s simarray > can generate that in less than a minute. > > The chemfp homepage is https://chemfp.com/ . To install a pre-compiled > chemfp for Linux-based OSes: > > python -m pip install chemfp -i https://chemfp.com/packages/ > > The default installation limits or disables a few chemfp features as > described in the base license agreement at > https://chemfp.com/BaseLicense.txt . To request a license key, which is > free for academic use, see https://chemfp.com/license/ . > > Best regards, > > Andrew Dalke > da...@dalkescientific.com > > > > _______________________________________________ > Cdk-user mailing list > Cdk-user@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/cdk-user > -- Okay, you make FAIR. But why? We now can link FAIR maturity indicators to reuse case scenarios. You can top asking "Is my data FAIR?" and start asking "How FAIR do I need to be to allow that reuse?" Read about it in our new paper "FAIR assessment of nanosafety data reusability with community standards", https://www.nature.com/articles/s41597-024-03324-x
_______________________________________________ Cdk-user mailing list Cdk-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/cdk-user