Re: [Cdk-user] ANN: chemfp 4.2

Egon Willighagen Sun, 01 Sep 2024 07:43:56 -0700

Thanks for posting!

I have reshared an announcement based on this on Mastodon:
https://fosstodon.org/@blueobelisk/113055702502809492 (I decided from the
Blue Obelisk account, since this is not just for the CDK)


Egon

On Mon, 5 Aug 2024 at 10:44, Andrew Dalke <da...@dalkescientific.com> wrote:

> Hi everyone,
>
> I have released chemfp 4.2. The new "simarray" functionality computes the
> full comparison matrix as a NumPy array, eg, for use in some clustering
> algorithms. It has built-in support for Tanimoto, Dice, cosine, and Hamming
> comparisons, plus an option to get the individual "a", "b", "c", and "d"
> components should you need a specialized metric. It processes roughly 100M
> comparisons per second on my laptop, which means if you had 30 TB of free
> disk space you could generate the NxN comparisons for ChEMBL in about a
> day. (I'm curious if someone will do this!)
>
> Chemfp supports the CDK, RDKit, Open Babel, and OpenEye toolkits. Some of
> the specific improvements for the chemfp/CDK interface are:
>
> - new "hydrogens" options for the SMILES and SDF readers ("as-is",
> "make-explicit", "make-implicit", and "make-nonchiral-implicit") to change
> between implicit and explicit hydrogens.
>
> - added support for the CDK 2.9 Pubchem fingerprint improvements
>
> - added support for jCompoundMapper fingerprints
>
> The jCompoundMapper and "hydrogens" option were added after I read
> “Effectiveness of molecular fingerprints for exploring the chemical space
> of natural products” by Boldini, Ballabio, Consonni, Todeschini, Grisoni,
> and Sieber, J. Cheminform. (2024) 16:35
> https://doi.org/10.1186/s13321-024-00830-3 and realized there were a few
> rough edges chemfp could help smooth out.
>
> For a full description of what's new in this release, see
> https://chemfp.com/docs/whats_new_in_42.html .
>
> Chemfp may be the package you’ve been looking for, if you work with binary
> cheminformatics fingerprints. Chemfp is perhaps best known for its
> high-performance fingerprint similarity search. Its Taylor/Butina
> clustering, MaxMin diversity selection, and sphere exclusion, (including
> directed sphere exclusion) are equally world-class. Or, if you simply need
> a 100K by 100K distance array to pass into scikit-learn, chemfp’s simarray
> can generate that in less than a minute.
>
> The chemfp homepage is https://chemfp.com/ . To install a pre-compiled
> chemfp for Linux-based OSes:
>
>   python -m pip install chemfp -i https://chemfp.com/packages/
>
> The default installation limits or disables a few chemfp features as
> described in the base license agreement at
> https://chemfp.com/BaseLicense.txt . To request a license key, which is
> free for academic use, see https://chemfp.com/license/ .
>
> Best regards,
>
>                                 Andrew Dalke
>                                 da...@dalkescientific.com
>
>
>
> _______________________________________________
> Cdk-user mailing list
> Cdk-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-user
>


-- 
Okay, you make FAIR. But why? We now can link FAIR maturity indicators to
reuse case scenarios. You can top asking "Is my data FAIR?" and start
asking "How FAIR do I need to be to allow that reuse?" Read about it in our
new paper "FAIR assessment of nanosafety data reusability with community
standards", https://www.nature.com/articles/s41597-024-03324-x

_______________________________________________
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user

Re: [Cdk-user] ANN: chemfp 4.2

Reply via email to