Thanks for the feedback. Rather than an explicit need to perform
clustering, it is more for me to learn how to do it.
Any pointers to this effect would be greatly appreciated.
Tristan
On Sun, 1 May 2022 at 18:18, Patrick Walters wrote:
> Similarity search on a database of 4 million is pretty q
I use Advanced Chemistry Development software, which is commercial, but I've
always been very pleased with their support. It has the added benefit that you
can view raw analytical data (NMR, LC-MS, etc.) from most any vendor and
combine structures with the raw data.
John
Classified - Confid
Thank you both for the feedback.
My primary aim is to run an LBVS experiment (similarity search) using a set
of actives and the dataset of cluster representatives.
On Sun, 1 May 2022, 17:09 Patrick Walters, wrote:
> For me, a lot of this depends on what you intend to do with the
> clustering.
Similarity search on a database of 4 million is pretty quick with ChemFp or
fpsim2. Do you need to do the clustering?
Here are a couple of relevant blog posts.
http://practicalcheminformatics.blogspot.com/2020/10/what-do-molecules-that-look-like-this.html
http://practicalcheminformatics.blogspo
For me, a lot of this depends on what you intend to do with the
clustering. If you want to pick a "representative" subset from a larger
dataset, k-means may do the trick. As Rajarshi mentioned, Practical
Cheminformatics has a k-means implementation that runs with FAISS.
Depending on your goal, ch
You could consider using FAISS. An example of clustering 2.1M cmpds is
described at
http://practicalcheminformatics.blogspot.com/2019/04/clustering-21-million-compounds-for-5.html
On Sun, May 1, 2022 at 9:23 AM Tristan Camilleri <
tristan.camilleri...@um.edu.mt> wrote:
> Hi,
>
> I am attempting
Hi,
I am attempting to cluster a database of circa 4M small molecules and I
have hit several snags.
Using BulkTanimoto is not possible due to resiurces that are required. I am
now working with fpsim2 and chemfp to get a distance matrix (sparse
matrix). However, I am finding it very challenging to
Datawarrior is my favorite.
---Original---
From: "Marawan Hussien via
Rdkit-discuss"___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
8 matches
Mail list logo