Thank you both for the feedback.

My primary aim is to run an LBVS experiment (similarity search) using a set
of actives and the dataset of cluster representatives.



On Sun, 1 May 2022, 17:09 Patrick Walters, <wpwalt...@gmail.com> wrote:

> For me, a lot of this depends on what you intend to do with the
> clustering.  If you want to pick a "representative" subset from a larger
> dataset, k-means may do the trick.  As Rajarshi mentioned, Practical
> Cheminformatics has a k-means implementation that runs with FAISS.
> Depending on your goal, choosing a subset with a diversity picker may fit
> the bill.  One annoying aspect of diversity pickers is that the initial
> selections tend to consist of strange molecules.
>
> @Tristen can you provide more information on what you want to do with the
> clustering results?
>
>
> Pat
>
> On Sun, May 1, 2022 at 10:46 AM Rajarshi Guha <rajarshi.g...@gmail.com>
> wrote:
>
>> You could consider using FAISS. An example of clustering 2.1M cmpds is
>> described at
>> http://practicalcheminformatics.blogspot.com/2019/04/clustering-21-million-compounds-for-5.html
>>
>>
>> On Sun, May 1, 2022 at 9:23 AM Tristan Camilleri <
>> tristan.camilleri...@um.edu.mt> wrote:
>>
>>> Hi,
>>>
>>> I am attempting to cluster a database of circa 4M small molecules and I
>>> have hit several snags.
>>> Using BulkTanimoto is not possible due to resiurces that are required. I
>>> am now working with fpsim2 and chemfp to get a distance matrix (sparse
>>> matrix). However, I am finding it very challenging to identify an
>>> appropriate clustering algorithm. I have considered both k-medoids and
>>> DBSCAN. Each of these has its own limitations, stating the number of
>>> clusters for k-medoids and not obtaining centroids for DBSCAN.
>>>
>>> I was wondering whether there is an implementation of the stochastic
>>> clustering analysis for clustering purposes, described in
>>> https://doi.org/10.1021/ci970056l .
>>>
>>> Any suggestions on the best method for clustering large datasets, with
>>> code suggestions, would be greatly appreciated. I am new to the subject and
>>> would appreciate any help.
>>>
>>> Regards,
>>> Tristan
>>>
>>>
>>> _______________________________________________
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>
>>
>> --
>> Rajarshi Guha | http://blog.rguha.net | @rguha
>> <https://twitter.com/rguha>
>>
>> _______________________________________________
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to