On 21/09/2018 16:53, Chris Earnshaw wrote:
Hi

I'm afraid I can't help with an RDkit solution to your question, but
there are a couple of issues which should be born in mind:
1) The centroid of a cluster is a vector mean of the fingerprints of
all the members of the cluster and probably will not be represented
_exactly_ by any member of the cluster; in this case no structures
will have a distance of 0.0 from the centroid. Do you want to
calculate the distances from the true centroid or from the
structure(s) closest to the centroid?

I have seen 'clustroid' in the literature to mean
cluster member nearest to the centroid of that cluster.

2) The Tanimoto metric doesn't obey the triangle inequality and is
therefore sub-optimal for this kind of analysis. It's better to use an
alternative which does obey the triangle inequality - e.g. the Cosine
metric.

The opposite is true.

Sven Kosub. A note on the triangle inequality for the jaccard distance.
CoRR, abs/1612.02696, 2016.

Alan H. Lipkus. A proof of the triangle inequality for the tanimoto dis-
tance. Journal of Mathematical Chemistry, 26(1):263–265, Oct 1999.

While cosine similarity is not a metric, according to wikipedia.

I'm not a mathematician, but I think (1 - Tanimoto) is a proper distance
as long as the molecules are encoded with only positive values.
So, Boolean fingerprints are OK, and counted unfolded fingerprints
as well.

Regard,
Francois.

Regards,
Chris Earnshaw

On Thu, 20 Sep 2018 at 21:55, James T. Metz via Rdkit-discuss
<rdkit-discuss@lists.sourceforge.net> wrote:

RDkit Discussion Group,

I note that RDkit can perform Butina clustering. Given an SDF
of
small molecules I would like to cluster the ligands, but obtain
additional
information from the clustering algorithm. In particular, I would
like to obtain
the cluster number and Tanimoto distance from the centroid for every
ligand
in the SDF. The centroid would obviously have a distance of 0.00.

Has anyone written additional RDkit code to extract this
additional information?

Thank you.

Regards,

Jim Metz

_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss [1]


Links:
------
[1] https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to