Thanks, Curt!  I'll give those a look.  It'll give me a very good reason to
start digging into SciPy a bit more and exploit the added functionality
that will bring.

Regarding my original question and for anyone else that might be
interested...

I did indeed find an answer through a lot of code dredging.  I found the
Murtagh.ClusterData() function in RDKit, and was able to generate clusters
from that.  The function returns a single member list, that single member
being a Cluster object.  I can feed that object to ClusterVis.ClusterToImg
to get the dendrogram I wanted.  Here's a short code snip showing the
pieces.

...
c_tree = Murtagh.ClusterData(dists,nfps,Murtagh.WARDS,isDistData=True)
...
rdkit.ML.Cluster.ClusterVis.ClusterToImg(c_tree[0], size=(500,500),
fileName='test.png')
...

I can then break the cluster tree into subtrees:

...
rdkit.ML.Cluster.ClusterUtils.SplitIntoNClusters(c_tree[0], 5)
...

And I've written a short function to extract out the individual structure
memberships for each group:

...

groups = ClusterUtils.SplitIntoNClusters(c_tree[0], 5)

def GetGroupMembers( grp, memberlist=[] ):
    for child in grp.GetChildren():
        if (child.GetData() is None ):
            GetGroupMembers( child, memberlist )
        else:
            memberlist.append( child.GetData() )

    return memberlist

print GetGroupMembers(groups[0])




On Sat, May 14, 2016 at 11:21 AM, Curt Fischer <curt.r.fisc...@gmail.com>
wrote:

> Hi Robert,
>
> For the number of molecules you are interested in, it's viable to use
> SciPy / NumPy clustering functions instead of rdkit's built in C-linked
> functions.  This approach will probably not be as fast rdkit's built-in
> clustering functionalities, and will probably not scale to tens of
> thousands of molecules as well as rdkit's functions, but if you use SciPy
> or NumPy in other types of technical computing, this approach may be more
> transparent, generalizable, and easier to use.
>
> I have an example Jupyter notebook in GitHub that describes what I mean;
> here are the GitHub and nbviewer links:
>
>
> https://github.com/tentrillion/ipython_notebooks/blob/master/chemical_similarity_in_python.ipynb
>
> https://nbviewer.jupyter.org/github/tentrillion/ipython_notebooks/blob/master/chemical_similarity_in_python.ipynb
>
> Here are some of the most important parts of the code for generating a
> dendrogram.
>
> 1. Generate a numpy fingerprint matrix from a list of rdkit Molecules.
>
> for smiles in smiles_list:
>     mol = Chem.MolFromSmiles(smiles)
>     mols.append(mol)
> fingerprint_mat = np.vstack(np.asarray(rdmolops.RDKFingerprint(mol, fpSize = 
> 2048), dtype = 'bool') for mol in mols)
>
>
> 2. Generate the distance matrix.  *pdist* and *squareform* are from
> *scipy.spatial.distance*.
>
> dist_mat = pdist(fingerprint_mat, 'jaccard') dist_df = pd.DataFrame(
> squareform(dist_mat), index = smiles_list, columns= smiles_list)
>
> As far as I can tell, the Jaccard distance is equivalent to one minus the
> Tanimoto similarity.
>
> 3. Perform hierarchical clustering on the distance matrix and show the
> dendrogram (see the github notebook for the plot). *hc* is
> *scipy.cluster.hierarchy*.
>
> z = hc.linkage(dist_mat)dendrogram = hc.dendrogram(z, labels=dist_df.columns, 
> leaf_rotation=90)plt.show()
>
>
> A helpful page for dendrograms using SciPy is this one:
> https://joernhees.de/blog/2015/08/26/scipy-hierarchical-clustering-and-dendrogram-tutorial/
>
> Good luck!
>
> Curt
>
> On Sat, May 14, 2016 at 9:11 AM, Robert DeLisle <rkdeli...@gmail.com>
> wrote:
>
>> Next up is clustering...
>>
>> I've got about 350 structures to cluster and I've worked through the
>> example code from the RDKit Cookbook (
>> http://www.rdkit.org/docs/Cookbook.html#clustering-molecules).  All
>> seems well and good there, but I would like to see the dendrogram.  I see
>> that there is a ClusterVis module to generate images, PDF, and SVG, but all
>> require a Cluster object as input.  I don't find anywhere a description of
>> acquiring or building that object based upon the results of clustering.
>>
>> Any tips?
>>
>> -Kirk
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Mobile security can be enabling, not merely restricting. Employees who
>> bring their own devices (BYOD) to work are irked by the imposition of MDM
>> restrictions. Mobile Device Manager Plus allows you to control only the
>> apps on BYO-devices by containerizing them, leaving personal data
>> untouched!
>> https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
>> _______________________________________________
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
------------------------------------------------------------------------------
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to