Hello Kev,
I am quite familiar with the W. Bialek's work on mutual information as in my 
previous PhD was working on the closed loop application for
unsupervised learning controllers.
Will be quite happy to beta test your code!

Cheers.


From: kev devnull [mailto:[email protected]]
Sent: 22 November 2013 00:27
To: [email protected]
Subject: [Scikit-learn-general] Adding a flexible mutual 
information/information theory based clustering method to sklearn.cluster?

Hi all,
I'm currently developing a Python/C application related to a population 
genetics / evolution-based simulation with populations of discrete dynamical 
systems (...). I am using scipy/numpy/scikit-learn/matplot lib for development 
and in the course of writing the code, I've been working on a Python 
implementation of "Information Based Clustering" (Slonim et al.: 
http://www.pnas.org/content/102/51/18297.abstract, including mutual information 
estimation: http://xxx.lanl.gov/abs/cs.IT/0502017).

The clustering algorithm has several interesting features, including being able 
to swap out various "similarity/difference" matrices as (including information 
theoretic measures of similarity e.g. a rate distortion matrix or a matrix of 
mutual information values, but one may use whatever difference measure is most 
appropriate to their data/application). I am implementing both the clustering 
method in the first paper as well as the estimation of mutual information from 
the second.

Much of this work came out of W. Bialek's lab, who originally developed these 
ideas for comparing neural spike train time-series (he's one of the authors of 
the popular computational neuroscience book "Spikes"). I've used a c++ 
implementation that I previously wrote for segmenting genomic time-series with 
good results (just using the Euclidean distance and Pearson correlation, not 
even delving into the M.I. based similarity measurements covered in the second 
paper above).
In any case, I was wondering if the scikit-learn team might like an 
implementation of this flexible clustering scheme that is fairly popular in the 
gene regulatory network community and has features that no other clustering 
algorithms that I know of have (e.g. if two members of the dataset share more 
than a single bit of mutual information, then their relationship is more 
complicated than simply switching one another off). I'd enjoy formatting the 
Python to the standard scikit code style so that it fits well with the existing 
clustering code. I would also like to contribute to additional unsupervised 
learning algorithms if people would like contributors in this area.
Please let me know if the team is interested and I will get the IBC code in a 
shape that is ready for submission to the project.
Thank you for your time!
-kc
------------------------------------------------------------------------------
Shape the Mobile Experience: Free Subscription
Software experts and developers: Be at the forefront of tech innovation.
Intel(R) Software Adrenaline delivers strategic insight and game-changing 
conversations that shape the rapidly evolving mobile landscape. Sign up now. 
http://pubads.g.doubleclick.net/gampad/clk?id=63431311&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to