Hi all,
I'm currently developing a Python/C application related to a population
genetics / evolution-based simulation with populations of discrete
dynamical systems (...). I am using scipy/numpy/scikit-learn/matplot lib
for development and in the course of writing the code, I've been working on
a Python implementation of "Information Based Clustering" (Slonim et al.:
http://www.pnas.org/content/102/51/18297.abstract, including mutual
information estimation: http://xxx.lanl.gov/abs/cs.IT/0502017).
The clustering algorithm has several interesting features, including being
able to swap out various "similarity/difference" matrices as (including
information theoretic measures of similarity e.g. a rate distortion matrix
or a matrix of mutual information values, but one may use whatever
difference measure is most appropriate to their data/application). I am
implementing both the clustering method in the first paper as well as the
estimation of mutual information from the second.
Much of this work came out of W. Bialek's lab, who originally developed
these ideas for comparing neural spike train time-series (he's one of the
authors of the popular computational neuroscience book "Spikes"). I've used
a c++ implementation that I previously wrote for segmenting genomic
time-series with good results (just using the Euclidean distance and
Pearson correlation, not even delving into the M.I. based similarity
measurements covered in the second paper above).
In any case, I was wondering if the scikit-learn team might like an
implementation of this flexible clustering scheme that is fairly popular in
the gene regulatory network community and has features that no other
clustering algorithms that I know of have (e.g. if two members of the
dataset share more than a single bit of mutual information, then their
relationship is more complicated than simply switching one another off).
I'd enjoy formatting the Python to the standard scikit code style so that
it fits well with the existing clustering code. I would also like to
contribute to additional unsupervised learning algorithms if people would
like contributors in this area.
Please let me know if the team is interested and I will get the IBC code in
a shape that is ready for submission to the project.
Thank you for your time!
-kc
------------------------------------------------------------------------------
Shape the Mobile Experience: Free Subscription
Software experts and developers: Be at the forefront of tech innovation.
Intel(R) Software Adrenaline delivers strategic insight and game-changing
conversations that shape the rapidly evolving mobile landscape. Sign up now.
http://pubads.g.doubleclick.net/gampad/clk?id=63431311&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general