I got in touch with the author of fastcluster, Daniel Müllner, and he says
that he will release the code under the BSD-2 license if we agree to
integrate it into scikit-learn.

Even better, he suggests (and I agree), would be to make this change
upstream, and replace scipy.cluster's hierarchy code with Müllner's faster
code.  Then scikit-learn could benefit simply by building on
scipy.cluster.hierarchy.  This would mean that scikit-learn relies on
possibly hard-to-maintain C++ code. Oliver mentioned:

...there is a policy of trying to stay away from adding more C++ in
> the scikit code base because of the maintenance cost inherent to C++.


So I'm not sure how this relates to depending on a C++ implementation via
scipy.

Müllner's code has the same interface as the scipy.cluster.hierarchy
implementation, so perhaps the integration with scipy would not be so
difficult.  I have no experience working with the scipy team, so I have a
question: where is the appropriate place to run this suggestion by them?
 Should I just post my suggestion on the *Scipy-Dev *mailing list?

Conrad
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2d-oct
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to