2011/10/4 Conrad Lee <[email protected]>: > On Tue, Oct 4, 2011 at 10:36 PM, Gael Varoquaux > <[email protected]> wrote: >> >> On Tue, Oct 04, 2011 at 10:27:01PM +0200, Lars Buitinck wrote: >> > You must mean from Θ(n² lg n) to Θ(n²). A generic Θ(n² lg n) algo is >> > listed in many textbooks [1], I sure hope we don't have the naive algo >> > scikit-learn? > > @Lars: > I haven't gone through the source code and figured out what the actual > complexity is. But if you look at the table on this page, the author of > that paper claims that scipy.cluster.hierarchy (as well as all of the other > implementations) are O(n^3).
I've checked the source, turns out we have the priority queue-based scheme, which should be n² lg n. The author of the arXiv paper lists a similar algo on p.9. > In the third paragraph of his paper, he mentions: > > Even when software packages do not use the inefficient primitive [stepwise, > procedural] algorithm (as SciPy (Eads,2007) and the R (R Development Core > Team, 2011) methods hclust and agnes do), the > author found that implementations largely use suboptimal algorithms rather > than improved It would be interesting to experiment -- Lars Buitinck Scientific programmer, ILPS University of Amsterdam ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2dcopy1 _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
