I timed a bit the Entropy criterion of classification tree construction. It appeared that the log (a transcendal function) was taking up a good fraction of the time).
I coded a fast log approximation that is a bit brutal: https://github.com/GaelVaroquaux/scikit-learn/commit/05e707f8dd67eb65948da877371ba62271ba94d1 it gives a factor of 4 gain in benchmarks (if you modify bench_tree to use the entropy criterion). All the tests still pass, which shows that this level of approximation is OK for what we are looking at. The question is: is it acceptable to have such an approximation? I think so, I just wanted confirmation. If people agree with this, I'll document it better (and maybe test it) and push to master. Gaƫl ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
