>2013/9/23 Fred Baba <[email protected]>: > System performance is currently on the order of ~1us, so Python overhead > would be unacceptable. For SVM, I'll extract the support vectors and > investigate using libSVM directly, as per federico vaggi's advice. +1 for > PMML support at some point down the road. Thanks for the quick responses.
If you really are dealing with micro-seconds per-prediction latencies then even in C you will probably have to abandon models with a high prediction time complexity. I was curious so I made the following experiment: >>> from sklearn.datasets import load_digits >>> from sklearn.svm import SVC, LinearSVC >>> digits = load_digits() Complex non linear model: >>> model = SVC(gamma=0.001, C=10).fit(digits.data, digits.target) >>> %timeit _ = model.predict(digits.data[42:43]) 10000 loops, best of 3: 131 µs per loop Simple linear model: >>> model2 = LinearSVC(C=10).fit(digits.data, digits.target) >>> %timeit _ = model2.predict(digits.data[42:43]) 10000 loops, best of 3: 49.7 µs per loop model2 decision is a very simple dot product between the feature. I have not checked but is it possible that most of the 50 µs spent in predict are actually spend in sklearn input check python boilerplate and very few time it actually spend inside the BLAS call that computes the dot product. However one can reasonably assume that the instance of the SVC class has the same level of Python overhead and is at least twice as slow as the linear model. That means that at least 70 µs are likely to be spent inside libsvm itself (sklearn is a wrapper of libsvm for this specific model). This model only has 803 support vectors in 64 dim space: >>> model.support_vectors_.shape (803, 64) So for any non-trival, non-linear model you will be likely to spend more than a couple of tens of µs for predicting the outcome of a single sample. Also I don't know you architecture, but it is very likely that your input data is not directly a vector or array of numerical feature values ready for consumption by libsvm or what ever machine learning algorithm implementation you end up choosing, you probably have to do some feature extraction from your raw data be it row from a DB or JSON events sent by a frontend applications or so on. This feature extraction layer is probably even slower than computing the prediction in many cases. Also remember that some models can really benefit from packing predictions together: In [18]: %timeit _ = model.predict(digits.data[0:100]) 100 loops, best of 3: 7.22 ms per loop That's 72 µs per-prediction. This is probably caused by the fact that many predictions can be packed together in a single DGEMM BLAS call for matrix multiplication before computing the non-linear kernel activations (our patched libsvm can leverage dense feature representations). This is even more the case for a dense linear model. In [19]: %timeit _ = model2.predict(digits.data[0:100]) 10000 loops, best of 3: 198 µs per loop That's 2 µs per prediction which is getting closer to your requirements. Packing 1000 predictions at once will make the individual prediction time drop below 0.6 µs for a linear model: >>> %timeit _ = model2.predict(digits.data[0:1000]) 1000 loops, best of 3: 569 µs per loop I am using the OSX implementation of BLAS, which is not the best. Hand built Atlas or MKL might be a bit faster. That is, if you are willing to contribute PMML exporters for some sklearn models, probably as a side project for scikit-learn, I am sure that many users will like it. However I am not sure that individual prediction latency caused by the Python interpreter overhead alone is worth it. -- Olivier ------------------------------------------------------------------------------ LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99! 1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint 2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13. http://pubads.g.doubleclick.net/gampad/clk?id=58041151&iu=/4140/ostg.clktrk _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
