Thanks, Olivier. The application involves continuous classification of
real-time input features. So I only make on prediction at a time. My
intention would be to observe the structure of the fitted model and then
optimize the prediction function by hand in c++. For instance, a linear
classifier would be implemented something like:
std::vector<std::pair<double,double> > vals;
...
double val = 0, z = 0;
for (const auto& v : vals)
val += v.first * v.second;
z = std::copysign(1, val);
The above code computes a 20-feature linear classifier in about 25ns.
I'm not sure if a PMML exporter will be necessary for my current
application, but if so I'd be happy to contribute it to scikit-learn.
On Mon, Sep 23, 2013 at 11:09 AM, Olivier Grisel
<[email protected]>wrote:
> >2013/9/23 Fred Baba <[email protected]>:
> > System performance is currently on the order of ~1us, so Python overhead
> > would be unacceptable. For SVM, I'll extract the support vectors and
> > investigate using libSVM directly, as per federico vaggi's advice. +1 for
> > PMML support at some point down the road. Thanks for the quick responses.
>
> If you really are dealing with micro-seconds per-prediction latencies
> then even in C you will probably have to abandon models with a high
> prediction time complexity. I was curious so I made the following
> experiment:
>
> >>> from sklearn.datasets import load_digits
> >>> from sklearn.svm import SVC, LinearSVC
> >>> digits = load_digits()
>
> Complex non linear model:
>
> >>> model = SVC(gamma=0.001, C=10).fit(digits.data, digits.target)
> >>> %timeit _ = model.predict(digits.data[42:43])
> 10000 loops, best of 3: 131 µs per loop
>
> Simple linear model:
>
> >>> model2 = LinearSVC(C=10).fit(digits.data, digits.target)
> >>> %timeit _ = model2.predict(digits.data[42:43])
> 10000 loops, best of 3: 49.7 µs per loop
>
> model2 decision is a very simple dot product between the feature. I
> have not checked but is it possible that most of the 50 µs spent in
> predict are actually spend in sklearn input check python boilerplate
> and very few time it actually spend inside the BLAS call that computes
> the dot product. However one can reasonably assume that the instance
> of the SVC class has the same level of Python overhead and is at least
> twice as slow as the linear model. That means that at least 70 µs are
> likely to be spent inside libsvm itself (sklearn is a wrapper of
> libsvm for this specific model).
>
> This model only has 803 support vectors in 64 dim space:
>
> >>> model.support_vectors_.shape
> (803, 64)
>
> So for any non-trival, non-linear model you will be likely to spend
> more than a couple of tens of µs for predicting the outcome of a
> single sample.
>
> Also I don't know you architecture, but it is very likely that your
> input data is not directly a vector or array of numerical feature
> values ready for consumption by libsvm or what ever machine learning
> algorithm implementation you end up choosing, you probably have to do
> some feature extraction from your raw data be it row from a DB or JSON
> events sent by a frontend applications or so on. This feature
> extraction layer is probably even slower than computing the prediction
> in many cases.
>
> Also remember that some models can really benefit from packing
> predictions together:
>
> In [18]: %timeit _ = model.predict(digits.data[0:100])
> 100 loops, best of 3: 7.22 ms per loop
>
> That's 72 µs per-prediction. This is probably caused by the fact that
> many predictions can be packed together in a single DGEMM BLAS call
> for matrix multiplication before computing the non-linear kernel
> activations (our patched libsvm can leverage dense feature
> representations).
>
> This is even more the case for a dense linear model.
>
> In [19]: %timeit _ = model2.predict(digits.data[0:100])
> 10000 loops, best of 3: 198 µs per loop
>
> That's 2 µs per prediction which is getting closer to your
> requirements. Packing 1000 predictions at once will make the
> individual prediction time drop below 0.6 µs for a linear model:
>
> >>> %timeit _ = model2.predict(digits.data[0:1000])
> 1000 loops, best of 3: 569 µs per loop
>
> I am using the OSX implementation of BLAS, which is not the best. Hand
> built Atlas or MKL might be a bit faster.
>
> That is, if you are willing to contribute PMML exporters for some
> sklearn models, probably as a side project for scikit-learn, I am sure
> that many users will like it. However I am not sure that individual
> prediction latency caused by the Python interpreter overhead alone is
> worth it.
>
> --
> Olivier
>
>
> ------------------------------------------------------------------------------
> LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
> 1,500+ hours of tutorials including VisualStudio 2012, Windows 8,
> SharePoint
> 2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack
> includes
> Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13.
> http://pubads.g.doubleclick.net/gampad/clk?id=58041151&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
------------------------------------------------------------------------------
LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint
2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes
Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13.
http://pubads.g.doubleclick.net/gampad/clk?id=58041151&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general