2013/9/30 Eustache DIEMERT <[email protected]>:
> Hi List,
>
> The topic of prediction latency seems to pop up from times to times, either
> on the list or on SO [1].
>
> I'm pondering the idea of creating a performance benchmark in the spirit of
> the out-of-core example.
>
> This benchmark could be used to compare and track prediction time of a few
> popular algorithms in sklearn.
>
> It could be used also to showcase to new users what they can expect from the
> different algorithms and help them choose a solution that fits their
> requirements.
>
> I don't know if other machine learning toolkits have this kind of benchmarks
> (and do we already have some various pieces out there ?), but I think it
> would be a great feature for the "enterprise" users who often need to choose
> a toolkit in a limited amount of time and stick to it during the life of the
> project.
>
> It could also give water to the idea that Python can/should be fast ;)
>
> Any thoughts on that ?

I think it would be great to have a topical / howto section on the
main concepts to be aware off when analyzing the performance tradeoffs
at prediction time:

For instance, explain the difference between  throughput (number of
predictions achievable per second, possibly in batch) vs latency (time
to perform an atomic prediction).

Then give orders of magnitudes to analyze the latency and throughput
of the predictions by considering:

- feature extraction time [e.g. a couple of MB/s on a single CPU for
the typical text feature extraction]
- Python function calls and input validation overhead [in the nanos to
micro second]
- impact of the model complexity:
  - number of (non-zeros) dimensions for a linear model (and the
`linear_model.sparsify()` trick),
  - number of support vectors in a non-linear SVM,
  - number and depth of decision trees in a RF or GBRT models...

Also highlighting the fact that models that use BLAS internally for
prediction (typically linear models) can benefit from packing hundreds
or thousands of predictions at once to better benefit from CPU cache
locality, SSE instructions and multithreading.

Also: memory usage optimization: loading a model with
joblib.load(filename, mmap_mode='r') to make it possible to share the
memory: for instance when several gunicorn workers (distinct Python
processes) of a flask API exposing a scikit-learn model as HTTP
service can load a large model (e.g. a big extra trees ensemble) only
once in memory rather than loading a copy of the model each.

We could even have subsection for each class of models and give
additional pointers as done by Peter for the SO question on GBRTs.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to