It has to do with:

- python iteration vs C iteration
- memory access pattern optimization (cache behaviour)
  in the former case the outer iteration is on trees and the inner on
vectors
  in the latter the outer iteration is on vectors and the inner on trees
- overhead of predict_proba book-keeping

On Tue, Nov 18, 2014 at 11:07 AM, Nicola Sambin <[email protected]>
wrote:

> Dear all,
> this is my post so I would like to apologize in case I am not doing things
> properly and then thank the community for all the valuable information
> shared.
>
> I came across an unexpected behavior while working with Random Forest
> classifier, namely the time required for prediction heavily depends on
> whether I am passing a feature vector each time or an array of feature
> vectors.
>
> Let me make the statement clearer.
> I have trained a random forest classifier with the following parameters:
>
> ensemble.RandomForestClassifier(n_estimators=100, max_depth=80,
> min_samples_split=10, criterion='entropy')
> and fit with a non uniform sample_weight
>
> I am working with 20-dimensional feature vectors, each coordinate being a
> float in [0,1].
>
> After loading (once for all) the trained classifier
> classifier = joblib.load('trained_classifier.pkl')
> I experimented the following behavior (for sake of clarity let me quote
> the result for 10000 randomly generated vectors):
>
> - when I computed:
> for vector in vectors:
>     classifier.predict_proba(vector)
> it took:
> 2227,99s user 90,75s system 21:29,94 total
>
> - while
> classifier.predict_proba(vectors)
> took:
> 1,06s user 0,39s system 1,984 total
>
> What is this (impressive) difference depending on?
> Thanks in advance.
>
> Nicola
>
>
>
> ------------------------------------------------------------------------------
> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
> with Interactivity, Sharing, Native Excel Exports, App Integration & more
> Get technology previously reserved for billion-dollar corporations, FREE
>
> http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to