Thanks for the great answer!
2014-11-18 11:47 GMT+01:00 Paolo Losi <[email protected]>:
> It has to do with:
>
> - python iteration vs C iteration
> - memory access pattern optimization (cache behaviour)
> in the former case the outer iteration is on trees and the inner on
> vectors
> in the latter the outer iteration is on vectors and the inner on trees
> - overhead of predict_proba book-keeping
>
> On Tue, Nov 18, 2014 at 11:07 AM, Nicola Sambin <[email protected]>
> wrote:
>
>> Dear all,
>> this is my post so I would like to apologize in case I am not doing
>> things properly and then thank the community for all the valuable
>> information shared.
>>
>> I came across an unexpected behavior while working with Random Forest
>> classifier, namely the time required for prediction heavily depends on
>> whether I am passing a feature vector each time or an array of feature
>> vectors.
>>
>> Let me make the statement clearer.
>> I have trained a random forest classifier with the following parameters:
>>
>> ensemble.RandomForestClassifier(n_estimators=100, max_depth=80,
>> min_samples_split=10, criterion='entropy')
>> and fit with a non uniform sample_weight
>>
>> I am working with 20-dimensional feature vectors, each coordinate being a
>> float in [0,1].
>>
>> After loading (once for all) the trained classifier
>> classifier = joblib.load('trained_classifier.pkl')
>> I experimented the following behavior (for sake of clarity let me quote
>> the result for 10000 randomly generated vectors):
>>
>> - when I computed:
>> for vector in vectors:
>> classifier.predict_proba(vector)
>> it took:
>> 2227,99s user 90,75s system 21:29,94 total
>>
>> - while
>> classifier.predict_proba(vectors)
>> took:
>> 1,06s user 0,39s system 1,984 total
>>
>> What is this (impressive) difference depending on?
>> Thanks in advance.
>>
>> Nicola
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
>> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
>> with Interactivity, Sharing, Native Excel Exports, App Integration & more
>> Get technology previously reserved for billion-dollar corporations, FREE
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
> with Interactivity, Sharing, Native Excel Exports, App Integration & more
> Get technology previously reserved for billion-dollar corporations, FREE
>
> http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general