Dear all,
this is my post so I would like to apologize in case I am not doing things
properly and then thank the community for all the valuable information
shared.
I came across an unexpected behavior while working with Random Forest
classifier, namely the time required for prediction heavily depends on
whether I am passing a feature vector each time or an array of feature
vectors.
Let me make the statement clearer.
I have trained a random forest classifier with the following parameters:
ensemble.RandomForestClassifier(n_estimators=100, max_depth=80,
min_samples_split=10, criterion='entropy')
and fit with a non uniform sample_weight
I am working with 20-dimensional feature vectors, each coordinate being a
float in [0,1].
After loading (once for all) the trained classifier
classifier = joblib.load('trained_classifier.pkl')
I experimented the following behavior (for sake of clarity let me quote the
result for 10000 randomly generated vectors):
- when I computed:
for vector in vectors:
classifier.predict_proba(vector)
it took:
2227,99s user 90,75s system 21:29,94 total
- while
classifier.predict_proba(vectors)
took:
1,06s user 0,39s system 1,984 total
What is this (impressive) difference depending on?
Thanks in advance.
Nicola
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general