How about mondrian forests ;)
On 05/26/2016 09:28 AM, Dale T Smith wrote:
I think your idea is an excellent candidate for scikit-learn-contrib
https://github.com/scikit-learn-contrib/scikit-learn-contrib
__________________________________________________________________________________________
*Dale Smith*| Macy's Systems and Technology | IFS eCommerce | Data
Science and Capacity Planning
| 5985 State Bridge Road, Johns Creek, GA 30097 | dale.t.sm...@macys.com
*From:*scikit-learn
[mailto:scikit-learn-bounces+dale.t.smith=macys....@python.org] *On
Behalf Of *Nicolas Goix
*Sent:* Thursday, May 26, 2016 8:51 AM
*To:* Scikit-learn user and developer mailing list
*Subject:* Re: [scikit-learn] partial_fit implementation for
IsolationForest
⚠ EXT MSG:
Hello Isaak,
There is a paper from the same authors as iforest but for streaming
data: http://ijcai.org/Proceedings/11/Papers/254.pdf
For now it is not cited enough (24) to satisfy the sklearn
requirements. Waiting for more citations, this could be a nice
addition to sklearn-contrib.
Otherwise, we could imagine extending iforest to streaming data by
building new
trees when data come (and removing the oldest ones), prediction still
being based on
the average depth of the forest. I'm not sure this heuristic could be
merged on
scikit-learn, since it is not based on well-cited papers. In the same
time,
it is a natural and simple extension of iforest to streaming data...
Any opinion on it?
Nicolas
2016-05-26 13:32 GMT+02:00 Arthur Mensch <arthur.men...@inria.fr
<mailto:arthur.men...@inria.fr>>:
Hi Isaac,
You may have a look at MiniBatchKMeans and MiniBatchDictionaryLearning
that both proposes this API. At the moment, you should fit a single
mini batch to the estimator using partial_fit, and update the inner
attributes accordingly. During the first partial_fit, you should take
care of various memory allocation that are needed by the estimator.
Please fill free to create a pull request whenever you think your code
is ready for review.
Good luck!
Le 26 mai 2016 13:14, <donkey-ho...@cryptolab.net
<mailto:donkey-ho...@cryptolab.net>> a écrit :
hello scikit-learn devs,
After following the work on IsolationForest so far and testing on a
real-world problem here we've found this model to be very promising
for anomaly detection. However, at present, IsolationForest only fits
data in batch even while it may be well suited to incremental on-line
learning since one could subsample recent history and older estimators
can be dropped progressively.
I'd like to contribute this feature, but being new to ML and
scikit-learn I'm curious how I should start making a quick & dirty
version to see how this may work. Are there other good examples where
one could see the difference between .fit and .partial_fit in other
models?
thanks
isaak y.
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org <mailto:scikit-learn@python.org>
https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org <mailto:scikit-learn@python.org>
https://mail.python.org/mailman/listinfo/scikit-learn
* This is an EXTERNAL EMAIL. Stop and think before clicking a link or
opening attachments.
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn