> However, at present, IsolationForest only fits data in batch even while it > may be well suited to incremental on-line learning since one could subsample > recent history and older estimators can be dropped progressively.
What you describe is quite different from what sklearn models typically do with partial_fit. partial_fit is more about out-of-core / streaming fitting rather than true online learning with explicit forgetting. In particular what you suggest would not accept calling partial_fit with very small chunks (e.g. from tens to a hundred samples at a time) because that would not be enough to develop deep isolation trees and would harm the performance of the resulting isolation forest. If the problem is true online learning (tracking a stream of training data with expected shifts in its distribution) I think it's better to devise a dedicated API that does not try to mimic the scikit-learn API (for this specific part). There will typically have to be an additional hyperparameter to control how much the model should remember about old samples. If the problem is more about out-of-core, then partial_fit is suitable but the trees should grow and get reorganized progressively (as pointed by others in previous comments). BTW, I would be curious to know more about the kind of anomaly detection problem where you found IsolationForests to work well. -- Olivier _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn