Ehh... how do i say this ~ i'm 95% sure this stems from me having not so
much in the way of a clue but:
So for the trees you can break them up into using a small subset of the
data for each tree
in SGD you can iterate over the data and use parts of it at a time
Is there any other methods in sklearn that you can have in order to only
use a part of the overall data? So far trees ~ A tree per part and SGD ~ a
subset per partial_fit have been mentioned.
Thanks,
Shomiron Ghose
On 17 November 2012 11:28, Andreas Mueller <[email protected]> wrote:
> On 11/17/2012 04:19 PM, Ronnie Ghose wrote:
> > See you guys just said I could use trees on subsets and they will work
> > well.
> >
> > So why not partial_fits + trees?
> >
> As I tried to say, these are different stories:
> Gilles said (and wrote about) using a different small subset of the data
> for each tree.
> In total, you still use all the data. You just build multiple models,
> each with a different
> subset. So you never have to store all the data at once.
> In the end you get many classifiers that you can bag.
>
> The partial fit is something different. It builds a single model
> incrementally from the
> data.
> This is feasible only for some models and some ways to train models.
> It is implemented for SGDClassifier mainly (I think).
> It would be easy to implement this for the naive Bayes models, but
> they don't have partial fit yet afaik.
>
>
> ------------------------------------------------------------------------------
> Monitor your physical, virtual and cloud infrastructure from a single
> web console. Get in-depth insight into apps, servers, databases, vmware,
> SAP, cloud infrastructure, etc. Download 30-day Free Trial.
> Pricing starts from $795 for 25 servers or applications!
> http://p.sf.net/sfu/zoho_dev2dev_nov
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
------------------------------------------------------------------------------
Monitor your physical, virtual and cloud infrastructure from a single
web console. Get in-depth insight into apps, servers, databases, vmware,
SAP, cloud infrastructure, etc. Download 30-day Free Trial.
Pricing starts from $795 for 25 servers or applications!
http://p.sf.net/sfu/zoho_dev2dev_nov
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general