Are you concerned about storing the whole corpus text in memory, or the
whole corpus' statistics? If the text, use input='file' or input='filename'
(or a generator of texts).

On Tue, 28 Jan 2020 at 18:01, Peng Yu <[email protected]> wrote:

> Hi,
>
> To use TfidfVectorizer, the whole corpus must be used into memory.
> This can be a problem for machines without a lot of memory. Is there a
> way to use only a small amount of memory by saving most intermediate
> results in the disk? Thanks.
>
> --
> Regards,
> Peng
> _______________________________________________
> scikit-learn mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/scikit-learn
>
_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to