Re: [scikit-learn] Can I evaluate clustering efficiency incrementally?

Joel Nothman Thu, 16 May 2019 00:08:44 -0700

The contingency matrix (
https://scikit-learn.org/stable/modules/generated/sklearn.metrics.cluster.contingency_matrix.html)
counts how many times each pair of (true cluster, predicted cluster)
occurs. It is sufficient statistics for every "supervised" (i.e. ground
truth-based) clustering evaluation metric in Scikit-learn. In an
incremental setting, you can simply add to the contingency matrix with each
new predicted batch. In
https://github.com/scikit-learn/scikit-learn/issues/8103 I proposed that we
provide an API for calculating clustering metrics from the sufficient
statistics alone, but it's not come to fruition.


On Thu, 16 May 2019 at 11:47, lampahome <pahome.c...@mirlab.org> wrote:

> Joel Nothman <joel.noth...@gmail.com> 於 2019年5月15日 週三 下午12:16寫道：
>
>> Evaluating on large datasets is easy if the sufficient statistics are
>> just the contingency matrix.
>>
>>
> Sorry, I don't understand it. Can you explain detailly?
> You mean we could take  subset   of samples to evaluating if subset is
> contingency(normal distribution) matrix?
> _______________________________________________
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Can I evaluate clustering efficiency incrementally?

Reply via email to