Re: [scikit-learn] Can I evaluate clustering efficiency incrementally?

Uri Goren Tue, 14 May 2019 00:08:39 -0700

Sounds like you need to use spark,
this project looks promising:
https://github.com/xiaocai00/SparkPinkMST


On Tue, May 14, 2019 at 5:12 AM lampahome <[email protected]> wrote:

>
> Uri Goren <[email protected]> 於 2019年5月3日 週五 下午7:29寫道：
>
>> I usually use clustering to save costs on labelling.
>> I like to apply hierarchical clustering, and then label a small sample
>> and fine-tune the clustering algorithm.
>>
>> That way, you can evaluate the effectiveness in terms of cluster purity
>> (how many clusters contain mixed labels)
>>
>> See example with sklearn here :
>> https://youtu.be/GM8L324MuHc?list=PLqkckaeDLF4IDdKltyBwx8jLaz5nwDPQU
>>
>>
>> But if my dataset is too large to load into memory, will it work?
>
> _______________________________________________
> scikit-learn mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/scikit-learn
>

_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Can I evaluate clustering efficiency incrementally?

Reply via email to