If anyone is interested in implementing these, dask-ml would welcome
additional
metrics that work well with Dask arrays:
https://github.com/dask/dask-ml/issues/213.

On Tue, May 14, 2019 at 2:09 AM Uri Goren <ugo...@gmail.com> wrote:

> Sounds like you need to use spark,
> this project looks promising:
> https://github.com/xiaocai00/SparkPinkMST
>
> On Tue, May 14, 2019 at 5:12 AM lampahome <pahome.c...@mirlab.org> wrote:
>
>>
>> Uri Goren <ugo...@gmail.com> 於 2019年5月3日 週五 下午7:29寫道:
>>
>>> I usually use clustering to save costs on labelling.
>>> I like to apply hierarchical clustering, and then label a small sample
>>> and fine-tune the clustering algorithm.
>>>
>>> That way, you can evaluate the effectiveness in terms of cluster purity
>>> (how many clusters contain mixed labels)
>>>
>>> See example with sklearn here :
>>> https://youtu.be/GM8L324MuHc?list=PLqkckaeDLF4IDdKltyBwx8jLaz5nwDPQU
>>>
>>>
>>> But if my dataset is too large to load into memory, will it work?
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn@python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to