I usually use clustering to save costs on labelling. I like to apply hierarchical clustering, and then label a small sample and fine-tune the clustering algorithm.
That way, you can evaluate the effectiveness in terms of cluster purity (how many clusters contain mixed labels) See example with sklearn here : https://youtu.be/GM8L324MuHc?list=PLqkckaeDLF4IDdKltyBwx8jLaz5nwDPQU On Fri, May 3, 2019, 11:03 AM lampahome <pahome.c...@mirlab.org> wrote: > I see some algo can cluster incrementally if dataset is too huge ex: > minibatchkmeans and Birch. > > But is there any way to evaluate incrementally? > > I found silhouette-coefficient and Calinski-Harabaz index because I don't > know the ground truth labels. > But they can't evaluate incrementally. > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn