I have a huge time-series dataset and should load batch by batch. My procedures like below: Scale to (0~1) Shuffle (because I use Birch not MiniBatchKMeans) Train Birch model with partial_fit Evaluate with silhouette_score (large is better)
Why I use Birch is because it have partial_fit and no need to specify the cluster number But...I found evaluting by silhouette_score and db score, it will cluster with fewer cluster numbers. When I look into the data, it should cluster more than the clustering results. Should I change the evaluating way? or else? thx
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn