I have a huge time-series dataset and should load batch by batch.

My procedures like below:
Scale to (0~1)
Shuffle (because I use Birch not MiniBatchKMeans)
Train Birch model with partial_fit
Evaluate with silhouette_score (large is better)

Why I use Birch is because it have partial_fit and no need to specify the
cluster number
But...I found evaluting by silhouette_score and db score, it will cluster
with fewer cluster numbers.

When I look into the data, it should cluster more than the clustering
results.

Should I change the evaluating way? or else?

thx
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to