[scikit-learn] Is there any general way to make clustering huge time-series dataset better?

lampahome Thu, 20 Jun 2019 07:35:30 -0700

I have a huge time-series dataset and should load batch by batch.

My procedures like below:
Scale to (0~1)
Shuffle (because I use Birch not MiniBatchKMeans)
Train Birch model with partial_fit
Evaluate with silhouette_score (large is better)


Why I use Birch is because it have partial_fit and no need to specify the
cluster number
But...I found evaluting by silhouette_score and db score, it will cluster
with fewer cluster numbers.

When I look into the data, it should cluster more than the clustering
results.

Should I change the evaluating way? or else?

thx

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

[scikit-learn] Is there any general way to make clustering huge time-series dataset better?

Reply via email to