John suggested that I post to pspp-dev. I'm adding code to the k-means
(i.e., quick-cluster.c) procedure to show cluster membership.
CLUSTER works perfectly on a trivial two-dimensional problem but it
fails miserably on some real data. For example, in one analysis
requesting 3 clusters on 98 cases, it found that everyone was in cluster
3 and zero people were in clusters 1 & 2. I think part of it is that
the starting values seem to be a pattern of 1's and zero's, even though
the comments describe selecting random individuals as starting values.
My question is about accessing the data. I copied other code to use a
"casereader" to iterate over the rows of data. Below are the relevant
parts of the code I've added that seems to display cluster membership.
If I want to randomly select cases as starting values, is there a way to
retrieve random records directly?
-Alan
quick_cluster_show_membership (struct Kmeans *kmeans, const struct
casereader *reader, const struct qc *qc)
{
[...]
struct ccase *c;
struct casereader *cs = casereader_clone (reader);
[...]
for (i = 0; (c = casereader_read (cs)) != NULL; i++, case_unref (c))
[...]
--
Alan D. Mead, Ph.D.
President, Talent Algorithms Inc.
science + technology = better workers
+815.588.3846 (Office)
+267.334.4143 (Mobile)
http://www.alanmead.org
Announcing the Journal of Computerized Adaptive Testing (JCAT), a
peer-reviewed electronic journal designed to advance the science and
practice of computerized adaptive testing: http://www.iacat.org/jcat
_______________________________________________
pspp-dev mailing list
[email protected]
https://lists.gnu.org/mailman/listinfo/pspp-dev