John suggested that I post to pspp-dev.  I'm adding code to the k-means
(i.e., quick-cluster.c) procedure to show cluster membership.

CLUSTER works perfectly on a trivial two-dimensional problem but it
fails miserably on some real data. For example, in one analysis
requesting 3 clusters on 98 cases, it found that everyone was in cluster
3 and zero people were in clusters 1 & 2.  I think part of it is that
the starting values seem to be a pattern of 1's and zero's, even though
the comments describe selecting random individuals as starting values.

My question is about accessing the data.  I copied other code to use a
"casereader" to iterate over the rows of data. Below are the relevant
parts of the code I've added that seems to display cluster membership.
If I want to randomly select cases as starting values, is there a way to
retrieve random records directly?

-Alan

quick_cluster_show_membership (struct Kmeans *kmeans, const struct
casereader *reader, const struct qc *qc)
{
[...]
  struct ccase *c;
  struct casereader *cs = casereader_clone (reader);
[...]
  for (i = 0; (c = casereader_read (cs)) != NULL; i++, case_unref (c))
[...]


-- 

Alan D. Mead, Ph.D.
President, Talent Algorithms Inc.

science + technology = better workers

+815.588.3846 (Office)
+267.334.4143 (Mobile)

http://www.alanmead.org

Announcing the Journal of Computerized Adaptive Testing (JCAT), a
peer-reviewed electronic journal designed to advance the science and
practice of computerized adaptive testing: http://www.iacat.org/jcat


_______________________________________________
pspp-dev mailing list
[email protected]
https://lists.gnu.org/mailman/listinfo/pspp-dev

Reply via email to