On Fri, May 29, 2015 at 09:33:30AM -0500, Alan Mead wrote:
John suggested that I post to pspp-dev. I'm adding code to the k-means
(i.e., quick-cluster.c) procedure to show cluster membership.
CLUSTER works perfectly on a trivial two-dimensional problem but it
fails miserably on some real data. For example, in one analysis
requesting 3 clusters on 98 cases, it found that everyone was in cluster
3 and zero people were in clusters 1 & 2. I think part of it is that
the starting values seem to be a pattern of 1's and zero's, even though
the comments describe selecting random individuals as starting values.
My question is about accessing the data. I copied other code to use a
"casereader" to iterate over the rows of data. Below are the relevant
parts of the code I've added that seems to display cluster membership.
If I want to randomly select cases as starting values, is there a way to
retrieve random records directly?
Ben is the casereader expert! Maybe he can comment? But I think you might be able to use the function casereader_select (defined in casereader-select.c) casereader_select (subreader, random_number - 1, random_number + 1, 1); You would have to ensure that random_number was within the range of subreader. Alternatively, we might be able to come up with a function similar to casereader_select, which advances the subreader by a (pseudo) random number on each read. Disclaimer: These are first ideas, which I haven't thought through in any degree. J' -- PGP Public key ID: 1024D/2DE827B3 fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3 See http://sks-keyservers.net or any PGP keyserver for public key.
signature.asc
Description: Digital signature
_______________________________________________ pspp-dev mailing list [email protected] https://lists.gnu.org/mailman/listinfo/pspp-dev
