Re: Are these bugs in cluster?

Ben Pfaff Sat, 30 May 2015 16:25:18 -0700

On Sat, May 30, 2015 at 09:13:16AM +0200, John Darrington wrote:
> On Fri, May 29, 2015 at 09:33:30AM -0500, Alan Mead wrote:
>      John suggested that I post to pspp-dev.  I'm adding code to the k-means
>      (i.e., quick-cluster.c) procedure to show cluster membership.
>      
>      CLUSTER works perfectly on a trivial two-dimensional problem but it
>      fails miserably on some real data. For example, in one analysis
>      requesting 3 clusters on 98 cases, it found that everyone was in cluster
>      3 and zero people were in clusters 1 & 2.  I think part of it is that
>      the starting values seem to be a pattern of 1's and zero's, even though
>      the comments describe selecting random individuals as starting values.
>      
>      My question is about accessing the data.  I copied other code to use a
>      "casereader" to iterate over the rows of data. Below are the relevant
>      parts of the code I've added that seems to display cluster membership.
>      If I want to randomly select cases as starting values, is there a way to
>      retrieve random records directly?
>      
> 
> Ben is the casereader expert!  Maybe he can comment?  But I think you might 
> be able to use the function casereader_select (defined in casereader-select.c)
> 
> casereader_select (subreader, random_number - 1, random_number + 1, 1);
> 
> You would have to ensure that random_number was within the range of subreader.


That seems reasonable to me.

If clustering actually wants a shuffled version of the complete data set
(I don't know if that is true?) then probably more efficient algorithms
are available.

_______________________________________________
pspp-dev mailing list
[email protected]
https://lists.gnu.org/mailman/listinfo/pspp-dev

Re: Are these bugs in cluster?

Reply via email to