Re: [Perldl] Understanding PDL::Stats::Kmeans

Chris Marshall Sat, 28 Sep 2013 07:14:24 -0700

On Fri, Sep 27, 2013 at 7:08 AM, Hern<E1>n De Angelis
<[email protected]> wrote:
>
> I have no problems in getting the module PDL::Stats
> to work, following the examples in the web page
> (http://pdl-stats.sourceforge.net/Kmeans.htm). The module
> accepts tabular data as input, akin to a data frame in R,
> so feeding the images to the algorithm is just a matter of
> clumping the n 2D piddles (images) to a single dimension of
> size m, and stacking them in a new 2D piddle of dimensions n x
> m.
>
> The problems appear when I try to specify the number of desired
> clusters. As suggested in the web page, if I use:
>
> $cluster = random_cluster( $stack->dim(0), $k );


$ pdldoc random_cluster

Module PDL::Stats::Kmeans
  random_cluster
      Signature: (byte [o]cluster(o,c); int obs=>o; int clu=>c)

    Creates masks for random mutually exclusive clusters. Accepts two
    parameters, num_obs and num_cluster. Extra parameter turns into extra
    dim in mask. May loop a long time if num_cluster approaches num_obs
    because empty cluster is not allowed.

        my $cluster = random_cluster( $num_obs, $num_cluster );

Is it m or n that is the observations?  I'm guessing it may
be the other one.

> where $k is the number of desired clusters, the algorithm
> complains ("more cluster than obs!")if $k > n, although I do
> not see any reason why this should be so because as far as I
> understand there is in principle no limitation to the number of
> clusters with regard to the number of data dimensions.
>
> Another thing that left me scratching my head is how to
> associate every vector (image pixel) to a cluster number, so I
> can fold them back into a classified image. The algorithm puts
> the result in a hash, but I see no obvious way to relate this
> to the observation vectors.

It says that random_cluster returns a piddle of vectors x number
of clusters where I presume the slice for each value of the
cluster index is a mask of which pixels fall in that cluster.
You can use which to select coordinates, make a cluster number
image by doing an inner product along the cluster dimension
of something like pdl [1..numclusters] which would give you
an "image" where the "color" of each value is the cluster
number (note the use of 1-based count).

Hope this helps,
Chris

_______________________________________________
Perldl mailing list
[email protected]
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl

Re: [Perldl] Understanding PDL::Stats::Kmeans

Reply via email to