Re: [Perldl] Understanding PDL::Stats::Kmeans

Maggie X Mon, 30 Sep 2013 08:18:37 -0700

The first param to random_cluster() should be n, the number of images.

You can use 
which_cluster()<http://pdl-stats.sourceforge.net/Kmeans.htm#which_cluster>
on
the output of random_cluster() to get the cluster index for each image.


I'm curious why you need to use random_cluster() though? You can use
kmeans() <http://pdl-stats.sourceforge.net/Kmeans.htm#kmeans> directly to
cluster the images. random_cluster() provides only the initial random
assignment of images to clusters.


Best,
Maggie


On Sat, Sep 28, 2013 at 10:09 AM, Chris Marshall <[email protected]>wrote:

> On Fri, Sep 27, 2013 at 7:08 AM, Hern<E1>n De Angelis
> <[email protected]> wrote:
> >
> > I have no problems in getting the module PDL::Stats
> > to work, following the examples in the web page
> > (http://pdl-stats.sourceforge.net/Kmeans.htm). The module
> > accepts tabular data as input, akin to a data frame in R,
> > so feeding the images to the algorithm is just a matter of
> > clumping the n 2D piddles (images) to a single dimension of
> > size m, and stacking them in a new 2D piddle of dimensions n x
> > m.
> >
> > The problems appear when I try to specify the number of desired
> > clusters. As suggested in the web page, if I use:
> >
> > $cluster = random_cluster( $stack->dim(0), $k );
>
> $ pdldoc random_cluster
>
> Module PDL::Stats::Kmeans
>   random_cluster
>       Signature: (byte [o]cluster(o,c); int obs=>o; int clu=>c)
>
>     Creates masks for random mutually exclusive clusters. Accepts two
>     parameters, num_obs and num_cluster. Extra parameter turns into extra
>     dim in mask. May loop a long time if num_cluster approaches num_obs
>     because empty cluster is not allowed.
>
>         my $cluster = random_cluster( $num_obs, $num_cluster );
>
> Is it m or n that is the observations?  I'm guessing it may
> be the other one.
>
> > where $k is the number of desired clusters, the algorithm
> > complains ("more cluster than obs!")if $k > n, although I do
> > not see any reason why this should be so because as far as I
> > understand there is in principle no limitation to the number of
> > clusters with regard to the number of data dimensions.
> >
> > Another thing that left me scratching my head is how to
> > associate every vector (image pixel) to a cluster number, so I
> > can fold them back into a classified image. The algorithm puts
> > the result in a hash, but I see no obvious way to relate this
> > to the observation vectors.
>
> It says that random_cluster returns a piddle of vectors x number
> of clusters where I presume the slice for each value of the
> cluster index is a mask of which pixels fall in that cluster.
> You can use which to select coordinates, make a cluster number
> image by doing an inner product along the cluster dimension
> of something like pdl [1..numclusters] which would give you
> an "image" where the "color" of each value is the cluster
> number (note the use of 1-based count).
>
> Hope this helps,
> Chris
>
> _______________________________________________
> Perldl mailing list
> [email protected]
> http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
>

_______________________________________________
Perldl mailing list
[email protected]
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl

Re: [Perldl] Understanding PDL::Stats::Kmeans

Reply via email to