On Wed, Jan 13, 2010 at 11:53 AM, Bogdan Vatkov <[email protected]>wrote:

> Sorry, what does that mean :)?
>

It means that there is probably a programming bug somehow.  At the very
least, the program is not robust with respect to strange invocations.


> what is a dotted vector? and why aren't they the same?
>

dot product is a vector operation that is the sum of products of
corresponding elements of the two vectors being operated on.  If these
vectors don't have the same length, then it is an error.

what should I investigate?
>

I am not familiar with the code, but if I had time to look, my strategy
would be to start in the NormalModel and work back up the stack trace to
find out how the vectors came to be different lengths.  No doubt, the code
in NormalModel will not tell you anything, but you can see which vectors are
involved and by walking up the stack you may be able to see where they come
from.


> I am basically running my complete kmeans scenario (same input data, same
> number of clusters param, etc.) but just replacing KmeansDriver.main step
> with a DirichletDriver.main call...of course the arguments are adjusted
> since kmeans and dirichlet do not have the same arguments.
>

I would think that this sounds very plausible.


> I am not sure what number I should give for the alpha argument,


Alpha should have a value in the range from 0.01 to 20.  I would scan with
1,2, 5 magnitude steps to see what works well for your data.  (i.e. 0.01,
0.02, 0.05, 0.1, 0.2 ... 20).  A value of 1 is a fine place to start.  The
effect of different values should be small over a pretty wide range.


> iterations
> and reductions...here is my current argument set:
>
> args = new String[] {
> "--input",
>
> "/store/dev/inst/mahout-0.2/email-clustering/1-solr-vectors/solr_index.vec",
> "--output", config.getClustersDir(),
> "--modelClass",
> "org.apache.mahout.clustering.dirichlet.models.NormalModelDistribution",
> "--maxIter", "15",
> "--alpha", "1.0",
> "--k", config.getClustersCount(),
> "--maxRed", "2"
> };
>
>
Not off-hand.

Reply via email to