ok, just reproduced w/ code from trunk :|
On Wed, Jan 13, 2010 at 11:07 PM, Bogdan Vatkov <[email protected]>wrote:
> I see a stack when the size of the vectore mean is set to 2:
>
> Daemon Thread [Thread-9] (Suspended (breakpoint at line 48 in NormalModel))
> NormalModel.<init>(Vector, double) line: 48
> NormalModelDistribution.sampleFromPrior(int) line: 33
> DirichletState<O>.<init>(ModelDistribution<O>, int, double, int, int)
> line: 48
> DirichletDriver.createState(String, int, double) line: 172
> DirichletDriver.writeInitialState(String, String, String, int, double)
> line: 150
> DirichletDriver.runJob(String, String, String, int, int, double, int)
> line: 133
> DirichletDriver.main(String[]) line: 109
> Clusters.doClustering() line: 244
> Clusters.access$0(Clusters) line: 175
> Clusters$1.run() line: 148
> Thread.run() line: 619
>
>
> public class NormalModelDistribution implements ModelDistribution<Vector> {
> @Override public Model<Vector>[] sampleFromPrior(int howMany) {
> Model<Vector>[] result = new NormalModel[howMany]; for (int i = 0; i <
> howMany; i++) { result[i] = new NormalModel(new DenseVector(2), 1); } return
> result; }
>
> and later this vector is dotted to
> @Override
> public double pdf(Vector x) {
> double sd2 = stdDev * stdDev;
> double exp = -(x.dot(x) - 2 * x.dot(mean) + mean.dot(mean)) / (2 *
> sd2);
> double ex = Math.exp(exp);
> return ex / (stdDev * sqrt2pi);
> }
>
> x vector which is coming from Hadoop MapRunner through the map function:
>
> public void map(WritableComparable<?> key, Vector v,
> OutputCollector<Text, Vector> output, Reporter reporter)
> throws IOException {
>
>
> any idea?
>
> btw, I am running Mahout 0.2...should I move to 0.3 or to trunk? is it safe
> enough to run against trunk?
>
> On Wed, Jan 13, 2010 at 10:13 PM, Ted Dunning <[email protected]>wrote:
>
>> On Wed, Jan 13, 2010 at 11:53 AM, Bogdan Vatkov <[email protected]
>> >wrote:
>>
>> > Sorry, what does that mean :)?
>> >
>>
>> It means that there is probably a programming bug somehow. At the very
>> least, the program is not robust with respect to strange invocations.
>>
>>
>> > what is a dotted vector? and why aren't they the same?
>> >
>>
>> dot product is a vector operation that is the sum of products of
>> corresponding elements of the two vectors being operated on. If these
>> vectors don't have the same length, then it is an error.
>>
>> what should I investigate?
>> >
>>
>> I am not familiar with the code, but if I had time to look, my strategy
>> would be to start in the NormalModel and work back up the stack trace to
>> find out how the vectors came to be different lengths. No doubt, the code
>> in NormalModel will not tell you anything, but you can see which vectors
>> are
>> involved and by walking up the stack you may be able to see where they
>> come
>> from.
>>
>>
>> > I am basically running my complete kmeans scenario (same input data,
>> same
>> > number of clusters param, etc.) but just replacing KmeansDriver.main
>> step
>> > with a DirichletDriver.main call...of course the arguments are adjusted
>> > since kmeans and dirichlet do not have the same arguments.
>> >
>>
>> I would think that this sounds very plausible.
>>
>>
>> > I am not sure what number I should give for the alpha argument,
>>
>>
>> Alpha should have a value in the range from 0.01 to 20. I would scan with
>> 1,2, 5 magnitude steps to see what works well for your data. (i.e. 0.01,
>> 0.02, 0.05, 0.1, 0.2 ... 20). A value of 1 is a fine place to start. The
>> effect of different values should be small over a pretty wide range.
>>
>>
>> > iterations
>> > and reductions...here is my current argument set:
>> >
>> > args = new String[] {
>> > "--input",
>> >
>> >
>> "/store/dev/inst/mahout-0.2/email-clustering/1-solr-vectors/solr_index.vec",
>> > "--output", config.getClustersDir(),
>> > "--modelClass",
>> > "org.apache.mahout.clustering.dirichlet.models.NormalModelDistribution",
>> > "--maxIter", "15",
>> > "--alpha", "1.0",
>> > "--k", config.getClustersCount(),
>> > "--maxRed", "2"
>> > };
>> >
>> >
>> Not off-hand.
>>
>
>
>
> --
> Best regards,
> Bogdan
>
>
--
Best regards,
Bogdan