Dear all,
Ok I tried 0.2 and it seems to work. Due to some comment on the list
regarding KMean in 0.2 I decided to move to trunk since 0.3 seems
close enough.
Nevertheless I had an issue with the creation of a writer (I am still
in the example of Ch 7 of Mahout in Action). Here is what I got:
at
org.apache.hadoop.io.serializer.SerializationFactory.getSerializer(SerializationFactory.java:73)
at org.apache.hadoop.io.SequenceFile$Writer.init(SequenceFile.java:910)
at
org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:843)
at
org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:831)
at
org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:823)
when calling:
writer = new SequenceFile.Writer(fs, conf, path, LongWritable.class,
DenseVector.class)
Anyone having the same issue ?
Regards
Guillaume
On Tue, Feb 23, 2010 at 7:14 PM, Alleon Guillaume
<[email protected]> wrote:
> Hi all,
>
> I am a complete newbie in action ... despite I have gone throuh the book of
> the same collection;)
> I would like to classify a number of items - each of them being characterize
> by a number of vectors. I thought it will be a good idea first to classify
> the vectors. Unfortunately my items number keep on growing so what I have
> done so far is a small piece of code constructing the mahout dense vectors on
> the fly setting the name as my item name. As far as I understand those
> vectors are kept in memory ...
> What are the next steps for me ?
> Storing those vectors on disk I assume :)
> Then creating some canopies and then using kmean to create my clusters.
> Can you guide me trough some steps ?
>
> Then I have more questions ?
> Can mahout determine an "optimal" number of clusters ?
> Once a set of clusters exist and new items are added, is it possible to
> update the existing clusters ? Is it possible to add clusters at alower cost
> than recreting it ?
>
> T hanks for your help and time
> Regards
> Guillaume
>
--
PGP KeyID: 1024D/69B00854 subkeys.pgp.net
http://cheztog.blogspot.com