Take a look a o.a.m.clustering.ClusterDumper in mahout-utils. The points file is a SequenceFile<Text,Text> where the key is the vector id and the value is a cluster id.
On Tue, Jan 5, 2010 at 9:51 PM, Bogdan Vatkov <[email protected]> wrote: > I customized the lucene index-to-vector dumper already quite a lot (e.g. > applied stop-words (from file), stop-regex) but I am wondering how the input > vectors are later reachable if I start from cluster vectors, you say points > are somehow doing that, where can I read more or can you tell me more, or is > there a piece of code which would best guide me through the points format?
