>From the view of the algorithm, in order to get more stable clustered result, I think it's better to use -t1 and -t2 parameters instead of -k.
What you think, guys? 2010/2/25 Robin Anil <[email protected]> > yeah... Now you cant write RandomAccessSparseVectors or any impls > > there is a class VectorWritable which does the serialization > > use VectorWritable.class in sequence file constructor > > for writing > > VectorWritable writable = new VectorWritable(); > > for(each data) { > Vector v = new RandomAccessSparseVector(your data); > > writable.set(v); > writer.write(key, writable); > } > > > > > > > On Thu, Feb 25, 2010 at 2:19 PM, Arshad Khan <[email protected] > >wrote: > > > This may not be related to Mahout but I am seeing an NPE when creating a > > SequenceFile.Writer. The following code used to work fine with Mahout 0.2 > > but now breaks with 0.3. I suspected that some dependencies may be > causing > > that so updated the classpath with all deps for 0.3 but the exception > still > > happens. The only diff from before is the use of RandomAccessSparseVector > > instead of SparseVector. > > > > VectorWriter sfWriter; > > Path path = new Path(outFile); > > Configuration conf = new Configuration(); > > FileSystem fs = FileSystem.get(conf); > > > > // NPE happens on following line > > SequenceFile.Writer seqWriter = SequenceFile.createWriter(fs, > conf, > > path, > > LongWritable.class, > > RandomAccessSparseVector.class); > > > > Stack trace is follows: > > java.lang.NullPointerException > > at > > > > > org.apache.hadoop.io.serializer.SerializationFactory.getSerializer(SerializationFactory.java:73) > > at > > org.apache.hadoop.io.SequenceFile$Writer.init(SequenceFile.java:910) > > at > > > > > org.apache.hadoop.io.SequenceFile$RecordCompressWriter.<init>(SequenceFile.java:1074) > > at > > org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:397) > > at > > org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:284) > > at > > org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:265) > > at > > com.lilly.tma.lm.cluster.Analyzer.getSeqFileWriter(Analyzer.java:119) > > > > Any help is greatly appreciated. > > > > Thanks > > > > On Thu, Feb 25, 2010 at 2:27 PM, Arshad Khan <[email protected] > > >wrote: > > > > > Thanks for the help and explanation. :) > > > > > > > > > On Thu, Feb 25, 2010 at 1:20 PM, Jake Mannix <[email protected] > > >wrote: > > > > > >> And to clarify: you can use either one, but you should think of them > > like > > >> this: > > >> RandomAccessSparseVector is useful for vectors whose contents change > > >> a great deal (the moving centroids of a clustering algorithm, for > > >> example), > > >> and SequentialAccessSparseVector are useful (ie faster) in the case > > where > > >> they are built up, and then are essentially used in an immutable > fashion > > >> (you repeatedly compute a lot of dot-products and add multiples of > them > > >> onto other vectors [either DenseVectors or > RandomAccessSparseVectors]). > > >> > > >> -jake > > >> > > >> On Wed, Feb 24, 2010 at 7:45 PM, Robin Anil <[email protected]> > > wrote: > > >> > > >> > They are replaced by the two impls RandomAccessSparseVector or > > >> > SequentialAccessSparseVector > > >> > > > >> > > > >> > On Thu, Feb 25, 2010 at 9:10 AM, Arshad Khan < > [email protected] > > >> > >wrote: > > >> > > > >> > > Thanks for the quick reply. > > >> > > > > >> > > I have downloaded the latest 0.3 code. There seems to be > significant > > >> > > changes > > >> > > in this version. For example, currently I am using > > >> > > org.apache.mahout.matrix.SparseVector class but in 0.3 I cannot > find > > >> this > > >> > > class. > > >> > > > > >> > > What class it is replaced with? > > >> > > > > >> > > Thanks > > >> > > > > >> > > On Thu, Feb 25, 2010 at 10:12 AM, Ted Dunning < > > [email protected]> > > >> > > wrote: > > >> > > > > >> > > > There are known problems with that version of k-means. > > >> > > > > > >> > > > Try using the trunk version. 0.3 is very close and we are > > entering > > >> > code > > >> > > > freeze for that so you should be fine with the latest version. > > >> > > > > > >> > > > On Wed, Feb 24, 2010 at 5:46 PM, Arshad Khan < > > >> [email protected] > > >> > > > >wrote: > > >> > > > > > >> > > > > Hello > > >> > > > > > > >> > > > > I am using Mahout 0.2 implementation of KMeans in one of my > Text > > >> > Mining > > >> > > > > project. I apply KMeans with a default K value of 4. It seems > > that > > >> > > every > > >> > > > > time I repeat the clustering process on the same data set, the > > >> > results > > >> > > > are > > >> > > > > different and difference (in terms of cluster size and > > membership) > > >> is > > >> > > > great > > >> > > > > from run to run. The initial set of centroid points are chosen > > >> > randomly > > >> > > > > through RandomSeedGenerator. Is there a way to obtain more > > >> consistent > > >> > > > > results that do not differ so greatly? Or may be I am doing > > >> something > > >> > > > > wrong? > > >> > > > > > > >> > > > > Any help or idea is very much appreciated. > > >> > > > > > > >> > > > > Thanks and Regards > > >> > > > > Arshad > > >> > > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > -- > > >> > > > Ted Dunning, CTO > > >> > > > DeepDyve > > >> > > > > > >> > > > > >> > > > >> > > > > > > > > >
