hi all
i create vector using lucene index, and the mahout will use NamedVector,
but how about create vector from sequenceFile???
now, i create vector from text with the follow steps:
step #1
text -> sequeneceFile
key = text, value = text
i do not use seqdirectory, cuz i want to put the String key into
the sequenceFile, not the doc Id
step #2
seq2sparse using TFIDF
the output i use tfidf-vectors/
step #3 #4
canopy -> kmeans
step #4
clusterDump
i found the vector is org.apache.mahout.math.RandomAccessSparseVector,
and where i can found the sequenceFile key??
thx in advance