Re: sparse matrix format

Jeff Eastman Tue, 02 Dec 2008 16:36:42 -0800

Hi Pradhuman,

All of the clustering algorithms use our vector implementation, and theactual class used (Sparse or Dense) should depend upon the encodingformat used. If you write a preprocessor job to get your input vectorsin the right format before running a clustering job on them I suggestusing the SparseVector implementation. It will serialize itself in amanner similar to your example (though I'd expect to see just '[sM, 'where M is the cardinality of the vector).


Jeff

Pradhuman Jhala wrote:

Hi,
I am looking for documentation on the input format, particularly, the sparse matrix format, supported by various supervised & unsupervised algorithms available in Mahout. It looks like 'sparse matrix format' is supported, but I am not able to find details of it.While looking at the way kmean clustering uses org.apache.mahout.matrix package, it seems, it expects data in the"[sM+2, index_1:value_1, index_2:value_2, ...., index_M:value_M, ] format, for it be considered as 'sparse'. Just wondering if this is correct and consistant across all clutering algorithms.thanks.Pradhuman

Re: sparse matrix format

Reply via email to