On 2 December 2011 19:31, Raphael Cendrillon <cendrillon1...@gmail.com> wrote: > Is this something people would find useful? > > How would you like to sparsify the matrix? Using a threshold, or something > else like target number of elements per row?
I can't yet swear hand-on-heart that I need this (I was thinking threshold btw), but here's the path that led me to think it might be useful: I first made some nice practical use of RowSimilarityJob with a sparse matrix of book rows * subject code columns. Later I tried a similar dataset, but first tried pre-processing it with dimension reduction (Lanczos in this case). However the reduced form of my data as it came out of Lanczos was a full matrix. From a quick poke into the data it looked like it still had a lot of zeros in it, but I didn't yet do the work to confirm that it could usefully be turned back into sparse form. Or even count the zeros or near-zeros. If the scenario makes sense to others, in terms of plugging together pieces of Mahout, it might be worthwhile. But I don't want to request it without more experience / experimentation. Does it sound plausible / useful? Dan > On Dec 2, 2011, at 10:04 AM, Ted Dunning <ted.dunn...@gmail.com> wrote: > >> No. >> >> On Fri, Dec 2, 2011 at 4:03 AM, Dan Brickley (Commented) (JIRA) < >> j...@apache.org> wrote: >>> https://issues.apache.org/jira/browse/MAHOUT-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13161562#comment-13161562] >>> >>> Dan Brickley commented on MAHOUT-880: >>> ------------------------------------- >>> >>> Does Mahout yet have a method to take a large full matrix, and convert it >>> sparse matrix format (losing zero values or perhaps if it makes sense, >>> near-zero values also...)?