On 2 December 2011 19:31, Raphael Cendrillon <cendrillon1...@gmail.com> wrote:
> Is this something people would find useful?
>
> How would you like to sparsify the matrix? Using a threshold, or something 
> else like target number of elements per row?

I can't yet swear hand-on-heart that I need this (I was thinking
threshold btw), but here's the path that led me to think it might be
useful:

I first made some nice practical use of RowSimilarityJob with a sparse
matrix of book rows * subject code columns. Later I tried a similar
dataset, but first tried pre-processing it with dimension reduction
(Lanczos in this case). However the reduced form of my data as it came
out of Lanczos was a full matrix. From a quick poke into the data it
looked like it still had a lot of zeros in it, but I didn't yet do the
work to confirm that it could usefully be turned back into sparse
form. Or even count the zeros or near-zeros.

If the scenario makes sense to others, in terms of plugging together
pieces of Mahout, it might be worthwhile. But I don't want to request
it without more experience / experimentation. Does it sound plausible
/ useful?

Dan

> On Dec 2, 2011, at 10:04 AM, Ted Dunning <ted.dunn...@gmail.com> wrote:
>
>> No.
>>
>> On Fri, Dec 2, 2011 at 4:03 AM, Dan Brickley (Commented) (JIRA) <
>> j...@apache.org> wrote:

>>> https://issues.apache.org/jira/browse/MAHOUT-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13161562#comment-13161562]
>>>
>>> Dan Brickley commented on MAHOUT-880:
>>> -------------------------------------
>>>
>>> Does Mahout yet have a method to take a large full matrix, and convert it
>>> sparse matrix format (losing zero values or perhaps if it makes sense,
>>> near-zero values also...)?

Reply via email to