[ https://issues.apache.org/jira/browse/MAHOUT-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629056#comment-13629056 ]
Gokhan Capan commented on MAHOUT-1178: -------------------------------------- Hi Sebastian, I did, though I'm not sure if I did it correctly:) Anyway, if it is correct, the diff here and there are not the same (the base directories I created the diffs are different, and the one in reviewboard is in a single diff file. Code is same though, I hope this is not a problem) > GSOC 2013: Improve Lucene support in Mahout > ------------------------------------------- > > Key: MAHOUT-1178 > URL: https://issues.apache.org/jira/browse/MAHOUT-1178 > Project: Mahout > Issue Type: New Feature > Reporter: Dan Filimon > Labels: gsoc2013, mentor > Attachments: MAHOUT-1178.patch, MAHOUT-1178-TEST.patch > > > [via Ted Dunning] > It should be possible to view a Lucene index as a matrix. This would > require that we standardize on a way to convert documents to rows. There > are many choices, the discussion of which should be deferred to the actual > work on the project, but there are a few obvious constraints: > a) it should be possible to get the same result as dumping the term vectors > for each document each to a line and converting that result using standard > Mahout methods. > b) numeric fields ought to work somehow. > c) if there are multiple text fields that ought to work sensibly as well. > Two options include dumping multiple matrices or to convert the fields > into a single row of a single matrix. > d) it should be possible to refer back from a row of the matrix to find the > correct document. THis might be because we remember the Lucene doc number > or because a field is named as holding a unique id. > e) named vectors and matrices should be used if plausible. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira