[ 
https://issues.apache.org/jira/browse/MAHOUT-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629056#comment-13629056
 ] 

Gokhan Capan edited comment on MAHOUT-1178 at 4/11/13 4:21 PM:
---------------------------------------------------------------

Hi Sebastian,

I did, though I'm not sure if I did it correctly:) Anyway, if it is correct, 
the diff here and there are not the same (the base directories I created the 
diffs are different, and the one in reviewboard is in a single diff file. Code 
is same though, I hope this is not a problem)

Update: adding the link https://reviews.apache.org/r/10420/
                
      was (Author: gokhancapan):
    Hi Sebastian,

I did, though I'm not sure if I did it correctly:) Anyway, if it is correct, 
the diff here and there are not the same (the base directories I created the 
diffs are different, and the one in reviewboard is in a single diff file. Code 
is same though, I hope this is not a problem)

                  
> GSOC 2013: Improve Lucene support in Mahout
> -------------------------------------------
>
>                 Key: MAHOUT-1178
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1178
>             Project: Mahout
>          Issue Type: New Feature
>            Reporter: Dan Filimon
>              Labels: gsoc2013, mentor
>         Attachments: MAHOUT-1178.patch, MAHOUT-1178-TEST.patch
>
>
> [via Ted Dunning]
> It should be possible to view a Lucene index as a matrix.  This would
> require that we standardize on a way to convert documents to rows.  There
> are many choices, the discussion of which should be deferred to the actual
> work on the project, but there are a few obvious constraints:
> a) it should be possible to get the same result as dumping the term vectors
> for each document each to a line and converting that result using standard
> Mahout methods.
> b) numeric fields ought to work somehow.
> c) if there are multiple text fields that ought to work sensibly as well.
>  Two options include dumping multiple matrices or to convert the fields
> into a single row of a single matrix.
> d) it should be possible to refer back from a row of the matrix to find the
> correct document.  THis might be because we remember the Lucene doc number
> or because a field is named as holding a unique id.
> e) named vectors and matrices should be used if plausible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to