Re: [jira] Updated: (MAHOUT-126) Prepare document vectors from the text

David Hall Fri, 19 Jun 2009 01:01:05 -0700

Ignore this. Wrong issue.


On Fri, Jun 19, 2009 at 12:59 AM, David Hall (JIRA)<j...@apache.org> wrote:
>
>     [ 
> https://issues.apache.org/jira/browse/MAHOUT-126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>  ]
>
> David Hall updated MAHOUT-126:
> ------------------------------
>
>    Attachment: MAHOUT-123.patch
>
> Ok, I'm going to call this a mostly functional patch.
>
>> Prepare document vectors from the text
>> --------------------------------------
>>
>>                 Key: MAHOUT-126
>>                 URL: https://issues.apache.org/jira/browse/MAHOUT-126
>>             Project: Mahout
>>          Issue Type: New Feature
>>    Affects Versions: 0.2
>>            Reporter: Shashikant Kore
>>            Assignee: Grant Ingersoll
>>             Fix For: 0.2
>>
>>         Attachments: mahout-126-benson.patch, 
>> MAHOUT-126-no-normalization.patch, MAHOUT-126-no-normalization.patch, 
>> MAHOUT-126-null-entry.patch, MAHOUT-126-TF.patch, MAHOUT-126.patch, 
>> MAHOUT-126.patch, MAHOUT-126.patch, MAHOUT-126.patch
>>
>>
>> Clustering algorithms presently take the document vectors as input.  
>> Generating these document vectors from the text can be broken in two tasks.
>> 1. Create lucene index of the input  plain-text documents
>> 2. From the index, generate the document vectors (sparse) with weights as 
>> TF-IDF values of the term. With lucene index, this value can be calculated 
>> very easily.
>> Presently, I have created two separate utilities, which could possibly be 
>> invoked from another class.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>

Re: [jira] Updated: (MAHOUT-126) Prepare document vectors from the text

Reply via email to