Prepare document vectors from the text --------------------------------------
Key: MAHOUT-126 URL: https://issues.apache.org/jira/browse/MAHOUT-126 Project: Mahout Issue Type: New Feature Reporter: Shashikant Kore Clustering algorithms presently take the document vectors as input. Generating these document vectors from the text can be broken in two tasks. 1. Create lucene index of the input plain-text documents 2. From the index, generate the document vectors (sparse) with weights as TF-IDF values of the term. With lucene index, this value can be calculated very easily. Presently, I have created two separate utilities, which could possibly be invoked from another class. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.