Sorry, didn't see this until I had already fixed it. On Jun 7, 2013, at 5:59 AM, "Suneel Marthi (JIRA)" <j...@apache.org> wrote:
> > [ > https://issues.apache.org/jira/browse/MAHOUT-944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13677794#comment-13677794 > ] > > Suneel Marthi commented on MAHOUT-944: > -------------------------------------- > > I'll take care of the Version thing, have a JIRA M-1244 open for that. > >> LuceneIndexToSequenceFiles (lucene2seq) utility >> ----------------------------------------------- >> >> Key: MAHOUT-944 >> URL: https://issues.apache.org/jira/browse/MAHOUT-944 >> Project: Mahout >> Issue Type: New Feature >> Components: Integration >> Affects Versions: 0.5 >> Reporter: Frank Scholten >> Assignee: Grant Ingersoll >> Priority: Minor >> Fix For: 0.8 >> >> Attachments: MAHOUT-944-minor.patch, MAHOUT-944.patch, >> MAHOUT-944.patch, MAHOUT-944.patch, MAHOUT-944.patch, MAHOUT-944.patch, >> MAHOUT-944.patch, MAHOUT-944.patch, MAHOUT-944.patch, MAHOUT-944.patch, >> MAHOUT-944.patch, MAHOUT-944.patch, MAHOUT-944.patch >> >> >> Here is a lucene2seq tool I used in a project. It creates sequence files >> based on the stored fields of a lucene index. >> The output from this tool can be then fed into seq2sparse and from there you >> can do text clustering. >> Comes with Java bean configuration. >> Let me know what you think. Some CLI code can be added later on. I used this >> for a small-scale project +- 100.000 docs. Is a MR version useful or is that >> overkill? >> See https://github.com/frankscholten/mahout/tree/lucene2seq for commits and >> review comments from Simon Willnauer (Thanks Simon!) >> or the attached patch. > > -- > This message is automatically generated by JIRA. > If you think it was sent incorrectly, please contact your JIRA administrators > For more information on JIRA, see: http://www.atlassian.com/software/jira -------------------------------------------- Grant Ingersoll | @gsingers http://www.lucidworks.com