[ https://issues.apache.org/jira/browse/MAHOUT-61?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12611732#action_12611732 ]
Karl Wettin commented on MAHOUT-61: ----------------------------------- I suppose next step is to pass on the data to some algorithm. I'm going to start with MAHOUT-19. > Text problem matrix builder > ---------------------------- > > Key: MAHOUT-61 > URL: https://issues.apache.org/jira/browse/MAHOUT-61 > Project: Mahout > Issue Type: New Feature > Reporter: Karl Wettin > Assignee: Karl Wettin > Priority: Minor > Attachments: MAHOUT-61.txt, MAHOUT-61.txt, MAHOUT-61.txt > > > A set of classes that builds matrices from text. > Currently the API consists of TokenMatrixBuilder and TokenInstanceBuilder. > Should be thread safe. > PostReader imports 20news-bydate. This takes several GB heap. It would be > nice to bounce the data via JDBM or perhaps using the PersistentHashMap in > MAHOUT-19. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.