[ 
https://issues.apache.org/jira/browse/MAHOUT-61?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608839#action_12608839
 ] 

karl.wettin edited comment on MAHOUT-61 at 6/27/08 10:00 AM:
-------------------------------------------------------------

M/R version of previous patch. 

The only thing it does is to compile. I'll be replacing the todos with code 
soon enough. It is still Maven only!

One thing I'm not quite certain about how to solve is how to handle features 
that are class values, for instance the news group when parsing 20NewsGroups.

Comments most appreciated. 

      was (Author: karl.wettin):
    M/R version of previous patch. 

The only thing it does is to compile. I'll be replacing the todos with code 
soon enough. 

One thing I'm not quite certain about how to solve is how to handle features 
that are class values, for instance the news group when parsing 20NewsGroups.

Comments most appreciated. 
  
> Text problem matrix builder 
> ----------------------------
>
>                 Key: MAHOUT-61
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-61
>             Project: Mahout
>          Issue Type: New Feature
>            Reporter: Karl Wettin
>            Assignee: Karl Wettin
>            Priority: Minor
>         Attachments: MAHOUT-61.txt, MAHOUT-61.txt
>
>
> A set of classes that builds matrices from text.
> Currently the API consists of TokenMatrixBuilder and TokenInstanceBuilder. 
> Should be thread safe.
> PostReader imports 20news-bydate. This takes several GB heap. It would be 
> nice to bounce the data via JDBM or perhaps using the PersistentHashMap in 
> MAHOUT-19.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to