[ 
https://issues.apache.org/jira/browse/MAHOUT-56?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616531#action_12616531
 ] 

Ted Dunning commented on MAHOUT-56:
-----------------------------------


I spoke poorly.  

In-memory is a misnomer.


It should be possible to have a large arff dataset in HDFS to be used as input 
as well as a large dataset in your format.

However you decide to read your data in, it should be usable by others.  
Likewise, by symmetrically, with the arff input.  

How that works should depend a little on your data.  My feeling is that we will 
need something like a "row-wise splitting matrix input format" that sends 
groups of rows of a matrix to different mappers.  This input format should 
accept a configuration argument which is the class to be used to actually 
decode the format.

It will probably happen that not all algorithms will be quite so happy with 
this, especially the groups of rows part.  They may want all mappers to see the 
entire data set (if the data set is, say, a set of population members rather 
than real data).  They may want the mappers to have some row-wise map input, 
but have some side data that is read without using an input format.

You are really one of the first to define a real user story for this so you 
should feel free to define what you need in the context of what you think 
others might be able to use as well. 

> Watchmaker Integration
> ----------------------
>
>                 Key: MAHOUT-56
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-56
>             Project: Mahout
>          Issue Type: Task
>          Components: Genetic Algorithms
>            Reporter: Deneche A. Hakim
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.1
>
>         Attachments: libs.zip, libs.zip, libs.zip, tsp-screenshot-1.jpg, 
> watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, 
> watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, 
> watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, 
> watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch, 
> watchmaker-tsp.patch
>
>
> The goal of this task is to allow watchmaker definded problems be solved in 
> Mahout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to