[
https://issues.apache.org/jira/browse/MAHOUT-56?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616531#action_12616531
]
Ted Dunning commented on MAHOUT-56:
-----------------------------------
I spoke poorly.
In-memory is a misnomer.
It should be possible to have a large arff dataset in HDFS to be used as input
as well as a large dataset in your format.
However you decide to read your data in, it should be usable by others.
Likewise, by symmetrically, with the arff input.
How that works should depend a little on your data. My feeling is that we will
need something like a "row-wise splitting matrix input format" that sends
groups of rows of a matrix to different mappers. This input format should
accept a configuration argument which is the class to be used to actually
decode the format.
It will probably happen that not all algorithms will be quite so happy with
this, especially the groups of rows part. They may want all mappers to see the
entire data set (if the data set is, say, a set of population members rather
than real data). They may want the mappers to have some row-wise map input,
but have some side data that is read without using an input format.
You are really one of the first to define a real user story for this so you
should feel free to define what you need in the context of what you think
others might be able to use as well.
> Watchmaker Integration
> ----------------------
>
> Key: MAHOUT-56
> URL: https://issues.apache.org/jira/browse/MAHOUT-56
> Project: Mahout
> Issue Type: Task
> Components: Genetic Algorithms
> Reporter: Deneche A. Hakim
> Assignee: Grant Ingersoll
> Priority: Minor
> Fix For: 0.1
>
> Attachments: libs.zip, libs.zip, libs.zip, tsp-screenshot-1.jpg,
> watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch,
> watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch,
> watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch,
> watchmaker-tsp.patch, watchmaker-tsp.patch, watchmaker-tsp.patch,
> watchmaker-tsp.patch
>
>
> The goal of this task is to allow watchmaker definded problems be solved in
> Mahout.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.