[
https://issues.apache.org/jira/browse/MAHOUT-621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13004651#comment-13004651
]
Ted Dunning commented on MAHOUT-621:
------------------------------------
The data sources that I have mostly seen include:
- document like things that have semi-structured fields. This includes most of
our recommendation style inputs if you do a group by user id and collect
the values of the item being rated. It also includes document inputs where the
Lucene document is an excellent example.
- sql queries which ultimately produce something that looks like a document,
possibly by denormalizing the final query result.
- time series. The openTSDB project has the nicest time series schema that I
have seen.
> Support more data import mechanisms
> -----------------------------------
>
> Key: MAHOUT-621
> URL: https://issues.apache.org/jira/browse/MAHOUT-621
> Project: Mahout
> Issue Type: Improvement
> Reporter: Grant Ingersoll
> Labels: gsoc2011,, mahout-gsoc-11
>
> We should have more ways of getting data in:
> 1. ARFF (MAHOUT-155)
> 2. CSV (MAHOUT-548)
> 3. Databases
> 4. Behemoth (Tika, Map-Reduce)
> 5. Other
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira