[ 
https://issues.apache.org/jira/browse/MAHOUT-621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13004651#comment-13004651
 ] 

Ted Dunning commented on MAHOUT-621:
------------------------------------

The data sources that I have mostly seen include:

- document like things that have semi-structured fields.  This includes most of 
our recommendation style inputs if you do a group by user id and collect
the values of the item being rated.  It also includes document inputs where the 
Lucene document is an excellent example.

- sql queries which ultimately produce something that looks like a document, 
possibly by denormalizing the final query result.

- time series.  The openTSDB project has the nicest time series schema that I 
have seen.



> Support more data import mechanisms
> -----------------------------------
>
>                 Key: MAHOUT-621
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-621
>             Project: Mahout
>          Issue Type: Improvement
>            Reporter: Grant Ingersoll
>              Labels: gsoc2011,, mahout-gsoc-11
>
> We should have more ways of getting data in:
> 1. ARFF (MAHOUT-155)
> 2. CSV (MAHOUT-548)
> 3. Databases
> 4. Behemoth (Tika, Map-Reduce)
> 5. Other

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to