[ 
https://issues.apache.org/jira/browse/MAHOUT-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated MAHOUT-1579:
-----------------------------------

    Description: 
As we all know, FileDataModel can only load data from local filesystem.
But the big-data are usually stored in hadoop filesystem(e.g. hdfs).
If we want to deal with the data in hdfs, we must run mapred job. And the 
distributed work can only process data form like [userID: ItemID1, ItemID2, 
ItemID3...]
It's necessay to implement a datamodel which can load data from hadoop 
filesystem directly, so that we can process data form like 
[userID,itemID,preference]

  was:
As we all know, FileDataModel can only load data from local filesystem.
But the big-data are usually stored in hadoop filesystem(e.g. hdfs).
If we want to deal with the data in hdfs, we must run mapred job.
It's necessay to implement a datamodel which can load data from hadoop 
filesystem directly.


> Implement a datamodel which can load data from hadoop filesystem directly
> -------------------------------------------------------------------------
>
>                 Key: MAHOUT-1579
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1579
>             Project: Mahout
>          Issue Type: Improvement
>            Reporter: Xiaomeng Huang
>            Priority: Minor
>         Attachments: Mahout-1579.patch
>
>
> As we all know, FileDataModel can only load data from local filesystem.
> But the big-data are usually stored in hadoop filesystem(e.g. hdfs).
> If we want to deal with the data in hdfs, we must run mapred job. And the 
> distributed work can only process data form like [userID: ItemID1, ItemID2, 
> ItemID3...]
> It's necessay to implement a datamodel which can load data from hadoop 
> filesystem directly, so that we can process data form like 
> [userID,itemID,preference]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to