[
https://issues.apache.org/jira/browse/MAHOUT-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14039803#comment-14039803
]
ASF GitHub Bot commented on MAHOUT-1579:
----------------------------------------
Github user sscdotopen commented on a diff in the pull request:
https://github.com/apache/mahout/pull/19#discussion_r14049581
--- Diff: mrlegacy/pom.xml ---
@@ -208,7 +208,35 @@
<artifactId>solr-commons-csv</artifactId>
<version>3.5.0</version>
</dependency>
-
+
+ <dependency>
--- End diff --
I'm not comfortable with adding all that stuff just for the mini cluster.
Could you rewrite your tests to simply use the local filesystem via Hadoop's
Path API? That should be sufficient.
> Implement a datamodel which can load data from hadoop filesystem directly
> -------------------------------------------------------------------------
>
> Key: MAHOUT-1579
> URL: https://issues.apache.org/jira/browse/MAHOUT-1579
> Project: Mahout
> Issue Type: Improvement
> Reporter: Xiaomeng Huang
> Priority: Minor
> Attachments: Mahout-1579.000.patch
>
>
> As we all know, FileDataModel can only load data from local filesystem.
> But the big-data are usually stored in hadoop filesystem(e.g. hdfs).
> If we want to deal with the data in hdfs, we must run mapred job.
> It's necessay to implement a data model which can load data from hadoop
> filesystem directly.
--
This message was sent by Atlassian JIRA
(v6.2#6252)