[
https://issues.apache.org/jira/browse/MAHOUT-535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Owen updated MAHOUT-535:
-----------------------------
Resolution: Fixed
Status: Resolved (was: Patch Available)
There are some minor issues with the style and formatting, but I've addressed
them locally. I committed with these changes. It seems fine as the change is
essentially to replace File with Path for input, to allow for support of local
or HDFS files.
> mahout seqdirectory reads only from the local filesystem, even when running
> over Hadoop
> ---------------------------------------------------------------------------------------
>
> Key: MAHOUT-535
> URL: https://issues.apache.org/jira/browse/MAHOUT-535
> Project: Mahout
> Issue Type: Improvement
> Components: Utils
> Affects Versions: 0.5
> Environment: local and hadoop
> Reporter: Matt Spitz
> Assignee: Isabel Drost
> Priority: Minor
> Fix For: 0.5
>
> Attachments: 0001-added-HDFS-support-to-seqdirectory.patch
>
>
> It seems as if seqdirectory only reads from the local filesystem, though it
> writes correctly to the HDFS.
> Consider 'myurls-local' and 'myurls-dfs', the former existing in the working
> directory and the latter existing on the home directory of the HDFS.
> Running:
> MAHOUT_HOME=. ./bin/mahout seqdirectory -i myurls-local -o myurls-seqdir -c
> UTF-8 -chunk
> acts as expected (myurls-seqdir is created on the local filesystem)
> Running:
> MAHOUT_HOME=. HADOOP_HOME=/usr/lib/hadoop-0.20
> HADOOP_CONF_DIR=/etc/hadoop-0.20/conf ./bin/mahout seqdirectory -i myurls-dfs
> -o myurls-seqdir -c UTF-8 -chunk
> creates a 12kb myurls-seqdir directory on the DFS. Presumably, it couldn't
> read myurls-dfs from the DFS and ended up creating a nearly-empty sequence
> directory.
> Running:
> MAHOUT_HOME=. HADOOP_HOME=/usr/lib/hadoop-0.20
> HADOOP_CONF_DIR=/etc/hadoop-0.20/conf ./bin/mahout seqdirectory -i
> myurls-local -o myurls-seqdir -c UTF-8 -chunk
> acts as expected, creating a substantial myurls-seqdir on the DFS.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.