[ 
https://issues.apache.org/jira/browse/MAHOUT-535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated MAHOUT-535:
-----------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

There are some minor issues with the style and formatting, but I've addressed 
them locally. I committed with these changes. It seems fine as the change is 
essentially to replace File with Path for input, to allow for support of local 
or HDFS files.

> mahout seqdirectory reads only from the local filesystem, even when running 
> over Hadoop
> ---------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-535
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-535
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Utils
>    Affects Versions: 0.5
>         Environment: local and hadoop
>            Reporter: Matt Spitz
>            Assignee: Isabel Drost
>            Priority: Minor
>             Fix For: 0.5
>
>         Attachments: 0001-added-HDFS-support-to-seqdirectory.patch
>
>
> It seems as if seqdirectory only reads from the local filesystem, though it 
> writes correctly to the HDFS.
> Consider 'myurls-local' and 'myurls-dfs', the former existing in the working 
> directory and the latter existing on the home directory of the HDFS.
> Running:
> MAHOUT_HOME=. ./bin/mahout seqdirectory -i myurls-local -o myurls-seqdir -c 
> UTF-8 -chunk 
> acts as expected (myurls-seqdir is created on the local filesystem)
> Running:
> MAHOUT_HOME=. HADOOP_HOME=/usr/lib/hadoop-0.20 
> HADOOP_CONF_DIR=/etc/hadoop-0.20/conf ./bin/mahout seqdirectory -i myurls-dfs 
> -o myurls-seqdir -c UTF-8 -chunk 
> creates a 12kb myurls-seqdir directory on the DFS.  Presumably, it couldn't 
> read myurls-dfs from the DFS and ended up creating a nearly-empty sequence 
> directory.
> Running:
> MAHOUT_HOME=. HADOOP_HOME=/usr/lib/hadoop-0.20 
> HADOOP_CONF_DIR=/etc/hadoop-0.20/conf ./bin/mahout seqdirectory -i 
> myurls-local -o myurls-seqdir -c UTF-8 -chunk 
> acts as expected, creating a substantial myurls-seqdir on the DFS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to