[ 
https://issues.apache.org/jira/browse/MAHOUT-535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shige Takeda updated MAHOUT-535:
--------------------------------

    Affects Version/s:     (was: 0.4)
                           (was: 0.3)
                       0.5
               Status: Patch Available  (was: Open)

hi, I came across the need of HDFS support in Sequence Files from Directory, 
and I made up a patch to get my job done with a simple test case.
I ran "mvn clean install" on both Mac OS X 10.6.6 and RHL2.6.18-162.2.1.el5.
Hope I'm on the right track.

> mahout seqdirectory reads only from the local filesystem, even when running 
> over Hadoop
> ---------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-535
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-535
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Utils
>    Affects Versions: 0.5
>         Environment: local and hadoop
>            Reporter: Matt Spitz
>            Assignee: Isabel Drost
>            Priority: Minor
>             Fix For: 0.5
>
>         Attachments: 0001-added-HDFS-support-to-seqdirectory.patch
>
>
> It seems as if seqdirectory only reads from the local filesystem, though it 
> writes correctly to the HDFS.
> Consider 'myurls-local' and 'myurls-dfs', the former existing in the working 
> directory and the latter existing on the home directory of the HDFS.
> Running:
> MAHOUT_HOME=. ./bin/mahout seqdirectory -i myurls-local -o myurls-seqdir -c 
> UTF-8 -chunk 
> acts as expected (myurls-seqdir is created on the local filesystem)
> Running:
> MAHOUT_HOME=. HADOOP_HOME=/usr/lib/hadoop-0.20 
> HADOOP_CONF_DIR=/etc/hadoop-0.20/conf ./bin/mahout seqdirectory -i myurls-dfs 
> -o myurls-seqdir -c UTF-8 -chunk 
> creates a 12kb myurls-seqdir directory on the DFS.  Presumably, it couldn't 
> read myurls-dfs from the DFS and ended up creating a nearly-empty sequence 
> directory.
> Running:
> MAHOUT_HOME=. HADOOP_HOME=/usr/lib/hadoop-0.20 
> HADOOP_CONF_DIR=/etc/hadoop-0.20/conf ./bin/mahout seqdirectory -i 
> myurls-local -o myurls-seqdir -c UTF-8 -chunk 
> acts as expected, creating a substantial myurls-seqdir on the DFS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to