[ https://issues.apache.org/jira/browse/MAHOUT-535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shige Takeda updated MAHOUT-535: -------------------------------- Affects Version/s: (was: 0.4) (was: 0.3) 0.5 Status: Patch Available (was: Open) hi, I came across the need of HDFS support in Sequence Files from Directory, and I made up a patch to get my job done with a simple test case. I ran "mvn clean install" on both Mac OS X 10.6.6 and RHL2.6.18-162.2.1.el5. Hope I'm on the right track. > mahout seqdirectory reads only from the local filesystem, even when running > over Hadoop > --------------------------------------------------------------------------------------- > > Key: MAHOUT-535 > URL: https://issues.apache.org/jira/browse/MAHOUT-535 > Project: Mahout > Issue Type: Improvement > Components: Utils > Affects Versions: 0.5 > Environment: local and hadoop > Reporter: Matt Spitz > Assignee: Isabel Drost > Priority: Minor > Fix For: 0.5 > > Attachments: 0001-added-HDFS-support-to-seqdirectory.patch > > > It seems as if seqdirectory only reads from the local filesystem, though it > writes correctly to the HDFS. > Consider 'myurls-local' and 'myurls-dfs', the former existing in the working > directory and the latter existing on the home directory of the HDFS. > Running: > MAHOUT_HOME=. ./bin/mahout seqdirectory -i myurls-local -o myurls-seqdir -c > UTF-8 -chunk > acts as expected (myurls-seqdir is created on the local filesystem) > Running: > MAHOUT_HOME=. HADOOP_HOME=/usr/lib/hadoop-0.20 > HADOOP_CONF_DIR=/etc/hadoop-0.20/conf ./bin/mahout seqdirectory -i myurls-dfs > -o myurls-seqdir -c UTF-8 -chunk > creates a 12kb myurls-seqdir directory on the DFS. Presumably, it couldn't > read myurls-dfs from the DFS and ended up creating a nearly-empty sequence > directory. > Running: > MAHOUT_HOME=. HADOOP_HOME=/usr/lib/hadoop-0.20 > HADOOP_CONF_DIR=/etc/hadoop-0.20/conf ./bin/mahout seqdirectory -i > myurls-local -o myurls-seqdir -c UTF-8 -chunk > acts as expected, creating a substantial myurls-seqdir on the DFS. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.