[ https://issues.apache.org/jira/browse/MAHOUT-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrew Palumbo updated MAHOUT-1592: ----------------------------------- Labels: legacy (was: ) > bin/maout's seqdirectory doesn't work when MAHOUT_LOCAL non-empty > ----------------------------------------------------------------- > > Key: MAHOUT-1592 > URL: https://issues.apache.org/jira/browse/MAHOUT-1592 > Project: Mahout > Issue Type: Bug > Components: Integration > Affects Versions: 0.9 > Environment: Linux > Reporter: Alex Ott > Priority: Minor > Labels: legacy > > trying to run seqdirectory with MAHOUT_LOCAL set to non-empty lead to > following error: > {noformat} > >mahout seqdirectory -i ${WORK_DIR}/20news-all -o ${WORK_DIR}/20news-seq -ow > > 13:48 0 > MAHOUT_LOCAL is set, so we don't add HADOOP_CONF_DIR to classpath. > > MAHOUT_LOCAL is set, running locally > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/home/ott/work/mahout-head/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/home/ott/work/mahout-head/examples/target/dependency/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] > 14/07/08 13:50:39 INFO common.AbstractJob: Command line arguments: > {--charset=[UTF-8], --chunkSize=[64], --endPhase=[2147483647], > --fileFilterClass=[org.apache.mahout.text.PrefixAdditionFilter], > --input=[/home/ott/work/exps/mh/20news-all], --keyPrefix=[], > --method=[mapreduce], --output=[/home/ott/work/exps/mh/20news-seq], > --overwrite=null, --startPhase=[0], --tempDir=[temp]} > 14/07/08 13:50:39 INFO common.HadoopUtil: Deleting > /home/ott/work/exps/mh/20news-seq > Exception in thread "main" java.io.FileNotFoundException: File does not > exist: /home/ott/work/url-cat-exps/mh/20news-all > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:558) > at > org.apache.mahout.text.SequenceFilesFromDirectory.runMapReduce(SequenceFilesFromDirectory.java:162) > at > org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFromDirectory.java:91) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) > at > org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:65) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) > at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) > at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195) > {noformat} > But directory exists in the specified folder: > {noformat} > ott@mercury:work/exps/mh\>ls -lsd 20news-all > 13:50 0 > 4 drwxrwxr-x 22 ott ott 4096 Jul 8 08:49 20news-all/ > {noformat} > If I explicitly specify {{-xm sequential}} flag, then there is no error, but > the task isn't performed at all: > {noformat} > MAHOUT_LOCAL is set, so we don't add HADOOP_CONF_DIR to classpath. > MAHOUT_LOCAL is set, running locally > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/home/ott/work/mahout-head/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/home/ott/work/mahout-head/examples/target/dependency/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] > 14/07/08 13:54:19 INFO common.AbstractJob: Command line arguments: > {--charset=[UTF-8], --chunkSize=[64], --endPhase=[2147483647], > --fileFilterClass=[org.apache.mahout.text.PrefixAdditionFilter], > --input=[/home/ott/work/exps/mh/20news-all], --keyPrefix=[], > --method=[sequential], --output=[/home/ott/work/exps/mh/20news-seq], > --overwrite=null, --startPhase=[0], --tempDir=[temp]} > 14/07/08 13:54:19 INFO driver.MahoutDriver: Program took 548 ms (Minutes: > 0.009133333333333334) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)