[jira] [Commented] (MAHOUT-1247) cluster-reuters doesn't work on Hadoop
[ https://issues.apache.org/jira/browse/MAHOUT-1247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13679074#comment-13679074 ] Grant Ingersoll commented on MAHOUT-1247: - Here's the first error I'm getting: https://paste.apache.org/cik6 {quote} java.lang.IllegalStateException: /tmp/hadoop-grantingersoll/mapred/local/taskTracker/distcache/4475940891381251304_1262960862_693852121/localhostdicVec/dictionary.file-0 at org.apache.mahout.common.iterator.sequencefile.SequenceFileIterable.iterator(SequenceFileIterable.java:63) at org.apache.mahout.vectorizer.term.TFPartialVectorReducer.setup(TFPartialVectorReducer.java:146) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:174) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:650) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: java.io.FileNotFoundException: File does not exist: hdfs://localhost:9000/tmp/hadoop-grantingersoll/mapred/local/taskTracker/distcache/4475940891381251304_1262960862_693852121/localhostdicVec/dictionary.file-0 at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:528) at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:796) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1479) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1474) at org.apache.mahout.common.iterator.sequencefile.SequenceFileIterator.init(SequenceFileIterator.java:58) at org.apache.mahout.common.iterator.sequencefile.SequenceFileIterable.iterator(SequenceFileIterable.java:61) ... 9 more {quote} Might be related to MAHOUT-992, but not sure. I added a main to DictionaryVectorizer that allows you to reproduce this off of the prior run of cluster-reuters without having to go re-run everything. cluster-reuters doesn't work on Hadoop -- Key: MAHOUT-1247 URL: https://issues.apache.org/jira/browse/MAHOUT-1247 Project: Mahout Issue Type: Bug Reporter: Grant Ingersoll Assignee: Grant Ingersoll Fix For: 0.8 At least two issues: 1. MAHOUT-992 messed up the Distributed Cache stuff somehow 2. The ExtractReuters data is not being moved to HDFS. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-1247) cluster-reuters doesn't work on Hadoop
[ https://issues.apache.org/jira/browse/MAHOUT-1247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13679076#comment-13679076 ] Grant Ingersoll commented on MAHOUT-1247: - After you run cluster-reuters.sh, you can run: {code}bin/mahout org.apache.mahout.vectorizer.DictionaryVectorizer -i /tmp/mahout-work-grantingersoll/reuters-out-seqdir-sparse-kmeans/tokenized-documents -o ./dicVec{code} Make sure you have HADOOP_HOME set and also substitute in the appropriate work directory. cluster-reuters doesn't work on Hadoop -- Key: MAHOUT-1247 URL: https://issues.apache.org/jira/browse/MAHOUT-1247 Project: Mahout Issue Type: Bug Reporter: Grant Ingersoll Assignee: Grant Ingersoll Fix For: 0.8 At least two issues: 1. MAHOUT-992 messed up the Distributed Cache stuff somehow 2. The ExtractReuters data is not being moved to HDFS. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-1247) cluster-reuters doesn't work on Hadoop
[ https://issues.apache.org/jira/browse/MAHOUT-1247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13679085#comment-13679085 ] Hudson commented on MAHOUT-1247: Integrated in Mahout-Quality #2065 (See [https://builds.apache.org/job/Mahout-Quality/2065/]) add some helpers to AbstractJob, add a main to DictionaryVectorizer to try and isolate some issues in testing DicVec on Hadoop for MAHOUT-1247 (Revision 1491225) Result = SUCCESS gsingers : Files : * /mahout/trunk/core/src/main/java/org/apache/mahout/common/AbstractJob.java * /mahout/trunk/core/src/main/java/org/apache/mahout/vectorizer/DictionaryVectorizer.java cluster-reuters doesn't work on Hadoop -- Key: MAHOUT-1247 URL: https://issues.apache.org/jira/browse/MAHOUT-1247 Project: Mahout Issue Type: Bug Reporter: Grant Ingersoll Assignee: Grant Ingersoll Fix For: 0.8 At least two issues: 1. MAHOUT-992 messed up the Distributed Cache stuff somehow 2. The ExtractReuters data is not being moved to HDFS. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-1247) cluster-reuters doesn't work on Hadoop
[ https://issues.apache.org/jira/browse/MAHOUT-1247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13679090#comment-13679090 ] Grant Ingersoll commented on MAHOUT-1247: - I think I see the issue. The cache file is local, the Iterator, however, has a Hadoop conf that is expecting an HDFS file, hence it can't find it. cluster-reuters doesn't work on Hadoop -- Key: MAHOUT-1247 URL: https://issues.apache.org/jira/browse/MAHOUT-1247 Project: Mahout Issue Type: Bug Reporter: Grant Ingersoll Assignee: Grant Ingersoll Fix For: 0.8 At least two issues: 1. MAHOUT-992 messed up the Distributed Cache stuff somehow 2. The ExtractReuters data is not being moved to HDFS. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira