[jira] [Updated] (MAHOUT-1498) DistributedCache.setCacheFiles in DictionaryVectorizer overwrites jars pushed using oozie
[ https://issues.apache.org/jira/browse/MAHOUT-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1498: --- Resolution: Fixed Assignee: Sebastian Schelter Status: Resolved (was: Patch Available) committed with a few cosmetic changes, thank you for the contribution! > DistributedCache.setCacheFiles in DictionaryVectorizer overwrites jars pushed > using oozie > - > > Key: MAHOUT-1498 > URL: https://issues.apache.org/jira/browse/MAHOUT-1498 > Project: Mahout > Issue Type: Bug >Affects Versions: 0.7 > Environment: mahout-core-0.7-cdh4.4.0.jar >Reporter: Sergey >Assignee: Sebastian Schelter > Labels: patch > Fix For: 1.0 > > Attachments: MAHOUT-1498.patch > > > Hi, I get exception > {code} > <<< Invocation of Main class completed <<< > Failing Oozie Launcher, Main class > [org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles], main() threw > exception, Job failed! > java.lang.IllegalStateException: Job failed! > at > org.apache.mahout.vectorizer.DictionaryVectorizer.makePartialVectors(DictionaryVectorizer.java:329) > at > org.apache.mahout.vectorizer.DictionaryVectorizer.createTermFrequencyVectors(DictionaryVectorizer.java:199) > at > org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.run(SparseVectorsFromSequenceFiles.java:271) > {code} > The root cause is: > {code} > Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector > at java.net.URLClassLoader$1.run(URLClassLoader.java:202) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:190) > at java.lang.ClassLoader.loadClass(ClassLoader.java:306) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) > at java.lang.ClassLoader.loadClass(ClassLoader.java:247) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:247 > {code} > Looks like it happens because of > DictionaryVectorizer.makePartialVectors method. > It has code: > {code} > DistributedCache.setCacheFiles(new URI[] {dictionaryFilePath.toUri()}, conf); > {code} > which overrides jars pushed with job by oozie: > {code} > public static void More ...setCacheFiles(URI[] files, Configuration conf) { > String sfiles = StringUtils.uriToString(files); > conf.set("mapred.cache.files", sfiles); > } > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAHOUT-1498) DistributedCache.setCacheFiles in DictionaryVectorizer overwrites jars pushed using oozie
[ https://issues.apache.org/jira/browse/MAHOUT-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey updated MAHOUT-1498: --- Status: Patch Available (was: Open) >From 4c95084229830b88f0bdec63b0967e5d6fceb58d Mon Sep 17 00:00:00 2001 From: seregasheypak Date: Sun, 18 May 2014 20:20:16 +0400 Subject: [PATCH] MAHOUT-1498 Do not reset app jars stored in DistributedCache. Now you can run it as oozie Java action. Just bundle dependent jars in workflow/lib folder. --- .../mahout/util/DistributedCacheFileLocator.java | 47 .../mahout/vectorizer/DictionaryVectorizer.java| 21 - .../vectorizer/term/TFPartialVectorReducer.java| 14 +++--- .../mahout/vectorizer/tfidf/TFIDFConverter.java| 11 +++-- .../tfidf/TFIDFPartialVectorReducer.java |6 ++- .../util/DistributedCacheFileLocatorTest.java | 38 6 files changed, 109 insertions(+), 28 deletions(-) create mode 100644 mrlegacy/src/main/java/org/apache/mahout/util/DistributedCacheFileLocator.java create mode 100644 mrlegacy/src/test/java/org/apache/mahout/util/DistributedCacheFileLocatorTest.java diff --git a/mrlegacy/src/main/java/org/apache/mahout/util/DistributedCacheFileLocator.java b/mrlegacy/src/main/java/org/apache/mahout/util/DistributedCacheFileLocator.java new file mode 100644 index 000..8a59908 --- /dev/null +++ b/mrlegacy/src/main/java/org/apache/mahout/util/DistributedCacheFileLocator.java @@ -0,0 +1,47 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.mahout.util; + +import org.apache.hadoop.fs.Path; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.net.URI; + +public class DistributedCacheFileLocator { + +private static final Logger log = LoggerFactory.getLogger(DistributedCacheFileLocator.class); + +/** + * Finds a file in Distributed cache + * @param aPartOfName is a substring in file name + * @param localFiles holds references to files stored in distributed cache + * @return Path instance to first matched file or null + * */ +public Path findByContainsInName(String aPartOfName, URI[] localFiles){ +for(URI distCacheFile : localFiles){ +log.debug("find a file in distributed cache by part of name {}", aPartOfName); +if(distCacheFile!=null && distCacheFile.toString().contains(aPartOfName)){ +log.debug("found a file [{}] using a part of name[{}]", distCacheFile.toString(), aPartOfName); +return new Path(distCacheFile.getPath()); +} +} +return null; +} + +} diff --git a/mrlegacy/src/main/java/org/apache/mahout/vectorizer/DictionaryVectorizer.java b/mrlegacy/src/main/java/org/apache/mahout/vectorizer/DictionaryVectorizer.java index 99ef019..64e5a67 100644 --- a/mrlegacy/src/main/java/org/apache/mahout/vectorizer/DictionaryVectorizer.java +++ b/mrlegacy/src/main/java/org/apache/mahout/vectorizer/DictionaryVectorizer.java @@ -17,11 +17,6 @@ package org.apache.mahout.vectorizer; -import java.io.IOException; -import java.net.URI; -import java.util.Collection; -import java.util.List; - import com.google.common.base.Preconditions; import com.google.common.collect.Lists; import com.google.common.io.Closeables; @@ -29,11 +24,7 @@ import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.filecache.DistributedCache; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; -import org.apache.hadoop.io.IntWritable; -import org.apache.hadoop.io.LongWritable; -import org.apache.hadoop.io.SequenceFile; -import org.apache.hadoop.io.Text; -import org.apache.hadoop.io.Writable; +import org.apache.hadoop.io.*; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; @@ -59,6 +50,10 @@ import org.apache.mahout.vectorizer.term.TermCountReducer; import org.slf4j.Logger; import org.slf4j.LoggerFactory; +import java.io.IOException; +import java.util.Collection; +import java.util.List; + /** * This class converts a set of input documents in th
[jira] [Updated] (MAHOUT-1498) DistributedCache.setCacheFiles in DictionaryVectorizer overwrites jars pushed using oozie
[ https://issues.apache.org/jira/browse/MAHOUT-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey updated MAHOUT-1498: --- Status: Open (was: Patch Available) > DistributedCache.setCacheFiles in DictionaryVectorizer overwrites jars pushed > using oozie > - > > Key: MAHOUT-1498 > URL: https://issues.apache.org/jira/browse/MAHOUT-1498 > Project: Mahout > Issue Type: Bug >Affects Versions: 0.7 > Environment: mahout-core-0.7-cdh4.4.0.jar >Reporter: Sergey > Labels: patch > Fix For: 1.0 > > > Hi, I get exception > {code} > <<< Invocation of Main class completed <<< > Failing Oozie Launcher, Main class > [org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles], main() threw > exception, Job failed! > java.lang.IllegalStateException: Job failed! > at > org.apache.mahout.vectorizer.DictionaryVectorizer.makePartialVectors(DictionaryVectorizer.java:329) > at > org.apache.mahout.vectorizer.DictionaryVectorizer.createTermFrequencyVectors(DictionaryVectorizer.java:199) > at > org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.run(SparseVectorsFromSequenceFiles.java:271) > {code} > The root cause is: > {code} > Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector > at java.net.URLClassLoader$1.run(URLClassLoader.java:202) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:190) > at java.lang.ClassLoader.loadClass(ClassLoader.java:306) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) > at java.lang.ClassLoader.loadClass(ClassLoader.java:247) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:247 > {code} > Looks like it happens because of > DictionaryVectorizer.makePartialVectors method. > It has code: > {code} > DistributedCache.setCacheFiles(new URI[] {dictionaryFilePath.toUri()}, conf); > {code} > which overrides jars pushed with job by oozie: > {code} > public static void More ...setCacheFiles(URI[] files, Configuration conf) { > String sfiles = StringUtils.uriToString(files); > conf.set("mapred.cache.files", sfiles); > } > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAHOUT-1498) DistributedCache.setCacheFiles in DictionaryVectorizer overwrites jars pushed using oozie
[ https://issues.apache.org/jira/browse/MAHOUT-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey updated MAHOUT-1498: --- Status: Open (was: Patch Available) > DistributedCache.setCacheFiles in DictionaryVectorizer overwrites jars pushed > using oozie > - > > Key: MAHOUT-1498 > URL: https://issues.apache.org/jira/browse/MAHOUT-1498 > Project: Mahout > Issue Type: Bug >Affects Versions: 0.7 > Environment: mahout-core-0.7-cdh4.4.0.jar >Reporter: Sergey > Fix For: 1.0 > > > Hi, I get exception > {code} > <<< Invocation of Main class completed <<< > Failing Oozie Launcher, Main class > [org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles], main() threw > exception, Job failed! > java.lang.IllegalStateException: Job failed! > at > org.apache.mahout.vectorizer.DictionaryVectorizer.makePartialVectors(DictionaryVectorizer.java:329) > at > org.apache.mahout.vectorizer.DictionaryVectorizer.createTermFrequencyVectors(DictionaryVectorizer.java:199) > at > org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.run(SparseVectorsFromSequenceFiles.java:271) > {code} > The root cause is: > {code} > Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector > at java.net.URLClassLoader$1.run(URLClassLoader.java:202) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:190) > at java.lang.ClassLoader.loadClass(ClassLoader.java:306) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) > at java.lang.ClassLoader.loadClass(ClassLoader.java:247) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:247 > {code} > Looks like it happens because of > DictionaryVectorizer.makePartialVectors method. > It has code: > {code} > DistributedCache.setCacheFiles(new URI[] {dictionaryFilePath.toUri()}, conf); > {code} > which overrides jars pushed with job by oozie: > {code} > public static void More ...setCacheFiles(URI[] files, Configuration conf) { > String sfiles = StringUtils.uriToString(files); > conf.set("mapred.cache.files", sfiles); > } > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAHOUT-1498) DistributedCache.setCacheFiles in DictionaryVectorizer overwrites jars pushed using oozie
[ https://issues.apache.org/jira/browse/MAHOUT-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey updated MAHOUT-1498: --- Status: Patch Available (was: Open) > DistributedCache.setCacheFiles in DictionaryVectorizer overwrites jars pushed > using oozie > - > > Key: MAHOUT-1498 > URL: https://issues.apache.org/jira/browse/MAHOUT-1498 > Project: Mahout > Issue Type: Bug >Affects Versions: 0.7 > Environment: mahout-core-0.7-cdh4.4.0.jar >Reporter: Sergey > Fix For: 1.0 > > > Hi, I get exception > {code} > <<< Invocation of Main class completed <<< > Failing Oozie Launcher, Main class > [org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles], main() threw > exception, Job failed! > java.lang.IllegalStateException: Job failed! > at > org.apache.mahout.vectorizer.DictionaryVectorizer.makePartialVectors(DictionaryVectorizer.java:329) > at > org.apache.mahout.vectorizer.DictionaryVectorizer.createTermFrequencyVectors(DictionaryVectorizer.java:199) > at > org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.run(SparseVectorsFromSequenceFiles.java:271) > {code} > The root cause is: > {code} > Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector > at java.net.URLClassLoader$1.run(URLClassLoader.java:202) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:190) > at java.lang.ClassLoader.loadClass(ClassLoader.java:306) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) > at java.lang.ClassLoader.loadClass(ClassLoader.java:247) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:247 > {code} > Looks like it happens because of > DictionaryVectorizer.makePartialVectors method. > It has code: > {code} > DistributedCache.setCacheFiles(new URI[] {dictionaryFilePath.toUri()}, conf); > {code} > which overrides jars pushed with job by oozie: > {code} > public static void More ...setCacheFiles(URI[] files, Configuration conf) { > String sfiles = StringUtils.uriToString(files); > conf.set("mapred.cache.files", sfiles); > } > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAHOUT-1498) DistributedCache.setCacheFiles in DictionaryVectorizer overwrites jars pushed using oozie
[ https://issues.apache.org/jira/browse/MAHOUT-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey updated MAHOUT-1498: --- Attachment: MAHOUT-1498.patch PATCH > DistributedCache.setCacheFiles in DictionaryVectorizer overwrites jars pushed > using oozie > - > > Key: MAHOUT-1498 > URL: https://issues.apache.org/jira/browse/MAHOUT-1498 > Project: Mahout > Issue Type: Bug >Affects Versions: 0.7 > Environment: mahout-core-0.7-cdh4.4.0.jar >Reporter: Sergey > Labels: patch > Fix For: 1.0 > > Attachments: MAHOUT-1498.patch > > > Hi, I get exception > {code} > <<< Invocation of Main class completed <<< > Failing Oozie Launcher, Main class > [org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles], main() threw > exception, Job failed! > java.lang.IllegalStateException: Job failed! > at > org.apache.mahout.vectorizer.DictionaryVectorizer.makePartialVectors(DictionaryVectorizer.java:329) > at > org.apache.mahout.vectorizer.DictionaryVectorizer.createTermFrequencyVectors(DictionaryVectorizer.java:199) > at > org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.run(SparseVectorsFromSequenceFiles.java:271) > {code} > The root cause is: > {code} > Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector > at java.net.URLClassLoader$1.run(URLClassLoader.java:202) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:190) > at java.lang.ClassLoader.loadClass(ClassLoader.java:306) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) > at java.lang.ClassLoader.loadClass(ClassLoader.java:247) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:247 > {code} > Looks like it happens because of > DictionaryVectorizer.makePartialVectors method. > It has code: > {code} > DistributedCache.setCacheFiles(new URI[] {dictionaryFilePath.toUri()}, conf); > {code} > which overrides jars pushed with job by oozie: > {code} > public static void More ...setCacheFiles(URI[] files, Configuration conf) { > String sfiles = StringUtils.uriToString(files); > conf.set("mapred.cache.files", sfiles); > } > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAHOUT-1498) DistributedCache.setCacheFiles in DictionaryVectorizer overwrites jars pushed using oozie
[ https://issues.apache.org/jira/browse/MAHOUT-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey updated MAHOUT-1498: --- Labels: patch (was: ) Status: Patch Available (was: Open) >From 4c95084229830b88f0bdec63b0967e5d6fceb58d Mon Sep 17 00:00:00 2001 From: seregasheypak Date: Sun, 18 May 2014 20:20:16 +0400 Subject: [PATCH] MAHOUT-1498 Do not reset app jars stored in DistributedCache. Now you can run it as oozie Java action. Just bundle dependent jars in workflow/lib folder. --- .../mahout/util/DistributedCacheFileLocator.java | 47 .../mahout/vectorizer/DictionaryVectorizer.java| 21 - .../vectorizer/term/TFPartialVectorReducer.java| 14 +++--- .../mahout/vectorizer/tfidf/TFIDFConverter.java| 11 +++-- .../tfidf/TFIDFPartialVectorReducer.java |6 ++- .../util/DistributedCacheFileLocatorTest.java | 38 6 files changed, 109 insertions(+), 28 deletions(-) create mode 100644 mrlegacy/src/main/java/org/apache/mahout/util/DistributedCacheFileLocator.java create mode 100644 mrlegacy/src/test/java/org/apache/mahout/util/DistributedCacheFileLocatorTest.java diff --git a/mrlegacy/src/main/java/org/apache/mahout/util/DistributedCacheFileLocator.java b/mrlegacy/src/main/java/org/apache/mahout/util/DistributedCacheFileLocator.java new file mode 100644 index 000..8a59908 --- /dev/null +++ b/mrlegacy/src/main/java/org/apache/mahout/util/DistributedCacheFileLocator.java @@ -0,0 +1,47 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.mahout.util; + +import org.apache.hadoop.fs.Path; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.net.URI; + +public class DistributedCacheFileLocator { + +private static final Logger log = LoggerFactory.getLogger(DistributedCacheFileLocator.class); + +/** + * Finds a file in Distributed cache + * @param aPartOfName is a substring in file name + * @param localFiles holds references to files stored in distributed cache + * @return Path instance to first matched file or null + * */ +public Path findByContainsInName(String aPartOfName, URI[] localFiles){ +for(URI distCacheFile : localFiles){ +log.debug("find a file in distributed cache by part of name {}", aPartOfName); +if(distCacheFile!=null && distCacheFile.toString().contains(aPartOfName)){ +log.debug("found a file [{}] using a part of name[{}]", distCacheFile.toString(), aPartOfName); +return new Path(distCacheFile.getPath()); +} +} +return null; +} + +} diff --git a/mrlegacy/src/main/java/org/apache/mahout/vectorizer/DictionaryVectorizer.java b/mrlegacy/src/main/java/org/apache/mahout/vectorizer/DictionaryVectorizer.java index 99ef019..64e5a67 100644 --- a/mrlegacy/src/main/java/org/apache/mahout/vectorizer/DictionaryVectorizer.java +++ b/mrlegacy/src/main/java/org/apache/mahout/vectorizer/DictionaryVectorizer.java @@ -17,11 +17,6 @@ package org.apache.mahout.vectorizer; -import java.io.IOException; -import java.net.URI; -import java.util.Collection; -import java.util.List; - import com.google.common.base.Preconditions; import com.google.common.collect.Lists; import com.google.common.io.Closeables; @@ -29,11 +24,7 @@ import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.filecache.DistributedCache; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; -import org.apache.hadoop.io.IntWritable; -import org.apache.hadoop.io.LongWritable; -import org.apache.hadoop.io.SequenceFile; -import org.apache.hadoop.io.Text; -import org.apache.hadoop.io.Writable; +import org.apache.hadoop.io.*; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; @@ -59,6 +50,10 @@ import org.apache.mahout.vectorizer.term.TermCountReducer; import org.slf4j.Logger; import org.slf4j.LoggerFactory; +import java.io.IOException; +import java.util.Collection; +import java.util.List; + /** * This class converts a s
[jira] [Updated] (MAHOUT-1498) DistributedCache.setCacheFiles in DictionaryVectorizer overwrites jars pushed using oozie
[ https://issues.apache.org/jira/browse/MAHOUT-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1498: --- Fix Version/s: 1.0 > DistributedCache.setCacheFiles in DictionaryVectorizer overwrites jars pushed > using oozie > - > > Key: MAHOUT-1498 > URL: https://issues.apache.org/jira/browse/MAHOUT-1498 > Project: Mahout > Issue Type: Bug >Affects Versions: 0.7 > Environment: mahout-core-0.7-cdh4.4.0.jar >Reporter: Sergey > Fix For: 1.0 > > > Hi, I get exception > {code} > <<< Invocation of Main class completed <<< > Failing Oozie Launcher, Main class > [org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles], main() threw > exception, Job failed! > java.lang.IllegalStateException: Job failed! > at > org.apache.mahout.vectorizer.DictionaryVectorizer.makePartialVectors(DictionaryVectorizer.java:329) > at > org.apache.mahout.vectorizer.DictionaryVectorizer.createTermFrequencyVectors(DictionaryVectorizer.java:199) > at > org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.run(SparseVectorsFromSequenceFiles.java:271) > {code} > The root cause is: > {code} > Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector > at java.net.URLClassLoader$1.run(URLClassLoader.java:202) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:190) > at java.lang.ClassLoader.loadClass(ClassLoader.java:306) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) > at java.lang.ClassLoader.loadClass(ClassLoader.java:247) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:247 > {code} > Looks like it happens because of > DictionaryVectorizer.makePartialVectors method. > It has code: > {code} > DistributedCache.setCacheFiles(new URI[] {dictionaryFilePath.toUri()}, conf); > {code} > which overrides jars pushed with job by oozie: > {code} > public static void More ...setCacheFiles(URI[] files, Configuration conf) { > String sfiles = StringUtils.uriToString(files); > conf.set("mapred.cache.files", sfiles); > } > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)