[jira] [Commented] (MAHOUT-1498) DistributedCache.setCacheFiles in DictionaryVectorizer overwrites jars pushed using oozie

2014-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003334#comment-14003334
 ] 

Hudson commented on MAHOUT-1498:


FAILURE: Integrated in Mahout-Quality #2610 (See 
[https://builds.apache.org/job/Mahout-Quality/2610/])
MAHOUT-1498 DistributedCache.setCacheFiles in DictionaryVectorizer overwrites 
jars pushed using oozie (ssc: rev 1595643)
* /mahout/trunk/CHANGELOG
* /mahout/trunk/mrlegacy/src/main/java/org/apache/mahout/common/HadoopUtil.java
* 
/mahout/trunk/mrlegacy/src/main/java/org/apache/mahout/vectorizer/DictionaryVectorizer.java
* 
/mahout/trunk/mrlegacy/src/main/java/org/apache/mahout/vectorizer/term/TFPartialVectorReducer.java
* 
/mahout/trunk/mrlegacy/src/main/java/org/apache/mahout/vectorizer/tfidf/TFIDFConverter.java
* 
/mahout/trunk/mrlegacy/src/main/java/org/apache/mahout/vectorizer/tfidf/TFIDFPartialVectorReducer.java
* 
/mahout/trunk/mrlegacy/src/test/java/org/apache/mahout/common/DistributedCacheFileLocationTest.java


 DistributedCache.setCacheFiles in DictionaryVectorizer overwrites jars pushed 
 using oozie
 -

 Key: MAHOUT-1498
 URL: https://issues.apache.org/jira/browse/MAHOUT-1498
 Project: Mahout
  Issue Type: Bug
Affects Versions: 0.7
 Environment: mahout-core-0.7-cdh4.4.0.jar
Reporter: Sergey
Assignee: Sebastian Schelter
  Labels: patch
 Fix For: 1.0

 Attachments: MAHOUT-1498.patch


 Hi, I get exception 
 {code}
  Invocation of Main class completed 
 Failing Oozie Launcher, Main class 
 [org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles], main() threw 
 exception, Job failed!
 java.lang.IllegalStateException: Job failed!
 at 
 org.apache.mahout.vectorizer.DictionaryVectorizer.makePartialVectors(DictionaryVectorizer.java:329)
 at 
 org.apache.mahout.vectorizer.DictionaryVectorizer.createTermFrequencyVectors(DictionaryVectorizer.java:199)
 at 
 org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.run(SparseVectorsFromSequenceFiles.java:271)
 {code}
 The root cause is:
 {code}
 Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
 at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
 at java.lang.Class.forName0(Native Method)
 at java.lang.Class.forName(Class.java:247
 {code}
 Looks like it happens because of 
 DictionaryVectorizer.makePartialVectors method.
 It has code:
 {code}
 DistributedCache.setCacheFiles(new URI[] {dictionaryFilePath.toUri()}, conf);
 {code}
 which overrides jars pushed with job by oozie:
 {code}
 public static void More ...setCacheFiles(URI[] files, Configuration conf) {
  String sfiles = StringUtils.uriToString(files);
  conf.set(mapred.cache.files, sfiles);
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1498) DistributedCache.setCacheFiles in DictionaryVectorizer overwrites jars pushed using oozie

2014-05-19 Thread Sergey (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14001434#comment-14001434
 ] 

Sergey commented on MAHOUT-1498:


Great, nice to hear it.
Looks like I have the  similar problem here. It appears during execution as 
oozie java action. I would inverstigate it and create separate ticket if I find 
the root cause of problem. ClusterClassificationDriver is much more difficult 
to read than previously patched modules. 
{code}
at 
org.apache.mahout.clustering.classify.ClusterClassificationDriver.classifyClusterMR(ClusterClassificationDriver.java:276)
at 
org.apache.mahout.clustering.classify.ClusterClassificationDriver.run(ClusterClassificationDriver.java:135)
at 
org.apache.mahout.clustering.canopy.CanopyDriver.clusterData(CanopyDriver.java:372)
at 
org.apache.mahout.clustering.canopy.CanopyDriver.run(CanopyDriver.java:158)
at 
org.apache.mahout.clustering.canopy.CanopyDriver.run(CanopyDriver.java:117)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at 
org.apache.mahout.clustering.canopy.CanopyDriver.main(CanopyDriver.java:64)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
{code}

 DistributedCache.setCacheFiles in DictionaryVectorizer overwrites jars pushed 
 using oozie
 -

 Key: MAHOUT-1498
 URL: https://issues.apache.org/jira/browse/MAHOUT-1498
 Project: Mahout
  Issue Type: Bug
Affects Versions: 0.7
 Environment: mahout-core-0.7-cdh4.4.0.jar
Reporter: Sergey
Assignee: Sebastian Schelter
  Labels: patch
 Fix For: 1.0

 Attachments: MAHOUT-1498.patch


 Hi, I get exception 
 {code}
  Invocation of Main class completed 
 Failing Oozie Launcher, Main class 
 [org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles], main() threw 
 exception, Job failed!
 java.lang.IllegalStateException: Job failed!
 at 
 org.apache.mahout.vectorizer.DictionaryVectorizer.makePartialVectors(DictionaryVectorizer.java:329)
 at 
 org.apache.mahout.vectorizer.DictionaryVectorizer.createTermFrequencyVectors(DictionaryVectorizer.java:199)
 at 
 org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.run(SparseVectorsFromSequenceFiles.java:271)
 {code}
 The root cause is:
 {code}
 Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
 at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
 at java.lang.Class.forName0(Native Method)
 at java.lang.Class.forName(Class.java:247
 {code}
 Looks like it happens because of 
 DictionaryVectorizer.makePartialVectors method.
 It has code:
 {code}
 DistributedCache.setCacheFiles(new URI[] {dictionaryFilePath.toUri()}, conf);
 {code}
 which overrides jars pushed with job by oozie:
 {code}
 public static void More ...setCacheFiles(URI[] files, Configuration conf) {
  String sfiles = StringUtils.uriToString(files);
  conf.set(mapred.cache.files, sfiles);
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1498) DistributedCache.setCacheFiles in DictionaryVectorizer overwrites jars pushed using oozie

2014-05-18 Thread Sebastian Schelter (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14001014#comment-14001014
 ] 

Sebastian Schelter commented on MAHOUT-1498:


[~serega_sheypak] whats the status here?

 DistributedCache.setCacheFiles in DictionaryVectorizer overwrites jars pushed 
 using oozie
 -

 Key: MAHOUT-1498
 URL: https://issues.apache.org/jira/browse/MAHOUT-1498
 Project: Mahout
  Issue Type: Bug
Affects Versions: 0.7
 Environment: mahout-core-0.7-cdh4.4.0.jar
Reporter: Sergey
 Fix For: 1.0


 Hi, I get exception 
 {code}
  Invocation of Main class completed 
 Failing Oozie Launcher, Main class 
 [org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles], main() threw 
 exception, Job failed!
 java.lang.IllegalStateException: Job failed!
 at 
 org.apache.mahout.vectorizer.DictionaryVectorizer.makePartialVectors(DictionaryVectorizer.java:329)
 at 
 org.apache.mahout.vectorizer.DictionaryVectorizer.createTermFrequencyVectors(DictionaryVectorizer.java:199)
 at 
 org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.run(SparseVectorsFromSequenceFiles.java:271)
 {code}
 The root cause is:
 {code}
 Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
 at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
 at java.lang.Class.forName0(Native Method)
 at java.lang.Class.forName(Class.java:247
 {code}
 Looks like it happens because of 
 DictionaryVectorizer.makePartialVectors method.
 It has code:
 {code}
 DistributedCache.setCacheFiles(new URI[] {dictionaryFilePath.toUri()}, conf);
 {code}
 which overrides jars pushed with job by oozie:
 {code}
 public static void More ...setCacheFiles(URI[] files, Configuration conf) {
  String sfiles = StringUtils.uriToString(files);
  conf.set(mapred.cache.files, sfiles);
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1498) DistributedCache.setCacheFiles in DictionaryVectorizer overwrites jars pushed using oozie

2014-05-18 Thread Sergey (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14001120#comment-14001120
 ] 

Sergey commented on MAHOUT-1498:


Hi, I've added a separate class+test. I hope I did follow the guides.
Please see is it OK or I need to work more on it. Thanks.

 DistributedCache.setCacheFiles in DictionaryVectorizer overwrites jars pushed 
 using oozie
 -

 Key: MAHOUT-1498
 URL: https://issues.apache.org/jira/browse/MAHOUT-1498
 Project: Mahout
  Issue Type: Bug
Affects Versions: 0.7
 Environment: mahout-core-0.7-cdh4.4.0.jar
Reporter: Sergey
  Labels: patch
 Fix For: 1.0

 Attachments: MAHOUT-1498.patch


 Hi, I get exception 
 {code}
  Invocation of Main class completed 
 Failing Oozie Launcher, Main class 
 [org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles], main() threw 
 exception, Job failed!
 java.lang.IllegalStateException: Job failed!
 at 
 org.apache.mahout.vectorizer.DictionaryVectorizer.makePartialVectors(DictionaryVectorizer.java:329)
 at 
 org.apache.mahout.vectorizer.DictionaryVectorizer.createTermFrequencyVectors(DictionaryVectorizer.java:199)
 at 
 org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.run(SparseVectorsFromSequenceFiles.java:271)
 {code}
 The root cause is:
 {code}
 Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
 at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
 at java.lang.Class.forName0(Native Method)
 at java.lang.Class.forName(Class.java:247
 {code}
 Looks like it happens because of 
 DictionaryVectorizer.makePartialVectors method.
 It has code:
 {code}
 DistributedCache.setCacheFiles(new URI[] {dictionaryFilePath.toUri()}, conf);
 {code}
 which overrides jars pushed with job by oozie:
 {code}
 public static void More ...setCacheFiles(URI[] files, Configuration conf) {
  String sfiles = StringUtils.uriToString(files);
  conf.set(mapred.cache.files, sfiles);
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1498) DistributedCache.setCacheFiles in DictionaryVectorizer overwrites jars pushed using oozie

2014-04-27 Thread Sebastian Schelter (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13982236#comment-13982236
 ] 

Sebastian Schelter commented on MAHOUT-1498:


[~serega_sheypak] Here is the info on how to contribute patch: 
https://mahout.apache.org/developers/how-to-contribute.html 

Please work off trunk

 DistributedCache.setCacheFiles in DictionaryVectorizer overwrites jars pushed 
 using oozie
 -

 Key: MAHOUT-1498
 URL: https://issues.apache.org/jira/browse/MAHOUT-1498
 Project: Mahout
  Issue Type: Bug
Affects Versions: 0.7
 Environment: mahout-core-0.7-cdh4.4.0.jar
Reporter: Sergey
 Fix For: 1.0


 Hi, I get exception 
 {code}
  Invocation of Main class completed 
 Failing Oozie Launcher, Main class 
 [org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles], main() threw 
 exception, Job failed!
 java.lang.IllegalStateException: Job failed!
 at 
 org.apache.mahout.vectorizer.DictionaryVectorizer.makePartialVectors(DictionaryVectorizer.java:329)
 at 
 org.apache.mahout.vectorizer.DictionaryVectorizer.createTermFrequencyVectors(DictionaryVectorizer.java:199)
 at 
 org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.run(SparseVectorsFromSequenceFiles.java:271)
 {code}
 The root cause is:
 {code}
 Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
 at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
 at java.lang.Class.forName0(Native Method)
 at java.lang.Class.forName(Class.java:247
 {code}
 Looks like it happens because of 
 DictionaryVectorizer.makePartialVectors method.
 It has code:
 {code}
 DistributedCache.setCacheFiles(new URI[] {dictionaryFilePath.toUri()}, conf);
 {code}
 which overrides jars pushed with job by oozie:
 {code}
 public static void More ...setCacheFiles(URI[] files, Configuration conf) {
  String sfiles = StringUtils.uriToString(files);
  conf.set(mapred.cache.files, sfiles);
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1498) DistributedCache.setCacheFiles in DictionaryVectorizer overwrites jars pushed using oozie

2014-04-21 Thread Sergey (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13975537#comment-13975537
 ] 

Sergey commented on MAHOUT-1498:


Yes, it's possible

1. http://commons.apache.org/patches.html
is it a valid guide?

2. Which branch do I have to checkout for patching? Right now it's done 
'quick-and-dirty'. It can be used a a patch...


 DistributedCache.setCacheFiles in DictionaryVectorizer overwrites jars pushed 
 using oozie
 -

 Key: MAHOUT-1498
 URL: https://issues.apache.org/jira/browse/MAHOUT-1498
 Project: Mahout
  Issue Type: Bug
Affects Versions: 0.7
 Environment: mahout-core-0.7-cdh4.4.0.jar
Reporter: Sergey
 Fix For: 1.0


 Hi, I get exception 
 {code}
  Invocation of Main class completed 
 Failing Oozie Launcher, Main class 
 [org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles], main() threw 
 exception, Job failed!
 java.lang.IllegalStateException: Job failed!
 at 
 org.apache.mahout.vectorizer.DictionaryVectorizer.makePartialVectors(DictionaryVectorizer.java:329)
 at 
 org.apache.mahout.vectorizer.DictionaryVectorizer.createTermFrequencyVectors(DictionaryVectorizer.java:199)
 at 
 org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.run(SparseVectorsFromSequenceFiles.java:271)
 {code}
 The root cause is:
 {code}
 Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
 at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
 at java.lang.Class.forName0(Native Method)
 at java.lang.Class.forName(Class.java:247
 {code}
 Looks like it happens because of 
 DictionaryVectorizer.makePartialVectors method.
 It has code:
 {code}
 DistributedCache.setCacheFiles(new URI[] {dictionaryFilePath.toUri()}, conf);
 {code}
 which overrides jars pushed with job by oozie:
 {code}
 public static void More ...setCacheFiles(URI[] files, Configuration conf) {
  String sfiles = StringUtils.uriToString(files);
  conf.set(mapred.cache.files, sfiles);
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1498) DistributedCache.setCacheFiles in DictionaryVectorizer overwrites jars pushed using oozie

2014-04-13 Thread Sebastian Schelter (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967801#comment-13967801
 ] 

Sebastian Schelter commented on MAHOUT-1498:


Could you provide a patch for your changes?

 DistributedCache.setCacheFiles in DictionaryVectorizer overwrites jars pushed 
 using oozie
 -

 Key: MAHOUT-1498
 URL: https://issues.apache.org/jira/browse/MAHOUT-1498
 Project: Mahout
  Issue Type: Bug
Affects Versions: 0.7
 Environment: mahout-core-0.7-cdh4.4.0.jar
Reporter: Sergey
 Fix For: 1.0


 Hi, I get exception 
 {code}
  Invocation of Main class completed 
 Failing Oozie Launcher, Main class 
 [org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles], main() threw 
 exception, Job failed!
 java.lang.IllegalStateException: Job failed!
 at 
 org.apache.mahout.vectorizer.DictionaryVectorizer.makePartialVectors(DictionaryVectorizer.java:329)
 at 
 org.apache.mahout.vectorizer.DictionaryVectorizer.createTermFrequencyVectors(DictionaryVectorizer.java:199)
 at 
 org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.run(SparseVectorsFromSequenceFiles.java:271)
 {code}
 The root cause is:
 {code}
 Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
 at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
 at java.lang.Class.forName0(Native Method)
 at java.lang.Class.forName(Class.java:247
 {code}
 Looks like it happens because of 
 DictionaryVectorizer.makePartialVectors method.
 It has code:
 {code}
 DistributedCache.setCacheFiles(new URI[] {dictionaryFilePath.toUri()}, conf);
 {code}
 which overrides jars pushed with job by oozie:
 {code}
 public static void More ...setCacheFiles(URI[] files, Configuration conf) {
  String sfiles = StringUtils.uriToString(files);
  conf.set(mapred.cache.files, sfiles);
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1498) DistributedCache.setCacheFiles in DictionaryVectorizer overwrites jars pushed using oozie

2014-03-30 Thread Sergey (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13954737#comment-13954737
 ] 

Sergey commented on MAHOUT-1498:


So I've replaced all 
{code}
DistributedCache.setCacheFiles(new URI[] {dictionaryFilePath.toUri()}, conf);
{code}
with
{code}
DistributedCache.addCacheFile(dictionaryFilePath.toUri(), conf);
{code}

Now my jars are not thrown away from distirubted cache. These jars are used in 
subsequent MR job submission.
Also I've modified several reducers. Reducers did expect to get single file in 
distCache. Here is an example:
{code}
//TFPartialVectorReducer
@Override
protected void setup(Context context) throws IOException, 
InterruptedException {
super.setup(context);
Configuration conf = context.getConfiguration();
URI[] localFiles = DistributedCache.getCacheFiles(conf);
Preconditions.checkArgument(localFiles != null  localFiles.length = 
1,
missing paths from the DistributedCache);

dimension = conf.getInt(PartialVectorMerger.DIMENSION, 
Integer.MAX_VALUE);
sequentialAccess = 
conf.getBoolean(PartialVectorMerger.SEQUENTIAL_ACCESS, false);
namedVector = conf.getBoolean(PartialVectorMerger.NAMED_VECTOR, false);
maxNGramSize = conf.getInt(DictionaryVectorizer.MAX_NGRAMS, 
maxNGramSize);

//Path dictionaryFile = new Path(localFiles[0].getPath());
Path dictionaryFile = getPathToDictionaryFile(localFiles);
// key is word value is id
for (PairWritable, IntWritable record
: new SequenceFileIterableWritable, 
IntWritable(dictionaryFile, true, conf)) {
dictionary.put(record.getFirst().toString(), 
record.getSecond().get());
}
}

private Path getPathToDictionaryFile(URI[] localFiles){
for(URI distCacheFile : localFiles){
System.out.println(getPathToDictionaryFile :::  + (distCacheFile 
== null ? null : distCacheFile.toString()));
if(distCacheFile!=null  
distCacheFile.toString().contains(dictionary.file)){
System.out.println(getPathToDictionaryFile ::: looks like 
[+distCacheFile+] is a dictionary we need);
return new Path(distCacheFile.getPath());
}
}
URI lastUri = localFiles[localFiles.length-1];
System.out.println(getPathToDictionaryFile ::: didn't find dict file. 
Trying to return the last one [+lastUri.toString()+]);
return new Path(lastUri.getPath());
}
{code}

I'm not sure is it good or bad, and now my oozie action runs without any 
problems. Here is a workflow action:
{code}
action name=run-mahout-item_info_catalog_category_id
java
job-tracker${jobTracker}/job-tracker
name-node${nameNode}/name-node
prepare
delete 
path=${nameNode}/staging/working/mahout/run-mahout-item_info_catalog_category_id/out
 /
/prepare
configuration
property
namemapred.queue.name/name
valuedefault/value
/property
/configuration

main-classorg.apache.mahout.vectorizer.SparseVectorsFromSequenceFilesDirtyHack/main-class

arg-Ddfs.blocksize=1m/arg

arg--input/arg

arg${nameNode}/staging/working/mahout/prepare-item_info_catalog_category_id/out/arg

arg--output/arg

arg${nameNode}/staging/working/mahout/run-mahout-item_info_catalog_category_id/out/arg

arg-ow/arg

arg-x/arg
arg70/arg

arg-ng/arg
arg4/arg

arg-n/arg
arg2/arg

arg-seq/arg

arg-wt/arg
argTFIDF/arg
/java
ok to=mahout-join-node/
error to=kill/
/action
{code}



 DistributedCache.setCacheFiles in DictionaryVectorizer overwrites jars pushed 
 using oozie
 -

 Key: MAHOUT-1498
 URL: https://issues.apache.org/jira/browse/MAHOUT-1498
 Project: Mahout
  Issue Type: Bug
Affects Versions: 0.7
 Environment: mahout-core-0.7-cdh4.4.0.jar
Reporter: Sergey

 Hi, I get exception 
 {code}
  Invocation of Main class completed 
 Failing Oozie Launcher, Main class 
 [org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles], main() threw 
 exception, Job failed!
 java.lang.IllegalStateException: Job failed!
 at 
 org.apache.mahout.vectorizer.DictionaryVectorizer.makePartialVectors(DictionaryVectorizer.java:329)
 at 
 org.apache.mahout.vectorizer.DictionaryVectorizer.createTermFrequencyVectors(DictionaryVectorizer.java:199)
 at 
 org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.run(SparseVectorsFromSequenceFiles.java:271)
 {code}
 The root cause