[ https://issues.apache.org/jira/browse/MAHOUT-1627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrew Palumbo updated MAHOUT-1627: ----------------------------------- Labels: legacy (was: ) > Problem with ALS Factorizer MapReduce version when working with oozie because > of files in distributed cache. Error: Unable to read sequence file from cache. > ------------------------------------------------------------------------------------------------------------------------------------------------------------ > > Key: MAHOUT-1627 > URL: https://issues.apache.org/jira/browse/MAHOUT-1627 > Project: Mahout > Issue Type: Bug > Components: Collaborative Filtering > Affects Versions: 1.0 > Environment: Hadoop > Reporter: Srinivasarao Daruna > Labels: legacy > > There is a problem with ALS Factorizer when working with distributed > environment and oozie. > Steps: > 1) Built mahout 1.0 jars and picked mahout-mrlegacy jar. > 2) I have created a Java class in which i have called > ParallelALSFactorizationJob with respective inputs. > 3) Submitted the job and there are list of Map Reduce jobs which got > submitted to perform the factorization. > 4) Job failed at MultithreadedSharingMapper with the error Unable to read > Sequnce file "<ourprogram>.jar" pointing the code at > org.apache.mahout.cf.taste.hadoop.als.ALS and > readMatrixByRowsFromDistributedCache method. > Cause: The ALS class picks up input files which are sequential files from the > distributed cache using readMatrixByRowsFromDistributedCache method. However, > when we are working in oozie environment, the program jar as well being > copied to distributed cache with input files. As the ALS class trying to read > all the files in distributed cache, it is failing when it encounters jar. > The remedy would be setting a condition to pick files those are other than > jars. -- This message was sent by Atlassian JIRA (v6.3.4#6332)