Re: Mahout 0.7 ALS Recommender: java.lang.Exception: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable
Looks like maybe a mismatch between mahout version you compiled code against and the mahout version installed in the cluster? On Nov 24, 2014, at 8:08 AM, Ashok Harnal ashokhar...@gmail.com wrote: Thanks for reply. Here are the facts: 1. I am using mahout shell command and not a java program. So I am not passing any arguments to map function. 2. I am using hadoop. Input training file is loaded in hadoop. It is a tab separated 'u1.base' file of MovieLens dataset. It is something like below. All users are there along with whatever ratings they have given. 115 123 134 143 153 : : 214 2102 2144 : : 3. I use the following mahout command to build model: mahout parallelALS --input /user/ashokharnal/u1.base --output /user/ashokharnal/u1.out --lambda 0.1 --implicitFeedback true --alpha 0.8 --numFeatures 15 --numIterations 10 --numThreadsPerSolver 1 --tempDir /tmp/ratings 4. My test file is just two-lines tab-separated file as below: 11 21 5. This file is converted to sequence file using the following mahout command: mahout seqdirectory -i /user/ashokharnal/ufind2.test -o /user/ashokharnal/seqfiles 6. I then run the following mahout command: mahout recommendfactorized --input /user/ashokharnal/seqfiles --userFeatures /user/ashokharnal/u1.out/U/ --itemFeatures /user/akh/u1.out/M/ --numRecommendations 1 --output /tmp/reommendation --maxRating 1 7. I am using CentOS 6.5 with Cloudera 5.2 installed. The error messages are as below: 14/11/24 18:06:48 INFO mapred.MapTask: Processing split: hdfs://master:8020/user/ashokharnal/seqfiles/part-m-0:0+195 14/11/24 18:06:49 INFO zlib.ZlibFactory: Successfully loaded initialized native-zlib library 14/11/24 18:06:49 INFO compress.CodecPool: Got brand-new decompressor [.deflate] 14/11/24 18:06:49 INFO mapred.LocalJobRunner: Map task executor complete. 14/11/24 18:06:49 WARN mapred.LocalJobRunner: job_local1177125820_0001 java.lang.Exception: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:406) Caused by: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable at org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper.run(MultithreadedMapper.java:151) at org.apache.mahout.cf.taste.hadoop.als.MultithreadedSharingMapper.run(MultithreadedSharingMapper.java:60) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:268) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable at org.apache.mahout.cf.taste.hadoop.als.PredictionMapper.map(PredictionMapper.java:44) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140) at org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper$MapRunner.run(MultithreadedMapper.java:268) 14/11/24 18:06:49 INFO mapred.JobClient: map 0% reduce 0% 14/11/24 18:06:49 INFO mapred.JobClient: Job complete: job_local1177125820_0001 14/11/24 18:06:49 INFO mapred.JobClient: Counters: 0 14/11/24 18:06:49 INFO driver.MahoutDriver: Program took 2529 ms (Minutes: 0.04215) 14/11/24 18:06:49 ERROR hdfs.DFSClient: Failed to close inode 24733 org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /tmp/reommendation/_temporary/_attempt_local1177125820_0001_m_00_0/part-m-0 (inode 24733): File does not exist. Holder DFSClient_NONMAPREDUCE_157704469_1 does not have any open files. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3319) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:3407) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:3377) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:673) at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.complete(AuthorizationProviderProxyClientProtocol.java:219) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:520)
Re: Mahout 0.7 ALS Recommender: java.lang.Exception: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable
Thanks for reply. I did not compile mahout. Mahout 0.9 comes along with Cloudera 5.2. Ashok Kumar Harnal On 24 November 2014 at 18:42, jayunit...@gmail.com wrote: Looks like maybe a mismatch between mahout version you compiled code against and the mahout version installed in the cluster? On Nov 24, 2014, at 8:08 AM, Ashok Harnal ashokhar...@gmail.com wrote: Thanks for reply. Here are the facts: 1. I am using mahout shell command and not a java program. So I am not passing any arguments to map function. 2. I am using hadoop. Input training file is loaded in hadoop. It is a tab separated 'u1.base' file of MovieLens dataset. It is something like below. All users are there along with whatever ratings they have given. 115 123 134 143 153 : : 214 2102 2144 : : 3. I use the following mahout command to build model: mahout parallelALS --input /user/ashokharnal/u1.base --output /user/ashokharnal/u1.out --lambda 0.1 --implicitFeedback true --alpha 0.8 --numFeatures 15 --numIterations 10 --numThreadsPerSolver 1 --tempDir /tmp/ratings 4. My test file is just two-lines tab-separated file as below: 11 21 5. This file is converted to sequence file using the following mahout command: mahout seqdirectory -i /user/ashokharnal/ufind2.test -o /user/ashokharnal/seqfiles 6. I then run the following mahout command: mahout recommendfactorized --input /user/ashokharnal/seqfiles --userFeatures /user/ashokharnal/u1.out/U/ --itemFeatures /user/akh/u1.out/M/ --numRecommendations 1 --output /tmp/reommendation --maxRating 1 7. I am using CentOS 6.5 with Cloudera 5.2 installed. The error messages are as below: 14/11/24 18:06:48 INFO mapred.MapTask: Processing split: hdfs://master:8020/user/ashokharnal/seqfiles/part-m-0:0+195 14/11/24 18:06:49 INFO zlib.ZlibFactory: Successfully loaded initialized native-zlib library 14/11/24 18:06:49 INFO compress.CodecPool: Got brand-new decompressor [.deflate] 14/11/24 18:06:49 INFO mapred.LocalJobRunner: Map task executor complete. 14/11/24 18:06:49 WARN mapred.LocalJobRunner: job_local1177125820_0001 java.lang.Exception: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:406) Caused by: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable at org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper.run(MultithreadedMapper.java:151) at org.apache.mahout.cf.taste.hadoop.als.MultithreadedSharingMapper.run(MultithreadedSharingMapper.java:60) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:268) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable at org.apache.mahout.cf.taste.hadoop.als.PredictionMapper.map(PredictionMapper.java:44) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140) at org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper$MapRunner.run(MultithreadedMapper.java:268) 14/11/24 18:06:49 INFO mapred.JobClient: map 0% reduce 0% 14/11/24 18:06:49 INFO mapred.JobClient: Job complete: job_local1177125820_0001 14/11/24 18:06:49 INFO mapred.JobClient: Counters: 0 14/11/24 18:06:49 INFO driver.MahoutDriver: Program took 2529 ms (Minutes: 0.04215) 14/11/24 18:06:49 ERROR hdfs.DFSClient: Failed to close inode 24733 org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /tmp/reommendation/_temporary/_attempt_local1177125820_0001_m_00_0/part-m-0 (inode 24733): File does not exist. Holder DFSClient_NONMAPREDUCE_157704469_1 does not have any open files. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3319) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:3407) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:3377) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:673) at
Re: Mahout 0.7 ALS Recommender: java.lang.Exception: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable
The error message that you got indicated that some input was textual and needed to be an integer. Is there a chance that the type of some of your input is incorrect in your sequence files? On Mon, Nov 24, 2014 at 3:47 PM, Ashok Harnal ashokhar...@gmail.com wrote: Thanks for reply. I did not compile mahout. Mahout 0.9 comes along with Cloudera 5.2. Ashok Kumar Harnal On 24 November 2014 at 18:42, jayunit...@gmail.com wrote: Looks like maybe a mismatch between mahout version you compiled code against and the mahout version installed in the cluster? On Nov 24, 2014, at 8:08 AM, Ashok Harnal ashokhar...@gmail.com wrote: Thanks for reply. Here are the facts: 1. I am using mahout shell command and not a java program. So I am not passing any arguments to map function. 2. I am using hadoop. Input training file is loaded in hadoop. It is a tab separated 'u1.base' file of MovieLens dataset. It is something like below. All users are there along with whatever ratings they have given. 115 123 134 143 153 : : 214 2102 2144 : : 3. I use the following mahout command to build model: mahout parallelALS --input /user/ashokharnal/u1.base --output /user/ashokharnal/u1.out --lambda 0.1 --implicitFeedback true --alpha 0.8 --numFeatures 15 --numIterations 10 --numThreadsPerSolver 1 --tempDir /tmp/ratings 4. My test file is just two-lines tab-separated file as below: 11 21 5. This file is converted to sequence file using the following mahout command: mahout seqdirectory -i /user/ashokharnal/ufind2.test -o /user/ashokharnal/seqfiles 6. I then run the following mahout command: mahout recommendfactorized --input /user/ashokharnal/seqfiles --userFeatures /user/ashokharnal/u1.out/U/ --itemFeatures /user/akh/u1.out/M/ --numRecommendations 1 --output /tmp/reommendation --maxRating 1 7. I am using CentOS 6.5 with Cloudera 5.2 installed. The error messages are as below: 14/11/24 18:06:48 INFO mapred.MapTask: Processing split: hdfs://master:8020/user/ashokharnal/seqfiles/part-m-0:0+195 14/11/24 18:06:49 INFO zlib.ZlibFactory: Successfully loaded initialized native-zlib library 14/11/24 18:06:49 INFO compress.CodecPool: Got brand-new decompressor [.deflate] 14/11/24 18:06:49 INFO mapred.LocalJobRunner: Map task executor complete. 14/11/24 18:06:49 WARN mapred.LocalJobRunner: job_local1177125820_0001 java.lang.Exception: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:406) Caused by: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable at org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper.run(MultithreadedMapper.java:151) at org.apache.mahout.cf.taste.hadoop.als.MultithreadedSharingMapper.run(MultithreadedSharingMapper.java:60) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:268) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable at org.apache.mahout.cf.taste.hadoop.als.PredictionMapper.map(PredictionMapper.java:44) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140) at org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper$MapRunner.run(MultithreadedMapper.java:268) 14/11/24 18:06:49 INFO mapred.JobClient: map 0% reduce 0% 14/11/24 18:06:49 INFO mapred.JobClient: Job complete: job_local1177125820_0001 14/11/24 18:06:49 INFO mapred.JobClient: Counters: 0 14/11/24 18:06:49 INFO driver.MahoutDriver: Program took 2529 ms (Minutes: 0.04215) 14/11/24 18:06:49 ERROR hdfs.DFSClient: Failed to close inode 24733 org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /tmp/reommendation/_temporary/_attempt_local1177125820_0001_m_00_0/part-m-0 (inode 24733): File does not exist. Holder DFSClient_NONMAPREDUCE_157704469_1 does not have any open files. at
Re: Bi-Factorization vs Tri-Factorization for recommender systems
There is no inherent mathematical difference, but there may be some pretty significant practical differences. Using the three matrix form (X = USV') puts the normalization constants into a place where you can control them a bit easier. This can be useful if you want *both* user and item vectors that are normalized. If you only want item vectors, then it really doesn't matter since you can incorporate as much of S as you like into the item vectors as you like and the rest winds up in the factor that you aren't looking at anyway. On Thu, Nov 20, 2014 at 1:34 AM, Parimi Rohit rohit.par...@gmail.com wrote: Hi All, Are there any (dis)advantages of using tri-factorization (||X - USV'||) as opposed to bi-factorization ((||X - UV'||)) for recommender systems? I have been reading a lot about tri-factorization and how they can be seen as co-clustering of rows and columns and was wondering if such as technique is implemented in Mahout? Also, I am particularly interested in implicit-feedback datasets and the only MF approach I am aware of is the ALS-WR for implicit feedback data implemented in mahout. Are there any other MF techniques? If not, is it possible (and useful) to extend some tri-factorization to handle implicit-feedback along the lines of Collaborative Filtering for Implicit Feedback Datasets (the approach implemented in Mahout). I apologize for any inconvenience as this question is very general and might not be relevant to Mahout and I would really appreciate any thoughts/feedback. Thanks, Rohit
Re: Mahout 0.7 ALS Recommender: java.lang.Exception: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable
Thanks for the reply. I will recheck and repeat the experiment using self-typed input. I am reinstalling Cloudera 5.2. Ashok Kumar Harnal On 24 November 2014 at 21:38, Ted Dunning ted.dunn...@gmail.com wrote: The error message that you got indicated that some input was textual and needed to be an integer. Is there a chance that the type of some of your input is incorrect in your sequence files? On Mon, Nov 24, 2014 at 3:47 PM, Ashok Harnal ashokhar...@gmail.com wrote: Thanks for reply. I did not compile mahout. Mahout 0.9 comes along with Cloudera 5.2. Ashok Kumar Harnal On 24 November 2014 at 18:42, jayunit...@gmail.com wrote: Looks like maybe a mismatch between mahout version you compiled code against and the mahout version installed in the cluster? On Nov 24, 2014, at 8:08 AM, Ashok Harnal ashokhar...@gmail.com wrote: Thanks for reply. Here are the facts: 1. I am using mahout shell command and not a java program. So I am not passing any arguments to map function. 2. I am using hadoop. Input training file is loaded in hadoop. It is a tab separated 'u1.base' file of MovieLens dataset. It is something like below. All users are there along with whatever ratings they have given. 115 123 134 143 153 : : 214 2102 2144 : : 3. I use the following mahout command to build model: mahout parallelALS --input /user/ashokharnal/u1.base --output /user/ashokharnal/u1.out --lambda 0.1 --implicitFeedback true --alpha 0.8 --numFeatures 15 --numIterations 10 --numThreadsPerSolver 1 --tempDir /tmp/ratings 4. My test file is just two-lines tab-separated file as below: 11 21 5. This file is converted to sequence file using the following mahout command: mahout seqdirectory -i /user/ashokharnal/ufind2.test -o /user/ashokharnal/seqfiles 6. I then run the following mahout command: mahout recommendfactorized --input /user/ashokharnal/seqfiles --userFeatures /user/ashokharnal/u1.out/U/ --itemFeatures /user/akh/u1.out/M/ --numRecommendations 1 --output /tmp/reommendation --maxRating 1 7. I am using CentOS 6.5 with Cloudera 5.2 installed. The error messages are as below: 14/11/24 18:06:48 INFO mapred.MapTask: Processing split: hdfs://master:8020/user/ashokharnal/seqfiles/part-m-0:0+195 14/11/24 18:06:49 INFO zlib.ZlibFactory: Successfully loaded initialized native-zlib library 14/11/24 18:06:49 INFO compress.CodecPool: Got brand-new decompressor [.deflate] 14/11/24 18:06:49 INFO mapred.LocalJobRunner: Map task executor complete. 14/11/24 18:06:49 WARN mapred.LocalJobRunner: job_local1177125820_0001 java.lang.Exception: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:406) Caused by: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable at org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper.run(MultithreadedMapper.java:151) at org.apache.mahout.cf.taste.hadoop.als.MultithreadedSharingMapper.run(MultithreadedSharingMapper.java:60) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:268) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable at org.apache.mahout.cf.taste.hadoop.als.PredictionMapper.map(PredictionMapper.java:44) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140) at org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper$MapRunner.run(MultithreadedMapper.java:268) 14/11/24 18:06:49 INFO mapred.JobClient: map 0% reduce 0% 14/11/24 18:06:49 INFO mapred.JobClient: Job complete: job_local1177125820_0001 14/11/24 18:06:49 INFO mapred.JobClient: Counters: 0 14/11/24 18:06:49 INFO driver.MahoutDriver: Program took 2529 ms (Minutes: 0.04215) 14/11/24 18:06:49 ERROR hdfs.DFSClient: Failed to close inode 24733