Re: Mahout 0.7 ALS Recommender: java.lang.Exception: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable
I have now tested on a fresh cluster of Cloudera 5.2. Mahout 0.9 comes installed with it. My input data is just five lines, tab-separated. I have typed this data myself. So I do not expect anything else in this data. 11001 12005 14001 22002 23001 I use the following Mahout command for factorization: mahout parallelALS --input /user/ashokharnal/mydata --output /user/ashokharnal/outdata --lambda 0.1 --implicitFeedback true --alpha 0.8 --numFeatures 2 --numIterations 5 --numThreadsPerSolver 1 --tempDir /tmp/ratings I then, create the following just two-line tab separated test file. 1 100 2 200 I have typed this out myself. So no text string is expected. This file was then converted to sequence format, as: mahout seqdirectory -i /user/ashokharnal/testdata -ow -o /user/ashokharnal/seqfiles Finally, I ran the following command to get recommendations: mahout recommendfactorized --input /user/ashokharnal/seqfiles --userFeatures /user/ashokharnal/outdata/U/ --itemFeatures /user/ashokharnal/outdata/M/ --numRecommendations 1 --output recommendations --maxRating 1 I get the same error. Full error trace is as below: $ mahout recommendfactorized --input /user/ashokharnal/seqfiles --userFeatures /user/ashokharnal/outdata/U/ --itemFeatures /user/ashokharnal/outdata/M/ --numRecommendations 1 --output recommendations --maxRating 1 MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath. Running on hadoop, using /opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/bin/hadoop and HADOOP_CONF_DIR=/etc/hadoop/conf MAHOUT-JOB: /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/mahout/mahout-examples-0.9-cdh5.2.0-job.jar 14/11/25 13:48:46 WARN driver.MahoutDriver: No recommendfactorized.props found on classpath, will use command-line arguments only 14/11/25 13:48:46 INFO common.AbstractJob: Command line arguments: {--endPhase=[2147483647], --input=[/user/ashokharnal/seqfiles], --itemFeatures=[/user/ashokharnal/outdata/M/], --maxRating=[1], --numRecommendations=[1], --numThreads=[1], --output=[recommendations], --startPhase=[0], --tempDir=[temp], --userFeatures=[/user/ashokharnal/outdata/U/]} 14/11/25 13:48:47 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id 14/11/25 13:48:47 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 14/11/25 13:48:47 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 14/11/25 13:48:47 INFO input.FileInputFormat: Total input paths to process : 1 14/11/25 13:48:48 WARN conf.Configuration: file:/tmp/hadoop-bigdata1/mapred/local/localRunner/bigdata1/job_local2071551631_0001/job_local2071551631_0001.xml:an attempt to override final parameter: hadoop.ssl.keystores.factory.class; Ignoring. 14/11/25 13:48:48 WARN conf.Configuration: file:/tmp/hadoop-bigdata1/mapred/local/localRunner/bigdata1/job_local2071551631_0001/job_local2071551631_0001.xml:an attempt to override final parameter: hadoop.ssl.client.conf; Ignoring. 14/11/25 13:48:48 WARN conf.Configuration: file:/tmp/hadoop-bigdata1/mapred/local/localRunner/bigdata1/job_local2071551631_0001/job_local2071551631_0001.xml:an attempt to override final parameter: hadoop.ssl.server.conf; Ignoring. 14/11/25 13:48:48 WARN conf.Configuration: file:/tmp/hadoop-bigdata1/mapred/local/localRunner/bigdata1/job_local2071551631_0001/job_local2071551631_0001.xml:an attempt to override final parameter: hadoop.ssl.require.client.cert; Ignoring. 14/11/25 13:48:48 INFO mapred.LocalJobRunner: OutputCommitter set in config null 14/11/25 13:48:48 INFO mapred.JobClient: Running job: job_local2071551631_0001 14/11/25 13:48:48 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter 14/11/25 13:48:48 INFO mapred.LocalJobRunner: Waiting for map tasks 14/11/25 13:48:48 INFO mapred.LocalJobRunner: Starting task: attempt_local2071551631_0001_m_00_0 14/11/25 13:48:48 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead 14/11/25 13:48:48 INFO util.ProcessTree: setsid exited with exit code 0 14/11/25 13:48:48 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@4e7f1fc4 14/11/25 13:48:48 INFO mapred.MapTask: Processing split: hdfs://bigdata1:8020/user/ashokharnal/seqfiles/part-m-0:0+196 14/11/25 13:48:48 INFO zlib.ZlibFactory: Successfully loaded initialized native-zlib library 14/11/25 13:48:48 INFO compress.CodecPool: Got brand-new decompressor [.deflate] 14/11/25 13:48:48 INFO mapred.LocalJobRunner: Map task executor complete. 14/11/25 13:48:48 WARN mapred.LocalJobRunner: job_local2071551631_0001 java.lang.Exception: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable at
Re: Mahout 0.7 ALS Recommender: java.lang.Exception: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable
The problem is that seqdirectory doesn't do what you want. From the documentation page: The output of seqDirectory will be a Sequence file Text, Text of all documents (/sub-directory-path/documentFileName, documentText). Please see http://mahout.apache.org/users/basics/creating-vectors-from-text.html for more details Sent from my iPhone On Nov 25, 2014, at 10:35, Ashok Harnal ashokhar...@gmail.com wrote: I have now tested on a fresh cluster of Cloudera 5.2. Mahout 0.9 comes installed with it. My input data is just five lines, tab-separated. I have typed this data myself. So I do not expect anything else in this data. 11001 12005 14001 22002 23001 I use the following Mahout command for factorization: mahout parallelALS --input /user/ashokharnal/mydata --output /user/ashokharnal/outdata --lambda 0.1 --implicitFeedback true --alpha 0.8 --numFeatures 2 --numIterations 5 --numThreadsPerSolver 1 --tempDir /tmp/ratings I then, create the following just two-line tab separated test file. 1100 2200 I have typed this out myself. So no text string is expected. This file was then converted to sequence format, as: mahout seqdirectory -i /user/ashokharnal/testdata -ow -o /user/ashokharnal/seqfiles Finally, I ran the following command to get recommendations: mahout recommendfactorized --input /user/ashokharnal/seqfiles --userFeatures /user/ashokharnal/outdata/U/ --itemFeatures /user/ashokharnal/outdata/M/ --numRecommendations 1 --output recommendations --maxRating 1 I get the same error. Full error trace is as below: $ mahout recommendfactorized --input /user/ashokharnal/seqfiles --userFeatures /user/ashokharnal/outdata/U/ --itemFeatures /user/ashokharnal/outdata/M/ --numRecommendations 1 --output recommendations --maxRating 1 MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath. Running on hadoop, using /opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/bin/hadoop and HADOOP_CONF_DIR=/etc/hadoop/conf MAHOUT-JOB: /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/mahout/mahout-examples-0.9-cdh5.2.0-job.jar 14/11/25 13:48:46 WARN driver.MahoutDriver: No recommendfactorized.props found on classpath, will use command-line arguments only 14/11/25 13:48:46 INFO common.AbstractJob: Command line arguments: {--endPhase=[2147483647], --input=[/user/ashokharnal/seqfiles], --itemFeatures=[/user/ashokharnal/outdata/M/], --maxRating=[1], --numRecommendations=[1], --numThreads=[1], --output=[recommendations], --startPhase=[0], --tempDir=[temp], --userFeatures=[/user/ashokharnal/outdata/U/]} 14/11/25 13:48:47 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id 14/11/25 13:48:47 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 14/11/25 13:48:47 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 14/11/25 13:48:47 INFO input.FileInputFormat: Total input paths to process : 1 14/11/25 13:48:48 WARN conf.Configuration: file:/tmp/hadoop-bigdata1/mapred/local/localRunner/bigdata1/job_local2071551631_0001/job_local2071551631_0001.xml:an attempt to override final parameter: hadoop.ssl.keystores.factory.class; Ignoring. 14/11/25 13:48:48 WARN conf.Configuration: file:/tmp/hadoop-bigdata1/mapred/local/localRunner/bigdata1/job_local2071551631_0001/job_local2071551631_0001.xml:an attempt to override final parameter: hadoop.ssl.client.conf; Ignoring. 14/11/25 13:48:48 WARN conf.Configuration: file:/tmp/hadoop-bigdata1/mapred/local/localRunner/bigdata1/job_local2071551631_0001/job_local2071551631_0001.xml:an attempt to override final parameter: hadoop.ssl.server.conf; Ignoring. 14/11/25 13:48:48 WARN conf.Configuration: file:/tmp/hadoop-bigdata1/mapred/local/localRunner/bigdata1/job_local2071551631_0001/job_local2071551631_0001.xml:an attempt to override final parameter: hadoop.ssl.require.client.cert; Ignoring. 14/11/25 13:48:48 INFO mapred.LocalJobRunner: OutputCommitter set in config null 14/11/25 13:48:48 INFO mapred.JobClient: Running job: job_local2071551631_0001 14/11/25 13:48:48 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter 14/11/25 13:48:48 INFO mapred.LocalJobRunner: Waiting for map tasks 14/11/25 13:48:48 INFO mapred.LocalJobRunner: Starting task: attempt_local2071551631_0001_m_00_0 14/11/25 13:48:48 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead 14/11/25 13:48:48 INFO util.ProcessTree: setsid exited with exit code 0 14/11/25 13:48:48 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@4e7f1fc4 14/11/25 13:48:48 INFO mapred.MapTask: Processing split: hdfs://bigdata1:8020/user/ashokharnal/seqfiles/part-m-0:0+196
Re: Mahout 0.7 ALS Recommender: java.lang.Exception: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable
Thank you for the reply. I proceeded as per the Example listed in Apache Mahout help page at this link https://mahout.apache.org/users/recommender/intro-als-hadoop.html: https://mahout.apache.org/users/recommender/intro-als-hadoop.html As per Step 4 of this link, after creation of sequence file, issue the following command: $ mahout recommendfactorized --input $als_input --userFeatures $als_output/U/ --itemFeatures $als_output/M/ --numRecommendations 1 --output recommendations --maxRating 1 Now, the folders 'U' and 'M' as are mentioned in above command are created during the process of sequence file creation by mahout as per the following command: $mahout seqdirectory -i /user/ashokharnal/testdata -ow -o /user/ashokharnal/seqfiles Since these very names were used in the Example, I thought nothing more was required to be done in creating sequence file. What further steps are needed? Please suggest simple shell command. Thanks, Ashok Kumar Harnal On 25 November 2014 at 14:52, Gokhan Capan gkhn...@gmail.com wrote: The problem is that seqdirectory doesn't do what you want. From the documentation page: The output of seqDirectory will be a Sequence file Text, Text of all documents (/sub-directory-path/documentFileName, documentText). Please see http://mahout.apache.org/users/basics/creating-vectors-from-text.html for more details Sent from my iPhone On Nov 25, 2014, at 10:35, Ashok Harnal ashokhar...@gmail.com wrote: I have now tested on a fresh cluster of Cloudera 5.2. Mahout 0.9 comes installed with it. My input data is just five lines, tab-separated. I have typed this data myself. So I do not expect anything else in this data. 11001 12005 14001 22002 23001 I use the following Mahout command for factorization: mahout parallelALS --input /user/ashokharnal/mydata --output /user/ashokharnal/outdata --lambda 0.1 --implicitFeedback true --alpha 0.8 --numFeatures 2 --numIterations 5 --numThreadsPerSolver 1 --tempDir /tmp/ratings I then, create the following just two-line tab separated test file. 1100 2200 I have typed this out myself. So no text string is expected. This file was then converted to sequence format, as: mahout seqdirectory -i /user/ashokharnal/testdata -ow -o /user/ashokharnal/seqfiles Finally, I ran the following command to get recommendations: mahout recommendfactorized --input /user/ashokharnal/seqfiles --userFeatures /user/ashokharnal/outdata/U/ --itemFeatures /user/ashokharnal/outdata/M/ --numRecommendations 1 --output recommendations --maxRating 1 I get the same error. Full error trace is as below: $ mahout recommendfactorized --input /user/ashokharnal/seqfiles --userFeatures /user/ashokharnal/outdata/U/ --itemFeatures /user/ashokharnal/outdata/M/ --numRecommendations 1 --output recommendations --maxRating 1 MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath. Running on hadoop, using /opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/bin/hadoop and HADOOP_CONF_DIR=/etc/hadoop/conf MAHOUT-JOB: /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/mahout/mahout-examples-0.9-cdh5.2.0-job.jar 14/11/25 13:48:46 WARN driver.MahoutDriver: No recommendfactorized.props found on classpath, will use command-line arguments only 14/11/25 13:48:46 INFO common.AbstractJob: Command line arguments: {--endPhase=[2147483647], --input=[/user/ashokharnal/seqfiles], --itemFeatures=[/user/ashokharnal/outdata/M/], --maxRating=[1], --numRecommendations=[1], --numThreads=[1], --output=[recommendations], --startPhase=[0], --tempDir=[temp], --userFeatures=[/user/ashokharnal/outdata/U/]} 14/11/25 13:48:47 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id 14/11/25 13:48:47 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 14/11/25 13:48:47 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 14/11/25 13:48:47 INFO input.FileInputFormat: Total input paths to process : 1 14/11/25 13:48:48 WARN conf.Configuration: file:/tmp/hadoop-bigdata1/mapred/local/localRunner/bigdata1/job_local2071551631_0001/job_local2071551631_0001.xml:an attempt to override final parameter: hadoop.ssl.keystores.factory.class; Ignoring. 14/11/25 13:48:48 WARN conf.Configuration: file:/tmp/hadoop-bigdata1/mapred/local/localRunner/bigdata1/job_local2071551631_0001/job_local2071551631_0001.xml:an attempt to override final parameter: hadoop.ssl.client.conf; Ignoring. 14/11/25 13:48:48 WARN conf.Configuration: file:/tmp/hadoop-bigdata1/mapred/local/localRunner/bigdata1/job_local2071551631_0001/job_local2071551631_0001.xml:an attempt to override final parameter: hadoop.ssl.server.conf; Ignoring. 14/11/25 13:48:48 WARN conf.Configuration:
Re: Mahout 0.7 ALS Recommender: java.lang.Exception: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable
If I don't miss it, the documentation in the link doesn't say anything about using seqdirectory. I don't remember how it works in 0.7, but it basically says: Given a file of lines of userId\titemId\trating, 1- run mahout parallelALS 2- run mahout recommendfactorized The input file for the 2nd step is the output of the first one. Hope this helps Gokhan On Tue, Nov 25, 2014 at 4:03 PM, Ashok Harnal ashokhar...@gmail.com wrote: Thank you for the reply. I proceeded as per the Example listed in Apache Mahout help page at this link https://mahout.apache.org/users/recommender/intro-als-hadoop.html: https://mahout.apache.org/users/recommender/intro-als-hadoop.html As per Step 4 of this link, after creation of sequence file, issue the following command: $ mahout recommendfactorized --input $als_input --userFeatures $als_output/U/ --itemFeatures $als_output/M/ --numRecommendations 1 --output recommendations --maxRating 1 Now, the folders 'U' and 'M' as are mentioned in above command are created during the process of sequence file creation by mahout as per the following command: $mahout seqdirectory -i /user/ashokharnal/testdata -ow -o /user/ashokharnal/seqfiles Since these very names were used in the Example, I thought nothing more was required to be done in creating sequence file. What further steps are needed? Please suggest simple shell command. Thanks, Ashok Kumar Harnal On 25 November 2014 at 14:52, Gokhan Capan gkhn...@gmail.com wrote: The problem is that seqdirectory doesn't do what you want. From the documentation page: The output of seqDirectory will be a Sequence file Text, Text of all documents (/sub-directory-path/documentFileName, documentText). Please see http://mahout.apache.org/users/basics/creating-vectors-from-text.html for more details Sent from my iPhone On Nov 25, 2014, at 10:35, Ashok Harnal ashokhar...@gmail.com wrote: I have now tested on a fresh cluster of Cloudera 5.2. Mahout 0.9 comes installed with it. My input data is just five lines, tab-separated. I have typed this data myself. So I do not expect anything else in this data. 11001 12005 14001 22002 23001 I use the following Mahout command for factorization: mahout parallelALS --input /user/ashokharnal/mydata --output /user/ashokharnal/outdata --lambda 0.1 --implicitFeedback true --alpha 0.8 --numFeatures 2 --numIterations 5 --numThreadsPerSolver 1 --tempDir /tmp/ratings I then, create the following just two-line tab separated test file. 1100 2200 I have typed this out myself. So no text string is expected. This file was then converted to sequence format, as: mahout seqdirectory -i /user/ashokharnal/testdata -ow -o /user/ashokharnal/seqfiles Finally, I ran the following command to get recommendations: mahout recommendfactorized --input /user/ashokharnal/seqfiles --userFeatures /user/ashokharnal/outdata/U/ --itemFeatures /user/ashokharnal/outdata/M/ --numRecommendations 1 --output recommendations --maxRating 1 I get the same error. Full error trace is as below: $ mahout recommendfactorized --input /user/ashokharnal/seqfiles --userFeatures /user/ashokharnal/outdata/U/ --itemFeatures /user/ashokharnal/outdata/M/ --numRecommendations 1 --output recommendations --maxRating 1 MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath. Running on hadoop, using /opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/bin/hadoop and HADOOP_CONF_DIR=/etc/hadoop/conf MAHOUT-JOB: /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/mahout/mahout-examples-0.9-cdh5.2.0-job.jar 14/11/25 13:48:46 WARN driver.MahoutDriver: No recommendfactorized.props found on classpath, will use command-line arguments only 14/11/25 13:48:46 INFO common.AbstractJob: Command line arguments: {--endPhase=[2147483647], --input=[/user/ashokharnal/seqfiles], --itemFeatures=[/user/ashokharnal/outdata/M/], --maxRating=[1], --numRecommendations=[1], --numThreads=[1], --output=[recommendations], --startPhase=[0], --tempDir=[temp], --userFeatures=[/user/ashokharnal/outdata/U/]} 14/11/25 13:48:47 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id 14/11/25 13:48:47 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 14/11/25 13:48:47 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 14/11/25 13:48:47 INFO input.FileInputFormat: Total input paths to process : 1 14/11/25 13:48:48 WARN conf.Configuration: file:/tmp/hadoop-bigdata1/mapred/local/localRunner/bigdata1/job_local2071551631_0001/job_local2071551631_0001.xml:an attempt to override final parameter: hadoop.ssl.keystores.factory.class;
Re: Mahout 0.7 ALS Recommender: java.lang.Exception: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable
Well, I have tried again. The Mahout documentation at this link ( https://mahout.apache.org/users/recommender/intro-als-hadoop.html ) says that once user and item features have been obtained, we proceed as follows: 1. For users we now want to make recommendations, we list them in a sequence file format with two fields: userid and itemid. 2. Feed the user-features, item-features and the file in (1) above to make recommendations using recommendfactorized. For, if I feed to recommendfactorized just a plain text file of userids and itemids, error is generated that input is not in sequence file format. So, I use mahout seqdirectory command to create a sequence directory. But this time, as suggested by you, I run the following mahout command, ie point the command to sequence file and not to sequence directory. mahout recommendfactorized --input /home/ashokharnal/useless/seqfiles/part-m-0 --userFeatures /user/ashokharnal/useless/outdata/U/ --itemFeatures /user/ashokharnal/useless/outdata/M/ --numRecommendations 1 --output recommendations --maxRating 1 (part-m-0 is the sequence file in the sequence directory). I get the same error as earlier. There is no respite. Thanks, Ashok Kumar Harnal On 25 November 2014 at 19:48, Gokhan Capan gkhn...@gmail.com wrote: If I don't miss it, the documentation in the link doesn't say anything about using seqdirectory. I don't remember how it works in 0.7, but it basically says: Given a file of lines of userId\titemId\trating, 1- run mahout parallelALS 2- run mahout recommendfactorized The input file for the 2nd step is the output of the first one. Hope this helps Gokhan On Tue, Nov 25, 2014 at 4:03 PM, Ashok Harnal ashokhar...@gmail.com wrote: Thank you for the reply. I proceeded as per the Example listed in Apache Mahout help page at this link https://mahout.apache.org/users/recommender/intro-als-hadoop.html : https://mahout.apache.org/users/recommender/intro-als-hadoop.html As per Step 4 of this link, after creation of sequence file, issue the following command: $ mahout recommendfactorized --input $als_input --userFeatures $als_output/U/ --itemFeatures $als_output/M/ --numRecommendations 1 --output recommendations --maxRating 1 Now, the folders 'U' and 'M' as are mentioned in above command are created during the process of sequence file creation by mahout as per the following command: $mahout seqdirectory -i /user/ashokharnal/testdata -ow -o /user/ashokharnal/seqfiles Since these very names were used in the Example, I thought nothing more was required to be done in creating sequence file. What further steps are needed? Please suggest simple shell command. Thanks, Ashok Kumar Harnal On 25 November 2014 at 14:52, Gokhan Capan gkhn...@gmail.com wrote: The problem is that seqdirectory doesn't do what you want. From the documentation page: The output of seqDirectory will be a Sequence file Text, Text of all documents (/sub-directory-path/documentFileName, documentText). Please see http://mahout.apache.org/users/basics/creating-vectors-from-text.html for more details Sent from my iPhone On Nov 25, 2014, at 10:35, Ashok Harnal ashokhar...@gmail.com wrote: I have now tested on a fresh cluster of Cloudera 5.2. Mahout 0.9 comes installed with it. My input data is just five lines, tab-separated. I have typed this data myself. So I do not expect anything else in this data. 11001 12005 14001 22002 23001 I use the following Mahout command for factorization: mahout parallelALS --input /user/ashokharnal/mydata --output /user/ashokharnal/outdata --lambda 0.1 --implicitFeedback true --alpha 0.8 --numFeatures 2 --numIterations 5 --numThreadsPerSolver 1 --tempDir /tmp/ratings I then, create the following just two-line tab separated test file. 1100 2200 I have typed this out myself. So no text string is expected. This file was then converted to sequence format, as: mahout seqdirectory -i /user/ashokharnal/testdata -ow -o /user/ashokharnal/seqfiles Finally, I ran the following command to get recommendations: mahout recommendfactorized --input /user/ashokharnal/seqfiles --userFeatures /user/ashokharnal/outdata/U/ --itemFeatures /user/ashokharnal/outdata/M/ --numRecommendations 1 --output recommendations --maxRating 1 I get the same error. Full error trace is as below: $ mahout recommendfactorized --input /user/ashokharnal/seqfiles --userFeatures /user/ashokharnal/outdata/U/ --itemFeatures /user/ashokharnal/outdata/M/ --numRecommendations 1 --output recommendations --maxRating 1 MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath. Running on hadoop, using
Re: Mahout 0.7 ALS Recommender: java.lang.Exception: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable
Looks like maybe a mismatch between mahout version you compiled code against and the mahout version installed in the cluster? On Nov 24, 2014, at 8:08 AM, Ashok Harnal ashokhar...@gmail.com wrote: Thanks for reply. Here are the facts: 1. I am using mahout shell command and not a java program. So I am not passing any arguments to map function. 2. I am using hadoop. Input training file is loaded in hadoop. It is a tab separated 'u1.base' file of MovieLens dataset. It is something like below. All users are there along with whatever ratings they have given. 115 123 134 143 153 : : 214 2102 2144 : : 3. I use the following mahout command to build model: mahout parallelALS --input /user/ashokharnal/u1.base --output /user/ashokharnal/u1.out --lambda 0.1 --implicitFeedback true --alpha 0.8 --numFeatures 15 --numIterations 10 --numThreadsPerSolver 1 --tempDir /tmp/ratings 4. My test file is just two-lines tab-separated file as below: 11 21 5. This file is converted to sequence file using the following mahout command: mahout seqdirectory -i /user/ashokharnal/ufind2.test -o /user/ashokharnal/seqfiles 6. I then run the following mahout command: mahout recommendfactorized --input /user/ashokharnal/seqfiles --userFeatures /user/ashokharnal/u1.out/U/ --itemFeatures /user/akh/u1.out/M/ --numRecommendations 1 --output /tmp/reommendation --maxRating 1 7. I am using CentOS 6.5 with Cloudera 5.2 installed. The error messages are as below: 14/11/24 18:06:48 INFO mapred.MapTask: Processing split: hdfs://master:8020/user/ashokharnal/seqfiles/part-m-0:0+195 14/11/24 18:06:49 INFO zlib.ZlibFactory: Successfully loaded initialized native-zlib library 14/11/24 18:06:49 INFO compress.CodecPool: Got brand-new decompressor [.deflate] 14/11/24 18:06:49 INFO mapred.LocalJobRunner: Map task executor complete. 14/11/24 18:06:49 WARN mapred.LocalJobRunner: job_local1177125820_0001 java.lang.Exception: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:406) Caused by: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable at org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper.run(MultithreadedMapper.java:151) at org.apache.mahout.cf.taste.hadoop.als.MultithreadedSharingMapper.run(MultithreadedSharingMapper.java:60) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:268) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable at org.apache.mahout.cf.taste.hadoop.als.PredictionMapper.map(PredictionMapper.java:44) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140) at org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper$MapRunner.run(MultithreadedMapper.java:268) 14/11/24 18:06:49 INFO mapred.JobClient: map 0% reduce 0% 14/11/24 18:06:49 INFO mapred.JobClient: Job complete: job_local1177125820_0001 14/11/24 18:06:49 INFO mapred.JobClient: Counters: 0 14/11/24 18:06:49 INFO driver.MahoutDriver: Program took 2529 ms (Minutes: 0.04215) 14/11/24 18:06:49 ERROR hdfs.DFSClient: Failed to close inode 24733 org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /tmp/reommendation/_temporary/_attempt_local1177125820_0001_m_00_0/part-m-0 (inode 24733): File does not exist. Holder DFSClient_NONMAPREDUCE_157704469_1 does not have any open files. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3319) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:3407) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:3377) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:673) at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.complete(AuthorizationProviderProxyClientProtocol.java:219) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:520)
Re: Mahout 0.7 ALS Recommender: java.lang.Exception: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable
Thanks for reply. I did not compile mahout. Mahout 0.9 comes along with Cloudera 5.2. Ashok Kumar Harnal On 24 November 2014 at 18:42, jayunit...@gmail.com wrote: Looks like maybe a mismatch between mahout version you compiled code against and the mahout version installed in the cluster? On Nov 24, 2014, at 8:08 AM, Ashok Harnal ashokhar...@gmail.com wrote: Thanks for reply. Here are the facts: 1. I am using mahout shell command and not a java program. So I am not passing any arguments to map function. 2. I am using hadoop. Input training file is loaded in hadoop. It is a tab separated 'u1.base' file of MovieLens dataset. It is something like below. All users are there along with whatever ratings they have given. 115 123 134 143 153 : : 214 2102 2144 : : 3. I use the following mahout command to build model: mahout parallelALS --input /user/ashokharnal/u1.base --output /user/ashokharnal/u1.out --lambda 0.1 --implicitFeedback true --alpha 0.8 --numFeatures 15 --numIterations 10 --numThreadsPerSolver 1 --tempDir /tmp/ratings 4. My test file is just two-lines tab-separated file as below: 11 21 5. This file is converted to sequence file using the following mahout command: mahout seqdirectory -i /user/ashokharnal/ufind2.test -o /user/ashokharnal/seqfiles 6. I then run the following mahout command: mahout recommendfactorized --input /user/ashokharnal/seqfiles --userFeatures /user/ashokharnal/u1.out/U/ --itemFeatures /user/akh/u1.out/M/ --numRecommendations 1 --output /tmp/reommendation --maxRating 1 7. I am using CentOS 6.5 with Cloudera 5.2 installed. The error messages are as below: 14/11/24 18:06:48 INFO mapred.MapTask: Processing split: hdfs://master:8020/user/ashokharnal/seqfiles/part-m-0:0+195 14/11/24 18:06:49 INFO zlib.ZlibFactory: Successfully loaded initialized native-zlib library 14/11/24 18:06:49 INFO compress.CodecPool: Got brand-new decompressor [.deflate] 14/11/24 18:06:49 INFO mapred.LocalJobRunner: Map task executor complete. 14/11/24 18:06:49 WARN mapred.LocalJobRunner: job_local1177125820_0001 java.lang.Exception: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:406) Caused by: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable at org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper.run(MultithreadedMapper.java:151) at org.apache.mahout.cf.taste.hadoop.als.MultithreadedSharingMapper.run(MultithreadedSharingMapper.java:60) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:268) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable at org.apache.mahout.cf.taste.hadoop.als.PredictionMapper.map(PredictionMapper.java:44) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140) at org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper$MapRunner.run(MultithreadedMapper.java:268) 14/11/24 18:06:49 INFO mapred.JobClient: map 0% reduce 0% 14/11/24 18:06:49 INFO mapred.JobClient: Job complete: job_local1177125820_0001 14/11/24 18:06:49 INFO mapred.JobClient: Counters: 0 14/11/24 18:06:49 INFO driver.MahoutDriver: Program took 2529 ms (Minutes: 0.04215) 14/11/24 18:06:49 ERROR hdfs.DFSClient: Failed to close inode 24733 org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /tmp/reommendation/_temporary/_attempt_local1177125820_0001_m_00_0/part-m-0 (inode 24733): File does not exist. Holder DFSClient_NONMAPREDUCE_157704469_1 does not have any open files. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3319) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:3407) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:3377) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:673) at
Re: Mahout 0.7 ALS Recommender: java.lang.Exception: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable
The error message that you got indicated that some input was textual and needed to be an integer. Is there a chance that the type of some of your input is incorrect in your sequence files? On Mon, Nov 24, 2014 at 3:47 PM, Ashok Harnal ashokhar...@gmail.com wrote: Thanks for reply. I did not compile mahout. Mahout 0.9 comes along with Cloudera 5.2. Ashok Kumar Harnal On 24 November 2014 at 18:42, jayunit...@gmail.com wrote: Looks like maybe a mismatch between mahout version you compiled code against and the mahout version installed in the cluster? On Nov 24, 2014, at 8:08 AM, Ashok Harnal ashokhar...@gmail.com wrote: Thanks for reply. Here are the facts: 1. I am using mahout shell command and not a java program. So I am not passing any arguments to map function. 2. I am using hadoop. Input training file is loaded in hadoop. It is a tab separated 'u1.base' file of MovieLens dataset. It is something like below. All users are there along with whatever ratings they have given. 115 123 134 143 153 : : 214 2102 2144 : : 3. I use the following mahout command to build model: mahout parallelALS --input /user/ashokharnal/u1.base --output /user/ashokharnal/u1.out --lambda 0.1 --implicitFeedback true --alpha 0.8 --numFeatures 15 --numIterations 10 --numThreadsPerSolver 1 --tempDir /tmp/ratings 4. My test file is just two-lines tab-separated file as below: 11 21 5. This file is converted to sequence file using the following mahout command: mahout seqdirectory -i /user/ashokharnal/ufind2.test -o /user/ashokharnal/seqfiles 6. I then run the following mahout command: mahout recommendfactorized --input /user/ashokharnal/seqfiles --userFeatures /user/ashokharnal/u1.out/U/ --itemFeatures /user/akh/u1.out/M/ --numRecommendations 1 --output /tmp/reommendation --maxRating 1 7. I am using CentOS 6.5 with Cloudera 5.2 installed. The error messages are as below: 14/11/24 18:06:48 INFO mapred.MapTask: Processing split: hdfs://master:8020/user/ashokharnal/seqfiles/part-m-0:0+195 14/11/24 18:06:49 INFO zlib.ZlibFactory: Successfully loaded initialized native-zlib library 14/11/24 18:06:49 INFO compress.CodecPool: Got brand-new decompressor [.deflate] 14/11/24 18:06:49 INFO mapred.LocalJobRunner: Map task executor complete. 14/11/24 18:06:49 WARN mapred.LocalJobRunner: job_local1177125820_0001 java.lang.Exception: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:406) Caused by: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable at org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper.run(MultithreadedMapper.java:151) at org.apache.mahout.cf.taste.hadoop.als.MultithreadedSharingMapper.run(MultithreadedSharingMapper.java:60) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:268) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable at org.apache.mahout.cf.taste.hadoop.als.PredictionMapper.map(PredictionMapper.java:44) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140) at org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper$MapRunner.run(MultithreadedMapper.java:268) 14/11/24 18:06:49 INFO mapred.JobClient: map 0% reduce 0% 14/11/24 18:06:49 INFO mapred.JobClient: Job complete: job_local1177125820_0001 14/11/24 18:06:49 INFO mapred.JobClient: Counters: 0 14/11/24 18:06:49 INFO driver.MahoutDriver: Program took 2529 ms (Minutes: 0.04215) 14/11/24 18:06:49 ERROR hdfs.DFSClient: Failed to close inode 24733 org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /tmp/reommendation/_temporary/_attempt_local1177125820_0001_m_00_0/part-m-0 (inode 24733): File does not exist. Holder DFSClient_NONMAPREDUCE_157704469_1 does not have any open files. at
Re: Mahout 0.7 ALS Recommender: java.lang.Exception: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable
Thanks for the reply. I will recheck and repeat the experiment using self-typed input. I am reinstalling Cloudera 5.2. Ashok Kumar Harnal On 24 November 2014 at 21:38, Ted Dunning ted.dunn...@gmail.com wrote: The error message that you got indicated that some input was textual and needed to be an integer. Is there a chance that the type of some of your input is incorrect in your sequence files? On Mon, Nov 24, 2014 at 3:47 PM, Ashok Harnal ashokhar...@gmail.com wrote: Thanks for reply. I did not compile mahout. Mahout 0.9 comes along with Cloudera 5.2. Ashok Kumar Harnal On 24 November 2014 at 18:42, jayunit...@gmail.com wrote: Looks like maybe a mismatch between mahout version you compiled code against and the mahout version installed in the cluster? On Nov 24, 2014, at 8:08 AM, Ashok Harnal ashokhar...@gmail.com wrote: Thanks for reply. Here are the facts: 1. I am using mahout shell command and not a java program. So I am not passing any arguments to map function. 2. I am using hadoop. Input training file is loaded in hadoop. It is a tab separated 'u1.base' file of MovieLens dataset. It is something like below. All users are there along with whatever ratings they have given. 115 123 134 143 153 : : 214 2102 2144 : : 3. I use the following mahout command to build model: mahout parallelALS --input /user/ashokharnal/u1.base --output /user/ashokharnal/u1.out --lambda 0.1 --implicitFeedback true --alpha 0.8 --numFeatures 15 --numIterations 10 --numThreadsPerSolver 1 --tempDir /tmp/ratings 4. My test file is just two-lines tab-separated file as below: 11 21 5. This file is converted to sequence file using the following mahout command: mahout seqdirectory -i /user/ashokharnal/ufind2.test -o /user/ashokharnal/seqfiles 6. I then run the following mahout command: mahout recommendfactorized --input /user/ashokharnal/seqfiles --userFeatures /user/ashokharnal/u1.out/U/ --itemFeatures /user/akh/u1.out/M/ --numRecommendations 1 --output /tmp/reommendation --maxRating 1 7. I am using CentOS 6.5 with Cloudera 5.2 installed. The error messages are as below: 14/11/24 18:06:48 INFO mapred.MapTask: Processing split: hdfs://master:8020/user/ashokharnal/seqfiles/part-m-0:0+195 14/11/24 18:06:49 INFO zlib.ZlibFactory: Successfully loaded initialized native-zlib library 14/11/24 18:06:49 INFO compress.CodecPool: Got brand-new decompressor [.deflate] 14/11/24 18:06:49 INFO mapred.LocalJobRunner: Map task executor complete. 14/11/24 18:06:49 WARN mapred.LocalJobRunner: job_local1177125820_0001 java.lang.Exception: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:406) Caused by: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable at org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper.run(MultithreadedMapper.java:151) at org.apache.mahout.cf.taste.hadoop.als.MultithreadedSharingMapper.run(MultithreadedSharingMapper.java:60) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:268) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable at org.apache.mahout.cf.taste.hadoop.als.PredictionMapper.map(PredictionMapper.java:44) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140) at org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper$MapRunner.run(MultithreadedMapper.java:268) 14/11/24 18:06:49 INFO mapred.JobClient: map 0% reduce 0% 14/11/24 18:06:49 INFO mapred.JobClient: Job complete: job_local1177125820_0001 14/11/24 18:06:49 INFO mapred.JobClient: Counters: 0 14/11/24 18:06:49 INFO driver.MahoutDriver: Program took 2529 ms (Minutes: 0.04215) 14/11/24 18:06:49 ERROR hdfs.DFSClient: Failed to close inode 24733
Re: Mahout 0.7 ALS Recommender: java.lang.Exception: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable
I upgraded to mahout 0.9. The same error persists. Here is the full dump. Incidentally, I am using local file system and not hadoop. [ashokharnal@master ~]$ mahout recommendfactorized --input /user/ashokharnal/seqfiles --userFeatures $res_out_file/U/ --itemFeatures $res_out_file/M/ --numRecommendations 1 --output /tmp/reommendation --maxRating 1 MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath. Running on hadoop, using /opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/bin/hadoop and HADOOP_CONF_DIR=/etc/hadoop/conf MAHOUT-JOB: /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/mahout/mahout-examples-0.9-cdh5.2.0-job.jar SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 14/11/23 17:51:35 WARN driver.MahoutDriver: No recommendfactorized.props found on classpath, will use command-line arguments only 14/11/23 17:51:35 INFO common.AbstractJob: Command line arguments: {--endPhase=[2147483647], --input=[/user/ashokharnal/seqfiles], --itemFeatures=[/user/ashokharnal/uexp.out/M/], --maxRating=[1], --numRecommendations=[1], --numThreads=[1], --output=[/tmp/reommendation], --startPhase=[0], --tempDir=[temp], --userFeatures=[/user/ashokharnal/uexp.out/U/]} 14/11/23 17:51:36 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id 14/11/23 17:51:36 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 14/11/23 17:51:36 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 14/11/23 17:51:36 INFO input.FileInputFormat: Total input paths to process : 1 14/11/23 17:51:37 INFO mapred.LocalJobRunner: OutputCommitter set in config null 14/11/23 17:51:37 INFO mapred.JobClient: Running job: job_local1520101691_0001 14/11/23 17:51:37 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter 14/11/23 17:51:37 INFO mapred.LocalJobRunner: Waiting for map tasks 14/11/23 17:51:37 INFO mapred.LocalJobRunner: Starting task: attempt_local1520101691_0001_m_00_0 14/11/23 17:51:37 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead 14/11/23 17:51:37 INFO util.ProcessTree: setsid exited with exit code 0 14/11/23 17:51:37 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@3f7b4c84 14/11/23 17:51:37 INFO mapred.MapTask: Processing split: hdfs://master:8020/user/ashokharnal/seqfiles/part-m-0:0+194 14/11/23 17:51:37 INFO zlib.ZlibFactory: Successfully loaded initialized native-zlib library 14/11/23 17:51:37 INFO compress.CodecPool: Got brand-new decompressor [.deflate] 14/11/23 17:51:37 INFO mapred.LocalJobRunner: Map task executor complete. 14/11/23 17:51:37 WARN mapred.LocalJobRunner: job_local1520101691_0001 java.lang.Exception: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:406) Caused by: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable at org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper.run(MultithreadedMapper.java:151) at org.apache.mahout.cf.taste.hadoop.als.MultithreadedSharingMapper.run(MultithreadedSharingMapper.java:60) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:268) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable at org.apache.mahout.cf.taste.hadoop.als.PredictionMapper.map(PredictionMapper.java:44) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140) at org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper$MapRunner.run(MultithreadedMapper.java:268) 14/11/23 17:51:38 INFO mapred.JobClient: map 0% reduce 0% 14/11/23 17:51:38 INFO
Re: Mahout 0.7 ALS Recommender: java.lang.Exception: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable
Can you paste a sample of your input data? The exception is this: ava.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:406) On Nov 23, 2014, at 4:31 AM, Ashok Harnal ashokhar...@gmail.com wrote: I upgraded to mahout 0.9. The same error persists. Here is the full dump. Incidentally, I am using local file system and not hadoop. [ashokharnal@master ~]$ mahout recommendfactorized --input /user/ashokharnal/seqfiles --userFeatures $res_out_file/U/ --itemFeatures $res_out_file/M/ --numRecommendations 1 --output /tmp/reommendation --maxRating 1 MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath. Running on hadoop, using /opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/bin/hadoop and HADOOP_CONF_DIR=/etc/hadoop/conf MAHOUT-JOB: /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/mahout/mahout-examples-0.9-cdh5.2.0-job.jar SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 14/11/23 17:51:35 WARN driver.MahoutDriver: No recommendfactorized.props found on classpath, will use command-line arguments only 14/11/23 17:51:35 INFO common.AbstractJob: Command line arguments: {--endPhase=[2147483647], --input=[/user/ashokharnal/seqfiles], --itemFeatures=[/user/ashokharnal/uexp.out/M/], --maxRating=[1], --numRecommendations=[1], --numThreads=[1], --output=[/tmp/reommendation], --startPhase=[0], --tempDir=[temp], --userFeatures=[/user/ashokharnal/uexp.out/U/]} 14/11/23 17:51:36 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id 14/11/23 17:51:36 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 14/11/23 17:51:36 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 14/11/23 17:51:36 INFO input.FileInputFormat: Total input paths to process : 1 14/11/23 17:51:37 INFO mapred.LocalJobRunner: OutputCommitter set in config null 14/11/23 17:51:37 INFO mapred.JobClient: Running job: job_local1520101691_0001 14/11/23 17:51:37 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter 14/11/23 17:51:37 INFO mapred.LocalJobRunner: Waiting for map tasks 14/11/23 17:51:37 INFO mapred.LocalJobRunner: Starting task: attempt_local1520101691_0001_m_00_0 14/11/23 17:51:37 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead 14/11/23 17:51:37 INFO util.ProcessTree: setsid exited with exit code 0 14/11/23 17:51:37 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@3f7b4c84 14/11/23 17:51:37 INFO mapred.MapTask: Processing split: hdfs://master:8020/user/ashokharnal/seqfiles/part-m-0:0+194 14/11/23 17:51:37 INFO zlib.ZlibFactory: Successfully loaded initialized native-zlib library 14/11/23 17:51:37 INFO compress.CodecPool: Got brand-new decompressor [.deflate] 14/11/23 17:51:37 INFO mapred.LocalJobRunner: Map task executor complete. 14/11/23 17:51:37 WARN mapred.LocalJobRunner: job_local1520101691_0001 java.lang.Exception: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:406) Caused by: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable at org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper.run(MultithreadedMapper.java:151) at org.apache.mahout.cf.taste.hadoop.als.MultithreadedSharingMapper.run(MultithreadedSharingMapper.java:60) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:268) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text
Re: Mahout 0.7 ALS Recommender: java.lang.Exception: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable
Please upgrade to Mahout version 0.9, as many things have been fixed since. On Nov 22, 2014, at 7:00 PM, Ashok Harnal ashokhar...@gmail.com wrote: I use mahout 0.7 installed in Cloudera. After creating user-feature and item-feature matrix in hdfs, I run the following command: mahout recommendfactorized --input /user/ashokharnal/seqfiles --userFeatures $res_out_file/U/ --itemFeatures $res_out_file/M/ --numRecommendations 1 --output $reommendation --maxRating 1 After some time, I get the following error: : : 14/11/23 08:28:20 INFO mapred.LocalJobRunner: Map task executor complete. 14/11/23 08:28:20 WARN mapred.LocalJobRunner: job_local954305987_0001 java.lang.Exception: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:406) Caused by: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable at org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper.run(MultithreadedMapper.java:151) at org.apache.mahout.cf.taste.hadoop.als.MultithreadedSharingMapper.run(MultithreadedSharingMapper.java:60) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:268) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable at org.apache.mahout.cf.taste.hadoop.als.PredictionMapper.map(PredictionMapper.java:44) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140) at org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper$MapRunner.run(MultithreadedMapper.java:268) Not sure what is wrong. Request help. Ashok Kumar Harnal -- Visit my blog at: http://ashokharnal.wordpress.com/