Re: Mahout 0.7 ALS Recommender: java.lang.Exception: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable

2014-11-25 Thread Ashok Harnal
I have now tested on a fresh cluster of Cloudera 5.2. Mahout 0.9 comes
installed with it.

My input data is just five lines, tab-separated. I have typed this data
myself. So
I do not expect anything else in this data.

11001
12005
14001
22002
23001

I use the following Mahout command for factorization:

mahout parallelALS --input /user/ashokharnal/mydata --output
/user/ashokharnal/outdata --lambda 0.1 --implicitFeedback true --alpha 0.8
--numFeatures 2 --numIterations 5  --numThreadsPerSolver 1 --tempDir
/tmp/ratings

I then, create the following just two-line tab separated test file.

1   100
2   200

I have typed this out myself. So no text string is expected.

This file was then converted to sequence format, as:

mahout seqdirectory -i /user/ashokharnal/testdata -ow -o
/user/ashokharnal/seqfiles

Finally, I ran the following command to get recommendations:

mahout recommendfactorized --input /user/ashokharnal/seqfiles
--userFeatures /user/ashokharnal/outdata/U/ --itemFeatures
/user/ashokharnal/outdata/M/ --numRecommendations 1 --output
recommendations --maxRating 1

I get the same error. Full error trace is as below:


$ mahout recommendfactorized --input /user/ashokharnal/seqfiles
--userFeatures /user/ashokharnal/outdata/U/ --itemFeatures
/user/ashokharnal/outdata/M/ --numRecommendations 1 --output
recommendations --maxRating 1
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using
/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/bin/hadoop and
HADOOP_CONF_DIR=/etc/hadoop/conf
MAHOUT-JOB: 
/opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/mahout/mahout-examples-0.9-cdh5.2.0-job.jar
14/11/25 13:48:46 WARN driver.MahoutDriver: No
recommendfactorized.props found on classpath, will use command-line
arguments only
14/11/25 13:48:46 INFO common.AbstractJob: Command line arguments:
{--endPhase=[2147483647], --input=[/user/ashokharnal/seqfiles],
--itemFeatures=[/user/ashokharnal/outdata/M/], --maxRating=[1],
--numRecommendations=[1], --numThreads=[1],
--output=[recommendations], --startPhase=[0], --tempDir=[temp],
--userFeatures=[/user/ashokharnal/outdata/U/]}
14/11/25 13:48:47 INFO Configuration.deprecation: session.id is
deprecated. Instead, use dfs.metrics.session-id
14/11/25 13:48:47 INFO jvm.JvmMetrics: Initializing JVM Metrics with
processName=JobTracker, sessionId=
14/11/25 13:48:47 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the
same.
14/11/25 13:48:47 INFO input.FileInputFormat: Total input paths to process : 1
14/11/25 13:48:48 WARN conf.Configuration:
file:/tmp/hadoop-bigdata1/mapred/local/localRunner/bigdata1/job_local2071551631_0001/job_local2071551631_0001.xml:an
attempt to override final parameter:
hadoop.ssl.keystores.factory.class;  Ignoring.
14/11/25 13:48:48 WARN conf.Configuration:
file:/tmp/hadoop-bigdata1/mapred/local/localRunner/bigdata1/job_local2071551631_0001/job_local2071551631_0001.xml:an
attempt to override final parameter: hadoop.ssl.client.conf;
Ignoring.
14/11/25 13:48:48 WARN conf.Configuration:
file:/tmp/hadoop-bigdata1/mapred/local/localRunner/bigdata1/job_local2071551631_0001/job_local2071551631_0001.xml:an
attempt to override final parameter: hadoop.ssl.server.conf;
Ignoring.
14/11/25 13:48:48 WARN conf.Configuration:
file:/tmp/hadoop-bigdata1/mapred/local/localRunner/bigdata1/job_local2071551631_0001/job_local2071551631_0001.xml:an
attempt to override final parameter: hadoop.ssl.require.client.cert;
Ignoring.
14/11/25 13:48:48 INFO mapred.LocalJobRunner: OutputCommitter set in config null
14/11/25 13:48:48 INFO mapred.JobClient: Running job: job_local2071551631_0001
14/11/25 13:48:48 INFO mapred.LocalJobRunner: OutputCommitter is
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
14/11/25 13:48:48 INFO mapred.LocalJobRunner: Waiting for map tasks
14/11/25 13:48:48 INFO mapred.LocalJobRunner: Starting task:
attempt_local2071551631_0001_m_00_0
14/11/25 13:48:48 WARN mapreduce.Counters: Group
org.apache.hadoop.mapred.Task$Counter is deprecated. Use
org.apache.hadoop.mapreduce.TaskCounter instead
14/11/25 13:48:48 INFO util.ProcessTree: setsid exited with exit code 0
14/11/25 13:48:48 INFO mapred.Task:  Using ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@4e7f1fc4
14/11/25 13:48:48 INFO mapred.MapTask: Processing split:
hdfs://bigdata1:8020/user/ashokharnal/seqfiles/part-m-0:0+196
14/11/25 13:48:48 INFO zlib.ZlibFactory: Successfully loaded 
initialized native-zlib library
14/11/25 13:48:48 INFO compress.CodecPool: Got brand-new decompressor [.deflate]
14/11/25 13:48:48 INFO mapred.LocalJobRunner: Map task executor complete.
14/11/25 13:48:48 WARN mapred.LocalJobRunner: job_local2071551631_0001
java.lang.Exception: java.lang.RuntimeException:
java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast
to org.apache.hadoop.io.IntWritable
at 

Re: Mahout 0.7 ALS Recommender: java.lang.Exception: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable

2014-11-25 Thread Gokhan Capan
The problem is that seqdirectory doesn't do what you want. From the
documentation page:

The output of seqDirectory will be a Sequence file  Text, Text  of
all documents (/sub-directory-path/documentFileName, documentText).

Please see http://mahout.apache.org/users/basics/creating-vectors-from-text.html
for more details

Sent from my iPhone

 On Nov 25, 2014, at 10:35, Ashok Harnal ashokhar...@gmail.com wrote:

 I have now tested on a fresh cluster of Cloudera 5.2. Mahout 0.9 comes
 installed with it.

 My input data is just five lines, tab-separated. I have typed this data
 myself. So
 I do not expect anything else in this data.

 11001
 12005
 14001
 22002
 23001

 I use the following Mahout command for factorization:

 mahout parallelALS --input /user/ashokharnal/mydata --output
 /user/ashokharnal/outdata --lambda 0.1 --implicitFeedback true --alpha 0.8
 --numFeatures 2 --numIterations 5  --numThreadsPerSolver 1 --tempDir
 /tmp/ratings

 I then, create the following just two-line tab separated test file.

 1100
 2200

 I have typed this out myself. So no text string is expected.

 This file was then converted to sequence format, as:

 mahout seqdirectory -i /user/ashokharnal/testdata -ow -o
 /user/ashokharnal/seqfiles

 Finally, I ran the following command to get recommendations:

 mahout recommendfactorized --input /user/ashokharnal/seqfiles
 --userFeatures /user/ashokharnal/outdata/U/ --itemFeatures
 /user/ashokharnal/outdata/M/ --numRecommendations 1 --output
 recommendations --maxRating 1

 I get the same error. Full error trace is as below:


 $ mahout recommendfactorized --input /user/ashokharnal/seqfiles
 --userFeatures /user/ashokharnal/outdata/U/ --itemFeatures
 /user/ashokharnal/outdata/M/ --numRecommendations 1 --output
 recommendations --maxRating 1
 MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
 Running on hadoop, using
 /opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/bin/hadoop and
 HADOOP_CONF_DIR=/etc/hadoop/conf
 MAHOUT-JOB: 
 /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/mahout/mahout-examples-0.9-cdh5.2.0-job.jar
 14/11/25 13:48:46 WARN driver.MahoutDriver: No
 recommendfactorized.props found on classpath, will use command-line
 arguments only
 14/11/25 13:48:46 INFO common.AbstractJob: Command line arguments:
 {--endPhase=[2147483647], --input=[/user/ashokharnal/seqfiles],
 --itemFeatures=[/user/ashokharnal/outdata/M/], --maxRating=[1],
 --numRecommendations=[1], --numThreads=[1],
 --output=[recommendations], --startPhase=[0], --tempDir=[temp],
 --userFeatures=[/user/ashokharnal/outdata/U/]}
 14/11/25 13:48:47 INFO Configuration.deprecation: session.id is
 deprecated. Instead, use dfs.metrics.session-id
 14/11/25 13:48:47 INFO jvm.JvmMetrics: Initializing JVM Metrics with
 processName=JobTracker, sessionId=
 14/11/25 13:48:47 WARN mapred.JobClient: Use GenericOptionsParser for
 parsing the arguments. Applications should implement Tool for the
 same.
 14/11/25 13:48:47 INFO input.FileInputFormat: Total input paths to process : 1
 14/11/25 13:48:48 WARN conf.Configuration:
 file:/tmp/hadoop-bigdata1/mapred/local/localRunner/bigdata1/job_local2071551631_0001/job_local2071551631_0001.xml:an
 attempt to override final parameter:
 hadoop.ssl.keystores.factory.class;  Ignoring.
 14/11/25 13:48:48 WARN conf.Configuration:
 file:/tmp/hadoop-bigdata1/mapred/local/localRunner/bigdata1/job_local2071551631_0001/job_local2071551631_0001.xml:an
 attempt to override final parameter: hadoop.ssl.client.conf;
 Ignoring.
 14/11/25 13:48:48 WARN conf.Configuration:
 file:/tmp/hadoop-bigdata1/mapred/local/localRunner/bigdata1/job_local2071551631_0001/job_local2071551631_0001.xml:an
 attempt to override final parameter: hadoop.ssl.server.conf;
 Ignoring.
 14/11/25 13:48:48 WARN conf.Configuration:
 file:/tmp/hadoop-bigdata1/mapred/local/localRunner/bigdata1/job_local2071551631_0001/job_local2071551631_0001.xml:an
 attempt to override final parameter: hadoop.ssl.require.client.cert;
 Ignoring.
 14/11/25 13:48:48 INFO mapred.LocalJobRunner: OutputCommitter set in config 
 null
 14/11/25 13:48:48 INFO mapred.JobClient: Running job: job_local2071551631_0001
 14/11/25 13:48:48 INFO mapred.LocalJobRunner: OutputCommitter is
 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
 14/11/25 13:48:48 INFO mapred.LocalJobRunner: Waiting for map tasks
 14/11/25 13:48:48 INFO mapred.LocalJobRunner: Starting task:
 attempt_local2071551631_0001_m_00_0
 14/11/25 13:48:48 WARN mapreduce.Counters: Group
 org.apache.hadoop.mapred.Task$Counter is deprecated. Use
 org.apache.hadoop.mapreduce.TaskCounter instead
 14/11/25 13:48:48 INFO util.ProcessTree: setsid exited with exit code 0
 14/11/25 13:48:48 INFO mapred.Task:  Using ResourceCalculatorPlugin :
 org.apache.hadoop.util.LinuxResourceCalculatorPlugin@4e7f1fc4
 14/11/25 13:48:48 INFO mapred.MapTask: Processing split:
 hdfs://bigdata1:8020/user/ashokharnal/seqfiles/part-m-0:0+196
 

Re: Mahout 0.7 ALS Recommender: java.lang.Exception: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable

2014-11-25 Thread Ashok Harnal
Thank you for the reply.

I proceeded as per the Example listed in Apache Mahout help page at this
link https://mahout.apache.org/users/recommender/intro-als-hadoop.html:

https://mahout.apache.org/users/recommender/intro-als-hadoop.html

As per Step 4 of this link, after creation of sequence file, issue the
following command:

$ mahout recommendfactorized --input $als_input --userFeatures
$als_output/U/ --itemFeatures $als_output/M/ --numRecommendations 1
--output recommendations --maxRating 1

Now, the folders 'U' and 'M' as are mentioned in above command
are created during the process of sequence file creation by mahout as
per the following command:

$mahout seqdirectory -i /user/ashokharnal/testdata -ow -o
/user/ashokharnal/seqfiles

Since these very names were used in the Example, I thought nothing more
was required to be done in creating sequence file.

What further steps are needed? Please suggest simple shell command.

Thanks,

Ashok Kumar Harnal









On 25 November 2014 at 14:52, Gokhan Capan gkhn...@gmail.com wrote:

 The problem is that seqdirectory doesn't do what you want. From the
 documentation page:

 The output of seqDirectory will be a Sequence file  Text, Text  of
 all documents (/sub-directory-path/documentFileName, documentText).

 Please see
 http://mahout.apache.org/users/basics/creating-vectors-from-text.html
 for more details

 Sent from my iPhone

  On Nov 25, 2014, at 10:35, Ashok Harnal ashokhar...@gmail.com wrote:
 
  I have now tested on a fresh cluster of Cloudera 5.2. Mahout 0.9 comes
  installed with it.
 
  My input data is just five lines, tab-separated. I have typed this data
  myself. So
  I do not expect anything else in this data.
 
  11001
  12005
  14001
  22002
  23001
 
  I use the following Mahout command for factorization:
 
  mahout parallelALS --input /user/ashokharnal/mydata --output
  /user/ashokharnal/outdata --lambda 0.1 --implicitFeedback true --alpha
 0.8
  --numFeatures 2 --numIterations 5  --numThreadsPerSolver 1 --tempDir
  /tmp/ratings
 
  I then, create the following just two-line tab separated test file.
 
  1100
  2200
 
  I have typed this out myself. So no text string is expected.
 
  This file was then converted to sequence format, as:
 
  mahout seqdirectory -i /user/ashokharnal/testdata -ow -o
  /user/ashokharnal/seqfiles
 
  Finally, I ran the following command to get recommendations:
 
  mahout recommendfactorized --input /user/ashokharnal/seqfiles
  --userFeatures /user/ashokharnal/outdata/U/ --itemFeatures
  /user/ashokharnal/outdata/M/ --numRecommendations 1 --output
  recommendations --maxRating 1
 
  I get the same error. Full error trace is as below:
 
 
  $ mahout recommendfactorized --input /user/ashokharnal/seqfiles
  --userFeatures /user/ashokharnal/outdata/U/ --itemFeatures
  /user/ashokharnal/outdata/M/ --numRecommendations 1 --output
  recommendations --maxRating 1
  MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
  Running on hadoop, using
  /opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/bin/hadoop and
  HADOOP_CONF_DIR=/etc/hadoop/conf
  MAHOUT-JOB:
 /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/mahout/mahout-examples-0.9-cdh5.2.0-job.jar
  14/11/25 13:48:46 WARN driver.MahoutDriver: No
  recommendfactorized.props found on classpath, will use command-line
  arguments only
  14/11/25 13:48:46 INFO common.AbstractJob: Command line arguments:
  {--endPhase=[2147483647], --input=[/user/ashokharnal/seqfiles],
  --itemFeatures=[/user/ashokharnal/outdata/M/], --maxRating=[1],
  --numRecommendations=[1], --numThreads=[1],
  --output=[recommendations], --startPhase=[0], --tempDir=[temp],
  --userFeatures=[/user/ashokharnal/outdata/U/]}
  14/11/25 13:48:47 INFO Configuration.deprecation: session.id is
  deprecated. Instead, use dfs.metrics.session-id
  14/11/25 13:48:47 INFO jvm.JvmMetrics: Initializing JVM Metrics with
  processName=JobTracker, sessionId=
  14/11/25 13:48:47 WARN mapred.JobClient: Use GenericOptionsParser for
  parsing the arguments. Applications should implement Tool for the
  same.
  14/11/25 13:48:47 INFO input.FileInputFormat: Total input paths to
 process : 1
  14/11/25 13:48:48 WARN conf.Configuration:
 
 file:/tmp/hadoop-bigdata1/mapred/local/localRunner/bigdata1/job_local2071551631_0001/job_local2071551631_0001.xml:an
  attempt to override final parameter:
  hadoop.ssl.keystores.factory.class;  Ignoring.
  14/11/25 13:48:48 WARN conf.Configuration:
 
 file:/tmp/hadoop-bigdata1/mapred/local/localRunner/bigdata1/job_local2071551631_0001/job_local2071551631_0001.xml:an
  attempt to override final parameter: hadoop.ssl.client.conf;
  Ignoring.
  14/11/25 13:48:48 WARN conf.Configuration:
 
 file:/tmp/hadoop-bigdata1/mapred/local/localRunner/bigdata1/job_local2071551631_0001/job_local2071551631_0001.xml:an
  attempt to override final parameter: hadoop.ssl.server.conf;
  Ignoring.
  14/11/25 13:48:48 WARN conf.Configuration:
 
 

Re: Mahout 0.7 ALS Recommender: java.lang.Exception: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable

2014-11-25 Thread Gokhan Capan
If I don't miss it, the documentation in the link doesn't say anything
about using seqdirectory.

I don't remember how it works in 0.7, but it basically says:
Given a file of lines of userId\titemId\trating,
1- run mahout parallelALS
2- run mahout recommendfactorized

The input file for the 2nd step is the output of the first one.

Hope this helps

Gokhan

On Tue, Nov 25, 2014 at 4:03 PM, Ashok Harnal ashokhar...@gmail.com wrote:

 Thank you for the reply.

 I proceeded as per the Example listed in Apache Mahout help page at this
 link https://mahout.apache.org/users/recommender/intro-als-hadoop.html:

 https://mahout.apache.org/users/recommender/intro-als-hadoop.html

 As per Step 4 of this link, after creation of sequence file, issue the
 following command:

 $ mahout recommendfactorized --input $als_input --userFeatures
 $als_output/U/ --itemFeatures $als_output/M/ --numRecommendations 1
 --output recommendations --maxRating 1

 Now, the folders 'U' and 'M' as are mentioned in above command
 are created during the process of sequence file creation by mahout as
 per the following command:

 $mahout seqdirectory -i /user/ashokharnal/testdata -ow -o
 /user/ashokharnal/seqfiles

 Since these very names were used in the Example, I thought nothing more
 was required to be done in creating sequence file.

 What further steps are needed? Please suggest simple shell command.

 Thanks,

 Ashok Kumar Harnal









 On 25 November 2014 at 14:52, Gokhan Capan gkhn...@gmail.com wrote:

  The problem is that seqdirectory doesn't do what you want. From the
  documentation page:
 
  The output of seqDirectory will be a Sequence file  Text, Text  of
  all documents (/sub-directory-path/documentFileName, documentText).
 
  Please see
  http://mahout.apache.org/users/basics/creating-vectors-from-text.html
  for more details
 
  Sent from my iPhone
 
   On Nov 25, 2014, at 10:35, Ashok Harnal ashokhar...@gmail.com wrote:
  
   I have now tested on a fresh cluster of Cloudera 5.2. Mahout 0.9 comes
   installed with it.
  
   My input data is just five lines, tab-separated. I have typed this data
   myself. So
   I do not expect anything else in this data.
  
   11001
   12005
   14001
   22002
   23001
  
   I use the following Mahout command for factorization:
  
   mahout parallelALS --input /user/ashokharnal/mydata --output
   /user/ashokharnal/outdata --lambda 0.1 --implicitFeedback true --alpha
  0.8
   --numFeatures 2 --numIterations 5  --numThreadsPerSolver 1 --tempDir
   /tmp/ratings
  
   I then, create the following just two-line tab separated test file.
  
   1100
   2200
  
   I have typed this out myself. So no text string is expected.
  
   This file was then converted to sequence format, as:
  
   mahout seqdirectory -i /user/ashokharnal/testdata -ow -o
   /user/ashokharnal/seqfiles
  
   Finally, I ran the following command to get recommendations:
  
   mahout recommendfactorized --input /user/ashokharnal/seqfiles
   --userFeatures /user/ashokharnal/outdata/U/ --itemFeatures
   /user/ashokharnal/outdata/M/ --numRecommendations 1 --output
   recommendations --maxRating 1
  
   I get the same error. Full error trace is as below:
  
  
   $ mahout recommendfactorized --input /user/ashokharnal/seqfiles
   --userFeatures /user/ashokharnal/outdata/U/ --itemFeatures
   /user/ashokharnal/outdata/M/ --numRecommendations 1 --output
   recommendations --maxRating 1
   MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
   Running on hadoop, using
   /opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/bin/hadoop and
   HADOOP_CONF_DIR=/etc/hadoop/conf
   MAHOUT-JOB:
 
 /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/mahout/mahout-examples-0.9-cdh5.2.0-job.jar
   14/11/25 13:48:46 WARN driver.MahoutDriver: No
   recommendfactorized.props found on classpath, will use command-line
   arguments only
   14/11/25 13:48:46 INFO common.AbstractJob: Command line arguments:
   {--endPhase=[2147483647], --input=[/user/ashokharnal/seqfiles],
   --itemFeatures=[/user/ashokharnal/outdata/M/], --maxRating=[1],
   --numRecommendations=[1], --numThreads=[1],
   --output=[recommendations], --startPhase=[0], --tempDir=[temp],
   --userFeatures=[/user/ashokharnal/outdata/U/]}
   14/11/25 13:48:47 INFO Configuration.deprecation: session.id is
   deprecated. Instead, use dfs.metrics.session-id
   14/11/25 13:48:47 INFO jvm.JvmMetrics: Initializing JVM Metrics with
   processName=JobTracker, sessionId=
   14/11/25 13:48:47 WARN mapred.JobClient: Use GenericOptionsParser for
   parsing the arguments. Applications should implement Tool for the
   same.
   14/11/25 13:48:47 INFO input.FileInputFormat: Total input paths to
  process : 1
   14/11/25 13:48:48 WARN conf.Configuration:
  
 
 file:/tmp/hadoop-bigdata1/mapred/local/localRunner/bigdata1/job_local2071551631_0001/job_local2071551631_0001.xml:an
   attempt to override final parameter:
   hadoop.ssl.keystores.factory.class;  

Re: Mahout 0.7 ALS Recommender: java.lang.Exception: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable

2014-11-25 Thread Ashok Harnal
Well, I have tried again. The Mahout documentation at this link (
https://mahout.apache.org/users/recommender/intro-als-hadoop.html )
says that once user and item features have been obtained, we proceed as
follows:

1. For users we now want to make recommendations, we list them in a
sequence file format with two fields: userid and  itemid.
2. Feed the user-features, item-features and the file in (1) above to make
recommendations using recommendfactorized.

For, if I feed to recommendfactorized just a plain text file of userids and
itemids, error is generated that input is not in sequence file format.

So, I use mahout seqdirectory command to create a sequence directory. But
this time, as suggested by you, I run the following mahout command, ie
point the command to sequence file and not to sequence directory.

mahout recommendfactorized --input
/home/ashokharnal/useless/seqfiles/part-m-0 --userFeatures
/user/ashokharnal/useless/outdata/U/ --itemFeatures
/user/ashokharnal/useless/outdata/M/ --numRecommendations 1 --output
recommendations --maxRating 1

(part-m-0 is the sequence file in the sequence directory).

I get the same error as earlier. There is no respite.

Thanks,

Ashok Kumar Harnal





On 25 November 2014 at 19:48, Gokhan Capan gkhn...@gmail.com wrote:

 If I don't miss it, the documentation in the link doesn't say anything
 about using seqdirectory.

 I don't remember how it works in 0.7, but it basically says:
 Given a file of lines of userId\titemId\trating,
 1- run mahout parallelALS
 2- run mahout recommendfactorized

 The input file for the 2nd step is the output of the first one.

 Hope this helps

 Gokhan

 On Tue, Nov 25, 2014 at 4:03 PM, Ashok Harnal ashokhar...@gmail.com
 wrote:

  Thank you for the reply.
 
  I proceeded as per the Example listed in Apache Mahout help page at this
  link https://mahout.apache.org/users/recommender/intro-als-hadoop.html
 :
 
  https://mahout.apache.org/users/recommender/intro-als-hadoop.html
 
  As per Step 4 of this link, after creation of sequence file, issue the
  following command:
 
  $ mahout recommendfactorized --input $als_input --userFeatures
  $als_output/U/ --itemFeatures $als_output/M/ --numRecommendations 1
  --output recommendations --maxRating 1
 
  Now, the folders 'U' and 'M' as are mentioned in above command
  are created during the process of sequence file creation by mahout as
  per the following command:
 
  $mahout seqdirectory -i /user/ashokharnal/testdata -ow -o
  /user/ashokharnal/seqfiles
 
  Since these very names were used in the Example, I thought nothing more
  was required to be done in creating sequence file.
 
  What further steps are needed? Please suggest simple shell command.
 
  Thanks,
 
  Ashok Kumar Harnal
 
 
 
 
 
 
 
 
 
  On 25 November 2014 at 14:52, Gokhan Capan gkhn...@gmail.com wrote:
 
   The problem is that seqdirectory doesn't do what you want. From the
   documentation page:
  
   The output of seqDirectory will be a Sequence file  Text, Text  of
   all documents (/sub-directory-path/documentFileName, documentText).
  
   Please see
   http://mahout.apache.org/users/basics/creating-vectors-from-text.html
   for more details
  
   Sent from my iPhone
  
On Nov 25, 2014, at 10:35, Ashok Harnal ashokhar...@gmail.com
 wrote:
   
I have now tested on a fresh cluster of Cloudera 5.2. Mahout 0.9
 comes
installed with it.
   
My input data is just five lines, tab-separated. I have typed this
 data
myself. So
I do not expect anything else in this data.
   
11001
12005
14001
22002
23001
   
I use the following Mahout command for factorization:
   
mahout parallelALS --input /user/ashokharnal/mydata --output
/user/ashokharnal/outdata --lambda 0.1 --implicitFeedback true
 --alpha
   0.8
--numFeatures 2 --numIterations 5  --numThreadsPerSolver 1 --tempDir
/tmp/ratings
   
I then, create the following just two-line tab separated test file.
   
1100
2200
   
I have typed this out myself. So no text string is expected.
   
This file was then converted to sequence format, as:
   
mahout seqdirectory -i /user/ashokharnal/testdata -ow -o
/user/ashokharnal/seqfiles
   
Finally, I ran the following command to get recommendations:
   
mahout recommendfactorized --input /user/ashokharnal/seqfiles
--userFeatures /user/ashokharnal/outdata/U/ --itemFeatures
/user/ashokharnal/outdata/M/ --numRecommendations 1 --output
recommendations --maxRating 1
   
I get the same error. Full error trace is as below:
   
   
$ mahout recommendfactorized --input /user/ashokharnal/seqfiles
--userFeatures /user/ashokharnal/outdata/U/ --itemFeatures
/user/ashokharnal/outdata/M/ --numRecommendations 1 --output
recommendations --maxRating 1
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using

Re: Mahout 0.7 ALS Recommender: java.lang.Exception: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable

2014-11-24 Thread jayunit100
Looks like maybe a mismatch between mahout version you compiled code against 
and the mahout version installed in the cluster?

 On Nov 24, 2014, at 8:08 AM, Ashok Harnal ashokhar...@gmail.com wrote:
 
 Thanks for reply. Here are the facts:
 
 1. I am using mahout shell command and not a java program. So I am not
 passing any arguments to map function.
 
 2. I am using hadoop. Input training file is loaded in hadoop. It is a  tab
 separated 'u1.base' file of MovieLens dataset.
It is something like below. All users are there along with whatever
 ratings they have given.
 
 115
 123
 134
 143
 153
 :
 :
 214
 2102
 2144
 :
 :
 
 3. I use the following mahout command to build model:
 
  mahout parallelALS --input /user/ashokharnal/u1.base --output
 /user/ashokharnal/u1.out --lambda 0.1 --implicitFeedback true --alpha
 0.8 --numFeatures 15 --numIterations 10  --numThreadsPerSolver 1
 --tempDir /tmp/ratings
 
 4. My test file is just two-lines tab-separated file as below:
 
 
 11
 21
 
 5. This file is converted to sequence file using the following mahout command:
 
 mahout seqdirectory -i /user/ashokharnal/ufind2.test -o
 /user/ashokharnal/seqfiles
 
 6. I then run the following mahout command:
 
 mahout recommendfactorized --input /user/ashokharnal/seqfiles
 --userFeatures  /user/ashokharnal/u1.out/U/ --itemFeatures
 /user/akh/u1.out/M/ --numRecommendations 1 --output /tmp/reommendation
 --maxRating 1
 
 7. I am using CentOS 6.5 with Cloudera 5.2 installed.
 
 The error messages are as below:
 
 14/11/24 18:06:48 INFO mapred.MapTask: Processing split:
 hdfs://master:8020/user/ashokharnal/seqfiles/part-m-0:0+195
 14/11/24 18:06:49 INFO zlib.ZlibFactory: Successfully loaded 
 initialized native-zlib library
 14/11/24 18:06:49 INFO compress.CodecPool: Got brand-new decompressor 
 [.deflate]
 14/11/24 18:06:49 INFO mapred.LocalJobRunner: Map task executor complete.
 14/11/24 18:06:49 WARN mapred.LocalJobRunner: job_local1177125820_0001
 java.lang.Exception: java.lang.RuntimeException:
 java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast
 to org.apache.hadoop.io.IntWritable
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:406)
 Caused by: java.lang.RuntimeException: java.lang.ClassCastException:
 org.apache.hadoop.io.Text cannot be cast to
 org.apache.hadoop.io.IntWritable
at 
 org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper.run(MultithreadedMapper.java:151)
at 
 org.apache.mahout.cf.taste.hadoop.als.MultithreadedSharingMapper.run(MultithreadedSharingMapper.java:60)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:268)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
 Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text
 cannot be cast to org.apache.hadoop.io.IntWritable
at 
 org.apache.mahout.cf.taste.hadoop.als.PredictionMapper.map(PredictionMapper.java:44)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
at 
 org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper$MapRunner.run(MultithreadedMapper.java:268)
 14/11/24 18:06:49 INFO mapred.JobClient:  map 0% reduce 0%
 14/11/24 18:06:49 INFO mapred.JobClient: Job complete: 
 job_local1177125820_0001
 14/11/24 18:06:49 INFO mapred.JobClient: Counters: 0
 14/11/24 18:06:49 INFO driver.MahoutDriver: Program took 2529 ms
 (Minutes: 0.04215)
 14/11/24 18:06:49 ERROR hdfs.DFSClient: Failed to close inode 24733
 org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
 No lease on 
 /tmp/reommendation/_temporary/_attempt_local1177125820_0001_m_00_0/part-m-0
 (inode 24733): File does not exist. Holder
 DFSClient_NONMAPREDUCE_157704469_1 does not have any open files.
at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3319)
at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:3407)
at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:3377)
at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:673)
at 
 org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.complete(AuthorizationProviderProxyClientProtocol.java:219)
at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:520)
 

Re: Mahout 0.7 ALS Recommender: java.lang.Exception: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable

2014-11-24 Thread Ashok Harnal
Thanks for reply. I did not compile mahout. Mahout 0.9 comes along with
Cloudera 5.2.

Ashok Kumar Harnal

On 24 November 2014 at 18:42, jayunit...@gmail.com wrote:

 Looks like maybe a mismatch between mahout version you compiled code
 against and the mahout version installed in the cluster?

  On Nov 24, 2014, at 8:08 AM, Ashok Harnal ashokhar...@gmail.com wrote:
 
  Thanks for reply. Here are the facts:
 
  1. I am using mahout shell command and not a java program. So I am not
  passing any arguments to map function.
 
  2. I am using hadoop. Input training file is loaded in hadoop. It is a
 tab
  separated 'u1.base' file of MovieLens dataset.
 It is something like below. All users are there along with whatever
  ratings they have given.
 
  115
  123
  134
  143
  153
  :
  :
  214
  2102
  2144
  :
  :
 
  3. I use the following mahout command to build model:
 
   mahout parallelALS --input /user/ashokharnal/u1.base --output
  /user/ashokharnal/u1.out --lambda 0.1 --implicitFeedback true --alpha
  0.8 --numFeatures 15 --numIterations 10  --numThreadsPerSolver 1
  --tempDir /tmp/ratings
 
  4. My test file is just two-lines tab-separated file as below:
 
 
  11
  21
 
  5. This file is converted to sequence file using the following mahout
 command:
 
  mahout seqdirectory -i /user/ashokharnal/ufind2.test -o
  /user/ashokharnal/seqfiles
 
  6. I then run the following mahout command:
 
  mahout recommendfactorized --input /user/ashokharnal/seqfiles
  --userFeatures  /user/ashokharnal/u1.out/U/ --itemFeatures
  /user/akh/u1.out/M/ --numRecommendations 1 --output /tmp/reommendation
  --maxRating 1
 
  7. I am using CentOS 6.5 with Cloudera 5.2 installed.
 
  The error messages are as below:
 
  14/11/24 18:06:48 INFO mapred.MapTask: Processing split:
  hdfs://master:8020/user/ashokharnal/seqfiles/part-m-0:0+195
  14/11/24 18:06:49 INFO zlib.ZlibFactory: Successfully loaded 
  initialized native-zlib library
  14/11/24 18:06:49 INFO compress.CodecPool: Got brand-new decompressor
 [.deflate]
  14/11/24 18:06:49 INFO mapred.LocalJobRunner: Map task executor complete.
  14/11/24 18:06:49 WARN mapred.LocalJobRunner: job_local1177125820_0001
  java.lang.Exception: java.lang.RuntimeException:
  java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast
  to org.apache.hadoop.io.IntWritable
 at
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:406)
  Caused by: java.lang.RuntimeException: java.lang.ClassCastException:
  org.apache.hadoop.io.Text cannot be cast to
  org.apache.hadoop.io.IntWritable
 at
 org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper.run(MultithreadedMapper.java:151)
 at
 org.apache.mahout.cf.taste.hadoop.als.MultithreadedSharingMapper.run(MultithreadedSharingMapper.java:60)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
 at
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:268)
 at
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
  Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text
  cannot be cast to org.apache.hadoop.io.IntWritable
 at
 org.apache.mahout.cf.taste.hadoop.als.PredictionMapper.map(PredictionMapper.java:44)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
 at
 org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper$MapRunner.run(MultithreadedMapper.java:268)
  14/11/24 18:06:49 INFO mapred.JobClient:  map 0% reduce 0%
  14/11/24 18:06:49 INFO mapred.JobClient: Job complete:
 job_local1177125820_0001
  14/11/24 18:06:49 INFO mapred.JobClient: Counters: 0
  14/11/24 18:06:49 INFO driver.MahoutDriver: Program took 2529 ms
  (Minutes: 0.04215)
  14/11/24 18:06:49 ERROR hdfs.DFSClient: Failed to close inode 24733
 
 org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
  No lease on
 /tmp/reommendation/_temporary/_attempt_local1177125820_0001_m_00_0/part-m-0
  (inode 24733): File does not exist. Holder
  DFSClient_NONMAPREDUCE_157704469_1 does not have any open files.
 at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3319)
 at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:3407)
 at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:3377)
 at
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:673)
 at
 

Re: Mahout 0.7 ALS Recommender: java.lang.Exception: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable

2014-11-24 Thread Ted Dunning
The error message that you got indicated that some input was textual and
needed to be an integer.

Is there a chance that the type of some of your input is incorrect in your
sequence files?



On Mon, Nov 24, 2014 at 3:47 PM, Ashok Harnal ashokhar...@gmail.com wrote:

 Thanks for reply. I did not compile mahout. Mahout 0.9 comes along with
 Cloudera 5.2.

 Ashok Kumar Harnal

 On 24 November 2014 at 18:42, jayunit...@gmail.com wrote:

  Looks like maybe a mismatch between mahout version you compiled code
  against and the mahout version installed in the cluster?
 
   On Nov 24, 2014, at 8:08 AM, Ashok Harnal ashokhar...@gmail.com
 wrote:
  
   Thanks for reply. Here are the facts:
  
   1. I am using mahout shell command and not a java program. So I am not
   passing any arguments to map function.
  
   2. I am using hadoop. Input training file is loaded in hadoop. It is a
  tab
   separated 'u1.base' file of MovieLens dataset.
  It is something like below. All users are there along with whatever
   ratings they have given.
  
   115
   123
   134
   143
   153
   :
   :
   214
   2102
   2144
   :
   :
  
   3. I use the following mahout command to build model:
  
mahout parallelALS --input /user/ashokharnal/u1.base --output
   /user/ashokharnal/u1.out --lambda 0.1 --implicitFeedback true --alpha
   0.8 --numFeatures 15 --numIterations 10  --numThreadsPerSolver 1
   --tempDir /tmp/ratings
  
   4. My test file is just two-lines tab-separated file as below:
  
  
   11
   21
  
   5. This file is converted to sequence file using the following mahout
  command:
  
   mahout seqdirectory -i /user/ashokharnal/ufind2.test -o
   /user/ashokharnal/seqfiles
  
   6. I then run the following mahout command:
  
   mahout recommendfactorized --input /user/ashokharnal/seqfiles
   --userFeatures  /user/ashokharnal/u1.out/U/ --itemFeatures
   /user/akh/u1.out/M/ --numRecommendations 1 --output /tmp/reommendation
   --maxRating 1
  
   7. I am using CentOS 6.5 with Cloudera 5.2 installed.
  
   The error messages are as below:
  
   14/11/24 18:06:48 INFO mapred.MapTask: Processing split:
   hdfs://master:8020/user/ashokharnal/seqfiles/part-m-0:0+195
   14/11/24 18:06:49 INFO zlib.ZlibFactory: Successfully loaded 
   initialized native-zlib library
   14/11/24 18:06:49 INFO compress.CodecPool: Got brand-new decompressor
  [.deflate]
   14/11/24 18:06:49 INFO mapred.LocalJobRunner: Map task executor
 complete.
   14/11/24 18:06:49 WARN mapred.LocalJobRunner: job_local1177125820_0001
   java.lang.Exception: java.lang.RuntimeException:
   java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast
   to org.apache.hadoop.io.IntWritable
  at
  org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:406)
   Caused by: java.lang.RuntimeException: java.lang.ClassCastException:
   org.apache.hadoop.io.Text cannot be cast to
   org.apache.hadoop.io.IntWritable
  at
 
 org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper.run(MultithreadedMapper.java:151)
  at
 
 org.apache.mahout.cf.taste.hadoop.als.MultithreadedSharingMapper.run(MultithreadedSharingMapper.java:60)
  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
  at
 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:268)
  at
  java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
  at
 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:744)
   Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text
   cannot be cast to org.apache.hadoop.io.IntWritable
  at
 
 org.apache.mahout.cf.taste.hadoop.als.PredictionMapper.map(PredictionMapper.java:44)
  at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
  at
 
 org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper$MapRunner.run(MultithreadedMapper.java:268)
   14/11/24 18:06:49 INFO mapred.JobClient:  map 0% reduce 0%
   14/11/24 18:06:49 INFO mapred.JobClient: Job complete:
  job_local1177125820_0001
   14/11/24 18:06:49 INFO mapred.JobClient: Counters: 0
   14/11/24 18:06:49 INFO driver.MahoutDriver: Program took 2529 ms
   (Minutes: 0.04215)
   14/11/24 18:06:49 ERROR hdfs.DFSClient: Failed to close inode 24733
  
 
 org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
   No lease on
 
 /tmp/reommendation/_temporary/_attempt_local1177125820_0001_m_00_0/part-m-0
   (inode 24733): File does not exist. Holder
   DFSClient_NONMAPREDUCE_157704469_1 does not have any open files.
  at
 
 

Re: Mahout 0.7 ALS Recommender: java.lang.Exception: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable

2014-11-24 Thread Ashok Harnal
Thanks for the reply. I will recheck and repeat the experiment using
self-typed input.
I am reinstalling Cloudera 5.2.

Ashok Kumar Harnal

On 24 November 2014 at 21:38, Ted Dunning ted.dunn...@gmail.com wrote:

 The error message that you got indicated that some input was textual and
 needed to be an integer.

 Is there a chance that the type of some of your input is incorrect in your
 sequence files?



 On Mon, Nov 24, 2014 at 3:47 PM, Ashok Harnal ashokhar...@gmail.com
 wrote:

  Thanks for reply. I did not compile mahout. Mahout 0.9 comes along with
  Cloudera 5.2.
 
  Ashok Kumar Harnal
 
  On 24 November 2014 at 18:42, jayunit...@gmail.com wrote:
 
   Looks like maybe a mismatch between mahout version you compiled code
   against and the mahout version installed in the cluster?
  
On Nov 24, 2014, at 8:08 AM, Ashok Harnal ashokhar...@gmail.com
  wrote:
   
Thanks for reply. Here are the facts:
   
1. I am using mahout shell command and not a java program. So I am
 not
passing any arguments to map function.
   
2. I am using hadoop. Input training file is loaded in hadoop. It is
 a
   tab
separated 'u1.base' file of MovieLens dataset.
   It is something like below. All users are there along with
 whatever
ratings they have given.
   
115
123
134
143
153
:
:
214
2102
2144
:
:
   
3. I use the following mahout command to build model:
   
 mahout parallelALS --input /user/ashokharnal/u1.base --output
/user/ashokharnal/u1.out --lambda 0.1 --implicitFeedback true --alpha
0.8 --numFeatures 15 --numIterations 10  --numThreadsPerSolver 1
--tempDir /tmp/ratings
   
4. My test file is just two-lines tab-separated file as below:
   
   
11
21
   
5. This file is converted to sequence file using the following mahout
   command:
   
mahout seqdirectory -i /user/ashokharnal/ufind2.test -o
/user/ashokharnal/seqfiles
   
6. I then run the following mahout command:
   
mahout recommendfactorized --input /user/ashokharnal/seqfiles
--userFeatures  /user/ashokharnal/u1.out/U/ --itemFeatures
/user/akh/u1.out/M/ --numRecommendations 1 --output
 /tmp/reommendation
--maxRating 1
   
7. I am using CentOS 6.5 with Cloudera 5.2 installed.
   
The error messages are as below:
   
14/11/24 18:06:48 INFO mapred.MapTask: Processing split:
hdfs://master:8020/user/ashokharnal/seqfiles/part-m-0:0+195
14/11/24 18:06:49 INFO zlib.ZlibFactory: Successfully loaded 
initialized native-zlib library
14/11/24 18:06:49 INFO compress.CodecPool: Got brand-new decompressor
   [.deflate]
14/11/24 18:06:49 INFO mapred.LocalJobRunner: Map task executor
  complete.
14/11/24 18:06:49 WARN mapred.LocalJobRunner:
 job_local1177125820_0001
java.lang.Exception: java.lang.RuntimeException:
java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be
 cast
to org.apache.hadoop.io.IntWritable
   at
  
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:406)
Caused by: java.lang.RuntimeException: java.lang.ClassCastException:
org.apache.hadoop.io.Text cannot be cast to
org.apache.hadoop.io.IntWritable
   at
  
 
 org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper.run(MultithreadedMapper.java:151)
   at
  
 
 org.apache.mahout.cf.taste.hadoop.als.MultithreadedSharingMapper.run(MultithreadedSharingMapper.java:60)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
   at
  
 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:268)
   at
   java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at
  
 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at
  
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text
cannot be cast to org.apache.hadoop.io.IntWritable
   at
  
 
 org.apache.mahout.cf.taste.hadoop.als.PredictionMapper.map(PredictionMapper.java:44)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
   at
  
 
 org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper$MapRunner.run(MultithreadedMapper.java:268)
14/11/24 18:06:49 INFO mapred.JobClient:  map 0% reduce 0%
14/11/24 18:06:49 INFO mapred.JobClient: Job complete:
   job_local1177125820_0001
14/11/24 18:06:49 INFO mapred.JobClient: Counters: 0
14/11/24 18:06:49 INFO driver.MahoutDriver: Program took 2529 ms
(Minutes: 0.04215)
14/11/24 18:06:49 ERROR hdfs.DFSClient: Failed to close inode 24733
   
  
 
 

Re: Mahout 0.7 ALS Recommender: java.lang.Exception: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable

2014-11-23 Thread Ashok Harnal
I upgraded to mahout 0.9. The same error persists. Here is the full dump.
Incidentally, I am using local file system and not hadoop.


[ashokharnal@master ~]$ mahout recommendfactorized --input
/user/ashokharnal/seqfiles  --userFeatures $res_out_file/U/ --itemFeatures
$res_out_file/M/ --numRecommendations 1 --output /tmp/reommendation
--maxRating 1

MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using
/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/bin/hadoop and
HADOOP_CONF_DIR=/etc/hadoop/conf
MAHOUT-JOB:
/opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/mahout/mahout-examples-0.9-cdh5.2.0-job.jar
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
14/11/23 17:51:35 WARN driver.MahoutDriver: No recommendfactorized.props
found on classpath, will use command-line arguments only
14/11/23 17:51:35 INFO common.AbstractJob: Command line arguments:
{--endPhase=[2147483647], --input=[/user/ashokharnal/seqfiles],
--itemFeatures=[/user/ashokharnal/uexp.out/M/], --maxRating=[1],
--numRecommendations=[1], --numThreads=[1], --output=[/tmp/reommendation],
--startPhase=[0], --tempDir=[temp],
--userFeatures=[/user/ashokharnal/uexp.out/U/]}
14/11/23 17:51:36 INFO Configuration.deprecation: session.id is deprecated.
Instead, use dfs.metrics.session-id
14/11/23 17:51:36 INFO jvm.JvmMetrics: Initializing JVM Metrics with
processName=JobTracker, sessionId=
14/11/23 17:51:36 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.
14/11/23 17:51:36 INFO input.FileInputFormat: Total input paths to process
: 1
14/11/23 17:51:37 INFO mapred.LocalJobRunner: OutputCommitter set in config
null
14/11/23 17:51:37 INFO mapred.JobClient: Running job:
job_local1520101691_0001
14/11/23 17:51:37 INFO mapred.LocalJobRunner: OutputCommitter is
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
14/11/23 17:51:37 INFO mapred.LocalJobRunner: Waiting for map tasks
14/11/23 17:51:37 INFO mapred.LocalJobRunner: Starting task:
attempt_local1520101691_0001_m_00_0
14/11/23 17:51:37 WARN mapreduce.Counters: Group
org.apache.hadoop.mapred.Task$Counter is deprecated. Use
org.apache.hadoop.mapreduce.TaskCounter instead
14/11/23 17:51:37 INFO util.ProcessTree: setsid exited with exit code 0
14/11/23 17:51:37 INFO mapred.Task:  Using ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@3f7b4c84
14/11/23 17:51:37 INFO mapred.MapTask: Processing split:
hdfs://master:8020/user/ashokharnal/seqfiles/part-m-0:0+194
14/11/23 17:51:37 INFO zlib.ZlibFactory: Successfully loaded  initialized
native-zlib library
14/11/23 17:51:37 INFO compress.CodecPool: Got brand-new decompressor
[.deflate]
14/11/23 17:51:37 INFO mapred.LocalJobRunner: Map task executor complete.
14/11/23 17:51:37 WARN mapred.LocalJobRunner: job_local1520101691_0001
java.lang.Exception: java.lang.RuntimeException:
java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to
org.apache.hadoop.io.IntWritable
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:406)
Caused by: java.lang.RuntimeException: java.lang.ClassCastException:
org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable
at
org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper.run(MultithreadedMapper.java:151)
at
org.apache.mahout.cf.taste.hadoop.als.MultithreadedSharingMapper.run(MultithreadedSharingMapper.java:60)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:268)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot
be cast to org.apache.hadoop.io.IntWritable
at
org.apache.mahout.cf.taste.hadoop.als.PredictionMapper.map(PredictionMapper.java:44)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
at
org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper$MapRunner.run(MultithreadedMapper.java:268)
14/11/23 17:51:38 INFO mapred.JobClient:  map 0% reduce 0%
14/11/23 17:51:38 INFO 

Re: Mahout 0.7 ALS Recommender: java.lang.Exception: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable

2014-11-23 Thread Andrew Musselman
Can you paste a sample of your input data?  The exception is this:

ava.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to
org.apache.hadoop.io.IntWritable
   at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:406)

 On Nov 23, 2014, at 4:31 AM, Ashok Harnal ashokhar...@gmail.com wrote:
 
 I upgraded to mahout 0.9. The same error persists. Here is the full dump.
 Incidentally, I am using local file system and not hadoop.
 
 
 [ashokharnal@master ~]$ mahout recommendfactorized --input
 /user/ashokharnal/seqfiles  --userFeatures $res_out_file/U/ --itemFeatures
 $res_out_file/M/ --numRecommendations 1 --output /tmp/reommendation
 --maxRating 1
 
 MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
 Running on hadoop, using
 /opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/bin/hadoop and
 HADOOP_CONF_DIR=/etc/hadoop/conf
 MAHOUT-JOB:
 /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/mahout/mahout-examples-0.9-cdh5.2.0-job.jar
 SLF4J: Class path contains multiple SLF4J bindings.
 SLF4J: Found binding in
 [jar:file:/opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: Found binding in
 [jar:file:/opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
 explanation.
 SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
 14/11/23 17:51:35 WARN driver.MahoutDriver: No recommendfactorized.props
 found on classpath, will use command-line arguments only
 14/11/23 17:51:35 INFO common.AbstractJob: Command line arguments:
 {--endPhase=[2147483647], --input=[/user/ashokharnal/seqfiles],
 --itemFeatures=[/user/ashokharnal/uexp.out/M/], --maxRating=[1],
 --numRecommendations=[1], --numThreads=[1], --output=[/tmp/reommendation],
 --startPhase=[0], --tempDir=[temp],
 --userFeatures=[/user/ashokharnal/uexp.out/U/]}
 14/11/23 17:51:36 INFO Configuration.deprecation: session.id is deprecated.
 Instead, use dfs.metrics.session-id
 14/11/23 17:51:36 INFO jvm.JvmMetrics: Initializing JVM Metrics with
 processName=JobTracker, sessionId=
 14/11/23 17:51:36 WARN mapred.JobClient: Use GenericOptionsParser for
 parsing the arguments. Applications should implement Tool for the same.
 14/11/23 17:51:36 INFO input.FileInputFormat: Total input paths to process
 : 1
 14/11/23 17:51:37 INFO mapred.LocalJobRunner: OutputCommitter set in config
 null
 14/11/23 17:51:37 INFO mapred.JobClient: Running job:
 job_local1520101691_0001
 14/11/23 17:51:37 INFO mapred.LocalJobRunner: OutputCommitter is
 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
 14/11/23 17:51:37 INFO mapred.LocalJobRunner: Waiting for map tasks
 14/11/23 17:51:37 INFO mapred.LocalJobRunner: Starting task:
 attempt_local1520101691_0001_m_00_0
 14/11/23 17:51:37 WARN mapreduce.Counters: Group
 org.apache.hadoop.mapred.Task$Counter is deprecated. Use
 org.apache.hadoop.mapreduce.TaskCounter instead
 14/11/23 17:51:37 INFO util.ProcessTree: setsid exited with exit code 0
 14/11/23 17:51:37 INFO mapred.Task:  Using ResourceCalculatorPlugin :
 org.apache.hadoop.util.LinuxResourceCalculatorPlugin@3f7b4c84
 14/11/23 17:51:37 INFO mapred.MapTask: Processing split:
 hdfs://master:8020/user/ashokharnal/seqfiles/part-m-0:0+194
 14/11/23 17:51:37 INFO zlib.ZlibFactory: Successfully loaded  initialized
 native-zlib library
 14/11/23 17:51:37 INFO compress.CodecPool: Got brand-new decompressor
 [.deflate]
 14/11/23 17:51:37 INFO mapred.LocalJobRunner: Map task executor complete.
 14/11/23 17:51:37 WARN mapred.LocalJobRunner: job_local1520101691_0001
 java.lang.Exception: java.lang.RuntimeException:
 java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to
 org.apache.hadoop.io.IntWritable
at
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:406)
 Caused by: java.lang.RuntimeException: java.lang.ClassCastException:
 org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable
at
 org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper.run(MultithreadedMapper.java:151)
at
 org.apache.mahout.cf.taste.hadoop.als.MultithreadedSharingMapper.run(MultithreadedSharingMapper.java:60)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:268)
at
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
 Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text 

Re: Mahout 0.7 ALS Recommender: java.lang.Exception: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable

2014-11-22 Thread Andrew Musselman
Please upgrade to Mahout version 0.9, as many things have been fixed since.

 On Nov 22, 2014, at 7:00 PM, Ashok Harnal ashokhar...@gmail.com wrote:
 
 I use mahout 0.7 installed in Cloudera. After creating user-feature and
 item-feature matrix in hdfs, I run the following command:
 
 mahout recommendfactorized --input /user/ashokharnal/seqfiles
 --userFeatures $res_out_file/U/ --itemFeatures $res_out_file/M/
 --numRecommendations 1 --output $reommendation --maxRating 1
 
 After some time, I get the following error:
 
 :
 :
 14/11/23 08:28:20 INFO mapred.LocalJobRunner: Map task executor complete.
 14/11/23 08:28:20 WARN mapred.LocalJobRunner: job_local954305987_0001
 java.lang.Exception: java.lang.RuntimeException:
 java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to
 org.apache.hadoop.io.IntWritable
at
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:406)
 Caused by: java.lang.RuntimeException: java.lang.ClassCastException:
 org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable
at
 org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper.run(MultithreadedMapper.java:151)
at
 org.apache.mahout.cf.taste.hadoop.als.MultithreadedSharingMapper.run(MultithreadedSharingMapper.java:60)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:268)
at
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
 Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot
 be cast to org.apache.hadoop.io.IntWritable
at
 org.apache.mahout.cf.taste.hadoop.als.PredictionMapper.map(PredictionMapper.java:44)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
at
 org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper$MapRunner.run(MultithreadedMapper.java:268)
 
 
 Not sure what is wrong.
 Request help.
 
 Ashok Kumar Harnal
 
 
 
 
 -- 
 Visit my blog at: http://ashokharnal.wordpress.com/