Re: Newbie question on modeling a Recommender using Mahout when the matrix is sparse

2012-09-13 Thread Gokul Pillai
Very true, good catch. I think I was interpreting the results the wrong way.
I expect only the top 5, so I changed the parameter to "5" instead of "10"
and the results are as expected now.

Thanks.

On Wed, Sep 12, 2012 at 11:36 PM, Sean Owen  wrote:

> Well there are only 7 products in the universe! If you ask for 10
> recommendations, you will always get all unrated items back in the
> recommendations. That's always true unless the algorithm can't
> actually establish a value for some items.
>
> What result were you expecting, less than 10 recs? less than 7?
>
> On Thu, Sep 13, 2012 at 6:55 AM, Gokul Pillai 
> wrote:
> > I am trying out Mahout to come up with product recommendations for users
> > based on data that show what products they use today.
> > The data is not web-scale, just about 300,000 users and 7 products. Few
> > comments about the data here:
> > 1. Since users either have or not have a particular product, the value in
> > the matrix is either "1" or "0" for all the columns (rows being the
> userids)
> > 2. All the users have one basic product, so I discounted this from the
> > data-model passed to the Mahout recommender since I assume that if
> everyone
> > has the same product, its effect on the recommendations are trivial.
> > 3. The matrix itself is sparse, the total counts of users having each
> > product is :
> > A=31847, 54754,1897 |23154 |2201 |2766 |33585
> >
> > Steps followed:
> > 1. Created a data-source from the user-product table in the database
> > File ratingsFile = new
> > File("datasets/products.csv");
> > DataModel model = new FileDataModel(ratingsFile);
> >   2.  Created a recommender on this data
> > CachingRecommender recommender = new CachingRecommender(new
> > SlopeOneRecommender(model));
> > 3. Loop through all users and get the top ten recommendations:
> > List recommendations =
> > recommender.recommend(userId, 10);
> >
> > Issue faced:
> > The problem I am facing is that the recommendations that come out are way
> > too simple - meaning that all that it seems like what is being
> recommended
> > is "if a user does not have product A, then recommend it, if they dont
> have
> > product B, then recommend it and so on." Basically a simple inverse of
> > their ownership status.
> >
> > Obviously, I am not doing something right here. How can I do the modeling
> > better to get the right recommendations. Or is it that my dataset (30
> > users times 7 products) is too small for Mahout to work with?
> >
> > Look forward to your comments. Thanks.
>


Newbie question on modeling a Recommender using Mahout when the matrix is sparse

2012-09-12 Thread Gokul Pillai
I am trying out Mahout to come up with product recommendations for users
based on data that show what products they use today.
The data is not web-scale, just about 300,000 users and 7 products. Few
comments about the data here:
1. Since users either have or not have a particular product, the value in
the matrix is either "1" or "0" for all the columns (rows being the userids)
2. All the users have one basic product, so I discounted this from the
data-model passed to the Mahout recommender since I assume that if everyone
has the same product, its effect on the recommendations are trivial.
3. The matrix itself is sparse, the total counts of users having each
product is :
A=31847, 54754,1897 |23154 |2201 |2766 |33585

Steps followed:
1. Created a data-source from the user-product table in the database
File ratingsFile = new
File("datasets/products.csv");
DataModel model = new FileDataModel(ratingsFile);
  2.  Created a recommender on this data
CachingRecommender recommender = new CachingRecommender(new
SlopeOneRecommender(model));
3. Loop through all users and get the top ten recommendations:
List recommendations =
recommender.recommend(userId, 10);

Issue faced:
The problem I am facing is that the recommendations that come out are way
too simple - meaning that all that it seems like what is being recommended
is "if a user does not have product A, then recommend it, if they dont have
product B, then recommend it and so on." Basically a simple inverse of
their ownership status.

Obviously, I am not doing something right here. How can I do the modeling
better to get the right recommendations. Or is it that my dataset (30
users times 7 products) is too small for Mahout to work with?

Look forward to your comments. Thanks.


Re: Help with running clusterdump after running Dirichlet

2010-07-15 Thread Gokul Pillai
My bad. After setting HADOOP_CONF_DIR and HADOOP_HOME, I now don't get the
errors.
However, I dont get any output too.
I tried this command too but again no output:
./bin/mahout clusterdump --seqFileDir dirichlet/output/data/ --pointsDir
dirichlet/output/clusteredPoints/ --output dumpOut

Anybody run the clusterdump successfully?


On Thu, Jul 15, 2010 at 2:19 PM, Gokul Pillai  wrote:

> I have Cloudera's CDH3 running on Ubuntu 10.04 version. And I have Apache
> Mahout (0.40 Snapshot version from yesterday).
>
> I was trying to get the clustering examples running based on the wiki page
> https://cwiki.apache.org/confluence/display/MAHOUT/Synthetic+Control+Data.
> At the bottom of this page, there is a section that describes how to get
> the data out and process it.
> Get the data out of HDFS  3
> <https://cwiki.apache.org/confluence/display/MAHOUT/Synthetic+Control+Data#Footnote3>
>   4
>
> <https://cwiki.apache.org/confluence/display/MAHOUT/Synthetic+Control+Data#Footnote4>
>  and
> have a look  5
> <https://cwiki.apache.org/confluence/display/MAHOUT/Synthetic+Control+Data#Footnote5>
>
>- All example jobs use *testdata* as input and output to directory *
>output*
>- Use *bin/hadoop fs -lsr output* to view all outputs. Copy them all to
>your local machine and you can run the ClusterDumper on them.
>   - Sequence files containing the original points in Vector form are
>   in *output/data*
>   - Computed clusters are contained in *output/clusters-i*
>   - All result clustered points are placed into *
>   output/clusteredPoints*
>
>
> So I got the data out of HDFS onto my local and it looks like this:
>
> had...@ubuntu:~/mahoutOutputs$ ls -l dirichlet/output/
> total 32
> drwxr-xr-x 3 hadoop hadoop 4096 2010-07-13 16:06 clusteredPoints
> drwxr-xr-x 2 hadoop hadoop 4096 2010-07-13 16:06 clusters-0
> drwxr-xr-x 3 hadoop hadoop 4096 2010-07-13 16:06 clusters-1
> drwxr-xr-x 3 hadoop hadoop 4096 2010-07-13 16:06 clusters-2
> drwxr-xr-x 3 hadoop hadoop 4096 2010-07-13 16:06 clusters-3
> drwxr-xr-x 3 hadoop hadoop 4096 2010-07-13 16:06 clusters-4
> drwxr-xr-x 3 hadoop hadoop 4096 2010-07-13 16:06 clusters-5
> drwxr-xr-x 3 hadoop hadoop 4096 2010-07-13 16:06 data
>
>
> However, when I ran clusterdump on this, I get the following error. Any
> help on why clusterdump is complaining about a "_logs" folder would be
> helpful:
>
> had...@ubuntu:~/mahoutOutputs$ ../mahoutsvn/trunk/bin/mahout clusterdump
> --seqFileDir dirichlet/output/clusters-1 --pointsDir
> dirichlet/output/clusteredPoints/ --output dumpOut
> no HADOOP_CONF_DIR or HADOOP_HOME set, running locally
> Exception in thread "main" java.io.FileNotFoundException:
> /home/hadoop/mahoutOutputs/dirichlet/output/clusteredPoints/_logs (Is a
> directory)
> at java.io.FileInputStream.open(Native Method)
> at java.io.FileInputStream.(FileInputStream.java:106)
> at
> org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.(RawLocalFileSystem.java:63)
> at
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.(RawLocalFileSystem.java:99)
> at
> org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:169)
> at
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:126)
> at
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)
> at
> org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1437)
> at
> org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1424)
> at
> org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1417)
> at
> org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1412)
> at
> org.apache.mahout.utils.clustering.ClusterDumper.readPoints(ClusterDumper.java:323)
> at
> org.apache.mahout.utils.clustering.ClusterDumper.init(ClusterDumper.java:93)
> at
> org.apache.mahout.utils.clustering.ClusterDumper.(ClusterDumper.java:86)
> at
> org.apache.mahout.utils.clustering.ClusterDumper.main(ClusterDumper.java:272)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:175)
>
> Regards
> Gokul
>


Help with running clusterdump after running Dirichlet

2010-07-15 Thread Gokul Pillai
I have Cloudera's CDH3 running on Ubuntu 10.04 version. And I have Apache
Mahout (0.40 Snapshot version from yesterday).

I was trying to get the clustering examples running based on the wiki page
https://cwiki.apache.org/confluence/display/MAHOUT/Synthetic+Control+Data.
At the bottom of this page, there is a section that describes how to get the
data out and process it.
Get the data out of HDFS  3

 4

and
have a look  5


   - All example jobs use *testdata* as input and output to directory *
   output*
   - Use *bin/hadoop fs -lsr output* to view all outputs. Copy them all to
   your local machine and you can run the ClusterDumper on them.
  - Sequence files containing the original points in Vector form are in
  *output/data*
  - Computed clusters are contained in *output/clusters-i*
  - All result clustered points are placed into *output/clusteredPoints*


So I got the data out of HDFS onto my local and it looks like this:

had...@ubuntu:~/mahoutOutputs$ ls -l dirichlet/output/
total 32
drwxr-xr-x 3 hadoop hadoop 4096 2010-07-13 16:06 clusteredPoints
drwxr-xr-x 2 hadoop hadoop 4096 2010-07-13 16:06 clusters-0
drwxr-xr-x 3 hadoop hadoop 4096 2010-07-13 16:06 clusters-1
drwxr-xr-x 3 hadoop hadoop 4096 2010-07-13 16:06 clusters-2
drwxr-xr-x 3 hadoop hadoop 4096 2010-07-13 16:06 clusters-3
drwxr-xr-x 3 hadoop hadoop 4096 2010-07-13 16:06 clusters-4
drwxr-xr-x 3 hadoop hadoop 4096 2010-07-13 16:06 clusters-5
drwxr-xr-x 3 hadoop hadoop 4096 2010-07-13 16:06 data


However, when I ran clusterdump on this, I get the following error. Any help
on why clusterdump is complaining about a "_logs" folder would be helpful:

had...@ubuntu:~/mahoutOutputs$ ../mahoutsvn/trunk/bin/mahout clusterdump
--seqFileDir dirichlet/output/clusters-1 --pointsDir
dirichlet/output/clusteredPoints/ --output dumpOut
no HADOOP_CONF_DIR or HADOOP_HOME set, running locally
Exception in thread "main" java.io.FileNotFoundException:
/home/hadoop/mahoutOutputs/dirichlet/output/clusteredPoints/_logs (Is a
directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.(FileInputStream.java:106)
at
org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.(RawLocalFileSystem.java:63)
at
org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.(RawLocalFileSystem.java:99)
at
org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:169)
at
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:126)
at
org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)
at
org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1437)
at
org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1424)
at
org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1417)
at
org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1412)
at
org.apache.mahout.utils.clustering.ClusterDumper.readPoints(ClusterDumper.java:323)
at
org.apache.mahout.utils.clustering.ClusterDumper.init(ClusterDumper.java:93)
at
org.apache.mahout.utils.clustering.ClusterDumper.(ClusterDumper.java:86)
at
org.apache.mahout.utils.clustering.ClusterDumper.main(ClusterDumper.java:272)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:175)

Regards
Gokul


Re: No class definition found for org.apache.mahout.math.Vector when invoking K-means example

2010-07-09 Thread Gokul Pillai
To clarify, do the fixes relate to only Java code or is there any script in
the mahout/trunk/bin folder too?
If so, how does one get that from the repo?

On Fri, Jul 9, 2010 at 4:16 PM, Gokul Pillai  wrote:

> Peter:
> I just noticed that there has been a new patch applied by Drew yesterday
> for this. (https://issues.apache.org/jira/browse/MAHOUT-426)
>
> The last comment on the thread says "Integrated in Mahout-Quality #128
> (See http://hudson.zones.apache.org/hudson/job/Mahout-Quality/128/)"
>
> Does it mean that if I did a "mvn-U" on the trunk, I should be able to get
> the fix and then not run into the issue ?
>
> Regards
> gokul
>
> On Thu, Jul 1, 2010 at 2:31 PM, Gokul Pillai  wrote:
>
>> The version on my instance is 0.20.1+152.
>> I am guessing if the jar file name is
>> "hadoop-0.20.1+152-fairscheduler.jar" then that is the version.
>>
>>
>> On Thu, Jul 1, 2010 at 11:11 AM, Peter M. Goldstein <
>> peter_m_goldst...@yahoo.com> wrote:
>>
>>> Gokul,
>>>
>>> What version of Hadoop are you using?  I'm running with 0.20+320, and I
>>> don't run into this issue.
>>>
>>> Thanks.
>>>
>>> Regards,
>>>
>>> Peter
>>>
>>> -Original Message-
>>> From: Gokul Pillai [mailto:gokoolt...@gmail.com]
>>> Sent: Thursday, July 01, 2010 10:29 AM
>>> To: user@mahout.apache.org
>>> Subject: Re: No class definition found for org.apache.mahout.math.Vector
>>> when invoking K-means example
>>>
>>> Hello Peter:
>>> When i run that command, I get the following although different error:
>>> Exception in thread "main" java.lang.NoClassDefFoundError:
>>> org/apache/commons/cli2/OptionException
>>>at java.lang.Class.forName0(Native Method)
>>>at java.lang.Class.forName(Class.java:247)
>>>at org.apache.hadoop.util.RunJar.main(RunJar.java:149)
>>> Caused by: java.lang.ClassNotFoundException:
>>> org.apache.commons.cli2.OptionException
>>>at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
>>>
>>> Digging into this, I find that hadoop 0.20 ships with
>>> "commons-cli-1.2.jar"
>>> but mahout 0.40 Snapshot version depends on "commons-cli-2.0-mahout.jar"
>>> that is there in the $MAHOUT_HOME/trunk/examples/target/ folder once you
>>> "mvn install" in the  "examples" folder.
>>>
>>> So i replaced the jar file in hadoop/lib with this one and then it goes
>>> back
>>> to the ClassNotFound error for Vector class.
>>>
>>> Regards
>>> Gokul
>>>
>>> On Thu, Jul 1, 2010 at 9:29 AM, Peter M. Goldstein <
>>> peter_m_goldst...@yahoo.com> wrote:
>>>
>>> > Hi Gokul,
>>> >
>>> > Using the Mahout on Amazon EC2 directions I'm able to get a little
>>> farther
>>> > than this.  Specifically, using the job file created when building the
>>> > examples I don't get any ClassNotFoundExceptions.  Instead I get an
>>> > InvalidInputException (presumably because I didn't prepare any input
>>> data
>>> > for the job).  If you provide some appropriately formatted candidate
>>> data
>>> > I'd be happy to try and run a complete test.
>>> >
>>> > In my test I'm using the command line:
>>> >
>>> > $HADOOP_HOME/bin/hadoop jar
>>> > $MAHOUT_HOME/examples/target/mahout-examples-0.4-SNAPSHOT.job
>>> > org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
>>> >
>>> > Again, it's critical to build and use the examples Job file.  If I
>>> attempt
>>> > to use the core Job file instead key classes appear to be absent.
>>> >
>>> > Hope that helps.
>>> >
>>> > --Peter
>>> >
>>> > -Original Message-
>>> > From: Gokul Pillai [mailto:gokoolt...@gmail.com]
>>> > Sent: Wednesday, June 30, 2010 2:35 PM
>>> > To: user@mahout.apache.org
>>> > Subject: Re: No class definition found for
>>> org.apache.mahout.math.Vector
>>> > when invoking K-means example
>>> >
>>> > Peter
>>> > Thank you for your response. However, after making your changes, it
>>> still
>>> > does not work for me.
>>> > In fact I tried another way which is:
>>> > 1. create a lib folder within my Job Jar

Re: No class definition found for org.apache.mahout.math.Vector when invoking K-means example

2010-07-09 Thread Gokul Pillai
Peter:
I just noticed that there has been a new patch applied by Drew yesterday for
this. (https://issues.apache.org/jira/browse/MAHOUT-426)

The last comment on the thread says "Integrated in Mahout-Quality #128 (See
http://hudson.zones.apache.org/hudson/job/Mahout-Quality/128/)"

Does it mean that if I did a "mvn-U" on the trunk, I should be able to get
the fix and then not run into the issue ?

Regards
gokul

On Thu, Jul 1, 2010 at 2:31 PM, Gokul Pillai  wrote:

> The version on my instance is 0.20.1+152.
> I am guessing if the jar file name is "hadoop-0.20.1+152-fairscheduler.jar"
> then that is the version.
>
>
> On Thu, Jul 1, 2010 at 11:11 AM, Peter M. Goldstein <
> peter_m_goldst...@yahoo.com> wrote:
>
>> Gokul,
>>
>> What version of Hadoop are you using?  I'm running with 0.20+320, and I
>> don't run into this issue.
>>
>> Thanks.
>>
>> Regards,
>>
>> Peter
>>
>> -Original Message-
>> From: Gokul Pillai [mailto:gokoolt...@gmail.com]
>> Sent: Thursday, July 01, 2010 10:29 AM
>> To: user@mahout.apache.org
>> Subject: Re: No class definition found for org.apache.mahout.math.Vector
>> when invoking K-means example
>>
>> Hello Peter:
>> When i run that command, I get the following although different error:
>> Exception in thread "main" java.lang.NoClassDefFoundError:
>> org/apache/commons/cli2/OptionException
>>at java.lang.Class.forName0(Native Method)
>>at java.lang.Class.forName(Class.java:247)
>>at org.apache.hadoop.util.RunJar.main(RunJar.java:149)
>> Caused by: java.lang.ClassNotFoundException:
>> org.apache.commons.cli2.OptionException
>>at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
>>
>> Digging into this, I find that hadoop 0.20 ships with
>> "commons-cli-1.2.jar"
>> but mahout 0.40 Snapshot version depends on "commons-cli-2.0-mahout.jar"
>> that is there in the $MAHOUT_HOME/trunk/examples/target/ folder once you
>> "mvn install" in the  "examples" folder.
>>
>> So i replaced the jar file in hadoop/lib with this one and then it goes
>> back
>> to the ClassNotFound error for Vector class.
>>
>> Regards
>> Gokul
>>
>> On Thu, Jul 1, 2010 at 9:29 AM, Peter M. Goldstein <
>> peter_m_goldst...@yahoo.com> wrote:
>>
>> > Hi Gokul,
>> >
>> > Using the Mahout on Amazon EC2 directions I'm able to get a little
>> farther
>> > than this.  Specifically, using the job file created when building the
>> > examples I don't get any ClassNotFoundExceptions.  Instead I get an
>> > InvalidInputException (presumably because I didn't prepare any input
>> data
>> > for the job).  If you provide some appropriately formatted candidate
>> data
>> > I'd be happy to try and run a complete test.
>> >
>> > In my test I'm using the command line:
>> >
>> > $HADOOP_HOME/bin/hadoop jar
>> > $MAHOUT_HOME/examples/target/mahout-examples-0.4-SNAPSHOT.job
>> > org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
>> >
>> > Again, it's critical to build and use the examples Job file.  If I
>> attempt
>> > to use the core Job file instead key classes appear to be absent.
>> >
>> > Hope that helps.
>> >
>> > --Peter
>> >
>> > -Original Message-
>> > From: Gokul Pillai [mailto:gokoolt...@gmail.com]
>> > Sent: Wednesday, June 30, 2010 2:35 PM
>> > To: user@mahout.apache.org
>> > Subject: Re: No class definition found for org.apache.mahout.math.Vector
>> > when invoking K-means example
>> >
>> > Peter
>> > Thank you for your response. However, after making your changes, it
>> still
>> > does not work for me.
>> > In fact I tried another way which is:
>> > 1. create a lib folder within my Job Jar-file and put the
>> > "org.apache.mahout.math.Vector" and other dependent jars.
>> > 2. Launch the command via "hadoop jar ./mahout-examples-with-libs.jar
>> > org.apache.mahout.clustering.syntheticcontrol.kmeans.Job"
>> >
>> > However, I still get the error:
>> > "Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
>> >at java.net.URLClassLoader$1.run(URLClassLoader.java:200)"
>> >
>> >
>> > Another thing I tried is modify the $HADOOP_HOME/conf/hadoop-env.sh and
>> > edit
>> > the HADOO

Version compatibility of Mahout 0.4-SNAPSHOT with Hadoop release?

2010-07-01 Thread Gokul Pillai
What version of Hadoop release is Mahout 0.4-SNAPSHOT compatible with. Where
can I find this information on the wiki?

Regards
Gokul


Re: No class definition found for org.apache.mahout.math.Vector when invoking K-means example

2010-07-01 Thread Gokul Pillai
The version on my instance is 0.20.1+152.
I am guessing if the jar file name is "hadoop-0.20.1+152-fairscheduler.jar"
then that is the version.

On Thu, Jul 1, 2010 at 11:11 AM, Peter M. Goldstein <
peter_m_goldst...@yahoo.com> wrote:

> Gokul,
>
> What version of Hadoop are you using?  I'm running with 0.20+320, and I
> don't run into this issue.
>
> Thanks.
>
> Regards,
>
> Peter
>
> -Original Message-
> From: Gokul Pillai [mailto:gokoolt...@gmail.com]
> Sent: Thursday, July 01, 2010 10:29 AM
> To: user@mahout.apache.org
> Subject: Re: No class definition found for org.apache.mahout.math.Vector
> when invoking K-means example
>
> Hello Peter:
> When i run that command, I get the following although different error:
> Exception in thread "main" java.lang.NoClassDefFoundError:
> org/apache/commons/cli2/OptionException
>at java.lang.Class.forName0(Native Method)
>at java.lang.Class.forName(Class.java:247)
>at org.apache.hadoop.util.RunJar.main(RunJar.java:149)
> Caused by: java.lang.ClassNotFoundException:
> org.apache.commons.cli2.OptionException
>at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
>
> Digging into this, I find that hadoop 0.20 ships with "commons-cli-1.2.jar"
> but mahout 0.40 Snapshot version depends on "commons-cli-2.0-mahout.jar"
> that is there in the $MAHOUT_HOME/trunk/examples/target/ folder once you
> "mvn install" in the  "examples" folder.
>
> So i replaced the jar file in hadoop/lib with this one and then it goes
> back
> to the ClassNotFound error for Vector class.
>
> Regards
> Gokul
>
> On Thu, Jul 1, 2010 at 9:29 AM, Peter M. Goldstein <
> peter_m_goldst...@yahoo.com> wrote:
>
> > Hi Gokul,
> >
> > Using the Mahout on Amazon EC2 directions I'm able to get a little
> farther
> > than this.  Specifically, using the job file created when building the
> > examples I don't get any ClassNotFoundExceptions.  Instead I get an
> > InvalidInputException (presumably because I didn't prepare any input data
> > for the job).  If you provide some appropriately formatted candidate data
> > I'd be happy to try and run a complete test.
> >
> > In my test I'm using the command line:
> >
> > $HADOOP_HOME/bin/hadoop jar
> > $MAHOUT_HOME/examples/target/mahout-examples-0.4-SNAPSHOT.job
> > org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
> >
> > Again, it's critical to build and use the examples Job file.  If I
> attempt
> > to use the core Job file instead key classes appear to be absent.
> >
> > Hope that helps.
> >
> > --Peter
> >
> > -Original Message-
> > From: Gokul Pillai [mailto:gokoolt...@gmail.com]
> > Sent: Wednesday, June 30, 2010 2:35 PM
> > To: user@mahout.apache.org
> > Subject: Re: No class definition found for org.apache.mahout.math.Vector
> > when invoking K-means example
> >
> > Peter
> > Thank you for your response. However, after making your changes, it still
> > does not work for me.
> > In fact I tried another way which is:
> > 1. create a lib folder within my Job Jar-file and put the
> > "org.apache.mahout.math.Vector" and other dependent jars.
> > 2. Launch the command via "hadoop jar ./mahout-examples-with-libs.jar
> > org.apache.mahout.clustering.syntheticcontrol.kmeans.Job"
> >
> > However, I still get the error:
> > "Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
> >at java.net.URLClassLoader$1.run(URLClassLoader.java:200)"
> >
> >
> > Another thing I tried is modify the $HADOOP_HOME/conf/hadoop-env.sh and
> > edit
> > the HADOOP_CLASSPATH to include the dependent libraries. Even that does
> not
> > work.
> >
> > Regards
> > Gokul
> >
> > On Mon, Jun 28, 2010 at 3:43 PM, Peter M. Goldstein <
> > peter_m_goldst...@yahoo.com> wrote:
> >
> > > Hi Gokul,
> > >
> > > You may want to see the following bug reports:
> > >
> > > https://issues.apache.org/jira/browse/MAHOUT-426
> > >
> > > https://issues.apache.org/jira/browse/MAHOUT-427
> > >
> > > https://issues.apache.org/jira/browse/MAHOUT-428
> > >
> > > I ran into similar issues as you and had to some minor patching to get
> > > everything to work.
> > >
> > > Also, make sure you run an explicit "mvn install" from within the
> > examples
> > > directory.  I had to do this to get the JOB files 

Re: No class definition found for org.apache.mahout.math.Vector when invoking K-means example

2010-07-01 Thread Gokul Pillai
Hello Peter:
When i run that command, I get the following although different error:
Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/commons/cli2/OptionException
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at org.apache.hadoop.util.RunJar.main(RunJar.java:149)
Caused by: java.lang.ClassNotFoundException:
org.apache.commons.cli2.OptionException
at java.net.URLClassLoader$1.run(URLClassLoader.java:200)

Digging into this, I find that hadoop 0.20 ships with "commons-cli-1.2.jar"
but mahout 0.40 Snapshot version depends on "commons-cli-2.0-mahout.jar"
that is there in the $MAHOUT_HOME/trunk/examples/target/ folder once you
"mvn install" in the  "examples" folder.

So i replaced the jar file in hadoop/lib with this one and then it goes back
to the ClassNotFound error for Vector class.

Regards
Gokul

On Thu, Jul 1, 2010 at 9:29 AM, Peter M. Goldstein <
peter_m_goldst...@yahoo.com> wrote:

> Hi Gokul,
>
> Using the Mahout on Amazon EC2 directions I'm able to get a little farther
> than this.  Specifically, using the job file created when building the
> examples I don't get any ClassNotFoundExceptions.  Instead I get an
> InvalidInputException (presumably because I didn't prepare any input data
> for the job).  If you provide some appropriately formatted candidate data
> I'd be happy to try and run a complete test.
>
> In my test I'm using the command line:
>
> $HADOOP_HOME/bin/hadoop jar
> $MAHOUT_HOME/examples/target/mahout-examples-0.4-SNAPSHOT.job
> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
>
> Again, it's critical to build and use the examples Job file.  If I attempt
> to use the core Job file instead key classes appear to be absent.
>
> Hope that helps.
>
> --Peter
>
> -Original Message-
> From: Gokul Pillai [mailto:gokoolt...@gmail.com]
> Sent: Wednesday, June 30, 2010 2:35 PM
> To: user@mahout.apache.org
> Subject: Re: No class definition found for org.apache.mahout.math.Vector
> when invoking K-means example
>
> Peter
> Thank you for your response. However, after making your changes, it still
> does not work for me.
> In fact I tried another way which is:
> 1. create a lib folder within my Job Jar-file and put the
> "org.apache.mahout.math.Vector" and other dependent jars.
> 2. Launch the command via "hadoop jar ./mahout-examples-with-libs.jar
> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job"
>
> However, I still get the error:
> "Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
>at java.net.URLClassLoader$1.run(URLClassLoader.java:200)"
>
>
> Another thing I tried is modify the $HADOOP_HOME/conf/hadoop-env.sh and
> edit
> the HADOOP_CLASSPATH to include the dependent libraries. Even that does not
> work.
>
> Regards
> Gokul
>
> On Mon, Jun 28, 2010 at 3:43 PM, Peter M. Goldstein <
> peter_m_goldst...@yahoo.com> wrote:
>
> > Hi Gokul,
> >
> > You may want to see the following bug reports:
> >
> > https://issues.apache.org/jira/browse/MAHOUT-426
> >
> > https://issues.apache.org/jira/browse/MAHOUT-427
> >
> > https://issues.apache.org/jira/browse/MAHOUT-428
> >
> > I ran into similar issues as you and had to some minor patching to get
> > everything to work.
> >
> > Also, make sure you run an explicit "mvn install" from within the
> examples
> > directory.  I had to do this to get the JOB files to build correctly.
> > Getting the JOB files to build correctly solves (sort of) issues 426 and
> > 427, so you only need to apply the patch described in 428.
> >
> > Hope that helps.
> >
> > Regards,
> >
> > Peter
> >
> > -Original Message-
> > From: Gokul Pillai [mailto:gokoolt...@gmail.com]
> > Sent: Monday, June 28, 2010 3:27 PM
> > To: user@mahout.apache.org
> > Subject: No class definition found for org.apache.mahout.math.Vector when
> > invoking K-means example
> >
> > I made a new installation of Hadoop 0.20 from Cloudera.
> > I then installed Mahout 0.4-SNAPSHOT by doing a SVN checkout this morning
> > and ran mvn install on core, utils and examples.
> > I tried running the KMeans Clustering example by doing the following as
> > mentioned in the tutorial:
> > hadoop jar
> ~/mahout/trunk/examples/target/mahout-examples-0.4-SNAPSHOT.jar
> > org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
> >
> > And I keep getting this error although I have explicitly set my Classpath
> > to
> >
&

Re: No class definition found for org.apache.mahout.math.Vector when invoking K-means example

2010-06-30 Thread Gokul Pillai
Peter
Thank you for your response. However, after making your changes, it still
does not work for me.
In fact I tried another way which is:
1. create a lib folder within my Job Jar-file and put the
"org.apache.mahout.math.Vector" and other dependent jars.
2. Launch the command via "hadoop jar ./mahout-examples-with-libs.jar
org.apache.mahout.clustering.syntheticcontrol.kmeans.Job"

However, I still get the error:
"Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
at java.net.URLClassLoader$1.run(URLClassLoader.java:200)"


Another thing I tried is modify the $HADOOP_HOME/conf/hadoop-env.sh and edit
the HADOOP_CLASSPATH to include the dependent libraries. Even that does not
work.

Regards
Gokul

On Mon, Jun 28, 2010 at 3:43 PM, Peter M. Goldstein <
peter_m_goldst...@yahoo.com> wrote:

> Hi Gokul,
>
> You may want to see the following bug reports:
>
> https://issues.apache.org/jira/browse/MAHOUT-426
>
> https://issues.apache.org/jira/browse/MAHOUT-427
>
> https://issues.apache.org/jira/browse/MAHOUT-428
>
> I ran into similar issues as you and had to some minor patching to get
> everything to work.
>
> Also, make sure you run an explicit "mvn install" from within the examples
> directory.  I had to do this to get the JOB files to build correctly.
> Getting the JOB files to build correctly solves (sort of) issues 426 and
> 427, so you only need to apply the patch described in 428.
>
> Hope that helps.
>
> Regards,
>
> Peter
>
> -Original Message-
> From: Gokul Pillai [mailto:gokoolt...@gmail.com]
> Sent: Monday, June 28, 2010 3:27 PM
> To: user@mahout.apache.org
> Subject: No class definition found for org.apache.mahout.math.Vector when
> invoking K-means example
>
> I made a new installation of Hadoop 0.20 from Cloudera.
> I then installed Mahout 0.4-SNAPSHOT by doing a SVN checkout this morning
> and ran mvn install on core, utils and examples.
> I tried running the KMeans Clustering example by doing the following as
> mentioned in the tutorial:
> hadoop jar ~/mahout/trunk/examples/target/mahout-examples-0.4-SNAPSHOT.jar
> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
>
> And I keep getting this error although I have explicitly set my Classpath
> to
>
> "/usr/lib/hadoop/lib/mahout-collections-1.0.jar:/usr/lib/hadoop/lib/mahout-m
>
> ath-0.4-SNAPSHOT.jar:/usr/lib/hadoop/lib/mahout-utils-0.4-SNAPSHOT.jar:/usr/
> lib/hadoop/lib/mahout-core-0.4-SNAPSHOT.jar":
>
> 10/06/28 15:19:21 INFO mapred.JobClient: Task Id :
> attempt_201006280959_0027_r_00_1, Status : FAILED
> java.lang.RuntimeException: java.lang.ClassNotFoundException:
> org.apache.mahout.clustering.kmeans.KMeansCombiner
>at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:868)
>at
>
> org.apache.hadoop.mapreduce.JobContext.getCombinerClass(JobContext.java:169)
>at org.apache.hadoop.mapred.Task$CombinerRunner.create(Task.java:1107)
>at
>
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier.(ReduceTask.java:1780
> )
>at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:375)
>at org.apache.hadoop.mapred.Child.main(Child.java:170)
> Caused by: java.lang.ClassNotFoundException:
> org.apache.mahout.clustering.kmeans.KMeansCombiner
>at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
>at java.security.AccessController.doPrivileged(Native Method)
>at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
>at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
>at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
>at java.lang.Class.forName0(Native Method)
>at java.lang.Class.forName(Class.java:247)
>at
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:815)
>at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:866)
>... 5 more
>
>
> Any help would be appreciated.
>
> Regards
> Gokul
>
>
>


No class definition found for org.apache.mahout.math.Vector when invoking K-means example

2010-06-28 Thread Gokul Pillai
I made a new installation of Hadoop 0.20 from Cloudera.
I then installed Mahout 0.4-SNAPSHOT by doing a SVN checkout this morning
and ran mvn install on core, utils and examples.
I tried running the KMeans Clustering example by doing the following as
mentioned in the tutorial:
hadoop jar ~/mahout/trunk/examples/target/mahout-examples-0.4-SNAPSHOT.jar
org.apache.mahout.clustering.syntheticcontrol.kmeans.Job

And I keep getting this error although I have explicitly set my Classpath to
"/usr/lib/hadoop/lib/mahout-collections-1.0.jar:/usr/lib/hadoop/lib/mahout-math-0.4-SNAPSHOT.jar:/usr/lib/hadoop/lib/mahout-utils-0.4-SNAPSHOT.jar:/usr/lib/hadoop/lib/mahout-core-0.4-SNAPSHOT.jar":

10/06/28 15:19:21 INFO mapred.JobClient: Task Id :
attempt_201006280959_0027_r_00_1, Status : FAILED
java.lang.RuntimeException: java.lang.ClassNotFoundException:
org.apache.mahout.clustering.kmeans.KMeansCombiner
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:868)
at
org.apache.hadoop.mapreduce.JobContext.getCombinerClass(JobContext.java:169)
at org.apache.hadoop.mapred.Task$CombinerRunner.create(Task.java:1107)
at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier.(ReduceTask.java:1780)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:375)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.lang.ClassNotFoundException:
org.apache.mahout.clustering.kmeans.KMeansCombiner
at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:815)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:866)
... 5 more


Any help would be appreciated.

Regards
Gokul