Hi Rakesh, the classifier cannot read Arff at the moment. The input is a
file with each line tab separated as key<TAB>value. The key is class, the
value is space separated tokens. The same format is used for classification.
let me know if you can get this running correctly. I will be updating the
code to make it run using vectors. But for now you will have to use this
format. See the twenty newsgroups example

Robin

PS: Subscribe by sending email to [email protected] and reply
to the confirmation email that comes.


*Forwarded message:*

Dear Robin,

First of all my sincere apologies for directly emailing you regarding a
problem with mahout. I have been trying to subscribe to the apache mahout
mailing list but the mailer daemon is not responding. I will really
appreciate it if you can help me find a solution to my problem

I am trying to use mahout's bayesian classifier over the iris dataset.
Please note that I am using mahout-0.3. These are the steps I followed

*Step 1*: convert the iris.arff file to mahout's vector format. I used the
following command

*java -cp
/home/rakesh/mahout/mahout-0.3/utils/target/mahout-utils-0.3.jar:$(echo
/home/rakesh/mahout/mahout-0.3/utils/target/dependency/*.jar . | sed 's/
/:/g')  org.apache.mahout.utils.vectors.arff.Driver -d
/home/rakesh/workspace/mahout/input/ -o
/home/rakesh/workspace/mahout/output/ -t
/home/rakesh/workspace/mahout/output/dict.txt*


this created the iris.arff.mvc file but the dict.txt was empty. Nevertheless
I went ahead with the training step

*Step 2*: training

*$HADOOP_HOME/bin/hadoop     jar
$MAHOUT_HOME/core/target/mahout-core-0.3.job
org.apache.mahout.classifier.bayes.TrainClassifier     -i output     -o
model     -type bayes --gramSize 1 -source hdfs*

This step also went through and I did not get any exceptions.

*Step 3*: Test over the input dataset

$HADOOP_HOME/bin/hadoop     jar
$MAHOUT_HOME/core/target/mahout-core-0.3.job
org.apache.mahout.classifier.bayes.TestClassifier     -m model     -d output
    -ng 1     -type bayes     -source hdfs -method sequential --verbose

This command gives the following error

*rak...@ubuntu:~/workspace/mahout$ $HADOOP_HOME/bin/hadoop     jar
$MAHOUT_HOME/core/target/mahout-core-0.3.job
org.apache.mahout.classifier.bayes.TestClassifier     -m model     -d output
    -ng 1     -type bayes     -source hdfs -method sequential --verbose*
*10/06/08 03:19:09 INFO bayes.TestClassifier: Loading model from:
{basePath=model, classifierType=bayes, alpha_i=1.0, dataSource=hdfs,
gramSize=1, verbose=true, encoding=UTF-8, defaultCat=unknown,
testDirPath=output}*
*10/06/08 03:19:09 INFO bayes.TestClassifier: Testing Bayes Classifier*
*10/06/08 03:19:09 INFO bayes.TestClassifier: --------------*
*10/06/08 03:19:09 INFO bayes.TestClassifier: Testing: output/iris.arff.mvc*
*java.lang.NullPointerException*
* **at
org.apache.mahout.classifier.ConfusionMatrix.getCount(ConfusionMatrix.java:100)
*
* **at
org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:116)
*
* **at
org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:120)
*
* **at
org.apache.mahout.classifier.ConfusionMatrix.addInstance(ConfusionMatrix.java:88)
*
* **at
org.apache.mahout.classifier.ResultAnalyzer.addInstance(ResultAnalyzer.java:68)
*
* **at
org.apache.mahout.classifier.bayes.TestClassifier.classifySequential(TestClassifier.java:256)
*
* **at
org.apache.mahout.classifier.bayes.TestClassifier.main(TestClassifier.java:176)
*
* **at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)*
* **at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
*
* **at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
*
* **at java.lang.reflect.Method.invoke(Method.java:597)*
* **at org.apache.hadoop.util.RunJar.main(RunJar.java:155)*
* **at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)*
* **at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)*
* **at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)*
* **at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)*

I am getting a null pointer exception and I have no clue as to how to
proceed. Could you please let me know if I am doing something wrong or
whether its a bug. I have attached the irir.arff file and iris.arff.mvc file
for your use.

thanks,
rakesh

Reply via email to