Hi Rakesh, the classifier cannot read Arff at the moment. The input is a file with each line tab separated as key<TAB>value. The key is class, the value is space separated tokens. The same format is used for classification. let me know if you can get this running correctly. I will be updating the code to make it run using vectors. But for now you will have to use this format. See the twenty newsgroups example
Robin PS: Subscribe by sending email to [email protected] and reply to the confirmation email that comes. *Forwarded message:* Dear Robin, First of all my sincere apologies for directly emailing you regarding a problem with mahout. I have been trying to subscribe to the apache mahout mailing list but the mailer daemon is not responding. I will really appreciate it if you can help me find a solution to my problem I am trying to use mahout's bayesian classifier over the iris dataset. Please note that I am using mahout-0.3. These are the steps I followed *Step 1*: convert the iris.arff file to mahout's vector format. I used the following command *java -cp /home/rakesh/mahout/mahout-0.3/utils/target/mahout-utils-0.3.jar:$(echo /home/rakesh/mahout/mahout-0.3/utils/target/dependency/*.jar . | sed 's/ /:/g') org.apache.mahout.utils.vectors.arff.Driver -d /home/rakesh/workspace/mahout/input/ -o /home/rakesh/workspace/mahout/output/ -t /home/rakesh/workspace/mahout/output/dict.txt* this created the iris.arff.mvc file but the dict.txt was empty. Nevertheless I went ahead with the training step *Step 2*: training *$HADOOP_HOME/bin/hadoop jar $MAHOUT_HOME/core/target/mahout-core-0.3.job org.apache.mahout.classifier.bayes.TrainClassifier -i output -o model -type bayes --gramSize 1 -source hdfs* This step also went through and I did not get any exceptions. *Step 3*: Test over the input dataset $HADOOP_HOME/bin/hadoop jar $MAHOUT_HOME/core/target/mahout-core-0.3.job org.apache.mahout.classifier.bayes.TestClassifier -m model -d output -ng 1 -type bayes -source hdfs -method sequential --verbose This command gives the following error *rak...@ubuntu:~/workspace/mahout$ $HADOOP_HOME/bin/hadoop jar $MAHOUT_HOME/core/target/mahout-core-0.3.job org.apache.mahout.classifier.bayes.TestClassifier -m model -d output -ng 1 -type bayes -source hdfs -method sequential --verbose* *10/06/08 03:19:09 INFO bayes.TestClassifier: Loading model from: {basePath=model, classifierType=bayes, alpha_i=1.0, dataSource=hdfs, gramSize=1, verbose=true, encoding=UTF-8, defaultCat=unknown, testDirPath=output}* *10/06/08 03:19:09 INFO bayes.TestClassifier: Testing Bayes Classifier* *10/06/08 03:19:09 INFO bayes.TestClassifier: --------------* *10/06/08 03:19:09 INFO bayes.TestClassifier: Testing: output/iris.arff.mvc* *java.lang.NullPointerException* * **at org.apache.mahout.classifier.ConfusionMatrix.getCount(ConfusionMatrix.java:100) * * **at org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:116) * * **at org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:120) * * **at org.apache.mahout.classifier.ConfusionMatrix.addInstance(ConfusionMatrix.java:88) * * **at org.apache.mahout.classifier.ResultAnalyzer.addInstance(ResultAnalyzer.java:68) * * **at org.apache.mahout.classifier.bayes.TestClassifier.classifySequential(TestClassifier.java:256) * * **at org.apache.mahout.classifier.bayes.TestClassifier.main(TestClassifier.java:176) * * **at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)* * **at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) * * **at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) * * **at java.lang.reflect.Method.invoke(Method.java:597)* * **at org.apache.hadoop.util.RunJar.main(RunJar.java:155)* * **at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)* * **at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)* * **at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)* * **at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)* I am getting a null pointer exception and I have no clue as to how to proceed. Could you please let me know if I am doing something wrong or whether its a bug. I have attached the irir.arff file and iris.arff.mvc file for your use. thanks, rakesh
