Yeah. It definitely shouldn't be. I will post a fix soon(I am at work right
now). Meanwhile, You can see the test classifier code, and programmatically
run the classifier.
its as easy as  setting the params and instantiating a classifier context
and send it files one by one.

Robin


On Thu, Feb 18, 2010 at 4:15 PM, Loek Cleophas <[email protected]>wrote:

> Thank you Robin. The stack trace I got:
>
> Exception in thread "main" java.lang.NullPointerException
>        at
> org.apache.mahout.classifier.ConfusionMatrix.getCount(ConfusionMatrix.java:100)
>        at
> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:117)
>        at
> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:122)
>        at
> org.apache.mahout.classifier.ConfusionMatrix.addInstance(ConfusionMatrix.java:88)
>        at
> org.apache.mahout.classifier.ResultAnalyzer.addInstance(ResultAnalyzer.java:63)
>        at
> org.apache.mahout.classifier.bayes.TestClassifier.classifySequential(TestClassifier.java:289)
>        at
> org.apache.mahout.classifier.bayes.TestClassifier.main(TestClassifier.java:204)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>
> Command line was: bin/hadoop jar
> ~/Downloads/mahout-0.2/examples/target/mahout-examples-0.2.job
> org.apache.mahout.classifier.bayes.TestClassifier -m
> docs-klg-n3-wordLevel-complementary -d
> ~/Code/klg/indextrainingvalidation/docs-klg-mahout-validate -ng 3 -type
> cbayes -source hdfs -method sequential
>
> It did read the model in correctly - and when I substitute a non-existing
> input directory for the one with the non-category-named .txt file, it indeed
> runs normally (classifying 0 instances).
>
> I presume it should be easy to reproduce - if not, let me know and I can
> see whether I can give you our small test data set or some small subset of
> it that I can reproduce it with.
>
> Regards,
> Loek
>
>
> On Feb 18, 2010, at 11:25, Robin Anil wrote:
>
>  I will look into this.
>>
>> On Thu, Feb 18, 2010 at 3:42 PM, Loek Cleophas <[email protected]
>> >wrote:
>>
>>  Hi
>>>
>>> While playing around some more with the 20newsgroups example code for the
>>> Bayes classifiers, I ran into an oddity and a presumable bug:
>>>
>>> instead of using (parts of) the 20 newsgroups data set, which was split
>>> nicely into one file per newsgroup, with the 'category, tab, tokens' line
>>> format, I generated such a file out of our company data set. What I did
>>> though was generate 1 file to train, and 1 to test with - so both files
>>> could have different lines having different categories, e.g.
>>>
>>> cars    Ferrari red ....
>>> animals cow cat dog ....
>>>
>>> In training, this works fine.  In testing, it crashes TestClassifier with
>>> a
>>> null pointer exception. I presume that is because either the file name
>>> does
>>> not match category.txt for some category name, or because there's
>>> multiple
>>> categories being used inside the single file - but I also presume that
>>> neither should crash the thing :) It also brings up the question: if the
>>> line format in the data files has the category in there, then why are the
>>> file names relevant at all? Seems like redundancy to me. Shouldn't
>>> TestClassifier merely take all .txt files in the input data directory and
>>> process their contents?
>>>
>>> Regards,
>>> Loek
>>>
>>>
>

Reply via email to