[ 
https://issues.apache.org/jira/browse/MAHOUT-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13982088#comment-13982088
 ] 

Richard Scharrer commented on MAHOUT-1525:
------------------------------------------

Solved it. I don't know why it's programmed like this, but 
validateAdaptiveLogistic gives you a confusion matrix which shows how it should 
be if everything is classified correctly instead of the value given by the 
model. It can easily be changed by changing:

        cm.addInstance(csv.getTargetString(line), csv.getTargetLabel(target)); 

too:

        Vector result = learner.classifyFull(v);
        int cat = result.maxValueIndex();
        cm.addInstance(csv.getTargetString(line), csv.getTargetLabel(cat)); 

> train/validateAdaptiveLogistic
> ------------------------------
>
>                 Key: MAHOUT-1525
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1525
>             Project: Mahout
>          Issue Type: Question
>          Components: Classification
>    Affects Versions: 0.9
>            Reporter: Richard Scharrer
>              Labels: adaptiveLogisticRegression,, newbie
>
> Hi,
> I tried to use train- and validateAdaptiveLogistic on my data which is like:
> category, id, var1, var2, ...var72 (all numeric)
> I used the following settings:
> mahout trainAdaptiveLogistic --input resource/trainingData \
> --output ./model \
> --target category --categories 9 \
> --predictors a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 .....
> --types numeric \
> --passes 100 \
> --showperf \
> mahout validateAdaptiveLogistic --input resource/testData --model model 
> --confusion --defaultCategory none
> The output of validateAdaptiveLogistic is:
> Log-likelihood:Min=-5.54, Max=-0.04, Mean=-1.58, Median=-1.33
> =======================================================
> Confusion Matrix
> -------------------------------------------------------
> a     b       d       e       f       g       h       i       <--Classified as
> 14    0       0       0       0       0       0       0        |  14          
> a     = projekt
> 0     18      0       0       0       0       0       0        |  18          
> b     = news/aktuelles/presse
> 0     0       24      0       0       0       0       0        |  24          
> d     = lehrveranstaltung
> 0     0       0       19      0       0       0       0        |  19          
> e     = publikation
> 0     0       0       0       20      0       0       0        |  20          
> f     = event
> 0     0       0       0       0       14      0       0        |  14          
> g     = mitarbeiter/person
> 0     0       0       0       0       0       44      0        |  44          
> h     = übersicht
> 0     0       0       0       0       0       0       13       |  13          
> i     = institut
> (in case you were wondering, the categories a in german)
> My problem is that this is impossible. I always get a perfect classification 
> even with just a little amount of training data. It doesnt even matter how 
> many features I use I tried it with all 72 and with only one. Am I missing 
> something?
> Regards,
> Richard



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to