[ https://issues.apache.org/jira/browse/MAHOUT-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13982088#comment-13982088 ]
Richard Scharrer commented on MAHOUT-1525: ------------------------------------------ Solved it. I don't know why it's programmed like this, but validateAdaptiveLogistic gives you a confusion matrix which shows how it should be if everything is classified correctly instead of the value given by the model. It can easily be changed by changing: cm.addInstance(csv.getTargetString(line), csv.getTargetLabel(target)); too: Vector result = learner.classifyFull(v); int cat = result.maxValueIndex(); cm.addInstance(csv.getTargetString(line), csv.getTargetLabel(cat)); > train/validateAdaptiveLogistic > ------------------------------ > > Key: MAHOUT-1525 > URL: https://issues.apache.org/jira/browse/MAHOUT-1525 > Project: Mahout > Issue Type: Question > Components: Classification > Affects Versions: 0.9 > Reporter: Richard Scharrer > Labels: adaptiveLogisticRegression,, newbie > > Hi, > I tried to use train- and validateAdaptiveLogistic on my data which is like: > category, id, var1, var2, ...var72 (all numeric) > I used the following settings: > mahout trainAdaptiveLogistic --input resource/trainingData \ > --output ./model \ > --target category --categories 9 \ > --predictors a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 ..... > --types numeric \ > --passes 100 \ > --showperf \ > mahout validateAdaptiveLogistic --input resource/testData --model model > --confusion --defaultCategory none > The output of validateAdaptiveLogistic is: > Log-likelihood:Min=-5.54, Max=-0.04, Mean=-1.58, Median=-1.33 > ======================================================= > Confusion Matrix > ------------------------------------------------------- > a b d e f g h i <--Classified as > 14 0 0 0 0 0 0 0 | 14 > a = projekt > 0 18 0 0 0 0 0 0 | 18 > b = news/aktuelles/presse > 0 0 24 0 0 0 0 0 | 24 > d = lehrveranstaltung > 0 0 0 19 0 0 0 0 | 19 > e = publikation > 0 0 0 0 20 0 0 0 | 20 > f = event > 0 0 0 0 0 14 0 0 | 14 > g = mitarbeiter/person > 0 0 0 0 0 0 44 0 | 44 > h = übersicht > 0 0 0 0 0 0 0 13 | 13 > i = institut > (in case you were wondering, the categories a in german) > My problem is that this is impossible. I always get a perfect classification > even with just a little amount of training data. It doesnt even matter how > many features I use I tried it with all 72 and with only one. Am I missing > something? > Regards, > Richard -- This message was sent by Atlassian JIRA (v6.2#6252)