[jira] [Commented] (MAHOUT-1549) Extracting tfidf-vectors by key
[ https://issues.apache.org/jira/browse/MAHOUT-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14001224#comment-14001224 ] Richard Scharrer commented on MAHOUT-1549: -- Yes! https://github.com/kevinweil/elephant-bird/issues/389 has the solution. > Extracting tfidf-vectors by key > --- > > Key: MAHOUT-1549 > URL: https://issues.apache.org/jira/browse/MAHOUT-1549 > Project: Mahout > Issue Type: Question > Components: Classification >Affects Versions: 0.7, 0.8, 0.9 >Reporter: Richard Scharrer > Labels: documentation, features, newbie > Fix For: 0.7, 0.8, 0.9 > > > Hi, > I have about 20 tfidf-vectors and I need to extract 500 of them of which > I have the keys. Is there some kind of magical option which allows me > something like taking the output of mahout seqdumper and transform it back > into a sequencefile that I can use for trainnb /testnb? The sequencefiles of > tfidf use the Text class for the keys and the VectorWritable class for the > values. I tried > https://github.com/kevinweil/elephant-bird/blob/master/pig/src/main/java/com/twitter/elephantbird/pig/store/SequenceFileStorage.java > with different settings but the output always gives me the Text class for > both, key and value which can't be used in trainnb and testnb. > I posted this question on: > http://stackoverflow.com/questions/23502362/extracting-tfidf-vectors-by-key-without-destroying-the-fileformat > I ask this question in here because I've seen similar questions on > stackoverflow that where asked last year and still didn't get an answer > I really need this information so in case you know anything please tell me. > Regards, > Richard -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (MAHOUT-1549) Extracting tfidf-vectors by key
[ https://issues.apache.org/jira/browse/MAHOUT-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Scharrer resolved MAHOUT-1549. -- Resolution: Done Fix Version/s: 0.7 0.8 0.9 > Extracting tfidf-vectors by key > --- > > Key: MAHOUT-1549 > URL: https://issues.apache.org/jira/browse/MAHOUT-1549 > Project: Mahout > Issue Type: Question > Components: Classification >Affects Versions: 0.7, 0.8, 0.9 >Reporter: Richard Scharrer > Labels: documentation, features, newbie > Fix For: 0.9, 0.8, 0.7 > > > Hi, > I have about 20 tfidf-vectors and I need to extract 500 of them of which > I have the keys. Is there some kind of magical option which allows me > something like taking the output of mahout seqdumper and transform it back > into a sequencefile that I can use for trainnb /testnb? The sequencefiles of > tfidf use the Text class for the keys and the VectorWritable class for the > values. I tried > https://github.com/kevinweil/elephant-bird/blob/master/pig/src/main/java/com/twitter/elephantbird/pig/store/SequenceFileStorage.java > with different settings but the output always gives me the Text class for > both, key and value which can't be used in trainnb and testnb. > I posted this question on: > http://stackoverflow.com/questions/23502362/extracting-tfidf-vectors-by-key-without-destroying-the-fileformat > I ask this question in here because I've seen similar questions on > stackoverflow that where asked last year and still didn't get an answer > I really need this information so in case you know anything please tell me. > Regards, > Richard -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAHOUT-1549) Extracting tfidf-vectors by key
[ https://issues.apache.org/jira/browse/MAHOUT-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13997495#comment-13997495 ] Richard Scharrer commented on MAHOUT-1549: -- Hi Andy, drahcos is actually my account. I'm sorry but I had to ask this question on two or three forums because I was in a hurry. To answer your question, yes this solved my problem. Thank you for your response. Regards, Richard > Extracting tfidf-vectors by key > --- > > Key: MAHOUT-1549 > URL: https://issues.apache.org/jira/browse/MAHOUT-1549 > Project: Mahout > Issue Type: Question > Components: Classification >Affects Versions: 0.7, 0.8, 0.9 >Reporter: Richard Scharrer > Labels: documentation, features, newbie > > Hi, > I have about 20 tfidf-vectors and I need to extract 500 of them of which > I have the keys. Is there some kind of magical option which allows me > something like taking the output of mahout seqdumper and transform it back > into a sequencefile that I can use for trainnb /testnb? The sequencefiles of > tfidf use the Text class for the keys and the VectorWritable class for the > values. I tried > https://github.com/kevinweil/elephant-bird/blob/master/pig/src/main/java/com/twitter/elephantbird/pig/store/SequenceFileStorage.java > with different settings but the output always gives me the Text class for > both, key and value which can't be used in trainnb and testnb. > I posted this question on: > http://stackoverflow.com/questions/23502362/extracting-tfidf-vectors-by-key-without-destroying-the-fileformat > I ask this question in here because I've seen similar questions on > stackoverflow that where asked last year and still didn't get an answer > I really need this information so in case you know anything please tell me. > Regards, > Richard -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (MAHOUT-1549) Extracting tfidf-vectors by key
Richard Scharrer created MAHOUT-1549: Summary: Extracting tfidf-vectors by key Key: MAHOUT-1549 URL: https://issues.apache.org/jira/browse/MAHOUT-1549 Project: Mahout Issue Type: Question Components: Classification Affects Versions: 0.9, 0.8, 0.7 Reporter: Richard Scharrer Hi, I have about 20 tfidf-vectors and I need to extract 500 of them of which I have the keys. Is there some kind of magical option which allows me something like taking the output of mahout seqdumper and transform it back into a sequencefile that I can use for trainnb /testnb? The sequencefiles of tfidf use the Text class for the keys and the VectorWritable class for the values. I tried https://github.com/kevinweil/elephant-bird/blob/master/pig/src/main/java/com/twitter/elephantbird/pig/store/SequenceFileStorage.java with different settings but the output always gives me the Text class for both, key and value which can't be used in trainnb and testnb. I posted this question on: http://stackoverflow.com/questions/23502362/extracting-tfidf-vectors-by-key-without-destroying-the-fileformat I ask this question in here because I've seen similar questions on stackoverflow that where asked last year and still didn't get an answer I really need this information so in case you know anything please tell me. Regards, Richard -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAHOUT-1525) train/validateAdaptiveLogistic
[ https://issues.apache.org/jira/browse/MAHOUT-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Scharrer updated MAHOUT-1525: - Affects Version/s: 0.8 0.9 > train/validateAdaptiveLogistic > -- > > Key: MAHOUT-1525 > URL: https://issues.apache.org/jira/browse/MAHOUT-1525 > Project: Mahout > Issue Type: Question > Components: Classification >Affects Versions: 0.7, 0.8, 0.9 >Reporter: Richard Scharrer > Labels: adaptiveLogisticRegression,, newbie > Fix For: 0.7, 0.8, 0.9 > > > Hi, > I tried to use train- and validateAdaptiveLogistic on my data which is like: > category, id, var1, var2, ...var72 (all numeric) > I used the following settings: > mahout trainAdaptiveLogistic --input resource/trainingData \ > --output ./model \ > --target category --categories 9 \ > --predictors a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 . > --types numeric \ > --passes 100 \ > --showperf \ > mahout validateAdaptiveLogistic --input resource/testData --model model > --confusion --defaultCategory none > The output of validateAdaptiveLogistic is: > Log-likelihood:Min=-5.54, Max=-0.04, Mean=-1.58, Median=-1.33 > === > Confusion Matrix > --- > a b d e f g h i <--Classified as > 140 0 0 0 0 0 0| 14 > a = projekt > 0 18 0 0 0 0 0 0| 18 > b = news/aktuelles/presse > 0 0 24 0 0 0 0 0| 24 > d = lehrveranstaltung > 0 0 0 19 0 0 0 0| 19 > e = publikation > 0 0 0 0 20 0 0 0| 20 > f = event > 0 0 0 0 0 14 0 0| 14 > g = mitarbeiter/person > 0 0 0 0 0 0 44 0| 44 > h = übersicht > 0 0 0 0 0 0 0 13 | 13 > i = institut > (in case you were wondering, the categories a in german) > My problem is that this is impossible. I always get a perfect classification > even with just a little amount of training data. It doesnt even matter how > many features I use I tried it with all 72 and with only one. Am I missing > something? > Regards, > Richard -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAHOUT-1525) train/validateAdaptiveLogistic
[ https://issues.apache.org/jira/browse/MAHOUT-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Scharrer updated MAHOUT-1525: - Affects Version/s: (was: 0.9) 0.7 > train/validateAdaptiveLogistic > -- > > Key: MAHOUT-1525 > URL: https://issues.apache.org/jira/browse/MAHOUT-1525 > Project: Mahout > Issue Type: Question > Components: Classification >Affects Versions: 0.7, 0.8, 0.9 >Reporter: Richard Scharrer > Labels: adaptiveLogisticRegression,, newbie > Fix For: 0.7, 0.8, 0.9 > > > Hi, > I tried to use train- and validateAdaptiveLogistic on my data which is like: > category, id, var1, var2, ...var72 (all numeric) > I used the following settings: > mahout trainAdaptiveLogistic --input resource/trainingData \ > --output ./model \ > --target category --categories 9 \ > --predictors a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 . > --types numeric \ > --passes 100 \ > --showperf \ > mahout validateAdaptiveLogistic --input resource/testData --model model > --confusion --defaultCategory none > The output of validateAdaptiveLogistic is: > Log-likelihood:Min=-5.54, Max=-0.04, Mean=-1.58, Median=-1.33 > === > Confusion Matrix > --- > a b d e f g h i <--Classified as > 140 0 0 0 0 0 0| 14 > a = projekt > 0 18 0 0 0 0 0 0| 18 > b = news/aktuelles/presse > 0 0 24 0 0 0 0 0| 24 > d = lehrveranstaltung > 0 0 0 19 0 0 0 0| 19 > e = publikation > 0 0 0 0 20 0 0 0| 20 > f = event > 0 0 0 0 0 14 0 0| 14 > g = mitarbeiter/person > 0 0 0 0 0 0 44 0| 44 > h = übersicht > 0 0 0 0 0 0 0 13 | 13 > i = institut > (in case you were wondering, the categories a in german) > My problem is that this is impossible. I always get a perfect classification > even with just a little amount of training data. It doesnt even matter how > many features I use I tried it with all 72 and with only one. Am I missing > something? > Regards, > Richard -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAHOUT-1525) train/validateAdaptiveLogistic
[ https://issues.apache.org/jira/browse/MAHOUT-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Scharrer updated MAHOUT-1525: - Fix Version/s: 0.7 0.8 0.9 > train/validateAdaptiveLogistic > -- > > Key: MAHOUT-1525 > URL: https://issues.apache.org/jira/browse/MAHOUT-1525 > Project: Mahout > Issue Type: Question > Components: Classification >Affects Versions: 0.7, 0.8, 0.9 >Reporter: Richard Scharrer > Labels: adaptiveLogisticRegression,, newbie > Fix For: 0.7, 0.8, 0.9 > > > Hi, > I tried to use train- and validateAdaptiveLogistic on my data which is like: > category, id, var1, var2, ...var72 (all numeric) > I used the following settings: > mahout trainAdaptiveLogistic --input resource/trainingData \ > --output ./model \ > --target category --categories 9 \ > --predictors a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 . > --types numeric \ > --passes 100 \ > --showperf \ > mahout validateAdaptiveLogistic --input resource/testData --model model > --confusion --defaultCategory none > The output of validateAdaptiveLogistic is: > Log-likelihood:Min=-5.54, Max=-0.04, Mean=-1.58, Median=-1.33 > === > Confusion Matrix > --- > a b d e f g h i <--Classified as > 140 0 0 0 0 0 0| 14 > a = projekt > 0 18 0 0 0 0 0 0| 18 > b = news/aktuelles/presse > 0 0 24 0 0 0 0 0| 24 > d = lehrveranstaltung > 0 0 0 19 0 0 0 0| 19 > e = publikation > 0 0 0 0 20 0 0 0| 20 > f = event > 0 0 0 0 0 14 0 0| 14 > g = mitarbeiter/person > 0 0 0 0 0 0 44 0| 44 > h = übersicht > 0 0 0 0 0 0 0 13 | 13 > i = institut > (in case you were wondering, the categories a in german) > My problem is that this is impossible. I always get a perfect classification > even with just a little amount of training data. It doesnt even matter how > many features I use I tried it with all 72 and with only one. Am I missing > something? > Regards, > Richard -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (MAHOUT-1525) train/validateAdaptiveLogistic
[ https://issues.apache.org/jira/browse/MAHOUT-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13982088#comment-13982088 ] Richard Scharrer edited comment on MAHOUT-1525 at 4/26/14 9:28 PM: --- Solved it. I don't know why it's programmed like this, but validateAdaptiveLogistic gives you a confusion matrix which shows how it should be if everything is classified correctly instead of the value given by the model. It can easily be changed by changing: cm.addInstance(csv.getTargetString(line), csv.getTargetLabel(target)); to: Vector result = learner.classifyFull(v); int cat = result.maxValueIndex(); cm.addInstance(csv.getTargetString(line), csv.getTargetLabel(cat)); was (Author: pilgrim): Solved it. I don't know why it's programmed like this, but validateAdaptiveLogistic gives you a confusion matrix which shows how it should be if everything is classified correctly instead of the value given by the model. It can easily be changed by changing: cm.addInstance(csv.getTargetString(line), csv.getTargetLabel(target)); too: Vector result = learner.classifyFull(v); int cat = result.maxValueIndex(); cm.addInstance(csv.getTargetString(line), csv.getTargetLabel(cat)); > train/validateAdaptiveLogistic > -- > > Key: MAHOUT-1525 > URL: https://issues.apache.org/jira/browse/MAHOUT-1525 > Project: Mahout > Issue Type: Question > Components: Classification >Affects Versions: 0.9 >Reporter: Richard Scharrer > Labels: adaptiveLogisticRegression,, newbie > > Hi, > I tried to use train- and validateAdaptiveLogistic on my data which is like: > category, id, var1, var2, ...var72 (all numeric) > I used the following settings: > mahout trainAdaptiveLogistic --input resource/trainingData \ > --output ./model \ > --target category --categories 9 \ > --predictors a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 . > --types numeric \ > --passes 100 \ > --showperf \ > mahout validateAdaptiveLogistic --input resource/testData --model model > --confusion --defaultCategory none > The output of validateAdaptiveLogistic is: > Log-likelihood:Min=-5.54, Max=-0.04, Mean=-1.58, Median=-1.33 > === > Confusion Matrix > --- > a b d e f g h i <--Classified as > 140 0 0 0 0 0 0| 14 > a = projekt > 0 18 0 0 0 0 0 0| 18 > b = news/aktuelles/presse > 0 0 24 0 0 0 0 0| 24 > d = lehrveranstaltung > 0 0 0 19 0 0 0 0| 19 > e = publikation > 0 0 0 0 20 0 0 0| 20 > f = event > 0 0 0 0 0 14 0 0| 14 > g = mitarbeiter/person > 0 0 0 0 0 0 44 0| 44 > h = übersicht > 0 0 0 0 0 0 0 13 | 13 > i = institut > (in case you were wondering, the categories a in german) > My problem is that this is impossible. I always get a perfect classification > even with just a little amount of training data. It doesnt even matter how > many features I use I tried it with all 72 and with only one. Am I missing > something? > Regards, > Richard -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (MAHOUT-1525) train/validateAdaptiveLogistic
[ https://issues.apache.org/jira/browse/MAHOUT-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Scharrer resolved MAHOUT-1525. -- Resolution: Fixed > train/validateAdaptiveLogistic > -- > > Key: MAHOUT-1525 > URL: https://issues.apache.org/jira/browse/MAHOUT-1525 > Project: Mahout > Issue Type: Question > Components: Classification >Affects Versions: 0.9 >Reporter: Richard Scharrer > Labels: adaptiveLogisticRegression,, newbie > > Hi, > I tried to use train- and validateAdaptiveLogistic on my data which is like: > category, id, var1, var2, ...var72 (all numeric) > I used the following settings: > mahout trainAdaptiveLogistic --input resource/trainingData \ > --output ./model \ > --target category --categories 9 \ > --predictors a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 . > --types numeric \ > --passes 100 \ > --showperf \ > mahout validateAdaptiveLogistic --input resource/testData --model model > --confusion --defaultCategory none > The output of validateAdaptiveLogistic is: > Log-likelihood:Min=-5.54, Max=-0.04, Mean=-1.58, Median=-1.33 > === > Confusion Matrix > --- > a b d e f g h i <--Classified as > 140 0 0 0 0 0 0| 14 > a = projekt > 0 18 0 0 0 0 0 0| 18 > b = news/aktuelles/presse > 0 0 24 0 0 0 0 0| 24 > d = lehrveranstaltung > 0 0 0 19 0 0 0 0| 19 > e = publikation > 0 0 0 0 20 0 0 0| 20 > f = event > 0 0 0 0 0 14 0 0| 14 > g = mitarbeiter/person > 0 0 0 0 0 0 44 0| 44 > h = übersicht > 0 0 0 0 0 0 0 13 | 13 > i = institut > (in case you were wondering, the categories a in german) > My problem is that this is impossible. I always get a perfect classification > even with just a little amount of training data. It doesnt even matter how > many features I use I tried it with all 72 and with only one. Am I missing > something? > Regards, > Richard -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAHOUT-1525) train/validateAdaptiveLogistic
[ https://issues.apache.org/jira/browse/MAHOUT-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13982088#comment-13982088 ] Richard Scharrer commented on MAHOUT-1525: -- Solved it. I don't know why it's programmed like this, but validateAdaptiveLogistic gives you a confusion matrix which shows how it should be if everything is classified correctly instead of the value given by the model. It can easily be changed by changing: cm.addInstance(csv.getTargetString(line), csv.getTargetLabel(target)); too: Vector result = learner.classifyFull(v); int cat = result.maxValueIndex(); cm.addInstance(csv.getTargetString(line), csv.getTargetLabel(cat)); > train/validateAdaptiveLogistic > -- > > Key: MAHOUT-1525 > URL: https://issues.apache.org/jira/browse/MAHOUT-1525 > Project: Mahout > Issue Type: Question > Components: Classification >Affects Versions: 0.9 >Reporter: Richard Scharrer > Labels: adaptiveLogisticRegression,, newbie > > Hi, > I tried to use train- and validateAdaptiveLogistic on my data which is like: > category, id, var1, var2, ...var72 (all numeric) > I used the following settings: > mahout trainAdaptiveLogistic --input resource/trainingData \ > --output ./model \ > --target category --categories 9 \ > --predictors a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 . > --types numeric \ > --passes 100 \ > --showperf \ > mahout validateAdaptiveLogistic --input resource/testData --model model > --confusion --defaultCategory none > The output of validateAdaptiveLogistic is: > Log-likelihood:Min=-5.54, Max=-0.04, Mean=-1.58, Median=-1.33 > === > Confusion Matrix > --- > a b d e f g h i <--Classified as > 140 0 0 0 0 0 0| 14 > a = projekt > 0 18 0 0 0 0 0 0| 18 > b = news/aktuelles/presse > 0 0 24 0 0 0 0 0| 24 > d = lehrveranstaltung > 0 0 0 19 0 0 0 0| 19 > e = publikation > 0 0 0 0 20 0 0 0| 20 > f = event > 0 0 0 0 0 14 0 0| 14 > g = mitarbeiter/person > 0 0 0 0 0 0 44 0| 44 > h = übersicht > 0 0 0 0 0 0 0 13 | 13 > i = institut > (in case you were wondering, the categories a in german) > My problem is that this is impossible. I always get a perfect classification > even with just a little amount of training data. It doesnt even matter how > many features I use I tried it with all 72 and with only one. Am I missing > something? > Regards, > Richard -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (MAHOUT-1525) train/validateAdaptiveLogistic
[ https://issues.apache.org/jira/browse/MAHOUT-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13981867#comment-13981867 ] Richard Scharrer edited comment on MAHOUT-1525 at 4/26/14 3:58 AM: --- Thank you for your response. I'm working with 0.9 now but I still have the same problem. Any idea what to do? was (Author: pilgrim): Thank you for your response. I'm working with 0.9 now but I still have the same problem. Should I create a new issue with version 0.9? > train/validateAdaptiveLogistic > -- > > Key: MAHOUT-1525 > URL: https://issues.apache.org/jira/browse/MAHOUT-1525 > Project: Mahout > Issue Type: Question > Components: Classification >Affects Versions: 0.9 >Reporter: Richard Scharrer > Labels: adaptiveLogisticRegression,, newbie > > Hi, > I tried to use train- and validateAdaptiveLogistic on my data which is like: > category, id, var1, var2, ...var72 (all numeric) > I used the following settings: > mahout trainAdaptiveLogistic --input resource/trainingData \ > --output ./model \ > --target category --categories 9 \ > --predictors a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 . > --types numeric \ > --passes 100 \ > --showperf \ > mahout validateAdaptiveLogistic --input resource/testData --model model > --confusion --defaultCategory none > The output of validateAdaptiveLogistic is: > Log-likelihood:Min=-5.54, Max=-0.04, Mean=-1.58, Median=-1.33 > === > Confusion Matrix > --- > a b d e f g h i <--Classified as > 140 0 0 0 0 0 0| 14 > a = projekt > 0 18 0 0 0 0 0 0| 18 > b = news/aktuelles/presse > 0 0 24 0 0 0 0 0| 24 > d = lehrveranstaltung > 0 0 0 19 0 0 0 0| 19 > e = publikation > 0 0 0 0 20 0 0 0| 20 > f = event > 0 0 0 0 0 14 0 0| 14 > g = mitarbeiter/person > 0 0 0 0 0 0 44 0| 44 > h = übersicht > 0 0 0 0 0 0 0 13 | 13 > i = institut > (in case you were wondering, the categories a in german) > My problem is that this is impossible. I always get a perfect classification > even with just a little amount of training data. It doesnt even matter how > many features I use I tried it with all 72 and with only one. Am I missing > something? > Regards, > Richard -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAHOUT-1525) train/validateAdaptiveLogistic
[ https://issues.apache.org/jira/browse/MAHOUT-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Scharrer updated MAHOUT-1525: - Affects Version/s: (was: 0.7) 0.9 > train/validateAdaptiveLogistic > -- > > Key: MAHOUT-1525 > URL: https://issues.apache.org/jira/browse/MAHOUT-1525 > Project: Mahout > Issue Type: Question > Components: Classification >Affects Versions: 0.9 >Reporter: Richard Scharrer > Labels: adaptiveLogisticRegression,, newbie > > Hi, > I tried to use train- and validateAdaptiveLogistic on my data which is like: > category, id, var1, var2, ...var72 (all numeric) > I used the following settings: > mahout trainAdaptiveLogistic --input resource/trainingData \ > --output ./model \ > --target category --categories 9 \ > --predictors a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 . > --types numeric \ > --passes 100 \ > --showperf \ > mahout validateAdaptiveLogistic --input resource/testData --model model > --confusion --defaultCategory none > The output of validateAdaptiveLogistic is: > Log-likelihood:Min=-5.54, Max=-0.04, Mean=-1.58, Median=-1.33 > === > Confusion Matrix > --- > a b d e f g h i <--Classified as > 140 0 0 0 0 0 0| 14 > a = projekt > 0 18 0 0 0 0 0 0| 18 > b = news/aktuelles/presse > 0 0 24 0 0 0 0 0| 24 > d = lehrveranstaltung > 0 0 0 19 0 0 0 0| 19 > e = publikation > 0 0 0 0 20 0 0 0| 20 > f = event > 0 0 0 0 0 14 0 0| 14 > g = mitarbeiter/person > 0 0 0 0 0 0 44 0| 44 > h = übersicht > 0 0 0 0 0 0 0 13 | 13 > i = institut > (in case you were wondering, the categories a in german) > My problem is that this is impossible. I always get a perfect classification > even with just a little amount of training data. It doesnt even matter how > many features I use I tried it with all 72 and with only one. Am I missing > something? > Regards, > Richard -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAHOUT-1525) train/validateAdaptiveLogistic
[ https://issues.apache.org/jira/browse/MAHOUT-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13981867#comment-13981867 ] Richard Scharrer commented on MAHOUT-1525: -- Thank you for your response. I'm working with 0.9 now but I still have the same problem. Should I create a new issue with version 0.9? > train/validateAdaptiveLogistic > -- > > Key: MAHOUT-1525 > URL: https://issues.apache.org/jira/browse/MAHOUT-1525 > Project: Mahout > Issue Type: Question > Components: Classification >Affects Versions: 0.7 >Reporter: Richard Scharrer > Labels: adaptiveLogisticRegression,, newbie > > Hi, > I tried to use train- and validateAdaptiveLogistic on my data which is like: > category, id, var1, var2, ...var72 (all numeric) > I used the following settings: > mahout trainAdaptiveLogistic --input resource/trainingData \ > --output ./model \ > --target category --categories 9 \ > --predictors a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 . > --types numeric \ > --passes 100 \ > --showperf \ > mahout validateAdaptiveLogistic --input resource/testData --model model > --confusion --defaultCategory none > The output of validateAdaptiveLogistic is: > Log-likelihood:Min=-5.54, Max=-0.04, Mean=-1.58, Median=-1.33 > === > Confusion Matrix > --- > a b d e f g h i <--Classified as > 140 0 0 0 0 0 0| 14 > a = projekt > 0 18 0 0 0 0 0 0| 18 > b = news/aktuelles/presse > 0 0 24 0 0 0 0 0| 24 > d = lehrveranstaltung > 0 0 0 19 0 0 0 0| 19 > e = publikation > 0 0 0 0 20 0 0 0| 20 > f = event > 0 0 0 0 0 14 0 0| 14 > g = mitarbeiter/person > 0 0 0 0 0 0 44 0| 44 > h = übersicht > 0 0 0 0 0 0 0 13 | 13 > i = institut > (in case you were wondering, the categories a in german) > My problem is that this is impossible. I always get a perfect classification > even with just a little amount of training data. It doesnt even matter how > many features I use I tried it with all 72 and with only one. Am I missing > something? > Regards, > Richard -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAHOUT-1329) Mahout for hadoop 2
[ https://issues.apache.org/jira/browse/MAHOUT-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13981492#comment-13981492 ] Richard Scharrer commented on MAHOUT-1329: -- Hi, i tried this patch and the build was successfull but when I start mahout I get Running on hadoop, using /usr/bin/hadoop and HADOOP_CONF_DIR= MAHOUT-JOB: /home/drahcir/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.util.ProgramDriver.driver([Ljava/lang/String;)V at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:122) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:208) Im using CDH 4.6 with Hadoop 2.0.0-cdh4.6.0. Any idea why this doesn't work? > Mahout for hadoop 2 > --- > > Key: MAHOUT-1329 > URL: https://issues.apache.org/jira/browse/MAHOUT-1329 > Project: Mahout > Issue Type: Task > Components: build >Affects Versions: 0.9 >Reporter: Sergey Svinarchuk >Assignee: Gokhan Capan > Labels: patch > Fix For: 1.0 > > Attachments: 1329-2.patch, 1329-3-additional.patch, 1329-3.patch, > 1329.patch > > > Update mahout for work with hadoop 2.X, targeting this for Mahout 1.0. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (MAHOUT-1525) train/validateAdaptiveLogistic
Richard Scharrer created MAHOUT-1525: Summary: train/validateAdaptiveLogistic Key: MAHOUT-1525 URL: https://issues.apache.org/jira/browse/MAHOUT-1525 Project: Mahout Issue Type: Question Components: Classification Affects Versions: 0.7 Reporter: Richard Scharrer Hi, I tried to use train- and validateAdaptiveLogistic on my data which is like: category, id, var1, var2, ...var72 (all numeric) I used the following settings: mahout trainAdaptiveLogistic --input resource/trainingData \ --output ./model \ --target category --categories 9 \ --predictors a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 . --types numeric \ --passes 100 \ --showperf \ mahout validateAdaptiveLogistic --input resource/testData --model model --confusion --defaultCategory none The output of validateAdaptiveLogistic is: Log-likelihood:Min=-5.54, Max=-0.04, Mean=-1.58, Median=-1.33 === Confusion Matrix --- a b d e f g h i <--Classified as 14 0 0 0 0 0 0 0| 14 a = projekt 0 18 0 0 0 0 0 0| 18 b = news/aktuelles/presse 0 0 24 0 0 0 0 0| 24 d = lehrveranstaltung 0 0 0 19 0 0 0 0| 19 e = publikation 0 0 0 0 20 0 0 0| 20 f = event 0 0 0 0 0 14 0 0| 14 g = mitarbeiter/person 0 0 0 0 0 0 44 0| 44 h = übersicht 0 0 0 0 0 0 0 13 | 13 i = institut (in case you were wondering, the categories a in german) My problem is that this is impossible. I always get a perfect classification even with just a little amount of training data. It doesnt even matter how many features I use I tried it with all 72 and with only one. Am I missing something? Regards, Richard -- This message was sent by Atlassian JIRA (v6.2#6252)