[jira] [Commented] (MAHOUT-1549) Extracting tfidf-vectors by key

2014-05-18 Thread Richard Scharrer (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14001224#comment-14001224
 ] 

Richard Scharrer commented on MAHOUT-1549:
--

Yes! https://github.com/kevinweil/elephant-bird/issues/389 has the solution.

> Extracting tfidf-vectors by key
> ---
>
> Key: MAHOUT-1549
> URL: https://issues.apache.org/jira/browse/MAHOUT-1549
> Project: Mahout
>  Issue Type: Question
>  Components: Classification
>Affects Versions: 0.7, 0.8, 0.9
>Reporter: Richard Scharrer
>  Labels: documentation, features, newbie
> Fix For: 0.7, 0.8, 0.9
>
>
> Hi,
> I have about 20 tfidf-vectors and I need to extract 500 of them of which 
> I have the keys. Is there some kind of magical option which allows me 
> something like taking the output of mahout seqdumper and transform it back 
> into a sequencefile that I can use for trainnb /testnb? The sequencefiles of 
> tfidf use the Text class for the keys and the VectorWritable class for the 
> values. I tried 
> https://github.com/kevinweil/elephant-bird/blob/master/pig/src/main/java/com/twitter/elephantbird/pig/store/SequenceFileStorage.java
> with different settings but the output always gives me the Text class for 
> both, key and value which can't be used in trainnb and testnb.
> I posted this question on:
> http://stackoverflow.com/questions/23502362/extracting-tfidf-vectors-by-key-without-destroying-the-fileformat
> I ask this question in here because I've seen similar questions on 
> stackoverflow that where asked last year and still didn't get an answer
> I really need this information so in case you know anything please tell me.
> Regards,
> Richard



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (MAHOUT-1549) Extracting tfidf-vectors by key

2014-05-18 Thread Richard Scharrer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Scharrer resolved MAHOUT-1549.
--

   Resolution: Done
Fix Version/s: 0.7
   0.8
   0.9

> Extracting tfidf-vectors by key
> ---
>
> Key: MAHOUT-1549
> URL: https://issues.apache.org/jira/browse/MAHOUT-1549
> Project: Mahout
>  Issue Type: Question
>  Components: Classification
>Affects Versions: 0.7, 0.8, 0.9
>Reporter: Richard Scharrer
>  Labels: documentation, features, newbie
> Fix For: 0.9, 0.8, 0.7
>
>
> Hi,
> I have about 20 tfidf-vectors and I need to extract 500 of them of which 
> I have the keys. Is there some kind of magical option which allows me 
> something like taking the output of mahout seqdumper and transform it back 
> into a sequencefile that I can use for trainnb /testnb? The sequencefiles of 
> tfidf use the Text class for the keys and the VectorWritable class for the 
> values. I tried 
> https://github.com/kevinweil/elephant-bird/blob/master/pig/src/main/java/com/twitter/elephantbird/pig/store/SequenceFileStorage.java
> with different settings but the output always gives me the Text class for 
> both, key and value which can't be used in trainnb and testnb.
> I posted this question on:
> http://stackoverflow.com/questions/23502362/extracting-tfidf-vectors-by-key-without-destroying-the-fileformat
> I ask this question in here because I've seen similar questions on 
> stackoverflow that where asked last year and still didn't get an answer
> I really need this information so in case you know anything please tell me.
> Regards,
> Richard



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1549) Extracting tfidf-vectors by key

2014-05-15 Thread Richard Scharrer (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13997495#comment-13997495
 ] 

Richard Scharrer commented on MAHOUT-1549:
--

Hi Andy, 
drahcos is actually my account. I'm sorry but I had to ask this question on two 
or three forums because I was in a hurry. To answer your question, yes this 
solved my problem.
Thank you for your response.

Regards,
Richard

  


> Extracting tfidf-vectors by key
> ---
>
> Key: MAHOUT-1549
> URL: https://issues.apache.org/jira/browse/MAHOUT-1549
> Project: Mahout
>  Issue Type: Question
>  Components: Classification
>Affects Versions: 0.7, 0.8, 0.9
>Reporter: Richard Scharrer
>  Labels: documentation, features, newbie
>
> Hi,
> I have about 20 tfidf-vectors and I need to extract 500 of them of which 
> I have the keys. Is there some kind of magical option which allows me 
> something like taking the output of mahout seqdumper and transform it back 
> into a sequencefile that I can use for trainnb /testnb? The sequencefiles of 
> tfidf use the Text class for the keys and the VectorWritable class for the 
> values. I tried 
> https://github.com/kevinweil/elephant-bird/blob/master/pig/src/main/java/com/twitter/elephantbird/pig/store/SequenceFileStorage.java
> with different settings but the output always gives me the Text class for 
> both, key and value which can't be used in trainnb and testnb.
> I posted this question on:
> http://stackoverflow.com/questions/23502362/extracting-tfidf-vectors-by-key-without-destroying-the-fileformat
> I ask this question in here because I've seen similar questions on 
> stackoverflow that where asked last year and still didn't get an answer
> I really need this information so in case you know anything please tell me.
> Regards,
> Richard



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (MAHOUT-1549) Extracting tfidf-vectors by key

2014-05-07 Thread Richard Scharrer (JIRA)
Richard Scharrer created MAHOUT-1549:


 Summary: Extracting tfidf-vectors by key
 Key: MAHOUT-1549
 URL: https://issues.apache.org/jira/browse/MAHOUT-1549
 Project: Mahout
  Issue Type: Question
  Components: Classification
Affects Versions: 0.9, 0.8, 0.7
Reporter: Richard Scharrer


Hi,
I have about 20 tfidf-vectors and I need to extract 500 of them of which I 
have the keys. Is there some kind of magical option which allows me something 
like taking the output of mahout seqdumper and transform it back into a 
sequencefile that I can use for trainnb /testnb? The sequencefiles of tfidf use 
the Text class for the keys and the VectorWritable class for the values. I 
tried 
https://github.com/kevinweil/elephant-bird/blob/master/pig/src/main/java/com/twitter/elephantbird/pig/store/SequenceFileStorage.java
with different settings but the output always gives me the Text class for both, 
key and value which can't be used in trainnb and testnb.

I posted this question on:

http://stackoverflow.com/questions/23502362/extracting-tfidf-vectors-by-key-without-destroying-the-fileformat

I ask this question in here because I've seen similar questions on 
stackoverflow that where asked last year and still didn't get an answer

I really need this information so in case you know anything please tell me.

Regards,
Richard



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAHOUT-1525) train/validateAdaptiveLogistic

2014-04-27 Thread Richard Scharrer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Scharrer updated MAHOUT-1525:
-

Affects Version/s: 0.8
   0.9

> train/validateAdaptiveLogistic
> --
>
> Key: MAHOUT-1525
> URL: https://issues.apache.org/jira/browse/MAHOUT-1525
> Project: Mahout
>  Issue Type: Question
>  Components: Classification
>Affects Versions: 0.7, 0.8, 0.9
>Reporter: Richard Scharrer
>  Labels: adaptiveLogisticRegression,, newbie
> Fix For: 0.7, 0.8, 0.9
>
>
> Hi,
> I tried to use train- and validateAdaptiveLogistic on my data which is like:
> category, id, var1, var2, ...var72 (all numeric)
> I used the following settings:
> mahout trainAdaptiveLogistic --input resource/trainingData \
> --output ./model \
> --target category --categories 9 \
> --predictors a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 .
> --types numeric \
> --passes 100 \
> --showperf \
> mahout validateAdaptiveLogistic --input resource/testData --model model 
> --confusion --defaultCategory none
> The output of validateAdaptiveLogistic is:
> Log-likelihood:Min=-5.54, Max=-0.04, Mean=-1.58, Median=-1.33
> ===
> Confusion Matrix
> ---
> a b   d   e   f   g   h   i   <--Classified as
> 140   0   0   0   0   0   0|  14  
> a = projekt
> 0 18  0   0   0   0   0   0|  18  
> b = news/aktuelles/presse
> 0 0   24  0   0   0   0   0|  24  
> d = lehrveranstaltung
> 0 0   0   19  0   0   0   0|  19  
> e = publikation
> 0 0   0   0   20  0   0   0|  20  
> f = event
> 0 0   0   0   0   14  0   0|  14  
> g = mitarbeiter/person
> 0 0   0   0   0   0   44  0|  44  
> h = übersicht
> 0 0   0   0   0   0   0   13   |  13  
> i = institut
> (in case you were wondering, the categories a in german)
> My problem is that this is impossible. I always get a perfect classification 
> even with just a little amount of training data. It doesnt even matter how 
> many features I use I tried it with all 72 and with only one. Am I missing 
> something?
> Regards,
> Richard



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAHOUT-1525) train/validateAdaptiveLogistic

2014-04-27 Thread Richard Scharrer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Scharrer updated MAHOUT-1525:
-

Affects Version/s: (was: 0.9)
   0.7

> train/validateAdaptiveLogistic
> --
>
> Key: MAHOUT-1525
> URL: https://issues.apache.org/jira/browse/MAHOUT-1525
> Project: Mahout
>  Issue Type: Question
>  Components: Classification
>Affects Versions: 0.7, 0.8, 0.9
>Reporter: Richard Scharrer
>  Labels: adaptiveLogisticRegression,, newbie
> Fix For: 0.7, 0.8, 0.9
>
>
> Hi,
> I tried to use train- and validateAdaptiveLogistic on my data which is like:
> category, id, var1, var2, ...var72 (all numeric)
> I used the following settings:
> mahout trainAdaptiveLogistic --input resource/trainingData \
> --output ./model \
> --target category --categories 9 \
> --predictors a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 .
> --types numeric \
> --passes 100 \
> --showperf \
> mahout validateAdaptiveLogistic --input resource/testData --model model 
> --confusion --defaultCategory none
> The output of validateAdaptiveLogistic is:
> Log-likelihood:Min=-5.54, Max=-0.04, Mean=-1.58, Median=-1.33
> ===
> Confusion Matrix
> ---
> a b   d   e   f   g   h   i   <--Classified as
> 140   0   0   0   0   0   0|  14  
> a = projekt
> 0 18  0   0   0   0   0   0|  18  
> b = news/aktuelles/presse
> 0 0   24  0   0   0   0   0|  24  
> d = lehrveranstaltung
> 0 0   0   19  0   0   0   0|  19  
> e = publikation
> 0 0   0   0   20  0   0   0|  20  
> f = event
> 0 0   0   0   0   14  0   0|  14  
> g = mitarbeiter/person
> 0 0   0   0   0   0   44  0|  44  
> h = übersicht
> 0 0   0   0   0   0   0   13   |  13  
> i = institut
> (in case you were wondering, the categories a in german)
> My problem is that this is impossible. I always get a perfect classification 
> even with just a little amount of training data. It doesnt even matter how 
> many features I use I tried it with all 72 and with only one. Am I missing 
> something?
> Regards,
> Richard



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAHOUT-1525) train/validateAdaptiveLogistic

2014-04-27 Thread Richard Scharrer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Scharrer updated MAHOUT-1525:
-

Fix Version/s: 0.7
   0.8
   0.9

> train/validateAdaptiveLogistic
> --
>
> Key: MAHOUT-1525
> URL: https://issues.apache.org/jira/browse/MAHOUT-1525
> Project: Mahout
>  Issue Type: Question
>  Components: Classification
>Affects Versions: 0.7, 0.8, 0.9
>Reporter: Richard Scharrer
>  Labels: adaptiveLogisticRegression,, newbie
> Fix For: 0.7, 0.8, 0.9
>
>
> Hi,
> I tried to use train- and validateAdaptiveLogistic on my data which is like:
> category, id, var1, var2, ...var72 (all numeric)
> I used the following settings:
> mahout trainAdaptiveLogistic --input resource/trainingData \
> --output ./model \
> --target category --categories 9 \
> --predictors a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 .
> --types numeric \
> --passes 100 \
> --showperf \
> mahout validateAdaptiveLogistic --input resource/testData --model model 
> --confusion --defaultCategory none
> The output of validateAdaptiveLogistic is:
> Log-likelihood:Min=-5.54, Max=-0.04, Mean=-1.58, Median=-1.33
> ===
> Confusion Matrix
> ---
> a b   d   e   f   g   h   i   <--Classified as
> 140   0   0   0   0   0   0|  14  
> a = projekt
> 0 18  0   0   0   0   0   0|  18  
> b = news/aktuelles/presse
> 0 0   24  0   0   0   0   0|  24  
> d = lehrveranstaltung
> 0 0   0   19  0   0   0   0|  19  
> e = publikation
> 0 0   0   0   20  0   0   0|  20  
> f = event
> 0 0   0   0   0   14  0   0|  14  
> g = mitarbeiter/person
> 0 0   0   0   0   0   44  0|  44  
> h = übersicht
> 0 0   0   0   0   0   0   13   |  13  
> i = institut
> (in case you were wondering, the categories a in german)
> My problem is that this is impossible. I always get a perfect classification 
> even with just a little amount of training data. It doesnt even matter how 
> many features I use I tried it with all 72 and with only one. Am I missing 
> something?
> Regards,
> Richard



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (MAHOUT-1525) train/validateAdaptiveLogistic

2014-04-26 Thread Richard Scharrer (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13982088#comment-13982088
 ] 

Richard Scharrer edited comment on MAHOUT-1525 at 4/26/14 9:28 PM:
---

Solved it. I don't know why it's programmed like this, but 
validateAdaptiveLogistic gives you a confusion matrix which shows how it should 
be if everything is classified correctly instead of the value given by the 
model. It can easily be changed by changing:

cm.addInstance(csv.getTargetString(line), csv.getTargetLabel(target)); 

to:

Vector result = learner.classifyFull(v);
int cat = result.maxValueIndex();
cm.addInstance(csv.getTargetString(line), csv.getTargetLabel(cat)); 


was (Author: pilgrim):
Solved it. I don't know why it's programmed like this, but 
validateAdaptiveLogistic gives you a confusion matrix which shows how it should 
be if everything is classified correctly instead of the value given by the 
model. It can easily be changed by changing:

cm.addInstance(csv.getTargetString(line), csv.getTargetLabel(target)); 

too:

Vector result = learner.classifyFull(v);
int cat = result.maxValueIndex();
cm.addInstance(csv.getTargetString(line), csv.getTargetLabel(cat)); 

> train/validateAdaptiveLogistic
> --
>
> Key: MAHOUT-1525
> URL: https://issues.apache.org/jira/browse/MAHOUT-1525
> Project: Mahout
>  Issue Type: Question
>  Components: Classification
>Affects Versions: 0.9
>Reporter: Richard Scharrer
>  Labels: adaptiveLogisticRegression,, newbie
>
> Hi,
> I tried to use train- and validateAdaptiveLogistic on my data which is like:
> category, id, var1, var2, ...var72 (all numeric)
> I used the following settings:
> mahout trainAdaptiveLogistic --input resource/trainingData \
> --output ./model \
> --target category --categories 9 \
> --predictors a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 .
> --types numeric \
> --passes 100 \
> --showperf \
> mahout validateAdaptiveLogistic --input resource/testData --model model 
> --confusion --defaultCategory none
> The output of validateAdaptiveLogistic is:
> Log-likelihood:Min=-5.54, Max=-0.04, Mean=-1.58, Median=-1.33
> ===
> Confusion Matrix
> ---
> a b   d   e   f   g   h   i   <--Classified as
> 140   0   0   0   0   0   0|  14  
> a = projekt
> 0 18  0   0   0   0   0   0|  18  
> b = news/aktuelles/presse
> 0 0   24  0   0   0   0   0|  24  
> d = lehrveranstaltung
> 0 0   0   19  0   0   0   0|  19  
> e = publikation
> 0 0   0   0   20  0   0   0|  20  
> f = event
> 0 0   0   0   0   14  0   0|  14  
> g = mitarbeiter/person
> 0 0   0   0   0   0   44  0|  44  
> h = übersicht
> 0 0   0   0   0   0   0   13   |  13  
> i = institut
> (in case you were wondering, the categories a in german)
> My problem is that this is impossible. I always get a perfect classification 
> even with just a little amount of training data. It doesnt even matter how 
> many features I use I tried it with all 72 and with only one. Am I missing 
> something?
> Regards,
> Richard



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (MAHOUT-1525) train/validateAdaptiveLogistic

2014-04-26 Thread Richard Scharrer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Scharrer resolved MAHOUT-1525.
--

Resolution: Fixed

> train/validateAdaptiveLogistic
> --
>
> Key: MAHOUT-1525
> URL: https://issues.apache.org/jira/browse/MAHOUT-1525
> Project: Mahout
>  Issue Type: Question
>  Components: Classification
>Affects Versions: 0.9
>Reporter: Richard Scharrer
>  Labels: adaptiveLogisticRegression,, newbie
>
> Hi,
> I tried to use train- and validateAdaptiveLogistic on my data which is like:
> category, id, var1, var2, ...var72 (all numeric)
> I used the following settings:
> mahout trainAdaptiveLogistic --input resource/trainingData \
> --output ./model \
> --target category --categories 9 \
> --predictors a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 .
> --types numeric \
> --passes 100 \
> --showperf \
> mahout validateAdaptiveLogistic --input resource/testData --model model 
> --confusion --defaultCategory none
> The output of validateAdaptiveLogistic is:
> Log-likelihood:Min=-5.54, Max=-0.04, Mean=-1.58, Median=-1.33
> ===
> Confusion Matrix
> ---
> a b   d   e   f   g   h   i   <--Classified as
> 140   0   0   0   0   0   0|  14  
> a = projekt
> 0 18  0   0   0   0   0   0|  18  
> b = news/aktuelles/presse
> 0 0   24  0   0   0   0   0|  24  
> d = lehrveranstaltung
> 0 0   0   19  0   0   0   0|  19  
> e = publikation
> 0 0   0   0   20  0   0   0|  20  
> f = event
> 0 0   0   0   0   14  0   0|  14  
> g = mitarbeiter/person
> 0 0   0   0   0   0   44  0|  44  
> h = übersicht
> 0 0   0   0   0   0   0   13   |  13  
> i = institut
> (in case you were wondering, the categories a in german)
> My problem is that this is impossible. I always get a perfect classification 
> even with just a little amount of training data. It doesnt even matter how 
> many features I use I tried it with all 72 and with only one. Am I missing 
> something?
> Regards,
> Richard



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1525) train/validateAdaptiveLogistic

2014-04-26 Thread Richard Scharrer (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13982088#comment-13982088
 ] 

Richard Scharrer commented on MAHOUT-1525:
--

Solved it. I don't know why it's programmed like this, but 
validateAdaptiveLogistic gives you a confusion matrix which shows how it should 
be if everything is classified correctly instead of the value given by the 
model. It can easily be changed by changing:

cm.addInstance(csv.getTargetString(line), csv.getTargetLabel(target)); 

too:

Vector result = learner.classifyFull(v);
int cat = result.maxValueIndex();
cm.addInstance(csv.getTargetString(line), csv.getTargetLabel(cat)); 

> train/validateAdaptiveLogistic
> --
>
> Key: MAHOUT-1525
> URL: https://issues.apache.org/jira/browse/MAHOUT-1525
> Project: Mahout
>  Issue Type: Question
>  Components: Classification
>Affects Versions: 0.9
>Reporter: Richard Scharrer
>  Labels: adaptiveLogisticRegression,, newbie
>
> Hi,
> I tried to use train- and validateAdaptiveLogistic on my data which is like:
> category, id, var1, var2, ...var72 (all numeric)
> I used the following settings:
> mahout trainAdaptiveLogistic --input resource/trainingData \
> --output ./model \
> --target category --categories 9 \
> --predictors a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 .
> --types numeric \
> --passes 100 \
> --showperf \
> mahout validateAdaptiveLogistic --input resource/testData --model model 
> --confusion --defaultCategory none
> The output of validateAdaptiveLogistic is:
> Log-likelihood:Min=-5.54, Max=-0.04, Mean=-1.58, Median=-1.33
> ===
> Confusion Matrix
> ---
> a b   d   e   f   g   h   i   <--Classified as
> 140   0   0   0   0   0   0|  14  
> a = projekt
> 0 18  0   0   0   0   0   0|  18  
> b = news/aktuelles/presse
> 0 0   24  0   0   0   0   0|  24  
> d = lehrveranstaltung
> 0 0   0   19  0   0   0   0|  19  
> e = publikation
> 0 0   0   0   20  0   0   0|  20  
> f = event
> 0 0   0   0   0   14  0   0|  14  
> g = mitarbeiter/person
> 0 0   0   0   0   0   44  0|  44  
> h = übersicht
> 0 0   0   0   0   0   0   13   |  13  
> i = institut
> (in case you were wondering, the categories a in german)
> My problem is that this is impossible. I always get a perfect classification 
> even with just a little amount of training data. It doesnt even matter how 
> many features I use I tried it with all 72 and with only one. Am I missing 
> something?
> Regards,
> Richard



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (MAHOUT-1525) train/validateAdaptiveLogistic

2014-04-25 Thread Richard Scharrer (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13981867#comment-13981867
 ] 

Richard Scharrer edited comment on MAHOUT-1525 at 4/26/14 3:58 AM:
---

Thank you for your response. I'm working with 0.9 now but I still have the same 
problem. Any idea what to do?


was (Author: pilgrim):
Thank you for your response. I'm working with 0.9 now but I still have the same 
problem. Should I create a new issue with version 0.9?

> train/validateAdaptiveLogistic
> --
>
> Key: MAHOUT-1525
> URL: https://issues.apache.org/jira/browse/MAHOUT-1525
> Project: Mahout
>  Issue Type: Question
>  Components: Classification
>Affects Versions: 0.9
>Reporter: Richard Scharrer
>  Labels: adaptiveLogisticRegression,, newbie
>
> Hi,
> I tried to use train- and validateAdaptiveLogistic on my data which is like:
> category, id, var1, var2, ...var72 (all numeric)
> I used the following settings:
> mahout trainAdaptiveLogistic --input resource/trainingData \
> --output ./model \
> --target category --categories 9 \
> --predictors a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 .
> --types numeric \
> --passes 100 \
> --showperf \
> mahout validateAdaptiveLogistic --input resource/testData --model model 
> --confusion --defaultCategory none
> The output of validateAdaptiveLogistic is:
> Log-likelihood:Min=-5.54, Max=-0.04, Mean=-1.58, Median=-1.33
> ===
> Confusion Matrix
> ---
> a b   d   e   f   g   h   i   <--Classified as
> 140   0   0   0   0   0   0|  14  
> a = projekt
> 0 18  0   0   0   0   0   0|  18  
> b = news/aktuelles/presse
> 0 0   24  0   0   0   0   0|  24  
> d = lehrveranstaltung
> 0 0   0   19  0   0   0   0|  19  
> e = publikation
> 0 0   0   0   20  0   0   0|  20  
> f = event
> 0 0   0   0   0   14  0   0|  14  
> g = mitarbeiter/person
> 0 0   0   0   0   0   44  0|  44  
> h = übersicht
> 0 0   0   0   0   0   0   13   |  13  
> i = institut
> (in case you were wondering, the categories a in german)
> My problem is that this is impossible. I always get a perfect classification 
> even with just a little amount of training data. It doesnt even matter how 
> many features I use I tried it with all 72 and with only one. Am I missing 
> something?
> Regards,
> Richard



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAHOUT-1525) train/validateAdaptiveLogistic

2014-04-25 Thread Richard Scharrer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Scharrer updated MAHOUT-1525:
-

Affects Version/s: (was: 0.7)
   0.9

> train/validateAdaptiveLogistic
> --
>
> Key: MAHOUT-1525
> URL: https://issues.apache.org/jira/browse/MAHOUT-1525
> Project: Mahout
>  Issue Type: Question
>  Components: Classification
>Affects Versions: 0.9
>Reporter: Richard Scharrer
>  Labels: adaptiveLogisticRegression,, newbie
>
> Hi,
> I tried to use train- and validateAdaptiveLogistic on my data which is like:
> category, id, var1, var2, ...var72 (all numeric)
> I used the following settings:
> mahout trainAdaptiveLogistic --input resource/trainingData \
> --output ./model \
> --target category --categories 9 \
> --predictors a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 .
> --types numeric \
> --passes 100 \
> --showperf \
> mahout validateAdaptiveLogistic --input resource/testData --model model 
> --confusion --defaultCategory none
> The output of validateAdaptiveLogistic is:
> Log-likelihood:Min=-5.54, Max=-0.04, Mean=-1.58, Median=-1.33
> ===
> Confusion Matrix
> ---
> a b   d   e   f   g   h   i   <--Classified as
> 140   0   0   0   0   0   0|  14  
> a = projekt
> 0 18  0   0   0   0   0   0|  18  
> b = news/aktuelles/presse
> 0 0   24  0   0   0   0   0|  24  
> d = lehrveranstaltung
> 0 0   0   19  0   0   0   0|  19  
> e = publikation
> 0 0   0   0   20  0   0   0|  20  
> f = event
> 0 0   0   0   0   14  0   0|  14  
> g = mitarbeiter/person
> 0 0   0   0   0   0   44  0|  44  
> h = übersicht
> 0 0   0   0   0   0   0   13   |  13  
> i = institut
> (in case you were wondering, the categories a in german)
> My problem is that this is impossible. I always get a perfect classification 
> even with just a little amount of training data. It doesnt even matter how 
> many features I use I tried it with all 72 and with only one. Am I missing 
> something?
> Regards,
> Richard



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1525) train/validateAdaptiveLogistic

2014-04-25 Thread Richard Scharrer (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13981867#comment-13981867
 ] 

Richard Scharrer commented on MAHOUT-1525:
--

Thank you for your response. I'm working with 0.9 now but I still have the same 
problem. Should I create a new issue with version 0.9?

> train/validateAdaptiveLogistic
> --
>
> Key: MAHOUT-1525
> URL: https://issues.apache.org/jira/browse/MAHOUT-1525
> Project: Mahout
>  Issue Type: Question
>  Components: Classification
>Affects Versions: 0.7
>Reporter: Richard Scharrer
>  Labels: adaptiveLogisticRegression,, newbie
>
> Hi,
> I tried to use train- and validateAdaptiveLogistic on my data which is like:
> category, id, var1, var2, ...var72 (all numeric)
> I used the following settings:
> mahout trainAdaptiveLogistic --input resource/trainingData \
> --output ./model \
> --target category --categories 9 \
> --predictors a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 .
> --types numeric \
> --passes 100 \
> --showperf \
> mahout validateAdaptiveLogistic --input resource/testData --model model 
> --confusion --defaultCategory none
> The output of validateAdaptiveLogistic is:
> Log-likelihood:Min=-5.54, Max=-0.04, Mean=-1.58, Median=-1.33
> ===
> Confusion Matrix
> ---
> a b   d   e   f   g   h   i   <--Classified as
> 140   0   0   0   0   0   0|  14  
> a = projekt
> 0 18  0   0   0   0   0   0|  18  
> b = news/aktuelles/presse
> 0 0   24  0   0   0   0   0|  24  
> d = lehrveranstaltung
> 0 0   0   19  0   0   0   0|  19  
> e = publikation
> 0 0   0   0   20  0   0   0|  20  
> f = event
> 0 0   0   0   0   14  0   0|  14  
> g = mitarbeiter/person
> 0 0   0   0   0   0   44  0|  44  
> h = übersicht
> 0 0   0   0   0   0   0   13   |  13  
> i = institut
> (in case you were wondering, the categories a in german)
> My problem is that this is impossible. I always get a perfect classification 
> even with just a little amount of training data. It doesnt even matter how 
> many features I use I tried it with all 72 and with only one. Am I missing 
> something?
> Regards,
> Richard



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1329) Mahout for hadoop 2

2014-04-25 Thread Richard Scharrer (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13981492#comment-13981492
 ] 

Richard Scharrer commented on MAHOUT-1329:
--

Hi,
i tried this patch and the build was successfull but when I start mahout I get 

Running on hadoop, using /usr/bin/hadoop and HADOOP_CONF_DIR=
MAHOUT-JOB: 
/home/drahcir/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
Exception in thread "main" java.lang.NoSuchMethodError: 
org.apache.hadoop.util.ProgramDriver.driver([Ljava/lang/String;)V
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:122)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)

Im using CDH 4.6 with Hadoop 2.0.0-cdh4.6.0.

Any idea why this doesn't work?


> Mahout for hadoop 2
> ---
>
> Key: MAHOUT-1329
> URL: https://issues.apache.org/jira/browse/MAHOUT-1329
> Project: Mahout
>  Issue Type: Task
>  Components: build
>Affects Versions: 0.9
>Reporter: Sergey Svinarchuk
>Assignee: Gokhan Capan
>  Labels: patch
> Fix For: 1.0
>
> Attachments: 1329-2.patch, 1329-3-additional.patch, 1329-3.patch, 
> 1329.patch
>
>
> Update mahout for work with hadoop 2.X, targeting this for Mahout 1.0.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (MAHOUT-1525) train/validateAdaptiveLogistic

2014-04-24 Thread Richard Scharrer (JIRA)
Richard Scharrer created MAHOUT-1525:


 Summary: train/validateAdaptiveLogistic
 Key: MAHOUT-1525
 URL: https://issues.apache.org/jira/browse/MAHOUT-1525
 Project: Mahout
  Issue Type: Question
  Components: Classification
Affects Versions: 0.7
Reporter: Richard Scharrer


Hi,
I tried to use train- and validateAdaptiveLogistic on my data which is like:
category, id, var1, var2, ...var72 (all numeric)

I used the following settings:
mahout trainAdaptiveLogistic --input resource/trainingData \
--output ./model \
--target category --categories 9 \
--predictors a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 .
--types numeric \
--passes 100 \
--showperf \

mahout validateAdaptiveLogistic --input resource/testData --model model 
--confusion --defaultCategory none

The output of validateAdaptiveLogistic is:
Log-likelihood:Min=-5.54, Max=-0.04, Mean=-1.58, Median=-1.33

===
Confusion Matrix
---
a   b   d   e   f   g   h   i   <--Classified as
14  0   0   0   0   0   0   0|  14  
a = projekt
0   18  0   0   0   0   0   0|  18  
b = news/aktuelles/presse
0   0   24  0   0   0   0   0|  24  
d = lehrveranstaltung
0   0   0   19  0   0   0   0|  19  
e = publikation
0   0   0   0   20  0   0   0|  20  
f = event
0   0   0   0   0   14  0   0|  14  
g = mitarbeiter/person
0   0   0   0   0   0   44  0|  44  
h = übersicht
0   0   0   0   0   0   0   13   |  13  
i = institut


(in case you were wondering, the categories a in german)

My problem is that this is impossible. I always get a perfect classification 
even with just a little amount of training data. It doesnt even matter how many 
features I use I tried it with all 72 and with only one. Am I missing something?

Regards,
Richard



--
This message was sent by Atlassian JIRA
(v6.2#6252)