HMM Model error

2015-05-11 Thread Raghuveer
When i am trying to run the sample from 
http://mahout.apache.org/users/classification/hidden-markov-models.html the 
model is running fine. However when i give a different sequence like below i 
see the error mentioned below:

echo 0 3 5 8 11 14 17 20 23 26 29 32 35 38 41 44 47 50 53 56 59 62 65 67 70 73 
76 79 82 85 88 91 94 97 100 103 106 54 56 57 59 60 62 63 65  hmm-input

mahout baumwelch -i hmm-input -o hmm-model -nh 3 -no 4 -e .0001 -m 1000 
Exception in thread main java.lang.ArrayIndexOutOfBoundsException: 5
    at org.apache.mahout.math.DenseMatrix.getQuick(DenseMatrix.java:78)
    at 
org.apache.mahout.classifier.sequencelearning.hmm.HmmAlgorithms.forwardAlgorithm(HmmAlgorithms.java:85)
    at 
org.apache.mahout.classifier.sequencelearning.hmm.HmmTrainer.trainBaumWelch(HmmTrainer.java:315)
    at 
org.apache.mahout.classifier.sequencelearning.hmm.BaumWelchTrainer.main(BaumWelchTrainer.java:116)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:483)
    at 
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
    at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
    at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)

Kindly suggest how i can get ride of this error.
Regards,Raghuveer


Re: Replacement for DefaultAnalyzer

2015-05-11 Thread Lewis John Mcgibbney
Hi Suneel,
Just for context, I've implemented the following.

@Override
protected void map(Text key, BehemothDocument value, Context context)
throws IOException, InterruptedException {
String sContent = value.getText();
if (sContent == null) {
// no text available? skip
context.getCounter(LuceneTokenizer, BehemothDocWithoutText)
.increment(1);
return;
}
analyzer = new StandardAnalyzer(matchVersion); // or any other
analyzer
TokenStream ts = analyzer.tokenStream(key.toString(), new
StringReader(sContent.toString()));
// The Analyzer class will construct the Tokenizer, TokenFilter(s),
and CharFilter(s),
//   and pass the resulting Reader to the Tokenizer.
@SuppressWarnings(unused)
OffsetAttribute offsetAtt = ts.addAttribute(OffsetAttribute.class);

CharTermAttribute termAtt = ts
.addAttribute(CharTermAttribute.class);
StringTuple document = new StringTuple();
try {
ts.reset(); // Resets this stream to the beginning. (Required)
while (ts.incrementToken()) {
if (termAtt.length()  0) {
document.add(new String(termAtt.buffer(), 0,
termAtt.length()));
}
}
ts.end();   // Perform end-of-stream operations, e.g. set the
final offset.
} finally {
ts.close(); // Release resources associated with this stream.
  }
context.write(key, document);
}

I'll be testing and will update is anything else comes up.
Thanks
Lewis


On Mon, May 11, 2015 at 2:12 PM, Lewis John Mcgibbney 
lewis.mcgibb...@gmail.com wrote:

 I found Mike's blog post regarding Lucene 4.X from a while ago [0].
 In the* '*Other Changes*'* section Mike states Analyzers must always
 provide a reusable token stream, by implementing the
 Analyzer.createComponents method (reusableTokenStream has been removed
 and tokenStream is now final, in Analzyer).
 This provides a good bit ore context therefore I'm going to continue on
 createComponents route with the aim of implementing the newer 4.X Lucene
 API.
 In the meantime, if you get any updated or have a code sample it would be
 very much appreciated.
 Thanks
 Lewis

 [0]
 http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html

 On Mon, May 11, 2015 at 2:03 PM, Lewis John Mcgibbney 
 lewis.mcgibb...@gmail.com wrote:

 Hi Suneel,

 On Sat, May 9, 2015 at 11:21 AM, Suneel Marthi smar...@apache.org
 wrote:

 Mahout 0.9 and 0.10.0 are using Lucene 4.6.1. There's been a change in
 the
 TokenStream workflow in Lucene post-Lucene 4.5.


 Yes I know that after looking into the codebase. Thanks for clarifying!



 What exactly are u trying to do and where is it u r stuck now? It would
 help if u posted a code snippet or something.


 In particular I am working on the following implementation [0] which uses
 the following code

 TokenStream stream = analyzer.reusableTokenStream(key.toString(), new
 StringReader(sContent.toString()));

 Of note here is that the analyzer object is instantiated as of type
 DefaultAnalyzer [1]. It is further noted that the 
 analyzer.reusableTokenStream
 API is deprecated as you've noted so I am just wondering what the suggested
 API semantics are in order to achieve the desired upgrade.
 Thanks in advance again for any input.
 Lewis

 [0]
 https://github.com/DigitalPebble/behemoth/blob/master/mahout/src/main/java/com/digitalpebble/behemoth/mahout/LuceneTokenizerMapper.java#L52-L53
 [1]
 http://svn.apache.org/repos/asf/mahout/tags/mahout-0.7/core/src/main/java/org/apache/mahout/vectorizer/DefaultAnalyzer.java






 --
 *Lewis*




-- 
*Lewis*


Re: Replacement for DefaultAnalyzer

2015-05-11 Thread Lewis John Mcgibbney
I found Mike's blog post regarding Lucene 4.X from a while ago [0].
In the* '*Other Changes*'* section Mike states Analyzers must always
provide a reusable token stream, by implementing the
Analyzer.createComponents method (reusableTokenStream has been removed and
tokenStream is now final, in Analzyer).
This provides a good bit ore context therefore I'm going to continue on
createComponents route with the aim of implementing the newer 4.X Lucene
API.
In the meantime, if you get any updated or have a code sample it would be
very much appreciated.
Thanks
Lewis

[0]
http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html

On Mon, May 11, 2015 at 2:03 PM, Lewis John Mcgibbney 
lewis.mcgibb...@gmail.com wrote:

 Hi Suneel,

 On Sat, May 9, 2015 at 11:21 AM, Suneel Marthi smar...@apache.org wrote:

 Mahout 0.9 and 0.10.0 are using Lucene 4.6.1. There's been a change in the
 TokenStream workflow in Lucene post-Lucene 4.5.


 Yes I know that after looking into the codebase. Thanks for clarifying!



 What exactly are u trying to do and where is it u r stuck now? It would
 help if u posted a code snippet or something.


 In particular I am working on the following implementation [0] which uses
 the following code

 TokenStream stream = analyzer.reusableTokenStream(key.toString(), new
 StringReader(sContent.toString()));

 Of note here is that the analyzer object is instantiated as of type
 DefaultAnalyzer [1]. It is further noted that the analyzer.reusableTokenStream
 API is deprecated as you've noted so I am just wondering what the suggested
 API semantics are in order to achieve the desired upgrade.
 Thanks in advance again for any input.
 Lewis

 [0]
 https://github.com/DigitalPebble/behemoth/blob/master/mahout/src/main/java/com/digitalpebble/behemoth/mahout/LuceneTokenizerMapper.java#L52-L53
 [1]
 http://svn.apache.org/repos/asf/mahout/tags/mahout-0.7/core/src/main/java/org/apache/mahout/vectorizer/DefaultAnalyzer.java






-- 
*Lewis*


Re: Replacement for DefaultAnalyzer

2015-05-11 Thread Lewis John Mcgibbney
Hi Suneel,

On Sat, May 9, 2015 at 11:21 AM, Suneel Marthi smar...@apache.org wrote:

 Mahout 0.9 and 0.10.0 are using Lucene 4.6.1. There's been a change in the
 TokenStream workflow in Lucene post-Lucene 4.5.


Yes I know that after looking into the codebase. Thanks for clarifying!



 What exactly are u trying to do and where is it u r stuck now? It would
 help if u posted a code snippet or something.


In particular I am working on the following implementation [0] which uses
the following code

TokenStream stream = analyzer.reusableTokenStream(key.toString(), new
StringReader(sContent.toString()));

Of note here is that the analyzer object is instantiated as of type
DefaultAnalyzer [1]. It is further noted that the analyzer.reusableTokenStream
API is deprecated as you've noted so I am just wondering what the suggested
API semantics are in order to achieve the desired upgrade.
Thanks in advance again for any input.
Lewis

[0]
https://github.com/DigitalPebble/behemoth/blob/master/mahout/src/main/java/com/digitalpebble/behemoth/mahout/LuceneTokenizerMapper.java#L52-L53
[1]
http://svn.apache.org/repos/asf/mahout/tags/mahout-0.7/core/src/main/java/org/apache/mahout/vectorizer/DefaultAnalyzer.java


Re: HMM Model error

2015-05-11 Thread Raghuveer
Can you please tell me how is it 107 because i have only 64 elements and if i 
remove all the spaces its 90 elements, can you kindly explain. 


 On Monday, May 11, 2015 5:21 PM, Max Heimel mhei...@gmail.com wrote:
   

 Hi Raghuveer,
the crash was caused because you did not provide the correct number of observed 
states (in your case: 107) to the -no argument of the BaumWelch trainer. (The 
trainer expects that the states in the provided sequence are encoded as 
integers from 0 to nr_states-1.)
Max
2015-05-11 12:25 GMT+02:00 Raghuveer alwaysra...@yahoo.com.invalid:

When i am trying to run the sample from 
http://mahout.apache.org/users/classification/hidden-markov-models.html the 
model is running fine. However when i give a different sequence like below i 
see the error mentioned below:

echo 0 3 5 8 11 14 17 20 23 26 29 32 35 38 41 44 47 50 53 56 59 62 65 67 70 73 
76 79 82 85 88 91 94 97 100 103 106 54 56 57 59 60 62 63 65  hmm-input

mahout baumwelch -i hmm-input -o hmm-model -nh 3 -no 4 -e .0001 -m 1000 
Exception in thread main java.lang.ArrayIndexOutOfBoundsException: 5
    at org.apache.mahout.math.DenseMatrix.getQuick(DenseMatrix.java:78)
    at 
org.apache.mahout.classifier.sequencelearning.hmm.HmmAlgorithms.forwardAlgorithm(HmmAlgorithms.java:85)
    at 
org.apache.mahout.classifier.sequencelearning.hmm.HmmTrainer.trainBaumWelch(HmmTrainer.java:315)
    at 
org.apache.mahout.classifier.sequencelearning.hmm.BaumWelchTrainer.main(BaumWelchTrainer.java:116)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:483)
    at 
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
    at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
    at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)

Kindly suggest how i can get ride of this error.
Regards,Raghuveer




  

Re: HMM Model error

2015-05-11 Thread Raghuveer
When i run as you suggest i got the resultsas below:

Initial probabilities: 
0 1 2 
NaN NaN NaN 
Transition matrix:
  0 1 2 
0 NaN NaN NaN 
1 NaN NaN NaN 
2 NaN NaN NaN 
Emission matrix: 
  0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 
81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 
105 106 
0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 
NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 
NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 
NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 
NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 
NaN NaN NaN NaN NaN NaN NaN NaN 
1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 
NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 
NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 
NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 
NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 
NaN NaN NaN NaN NaN NaN NaN NaN 
2 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 
NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 
NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 
NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 
NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 
NaN NaN NaN NaN NaN NaN NaN NaN 
15/05/12 11:05:17 INFO driver.MahoutDriver: Program took 569 ms (Minutes: 
0.009483)

So the final result that i see when i run cat hmm-predictions is:
 
0 0 0 0 0 0 0 0 0 0 
Is this correct or my initial data is incorrect? 



 On Tuesday, May 12, 2015 11:16 AM, Raghuveer 
alwaysra...@yahoo.com.INVALID wrote:
   

 Can you please tell me how is it 107 because i have only 64 elements and if i 
remove all the spaces its 90 elements, can you kindly explain. 


    On Monday, May 11, 2015 5:21 PM, Max Heimel mhei...@gmail.com wrote:
  

 Hi Raghuveer,
the crash was caused because you did not provide the correct number of observed 
states (in your case: 107) to the -no argument of the BaumWelch trainer. (The 
trainer expects that the states in the provided sequence are encoded as 
integers from 0 to nr_states-1.)
Max
2015-05-11 12:25 GMT+02:00 Raghuveer alwaysra...@yahoo.com.invalid:

When i am trying to run the sample from 
http://mahout.apache.org/users/classification/hidden-markov-models.html the 
model is running fine. However when i give a different sequence like below i 
see the error mentioned below:

echo 0 3 5 8 11 14 17 20 23 26 29 32 35 38 41 44 47 50 53 56 59 62 65 67 70 73 
76 79 82 85 88 91 94 97 100 103 106 54 56 57 59 60 62 63 65  hmm-input

mahout baumwelch -i hmm-input -o hmm-model -nh 3 -no 4 -e .0001 -m 1000 
Exception in thread main java.lang.ArrayIndexOutOfBoundsException: 5
    at org.apache.mahout.math.DenseMatrix.getQuick(DenseMatrix.java:78)
    at 
org.apache.mahout.classifier.sequencelearning.hmm.HmmAlgorithms.forwardAlgorithm(HmmAlgorithms.java:85)
    at 
org.apache.mahout.classifier.sequencelearning.hmm.HmmTrainer.trainBaumWelch(HmmTrainer.java:315)
    at 
org.apache.mahout.classifier.sequencelearning.hmm.BaumWelchTrainer.main(BaumWelchTrainer.java:116)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:483)
    at 
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
    at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
    at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)

Kindly suggest how i can get ride of this error.
Regards,Raghuveer