subject:"\[sage\-support\] possible bug in DiscreteHiddenMarkovModel \- NaN produced in output matrices"

Re: [sage-support] possible bug in DiscreteHiddenMarkovModel - NaN produced in output matrices

2014-05-09 Thread Jesse Hersch

On Thursday, May 8, 2014 5:59:10 PM UTC-7, William wrote:

Do you recall if you handled the underflow problem in your
implementation?

I believe it does not.

I haven't studied the code yet, but it seems like this could be the
culprit.

I think you're right. You should implement it!

I had a look at the code an it appears that it *is* already handling the
underflow problem.

The scaling factors are computed in _forward_scale_all() and used in both
_forward_scale_all() and _backward_scale_all(). Also _viterbi_scale() is
using log probabilities to avoid underflow in products of small
probabilities.

So I need to dig deeper. btw I am new to both sage and cython. I am eager
to find the cause and fix this though. So here's my question:

If I make a change to hmm.pyx, how do I get sage to pick up that change
without having to rebuild all of sage from source? (that took a few hours).

I read here that I can attach a .pyx file which should force a cython
recompilation of hte file whenever the .pyx file is changed. Is that
right?
http://www.sagemath.org/doc/developer/coding_in_cython.html#attaching-or-loading-spyx-files

I tried that and got a syntax error:

sage: attach /home/jhersch/bin/sage-6.2/src/sage/stats/hmm/hmm.pyx
File ipython-input-4-162f4bbc7027, line 1
attach /home/jhersch/bin/sage-6.2/src/sage/stats/hmm/hmm.pyx
^
SyntaxError: invalid syntax

What is the usual way sage developers go about making changes in cython
code without rebuilding everything?

Thanks!

Jesse

--
You received this message because you are subscribed to the Google Groups
sage-support group.
To unsubscribe from this group and stop receiving emails from it, send an email
to sage-support+unsubscr...@googlegroups.com.
To post to this group, send email to sage-support@googlegroups.com.
Visit this group at http://groups.google.com/group/sage-support.
For more options, visit https://groups.google.com/d/optout.

Re: [sage-support] possible bug in DiscreteHiddenMarkovModel - NaN produced in output matrices

2014-05-09 Thread William Stein

On Fri, May 9, 2014 at 12:12 PM, Jesse Hersch jesseher...@fastmail.fm wrote:
On Thursday, May 8, 2014 5:59:10 PM UTC-7, William wrote:

Do you recall if you handled the underflow problem in your
implementation?

I believe it does not.

I haven't studied the code yet, but it seems like this could be the
culprit.

I think you're right. You should implement it!

I had a look at the code an it appears that it is already handling the
underflow problem.

So I need to dig deeper. btw I am new to both sage and cython. I am eager
to find the cause and fix this though. So here's my question:

If I make a change to hmm.pyx, how do I get sage to pick up that change
without having to rebuild all of sage from source? (that took a few hours).

./sage -br, as Lief said.

By the way, 10 minutes ago I just gave a very, very basic lecture on
Cython, which will appear here shortly: http://youtu.be/YrO89QIizxI

I read here that I can attach a .pyx file which should force a cython
recompilation of hte file whenever the .pyx file is changed. Is that right?
http://www.sagemath.org/doc/developer/coding_in_cython.html#attaching-or-loading-spyx-files

I tried that and got a syntax error:

sage: attach /home/jhersch/bin/sage-6.2/src/sage/stats/hmm/hmm.pyx
File ipython-input-4-162f4bbc7027, line 1
attach /home/jhersch/bin/sage-6.2/src/sage/stats/hmm/hmm.pyx
^
SyntaxError: invalid syntax

What is the usual way sage developers go about making changes in cython code
without rebuilding everything?

Thanks!

Jesse

--
You received this message because you are subscribed to the Google Groups
sage-support group.
To unsubscribe from this group and stop receiving emails from it, send an
email to sage-support+unsubscr...@googlegroups.com.
To post to this group, send email to sage-support@googlegroups.com.
Visit this group at http://groups.google.com/group/sage-support.
For more options, visit https://groups.google.com/d/optout.

--
William Stein
Professor of Mathematics
University of Washington
http://wstein.org

[sage-support] possible bug in DiscreteHiddenMarkovModel - NaN produced in output matrices

2014-05-08 Thread Jesse Hersch

Hi there, 

I think I may have found a bug in the class hmm.DiscreteHiddenMarkovModel. 
 The repro is below.  It probably has something to do with one emission 
value being much more common than the others, but that shouldn't be invalid 
from my understanding of HMMs.

I am running Sage Version 6.2 on Linux (CentOS).  I built it from source 
yesterday.  I am a sage newbie!  

Why am I reporting the bug here?  Because the report a problem link in 
the sage notebook points here: http://ask.sagemath.org/questions/ but I 
cannot post there because of being a new user (karma  10)  That page says 
to use this list instead.  :) 

*repro:*

print version()

# here are two emisison sequences.  each observable has 4 possible values: 
0-3.
# 1 is much more common then 0,2,3 obviously
sequences = [
[1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 3, 1, 1,
 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 3, 3, 1, 1, 3, 1, 1, 1, 1, 1, 
1, 3, 1, 1, 1, 1, 1, 3, 3, 2, 3, 1, 3, 1,
 3, 1, 3, 3, 3, 1, 1, 3, 3, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1]]

transitions = [[0.2, 0.8], [0.2, 0.8]]
pi = [.4, .6]
b = [[.1, .7, .1, .1], [.1, .7, .1, .1]]
model = hmm.DiscreteHiddenMarkovModel(A=transitions, B=b, pi=pi, 
emission_symbols=None, normalize=True)

print 'initial state for hmm:\n', model

# training on the first sequence goes ok.
# but after the second sequence, all elements of the transition, emission, 
and pi matrices are NaN.
for i, seq in enumerate(sequences):
print '\nbaum_welch on sequence ', i
model.baum_welch(obs=seq, max_iter=1000)
print model


*And here is the output.  see the many NaN in the final model*

Sage Version 6.2, Release Date: 2014-05-06
initial state for hmm:
Discrete Hidden Markov Model with 2 States and 4 Emissions
Transition matrix:
[0.2 0.8]
[0.2 0.8]
Emission matrix:
[0.1 0.7 0.1 0.1]
[0.1 0.7 0.1 0.1]
Initial probabilities: [0.4000, 0.6000]

baum_welch on sequence  0
(-18.660162393780404, 128)
Discrete Hidden Markov Model with 2 States and 4 Emissions
Transition matrix:
[0.195469702114 0.804530297886]
[0.197500250574 0.802499749426]
Emission matrix:
[0.0001956779127210.999217288349   0.0 0.000587033738163]
[  0.01363219259310.945471229628   0.0   0.040896594]
Initial probabilities: [0.9812, 0.0188]

baum_welch on sequence  1
(nan, 1000)
Discrete Hidden Markov Model with 2 States and 4 Emissions
Transition matrix:
[NaN NaN]
[NaN NaN]
Emission matrix:
[NaN NaN NaN NaN]
[NaN NaN NaN NaN]
Initial probabilities: [nan, nan]

-- 
You received this message because you are subscribed to the Google Groups 
sage-support group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sage-support+unsubscr...@googlegroups.com.
To post to this group, send email to sage-support@googlegroups.com.
Visit this group at http://groups.google.com/group/sage-support.
For more options, visit https://groups.google.com/d/optout.

Re: [sage-support] possible bug in DiscreteHiddenMarkovModel - NaN produced in output matrices

2014-05-08 Thread William Stein

On Thu, May 8, 2014 at 3:50 PM, Jesse Hersch jesseher...@fastmail.fm wrote:
 Hi there,

 I think I may have found a bug in the class hmm.DiscreteHiddenMarkovModel.
 The repro is below.  It probably has something to do with one emission value
 being much more common than the others, but that shouldn't be invalid from
 my understanding of HMMs.

I could be wrong, but I don't think the implementation of Baum-Welch
is wrong.  The BM algorithm [1] using double precision numbers (which
is all the HMM algorithm in Sage uses) can lead to overflow, given the
sort of computations that are involved.

[1] http://en.wikipedia.org/wiki/Baum%E2%80%93Welch_algorithm

You can see the Sage implementation of Baum-Welch by typing

   model.baum_welch??

after running your code below, or visiting this link:

  https://github.com/sagemath/sage/blob/master/src/sage/stats/hmm/hmm.pyx

The entire implementation starting around line 1250 is only about 1-2
pages, and a straightforward translation of the standard thing.

 -- William


 I am running Sage Version 6.2 on Linux (CentOS).  I built it from source
 yesterday.  I am a sage newbie!

 Why am I reporting the bug here?  Because the report a problem link in the
 sage notebook points here: http://ask.sagemath.org/questions/ but I cannot
 post there because of being a new user (karma  10)  That page says to use
 this list instead.  :)

 repro:

 print version()

 # here are two emisison sequences.  each observable has 4 possible values:
 0-3.
 # 1 is much more common then 0,2,3 obviously
 sequences = [
 [1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
  1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 3, 1, 1,
  1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
 [1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 3, 3, 1, 1, 3, 1, 1, 1, 1, 1, 1,
 3, 1, 1, 1, 1, 1, 3, 3, 2, 3, 1, 3, 1,
  3, 1, 3, 3, 3, 1, 1, 3, 3, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
  1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1]]

 transitions = [[0.2, 0.8], [0.2, 0.8]]
 pi = [.4, .6]
 b = [[.1, .7, .1, .1], [.1, .7, .1, .1]]
 model = hmm.DiscreteHiddenMarkovModel(A=transitions, B=b, pi=pi,
 emission_symbols=None, normalize=True)

 print 'initial state for hmm:\n', model

 # training on the first sequence goes ok.
 # but after the second sequence, all elements of the transition, emission,
 and pi matrices are NaN.
 for i, seq in enumerate(sequences):
 print '\nbaum_welch on sequence ', i
 model.baum_welch(obs=seq, max_iter=1000)
 print model


 And here is the output.  see the many NaN in the final model

 Sage Version 6.2, Release Date: 2014-05-06
 initial state for hmm:
 Discrete Hidden Markov Model with 2 States and 4 Emissions
 Transition matrix:
 [0.2 0.8]
 [0.2 0.8]
 Emission matrix:
 [0.1 0.7 0.1 0.1]
 [0.1 0.7 0.1 0.1]
 Initial probabilities: [0.4000, 0.6000]

 baum_welch on sequence  0
 (-18.660162393780404, 128)
 Discrete Hidden Markov Model with 2 States and 4 Emissions
 Transition matrix:
 [0.195469702114 0.804530297886]
 [0.197500250574 0.802499749426]
 Emission matrix:
 [0.0001956779127210.999217288349   0.0 0.000587033738163]
 [  0.01363219259310.945471229628   0.0   0.040896594]
 Initial probabilities: [0.9812, 0.0188]

 baum_welch on sequence  1
 (nan, 1000)
 Discrete Hidden Markov Model with 2 States and 4 Emissions
 Transition matrix:
 [NaN NaN]
 [NaN NaN]
 Emission matrix:
 [NaN NaN NaN NaN]
 [NaN NaN NaN NaN]
 Initial probabilities: [nan, nan]

 --
 You received this message because you are subscribed to the Google Groups
 sage-support group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to sage-support+unsubscr...@googlegroups.com.
 To post to this group, send email to sage-support@googlegroups.com.
 Visit this group at http://groups.google.com/group/sage-support.
 For more options, visit https://groups.google.com/d/optout.



-- 
William Stein
Professor of Mathematics
University of Washington
http://wstein.org

-- 
You received this message because you are subscribed to the Google Groups 
sage-support group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sage-support+unsubscr...@googlegroups.com.
To post to this group, send email to sage-support@googlegroups.com.
Visit this group at http://groups.google.com/group/sage-support.
For more options, visit https://groups.google.com/d/optout.

Re: [sage-support] possible bug in DiscreteHiddenMarkovModel - NaN produced in output matrices

2014-05-08 Thread Jesse Hersch

On Thursday, May 8, 2014 4:14:32 PM UTC-7, William wrote:

 I could be wrong, but I don't think the implementation of Baum-Welch 
 is wrong.  The BM algorithm [1] using double precision numbers (which 
 is all the HMM algorithm in Sage uses) can lead to overflow, given the 
 sort of computations that are involved. 


Thanks for the reply!  

My understanding is that it's underflow that's more common with HMM stuff, 
due to all the products of small probabilities running around.  In some 
implementations I've seen this handled by the logsumexp 
trick: 
http://machineintelligence.tumblr.com/post/4998477107/the-log-sum-exp-trick

Also in the Rabiner tutorial there's a section on scaling where he talks 
about underflow and how to handle it.  that's on page 16 (272) 
here: 
http://people.sabanciuniv.edu/berrin/cs512/reading/rabiner-tutorial-on-hmm.pdf

Do you recall if you handled the underflow problem in your implementation? 
 I haven't studied the code yet, but it seems like this could be the 
culprit.

-- 
You received this message because you are subscribed to the Google Groups 
sage-support group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sage-support+unsubscr...@googlegroups.com.
To post to this group, send email to sage-support@googlegroups.com.
Visit this group at http://groups.google.com/group/sage-support.
For more options, visit https://groups.google.com/d/optout.

Re: [sage-support] possible bug in DiscreteHiddenMarkovModel - NaN produced in output matrices

2014-05-08 Thread William Stein

On Thu, May 8, 2014 at 4:57 PM, Jesse Hersch jesseher...@fastmail.fm wrote:
 On Thursday, May 8, 2014 4:14:32 PM UTC-7, William wrote:

 I could be wrong, but I don't think the implementation of Baum-Welch
 is wrong.  The BM algorithm [1] using double precision numbers (which
 is all the HMM algorithm in Sage uses) can lead to overflow, given the
 sort of computations that are involved.


 Thanks for the reply!

 My understanding is that it's underflow that's more common with HMM stuff,
 due to all the products of small probabilities running around.  In some
 implementations I've seen this handled by the logsumexp trick:
 http://machineintelligence.tumblr.com/post/4998477107/the-log-sum-exp-trick

 Also in the Rabiner tutorial there's a section on scaling where he talks
 about underflow and how to handle it.  that's on page 16 (272) here:
 http://people.sabanciuniv.edu/berrin/cs512/reading/rabiner-tutorial-on-hmm.pdf

 Do you recall if you handled the underflow problem in your implementation?

I believe it does not.

 I haven't studied the code yet, but it seems like this could be the culprit.

I think you're right.  You should implement it!

-- William


 --
 You received this message because you are subscribed to the Google Groups
 sage-support group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to sage-support+unsubscr...@googlegroups.com.
 To post to this group, send email to sage-support@googlegroups.com.
 Visit this group at http://groups.google.com/group/sage-support.
 For more options, visit https://groups.google.com/d/optout.



-- 
William Stein
Professor of Mathematics
University of Washington
http://wstein.org

-- 
You received this message because you are subscribed to the Google Groups 
sage-support group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sage-support+unsubscr...@googlegroups.com.
To post to this group, send email to sage-support@googlegroups.com.
Visit this group at http://groups.google.com/group/sage-support.
For more options, visit https://groups.google.com/d/optout.

Re: [sage-support] possible bug in DiscreteHiddenMarkovModel - NaN produced in output matrices

Re: [sage-support] possible bug in DiscreteHiddenMarkovModel - NaN produced in output matrices

[sage-support] possible bug in DiscreteHiddenMarkovModel - NaN produced in output matrices

Re: [sage-support] possible bug in DiscreteHiddenMarkovModel - NaN produced in output matrices

Re: [sage-support] possible bug in DiscreteHiddenMarkovModel - NaN produced in output matrices

Re: [sage-support] possible bug in DiscreteHiddenMarkovModel - NaN produced in output matrices

6 matches

Site Navigation

Mail list logo

Footer information