Re: [sage-support] possible bug in DiscreteHiddenMarkovModel - NaN produced in output matrices

2014-05-09 Thread Jesse Hersch
On Thursday, May 8, 2014 5:59:10 PM UTC-7, William wrote:

  Do you recall if you handled the underflow problem in your 
 implementation? 

 I believe it does not. 

  I haven't studied the code yet, but it seems like this could be the 
 culprit. 

 I think you're right.  You should implement it! 


I had a look at the code an it appears that it *is* already handling the 
underflow problem.  

The scaling factors are computed in _forward_scale_all() and used in both 
_forward_scale_all() and _backward_scale_all().  Also _viterbi_scale() is 
using log probabilities to avoid underflow in products of small 
probabilities.

So I need to dig deeper.  btw I am new to both sage and cython.  I am eager 
to find the cause and fix this though.  So here's my question:

If I make a change to hmm.pyx, how do I get sage to pick up that change 
without having to rebuild all of sage from source? (that took a few hours). 

I read here that I can attach a .pyx file which should force a cython 
recompilation of hte file whenever the .pyx file is changed.  Is that 
right? 
http://www.sagemath.org/doc/developer/coding_in_cython.html#attaching-or-loading-spyx-files

I tried that and got a syntax error:

sage: attach /home/jhersch/bin/sage-6.2/src/sage/stats/hmm/hmm.pyx
  File ipython-input-4-162f4bbc7027, line 1
attach /home/jhersch/bin/sage-6.2/src/sage/stats/hmm/hmm.pyx
 ^
SyntaxError: invalid syntax


What is the usual way sage developers go about making changes in cython 
code without rebuilding everything?

Thanks!

Jesse

-- 
You received this message because you are subscribed to the Google Groups 
sage-support group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sage-support+unsubscr...@googlegroups.com.
To post to this group, send email to sage-support@googlegroups.com.
Visit this group at http://groups.google.com/group/sage-support.
For more options, visit https://groups.google.com/d/optout.


Re: [sage-support] possible bug in DiscreteHiddenMarkovModel - NaN produced in output matrices

2014-05-09 Thread William Stein
On Fri, May 9, 2014 at 12:12 PM, Jesse Hersch jesseher...@fastmail.fm wrote:
 On Thursday, May 8, 2014 5:59:10 PM UTC-7, William wrote:

  Do you recall if you handled the underflow problem in your
  implementation?

 I believe it does not.

  I haven't studied the code yet, but it seems like this could be the
  culprit.

 I think you're right.  You should implement it!


 I had a look at the code an it appears that it is already handling the
 underflow problem.

 The scaling factors are computed in _forward_scale_all() and used in both
 _forward_scale_all() and _backward_scale_all().  Also _viterbi_scale() is
 using log probabilities to avoid underflow in products of small
 probabilities.

 So I need to dig deeper.  btw I am new to both sage and cython.  I am eager
 to find the cause and fix this though.  So here's my question:

 If I make a change to hmm.pyx, how do I get sage to pick up that change
 without having to rebuild all of sage from source? (that took a few hours).

./sage -br, as Lief said.

By the way, 10 minutes ago I just gave a very, very basic lecture on
Cython, which will appear here shortly: http://youtu.be/YrO89QIizxI


 I read here that I can attach a .pyx file which should force a cython
 recompilation of hte file whenever the .pyx file is changed.  Is that right?
 http://www.sagemath.org/doc/developer/coding_in_cython.html#attaching-or-loading-spyx-files

 I tried that and got a syntax error:

 sage: attach /home/jhersch/bin/sage-6.2/src/sage/stats/hmm/hmm.pyx
   File ipython-input-4-162f4bbc7027, line 1
 attach /home/jhersch/bin/sage-6.2/src/sage/stats/hmm/hmm.pyx
  ^
 SyntaxError: invalid syntax


 What is the usual way sage developers go about making changes in cython code
 without rebuilding everything?

 Thanks!

 Jesse

 --
 You received this message because you are subscribed to the Google Groups
 sage-support group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to sage-support+unsubscr...@googlegroups.com.
 To post to this group, send email to sage-support@googlegroups.com.
 Visit this group at http://groups.google.com/group/sage-support.
 For more options, visit https://groups.google.com/d/optout.



-- 
William Stein
Professor of Mathematics
University of Washington
http://wstein.org

-- 
You received this message because you are subscribed to the Google Groups 
sage-support group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sage-support+unsubscr...@googlegroups.com.
To post to this group, send email to sage-support@googlegroups.com.
Visit this group at http://groups.google.com/group/sage-support.
For more options, visit https://groups.google.com/d/optout.


[sage-support] possible bug in DiscreteHiddenMarkovModel - NaN produced in output matrices

2014-05-08 Thread Jesse Hersch
Hi there, 

I think I may have found a bug in the class hmm.DiscreteHiddenMarkovModel. 
 The repro is below.  It probably has something to do with one emission 
value being much more common than the others, but that shouldn't be invalid 
from my understanding of HMMs.

I am running Sage Version 6.2 on Linux (CentOS).  I built it from source 
yesterday.  I am a sage newbie!  

Why am I reporting the bug here?  Because the report a problem link in 
the sage notebook points here: http://ask.sagemath.org/questions/ but I 
cannot post there because of being a new user (karma  10)  That page says 
to use this list instead.  :) 

*repro:*

print version()

# here are two emisison sequences.  each observable has 4 possible values: 
0-3.
# 1 is much more common then 0,2,3 obviously
sequences = [
[1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 3, 1, 1,
 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 3, 3, 1, 1, 3, 1, 1, 1, 1, 1, 
1, 3, 1, 1, 1, 1, 1, 3, 3, 2, 3, 1, 3, 1,
 3, 1, 3, 3, 3, 1, 1, 3, 3, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1]]

transitions = [[0.2, 0.8], [0.2, 0.8]]
pi = [.4, .6]
b = [[.1, .7, .1, .1], [.1, .7, .1, .1]]
model = hmm.DiscreteHiddenMarkovModel(A=transitions, B=b, pi=pi, 
emission_symbols=None, normalize=True)

print 'initial state for hmm:\n', model

# training on the first sequence goes ok.
# but after the second sequence, all elements of the transition, emission, 
and pi matrices are NaN.
for i, seq in enumerate(sequences):
print '\nbaum_welch on sequence ', i
model.baum_welch(obs=seq, max_iter=1000)
print model


*And here is the output.  see the many NaN in the final model*

Sage Version 6.2, Release Date: 2014-05-06
initial state for hmm:
Discrete Hidden Markov Model with 2 States and 4 Emissions
Transition matrix:
[0.2 0.8]
[0.2 0.8]
Emission matrix:
[0.1 0.7 0.1 0.1]
[0.1 0.7 0.1 0.1]
Initial probabilities: [0.4000, 0.6000]

baum_welch on sequence  0
(-18.660162393780404, 128)
Discrete Hidden Markov Model with 2 States and 4 Emissions
Transition matrix:
[0.195469702114 0.804530297886]
[0.197500250574 0.802499749426]
Emission matrix:
[0.0001956779127210.999217288349   0.0 0.000587033738163]
[  0.01363219259310.945471229628   0.0   0.040896594]
Initial probabilities: [0.9812, 0.0188]

baum_welch on sequence  1
(nan, 1000)
Discrete Hidden Markov Model with 2 States and 4 Emissions
Transition matrix:
[NaN NaN]
[NaN NaN]
Emission matrix:
[NaN NaN NaN NaN]
[NaN NaN NaN NaN]
Initial probabilities: [nan, nan]

-- 
You received this message because you are subscribed to the Google Groups 
sage-support group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sage-support+unsubscr...@googlegroups.com.
To post to this group, send email to sage-support@googlegroups.com.
Visit this group at http://groups.google.com/group/sage-support.
For more options, visit https://groups.google.com/d/optout.


Re: [sage-support] possible bug in DiscreteHiddenMarkovModel - NaN produced in output matrices

2014-05-08 Thread William Stein
On Thu, May 8, 2014 at 3:50 PM, Jesse Hersch jesseher...@fastmail.fm wrote:
 Hi there,

 I think I may have found a bug in the class hmm.DiscreteHiddenMarkovModel.
 The repro is below.  It probably has something to do with one emission value
 being much more common than the others, but that shouldn't be invalid from
 my understanding of HMMs.

I could be wrong, but I don't think the implementation of Baum-Welch
is wrong.  The BM algorithm [1] using double precision numbers (which
is all the HMM algorithm in Sage uses) can lead to overflow, given the
sort of computations that are involved.

[1] http://en.wikipedia.org/wiki/Baum%E2%80%93Welch_algorithm

You can see the Sage implementation of Baum-Welch by typing

   model.baum_welch??

after running your code below, or visiting this link:

  https://github.com/sagemath/sage/blob/master/src/sage/stats/hmm/hmm.pyx

The entire implementation starting around line 1250 is only about 1-2
pages, and a straightforward translation of the standard thing.

 -- William


 I am running Sage Version 6.2 on Linux (CentOS).  I built it from source
 yesterday.  I am a sage newbie!

 Why am I reporting the bug here?  Because the report a problem link in the
 sage notebook points here: http://ask.sagemath.org/questions/ but I cannot
 post there because of being a new user (karma  10)  That page says to use
 this list instead.  :)

 repro:

 print version()

 # here are two emisison sequences.  each observable has 4 possible values:
 0-3.
 # 1 is much more common then 0,2,3 obviously
 sequences = [
 [1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
  1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 3, 1, 1,
  1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
 [1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 3, 3, 1, 1, 3, 1, 1, 1, 1, 1, 1,
 3, 1, 1, 1, 1, 1, 3, 3, 2, 3, 1, 3, 1,
  3, 1, 3, 3, 3, 1, 1, 3, 3, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
  1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1]]

 transitions = [[0.2, 0.8], [0.2, 0.8]]
 pi = [.4, .6]
 b = [[.1, .7, .1, .1], [.1, .7, .1, .1]]
 model = hmm.DiscreteHiddenMarkovModel(A=transitions, B=b, pi=pi,
 emission_symbols=None, normalize=True)

 print 'initial state for hmm:\n', model

 # training on the first sequence goes ok.
 # but after the second sequence, all elements of the transition, emission,
 and pi matrices are NaN.
 for i, seq in enumerate(sequences):
 print '\nbaum_welch on sequence ', i
 model.baum_welch(obs=seq, max_iter=1000)
 print model


 And here is the output.  see the many NaN in the final model

 Sage Version 6.2, Release Date: 2014-05-06
 initial state for hmm:
 Discrete Hidden Markov Model with 2 States and 4 Emissions
 Transition matrix:
 [0.2 0.8]
 [0.2 0.8]
 Emission matrix:
 [0.1 0.7 0.1 0.1]
 [0.1 0.7 0.1 0.1]
 Initial probabilities: [0.4000, 0.6000]

 baum_welch on sequence  0
 (-18.660162393780404, 128)
 Discrete Hidden Markov Model with 2 States and 4 Emissions
 Transition matrix:
 [0.195469702114 0.804530297886]
 [0.197500250574 0.802499749426]
 Emission matrix:
 [0.0001956779127210.999217288349   0.0 0.000587033738163]
 [  0.01363219259310.945471229628   0.0   0.040896594]
 Initial probabilities: [0.9812, 0.0188]

 baum_welch on sequence  1
 (nan, 1000)
 Discrete Hidden Markov Model with 2 States and 4 Emissions
 Transition matrix:
 [NaN NaN]
 [NaN NaN]
 Emission matrix:
 [NaN NaN NaN NaN]
 [NaN NaN NaN NaN]
 Initial probabilities: [nan, nan]

 --
 You received this message because you are subscribed to the Google Groups
 sage-support group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to sage-support+unsubscr...@googlegroups.com.
 To post to this group, send email to sage-support@googlegroups.com.
 Visit this group at http://groups.google.com/group/sage-support.
 For more options, visit https://groups.google.com/d/optout.



-- 
William Stein
Professor of Mathematics
University of Washington
http://wstein.org

-- 
You received this message because you are subscribed to the Google Groups 
sage-support group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sage-support+unsubscr...@googlegroups.com.
To post to this group, send email to sage-support@googlegroups.com.
Visit this group at http://groups.google.com/group/sage-support.
For more options, visit https://groups.google.com/d/optout.


Re: [sage-support] possible bug in DiscreteHiddenMarkovModel - NaN produced in output matrices

2014-05-08 Thread Jesse Hersch
On Thursday, May 8, 2014 4:14:32 PM UTC-7, William wrote:

 I could be wrong, but I don't think the implementation of Baum-Welch 
 is wrong.  The BM algorithm [1] using double precision numbers (which 
 is all the HMM algorithm in Sage uses) can lead to overflow, given the 
 sort of computations that are involved. 


Thanks for the reply!  

My understanding is that it's underflow that's more common with HMM stuff, 
due to all the products of small probabilities running around.  In some 
implementations I've seen this handled by the logsumexp 
trick: 
http://machineintelligence.tumblr.com/post/4998477107/the-log-sum-exp-trick

Also in the Rabiner tutorial there's a section on scaling where he talks 
about underflow and how to handle it.  that's on page 16 (272) 
here: 
http://people.sabanciuniv.edu/berrin/cs512/reading/rabiner-tutorial-on-hmm.pdf

Do you recall if you handled the underflow problem in your implementation? 
 I haven't studied the code yet, but it seems like this could be the 
culprit.

-- 
You received this message because you are subscribed to the Google Groups 
sage-support group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sage-support+unsubscr...@googlegroups.com.
To post to this group, send email to sage-support@googlegroups.com.
Visit this group at http://groups.google.com/group/sage-support.
For more options, visit https://groups.google.com/d/optout.


Re: [sage-support] possible bug in DiscreteHiddenMarkovModel - NaN produced in output matrices

2014-05-08 Thread William Stein
On Thu, May 8, 2014 at 4:57 PM, Jesse Hersch jesseher...@fastmail.fm wrote:
 On Thursday, May 8, 2014 4:14:32 PM UTC-7, William wrote:

 I could be wrong, but I don't think the implementation of Baum-Welch
 is wrong.  The BM algorithm [1] using double precision numbers (which
 is all the HMM algorithm in Sage uses) can lead to overflow, given the
 sort of computations that are involved.


 Thanks for the reply!

 My understanding is that it's underflow that's more common with HMM stuff,
 due to all the products of small probabilities running around.  In some
 implementations I've seen this handled by the logsumexp trick:
 http://machineintelligence.tumblr.com/post/4998477107/the-log-sum-exp-trick

 Also in the Rabiner tutorial there's a section on scaling where he talks
 about underflow and how to handle it.  that's on page 16 (272) here:
 http://people.sabanciuniv.edu/berrin/cs512/reading/rabiner-tutorial-on-hmm.pdf

 Do you recall if you handled the underflow problem in your implementation?

I believe it does not.

 I haven't studied the code yet, but it seems like this could be the culprit.

I think you're right.  You should implement it!

-- William


 --
 You received this message because you are subscribed to the Google Groups
 sage-support group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to sage-support+unsubscr...@googlegroups.com.
 To post to this group, send email to sage-support@googlegroups.com.
 Visit this group at http://groups.google.com/group/sage-support.
 For more options, visit https://groups.google.com/d/optout.



-- 
William Stein
Professor of Mathematics
University of Washington
http://wstein.org

-- 
You received this message because you are subscribed to the Google Groups 
sage-support group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sage-support+unsubscr...@googlegroups.com.
To post to this group, send email to sage-support@googlegroups.com.
Visit this group at http://groups.google.com/group/sage-support.
For more options, visit https://groups.google.com/d/optout.