Re: [sage-support] possible bug in DiscreteHiddenMarkovModel - NaN produced in output matrices
On Thursday, May 8, 2014 5:59:10 PM UTC-7, William wrote: Do you recall if you handled the underflow problem in your implementation? I believe it does not. I haven't studied the code yet, but it seems like this could be the culprit. I think you're right. You should implement it! I had a look at the code an it appears that it *is* already handling the underflow problem. The scaling factors are computed in _forward_scale_all() and used in both _forward_scale_all() and _backward_scale_all(). Also _viterbi_scale() is using log probabilities to avoid underflow in products of small probabilities. So I need to dig deeper. btw I am new to both sage and cython. I am eager to find the cause and fix this though. So here's my question: If I make a change to hmm.pyx, how do I get sage to pick up that change without having to rebuild all of sage from source? (that took a few hours). I read here that I can attach a .pyx file which should force a cython recompilation of hte file whenever the .pyx file is changed. Is that right? http://www.sagemath.org/doc/developer/coding_in_cython.html#attaching-or-loading-spyx-files I tried that and got a syntax error: sage: attach /home/jhersch/bin/sage-6.2/src/sage/stats/hmm/hmm.pyx File ipython-input-4-162f4bbc7027, line 1 attach /home/jhersch/bin/sage-6.2/src/sage/stats/hmm/hmm.pyx ^ SyntaxError: invalid syntax What is the usual way sage developers go about making changes in cython code without rebuilding everything? Thanks! Jesse -- You received this message because you are subscribed to the Google Groups sage-support group. To unsubscribe from this group and stop receiving emails from it, send an email to sage-support+unsubscr...@googlegroups.com. To post to this group, send email to sage-support@googlegroups.com. Visit this group at http://groups.google.com/group/sage-support. For more options, visit https://groups.google.com/d/optout.
Re: [sage-support] possible bug in DiscreteHiddenMarkovModel - NaN produced in output matrices
On Fri, May 9, 2014 at 12:12 PM, Jesse Hersch jesseher...@fastmail.fm wrote: On Thursday, May 8, 2014 5:59:10 PM UTC-7, William wrote: Do you recall if you handled the underflow problem in your implementation? I believe it does not. I haven't studied the code yet, but it seems like this could be the culprit. I think you're right. You should implement it! I had a look at the code an it appears that it is already handling the underflow problem. The scaling factors are computed in _forward_scale_all() and used in both _forward_scale_all() and _backward_scale_all(). Also _viterbi_scale() is using log probabilities to avoid underflow in products of small probabilities. So I need to dig deeper. btw I am new to both sage and cython. I am eager to find the cause and fix this though. So here's my question: If I make a change to hmm.pyx, how do I get sage to pick up that change without having to rebuild all of sage from source? (that took a few hours). ./sage -br, as Lief said. By the way, 10 minutes ago I just gave a very, very basic lecture on Cython, which will appear here shortly: http://youtu.be/YrO89QIizxI I read here that I can attach a .pyx file which should force a cython recompilation of hte file whenever the .pyx file is changed. Is that right? http://www.sagemath.org/doc/developer/coding_in_cython.html#attaching-or-loading-spyx-files I tried that and got a syntax error: sage: attach /home/jhersch/bin/sage-6.2/src/sage/stats/hmm/hmm.pyx File ipython-input-4-162f4bbc7027, line 1 attach /home/jhersch/bin/sage-6.2/src/sage/stats/hmm/hmm.pyx ^ SyntaxError: invalid syntax What is the usual way sage developers go about making changes in cython code without rebuilding everything? Thanks! Jesse -- You received this message because you are subscribed to the Google Groups sage-support group. To unsubscribe from this group and stop receiving emails from it, send an email to sage-support+unsubscr...@googlegroups.com. To post to this group, send email to sage-support@googlegroups.com. Visit this group at http://groups.google.com/group/sage-support. For more options, visit https://groups.google.com/d/optout. -- William Stein Professor of Mathematics University of Washington http://wstein.org -- You received this message because you are subscribed to the Google Groups sage-support group. To unsubscribe from this group and stop receiving emails from it, send an email to sage-support+unsubscr...@googlegroups.com. To post to this group, send email to sage-support@googlegroups.com. Visit this group at http://groups.google.com/group/sage-support. For more options, visit https://groups.google.com/d/optout.
[sage-support] possible bug in DiscreteHiddenMarkovModel - NaN produced in output matrices
Hi there, I think I may have found a bug in the class hmm.DiscreteHiddenMarkovModel. The repro is below. It probably has something to do with one emission value being much more common than the others, but that shouldn't be invalid from my understanding of HMMs. I am running Sage Version 6.2 on Linux (CentOS). I built it from source yesterday. I am a sage newbie! Why am I reporting the bug here? Because the report a problem link in the sage notebook points here: http://ask.sagemath.org/questions/ but I cannot post there because of being a new user (karma 10) That page says to use this list instead. :) *repro:* print version() # here are two emisison sequences. each observable has 4 possible values: 0-3. # 1 is much more common then 0,2,3 obviously sequences = [ [1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 3, 3, 1, 1, 3, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 3, 3, 2, 3, 1, 3, 1, 3, 1, 3, 3, 3, 1, 1, 3, 3, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1]] transitions = [[0.2, 0.8], [0.2, 0.8]] pi = [.4, .6] b = [[.1, .7, .1, .1], [.1, .7, .1, .1]] model = hmm.DiscreteHiddenMarkovModel(A=transitions, B=b, pi=pi, emission_symbols=None, normalize=True) print 'initial state for hmm:\n', model # training on the first sequence goes ok. # but after the second sequence, all elements of the transition, emission, and pi matrices are NaN. for i, seq in enumerate(sequences): print '\nbaum_welch on sequence ', i model.baum_welch(obs=seq, max_iter=1000) print model *And here is the output. see the many NaN in the final model* Sage Version 6.2, Release Date: 2014-05-06 initial state for hmm: Discrete Hidden Markov Model with 2 States and 4 Emissions Transition matrix: [0.2 0.8] [0.2 0.8] Emission matrix: [0.1 0.7 0.1 0.1] [0.1 0.7 0.1 0.1] Initial probabilities: [0.4000, 0.6000] baum_welch on sequence 0 (-18.660162393780404, 128) Discrete Hidden Markov Model with 2 States and 4 Emissions Transition matrix: [0.195469702114 0.804530297886] [0.197500250574 0.802499749426] Emission matrix: [0.0001956779127210.999217288349 0.0 0.000587033738163] [ 0.01363219259310.945471229628 0.0 0.040896594] Initial probabilities: [0.9812, 0.0188] baum_welch on sequence 1 (nan, 1000) Discrete Hidden Markov Model with 2 States and 4 Emissions Transition matrix: [NaN NaN] [NaN NaN] Emission matrix: [NaN NaN NaN NaN] [NaN NaN NaN NaN] Initial probabilities: [nan, nan] -- You received this message because you are subscribed to the Google Groups sage-support group. To unsubscribe from this group and stop receiving emails from it, send an email to sage-support+unsubscr...@googlegroups.com. To post to this group, send email to sage-support@googlegroups.com. Visit this group at http://groups.google.com/group/sage-support. For more options, visit https://groups.google.com/d/optout.
Re: [sage-support] possible bug in DiscreteHiddenMarkovModel - NaN produced in output matrices
On Thu, May 8, 2014 at 3:50 PM, Jesse Hersch jesseher...@fastmail.fm wrote: Hi there, I think I may have found a bug in the class hmm.DiscreteHiddenMarkovModel. The repro is below. It probably has something to do with one emission value being much more common than the others, but that shouldn't be invalid from my understanding of HMMs. I could be wrong, but I don't think the implementation of Baum-Welch is wrong. The BM algorithm [1] using double precision numbers (which is all the HMM algorithm in Sage uses) can lead to overflow, given the sort of computations that are involved. [1] http://en.wikipedia.org/wiki/Baum%E2%80%93Welch_algorithm You can see the Sage implementation of Baum-Welch by typing model.baum_welch?? after running your code below, or visiting this link: https://github.com/sagemath/sage/blob/master/src/sage/stats/hmm/hmm.pyx The entire implementation starting around line 1250 is only about 1-2 pages, and a straightforward translation of the standard thing. -- William I am running Sage Version 6.2 on Linux (CentOS). I built it from source yesterday. I am a sage newbie! Why am I reporting the bug here? Because the report a problem link in the sage notebook points here: http://ask.sagemath.org/questions/ but I cannot post there because of being a new user (karma 10) That page says to use this list instead. :) repro: print version() # here are two emisison sequences. each observable has 4 possible values: 0-3. # 1 is much more common then 0,2,3 obviously sequences = [ [1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 3, 3, 1, 1, 3, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 3, 3, 2, 3, 1, 3, 1, 3, 1, 3, 3, 3, 1, 1, 3, 3, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1]] transitions = [[0.2, 0.8], [0.2, 0.8]] pi = [.4, .6] b = [[.1, .7, .1, .1], [.1, .7, .1, .1]] model = hmm.DiscreteHiddenMarkovModel(A=transitions, B=b, pi=pi, emission_symbols=None, normalize=True) print 'initial state for hmm:\n', model # training on the first sequence goes ok. # but after the second sequence, all elements of the transition, emission, and pi matrices are NaN. for i, seq in enumerate(sequences): print '\nbaum_welch on sequence ', i model.baum_welch(obs=seq, max_iter=1000) print model And here is the output. see the many NaN in the final model Sage Version 6.2, Release Date: 2014-05-06 initial state for hmm: Discrete Hidden Markov Model with 2 States and 4 Emissions Transition matrix: [0.2 0.8] [0.2 0.8] Emission matrix: [0.1 0.7 0.1 0.1] [0.1 0.7 0.1 0.1] Initial probabilities: [0.4000, 0.6000] baum_welch on sequence 0 (-18.660162393780404, 128) Discrete Hidden Markov Model with 2 States and 4 Emissions Transition matrix: [0.195469702114 0.804530297886] [0.197500250574 0.802499749426] Emission matrix: [0.0001956779127210.999217288349 0.0 0.000587033738163] [ 0.01363219259310.945471229628 0.0 0.040896594] Initial probabilities: [0.9812, 0.0188] baum_welch on sequence 1 (nan, 1000) Discrete Hidden Markov Model with 2 States and 4 Emissions Transition matrix: [NaN NaN] [NaN NaN] Emission matrix: [NaN NaN NaN NaN] [NaN NaN NaN NaN] Initial probabilities: [nan, nan] -- You received this message because you are subscribed to the Google Groups sage-support group. To unsubscribe from this group and stop receiving emails from it, send an email to sage-support+unsubscr...@googlegroups.com. To post to this group, send email to sage-support@googlegroups.com. Visit this group at http://groups.google.com/group/sage-support. For more options, visit https://groups.google.com/d/optout. -- William Stein Professor of Mathematics University of Washington http://wstein.org -- You received this message because you are subscribed to the Google Groups sage-support group. To unsubscribe from this group and stop receiving emails from it, send an email to sage-support+unsubscr...@googlegroups.com. To post to this group, send email to sage-support@googlegroups.com. Visit this group at http://groups.google.com/group/sage-support. For more options, visit https://groups.google.com/d/optout.
Re: [sage-support] possible bug in DiscreteHiddenMarkovModel - NaN produced in output matrices
On Thursday, May 8, 2014 4:14:32 PM UTC-7, William wrote: I could be wrong, but I don't think the implementation of Baum-Welch is wrong. The BM algorithm [1] using double precision numbers (which is all the HMM algorithm in Sage uses) can lead to overflow, given the sort of computations that are involved. Thanks for the reply! My understanding is that it's underflow that's more common with HMM stuff, due to all the products of small probabilities running around. In some implementations I've seen this handled by the logsumexp trick: http://machineintelligence.tumblr.com/post/4998477107/the-log-sum-exp-trick Also in the Rabiner tutorial there's a section on scaling where he talks about underflow and how to handle it. that's on page 16 (272) here: http://people.sabanciuniv.edu/berrin/cs512/reading/rabiner-tutorial-on-hmm.pdf Do you recall if you handled the underflow problem in your implementation? I haven't studied the code yet, but it seems like this could be the culprit. -- You received this message because you are subscribed to the Google Groups sage-support group. To unsubscribe from this group and stop receiving emails from it, send an email to sage-support+unsubscr...@googlegroups.com. To post to this group, send email to sage-support@googlegroups.com. Visit this group at http://groups.google.com/group/sage-support. For more options, visit https://groups.google.com/d/optout.
Re: [sage-support] possible bug in DiscreteHiddenMarkovModel - NaN produced in output matrices
On Thu, May 8, 2014 at 4:57 PM, Jesse Hersch jesseher...@fastmail.fm wrote: On Thursday, May 8, 2014 4:14:32 PM UTC-7, William wrote: I could be wrong, but I don't think the implementation of Baum-Welch is wrong. The BM algorithm [1] using double precision numbers (which is all the HMM algorithm in Sage uses) can lead to overflow, given the sort of computations that are involved. Thanks for the reply! My understanding is that it's underflow that's more common with HMM stuff, due to all the products of small probabilities running around. In some implementations I've seen this handled by the logsumexp trick: http://machineintelligence.tumblr.com/post/4998477107/the-log-sum-exp-trick Also in the Rabiner tutorial there's a section on scaling where he talks about underflow and how to handle it. that's on page 16 (272) here: http://people.sabanciuniv.edu/berrin/cs512/reading/rabiner-tutorial-on-hmm.pdf Do you recall if you handled the underflow problem in your implementation? I believe it does not. I haven't studied the code yet, but it seems like this could be the culprit. I think you're right. You should implement it! -- William -- You received this message because you are subscribed to the Google Groups sage-support group. To unsubscribe from this group and stop receiving emails from it, send an email to sage-support+unsubscr...@googlegroups.com. To post to this group, send email to sage-support@googlegroups.com. Visit this group at http://groups.google.com/group/sage-support. For more options, visit https://groups.google.com/d/optout. -- William Stein Professor of Mathematics University of Washington http://wstein.org -- You received this message because you are subscribed to the Google Groups sage-support group. To unsubscribe from this group and stop receiving emails from it, send an email to sage-support+unsubscr...@googlegroups.com. To post to this group, send email to sage-support@googlegroups.com. Visit this group at http://groups.google.com/group/sage-support. For more options, visit https://groups.google.com/d/optout.