doing the Viterbi algorithm with Dijkstra's algorithm

Kragen Javier Sitaker Thu, 24 Jun 2010 00:37:05 -0700

Given infinite memory, you should be able to duplicate the results of
the Viterbi algorithm using Dijkstra’s algorithm. In many cases this
should be more efficient. But it also isn’t subject to the same
limitations as the Viterbi algorithm. In particular, it can handle
insertion and deletion “errors” as well as “substitution errors”.


This also suggests you should be able to handle the
longest-common-subsequence problem with Dijkstra’s algorithm as
well. But I’m not clear that it is necessarily faster to do so.

I haven’t tried either of these, and I don’t really understand much
about this stuff, but here’s the idea.

The Viterbi algorithm
---------------------

The Viterbi algorithm calculates the most probable sequence of hidden
states in a hidden Markov model to have produced some observed output
symbols from the Markov process. It works by calculating, at each time
step, the most probable path to reach each possible hidden state. It’s
a simple dynamic programming algorithm: it examines all the possible
state transitions from the previous state and for each one multiplies
the a priori probability of that transition, transitions (specified by
the HMM), the probabilities of the states they come from (calculated
in the previous step), and the probability that that state produced
the observed output symbol (specified by the HMM).

Since the probabilities are only ever multiplied, this can be done by
adding logarithms of probabilities instead of multiplying them
directly, which allows the Viterbi algorithm to scale to long
sequences.

You can handle the problem of reading a text with substitution errors
by treating the letters of the underlying text as the “hidden
states”. Each letter has some probability of being seen as each
letter; typically it is very likely to be seen as itself, and much
less likely to be seen as other letters.

Dijkstra’s algorithm
--------------------

Dijkstra’s algorithm finds the shortest path from some origin node to
all nodes in a weighted node-link graph by growing a spanning tree of
that graph starting from the origin node, at each step adding the node
closest to the origin node out of all the nodes not yet added to the
tree. If you only care about the paths to one node or a few nodes, you
can stop running the algorithm once the tree reaches them.

The mapping
-----------

The hidden states are the nodes of the tree; the edge weights are, I
think, given by the sum of the log-probability of the transition in
the underlying Markov model and of the observed output symbol
appearing given that state. (Maybe I should say log-improbability: the
log-probability itself is negative, and this is its absolute value.)

This way, Dijkstra’s algorithm finds the same path that the Viterbi
algorithm would. However, my intuition is that, since the edge weights
differ greatly from each other, it will usually be able to find that
path without considering most of the nodes in the graph. This should
enable the use of underlying Markov models with larger sets of states.

The cost, however, is that distances and paths to all the past hidden
Markov states need to be retained (at least until all of their
possible successor states have been explored), while the Viterbi
algorithm can retain only the probability and path for each of the
possible states for the previous timestep.

Maybe more importantly, for domains where there’s no reliable framing
(e.g. OCR of proportional fonts or speech recognition), it should
allow the consideration of insertions and deletions, avoiding the
Viterbi algorithm’s lockstep approach. An insertion error is simply a
transition from a state at time T-2 to T, with the log-improbability
of the observed symbol at T-1 being inserted added to the normal edge
weight. I’m not sure exactly how to account for deletions: there’s an
extra set of nodes associated with the deleted symbol, with
transitions from the T-1 nodes to the deleted-symbol nodes and from
them to the nodes at time T, but what probability should be assigned
to those transitions?

-- 
To unsubscribe: http://lists.canonical.org/mailman/listinfo/kragen-tol

doing the Viterbi algorithm with Dijkstra's algorithm

Reply via email to