Eugene Kirpichov wrote:
Do there exist other nontrivial higher-order algorithms and datastructures?
Is the field of higher-order algorithms indeed as unexplored as it seems?

Many algorithms in natural language processing can be captured by higher-order algorithms parameterized by the choice of semiring (or module space).

For example, consider the inference problem for hidden Markov models (which are often used for things like determining the part of speech tags for some sentence in natural language). To figure out the total probability that the HMM is in some state at some time, you use the Forward algorithm.[1] To figure out the probability of the most likely state sequence that has a specific state at some time, you use the Viterbi algorithm. To figure out not only the probability of the most likely state sequence but also what that tag sequence actually is, you can modify Viterbi to store back pointers.

All of these are the same algorithm, just with different (augmented) semirings. In order to prevent underflow for very small probabilities, we usually run these algorithms with probabilities in the log-domain. Those variants are also the same algorithm, just taking the image of the semiring under the logarithm functor:

Forward     : FW ([0,1], +, 0, *, 1)

Log Forward : FW ([-Inf,0], <+>, -Inf, +, 0)
    where
    -- Ignoring infinities...
    x <+> y | x >= y    = x + log (1 + exp (y-x))
            | otherwise = y + log (1 + exp (x-y))

Viterbi     : FW ([0,1], max, 0, *, 1)

Log Viterbi : FW ([-Inf,0], max, -Inf, +, 0)

ViterbiBP Q : FW (Maybe([0,1],Maybe Q), argmax, Nothing, <*>, Just(1,Nothing))
    where
    -- Q = the type of the states in your HMM
    mx <*> my = do
        (px,x) <- mx
        (py,y) <- my
        return (px*py, y `mappend` x)


Log (ViterbiBP Q)
    : FW ( Maybe([-Inf,0],Maybe Q)
         , argmax, Nothing
         , <+>, Just(0,Nothing))
    where
    mx <+> my = do
        (px,x) <- mx
        (py,y) <- my
        return (px+py, y `mappend` x)

Using augmented semirings we can simplify the backpointer version significantly in order to incorporate the optimizations usually encountered in practice. That is, the Maybes are required to make it a semiring, but we can optimize both of them away in practice, yielding an augmented semiring over (Prob,Q) or (Log Prob, Q).

We get the same sort of thing for variants of the Backward algorithm used in the Forward--Backward algorithm. Of course, there's nothing special about HMMs here. We can extend the Forward--Backward algorithm to operate over tree structures instead of just list structures. That version is called the Inside--Outside algorithm. And semirings show up all over the place in other algorithms too.

Of course, in hindsight this makes perfect sense: the powerset of the free semiring over S is the set of all (automata theoretic) languages over S. So semirings capture languages exactly; in the same way that commutative monoids capture multisets, and monoids capture sequences. This insight also extends to cover things like weighted-logic programming languages, since we can use any semiring we like, not just the Boolean probability semiring. Automata theoretic languages are everywhere.


[1] Or you combine the Forward and Backward algorithms, depending on what exactly you want. Same goes for the others.

--
Live well,
~wren
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to