I've been struggling to create a monadic parser.  I'm parsing XML
documents into a DOM tree.  Step one is to use a SAX parser to create
a stream of XML tokens; each a struct map.  The token types are
:start-element, :attribute, :text and :end-element.

Next I'm feeding this list of tokens into a monadic parser I've
created following
http://intensivesystems.net/tutorials/monads_101.html. The output
should be a DOM tree structure; element nodes will contain an
:attributes key (containing attribute tokens) and a :body key
containing child text and elements nodes.

I've had to make some minor changes to the tutorial's parser-m monad,
to allow for the fact that I'm parsing tokens, not character strings.

As I parse :attribute elements, I take the original element node and
add the attribute token to the :attributes list inside the element
node struct, creating the vector as necessary.

I have this working for the simple case, where I only allow for
at-most one attribute token.

(defmonad parser-m
          [m-result (fn [x]
                        (fn [tokens]
                            (list x tokens)))

           m-bind (fn [parser action]
                      (fn [tokens]
                          (let [result (parser tokens)]
                               (when-not (nil? result)
                                         ((action (first result))
(second result))))))

           m-zero (fn [tokens]
                      nil)

           m-plus (fn [& parsers]
                      (fn [tokens]
                          (first
                            (drop-while nil?
                                        (map #(% tokens) parsers)))))])

; And some actions and parser generators

(defn any-token
  "Fundamental parser action: returns [first, rest] if tokens is not
empty, nil otherwise."
  [tokens]
  (if (empty? tokens)
      nil
      ; This is what actually "consumes" the tokens seq
      (list (first tokens) (rest tokens))))

; Utilities that will likely move elsewhere

(defn add-to-key-list
  "Updates the map adding the value to the list stored in the indicated key."
  [map key value]
  (update-in map [key] #(conj (or % []) value)))

(with-monad
  parser-m

  (defn token-test
    "Parser factory using a predicate. When a token matches the
predicate, it becomes
  the new result."
    [pred]
    (domonad
      [t any-token :when (pred t)]
      ; return the matched token
      t))

  (defn match-type
    "Parser factory that matches a particular token type (making the matched
token the result), or returns nil."
    [type]
    (token-test #(= (% :type) type)))

  (defn optional [parser]
    (m-plus parser (m-result nil)))

  (declare element one-or-more)

  (defn none-or-more [parser]
    (optional (one-or-more parser)))

  (defn one-or-more [parser]
    (domonad [a parser
              as (none-or-more parser)]
             (cons a as)))

  (defn attribute
    "Parser generator that matches attribute tokens, adding them to
    :attributes key of the element-node, and returning the new
    element-node."
    [element-node]
    (domonad [attribute-token (match-type :attribute)]
             (add-to-key-list element-node :attributes attribute-token)))

  (defn text
    "Parser generator that matches text tokens, adding them to the :body
key of the element-node, and returning the new element-node."
    [element-node]
    (domonad [text-token (match-type :text)]
             (add-to-key-list element-node :body text-token)))

  (def match-first m-plus)

  (defn body-token
    [element-node]
      (match-first
        (attribute element-node)
        (text element-node)
        (element element-node)))

  (defn process-body
    [element-node]
    (domonad [modified-element (optional (body-token element-node))]
             (or modified-element element-node)))

  (defn element
    "Parses an element token into an element-node, and then adds the
fully constructed element-node to the
    :body of the containing element node."
    [container-element-node]
    (domonad [token (match-type :start-element)
              element-node (m-result (struct element-node :element token))
              assembled-node (process-body element-node)
              _ (match-type :end-element)]
             (add-to-key-list container-element-node :body assembled-node)))


  (def template
    ; Creates a psuedo-element (an empty map) and parses the root
element into it, then extracts the root element.
    ; Will eventually have to expand to support text and comments,
etc., around the root element."
    (domonad [root (element {})]
             (first (root :body))))
  )


Here's the problem: I need to be able to handle multiple attribute
tokens, as well as :text and sub-elements. To me, that means that
process-body should be recursive, but we need to pass each
modification of the element on each iteration (to accumlate additions
to the :attributes and :body keys).

However, when I do the following:

  (defn process-body
    [element-node]
    (domonad [modified-element (optional (body-token element-node))
              :when modified-element
              final-element (process-body modified-element)]
             (or final-element modified-element element-node)))

Which, to me, says "parse a body token and apply changes to the
original element-node, then recurse; if the current token is not a
body token, then abort, returning the initial element unchanged."

Unfortunately, this breaks the parser entirely, with nothing parsing,
not even a trivial document of just a root element with no attributes.

I've tried variations with (none-or-more), but what happens is the
initial element-node appears to be passed to the (body-token) parser
repeatedly with the end result being that only the final change (the
last attribute node in my current test) is recorded, and the prior
changes are lost.

Perhaps I'm misunderstanding what the :when guard does?

I think I'm missing something subtle here (well, obviously, it doesn't
work!) but my head keeps spinning a little. Within a parser monad,
it's easy to get tripped up by parsers (monadic functions: accept
state and return actions), actions (monadic values: functions that
accept state and return result and new state) and parser generators
(that return parsers).

Thanks!

-- 
Howard M. Lewis Ship

Creator of Apache Tapestry
Director of Open Source Technology at Formos

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To post to this group, send email to clojure@googlegroups.com
To unsubscribe from this group, send email to 
clojure+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/clojure?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to