The dtd-text package[1] provides a parser for XML DTDs. It implements most of the parts of the W3C XML specification relating to DTDs, and is compatible with versions 1.0 and 1.1 of the specification.[2]
The result of the parse is a Haskell DTD object from the dtd-types[3] package. This first preliminary version of dtd-text, version 0.1.0.0, requires at least version 0.3.0.1 of dtd-types. Synopsis: -- Parse a DTD from a Data.Text.Lazy: dtdParse :: L.Text -> DTD That should usually be all you need. -- Or, for advanced users, if the DTD contains external -- parameter entities and you want to supply their values: dtdParseWithExtern :: SymTable -> L.Text -> DTD -- where type SymTable = M.Map Text L.Text I really should have edited the Cabal description of this package before I uploaded it. It promises an attoparsec-text parser and blaze-builder renderer for DTDs. First of all, the renderer is vaporware - I haven't written it yet. Just the parser was quite a bit of work, so I decided to release it before even starting on the renderer. Second, although dtd-text does use attoparsec-text, and does export parsers for all of the significant components of a DTD, those parsers are of limited usefulness on their own. It turns out that in order to support the full algorithm specified in the spec for parameter entity resolution, which is rather imperative in nature, two layers of parsing are necessary. So the dtd-text package also has some internal plumbing so that it can present a simple interface. This is a very preliminary alpha release. All I can say so far is that it compiles on my machine (GHC 7.0.2 on 64 bit Linux), and that I tested it against a huge, extremely complicated DTD, and it seems to have done the RIght Thing. Since there are likely to be bugs that I will need to fix soon, I will wait until then to fix the package description. More about external parameter entities, for advanced users: As mentioned above, this parser does not attempt to go out and fetch the values of external references for you from files and URLs. If you need to extract information from the DTD before you fetch them yourself, such as system IDs and public IDs, you might be able to get them by applying parseDTD to all or part of the DTD as an initial parse. The parser tries very hard to give partial results when things are missing, while still doing its best to avoid problems like looping references. So if your DTD has many deeply intertwined external parameter entities, this parser may not be very practical for you; on the other hand, I personally have never seen such a DTD in the wild. A final caveat: this version of dtd-text does not yet support conditional sections. Enjoy, Yitz [1] http://hackage.haskell.org/package/dtd-text [2] http://www.w3.org/TR/2008/REC-xml-20081126/ [3] http://hackage.haskell.org/package/dtd-types _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe