2010/10/7 Gregory Collins <g...@gregorycollins.net>: > "Edward Z. Yang" <ezy...@mit.edu> writes: > >> Excerpts from Gregory Collins's message of Wed Oct 06 19:44:44 -0400 2010: >>> I've got the month of October off, and one of the things I've been >>> planning on working on is a compliant HTML5 parser for Haskell -- >>> something which is sorely needed! I will ping the list back if/when I >>> get it finished. >> >> I've heard that some of the existing HTML parsers in Haskell were >> already HTML5 compliant (this topic came up when I was complaining >> that there were some algorithms that you absolutely had to have >> state for, because that was how they were specified.) I never >> verified this assertion though. > > If there's already a library which *correctly* parses html5 documents > into DOM trees, could someone please let me know so I can use it instead > of wasting a bunch of time writing one?
As far as I know, Neil Mitchel's tagsoup[1] parses according to the HTML 5 parsing rules, but it just generates a list of Tags[2], so you'd have to build the DOM tree up from there. I personally have had great experience with tagsoup. It's even the core of HTML-scraping technology powering searchonce[3]. Michael [1] http://hackage.haskell.org/package/tagsoup [2] http://hackage.haskell.org/packages/archive/tagsoup/0.11.1/doc/html/Text-HTML-TagSoup.html#t:Tag [3] http://www.search-once.com/ _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe