Package: wnpp
Severity: wishlist

* Package name    : htmlcxx
  Version         : 0.83
  Upstream Author : Davi Reis <[EMAIL PROTECTED]>
* URL             : http://htmlcxx.sourceforge.net/
* License         : LGPL
  Programming Lang: C++
  Description     : htmlcxx is a simple non-validating html parser library for 
C++

htmlcxx is a simple non-validating css1 and html parser for C++. Although
there are several other html parsers available, htmlcxx has some
characteristics that make it unique:

    * STL like navigation of DOM tree, using excelent's tree.hh library from
      Kasper Peeters
    * It is possible to reproduce exactly, character by character, the
original document from the parse tree
    * Bundled css parser
    * Optional parsing of attributes
    * C++ code that looks like C++ (not so true anymore)
    * Offsets of tags/elements in the original document are stored in the
    nodes of the DOM tree

The parsing politics of htmlcxx were created trying to mimic mozilla
firefox (http://www.mozilla.org) behavior. So you should expect parse trees
similar to those create by firefox. However, differently from firefox,
htmlcxx does not insert non-existent stuff in your html. Therefore,
serializing the DOM tree gives exactly the same bytes contained in the
original HTML document. 

-- 
http://syx.googlecode.com - Smalltalk YX
http://lethalman.blogspot.com - Thoughts about computer technologies
http://www.debian.org - The Universal Operating System

Attachment: signature.asc
Description: Digital signature

Reply via email to