Package: wnpp Severity: wishlist * Package name : html5tidy Version : git master Upstream Author : Michael Murtaugh & The active archives contributors * URL : https://github.com/aleray/html5tidy.git * License : GPLv3+ Programming Lang: Python 2 Description : “tidy” HTML 5 in the wild to well-formed XML or HTML
Since tidy fails hard on many HTML 5 documents (e.g. zero output) this package can be used to transform in-the-wild HTML 5 documents to input xmlstarlet can actually act on, e.g. for data extraction with XPath and XSLT via “xmlstarlet sel”.