All,
I have some code based on Henri Sivonen's html5 parser that adds HTML
parsing capabilities to the Abdera api. For instance,
URL url = new URL("http://www.snellspace.com");
Abdera abdera = Abdera.getInstance();
Parser parser = abdera.getParserFactory().getParser("html");
Document doc = parser.parse(url.openStream());
doc.writeTo(System.out);
The parser will repair broken markup and allow it to be accessed using
the Abdera Element objects. The two cases where this becomes
particularly use is...
a) Performing autodiscovery of feeds and atompub service docs
b) Converting HTML content to XHTML content and protecting feeds against
accidental breakage.
For example,
List<Element> list =
HtmlHelper.discoverLinks(
"http://www.snellspace.com/wp",
"application/atom+xml",
"alternate");
for (Element el : list) {
String href = el.getAttributeValue("href");
String title = el.getAttributeValue("title");
String type = el.getAttributeValue("type");
System.out.println(type + ", " + title + ", " + href);
}
And another:
Abdera abdera = Abdera.getInstance();
Entry entry = abdera.newEntry();
entry.setContentAsXhtml(HtmlCleaner.parse("<p>test<br>foo"));
System.out.println(entry);
Which outputs:
<entry xmlns="http://www.w3.org/2005/Atom">
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<p>test<br />foo</p>
</div>
</content>
</entry>
Note that the html fragment is fixed by the HtmlCleaner.
I could commit this but doing so means adding two new optional
dependency jars. I think the function is valuable enough to justify the
addition but I wanted to run it past the rest of you first.
- James