Hi Jukka,
I was looking into a failure in a Bixo test, when using BodyContentHandler
(wrapped by XHTMLContentHandler).
The issue is that BodyContentHandler uses MatchingContentHandler to find only
text in nodes under the /html/body hierarchy.
And this in turn winds up not matching the <html> element.
Looking at the MatchingContentHandler's startElement() method (code below), the
issue I see is that the initial matcher.descend() works, in that it matches the
html element, but the Matcher it returns is a NamedElementMatcher. This in turn
always returns false from its matchesElement() method, so you only wind up
actually descending if the html element has attributes - which I don't, for
this case.
It seems like NamedElementMatcher needs to set state (matched or not) when
descend() is called, that state needs to be returned by its matchesElement(),
and the initial matcher should be queried for matchesElement, not the matcher
you descend into.
But this whole Matcher/XPath parser stuff is pretty convoluted, so I didn't
want to file an issue & try fixing it until I got some input from you.
Regards,
-- Ken
public void startElement(
String uri, String localName, String name, Attributes attributes)
throws SAXException {
matchers.addFirst(matcher);
matcher = matcher.descend(uri, localName);
AttributesImpl matches = new AttributesImpl();
for (int i = 0; i < attributes.getLength(); i++) {
String attributeURI = attributes.getURI(i);
String attributeName = attributes.getLocalName(i);
if (matcher.matchesAttribute(attributeURI, attributeName)) {
matches.addAttribute(
attributeURI, attributeName, attributes.getQName(i),
attributes.getType(i), attributes.getValue(i));
}
}
if (matcher.matchesElement() || matches.getLength() > 0) {
super.startElement(uri, localName, name, matches);
if (!matcher.matchesElement()) {
// Force the matcher to match the current element, so the
// endElement method knows to emit the correct event
matcher =
new CompositeMatcher(matcher, ElementMatcher.INSTANCE);
}
}
}
--------------------------
Ken Krugler
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Mahout & Solr