> the following code would fail in case the meta tags are in upper case > > Node nameNode = attrs.getNamedItem("name"); > Node equivNode = attrs.getNamedItem("http-equiv"); > Node contentNode = attrs.getNamedItem("content");
This code works well, because Nutch HTML Parser uses Xerces implementation HTMLDocumentImpl object that lowercased attributes (instead of elements names that are uppercased). For consistency and to decouple a little Nutch HTML Parser and Xerces implementation, I suggest to change these lines by something like: Node nameNode = null; Node equivNode = null; Node contentNode = null; for (int i=0; i<attrs.getLength(); i++) { Node attr = attrs.item(i); String attrName = attr.getNodeName().toLowerCase(); if (attrName.equals("name")) { nameNode = attr; } else if (attrName.equals("http-equiv")) { equivNode = attr; } else if (attrName.equals("content")) { contentNode = attr; } } Jérôme -- http://motrech.free.fr/ http://www.frutch.org/