> the following code would fail in case the meta tags are in upper case
>
>         Node nameNode = attrs.getNamedItem("name");
>         Node equivNode = attrs.getNamedItem("http-equiv");
>         Node contentNode = attrs.getNamedItem("content");

This code works well, because Nutch HTML Parser uses Xerces implementation
HTMLDocumentImpl object that lowercased attributes (instead of elements
names that are uppercased).
For consistency and to decouple a little Nutch HTML Parser and Xerces
implementation, I suggest to change these lines by something like:
Node nameNode = null;
Node equivNode = null;
Node contentNode = null;
for (int i=0; i<attrs.getLength(); i++) {
  Node attr = attrs.item(i);
  String attrName = attr.getNodeName().toLowerCase();
  if (attrName.equals("name")) {
    nameNode = attr;
  } else if (attrName.equals("http-equiv")) {
    equivNode = attr;
  } else if (attrName.equals("content")) {
    contentNode = attr;
  }
}


Jérôme


--
http://motrech.free.fr/
http://www.frutch.org/

Reply via email to