Hi, I need to parse and extract the body content from a bunch of xhtml files using XMl::LibXML. I figured that should be possible since xhmtl is supposed to be valid xml, right?
Here's the code that I'm using: #!/usr/bin/perl > use XML::LibXML; > my $parser = XML::LibXML->new(); > > my $doc = $parser->parse_file("xhtml.htm"); > my $docRoot = $doc->getDocumentElement; > print $_->toString for $docRoot->findnodes("body")->shift->childNodes; but I keep getting the error: > "Can't call method "childNodes" on an undefined value" as if it can't find a body element Now, the file seems to parse without errors, and > $docRoot-toString(); prints the whole html tag just fine. After a couple of tests I found that it works if I remove the following from the beginning of the document: > <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" > "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> > <html xmlns="http://www.w3.org/1999/xhtml"> and replace it with just > <html> Why is that? Is there a way that I can make the script work WITH these tags? Thanks for any hint! Ingo