On Mon, Sep 1, 2008 at 11:15 AM, Phil Archer <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I've used LWP in several apps in which the key bit of information I'm after
> is the headers. I've therefore got used to the fact that if the returned
> resource is HTML, one of the triggers for "OK, that's all the headers and
> everything else must be content" is the presence of anything in the <head>
> section of the document that LWP doesn't recognise.
>
> Take this, for example:
>
> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
> "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd";>
> <html
> xmlns:creativeCommons='http://backend.userland.com/creativeCommonsRssModule'
>  xmlns="http://www.w3.org/1999/xhtml"; dir="ltr" lang="en-US">
>
> <creativeCommons:license>http://creativecommons.org/licenses/by-nc-nd/3.0/</creativeCommons:license>
>
> <head profile="http://gmpg.org/xfn/11";>
> ...
>
> Perfectly valid XHTML - but... LWP doesn't recognise the <creativecommons...
> tag and so stops parsing the headers.
>
> The User Agent package I'm using is version 2.31
>
> So, some questions:
>
> 1. Which modules need updating so that LWP can recognise this kind of thing
> as valid <head> content

It's the HTML::HeadParser module in the HTML-Parser dist.

> 2. Has anyone written such a module?

Not that I know about.

>
> As a demonstration, [1] and [2] show the status line, headers_as_string and
> content from two versions of the same document, the only difference between
> the two being that in [2], the <creativecommons..> tag is commented out. You
> can get this output from any URI using the form at [3].
>
> Thanks for any help
>
> Phil.
>
> [1]
> http://www.icra.org/cgi-bin/HTTP_Headers.cgi?url=http%3A%2F%2Fwww.icra.org%2Flabel%2FHTTP-Test%2Fspace.htm
> [2]
> http://www.icra.org/cgi-bin/HTTP_Headers.cgi?url=http%3A%2F%2Fwww.icra.org%2Flabel%2FHTTP-Test%2Fspace-mod.htm
> [3] http://www.icra.org/label/HTTP-Test/

Reply via email to