Both the initial DOCTYPE declaration and the following comment need to be
terminated.

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
<!-- saved from url=(0023)http://www.iras.gov.sg/ --

should be

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<!-- saved from url=(0023)http://www.iras.gov.sg/ -->

If any browsers are managing to read the page it's because they probably
re-parse the thing when they discover no text on the first pass.  Ask
whoever wrote the page to fix it.  I actually see several '>'s missing; any
one of which could case the same problem.
--
Mac :})
** I may forward private database questions to the DBI mail lists. **
----- Original Message -----
From: "Tan Joo Geok" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Tuesday, March 28, 2000 1:59 AM
Subject: Cannot parse this web page!


I am using the libwww distribution with HTML-Parser-3.07 to measure the
total time it takes to fetch a URL including all the objects contained in
it.  However, there is a Web page(see attached) from which the parser
consistently cannot extract the image objects.  I hope somebody would be
able to tell me what is wrong and how to get around the problem.


Reply via email to