Thanks Cameron,

It does indeed have that header, how do I make this work?
 XMLDocument1.FileName := 'c:\temp\test.htm';
 XMLDocument1.Active := True;
Gives me various errors, I suspect that that the file is not valid xml, or is there some other way of parsing it?

Alister Christie
Computers for People
Ph: 04 471 1849 Fax: 04 471 1266
http://www.salespartner.co.nz
PO Box 13085
Johnsonville
Wellington


Cameron Hart wrote:

Do you know if the websites are xhtml -- do they have anything like below in the start of the page.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd";>

<html xmlns="http://www.w3.org/1999/xhtml";>

If they are it would be easier to load them into XML documents and process them that way using msxml DOMDocument60

cameron

*From:* [email protected] [mailto:[email protected]] *On Behalf Of *Alister Christie
*Sent:* Friday, 29 January 2010 12:22 p.m.
*To:* NZ Borland Developers Group - Delphi List
*Subject:* [DUG] web scraping using IHTMLDocument2

I'm trying to do some web page scraping using IHTMLDocument2, which is working fairly well and I can grab the second paragraph on a web page by doing something like:

p := iDoc.all.tags('P');
if p.Length >= 2 then
  result := p.Item(1).InnerText;

Where iDoc is an isnstance of IHTMLDocument2.

However say there there is an HTML element like

<div class="propertyInfo">Price: <span>Negotiation</span></div>

How would I be able to find the divs where class="propertyInfo"? (if anyone has much experience with IHTMLDocument2)

--
Alister Christie
Computers for People
Ph: 04 471 1849 Fax: 04 471 1266
http://www.salespartner.co.nz
PO Box 13085
Johnsonville
Wellington ------------------------------------------------------------------------

_______________________________________________
NZ Borland Developers Group - Delphi mailing list
Post: [email protected]
Admin: http://delphi.org.nz/mailman/listinfo/delphi
Unsubscribe: send an email to [email protected] with Subject: 
unsubscribe
_______________________________________________
NZ Borland Developers Group - Delphi mailing list
Post: [email protected]
Admin: http://delphi.org.nz/mailman/listinfo/delphi
Unsubscribe: send an email to [email protected] with Subject: 
unsubscribe

Reply via email to