Den 27/09/2010 kl. 19.02 skrev harryos:

> thanks for the pointer
> I am trying to get something similar to changedetection  but with
> hourly updates.
> I need to get updates from a number of sites..So I was wondering how
> to implement an updating utility

You could also try looking at the HTTP headers for a request for e.g. 
"index.htm" using urllib. Specifically the "Expires" and "Last-Modified". This 
would let you ignore e.g. banners and flash content etc. as they are fetched in 
separate requests. If you want to go really lightweight and fast, do a HEAD 
request instead of a plain GET. It's easy to look at the headers a specific 
site is sending with e.g. the Firebug plugin for Firefox.

Using headers values requires that you can trust the site on the header 
content. Web servers and caching proxies can do all sorts of things with the 
headers. Otherwise, saving the hash of the raw HTML (without GIFs etc.) as 
suggested is a good approach. Depending on what your definition of "updated" is.

King regards,
Erik

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to