> You could also try looking at the HTTP headers for a request for e.g. 
> "index.htm" using urllib. Specifically the "Expires" and "Last-Modified".
> Using headers values requires that you can trust the site on the header 
> content. Web servers and caching proxies can do all sorts of things with the 
> headers. Otherwise, saving the hash of the raw HTML (without GIFs etc.) as 
> suggested is a good approach. Depending on what your definition of "updated" 
> is.
>


thanks Erik,
By 'update' I meant a major addition/removal of text(say 100
characters).
Initially I thought of making hash of a page and comparing it to the
saved hash of the same page  at a different moment of time..But ,this
would
cause even a tiny change to be considered as an update..I would like
to use a filter to set an update of x number of characters.
May be using f=urllib.urlopen and
currentsize=len(f.read())  will let me find the number of added/
removed characters..and set the filter accordingly..

any other suggestions most welcome
harry


-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-us...@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.

Reply via email to