> You could also try looking at the HTTP headers for a request for e.g. > "index.htm" using urllib. Specifically the "Expires" and "Last-Modified". > Using headers values requires that you can trust the site on the header > content. Web servers and caching proxies can do all sorts of things with the > headers. Otherwise, saving the hash of the raw HTML (without GIFs etc.) as > suggested is a good approach. Depending on what your definition of "updated" > is. >
thanks Erik, By 'update' I meant a major addition/removal of text(say 100 characters). Initially I thought of making hash of a page and comparing it to the saved hash of the same page at a different moment of time..But ,this would cause even a tiny change to be considered as an update..I would like to use a filter to set an update of x number of characters. May be using f=urllib.urlopen and currentsize=len(f.read()) will let me find the number of added/ removed characters..and set the filter accordingly.. any other suggestions most welcome harry -- You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-us...@googlegroups.com. To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-users?hl=en.