Re: notification in python
hi Erik that was food for thought..content length may not work if substitutions leave length unchanged.. Will look into L distance ..thanks for the suggestion regards harry > Content length (which you could also get using the HTTP header "Content > Length") won't necessarily tell you if content has changed. I think your > problem is a candidate > forhttp://en.wikipedia.org/wiki/Levenshtein_distance(calculating the > "distance" between two strings), for which I think there are Python > implementations. > > Depending on your requirements, you could add other heuristics to detect > major changes, e.g. load the page into an XML parser and only check certain > 's. But further suggestions would require more information on your > problem. -- You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-us...@googlegroups.com. To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-users?hl=en.
Re: notification in python
Harryos, Den 28/09/2010 kl. 09.56 skrev harryos: > thanks Erik, > By 'update' I meant a major addition/removal of text(say 100 > characters). > Initially I thought of making hash of a page and comparing it to the > saved hash of the same page at a different moment of time..But ,this > would > cause even a tiny change to be considered as an update..I would like > to use a filter to set an update of x number of characters. > May be using f=urllib.urlopen and > currentsize=len(f.read()) will let me find the number of added/ > removed characters..and set the filter accordingly.. Content length (which you could also get using the HTTP header "Content Length") won't necessarily tell you if content has changed. I think your problem is a candidate for http://en.wikipedia.org/wiki/Levenshtein_distance (calculating the "distance" between two strings), for which I think there are Python implementations. Depending on your requirements, you could add other heuristics to detect major changes, e.g. load the page into an XML parser and only check certain 's. But further suggestions would require more information on your problem. Kind regards, Erik smime.p7s Description: S/MIME cryptographic signature
Re: notification in python
> You could also try looking at the HTTP headers for a request for e.g. > "index.htm" using urllib. Specifically the "Expires" and "Last-Modified". > Using headers values requires that you can trust the site on the header > content. Web servers and caching proxies can do all sorts of things with the > headers. Otherwise, saving the hash of the raw HTML (without GIFs etc.) as > suggested is a good approach. Depending on what your definition of "updated" > is. > thanks Erik, By 'update' I meant a major addition/removal of text(say 100 characters). Initially I thought of making hash of a page and comparing it to the saved hash of the same page at a different moment of time..But ,this would cause even a tiny change to be considered as an update..I would like to use a filter to set an update of x number of characters. May be using f=urllib.urlopen and currentsize=len(f.read()) will let me find the number of added/ removed characters..and set the filter accordingly.. any other suggestions most welcome harry -- You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-us...@googlegroups.com. To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-users?hl=en.
Re: notification in python
Den 27/09/2010 kl. 19.02 skrev harryos: > thanks for the pointer > I am trying to get something similar to changedetection but with > hourly updates. > I need to get updates from a number of sites..So I was wondering how > to implement an updating utility You could also try looking at the HTTP headers for a request for e.g. "index.htm" using urllib. Specifically the "Expires" and "Last-Modified". This would let you ignore e.g. banners and flash content etc. as they are fetched in separate requests. If you want to go really lightweight and fast, do a HEAD request instead of a plain GET. It's easy to look at the headers a specific site is sending with e.g. the Firebug plugin for Firefox. Using headers values requires that you can trust the site on the header content. Web servers and caching proxies can do all sorts of things with the headers. Otherwise, saving the hash of the raw HTML (without GIFs etc.) as suggested is a good approach. Depending on what your definition of "updated" is. King regards, Erik smime.p7s Description: S/MIME cryptographic signature
Re: notification in python
I did a quick Google search and didn't find anything that was obviously solving this problem. I did see companies that sell this service, and probably with good reason. Due to dynamic content such as ads, data from RSS feeds, and simply auto-generated content from server-side code, it seems that you'd almost have to customize the configuration on a site-by-site basis. It would be difficult to distinguish between changes to the content you're interested in monitoring and all the other stuff. On the other hand, it seems that this functionality is something that a lot of people might want, so don't stop looking. If you're sure there's no open-source solution out there, maybe you can create it and put out a call for contributors on this list. Shawn -- You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-us...@googlegroups.com. To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-users?hl=en.
Re: notification in python
thanks for the pointer I am trying to get something similar to changedetection but with hourly updates. I need to get updates from a number of sites..So I was wondering how to implement an updating utility harry On Sep 27, 9:16 pm, Shawn Milochik wrote: > If you're asking for functionality like this:http://www.changedetection.com/ > > Or are you looking for something to embed in your own code to know when > something has happened on your own site? > > If the former, you can probably do it by scheduling a urlopen and saving its > hash, comparing it each time. If the latter, you can use the logging module. > > Shawn -- You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-us...@googlegroups.com. To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-users?hl=en.
Re: notification in python
If you're asking for functionality like this: http://www.changedetection.com/ Or are you looking for something to embed in your own code to know when something has happened on your own site? If the former, you can probably do it by scheduling a urlopen and saving its hash, comparing it each time. If the latter, you can use the logging module. Shawn -- You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-us...@googlegroups.com. To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-users?hl=en.