On Wed, 08 Dec 2004 19:52:53 -0500, Lonnie Princehouse wrote: > I don't think a Bayesian classifier is going to be very helpful here, > unless you have tens of thousands of examples to feed it, or unless it
We do have tens of thousands of examples to feed it. > The series of if host.find(...) lines in is_dynip() is equivalent to a > regular expression, but much more expensive to execute because of all It is not equivalent, because the patterns are based on the IP address. As I mentioned before, I tried building a custom regex from the IP for each test - but compiling the regex is way too slow to be done for each test. > For IP addresses, you really just need a mechanism to filter blocks of > IP addresses. It might be easiest to first convert them into hex and > then make liberal use of [0-f] in regular expressions. The point of the ip address is *not* to recognize ip addresses. The point is to look for transformations of the ip address in the hostname. This gives a *huge* bang for the buck. I have been working on this problem for a while. If the hostname has a transformation of the ip address - it is (almost certainly) a dynamic address. The ISPs are very creative in their transformations, using the parts of the ip in various orders and encoding in hex, base64, decimal with or without zerofill, and even roman numerals. The regex engine is just not powerful enough to handle parameterized regexe (that I know of). -- http://mail.python.org/mailman/listinfo/python-list