On Saturday 28 August 2004 HH:27:20, Chris Devers wrote: > On Sat, 28 Aug 2004, Philipp Traeder wrote: > > I'm not sure if I understood the Halting Problem completely, but could > > you give a hint what brings you to the assumption that "automatically > > generating regular expressions from a bunch of text" is a variant of > > the Halting Problem? > > I'm not a computer science theorist or anything, and could easily be > wrong about my guess there, but it just seems like the same sort of > broadly unbounded scenario that the halting problem deals with. > > The halting problem, for those that haven't heard of it, basically > states that it is impossible to determine whether any random program > will ever finish properly, because most will succeed or fail quickly but > some will legitimately need infinite time to complete. It was conceived > by Alan Turing before the first electronic computers were yet built, and > is itself a variant on Kurt Godel's incompleteness theorem: > [..] > > Anyway, all of this & more isdescribed at extreme length in the great > book _Godel, Escher, Bach_, if you want to learn more from an author who > can actually explain this stuff without mangling it, as I've probably > done :-)
Thanks for the explanation and the book title - I'll definitely take a look at it. This stuff sounds fascinating. > > my $a = 'abcdXef'; > > my $b = 'abcdYef'; > > my ($similarity, $regexp) = compare_strings($a, $b); > > > > and would return the similarity as percentile of matching chars (6/7 > > in this case) and a regex that looks like > > > > abcd.ef > > > > Is this problem different to the one you were talking about? If not, why > > shouldn't it be solvable? > > As I say before, I'm not a theorist or anything, it's all just hunches > here :-). > > That said, the example you give is pretty well restricted, which would > seem to lend to the problem being solvable. On the other hand, the > example you give is so well restricted that it didn't take you any > trouble at all to come up with the regex yourself. > > My suspicion is that as the problem gets more and more nuanced, > automatic tools will have an increasingly hard time keeping up. > > This also gets into the sorts of topics that artificial intelligence > researchers spend time pondering about -- how do you get a program to > automatically recognize that a random image is a person (rather than, > say, a statue), etc. It sounds simple, but it's really tricy stuff... You're right - the problem I'm trying to solve is quite restricted - and I'm very thankful for this ;-) Basically, I'm trying to write an application that "recognizes" log file formats, so that the following lines are identified as several manifestations of the same log message: could not delete user 3248234 could not delete user 2348723 or even failed to connect to primary server failed to connect to secondary server What I would like to see is a count of how many "manifestations" of each log message are being thrown, independently of the actual data they might contain. Since I do not want to hardcode the log messages into my application, I would like to generate regexes on the fly as they are needed. So far, I'm quite confident that this should be solvable, but I agree that it'll become increasingly difficult if the connections between messages get more complex. Thanks again for this interesting perspective, Philipp -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>