Re: Regular expression generator/creator

Philipp Traeder Sat, 28 Aug 2004 14:25:29 -0700

On Saturday 28 August 2004 HH:27:20, Chris Devers wrote:
> On Sat, 28 Aug 2004, Philipp Traeder wrote:
> > I'm not sure if I understood the Halting Problem completely, but could
> > you give a hint what brings you to the assumption that "automatically
> > generating regular expressions from a bunch of text" is a variant of
> > the Halting Problem?
>
> I'm not a computer science theorist or anything, and could easily be
> wrong about my guess there, but it just seems like the same sort of
> broadly unbounded scenario that the halting problem deals with.
>
> The halting problem, for those that haven't heard of it, basically
> states that it is impossible to determine whether any random program
> will ever finish properly, because most will succeed or fail quickly but
> some will legitimately need infinite time to complete. It was conceived
> by Alan Turing before the first electronic computers were yet built, and
> is itself a variant on Kurt Godel's incompleteness theorem:
>
[..]
>
> Anyway, all of this & more isdescribed at extreme length in the great
> book _Godel, Escher, Bach_, if you want to learn more from an author who
> can actually explain this stuff without mangling it, as I've probably
> done :-)


Thanks for the explanation and the book title - I'll definitely take a look at 
it. This stuff sounds fascinating.

> >  my $a = 'abcdXef';
> >  my $b = 'abcdYef';
> >  my ($similarity, $regexp) = compare_strings($a, $b);
> >
> > and would return the similarity as percentile of matching chars (6/7
> > in this case) and a regex that looks like
> >
> >  abcd.ef
> >
> > Is this problem different to the one you were talking about? If not, why
> > shouldn't it be solvable?
>
> As I say before, I'm not a theorist or anything, it's all just hunches
> here :-).
>
> That said, the example you give is pretty well restricted, which would
> seem to lend to the problem being solvable. On the other hand, the
> example you give is so well restricted that it didn't take you any
> trouble at all to come up with the regex yourself.
>
> My suspicion is that as the problem gets more and more nuanced,
> automatic tools will have an increasingly hard time keeping up.
>
> This also gets into the sorts of topics that artificial intelligence
> researchers spend time pondering about -- how do you get a program to
> automatically recognize that a random image is a person (rather than,
> say, a statue), etc. It sounds simple, but it's really tricy stuff...

You're right - the problem I'm trying to solve is quite restricted - and I'm 
very thankful for this ;-)
Basically, I'm trying to write an application that "recognizes" log file 
formats, so that the following lines are identified as several manifestations 
of the same log message:

  could not delete user 3248234
  could not delete user 2348723

or even

  failed to connect to primary server
  failed to connect to secondary server

What I would like to see is a count of how many "manifestations" of each log 
message are being thrown, independently of the actual data they might 
contain. Since I do not want to hardcode the log messages into my 
application, I would like to generate regexes on the fly as they are needed.

So far, I'm quite confident that this should be solvable, but I agree that 
it'll become increasingly difficult if the connections between messages get 
more complex.

Thanks again for this interesting perspective,

Philipp

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>

Re: Regular expression generator/creator

Reply via email to