From: Philipp Traeder <[EMAIL PROTECTED]>
> I'm facing a roughly similar problem at the moment, and I was planning
> on using String::Compare or something like it for comparing strings
> char by char. Taking a first glance at the code, it doesn't look too
> hard to modify it in a way that it returns not only the similarity
> between two strings, but also a string with special characters at
> those places that are different - something that I could call like:
> 
>   my $a = 'abcdXef';
>   my $b = 'abcdYef';
>   my ($similarity, $regexp) = compare_strings($a, $b);
> 
> and would return the similarity as percentile of matching chars (6/7
> in this case) and a regex that looks like 
>   abcd.ef

There is an indefinite number of different regexps that do match both 
your strings. Which of the them do you want? The minimal would be 
/^abdc[XY]ef$/, the maximal //. 

If you say you want to generate regexps that have the literals on all 
places where all the specified strings match and dots on the other 
places it's easy, but that only helps if the variable part of the 
messages is always the same length.

What about the example you give in another email
        Cannot connect to the primary server
        Cannot connect to the secondary server
what do you expect to get out of this?

What about the other

        unable to delete user 1234567
        unable to delete user 1897584

Do you really want
        /unable to delete user 1...5../
? And what if the IDs are not the same length?

Keep in mind that if you give a few examples of strings that you want 
to match and ask him to write a regexp for you he/she has much more 
information that just these strings. He/she knows what eahc part of 
the string means, if he/she sees 2004/11/27 somewhere in the strings 
he/she knows it's a date and can write the regexps so that it only 
matches valid dates, if one message contains something like "user 
138767" and some other "user 134795" he/she know you most probably 
need the regexp to contain something like "user \d+", etc. etc. etc. 
The computer has no chance to know all this.

Jenda
===== [EMAIL PROTECTED] === http://Jenda.Krynicky.cz =====
When it comes to wine, women and song, wizards are allowed 
to get drunk and croon as much as they like.
        -- Terry Pratchett in Sourcery


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to