On 14 Feb 02 at 11:18:51AM, Daniel R. Allen wrote:
> Hi,
> 
> Somebody asked on the Toronto Perlmongers list about finding the longest
> common (consecutive) substring of two thousand-character texts using perl.  
> I've given a quick search and can find nothing definitive. [1]


Try this:

---

#!/usr/local/bin/perl -w
use strict;

use vars qw($a $b $longest);
($a, $b) = <DATA>;
$longest = "";

$a =~ m/(.+)(?{$longest = $1 if length $1 > length $longest and index($b, $1) != 
-1})(?!)/;

print ">>$longest<<\n"

__DATA__
this is the way we brush our teeth
is his the way to do it?
quick the lazy brown fox the quick brown fox jumps over the lazy dog over quick
jumps lazy dog over quick the quick brown fox jumps over the lazy dog lazy brown fox



---

It's not exactly thoroughly tested, but then, I'm supposed to be doing
something else. Delete the first two lines of data to test the second
two.

It uses the regex engine, but not much. It also needs a reasonably
up-to-date perl (say, 5.6.1 - it didn't work with 5.005_03). I tried
originally using regex matching within the embedded code, and it
caused a core dump, so I went for what was a cleaner solution, index,
anyway. It only reports the first longest common substring it finds.

Regards,


Ian Boreham

Reply via email to