Keith, Thanks for you input!! we seem to have a nice collaboration on text utilities :)
I ran the code you provided, and found an issue. Given: "This is my dog" & "My Dog does not have fleas" - Your initial code generates NO matches - you use Position with Case sensitivity ON, and as for position > 1, missing matches starting at position 1. - With out case sensitivity, & Position > 0, Your code generates the longest common string as length 5, and "my do" This is because you end up truncating the last character comparison due to using [[$i-1]] Follows, modified code to handle requests for both case sensitivity, and not, as well as removing the truncation issue. I am thinking that it might be slightly more efficient to: - compare the first 2 characters -- removing 1 iteration over the longer string. But it might not be worth the housekeeping to code that. - to build the final longest string as you go, rather then reiterate back over the strings looking for matches of the longest length. Where X = the number of times the longest common string length can be found in the shortest string... it would save X substrings, 2X positions X find in array, 2X assignments, and 2X comparisons + the loop counter incrementation and comparison. For relatively short strings, there would likely be no significant difference in execution time, however, very long strings (what length is 'long' I do not know), or repeated execution of this code in some sort of loop the difference might be large. Code start:: // ---------------------------------------------------- // Method: Suffix_Tree - from Wiki examples // - // INPUT1: Text - to compare // INPUT2: Text - to compare // INPUT3: Pointer - array of longest matches // INPUT4: Boolean - Do case sensitive comparisons (default = false) // OUTPUT: Longint - length // ---------------------------------------------------- C_LONGINT($i;$size;$longest;$0) C_TEXT($longer;$shorter;$temp) C_POINTER($3;$Return_Array) C_BOOLEAN($4;$Case_Sensitive;$Same) $str1:=$1 $str2:=$2 $Return_Array:=$3 If (Count parameters>=4) $Case_Sensitive:=$4 End if If (Length($str1)>Length($str2)) $longer:=$str1 $shorter:=$str2 Else $longer:=$str2 $shorter:=$str1 End if $m:=Length($longer) // is the longer string $n:=Length($shorter) ARRAY LONGINT($LCSub;$m;$n) $longest:=0 For ($i;1;$m) For ($j;1;$n) If ($Case_Sensitive) //doing case sensitive comparision - yes $Same:=(Character code($longer[[$i]])=Character code($shorter[[$j]])) Else //- no $Same:=($longer[[$i]]=$shorter[[$j]]) End if If ($Same) //are the current strings the same? - yes //depending on the comparision point add this character count Case of : ($i=1) $LCSub{$i}{$j}:=$LCSub{$i}{$j}+1 : ($j=1) $LCSub{$i}{$j}:=$LCSub{$i-1}{$j}+1 Else $LCSub{$i}{$j}:=$LCSub{$i-1}{$j-1}+1 End case //longer then the previous longest? If ($longest<$LCSub{$i}{$j}) // $longest:=$LCSub{$i}{$j} //update the longest substring found End if Else $LCSub{$i}{$j}:=0 End if End for End for // search for matches of $longest length ARRAY TEXT($aMatch;0) $size:=Length($shorter)-$longest+1 //for every possible substring of the longest length //try to find substring matches For ($i;1;$size) $temp:=Substring($shorter;$i;$longest) If ($Case_Sensitive) //case sensitive comparison - Yes $Found:=Position($temp;$longer;*) Else //-no $Found:=Position($temp;$longer) End if If ($Found>0) //note was >1 //hve we found this substrig before? If (Find in array($aMatch;$temp)=-1) APPEND TO ARRAY($aMatch;$temp) //- no End if End if End for COPY ARRAY($aMatch;$Return_Array->) $0:=$longest On Sat, 24 Aug 2019 16:23:56 -0500, Keith Culotta via 4D_Tech wrote: > That's great. It's fast, and reduces the number of searches. > > Keith - CDI > > > // ---------------------------------------------------- > // Method: Suffix_Tree - from Wiki examples > // - > // INPUT1: Text - to compare > // INPUT2: Text - to compare > // INPUT3: Pointer - array of longest matches > // OUTPUT: Longint - length > // ---------------------------------------------------- > C_LONGINT($i;$size;$longest;$0) > C_TEXT($longer;$shorter;$temp) > $str1:=$1 > $str2:=$2 > > If (Length($str1)>Length($str2)) > $longer:=$str1 > $shorter:=$str2 > Else > $longer:=$str2 > $shorter:=$str1 > End if > > $m:=Length($longer) // is the longer string > $n:=Length($shorter) > > ARRAY LONGINT($LCSub;$m;$n) > $longest:=0 > > For ($i;1;$m) > For ($j;1;$n) > If ($i=1) | ($j=1) > $LCSub{$i}{$j}:=0 > > Else > If ($longer[[$i-1]]=$shorter[[$j-1]]) > $LCSub{$i}{$j}:=$LCSub{$i-1}{$j-1}+1 > If ($longest<$LCSub{$i}{$j}) > $longest:=$LCSub{$i}{$j} > End if > Else > $LCSub{$i}{$j}:=0 > End if > End if > > End for > End for > > // search for matches of $longest length > ARRAY TEXT($aMatch;0) > $size:=Length($shorter)-$longest+1 > For ($i;1;$size) > $temp:=Substring($shorter;$i;$longest) > If (Position($temp;$longer;*)>1) > If (Find in array($aMatch;$temp)=-1) > APPEND TO ARRAY($aMatch;$temp) > End if > End if > End for > > COPY ARRAY($aMatch;$3->) > $0:=$longest > > > > >> On Aug 24, 2019, at 2:41 PM, John DeSoi via 4D_Tech >> <4d_tech@lists.4d.com> wrote: >> >> See >> >> https://en.wikipedia.org/wiki/Longest_common_substring_problem >> >> John DeSoi, Ph.D. >> >> >>> On Aug 24, 2019, at 10:48 AM, Keith Culotta via 4D_Tech >>> <4d_tech@lists.4d.com> wrote: >>> >>> This version stands alone, and runs a little more efficiently. >>> It's not been tested in every way, but the results are >>> encouraging. Interesting problem. I can't think of a way to do it >>> without comparing every character combination. The new "Split >>> string" command would speed part of this up if Collections could be >>> used. >> > > ********************************************************************** > 4D Internet Users Group (4D iNUG) > Archive: http://lists.4d.com/archives.html > Options: https://lists.4d.com/mailman/options/4d_tech > Unsub: mailto:4d_tech-unsubscr...@lists.4d.com > ********************************************************************** --------------- Gas is for washing parts Alcohol is for drinkin' Nitromethane is for racing ********************************************************************** 4D Internet Users Group (4D iNUG) Archive: http://lists.4d.com/archives.html Options: https://lists.4d.com/mailman/options/4d_tech Unsub: mailto:4d_tech-unsubscr...@lists.4d.com **********************************************************************