ali-ghanbari commented on a change in pull request #233:
URL: https://github.com/apache/commons-text/pull/233#discussion_r674469665



##########
File path: 
src/main/java/org/apache/commons/text/similarity/LongestCommonSubsequence.java
##########
@@ -120,24 +170,72 @@ public CharSequence longestCommonSubsequence(final 
CharSequence left, final Char
        if (left == null || right == null) {
            throw new IllegalArgumentException("Inputs must not be null");
        }
-       final StringBuilder longestCommonSubstringArray = new 
StringBuilder(Math.max(left.length(), right.length()));
-       final int[][] lcsLengthArray = longestCommonSubstringLengthArray(left, 
right);
-       int i = left.length() - 1;
-       int j = right.length() - 1;
-       int k = lcsLengthArray[left.length()][right.length()] - 1;
-       while (k >= 0) {
-           if (left.charAt(i) == right.charAt(j)) {
-               longestCommonSubstringArray.append(left.charAt(i));
-               i = i - 1;
-               j = j - 1;
-               k = k - 1;
-           } else if (lcsLengthArray[i + 1][j] < lcsLengthArray[i][j + 1]) {
-               i = i - 1;
-           } else {
-               j = j - 1;
-           }
+       // Find lengths of two strings
+       final int leftSz = left.length();
+       final int rightSz = right.length();
+
+       // Check if we can save even more space
+       if (leftSz < rightSz) {
+           return algorithmC(right, rightSz, left, leftSz);
        }
-       return longestCommonSubstringArray.reverse().toString();
+       return algorithmC(left, leftSz, right, rightSz);
+   }
+
+    /**
+     * An implementation of "ALG C" from Hirschberg's paper <a 
href="https://dl.acm.org/doi/10.1145/360825.360861";>A linear space algorithm 
for computing maximal common subsequences</a>.
+     * Assuming the sequence <code>left</code> is of size <code>m</code> and 
the sequence <code>right</code> is of size <code>n</code>,
+     * this method returns Longest Common Subsequence (LCS) the two sequences.
+     * As per the paper, this method runs in O(m * n) time and O(m + n) space.
+     *
+     * @param left Left sequence
+     * @param m Length of left sequence
+     * @param right Right sequence
+     * @param n Length of right sequence
+     * @return LCS of <code>left</code> and <code>right</code>
+     */
+   static CharSequence algorithmC(final CharSequence left, final int m,

Review comment:
       >  ... Algorithm A and Algorithm B are very similar, but I think A 
requires O(mn) time and O(mn) space. While Algorithm B requires O(mn) time and 
O(m+n) space.
   
   Right, in terms of time complexity, yes, they all are quadratic (O(mn), to 
be precise) in the size of input strings.
   Algorithm B has O(m+n) space complexity, while Algorithm A has O(mn) space 
complexity (it returns an entire DP array; just like the method that we marked 
as deprecated).
   
   > And, Algorithm C is the third one in that paper, with O(mn) time...
   
   Thanks for bring this up @kinow. The method algorithmC allocates heap space 
directly or indirectly (through the calls to subSequence), and there are 2m-1 
calls to algorithmC. The recurrence does not seem to be easy to solve for me, 
but a back of the envelope calculation tells me that this implementation should 
not be of linear space complexity...
   Looking at the space analysis proof given in the paper, it looks like the 
paper makes a simplifying assumption about the storage of substrings. (please 
correct me if I am wrong)
   Perhaps that is why 
[https://www.ics.uci.edu/~eppstein/161/960229.html](https://www.ics.uci.edu/~eppstein/161/960229.html),
 emphasizes that Hirschberg's linear-spaced algorithm is used **only** for 
calculating length of the LCS.
   Please confirm this and I can delete algorithmC and keep algorithmB only.
   I can quickly adjust the rest of the class code accordingly. Thanks!
   
   > Past midnight here, long time with ...
   
   Thanks @kinow your constructive comments and your hard work! 👍 🥇 
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@commons.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to