rkavanap commented on a change in pull request #11051:
URL: https://github.com/apache/arrow/pull/11051#discussion_r741622617



##########
File path: cpp/src/gandiva/precompiled/string_ops.cc
##########
@@ -1642,6 +1642,55 @@ const char* convert_toUTF8(int64_t context, const char* 
value, int32_t value_len
   return value;
 }
 
+// Calculate the levenshtein distance between two string values
+FORCE_INLINE
+gdv_int32 levenshtein_utf8_utf8(int64_t context, const char* in1, int32_t 
in1_len,

Review comment:
       Infact I feel more strongly now that we should use a memory efficient 
algorithm. What if the column width of the 2 columns is 65K? If my math is 
correct, we will need 4GB to run this algorithm. In fact Java version uses an 
algorithm that uses only 2n space where n is the smallest of the 2 strings.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to