rkavanap commented on a change in pull request #11051:
URL: https://github.com/apache/arrow/pull/11051#discussion_r741622617
##########
File path: cpp/src/gandiva/precompiled/string_ops.cc
##########
@@ -1642,6 +1642,55 @@ const char* convert_toUTF8(int64_t context, const char*
value, int32_t value_len
return value;
}
+// Calculate the levenshtein distance between two string values
+FORCE_INLINE
+gdv_int32 levenshtein_utf8_utf8(int64_t context, const char* in1, int32_t
in1_len,
Review comment:
Infact I feel more strongly now that we should use a memory efficient
algorithm. What if the column width is 65K? we will need 4GB to run this
alorithm. In fact Java version uses an algorithm that uses only 2n space where
n is the smallest of the 2 strings.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]