vvellanki commented on a change in pull request #11522:
URL: https://github.com/apache/arrow/pull/11522#discussion_r756004169



##########
File path: cpp/src/gandiva/precompiled/string_ops.cc
##########
@@ -1642,6 +1642,107 @@ const char* convert_toUTF8(int64_t context, const char* 
value, int32_t value_len
   return value;
 }
 
+// Calculate the levenshtein distance between two string values
+FORCE_INLINE
+gdv_int32 levenshtein_utf8_utf8(int64_t context, const char* in1, int32_t 
in1_len,

Review comment:
       I dont think this algorithm is written to work for utf-8 input. utf-8 
uses multiple bytes to encode a character... this algorithm doesn't handle this 
correctly..
   
   Can you change the name of the function to not include utf-8 in the name?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to