vvellanki commented on a change in pull request #11551:
URL: https://github.com/apache/arrow/pull/11551#discussion_r744566673
##########
File path: cpp/src/gandiva/gdv_function_stubs.cc
##########
@@ -794,6 +795,88 @@ const char* gdv_fn_initcap_utf8(int64_t context, const
char* data, int32_t data_
*out_len = out_idx;
return out;
}
+
+GANDIVA_EXPORT
+const char* gdv_fn_mask_first_n(int64_t context, const char* data, int32_t
data_len,
+ int32_t n_to_mask, int32_t* out_len) {
+ if (data_len <= 0) {
+ *out_len = 0;
+ return nullptr;
+ }
+
+ if (n_to_mask < 0) {
+ n_to_mask = n_to_mask * (-1);
+ }
+
+ *out_len = data_len;
+
+ char* out = reinterpret_cast<char*>(gdv_fn_context_arena_malloc(context,
*out_len));
+ if (out == nullptr) {
+ gdv_fn_context_set_error_msg(context, "Could not allocate memory for
output string");
+ *out_len = 0;
+ return nullptr;
+ }
+
+ // do the masking
+ for (int i = 0; i < data_len; ++i) {
+ if(isdigit(data[i]) && i < n_to_mask) {
Review comment:
With utf-8, a character can be multiple bytes. Masking first n will
require masking first n characters - that will change the logic completely.
With multi-byte chars, this logic is not going to work. Does Hive also assume
ascii characters?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]