Zach Amsden has posted comments on this change. Change subject: IMPALA-4729: Implement REPLACE() ......................................................................
Patch Set 5: (5 comments) http://gerrit.cloudera.org:8080/#/c/5776/5/be/src/exprs/string-functions-ir.cc File be/src/exprs/string-functions-ir.cc: PS5, Line 211: pattern.is_null > I believe pattern.is_null should also return StringVal::null(). pattern.is_ Will change this semantic. PS5, Line 232: if (UNLIKELY(out_max > 1ULL << 18)) { > It seems a bit arbitrary to choose 256K. Why not just start with a buffer o I dropped this down to 64k. Seems like the benefit for not re-allocating is pretty high and we only want to pay costs on very large replaces, which is what we'll get. PS5, Line 236: haystack.len + replace.len - pattern.len > I believe this can still overflow a 32-bit value, right ? Btw, please keep should not be able to overflow. preferred indent is in 4 and continuing, correct? PS5, Line 240: buffer_space = out_max; > May help to have a test case in which the result string is longer than Stri an excellent idea PS5, Line 249: while (match_pos + needle.len <= haystack.len) { : // Copy in original string : const int bytes = match_pos - consumed; : memcpy(ptr, &haystack.ptr[consumed], bytes); : DCHECK_LE(ptr - result.ptr + bytes, buffer_space); : ptr += bytes; : : // Copy in replacement - always safe since we always leave room for one more replace : DCHECK_LE(ptr - result.ptr + replace.len, buffer_space); : memcpy(ptr, replace.ptr, replace.len); : ptr += replace.len; : : // Don't want to re-match within already replaced pattern : match_pos += needle.len; : consumed = match_pos; : : // If we had an enlarging pattern, we may need more space : if (UNLIKELY(check_space)) { : const int bytes_produced = ptr - result.ptr; : const int bytes_remaining = haystack.len - consumed; : if (UNLIKELY(bytes_produced + bytes_remaining + replace.len > buffer_space)) { : // Ran out of space, double the size. See the note above regarding the choice : // of a power of two sized buffer. : buffer_space <<= 1; : DCHECK((buffer_space & (buffer_space - 1)) == 0); : const auto ofs = ptr - result.ptr; : if (UNLIKELY(!result.Resize(context, buffer_space))) { : return StringVal::null(); : } : ptr = result.ptr + ofs; : } : } : : StringValue haystack_substring = haystack.Substring(match_pos); : int match_pos_in_substring = search.Search(&haystack_substring); : if (match_pos_in_substring < 0) break; : match_pos += match_pos_in_substring; : } > Please ignore my previous comment. We need to keep the memory allocations i I looked at StringBuffer but it uses MemPool directly. I'd rather keep this change locally confined to using the current allocator rather than adapting a new interface. -- To view, visit http://gerrit.cloudera.org:8080/5776 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1780a7d8fee6d0db9dad148217fb6eb10f773329 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden <zams...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Zach Amsden <zams...@cloudera.com> Gerrit-HasComments: Yes