Zach Amsden has posted comments on this change.

Change subject: IMPALA-4729: Implement REPLACE()
......................................................................


Patch Set 5:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/5776/5/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

PS5, Line 211: pattern.is_null
> I believe pattern.is_null should also return StringVal::null(). pattern.is_
Will change this semantic.


PS5, Line 232: if (UNLIKELY(out_max > 1ULL << 18)) {
> It seems a bit arbitrary to choose 256K. Why not just start with a buffer o
I dropped this down to 64k.  Seems like the benefit for not re-allocating is 
pretty high and we only want to pay costs on very large replaces, which is what 
we'll get.


PS5, Line 236: haystack.len + replace.len - pattern.len
> I believe this can still overflow a 32-bit value, right ? Btw, please keep 
should not be able to overflow.  preferred indent is
   in 4 
   and continuing, correct?


PS5, Line 240: buffer_space = out_max;
> May help to have a test case in which the result string is longer than Stri
an excellent idea


PS5, Line 249:  while (match_pos + needle.len <= haystack.len) {
             :       // Copy in original string
             :       const int bytes = match_pos - consumed;
             :       memcpy(ptr, &haystack.ptr[consumed], bytes);
             :       DCHECK_LE(ptr - result.ptr + bytes, buffer_space);
             :       ptr += bytes;
             : 
             :       // Copy in replacement - always safe since we always leave 
room for one more replace
             :       DCHECK_LE(ptr - result.ptr + replace.len, buffer_space);
             :       memcpy(ptr, replace.ptr, replace.len);
             :       ptr += replace.len;
             : 
             :       // Don't want to re-match within already replaced pattern
             :       match_pos += needle.len;
             :       consumed = match_pos;
             : 
             :       // If we had an enlarging pattern, we may need more space
             :       if (UNLIKELY(check_space)) {
             :         const int bytes_produced = ptr - result.ptr;
             :         const int bytes_remaining = haystack.len - consumed;
             :         if (UNLIKELY(bytes_produced + bytes_remaining + 
replace.len > buffer_space)) {
             :           // Ran out of space, double the size.  See the note 
above regarding the choice
             :           // of a power of two sized buffer.
             :           buffer_space <<= 1;
             :           DCHECK((buffer_space & (buffer_space - 1)) == 0);
             :           const auto ofs = ptr - result.ptr;
             :           if (UNLIKELY(!result.Resize(context, buffer_space))) {
             :             return StringVal::null();
             :           }
             :           ptr = result.ptr + ofs;
             :         }
             :       }
             : 
             :       StringValue haystack_substring = 
haystack.Substring(match_pos);
             :       int match_pos_in_substring = 
search.Search(&haystack_substring);
             :       if (match_pos_in_substring < 0) break;
             :       match_pos += match_pos_in_substring;
             :   }
> Please ignore my previous comment. We need to keep the memory allocations i
I looked at StringBuffer but it uses MemPool directly.  I'd rather keep this 
change locally confined to using the current allocator rather than adapting a 
new interface.


-- 
To view, visit http://gerrit.cloudera.org:8080/5776
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1780a7d8fee6d0db9dad148217fb6eb10f773329
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <zams...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <zams...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to