Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20496 )

Change subject: IMPALA-12373: Small String Optimization for StringValue
......................................................................


Patch Set 6:

(22 comments)

Thanks.

http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/buffered-tuple-stream.cc
File be/src/runtime/buffered-tuple-stream.cc:

http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/buffered-tuple-stream.cc@1118
PS4, Line 1118:       StringValue::SimpleString s = sv->ToSimpleString();
> Current implementation generates much less CPU instructions: https://godbol
Ok, nice detail.


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/descriptors.cc
File be/src/runtime/descriptors.cc:

http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/descriptors.cc@1112
PS4, Line 1112: CodegenWriteStringOrCollectionToSlot
> For now I'd keep it as is, unless you feel strong about it.
It's fine by me to leave it as it is.


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/descriptors.cc@1123
PS4, Line 1123:
> For now I'd just remove the conditionals/comments.
Ok.


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/descriptors.cc@1133
PS4, Line 1133: raw_type
> I would keep it for readability.
Ok.


http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/smallable-string.h
File be/src/runtime/smallable-string.h:

http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/smallable-string.h@56
PS6, Line 56: so will the newly created SmallableString
We could also mention that it will point to the same data.


http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/smallable-string.h@63
PS6, Line 63: int
Could change this to uint32_t.


http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/smallable-string.h@86
PS6, Line 86: int
Could change this to uint32_t.


http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/smallable-string.h@113
PS6, Line 113: int
Could change this to uint32_t.


http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/smallable-string.h@129
PS6, Line 129: int
Could change this to uint32_t.


http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/smallable-string.h@130
PS6, Line 130:     DCHECK_LE(len, Len());
It could be mentioned in the function comment that it can only be decreased.


http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/smallable-string.h@147
PS6, Line 147:       ret.ptr = const_cast<char*>(rep.small_rep.buf);
Could we use the Ptr() and Len() methods here? Or would it be less efficient?


http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/smallable-string.h@157
PS6, Line 157: int
Could change this to uint32_t.


http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/smallable-string.h@161
PS6, Line 161: int
Could change this to uint32_t.


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/smallable-string.h
File be/src/runtime/smallable-string.h:

http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/smallable-string.h@38
PS4, Line 38: /// character data.
> I introduced it because of two things:
Ok, thanks.


http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/sorter.cc
File be/src/runtime/sorter.cc:

http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/sorter.cc@785
PS6, Line 785: ptr_type
We could use 'char*' instead of 'ptr_type' on L786.


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/string-value-ir.cc
File be/src/runtime/string-value-ir.cc:

http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/string-value-ir.cc@23
PS4, Line 23: IR_ALWAYS_INLINE int StringValue::IrLen() const { return Len(); }
> I wanted to define them in the header so the compiler can surely inline the
Ok, thanks.


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/string-value.cc
File be/src/runtime/string-value.cc:

http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/string-value.cc@54
PS4, Line 54:   int i = len - 1;
> I think so, but this should be fixed independently of this patch.
I opened https://issues.apache.org/jira/browse/IMPALA-12478 for it.


http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/string-value.cc
File be/src/runtime/string-value.cc:

http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/string-value.cc@50
PS6, Line 50: unsigned
Could use uint32_t, as on L38.


http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/string-value.cc@80
PS6, Line 80: unsigned
Could use uint32_t, as on L38.


http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/tuple.cc
File be/src/runtime/tuple.cc:

http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/tuple.cc@87
PS6, Line 87:   for (vector<SlotDescriptor*>::const_iterator slot = 
desc.string_slots().begin();
Can't we use a range-based for-loop?
"for (SlotDescriptor* const slot : desc.string_slots()) { ... }".


http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/util/min-max-filter.h
File be/src/util/min-max-filter.h:

http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/util/min-max-filter.h@306
PS6, Line 306: .
Nit: it would be better if it was a comma, i.e. "'buffer', except ...".


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/util/min-max-filter.cc
File be/src/util/min-max-filter.cc:

http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/util/min-max-filter.cc@507
PS4, Line 507:   if (value->IsSmall()) return;
> 'len' can only be smaller than 'value->Len()' when 'value->Len()' is greate
I think we should add a DCHECK for that.



--
To view, visit http://gerrit.cloudera.org:8080/20496
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I741c3a5f12ab620b6b64b57d4c89b5f8e056efd3
Gerrit-Change-Number: 20496
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Michael Smith <michael.sm...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Tue, 03 Oct 2023 11:32:36 +0000
Gerrit-HasComments: Yes

Reply via email to