Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/20496 )
Change subject: IMPALA-12373: Small String Optimization for StringValue ...................................................................... Patch Set 6: (22 comments) Thanks. http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/buffered-tuple-stream.cc File be/src/runtime/buffered-tuple-stream.cc: http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/buffered-tuple-stream.cc@1118 PS4, Line 1118: StringValue::SimpleString s = sv->ToSimpleString(); > Current implementation generates much less CPU instructions: https://godbol Ok, nice detail. http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/descriptors.cc File be/src/runtime/descriptors.cc: http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/descriptors.cc@1112 PS4, Line 1112: CodegenWriteStringOrCollectionToSlot > For now I'd keep it as is, unless you feel strong about it. It's fine by me to leave it as it is. http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/descriptors.cc@1123 PS4, Line 1123: > For now I'd just remove the conditionals/comments. Ok. http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/descriptors.cc@1133 PS4, Line 1133: raw_type > I would keep it for readability. Ok. http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/smallable-string.h File be/src/runtime/smallable-string.h: http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/smallable-string.h@56 PS6, Line 56: so will the newly created SmallableString We could also mention that it will point to the same data. http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/smallable-string.h@63 PS6, Line 63: int Could change this to uint32_t. http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/smallable-string.h@86 PS6, Line 86: int Could change this to uint32_t. http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/smallable-string.h@113 PS6, Line 113: int Could change this to uint32_t. http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/smallable-string.h@129 PS6, Line 129: int Could change this to uint32_t. http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/smallable-string.h@130 PS6, Line 130: DCHECK_LE(len, Len()); It could be mentioned in the function comment that it can only be decreased. http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/smallable-string.h@147 PS6, Line 147: ret.ptr = const_cast<char*>(rep.small_rep.buf); Could we use the Ptr() and Len() methods here? Or would it be less efficient? http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/smallable-string.h@157 PS6, Line 157: int Could change this to uint32_t. http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/smallable-string.h@161 PS6, Line 161: int Could change this to uint32_t. http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/smallable-string.h File be/src/runtime/smallable-string.h: http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/smallable-string.h@38 PS4, Line 38: /// character data. > I introduced it because of two things: Ok, thanks. http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/sorter.cc File be/src/runtime/sorter.cc: http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/sorter.cc@785 PS6, Line 785: ptr_type We could use 'char*' instead of 'ptr_type' on L786. http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/string-value-ir.cc File be/src/runtime/string-value-ir.cc: http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/string-value-ir.cc@23 PS4, Line 23: IR_ALWAYS_INLINE int StringValue::IrLen() const { return Len(); } > I wanted to define them in the header so the compiler can surely inline the Ok, thanks. http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/string-value.cc File be/src/runtime/string-value.cc: http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/string-value.cc@54 PS4, Line 54: int i = len - 1; > I think so, but this should be fixed independently of this patch. I opened https://issues.apache.org/jira/browse/IMPALA-12478 for it. http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/string-value.cc File be/src/runtime/string-value.cc: http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/string-value.cc@50 PS6, Line 50: unsigned Could use uint32_t, as on L38. http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/string-value.cc@80 PS6, Line 80: unsigned Could use uint32_t, as on L38. http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/tuple.cc File be/src/runtime/tuple.cc: http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/runtime/tuple.cc@87 PS6, Line 87: for (vector<SlotDescriptor*>::const_iterator slot = desc.string_slots().begin(); Can't we use a range-based for-loop? "for (SlotDescriptor* const slot : desc.string_slots()) { ... }". http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/util/min-max-filter.h File be/src/util/min-max-filter.h: http://gerrit.cloudera.org:8080/#/c/20496/6/be/src/util/min-max-filter.h@306 PS6, Line 306: . Nit: it would be better if it was a comma, i.e. "'buffer', except ...". http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/util/min-max-filter.cc File be/src/util/min-max-filter.cc: http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/util/min-max-filter.cc@507 PS4, Line 507: if (value->IsSmall()) return; > 'len' can only be smaller than 'value->Len()' when 'value->Len()' is greate I think we should add a DCHECK for that. -- To view, visit http://gerrit.cloudera.org:8080/20496 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I741c3a5f12ab620b6b64b57d4c89b5f8e056efd3 Gerrit-Change-Number: 20496 Gerrit-PatchSet: 6 Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Michael Smith <michael.sm...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Tue, 03 Oct 2023 11:32:36 +0000 Gerrit-HasComments: Yes