Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/18526 )
Change subject: IMPALA-10851: Codegen for structs ...................................................................... Patch Set 2: (6 comments) http://gerrit.cloudera.org:8080/#/c/18526/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/18526/2//COMMIT_MSG@10 PS2, Line 10: codegen turned off. This commit implements codegen for struct types. Can you also mention that some code has been refactored? As far as I understand the main reason for the refactor is to add recursion inside the codegen logic to write structs in tuples to StructVal (SlotRef::GetCodegendComputeFnImpl()) and that write StructVal to tuples (CodegenAnyVal::WriteToSlot ). http://gerrit.cloudera.org:8080/#/c/18526/2//COMMIT_MSG@11 PS2, Line 11: Can you add some measurement about the codegen time of a deeply nested structure? I don't expect it to be bad, but just to check that it is not too slow. http://gerrit.cloudera.org:8080/#/c/18526/2/be/src/codegen/codegen-anyval.h File be/src/codegen/codegen-anyval.h: http://gerrit.cloudera.org:8080/#/c/18526/2/be/src/codegen/codegen-anyval.h@52 PS2, Line 52: /// TYPE_ARRAY/TYPE_MAP/CollectionVal: { i64, i8* } Can you mention StructVal here? http://gerrit.cloudera.org:8080/#/c/18526/2/be/src/codegen/codegen-anyval.h@317 PS2, Line 317: SetToNull Can you add a comment for this? So it sets the null indicatior bit to 0 + for structures it set all children's null indicator bit to 0 http://gerrit.cloudera.org:8080/#/c/18526/2/be/src/exprs/slot-ref.cc File be/src/exprs/slot-ref.cc: http://gerrit.cloudera.org:8080/#/c/18526/2/be/src/exprs/slot-ref.cc@328 PS2, Line 328: child can you distinguish somehow between source and dest child, e.g. by using dstChild for destination? http://gerrit.cloudera.org:8080/#/c/18526/2/be/src/exprs/slot-ref.cc@486 PS2, Line 486: /// The generated code can be conceptually divided into the following parts: > just a brain dump: I thought a bit more, and it seems feasible to extend this to structs: so this CodegenValueReader could also have child CodegenValueReaders, and the source would have to create this tree (so all of them would have a block for null and non-null case), while the destination would fill the null and non-null blocks. The logic seems the same in the AnyVal->slot and the slot->AnyVal case: the reading drives which block to branch to while checking nullness, and null/non-null write logic is implemented based on the destination, e.g. set all null bits to 0 if the destination is a slot. I don't want to force this refactor during this patch, but I think that it would generally reduce duplications and some logic could be moved to the place where it actually belongs, e.g. CodegenAnyVal would no longer need a deep understanding of slots and tuples. -- To view, visit http://gerrit.cloudera.org:8080/18526 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5272c3f095fd9f07877104ee03c8e43d0c4ec0b6 Gerrit-Change-Number: 18526 Gerrit-PatchSet: 2 Gerrit-Owner: Daniel Becker <daniel.bec...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Comment-Date: Mon, 23 May 2022 11:01:32 +0000 Gerrit-HasComments: Yes