Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8146 )
Change subject: IMPALA-5307: Part 2: copy out strings in uncompressed Avro ...................................................................... Patch Set 11: (4 comments) http://gerrit.cloudera.org:8080/#/c/8146/11/be/src/runtime/descriptors.cc File be/src/runtime/descriptors.cc: http://gerrit.cloudera.org:8080/#/c/8146/11/be/src/runtime/descriptors.cc@87 PS11, Line 87: return ConstantStruct::get(null_indicator_offset_type, : {ConstantInt::get(codegen->int_type(), byte_offset), : ConstantInt::get(codegen->tinyint_type(), bit_mask), : zeroes->getStructElement(2)}); > This needs to be updated when we change the definition of NullIndicatorOffs Added a comment to the class definition. ConstantStruct::get() asserts if the number of elements doesn't match the number of elements in the struct (that's why I had to add the padding explicitly). This behaviour seems better than silently failing. The code in LLVM is: Constant *ConstantStruct::get(StructType *ST, ArrayRef<Constant*> V) { assert((ST->isOpaque() || ST->getNumElements() == V.size()) && "Incorrect # elements specified to ConstantStruct::get"); http://gerrit.cloudera.org:8080/#/c/8146/11/be/src/runtime/tuple.h File be/src/runtime/tuple.h: http://gerrit.cloudera.org:8080/#/c/8146/11/be/src/runtime/tuple.h@289 PS11, Line 289: ); Looking in the IR I noticed that there were "invoke" instruction emitted when calling this function. http://gerrit.cloudera.org:8080/#/c/8146/11/be/src/runtime/tuple.cc File be/src/runtime/tuple.cc: http://gerrit.cloudera.org:8080/#/c/8146/11/be/src/runtime/tuple.cc@397 PS11, Line 397: Status( > Status::Expected() ? I think finalisation failing should be very unusual - either an Impala bug or a corrupt IR UDF. To be honest I don't know how robust our handling of this is, since we'll continue on and eventually try to optimise and compile the module. http://gerrit.cloudera.org:8080/#/c/8146/11/be/src/runtime/tuple.cc@449 PS11, Line 449: Value* result_val = builder.CreateCall(cross_compiled_fn, > Just curious if you inspected the output IR to see if it's actually unrolli I confirmed that it was unrolled for a handful of string columns. For 100s of string columns, e.g, select count(*) from (select distinct * from widetable_1000_cols) v;, it is not unrolled. I checked some intermediate values too and it seems pretty conservative - it seemed like the switch-over point is around materialising >=8 string columns. E.g. select * from tpch_avro did not unroll the loop. -- To view, visit http://gerrit.cloudera.org:8080/8146 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If1fc78790d778c874f5aafa5958c3c045a88d233 Gerrit-Change-Number: 8146 Gerrit-PatchSet: 11 Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Comment-Date: Mon, 23 Oct 2017 23:36:23 +0000 Gerrit-HasComments: Yes