[ https://issues.apache.org/jira/browse/IMPALA-11204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517322#comment-17517322 ]
ASF subversion and git services commented on IMPALA-11204: ---------------------------------------------------------- Commit 85ddd27b640bf42a61d8af238938e16618db537e in impala's branch refs/heads/master from stiga-huang [ https://gitbox.apache.org/repos/asf?p=impala.git;h=85ddd27b6 ] IMPALA-11204: Template implementation for OrcStringColumnReader::ReadValue There are some checks in OrcStringColumnReader::ReadValue() that we can determine outside the scope of this method. They should be optimized since this is a critical method that will be executed for each row (and for each string column). With these checks, the method is too complex to be inlined in OrcBatchedReader::ReadValueBatch() by the compiler. This patch templates OrcStringColumnReader::ReadValue() with two parameters, one for the target slot type (i.e. STRING/CHAR/VARCHAR), and the other one for whether the column is dictionary encoded. Also adds an ALWAYS_INLINE marker to force inlining it. OrcStringColumnReader::ReadValueBatch() will call a template version of ReadValue() based on the slot type and the orc batch encoded state. Ran a single node perf test on TPCH(30) on my dev box using 3 impalad instances. There are some improvements and no significant regressions: +----------+--------+-------------+------------+ | Query | Avg(s) | Base Avg(s) | Delta(Avg) | +----------+--------+-------------+------------+ | TPCH-Q19 | 5.62 | 6.07 | I -7.41% | | TPCH-Q6 | 2.56 | 2.78 | I -7.77% | | TPCH-Q4 | 3.85 | 4.25 | I -9.42% | | TPCH-Q12 | 4.25 | 4.99 | I -14.78% | +----------+--------+-------------+------------+ Base commit: ff21728 File Format: orc/snap/block Iterations: 30 Change-Id: I5e5f88c28059fb3d3ac1172e6d383d06ee3bedd5 Reviewed-on: http://gerrit.cloudera.org:8080/18366 Reviewed-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> > OrcStringColumnReader should be a template class > ------------------------------------------------ > > Key: IMPALA-11204 > URL: https://issues.apache.org/jira/browse/IMPALA-11204 > Project: IMPALA > Issue Type: Improvement > Components: Backend > Reporter: Quanlong Huang > Assignee: Quanlong Huang > Priority: Critical > Attachments: perf-report-tpch30-orc-snap.txt > > > There are some duplicated checks in OrcStringColumnReader::ReadValue() which > is executed for each row. They are static and should only be performed once > per ORC stripe. > {code:cpp} > Status OrcStringColumnReader::ReadValue(int row_idx, Tuple* tuple, MemPool* > pool) { > if (IsNull(DCHECK_NOTNULL(batch_), row_idx)) { > SetNullSlot(tuple); > return Status::OK(); > } > char* src_ptr; > int src_len; > if (batch_->isEncoded) { > orc::EncodedStringVectorBatch* currentBatch = > static_cast<orc::EncodedStringVectorBatch*>(batch_); > orc::DataBuffer<int64_t>& offsets = > currentBatch->dictionary->dictionaryOffset; > int64_t index = currentBatch->index[row_idx]; > if (UNLIKELY(index < 0 || static_cast<uint64_t>(index) + 1 >= > offsets.size())) { > return Status(Substitute("Corrupt ORC file: $0. Index ($1) out of range > [0, $2) in " > "StringDictionary.", scanner_->filename(), index, offsets.size()));; > } > src_ptr = blob_ + offsets[index]; > src_len = offsets[index + 1] - offsets[index]; > } else { > // The pointed data is now in blob_, a buffer handled by Impala. > src_ptr = blob_ + (batch_->data[row_idx] - batch_->blob.data()); > src_len = batch_->length[row_idx]; > } > int dst_len = slot_desc_->type().len; > if (slot_desc_->type().type == TYPE_CHAR) { > int unpadded_len = min(dst_len, static_cast<int>(src_len)); > char* dst_char = reinterpret_cast<char*>(GetSlot(tuple)); > memcpy(dst_char, src_ptr, unpadded_len); > StringValue::PadWithSpaces(dst_char, dst_len, unpadded_len); > return Status::OK(); > } > StringValue* dst = reinterpret_cast<StringValue*>(GetSlot(tuple)); > if (slot_desc_->type().type == TYPE_VARCHAR && src_len > dst_len) { > dst->len = dst_len; > } else { > dst->len = src_len; > } > dst->ptr = src_ptr; > return Status::OK(); > } > {code} > This method is executed for each row. The complexity of it causes the > compiler unable to inline it. I can see this method takes some portion of the > time in TPC-H queries (e.g. 21% in Q12). > Actually, the slot type is determined when we are creating the reader. > Whether the orc vector batch is encoded is determined when we start reading a > new stripe. If the string column is in dictionary encodings in the stripe and > we set EnableLazyDecoding to true in RowReaderOptions, the orc vector batch > will be encoded. > [https://github.com/apache/orc/blob/21259815ae665f6a4beac29edb4fab52c2403e13/c%2B%2B/src/Reader.cc#L1205] > [https://github.com/apache/orc/blob/21259815ae665f6a4beac29edb4fab52c2403e13/c%2B%2B/src/ColumnReader.cc#L1843-L1847] > We can switch to a template implementation so the compile can remove these > checks and inline the method. -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org