[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components

Andrew Wong (Code Review) Tue, 14 Aug 2018 23:05:39 -0700

Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10983 )


Change subject: KUDU-1291. Efficiently support predicates on non-prefix key 
components
......................................................................


Patch Set 15:

(16 comments)

http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/cfile/cfile_reader.cc@792
PS14, Line 792: // Currently this block holds encoded composite key.
              :   Arena arena(16*1024);
              :
              :   DCHECK(!prepared_blocks_.empty());
              :
nit: As a general rule of thumb, if something is indicative of a programmer 
error, we'll add a CHECK or DCHECK, whereas if there's an issue with the 
underlying data, we'll return an error.


http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/cfile/cfile_reader.cc@797
PS14, Line 797: us
nit: I know some of the older code has this reversed, but for new code, we 
favor placing the sigil immediately after the var name (PreparedBlock* pblk), 
here and elsewhere.


http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/cfile/cfile_reader.cc@871
PS14, Line 871:   last_prepare_idx_ = b->first_row_idx() + 
b->dblk_->GetCurrentIndex();
              :   last_prepare_count_ = 0;
              :
              :   p
It's worth documenting in the header the side effect of `cache_seeked_value`, 
i.e. if it is set to true, this call to StoreCurrentValue() will seek an extra 
row, right?

Another approach I chatted with you about before was that maybe we can avoid 
pushing this bookkeeping so low by slightly adjusting the APIs:

 Status SeekAtOrAfter(const EncodedKey& encoded_key, bool* exact_match);  // 
Same behavior as before this patch.
 Status ConsumeNextValue(string* val);  // Combines StoreCurrentValue() and 
GetCurrentValue()

Then, callers (i.e. CFileSet::Iterator) could use key_iter->ConsumeNextValue() 
immediately after SeekAtOrAfter() instead of doing SeekAtOrAfter(/* 
cache_seeked_values=*/true), the CFileIterator::cur_val_ member could be moved 
to the CFileSet::Iterator with a more telling name, like 'skip_scan_cur_key_' 
or somesuch, and we wouldn't have to have caveats and comments explaining the 
restricted access to cur_val_.

Might be a pretty substantial change, but it might with the broken 
encapsulation. WDYT?


http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/common/encoded_key.cc
File src/kudu/common/encoded_key.cc:

http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/common/encoded_key.cc@92
PS14, Line 92:                                        gscoped_ptr<EncodedKey> 
*key,
             :                                        Arena* arena, int32_t 
num_columns
Per the GSG (https://google.github.io/styleguide/cppguide.html):
nit: spacing
nit: output parameters should come after input parameters


http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.h
File src/kudu/tablet/cfile_set.h:

PS14:
nit: s/unique prefix/distinct prefix


http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.h@174
PS14, Line 174: //
              : // Due to the caveat(see below) listed for 
SkipToNextScan(size_t *remaining),
              : // this class should not reference a separate "client" class 
that uses key_iter->cur_val.
Hmm, I'm not sure what this means. See my comment in cfile_reader.cc re: an 
alternate approach.

Also nit: add a space in "caveat (see"


http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.h@208
PS14, Line 208:
              :   // Decode the currently-seeked key into 'enc_key'.
              :   Status
Before getting into the nitty gritty of each method, I think it'd be helpful to 
start out by adding a block comment with:
- a brief, high-level overview of what a skip scan is
- definitions of "predicate column", "prefix columns", "prefix keys", in the 
context of a skip scan
- definitions of a lower and upper bound in the context of a skip scan, and how 
they're used when seeking

With the definitions in one place, it'd be easier to read about, and you 
wouldn't have to sprinkle definitions throughout the below comments. You've 
already got a good base at L238.


http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.h@209
PS14, Line 209: // Decode the currently-seeked key into 'enc_key'.
              :   Status DecodeCurrentKey(Arena* arena, 
gscoped_ptr<EncodedKey>* enc_key);
              :
              :   // This function is used to place the validx_iter_ at the 
next greater prefix key.
              :   // prefix key refers to the first "num_prefix_cols" columns 
of the current key.
              :   // current key is the key currently pointed to by 
validx_iter_ (prior to calling
              :   // this function).
              :   // If 'cache_seeked_value' is true, the validx_iter_ will 
store the value seeked to.
              :   Status SeekToNextPrefixKey(size_t num_prefix_cols, bool 
cache_seeked_value);
              :
              :   // Seek to the next predicate match within the current prefix.
              :   Status SeekToRowWithCurPrefixMatchingPred(const 
gscoped_ptr<EncodedKey>& enc_key);
              :
              :   // Build the key with the same prefix as 'cur_enc_key', that 
has
              :   // 'skip_scan_predicate_value_' in its predicate column,
              :   // and the minimum possible value for all other columns.
              :   gscoped_ptr<EncodedKey> GetKeyWithPredicateVal(KuduPartialRow 
*p_row,
              :                                                  const 
gscoped_ptr<EncodedKey>& cur_enc_key);
              :
              :   // Returns true if the given encoded key matches the skip 
scan predicate.
              :   bool CheckPredicateMatch(const gscoped_ptr<EncodedKey>& 
enc_key) const;
              :
              :   // Check if the column values in the range corresponding to 
the given
              :   // inclusive column id range [start_col_id, end_col_id] are 
equal between the
              :   // two given keys.
              :   bool KeyColumnsMatch(const gscoped_ptr<EncodedKey>& key1,
              :                        const gscoped_ptr<EncodedKey>& key2,
              :                        int start_col_id, int end_col_id) const;
              :
              :   // This method implements a "skip-scan" optimization, 
allowing a scan to use
              :   // the primary key index to efficiently seek to matching rows 
where there are
              :   // predicates on compound key columns that do not necessarily 
include the
              :   // leading column of the primary key. At the time of writing, 
only a single
              :   // equality predicate is supported, although the algorithm 
can support ranges
              :   // of values.
              :   //
              :   // This method should be invoked during the PrepareBatch() 
phase of the row
              :   // iterator lifecycle.
              :   //
              :   // Caveat:
              :   // This method assumes exclusive access to key_iter->cur_val_.
              :   //
              :   // The in-out parameter 'remaining' refers to the number of 
rows remaining to
              :   // scan. When this method is invoked, 'remaining' should 
contain the maximum
              :   // number of remaining rows available to scan. Once this 
method returns,
              :   // 'remaining' will contain the number of rows to scan to 
consume the
              :   // available matching rows according to the equality 
predicate. Note:
              :   // 'remaining' will always be at least 1, although it is a 
TODO to allow it
              :   // to be 0 (0 violates CHECK conditions elsewhere in the scan 
code).
              :   //
              :   // Currently, skip scan will be dynamically disabled when the 
number of seeks
              :   // for unique prefix keys exceeds sqrt(#total rows). We use 
sqrt(#total rows)
              :   // as a cutoff because based on performance tests on upto 10M 
rows per tablet,
              :   // the scan time for skip scan is the same as that of the 
current flow until
              :   // #seeks = sqrt(#total_rows). Further increase in #seeks 
leads to a drop in
              :   // skip scan performance wrt the current flow. This cutoff 
value is stored in
              :   // 'skip_scan_num_seeks_cutoff'.
              :
              :   // Preconditions upon entering this method:
              :   // * key_iter_ is not NULL.
              :   //
              :   // Postconditions upon exiting this method:
              :   // 1. cur_idx_ is updated to the row_id of the next 
row(containing the next
              :   //    higher unique prefix) to scan.
              :   // 2. 'remaining' stores the number the entries to be scanned 
in the current scan range.
              :   //
              :   // See the .cc file for details on the approach and the 
implementation.
              :   Status SkipToNextScan(size_t *remaining);
              : 
These could all be private, right?


http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.h@299
PS14, Line 299: key(PK)
nit: add a space


http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.h@301
PS14, Line 301: predicate_column"
nit: remove underscore


http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.h@362
PS14, Line 362: int64_t skip_scan_num_seeks_cutoff;
              :
nit: follow private members with an underscore


http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc
File src/kudu/tablet/cfile_set.cc:

http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc@551
PS12, Line 551:     "Failed to decode current value from primary key index");
              :   return Status::OK();
              : }
> Noted. I will have to look more into this.
Might be helpful: from util/memory/arena.h and the definition of Reset():

 // Unless allocations exceed max_buffer_size, repetitive filling up and
 // resetting normally lead to quickly settling memory footprint and ceasing
 // buffer allocations, as the arena keeps reusing a single, large buffer.


http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.cc
File src/kudu/tablet/cfile_set.cc:

http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.cc@562
PS14, Line 562:   RETURN_NOT_OK(EncodedKey::IncrementEncodedKeyColumns(
              :       base_data_->tablet_schema(),
              :       &enc_key, &arena, num_prefix_cols));
              :
              :   if (cache_seeked_value) {
              :     // Set the predicate column to the predicate value in case 
we can find a
              :     // predicate match in one search. As a side effect, 
GetKeyWithPredicateVal()
              :
Please document in the .h that if `cache_seeked_value` is true, this will 
search for a different value than if it were not. Is that the intended API?


http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.cc@593
PS14, Line 593:       /* cache_seeked_value= */ true,
              :       /* exact_match= */ nullptr);
Please sprinkle a couple TODOs about how to generalize this patch. E.g.:

TODO(anupama): to support in-range predicates, generalize this to return a key 
with the same prefix as cur_enc_key, a predicate column populated with the 
lower bound of the predicate values, and the minimum value for all other 
columns.


http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.cc@680
PS14, Line 680:     !lower_bound_key_found && loop_num < 
FLAGS_skip_scan_short_circuit_loops;
              :        loop_num++) {
              :     DCHECK_LT(cur_idx_, skip_scan_upper_bound_idx_);
nit: since the steps are described in this method, we can probably get away 
with removing "in three seek approach", here and below. Also probably don't 
need the extra lines of "/////////////..."


http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.cc@770
PS14, Line 770: reak;
How does this happen? Could you elaborate a bit? What does seeking backwards 
mean?



--
To view, visit http://gerrit.cloudera.org:8080/10983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
Gerrit-Change-Number: 10983
Gerrit-PatchSet: 15
Gerrit-Owner: Anupama Gupta <anupama.gu...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Anupama Gupta <anupama.gu...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mpe...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Wed, 15 Aug 2018 06:04:58 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components

Reply via email to