[kudu-CR] KUDU-1165: Implement partition pruning for C++ client and server

Dan Burkert (Code Review) Thu, 24 Mar 2016 14:34:18 -0700

Dan Burkert has posted comments on this change.

Change subject: KUDU-1165: Implement partition pruning for C++ client and server
......................................................................



Patch Set 20:

(28 comments)

http://gerrit.cloudera.org:8080/#/c/2413/20/src/kudu/common/partition_pruner-test.cc
File src/kudu/common/partition_pruner-test.cc:

Line 34: #include "kudu/util/auto_release_pool.h"
> I don't see this used. Could you double check the other includes too?
Done


http://gerrit.cloudera.org:8080/#/c/2413/20/src/kudu/common/partition_pruner.cc
File src/kudu/common/partition_pruner.cc:

Line 77:     // Don't bother if there are no lower/upper PK bounds
> The code "doesn't bother" if both the lower and upper bounds are missing. B
Done


Line 135:   Arena arena(max<size_t>(Arena::kMinimumChunkSize, 
schema.key_byte_size()), 4096);
> Would be good to add a comment explaining the choices in this constructor.
Done


Line 165:   // DISTRIBUTE BY RANGE (c)
> As we discussed, DISTRIBUTE BY RANGE without any split rows has no effect, 
I think I was unclear in our discussion.  Having range columns, regardless of 
whether there are splits or not, does effect the partition key. If there are 
not splits, then the range portion of the partition key does not have any 
effect on how rows are mapped to tablets.  In this algorithm we are interested 
in the range portion, but don't have any information about whether the table 
has split keys or not.  So I don't think the range should be removed (since it 
makes the example into somewhat of a special case).  Perhaps I should add split 
rows?  It could easily be argued that would be adding extraneous information to 
the example.


Line 173:   // | a = 0      | [(bucket=0, bucket=2, c=0), (bucket=0, bucket=2, 
c=1)) |
> Nit: where is the end bracket for each of these examples?
The opening bracket matches up with a closing paren to indicate a closed, open 
range.


Line 182:   // |            | [(bucket=0, bucket=2, c=0), (bucket=1, bucket=2, 
c=1)) |
> I don't understand this example.
yes, this is a bug in the example.  What you expected is correct.  Another way 
of looking at this particular case is that it's 1/2 the partitions of the c = 0 
example towards the bottom (that one doesn't have the same bug).


Line 202:   // components and a range component, then then a few patterns 
emerge from the
> Nit: double then
Done


Line 203:   // examples above:
> Nit: terminology-wise, perhaps it'd be clearer if you used the word "column
In this example and in the design doc component is not the same as column.  A 
component is either a hash component, of which there can be 0 or more, or a 
range component, of which there is always one.  Hash components are made up of 
one or more columns, while the range component is made up of 0 or more columns.


Line 213:   //    it an exclusive key.
> This seems to be somewhat true if the final constrained component is in a r
That's true.  The reason I didn't point this out is that the transformation of 
the predicate c = 0 to the range bounds of [(0), (1)) happens during key 
encoding, which is a little bit out of scope of this example.


Line 217:   //    before a final constrained component.
> You should also mention the "base case": if there are no unconstrained hash
Done


Line 221:   std::string range_upper_bound;
> Nit: don't need std:: prefix.
Done


Line 224:     if (AreRangeColumnsPrefixOfPrimaryKey(schema, range_columns)) {
> For sanity checking, should we first verify that schema and partition_schem
I've added a check to AreRangeColumnsPrefixOfPrimaryKey.


Line 271:   // The index of the final constrained hash component in the 
partition key.
> Okay, it's roughly from here (well, from the std::distance call) that my ey
OK I've expanded the numbering, added more comments, and improved some variable 
naming.  Hopefully this helps.


Line 283:   // Create the set of ranges.
> Nit: stopped using step numbers?
Done


Line 323:     get<0>(range).append(range_lower_bound);
> Can range_lower_bound/range_upper_bound be empty here? I guess append() wil
correct, I've changed the comment to that effect.


Line 327:   // Remove all ranges past the scan spec's upper bound partition key 
key.
> Nit: key key
Done


Line 330:       if (!get<1>(*range).empty() &&
> Why do we check for empty() on the end of the range but not on the beginnin
correct, the empty string in the upper bound corresponds to unbounded, but it 
doesn't sort that way naturally.


Line 367:   for (auto range = partition_key_ranges_.rbegin();
> Looks similar to the "remove all ranges past..." logic above, but is just d
Yes.  Crucially, the 'remove all ranges' routine above operates on the 
partition key ranges when they are in ascending sort order.  One of the 
invariants of this class is that the key ranges are kept in descending sort 
order, so I didn't want to split out that logic (which isn't used anywhere 
else, anyways).  This method is used elsewhere, and depends on the normal 
descending sort order.


Line 391:          partition.partition_key_end() <= get<0>(*range));
> Nit: indent by one more character to show that it's part of the parens.
Done


http://gerrit.cloudera.org:8080/#/c/2413/20/src/kudu/common/partition_pruner.h
File src/kudu/common/partition_pruner.h:

Line 18: #pragma once
> So this is what we're doing going forward, instead of the older style of in
I've been using it since the c++11 upgrade.  It's supposed to help compile 
times, and it's supported by every c++11 compiler on platforms we support.


Line 38: class PartitionPruner {
> May be useful to add in the class comment that partition keys are represent
Done


Line 41:    PartitionPruner() = default;
> Nit: indentation is a little off.
Done


Line 44:   // already be optimized.
> It isn't clear what "optimized" means in this context.
Done


Line 50:   bool HasMorePartitions() const;
> Nit: as a general comment, this domain is fraught with terminology (as you 
Done


Line 55:   // Removes partition key ranges through the provided exclusive upper 
bound.
> Nit: Removes all partition key ranges...
Done


Line 60:   // Used for testing.
> Only for testing? If so, add ForTests() to the method name.
Done


Line 63:   std::string ToString(const Schema& schema, const PartitionSchema& 
partition_schema) const;
> Normally ToString() is a no-arg method. Given that Init() is provided the s
that would push the complexity into a restriction that the partition pruner 
can't outlive the schema and partition schema, not sure that's worth it.

In the past having the no-arg ToString has resulted in things like ScanSpec 
having 'ToString' and 'ToStringWithSchema'.  'ToString' was totally worthless 
because it output encoded forms of bounds instead of the proper, easily 
readable forms.  I prefer only one way to do it, with nice printing.  These are 
only ever used for logging and debugging, anyway.


http://gerrit.cloudera.org:8080/#/c/2413/20/src/kudu/util/memory/arena.h
File src/kudu/util/memory/arena.h:

Line 270:   static const size_t kMinimumChunkSize;
> Add a comment here?
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/2413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9f4fc14b44af63a17b34cd9c2a9134127b1afe4d
Gerrit-PatchSet: 20
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Dan Burkert <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1165: Implement partition pruning for C++ client and server

Reply via email to