Dan Burkert has posted comments on this change. Change subject: KUDU-1165: Implement partition pruning for C++ client and server ......................................................................
Patch Set 20: (28 comments) http://gerrit.cloudera.org:8080/#/c/2413/20/src/kudu/common/partition_pruner-test.cc File src/kudu/common/partition_pruner-test.cc: Line 34: #include "kudu/util/auto_release_pool.h" > I don't see this used. Could you double check the other includes too? Done http://gerrit.cloudera.org:8080/#/c/2413/20/src/kudu/common/partition_pruner.cc File src/kudu/common/partition_pruner.cc: Line 77: // Don't bother if there are no lower/upper PK bounds > The code "doesn't bother" if both the lower and upper bounds are missing. B Done Line 135: Arena arena(max<size_t>(Arena::kMinimumChunkSize, schema.key_byte_size()), 4096); > Would be good to add a comment explaining the choices in this constructor. Done Line 165: // DISTRIBUTE BY RANGE (c) > As we discussed, DISTRIBUTE BY RANGE without any split rows has no effect, I think I was unclear in our discussion. Having range columns, regardless of whether there are splits or not, does effect the partition key. If there are not splits, then the range portion of the partition key does not have any effect on how rows are mapped to tablets. In this algorithm we are interested in the range portion, but don't have any information about whether the table has split keys or not. So I don't think the range should be removed (since it makes the example into somewhat of a special case). Perhaps I should add split rows? It could easily be argued that would be adding extraneous information to the example. Line 173: // | a = 0 | [(bucket=0, bucket=2, c=0), (bucket=0, bucket=2, c=1)) | > Nit: where is the end bracket for each of these examples? The opening bracket matches up with a closing paren to indicate a closed, open range. Line 182: // | | [(bucket=0, bucket=2, c=0), (bucket=1, bucket=2, c=1)) | > I don't understand this example. yes, this is a bug in the example. What you expected is correct. Another way of looking at this particular case is that it's 1/2 the partitions of the c = 0 example towards the bottom (that one doesn't have the same bug). Line 202: // components and a range component, then then a few patterns emerge from the > Nit: double then Done Line 203: // examples above: > Nit: terminology-wise, perhaps it'd be clearer if you used the word "column In this example and in the design doc component is not the same as column. A component is either a hash component, of which there can be 0 or more, or a range component, of which there is always one. Hash components are made up of one or more columns, while the range component is made up of 0 or more columns. Line 213: // it an exclusive key. > This seems to be somewhat true if the final constrained component is in a r That's true. The reason I didn't point this out is that the transformation of the predicate c = 0 to the range bounds of [(0), (1)) happens during key encoding, which is a little bit out of scope of this example. Line 217: // before a final constrained component. > You should also mention the "base case": if there are no unconstrained hash Done Line 221: std::string range_upper_bound; > Nit: don't need std:: prefix. Done Line 224: if (AreRangeColumnsPrefixOfPrimaryKey(schema, range_columns)) { > For sanity checking, should we first verify that schema and partition_schem I've added a check to AreRangeColumnsPrefixOfPrimaryKey. Line 271: // The index of the final constrained hash component in the partition key. > Okay, it's roughly from here (well, from the std::distance call) that my ey OK I've expanded the numbering, added more comments, and improved some variable naming. Hopefully this helps. Line 283: // Create the set of ranges. > Nit: stopped using step numbers? Done Line 323: get<0>(range).append(range_lower_bound); > Can range_lower_bound/range_upper_bound be empty here? I guess append() wil correct, I've changed the comment to that effect. Line 327: // Remove all ranges past the scan spec's upper bound partition key key. > Nit: key key Done Line 330: if (!get<1>(*range).empty() && > Why do we check for empty() on the end of the range but not on the beginnin correct, the empty string in the upper bound corresponds to unbounded, but it doesn't sort that way naturally. Line 367: for (auto range = partition_key_ranges_.rbegin(); > Looks similar to the "remove all ranges past..." logic above, but is just d Yes. Crucially, the 'remove all ranges' routine above operates on the partition key ranges when they are in ascending sort order. One of the invariants of this class is that the key ranges are kept in descending sort order, so I didn't want to split out that logic (which isn't used anywhere else, anyways). This method is used elsewhere, and depends on the normal descending sort order. Line 391: partition.partition_key_end() <= get<0>(*range)); > Nit: indent by one more character to show that it's part of the parens. Done http://gerrit.cloudera.org:8080/#/c/2413/20/src/kudu/common/partition_pruner.h File src/kudu/common/partition_pruner.h: Line 18: #pragma once > So this is what we're doing going forward, instead of the older style of in I've been using it since the c++11 upgrade. It's supposed to help compile times, and it's supported by every c++11 compiler on platforms we support. Line 38: class PartitionPruner { > May be useful to add in the class comment that partition keys are represent Done Line 41: PartitionPruner() = default; > Nit: indentation is a little off. Done Line 44: // already be optimized. > It isn't clear what "optimized" means in this context. Done Line 50: bool HasMorePartitions() const; > Nit: as a general comment, this domain is fraught with terminology (as you Done Line 55: // Removes partition key ranges through the provided exclusive upper bound. > Nit: Removes all partition key ranges... Done Line 60: // Used for testing. > Only for testing? If so, add ForTests() to the method name. Done Line 63: std::string ToString(const Schema& schema, const PartitionSchema& partition_schema) const; > Normally ToString() is a no-arg method. Given that Init() is provided the s that would push the complexity into a restriction that the partition pruner can't outlive the schema and partition schema, not sure that's worth it. In the past having the no-arg ToString has resulted in things like ScanSpec having 'ToString' and 'ToStringWithSchema'. 'ToString' was totally worthless because it output encoded forms of bounds instead of the proper, easily readable forms. I prefer only one way to do it, with nice printing. These are only ever used for logging and debugging, anyway. http://gerrit.cloudera.org:8080/#/c/2413/20/src/kudu/util/memory/arena.h File src/kudu/util/memory/arena.h: Line 270: static const size_t kMinimumChunkSize; > Add a comment here? Done -- To view, visit http://gerrit.cloudera.org:8080/2413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9f4fc14b44af63a17b34cd9c2a9134127b1afe4d Gerrit-PatchSet: 20 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dan Burkert <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Dan Burkert <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-HasComments: Yes
