Mahesh Reddy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16859 )
Change subject: [master] Range specific hashing at table creation time. ...................................................................... Patch Set 5: (4 comments) http://gerrit.cloudera.org:8080/#/c/16859/4/src/kudu/common/partition-test.cc File src/kudu/common/partition-test.cc: http://gerrit.cloudera.org:8080/#/c/16859/4/src/kudu/common/partition-test.cc@1157 PS4, Line 1157: struct PartitionData { > nit: would it be possible to write this in a similar style to TableLocation Should I refactor other test cases in this test file to include this vector of structs? Also, while going through previous partition tests I came across 'PartitionDebugString()' which prints its hash schemas and its range. For example, "HASH (a) PARTITION 0, HASH (b) PARTITION 0, " RANGE (a, b, c) PARTITION ("a1", "b1", "c1") <= VALUES < ("a2", "b2", "")". It got me thinking about the fields of the Partition class, since in the PartitionDebugString() method it prints the hash bucket schemas for each partition by accessing the 'hash_bucket_schemas_' field of the PartitionSchema class. Currently, a Partition object has fields for its hash buckets and both the start and end of its partition key. With this new feature and the current design, there is no way to access the entire hash schema(s) per a range. Basically, the missing parts are the columns that are hashed on and the hash seed. Technically, the partition key holds all the relevant information about the hash and range schema for a single partition. I'll look into this some more, but do you think it's worth adding a field to the Partition class similar to 'hash_bucket_schemas_' of the PartitionSchema class? http://gerrit.cloudera.org:8080/#/c/16859/3/src/kudu/common/partition.cc File src/kudu/common/partition.cc: http://gerrit.cloudera.org:8080/#/c/16859/3/src/kudu/common/partition.cc@417 PS3, Line 417: : if (!split_rows.empty()) { : for (const auto& hash_schemas : range_hash_schemas) { : if (!hash_schemas.empty()) { : return Status::InvalidArgument("Both 'split_rows' and 'range_hash_schemas' cannot be " : "populated at the same time."); : } : } : } : : if (!range_hash_schemas.empty() && range_bounds.size() != range_hash_schemas.size()) { : return Status::InvalidArgument("The number of range bounds does not match the number of user " : "supplied hash schemas."); : } > If there are unexercised code paths, more coverage couldn't hurt. That said added another test for this case in table_locations-itest http://gerrit.cloudera.org:8080/#/c/16859/4/src/kudu/integration-tests/table_locations-itest.cc File src/kudu/integration-tests/table_locations-itest.cc: http://gerrit.cloudera.org:8080/#/c/16859/4/src/kudu/integration-tests/table_locations-itest.cc@482 PS4, Line 482: G > nit: there doesn't seem to be a point to this extra scope, since we're only Done http://gerrit.cloudera.org:8080/#/c/16859/4/src/kudu/integration-tests/table_locations-itest.cc@517 PS4, Line 517: t i = 0; i < resp.tablet_locations_size(); i++) { : EXPECT_EQ(partition_key_starts[i], : > nit: Isn't this the default comparator for strings? Can we get by without i Done -- To view, visit http://gerrit.cloudera.org:8080/16859 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8f0dcbc3324f8f2d6e99b4d169fdf5c7f7dff95d Gerrit-Change-Number: 16859 Gerrit-PatchSet: 5 Gerrit-Owner: Mahesh Reddy <mre...@cloudera.com> Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: Bankim Bhavsar <ban...@cloudera.com> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mahesh Reddy <mre...@cloudera.com> Gerrit-Comment-Date: Thu, 17 Dec 2020 00:00:33 +0000 Gerrit-HasComments: Yes