[kudu-CR] [master] Range specific hashing at table creation time.

Mahesh Reddy (Code Review) Wed, 16 Dec 2020 16:00:59 -0800

Mahesh Reddy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16859 )


Change subject: [master] Range specific hashing at table creation time.
......................................................................


Patch Set 5:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/16859/4/src/kudu/common/partition-test.cc
File src/kudu/common/partition-test.cc:

http://gerrit.cloudera.org:8080/#/c/16859/4/src/kudu/common/partition-test.cc@1157
PS4, Line 1157:   struct PartitionData {
> nit: would it be possible to write this in a similar style to TableLocation
Should I refactor other test cases in this test file to include this vector of 
structs?

Also, while going through previous partition tests I came across 
'PartitionDebugString()' which prints its hash schemas and its range. For 
example,

"HASH (a) PARTITION 0, HASH (b) PARTITION 0, "
 RANGE (a, b, c) PARTITION ("a1", "b1", "c1") <= VALUES < ("a2", "b2", "")".

It got me thinking about the fields of the Partition class, since in the 
PartitionDebugString() method it prints the hash bucket schemas for each 
partition by accessing the 'hash_bucket_schemas_' field of the PartitionSchema 
class. Currently, a Partition object has fields for its hash buckets and both 
the start and end of its partition key. With this new feature and the current 
design, there is no way to access the entire hash schema(s) per a range. 
Basically, the missing parts are the columns that are hashed on and the hash 
seed.

Technically, the partition key holds all the relevant information about the 
hash and range schema for a single partition. I'll look into this some more, 
but do you think it's worth adding a field to the Partition class similar to 
'hash_bucket_schemas_' of the PartitionSchema class?


http://gerrit.cloudera.org:8080/#/c/16859/3/src/kudu/common/partition.cc
File src/kudu/common/partition.cc:

http://gerrit.cloudera.org:8080/#/c/16859/3/src/kudu/common/partition.cc@417
PS3, Line 417:
             :   if (!split_rows.empty()) {
             :     for (const auto& hash_schemas : range_hash_schemas) {
             :       if (!hash_schemas.empty()) {
             :         return Status::InvalidArgument("Both 'split_rows' and 
'range_hash_schemas' cannot be "
             :                                        "populated at the same 
time.");
             :       }
             :     }
             :   }
             :
             :   if (!range_hash_schemas.empty() && range_bounds.size() != 
range_hash_schemas.size()) {
             :     return Status::InvalidArgument("The number of range bounds 
does not match the number of user "
             :                                    "supplied hash schemas.");
             :   }
> If there are unexercised code paths, more coverage couldn't hurt. That said
added another test for this case in table_locations-itest


http://gerrit.cloudera.org:8080/#/c/16859/4/src/kudu/integration-tests/table_locations-itest.cc
File src/kudu/integration-tests/table_locations-itest.cc:

http://gerrit.cloudera.org:8080/#/c/16859/4/src/kudu/integration-tests/table_locations-itest.cc@482
PS4, Line 482:   G
> nit: there doesn't seem to be a point to this extra scope, since we're only
Done


http://gerrit.cloudera.org:8080/#/c/16859/4/src/kudu/integration-tests/table_locations-itest.cc@517
PS4, Line 517: t i = 0; i < resp.tablet_locations_size(); i++) {
             :     EXPECT_EQ(partition_key_starts[i],
             :
> nit: Isn't this the default comparator for strings? Can we get by without i
Done



--
To view, visit http://gerrit.cloudera.org:8080/16859
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8f0dcbc3324f8f2d6e99b4d169fdf5c7f7dff95d
Gerrit-Change-Number: 16859
Gerrit-PatchSet: 5
Gerrit-Owner: Mahesh Reddy <mre...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ban...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <mre...@cloudera.com>
Gerrit-Comment-Date: Thu, 17 Dec 2020 00:00:33 +0000
Gerrit-HasComments: Yes

[kudu-CR] [master] Range specific hashing at table creation time.

Reply via email to