Hongjiang Zhang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17273 )

Change subject: KUDU-3223: Quota management for table write
......................................................................


Patch Set 23:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/17273/21/src/kudu/integration-tests/write_quota-itest.cc
File src/kudu/integration-tests/write_quota-itest.cc:

http://gerrit.cloudera.org:8080/#/c/17273/21/src/kudu/integration-tests/write_quota-itest.cc@216
PS21, Line 216:
> Would it speed this test up to set this to a low value?
I tried this, but most of cases failed because the smallest value of 
FLAGS_flush_threshold_secs is 1 second. So, I cannot change this.


http://gerrit.cloudera.org:8080/#/c/17273/21/src/kudu/integration-tests/write_quota-itest.cc@257
PS21, Line 257:
> nit: in my opinion, it'd be easier to read this with the more traditional
No, here is a little tricky. This case targets to test disk size limit, so I 
enlarge the row count limit. The above loop will break early because the disk 
size reached limit but row count does not reach limit.

    for (k = 0; k < row_count_limit_for_disk_size_test; k++) {
      ASSERT_OK(InsertKeyToTable(client_table_.get(), good_session.get(), k));
      s = good_session->Flush();
      if (!s.ok()) {
        // break the loop once the write is blocked
        break;
      }
      WaitForTServerUpdatesStatisticsToMaster(1000);
    }


http://gerrit.cloudera.org:8080/#/c/17273/21/src/kudu/integration-tests/write_quota-itest.cc@270
PS21, Line 270: 1000
> Rather than waiting and hoping for certain invariants to become true, it ma
I found this break the test case. I guess the possible reason is the in memory 
data was not flush to disk, and disk size is very sensitive in this test case.


http://gerrit.cloudera.org:8080/#/c/17273/21/src/kudu/integration-tests/write_quota-itest.cc@402
PS21, Line 402:  client
> nit: use kCamelCase for const variables, per the Google style guide. Same e
Done


http://gerrit.cloudera.org:8080/#/c/17273/21/src/kudu/integration-tests/write_quota-itest.cc@407
PS21, Line 407:   NO_FATALS(ModifyLimit(2L *
> Should we also tet the size limit?
size limit test case is very difficult to design and test. Every time I change 
the size limit, I have to tune FLAGS_flush_threshold_mb or 
FLAGS_flush_threshold_secs, and change the disk size limit, I'll try but I'm 
afraid this may cause random failure.


http://gerrit.cloudera.org:8080/#/c/17273/16/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/17273/16/src/kudu/master/catalog_manager.cc@339
PS16, Line 339:             "the write may be forbidden.");
> nit: while this is under development, let's tag these as experimental as we
Sure.


http://gerrit.cloudera.org:8080/#/c/17273/16/src/kudu/master/catalog_manager.cc@334
PS16, Line 334: TAG_FLAG(enable_per_range_hash_schemas, unsafe);
              :
              : DEFINE_bool(enable_table_write_limit, false,
              :             "Enable the table write limit. "
              :             "When the table's size or row count is approaching 
the limit, "
              :             "the write may be forbidden.");
              : TAG_FLAG(enable_table_write_limit, experimental);
              :
              : DEFINE_int64(table_disk_size_limit,
              :              std::numeric_limits<int64_t>::max(),
              :              "Set the target size of a table to write");
              :
> I was discussing this feature with Grant, and re-reading the Jira, a few qu
I have investigated some databases.

A quick summary here:
1. Database level quota is the mainstream, e.g. MySQL (1), PostgreSQL (2).
2. Schema or role scope quota: Greenplum (3)
3. Cloud database does not have disk quota, instead, they have service level 
quota, like snowflake (4).
4. Cloudera HBase supports both table level and namespace level quota (5).

Kudu does not have database or namespace concept. Only integrated with HMS, can 
database be supported. In addition, I think you also consider about how Kudu's 
quota management aligns with Impala which relies on HDFS. I think it should be 
database level because HDFS is easy to support that.

I agree to support database level quota management to align with other main 
stream database. However, table level limit is also needed for streaming 
scenario.

So, it looks like both database level and table level should supported.

For this feature, it looks like we need some subtasks to support step by step.

Reference:
1. 
https://www.cyberciti.biz/tips/linux-unix-setting-up-mysql-database-quotas.html
2. 
https://www.postgresql.org/message-id/CAB0yre=rp_ho6bq4cv23elkxrcfhv2yqrb1zhp0rfupewcn...@mail.gmail.com
3. https://greenplum.docs.pivotal.io/6-0/ref_guide/modules/diskquota.html
4. https://docs.snowflake.com/en/user-guide/resource-monitors.html
5. 
https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.0.1/hbase-data-access/content/hbase-quota-management.html


http://gerrit.cloudera.org:8080/#/c/17273/16/src/kudu/master/catalog_manager.cc@2078
PS16, Line 2078:
               :   return table;
> Why not just not set these values in the protobuf message?
Sure


http://gerrit.cloudera.org:8080/#/c/17273/21/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/17273/21/src/kudu/master/catalog_manager.cc@2754
PS21, Line 2754:                                       l.data().owner(),
               :                                       schema);
               :
> Did you consider baking this into AlterTable() directly? As it stands, we'l
Oh, yes, let me try.


http://gerrit.cloudera.org:8080/#/c/17273/21/src/kudu/master/catalog_manager.cc@3426
PS21, Line 3426:
> nit: it helps a little with readability to write this as:
Done


http://gerrit.cloudera.org:8080/#/c/17273/21/src/kudu/master/catalog_manager.cc@3443
PS21, Line 3443: _lock_.AssertAcquiredForReading();
> I've asked this elsewhere, but could we instead use table_lock.data().pb.ha
hmm, I think both has_table_disk_size_limit() and compare with -1 are 
necessary, because -1 is used to disable disk size limit.

table_lock.data().pb.has_table_disk_size_limit() &&
table_disk_size_limit != TableInfo::TABLE_WRITE_DEFAULT_LIMIT &&


http://gerrit.cloudera.org:8080/#/c/17273/21/src/kudu/master/catalog_manager.cc@3462
PS21, Line 3462:
> nit: this will always be "forbidden"
Done


http://gerrit.cloudera.org:8080/#/c/17273/7/src/kudu/master/master.proto
File src/kudu/master/master.proto:

http://gerrit.cloudera.org:8080/#/c/17273/7/src/kudu/master/master.proto@209
PS7, Line 209:
> Seems ok to use -1 for the in-memory values, but for serialization, why -1
Good question. I'm also considering backward compatibility. If the legacy table 
does not have this field. How can I handle it? Do you think this is a potential 
problem?


http://gerrit.cloudera.org:8080/#/c/17273/16/src/kudu/tools/tool_action_table.cc
File src/kudu/tools/tool_action_table.cc:

PS16:
> nit: We typically like to make sure each patch is self-contained and focuse
I mixed them together because my integration test (row count limit and disk 
size limit) needs to dynamically change the quota.



--
To view, visit http://gerrit.cloudera.org:8080/17273
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2dbf365ad59f17c0a4e2e7ea6a5afaa7680724b0
Gerrit-Change-Number: 17273
Gerrit-PatchSet: 23
Gerrit-Owner: Hongjiang Zhang <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Hongjiang Zhang <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Tue, 20 Apr 2021 09:27:45 +0000
Gerrit-HasComments: Yes

Reply via email to