Hongjiang Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17273 )
Change subject: KUDU-3223: Quota management for table write ...................................................................... Patch Set 23: (14 comments) http://gerrit.cloudera.org:8080/#/c/17273/21/src/kudu/integration-tests/write_quota-itest.cc File src/kudu/integration-tests/write_quota-itest.cc: http://gerrit.cloudera.org:8080/#/c/17273/21/src/kudu/integration-tests/write_quota-itest.cc@216 PS21, Line 216: > Would it speed this test up to set this to a low value? I tried this, but most of cases failed because the smallest value of FLAGS_flush_threshold_secs is 1 second. So, I cannot change this. http://gerrit.cloudera.org:8080/#/c/17273/21/src/kudu/integration-tests/write_quota-itest.cc@257 PS21, Line 257: > nit: in my opinion, it'd be easier to read this with the more traditional No, here is a little tricky. This case targets to test disk size limit, so I enlarge the row count limit. The above loop will break early because the disk size reached limit but row count does not reach limit. for (k = 0; k < row_count_limit_for_disk_size_test; k++) { ASSERT_OK(InsertKeyToTable(client_table_.get(), good_session.get(), k)); s = good_session->Flush(); if (!s.ok()) { // break the loop once the write is blocked break; } WaitForTServerUpdatesStatisticsToMaster(1000); } http://gerrit.cloudera.org:8080/#/c/17273/21/src/kudu/integration-tests/write_quota-itest.cc@270 PS21, Line 270: 1000 > Rather than waiting and hoping for certain invariants to become true, it ma I found this break the test case. I guess the possible reason is the in memory data was not flush to disk, and disk size is very sensitive in this test case. http://gerrit.cloudera.org:8080/#/c/17273/21/src/kudu/integration-tests/write_quota-itest.cc@402 PS21, Line 402: client > nit: use kCamelCase for const variables, per the Google style guide. Same e Done http://gerrit.cloudera.org:8080/#/c/17273/21/src/kudu/integration-tests/write_quota-itest.cc@407 PS21, Line 407: NO_FATALS(ModifyLimit(2L * > Should we also tet the size limit? size limit test case is very difficult to design and test. Every time I change the size limit, I have to tune FLAGS_flush_threshold_mb or FLAGS_flush_threshold_secs, and change the disk size limit, I'll try but I'm afraid this may cause random failure. http://gerrit.cloudera.org:8080/#/c/17273/16/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: http://gerrit.cloudera.org:8080/#/c/17273/16/src/kudu/master/catalog_manager.cc@339 PS16, Line 339: "the write may be forbidden."); > nit: while this is under development, let's tag these as experimental as we Sure. http://gerrit.cloudera.org:8080/#/c/17273/16/src/kudu/master/catalog_manager.cc@334 PS16, Line 334: TAG_FLAG(enable_per_range_hash_schemas, unsafe); : : DEFINE_bool(enable_table_write_limit, false, : "Enable the table write limit. " : "When the table's size or row count is approaching the limit, " : "the write may be forbidden."); : TAG_FLAG(enable_table_write_limit, experimental); : : DEFINE_int64(table_disk_size_limit, : std::numeric_limits<int64_t>::max(), : "Set the target size of a table to write"); : > I was discussing this feature with Grant, and re-reading the Jira, a few qu I have investigated some databases. A quick summary here: 1. Database level quota is the mainstream, e.g. MySQL (1), PostgreSQL (2). 2. Schema or role scope quota: Greenplum (3) 3. Cloud database does not have disk quota, instead, they have service level quota, like snowflake (4). 4. Cloudera HBase supports both table level and namespace level quota (5). Kudu does not have database or namespace concept. Only integrated with HMS, can database be supported. In addition, I think you also consider about how Kudu's quota management aligns with Impala which relies on HDFS. I think it should be database level because HDFS is easy to support that. I agree to support database level quota management to align with other main stream database. However, table level limit is also needed for streaming scenario. So, it looks like both database level and table level should supported. For this feature, it looks like we need some subtasks to support step by step. Reference: 1. https://www.cyberciti.biz/tips/linux-unix-setting-up-mysql-database-quotas.html 2. https://www.postgresql.org/message-id/CAB0yre=rp_ho6bq4cv23elkxrcfhv2yqrb1zhp0rfupewcn...@mail.gmail.com 3. https://greenplum.docs.pivotal.io/6-0/ref_guide/modules/diskquota.html 4. https://docs.snowflake.com/en/user-guide/resource-monitors.html 5. https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.0.1/hbase-data-access/content/hbase-quota-management.html http://gerrit.cloudera.org:8080/#/c/17273/16/src/kudu/master/catalog_manager.cc@2078 PS16, Line 2078: : return table; > Why not just not set these values in the protobuf message? Sure http://gerrit.cloudera.org:8080/#/c/17273/21/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: http://gerrit.cloudera.org:8080/#/c/17273/21/src/kudu/master/catalog_manager.cc@2754 PS21, Line 2754: l.data().owner(), : schema); : > Did you consider baking this into AlterTable() directly? As it stands, we'l Oh, yes, let me try. http://gerrit.cloudera.org:8080/#/c/17273/21/src/kudu/master/catalog_manager.cc@3426 PS21, Line 3426: > nit: it helps a little with readability to write this as: Done http://gerrit.cloudera.org:8080/#/c/17273/21/src/kudu/master/catalog_manager.cc@3443 PS21, Line 3443: _lock_.AssertAcquiredForReading(); > I've asked this elsewhere, but could we instead use table_lock.data().pb.ha hmm, I think both has_table_disk_size_limit() and compare with -1 are necessary, because -1 is used to disable disk size limit. table_lock.data().pb.has_table_disk_size_limit() && table_disk_size_limit != TableInfo::TABLE_WRITE_DEFAULT_LIMIT && http://gerrit.cloudera.org:8080/#/c/17273/21/src/kudu/master/catalog_manager.cc@3462 PS21, Line 3462: > nit: this will always be "forbidden" Done http://gerrit.cloudera.org:8080/#/c/17273/7/src/kudu/master/master.proto File src/kudu/master/master.proto: http://gerrit.cloudera.org:8080/#/c/17273/7/src/kudu/master/master.proto@209 PS7, Line 209: > Seems ok to use -1 for the in-memory values, but for serialization, why -1 Good question. I'm also considering backward compatibility. If the legacy table does not have this field. How can I handle it? Do you think this is a potential problem? http://gerrit.cloudera.org:8080/#/c/17273/16/src/kudu/tools/tool_action_table.cc File src/kudu/tools/tool_action_table.cc: PS16: > nit: We typically like to make sure each patch is self-contained and focuse I mixed them together because my integration test (row count limit and disk size limit) needs to dynamically change the quota. -- To view, visit http://gerrit.cloudera.org:8080/17273 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2dbf365ad59f17c0a4e2e7ea6a5afaa7680724b0 Gerrit-Change-Number: 17273 Gerrit-PatchSet: 23 Gerrit-Owner: Hongjiang Zhang <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Hongjiang Zhang <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Tue, 20 Apr 2021 09:27:45 +0000 Gerrit-HasComments: Yes
