Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/19278 )
Change subject: [tools] Add test to generate heavy rowset compaction ...................................................................... Patch Set 10: (13 comments) http://gerrit.cloudera.org:8080/#/c/19278/10//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19278/10//COMMIT_MSG@13 PS10, Line 13: TestLoadgenCompactRowSetsOpHighMemUsage It would be great to specify the number of iterations at various stages as well. 4 GBytes looks like a non-so-great number, so it's interesting to know what were the iteration numbers. http://gerrit.cloudera.org:8080/#/c/19278/10/src/kudu/benchmarks/compaction-highmem.cc File src/kudu/benchmarks/compaction-highmem.cc: http://gerrit.cloudera.org:8080/#/c/19278/10/src/kudu/benchmarks/compaction-highmem.cc@18 PS10, Line 18: #include <cstdio> // IWYU pragma: keep : : #include <algorithm> // IWYU pragma: keep : #include <iterator> // IWYU pragma: keep : #include <memory> // IWYU pragma: keep : #include <string> // IWYU pragma: keep : #include <type_traits> // IWYU pragma: keep : #include <vector> // IWYU pragma: keep : : #include <gflags/gflags.h> // IWYU pragma: keep : #include <gtest/gtest.h> // IWYU pragma: keep nit: the 'keep' pragma is needed only when IWYU for some reason recommends removing the include, and doing so breaks compilation. So, consider removing these pragmas and see what's the result. http://gerrit.cloudera.org:8080/#/c/19278/10/src/kudu/benchmarks/compaction-highmem.cc@41 PS10, Line 41: reps Consider renaming 'reps' into 'iterations'. http://gerrit.cloudera.org:8080/#/c/19278/10/src/kudu/benchmarks/compaction-highmem.cc@43 PS10, Line 43: DEFINE_int32 Could there be negative values for these flags? If not, maybe consider using DEFINE_uint32 instead of DEFINE_int32? http://gerrit.cloudera.org:8080/#/c/19278/10/src/kudu/benchmarks/compaction-highmem.cc@89 PS10, Line 89: nit: fix the indent? http://gerrit.cloudera.org:8080/#/c/19278/10/src/kudu/benchmarks/compaction-highmem.cc@91 PS10, Line 91: string* tool_stdout, : string* tool_stderr, style nit: out and in-out parameters should come last http://gerrit.cloudera.org:8080/#/c/19278/10/src/kudu/benchmarks/compaction-highmem.cc@105 PS10, Line 105: ColumnSchema("bool_val", BOOL), : ColumnSchema("int8_val", INT8), : ColumnSchema("int16_val", INT16), : ColumnSchema("int32_val", INT32), : ColumnSchema("int64_val", INT64), : ColumnSchema("float_val", FLOAT), : ColumnSchema("double_val", DOUBLE), : ColumnSchema("decimal32_val", DECIMAL32, false, false, false, : nullptr, nullptr, ColumnStorageAttributes(), : ColumnTypeAttributes(9, 9)), : ColumnSchema("decimal64_val", DECIMAL64, false, false, false, : nullptr, nullptr, ColumnStorageAttributes(), : ColumnTypeAttributes(18, 2)), : ColumnSchema("decimal128_val", DECIMAL128, false, false, false, : nullptr, nullptr, ColumnStorageAttributes(), : ColumnTypeAttributes(38, 0)), : ColumnSchema("unixtime_micros_val", UNIXTIME_MICROS), : ColumnSchema("string_val", STRING), : ColumnSchema("binary_val", BINARY), Is it crucial to have all these different types of columns? IIRC, the OOM happened in case of system tablet, where it has just three columns: INT8, INT8, STRING, where the primary column is comprised for first two INT8 columns. http://gerrit.cloudera.org:8080/#/c/19278/10/src/kudu/benchmarks/compaction-highmem.cc@133 PS10, Line 133: .add_hash_partitions({kKeyColumnName}, 2) Would having just a single partition be a more straight forward case? Otherwise, why not to add tens of partitions if we are shooting for an OOM condition? http://gerrit.cloudera.org:8080/#/c/19278/10/src/kudu/benchmarks/compaction-highmem.cc@166 PS10, Line 166: "--table_num_replicas=1", : "--table_num_range_partitions=2", : "--table_num_hash_partitions=1", Here and elsewhere in the code: IIUC, these parameters are irrelevant and can be omitted since the 'perf loadgen' tool uses an already existing table. http://gerrit.cloudera.org:8080/#/c/19278/10/src/kudu/benchmarks/compaction-highmem.cc@175 PS10, Line 175: CompactRowSetsOpTable Consider creating a constant and using it elsewhere in the code below? http://gerrit.cloudera.org:8080/#/c/19278/10/src/kudu/benchmarks/compaction-highmem.cc@195 PS10, Line 195: for (int reps = 0; reps < FLAGS_stage_2_reps; reps++) { : const vector<string> args = { : "perf", : "loadgen", : cluster_->master()->bound_rpc_addr().ToString(), : "--table_name=CompactRowSetsOpTable", : "--table_num_replicas=1", : "--table_num_range_partitions=2", : "--table_num_hash_partitions=1", : "--string_len=8192", : "--use_random_pk", : "--use_upsert", : "--num_threads=2", : "--num_rows_per_thread=100000", : }; : ASSERT_OK(RunKuduTool(args)); : } : : for (int reps = 0; reps < FLAGS_stage_3_reps; reps++) { : const vector<string> args = { : "perf", : "loadgen", : cluster_->master()->bound_rpc_addr().ToString(), : "--table_name=CompactRowSetsOpTable", : "--table_num_replicas=1", : "--table_num_range_partitions=2", : "--table_num_hash_partitions=1", : "--string_len=8192", : "--use_random_pk", : "--use_upsert", : "--num_threads=2", : "--num_rows_per_thread=1000", : }; : ASSERT_OK(RunKuduTool(args)); : } : : for (int reps = 0; reps < FLAGS_stage_4_reps; reps++) { : const vector<string> args = { : "perf", : "loadgen", : cluster_->master()->bound_rpc_addr().ToString(), : "--table_name=CompactRowSetsOpTable", : "--table_num_replicas=1", : "--table_num_range_partitions=2", : "--table_num_hash_partitions=1", : "--string_len=8192", : "--use_random_pk", : "--use_upsert", : "--num_threads=2", : "--num_rows_per_thread=10000", : }; : ASSERT_OK(RunKuduTool(args)); : } : } Could these be combined? It seems they use the same pattern, the only difference is the number operations per thread, but they use different (but repeating) seed for the random generator. I'd expect there might be a good chance to have the desired effect with many iterations of the very last cycle, but the first two might be omitted. http://gerrit.cloudera.org:8080/#/c/19278/10/src/kudu/benchmarks/compaction-highmem.cc@297 PS10, Line 297: for (int reps = 0; reps < FLAGS_stage_1_reps; reps++) { Does it make sense to separate the common part with the iterations of various load patterns to be common between these the TestLoadgenCompactRowSetsOpHighMemUsage and TestLoadgenCompactRowSetsOpOptimalMemUsage scenarios? http://gerrit.cloudera.org:8080/#/c/19278/10/src/kudu/scripts/benchmarks.sh File src/kudu/scripts/benchmarks.sh: http://gerrit.cloudera.org:8080/#/c/19278/10/src/kudu/scripts/benchmarks.sh@241 PS10, Line 241: $PERF_STAT ./build/latest/bin/compaction-highmem \ : --gtest_filter=\*TestLoadgenCompactRowSets\* \ : --stage_1_reps=50 \ : --stage_2_reps=50 \ : --stage_3_reps=1000 \ : --stage_4_reps=100 \ : &> $LOGDIR/${COMPACT_HIGHMEM_TEST}.log If you are about to add this piece, you also need to take care of other parts to collect and record the stats into the database. As a reference, track other scenarios like MT_TABLET_TEST in this script. -- To view, visit http://gerrit.cloudera.org:8080/19278 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ief4ec03f1bf0f7936f8fb054948f87e71333f824 Gerrit-Change-Number: 19278 Gerrit-PatchSet: 10 Gerrit-Owner: Ashwani Raina <ara...@cloudera.com> Gerrit-Reviewer: Abhishek Chennaka <achenn...@cloudera.com> Gerrit-Reviewer: Alexey Serbin <ale...@apache.org> Gerrit-Reviewer: Ashwani Raina <ara...@cloudera.com> Gerrit-Reviewer: Attila Bukor <abu...@apache.org> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mahesh Reddy <mre...@cloudera.com> Gerrit-Reviewer: Zoltan Chovan <zcho...@cloudera.com> Gerrit-Comment-Date: Tue, 28 Mar 2023 05:33:39 +0000 Gerrit-HasComments: Yes