[kudu-CR] [tools] Add test to generate heavy rowset compaction

Alexey Serbin (Code Review) Mon, 27 Mar 2023 22:37:49 -0700

Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19278 )


Change subject: [tools] Add test to generate heavy rowset compaction
......................................................................


Patch Set 10:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/19278/10//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19278/10//COMMIT_MSG@13
PS10, Line 13: TestLoadgenCompactRowSetsOpHighMemUsage
It would be great to specify the number of iterations at various stages as well.

4 GBytes looks like a non-so-great number, so it's interesting to know what 
were the iteration numbers.


http://gerrit.cloudera.org:8080/#/c/19278/10/src/kudu/benchmarks/compaction-highmem.cc
File src/kudu/benchmarks/compaction-highmem.cc:

http://gerrit.cloudera.org:8080/#/c/19278/10/src/kudu/benchmarks/compaction-highmem.cc@18
PS10, Line 18: #include <cstdio> // IWYU pragma: keep
             :
             : #include <algorithm> // IWYU pragma: keep
             : #include <iterator> // IWYU pragma: keep
             : #include <memory> // IWYU pragma: keep
             : #include <string> // IWYU pragma: keep
             : #include <type_traits> // IWYU pragma: keep
             : #include <vector> // IWYU pragma: keep
             :
             : #include <gflags/gflags.h> // IWYU pragma: keep
             : #include <gtest/gtest.h> // IWYU pragma: keep
nit: the 'keep' pragma is needed only when IWYU for some reason recommends 
removing the include, and doing so breaks compilation.  So, consider removing 
these pragmas and see what's the result.


http://gerrit.cloudera.org:8080/#/c/19278/10/src/kudu/benchmarks/compaction-highmem.cc@41
PS10, Line 41: reps
Consider renaming 'reps' into 'iterations'.


http://gerrit.cloudera.org:8080/#/c/19278/10/src/kudu/benchmarks/compaction-highmem.cc@43
PS10, Line 43: DEFINE_int32
Could there be negative values for these flags?  If not, maybe consider using 
DEFINE_uint32 instead of DEFINE_int32?


http://gerrit.cloudera.org:8080/#/c/19278/10/src/kudu/benchmarks/compaction-highmem.cc@89
PS10, Line 89:                
nit: fix the indent?


http://gerrit.cloudera.org:8080/#/c/19278/10/src/kudu/benchmarks/compaction-highmem.cc@91
PS10, Line 91:                           string* tool_stdout,
             :                           string* tool_stderr,
style nit: out and in-out parameters should come last


http://gerrit.cloudera.org:8080/#/c/19278/10/src/kudu/benchmarks/compaction-highmem.cc@105
PS10, Line 105:         ColumnSchema("bool_val", BOOL),
              :         ColumnSchema("int8_val", INT8),
              :         ColumnSchema("int16_val", INT16),
              :         ColumnSchema("int32_val", INT32),
              :         ColumnSchema("int64_val", INT64),
              :         ColumnSchema("float_val", FLOAT),
              :         ColumnSchema("double_val", DOUBLE),
              :         ColumnSchema("decimal32_val", DECIMAL32, false, false, 
false,
              :                      nullptr, nullptr, 
ColumnStorageAttributes(),
              :                      ColumnTypeAttributes(9, 9)),
              :         ColumnSchema("decimal64_val", DECIMAL64, false, false, 
false,
              :                      nullptr, nullptr, 
ColumnStorageAttributes(),
              :                      ColumnTypeAttributes(18, 2)),
              :         ColumnSchema("decimal128_val", DECIMAL128, false, 
false, false,
              :                      nullptr, nullptr, 
ColumnStorageAttributes(),
              :                      ColumnTypeAttributes(38, 0)),
              :         ColumnSchema("unixtime_micros_val", UNIXTIME_MICROS),
              :         ColumnSchema("string_val", STRING),
              :         ColumnSchema("binary_val", BINARY),
Is it crucial to have all these different types of columns?  IIRC, the OOM 
happened in case of system tablet, where it has just three columns: INT8, INT8, 
STRING, where the primary column is comprised for first two INT8 columns.


http://gerrit.cloudera.org:8080/#/c/19278/10/src/kudu/benchmarks/compaction-highmem.cc@133
PS10, Line 133:  .add_hash_partitions({kKeyColumnName}, 2)
Would having just a single partition be a more straight forward case?  
Otherwise, why not to add tens of partitions if we are shooting for an OOM 
condition?


http://gerrit.cloudera.org:8080/#/c/19278/10/src/kudu/benchmarks/compaction-highmem.cc@166
PS10, Line 166:        "--table_num_replicas=1",
              :         "--table_num_range_partitions=2",
              :         "--table_num_hash_partitions=1",
Here and elsewhere in the code: IIUC, these parameters are irrelevant and can 
be omitted since the 'perf loadgen' tool uses an already existing table.


http://gerrit.cloudera.org:8080/#/c/19278/10/src/kudu/benchmarks/compaction-highmem.cc@175
PS10, Line 175: CompactRowSetsOpTable
Consider creating a constant and using it elsewhere in the code below?


http://gerrit.cloudera.org:8080/#/c/19278/10/src/kudu/benchmarks/compaction-highmem.cc@195
PS10, Line 195:   for (int reps = 0; reps < FLAGS_stage_2_reps; reps++) {
              :     const vector<string> args = {
              :       "perf",
              :       "loadgen",
              :       cluster_->master()->bound_rpc_addr().ToString(),
              :       "--table_name=CompactRowSetsOpTable",
              :       "--table_num_replicas=1",
              :       "--table_num_range_partitions=2",
              :       "--table_num_hash_partitions=1",
              :       "--string_len=8192",
              :       "--use_random_pk",
              :       "--use_upsert",
              :       "--num_threads=2",
              :       "--num_rows_per_thread=100000",
              :     };
              :     ASSERT_OK(RunKuduTool(args));
              :   }
              :
              :   for (int reps = 0; reps < FLAGS_stage_3_reps; reps++) {
              :     const vector<string> args = {
              :       "perf",
              :       "loadgen",
              :       cluster_->master()->bound_rpc_addr().ToString(),
              :       "--table_name=CompactRowSetsOpTable",
              :       "--table_num_replicas=1",
              :       "--table_num_range_partitions=2",
              :       "--table_num_hash_partitions=1",
              :       "--string_len=8192",
              :       "--use_random_pk",
              :       "--use_upsert",
              :       "--num_threads=2",
              :       "--num_rows_per_thread=1000",
              :     };
              :     ASSERT_OK(RunKuduTool(args));
              :   }
              :
              :   for (int reps = 0; reps < FLAGS_stage_4_reps; reps++) {
              :     const vector<string> args = {
              :       "perf",
              :       "loadgen",
              :       cluster_->master()->bound_rpc_addr().ToString(),
              :       "--table_name=CompactRowSetsOpTable",
              :       "--table_num_replicas=1",
              :       "--table_num_range_partitions=2",
              :       "--table_num_hash_partitions=1",
              :       "--string_len=8192",
              :       "--use_random_pk",
              :       "--use_upsert",
              :       "--num_threads=2",
              :       "--num_rows_per_thread=10000",
              :     };
              :     ASSERT_OK(RunKuduTool(args));
              :   }
              : }
Could these be combined?  It seems they use the same pattern, the only 
difference is the number operations per thread, but they use different (but 
repeating) seed for the random generator.  I'd expect there might be a good 
chance to have the desired effect with many iterations of the very last cycle, 
but the first two might be omitted.


http://gerrit.cloudera.org:8080/#/c/19278/10/src/kudu/benchmarks/compaction-highmem.cc@297
PS10, Line 297:   for (int reps = 0; reps < FLAGS_stage_1_reps; reps++) {
Does it make sense to separate the common part with the iterations of various 
load patterns to be common between these the 
TestLoadgenCompactRowSetsOpHighMemUsage and 
TestLoadgenCompactRowSetsOpOptimalMemUsage  scenarios?


http://gerrit.cloudera.org:8080/#/c/19278/10/src/kudu/scripts/benchmarks.sh
File src/kudu/scripts/benchmarks.sh:

http://gerrit.cloudera.org:8080/#/c/19278/10/src/kudu/scripts/benchmarks.sh@241
PS10, Line 241:   $PERF_STAT ./build/latest/bin/compaction-highmem \
              :     --gtest_filter=\*TestLoadgenCompactRowSets\* \
              :     --stage_1_reps=50 \
              :     --stage_2_reps=50 \
              :     --stage_3_reps=1000 \
              :     --stage_4_reps=100 \
              :     &> $LOGDIR/${COMPACT_HIGHMEM_TEST}.log
If you are about to add this piece, you also need to take care of other parts 
to collect and record the stats into the database.  As a reference, track other 
scenarios like MT_TABLET_TEST in this script.



--
To view, visit http://gerrit.cloudera.org:8080/19278
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ief4ec03f1bf0f7936f8fb054948f87e71333f824
Gerrit-Change-Number: 19278
Gerrit-PatchSet: 10
Gerrit-Owner: Ashwani Raina <ara...@cloudera.com>
Gerrit-Reviewer: Abhishek Chennaka <achenn...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <ale...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ara...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <abu...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <mre...@cloudera.com>
Gerrit-Reviewer: Zoltan Chovan <zcho...@cloudera.com>
Gerrit-Comment-Date: Tue, 28 Mar 2023 05:33:39 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tools] Add test to generate heavy rowset compaction

Reply via email to