Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/20770 )
Change subject: IMPALA-12426: Query History Table ...................................................................... Patch Set 12: (30 comments) I recommend to run auto-format the C++ codes with clang-format. This is how I do it in linux from $IMPALA_HOME: git diff -U0 --no-color HEAD~1 | clang-format-diff -p1 -i Some auto-format, however, should be avoided, such as changing import order in java code (to preserve git history). http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/benchmarks/expr-benchmark.cc File be/src/benchmarks/expr-benchmark.cc: http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/benchmarks/expr-benchmark.cc@591 PS12, Line 591: // prettyprint_duration 5.65 5.7 5.75 0.0441X 0.0439X 0.0441X nit: should update all numbers in StringFn and StringFnCodegen table, not just prettyprint_duration and prettyprint_memory. I think in your benchmark run, the baseline (length function) must have lower iters/ms number than the old number. http://gerrit.cloudera.org:8080/#/c/20770/9/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: http://gerrit.cloudera.org:8080/#/c/20770/9/be/src/exprs/expr-test.cc@4350 PS9, Line 4350: TestStringValue("prettyprint_duration(-1)", "-1000.000ns"); > done Done http://gerrit.cloudera.org:8080/#/c/20770/9/be/src/exprs/string-functions-ir.cc File be/src/exprs/string-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/20770/9/be/src/exprs/string-functions-ir.cc@1811 PS9, Line 1811: const string& fmt_str = PrettyPrinter::Print(val, unit); > Thanks for bringing that to my attention. At least initially we will not b Done http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/CMakeLists.txt File be/src/service/CMakeLists.txt: http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/CMakeLists.txt@46 PS12, Line 46: query-state.cc I recommend to rename this to query-state-record.cc to not confuse with be/src/runtime/query-state.cc http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/impala-server.h File be/src/service/impala-server.h: http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/impala-server.h@1256 PS12, Line 1256: std::shared_ptr<bool> completed_queries_ready_; Is this protected by completed_queries_lock_ as well? http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/impala-server.h@1281 PS12, Line 1281: The count is tracked so that the number of attempts can : /// be limited and failing inserts do not retry indefinitely Mention constant/flag that represent maximum number of attempts in this doc. http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/impala-server.h@1797 PS12, Line 1797: /// Expanded snapshot of the query including its state along with other fields relevant : /// to workload management. : struct QueryStateExpanded { Is this also immutable like QueryStateRecord? Please clarify in comment. http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/impala-server.cc@3192 PS12, Line 3192: this-> nit: redundant "this->" here and below? http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/impala-server.cc@3433 PS12, Line 3433: } : : } // namespace impala nit: unnecessary whitespace. http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/internal-server.h File be/src/service/internal-server.h: http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/internal-server.h@96 PS12, Line 96: /// `persist_in_db` Boolean indicating if the query data should be written to the : /// completed queries table after it is closed. nit: Clarify in the doc that `persist_in_db` is optional and default to True. Same comment for other methods below. http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/internal-server.cc File be/src/service/internal-server.cc: http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/internal-server.cc@117 PS12, Line 117: internal_query_id.hi != 0 && internal_query_id.lo != 0 Move this check to uid-util.h http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/internal-server.cc@121 PS12, Line 121: session_id Should this went through the same check as internal_query_id before closing? http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/query-state.cc File be/src/service/query-state.cc: http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/query-state.cc@95 PS12, Line 95: beeswax::_QueryState_VALUES_TO_NAMES.find(beeswax_query_state)->second; nit: other place use this from debug-util.h: PrintValue(BeeswaxQueryState()). http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/workload-management.cc File be/src/service/workload-management.cc: http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/workload-management.cc@50 PS12, Line 50: DEFINE_string(enable_workload_mgmt, "", : "Specifies if Impala will automatically write completed queries in the query log " : "table. An empty or not-specified value will result in completed queries not being " : "stored. A value of 'iceberg' will store the query log table as an Iceberg table. If " : "this value is specified and then later removed, the query log table will remain " : "intact and accessible."); Since it only support iceberg table now, shall this turn to bool flag? I think it is better to separate table format vs enablement if it do support other table format in the future. http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/workload-management.cc@63 PS12, Line 63: DEFINE_string_hidden(query_log_table_name, "impala_query_log", "Specifies the name of " : "the query log table where completed queries will be stored. This table will be in " : "the 'sys' database."); Validate against empty string. http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/workload-management.cc@70 PS12, Line 70: Max value is 14400. nit: Valid value is between (0, 14400]. http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/workload-management.cc@75 PS12, Line 75: than 14,400. nit: than or equal to 14400 seconds (24 hours). http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/workload-management.cc@186 PS12, Line 186: /// Functions used to generate the SQL DML to insert records into the completed queries : /// table. : static std::list<std::tuple<std::string, std::string, WMFieldParser>> fields_; Can the static members and functions of workload-management.cc contained in a dedicated class? http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/workload-management.cc@494 PS12, Line 494: /// Builds the list of field parsers. Each field has its own parser that defines the field : /// name, field SQL type, and function to produce the value from a QueryStateExpanded : /// object. : static void buildFieldParsers() Is field order matter here? If yes, please add comment to warn against reorder. http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/workload-management.cc@578 PS12, Line 578: didExceedMaxRecords nit: MaxRecordsExceeded sounds better. Impala C++ code style also begin function name with capital letter. https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65868536 http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/workload-management.cc@582 PS12, Line 582: generating the generating nit: generating http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/workload-management.cc@584 PS12, Line 584: initDB nit: InitDB. http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/workload-management.cc@598 PS12, Line 598: : auto iter = fields_.cbegin(); : DCHECK(iter != fields_.cend()); : create_table_sql << std::get<0>(*iter) << " " << std::get<1>(*iter); : iter++; : : while(iter != fields_.cend()) { : create_table_sql << "," << std::get<0>(*iter) << " " << std::get<1>(*iter); : iter++; : } What happen if content of fileds_ changed in the future? Should this be versioned to handle evolving schema? http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/workload-management.cc@646 PS12, Line 646: if (!query_handle.query_driver()->IncludedInQueryLog() || : query_handle->stmt_type() == TStmtType::SET || : (query_handle->sql_stmt().length() >= 3 && : boost::algorithm::iequals(query_handle->sql_stmt().substr(0,3), "use")) || : (query_handle->sql_stmt().length() >= 4 && : boost::algorithm::iequals(query_handle->sql_stmt().substr(0,4), "show")) || : !wmEnabled()) { Add indentations and check wmEnabled() first. if (!wmEnabled() || !query_handle.query_driver()->IncludedInQueryLog() || query_handle->stmt_type() == TStmtType::SET || (query_handle->sql_stmt().length() >= 3 && boost::algorithm::iequals(query_handle->sql_stmt().substr(0, 3), "use")) || (query_handle->sql_stmt().length() >= 4 && boost::algorithm::iequals(query_handle->sql_stmt().substr(0, 4), "show"))) { Also, why not mark IncludeInQueryLog(false) for "set", "use", and "show" early on? http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/util/network-util.h File be/src/util/network-util.h: http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/util/network-util.h@114 PS12, Line 114: // hostnames were the same, compare on port : return first.port < second.port; missing comparison over uds_address. http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/util/string-util.h File be/src/util/string-util.h: http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/util/string-util.h@99 PS12, Line 99: /// Iterates through a string converting each character to an underscore (if the character : /// is a space) or to the lower case of the character and writes the converted character : /// to a stringstream. Should this handle special char as well? For example, there is an event key with parenthesis like this in query profile: "Authorization finished (ranger)" http://gerrit.cloudera.org:8080/#/c/20770/12/tests/beeswax/impala_beeswax.py File tests/beeswax/impala_beeswax.py: http://gerrit.cloudera.org:8080/#/c/20770/12/tests/beeswax/impala_beeswax.py@215 PS12, Line 215: result.runtime_profile = self.get_runtime_profile(handle) Why this is moved after close_query? http://gerrit.cloudera.org:8080/#/c/20770/9/tests/custom_cluster/test_query_log.py File tests/custom_cluster/test_query_log.py: http://gerrit.cloudera.org:8080/#/c/20770/9/tests/custom_cluster/test_query_log.py@51 PS9, Line 51: # If que > Good idea. I removed sleep statements where I could and switched to metric Done http://gerrit.cloudera.org:8080/#/c/20770/12/tests/custom_cluster/test_query_log.py File tests/custom_cluster/test_query_log.py: http://gerrit.cloudera.org:8080/#/c/20770/12/tests/custom_cluster/test_query_log.py@39 PS12, Line 39: : class TestQueryLogTable(CustomClusterTestSuite): Can you craft a custom cluster with 2 coordinator and verify that impala_query_log table written by one coordinator is also readable by the other coordinator? I think there is an example of multi-coordinator test in tests/custom_cluster/test_executor_groups.py. http://gerrit.cloudera.org:8080/#/c/20770/12/tests/custom_cluster/test_query_log.py@633 PS12, Line 633: : impalad.service.wait_for_metric_value( : "impala-server.completed_queries.written", expected_writes, 30) : : # Allow time for Impala to process the insert. : sleep(3) Why need to sleep? Isn't impala-server.completed_queries.written only incremented upon successful insert to impala_query_log table? -- To view, visit http://gerrit.cloudera.org:8080/20770 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2d2da9d450fba4e789400cfa62927fc25d34f844 Gerrit-Change-Number: 20770 Gerrit-PatchSet: 12 Gerrit-Owner: Jason Fehr <jf...@cloudera.com> Gerrit-Reviewer: Andrew Sherman <asher...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Jason Fehr <jf...@cloudera.com> Gerrit-Reviewer: Michael Smith <michael.sm...@cloudera.com> Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com> Gerrit-Comment-Date: Tue, 23 Jan 2024 04:31:08 +0000 Gerrit-HasComments: Yes