Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20770 )

Change subject: IMPALA-12426: Query History Table
......................................................................


Patch Set 12:

(30 comments)

I recommend to run auto-format the C++ codes with clang-format. This is how I 
do it in linux from $IMPALA_HOME:

git diff -U0 --no-color HEAD~1 | clang-format-diff -p1 -i

Some auto-format, however, should be avoided, such as changing import order in 
java code (to preserve git history).

http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/benchmarks/expr-benchmark.cc
File be/src/benchmarks/expr-benchmark.cc:

http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/benchmarks/expr-benchmark.cc@591
PS12, Line 591: //                prettyprint_duration               5.65      
5.7     5.75    0.0441X    0.0439X    0.0441X
nit: should update all numbers in StringFn and StringFnCodegen table, not just 
prettyprint_duration and prettyprint_memory.
I think in your benchmark run, the baseline (length function) must have lower 
iters/ms number than the old number.


http://gerrit.cloudera.org:8080/#/c/20770/9/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

http://gerrit.cloudera.org:8080/#/c/20770/9/be/src/exprs/expr-test.cc@4350
PS9, Line 4350: TestStringValue("prettyprint_duration(-1)", "-1000.000ns");
> done
Done


http://gerrit.cloudera.org:8080/#/c/20770/9/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/20770/9/be/src/exprs/string-functions-ir.cc@1811
PS9, Line 1811: const string& fmt_str = PrettyPrinter::Print(val, unit);
> Thanks for bringing that to my attention.  At least initially we will not b
Done


http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/CMakeLists.txt
File be/src/service/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/CMakeLists.txt@46
PS12, Line 46: query-state.cc
I recommend to rename this to query-state-record.cc to not confuse with 
be/src/runtime/query-state.cc


http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/impala-server.h
File be/src/service/impala-server.h:

http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/impala-server.h@1256
PS12, Line 1256:   std::shared_ptr<bool> completed_queries_ready_;
Is this protected by completed_queries_lock_ as well?


http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/impala-server.h@1281
PS12, Line 1281: The count is tracked so that the number of attempts can
               :   /// be limited and failing inserts do not retry indefinitely
Mention constant/flag that represent maximum number of attempts in this doc.


http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/impala-server.h@1797
PS12, Line 1797: /// Expanded snapshot of the query including its state along 
with other fields relevant
               : /// to workload management.
               : struct QueryStateExpanded {
Is this also immutable like QueryStateRecord? Please clarify in comment.


http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/impala-server.cc@3192
PS12, Line 3192: this->
nit: redundant "this->" here and below?


http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/impala-server.cc@3433
PS12, Line 3433: }
               :
               : } // namespace impala
nit: unnecessary whitespace.


http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/internal-server.h
File be/src/service/internal-server.h:

http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/internal-server.h@96
PS12, Line 96:       ///   `persist_in_db` Boolean indicating if the query data 
should be written to the
             :       ///                   completed queries table after it is 
closed.
nit: Clarify in the doc that `persist_in_db` is optional and default to True.
Same comment for other methods below.


http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/internal-server.cc
File be/src/service/internal-server.cc:

http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/internal-server.cc@117
PS12, Line 117: internal_query_id.hi != 0 && internal_query_id.lo != 0
Move this check to uid-util.h


http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/internal-server.cc@121
PS12, Line 121: session_id
Should this went through the same check as internal_query_id before closing?


http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/query-state.cc
File be/src/service/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/query-state.cc@95
PS12, Line 95: 
beeswax::_QueryState_VALUES_TO_NAMES.find(beeswax_query_state)->second;
nit: other place use this from debug-util.h:
PrintValue(BeeswaxQueryState()).


http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/workload-management.cc
File be/src/service/workload-management.cc:

http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/workload-management.cc@50
PS12, Line 50: DEFINE_string(enable_workload_mgmt, "",
             :     "Specifies if Impala will automatically write completed 
queries in the query log "
             :     "table. An empty or not-specified value will result in 
completed queries not being "
             :     "stored. A value of 'iceberg' will store the query log table 
as an Iceberg table. If "
             :     "this value is specified and then later removed, the query 
log table will remain "
             :     "intact and accessible.");
Since it only support iceberg table now, shall this turn to bool flag?
I think it is better to separate table format vs enablement if it do support 
other table format in the future.


http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/workload-management.cc@63
PS12, Line 63: DEFINE_string_hidden(query_log_table_name, "impala_query_log", 
"Specifies the name of "
             :     "the query log table where completed queries will be stored. 
This table will be in "
             :     "the 'sys' database.");
Validate against empty string.


http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/workload-management.cc@70
PS12, Line 70: Max value is 14400.
nit: Valid value is between (0, 14400].


http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/workload-management.cc@75
PS12, Line 75: than 14,400.
nit: than or equal to 14400 seconds (24 hours).


http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/workload-management.cc@186
PS12, Line 186: /// Functions used to generate the SQL DML to insert records 
into the completed queries
              : /// table.
              : static std::list<std::tuple<std::string, std::string, 
WMFieldParser>> fields_;
Can the static members and functions of workload-management.cc contained in a 
dedicated class?


http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/workload-management.cc@494
PS12, Line 494: /// Builds the list of field parsers. Each field has its own 
parser that defines the field
              : /// name, field SQL type, and function to produce the value 
from a QueryStateExpanded
              : /// object.
              : static void buildFieldParsers()
Is field order matter here? If yes, please add comment to warn against reorder.


http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/workload-management.cc@578
PS12, Line 578: didExceedMaxRecords
nit: MaxRecordsExceeded sounds better.
Impala C++ code style also begin function name with capital letter.
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65868536


http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/workload-management.cc@582
PS12, Line 582: generating the generating
nit: generating


http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/workload-management.cc@584
PS12, Line 584: initDB
nit: InitDB.


http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/workload-management.cc@598
PS12, Line 598:
              :   auto iter = fields_.cbegin();
              :   DCHECK(iter != fields_.cend());
              :   create_table_sql << std::get<0>(*iter) << " " << 
std::get<1>(*iter);
              :   iter++;
              :
              :   while(iter != fields_.cend()) {
              :     create_table_sql << "," << std::get<0>(*iter) << " " << 
std::get<1>(*iter);
              :     iter++;
              :   }
What happen if content of fileds_ changed in the future?
Should this be versioned to handle evolving schema?


http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/workload-management.cc@646
PS12, Line 646:   if (!query_handle.query_driver()->IncludedInQueryLog() ||
              :       query_handle->stmt_type() == TStmtType::SET ||
              :       (query_handle->sql_stmt().length() >= 3 &&
              :       
boost::algorithm::iequals(query_handle->sql_stmt().substr(0,3), "use")) ||
              :       (query_handle->sql_stmt().length() >= 4 &&
              :       
boost::algorithm::iequals(query_handle->sql_stmt().substr(0,4), "show")) ||
              :       !wmEnabled()) {
Add indentations and check wmEnabled() first.

  if (!wmEnabled()
      || !query_handle.query_driver()->IncludedInQueryLog()
      || query_handle->stmt_type() == TStmtType::SET
      || (query_handle->sql_stmt().length() >= 3
          && boost::algorithm::iequals(query_handle->sql_stmt().substr(0, 3), 
"use"))
      || (query_handle->sql_stmt().length() >= 4
          && boost::algorithm::iequals(query_handle->sql_stmt().substr(0, 4), 
"show"))) {

Also, why not mark IncludeInQueryLog(false) for "set", "use", and "show" early 
on?


http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/util/network-util.h
File be/src/util/network-util.h:

http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/util/network-util.h@114
PS12, Line 114:     // hostnames were the same, compare on port
              :     return first.port < second.port;
missing comparison over uds_address.


http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/util/string-util.h
File be/src/util/string-util.h:

http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/util/string-util.h@99
PS12, Line 99: /// Iterates through a string converting each character to an 
underscore (if the character
             : /// is a space) or to the lower case of the character and writes 
the converted character
             : /// to a stringstream.
Should this handle special char as well? For example, there is an event key 
with parenthesis like this in query profile:
"Authorization finished (ranger)"


http://gerrit.cloudera.org:8080/#/c/20770/12/tests/beeswax/impala_beeswax.py
File tests/beeswax/impala_beeswax.py:

http://gerrit.cloudera.org:8080/#/c/20770/12/tests/beeswax/impala_beeswax.py@215
PS12, Line 215: result.runtime_profile = self.get_runtime_profile(handle)
Why this is moved after close_query?


http://gerrit.cloudera.org:8080/#/c/20770/9/tests/custom_cluster/test_query_log.py
File tests/custom_cluster/test_query_log.py:

http://gerrit.cloudera.org:8080/#/c/20770/9/tests/custom_cluster/test_query_log.py@51
PS9, Line 51: # If que
> Good idea.  I removed sleep statements where I could and switched to metric
Done


http://gerrit.cloudera.org:8080/#/c/20770/12/tests/custom_cluster/test_query_log.py
File tests/custom_cluster/test_query_log.py:

http://gerrit.cloudera.org:8080/#/c/20770/12/tests/custom_cluster/test_query_log.py@39
PS12, Line 39:
             : class TestQueryLogTable(CustomClusterTestSuite):
Can you craft a custom cluster with 2 coordinator and verify that 
impala_query_log table written by one coordinator is also readable by the other 
coordinator?
I think there is an example of multi-coordinator test in 
tests/custom_cluster/test_executor_groups.py.


http://gerrit.cloudera.org:8080/#/c/20770/12/tests/custom_cluster/test_query_log.py@633
PS12, Line 633:
              :     impalad.service.wait_for_metric_value(
              :         "impala-server.completed_queries.written", 
expected_writes, 30)
              :
              :     # Allow time for Impala to process the insert.
              :     sleep(3)
Why need to sleep? Isn't impala-server.completed_queries.written only 
incremented upon successful insert to impala_query_log table?



--
To view, visit http://gerrit.cloudera.org:8080/20770
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2d2da9d450fba4e789400cfa62927fc25d34f844
Gerrit-Change-Number: 20770
Gerrit-PatchSet: 12
Gerrit-Owner: Jason Fehr <jf...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <asher...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Jason Fehr <jf...@cloudera.com>
Gerrit-Reviewer: Michael Smith <michael.sm...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com>
Gerrit-Comment-Date: Tue, 23 Jan 2024 04:31:08 +0000
Gerrit-HasComments: Yes

Reply via email to