Hello Thomas Marshall, Fredy Wijaya, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/12884

to look at the new patch set (#21).

Change subject: IMPALA-7290: part 2: Add HS2 support to Impala shell
......................................................................

IMPALA-7290: part 2: Add HS2 support to Impala shell

HS2 is the new default and the user-visible differences
are minimal. Beeswax is still supported via --protocol=beeswax
but will be deprecated. Differences are abstracted into
ImpalaClient subclasses.

This support requires Impala-specific extensions to
the HS2 interface, similar to the existing extensions
to Beeswax. Thus the HS2 shell is only
forwards-compatible with newer Impala versions.
I considered trying to gracefully degrade when the
new extensions weren't present, but it didn't seem to be
worth the ongoing testing effort.

Here are the changes required to make it work:
* Switch to TBinaryProtocolAccelerated to avoid perf
  regression. The HS2 protocol requires decoding
  more primitive values (because its not a string-per-row),
  which was slow with the pure python implementation of
  TBinaryProtocol.
* Added bitarray module to efficiently unpack null indicators
* Minimise invasiveness of changes by transposing and stringifying
  the columnar results into rows in impala_client.py. The transposition
  needs to happen before display anyway.
* Add PingImpalaHS2Service() to get back version string and webserver
  address.
* Add CloseImpalaOperation() extension to return DML row counts. This
  possibly addresses IMPALA-1789, although we need to confirm that
  this is a sufficient solution.
* Add is_closed member to query handles to avoid shell independently
  tracking whether the query handle was closed or not.
* Include query status in HS2 log to match beeswax.
* HS2 GetLog() command now includes query status error message for
  consistency with beeswax.
* "set"/"set all" uses the client requests options, not the session
  default. This captures the effective value of TIMEZONE, which
  was previously missing. This also requires test changes where
  the tests set non-default values, e.g. for ABORT_ON_ERROR.
* "set all" on the server side returns REMOVED query options - the
  shell needs to know these so it can correctly ignore them.
* Clean up self.orig_cmd/self.last_leading comment argument
  passing to avoid implicit parameter passing through multiple
  function calls.
* Clean up argument handling in shell tests to consistently pass
  around lists of arguments instead of strings that are subject
  to shell tokenisation rules.
* Consistently close connections in the shell to avoid leaking
  HS2 sessions. This is enforced by making ImpalaShell a context
  manager and also eliminating all sys.exit() calls that would
  bypass the explicit connection closing.

Testing:
* Shell tests can run with both protocols
* Add tests for formatting of all types and NULL values
* Added testing for floating point output formatting, which does
  change as a result of switching to server-side vs client-side
  formatting.
* Verified that newly-added tests were actually going through HS2
  by disabling hs2 on the minicluster and running tests.
* Add checks to test_verify_metrics.py to ensure that no sessions
  are left open at the end of tests.

Performance:
Baseline from beeswax shell for large extract is as follows:

  $ time impala-shell.sh -B -q 'select * from tpch_parquet.orders' > /dev/null
  real    0m6.708s
  user    0m5.132s
  sys     0m0.204s

After this change it is somewhat slower, but we generally don't consider
bulk extract performance through the shell to be perf-critical:
  real    0m7.625s
  user    0m6.436s
  sys     0m0.256s

Change-Id: I6d5cc83d545aacc659523f29b1d6feed672e2a12
---
M LICENSE.txt
M be/src/runtime/dml-exec-state.cc
M be/src/runtime/dml-exec-state.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/impala-beeswax-server.cc
M be/src/service/impala-hs2-server.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M bin/rat_exclude_files.txt
M common/thrift/ImpalaService.thrift
M infra/python/deps/compiled-requirements.txt
A shell/ext-py/bitarray-0.9.0/PKG-INFO
A shell/ext-py/bitarray-0.9.0/bitarray/__init__.py
A shell/ext-py/bitarray-0.9.0/bitarray/_bitarray.c
A shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py
A shell/ext-py/bitarray-0.9.0/setup.py
M shell/impala_client.py
M shell/impala_shell.py
M shell/impala_shell_config_defaults.py
M shell/make_shell_tarball.sh
M shell/option_parser.py
M shell/thrift_sasl.py
M testdata/workloads/functional-query/queries/QueryTest/set.test
M tests/common/impala_service.py
M tests/common/impala_test_suite.py
M tests/custom_cluster/test_shell_interactive.py
M tests/custom_cluster/test_shell_interactive_reconnect.py
M tests/hs2/test_hs2.py
M tests/query_test/test_observability.py
M tests/shell/test_shell_commandline.py
M tests/shell/test_shell_interactive.py
M tests/shell/util.py
M tests/verifiers/metric_verifier.py
M tests/verifiers/test_verify_metrics.py
35 files changed, 7,382 insertions(+), 540 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/84/12884/21
-- 
To view, visit http://gerrit.cloudera.org:8080/12884
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6d5cc83d545aacc659523f29b1d6feed672e2a12
Gerrit-Change-Number: 12884
Gerrit-PatchSet: 21
Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fwij...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <tmarsh...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>

Reply via email to