[Impala-ASF-CR] HS2 + HTTP(S) + BASIC/LDAP based thrift server endpoint

2019-05-16 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13299 )

Change subject: HS2 + HTTP(S) + BASIC/LDAP based thrift server endpoint
..


Patch Set 2:

(17 comments)

http://gerrit.cloudera.org:8080/#/c/13299/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13299/2//COMMIT_MSG@21
PS2, Line 21:  Before this patch is committed, the intention is
: to submit the changes to those files that are shown in this 
review as
: a patch on Impala's native-toolchain Thrift.
It seems to me like it would be easier to just copy-paste this, since the 
Thrift Protocol/Transport interfaces are typically pretty stable across Thrift 
versions. Or, contribute the changes upstream to Thrift itself. Carrying a 
largeish patch in the context of native-toolchain seems kinda like worst of 
both worlds to me.


http://gerrit.cloudera.org:8080/#/c/13299/2//COMMIT_MSG@28
PS2, Line 28: sends a huge payload that can potentially crash the server.
yea, assuming this is meant to be exposed to the internet, we might want to 
consider fuzz testing the pre-authenticated endpoint in an ASAN build.


http://gerrit.cloudera.org:8080/#/c/13299/2/be/src/rpc/authentication.cc
File be/src/rpc/authentication.cc:

http://gerrit.cloudera.org:8080/#/c/13299/2/be/src/rpc/authentication.cc@502
PS2, Line 502: LOG(ERROR) << "Failed to determine buffer length to decode 
base64 auth string.";
for all of the log messages in this function, can we include the remote socket 
address?


http://gerrit.cloudera.org:8080/#/c/13299/2/be/src/rpc/authentication.cc@505
PS2, Line 505:   char out[out_max];
stack allocating here seems quite dangerous without constraining the length (eg 
to 1kb or something)


http://gerrit.cloudera.org:8080/#/c/13299/2/be/src/rpc/authentication.cc@507
PS2, Line 507:   if (!Base64Decode(base64, in_len, out_max, out, _len)) {
Looking at Base64Decode, it doesn't seem to null-terminate the output, but down 
a few lines you're using 'out' as if it's a null-terminated C string.

I'd feel safer about all this code if we used C++ strings. gutil has this 
function:

bool Base64Unescape(const char* src, int slen, string* dest);

which will take care of all the buffer sizing stuff for you


http://gerrit.cloudera.org:8080/#/c/13299/2/be/src/rpc/authentication.cc@521
PS2, Line 521:   size_t pass_len = out_len - user_len - 1;
again I'd feel safer about the above code if we used C++ strings, like:

pair u_p = strings::Split(up_string, 
strings::delimiter::Limit(":", 1));

(I can't necessarily spot a bug in your C string manipulation, but I also know 
that the above won't have a bug)


http://gerrit.cloudera.org:8080/#/c/13299/2/be/src/rpc/authentication.cc@957
PS2, Line 957: SetConnectionUsername
this function is a bit weirdly named, since in the HTTP case, it isn't setting 
the connection username a tall, but rather setting up a hook that will later 
set a connection username. Maybe a moot point based on my comment elsewhere 
that this needs to be per-request state rather than per-connection state for 
HTTP


http://gerrit.cloudera.org:8080/#/c/13299/2/be/src/rpc/authentication.cc@977
PS2, Line 977:   const string& err = Substitute("Bad transport type: $0", 
underlying_transport_type);
can we just LOG(FATAL) here since it would be a coding bug?


http://gerrit.cloudera.org:8080/#/c/13299/2/be/src/rpc/authentication.cc@998
PS2, Line 998:   return Status::Expected(err);
same


http://gerrit.cloudera.org:8080/#/c/13299/2/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/13299/2/be/src/service/impala-server.cc@2424
PS2, Line 2424: if (hs2_http_port > 0) {
Maybe we should use '-1' to mean disable? port 0 usually means "use an 
ephemeral port'


http://gerrit.cloudera.org:8080/#/c/13299/2/be/src/transport/THttpServer.h
File be/src/transport/THttpServer.h:

http://gerrit.cloudera.org:8080/#/c/13299/2/be/src/transport/THttpServer.h@71
PS2, Line 71: std::vector
that's a bit of an odd choice of type instead of std::string


http://gerrit.cloudera.org:8080/#/c/13299/2/be/src/transport/THttpServer.h@80
PS2, Line 80:   THttpServerTransportFactory(bool requireBasicAuth = false)
explicit


http://gerrit.cloudera.org:8080/#/c/13299/2/be/src/transport/THttpServer.cpp
File be/src/transport/THttpServer.cpp:

http://gerrit.cloudera.org:8080/#/c/13299/2/be/src/transport/THttpServer.cpp@68
PS2, Line 68: strncmp
probably needs to be made case-insensitive (odd that x-forwarded-for is not 
using THRIFT_strncasecmp).


http://gerrit.cloudera.org:8080/#/c/13299/2/be/src/transport/THttpServer.cpp@69
PS2, Line 69: 7
probably need to also check that sz >= 7, otherwise we might read past the end 
of value

(one of the advantages of pulling THttpServer into Impala code itself instead 
of trying to patch Thrift woudl be we could use nice convenience functions like 

[Impala-ASF-CR] HS2 + HTTP(S) + BASIC/LDAP based thrift server endpoint

2019-05-10 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13299 )

Change subject: HS2 + HTTP(S) + BASIC/LDAP based thrift server endpoint
..


Patch Set 2:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/3185/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/13299
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5569ac62ef3af2868b5d0581f5029dac736b2ff
Gerrit-Change-Number: 13299
Gerrit-PatchSet: 2
Gerrit-Owner: Thomas Marshall 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sudhanshu Arora 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Sat, 11 May 2019 00:10:31 +
Gerrit-HasComments: No


[Impala-ASF-CR] HS2 + HTTP(S) + BASIC/LDAP based thrift server endpoint

2019-05-10 Thread Thomas Marshall (Code Review)
Hello Bharath Vissapragada, Michael Ho, Sudhanshu Arora, Todd Lipcon, Impala 
Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/13299

to look at the new patch set (#2).

Change subject: HS2 + HTTP(S) + BASIC/LDAP based thrift server endpoint
..

HS2 + HTTP(S) + BASIC/LDAP based thrift server endpoint

This patch provides an option to use HTTP based transport for
HiveServer2 endpoint on coordinators that the clients can connect
to query. HTTP(S) also works when external TLS is enabled using
--ssl_server_certificate.

Implemented only for HS2 compatible thrift server since, unlike
beeswax, its session management does not need to be tied to the
underlying TCP conneciton.

Thirft's http transport is modified to support BASIC authentication
via ldap. For convenience of developing and reviewing, this patch
is based on another that copied THttpServer and THttpTransport into
Impala's codebase. Before this patch is committed, the intention is
to submit the changes to those files that are shown in this review as
a patch on Impala's native-toolchain Thrift.

TODO
=
- Audit code to see if it can handle DOS kinda cases where the client
sends a huge payload that can potentially crash the server.
- Figure out how to do automated testing with LDAP.

Testing
===
- Parameterized JdbcTest to work for HS2 + HTTP mode (no TLS).

Manual testing with Beeline client (from Apache Hive), which has
builtin support to connect to HTTP(S) based HS2 compatible endpoints.

Example


-- HTTP mode:
> start-impala-cluster.py
> JDBC_URL="jdbc:hive2://localhost:/default;transportMode=http"
> beeline -u "$JDBC_URL"

-- HTTPS mode:
> cd $IMPALA_HOME
> SSL_ARGS="--ssl_client_ca_certificate=./be/src/testutil/server-cert.pem \
--ssl_server_certificate=./be/src/testutil/server-cert.pem \
--ssl_private_key=./be/src/testutil/server-key.pem --hostname=localhost"
> start-impala-cluster.py --impalad_args="$SSL_ARGS" \
--catalogd_args="$SSL_ARGS" --state_store_args="$SSL_ARGS"
- Create a local trust store using 'keytool' and import the certificate
from server-cert.pem (./clientkeystore in the example).
> JDBC_URL="jdbc:hive2://localhost:/default;ssl=true;sslTrustStore= \
./clientkeystore;trustStorePassword=password;transportMode=http"
> beeline -u "$JDBC_URL"

-- BASIC Auth with LDAP:
> LDAP_ARGS="--enable_ldap_auth --ldap_uri='ldap://...' \
--ldap_bind_pattern='...' --ldap_passwords_in_clear_ok"
> start-impala-cluster.py --impalad_args="$LDAP_ARGS"
> JDBC_URL="jdbc:hive2://localhost:28000/default;user=...;password=\
...;transportMode=http"
> beeline -u "$JDBC_URL"

-- HTTPS mode with LDAP:
> start-impala-cluster.py --impalad_args="$LDAP_ARGS $SSL_ARGS" \
--catalogd_args="$SSL_ARGS" --state_store_args="$SSL_ARGS"
> JDBC_URL="jdbc:hive2://localhost:28000/default;user=...;password=\
...;ssl=true;sslTrustStore=./clientkeystore;trustStorePassword=\
password;transportMode=http"
> beeline -u "$JDBC_URL"

Change-Id: Ic5569ac62ef3af2868b5d0581f5029dac736b2ff
---
M be/src/rpc/auth-provider.h
M be/src/rpc/authentication-test.cc
M be/src/rpc/authentication.cc
M be/src/rpc/thrift-server.cc
M be/src/rpc/thrift-server.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/service/impalad-main.cc
M be/src/testutil/in-process-servers.cc
M be/src/testutil/in-process-servers.h
M be/src/transport/THttpServer.cpp
M be/src/transport/THttpServer.h
M be/src/transport/THttpTransport.cpp
M be/src/transport/THttpTransport.h
M bin/start-impala-cluster.py
M common/thrift/metrics.json
M fe/src/test/java/org/apache/impala/service/JdbcTest.java
M fe/src/test/java/org/apache/impala/testutil/ImpalaJdbcClient.java
M tests/common/impala_cluster.py
19 files changed, 480 insertions(+), 116 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/13299/2
--
To view, visit http://gerrit.cloudera.org:8080/13299
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic5569ac62ef3af2868b5d0581f5029dac736b2ff
Gerrit-Change-Number: 13299
Gerrit-PatchSet: 2
Gerrit-Owner: Thomas Marshall 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sudhanshu Arora 
Gerrit-Reviewer: Todd Lipcon 


[Impala-ASF-CR] HS2 + HTTP(S) + BASIC/LDAP based thrift server endpoint

2019-05-10 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13299 )

Change subject: HS2 + HTTP(S) + BASIC/LDAP based thrift server endpoint
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13299/2/bin/start-impala-cluster.py
File bin/start-impala-cluster.py:

http://gerrit.cloudera.org:8080/#/c/13299/2/bin/start-impala-cluster.py@204
PS2, Line 204:
flake8: E203 whitespace before ':'



-- 
To view, visit http://gerrit.cloudera.org:8080/13299
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5569ac62ef3af2868b5d0581f5029dac736b2ff
Gerrit-Change-Number: 13299
Gerrit-PatchSet: 2
Gerrit-Owner: Thomas Marshall 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sudhanshu Arora 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 10 May 2019 23:14:33 +
Gerrit-HasComments: Yes