[Impala-ASF-CR] [DOCS] An upgrade consideratiobn for IMPALA-7800

2019-07-19 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13879 )

Change subject: [DOCS] An upgrade consideratiobn for IMPALA-7800
..


Patch Set 1: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13879/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13879/1//COMMIT_MSG@7
PS1, Line 7: consideratiobn
typo



--
To view, visit http://gerrit.cloudera.org:8080/13879
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia239fec4973722bc2e53239d3dc5df608a341d92
Gerrit-Change-Number: 13879
Gerrit-PatchSet: 1
Gerrit-Owner: Alex Rodoni 
Gerrit-Reviewer: Alex Rodoni 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Fri, 19 Jul 2019 22:09:41 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] [WIP] IMPALA-8339: Add local node blacklist to coordinators

2019-07-18 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13868 )

Change subject: [WIP] IMPALA-8339: Add local node blacklist to coordinators
..


Patch Set 1:

(16 comments)

Some comments on the first pass. May be helpful to document the interaction 
between blacklisted node and cluster membership change.

http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/runtime/coordinator-backend-state.h
File be/src/runtime/coordinator-backend-state.h:

http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/runtime/coordinator-backend-state.h@284
PS1, Line 284: Exec()
ExecQueryFInstance()


http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/runtime/coordinator.cc@394
PS1, Line 394:   // The Exec() rpc failed, so blacklist the node.
 :   const TBackendDescriptor& be_desc = 
backend_state->exec_params()->be_desc;
 :   
ExecEnv::GetInstance()->cluster_membership_mgr()->Blacklist(be_desc);
One interesting case is what if the coordinator cannot RPC to itself. I suppose 
the assumption we make here is that blacklisting certain nodes will prevent 
scheduler from scheduling scans and thus the rest of the fragments on them. The 
best practice is to have dedicated coordinators and executors so a coordinator 
usually doesn't take the role of an executor. However, for certain queries, 
there may always be a root fragment which will run on the coordinator.

This may get better as we move the root fragments off the coordinator.


http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/scheduling/cluster-membership-mgr.h
File be/src/scheduling/cluster-membership-mgr.h:

http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/scheduling/cluster-membership-mgr.h@96
PS1, Line 96: Ohly
typo


http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/scheduling/cluster-membership-mgr.h@206
PS1, Line 206:   boost::mutex update_membership_lock_;
Does this also need the mutable modifier like the other two locks ?

Also, membership update seems to touch quite a number of fields in this class. 
May be it's better to clearly document the thread safety of each field.


http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/scheduling/cluster-membership-mgr.h@206
PS1, Line 206: update_membership_lock_;
Is there any chance this mutex may be held at the same time as various other 
mutex in this class ? If so, we had better document the lock ordering.


http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/scheduling/cluster-membership-mgr.cc
File be/src/scheduling/cluster-membership-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/scheduling/cluster-membership-mgr.cc@119
PS1, Line 119:   // Get any previously blacklisted nodes that should be 
unblacklisted.
 :   std::list unblacklisted = 
node_blacklist_.Maintenance();
I suppose this relies on StateStore being alive and keeps sending periodic 
cluster membership update. It's a fair assumption that StateStore should 
recover within a reasonable amount of time.


http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/scheduling/cluster-membership-mgr.cc@168
PS1, Line 168: // Deleted item
How would this work if the deleted item is in the list unblacklisted ?


http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/scheduling/cluster-membership-mgr.cc@175
PS1, Line 175:  !blacklisted) {
How would it work for a node which is already removed from the executor group 
due to blacklisting ? Should we remove it altogether from the blacklist if the 
cluster memebership update indicates this is already deleted ? Otherwise, won't 
it eventually get into the probation state and get added back with the stale BE 
descriptor ?

In theory, a blacklisted node can get removed from the cluster and restarts 
with the same or different IP address.


http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/scheduling/cluster-membership-mgr.cc@265
PS1, Line 265: for (const TBackendDescriptor& be_desc : unblacklisted) {
How would this work if the cluster membership in the past indicates that this 
node is removed from the cluster already ? This probably warrants some more 
detailed documentation.


http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/scheduling/node-blacklist.h
File be/src/scheduling/node-blacklist.h:

http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/scheduling/node-blacklist.h@30
PS1, Line 30: Class to maintain a local blacklist of backends.
Can you please describe the thread safety of this class ? Is there any 
assumption about the locks held by caller when calling into functions of this 
class ?


http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/scheduling/node-blacklist.h@38
PS1, Line 38: come
typo


http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/scheduling/node-blacklist.h@48
PS1, Line 48: class NodeBlacklist {
Is this class called only from the ClusterMembershipMgr class 

[Impala-ASF-CR] IMPALA-8656: Re-factor PlanRootSink into blocking and buffered implementations

2019-07-18 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13873 )

Change subject: IMPALA-8656: Re-factor PlanRootSink into blocking and buffered 
implementations
..


Patch Set 7: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/13873
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8786b1a9af68ab0a8a094970d8f955eb20d04bca
Gerrit-Change-Number: 13873
Gerrit-PatchSet: 7
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 18 Jul 2019 19:18:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8656: Re-factor PlanRootSink into blocking and buffered implementations

2019-07-17 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13873 )

Change subject: IMPALA-8656: Re-factor PlanRootSink into blocking and buffered 
implementations
..


Patch Set 6: Code-Review+2

(2 comments)

http://gerrit.cloudera.org:8080/#/c/13873/6/be/src/exec/buffered-plan-root-sink.h
File be/src/exec/buffered-plan-root-sink.h:

http://gerrit.cloudera.org:8080/#/c/13873/6/be/src/exec/buffered-plan-root-sink.h@24
PS6, Line 24: /// PlanRootSink that buffers RowBatches from the 'sender' 
(fragment) thread. RowBatches
: /// are buffered in memory until a memory limit is hit.
This is just for the initial implementation only, right ? We kind need to 
support spilling to disk eventually so we can produce all the possible result 
rows from the underlying plan tree.

As it's described now, it's not completely solving the problem of a client not 
fetching all result rows.


http://gerrit.cloudera.org:8080/#/c/13873/6/be/src/exec/plan-root-sink.h
File be/src/exec/plan-root-sink.h:

http://gerrit.cloudera.org:8080/#/c/13873/6/be/src/exec/plan-root-sink.h@a67
PS6, Line 67:
:
:
:
:
:
:
:
:
nit: these are part of the interfaces of DataSink. It seems clearer to just 
leave them here as pure virtual functions.



--
To view, visit http://gerrit.cloudera.org:8080/13873
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8786b1a9af68ab0a8a094970d8f955eb20d04bca
Gerrit-Change-Number: 13873
Gerrit-PatchSet: 6
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 18 Jul 2019 04:58:46 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8543: More diagnostics for TAcceptQueueServer

2019-07-09 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13790 )

Change subject: IMPALA-8543: More diagnostics for TAcceptQueueServer
..


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/13790/1/be/src/rpc/TAcceptQueueServer.cpp
File be/src/rpc/TAcceptQueueServer.cpp:

http://gerrit.cloudera.org:8080/#/c/13790/1/be/src/rpc/TAcceptQueueServer.cpp@255
PS1, Line 255: // Insert thre
> Maybe make this two separate variables, eg. setup_timer and thread_wait_tim
Done


http://gerrit.cloudera.org:8080/#/c/13790/1/be/src/rpc/TAcceptQueueServer.cpp@293
PS1, Line 293: ient $
> Can this one also be made const?
Done


http://gerrit.cloudera.org:8080/#/c/13790/1/be/src/rpc/TAcceptQueueServer.cpp@406
PS1, Line 406:
> maybe label these, eg: /* highest_trackable_value */
Done



--
To view, visit http://gerrit.cloudera.org:8080/13790
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I33b32352b457a2c8ec7bae6da46bb9c555dc9c36
Gerrit-Change-Number: 13790
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Wed, 10 Jul 2019 01:26:50 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8543: More diagnostics for TAcceptQueueServer

2019-07-09 Thread Michael Ho (Code Review)
Michael Ho has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/13790 )

Change subject: IMPALA-8543: More diagnostics for TAcceptQueueServer
..

IMPALA-8543: More diagnostics for TAcceptQueueServer

This change adds more logging information in TAcceptQueueServer
to help diagnose issues at various stages of client connections
establishment.

Two new metrics are also added to measure the connection setup time
and the wait time for service threads to be available.

Change-Id: I33b32352b457a2c8ec7bae6da46bb9c555dc9c36
---
M be/src/rpc/TAcceptQueueServer.cpp
M be/src/rpc/TAcceptQueueServer.h
M common/thrift/metrics.json
3 files changed, 189 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/13790/2
--
To view, visit http://gerrit.cloudera.org:8080/13790
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I33b32352b457a2c8ec7bae6da46bb9c555dc9c36
Gerrit-Change-Number: 13790
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-8732: Use a serialized descriptor table in TQueryCtx

2019-07-09 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13772 )

Change subject: IMPALA-8732: Use a serialized descriptor table in TQueryCtx
..


Patch Set 4: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13772/4/be/src/runtime/descriptors.h
File be/src/runtime/descriptors.h:

http://gerrit.cloudera.org:8080/#/c/13772/4/be/src/runtime/descriptors.h@539
PS4, Line 539: WARN_UNUSED_RESULT;
Do DeserializeThrift() and CreateTblDecriptorInternal() also need this ?



--
To view, visit http://gerrit.cloudera.org:8080/13772
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I458aa62dd4d1e4e4a7b1869a604623a69f3b2d9a
Gerrit-Change-Number: 13772
Gerrit-PatchSet: 4
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Wed, 10 Jul 2019 00:34:45 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8748: Always pass hostname to RpcMgr::GetProxy()

2019-07-08 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13818 )

Change subject: IMPALA-8748: Always pass hostname to RpcMgr::GetProxy()
..


Patch Set 4: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/13818
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I85b661c8c3b3b67bfc1ce9e29aecb90a862666c0
Gerrit-Change-Number: 13818
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Tue, 09 Jul 2019 05:07:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8748: Always pass hostname to RpcMgr::GetProxy()

2019-07-08 Thread Michael Ho (Code Review)
Hello Thomas Tauber-Marshall, Todd Lipcon, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/13818

to look at the new patch set (#4).

Change subject: IMPALA-8748: Always pass hostname to RpcMgr::GetProxy()
..

IMPALA-8748: Always pass hostname to RpcMgr::GetProxy()

This change fixes some callers of RpcMgr::GetProxy() tp
pass the hostname instead of the resolved IP adddress.
The hostname passed to RpcMgr::GetProxy() is used to
construct the principal name of the remote destination
when Kerberos is enabled.

Change-Id: I85b661c8c3b3b67bfc1ce9e29aecb90a862666c0
---
M be/src/rpc/rpc-mgr.h
M be/src/runtime/coordinator-backend-state.cc
M be/src/service/client-request-state.cc
3 files changed, 13 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/18/13818/4
--
To view, visit http://gerrit.cloudera.org:8080/13818
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I85b661c8c3b3b67bfc1ce9e29aecb90a862666c0
Gerrit-Change-Number: 13818
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Todd Lipcon 


[Impala-ASF-CR] IMPALA-8748: Always pass hostname to RpcMgr::GetProxy()

2019-07-08 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13818 )

Change subject: IMPALA-8748: Always pass hostname to RpcMgr::GetProxy()
..


Patch Set 2: Code-Review+2

Carry Todd's +2


--
To view, visit http://gerrit.cloudera.org:8080/13818
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I85b661c8c3b3b67bfc1ce9e29aecb90a862666c0
Gerrit-Change-Number: 13818
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Tue, 09 Jul 2019 05:04:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8748: Always pass hostname to RpcMgr::GetProxy()

2019-07-08 Thread Michael Ho (Code Review)
Michael Ho has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/13818 )

Change subject: IMPALA-8748: Always pass hostname to RpcMgr::GetProxy()
..

IMPALA-8748: Always pass hostname to RpcMgr::GetProxy()

This change fixes some callers of RpcMgr::GetProxy() tp
pass the hostname instead of the resolved IP adddress.
A new DCHECK is added to make sure calleres are not
passing the resolved IP address as hostname.

The hostname passed to RpcMgr::GetProxy() is used to
construct the principal name of the remote destination
when Kerberos is enabled.

Change-Id: I85b661c8c3b3b67bfc1ce9e29aecb90a862666c0
---
M be/src/rpc/rpc-mgr.h
M be/src/runtime/coordinator-backend-state.cc
M be/src/service/client-request-state.cc
3 files changed, 13 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/18/13818/2
--
To view, visit http://gerrit.cloudera.org:8080/13818
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I85b661c8c3b3b67bfc1ce9e29aecb90a862666c0
Gerrit-Change-Number: 13818
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Todd Lipcon 


[Impala-ASF-CR] IMPALA-8748: Always pass hostname to RpcMgr::GetProxy()

2019-07-08 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13818 )

Change subject: IMPALA-8748: Always pass hostname to RpcMgr::GetProxy()
..


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/13818/1/be/src/rpc/rpc-mgr.h
File be/src/rpc/rpc-mgr.h:

http://gerrit.cloudera.org:8080/#/c/13818/1/be/src/rpc/rpc-mgr.h@148
PS1, Line 148: has to match
> if Kerberos is enabled
Done


http://gerrit.cloudera.org:8080/#/c/13818/1/be/src/rpc/rpc-mgr.h@149
PS1, Line 149: f Kerberos is enabled. 'P' must descend fr
> The "usually" here seems to suggest that the DCHECK could fail in some case
Removed in PS2. Please see replies in rpc-mgr.inline.h


http://gerrit.cloudera.org:8080/#/c/13818/1/be/src/rpc/rpc-mgr.inline.h
File be/src/rpc/rpc-mgr.inline.h:

http://gerrit.cloudera.org:8080/#/c/13818/1/be/src/rpc/rpc-mgr.inline.h@44
PS1, Line 44:   // Talk to self via loopback.
> I don't think this DCHECK is valid -- it's OK to run things in an environme
Yes, was having a hard time coming up with a DCHECK which works for all cases. 
I suspect Impala may fail elsewhere if DNS is disabled but there is nothing 
from stopping a user from configuring the hostname as 1.2.3.4 for whatever 
reason. Anyhow, removing it for now.

FWIW, I confirmed that this DCHECK actually catches the bug which this patch is 
trying to fix.



--
To view, visit http://gerrit.cloudera.org:8080/13818
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I85b661c8c3b3b67bfc1ce9e29aecb90a862666c0
Gerrit-Change-Number: 13818
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Tue, 09 Jul 2019 05:03:55 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8744: Fix test session expiration.py to work with Python 2.6

2019-07-08 Thread Michael Ho (Code Review)
Michael Ho has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/13819


Change subject: IMPALA-8744: Fix test_session_expiration.py to work with Python 
2.6
..

IMPALA-8744: Fix test_session_expiration.py to work with Python 2.6

This patches fixes a formatting string to use "{0}" rather than "{}"
as Python 2.6 doesn't support "{}"

Testing done:
- Ran the test on Centos6 platform with Python 2.6

Change-Id: I3a914a9ca881181d47bee68641b9c7affbfc0d54
---
M tests/custom_cluster/test_session_expiration.py
1 file changed, 1 insertion(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/13819/1
--
To view, visit http://gerrit.cloudera.org:8080/13819
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I3a914a9ca881181d47bee68641b9c7affbfc0d54
Gerrit-Change-Number: 13819
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 


[Impala-ASF-CR] IMPALA-8748: Always pass hostname to RpcMgr::GetProxy()

2019-07-08 Thread Michael Ho (Code Review)
Michael Ho has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/13818


Change subject: IMPALA-8748: Always pass hostname to RpcMgr::GetProxy()
..

IMPALA-8748: Always pass hostname to RpcMgr::GetProxy()

This change fixes some callers of RpcMgr::GetProxy() tp
pass the hostname instead of the resolved IP adddress.
A new DCHECK is added to make sure calleres are not
passing the resolved IP address as hostname.

The hostname passed to RpcMgr::GetProxy() is used to
construct the principal name of the remote destination
when Kerberos is enabled.

Change-Id: I85b661c8c3b3b67bfc1ce9e29aecb90a862666c0
---
M be/src/rpc/rpc-mgr.h
M be/src/rpc/rpc-mgr.inline.h
M be/src/runtime/coordinator-backend-state.cc
M be/src/service/client-request-state.cc
4 files changed, 17 insertions(+), 13 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/18/13818/1
--
To view, visit http://gerrit.cloudera.org:8080/13818
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I85b661c8c3b3b67bfc1ce9e29aecb90a862666c0
Gerrit-Change-Number: 13818
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 


[Impala-ASF-CR] IMPALA-8341: [DOCS] Added a note about the experiemental feature

2019-07-02 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13788 )

Change subject: IMPALA-8341: [DOCS] Added a note about the experiemental feature
..


Patch Set 1: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13788/1/docs/topics/impala_data_cache.xml
File docs/topics/impala_data_cache.xml:

http://gerrit.cloudera.org:8080/#/c/13788/1/docs/topics/impala_data_cache.xml@64
PS1, Line 64:   The specified directories must exist in the local 
filesystem of each Impala Daemon.
or Impala will fail to start.



--
To view, visit http://gerrit.cloudera.org:8080/13788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If9f5795a28738f7df435a3bea06ffa1a216fd04e
Gerrit-Change-Number: 13788
Gerrit-PatchSet: 1
Gerrit-Owner: Alex Rodoni 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Wed, 03 Jul 2019 01:28:49 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8732: Use a serialized descriptor table in TQueryCtx

2019-07-02 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13772 )

Change subject: IMPALA-8732: Use a serialized descriptor table in TQueryCtx
..


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/13772/1/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/13772/1/be/src/runtime/coordinator-backend-state.cc@200
PS1, Line 200: sidecar.__set_query_ctx(query_ctx());
Makes me wonder if we should factor out the descriptor table from query_ctx() 
and send the entire serialized descriptor table byte array as a sidecar instead 
to avoid another malloc + copy in line 213 below.

If the above makes sense, it could be a follow-up change to make backporting 
easier.


http://gerrit.cloudera.org:8080/#/c/13772/1/be/src/runtime/descriptors.h
File be/src/runtime/descriptors.h:

http://gerrit.cloudera.org:8080/#/c/13772/1/be/src/runtime/descriptors.h@480
PS1, Line 480: thrift_tbl
stale


http://gerrit.cloudera.org:8080/#/c/13772/1/be/src/runtime/descriptors.h@485
PS1, Line 485: thrift_tbl
stale



--
To view, visit http://gerrit.cloudera.org:8080/13772
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I458aa62dd4d1e4e4a7b1869a604623a69f3b2d9a
Gerrit-Change-Number: 13772
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Wed, 03 Jul 2019 00:48:36 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8543: More diagnostics for TAcceptQueueServer

2019-07-02 Thread Michael Ho (Code Review)
Michael Ho has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/13790


Change subject: IMPALA-8543: More diagnostics for TAcceptQueueServer
..

IMPALA-8543: More diagnostics for TAcceptQueueServer

This change adds more logging information in TAcceptQueueServer
to help diagnose issues at various stages of client connections
establishment.

Two new metrics are also added to measure the connection setup time
and the wait time for service threads to be available.

Change-Id: I33b32352b457a2c8ec7bae6da46bb9c555dc9c36
---
M be/src/rpc/TAcceptQueueServer.cpp
M be/src/rpc/TAcceptQueueServer.h
M common/thrift/metrics.json
3 files changed, 180 insertions(+), 22 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/13790/1
--
To view, visit http://gerrit.cloudera.org:8080/13790
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I33b32352b457a2c8ec7bae6da46bb9c555dc9c36
Gerrit-Change-Number: 13790
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 


[Impala-ASF-CR] IMPALA-8427: [DOCS] Document the new startup flag IMPALA-7800 introduced

2019-07-01 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13762 )

Change subject: IMPALA-8427: [DOCS] Document the new startup flag IMPALA-7800 
introduced
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/13762
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1a4c1975721c32a78a003d91babc5d2bb90f3949
Gerrit-Change-Number: 13762
Gerrit-PatchSet: 2
Gerrit-Owner: Alex Rodoni 
Gerrit-Reviewer: Alex Rodoni 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Tue, 02 Jul 2019 01:25:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] Update squeasel to 7973705170f4744d1806e32695f7ea1e8308ee95

2019-07-01 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13768 )

Change subject: Update squeasel to 7973705170f4744d1806e32695f7ea1e8308ee95
..


Patch Set 1: Code-Review+2

Didn't look too much into the diff in squeasel files as they are mostly 
copy-and-paste from upstream I assume.

LGTM otherwise. Please fix the formatting nit in webserver.cc


--
To view, visit http://gerrit.cloudera.org:8080/13768
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I17f90561e0ea6b0917fff51b055225060a4fa549
Gerrit-Change-Number: 13768
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Mon, 01 Jul 2019 21:09:29 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8427: [DOCS] Document the new startup flag IMPALA-7800 introduced

2019-06-28 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13762 )

Change subject: IMPALA-8427: [DOCS] Document the new startup flag IMPALA-7800 
introduced
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13762/1/docs/topics/impala_client.xml
File docs/topics/impala_client.xml:

http://gerrit.cloudera.org:8080/#/c/13762/1/docs/topics/impala_client.xml@115
PS1, Line 115: >--accepted_client_cnxn_timeout > 0,
May want to add that this is in seconds.



--
To view, visit http://gerrit.cloudera.org:8080/13762
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1a4c1975721c32a78a003d91babc5d2bb90f3949
Gerrit-Change-Number: 13762
Gerrit-PatchSet: 1
Gerrit-Owner: Alex Rodoni 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Fri, 28 Jun 2019 22:39:15 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] Update kudu/security from 9ebcb77aa911aae76c48e717af24e643cb81908d

2019-06-28 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13767 )

Change subject: Update kudu/security from 
9ebcb77aa911aae76c48e717af24e643cb81908d
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/13767
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie7c91193fd49f8ca1234b23cf61fc90c1fdbe2e0
Gerrit-Change-Number: 13767
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Fri, 28 Jun 2019 22:37:41 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6159 / KUDU-2192: Enable TCP keepalive for all outbound connections

2019-06-28 Thread Michael Ho (Code Review)
Michael Ho has removed Kudu Jenkins from this change.  ( 
http://gerrit.cloudera.org:8080/13764 )

Change subject: IMPALA-6159 / KUDU-2192: Enable TCP keepalive for all outbound 
connections
..


Removed reviewer Kudu Jenkins.
--
To view, visit http://gerrit.cloudera.org:8080/13764
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: deleteReviewer
Gerrit-Change-Id: Iaa1d66d83aea1cc82d07fc6217be5fc1306695bc
Gerrit-Change-Number: 13764
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Todd Lipcon 


[Impala-ASF-CR] IMPALA-6159 / KUDU-2192: Enable TCP keepalive for all outbound connections

2019-06-28 Thread Michael Ho (Code Review)
Hello Alexey Serbin, Kudu Jenkins, Todd Lipcon,

I'd like you to do a code review. Please visit

http://gerrit.cloudera.org:8080/13764

to review the following change.


Change subject: IMPALA-6159 / KUDU-2192: Enable TCP keepalive for all outbound 
connections
..

IMPALA-6159 / KUDU-2192: Enable TCP keepalive for all outbound connections

This change enables TCP keepalive for all outbound connections.
This aims to handle cases in which the remote peer may have
dropped off the network without sending a TCP RST. For instance,
a remote host could have hit a kernel panic and got power cycled.
In which case, the existing TCP connection to that host may be
stale. In an idle cluster, this stale connection may not be detected
until the next use of it, in which case it will result in a RPC
failure due to TCP RST sent from the restarted peer.

By enabling TCP keepalive, we ensure that stale TCP connections
in an idle cluster will be detected and closed within a time bound
so a new connection will be created on the next use. This change
introduces 3 different flags:

--tcp_keepalive_probe_period_s: the duration in seconds a TCP connection
has to be idle before keepalive probes started to be sent.

--tcp_keepalive_retry_period_s: the duration in seconds between successive
keepalive probes if previous probes didn't get an ACK from remote peer.

--tcp_keepalive_retry_count: the maximum number of TCP keepalive probes
sent without an ACK before declaring the remote peer as dead.

Testing:
- Used TCP dump to verify that keepalive probes are being sent periodically.
- Verified that blocking all incoming traffic to a server's port via an iptable
rule caused the TCP connection to be closed and the keepalive probes to stop
eventually.

Change-Id: Iaa1d66d83aea1cc82d07fc6217be5fc1306695bc
Reviewed-on: http://gerrit.cloudera.org:8080/13702
Reviewed-by: Alexey Serbin 
Reviewed-by: Todd Lipcon 
Tested-by: Kudu Jenkins
---
M be/src/kudu/rpc/connection.cc
M be/src/kudu/rpc/connection.h
M be/src/kudu/rpc/reactor.cc
M be/src/kudu/rpc/reactor.h
M be/src/kudu/rpc/rpc-test.cc
M be/src/kudu/util/net/socket.cc
M be/src/kudu/util/net/socket.h
7 files changed, 112 insertions(+), 2 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/13764/1
--
To view, visit http://gerrit.cloudera.org:8080/13764
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Iaa1d66d83aea1cc82d07fc6217be5fc1306695bc
Gerrit-Change-Number: 13764
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon 


[Impala-ASF-CR] IMPALA-8341: [DOCS] Describe the setting for remote data caching

2019-06-26 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13724 )

Change subject: IMPALA-8341: [DOCS] Describe the setting for remote data caching
..


Patch Set 3: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/13724
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7dd958e4de109b46eaf906fe93145799af123b3f
Gerrit-Change-Number: 13724
Gerrit-PatchSet: 3
Gerrit-Owner: Alex Rodoni 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Wed, 26 Jun 2019 18:45:27 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8341: [DOCS] Describe the settings for remote data caching

2019-06-25 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13724 )

Change subject: IMPALA-8341: [DOCS] Describe the settings for remote data 
caching
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13724/1/docs/topics/impala_data_cache.xml
File docs/topics/impala_data_cache.xml:

http://gerrit.cloudera.org:8080/#/c/13724/1/docs/topics/impala_data_cache.xml@44
PS1, Line 44: --data_cache_dir
--data_cache_dir and --data_cache_size are options built specifically only for 
./bin/start-impala-cluster.py as that script needs to create extra 
sub-directories for the caching directory.

To specify the caching directory, the user should use the flag: 
--data_cache=,,:

With the above configuration, data will be stored in ,  and  
respectively. The user needs to make sure those directories exist in the local 
filesystem to begin with. In addition, the filesystem which the directory 
resides in must support hole punching. Modern filesystems such as ext4 and xfs 
support this feature. The cache may consume up to  bytes for each of the 
directories specified. In other words, with the above configuration, the total 
cache size can be up to 3 * .

Please see 
https://github.com/apache/impala/blob/master/be/src/runtime/io/disk-io-mgr.cc#L58-L63
 for the definition of the flag.



--
To view, visit http://gerrit.cloudera.org:8080/13724
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7dd958e4de109b46eaf906fe93145799af123b3f
Gerrit-Change-Number: 13724
Gerrit-PatchSet: 1
Gerrit-Owner: Alex Rodoni 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Tue, 25 Jun 2019 17:50:09 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7802: Close connections of idle client sessions

2019-06-20 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13607 )

Change subject: IMPALA-7802: Close connections of idle client sessions
..


Patch Set 4: Code-Review+2

Carry Thomas' +2


--
To view, visit http://gerrit.cloudera.org:8080/13607
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I97c4fb8e1b741add273f8a913fb0967303683e38
Gerrit-Change-Number: 13607
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Manish Maheshwari 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Thu, 20 Jun 2019 18:02:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7802: Close connections of idle client sessions

2019-06-20 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13607 )

Change subject: IMPALA-7802: Close connections of idle client sessions
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13607/2/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/13607/2/be/src/service/impala-server.cc@219
PS2, Line 219: DEFINE_int32(idle_client_poll_period_ms, 3, "The poll 
period, in milliseconds, after "
> Maybe make this seconds not ms? I don't think there's a reason anyone would
Done



--
To view, visit http://gerrit.cloudera.org:8080/13607
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I97c4fb8e1b741add273f8a913fb0967303683e38
Gerrit-Change-Number: 13607
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Manish Maheshwari 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Thu, 20 Jun 2019 17:58:17 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7802: Close connections of idle client sessions

2019-06-20 Thread Michael Ho (Code Review)
Michael Ho has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/13607 )

Change subject: IMPALA-7802: Close connections of idle client sessions
..

IMPALA-7802: Close connections of idle client sessions

Previously, if idle session timeout is set either via
startup flag or query options, a client session will expire
after that set period of inactivity. However, the network
connection and the service thread of an expired session will
still be around until the session is closed by the client.
This is highly undesirable as these idle sessions still count
towards the quota bound by --fe_esrvice_threads, so if the
total number of sessions (including the idle ones) reaches
that upper bound, all incoming new session will block until
some of the existing sessions exit. There is no time bound on
when those expired sessions will be closed. In some sense,
leaving many idle sessions opened is a denial-of-service attack
on Impala.

This change implements support for closing expired client sessions.
In particular, a new flag --idle_client_poll_time_s is added to
specify a time interval in seconds of client's inactivity which
will cause an idle service thread of a client connection to wake up
and check if all sessions associated with the connection are idle.
If so, the connection will be closed. This allows the service threads
to be freed up without waiting for client to close the connections.

Testing done:
- core build
- new targeted test which verifies the connections of expired sessions
are closed.
- verified the flags function as expected in a secure cluster with Kerberos + 
SSL

Change-Id: I97c4fb8e1b741add273f8a913fb0967303683e38
---
M be/src/rpc/TAcceptQueueServer.cpp
M be/src/rpc/TAcceptQueueServer.h
M be/src/rpc/thrift-server.cc
M be/src/rpc/thrift-server.h
M be/src/rpc/thrift-util.cc
M be/src/rpc/thrift-util.h
M be/src/runtime/client-cache.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M tests/custom_cluster/test_hs2.py
M tests/custom_cluster/test_session_expiration.py
11 files changed, 329 insertions(+), 94 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/13607/4
--
To view, visit http://gerrit.cloudera.org:8080/13607
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I97c4fb8e1b741add273f8a913fb0967303683e38
Gerrit-Change-Number: 13607
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Manish Maheshwari 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoram Thanga 


[Impala-ASF-CR] IMPALA-7802: Close connections of idle client sessions

2019-06-18 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13607 )

Change subject: IMPALA-7802: Close connections of idle client sessions
..


Patch Set 3: Code-Review+1

Carry Tim's +1.


--
To view, visit http://gerrit.cloudera.org:8080/13607
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I97c4fb8e1b741add273f8a913fb0967303683e38
Gerrit-Change-Number: 13607
Gerrit-PatchSet: 3
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Manish Maheshwari 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Tue, 18 Jun 2019 21:48:04 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7467: Port ExecQueryFInstances to krpc

2019-06-18 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13583 )

Change subject: IMPALA-7467: Port ExecQueryFInstances to krpc
..


Patch Set 3: Code-Review+2

(2 comments)

http://gerrit.cloudera.org:8080/#/c/13583/2/be/src/runtime/backend-client.h
File be/src/runtime/backend-client.h:

http://gerrit.cloudera.org:8080/#/c/13583/2/be/src/runtime/backend-client.h@48
PS2, Line 48:   /// Callers of TransmitData() should provide their own counter 
to measure the data
:   /// transmission time.
:   void 
SetTransmitDataCounter(RuntimeProfile::ConcurrentTimerCounter* csw) {
: DCHECK(transmit_csw_ == NULL);
: transmit_csw_ = csw;
:   }
:
:   /// ImpalaBackendClient is shared by multiple queries. It's the 
caller's responsibility
:   /// to reset the counter after data transmission.
:   void ResetTransmitDataCounter() {
: transmit_csw_ = NULL;
:   }
Not your change but I believe these are not needed anymore. May as well delete 
them while you are here.


http://gerrit.cloudera.org:8080/#/c/13583/2/common/protobuf/control_service.proto
File common/protobuf/control_service.proto:

http://gerrit.cloudera.org:8080/#/c/13583/2/common/protobuf/control_service.proto@235
PS2, Line 235: 5
Why the jump from 2 to 5 ?



--
To view, visit http://gerrit.cloudera.org:8080/13583
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id3f1c6099109bd8e5361188005a7d0e892147570
Gerrit-Change-Number: 13583
Gerrit-PatchSet: 3
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Tue, 18 Jun 2019 20:37:00 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8542. Add an access trace for the data cache

2019-06-18 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13425 )

Change subject: IMPALA-8542. Add an access trace for the data cache
..


Patch Set 8: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/13425
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c
Gerrit-Change-Number: 13425
Gerrit-PatchSet: 8
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Tue, 18 Jun 2019 06:05:10 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7802: Close connections of idle client sessions

2019-06-17 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13607 )

Change subject: IMPALA-7802: Close connections of idle client sessions
..


Patch Set 2:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/13607/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13607/1//COMMIT_MSG@30
PS1, Line 30:
> I do wonder if we should try to handle the specific case where a connection
IIRC, most of the problem involving load balancer we saw in the past never 
quite made it to the point of completing the connection establishment 
handshake. In particular, most of them were stuck in the stage of SASL / 
Kerberos step. There is already a timeout for that step although it's a bit 
high by default (5 mins) but that's to prevent false positive of closing client 
connection when KDC may be overloaded.


http://gerrit.cloudera.org:8080/#/c/13607/1/be/src/rpc/TAcceptQueueServer.cpp
File be/src/rpc/TAcceptQueueServer.cpp:

http://gerrit.cloudera.org:8080/#/c/13607/1/be/src/rpc/TAcceptQueueServer.cpp@78
PS1, Line 78: // Setting a socket timeout for process() may lead to 
false positive
> Ok, makes sense. Maybe leave a comment here to explain that?
Done


http://gerrit.cloudera.org:8080/#/c/13607/1/be/src/rpc/thrift-server.h
File be/src/rpc/thrift-server.h:

http://gerrit.cloudera.org:8080/#/c/13607/1/be/src/rpc/thrift-server.h@181
PS1, Line 181: /// Called by the Thrift server implementation when it has 
acquired its resources and
> line too long (92 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/13607/1/be/src/rpc/thrift-server.h@182
PS1, Line 182: /// is ready to serve, and signals to StartAndWaitForServer 
that start-up is finished.
> line too long (92 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/13607/1/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/13607/1/be/src/service/impala-server.cc@221
PS1, Line 221: idle_session_timeout
> Understood. It would be good to make it clear in the docs that idle_client_
Points taken. A doc bug will be filed once this patch is in.


http://gerrit.cloudera.org:8080/#/c/13607/1/be/src/service/impala-server.cc@2072
PS1, Line 2072: unique_lock l(connection_to_sessions_map_lock_);
> What happens if the entry is present but the set is empty? Shouldn't that b
DCHECK added.


http://gerrit.cloudera.org:8080/#/c/13607/1/be/src/service/impala-server.cc@2074
PS1, Line 2074:
> Why not session_ids = it->second? To avoid the double-lookup.
Done


http://gerrit.cloudera.org:8080/#/c/13607/1/be/src/service/impala-server.cc@2086
PS1, Line 2086:   // Check if all the sessions associated with the connection 
are idle.
> Minor thing, but this copy increments the refcount, and I don't think it's
Done


http://gerrit.cloudera.org:8080/#/c/13607/1/tests/custom_cluster/test_session_expiration.py
File tests/custom_cluster/test_session_expiration.py:

http://gerrit.cloudera.org:8080/#/c/13607/1/tests/custom_cluster/test_session_expiration.py@119
PS1, Line 119: @
> flake8: E303 too many blank lines (2)
Done


http://gerrit.cloudera.org:8080/#/c/13607/1/tests/custom_cluster/test_session_expiration.py@122
PS1, Line 122:   def test_closing_idle_connection(self, vector):
> I guess it's still expected that these will stay open, so maybe we don't ne
Done



--
To view, visit http://gerrit.cloudera.org:8080/13607
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I97c4fb8e1b741add273f8a913fb0967303683e38
Gerrit-Change-Number: 13607
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Manish Maheshwari 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Mon, 17 Jun 2019 20:48:13 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7802: Close connections of idle client sessions

2019-06-17 Thread Michael Ho (Code Review)
Michael Ho has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/13607 )

Change subject: IMPALA-7802: Close connections of idle client sessions
..

IMPALA-7802: Close connections of idle client sessions

Previously, if idle session timeout is set either via
startup flag or query options, a client session will expire
after that set period of inactivity. However, the network
connection and the service thread of an expired session will
still be around until the session is closed by the client.
This is highly undesirable as these idle sessions still count
towards the quota bound by --fe_esrvice_threads, so if the
total number of sessions (including the idle ones) reaches
that upper bound, all incoming new session will block until
some of the existing sessions exit. There is no time bound on
when those expired sessions will be closed. In some sense,
leaving many idle sessions opened is a denial-of-service attack
on Impala.

This change implements support for closing expired client sessions.
In particular, a new flag --idle_client_poll_time_ms is added to
specify a time interval in milliseconds of client's inactivity which
will cause an idle service thread of a client connection to wake up
and check if all sessions associated with the connection are idle.
If so, the connection will be closed. This allows the service threads
to be freed up without waiting for client to close the connections.

Testing done:
- core build
- new targeted test which verifies the connections of expired sessions
are closed.

Change-Id: I97c4fb8e1b741add273f8a913fb0967303683e38
---
M be/src/rpc/TAcceptQueueServer.cpp
M be/src/rpc/TAcceptQueueServer.h
M be/src/rpc/thrift-server.cc
M be/src/rpc/thrift-server.h
M be/src/rpc/thrift-util.cc
M be/src/rpc/thrift-util.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M tests/custom_cluster/test_hs2.py
M tests/custom_cluster/test_session_expiration.py
10 files changed, 315 insertions(+), 85 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/13607/2
-- 
To view, visit http://gerrit.cloudera.org:8080/13607
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I97c4fb8e1b741add273f8a913fb0967303683e38
Gerrit-Change-Number: 13607
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Manish Maheshwari 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoram Thanga 


[Impala-ASF-CR] IMPALA-8542. Add an access trace for the data cache

2019-06-13 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13425 )

Change subject: IMPALA-8542. Add an access trace for the data cache
..


Patch Set 5: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13425/3/be/src/runtime/io/data-cache.cc
File be/src/runtime/io/data-cache.cc:

http://gerrit.cloudera.org:8080/#/c/13425/3/be/src/runtime/io/data-cache.cc@98
PS3, Line 98:  "impala-cache-file-";
> it currently creates the trace file with open() flags to truncate the exist
Yes, makes sense. Just wanna make sure those files won't accidentally pile up 
over time.



--
To view, visit http://gerrit.cloudera.org:8080/13425
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c
Gerrit-Change-Number: 13425
Gerrit-PatchSet: 5
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 14 Jun 2019 04:59:43 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7802: Close connections of idle client sessions

2019-06-13 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13607 )

Change subject: IMPALA-7802: Close connections of idle client sessions
..


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/13607/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13607/1//COMMIT_MSG@28
PS1, Line 28: If so, the connection will be closed. This allows the service 
threads
> I was trying to understand this scenario:
If there are no open sessions associated with a connection, the 
connection_to_session state map shouldn't find an entry if I understand the 
code correctly. In that case, we will by default mark the connection as not 
idle. Please see 
https://gerrit.cloudera.org/#/c/13607/1/be/src/service/impala-server.cc@2072

Given the above, I am not sure I understand the concern about setting it to a 
very low value.


http://gerrit.cloudera.org:8080/#/c/13607/1//COMMIT_MSG@30
PS1, Line 30:
> One thing I didn't understand about the scope of this - are we trying to ha
It's the former for sure. Will update commit message to make it clearer.


http://gerrit.cloudera.org:8080/#/c/13607/1/be/src/rpc/TAcceptQueueServer.cpp
File be/src/rpc/TAcceptQueueServer.cpp:

http://gerrit.cloudera.org:8080/#/c/13607/1/be/src/rpc/TAcceptQueueServer.cpp@78
PS1, Line 78: if (!processor_->process(input_, output_, 
connectionContext) ||
> Is it possible that the thread can get stuck in either the process() or pro
Yes but that's not the intention of this patch to address those cases. It's 
hard to tell between a slow network/client vs a stuck client so setting a 
timeout here may sometimes lead to false positive and prematurely closes a 
client's connection.

A better thing to do instead of a socket timeout is to use TCP keepalive or 
something similar to TCP_USER_TIMEOUT to check the remote end is still alive.


http://gerrit.cloudera.org:8080/#/c/13607/1/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/13607/1/be/src/service/impala-server.cc@221
PS1, Line 221: idle_session_timeout
> So if idle_session_timeout is set to 10 minutes and idle_client_poll_period
To the first point, yes, that's potentially the latest time the connection of 
an idle session will be closed.

Nope. If idle_session_timeout is not set for a session, a session cannot become 
idle by definition.

Please keep in mind that idle_session_timeout can be set per session during 
session establishment via query option. The global flag --idle_session_timeout 
is mostly there to set the default value.



--
To view, visit http://gerrit.cloudera.org:8080/13607
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I97c4fb8e1b741add273f8a913fb0967303683e38
Gerrit-Change-Number: 13607
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Anonymous Coward (443)
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Fri, 14 Jun 2019 04:46:08 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7467: Ported ExecQueryFInstances over to krpc

2019-06-13 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13583 )

Change subject: IMPALA-7467: Ported ExecQueryFInstances over to krpc
..


Patch Set 1:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/13583/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13583/1//COMMIT_MSG@9
PS1, Line 9: ExecQuerryFInstances
ExecQueryFInstances


http://gerrit.cloudera.org:8080/#/c/13583/1/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/13583/1/be/src/runtime/coordinator-backend-state.cc@176
PS1, Line 176: VLOG_QUERY
LOG(ERROR) ?


http://gerrit.cloudera.org:8080/#/c/13583/1/be/src/runtime/coordinator-backend-state.cc@209
PS1, Line 209: serialized_buf
May help to add a remark about the ownership of this buffer.


http://gerrit.cloudera.org:8080/#/c/13583/1/be/src/runtime/coordinator-backend-state.cc@214
PS1, Line 214: serialized_buf = nullptr;
Why is this necessary ?

Also, leave a comment on why we don't need to free the buffer explicitly here.


http://gerrit.cloudera.org:8080/#/c/13583/1/be/src/runtime/coordinator-backend-state.cc@218
PS1, Line 218: serialized_buf = nullptr;
Same as above.


http://gerrit.cloudera.org:8080/#/c/13583/1/be/src/runtime/coordinator-backend-state.cc@233
PS1, Line 233: LOG(DFATAL) << FromKuduStatus(sidecar_status, "Failed to add 
sidecar").GetDetail();
If we fail to add the sidecar, it doesn't seem to make sense to continue with 
the RPC as the executor cannot do much without the content in the sidecar, 
right ?


http://gerrit.cloudera.org:8080/#/c/13583/1/be/src/service/control-service.h
File be/src/service/control-service.h:

http://gerrit.cloudera.org:8080/#/c/13583/1/be/src/service/control-service.h@112
PS1, Line 112: QueryExecMgr* query_exec_mgr_;
Is this necessary ? What's wrong with getting it from the singleton ExecEnv() 
on each RPC call ? It shouldn't be that expensive.


http://gerrit.cloudera.org:8080/#/c/13583/1/be/src/service/control-service.cc
File be/src/service/control-service.cc:

http://gerrit.cloudera.org:8080/#/c/13583/1/be/src/service/control-service.cc@73
PS1, Line 73:   query_exec_mgr_ = ExecEnv::GetInstance()->query_exec_mgr();
:   DCHECK(query_exec_mgr_ != nullptr);
Move to initializer list.


http://gerrit.cloudera.org:8080/#/c/13583/1/common/thrift/ImpalaInternalService.thrift
File common/thrift/ImpalaInternalService.thrift:

http://gerrit.cloudera.org:8080/#/c/13583/1/common/thrift/ImpalaInternalService.thrift@579
PS1, Line 579: struct TExecQueryFInstancesSidecar {
Do we plan to clean it up eventually to use protobuf only ? How about leaving a 
TODO here about converting the followings to protobuf eventually ?



--
To view, visit http://gerrit.cloudera.org:8080/13583
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id3f1c6099109bd8e5361188005a7d0e892147570
Gerrit-Change-Number: 13583
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Marshall 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Thu, 13 Jun 2019 21:31:54 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7802: Close connections of idle client sessions

2019-06-13 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13607 )

Change subject: IMPALA-7802: Close connections of idle client sessions
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13607/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13607/1//COMMIT_MSG@28
PS1, Line 28: If so, the connection will be closed. This allows the service 
threads
> I think it's unlikely that HS2 clients are deliberately leaving connections
The logic will not mark a connection as idle if we cannot find any sessions 
with a given connection in the connection to session state map. Based on my 
understanding, we don't remove the session from the connection to session state 
map even after it's closed so we should be fine in that case.



--
To view, visit http://gerrit.cloudera.org:8080/13607
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I97c4fb8e1b741add273f8a913fb0967303683e38
Gerrit-Change-Number: 13607
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Thu, 13 Jun 2019 19:18:21 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7802: Close connections of idle client sessions

2019-06-13 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13607 )

Change subject: IMPALA-7802: Close connections of idle client sessions
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13607/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13607/1//COMMIT_MSG@28
PS1, Line 28: If so, the connection will be closed. This allows the service 
threads
> Do we expect the connection to be closed idle_client_poll_time_ms after the
We expect the session to be closed within idle_client_poll_time_ms after the 
last session has _expired_.

The goal of this patch is to address clients which don't close their idle 
sessions in time, which is the biggest complaint from users now.

May be it's worth adding some logic to also close the connection if there are 
no sessions associated with it for extended period of time. Based on your 
experience, do you expect it to be a common thing for clients to call 
CloseSession() without closing the connection ?



--
To view, visit http://gerrit.cloudera.org:8080/13607
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I97c4fb8e1b741add273f8a913fb0967303683e38
Gerrit-Change-Number: 13607
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Thu, 13 Jun 2019 18:25:14 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7802: Close connections of idle client sessions

2019-06-12 Thread Michael Ho (Code Review)
Michael Ho has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/13607


Change subject: IMPALA-7802: Close connections of idle client sessions
..

IMPALA-7802: Close connections of idle client sessions

Previously, if idle session timeout is set either via
startup flag or query options, a client session will expire
after that set period of inactivity. However, the network
connection and the service thread of an expired session will
still be around until the session is closed by the client.
This is highly undesirable as these idle sessions still count
towards the quota bound by --fe_esrvice_threads, so if the
total number of sessions (including the idle ones) reaches
that upper bound, all incoming new session will block until
some of the existing sessions exit. There is no time bound on
when those expired sessions will be closed. In some sense,
leaving many idle sessions opened is a denial-of-service attack
on Impala.

This change implements support for closing expired client sessions.
In particular, a new flag --idle_client_poll_time_ms is added to
specify a time interval in milliseconds of client's inactivity which
will cause an idle service thread of a client connection to wake up
and check if all sessions associated with the connection are idle.
If so, the connection will be closed. This allows the service threads
to be freed up without waiting for client to close the connections.

Testing done:
- core build
- new targeted test which verifies the connections of expired sessions
are closed.

Change-Id: I97c4fb8e1b741add273f8a913fb0967303683e38
---
M be/src/rpc/TAcceptQueueServer.cpp
M be/src/rpc/TAcceptQueueServer.h
M be/src/rpc/thrift-server.cc
M be/src/rpc/thrift-server.h
M be/src/rpc/thrift-util.cc
M be/src/rpc/thrift-util.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M tests/custom_cluster/test_hs2.py
M tests/custom_cluster/test_session_expiration.py
10 files changed, 291 insertions(+), 85 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/13607/1
--
To view, visit http://gerrit.cloudera.org:8080/13607
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I97c4fb8e1b741add273f8a913fb0967303683e38
Gerrit-Change-Number: 13607
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 


[Impala-ASF-CR] IMPALA-8659: Allow self-RPCs for KRPC to go via loopback

2019-06-12 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13592 )

Change subject: IMPALA-8659: Allow self-RPCs for KRPC to go via loopback
..


Patch Set 8: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13592/8/be/src/rpc/rpc-mgr.cc
File be/src/rpc/rpc-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/13592/8/be/src/rpc/rpc-mgr.cc@194
PS8, Line 194: // Listen on all addresses, including loopback.
May help to DCHECK that the s_addr is INADDR_ANY in some way. This seems to be 
the assumption being made about the default ctor of Sockaddr.



-- 
To view, visit http://gerrit.cloudera.org:8080/13592
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9dbd477769ed49c05e624f06da4e51afaaf1670d
Gerrit-Change-Number: 13592
Gerrit-PatchSet: 8
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Thu, 13 Jun 2019 01:21:56 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8143: Enhance DoRpcWithRetry().

2019-06-11 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12672 )

Change subject: IMPALA-8143: Enhance DoRpcWithRetry().
..


Patch Set 12: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/12672
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia9693151c35e02235665b3c285a48c585973d390
Gerrit-Change-Number: 12672
Gerrit-PatchSet: 12
Gerrit-Owner: Andrew Sherman 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Comment-Date: Tue, 11 Jun 2019 17:30:39 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8561: Eliminate mtime=-1 for HDFS scan ranges (part 1)

2019-06-10 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13522 )

Change subject: IMPALA-8561: Eliminate mtime=-1 for HDFS scan ranges (part 1)
..


Patch Set 4: Code-Review+2

(3 comments)

Thanks for fixing it.

http://gerrit.cloudera.org:8080/#/c/13522/4/be/src/runtime/io/handle-cache.inline.h
File be/src/runtime/io/handle-cache.inline.h:

http://gerrit.cloudera.org:8080/#/c/13522/4/be/src/runtime/io/handle-cache.inline.h@93
PS4, Line 93: DCHECK_GT(mtime, 0);
super minor nit: is 0 invalid ?


http://gerrit.cloudera.org:8080/#/c/13522/4/be/src/runtime/io/request-ranges.h
File be/src/runtime/io/request-ranges.h:

http://gerrit.cloudera.org:8080/#/c/13522/4/be/src/runtime/io/request-ranges.h@170
PS4, Line 170: 'mtime' matches
 :   // the modified time of the cached file in the HDFS cache
stale


http://gerrit.cloudera.org:8080/#/c/13522/4/be/src/runtime/io/request-ranges.h@244
PS4, Line 244: int64_t mtime,
Nice to have one comment about it.



--
To view, visit http://gerrit.cloudera.org:8080/13522
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I48b7ed60d6ab9104b993237b4fe23de5dc058672
Gerrit-Change-Number: 13522
Gerrit-PatchSet: 4
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 10 Jun 2019 23:31:41 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8562: Data cache should skip insertion of uncacheable ScanRanges

2019-06-04 Thread Michael Ho (Code Review)
Michael Ho has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/13369 )

Change subject: IMPALA-8562: Data cache should skip insertion of uncacheable 
ScanRanges
..

IMPALA-8562: Data cache should skip insertion of uncacheable ScanRanges

As shown in IMPALA-8561, there are some paths in the code which
create uncacheable ScanRanges. These uncacheable ScanRanges have
mtime of -1. 'mtime' is used for differentiating versions of files
with the same names. An mtime == -1 means the cache entry could
potentially be from any versions of a file with the same name.

This change skips lookup or insertion of ScanRange with negative
mtime, file offset or buffer length.

Testing done: Added targeted test cases in data-cache-test

Change-Id: I2294833b075a2ddcae956d9fdb04f3e85adb0391
---
M be/src/runtime/io/data-cache-test.cc
M be/src/runtime/io/data-cache.cc
M be/src/runtime/io/data-cache.h
M be/src/runtime/io/request-ranges.h
4 files changed, 47 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/13369/3
--
To view, visit http://gerrit.cloudera.org:8080/13369
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2294833b075a2ddcae956d9fdb04f3e85adb0391
Gerrit-Change-Number: 13369
Gerrit-PatchSet: 3
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-8562: Data cache should skip insertion of uncacheable ScanRanges

2019-06-04 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13369 )

Change subject: IMPALA-8562: Data cache should skip insertion of uncacheable 
ScanRanges
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13369/1/be/src/runtime/io/data-cache-test.cc
File be/src/runtime/io/data-cache-test.cc:

http://gerrit.cloudera.org:8080/#/c/13369/1/be/src/runtime/io/data-cache-test.cc@267
PS1, Line 267:   // Test with uncacheable 'mtime' to make sure the entry is not 
stored.
> I actually meant those should be additional test cases, i.e. test -1 (the m
Added one more test case for the uncacheable scan ranges in the latest PS.



--
To view, visit http://gerrit.cloudera.org:8080/13369
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2294833b075a2ddcae956d9fdb04f3e85adb0391
Gerrit-Change-Number: 13369
Gerrit-PatchSet: 3
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 04 Jun 2019 20:08:23 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8562: Data cache should skip insertion of uncacheable ScanRanges

2019-06-03 Thread Michael Ho (Code Review)
Michael Ho has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/13369 )

Change subject: IMPALA-8562: Data cache should skip insertion of uncacheable 
ScanRanges
..

IMPALA-8562: Data cache should skip insertion of uncacheable ScanRanges

As shown in IMPALA-8561, there are some paths in the code which
create uncacheable ScanRanges. These uncacheable ScanRanges have
mtime of -1. 'mtime' is used for differentiating versions of files
with the same names. An mtime == -1 means the cache entry could
potentially be from any versions of a file with the same name.

This change skips lookup or insertion of ScanRange with negative
mtime, file offset or buffer length.

Testing done: Added targeted test cases in data-cache-test

Change-Id: I2294833b075a2ddcae956d9fdb04f3e85adb0391
---
M be/src/runtime/io/data-cache-test.cc
M be/src/runtime/io/data-cache.cc
M be/src/runtime/io/data-cache.h
3 files changed, 39 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/13369/2
--
To view, visit http://gerrit.cloudera.org:8080/13369
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2294833b075a2ddcae956d9fdb04f3e85adb0391
Gerrit-Change-Number: 13369
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-8562: Data cache should skip insertion of uncacheable ScanRanges

2019-06-03 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13369 )

Change subject: IMPALA-8562: Data cache should skip insertion of uncacheable 
ScanRanges
..


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/13369/1/be/src/runtime/io/data-cache-test.cc
File be/src/runtime/io/data-cache-test.cc:

http://gerrit.cloudera.org:8080/#/c/13369/1/be/src/runtime/io/data-cache-test.cc@267
PS1, Line 267:   ASSERT_FALSE(cache.Store(FNAME, -1000, 0, test_buffer(), 
TEMP_BUFFER_SIZE));
> Maybe test with a different bogus mtime, e.g. -1 just to demonstrate th
Done


http://gerrit.cloudera.org:8080/#/c/13369/1/be/src/runtime/io/data-cache.cc
File be/src/runtime/io/data-cache.cc:

http://gerrit.cloudera.org:8080/#/c/13369/1/be/src/runtime/io/data-cache.cc@287
PS1, Line 287:   explicit CacheKey(const string& filename, int64_t mtime, 
int64_t offset)
> Maybe add DCHECKs here to make sure that invalid values didn't slip through
Done


http://gerrit.cloudera.org:8080/#/c/13369/1/be/src/runtime/io/data-cache.cc@660
PS1, Line 660: int64_t bytes_to_read, uint8_t* buffer) {
> I think I prefer "uncachable" instead of "illegitimate". In some sense read
Done



--
To view, visit http://gerrit.cloudera.org:8080/13369
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2294833b075a2ddcae956d9fdb04f3e85adb0391
Gerrit-Change-Number: 13369
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 04 Jun 2019 01:07:36 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8562: Data cache should skip insertion of uncacheable ScanRanges

2019-06-03 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13369 )

Change subject: IMPALA-8562: Data cache should skip insertion of uncacheable 
ScanRanges
..


Patch Set 2: Code-Review+1

Carry Lars' +1


--
To view, visit http://gerrit.cloudera.org:8080/13369
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2294833b075a2ddcae956d9fdb04f3e85adb0391
Gerrit-Change-Number: 13369
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 04 Jun 2019 01:07:45 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8143: Enhance DoRpcWithRetry().

2019-06-03 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12672 )

Change subject: IMPALA-8143: Enhance DoRpcWithRetry().
..


Patch Set 11: Code-Review+2

(6 comments)

LGTM. Please address the nits.

http://gerrit.cloudera.org:8080/#/c/12672/11/be/src/rpc/impala-service-pool.h
File be/src/rpc/impala-service-pool.h:

http://gerrit.cloudera.org:8080/#/c/12672/11/be/src/rpc/impala-service-pool.h@72
PS11, Line 72: string
char*

See 
https://google.github.io/styleguide/cppguide.html#Static_and_Global_Variables


http://gerrit.cloudera.org:8080/#/c/12672/11/be/src/rpc/impala-service-pool.cc
File be/src/rpc/impala-service-pool.cc:

http://gerrit.cloudera.org:8080/#/c/12672/11/be/src/rpc/impala-service-pool.cc@192
PS11, Line 192: !debug_status.ok()
UNLIKELY(!debug_status.ok())


http://gerrit.cloudera.org:8080/#/c/12672/11/be/src/rpc/rpc-mgr-test.h
File be/src/rpc/rpc-mgr-test.h:

http://gerrit.cloudera.org:8080/#/c/12672/11/be/src/rpc/rpc-mgr-test.h@281
PS11, Line 281: getNumberOfCalls()
nit: number_of_calls() or GetNumberOfCalls()


http://gerrit.cloudera.org:8080/#/c/12672/11/be/src/rpc/rpc-mgr-test.cc
File be/src/rpc/rpc-mgr-test.cc:

http://gerrit.cloudera.org:8080/#/c/12672/11/be/src/rpc/rpc-mgr-test.cc@310
PS11, Line 310: NULL
nit: nullptr


http://gerrit.cloudera.org:8080/#/c/12672/11/be/src/rpc/rpc-mgr-test.cc@331
PS11, Line 331: // DoRpcWithRetry will fail 1 in 2^40 times.
  : ASSERT_TRUE(status.ok());
  :   }
  :   // There will be no overflows (i.e. service too busy) 1 in 
2^40 times.
  :   ASSERT_GT(overflow_metric->GetValue(), 0);
:-).

Subtle nits: Although they are similar this is referring to the probability of 
the debug action not kicking in at all in the loop above is: (1/2)^40 times.

The other one on line 331 refers to the probability of the debugging action 
kicking in 40 times in a row.

I suppose it's slightly clearer to state in (1/2)^num_rpc_retry_calls and 
(1/2)^num_retries. Please feel free to ignore as they are rather minor nits.


http://gerrit.cloudera.org:8080/#/c/12672/11/be/src/rpc/rpc-mgr.h
File be/src/rpc/rpc-mgr.h:

http://gerrit.cloudera.org:8080/#/c/12672/11/be/src/rpc/rpc-mgr.h@183
PS11, Line 183:   /// Pass 'debug_action' to DebugAction() to potentially 
inject errors.
Please add a TODO:

TODO: Clean up this interface. Replace the debug action with fault injection in 
RPC callbacks or other places.



--
To view, visit http://gerrit.cloudera.org:8080/12672
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia9693151c35e02235665b3c285a48c585973d390
Gerrit-Change-Number: 12672
Gerrit-PatchSet: 11
Gerrit-Owner: Andrew Sherman 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Comment-Date: Mon, 03 Jun 2019 23:30:48 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8542. Add an access trace for the data cache

2019-06-03 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13425 )

Change subject: IMPALA-8542. Add an access trace for the data cache
..


Patch Set 3:

(8 comments)

LGTM. Mostly nits.

http://gerrit.cloudera.org:8080/#/c/13425/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13425/3//COMMIT_MSG@10
PS3, Line 10: trace.txt
impala-cache-trace.txt ?


http://gerrit.cloudera.org:8080/#/c/13425/3/be/src/runtime/io/data-cache.cc
File be/src/runtime/io/data-cache.cc:

http://gerrit.cloudera.org:8080/#/c/13425/3/be/src/runtime/io/data-cache.cc@88
PS3, Line 88: "(Advanced) Collect a trace of all lookups in the 
data cache.");
nit: the indent here and other places in this change seems to be different from 
the existing code. We usually indent 4 for line continuation. Seems more 
consistent to follow the indent 4 convention.


http://gerrit.cloudera.org:8080/#/c/13425/3/be/src/runtime/io/data-cache.cc@98
PS3, Line 98: impala-cache-trace.txt
The data cache by default removes all files with names containing prefix 
CACHE_FILE_PREFIX at startup. I suppose we want to keep the trace files across 
restart to allow more flexibility for experiment, right ? In other words, it's 
up to users to remove the trace files.


http://gerrit.cloudera.org:8080/#/c/13425/3/be/src/runtime/io/data-cache.cc@116
PS3, Line 116: if (file_) Flush();
Does it make sense to also call file_->Close() here or does the d'tor take care 
of it ?


http://gerrit.cloudera.org:8080/#/c/13425/3/be/src/runtime/io/data-cache.cc@148
PS3, Line 148:
nit: unnecessary blank line.


http://gerrit.cloudera.org:8080/#/c/13425/3/be/src/runtime/io/data-cache.cc@186
PS3, Line 186:   unique_ptr underlying_logger_;
 :   unique_ptr logger_;
Would be nice to briefly document these variables.


http://gerrit.cloudera.org:8080/#/c/13425/3/be/src/runtime/io/data-cache.cc@498
PS3, Line 498:   if (FLAGS_data_cache_enable_tracing) {
 : tracer_.reset(new Tracer(path_ + "/" + TRACE_FILE_NAME));
 : RETURN_IF_ERROR(tracer_->Start());
 :   }
I wonder how this may affect the available storage space allocated for the data 
cache ? The traces are relatively small compared to the cached data but I 
wonder if a long running Impala process may accumulate a somewhat large trace 
file.

On the other hand, tracing is mostly for experimentation so it's not a severe 
concern for production environment.


http://gerrit.cloudera.org:8080/#/c/13425/3/be/src/runtime/io/data-cache.cc@887
PS3, Line 887: char buf[BUF_LEN];
nit: same name as 'buf' declared outside of the scope. While functionally 
correct, it'd be less confusing to have no name collision.



--
To view, visit http://gerrit.cloudera.org:8080/13425
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c
Gerrit-Change-Number: 13425
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Mon, 03 Jun 2019 22:37:41 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8562: Data cache should skip insertion of illegitimate ScanRanges

2019-05-17 Thread Michael Ho (Code Review)
Michael Ho has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/13369


Change subject: IMPALA-8562: Data cache should skip insertion of illegitimate 
ScanRanges
..

IMPALA-8562: Data cache should skip insertion of illegitimate ScanRanges

As shown in IMPALA-8561, there are some paths in the code which
create uncacheable ScanRanges. These uncacheable ScanRanges have
mtime of -1. 'mtime' is used for differentiating versions of files
with the same names. An mtime == -1 means the cache entry could
potentially be from any versions of a file with the same name.

This change skips lookup or insertion of ScanRange with negative
mtime, file offset or buffer length.

Testing done: Added targeted test cases in data-cache-test

Change-Id: I2294833b075a2ddcae956d9fdb04f3e85adb0391
---
M be/src/runtime/io/data-cache-test.cc
M be/src/runtime/io/data-cache.cc
2 files changed, 24 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/13369/1
--
To view, visit http://gerrit.cloudera.org:8080/13369
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I2294833b075a2ddcae956d9fdb04f3e85adb0391
Gerrit-Change-Number: 13369
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Marshall 


[Impala-ASF-CR] IMPALA-8563: Update SSL ciphers used in BE tests

2019-05-17 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13364 )

Change subject: IMPALA-8563: Update SSL ciphers used in BE tests
..


Patch Set 1: Code-Review+1

Thanks for fixing it.


--
To view, visit http://gerrit.cloudera.org:8080/13364
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I12b014361fb90afe63aed4b4608f6d6031e49cca
Gerrit-Change-Number: 13364
Gerrit-PatchSet: 1
Gerrit-Owner: Laszlo Gaal 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 17 May 2019 20:44:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] Allow data cache to be enabled optionally when running tests

2019-05-16 Thread Michael Ho (Code Review)
Hello Lars Volker, Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/13330

to look at the new patch set (#4).

Change subject: Allow data cache to be enabled optionally when running tests
..

Allow data cache to be enabled optionally when running tests

This change adds two environment variables ${DATA_CACHE_DIR}
and ${DATA_CACHE_SIZE} which specify the root directory path
of the data cache and the data cache size respectively. If both
are set, Impala cluster will be started with data cache enabled
when running E2E tests. When running with HDFS, it will also
set the config option --always_use_data_cache to true to force
use of the data cache all the time.

Change-Id: I09117ab289c2355408212a5fc6493ab751f4853b
---
M bin/run-all-tests.sh
1 file changed, 15 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/13330/4
--
To view, visit http://gerrit.cloudera.org:8080/13330
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I09117ab289c2355408212a5fc6493ab751f4853b
Gerrit-Change-Number: 13330
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 


[Impala-ASF-CR] Allow data cache to be enabled optionally when running tests

2019-05-16 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13330 )

Change subject: Allow data cache to be enabled optionally when running tests
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/13330/2/bin/run-all-tests.sh
File bin/run-all-tests.sh:

http://gerrit.cloudera.org:8080/#/c/13330/2/bin/run-all-tests.sh@64
PS2, Line 64:
> Do we know if the choice of filesystem matters for the cache? For example,
It may matter if the filesystem doesn't support hole punching. I leave it as 
empty in the new patch.


http://gerrit.cloudera.org:8080/#/c/13330/2/bin/run-all-tests.sh@78
PS2, Line 78: _START_CLUSTER_ARGS="${TEST_S
> I think the data caching is orthogonal to the filesystem, and we might want
Done



--
To view, visit http://gerrit.cloudera.org:8080/13330
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I09117ab289c2355408212a5fc6493ab751f4853b
Gerrit-Change-Number: 13330
Gerrit-PatchSet: 3
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Thu, 16 May 2019 18:11:33 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] Allow data cache to be enabled optionally when running tests

2019-05-16 Thread Michael Ho (Code Review)
Hello Lars Volker, Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/13330

to look at the new patch set (#3).

Change subject: Allow data cache to be enabled optionally when running tests
..

Allow data cache to be enabled optionally when running tests

This change adds two environment variables ${DATA_CACHE_DIR}
and ${DATA_CACHE_SIZE} which specify the root directory path
of the data cache and the data cache size respectively. If both
are set, Impala cluster will be started with data cache enabled
when running E2E tests. When running with HDFS, it will also
set the config option --always_use_data_cache to true to force
use of the data cache all the time.

Change-Id: I09117ab289c2355408212a5fc6493ab751f4853b
---
M bin/run-all-tests.sh
1 file changed, 15 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/13330/3
--
To view, visit http://gerrit.cloudera.org:8080/13330
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I09117ab289c2355408212a5fc6493ab751f4853b
Gerrit-Change-Number: 13330
Gerrit-PatchSet: 3
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 


[Impala-ASF-CR] IMPALA-1653: Don't close hiveserver2 session when connection is closed

2019-05-15 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13306 )

Change subject: IMPALA-1653: Don't close hiveserver2 session when connection is 
closed
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13306/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13306/3//COMMIT_MSG@30
PS3, Line 30: We may want to consider changing
: some of this behavior.
> We had started discussing this a bit offline. Agree that we rely on the ses
Enabling --idle_query_timeout by default sounds like a reasonable mitigation 
against clients which don't fetch results. As Thomas said, the session state 
isn't that big so we can bound it at some large numbers and do LRU reclamation 
or set a reasonable large limit (e.g. 24 hrs) on how long a session will be 
maintained by default.

What was the idle_session-timeout set to when you ran tests with it enabled ?



--
To view, visit http://gerrit.cloudera.org:8080/13306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia4555cd9b73db5b4dde92cd4fac4f9bfa3664d78
Gerrit-Change-Number: 13306
Gerrit-PatchSet: 3
Gerrit-Owner: Thomas Marshall 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 15 May 2019 19:03:53 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1653: Don't close hiveserver2 session when connection is closed

2019-05-15 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13306 )

Change subject: IMPALA-1653: Don't close hiveserver2 session when connection is 
closed
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13306/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13306/3//COMMIT_MSG@30
PS3, Line 30: We may want to consider changing
: some of this behavior.
> We probably should consider removing states of expired sessions or Impala m
To be fair, Impala is already vulnerable to similar kind of DoS attack today. A 
client can establish many connections to an Impala server, consuming all the FE 
service threads while leaving the sessions idle. All other clients will be 
blocked by this misbehaving client. See IMPALA-7802.

Before this change, we could have as many sessions as number of FE service 
threads. After this change, the session will be kept alive even after the 
connection is closed so we could in theory have way more sessions than number 
of FE service threads. It seems there is a more pressing need for idle session 
timeout after this patch as it seems easier to create the situation of issuing 
a query without fetching all the results, holding up resources in the cluster 
but I could be missing something. Previously, the client disconnection will 
automatically close the session and unregistering all in-flight queries, 
freeing up the resources.



--
To view, visit http://gerrit.cloudera.org:8080/13306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia4555cd9b73db5b4dde92cd4fac4f9bfa3664d78
Gerrit-Change-Number: 13306
Gerrit-PatchSet: 3
Gerrit-Owner: Thomas Marshall 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Comment-Date: Wed, 15 May 2019 07:22:32 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4356,IMPALA-7331: codegen all ScalarExprs

2019-05-14 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12797 )

Change subject: IMPALA-4356,IMPALA-7331: codegen all ScalarExprs
..


Patch Set 13: Code-Review+2

(3 comments)

http://gerrit.cloudera.org:8080/#/c/12797/12/be/src/exprs/scalar-fn-call.cc
File be/src/exprs/scalar-fn-call.cc:

http://gerrit.cloudera.org:8080/#/c/12797/12/be/src/exprs/scalar-fn-call.cc@143
PS12, Line 143:   // staging_input_vals, which will be reused across calls to 
scalar_fn_ on the
  :   // interpreted code path..
  :   vector* input_vals = 
fn_ctx->impl()->staging_input_vals();
  :   for (int i = 0; i < NumFixedArgs(); ++i) {
  : AnyVal* input_val;
  : RETURN_IF_ERROR(AllocateAnyVal(state, 
eval->expr_perm_pool(), children_[i]->type(),
  : "Could not allocate expression value", _val));
  : input_vals->push_back(input_val);
  :   }
  :
> Yes, the problem I ran into is that code like the below code to GetConstVal
I suppose an alternative is to do:

  bool is_interpreted = codegend_compute_fn_ == nullptr;

Of course, this makes assumption about the order in which  codegend_compute_fn_ 
is populated and when this function is called so it may add unnecessary 
complication. So, I think it's okay to leave it as-is but would be nice to 
briefly document the reasoning behind it.


http://gerrit.cloudera.org:8080/#/c/12797/13/be/src/exprs/scalar-fn-call.cc
File be/src/exprs/scalar-fn-call.cc:

http://gerrit.cloudera.org:8080/#/c/12797/13/be/src/exprs/scalar-fn-call.cc@142
PS13, Line 142: We're in the interpreted path (i.e. no JIT)
Prepare input_vals in case the interpreted path is invoked during 
initialization (e.g. GetConstantValue() below).


http://gerrit.cloudera.org:8080/#/c/12797/13/be/src/exprs/scalar-fn-call.cc@144
PS13, Line 144: .
nit: extra .



--
To view, visit http://gerrit.cloudera.org:8080/12797
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I839d7a3a2f5e1309c33a1f66013ef11628c5dc11
Gerrit-Change-Number: 12797
Gerrit-PatchSet: 13
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 14 May 2019 21:30:47 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1653: Don't close hiveserver2 session when connection is closed

2019-05-14 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13306 )

Change subject: IMPALA-1653: Don't close hiveserver2 session when connection is 
closed
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/13306/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13306/3//COMMIT_MSG@28
PS3, Line 28: timeout
: is off by default
May be worth considering setting it to some large values instead of being "off".


http://gerrit.cloudera.org:8080/#/c/13306/3//COMMIT_MSG@30
PS3, Line 30: We may want to consider changing
: some of this behavior.
We probably should consider removing states of expired sessions or Impala may 
risk being attacked indirectly by clients which open many sessions and never 
close them. Previously, a client will need to keep a connection opened to DoS 
Impala but with this change, it doesn't even have to.



--
To view, visit http://gerrit.cloudera.org:8080/13306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia4555cd9b73db5b4dde92cd4fac4f9bfa3664d78
Gerrit-Change-Number: 13306
Gerrit-PatchSet: 3
Gerrit-Owner: Thomas Marshall 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Comment-Date: Tue, 14 May 2019 20:14:20 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard

2019-05-14 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13020 )

Change subject: IMPALA-8344: Add support for running the minicluster with 
S3Guard
..


Patch Set 4: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/13020
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d
Gerrit-Change-Number: 13020
Gerrit-PatchSet: 4
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Comment-Date: Tue, 14 May 2019 18:31:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] Enable data cache by default for all S3 builds

2019-05-14 Thread Michael Ho (Code Review)
Hello Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/13330

to look at the new patch set (#2).

Change subject: Enable data cache by default for all S3 builds
..

Enable data cache by default for all S3 builds

Change-Id: I09117ab289c2355408212a5fc6493ab751f4853b
---
M bin/run-all-tests.sh
1 file changed, 11 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/13330/2
--
To view, visit http://gerrit.cloudera.org:8080/13330
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I09117ab289c2355408212a5fc6493ab751f4853b
Gerrit-Change-Number: 13330
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 


[Impala-ASF-CR] Enable data cache by default for all S3 builds

2019-05-14 Thread Michael Ho (Code Review)
Michael Ho has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/13330


Change subject: Enable data cache by default for all S3 builds
..

Enable data cache by default for all S3 builds

Change-Id: I09117ab289c2355408212a5fc6493ab751f4853b
---
M bin/run-all-tests.sh
1 file changed, 11 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/13330/1
--
To view, visit http://gerrit.cloudera.org:8080/13330
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I09117ab289c2355408212a5fc6493ab751f4853b
Gerrit-Change-Number: 13330
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 


[Impala-ASF-CR] Prototype for a remote read byte cache.

2019-05-10 Thread Michael Ho (Code Review)
Michael Ho has abandoned this change. ( http://gerrit.cloudera.org:8080/12683 )

Change subject: Prototype for a remote read byte cache.
..


Abandoned

See https://gerrit.cloudera.org/#/c/12987/
--
To view, visit http://gerrit.cloudera.org:8080/12683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: Ic312b0f7ac7875e00a3855ef21dce5b8a9aa67c5
Gerrit-Change-Number: 12683
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 


[Impala-ASF-CR] IMPALA-4356,IMPALA-7331: codegen all ScalarExprs

2019-05-10 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12797 )

Change subject: IMPALA-4356,IMPALA-7331: codegen all ScalarExprs
..


Patch Set 12:

(12 comments)

Thanks for the explanation. Can you please answer the question in 
scalar-fn-call.cc. Otherwise, I think this can be +2'ed.

http://gerrit.cloudera.org:8080/#/c/12797/12/be/src/exec/grouping-aggregator.cc
File be/src/exec/grouping-aggregator.cc:

http://gerrit.cloudera.org:8080/#/c/12797/12/be/src/exec/grouping-aggregator.cc@119
PS12, Line 119:   intermediate_row_desc_, /* is_entry_point */ false, 
state));
nit: indentation


http://gerrit.cloudera.org:8080/#/c/12797/12/be/src/exprs/scalar-expr.h
File be/src/exprs/scalar-expr.h:

http://gerrit.cloudera.org:8080/#/c/12797/12/be/src/exprs/scalar-expr.h@110
PS12, Line 110: interpreted code.
(e.g. an operator which doesn't support codegen yet).


http://gerrit.cloudera.org:8080/#/c/12797/12/be/src/exprs/scalar-expr.h@117
PS12, Line 117: /// TODO: Fix subclasses which call 
GetCodegendComputeFnWrapper() to not call interpreted
  : /// functions.
Stale ?


http://gerrit.cloudera.org:8080/#/c/12797/12/be/src/exprs/scalar-expr.h@170
PS12, Line 170:   /// compilation.  Returns error status on failure.
Can you please also add something similar to below:

This function is invoked either by some other codegen functions (e.g. the 
codegen code of a join) or by RuntimeState::CodegenScalarExprs() which is 
called from FIS::Open() before LLVM compilation. These two call sites 
correspond to the two usage patterns in the class comment.


http://gerrit.cloudera.org:8080/#/c/12797/12/be/src/exprs/scalar-expr.h@246
PS12, Line 246: for each
to be overridden by


http://gerrit.cloudera.org:8080/#/c/12797/12/be/src/exprs/scalar-expr.h@388
PS12, Line 388: If this is true, then after the expressions are codegen'd,
  :   /// then 'codegend_compute_fn_' is non-NULL.
If this is true, 'codegend_compute_fn_' will be set to the JIT'd function 
produced by GetCodegendComputeFn() after LLVM compilation.


http://gerrit.cloudera.org:8080/#/c/12797/10/be/src/exprs/scalar-expr.cc
File be/src/exprs/scalar-expr.cc:

http://gerrit.cloudera.org:8080/#/c/12797/10/be/src/exprs/scalar-expr.cc@284
PS10, Line 284:
> This assumption is false - GetCodegendComputeFnWrapper() generates
 > a codegen'd function that calls the interpreted path of its child.
 > We need to generate entry points there to avoid a perf cliff where
 > the whole subtree doesn't get codegen'd (which would be a
 > regression from the old behaviour too).
 >

That's true. I suppose this mostly affects ScalarFnCall, right ? Before this 
change, if the expression is not codegen'd, its children are not codegen'd 
unless the children are ScalarFnCall. In that sense, that'll be a regression. 
Are there other cases which I may have missed ?

 >
 > ScalarExpr doesn't know the context in which it is used. E.g. if
 > it's the child of an AggregateExpr, then it's not an entry point
 > but doesn't know that based on local information. There are
 > actually a bunch more places where the expr is always called from
 > codegen'd code and we could save some overhead, but I didn't go
 > through and find them all.

I see. I can see an alternate implementation is to revert the polarity of the 
"is_entry_point" flag to false by default and makes the owners of expressions 
to explicitly set 'is_entry_point' when creating / initialization the 
expressions. Those owners most likely include some operators which don't yet 
support codegen but I haven't looked through the entire code to see if there 
could be other non-trivial cases.

On the other hand, this may make it more likely to regress if we miss some 
cases as ScalarFnCall currently codegen all the time regardless of the callers. 
I suppose we kind of need to go through all owners of expressions and mark 
those which don't need entry points when addressing the TODO in Create() above.


http://gerrit.cloudera.org:8080/#/c/12797/12/be/src/exprs/scalar-expr.cc
File be/src/exprs/scalar-expr.cc:

http://gerrit.cloudera.org:8080/#/c/12797/12/be/src/exprs/scalar-expr.cc@86
PS12, Line 86: ///
nit:  //


http://gerrit.cloudera.org:8080/#/c/12797/12/be/src/exprs/scalar-expr.inline.h
File be/src/exprs/scalar-expr.inline.h:

http://gerrit.cloudera.org:8080/#/c/12797/12/be/src/exprs/scalar-expr.inline.h@24
PS12, Line 24: //
nit: ///


http://gerrit.cloudera.org:8080/#/c/12797/12/be/src/exprs/scalar-expr.inline.h@28
PS12, Line 28: ScalarFnCall
ScalarExpr


http://gerrit.cloudera.org:8080/#/c/12797/12/be/src/exprs/scalar-fn-call.cc
File be/src/exprs/scalar-fn-call.cc:

http://gerrit.cloudera.org:8080/#/c/12797/12/be/src/exprs/scalar-fn-call.cc@95
PS12, Line 95: // CHAR or VARCHAR are not supported as input arguments or 
return values for UDFs.
stale ?



[Impala-ASF-CR] IMPALA-8143: Enhance DoRpcWithRetry().

2019-05-09 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12672 )

Change subject: IMPALA-8143: Enhance DoRpcWithRetry().
..


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12672/10/be/src/rpc/rpc-mgr-test.h
File be/src/rpc/rpc-mgr-test.h:

http://gerrit.cloudera.org:8080/#/c/12672/10/be/src/rpc/rpc-mgr-test.h@322
PS10, Line 322: kudu::rpc::ResponseCallback callback = []() {};
  : kudu::rpc::ErrorStatusPB* error_status_pb = new 
kudu::rpc::ErrorStatusPB();
  : 
error_status_pb->set_code(kudu::rpc::ErrorStatusPB::ERROR_SERVER_TOO_BUSY);
  : kudu::rpc::OutboundCall* outbound_call = new 
kudu::rpc::OutboundCall(
  : *conn_id_, *remote_method_, resp, controller, callback);
  : outbound_call->status_ = kudu::Status::RemoteError("Remote 
error");
  : outbound_call->state_ = 
kudu::rpc::OutboundCall::FINISHED_ERROR;
  : outbound_call->error_pb_.reset(error_status_pb);
  : controller->call_.reset(outbound_call);
As discussed offline last week, we may be better off using a debug action in 
ImpalaServicePool to simulate the busy queue behavior instead of cooking this 
failure up on the client side.



--
To view, visit http://gerrit.cloudera.org:8080/12672
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia9693151c35e02235665b3c285a48c585973d390
Gerrit-Change-Number: 12672
Gerrit-PatchSet: 10
Gerrit-Owner: Andrew Sherman 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Comment-Date: Thu, 09 May 2019 17:43:56 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7176: Increase wait time in test insert mem limit

2019-05-08 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13292 )

Change subject: IMPALA-7176: Increase wait time in test_insert_mem_limit
..


Patch Set 1:

Didn't know you were working on the same thing. Oh well..may be you can  update 
the commit message of your patch to include that there was testing done with 
timeout of 180s.


--
To view, visit http://gerrit.cloudera.org:8080/13292
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2e14bef79c6c6fb0004270319f1c491194260438
Gerrit-Change-Number: 13292
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 09 May 2019 01:43:34 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7176: Increase wait time in test insert mem limit

2019-05-08 Thread Michael Ho (Code Review)
Michael Ho has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/13292


Change subject: IMPALA-7176: Increase wait time in test_insert_mem_limit
..

IMPALA-7176: Increase wait time in test_insert_mem_limit

test_insert_mem_limit in test_insert.py will wait for all fragments
to exit before proceeding after the test query hits a memory limit.
On some slower EC2 instances with Centos6, the default wait time of
60s may occasionally cause the test to fail due to time out waiting
for all fragments to exit.

This change increases the timeout to 180 seconds.

Testing done:
- Looped test_insert_mem_limit for 100 times on Centos6;

Change-Id: I2e14bef79c6c6fb0004270319f1c491194260438
---
M tests/query_test/test_insert.py
1 file changed, 1 insertion(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/13292/1
--
To view, visit http://gerrit.cloudera.org:8080/13292
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I2e14bef79c6c6fb0004270319f1c491194260438
Gerrit-Change-Number: 13292
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 


[Impala-ASF-CR] IMPALA-8512: Disable certain tests on Centos6

2019-05-08 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13271 )

Change subject: IMPALA-8512: Disable certain tests on Centos6
..


Patch Set 4:

GVO failed due to IMPALA-8527


--
To view, visit http://gerrit.cloudera.org:8080/13271
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifbff15415bc690f779a09ec93a7ded8b394eca10
Gerrit-Change-Number: 13271
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 08 May 2019 17:54:54 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7665: Fix unwarranted query cancellation on statestore restart

2019-05-07 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13061 )

Change subject: IMPALA-7665: Fix unwarranted query cancellation on statestore 
restart
..


Patch Set 5: Code-Review+1

Not sure if Lars wants to take another look. Otherwise, I can +2 it.


--
To view, visit http://gerrit.cloudera.org:8080/13061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30b68bd8bde4bf589d58d42d6f683afb166de959
Gerrit-Change-Number: 13061
Gerrit-PatchSet: 5
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 07 May 2019 23:47:52 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8512: Disable certain tests on Centos6

2019-05-07 Thread Michael Ho (Code Review)
Hello Lars Volker, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/13271

to look at the new patch set (#3).

Change subject: IMPALA-8512: Disable certain tests on Centos6
..

IMPALA-8512: Disable certain tests on Centos6

The data cache related tests rely on data cache files being created
successfully on local filesystem. The cache initialization may fail
if the cache directory resides on a ext filesystem which is affected
by KUDU-1508 (metadata corruption after hole punching in some files).
On some older versions of Centos6, the tests fail as a result of
this bug.

This change skips these tests if they detect that it's running on
an old system affected by KUDU-1508. This patch also disables a
filesystem-util test which relies on readdir() returning the correct
entries' types. On some older platforms such as Centos6, this feature
may not be fully supported on all filesystems.

Change-Id: Ifbff15415bc690f779a09ec93a7ded8b394eca10
---
M be/src/runtime/io/data-cache-test.cc
M be/src/util/filesystem-util-test.cc
M be/src/util/filesystem-util.cc
M be/src/util/filesystem-util.h
M tests/common/environ.py
M tests/common/skip.py
M tests/custom_cluster/test_data_cache.py
7 files changed, 86 insertions(+), 32 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/13271/3
--
To view, visit http://gerrit.cloudera.org:8080/13271
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifbff15415bc690f779a09ec93a7ded8b394eca10
Gerrit-Change-Number: 13271
Gerrit-PatchSet: 3
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-8512: Disable certain tests on Centos6

2019-05-07 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13271 )

Change subject: IMPALA-8512: Disable certain tests on Centos6
..


Patch Set 3: Code-Review+2

Carry Tim's +2


--
To view, visit http://gerrit.cloudera.org:8080/13271
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifbff15415bc690f779a09ec93a7ded8b394eca10
Gerrit-Change-Number: 13271
Gerrit-PatchSet: 3
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 07 May 2019 23:17:15 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8512: Disable certain tests on Centos6

2019-05-07 Thread Michael Ho (Code Review)
Michael Ho has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/13271 )

Change subject: IMPALA-8512: Disable certain tests on Centos6
..

IMPALA-8512: Disable certain tests on Centos6

The data cache related tests rely on data cache files being created
successfully on local filesystem. The cache initialization may fail
if the cache directory resides on a ext filesystem which is affected
by KUDU-1508 (metadata corruption after hole punching in some files).
On some older versions of Centos6, the tests fail as a result of
this bug.

This change skips these tests if they detect that it's running on
an old system affected by KUDU-1508. This patch also disables a
filesystem-util test which relies on readdir() returning the correct
entries' types. On some older platforms such as Centos6, this feature
may not be fully supported on all filesystems.

Change-Id: Ifbff15415bc690f779a09ec93a7ded8b394eca10
---
M be/src/runtime/io/data-cache-test.cc
M be/src/util/filesystem-util-test.cc
M be/src/util/filesystem-util.cc
M be/src/util/filesystem-util.h
M tests/common/environ.py
M tests/common/skip.py
M tests/custom_cluster/test_data_cache.py
7 files changed, 86 insertions(+), 32 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/13271/2
--
To view, visit http://gerrit.cloudera.org:8080/13271
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifbff15415bc690f779a09ec93a7ded8b394eca10
Gerrit-Change-Number: 13271
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-8512: Disable certain tests on Centos6

2019-05-07 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13271 )

Change subject: IMPALA-8512: Disable certain tests on Centos6
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13271/1/tests/common/environ.py
File tests/common/environ.py:

http://gerrit.cloudera.org:8080/#/c/13271/1/tests/common/environ.py@54
PS1, Line 54: IS_BUGGY_EL6_KERNEL = 'el6' in kernel_release and kernel_release 
< '2.6.32-674'
> Yes, it probably requires some more sophisticated parsing to make it more f
There are probably other python modules which allow simple parsing. Otherwise, 
it shouldn't be too hard to parse them via regular expression. Will update the 
patch on it.



--
To view, visit http://gerrit.cloudera.org:8080/13271
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifbff15415bc690f779a09ec93a7ded8b394eca10
Gerrit-Change-Number: 13271
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 07 May 2019 20:14:34 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8512: Disable certain tests on Centos6

2019-05-07 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13271 )

Change subject: IMPALA-8512: Disable certain tests on Centos6
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13271/1/tests/common/environ.py
File tests/common/environ.py:

http://gerrit.cloudera.org:8080/#/c/13271/1/tests/common/environ.py@54
PS1, Line 54: IS_BUGGY_EL6_KERNEL = 'el6' in kernel_release and kernel_release 
< '2.6.32-674'
> Does string comparison of the kernel version always work? As opposed to num
Yes, it probably requires some more sophisticated parsing to make it more 
foolproof but it gets most of what we need for our current testing environment.



--
To view, visit http://gerrit.cloudera.org:8080/13271
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifbff15415bc690f779a09ec93a7ded8b394eca10
Gerrit-Change-Number: 13271
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 07 May 2019 19:56:33 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8512: Disable certain tests on Centos6

2019-05-07 Thread Michael Ho (Code Review)
Michael Ho has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/13271


Change subject: IMPALA-8512: Disable certain tests on Centos6
..

IMPALA-8512: Disable certain tests on Centos6

The data cache related tests rely on data cache files being created
successfully on local filesystem. The cache initialization may fail
if the cache directory resides on a ext filesystem which is affected
by KUDU-1508 (metadata corruption after hole punching in some files).
On some older versions of Centos6, the tests fail as a result of
this bug.

This change skips these tests if they detect that it's running on
an old system affected by KUDU-1508. This patch also disables a
filesystem-util test which relies on readdir() returning the correct
entries' types. On some older platforms such as Centos6, this feature
may not be fully supported on all filesystems.

Change-Id: Ifbff15415bc690f779a09ec93a7ded8b394eca10
---
M be/src/runtime/io/data-cache-test.cc
M be/src/util/filesystem-util-test.cc
M be/src/util/filesystem-util.cc
M be/src/util/filesystem-util.h
M tests/common/environ.py
M tests/common/skip.py
M tests/custom_cluster/test_data_cache.py
7 files changed, 79 insertions(+), 32 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/13271/1
--
To view, visit http://gerrit.cloudera.org:8080/13271
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ifbff15415bc690f779a09ec93a7ded8b394eca10
Gerrit-Change-Number: 13271
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 


[Impala-ASF-CR] IMPALA-7665: Fix unwarranted query cancellation on statestore restart

2019-05-06 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13061 )

Change subject: IMPALA-7665: Fix unwarranted query cancellation on statestore 
restart
..


Patch Set 4:

(3 comments)

Thanks for fixing it. Glad that this is also pretty straightforward.

http://gerrit.cloudera.org:8080/#/c/13061/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13061/4//COMMIT_MSG@23
PS4, Line 23: fo
typo


http://gerrit.cloudera.org:8080/#/c/13061/4/be/src/statestore/statestore-subscriber.h
File be/src/statestore/statestore-subscriber.h:

http://gerrit.cloudera.org:8080/#/c/13061/4/be/src/statestore/statestore-subscriber.h@214
PS4, Line 214: //
nit: ///


http://gerrit.cloudera.org:8080/#/c/13061/4/be/src/statestore/statestore-subscriber.cc
File be/src/statestore/statestore-subscriber.cc:

http://gerrit.cloudera.org:8080/#/c/13061/4/be/src/statestore/statestore-subscriber.cc@196
PS4, Line 196: last_registration_ms_.Store(MonotonicMillis());
Should this be set iff status.ok() ?



--
To view, visit http://gerrit.cloudera.org:8080/13061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30b68bd8bde4bf589d58d42d6f683afb166de959
Gerrit-Change-Number: 13061
Gerrit-PatchSet: 4
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 06 May 2019 22:13:50 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8496: Fix flakiness of test data cache.py

2019-05-04 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13242 )

Change subject: IMPALA-8496: Fix flakiness of test_data_cache.py
..


Patch Set 4: Code-Review+2

(3 comments)

Carry Lars' +2.

http://gerrit.cloudera.org:8080/#/c/13242/3/tests/custom_cluster/test_data_cache.py
File tests/custom_cluster/test_data_cache.py:

http://gerrit.cloudera.org:8080/#/c/13242/3/tests/custom_cluster/test_data_cache.py@58
PS3, Line 58: t count m
> ...verifies that the cache hit count metrics are correct.
Done


http://gerrit.cloudera.org:8080/#/c/13242/3/tests/custom_cluster/test_data_cache.py@59
PS3, Line 59: es hit is
> "The exact number of bytes.."
Done


http://gerrit.cloudera.org:8080/#/c/13242/3/tests/custom_cluster/test_data_cache.py@63
PS3, Line 63:  warm up the cache. Expect no hits.
> make the query a variable?
Done



--
To view, visit http://gerrit.cloudera.org:8080/13242
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I98d5b8fa1d3fb25682a64bffaf56d751a140e4c9
Gerrit-Change-Number: 13242
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Sat, 04 May 2019 20:48:56 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8496: Fix flakiness of test data cache.py

2019-05-04 Thread Michael Ho (Code Review)
Hello Lars Volker, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/13242

to look at the new patch set (#4).

Change subject: IMPALA-8496: Fix flakiness of test_data_cache.py
..

IMPALA-8496: Fix flakiness of test_data_cache.py

test_data_cache.py was added as part of IMPALA-8341 to verify
that the DataCache hit / miss counts and the DataCache metrics
are working as expected. The test seems to fail intermittently
due to unexpected cache misses.

Part of the test creates a temp table from tpch_parquet.lineitem
and uses it to join against tpch_parquet.lineitem itself on the
l_orderkey column. The test expects a complete cache hit for
tpch_parquet.lineitem when joining against the temp table as it
should be cached entirely as part of CTAS statement. However, this
doesn't work as expected all the time. In particular, the data cache
internally divides up the key space into multiple shards and a key
is hashed to determine the shard it belongs to. By default, the
number of shards is the same as number of CPU cores (e.g. 16 for AWS
m5-4xlarge instance). Since the cache size is set to 500MB, each shard
will have a capacity of 31MB only. In some cases, it's possible that
some rows of l_orderkey are evicted if the shard they belong to grow
beyond 31MB. The problem is not deterministic as part of the cache key
is the modification time of the file, which changes from run-to-run as
it's essentially determined by the data loading time of the job. This
leads to flakiness of the test.

To fix this problem, this patch forces the data cache to use a single
shard only for determinisim. In addition, the test is also skipped for
non-HDFS and HDFS erasure encoding builds as it's dependent on the scan
range assignment. To exercise the cache more extensively, the plan is
to enable it by default for S3 builds instead of relying on BE and E2E
tests only.

Testing done:
- Ran test_data_cache.py 10+ times, each with different mtime
  for tpch_parquet.lineitem; Used to fail 2 out of 3 runs.

Change-Id: I98d5b8fa1d3fb25682a64bffaf56d751a140e4c9
---
M testdata/workloads/functional-query/queries/QueryTest/data-cache.test
M tests/custom_cluster/test_data_cache.py
2 files changed, 36 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/42/13242/4
--
To view, visit http://gerrit.cloudera.org:8080/13242
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I98d5b8fa1d3fb25682a64bffaf56d751a140e4c9
Gerrit-Change-Number: 13242
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 


[Impala-ASF-CR] IMPALA-8496: Fix flakiness of test data cache.py

2019-05-04 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13242 )

Change subject: IMPALA-8496: Fix flakiness of test_data_cache.py
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/13242/2/testdata/workloads/functional-query/queries/QueryTest/data-cache.test
File testdata/workloads/functional-query/queries/QueryTest/data-cache.test:

http://gerrit.cloudera.org:8080/#/c/13242/2/testdata/workloads/functional-query/queries/QueryTest/data-cache.test@41
PS2, Line 41: # Verifies that stale data from the cache is not used.
> nit: . instead of ;
Done


http://gerrit.cloudera.org:8080/#/c/13242/2/tests/custom_cluster/test_data_cache.py
File tests/custom_cluster/test_data_cache.py:

http://gerrit.cloudera.org:8080/#/c/13242/2/tests/custom_cluster/test_data_cache.py@39
PS2, Line 39: cache_force_single_shard
> Should we have a second test without that flag that just checks that the hi
I added a second test which only checks the metrics but not the counters in the 
profile. Seems a bit tricky to support both cases in data-cache.test unless we 
loosen the checks.



--
To view, visit http://gerrit.cloudera.org:8080/13242
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I98d5b8fa1d3fb25682a64bffaf56d751a140e4c9
Gerrit-Change-Number: 13242
Gerrit-PatchSet: 3
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Sat, 04 May 2019 17:56:00 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8496: Fix flakiness of test data cache.py

2019-05-04 Thread Michael Ho (Code Review)
Hello Lars Volker, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/13242

to look at the new patch set (#3).

Change subject: IMPALA-8496: Fix flakiness of test_data_cache.py
..

IMPALA-8496: Fix flakiness of test_data_cache.py

test_data_cache.py was added as part of IMPALA-8341 to verify
that the DataCache hit / miss counts and the DataCache metrics
are working as expected. The test seems to fail intermittently
due to unexpected cache misses.

Part of the test creates a temp table from tpch_parquet.lineitem
and uses it to join against tpch_parquet.lineitem itself on the
l_orderkey column. The test expects a complete cache hit for
tpch_parquet.lineitem when joining against the temp table as it
should be cached entirely as part of CTAS statement. However, this
doesn't work as expected all the time. In particular, the data cache
internally divides up the key space into multiple shards and a key
is hashed to determine the shard it belongs to. By default, the
number of shards is the same as number of CPU cores (e.g. 16 for AWS
m5-4xlarge instance). Since the cache size is set to 500MB, each shard
will have a capacity of 31MB only. In some cases, it's possible that
some rows of l_orderkey are evicted if the shard they belong to grow
beyond 31MB. The problem is not deterministic as part of the cache key
is the modification time of the file, which changes from run-to-run as
it's essentially determined by the data loading time of the job. This
leads to flakiness of the test.

To fix this problem, this patch forces the data cache to use a single
shard only for determinisim. In addition, the test is also skipped for
non-HDFS and HDFS erasure encoding builds as it's dependent on the scan
range assignment. To exercise the cache more extensively, the plan is
to enable it by default for S3 builds instead of relying on BE and E2E
tests only.

Testing done:
- Ran test_data_cache.py 10+ times, each with different mtime
  for tpch_parquet.lineitem; Used to fail 2 out of 3 runs.

Change-Id: I98d5b8fa1d3fb25682a64bffaf56d751a140e4c9
---
M testdata/workloads/functional-query/queries/QueryTest/data-cache.test
M tests/custom_cluster/test_data_cache.py
2 files changed, 35 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/42/13242/3
--
To view, visit http://gerrit.cloudera.org:8080/13242
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I98d5b8fa1d3fb25682a64bffaf56d751a140e4c9
Gerrit-Change-Number: 13242
Gerrit-PatchSet: 3
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 


[Impala-ASF-CR] IMPALA-8496: Fix flakiness of test data cache.py

2019-05-04 Thread Michael Ho (Code Review)
Michael Ho has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/13242 )

Change subject: IMPALA-8496: Fix flakiness of test_data_cache.py
..

IMPALA-8496: Fix flakiness of test_data_cache.py

test_data_cache.py was added as part of IMPALA-8341 to verify
that the DataCache hit / miss counts and the DataCache metrics
are working as expected. The test seems to fail intermittently
due to unexpected cache misses.

Part of the test creates a temp table from tpch_parquet.lineitem
and uses it to join against tpch_parquet.lineitem itself on the
l_orderkey column. The test expects a complete cache hit for
tpch_parquet.lineitem when joining against the temp table as it
should be cached entirely as part of CTAS statement. However, this
doesn't work as expected all the time. In particular, the data cache
internally divides up the key space into multiple shards and a key
is hashed to determine the shard it belongs to. By default, the
number of shards is the same as number of CPU cores (e.g. 16 for AWS
m5-4xlarge instance). Since the cache size is set to 500MB, each shard
will have a capacity of 31MB only. In some cases, it's possible that
some rows of l_orderkey are evicted if the shard they belong to grow
beyond 31MB. The problem is not deterministic as part of the cache key
is the modification time of the file, which changes from run-to-run as
it's essentially determined by the data loading time of the job. This
leads to flakiness of the test.

To fix this problem, this patch forces the data cache to use a single
shard only for determinisim. In addition, the test is also skipped for
non-HDFS and HDFS erasure encoding builds as it's dependent on the scan
range assignment. To exercise the cache more extensively, the plan is
to enable it by default for S3 builds instead of relying on BE and E2E
tests only.

Testing done:
- Ran test_data_cache.py 10+ times, each with different mtime
  for tpch_parquet.lineitem; Used to fail 2 out of 3 runs.

Change-Id: I98d5b8fa1d3fb25682a64bffaf56d751a140e4c9
---
M testdata/workloads/functional-query/queries/QueryTest/data-cache.test
M tests/custom_cluster/test_data_cache.py
2 files changed, 15 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/42/13242/2
--
To view, visit http://gerrit.cloudera.org:8080/13242
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I98d5b8fa1d3fb25682a64bffaf56d751a140e4c9
Gerrit-Change-Number: 13242
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 


[Impala-ASF-CR] IMPALA-8496: Fix flakiness of test data cache.py

2019-05-04 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13242 )

Change subject: IMPALA-8496: Fix flakiness of test_data_cache.py
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/13242/1/tests/custom_cluster/test_data_cache.py
File tests/custom_cluster/test_data_cache.py:

http://gerrit.cloudera.org:8080/#/c/13242/1/tests/custom_cluster/test_data_cache.py@38
PS1, Line 38:
> flake8: E502 the backslash is redundant between brackets
Done


http://gerrit.cloudera.org:8080/#/c/13242/1/tests/custom_cluster/test_data_cache.py@39
PS1, Line 39: c
> flake8: E131 continuation line unaligned for hanging indent
Done



--
To view, visit http://gerrit.cloudera.org:8080/13242
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I98d5b8fa1d3fb25682a64bffaf56d751a140e4c9
Gerrit-Change-Number: 13242
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Sat, 04 May 2019 17:13:47 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8496: Fix flakiness of test data cache.py

2019-05-04 Thread Michael Ho (Code Review)
Michael Ho has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/13242


Change subject: IMPALA-8496: Fix flakiness of test_data_cache.py
..

IMPALA-8496: Fix flakiness of test_data_cache.py

test_data_cache.py was added as part of IMPALA-8341 to verify
that the DataCache hit / miss counts and the DataCache metrics
are working as expected. The test seems to fail intermittently
due to unexpected cache misses.

Part of the test creates a temp table from tpch_parquet.lineitem
and uses it to join against tpch_parquet.lineitem itself on the
l_orderkey column. The test expects a complete cache hit for
tpch_parquet.lineitem when joining against the temp table as it
should be cached entirely as part of CTAS statement. However, this
doesn't work as expected all the time. In particular, the data cache
internally divides up the key space into multiple shards and a key
is hashed to determine the shard it belongs to. By default, the
number of shards is the same as number of CPU cores (e.g. 16 for AWS
m5-4xlarge instance). Since the cache size is set to 500MB, each shard
will have a capacity of 31MB only. In some cases, it's possible that
some rows of l_orderkey are evicted if the shard they belong to grow
beyond 31MB. The problem is not deterministic as part of the cache key
is the modification time of the file, which changes from run-to-run as
it's essentially determined by the data loading time of the job. This
leads to flakiness of the test.

To fix this problem, this patch forces the data cache to use a single
shard only for determinisim. In addition, the test is also skipped for
non-HDFS and HDFS erasure encoding builds as it's dependent on the scan
range assignment. To exercise the cache more extensively, the plan is
to enable it by default for S3 builds instead of relying on BE and E2E
tests only.

Testing done:
- Ran test_data_cache.py 10+ times, each with different mtime
  for tpch_parquet.lineitem; Used to fail 2 out of 3 runs.

Change-Id: I98d5b8fa1d3fb25682a64bffaf56d751a140e4c9
---
M testdata/workloads/functional-query/queries/QueryTest/data-cache.test
M tests/custom_cluster/test_data_cache.py
2 files changed, 15 insertions(+), 5 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/42/13242/1
--
To view, visit http://gerrit.cloudera.org:8080/13242
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I98d5b8fa1d3fb25682a64bffaf56d751a140e4c9
Gerrit-Change-Number: 13242
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 


[Impala-ASF-CR] IMPALA-8341: Data cache for remote reads

2019-05-02 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12987 )

Change subject: IMPALA-8341: Data cache for remote reads
..


Patch Set 10: Code-Review+2

(1 comment)

Carry Todd's +2

http://gerrit.cloudera.org:8080/#/c/12987/9/be/src/runtime/io/data-cache.cc
File be/src/runtime/io/data-cache.cc:

http://gerrit.cloudera.org:8080/#/c/12987/9/be/src/runtime/io/data-cache.cc@260
PS9, Line 260:   explicit CacheEntry(const Slice& value) {
> explicit :)
Done



--
To view, visit http://gerrit.cloudera.org:8080/12987
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc
Gerrit-Change-Number: 12987
Gerrit-PatchSet: 10
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 03 May 2019 02:46:01 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8341: Data cache for remote reads

2019-05-02 Thread Michael Ho (Code Review)
Michael Ho has uploaded a new patch set (#10). ( 
http://gerrit.cloudera.org:8080/12987 )

Change subject: IMPALA-8341: Data cache for remote reads
..

IMPALA-8341: Data cache for remote reads

This is a patch based on PhilZ's prototype: 
https://gerrit.cloudera.org/#/c/12683/

This change implements an IO data cache which is backed by
local storage. It implicitly relies on the OS page cache
management to shuffle data between memory and the storage
device. This is useful for caching data read from remote
filesystems (e.g. remote HDFS data node, S3, ABFS, ADLS).

A data cache is divided into one or more partitions based on
the configuration string which is a list of directories, separated
by comma, followed by the storage capacity per directory.
An example configuration string is like the following:
  --data_cache_config=/data/0,/data/1:150GB

In the configuration above, the cache may use up to 300GB of
storage space, with 150GB max for /data/0 and /data/1 respectively.

Each partition has a meta-data cache which tracks the mappings
of cache keys to the locations of the cached data. A cache key
is a tuple of (file's name, file's modification time, file offset)
and a cache entry is a tuple of (backing file, offset in the backing
file, length of the cached data, optional checksum). Note that the
cache currently doesn't support overlapping ranges. In other words,
if the cache contains an entry of a file for range [m, m+4MB), a lookup
for [m+4K, m+8K) will miss in the cache. In practice, we haven't seen
this as a problem but this may require further evaluation in the future.

Each partition stores its set of cached data in backing files created
on local storage. When inserting new data into the cache, the data is
appended to the current backing file in use. The storage consumption
of each cache entry counts towards the quota of that partition. When a
partition reaches its capacity, the least recently used (LRU) data in
that partition is evicted. Evicted data is removed from the underlying
storage by punching holes in the backing file it's stored in. As a
backing file reaches a certain size (by default 4TB), new data will
stop being appended to it and a new file will be created instead. Note
that due to hole punching, the backing file is actually sparse. When
the number of backing files per partition exceeds,
--data_cache_max_files_per_partition, files are deleted in the order
in which they are created. Stale cache entries referencing deleted
files are erased lazily or evicted due to inactivity.

Optionally, checksumming can be enabled to verify read from the cache
is consistent with what was inserted and to verify that multiple attempted
insertions with the same cache key have the same cache content.
Checksumming is enabled by default for debug builds.

To probe for cached data in the cache, the interface Lookup() is used;
To insert data into the cache, the interface Store() is used. Please note
that eviction happens inline currently during Store().

This patch also added two startup flags for start-impala-cluster.py:
'--data_cache_dir' specifies the base directory in which each Impalad
creates the caching directory
'--data_cache_size' specifies the capacity string for each cache directory.

Testing done:
- added a new BE and EE test
- exhaustive (debug, release) builds with cache enabled
- core ASAN build with cache enabled

Perf:
- 16-streams TPCDS at 3TB in a 20 node S3 cluster shows about 30% improvement
over runs without the cache. Each node has a cache size of 150GB per node.
The performance is at parity with a configuration of a HDFS cluster using
EBS as the storage.

Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc
---
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/runtime/io/CMakeLists.txt
A be/src/runtime/io/data-cache-test.cc
A be/src/runtime/io/data-cache.cc
A be/src/runtime/io/data-cache.h
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/disk-io-mgr.h
M be/src/runtime/io/hdfs-file-reader.cc
M be/src/runtime/io/hdfs-file-reader.h
M be/src/runtime/io/request-context.h
M be/src/util/filesystem-util-test.cc
M be/src/util/filesystem-util.cc
M be/src/util/filesystem-util.h
M be/src/util/impalad-metrics.cc
M be/src/util/impalad-metrics.h
M bin/start-impala-cluster.py
M common/thrift/metrics.json
A testdata/workloads/functional-query/queries/QueryTest/data-cache.test
M tests/common/custom_cluster_test_suite.py
M tests/common/impala_test_suite.py
A tests/custom_cluster/test_data_cache.py
M tests/custom_cluster/test_krpc_metrics.py
23 files changed, 2,150 insertions(+), 55 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/12987/10
--
To view, visit http://gerrit.cloudera.org:8080/12987
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: 

[Impala-ASF-CR] IMPALA-8341: Data cache for remote reads

2019-05-01 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12987 )

Change subject: IMPALA-8341: Data cache for remote reads
..


Patch Set 9: Code-Review+1

Carry Lars' +1


--
To view, visit http://gerrit.cloudera.org:8080/12987
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc
Gerrit-Change-Number: 12987
Gerrit-PatchSet: 9
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Thu, 02 May 2019 02:24:51 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8341: Data cache for remote reads

2019-05-01 Thread Michael Ho (Code Review)
Michael Ho has uploaded a new patch set (#9). ( 
http://gerrit.cloudera.org:8080/12987 )

Change subject: IMPALA-8341: Data cache for remote reads
..

IMPALA-8341: Data cache for remote reads

This is a patch based on PhilZ's prototype: 
https://gerrit.cloudera.org/#/c/12683/

This change implements an IO data cache which is backed by
local storage. It implicitly relies on the OS page cache
management to shuffle data between memory and the storage
device. This is useful for caching data read from remote
filesystems (e.g. remote HDFS data node, S3, ABFS, ADLS).

A data cache is divided into one or more partitions based on
the configuration string which is a list of directories, separated
by comma, followed by the storage capacity per directory.
An example configuration string is like the following:
  --data_cache_config=/data/0,/data/1:150GB

In the configuration above, the cache may use up to 300GB of
storage space, with 150GB max for /data/0 and /data/1 respectively.

Each partition has a meta-data cache which tracks the mappings
of cache keys to the locations of the cached data. A cache key
is a tuple of (file's name, file's modification time, file offset)
and a cache entry is a tuple of (backing file, offset in the backing
file, length of the cached data, optional checksum). Note that the
cache currently doesn't support overlapping ranges. In other words,
if the cache contains an entry of a file for range [m, m+4MB), a lookup
for [m+4K, m+8K) will miss in the cache. In practice, we haven't seen
this as a problem but this may require further evaluation in the future.

Each partition stores its set of cached data in backing files created
on local storage. When inserting new data into the cache, the data is
appended to the current backing file in use. The storage consumption
of each cache entry counts towards the quota of that partition. When a
partition reaches its capacity, the least recently used (LRU) data in
that partition is evicted. Evicted data is removed from the underlying
storage by punching holes in the backing file it's stored in. As a
backing file reaches a certain size (by default 4TB), new data will
stop being appended to it and a new file will be created instead. Note
that due to hole punching, the backing file is actually sparse. When
the number of backing files per partition exceeds,
--data_cache_max_files_per_partition, files are deleted in the order
in which they are created. Stale cache entries referencing deleted
files are erased lazily or evicted due to inactivity.

Optionally, checksumming can be enabled to verify read from the cache
is consistent with what was inserted and to verify that multiple attempted
insertions with the same cache key have the same cache content.
Checksumming is enabled by default for debug builds.

To probe for cached data in the cache, the interface Lookup() is used;
To insert data into the cache, the interface Store() is used. Please note
that eviction happens inline currently during Store().

This patch also added two startup flags for start-impala-cluster.py:
'--data_cache_dir' specifies the base directory in which each Impalad
creates the caching directory
'--data_cache_size' specifies the capacity string for each cache directory.

Testing done:
- added a new BE and EE test
- exhaustive (debug, release) builds with cache enabled
- core ASAN build with cache enabled

Perf:
- 16-streams TPCDS at 3TB in a 20 node S3 cluster shows about 30% improvement
over runs without the cache. Each node has a cache size of 150GB per node.
The performance is at parity with a configuration of a HDFS cluster using
EBS as the storage.

Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc
---
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/runtime/io/CMakeLists.txt
A be/src/runtime/io/data-cache-test.cc
A be/src/runtime/io/data-cache.cc
A be/src/runtime/io/data-cache.h
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/disk-io-mgr.h
M be/src/runtime/io/hdfs-file-reader.cc
M be/src/runtime/io/hdfs-file-reader.h
M be/src/runtime/io/request-context.h
M be/src/util/filesystem-util-test.cc
M be/src/util/filesystem-util.cc
M be/src/util/filesystem-util.h
M be/src/util/impalad-metrics.cc
M be/src/util/impalad-metrics.h
M bin/start-impala-cluster.py
M common/thrift/metrics.json
A testdata/workloads/functional-query/queries/QueryTest/data-cache.test
M tests/common/custom_cluster_test_suite.py
M tests/common/impala_test_suite.py
A tests/custom_cluster/test_data_cache.py
M tests/custom_cluster/test_krpc_metrics.py
23 files changed, 2,150 insertions(+), 55 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/12987/9
--
To view, visit http://gerrit.cloudera.org:8080/12987
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: 

[Impala-ASF-CR] IMPALA-8341: Data cache for remote reads

2019-05-01 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12987 )

Change subject: IMPALA-8341: Data cache for remote reads
..


Patch Set 8:

(5 comments)

Accidentally pushed a draft. Will push PS9.

http://gerrit.cloudera.org:8080/#/c/12987/7/be/src/runtime/io/data-cache-test.cc
File be/src/runtime/io/data-cache-test.cc:

http://gerrit.cloudera.org:8080/#/c/12987/7/be/src/runtime/io/data-cache-test.cc@274
PS7, Line 274:
> I think gflags has a built in 'google::FlagsSaver', no? It handles restorin
Nice. I don't think we have an equivalent of that in Impala although it's a 
nice improvement. Filed IMPALA-8480.


http://gerrit.cloudera.org:8080/#/c/12987/7/be/src/runtime/io/data-cache.cc
File be/src/runtime/io/data-cache.cc:

http://gerrit.cloudera.org:8080/#/c/12987/7/be/src/runtime/io/data-cache.cc@245
PS7, Line 245:   explicit CacheFile(std::string path) : path_(move(path)) { }
> nit: explicit
Done


http://gerrit.cloudera.org:8080/#/c/12987/7/be/src/runtime/io/data-cache.cc@301
PS7, Line 301:
> here I think you could just call lock_.DCheckLocked()  right?
Done


http://gerrit.cloudera.org:8080/#/c/12987/7/be/src/runtime/io/data-cache.cc@409
PS7, Line 409: uint8_t* buffer) {
> why not have UnpackCacheEntry return the CacheEntry object? (the generated
Done


http://gerrit.cloudera.org:8080/#/c/12987/7/be/src/util/impalad-metrics.h
File be/src/util/impalad-metrics.h:

http://gerrit.cloudera.org:8080/#/c/12987/7/be/src/util/impalad-metrics.h@91
PS7, Line 91: not ins
> In the rest of the code "skipped" often means "pruned" and is a good thing,
Done



--
To view, visit http://gerrit.cloudera.org:8080/12987
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc
Gerrit-Change-Number: 12987
Gerrit-PatchSet: 8
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Thu, 02 May 2019 00:50:47 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8341: Data cache for remote reads

2019-05-01 Thread Michael Ho (Code Review)
Michael Ho has uploaded a new patch set (#8). ( 
http://gerrit.cloudera.org:8080/12987 )

Change subject: IMPALA-8341: Data cache for remote reads
..

IMPALA-8341: Data cache for remote reads

This is a patch based on PhilZ's prototype: 
https://gerrit.cloudera.org/#/c/12683/

This change implements an IO data cache which is backed by
local storage. It implicitly relies on the OS page cache
management to shuffle data between memory and the storage
device. This is useful for caching data read from remote
filesystems (e.g. remote HDFS data node, S3, ABFS, ADLS).

A data cache is divided into one or more partitions based on
the configuration string which is a list of directories, separated
by comma, followed by the storage capacity per directory.
An example configuration string is like the following:
  --data_cache_config=/data/0,/data/1:150GB

In the configuration above, the cache may use up to 300GB of
storage space, with 150GB max for /data/0 and /data/1 respectively.

Each partition has a meta-data cache which tracks the mappings
of cache keys to the locations of the cached data. A cache key
is a tuple of (file's name, file's modification time, file offset)
and a cache entry is a tuple of (backing file, offset in the backing
file, length of the cached data, optional checksum). Note that the
cache currently doesn't support overlapping ranges. In other words,
if the cache contains an entry of a file for range [m, m+4MB), a lookup
for [m+4K, m+8K) will miss in the cache. In practice, we haven't seen
this as a problem but this may require further evaluation in the future.

Each partition stores its set of cached data in backing files created
on local storage. When inserting new data into the cache, the data is
appended to the current backing file in use. The storage consumption
of each cache entry counts towards the quota of that partition. When a
partition reaches its capacity, the least recently used (LRU) data in
that partition is evicted. Evicted data is removed from the underlying
storage by punching holes in the backing file it's stored in. As a
backing file reaches a certain size (by default 4TB), new data will
stop being appended to it and a new file will be created instead. Note
that due to hole punching, the backing file is actually sparse. When
the number of backing files per partition exceeds,
--data_cache_max_files_per_partition, files are deleted in the order
in which they are created. Stale cache entries referencing deleted
files are erased lazily or evicted due to inactivity.

Optionally, checksumming can be enabled to verify read from the cache
is consistent with what was inserted and to verify that multiple attempted
insertions with the same cache key have the same cache content.
Checksumming is enabled by default for debug builds.

To probe for cached data in the cache, the interface Lookup() is used;
To insert data into the cache, the interface Store() is used. Please note
that eviction happens inline currently during Store().

This patch also added two startup flags for start-impala-cluster.py:
'--data_cache_dir' specifies the base directory in which each Impalad
creates the caching directory
'--data_cache_size' specifies the capacity string for each cache directory.

Testing done:
- added a new BE and EE test
- exhaustive (debug, release) builds with cache enabled
- core ASAN build with cache enabled

Perf:
- 16-streams TPCDS at 3TB in a 20 node S3 cluster shows about 30% improvement
over runs without the cache. Each node has a cache size of 150GB per node.
The performance is at parity with a configuration of a HDFS cluster using
EBS as the storage.

Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc
---
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/runtime/io/CMakeLists.txt
A be/src/runtime/io/data-cache-test.cc
A be/src/runtime/io/data-cache.cc
A be/src/runtime/io/data-cache.h
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/disk-io-mgr.h
M be/src/runtime/io/hdfs-file-reader.cc
M be/src/runtime/io/hdfs-file-reader.h
M be/src/runtime/io/request-context.h
M be/src/util/filesystem-util-test.cc
M be/src/util/filesystem-util.cc
M be/src/util/filesystem-util.h
M be/src/util/impalad-metrics.cc
M be/src/util/impalad-metrics.h
M bin/start-impala-cluster.py
M common/thrift/metrics.json
A testdata/workloads/functional-query/queries/QueryTest/data-cache.test
M tests/common/custom_cluster_test_suite.py
M tests/common/impala_test_suite.py
A tests/custom_cluster/test_data_cache.py
M tests/custom_cluster/test_krpc_metrics.py
23 files changed, 2,150 insertions(+), 55 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/12987/8
--
To view, visit http://gerrit.cloudera.org:8080/12987
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: 

[Impala-ASF-CR] IMPALA-8341: Data cache for remote reads

2019-04-30 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12987 )

Change subject: IMPALA-8341: Data cache for remote reads
..


Patch Set 7:

(41 comments)

http://gerrit.cloudera.org:8080/#/c/12987/5/be/src/runtime/io/data-cache-test.cc
File be/src/runtime/io/data-cache-test.cc:

http://gerrit.cloudera.org:8080/#/c/12987/5/be/src/runtime/io/data-cache-test.cc@273
PS5, Line 273:   auto s =
> I just found out we have ScopedFlagSetter in scoped-flag-setter.h, I think
Done


http://gerrit.cloudera.org:8080/#/c/12987/6/be/src/runtime/io/data-cache.h
File be/src/runtime/io/data-cache.h:

http://gerrit.cloudera.org:8080/#/c/12987/6/be/src/runtime/io/data-cache.h@112
PS6, Line 112:
 :   /// 'config' is the configurati
> per the commit message, we've moved to a single quota rather than per-direc
Yes, comments were stale. Fixed in new patch.


http://gerrit.cloudera.org:8080/#/c/12987/6/be/src/runtime/io/data-cache.h@261
PS6, Line 261:
> perhaps init to -1?
Done. Switched to initializing it to 0 in CreateCacheFile() instead.


http://gerrit.cloudera.org:8080/#/c/12987/5/be/src/runtime/io/data-cache.h
File be/src/runtime/io/data-cache.h:

http://gerrit.cloudera.org:8080/#/c/12987/5/be/src/runtime/io/data-cache.h@215
PS5, Line 215:
> 'start_reclaim'?
Done


http://gerrit.cloudera.org:8080/#/c/12987/5/be/src/runtime/io/data-cache.h@337
PS5, Line 337:   };
> Can you mention in the comment that the pool has only 1 thread and why you'
The reason for using a pool is that we want to be able to queue the work for 
deferred processing by another thread and ThreadPool seems to provide the right 
abstraction. May be there are other classes in utility/ which are also 
applicable ?


http://gerrit.cloudera.org:8080/#/c/12987/5/be/src/runtime/io/data-cache.h@341
PS5, Line 341:
> Some functions around deleting files are called "Close...". We should point
Done


http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.cc
File be/src/runtime/io/data-cache.cc:

http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.cc@306
PS4, Line 306:   cache_files_.emplace_back(std::mo
> This check is racy. More allocation could have happened since 'insertion_of
Done


http://gerrit.cloudera.org:8080/#/c/12987/5/be/src/runtime/io/data-cache.cc
File be/src/runtime/io/data-cache.cc:

http://gerrit.cloudera.org:8080/#/c/12987/5/be/src/runtime/io/data-cache.cc@95
PS5, Line 95: ng files opened, ol
> switch to single thread, or mention pool here
Done


http://gerrit.cloudera.org:8080/#/c/12987/5/be/src/runtime/io/data-cache.cc@112
PS5, Line 112: DeleteFile
> Can we call this DeleteFile? Otherwise there's a third thing to keep track
Done


http://gerrit.cloudera.org:8080/#/c/12987/5/be/src/runtime/io/data-cache.cc@125
PS5, Line 125: d the lock in
> It's not obvious to me why we only need a percpu_rwlock here. Can you add a
Done. Please also see explanation in comments above.


http://gerrit.cloudera.org:8080/#/c/12987/5/be/src/runtime/io/data-cache.cc@208
PS5, Line 208:
> nit: singular
Done


http://gerrit.cloudera.org:8080/#/c/12987/5/be/src/runtime/io/data-cache.cc@335
PS5, Line 335: ing files left over fro
> Similar to other comments, I'd call this "VerifySizeAndDeleteFiles", I thin
Done


http://gerrit.cloudera.org:8080/#/c/12987/5/be/src/runtime/io/data-cache.cc@395
PS5, Line 395: }
> Will this handle hole punching through the eviction logic?
Yes.


http://gerrit.cloudera.org:8080/#/c/12987/5/be/src/runtime/io/data-cache.cc@436
PS5, Line 436:   // Try verifying the checksum of the new buffer matches that 
of the existing entry.
> nit: only append the "checksum $3" part if checksumming is enabled? I don't
Keeping it for simplicity.


http://gerrit.cloudera.org:8080/#/c/12987/5/be/src/runtime/io/data-cache.cc@457
PS5, Line 457: == nullptr)) r
> start_reclaim?
Done


http://gerrit.cloudera.org:8080/#/c/12987/5/be/src/runtime/io/data-cache.cc@633
PS5, Line 633: URN_IF_ERROR(p
> start_reclaim?
Done


http://gerrit.cloudera.org:8080/#/c/12987/6/be/src/runtime/io/data-cache.cc
File be/src/runtime/io/data-cache.cc:

http://gerrit.cloudera.org:8080/#/c/12987/6/be/src/runtime/io/data-cache.cc@64
PS6, Line 64: DEFINE_int64(data_cache_file_max_size_bytes, 1L << 40 /* 1TB */,
> - can you add a comment here like /* 4TB */?
Done.

Yes, according to https://access.redhat.com/solutions/1532, ext4 should support 
up to 16TB. Apparently, there is still need to support ext3 which has a limit 
of 2TB. So, I will set it to 1TB to be safe.


http://gerrit.cloudera.org:8080/#/c/12987/6/be/src/runtime/io/data-cache.cc@68
PS6, Line 68: "(Advanced) The maximum number of allowed opened files. This 
must be at least the "
> Setting this per-partition creates a dependency between this and the number
Makes sense. Done.


http://gerrit.cloudera.org:8080/#/c/12987/6/be/src/runtime/io/data-cache.cc@70
PS6, Line 70: 

[Impala-ASF-CR] IMPALA-8341: Data cache for remote reads

2019-04-30 Thread Michael Ho (Code Review)
Hello Thomas Marshall, Lars Volker, David Rorke, Sahil Takiar, Todd Lipcon, Tim 
Armstrong, Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12987

to look at the new patch set (#7).

Change subject: IMPALA-8341: Data cache for remote reads
..

IMPALA-8341: Data cache for remote reads

This is a patch based on PhilZ's prototype: 
https://gerrit.cloudera.org/#/c/12683/

This change implements an IO data cache which is backed by
local storage. It implicitly relies on the OS page cache
management to shuffle data between memory and the storage
device. This is useful for caching data read from remote
filesystems (e.g. remote HDFS data node, S3, ABFS, ADLS).

A data cache is divided into one or more partitions based on
the configuration string which is a list of directories, separated
by comma, followed by the storage capacity per directory.
An example configuration string is like the following:
  --data_cache_config=/data/0,/data/1:150GB

In the configuration above, the cache may use up to 300GB of
storage space, with 150GB max for /data/0 and /data/1 respectively.

Each partition has a meta-data cache which tracks the mappings
of cache keys to the locations of the cached data. A cache key
is a tuple of (file's name, file's modification time, file offset)
and a cache entry is a tuple of (backing file, offset in the backing
file, length of the cached data, optional checksum). Note that the
cache currently doesn't support overlapping ranges. In other words,
if the cache contains an entry of a file for range [m, m+4MB), a lookup
for [m+4K, m+8K) will miss in the cache. In practice, we haven't seen
this as a problem but this may require further evaluation in the future.

Each partition stores its set of cached data in backing files created
on local storage. When inserting new data into the cache, the data is
appended to the current backing file in use. The storage consumption
of each cache entry counts towards the quota of that partition. When a
partition reaches its capacity, the least recently used (LRU) data in
that partition is evicted. Evicted data is removed from the underlying
storage by punching holes in the backing file it's stored in. As a
backing file reaches a certain size (by default 4TB), new data will
stop being appended to it and a new file will be created instead. Note
that due to hole punching, the backing file is actually sparse. When
the number of backing files per partition exceeds,
--data_cache_max_files_per_partition, files are deleted in the order
in which they are created. Stale cache entries referencing deleted
files are erased lazily or evicted due to inactivity.

Optionally, checksumming can be enabled to verify read from the cache
is consistent with what was inserted and to verify that multiple attempted
insertions with the same cache key have the same cache content.
Checksumming is enabled by default for debug builds.

To probe for cached data in the cache, the interface Lookup() is used;
To insert data into the cache, the interface Store() is used. Please note
that eviction happens inline currently during Store().

This patch also added two startup flags for start-impala-cluster.py:
'--data_cache_dir' specifies the base directory in which each Impalad
creates the caching directory
'--data_cache_size' specifies the capacity string for each cache directory.

Testing done:
- added a new BE and EE test
- exhaustive (debug, release) builds with cache enabled
- core ASAN build with cache enabled

Perf:
- 16-streams TPCDS at 3TB in a 20 node S3 cluster shows about 30% improvement
over runs without the cache. Each node has a cache size of 150GB per node.
The performance is at parity with a configuration of a HDFS cluster using
EBS as the storage.

Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc
---
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/runtime/io/CMakeLists.txt
A be/src/runtime/io/data-cache-test.cc
A be/src/runtime/io/data-cache.cc
A be/src/runtime/io/data-cache.h
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/disk-io-mgr.h
M be/src/runtime/io/hdfs-file-reader.cc
M be/src/runtime/io/hdfs-file-reader.h
M be/src/runtime/io/request-context.h
M be/src/util/filesystem-util-test.cc
M be/src/util/filesystem-util.cc
M be/src/util/filesystem-util.h
M be/src/util/impalad-metrics.cc
M be/src/util/impalad-metrics.h
M bin/start-impala-cluster.py
M common/thrift/metrics.json
A testdata/workloads/functional-query/queries/QueryTest/data-cache.test
M tests/common/custom_cluster_test_suite.py
M tests/common/impala_test_suite.py
A tests/custom_cluster/test_data_cache.py
M tests/custom_cluster/test_krpc_metrics.py
23 files changed, 2,157 insertions(+), 55 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/12987/7
--
To view, visit http://gerrit.cloudera.org:8080/12987
To 

[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard

2019-04-29 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13020 )

Change subject: IMPALA-8344: Add support for running the minicluster with 
S3Guard
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13020/2/testdata/bin/load-test-warehouse-snapshot.sh
File testdata/bin/load-test-warehouse-snapshot.sh:

http://gerrit.cloudera.org:8080/#/c/13020/2/testdata/bin/load-test-warehouse-snapshot.sh@62
PS2, Line 62: if [[ "${S3GUARD_ENABLED}" = "true" ]]; then
May be useful to verify that S3GUARD_DYNAMODB_TABLE and S3GUARD_DYNAMODB_REGION 
are not empty if S3GUARD_ENABLED is "true" ?



--
To view, visit http://gerrit.cloudera.org:8080/13020
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d
Gerrit-Change-Number: 13020
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Comment-Date: Mon, 29 Apr 2019 20:24:26 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8341: Data cache for remote reads

2019-04-29 Thread Michael Ho (Code Review)
Hello Thomas Marshall, Lars Volker, David Rorke, Sahil Takiar, Todd Lipcon, Tim 
Armstrong, Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12987

to look at the new patch set (#6).

Change subject: IMPALA-8341: Data cache for remote reads
..

IMPALA-8341: Data cache for remote reads

This is a patch based on PhilZ's prototype: 
https://gerrit.cloudera.org/#/c/12683/

This change implements an IO data cache which is backed by
local storage. It implicitly relies on the OS page cache
management to shuffle data between memory and the storage
device. This is useful for caching data read from remote
filesystems (e.g. remote HDFS data node, S3, ABFS, ADLS).

A data cache is divided into one or more partitions based on
the configuration string which is a list of directories, separated
by comma, followed by the storage capacity per directory.
An example configuration string is like the following:
  --data_cache_config=/data/0,/data/1:150GB

In the configuration above, the cache may use up to 300GB of
storage space, with 150GB max for /data/0 and /data/1 respectively.

Each partition has a meta-data cache which tracks the mappings
of cache keys to the locations of the cached data. A cache key
is a tuple of (file's name, file's modification time, file offset)
and a cache entry is a tuple of (backing file, offset in the backing
file, length of the cached data, optional checksum). Note that the
cache currently doesn't support overlapping ranges. In other words,
if the cache contains an entry of a file for range [m, m+4MB), a lookup
for [m+4K, m+8K) will miss in the cache. In practice, we haven't seen
this as a problem but this may require further evaluation in the future.

Each partition stores its set of cached data in backing files created
on local storage. When inserting new data into the cache, the data is
appended to the current backing file in use. The storage consumption
of each cache entry counts towards the quota of that partition. When a
partition reaches its capacity, the least recently used (LRU) data in
that partition is evicted. Evicted data is removed from the underlying
storage by punching holes in the backing file it's stored in. As a
backing file reaches a certain size (by default 4TB), new data will
stop being appended to it and a new file will be created instead. Note
that due to hole punching, the backing file is actually sparse. When
the number of backing files per partition exceeds,
--data_cache_max_files_per_partition, files are deleted in the order
in which they are created. Stale cache entries referencing deleted
files are erased lazily or evicted due to inactivity.

Optionally, checksumming can be enabled to verify read from the cache
is consistent with what was inserted and to verify that multiple attempted
insertions with the same cache key have the same cache content.
Checksumming is enabled by default for debug builds.

To probe for cached data in the cache, the interface Lookup() is used;
To insert data into the cache, the interface Store() is used. Please note
that eviction happens inline currently during Store().

This patch also added two startup flags for start-impala-cluster.py:
'--data_cache_dir' specifies the base directory in which each Impalad
creates the caching directory
'--data_cache_size' specifies the capacity string for each cache directory.

Testing done:
- added a new BE and EE test
- exhaustive (debug, release) builds with cache enabled
- core ASAN build with cache enabled

Perf:
- 16-streams TPCDS at 3TB in a 20 node S3 cluster shows about 30% improvement
over runs without the cache. Each node has a cache size of 150GB per node.
The performance is at parity with a configuration of a HDFS cluster using
EBS as the storage.

Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc
---
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/runtime/io/CMakeLists.txt
A be/src/runtime/io/data-cache-test.cc
A be/src/runtime/io/data-cache.cc
A be/src/runtime/io/data-cache.h
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/disk-io-mgr.h
M be/src/runtime/io/hdfs-file-reader.cc
M be/src/runtime/io/hdfs-file-reader.h
M be/src/runtime/io/request-context.h
M be/src/util/filesystem-util-test.cc
M be/src/util/filesystem-util.cc
M be/src/util/filesystem-util.h
M be/src/util/impalad-metrics.cc
M be/src/util/impalad-metrics.h
M bin/start-impala-cluster.py
M common/thrift/metrics.json
A testdata/workloads/functional-query/queries/QueryTest/data-cache.test
M tests/common/custom_cluster_test_suite.py
M tests/common/impala_test_suite.py
A tests/custom_cluster/test_data_cache.py
M tests/custom_cluster/test_krpc_metrics.py
23 files changed, 2,062 insertions(+), 55 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/12987/6
--
To view, visit http://gerrit.cloudera.org:8080/12987
To 

[Impala-ASF-CR] IMPALA-8341: Data cache for remote reads

2019-04-27 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12987 )

Change subject: IMPALA-8341: Data cache for remote reads
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12987/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12987/4//COMMIT_MSG@61
PS4, Line 61: Testing done: a new BE test was added; core test with cache 
enabled.
> I have a bit of trouble in getting this to work in the mini-cluster. May be
Ended up adding a startup flag to force use of the cache even for local reads. 
A new custom cluster test was also added for sanity check.



--
To view, visit http://gerrit.cloudera.org:8080/12987
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc
Gerrit-Change-Number: 12987
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Sat, 27 Apr 2019 07:27:42 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8341: Data cache for remote reads

2019-04-27 Thread Michael Ho (Code Review)
Michael Ho has uploaded a new patch set (#5). ( 
http://gerrit.cloudera.org:8080/12987 )

Change subject: IMPALA-8341: Data cache for remote reads
..

IMPALA-8341: Data cache for remote reads

This is a patch based on PhilZ's prototype: 
https://gerrit.cloudera.org/#/c/12683/

This change implements an IO data cache which is backed by
local storage. It implicitly relies on the OS page cache
management to shuffle data between memory and the storage
device. This is useful for caching data read from remote
filesystems (e.g. remote HDFS data node, S3, ABFS, ADLS).

A data cache is divided into one or more partitions based on
the configuration string which is a list of directories, separated
by comma, followed by the storage capacity per directory.
An example configuration string is like the following:
  --data_cache_config=/data/0,/data/1:150GB

In the configuration above, the cache may use up to 300GB of
storage space, with 150GB max for /data/0 and /data/1 respectively.

Each partition has a meta-data cache which tracks the mappings
of cache keys to the locations of the cached data. A cache key
is a tuple of (file's name, file's modification time, file offset)
and a cache entry is a tuple of (backing file, offset in the backing
file, length of the cached data, optional checksum). Note that the
cache currently doesn't support overlapping ranges. In other words,
if the cache contains an entry of a file for range [m, m+4MB), a lookup
for [m+4K, m+8K) will miss in the cache. In practice, we haven't seen
this as a problem but this may require further evaluation in the future.

Each partition stores its set of cached data in backing files created
on local storage. When inserting new data into the cache, the data is
appended to the current backing file in use. The storage consumption
of each cache entry counts towards the quota of that partition. When a
partition reaches its capacity, the least recently used (LRU) data in
that partition is evicted. Evicted data is removed from the underlying
storage by punching holes in the backing file it's stored in. As a
backing file reaches a certain size (by default 4TB), new data will
stop being appended to it and a new file will be created instead. Note
that due to hole punching, the backing file is actually sparse. When
the number of backing files per partition exceeds,
--data_cache_max_files_per_partition, files are deleted in the order
in which they are created. Stale cache entries referencing deleted
files are erased lazily or evicted due to inactivity.

Optionally, checksumming can be enabled to verify read from the cache
is consistent with what was inserted and to verify that multiple attempted
insertions with the same cache key have the same cache content.
Checksumming is enabled by default for debug builds.

To probe for cached data in the cache, the interface Lookup() is used;
To insert data into the cache, the interface Store() is used. Please note
that eviction happens inline currently during Store().

This patch also added two startup flags for start-impala-cluster.py:
'--data_cache_dir' specifies the base directory in which each Impalad
creates the caching directory
'--data_cache_size' specifies the capacity string for each cache directory.

Testing done:
- added a new BE and EE test
- exhaustive (debug, release) builds with cache enabled
- core ASAN build with cache enabled

Perf:
- 16-streams TPCDS at 3TB in a 20 node S3 cluster shows about 30% improvement
over runs without the cache. Each node has a cache size of 150GB per node.
The performance is at parity with a configuration of a HDFS cluster using
EBS as the storage.

Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc
---
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/runtime/io/CMakeLists.txt
A be/src/runtime/io/data-cache-test.cc
A be/src/runtime/io/data-cache.cc
A be/src/runtime/io/data-cache.h
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/disk-io-mgr.h
M be/src/runtime/io/hdfs-file-reader.cc
M be/src/runtime/io/hdfs-file-reader.h
M be/src/runtime/io/request-context.h
M be/src/util/filesystem-util-test.cc
M be/src/util/filesystem-util.cc
M be/src/util/filesystem-util.h
M be/src/util/impalad-metrics.cc
M be/src/util/impalad-metrics.h
M bin/start-impala-cluster.py
M common/thrift/metrics.json
A testdata/workloads/functional-query/queries/QueryTest/data-cache.test
M tests/common/custom_cluster_test_suite.py
M tests/common/impala_test_suite.py
A tests/custom_cluster/test_data_cache.py
M tests/custom_cluster/test_krpc_metrics.py
23 files changed, 2,059 insertions(+), 55 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/12987/5
--
To view, visit http://gerrit.cloudera.org:8080/12987
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: 

[Impala-ASF-CR] IMPALA-8341: Data cache for remote reads

2019-04-27 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12987 )

Change subject: IMPALA-8341: Data cache for remote reads
..


Patch Set 5:

(32 comments)

http://gerrit.cloudera.org:8080/#/c/12987/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12987/4//COMMIT_MSG@61
PS4, Line 61: '--data_cache_dir' specifies the base directory in which each 
Impalad
> Can we add a custom cluster test to sanity check that it works end to end.
I have a bit of trouble in getting this to work in the mini-cluster. May be 
it's easier with the dockerised test.


http://gerrit.cloudera.org:8080/#/c/12987/4//COMMIT_MSG@62
PS4, Line 62: creates the caching directory
> Do we want to consider enabling this by default for the dockerised tests as
Definitely. Will also do so for S3 builds.


http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/exec/hdfs-scan-node-base.cc@365
PS4, Line 365:   "DataCacheHitCount", TUnit::UNIT);
> Do we have tests for these to make sure they show up in the profiles and wo
Done


http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache-test.cc
File be/src/runtime/io/data-cache-test.cc:

http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache-test.cc@138
PS4, Line 138:
> Move to the other test constants (TEMP_BUFFER_SIZE etc)?
Done


http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache-test.cc@226
PS4, Line 226:
> nit: This could now be 4 * FLAGS_data_cache_file_max_size ?
Done


http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache-test.cc@349
PS4, Line 349:  num_entries = 0
> Can they be to separate tests?
Done


http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache-test.cc@410
PS4, Line 410: mp_buf
> nit: typo
Done


http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.h
File be/src/runtime/io/data-cache.h:

http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.h@83
PS4, Line 83: eviction from it happen s
> That's controlled by --data_cache_write_concurrency, right? Mention here?
Done


http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.h@87
PS4, Line 87: /// of 4KB so any data inserted will be rounded up to the nearest 
multiple of 4KB.
> Do we plan to look into picking partitions on faster disks with higher prob
Ideally, we want to keep the hotter data in the faster media while keeping the 
lukewarm data in the slower media. Added a TODO.


http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.h@88
PS4, Line 88: ///
> Yeah this scenario is a bit concerning for me still since it's conceivable
I added a "simple" implementation with rw-lock and lazy cache entry eviction. 
If it's deemed too complicated, please let me know and I can undo it. Also 
added a test case for it.


http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.h@145
PS4, Line 145: ,
> nit: formatting
Done


http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.h@197
PS4, Line 197: /// - removes any stale backing file in this partition
> Should we also delete the files when we close them? There's a distinction i
This was needed for data-cache-test.cc as we need to close the files before 
verifying their sizes. However, it seems that we can hide all those internal 
details in VerifyFileSizes(), which is renamed to CloseAndVerifyFileSizes();


http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.h@203
PS4, Line 203:
> Should we pass const CacheKey& here and convert it in the implementation?
Done


http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.h@224
PS4, Line 224:  void EvictedEnt
> nit: VerifyFileSizes
Done


http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.cc
File be/src/runtime/io/data-cache.cc:

http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.cc@72
PS4, Line 72: "(Advanced) Enable checksumming for the cached buffer.");
> static const char*?
This is actually a static class member of DataCache.


http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.cc@75
PS4, Line 75: namespace io {
> Should this be a class, given it has a c'tor and d'tor?
Done


http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.cc@173
PS4, Line 173: ock(lock_.get_lock());
 : if (UNLIKELY(!file_)) return fals
> I think you can merge these two lines, which also reduces the risk that som
Done


http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.cc@187
PS4, Line 187: inline
> nit: missing include, but we might generally omit this one.
Not sure which one you are referring to ? Isn't it in #include "common/names.h" 
?



[Impala-ASF-CR] IMPALA-2990: timeout unresponsive queries in coordinator

2019-04-25 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12299 )

Change subject: IMPALA-2990: timeout unresponsive queries in coordinator
..


Patch Set 9: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/12299
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I196c8c6a5633b1960e2c3a3884777be9b3824987
Gerrit-Change-Number: 12299
Gerrit-PatchSet: 9
Gerrit-Owner: Thomas Marshall 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Thu, 25 Apr 2019 18:43:32 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8295: Fix flakiness in TestAdmissionControllerStress

2019-04-24 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13103 )

Change subject: IMPALA-8295: Fix flakiness in TestAdmissionControllerStress
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/13103
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If9ada7bdbe2461d164100badabb905cf9a8cf4cc
Gerrit-Change-Number: 13103
Gerrit-PatchSet: 2
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 24 Apr 2019 19:01:43 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8295: Fix flakiness in TestAdmissionControllerStress

2019-04-24 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13103 )

Change subject: IMPALA-8295: Fix flakiness in TestAdmissionControllerStress
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13103/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13103/1//COMMIT_MSG@9
PS1, Line 9: ecounters
encounters



--
To view, visit http://gerrit.cloudera.org:8080/13103
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If9ada7bdbe2461d164100badabb905cf9a8cf4cc
Gerrit-Change-Number: 13103
Gerrit-PatchSet: 1
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 24 Apr 2019 18:32:12 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2990: timeout unresponsive queries in coordinator

2019-04-23 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12299 )

Change subject: IMPALA-2990: timeout unresponsive queries in coordinator
..


Patch Set 8:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/12299/8//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12299/8//COMMIT_MSG@34
PS8, Line 34:  
'REPORT_EXEC_STATUS_SEND:FAIL@0.1|REPORT_EXEC_STATUS_RECV:FAIL@0.1'
Stale ?


http://gerrit.cloudera.org:8080/#/c/12299/8/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/12299/8/be/src/runtime/query-state.cc@400
PS8, Line 400: FLAGS_backend_client_rpc_timeout_ms
Can you please leave a TODO on rethinking the timeout period for RPCs in 
general. As we convert more of the RPCs from Thrift To KRPC, this flag's role 
has changed.

Historically, this is used for setting socket timeout with Thrift RPC so that 
an Impala thread will pop out of the Thrift RPC stack and check for 
cancellation.


http://gerrit.cloudera.org:8080/#/c/12299/8/be/src/runtime/query-state.cc@414
PS8, Line 414: spend
spent


http://gerrit.cloudera.org:8080/#/c/12299/8/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/12299/8/be/src/service/impala-server.cc@217
PS8, Line 217: "reporting is disabled.");
and only the final report is sent.


http://gerrit.cloudera.org:8080/#/c/12299/8/be/src/service/impala-server.cc@218
PS8, Line 218: 180
To avoid regression, let's be more conservative and set it to be at least 300s, 
if not larger.


http://gerrit.cloudera.org:8080/#/c/12299/8/be/src/service/impala-server.cc@221
PS8, Line 221: The percent padding that a "
 : "coordinator should wait to recieve a report after 
--status_report_max_retry_s "
 : "before deciding that a backend is unresponsive and the 
query should be cancelled.
May be clearer to include the formula you have in the commit message.


http://gerrit.cloudera.org:8080/#/c/12299/8/be/src/service/impala-server.cc@421
PS8, Line 421:   }
May wanna check if --status_report_cancellation_padding is set to something 
sane.


http://gerrit.cloudera.org:8080/#/c/12299/8/be/src/service/impala-server.cc@2248
PS8, Line 2248:   * (1 + FLAGS_status_report_cancellation_padding / 100.0);
DCHECK_GT(max_lag_ms, 0);



--
To view, visit http://gerrit.cloudera.org:8080/12299
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I196c8c6a5633b1960e2c3a3884777be9b3824987
Gerrit-Change-Number: 12299
Gerrit-PatchSet: 8
Gerrit-Owner: Thomas Marshall 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Tue, 23 Apr 2019 22:34:25 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4356,IMPALA-7331: codegen all ScalarExprs

2019-04-22 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12797 )

Change subject: IMPALA-4356,IMPALA-7331: codegen all ScalarExprs
..


Patch Set 10:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/12797/10/be/src/exprs/scalar-expr.h
File be/src/exprs/scalar-expr.h:

http://gerrit.cloudera.org:8080/#/c/12797/10/be/src/exprs/scalar-expr.h@87
PS10, Line 87: Get*Val() dispatches to either
 : /// a codegen'd function pointer or to an interpreted 
implementation Get*ValInterpreted()
 : /// These interpreted functions must be overridden by subclasses 
of ScalarExpr for every
 : /// type that they may return.
 : ///
 : /// The two main usage patterns for ScalarExpr are:
 : /// * The codegen'd expressions are called from other codegen'd 
functions, e.g. from a
 : ///   codegen'd join implementation
 : /// * Get*Val() is called on the root of each expression subtree 
by interpreted code.
 : /// We can optimize for the second usage pattern by filling in 
the codegen'd function
 : /// pointer (codegend_compute_fn_) in root of each ScalarExpr 
tree. Individual callsites
 : /// can disable this optimisation if it's not needed. Expr 
subtrees can be evaluated
 : /// (e.g. by ScalarExprEvaluator::GetConstValue()) but may fail 
back to a slower
 : /// interpreted implementation.
These sets of comments seem to fit better after line 107-114. Reading this 
comment, it's a bit hard to understand what's being talked about without 
knowing what codegen is. It will be clearer to first do an introduction of 
codegen and its relationship with ScalarExpr and various functions before 
pointing out that Get*Val() is a wrapper which dispatches to codegen'd function 
by default but can also fall back to the interpreted version if it's not 
codegen'd.


http://gerrit.cloudera.org:8080/#/c/12797/10/be/src/exprs/scalar-expr.h@170
PS10, Line 170: bool is_codegen_entry_point,
I wonder how much measurable benefit we get from this. IMHO, this can be an 
optimization as a follow-up patch. Doesn't seem 100% necessary functionally for 
this patch.


http://gerrit.cloudera.org:8080/#/c/12797/10/be/src/exprs/scalar-expr.h@288
PS10, Line 288: If 'is_entry_point' is true, this indicates that Get*Val()
  :   /// may be called directly from interpreted code and that we 
should generate an entry
  :   /// point into the codegen'd code.
Please see comments in scalar-expr.cc. 'is_entry_point' may not need to be 
exposed.


http://gerrit.cloudera.org:8080/#/c/12797/10/be/src/exprs/scalar-expr.h@349
PS10, Line 349:   /// The vast majority of exprs support interpretation, so 
default to true.
May want to point out that expressions which aren't interpretable should 
override this function etc.

Will also help to explain more elaborately why an expression is not 
interpretable. IMHO, this concept is a bit more on the side of implementation 
details and exposing this interface is a bit unfortunate as it inter-mingles 
the already complicated concept of ScalarExpr with codegen.


http://gerrit.cloudera.org:8080/#/c/12797/10/be/src/exprs/scalar-expr.h@375
PS10, Line 375: codegend_compute_fn_ = nullptr;
Please document that this is left as null if this scalar expression is not an 
"entry point".


http://gerrit.cloudera.org:8080/#/c/12797/10/be/src/exprs/scalar-expr.cc
File be/src/exprs/scalar-expr.cc:

http://gerrit.cloudera.org:8080/#/c/12797/10/be/src/exprs/scalar-expr.cc@284
PS10, Line 284: bool is_entry_point
May be I didn't follow all the details correctly but it seems to me that all 
these plumbing for passing this flag around is only used in 
GetCodegendComputeFn().

In theory, only the top level expression (i.e. the root of the scalar 
expression tree) may require exposing itself with a function pointer after 
codegen (i.e. it's an entry point).

So, why cannot 'is_entry_point' be a property of a ScalarExpr itself ? In other 
words, a simple implementation is to set 'is_entry_point_' to true for the root 
of all ScalarExpr and leave it as false for any sub-expressions in the tree. In 
this way, we don't need to plumb this flag around. More importantly, this seems 
to be implementation specific details / optimization that we probably should 
try not to expose it in the interface of Init() if possible.


http://gerrit.cloudera.org:8080/#/c/12797/10/be/src/exprs/scalar-expr.cc@409
PS10, Line 409:   llvm::Function* static_getval_fn = 
GetStaticGetValWrapper(type(), codegen);
if (static_getval_fn == nullptr) {
   return 
}

We only get a DCHECK in debug build. Failing to codegen shouldn't be fatal.


http://gerrit.cloudera.org:8080/#/c/12797/10/be/src/exprs/scalar-expr.inline.h
File be/src/exprs/scalar-expr.inline.h:


<    1   2   3   4   5   6   7   8   9   10   >