[Impala-ASF-CR] [DOCS] An upgrade consideratiobn for IMPALA-7800
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13879 ) Change subject: [DOCS] An upgrade consideratiobn for IMPALA-7800 .. Patch Set 1: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/13879/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/13879/1//COMMIT_MSG@7 PS1, Line 7: consideratiobn typo -- To view, visit http://gerrit.cloudera.org:8080/13879 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia239fec4973722bc2e53239d3dc5df608a341d92 Gerrit-Change-Number: 13879 Gerrit-PatchSet: 1 Gerrit-Owner: Alex Rodoni Gerrit-Reviewer: Alex Rodoni Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Fri, 19 Jul 2019 22:09:41 + Gerrit-HasComments: Yes
[Impala-ASF-CR] [WIP] IMPALA-8339: Add local node blacklist to coordinators
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13868 ) Change subject: [WIP] IMPALA-8339: Add local node blacklist to coordinators .. Patch Set 1: (16 comments) Some comments on the first pass. May be helpful to document the interaction between blacklisted node and cluster membership change. http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/runtime/coordinator-backend-state.h File be/src/runtime/coordinator-backend-state.h: http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/runtime/coordinator-backend-state.h@284 PS1, Line 284: Exec() ExecQueryFInstance() http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/runtime/coordinator.cc@394 PS1, Line 394: // The Exec() rpc failed, so blacklist the node. : const TBackendDescriptor& be_desc = backend_state->exec_params()->be_desc; : ExecEnv::GetInstance()->cluster_membership_mgr()->Blacklist(be_desc); One interesting case is what if the coordinator cannot RPC to itself. I suppose the assumption we make here is that blacklisting certain nodes will prevent scheduler from scheduling scans and thus the rest of the fragments on them. The best practice is to have dedicated coordinators and executors so a coordinator usually doesn't take the role of an executor. However, for certain queries, there may always be a root fragment which will run on the coordinator. This may get better as we move the root fragments off the coordinator. http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/scheduling/cluster-membership-mgr.h File be/src/scheduling/cluster-membership-mgr.h: http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/scheduling/cluster-membership-mgr.h@96 PS1, Line 96: Ohly typo http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/scheduling/cluster-membership-mgr.h@206 PS1, Line 206: boost::mutex update_membership_lock_; Does this also need the mutable modifier like the other two locks ? Also, membership update seems to touch quite a number of fields in this class. May be it's better to clearly document the thread safety of each field. http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/scheduling/cluster-membership-mgr.h@206 PS1, Line 206: update_membership_lock_; Is there any chance this mutex may be held at the same time as various other mutex in this class ? If so, we had better document the lock ordering. http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/scheduling/cluster-membership-mgr.cc File be/src/scheduling/cluster-membership-mgr.cc: http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/scheduling/cluster-membership-mgr.cc@119 PS1, Line 119: // Get any previously blacklisted nodes that should be unblacklisted. : std::list unblacklisted = node_blacklist_.Maintenance(); I suppose this relies on StateStore being alive and keeps sending periodic cluster membership update. It's a fair assumption that StateStore should recover within a reasonable amount of time. http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/scheduling/cluster-membership-mgr.cc@168 PS1, Line 168: // Deleted item How would this work if the deleted item is in the list unblacklisted ? http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/scheduling/cluster-membership-mgr.cc@175 PS1, Line 175: !blacklisted) { How would it work for a node which is already removed from the executor group due to blacklisting ? Should we remove it altogether from the blacklist if the cluster memebership update indicates this is already deleted ? Otherwise, won't it eventually get into the probation state and get added back with the stale BE descriptor ? In theory, a blacklisted node can get removed from the cluster and restarts with the same or different IP address. http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/scheduling/cluster-membership-mgr.cc@265 PS1, Line 265: for (const TBackendDescriptor& be_desc : unblacklisted) { How would this work if the cluster membership in the past indicates that this node is removed from the cluster already ? This probably warrants some more detailed documentation. http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/scheduling/node-blacklist.h File be/src/scheduling/node-blacklist.h: http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/scheduling/node-blacklist.h@30 PS1, Line 30: Class to maintain a local blacklist of backends. Can you please describe the thread safety of this class ? Is there any assumption about the locks held by caller when calling into functions of this class ? http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/scheduling/node-blacklist.h@38 PS1, Line 38: come typo http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/scheduling/node-blacklist.h@48 PS1, Line 48: class NodeBlacklist { Is this class called only from the ClusterMembershipMgr class
[Impala-ASF-CR] IMPALA-8656: Re-factor PlanRootSink into blocking and buffered implementations
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13873 ) Change subject: IMPALA-8656: Re-factor PlanRootSink into blocking and buffered implementations .. Patch Set 7: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/13873 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8786b1a9af68ab0a8a094970d8f955eb20d04bca Gerrit-Change-Number: 13873 Gerrit-PatchSet: 7 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 18 Jul 2019 19:18:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8656: Re-factor PlanRootSink into blocking and buffered implementations
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13873 ) Change subject: IMPALA-8656: Re-factor PlanRootSink into blocking and buffered implementations .. Patch Set 6: Code-Review+2 (2 comments) http://gerrit.cloudera.org:8080/#/c/13873/6/be/src/exec/buffered-plan-root-sink.h File be/src/exec/buffered-plan-root-sink.h: http://gerrit.cloudera.org:8080/#/c/13873/6/be/src/exec/buffered-plan-root-sink.h@24 PS6, Line 24: /// PlanRootSink that buffers RowBatches from the 'sender' (fragment) thread. RowBatches : /// are buffered in memory until a memory limit is hit. This is just for the initial implementation only, right ? We kind need to support spilling to disk eventually so we can produce all the possible result rows from the underlying plan tree. As it's described now, it's not completely solving the problem of a client not fetching all result rows. http://gerrit.cloudera.org:8080/#/c/13873/6/be/src/exec/plan-root-sink.h File be/src/exec/plan-root-sink.h: http://gerrit.cloudera.org:8080/#/c/13873/6/be/src/exec/plan-root-sink.h@a67 PS6, Line 67: : : : : : : : : nit: these are part of the interfaces of DataSink. It seems clearer to just leave them here as pure virtual functions. -- To view, visit http://gerrit.cloudera.org:8080/13873 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8786b1a9af68ab0a8a094970d8f955eb20d04bca Gerrit-Change-Number: 13873 Gerrit-PatchSet: 6 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 18 Jul 2019 04:58:46 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8543: More diagnostics for TAcceptQueueServer
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13790 ) Change subject: IMPALA-8543: More diagnostics for TAcceptQueueServer .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/13790/1/be/src/rpc/TAcceptQueueServer.cpp File be/src/rpc/TAcceptQueueServer.cpp: http://gerrit.cloudera.org:8080/#/c/13790/1/be/src/rpc/TAcceptQueueServer.cpp@255 PS1, Line 255: // Insert thre > Maybe make this two separate variables, eg. setup_timer and thread_wait_tim Done http://gerrit.cloudera.org:8080/#/c/13790/1/be/src/rpc/TAcceptQueueServer.cpp@293 PS1, Line 293: ient $ > Can this one also be made const? Done http://gerrit.cloudera.org:8080/#/c/13790/1/be/src/rpc/TAcceptQueueServer.cpp@406 PS1, Line 406: > maybe label these, eg: /* highest_trackable_value */ Done -- To view, visit http://gerrit.cloudera.org:8080/13790 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I33b32352b457a2c8ec7bae6da46bb9c555dc9c36 Gerrit-Change-Number: 13790 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Wed, 10 Jul 2019 01:26:50 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8543: More diagnostics for TAcceptQueueServer
Michael Ho has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/13790 ) Change subject: IMPALA-8543: More diagnostics for TAcceptQueueServer .. IMPALA-8543: More diagnostics for TAcceptQueueServer This change adds more logging information in TAcceptQueueServer to help diagnose issues at various stages of client connections establishment. Two new metrics are also added to measure the connection setup time and the wait time for service threads to be available. Change-Id: I33b32352b457a2c8ec7bae6da46bb9c555dc9c36 --- M be/src/rpc/TAcceptQueueServer.cpp M be/src/rpc/TAcceptQueueServer.h M common/thrift/metrics.json 3 files changed, 189 insertions(+), 28 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/13790/2 -- To view, visit http://gerrit.cloudera.org:8080/13790 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I33b32352b457a2c8ec7bae6da46bb9c555dc9c36 Gerrit-Change-Number: 13790 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-8732: Use a serialized descriptor table in TQueryCtx
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13772 ) Change subject: IMPALA-8732: Use a serialized descriptor table in TQueryCtx .. Patch Set 4: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/13772/4/be/src/runtime/descriptors.h File be/src/runtime/descriptors.h: http://gerrit.cloudera.org:8080/#/c/13772/4/be/src/runtime/descriptors.h@539 PS4, Line 539: WARN_UNUSED_RESULT; Do DeserializeThrift() and CreateTblDecriptorInternal() also need this ? -- To view, visit http://gerrit.cloudera.org:8080/13772 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I458aa62dd4d1e4e4a7b1869a604623a69f3b2d9a Gerrit-Change-Number: 13772 Gerrit-PatchSet: 4 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Wed, 10 Jul 2019 00:34:45 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8748: Always pass hostname to RpcMgr::GetProxy()
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13818 ) Change subject: IMPALA-8748: Always pass hostname to RpcMgr::GetProxy() .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/13818 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I85b661c8c3b3b67bfc1ce9e29aecb90a862666c0 Gerrit-Change-Number: 13818 Gerrit-PatchSet: 4 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 09 Jul 2019 05:07:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8748: Always pass hostname to RpcMgr::GetProxy()
Hello Thomas Tauber-Marshall, Todd Lipcon, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/13818 to look at the new patch set (#4). Change subject: IMPALA-8748: Always pass hostname to RpcMgr::GetProxy() .. IMPALA-8748: Always pass hostname to RpcMgr::GetProxy() This change fixes some callers of RpcMgr::GetProxy() tp pass the hostname instead of the resolved IP adddress. The hostname passed to RpcMgr::GetProxy() is used to construct the principal name of the remote destination when Kerberos is enabled. Change-Id: I85b661c8c3b3b67bfc1ce9e29aecb90a862666c0 --- M be/src/rpc/rpc-mgr.h M be/src/runtime/coordinator-backend-state.cc M be/src/service/client-request-state.cc 3 files changed, 13 insertions(+), 13 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/18/13818/4 -- To view, visit http://gerrit.cloudera.org:8080/13818 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I85b661c8c3b3b67bfc1ce9e29aecb90a862666c0 Gerrit-Change-Number: 13818 Gerrit-PatchSet: 4 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Todd Lipcon
[Impala-ASF-CR] IMPALA-8748: Always pass hostname to RpcMgr::GetProxy()
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13818 ) Change subject: IMPALA-8748: Always pass hostname to RpcMgr::GetProxy() .. Patch Set 2: Code-Review+2 Carry Todd's +2 -- To view, visit http://gerrit.cloudera.org:8080/13818 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I85b661c8c3b3b67bfc1ce9e29aecb90a862666c0 Gerrit-Change-Number: 13818 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 09 Jul 2019 05:04:33 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8748: Always pass hostname to RpcMgr::GetProxy()
Michael Ho has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/13818 ) Change subject: IMPALA-8748: Always pass hostname to RpcMgr::GetProxy() .. IMPALA-8748: Always pass hostname to RpcMgr::GetProxy() This change fixes some callers of RpcMgr::GetProxy() tp pass the hostname instead of the resolved IP adddress. A new DCHECK is added to make sure calleres are not passing the resolved IP address as hostname. The hostname passed to RpcMgr::GetProxy() is used to construct the principal name of the remote destination when Kerberos is enabled. Change-Id: I85b661c8c3b3b67bfc1ce9e29aecb90a862666c0 --- M be/src/rpc/rpc-mgr.h M be/src/runtime/coordinator-backend-state.cc M be/src/service/client-request-state.cc 3 files changed, 13 insertions(+), 13 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/18/13818/2 -- To view, visit http://gerrit.cloudera.org:8080/13818 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I85b661c8c3b3b67bfc1ce9e29aecb90a862666c0 Gerrit-Change-Number: 13818 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Todd Lipcon
[Impala-ASF-CR] IMPALA-8748: Always pass hostname to RpcMgr::GetProxy()
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13818 ) Change subject: IMPALA-8748: Always pass hostname to RpcMgr::GetProxy() .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/13818/1/be/src/rpc/rpc-mgr.h File be/src/rpc/rpc-mgr.h: http://gerrit.cloudera.org:8080/#/c/13818/1/be/src/rpc/rpc-mgr.h@148 PS1, Line 148: has to match > if Kerberos is enabled Done http://gerrit.cloudera.org:8080/#/c/13818/1/be/src/rpc/rpc-mgr.h@149 PS1, Line 149: f Kerberos is enabled. 'P' must descend fr > The "usually" here seems to suggest that the DCHECK could fail in some case Removed in PS2. Please see replies in rpc-mgr.inline.h http://gerrit.cloudera.org:8080/#/c/13818/1/be/src/rpc/rpc-mgr.inline.h File be/src/rpc/rpc-mgr.inline.h: http://gerrit.cloudera.org:8080/#/c/13818/1/be/src/rpc/rpc-mgr.inline.h@44 PS1, Line 44: // Talk to self via loopback. > I don't think this DCHECK is valid -- it's OK to run things in an environme Yes, was having a hard time coming up with a DCHECK which works for all cases. I suspect Impala may fail elsewhere if DNS is disabled but there is nothing from stopping a user from configuring the hostname as 1.2.3.4 for whatever reason. Anyhow, removing it for now. FWIW, I confirmed that this DCHECK actually catches the bug which this patch is trying to fix. -- To view, visit http://gerrit.cloudera.org:8080/13818 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I85b661c8c3b3b67bfc1ce9e29aecb90a862666c0 Gerrit-Change-Number: 13818 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 09 Jul 2019 05:03:55 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8744: Fix test session expiration.py to work with Python 2.6
Michael Ho has uploaded this change for review. ( http://gerrit.cloudera.org:8080/13819 Change subject: IMPALA-8744: Fix test_session_expiration.py to work with Python 2.6 .. IMPALA-8744: Fix test_session_expiration.py to work with Python 2.6 This patches fixes a formatting string to use "{0}" rather than "{}" as Python 2.6 doesn't support "{}" Testing done: - Ran the test on Centos6 platform with Python 2.6 Change-Id: I3a914a9ca881181d47bee68641b9c7affbfc0d54 --- M tests/custom_cluster/test_session_expiration.py 1 file changed, 1 insertion(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/13819/1 -- To view, visit http://gerrit.cloudera.org:8080/13819 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I3a914a9ca881181d47bee68641b9c7affbfc0d54 Gerrit-Change-Number: 13819 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho
[Impala-ASF-CR] IMPALA-8748: Always pass hostname to RpcMgr::GetProxy()
Michael Ho has uploaded this change for review. ( http://gerrit.cloudera.org:8080/13818 Change subject: IMPALA-8748: Always pass hostname to RpcMgr::GetProxy() .. IMPALA-8748: Always pass hostname to RpcMgr::GetProxy() This change fixes some callers of RpcMgr::GetProxy() tp pass the hostname instead of the resolved IP adddress. A new DCHECK is added to make sure calleres are not passing the resolved IP address as hostname. The hostname passed to RpcMgr::GetProxy() is used to construct the principal name of the remote destination when Kerberos is enabled. Change-Id: I85b661c8c3b3b67bfc1ce9e29aecb90a862666c0 --- M be/src/rpc/rpc-mgr.h M be/src/rpc/rpc-mgr.inline.h M be/src/runtime/coordinator-backend-state.cc M be/src/service/client-request-state.cc 4 files changed, 17 insertions(+), 13 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/18/13818/1 -- To view, visit http://gerrit.cloudera.org:8080/13818 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I85b661c8c3b3b67bfc1ce9e29aecb90a862666c0 Gerrit-Change-Number: 13818 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho
[Impala-ASF-CR] IMPALA-8341: [DOCS] Added a note about the experiemental feature
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13788 ) Change subject: IMPALA-8341: [DOCS] Added a note about the experiemental feature .. Patch Set 1: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/13788/1/docs/topics/impala_data_cache.xml File docs/topics/impala_data_cache.xml: http://gerrit.cloudera.org:8080/#/c/13788/1/docs/topics/impala_data_cache.xml@64 PS1, Line 64: The specified directories must exist in the local filesystem of each Impala Daemon. or Impala will fail to start. -- To view, visit http://gerrit.cloudera.org:8080/13788 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If9f5795a28738f7df435a3bea06ffa1a216fd04e Gerrit-Change-Number: 13788 Gerrit-PatchSet: 1 Gerrit-Owner: Alex Rodoni Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Wed, 03 Jul 2019 01:28:49 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8732: Use a serialized descriptor table in TQueryCtx
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13772 ) Change subject: IMPALA-8732: Use a serialized descriptor table in TQueryCtx .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/13772/1/be/src/runtime/coordinator-backend-state.cc File be/src/runtime/coordinator-backend-state.cc: http://gerrit.cloudera.org:8080/#/c/13772/1/be/src/runtime/coordinator-backend-state.cc@200 PS1, Line 200: sidecar.__set_query_ctx(query_ctx()); Makes me wonder if we should factor out the descriptor table from query_ctx() and send the entire serialized descriptor table byte array as a sidecar instead to avoid another malloc + copy in line 213 below. If the above makes sense, it could be a follow-up change to make backporting easier. http://gerrit.cloudera.org:8080/#/c/13772/1/be/src/runtime/descriptors.h File be/src/runtime/descriptors.h: http://gerrit.cloudera.org:8080/#/c/13772/1/be/src/runtime/descriptors.h@480 PS1, Line 480: thrift_tbl stale http://gerrit.cloudera.org:8080/#/c/13772/1/be/src/runtime/descriptors.h@485 PS1, Line 485: thrift_tbl stale -- To view, visit http://gerrit.cloudera.org:8080/13772 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I458aa62dd4d1e4e4a7b1869a604623a69f3b2d9a Gerrit-Change-Number: 13772 Gerrit-PatchSet: 1 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Wed, 03 Jul 2019 00:48:36 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8543: More diagnostics for TAcceptQueueServer
Michael Ho has uploaded this change for review. ( http://gerrit.cloudera.org:8080/13790 Change subject: IMPALA-8543: More diagnostics for TAcceptQueueServer .. IMPALA-8543: More diagnostics for TAcceptQueueServer This change adds more logging information in TAcceptQueueServer to help diagnose issues at various stages of client connections establishment. Two new metrics are also added to measure the connection setup time and the wait time for service threads to be available. Change-Id: I33b32352b457a2c8ec7bae6da46bb9c555dc9c36 --- M be/src/rpc/TAcceptQueueServer.cpp M be/src/rpc/TAcceptQueueServer.h M common/thrift/metrics.json 3 files changed, 180 insertions(+), 22 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/13790/1 -- To view, visit http://gerrit.cloudera.org:8080/13790 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I33b32352b457a2c8ec7bae6da46bb9c555dc9c36 Gerrit-Change-Number: 13790 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho
[Impala-ASF-CR] IMPALA-8427: [DOCS] Document the new startup flag IMPALA-7800 introduced
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13762 ) Change subject: IMPALA-8427: [DOCS] Document the new startup flag IMPALA-7800 introduced .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/13762 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1a4c1975721c32a78a003d91babc5d2bb90f3949 Gerrit-Change-Number: 13762 Gerrit-PatchSet: 2 Gerrit-Owner: Alex Rodoni Gerrit-Reviewer: Alex Rodoni Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Tue, 02 Jul 2019 01:25:30 + Gerrit-HasComments: No
[Impala-ASF-CR] Update squeasel to 7973705170f4744d1806e32695f7ea1e8308ee95
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13768 ) Change subject: Update squeasel to 7973705170f4744d1806e32695f7ea1e8308ee95 .. Patch Set 1: Code-Review+2 Didn't look too much into the diff in squeasel files as they are mostly copy-and-paste from upstream I assume. LGTM otherwise. Please fix the formatting nit in webserver.cc -- To view, visit http://gerrit.cloudera.org:8080/13768 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I17f90561e0ea6b0917fff51b055225060a4fa549 Gerrit-Change-Number: 13768 Gerrit-PatchSet: 1 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Mon, 01 Jul 2019 21:09:29 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8427: [DOCS] Document the new startup flag IMPALA-7800 introduced
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13762 ) Change subject: IMPALA-8427: [DOCS] Document the new startup flag IMPALA-7800 introduced .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/13762/1/docs/topics/impala_client.xml File docs/topics/impala_client.xml: http://gerrit.cloudera.org:8080/#/c/13762/1/docs/topics/impala_client.xml@115 PS1, Line 115: >--accepted_client_cnxn_timeout > 0, May want to add that this is in seconds. -- To view, visit http://gerrit.cloudera.org:8080/13762 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1a4c1975721c32a78a003d91babc5d2bb90f3949 Gerrit-Change-Number: 13762 Gerrit-PatchSet: 1 Gerrit-Owner: Alex Rodoni Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Fri, 28 Jun 2019 22:39:15 + Gerrit-HasComments: Yes
[Impala-ASF-CR] Update kudu/security from 9ebcb77aa911aae76c48e717af24e643cb81908d
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13767 ) Change subject: Update kudu/security from 9ebcb77aa911aae76c48e717af24e643cb81908d .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/13767 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie7c91193fd49f8ca1234b23cf61fc90c1fdbe2e0 Gerrit-Change-Number: 13767 Gerrit-PatchSet: 1 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Fri, 28 Jun 2019 22:37:41 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6159 / KUDU-2192: Enable TCP keepalive for all outbound connections
Michael Ho has removed Kudu Jenkins from this change. ( http://gerrit.cloudera.org:8080/13764 ) Change subject: IMPALA-6159 / KUDU-2192: Enable TCP keepalive for all outbound connections .. Removed reviewer Kudu Jenkins. -- To view, visit http://gerrit.cloudera.org:8080/13764 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: deleteReviewer Gerrit-Change-Id: Iaa1d66d83aea1cc82d07fc6217be5fc1306695bc Gerrit-Change-Number: 13764 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon
[Impala-ASF-CR] IMPALA-6159 / KUDU-2192: Enable TCP keepalive for all outbound connections
Hello Alexey Serbin, Kudu Jenkins, Todd Lipcon, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/13764 to review the following change. Change subject: IMPALA-6159 / KUDU-2192: Enable TCP keepalive for all outbound connections .. IMPALA-6159 / KUDU-2192: Enable TCP keepalive for all outbound connections This change enables TCP keepalive for all outbound connections. This aims to handle cases in which the remote peer may have dropped off the network without sending a TCP RST. For instance, a remote host could have hit a kernel panic and got power cycled. In which case, the existing TCP connection to that host may be stale. In an idle cluster, this stale connection may not be detected until the next use of it, in which case it will result in a RPC failure due to TCP RST sent from the restarted peer. By enabling TCP keepalive, we ensure that stale TCP connections in an idle cluster will be detected and closed within a time bound so a new connection will be created on the next use. This change introduces 3 different flags: --tcp_keepalive_probe_period_s: the duration in seconds a TCP connection has to be idle before keepalive probes started to be sent. --tcp_keepalive_retry_period_s: the duration in seconds between successive keepalive probes if previous probes didn't get an ACK from remote peer. --tcp_keepalive_retry_count: the maximum number of TCP keepalive probes sent without an ACK before declaring the remote peer as dead. Testing: - Used TCP dump to verify that keepalive probes are being sent periodically. - Verified that blocking all incoming traffic to a server's port via an iptable rule caused the TCP connection to be closed and the keepalive probes to stop eventually. Change-Id: Iaa1d66d83aea1cc82d07fc6217be5fc1306695bc Reviewed-on: http://gerrit.cloudera.org:8080/13702 Reviewed-by: Alexey Serbin Reviewed-by: Todd Lipcon Tested-by: Kudu Jenkins --- M be/src/kudu/rpc/connection.cc M be/src/kudu/rpc/connection.h M be/src/kudu/rpc/reactor.cc M be/src/kudu/rpc/reactor.h M be/src/kudu/rpc/rpc-test.cc M be/src/kudu/util/net/socket.cc M be/src/kudu/util/net/socket.h 7 files changed, 112 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/13764/1 -- To view, visit http://gerrit.cloudera.org:8080/13764 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Iaa1d66d83aea1cc82d07fc6217be5fc1306695bc Gerrit-Change-Number: 13764 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Todd Lipcon
[Impala-ASF-CR] IMPALA-8341: [DOCS] Describe the setting for remote data caching
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13724 ) Change subject: IMPALA-8341: [DOCS] Describe the setting for remote data caching .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/13724 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7dd958e4de109b46eaf906fe93145799af123b3f Gerrit-Change-Number: 13724 Gerrit-PatchSet: 3 Gerrit-Owner: Alex Rodoni Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Wed, 26 Jun 2019 18:45:27 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8341: [DOCS] Describe the settings for remote data caching
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13724 ) Change subject: IMPALA-8341: [DOCS] Describe the settings for remote data caching .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/13724/1/docs/topics/impala_data_cache.xml File docs/topics/impala_data_cache.xml: http://gerrit.cloudera.org:8080/#/c/13724/1/docs/topics/impala_data_cache.xml@44 PS1, Line 44: --data_cache_dir --data_cache_dir and --data_cache_size are options built specifically only for ./bin/start-impala-cluster.py as that script needs to create extra sub-directories for the caching directory. To specify the caching directory, the user should use the flag: --data_cache=,,: With the above configuration, data will be stored in , and respectively. The user needs to make sure those directories exist in the local filesystem to begin with. In addition, the filesystem which the directory resides in must support hole punching. Modern filesystems such as ext4 and xfs support this feature. The cache may consume up to bytes for each of the directories specified. In other words, with the above configuration, the total cache size can be up to 3 * . Please see https://github.com/apache/impala/blob/master/be/src/runtime/io/disk-io-mgr.cc#L58-L63 for the definition of the flag. -- To view, visit http://gerrit.cloudera.org:8080/13724 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7dd958e4de109b46eaf906fe93145799af123b3f Gerrit-Change-Number: 13724 Gerrit-PatchSet: 1 Gerrit-Owner: Alex Rodoni Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Tue, 25 Jun 2019 17:50:09 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7802: Close connections of idle client sessions
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13607 ) Change subject: IMPALA-7802: Close connections of idle client sessions .. Patch Set 4: Code-Review+2 Carry Thomas' +2 -- To view, visit http://gerrit.cloudera.org:8080/13607 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I97c4fb8e1b741add273f8a913fb0967303683e38 Gerrit-Change-Number: 13607 Gerrit-PatchSet: 4 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Manish Maheshwari Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Thu, 20 Jun 2019 18:02:56 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7802: Close connections of idle client sessions
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13607 ) Change subject: IMPALA-7802: Close connections of idle client sessions .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/13607/2/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/13607/2/be/src/service/impala-server.cc@219 PS2, Line 219: DEFINE_int32(idle_client_poll_period_ms, 3, "The poll period, in milliseconds, after " > Maybe make this seconds not ms? I don't think there's a reason anyone would Done -- To view, visit http://gerrit.cloudera.org:8080/13607 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I97c4fb8e1b741add273f8a913fb0967303683e38 Gerrit-Change-Number: 13607 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Manish Maheshwari Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Thu, 20 Jun 2019 17:58:17 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7802: Close connections of idle client sessions
Michael Ho has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/13607 ) Change subject: IMPALA-7802: Close connections of idle client sessions .. IMPALA-7802: Close connections of idle client sessions Previously, if idle session timeout is set either via startup flag or query options, a client session will expire after that set period of inactivity. However, the network connection and the service thread of an expired session will still be around until the session is closed by the client. This is highly undesirable as these idle sessions still count towards the quota bound by --fe_esrvice_threads, so if the total number of sessions (including the idle ones) reaches that upper bound, all incoming new session will block until some of the existing sessions exit. There is no time bound on when those expired sessions will be closed. In some sense, leaving many idle sessions opened is a denial-of-service attack on Impala. This change implements support for closing expired client sessions. In particular, a new flag --idle_client_poll_time_s is added to specify a time interval in seconds of client's inactivity which will cause an idle service thread of a client connection to wake up and check if all sessions associated with the connection are idle. If so, the connection will be closed. This allows the service threads to be freed up without waiting for client to close the connections. Testing done: - core build - new targeted test which verifies the connections of expired sessions are closed. - verified the flags function as expected in a secure cluster with Kerberos + SSL Change-Id: I97c4fb8e1b741add273f8a913fb0967303683e38 --- M be/src/rpc/TAcceptQueueServer.cpp M be/src/rpc/TAcceptQueueServer.h M be/src/rpc/thrift-server.cc M be/src/rpc/thrift-server.h M be/src/rpc/thrift-util.cc M be/src/rpc/thrift-util.h M be/src/runtime/client-cache.h M be/src/service/impala-server.cc M be/src/service/impala-server.h M tests/custom_cluster/test_hs2.py M tests/custom_cluster/test_session_expiration.py 11 files changed, 329 insertions(+), 94 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/13607/4 -- To view, visit http://gerrit.cloudera.org:8080/13607 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I97c4fb8e1b741add273f8a913fb0967303683e38 Gerrit-Change-Number: 13607 Gerrit-PatchSet: 4 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Manish Maheshwari Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoram Thanga
[Impala-ASF-CR] IMPALA-7802: Close connections of idle client sessions
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13607 ) Change subject: IMPALA-7802: Close connections of idle client sessions .. Patch Set 3: Code-Review+1 Carry Tim's +1. -- To view, visit http://gerrit.cloudera.org:8080/13607 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I97c4fb8e1b741add273f8a913fb0967303683e38 Gerrit-Change-Number: 13607 Gerrit-PatchSet: 3 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Manish Maheshwari Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Tue, 18 Jun 2019 21:48:04 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7467: Port ExecQueryFInstances to krpc
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13583 ) Change subject: IMPALA-7467: Port ExecQueryFInstances to krpc .. Patch Set 3: Code-Review+2 (2 comments) http://gerrit.cloudera.org:8080/#/c/13583/2/be/src/runtime/backend-client.h File be/src/runtime/backend-client.h: http://gerrit.cloudera.org:8080/#/c/13583/2/be/src/runtime/backend-client.h@48 PS2, Line 48: /// Callers of TransmitData() should provide their own counter to measure the data : /// transmission time. : void SetTransmitDataCounter(RuntimeProfile::ConcurrentTimerCounter* csw) { : DCHECK(transmit_csw_ == NULL); : transmit_csw_ = csw; : } : : /// ImpalaBackendClient is shared by multiple queries. It's the caller's responsibility : /// to reset the counter after data transmission. : void ResetTransmitDataCounter() { : transmit_csw_ = NULL; : } Not your change but I believe these are not needed anymore. May as well delete them while you are here. http://gerrit.cloudera.org:8080/#/c/13583/2/common/protobuf/control_service.proto File common/protobuf/control_service.proto: http://gerrit.cloudera.org:8080/#/c/13583/2/common/protobuf/control_service.proto@235 PS2, Line 235: 5 Why the jump from 2 to 5 ? -- To view, visit http://gerrit.cloudera.org:8080/13583 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id3f1c6099109bd8e5361188005a7d0e892147570 Gerrit-Change-Number: 13583 Gerrit-PatchSet: 3 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Tue, 18 Jun 2019 20:37:00 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8542. Add an access trace for the data cache
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13425 ) Change subject: IMPALA-8542. Add an access trace for the data cache .. Patch Set 8: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/13425 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c Gerrit-Change-Number: 13425 Gerrit-PatchSet: 8 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 18 Jun 2019 06:05:10 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7802: Close connections of idle client sessions
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13607 ) Change subject: IMPALA-7802: Close connections of idle client sessions .. Patch Set 2: (10 comments) http://gerrit.cloudera.org:8080/#/c/13607/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/13607/1//COMMIT_MSG@30 PS1, Line 30: > I do wonder if we should try to handle the specific case where a connection IIRC, most of the problem involving load balancer we saw in the past never quite made it to the point of completing the connection establishment handshake. In particular, most of them were stuck in the stage of SASL / Kerberos step. There is already a timeout for that step although it's a bit high by default (5 mins) but that's to prevent false positive of closing client connection when KDC may be overloaded. http://gerrit.cloudera.org:8080/#/c/13607/1/be/src/rpc/TAcceptQueueServer.cpp File be/src/rpc/TAcceptQueueServer.cpp: http://gerrit.cloudera.org:8080/#/c/13607/1/be/src/rpc/TAcceptQueueServer.cpp@78 PS1, Line 78: // Setting a socket timeout for process() may lead to false positive > Ok, makes sense. Maybe leave a comment here to explain that? Done http://gerrit.cloudera.org:8080/#/c/13607/1/be/src/rpc/thrift-server.h File be/src/rpc/thrift-server.h: http://gerrit.cloudera.org:8080/#/c/13607/1/be/src/rpc/thrift-server.h@181 PS1, Line 181: /// Called by the Thrift server implementation when it has acquired its resources and > line too long (92 > 90) Done http://gerrit.cloudera.org:8080/#/c/13607/1/be/src/rpc/thrift-server.h@182 PS1, Line 182: /// is ready to serve, and signals to StartAndWaitForServer that start-up is finished. > line too long (92 > 90) Done http://gerrit.cloudera.org:8080/#/c/13607/1/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/13607/1/be/src/service/impala-server.cc@221 PS1, Line 221: idle_session_timeout > Understood. It would be good to make it clear in the docs that idle_client_ Points taken. A doc bug will be filed once this patch is in. http://gerrit.cloudera.org:8080/#/c/13607/1/be/src/service/impala-server.cc@2072 PS1, Line 2072: unique_lock l(connection_to_sessions_map_lock_); > What happens if the entry is present but the set is empty? Shouldn't that b DCHECK added. http://gerrit.cloudera.org:8080/#/c/13607/1/be/src/service/impala-server.cc@2074 PS1, Line 2074: > Why not session_ids = it->second? To avoid the double-lookup. Done http://gerrit.cloudera.org:8080/#/c/13607/1/be/src/service/impala-server.cc@2086 PS1, Line 2086: // Check if all the sessions associated with the connection are idle. > Minor thing, but this copy increments the refcount, and I don't think it's Done http://gerrit.cloudera.org:8080/#/c/13607/1/tests/custom_cluster/test_session_expiration.py File tests/custom_cluster/test_session_expiration.py: http://gerrit.cloudera.org:8080/#/c/13607/1/tests/custom_cluster/test_session_expiration.py@119 PS1, Line 119: @ > flake8: E303 too many blank lines (2) Done http://gerrit.cloudera.org:8080/#/c/13607/1/tests/custom_cluster/test_session_expiration.py@122 PS1, Line 122: def test_closing_idle_connection(self, vector): > I guess it's still expected that these will stay open, so maybe we don't ne Done -- To view, visit http://gerrit.cloudera.org:8080/13607 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I97c4fb8e1b741add273f8a913fb0967303683e38 Gerrit-Change-Number: 13607 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Manish Maheshwari Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Mon, 17 Jun 2019 20:48:13 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7802: Close connections of idle client sessions
Michael Ho has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/13607 ) Change subject: IMPALA-7802: Close connections of idle client sessions .. IMPALA-7802: Close connections of idle client sessions Previously, if idle session timeout is set either via startup flag or query options, a client session will expire after that set period of inactivity. However, the network connection and the service thread of an expired session will still be around until the session is closed by the client. This is highly undesirable as these idle sessions still count towards the quota bound by --fe_esrvice_threads, so if the total number of sessions (including the idle ones) reaches that upper bound, all incoming new session will block until some of the existing sessions exit. There is no time bound on when those expired sessions will be closed. In some sense, leaving many idle sessions opened is a denial-of-service attack on Impala. This change implements support for closing expired client sessions. In particular, a new flag --idle_client_poll_time_ms is added to specify a time interval in milliseconds of client's inactivity which will cause an idle service thread of a client connection to wake up and check if all sessions associated with the connection are idle. If so, the connection will be closed. This allows the service threads to be freed up without waiting for client to close the connections. Testing done: - core build - new targeted test which verifies the connections of expired sessions are closed. Change-Id: I97c4fb8e1b741add273f8a913fb0967303683e38 --- M be/src/rpc/TAcceptQueueServer.cpp M be/src/rpc/TAcceptQueueServer.h M be/src/rpc/thrift-server.cc M be/src/rpc/thrift-server.h M be/src/rpc/thrift-util.cc M be/src/rpc/thrift-util.h M be/src/service/impala-server.cc M be/src/service/impala-server.h M tests/custom_cluster/test_hs2.py M tests/custom_cluster/test_session_expiration.py 10 files changed, 315 insertions(+), 85 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/13607/2 -- To view, visit http://gerrit.cloudera.org:8080/13607 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I97c4fb8e1b741add273f8a913fb0967303683e38 Gerrit-Change-Number: 13607 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Manish Maheshwari Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoram Thanga
[Impala-ASF-CR] IMPALA-8542. Add an access trace for the data cache
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13425 ) Change subject: IMPALA-8542. Add an access trace for the data cache .. Patch Set 5: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/13425/3/be/src/runtime/io/data-cache.cc File be/src/runtime/io/data-cache.cc: http://gerrit.cloudera.org:8080/#/c/13425/3/be/src/runtime/io/data-cache.cc@98 PS3, Line 98: "impala-cache-file-"; > it currently creates the trace file with open() flags to truncate the exist Yes, makes sense. Just wanna make sure those files won't accidentally pile up over time. -- To view, visit http://gerrit.cloudera.org:8080/13425 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c Gerrit-Change-Number: 13425 Gerrit-PatchSet: 5 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 14 Jun 2019 04:59:43 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7802: Close connections of idle client sessions
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13607 ) Change subject: IMPALA-7802: Close connections of idle client sessions .. Patch Set 1: (4 comments) http://gerrit.cloudera.org:8080/#/c/13607/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/13607/1//COMMIT_MSG@28 PS1, Line 28: If so, the connection will be closed. This allows the service threads > I was trying to understand this scenario: If there are no open sessions associated with a connection, the connection_to_session state map shouldn't find an entry if I understand the code correctly. In that case, we will by default mark the connection as not idle. Please see https://gerrit.cloudera.org/#/c/13607/1/be/src/service/impala-server.cc@2072 Given the above, I am not sure I understand the concern about setting it to a very low value. http://gerrit.cloudera.org:8080/#/c/13607/1//COMMIT_MSG@30 PS1, Line 30: > One thing I didn't understand about the scope of this - are we trying to ha It's the former for sure. Will update commit message to make it clearer. http://gerrit.cloudera.org:8080/#/c/13607/1/be/src/rpc/TAcceptQueueServer.cpp File be/src/rpc/TAcceptQueueServer.cpp: http://gerrit.cloudera.org:8080/#/c/13607/1/be/src/rpc/TAcceptQueueServer.cpp@78 PS1, Line 78: if (!processor_->process(input_, output_, connectionContext) || > Is it possible that the thread can get stuck in either the process() or pro Yes but that's not the intention of this patch to address those cases. It's hard to tell between a slow network/client vs a stuck client so setting a timeout here may sometimes lead to false positive and prematurely closes a client's connection. A better thing to do instead of a socket timeout is to use TCP keepalive or something similar to TCP_USER_TIMEOUT to check the remote end is still alive. http://gerrit.cloudera.org:8080/#/c/13607/1/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/13607/1/be/src/service/impala-server.cc@221 PS1, Line 221: idle_session_timeout > So if idle_session_timeout is set to 10 minutes and idle_client_poll_period To the first point, yes, that's potentially the latest time the connection of an idle session will be closed. Nope. If idle_session_timeout is not set for a session, a session cannot become idle by definition. Please keep in mind that idle_session_timeout can be set per session during session establishment via query option. The global flag --idle_session_timeout is mostly there to set the default value. -- To view, visit http://gerrit.cloudera.org:8080/13607 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I97c4fb8e1b741add273f8a913fb0967303683e38 Gerrit-Change-Number: 13607 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Anonymous Coward (443) Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Fri, 14 Jun 2019 04:46:08 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7467: Ported ExecQueryFInstances over to krpc
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13583 ) Change subject: IMPALA-7467: Ported ExecQueryFInstances over to krpc .. Patch Set 1: (9 comments) http://gerrit.cloudera.org:8080/#/c/13583/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/13583/1//COMMIT_MSG@9 PS1, Line 9: ExecQuerryFInstances ExecQueryFInstances http://gerrit.cloudera.org:8080/#/c/13583/1/be/src/runtime/coordinator-backend-state.cc File be/src/runtime/coordinator-backend-state.cc: http://gerrit.cloudera.org:8080/#/c/13583/1/be/src/runtime/coordinator-backend-state.cc@176 PS1, Line 176: VLOG_QUERY LOG(ERROR) ? http://gerrit.cloudera.org:8080/#/c/13583/1/be/src/runtime/coordinator-backend-state.cc@209 PS1, Line 209: serialized_buf May help to add a remark about the ownership of this buffer. http://gerrit.cloudera.org:8080/#/c/13583/1/be/src/runtime/coordinator-backend-state.cc@214 PS1, Line 214: serialized_buf = nullptr; Why is this necessary ? Also, leave a comment on why we don't need to free the buffer explicitly here. http://gerrit.cloudera.org:8080/#/c/13583/1/be/src/runtime/coordinator-backend-state.cc@218 PS1, Line 218: serialized_buf = nullptr; Same as above. http://gerrit.cloudera.org:8080/#/c/13583/1/be/src/runtime/coordinator-backend-state.cc@233 PS1, Line 233: LOG(DFATAL) << FromKuduStatus(sidecar_status, "Failed to add sidecar").GetDetail(); If we fail to add the sidecar, it doesn't seem to make sense to continue with the RPC as the executor cannot do much without the content in the sidecar, right ? http://gerrit.cloudera.org:8080/#/c/13583/1/be/src/service/control-service.h File be/src/service/control-service.h: http://gerrit.cloudera.org:8080/#/c/13583/1/be/src/service/control-service.h@112 PS1, Line 112: QueryExecMgr* query_exec_mgr_; Is this necessary ? What's wrong with getting it from the singleton ExecEnv() on each RPC call ? It shouldn't be that expensive. http://gerrit.cloudera.org:8080/#/c/13583/1/be/src/service/control-service.cc File be/src/service/control-service.cc: http://gerrit.cloudera.org:8080/#/c/13583/1/be/src/service/control-service.cc@73 PS1, Line 73: query_exec_mgr_ = ExecEnv::GetInstance()->query_exec_mgr(); : DCHECK(query_exec_mgr_ != nullptr); Move to initializer list. http://gerrit.cloudera.org:8080/#/c/13583/1/common/thrift/ImpalaInternalService.thrift File common/thrift/ImpalaInternalService.thrift: http://gerrit.cloudera.org:8080/#/c/13583/1/common/thrift/ImpalaInternalService.thrift@579 PS1, Line 579: struct TExecQueryFInstancesSidecar { Do we plan to clean it up eventually to use protobuf only ? How about leaving a TODO here about converting the followings to protobuf eventually ? -- To view, visit http://gerrit.cloudera.org:8080/13583 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id3f1c6099109bd8e5361188005a7d0e892147570 Gerrit-Change-Number: 13583 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Thu, 13 Jun 2019 21:31:54 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7802: Close connections of idle client sessions
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13607 ) Change subject: IMPALA-7802: Close connections of idle client sessions .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/13607/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/13607/1//COMMIT_MSG@28 PS1, Line 28: If so, the connection will be closed. This allows the service threads > I think it's unlikely that HS2 clients are deliberately leaving connections The logic will not mark a connection as idle if we cannot find any sessions with a given connection in the connection to session state map. Based on my understanding, we don't remove the session from the connection to session state map even after it's closed so we should be fine in that case. -- To view, visit http://gerrit.cloudera.org:8080/13607 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I97c4fb8e1b741add273f8a913fb0967303683e38 Gerrit-Change-Number: 13607 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Thu, 13 Jun 2019 19:18:21 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7802: Close connections of idle client sessions
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13607 ) Change subject: IMPALA-7802: Close connections of idle client sessions .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/13607/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/13607/1//COMMIT_MSG@28 PS1, Line 28: If so, the connection will be closed. This allows the service threads > Do we expect the connection to be closed idle_client_poll_time_ms after the We expect the session to be closed within idle_client_poll_time_ms after the last session has _expired_. The goal of this patch is to address clients which don't close their idle sessions in time, which is the biggest complaint from users now. May be it's worth adding some logic to also close the connection if there are no sessions associated with it for extended period of time. Based on your experience, do you expect it to be a common thing for clients to call CloseSession() without closing the connection ? -- To view, visit http://gerrit.cloudera.org:8080/13607 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I97c4fb8e1b741add273f8a913fb0967303683e38 Gerrit-Change-Number: 13607 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Thu, 13 Jun 2019 18:25:14 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7802: Close connections of idle client sessions
Michael Ho has uploaded this change for review. ( http://gerrit.cloudera.org:8080/13607 Change subject: IMPALA-7802: Close connections of idle client sessions .. IMPALA-7802: Close connections of idle client sessions Previously, if idle session timeout is set either via startup flag or query options, a client session will expire after that set period of inactivity. However, the network connection and the service thread of an expired session will still be around until the session is closed by the client. This is highly undesirable as these idle sessions still count towards the quota bound by --fe_esrvice_threads, so if the total number of sessions (including the idle ones) reaches that upper bound, all incoming new session will block until some of the existing sessions exit. There is no time bound on when those expired sessions will be closed. In some sense, leaving many idle sessions opened is a denial-of-service attack on Impala. This change implements support for closing expired client sessions. In particular, a new flag --idle_client_poll_time_ms is added to specify a time interval in milliseconds of client's inactivity which will cause an idle service thread of a client connection to wake up and check if all sessions associated with the connection are idle. If so, the connection will be closed. This allows the service threads to be freed up without waiting for client to close the connections. Testing done: - core build - new targeted test which verifies the connections of expired sessions are closed. Change-Id: I97c4fb8e1b741add273f8a913fb0967303683e38 --- M be/src/rpc/TAcceptQueueServer.cpp M be/src/rpc/TAcceptQueueServer.h M be/src/rpc/thrift-server.cc M be/src/rpc/thrift-server.h M be/src/rpc/thrift-util.cc M be/src/rpc/thrift-util.h M be/src/service/impala-server.cc M be/src/service/impala-server.h M tests/custom_cluster/test_hs2.py M tests/custom_cluster/test_session_expiration.py 10 files changed, 291 insertions(+), 85 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/13607/1 -- To view, visit http://gerrit.cloudera.org:8080/13607 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I97c4fb8e1b741add273f8a913fb0967303683e38 Gerrit-Change-Number: 13607 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho
[Impala-ASF-CR] IMPALA-8659: Allow self-RPCs for KRPC to go via loopback
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13592 ) Change subject: IMPALA-8659: Allow self-RPCs for KRPC to go via loopback .. Patch Set 8: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/13592/8/be/src/rpc/rpc-mgr.cc File be/src/rpc/rpc-mgr.cc: http://gerrit.cloudera.org:8080/#/c/13592/8/be/src/rpc/rpc-mgr.cc@194 PS8, Line 194: // Listen on all addresses, including loopback. May help to DCHECK that the s_addr is INADDR_ANY in some way. This seems to be the assumption being made about the default ctor of Sockaddr. -- To view, visit http://gerrit.cloudera.org:8080/13592 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9dbd477769ed49c05e624f06da4e51afaaf1670d Gerrit-Change-Number: 13592 Gerrit-PatchSet: 8 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Thu, 13 Jun 2019 01:21:56 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8143: Enhance DoRpcWithRetry().
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12672 ) Change subject: IMPALA-8143: Enhance DoRpcWithRetry(). .. Patch Set 12: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/12672 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia9693151c35e02235665b3c285a48c585973d390 Gerrit-Change-Number: 12672 Gerrit-PatchSet: 12 Gerrit-Owner: Andrew Sherman Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Comment-Date: Tue, 11 Jun 2019 17:30:39 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8561: Eliminate mtime=-1 for HDFS scan ranges (part 1)
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13522 ) Change subject: IMPALA-8561: Eliminate mtime=-1 for HDFS scan ranges (part 1) .. Patch Set 4: Code-Review+2 (3 comments) Thanks for fixing it. http://gerrit.cloudera.org:8080/#/c/13522/4/be/src/runtime/io/handle-cache.inline.h File be/src/runtime/io/handle-cache.inline.h: http://gerrit.cloudera.org:8080/#/c/13522/4/be/src/runtime/io/handle-cache.inline.h@93 PS4, Line 93: DCHECK_GT(mtime, 0); super minor nit: is 0 invalid ? http://gerrit.cloudera.org:8080/#/c/13522/4/be/src/runtime/io/request-ranges.h File be/src/runtime/io/request-ranges.h: http://gerrit.cloudera.org:8080/#/c/13522/4/be/src/runtime/io/request-ranges.h@170 PS4, Line 170: 'mtime' matches : // the modified time of the cached file in the HDFS cache stale http://gerrit.cloudera.org:8080/#/c/13522/4/be/src/runtime/io/request-ranges.h@244 PS4, Line 244: int64_t mtime, Nice to have one comment about it. -- To view, visit http://gerrit.cloudera.org:8080/13522 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I48b7ed60d6ab9104b993237b4fe23de5dc058672 Gerrit-Change-Number: 13522 Gerrit-PatchSet: 4 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 10 Jun 2019 23:31:41 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8562: Data cache should skip insertion of uncacheable ScanRanges
Michael Ho has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/13369 ) Change subject: IMPALA-8562: Data cache should skip insertion of uncacheable ScanRanges .. IMPALA-8562: Data cache should skip insertion of uncacheable ScanRanges As shown in IMPALA-8561, there are some paths in the code which create uncacheable ScanRanges. These uncacheable ScanRanges have mtime of -1. 'mtime' is used for differentiating versions of files with the same names. An mtime == -1 means the cache entry could potentially be from any versions of a file with the same name. This change skips lookup or insertion of ScanRange with negative mtime, file offset or buffer length. Testing done: Added targeted test cases in data-cache-test Change-Id: I2294833b075a2ddcae956d9fdb04f3e85adb0391 --- M be/src/runtime/io/data-cache-test.cc M be/src/runtime/io/data-cache.cc M be/src/runtime/io/data-cache.h M be/src/runtime/io/request-ranges.h 4 files changed, 47 insertions(+), 11 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/13369/3 -- To view, visit http://gerrit.cloudera.org:8080/13369 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2294833b075a2ddcae956d9fdb04f3e85adb0391 Gerrit-Change-Number: 13369 Gerrit-PatchSet: 3 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-8562: Data cache should skip insertion of uncacheable ScanRanges
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13369 ) Change subject: IMPALA-8562: Data cache should skip insertion of uncacheable ScanRanges .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/13369/1/be/src/runtime/io/data-cache-test.cc File be/src/runtime/io/data-cache-test.cc: http://gerrit.cloudera.org:8080/#/c/13369/1/be/src/runtime/io/data-cache-test.cc@267 PS1, Line 267: // Test with uncacheable 'mtime' to make sure the entry is not stored. > I actually meant those should be additional test cases, i.e. test -1 (the m Added one more test case for the uncacheable scan ranges in the latest PS. -- To view, visit http://gerrit.cloudera.org:8080/13369 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2294833b075a2ddcae956d9fdb04f3e85adb0391 Gerrit-Change-Number: 13369 Gerrit-PatchSet: 3 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 04 Jun 2019 20:08:23 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8562: Data cache should skip insertion of uncacheable ScanRanges
Michael Ho has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/13369 ) Change subject: IMPALA-8562: Data cache should skip insertion of uncacheable ScanRanges .. IMPALA-8562: Data cache should skip insertion of uncacheable ScanRanges As shown in IMPALA-8561, there are some paths in the code which create uncacheable ScanRanges. These uncacheable ScanRanges have mtime of -1. 'mtime' is used for differentiating versions of files with the same names. An mtime == -1 means the cache entry could potentially be from any versions of a file with the same name. This change skips lookup or insertion of ScanRange with negative mtime, file offset or buffer length. Testing done: Added targeted test cases in data-cache-test Change-Id: I2294833b075a2ddcae956d9fdb04f3e85adb0391 --- M be/src/runtime/io/data-cache-test.cc M be/src/runtime/io/data-cache.cc M be/src/runtime/io/data-cache.h 3 files changed, 39 insertions(+), 11 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/13369/2 -- To view, visit http://gerrit.cloudera.org:8080/13369 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2294833b075a2ddcae956d9fdb04f3e85adb0391 Gerrit-Change-Number: 13369 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-8562: Data cache should skip insertion of uncacheable ScanRanges
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13369 ) Change subject: IMPALA-8562: Data cache should skip insertion of uncacheable ScanRanges .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/13369/1/be/src/runtime/io/data-cache-test.cc File be/src/runtime/io/data-cache-test.cc: http://gerrit.cloudera.org:8080/#/c/13369/1/be/src/runtime/io/data-cache-test.cc@267 PS1, Line 267: ASSERT_FALSE(cache.Store(FNAME, -1000, 0, test_buffer(), TEMP_BUFFER_SIZE)); > Maybe test with a different bogus mtime, e.g. -1 just to demonstrate th Done http://gerrit.cloudera.org:8080/#/c/13369/1/be/src/runtime/io/data-cache.cc File be/src/runtime/io/data-cache.cc: http://gerrit.cloudera.org:8080/#/c/13369/1/be/src/runtime/io/data-cache.cc@287 PS1, Line 287: explicit CacheKey(const string& filename, int64_t mtime, int64_t offset) > Maybe add DCHECKs here to make sure that invalid values didn't slip through Done http://gerrit.cloudera.org:8080/#/c/13369/1/be/src/runtime/io/data-cache.cc@660 PS1, Line 660: int64_t bytes_to_read, uint8_t* buffer) { > I think I prefer "uncachable" instead of "illegitimate". In some sense read Done -- To view, visit http://gerrit.cloudera.org:8080/13369 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2294833b075a2ddcae956d9fdb04f3e85adb0391 Gerrit-Change-Number: 13369 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 04 Jun 2019 01:07:36 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8562: Data cache should skip insertion of uncacheable ScanRanges
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13369 ) Change subject: IMPALA-8562: Data cache should skip insertion of uncacheable ScanRanges .. Patch Set 2: Code-Review+1 Carry Lars' +1 -- To view, visit http://gerrit.cloudera.org:8080/13369 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2294833b075a2ddcae956d9fdb04f3e85adb0391 Gerrit-Change-Number: 13369 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 04 Jun 2019 01:07:45 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8143: Enhance DoRpcWithRetry().
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12672 ) Change subject: IMPALA-8143: Enhance DoRpcWithRetry(). .. Patch Set 11: Code-Review+2 (6 comments) LGTM. Please address the nits. http://gerrit.cloudera.org:8080/#/c/12672/11/be/src/rpc/impala-service-pool.h File be/src/rpc/impala-service-pool.h: http://gerrit.cloudera.org:8080/#/c/12672/11/be/src/rpc/impala-service-pool.h@72 PS11, Line 72: string char* See https://google.github.io/styleguide/cppguide.html#Static_and_Global_Variables http://gerrit.cloudera.org:8080/#/c/12672/11/be/src/rpc/impala-service-pool.cc File be/src/rpc/impala-service-pool.cc: http://gerrit.cloudera.org:8080/#/c/12672/11/be/src/rpc/impala-service-pool.cc@192 PS11, Line 192: !debug_status.ok() UNLIKELY(!debug_status.ok()) http://gerrit.cloudera.org:8080/#/c/12672/11/be/src/rpc/rpc-mgr-test.h File be/src/rpc/rpc-mgr-test.h: http://gerrit.cloudera.org:8080/#/c/12672/11/be/src/rpc/rpc-mgr-test.h@281 PS11, Line 281: getNumberOfCalls() nit: number_of_calls() or GetNumberOfCalls() http://gerrit.cloudera.org:8080/#/c/12672/11/be/src/rpc/rpc-mgr-test.cc File be/src/rpc/rpc-mgr-test.cc: http://gerrit.cloudera.org:8080/#/c/12672/11/be/src/rpc/rpc-mgr-test.cc@310 PS11, Line 310: NULL nit: nullptr http://gerrit.cloudera.org:8080/#/c/12672/11/be/src/rpc/rpc-mgr-test.cc@331 PS11, Line 331: // DoRpcWithRetry will fail 1 in 2^40 times. : ASSERT_TRUE(status.ok()); : } : // There will be no overflows (i.e. service too busy) 1 in 2^40 times. : ASSERT_GT(overflow_metric->GetValue(), 0); :-). Subtle nits: Although they are similar this is referring to the probability of the debug action not kicking in at all in the loop above is: (1/2)^40 times. The other one on line 331 refers to the probability of the debugging action kicking in 40 times in a row. I suppose it's slightly clearer to state in (1/2)^num_rpc_retry_calls and (1/2)^num_retries. Please feel free to ignore as they are rather minor nits. http://gerrit.cloudera.org:8080/#/c/12672/11/be/src/rpc/rpc-mgr.h File be/src/rpc/rpc-mgr.h: http://gerrit.cloudera.org:8080/#/c/12672/11/be/src/rpc/rpc-mgr.h@183 PS11, Line 183: /// Pass 'debug_action' to DebugAction() to potentially inject errors. Please add a TODO: TODO: Clean up this interface. Replace the debug action with fault injection in RPC callbacks or other places. -- To view, visit http://gerrit.cloudera.org:8080/12672 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia9693151c35e02235665b3c285a48c585973d390 Gerrit-Change-Number: 12672 Gerrit-PatchSet: 11 Gerrit-Owner: Andrew Sherman Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Comment-Date: Mon, 03 Jun 2019 23:30:48 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8542. Add an access trace for the data cache
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13425 ) Change subject: IMPALA-8542. Add an access trace for the data cache .. Patch Set 3: (8 comments) LGTM. Mostly nits. http://gerrit.cloudera.org:8080/#/c/13425/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/13425/3//COMMIT_MSG@10 PS3, Line 10: trace.txt impala-cache-trace.txt ? http://gerrit.cloudera.org:8080/#/c/13425/3/be/src/runtime/io/data-cache.cc File be/src/runtime/io/data-cache.cc: http://gerrit.cloudera.org:8080/#/c/13425/3/be/src/runtime/io/data-cache.cc@88 PS3, Line 88: "(Advanced) Collect a trace of all lookups in the data cache."); nit: the indent here and other places in this change seems to be different from the existing code. We usually indent 4 for line continuation. Seems more consistent to follow the indent 4 convention. http://gerrit.cloudera.org:8080/#/c/13425/3/be/src/runtime/io/data-cache.cc@98 PS3, Line 98: impala-cache-trace.txt The data cache by default removes all files with names containing prefix CACHE_FILE_PREFIX at startup. I suppose we want to keep the trace files across restart to allow more flexibility for experiment, right ? In other words, it's up to users to remove the trace files. http://gerrit.cloudera.org:8080/#/c/13425/3/be/src/runtime/io/data-cache.cc@116 PS3, Line 116: if (file_) Flush(); Does it make sense to also call file_->Close() here or does the d'tor take care of it ? http://gerrit.cloudera.org:8080/#/c/13425/3/be/src/runtime/io/data-cache.cc@148 PS3, Line 148: nit: unnecessary blank line. http://gerrit.cloudera.org:8080/#/c/13425/3/be/src/runtime/io/data-cache.cc@186 PS3, Line 186: unique_ptr underlying_logger_; : unique_ptr logger_; Would be nice to briefly document these variables. http://gerrit.cloudera.org:8080/#/c/13425/3/be/src/runtime/io/data-cache.cc@498 PS3, Line 498: if (FLAGS_data_cache_enable_tracing) { : tracer_.reset(new Tracer(path_ + "/" + TRACE_FILE_NAME)); : RETURN_IF_ERROR(tracer_->Start()); : } I wonder how this may affect the available storage space allocated for the data cache ? The traces are relatively small compared to the cached data but I wonder if a long running Impala process may accumulate a somewhat large trace file. On the other hand, tracing is mostly for experimentation so it's not a severe concern for production environment. http://gerrit.cloudera.org:8080/#/c/13425/3/be/src/runtime/io/data-cache.cc@887 PS3, Line 887: char buf[BUF_LEN]; nit: same name as 'buf' declared outside of the scope. While functionally correct, it'd be less confusing to have no name collision. -- To view, visit http://gerrit.cloudera.org:8080/13425 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c Gerrit-Change-Number: 13425 Gerrit-PatchSet: 3 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Mon, 03 Jun 2019 22:37:41 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8562: Data cache should skip insertion of illegitimate ScanRanges
Michael Ho has uploaded this change for review. ( http://gerrit.cloudera.org:8080/13369 Change subject: IMPALA-8562: Data cache should skip insertion of illegitimate ScanRanges .. IMPALA-8562: Data cache should skip insertion of illegitimate ScanRanges As shown in IMPALA-8561, there are some paths in the code which create uncacheable ScanRanges. These uncacheable ScanRanges have mtime of -1. 'mtime' is used for differentiating versions of files with the same names. An mtime == -1 means the cache entry could potentially be from any versions of a file with the same name. This change skips lookup or insertion of ScanRange with negative mtime, file offset or buffer length. Testing done: Added targeted test cases in data-cache-test Change-Id: I2294833b075a2ddcae956d9fdb04f3e85adb0391 --- M be/src/runtime/io/data-cache-test.cc M be/src/runtime/io/data-cache.cc 2 files changed, 24 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/13369/1 -- To view, visit http://gerrit.cloudera.org:8080/13369 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I2294833b075a2ddcae956d9fdb04f3e85adb0391 Gerrit-Change-Number: 13369 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Marshall
[Impala-ASF-CR] IMPALA-8563: Update SSL ciphers used in BE tests
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13364 ) Change subject: IMPALA-8563: Update SSL ciphers used in BE tests .. Patch Set 1: Code-Review+1 Thanks for fixing it. -- To view, visit http://gerrit.cloudera.org:8080/13364 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I12b014361fb90afe63aed4b4608f6d6031e49cca Gerrit-Change-Number: 13364 Gerrit-PatchSet: 1 Gerrit-Owner: Laszlo Gaal Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 17 May 2019 20:44:08 + Gerrit-HasComments: No
[Impala-ASF-CR] Allow data cache to be enabled optionally when running tests
Hello Lars Volker, Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/13330 to look at the new patch set (#4). Change subject: Allow data cache to be enabled optionally when running tests .. Allow data cache to be enabled optionally when running tests This change adds two environment variables ${DATA_CACHE_DIR} and ${DATA_CACHE_SIZE} which specify the root directory path of the data cache and the data cache size respectively. If both are set, Impala cluster will be started with data cache enabled when running E2E tests. When running with HDFS, it will also set the config option --always_use_data_cache to true to force use of the data cache all the time. Change-Id: I09117ab289c2355408212a5fc6493ab751f4853b --- M bin/run-all-tests.sh 1 file changed, 15 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/13330/4 -- To view, visit http://gerrit.cloudera.org:8080/13330 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I09117ab289c2355408212a5fc6493ab751f4853b Gerrit-Change-Number: 13330 Gerrit-PatchSet: 4 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho
[Impala-ASF-CR] Allow data cache to be enabled optionally when running tests
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13330 ) Change subject: Allow data cache to be enabled optionally when running tests .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/13330/2/bin/run-all-tests.sh File bin/run-all-tests.sh: http://gerrit.cloudera.org:8080/#/c/13330/2/bin/run-all-tests.sh@64 PS2, Line 64: > Do we know if the choice of filesystem matters for the cache? For example, It may matter if the filesystem doesn't support hole punching. I leave it as empty in the new patch. http://gerrit.cloudera.org:8080/#/c/13330/2/bin/run-all-tests.sh@78 PS2, Line 78: _START_CLUSTER_ARGS="${TEST_S > I think the data caching is orthogonal to the filesystem, and we might want Done -- To view, visit http://gerrit.cloudera.org:8080/13330 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I09117ab289c2355408212a5fc6493ab751f4853b Gerrit-Change-Number: 13330 Gerrit-PatchSet: 3 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Thu, 16 May 2019 18:11:33 + Gerrit-HasComments: Yes
[Impala-ASF-CR] Allow data cache to be enabled optionally when running tests
Hello Lars Volker, Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/13330 to look at the new patch set (#3). Change subject: Allow data cache to be enabled optionally when running tests .. Allow data cache to be enabled optionally when running tests This change adds two environment variables ${DATA_CACHE_DIR} and ${DATA_CACHE_SIZE} which specify the root directory path of the data cache and the data cache size respectively. If both are set, Impala cluster will be started with data cache enabled when running E2E tests. When running with HDFS, it will also set the config option --always_use_data_cache to true to force use of the data cache all the time. Change-Id: I09117ab289c2355408212a5fc6493ab751f4853b --- M bin/run-all-tests.sh 1 file changed, 15 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/13330/3 -- To view, visit http://gerrit.cloudera.org:8080/13330 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I09117ab289c2355408212a5fc6493ab751f4853b Gerrit-Change-Number: 13330 Gerrit-PatchSet: 3 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-1653: Don't close hiveserver2 session when connection is closed
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13306 ) Change subject: IMPALA-1653: Don't close hiveserver2 session when connection is closed .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/13306/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/13306/3//COMMIT_MSG@30 PS3, Line 30: We may want to consider changing : some of this behavior. > We had started discussing this a bit offline. Agree that we rely on the ses Enabling --idle_query_timeout by default sounds like a reasonable mitigation against clients which don't fetch results. As Thomas said, the session state isn't that big so we can bound it at some large numbers and do LRU reclamation or set a reasonable large limit (e.g. 24 hrs) on how long a session will be maintained by default. What was the idle_session-timeout set to when you ran tests with it enabled ? -- To view, visit http://gerrit.cloudera.org:8080/13306 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia4555cd9b73db5b4dde92cd4fac4f9bfa3664d78 Gerrit-Change-Number: 13306 Gerrit-PatchSet: 3 Gerrit-Owner: Thomas Marshall Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 15 May 2019 19:03:53 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1653: Don't close hiveserver2 session when connection is closed
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13306 ) Change subject: IMPALA-1653: Don't close hiveserver2 session when connection is closed .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/13306/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/13306/3//COMMIT_MSG@30 PS3, Line 30: We may want to consider changing : some of this behavior. > We probably should consider removing states of expired sessions or Impala m To be fair, Impala is already vulnerable to similar kind of DoS attack today. A client can establish many connections to an Impala server, consuming all the FE service threads while leaving the sessions idle. All other clients will be blocked by this misbehaving client. See IMPALA-7802. Before this change, we could have as many sessions as number of FE service threads. After this change, the session will be kept alive even after the connection is closed so we could in theory have way more sessions than number of FE service threads. It seems there is a more pressing need for idle session timeout after this patch as it seems easier to create the situation of issuing a query without fetching all the results, holding up resources in the cluster but I could be missing something. Previously, the client disconnection will automatically close the session and unregistering all in-flight queries, freeing up the resources. -- To view, visit http://gerrit.cloudera.org:8080/13306 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia4555cd9b73db5b4dde92cd4fac4f9bfa3664d78 Gerrit-Change-Number: 13306 Gerrit-PatchSet: 3 Gerrit-Owner: Thomas Marshall Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Comment-Date: Wed, 15 May 2019 07:22:32 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4356,IMPALA-7331: codegen all ScalarExprs
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12797 ) Change subject: IMPALA-4356,IMPALA-7331: codegen all ScalarExprs .. Patch Set 13: Code-Review+2 (3 comments) http://gerrit.cloudera.org:8080/#/c/12797/12/be/src/exprs/scalar-fn-call.cc File be/src/exprs/scalar-fn-call.cc: http://gerrit.cloudera.org:8080/#/c/12797/12/be/src/exprs/scalar-fn-call.cc@143 PS12, Line 143: // staging_input_vals, which will be reused across calls to scalar_fn_ on the : // interpreted code path.. : vector* input_vals = fn_ctx->impl()->staging_input_vals(); : for (int i = 0; i < NumFixedArgs(); ++i) { : AnyVal* input_val; : RETURN_IF_ERROR(AllocateAnyVal(state, eval->expr_perm_pool(), children_[i]->type(), : "Could not allocate expression value", _val)); : input_vals->push_back(input_val); : } : > Yes, the problem I ran into is that code like the below code to GetConstVal I suppose an alternative is to do: bool is_interpreted = codegend_compute_fn_ == nullptr; Of course, this makes assumption about the order in which codegend_compute_fn_ is populated and when this function is called so it may add unnecessary complication. So, I think it's okay to leave it as-is but would be nice to briefly document the reasoning behind it. http://gerrit.cloudera.org:8080/#/c/12797/13/be/src/exprs/scalar-fn-call.cc File be/src/exprs/scalar-fn-call.cc: http://gerrit.cloudera.org:8080/#/c/12797/13/be/src/exprs/scalar-fn-call.cc@142 PS13, Line 142: We're in the interpreted path (i.e. no JIT) Prepare input_vals in case the interpreted path is invoked during initialization (e.g. GetConstantValue() below). http://gerrit.cloudera.org:8080/#/c/12797/13/be/src/exprs/scalar-fn-call.cc@144 PS13, Line 144: . nit: extra . -- To view, visit http://gerrit.cloudera.org:8080/12797 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I839d7a3a2f5e1309c33a1f66013ef11628c5dc11 Gerrit-Change-Number: 12797 Gerrit-PatchSet: 13 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 14 May 2019 21:30:47 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1653: Don't close hiveserver2 session when connection is closed
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13306 ) Change subject: IMPALA-1653: Don't close hiveserver2 session when connection is closed .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/13306/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/13306/3//COMMIT_MSG@28 PS3, Line 28: timeout : is off by default May be worth considering setting it to some large values instead of being "off". http://gerrit.cloudera.org:8080/#/c/13306/3//COMMIT_MSG@30 PS3, Line 30: We may want to consider changing : some of this behavior. We probably should consider removing states of expired sessions or Impala may risk being attacked indirectly by clients which open many sessions and never close them. Previously, a client will need to keep a connection opened to DoS Impala but with this change, it doesn't even have to. -- To view, visit http://gerrit.cloudera.org:8080/13306 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia4555cd9b73db5b4dde92cd4fac4f9bfa3664d78 Gerrit-Change-Number: 13306 Gerrit-PatchSet: 3 Gerrit-Owner: Thomas Marshall Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Comment-Date: Tue, 14 May 2019 20:14:20 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13020 ) Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard .. Patch Set 4: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/13020 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d Gerrit-Change-Number: 13020 Gerrit-PatchSet: 4 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sahil Takiar Gerrit-Comment-Date: Tue, 14 May 2019 18:31:21 + Gerrit-HasComments: No
[Impala-ASF-CR] Enable data cache by default for all S3 builds
Hello Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/13330 to look at the new patch set (#2). Change subject: Enable data cache by default for all S3 builds .. Enable data cache by default for all S3 builds Change-Id: I09117ab289c2355408212a5fc6493ab751f4853b --- M bin/run-all-tests.sh 1 file changed, 11 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/13330/2 -- To view, visit http://gerrit.cloudera.org:8080/13330 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I09117ab289c2355408212a5fc6493ab751f4853b Gerrit-Change-Number: 13330 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell
[Impala-ASF-CR] Enable data cache by default for all S3 builds
Michael Ho has uploaded this change for review. ( http://gerrit.cloudera.org:8080/13330 Change subject: Enable data cache by default for all S3 builds .. Enable data cache by default for all S3 builds Change-Id: I09117ab289c2355408212a5fc6493ab751f4853b --- M bin/run-all-tests.sh 1 file changed, 11 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/13330/1 -- To view, visit http://gerrit.cloudera.org:8080/13330 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I09117ab289c2355408212a5fc6493ab751f4853b Gerrit-Change-Number: 13330 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho
[Impala-ASF-CR] Prototype for a remote read byte cache.
Michael Ho has abandoned this change. ( http://gerrit.cloudera.org:8080/12683 ) Change subject: Prototype for a remote read byte cache. .. Abandoned See https://gerrit.cloudera.org/#/c/12987/ -- To view, visit http://gerrit.cloudera.org:8080/12683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: abandon Gerrit-Change-Id: Ic312b0f7ac7875e00a3855ef21dce5b8a9aa67c5 Gerrit-Change-Number: 12683 Gerrit-PatchSet: 1 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger
[Impala-ASF-CR] IMPALA-4356,IMPALA-7331: codegen all ScalarExprs
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12797 ) Change subject: IMPALA-4356,IMPALA-7331: codegen all ScalarExprs .. Patch Set 12: (12 comments) Thanks for the explanation. Can you please answer the question in scalar-fn-call.cc. Otherwise, I think this can be +2'ed. http://gerrit.cloudera.org:8080/#/c/12797/12/be/src/exec/grouping-aggregator.cc File be/src/exec/grouping-aggregator.cc: http://gerrit.cloudera.org:8080/#/c/12797/12/be/src/exec/grouping-aggregator.cc@119 PS12, Line 119: intermediate_row_desc_, /* is_entry_point */ false, state)); nit: indentation http://gerrit.cloudera.org:8080/#/c/12797/12/be/src/exprs/scalar-expr.h File be/src/exprs/scalar-expr.h: http://gerrit.cloudera.org:8080/#/c/12797/12/be/src/exprs/scalar-expr.h@110 PS12, Line 110: interpreted code. (e.g. an operator which doesn't support codegen yet). http://gerrit.cloudera.org:8080/#/c/12797/12/be/src/exprs/scalar-expr.h@117 PS12, Line 117: /// TODO: Fix subclasses which call GetCodegendComputeFnWrapper() to not call interpreted : /// functions. Stale ? http://gerrit.cloudera.org:8080/#/c/12797/12/be/src/exprs/scalar-expr.h@170 PS12, Line 170: /// compilation. Returns error status on failure. Can you please also add something similar to below: This function is invoked either by some other codegen functions (e.g. the codegen code of a join) or by RuntimeState::CodegenScalarExprs() which is called from FIS::Open() before LLVM compilation. These two call sites correspond to the two usage patterns in the class comment. http://gerrit.cloudera.org:8080/#/c/12797/12/be/src/exprs/scalar-expr.h@246 PS12, Line 246: for each to be overridden by http://gerrit.cloudera.org:8080/#/c/12797/12/be/src/exprs/scalar-expr.h@388 PS12, Line 388: If this is true, then after the expressions are codegen'd, : /// then 'codegend_compute_fn_' is non-NULL. If this is true, 'codegend_compute_fn_' will be set to the JIT'd function produced by GetCodegendComputeFn() after LLVM compilation. http://gerrit.cloudera.org:8080/#/c/12797/10/be/src/exprs/scalar-expr.cc File be/src/exprs/scalar-expr.cc: http://gerrit.cloudera.org:8080/#/c/12797/10/be/src/exprs/scalar-expr.cc@284 PS10, Line 284: > This assumption is false - GetCodegendComputeFnWrapper() generates > a codegen'd function that calls the interpreted path of its child. > We need to generate entry points there to avoid a perf cliff where > the whole subtree doesn't get codegen'd (which would be a > regression from the old behaviour too). > That's true. I suppose this mostly affects ScalarFnCall, right ? Before this change, if the expression is not codegen'd, its children are not codegen'd unless the children are ScalarFnCall. In that sense, that'll be a regression. Are there other cases which I may have missed ? > > ScalarExpr doesn't know the context in which it is used. E.g. if > it's the child of an AggregateExpr, then it's not an entry point > but doesn't know that based on local information. There are > actually a bunch more places where the expr is always called from > codegen'd code and we could save some overhead, but I didn't go > through and find them all. I see. I can see an alternate implementation is to revert the polarity of the "is_entry_point" flag to false by default and makes the owners of expressions to explicitly set 'is_entry_point' when creating / initialization the expressions. Those owners most likely include some operators which don't yet support codegen but I haven't looked through the entire code to see if there could be other non-trivial cases. On the other hand, this may make it more likely to regress if we miss some cases as ScalarFnCall currently codegen all the time regardless of the callers. I suppose we kind of need to go through all owners of expressions and mark those which don't need entry points when addressing the TODO in Create() above. http://gerrit.cloudera.org:8080/#/c/12797/12/be/src/exprs/scalar-expr.cc File be/src/exprs/scalar-expr.cc: http://gerrit.cloudera.org:8080/#/c/12797/12/be/src/exprs/scalar-expr.cc@86 PS12, Line 86: /// nit: // http://gerrit.cloudera.org:8080/#/c/12797/12/be/src/exprs/scalar-expr.inline.h File be/src/exprs/scalar-expr.inline.h: http://gerrit.cloudera.org:8080/#/c/12797/12/be/src/exprs/scalar-expr.inline.h@24 PS12, Line 24: // nit: /// http://gerrit.cloudera.org:8080/#/c/12797/12/be/src/exprs/scalar-expr.inline.h@28 PS12, Line 28: ScalarFnCall ScalarExpr http://gerrit.cloudera.org:8080/#/c/12797/12/be/src/exprs/scalar-fn-call.cc File be/src/exprs/scalar-fn-call.cc: http://gerrit.cloudera.org:8080/#/c/12797/12/be/src/exprs/scalar-fn-call.cc@95 PS12, Line 95: // CHAR or VARCHAR are not supported as input arguments or return values for UDFs. stale ?
[Impala-ASF-CR] IMPALA-8143: Enhance DoRpcWithRetry().
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12672 ) Change subject: IMPALA-8143: Enhance DoRpcWithRetry(). .. Patch Set 10: (1 comment) http://gerrit.cloudera.org:8080/#/c/12672/10/be/src/rpc/rpc-mgr-test.h File be/src/rpc/rpc-mgr-test.h: http://gerrit.cloudera.org:8080/#/c/12672/10/be/src/rpc/rpc-mgr-test.h@322 PS10, Line 322: kudu::rpc::ResponseCallback callback = []() {}; : kudu::rpc::ErrorStatusPB* error_status_pb = new kudu::rpc::ErrorStatusPB(); : error_status_pb->set_code(kudu::rpc::ErrorStatusPB::ERROR_SERVER_TOO_BUSY); : kudu::rpc::OutboundCall* outbound_call = new kudu::rpc::OutboundCall( : *conn_id_, *remote_method_, resp, controller, callback); : outbound_call->status_ = kudu::Status::RemoteError("Remote error"); : outbound_call->state_ = kudu::rpc::OutboundCall::FINISHED_ERROR; : outbound_call->error_pb_.reset(error_status_pb); : controller->call_.reset(outbound_call); As discussed offline last week, we may be better off using a debug action in ImpalaServicePool to simulate the busy queue behavior instead of cooking this failure up on the client side. -- To view, visit http://gerrit.cloudera.org:8080/12672 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia9693151c35e02235665b3c285a48c585973d390 Gerrit-Change-Number: 12672 Gerrit-PatchSet: 10 Gerrit-Owner: Andrew Sherman Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Comment-Date: Thu, 09 May 2019 17:43:56 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7176: Increase wait time in test insert mem limit
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13292 ) Change subject: IMPALA-7176: Increase wait time in test_insert_mem_limit .. Patch Set 1: Didn't know you were working on the same thing. Oh well..may be you can update the commit message of your patch to include that there was testing done with timeout of 180s. -- To view, visit http://gerrit.cloudera.org:8080/13292 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2e14bef79c6c6fb0004270319f1c491194260438 Gerrit-Change-Number: 13292 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 09 May 2019 01:43:34 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7176: Increase wait time in test insert mem limit
Michael Ho has uploaded this change for review. ( http://gerrit.cloudera.org:8080/13292 Change subject: IMPALA-7176: Increase wait time in test_insert_mem_limit .. IMPALA-7176: Increase wait time in test_insert_mem_limit test_insert_mem_limit in test_insert.py will wait for all fragments to exit before proceeding after the test query hits a memory limit. On some slower EC2 instances with Centos6, the default wait time of 60s may occasionally cause the test to fail due to time out waiting for all fragments to exit. This change increases the timeout to 180 seconds. Testing done: - Looped test_insert_mem_limit for 100 times on Centos6; Change-Id: I2e14bef79c6c6fb0004270319f1c491194260438 --- M tests/query_test/test_insert.py 1 file changed, 1 insertion(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/13292/1 -- To view, visit http://gerrit.cloudera.org:8080/13292 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I2e14bef79c6c6fb0004270319f1c491194260438 Gerrit-Change-Number: 13292 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho
[Impala-ASF-CR] IMPALA-8512: Disable certain tests on Centos6
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13271 ) Change subject: IMPALA-8512: Disable certain tests on Centos6 .. Patch Set 4: GVO failed due to IMPALA-8527 -- To view, visit http://gerrit.cloudera.org:8080/13271 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifbff15415bc690f779a09ec93a7ded8b394eca10 Gerrit-Change-Number: 13271 Gerrit-PatchSet: 4 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 08 May 2019 17:54:54 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7665: Fix unwarranted query cancellation on statestore restart
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13061 ) Change subject: IMPALA-7665: Fix unwarranted query cancellation on statestore restart .. Patch Set 5: Code-Review+1 Not sure if Lars wants to take another look. Otherwise, I can +2 it. -- To view, visit http://gerrit.cloudera.org:8080/13061 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30b68bd8bde4bf589d58d42d6f683afb166de959 Gerrit-Change-Number: 13061 Gerrit-PatchSet: 5 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 07 May 2019 23:47:52 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8512: Disable certain tests on Centos6
Hello Lars Volker, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/13271 to look at the new patch set (#3). Change subject: IMPALA-8512: Disable certain tests on Centos6 .. IMPALA-8512: Disable certain tests on Centos6 The data cache related tests rely on data cache files being created successfully on local filesystem. The cache initialization may fail if the cache directory resides on a ext filesystem which is affected by KUDU-1508 (metadata corruption after hole punching in some files). On some older versions of Centos6, the tests fail as a result of this bug. This change skips these tests if they detect that it's running on an old system affected by KUDU-1508. This patch also disables a filesystem-util test which relies on readdir() returning the correct entries' types. On some older platforms such as Centos6, this feature may not be fully supported on all filesystems. Change-Id: Ifbff15415bc690f779a09ec93a7ded8b394eca10 --- M be/src/runtime/io/data-cache-test.cc M be/src/util/filesystem-util-test.cc M be/src/util/filesystem-util.cc M be/src/util/filesystem-util.h M tests/common/environ.py M tests/common/skip.py M tests/custom_cluster/test_data_cache.py 7 files changed, 86 insertions(+), 32 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/13271/3 -- To view, visit http://gerrit.cloudera.org:8080/13271 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifbff15415bc690f779a09ec93a7ded8b394eca10 Gerrit-Change-Number: 13271 Gerrit-PatchSet: 3 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-8512: Disable certain tests on Centos6
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13271 ) Change subject: IMPALA-8512: Disable certain tests on Centos6 .. Patch Set 3: Code-Review+2 Carry Tim's +2 -- To view, visit http://gerrit.cloudera.org:8080/13271 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifbff15415bc690f779a09ec93a7ded8b394eca10 Gerrit-Change-Number: 13271 Gerrit-PatchSet: 3 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 07 May 2019 23:17:15 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8512: Disable certain tests on Centos6
Michael Ho has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/13271 ) Change subject: IMPALA-8512: Disable certain tests on Centos6 .. IMPALA-8512: Disable certain tests on Centos6 The data cache related tests rely on data cache files being created successfully on local filesystem. The cache initialization may fail if the cache directory resides on a ext filesystem which is affected by KUDU-1508 (metadata corruption after hole punching in some files). On some older versions of Centos6, the tests fail as a result of this bug. This change skips these tests if they detect that it's running on an old system affected by KUDU-1508. This patch also disables a filesystem-util test which relies on readdir() returning the correct entries' types. On some older platforms such as Centos6, this feature may not be fully supported on all filesystems. Change-Id: Ifbff15415bc690f779a09ec93a7ded8b394eca10 --- M be/src/runtime/io/data-cache-test.cc M be/src/util/filesystem-util-test.cc M be/src/util/filesystem-util.cc M be/src/util/filesystem-util.h M tests/common/environ.py M tests/common/skip.py M tests/custom_cluster/test_data_cache.py 7 files changed, 86 insertions(+), 32 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/13271/2 -- To view, visit http://gerrit.cloudera.org:8080/13271 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifbff15415bc690f779a09ec93a7ded8b394eca10 Gerrit-Change-Number: 13271 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-8512: Disable certain tests on Centos6
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13271 ) Change subject: IMPALA-8512: Disable certain tests on Centos6 .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/13271/1/tests/common/environ.py File tests/common/environ.py: http://gerrit.cloudera.org:8080/#/c/13271/1/tests/common/environ.py@54 PS1, Line 54: IS_BUGGY_EL6_KERNEL = 'el6' in kernel_release and kernel_release < '2.6.32-674' > Yes, it probably requires some more sophisticated parsing to make it more f There are probably other python modules which allow simple parsing. Otherwise, it shouldn't be too hard to parse them via regular expression. Will update the patch on it. -- To view, visit http://gerrit.cloudera.org:8080/13271 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifbff15415bc690f779a09ec93a7ded8b394eca10 Gerrit-Change-Number: 13271 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 07 May 2019 20:14:34 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8512: Disable certain tests on Centos6
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13271 ) Change subject: IMPALA-8512: Disable certain tests on Centos6 .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/13271/1/tests/common/environ.py File tests/common/environ.py: http://gerrit.cloudera.org:8080/#/c/13271/1/tests/common/environ.py@54 PS1, Line 54: IS_BUGGY_EL6_KERNEL = 'el6' in kernel_release and kernel_release < '2.6.32-674' > Does string comparison of the kernel version always work? As opposed to num Yes, it probably requires some more sophisticated parsing to make it more foolproof but it gets most of what we need for our current testing environment. -- To view, visit http://gerrit.cloudera.org:8080/13271 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifbff15415bc690f779a09ec93a7ded8b394eca10 Gerrit-Change-Number: 13271 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 07 May 2019 19:56:33 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8512: Disable certain tests on Centos6
Michael Ho has uploaded this change for review. ( http://gerrit.cloudera.org:8080/13271 Change subject: IMPALA-8512: Disable certain tests on Centos6 .. IMPALA-8512: Disable certain tests on Centos6 The data cache related tests rely on data cache files being created successfully on local filesystem. The cache initialization may fail if the cache directory resides on a ext filesystem which is affected by KUDU-1508 (metadata corruption after hole punching in some files). On some older versions of Centos6, the tests fail as a result of this bug. This change skips these tests if they detect that it's running on an old system affected by KUDU-1508. This patch also disables a filesystem-util test which relies on readdir() returning the correct entries' types. On some older platforms such as Centos6, this feature may not be fully supported on all filesystems. Change-Id: Ifbff15415bc690f779a09ec93a7ded8b394eca10 --- M be/src/runtime/io/data-cache-test.cc M be/src/util/filesystem-util-test.cc M be/src/util/filesystem-util.cc M be/src/util/filesystem-util.h M tests/common/environ.py M tests/common/skip.py M tests/custom_cluster/test_data_cache.py 7 files changed, 79 insertions(+), 32 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/13271/1 -- To view, visit http://gerrit.cloudera.org:8080/13271 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ifbff15415bc690f779a09ec93a7ded8b394eca10 Gerrit-Change-Number: 13271 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho
[Impala-ASF-CR] IMPALA-7665: Fix unwarranted query cancellation on statestore restart
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13061 ) Change subject: IMPALA-7665: Fix unwarranted query cancellation on statestore restart .. Patch Set 4: (3 comments) Thanks for fixing it. Glad that this is also pretty straightforward. http://gerrit.cloudera.org:8080/#/c/13061/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/13061/4//COMMIT_MSG@23 PS4, Line 23: fo typo http://gerrit.cloudera.org:8080/#/c/13061/4/be/src/statestore/statestore-subscriber.h File be/src/statestore/statestore-subscriber.h: http://gerrit.cloudera.org:8080/#/c/13061/4/be/src/statestore/statestore-subscriber.h@214 PS4, Line 214: // nit: /// http://gerrit.cloudera.org:8080/#/c/13061/4/be/src/statestore/statestore-subscriber.cc File be/src/statestore/statestore-subscriber.cc: http://gerrit.cloudera.org:8080/#/c/13061/4/be/src/statestore/statestore-subscriber.cc@196 PS4, Line 196: last_registration_ms_.Store(MonotonicMillis()); Should this be set iff status.ok() ? -- To view, visit http://gerrit.cloudera.org:8080/13061 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30b68bd8bde4bf589d58d42d6f683afb166de959 Gerrit-Change-Number: 13061 Gerrit-PatchSet: 4 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 06 May 2019 22:13:50 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8496: Fix flakiness of test data cache.py
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13242 ) Change subject: IMPALA-8496: Fix flakiness of test_data_cache.py .. Patch Set 4: Code-Review+2 (3 comments) Carry Lars' +2. http://gerrit.cloudera.org:8080/#/c/13242/3/tests/custom_cluster/test_data_cache.py File tests/custom_cluster/test_data_cache.py: http://gerrit.cloudera.org:8080/#/c/13242/3/tests/custom_cluster/test_data_cache.py@58 PS3, Line 58: t count m > ...verifies that the cache hit count metrics are correct. Done http://gerrit.cloudera.org:8080/#/c/13242/3/tests/custom_cluster/test_data_cache.py@59 PS3, Line 59: es hit is > "The exact number of bytes.." Done http://gerrit.cloudera.org:8080/#/c/13242/3/tests/custom_cluster/test_data_cache.py@63 PS3, Line 63: warm up the cache. Expect no hits. > make the query a variable? Done -- To view, visit http://gerrit.cloudera.org:8080/13242 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I98d5b8fa1d3fb25682a64bffaf56d751a140e4c9 Gerrit-Change-Number: 13242 Gerrit-PatchSet: 4 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Sat, 04 May 2019 20:48:56 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8496: Fix flakiness of test data cache.py
Hello Lars Volker, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/13242 to look at the new patch set (#4). Change subject: IMPALA-8496: Fix flakiness of test_data_cache.py .. IMPALA-8496: Fix flakiness of test_data_cache.py test_data_cache.py was added as part of IMPALA-8341 to verify that the DataCache hit / miss counts and the DataCache metrics are working as expected. The test seems to fail intermittently due to unexpected cache misses. Part of the test creates a temp table from tpch_parquet.lineitem and uses it to join against tpch_parquet.lineitem itself on the l_orderkey column. The test expects a complete cache hit for tpch_parquet.lineitem when joining against the temp table as it should be cached entirely as part of CTAS statement. However, this doesn't work as expected all the time. In particular, the data cache internally divides up the key space into multiple shards and a key is hashed to determine the shard it belongs to. By default, the number of shards is the same as number of CPU cores (e.g. 16 for AWS m5-4xlarge instance). Since the cache size is set to 500MB, each shard will have a capacity of 31MB only. In some cases, it's possible that some rows of l_orderkey are evicted if the shard they belong to grow beyond 31MB. The problem is not deterministic as part of the cache key is the modification time of the file, which changes from run-to-run as it's essentially determined by the data loading time of the job. This leads to flakiness of the test. To fix this problem, this patch forces the data cache to use a single shard only for determinisim. In addition, the test is also skipped for non-HDFS and HDFS erasure encoding builds as it's dependent on the scan range assignment. To exercise the cache more extensively, the plan is to enable it by default for S3 builds instead of relying on BE and E2E tests only. Testing done: - Ran test_data_cache.py 10+ times, each with different mtime for tpch_parquet.lineitem; Used to fail 2 out of 3 runs. Change-Id: I98d5b8fa1d3fb25682a64bffaf56d751a140e4c9 --- M testdata/workloads/functional-query/queries/QueryTest/data-cache.test M tests/custom_cluster/test_data_cache.py 2 files changed, 36 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/42/13242/4 -- To view, visit http://gerrit.cloudera.org:8080/13242 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I98d5b8fa1d3fb25682a64bffaf56d751a140e4c9 Gerrit-Change-Number: 13242 Gerrit-PatchSet: 4 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho
[Impala-ASF-CR] IMPALA-8496: Fix flakiness of test data cache.py
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13242 ) Change subject: IMPALA-8496: Fix flakiness of test_data_cache.py .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/13242/2/testdata/workloads/functional-query/queries/QueryTest/data-cache.test File testdata/workloads/functional-query/queries/QueryTest/data-cache.test: http://gerrit.cloudera.org:8080/#/c/13242/2/testdata/workloads/functional-query/queries/QueryTest/data-cache.test@41 PS2, Line 41: # Verifies that stale data from the cache is not used. > nit: . instead of ; Done http://gerrit.cloudera.org:8080/#/c/13242/2/tests/custom_cluster/test_data_cache.py File tests/custom_cluster/test_data_cache.py: http://gerrit.cloudera.org:8080/#/c/13242/2/tests/custom_cluster/test_data_cache.py@39 PS2, Line 39: cache_force_single_shard > Should we have a second test without that flag that just checks that the hi I added a second test which only checks the metrics but not the counters in the profile. Seems a bit tricky to support both cases in data-cache.test unless we loosen the checks. -- To view, visit http://gerrit.cloudera.org:8080/13242 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I98d5b8fa1d3fb25682a64bffaf56d751a140e4c9 Gerrit-Change-Number: 13242 Gerrit-PatchSet: 3 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Sat, 04 May 2019 17:56:00 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8496: Fix flakiness of test data cache.py
Hello Lars Volker, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/13242 to look at the new patch set (#3). Change subject: IMPALA-8496: Fix flakiness of test_data_cache.py .. IMPALA-8496: Fix flakiness of test_data_cache.py test_data_cache.py was added as part of IMPALA-8341 to verify that the DataCache hit / miss counts and the DataCache metrics are working as expected. The test seems to fail intermittently due to unexpected cache misses. Part of the test creates a temp table from tpch_parquet.lineitem and uses it to join against tpch_parquet.lineitem itself on the l_orderkey column. The test expects a complete cache hit for tpch_parquet.lineitem when joining against the temp table as it should be cached entirely as part of CTAS statement. However, this doesn't work as expected all the time. In particular, the data cache internally divides up the key space into multiple shards and a key is hashed to determine the shard it belongs to. By default, the number of shards is the same as number of CPU cores (e.g. 16 for AWS m5-4xlarge instance). Since the cache size is set to 500MB, each shard will have a capacity of 31MB only. In some cases, it's possible that some rows of l_orderkey are evicted if the shard they belong to grow beyond 31MB. The problem is not deterministic as part of the cache key is the modification time of the file, which changes from run-to-run as it's essentially determined by the data loading time of the job. This leads to flakiness of the test. To fix this problem, this patch forces the data cache to use a single shard only for determinisim. In addition, the test is also skipped for non-HDFS and HDFS erasure encoding builds as it's dependent on the scan range assignment. To exercise the cache more extensively, the plan is to enable it by default for S3 builds instead of relying on BE and E2E tests only. Testing done: - Ran test_data_cache.py 10+ times, each with different mtime for tpch_parquet.lineitem; Used to fail 2 out of 3 runs. Change-Id: I98d5b8fa1d3fb25682a64bffaf56d751a140e4c9 --- M testdata/workloads/functional-query/queries/QueryTest/data-cache.test M tests/custom_cluster/test_data_cache.py 2 files changed, 35 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/42/13242/3 -- To view, visit http://gerrit.cloudera.org:8080/13242 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I98d5b8fa1d3fb25682a64bffaf56d751a140e4c9 Gerrit-Change-Number: 13242 Gerrit-PatchSet: 3 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho
[Impala-ASF-CR] IMPALA-8496: Fix flakiness of test data cache.py
Michael Ho has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/13242 ) Change subject: IMPALA-8496: Fix flakiness of test_data_cache.py .. IMPALA-8496: Fix flakiness of test_data_cache.py test_data_cache.py was added as part of IMPALA-8341 to verify that the DataCache hit / miss counts and the DataCache metrics are working as expected. The test seems to fail intermittently due to unexpected cache misses. Part of the test creates a temp table from tpch_parquet.lineitem and uses it to join against tpch_parquet.lineitem itself on the l_orderkey column. The test expects a complete cache hit for tpch_parquet.lineitem when joining against the temp table as it should be cached entirely as part of CTAS statement. However, this doesn't work as expected all the time. In particular, the data cache internally divides up the key space into multiple shards and a key is hashed to determine the shard it belongs to. By default, the number of shards is the same as number of CPU cores (e.g. 16 for AWS m5-4xlarge instance). Since the cache size is set to 500MB, each shard will have a capacity of 31MB only. In some cases, it's possible that some rows of l_orderkey are evicted if the shard they belong to grow beyond 31MB. The problem is not deterministic as part of the cache key is the modification time of the file, which changes from run-to-run as it's essentially determined by the data loading time of the job. This leads to flakiness of the test. To fix this problem, this patch forces the data cache to use a single shard only for determinisim. In addition, the test is also skipped for non-HDFS and HDFS erasure encoding builds as it's dependent on the scan range assignment. To exercise the cache more extensively, the plan is to enable it by default for S3 builds instead of relying on BE and E2E tests only. Testing done: - Ran test_data_cache.py 10+ times, each with different mtime for tpch_parquet.lineitem; Used to fail 2 out of 3 runs. Change-Id: I98d5b8fa1d3fb25682a64bffaf56d751a140e4c9 --- M testdata/workloads/functional-query/queries/QueryTest/data-cache.test M tests/custom_cluster/test_data_cache.py 2 files changed, 15 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/42/13242/2 -- To view, visit http://gerrit.cloudera.org:8080/13242 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I98d5b8fa1d3fb25682a64bffaf56d751a140e4c9 Gerrit-Change-Number: 13242 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-8496: Fix flakiness of test data cache.py
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13242 ) Change subject: IMPALA-8496: Fix flakiness of test_data_cache.py .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/13242/1/tests/custom_cluster/test_data_cache.py File tests/custom_cluster/test_data_cache.py: http://gerrit.cloudera.org:8080/#/c/13242/1/tests/custom_cluster/test_data_cache.py@38 PS1, Line 38: > flake8: E502 the backslash is redundant between brackets Done http://gerrit.cloudera.org:8080/#/c/13242/1/tests/custom_cluster/test_data_cache.py@39 PS1, Line 39: c > flake8: E131 continuation line unaligned for hanging indent Done -- To view, visit http://gerrit.cloudera.org:8080/13242 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I98d5b8fa1d3fb25682a64bffaf56d751a140e4c9 Gerrit-Change-Number: 13242 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Sat, 04 May 2019 17:13:47 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8496: Fix flakiness of test data cache.py
Michael Ho has uploaded this change for review. ( http://gerrit.cloudera.org:8080/13242 Change subject: IMPALA-8496: Fix flakiness of test_data_cache.py .. IMPALA-8496: Fix flakiness of test_data_cache.py test_data_cache.py was added as part of IMPALA-8341 to verify that the DataCache hit / miss counts and the DataCache metrics are working as expected. The test seems to fail intermittently due to unexpected cache misses. Part of the test creates a temp table from tpch_parquet.lineitem and uses it to join against tpch_parquet.lineitem itself on the l_orderkey column. The test expects a complete cache hit for tpch_parquet.lineitem when joining against the temp table as it should be cached entirely as part of CTAS statement. However, this doesn't work as expected all the time. In particular, the data cache internally divides up the key space into multiple shards and a key is hashed to determine the shard it belongs to. By default, the number of shards is the same as number of CPU cores (e.g. 16 for AWS m5-4xlarge instance). Since the cache size is set to 500MB, each shard will have a capacity of 31MB only. In some cases, it's possible that some rows of l_orderkey are evicted if the shard they belong to grow beyond 31MB. The problem is not deterministic as part of the cache key is the modification time of the file, which changes from run-to-run as it's essentially determined by the data loading time of the job. This leads to flakiness of the test. To fix this problem, this patch forces the data cache to use a single shard only for determinisim. In addition, the test is also skipped for non-HDFS and HDFS erasure encoding builds as it's dependent on the scan range assignment. To exercise the cache more extensively, the plan is to enable it by default for S3 builds instead of relying on BE and E2E tests only. Testing done: - Ran test_data_cache.py 10+ times, each with different mtime for tpch_parquet.lineitem; Used to fail 2 out of 3 runs. Change-Id: I98d5b8fa1d3fb25682a64bffaf56d751a140e4c9 --- M testdata/workloads/functional-query/queries/QueryTest/data-cache.test M tests/custom_cluster/test_data_cache.py 2 files changed, 15 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/42/13242/1 -- To view, visit http://gerrit.cloudera.org:8080/13242 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I98d5b8fa1d3fb25682a64bffaf56d751a140e4c9 Gerrit-Change-Number: 13242 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho
[Impala-ASF-CR] IMPALA-8341: Data cache for remote reads
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12987 ) Change subject: IMPALA-8341: Data cache for remote reads .. Patch Set 10: Code-Review+2 (1 comment) Carry Todd's +2 http://gerrit.cloudera.org:8080/#/c/12987/9/be/src/runtime/io/data-cache.cc File be/src/runtime/io/data-cache.cc: http://gerrit.cloudera.org:8080/#/c/12987/9/be/src/runtime/io/data-cache.cc@260 PS9, Line 260: explicit CacheEntry(const Slice& value) { > explicit :) Done -- To view, visit http://gerrit.cloudera.org:8080/12987 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc Gerrit-Change-Number: 12987 Gerrit-PatchSet: 10 Gerrit-Owner: Michael Ho Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 03 May 2019 02:46:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8341: Data cache for remote reads
Michael Ho has uploaded a new patch set (#10). ( http://gerrit.cloudera.org:8080/12987 ) Change subject: IMPALA-8341: Data cache for remote reads .. IMPALA-8341: Data cache for remote reads This is a patch based on PhilZ's prototype: https://gerrit.cloudera.org/#/c/12683/ This change implements an IO data cache which is backed by local storage. It implicitly relies on the OS page cache management to shuffle data between memory and the storage device. This is useful for caching data read from remote filesystems (e.g. remote HDFS data node, S3, ABFS, ADLS). A data cache is divided into one or more partitions based on the configuration string which is a list of directories, separated by comma, followed by the storage capacity per directory. An example configuration string is like the following: --data_cache_config=/data/0,/data/1:150GB In the configuration above, the cache may use up to 300GB of storage space, with 150GB max for /data/0 and /data/1 respectively. Each partition has a meta-data cache which tracks the mappings of cache keys to the locations of the cached data. A cache key is a tuple of (file's name, file's modification time, file offset) and a cache entry is a tuple of (backing file, offset in the backing file, length of the cached data, optional checksum). Note that the cache currently doesn't support overlapping ranges. In other words, if the cache contains an entry of a file for range [m, m+4MB), a lookup for [m+4K, m+8K) will miss in the cache. In practice, we haven't seen this as a problem but this may require further evaluation in the future. Each partition stores its set of cached data in backing files created on local storage. When inserting new data into the cache, the data is appended to the current backing file in use. The storage consumption of each cache entry counts towards the quota of that partition. When a partition reaches its capacity, the least recently used (LRU) data in that partition is evicted. Evicted data is removed from the underlying storage by punching holes in the backing file it's stored in. As a backing file reaches a certain size (by default 4TB), new data will stop being appended to it and a new file will be created instead. Note that due to hole punching, the backing file is actually sparse. When the number of backing files per partition exceeds, --data_cache_max_files_per_partition, files are deleted in the order in which they are created. Stale cache entries referencing deleted files are erased lazily or evicted due to inactivity. Optionally, checksumming can be enabled to verify read from the cache is consistent with what was inserted and to verify that multiple attempted insertions with the same cache key have the same cache content. Checksumming is enabled by default for debug builds. To probe for cached data in the cache, the interface Lookup() is used; To insert data into the cache, the interface Store() is used. Please note that eviction happens inline currently during Store(). This patch also added two startup flags for start-impala-cluster.py: '--data_cache_dir' specifies the base directory in which each Impalad creates the caching directory '--data_cache_size' specifies the capacity string for each cache directory. Testing done: - added a new BE and EE test - exhaustive (debug, release) builds with cache enabled - core ASAN build with cache enabled Perf: - 16-streams TPCDS at 3TB in a 20 node S3 cluster shows about 30% improvement over runs without the cache. Each node has a cache size of 150GB per node. The performance is at parity with a configuration of a HDFS cluster using EBS as the storage. Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc --- M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/runtime/io/CMakeLists.txt A be/src/runtime/io/data-cache-test.cc A be/src/runtime/io/data-cache.cc A be/src/runtime/io/data-cache.h M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/disk-io-mgr.h M be/src/runtime/io/hdfs-file-reader.cc M be/src/runtime/io/hdfs-file-reader.h M be/src/runtime/io/request-context.h M be/src/util/filesystem-util-test.cc M be/src/util/filesystem-util.cc M be/src/util/filesystem-util.h M be/src/util/impalad-metrics.cc M be/src/util/impalad-metrics.h M bin/start-impala-cluster.py M common/thrift/metrics.json A testdata/workloads/functional-query/queries/QueryTest/data-cache.test M tests/common/custom_cluster_test_suite.py M tests/common/impala_test_suite.py A tests/custom_cluster/test_data_cache.py M tests/custom_cluster/test_krpc_metrics.py 23 files changed, 2,150 insertions(+), 55 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/12987/10 -- To view, visit http://gerrit.cloudera.org:8080/12987 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id:
[Impala-ASF-CR] IMPALA-8341: Data cache for remote reads
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12987 ) Change subject: IMPALA-8341: Data cache for remote reads .. Patch Set 9: Code-Review+1 Carry Lars' +1 -- To view, visit http://gerrit.cloudera.org:8080/12987 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc Gerrit-Change-Number: 12987 Gerrit-PatchSet: 9 Gerrit-Owner: Michael Ho Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Thu, 02 May 2019 02:24:51 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8341: Data cache for remote reads
Michael Ho has uploaded a new patch set (#9). ( http://gerrit.cloudera.org:8080/12987 ) Change subject: IMPALA-8341: Data cache for remote reads .. IMPALA-8341: Data cache for remote reads This is a patch based on PhilZ's prototype: https://gerrit.cloudera.org/#/c/12683/ This change implements an IO data cache which is backed by local storage. It implicitly relies on the OS page cache management to shuffle data between memory and the storage device. This is useful for caching data read from remote filesystems (e.g. remote HDFS data node, S3, ABFS, ADLS). A data cache is divided into one or more partitions based on the configuration string which is a list of directories, separated by comma, followed by the storage capacity per directory. An example configuration string is like the following: --data_cache_config=/data/0,/data/1:150GB In the configuration above, the cache may use up to 300GB of storage space, with 150GB max for /data/0 and /data/1 respectively. Each partition has a meta-data cache which tracks the mappings of cache keys to the locations of the cached data. A cache key is a tuple of (file's name, file's modification time, file offset) and a cache entry is a tuple of (backing file, offset in the backing file, length of the cached data, optional checksum). Note that the cache currently doesn't support overlapping ranges. In other words, if the cache contains an entry of a file for range [m, m+4MB), a lookup for [m+4K, m+8K) will miss in the cache. In practice, we haven't seen this as a problem but this may require further evaluation in the future. Each partition stores its set of cached data in backing files created on local storage. When inserting new data into the cache, the data is appended to the current backing file in use. The storage consumption of each cache entry counts towards the quota of that partition. When a partition reaches its capacity, the least recently used (LRU) data in that partition is evicted. Evicted data is removed from the underlying storage by punching holes in the backing file it's stored in. As a backing file reaches a certain size (by default 4TB), new data will stop being appended to it and a new file will be created instead. Note that due to hole punching, the backing file is actually sparse. When the number of backing files per partition exceeds, --data_cache_max_files_per_partition, files are deleted in the order in which they are created. Stale cache entries referencing deleted files are erased lazily or evicted due to inactivity. Optionally, checksumming can be enabled to verify read from the cache is consistent with what was inserted and to verify that multiple attempted insertions with the same cache key have the same cache content. Checksumming is enabled by default for debug builds. To probe for cached data in the cache, the interface Lookup() is used; To insert data into the cache, the interface Store() is used. Please note that eviction happens inline currently during Store(). This patch also added two startup flags for start-impala-cluster.py: '--data_cache_dir' specifies the base directory in which each Impalad creates the caching directory '--data_cache_size' specifies the capacity string for each cache directory. Testing done: - added a new BE and EE test - exhaustive (debug, release) builds with cache enabled - core ASAN build with cache enabled Perf: - 16-streams TPCDS at 3TB in a 20 node S3 cluster shows about 30% improvement over runs without the cache. Each node has a cache size of 150GB per node. The performance is at parity with a configuration of a HDFS cluster using EBS as the storage. Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc --- M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/runtime/io/CMakeLists.txt A be/src/runtime/io/data-cache-test.cc A be/src/runtime/io/data-cache.cc A be/src/runtime/io/data-cache.h M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/disk-io-mgr.h M be/src/runtime/io/hdfs-file-reader.cc M be/src/runtime/io/hdfs-file-reader.h M be/src/runtime/io/request-context.h M be/src/util/filesystem-util-test.cc M be/src/util/filesystem-util.cc M be/src/util/filesystem-util.h M be/src/util/impalad-metrics.cc M be/src/util/impalad-metrics.h M bin/start-impala-cluster.py M common/thrift/metrics.json A testdata/workloads/functional-query/queries/QueryTest/data-cache.test M tests/common/custom_cluster_test_suite.py M tests/common/impala_test_suite.py A tests/custom_cluster/test_data_cache.py M tests/custom_cluster/test_krpc_metrics.py 23 files changed, 2,150 insertions(+), 55 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/12987/9 -- To view, visit http://gerrit.cloudera.org:8080/12987 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id:
[Impala-ASF-CR] IMPALA-8341: Data cache for remote reads
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12987 ) Change subject: IMPALA-8341: Data cache for remote reads .. Patch Set 8: (5 comments) Accidentally pushed a draft. Will push PS9. http://gerrit.cloudera.org:8080/#/c/12987/7/be/src/runtime/io/data-cache-test.cc File be/src/runtime/io/data-cache-test.cc: http://gerrit.cloudera.org:8080/#/c/12987/7/be/src/runtime/io/data-cache-test.cc@274 PS7, Line 274: > I think gflags has a built in 'google::FlagsSaver', no? It handles restorin Nice. I don't think we have an equivalent of that in Impala although it's a nice improvement. Filed IMPALA-8480. http://gerrit.cloudera.org:8080/#/c/12987/7/be/src/runtime/io/data-cache.cc File be/src/runtime/io/data-cache.cc: http://gerrit.cloudera.org:8080/#/c/12987/7/be/src/runtime/io/data-cache.cc@245 PS7, Line 245: explicit CacheFile(std::string path) : path_(move(path)) { } > nit: explicit Done http://gerrit.cloudera.org:8080/#/c/12987/7/be/src/runtime/io/data-cache.cc@301 PS7, Line 301: > here I think you could just call lock_.DCheckLocked() right? Done http://gerrit.cloudera.org:8080/#/c/12987/7/be/src/runtime/io/data-cache.cc@409 PS7, Line 409: uint8_t* buffer) { > why not have UnpackCacheEntry return the CacheEntry object? (the generated Done http://gerrit.cloudera.org:8080/#/c/12987/7/be/src/util/impalad-metrics.h File be/src/util/impalad-metrics.h: http://gerrit.cloudera.org:8080/#/c/12987/7/be/src/util/impalad-metrics.h@91 PS7, Line 91: not ins > In the rest of the code "skipped" often means "pruned" and is a good thing, Done -- To view, visit http://gerrit.cloudera.org:8080/12987 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc Gerrit-Change-Number: 12987 Gerrit-PatchSet: 8 Gerrit-Owner: Michael Ho Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Thu, 02 May 2019 00:50:47 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8341: Data cache for remote reads
Michael Ho has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/12987 ) Change subject: IMPALA-8341: Data cache for remote reads .. IMPALA-8341: Data cache for remote reads This is a patch based on PhilZ's prototype: https://gerrit.cloudera.org/#/c/12683/ This change implements an IO data cache which is backed by local storage. It implicitly relies on the OS page cache management to shuffle data between memory and the storage device. This is useful for caching data read from remote filesystems (e.g. remote HDFS data node, S3, ABFS, ADLS). A data cache is divided into one or more partitions based on the configuration string which is a list of directories, separated by comma, followed by the storage capacity per directory. An example configuration string is like the following: --data_cache_config=/data/0,/data/1:150GB In the configuration above, the cache may use up to 300GB of storage space, with 150GB max for /data/0 and /data/1 respectively. Each partition has a meta-data cache which tracks the mappings of cache keys to the locations of the cached data. A cache key is a tuple of (file's name, file's modification time, file offset) and a cache entry is a tuple of (backing file, offset in the backing file, length of the cached data, optional checksum). Note that the cache currently doesn't support overlapping ranges. In other words, if the cache contains an entry of a file for range [m, m+4MB), a lookup for [m+4K, m+8K) will miss in the cache. In practice, we haven't seen this as a problem but this may require further evaluation in the future. Each partition stores its set of cached data in backing files created on local storage. When inserting new data into the cache, the data is appended to the current backing file in use. The storage consumption of each cache entry counts towards the quota of that partition. When a partition reaches its capacity, the least recently used (LRU) data in that partition is evicted. Evicted data is removed from the underlying storage by punching holes in the backing file it's stored in. As a backing file reaches a certain size (by default 4TB), new data will stop being appended to it and a new file will be created instead. Note that due to hole punching, the backing file is actually sparse. When the number of backing files per partition exceeds, --data_cache_max_files_per_partition, files are deleted in the order in which they are created. Stale cache entries referencing deleted files are erased lazily or evicted due to inactivity. Optionally, checksumming can be enabled to verify read from the cache is consistent with what was inserted and to verify that multiple attempted insertions with the same cache key have the same cache content. Checksumming is enabled by default for debug builds. To probe for cached data in the cache, the interface Lookup() is used; To insert data into the cache, the interface Store() is used. Please note that eviction happens inline currently during Store(). This patch also added two startup flags for start-impala-cluster.py: '--data_cache_dir' specifies the base directory in which each Impalad creates the caching directory '--data_cache_size' specifies the capacity string for each cache directory. Testing done: - added a new BE and EE test - exhaustive (debug, release) builds with cache enabled - core ASAN build with cache enabled Perf: - 16-streams TPCDS at 3TB in a 20 node S3 cluster shows about 30% improvement over runs without the cache. Each node has a cache size of 150GB per node. The performance is at parity with a configuration of a HDFS cluster using EBS as the storage. Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc --- M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/runtime/io/CMakeLists.txt A be/src/runtime/io/data-cache-test.cc A be/src/runtime/io/data-cache.cc A be/src/runtime/io/data-cache.h M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/disk-io-mgr.h M be/src/runtime/io/hdfs-file-reader.cc M be/src/runtime/io/hdfs-file-reader.h M be/src/runtime/io/request-context.h M be/src/util/filesystem-util-test.cc M be/src/util/filesystem-util.cc M be/src/util/filesystem-util.h M be/src/util/impalad-metrics.cc M be/src/util/impalad-metrics.h M bin/start-impala-cluster.py M common/thrift/metrics.json A testdata/workloads/functional-query/queries/QueryTest/data-cache.test M tests/common/custom_cluster_test_suite.py M tests/common/impala_test_suite.py A tests/custom_cluster/test_data_cache.py M tests/custom_cluster/test_krpc_metrics.py 23 files changed, 2,150 insertions(+), 55 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/12987/8 -- To view, visit http://gerrit.cloudera.org:8080/12987 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id:
[Impala-ASF-CR] IMPALA-8341: Data cache for remote reads
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12987 ) Change subject: IMPALA-8341: Data cache for remote reads .. Patch Set 7: (41 comments) http://gerrit.cloudera.org:8080/#/c/12987/5/be/src/runtime/io/data-cache-test.cc File be/src/runtime/io/data-cache-test.cc: http://gerrit.cloudera.org:8080/#/c/12987/5/be/src/runtime/io/data-cache-test.cc@273 PS5, Line 273: auto s = > I just found out we have ScopedFlagSetter in scoped-flag-setter.h, I think Done http://gerrit.cloudera.org:8080/#/c/12987/6/be/src/runtime/io/data-cache.h File be/src/runtime/io/data-cache.h: http://gerrit.cloudera.org:8080/#/c/12987/6/be/src/runtime/io/data-cache.h@112 PS6, Line 112: : /// 'config' is the configurati > per the commit message, we've moved to a single quota rather than per-direc Yes, comments were stale. Fixed in new patch. http://gerrit.cloudera.org:8080/#/c/12987/6/be/src/runtime/io/data-cache.h@261 PS6, Line 261: > perhaps init to -1? Done. Switched to initializing it to 0 in CreateCacheFile() instead. http://gerrit.cloudera.org:8080/#/c/12987/5/be/src/runtime/io/data-cache.h File be/src/runtime/io/data-cache.h: http://gerrit.cloudera.org:8080/#/c/12987/5/be/src/runtime/io/data-cache.h@215 PS5, Line 215: > 'start_reclaim'? Done http://gerrit.cloudera.org:8080/#/c/12987/5/be/src/runtime/io/data-cache.h@337 PS5, Line 337: }; > Can you mention in the comment that the pool has only 1 thread and why you' The reason for using a pool is that we want to be able to queue the work for deferred processing by another thread and ThreadPool seems to provide the right abstraction. May be there are other classes in utility/ which are also applicable ? http://gerrit.cloudera.org:8080/#/c/12987/5/be/src/runtime/io/data-cache.h@341 PS5, Line 341: > Some functions around deleting files are called "Close...". We should point Done http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.cc File be/src/runtime/io/data-cache.cc: http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.cc@306 PS4, Line 306: cache_files_.emplace_back(std::mo > This check is racy. More allocation could have happened since 'insertion_of Done http://gerrit.cloudera.org:8080/#/c/12987/5/be/src/runtime/io/data-cache.cc File be/src/runtime/io/data-cache.cc: http://gerrit.cloudera.org:8080/#/c/12987/5/be/src/runtime/io/data-cache.cc@95 PS5, Line 95: ng files opened, ol > switch to single thread, or mention pool here Done http://gerrit.cloudera.org:8080/#/c/12987/5/be/src/runtime/io/data-cache.cc@112 PS5, Line 112: DeleteFile > Can we call this DeleteFile? Otherwise there's a third thing to keep track Done http://gerrit.cloudera.org:8080/#/c/12987/5/be/src/runtime/io/data-cache.cc@125 PS5, Line 125: d the lock in > It's not obvious to me why we only need a percpu_rwlock here. Can you add a Done. Please also see explanation in comments above. http://gerrit.cloudera.org:8080/#/c/12987/5/be/src/runtime/io/data-cache.cc@208 PS5, Line 208: > nit: singular Done http://gerrit.cloudera.org:8080/#/c/12987/5/be/src/runtime/io/data-cache.cc@335 PS5, Line 335: ing files left over fro > Similar to other comments, I'd call this "VerifySizeAndDeleteFiles", I thin Done http://gerrit.cloudera.org:8080/#/c/12987/5/be/src/runtime/io/data-cache.cc@395 PS5, Line 395: } > Will this handle hole punching through the eviction logic? Yes. http://gerrit.cloudera.org:8080/#/c/12987/5/be/src/runtime/io/data-cache.cc@436 PS5, Line 436: // Try verifying the checksum of the new buffer matches that of the existing entry. > nit: only append the "checksum $3" part if checksumming is enabled? I don't Keeping it for simplicity. http://gerrit.cloudera.org:8080/#/c/12987/5/be/src/runtime/io/data-cache.cc@457 PS5, Line 457: == nullptr)) r > start_reclaim? Done http://gerrit.cloudera.org:8080/#/c/12987/5/be/src/runtime/io/data-cache.cc@633 PS5, Line 633: URN_IF_ERROR(p > start_reclaim? Done http://gerrit.cloudera.org:8080/#/c/12987/6/be/src/runtime/io/data-cache.cc File be/src/runtime/io/data-cache.cc: http://gerrit.cloudera.org:8080/#/c/12987/6/be/src/runtime/io/data-cache.cc@64 PS6, Line 64: DEFINE_int64(data_cache_file_max_size_bytes, 1L << 40 /* 1TB */, > - can you add a comment here like /* 4TB */? Done. Yes, according to https://access.redhat.com/solutions/1532, ext4 should support up to 16TB. Apparently, there is still need to support ext3 which has a limit of 2TB. So, I will set it to 1TB to be safe. http://gerrit.cloudera.org:8080/#/c/12987/6/be/src/runtime/io/data-cache.cc@68 PS6, Line 68: "(Advanced) The maximum number of allowed opened files. This must be at least the " > Setting this per-partition creates a dependency between this and the number Makes sense. Done. http://gerrit.cloudera.org:8080/#/c/12987/6/be/src/runtime/io/data-cache.cc@70 PS6, Line 70:
[Impala-ASF-CR] IMPALA-8341: Data cache for remote reads
Hello Thomas Marshall, Lars Volker, David Rorke, Sahil Takiar, Todd Lipcon, Tim Armstrong, Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12987 to look at the new patch set (#7). Change subject: IMPALA-8341: Data cache for remote reads .. IMPALA-8341: Data cache for remote reads This is a patch based on PhilZ's prototype: https://gerrit.cloudera.org/#/c/12683/ This change implements an IO data cache which is backed by local storage. It implicitly relies on the OS page cache management to shuffle data between memory and the storage device. This is useful for caching data read from remote filesystems (e.g. remote HDFS data node, S3, ABFS, ADLS). A data cache is divided into one or more partitions based on the configuration string which is a list of directories, separated by comma, followed by the storage capacity per directory. An example configuration string is like the following: --data_cache_config=/data/0,/data/1:150GB In the configuration above, the cache may use up to 300GB of storage space, with 150GB max for /data/0 and /data/1 respectively. Each partition has a meta-data cache which tracks the mappings of cache keys to the locations of the cached data. A cache key is a tuple of (file's name, file's modification time, file offset) and a cache entry is a tuple of (backing file, offset in the backing file, length of the cached data, optional checksum). Note that the cache currently doesn't support overlapping ranges. In other words, if the cache contains an entry of a file for range [m, m+4MB), a lookup for [m+4K, m+8K) will miss in the cache. In practice, we haven't seen this as a problem but this may require further evaluation in the future. Each partition stores its set of cached data in backing files created on local storage. When inserting new data into the cache, the data is appended to the current backing file in use. The storage consumption of each cache entry counts towards the quota of that partition. When a partition reaches its capacity, the least recently used (LRU) data in that partition is evicted. Evicted data is removed from the underlying storage by punching holes in the backing file it's stored in. As a backing file reaches a certain size (by default 4TB), new data will stop being appended to it and a new file will be created instead. Note that due to hole punching, the backing file is actually sparse. When the number of backing files per partition exceeds, --data_cache_max_files_per_partition, files are deleted in the order in which they are created. Stale cache entries referencing deleted files are erased lazily or evicted due to inactivity. Optionally, checksumming can be enabled to verify read from the cache is consistent with what was inserted and to verify that multiple attempted insertions with the same cache key have the same cache content. Checksumming is enabled by default for debug builds. To probe for cached data in the cache, the interface Lookup() is used; To insert data into the cache, the interface Store() is used. Please note that eviction happens inline currently during Store(). This patch also added two startup flags for start-impala-cluster.py: '--data_cache_dir' specifies the base directory in which each Impalad creates the caching directory '--data_cache_size' specifies the capacity string for each cache directory. Testing done: - added a new BE and EE test - exhaustive (debug, release) builds with cache enabled - core ASAN build with cache enabled Perf: - 16-streams TPCDS at 3TB in a 20 node S3 cluster shows about 30% improvement over runs without the cache. Each node has a cache size of 150GB per node. The performance is at parity with a configuration of a HDFS cluster using EBS as the storage. Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc --- M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/runtime/io/CMakeLists.txt A be/src/runtime/io/data-cache-test.cc A be/src/runtime/io/data-cache.cc A be/src/runtime/io/data-cache.h M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/disk-io-mgr.h M be/src/runtime/io/hdfs-file-reader.cc M be/src/runtime/io/hdfs-file-reader.h M be/src/runtime/io/request-context.h M be/src/util/filesystem-util-test.cc M be/src/util/filesystem-util.cc M be/src/util/filesystem-util.h M be/src/util/impalad-metrics.cc M be/src/util/impalad-metrics.h M bin/start-impala-cluster.py M common/thrift/metrics.json A testdata/workloads/functional-query/queries/QueryTest/data-cache.test M tests/common/custom_cluster_test_suite.py M tests/common/impala_test_suite.py A tests/custom_cluster/test_data_cache.py M tests/custom_cluster/test_krpc_metrics.py 23 files changed, 2,157 insertions(+), 55 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/12987/7 -- To view, visit http://gerrit.cloudera.org:8080/12987 To
[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13020 ) Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/13020/2/testdata/bin/load-test-warehouse-snapshot.sh File testdata/bin/load-test-warehouse-snapshot.sh: http://gerrit.cloudera.org:8080/#/c/13020/2/testdata/bin/load-test-warehouse-snapshot.sh@62 PS2, Line 62: if [[ "${S3GUARD_ENABLED}" = "true" ]]; then May be useful to verify that S3GUARD_DYNAMODB_TABLE and S3GUARD_DYNAMODB_REGION are not empty if S3GUARD_ENABLED is "true" ? -- To view, visit http://gerrit.cloudera.org:8080/13020 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d Gerrit-Change-Number: 13020 Gerrit-PatchSet: 2 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sahil Takiar Gerrit-Comment-Date: Mon, 29 Apr 2019 20:24:26 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8341: Data cache for remote reads
Hello Thomas Marshall, Lars Volker, David Rorke, Sahil Takiar, Todd Lipcon, Tim Armstrong, Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12987 to look at the new patch set (#6). Change subject: IMPALA-8341: Data cache for remote reads .. IMPALA-8341: Data cache for remote reads This is a patch based on PhilZ's prototype: https://gerrit.cloudera.org/#/c/12683/ This change implements an IO data cache which is backed by local storage. It implicitly relies on the OS page cache management to shuffle data between memory and the storage device. This is useful for caching data read from remote filesystems (e.g. remote HDFS data node, S3, ABFS, ADLS). A data cache is divided into one or more partitions based on the configuration string which is a list of directories, separated by comma, followed by the storage capacity per directory. An example configuration string is like the following: --data_cache_config=/data/0,/data/1:150GB In the configuration above, the cache may use up to 300GB of storage space, with 150GB max for /data/0 and /data/1 respectively. Each partition has a meta-data cache which tracks the mappings of cache keys to the locations of the cached data. A cache key is a tuple of (file's name, file's modification time, file offset) and a cache entry is a tuple of (backing file, offset in the backing file, length of the cached data, optional checksum). Note that the cache currently doesn't support overlapping ranges. In other words, if the cache contains an entry of a file for range [m, m+4MB), a lookup for [m+4K, m+8K) will miss in the cache. In practice, we haven't seen this as a problem but this may require further evaluation in the future. Each partition stores its set of cached data in backing files created on local storage. When inserting new data into the cache, the data is appended to the current backing file in use. The storage consumption of each cache entry counts towards the quota of that partition. When a partition reaches its capacity, the least recently used (LRU) data in that partition is evicted. Evicted data is removed from the underlying storage by punching holes in the backing file it's stored in. As a backing file reaches a certain size (by default 4TB), new data will stop being appended to it and a new file will be created instead. Note that due to hole punching, the backing file is actually sparse. When the number of backing files per partition exceeds, --data_cache_max_files_per_partition, files are deleted in the order in which they are created. Stale cache entries referencing deleted files are erased lazily or evicted due to inactivity. Optionally, checksumming can be enabled to verify read from the cache is consistent with what was inserted and to verify that multiple attempted insertions with the same cache key have the same cache content. Checksumming is enabled by default for debug builds. To probe for cached data in the cache, the interface Lookup() is used; To insert data into the cache, the interface Store() is used. Please note that eviction happens inline currently during Store(). This patch also added two startup flags for start-impala-cluster.py: '--data_cache_dir' specifies the base directory in which each Impalad creates the caching directory '--data_cache_size' specifies the capacity string for each cache directory. Testing done: - added a new BE and EE test - exhaustive (debug, release) builds with cache enabled - core ASAN build with cache enabled Perf: - 16-streams TPCDS at 3TB in a 20 node S3 cluster shows about 30% improvement over runs without the cache. Each node has a cache size of 150GB per node. The performance is at parity with a configuration of a HDFS cluster using EBS as the storage. Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc --- M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/runtime/io/CMakeLists.txt A be/src/runtime/io/data-cache-test.cc A be/src/runtime/io/data-cache.cc A be/src/runtime/io/data-cache.h M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/disk-io-mgr.h M be/src/runtime/io/hdfs-file-reader.cc M be/src/runtime/io/hdfs-file-reader.h M be/src/runtime/io/request-context.h M be/src/util/filesystem-util-test.cc M be/src/util/filesystem-util.cc M be/src/util/filesystem-util.h M be/src/util/impalad-metrics.cc M be/src/util/impalad-metrics.h M bin/start-impala-cluster.py M common/thrift/metrics.json A testdata/workloads/functional-query/queries/QueryTest/data-cache.test M tests/common/custom_cluster_test_suite.py M tests/common/impala_test_suite.py A tests/custom_cluster/test_data_cache.py M tests/custom_cluster/test_krpc_metrics.py 23 files changed, 2,062 insertions(+), 55 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/12987/6 -- To view, visit http://gerrit.cloudera.org:8080/12987 To
[Impala-ASF-CR] IMPALA-8341: Data cache for remote reads
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12987 ) Change subject: IMPALA-8341: Data cache for remote reads .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/12987/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12987/4//COMMIT_MSG@61 PS4, Line 61: Testing done: a new BE test was added; core test with cache enabled. > I have a bit of trouble in getting this to work in the mini-cluster. May be Ended up adding a startup flag to force use of the cache even for local reads. A new custom cluster test was also added for sanity check. -- To view, visit http://gerrit.cloudera.org:8080/12987 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc Gerrit-Change-Number: 12987 Gerrit-PatchSet: 4 Gerrit-Owner: Michael Ho Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Sat, 27 Apr 2019 07:27:42 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8341: Data cache for remote reads
Michael Ho has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/12987 ) Change subject: IMPALA-8341: Data cache for remote reads .. IMPALA-8341: Data cache for remote reads This is a patch based on PhilZ's prototype: https://gerrit.cloudera.org/#/c/12683/ This change implements an IO data cache which is backed by local storage. It implicitly relies on the OS page cache management to shuffle data between memory and the storage device. This is useful for caching data read from remote filesystems (e.g. remote HDFS data node, S3, ABFS, ADLS). A data cache is divided into one or more partitions based on the configuration string which is a list of directories, separated by comma, followed by the storage capacity per directory. An example configuration string is like the following: --data_cache_config=/data/0,/data/1:150GB In the configuration above, the cache may use up to 300GB of storage space, with 150GB max for /data/0 and /data/1 respectively. Each partition has a meta-data cache which tracks the mappings of cache keys to the locations of the cached data. A cache key is a tuple of (file's name, file's modification time, file offset) and a cache entry is a tuple of (backing file, offset in the backing file, length of the cached data, optional checksum). Note that the cache currently doesn't support overlapping ranges. In other words, if the cache contains an entry of a file for range [m, m+4MB), a lookup for [m+4K, m+8K) will miss in the cache. In practice, we haven't seen this as a problem but this may require further evaluation in the future. Each partition stores its set of cached data in backing files created on local storage. When inserting new data into the cache, the data is appended to the current backing file in use. The storage consumption of each cache entry counts towards the quota of that partition. When a partition reaches its capacity, the least recently used (LRU) data in that partition is evicted. Evicted data is removed from the underlying storage by punching holes in the backing file it's stored in. As a backing file reaches a certain size (by default 4TB), new data will stop being appended to it and a new file will be created instead. Note that due to hole punching, the backing file is actually sparse. When the number of backing files per partition exceeds, --data_cache_max_files_per_partition, files are deleted in the order in which they are created. Stale cache entries referencing deleted files are erased lazily or evicted due to inactivity. Optionally, checksumming can be enabled to verify read from the cache is consistent with what was inserted and to verify that multiple attempted insertions with the same cache key have the same cache content. Checksumming is enabled by default for debug builds. To probe for cached data in the cache, the interface Lookup() is used; To insert data into the cache, the interface Store() is used. Please note that eviction happens inline currently during Store(). This patch also added two startup flags for start-impala-cluster.py: '--data_cache_dir' specifies the base directory in which each Impalad creates the caching directory '--data_cache_size' specifies the capacity string for each cache directory. Testing done: - added a new BE and EE test - exhaustive (debug, release) builds with cache enabled - core ASAN build with cache enabled Perf: - 16-streams TPCDS at 3TB in a 20 node S3 cluster shows about 30% improvement over runs without the cache. Each node has a cache size of 150GB per node. The performance is at parity with a configuration of a HDFS cluster using EBS as the storage. Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc --- M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/runtime/io/CMakeLists.txt A be/src/runtime/io/data-cache-test.cc A be/src/runtime/io/data-cache.cc A be/src/runtime/io/data-cache.h M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/disk-io-mgr.h M be/src/runtime/io/hdfs-file-reader.cc M be/src/runtime/io/hdfs-file-reader.h M be/src/runtime/io/request-context.h M be/src/util/filesystem-util-test.cc M be/src/util/filesystem-util.cc M be/src/util/filesystem-util.h M be/src/util/impalad-metrics.cc M be/src/util/impalad-metrics.h M bin/start-impala-cluster.py M common/thrift/metrics.json A testdata/workloads/functional-query/queries/QueryTest/data-cache.test M tests/common/custom_cluster_test_suite.py M tests/common/impala_test_suite.py A tests/custom_cluster/test_data_cache.py M tests/custom_cluster/test_krpc_metrics.py 23 files changed, 2,059 insertions(+), 55 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/12987/5 -- To view, visit http://gerrit.cloudera.org:8080/12987 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id:
[Impala-ASF-CR] IMPALA-8341: Data cache for remote reads
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12987 ) Change subject: IMPALA-8341: Data cache for remote reads .. Patch Set 5: (32 comments) http://gerrit.cloudera.org:8080/#/c/12987/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12987/4//COMMIT_MSG@61 PS4, Line 61: '--data_cache_dir' specifies the base directory in which each Impalad > Can we add a custom cluster test to sanity check that it works end to end. I have a bit of trouble in getting this to work in the mini-cluster. May be it's easier with the dockerised test. http://gerrit.cloudera.org:8080/#/c/12987/4//COMMIT_MSG@62 PS4, Line 62: creates the caching directory > Do we want to consider enabling this by default for the dockerised tests as Definitely. Will also do so for S3 builds. http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/exec/hdfs-scan-node-base.cc File be/src/exec/hdfs-scan-node-base.cc: http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/exec/hdfs-scan-node-base.cc@365 PS4, Line 365: "DataCacheHitCount", TUnit::UNIT); > Do we have tests for these to make sure they show up in the profiles and wo Done http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache-test.cc File be/src/runtime/io/data-cache-test.cc: http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache-test.cc@138 PS4, Line 138: > Move to the other test constants (TEMP_BUFFER_SIZE etc)? Done http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache-test.cc@226 PS4, Line 226: > nit: This could now be 4 * FLAGS_data_cache_file_max_size ? Done http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache-test.cc@349 PS4, Line 349: num_entries = 0 > Can they be to separate tests? Done http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache-test.cc@410 PS4, Line 410: mp_buf > nit: typo Done http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.h File be/src/runtime/io/data-cache.h: http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.h@83 PS4, Line 83: eviction from it happen s > That's controlled by --data_cache_write_concurrency, right? Mention here? Done http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.h@87 PS4, Line 87: /// of 4KB so any data inserted will be rounded up to the nearest multiple of 4KB. > Do we plan to look into picking partitions on faster disks with higher prob Ideally, we want to keep the hotter data in the faster media while keeping the lukewarm data in the slower media. Added a TODO. http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.h@88 PS4, Line 88: /// > Yeah this scenario is a bit concerning for me still since it's conceivable I added a "simple" implementation with rw-lock and lazy cache entry eviction. If it's deemed too complicated, please let me know and I can undo it. Also added a test case for it. http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.h@145 PS4, Line 145: , > nit: formatting Done http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.h@197 PS4, Line 197: /// - removes any stale backing file in this partition > Should we also delete the files when we close them? There's a distinction i This was needed for data-cache-test.cc as we need to close the files before verifying their sizes. However, it seems that we can hide all those internal details in VerifyFileSizes(), which is renamed to CloseAndVerifyFileSizes(); http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.h@203 PS4, Line 203: > Should we pass const CacheKey& here and convert it in the implementation? Done http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.h@224 PS4, Line 224: void EvictedEnt > nit: VerifyFileSizes Done http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.cc File be/src/runtime/io/data-cache.cc: http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.cc@72 PS4, Line 72: "(Advanced) Enable checksumming for the cached buffer."); > static const char*? This is actually a static class member of DataCache. http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.cc@75 PS4, Line 75: namespace io { > Should this be a class, given it has a c'tor and d'tor? Done http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.cc@173 PS4, Line 173: ock(lock_.get_lock()); : if (UNLIKELY(!file_)) return fals > I think you can merge these two lines, which also reduces the risk that som Done http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.cc@187 PS4, Line 187: inline > nit: missing include, but we might generally omit this one. Not sure which one you are referring to ? Isn't it in #include "common/names.h" ?
[Impala-ASF-CR] IMPALA-2990: timeout unresponsive queries in coordinator
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12299 ) Change subject: IMPALA-2990: timeout unresponsive queries in coordinator .. Patch Set 9: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/12299 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I196c8c6a5633b1960e2c3a3884777be9b3824987 Gerrit-Change-Number: 12299 Gerrit-PatchSet: 9 Gerrit-Owner: Thomas Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Thu, 25 Apr 2019 18:43:32 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8295: Fix flakiness in TestAdmissionControllerStress
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13103 ) Change subject: IMPALA-8295: Fix flakiness in TestAdmissionControllerStress .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/13103 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If9ada7bdbe2461d164100badabb905cf9a8cf4cc Gerrit-Change-Number: 13103 Gerrit-PatchSet: 2 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 24 Apr 2019 19:01:43 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8295: Fix flakiness in TestAdmissionControllerStress
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13103 ) Change subject: IMPALA-8295: Fix flakiness in TestAdmissionControllerStress .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/13103/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/13103/1//COMMIT_MSG@9 PS1, Line 9: ecounters encounters -- To view, visit http://gerrit.cloudera.org:8080/13103 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If9ada7bdbe2461d164100badabb905cf9a8cf4cc Gerrit-Change-Number: 13103 Gerrit-PatchSet: 1 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 24 Apr 2019 18:32:12 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2990: timeout unresponsive queries in coordinator
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12299 ) Change subject: IMPALA-2990: timeout unresponsive queries in coordinator .. Patch Set 8: (8 comments) http://gerrit.cloudera.org:8080/#/c/12299/8//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12299/8//COMMIT_MSG@34 PS8, Line 34: 'REPORT_EXEC_STATUS_SEND:FAIL@0.1|REPORT_EXEC_STATUS_RECV:FAIL@0.1' Stale ? http://gerrit.cloudera.org:8080/#/c/12299/8/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: http://gerrit.cloudera.org:8080/#/c/12299/8/be/src/runtime/query-state.cc@400 PS8, Line 400: FLAGS_backend_client_rpc_timeout_ms Can you please leave a TODO on rethinking the timeout period for RPCs in general. As we convert more of the RPCs from Thrift To KRPC, this flag's role has changed. Historically, this is used for setting socket timeout with Thrift RPC so that an Impala thread will pop out of the Thrift RPC stack and check for cancellation. http://gerrit.cloudera.org:8080/#/c/12299/8/be/src/runtime/query-state.cc@414 PS8, Line 414: spend spent http://gerrit.cloudera.org:8080/#/c/12299/8/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/12299/8/be/src/service/impala-server.cc@217 PS8, Line 217: "reporting is disabled."); and only the final report is sent. http://gerrit.cloudera.org:8080/#/c/12299/8/be/src/service/impala-server.cc@218 PS8, Line 218: 180 To avoid regression, let's be more conservative and set it to be at least 300s, if not larger. http://gerrit.cloudera.org:8080/#/c/12299/8/be/src/service/impala-server.cc@221 PS8, Line 221: The percent padding that a " : "coordinator should wait to recieve a report after --status_report_max_retry_s " : "before deciding that a backend is unresponsive and the query should be cancelled. May be clearer to include the formula you have in the commit message. http://gerrit.cloudera.org:8080/#/c/12299/8/be/src/service/impala-server.cc@421 PS8, Line 421: } May wanna check if --status_report_cancellation_padding is set to something sane. http://gerrit.cloudera.org:8080/#/c/12299/8/be/src/service/impala-server.cc@2248 PS8, Line 2248: * (1 + FLAGS_status_report_cancellation_padding / 100.0); DCHECK_GT(max_lag_ms, 0); -- To view, visit http://gerrit.cloudera.org:8080/12299 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I196c8c6a5633b1960e2c3a3884777be9b3824987 Gerrit-Change-Number: 12299 Gerrit-PatchSet: 8 Gerrit-Owner: Thomas Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 23 Apr 2019 22:34:25 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4356,IMPALA-7331: codegen all ScalarExprs
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12797 ) Change subject: IMPALA-4356,IMPALA-7331: codegen all ScalarExprs .. Patch Set 10: (12 comments) http://gerrit.cloudera.org:8080/#/c/12797/10/be/src/exprs/scalar-expr.h File be/src/exprs/scalar-expr.h: http://gerrit.cloudera.org:8080/#/c/12797/10/be/src/exprs/scalar-expr.h@87 PS10, Line 87: Get*Val() dispatches to either : /// a codegen'd function pointer or to an interpreted implementation Get*ValInterpreted() : /// These interpreted functions must be overridden by subclasses of ScalarExpr for every : /// type that they may return. : /// : /// The two main usage patterns for ScalarExpr are: : /// * The codegen'd expressions are called from other codegen'd functions, e.g. from a : /// codegen'd join implementation : /// * Get*Val() is called on the root of each expression subtree by interpreted code. : /// We can optimize for the second usage pattern by filling in the codegen'd function : /// pointer (codegend_compute_fn_) in root of each ScalarExpr tree. Individual callsites : /// can disable this optimisation if it's not needed. Expr subtrees can be evaluated : /// (e.g. by ScalarExprEvaluator::GetConstValue()) but may fail back to a slower : /// interpreted implementation. These sets of comments seem to fit better after line 107-114. Reading this comment, it's a bit hard to understand what's being talked about without knowing what codegen is. It will be clearer to first do an introduction of codegen and its relationship with ScalarExpr and various functions before pointing out that Get*Val() is a wrapper which dispatches to codegen'd function by default but can also fall back to the interpreted version if it's not codegen'd. http://gerrit.cloudera.org:8080/#/c/12797/10/be/src/exprs/scalar-expr.h@170 PS10, Line 170: bool is_codegen_entry_point, I wonder how much measurable benefit we get from this. IMHO, this can be an optimization as a follow-up patch. Doesn't seem 100% necessary functionally for this patch. http://gerrit.cloudera.org:8080/#/c/12797/10/be/src/exprs/scalar-expr.h@288 PS10, Line 288: If 'is_entry_point' is true, this indicates that Get*Val() : /// may be called directly from interpreted code and that we should generate an entry : /// point into the codegen'd code. Please see comments in scalar-expr.cc. 'is_entry_point' may not need to be exposed. http://gerrit.cloudera.org:8080/#/c/12797/10/be/src/exprs/scalar-expr.h@349 PS10, Line 349: /// The vast majority of exprs support interpretation, so default to true. May want to point out that expressions which aren't interpretable should override this function etc. Will also help to explain more elaborately why an expression is not interpretable. IMHO, this concept is a bit more on the side of implementation details and exposing this interface is a bit unfortunate as it inter-mingles the already complicated concept of ScalarExpr with codegen. http://gerrit.cloudera.org:8080/#/c/12797/10/be/src/exprs/scalar-expr.h@375 PS10, Line 375: codegend_compute_fn_ = nullptr; Please document that this is left as null if this scalar expression is not an "entry point". http://gerrit.cloudera.org:8080/#/c/12797/10/be/src/exprs/scalar-expr.cc File be/src/exprs/scalar-expr.cc: http://gerrit.cloudera.org:8080/#/c/12797/10/be/src/exprs/scalar-expr.cc@284 PS10, Line 284: bool is_entry_point May be I didn't follow all the details correctly but it seems to me that all these plumbing for passing this flag around is only used in GetCodegendComputeFn(). In theory, only the top level expression (i.e. the root of the scalar expression tree) may require exposing itself with a function pointer after codegen (i.e. it's an entry point). So, why cannot 'is_entry_point' be a property of a ScalarExpr itself ? In other words, a simple implementation is to set 'is_entry_point_' to true for the root of all ScalarExpr and leave it as false for any sub-expressions in the tree. In this way, we don't need to plumb this flag around. More importantly, this seems to be implementation specific details / optimization that we probably should try not to expose it in the interface of Init() if possible. http://gerrit.cloudera.org:8080/#/c/12797/10/be/src/exprs/scalar-expr.cc@409 PS10, Line 409: llvm::Function* static_getval_fn = GetStaticGetValWrapper(type(), codegen); if (static_getval_fn == nullptr) { return } We only get a DCHECK in debug build. Failing to codegen shouldn't be fatal. http://gerrit.cloudera.org:8080/#/c/12797/10/be/src/exprs/scalar-expr.inline.h File be/src/exprs/scalar-expr.inline.h: