[native-toolchain-CR] IMPALA-5659: Build shared libs for gflags
Henry Robinson has posted comments on this change. Change subject: IMPALA-5659: Build shared libs for gflags .. Patch Set 1: Verified+1 Passed a toolchain build -- To view, visit http://gerrit.cloudera.org:8080/7415 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If1842768f21444ccf231cfd3fe5e51a4ca0d7244 Gerrit-PatchSet: 1 Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[native-toolchain-CR] IMPALA-5659: Build shared libs for gflags
Henry Robinson has submitted this change and it was merged. Change subject: IMPALA-5659: Build shared libs for gflags .. IMPALA-5659: Build shared libs for gflags Change-Id: If1842768f21444ccf231cfd3fe5e51a4ca0d7244 --- M source/gflags/build.sh 1 file changed, 2 insertions(+), 1 deletion(-) Approvals: Henry Robinson: Verified Matthew Jacobs: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/7415 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: If1842768f21444ccf231cfd3fe5e51a4ca0d7244 Gerrit-PatchSet: 1 Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Matthew Jacobs
[Impala-ASF-CR] IMPALA-5116: Remove deprecated hash * types in gutil
Kim Jin Chul has posted comments on this change. Change subject: IMPALA-5116: Remove deprecated hash_* types in gutil .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/7414/3/be/src/gutil/hash/hash.h File be/src/gutil/hash/hash.h: PS3, Line 218: As I told in the previous comment, _gnu_ctx is replaced into std. https://gerrit.cloudera.org/#/c/7414/1/be/src/gutil/hash/hash.h@a278 Line 251 You may know std already defined pointer type as below. /// Partial specializations for pointer types. template struct hash<_Tp*> : public __hash_base { size_t operator()(_Tp* __p) const noexcept { return reinterpret_cast(__p); } }; I guess your hash function is more faster than std's implementation, but there are pros and cons. I found the interesting article: https://stackoverflow.com/questions/20953390/what-is-the-fastest-hash-function-for-pointers I have the following issue on this. Please answer it. How to implement a customized hash specialization for pointer type. I removed your code because it cannot be redefined. I think we need to introduce a new template class which extends std::hash if we want to keep own hash. -- To view, visit http://gerrit.cloudera.org:8080/7414 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I06af30fd15acd43a60e3c30af54056c96520b6e9 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Kim Jin Chul Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5116: Remove deprecated hash * types in gutil
Kim Jin Chul has uploaded a new patch set (#3). Change subject: IMPALA-5116: Remove deprecated hash_* types in gutil .. IMPALA-5116: Remove deprecated hash_* types in gutil The following class templates are substituted from C++11 standard. __gnu_cxx::hash_map => std::unordered_map __gnu_cxx::hash_set => std::unordered_set __gnu_cxx::hash => std::hash Change-Id: I06af30fd15acd43a60e3c30af54056c96520b6e9 --- M be/src/gutil/hash/hash.cc M be/src/gutil/hash/hash.h M be/src/gutil/strings/join.h M be/src/gutil/strings/serialize.cc M be/src/gutil/strings/serialize.h M be/src/gutil/strings/split.cc M be/src/gutil/strings/split.h 7 files changed, 37 insertions(+), 121 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/7414/3 -- To view, visit http://gerrit.cloudera.org:8080/7414 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I06af30fd15acd43a60e3c30af54056c96520b6e9 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Kim Jin Chul
[Impala-ASF-CR] IMPALA-5650: Make sum init zero a SUM function
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-5650: Make sum_init_zero a SUM function .. IMPALA-5650: Make sum_init_zero a SUM function The recent Parquet count(*) optimization (IMPALA-5036) introduced a new aggregate function sum_init_zero(). This caused an issue for non partitioned aggregation node, which checks that the function type is either count, min, max, sum or ndv. Since sum_init_zero() was not labelled as a sum() function, a DCHECK was hit. This issue is fixed by labelling sum_init_zero() as a sum() function. Testing: - Verified that we no longer hit a DCHECK when running a query with a Parquet count(*) optimization with legacy agg node enabled. Change-Id: Ibca53e3ef6bad5283550100b33db55d674cff0b9 Reviewed-on: http://gerrit.cloudera.org:8080/7404 Reviewed-by: Michael Ho Tested-by: Impala Public Jenkins --- M be/src/exec/partitioned-aggregation-node.cc M be/src/exprs/agg-fn.cc 2 files changed, 10 insertions(+), 5 deletions(-) Approvals: Impala Public Jenkins: Verified Michael Ho: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/7404 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ibca53e3ef6bad5283550100b33db55d674cff0b9 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-5650: Make sum init zero a SUM function
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5650: Make sum_init_zero a SUM function .. Patch Set 4: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/7404 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibca53e3ef6bad5283550100b33db55d674cff0b9 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5520: TopN node periodically reclaims old allocations
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5520: TopN node periodically reclaims old allocations .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/7400/3/be/src/exec/topn-node.h File be/src/exec/topn-node.h: Line 100: boost::scoped_ptr> tuple_row_comparator_wrapper_; > True, but every time POP_HEAP or PUSH_HEAP is used, it will incur the cost The only member of ComparatorWrapper is the TupleRowComparator&. If you treat the wrapper object as a value (i.e. don't heap allocate it) any decent compiler will be able to optimise that into just passing the TupleRowComparator* pointer around. -- To view, visit http://gerrit.cloudera.org:8080/7400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I968f57f0ff2905bd581908bc5c5ee486b31e6aa8 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4276: Profile displays non-default query options set by planner
Henry Robinson has posted comments on this change. Change subject: IMPALA-4276: Profile displays non-default query options set by planner .. Patch Set 1: (5 comments) http://gerrit.cloudera.org:8080/#/c/7419/1//COMMIT_MSG Commit Message: PS1, Line 10: expilictly explicitly PS1, Line 10: expilictly set : default query options were also populated in the runtime profile. can you clarify what this means - what's an explicitly set default query option? One that's set by the user to the default value? http://gerrit.cloudera.org:8080/#/c/7419/1/be/src/service/client-request-state.cc File be/src/service/client-request-state.cc: PS1, Line 151: exec_request nit: use exec_request_ http://gerrit.cloudera.org:8080/#/c/7419/1/be/src/service/query-options.cc File be/src/service/query-options.cc: PS1, Line 84: can you quickly explain why this is right? It looks to me like we would want to include any option that is set, if there is no default value for it. http://gerrit.cloudera.org:8080/#/c/7419/1/tests/query_test/test_observability.py File tests/query_test/test_observability.py: PS1, Line 112: query = "set mem_limit = 0" can you change this to some other query option, then test that MEM_LIMIT=8589934592 is still in the profile as well? -- To view, visit http://gerrit.cloudera.org:8080/7419 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08e9dc2bebb83101976bbbd903ee48c5068dbaab Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Henry Robinson Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5520: TopN node periodically reclaims old allocations
Bikramjeet Vig has uploaded a new patch set (#4). Change subject: IMPALA-5520: TopN node periodically reclaims old allocations .. IMPALA-5520: TopN node periodically reclaims old allocations Currently TopN retains old string allocations in a tuple pool which is held longer than necessary, resulting in unnecessary memory usage. With this commit, the TopN node will periodically re-materialise the rows stored in the priority queue and reclaim the old allocations. This is done when the number of rows removed from the priority queue is more than twice the N (limit + offset). Moreover, a new counter called "TuplePoolReclamations" is added to the TopN node that keeps track of the number of times the tuple pool is reclaimed. Testing: Test added to test_tpch_queries.py which sets a low mem_limit such that the test would fail if reclamation is not implemented and pass otherwise. Performance: Query 1 (expected general case): select * from tpch.lineitem order by l_orderkey desc limit 10; Query 2 (example worst case: data stored in reverse order before feeding to the last TopN node): select * from (select * from tpch.lineitem order by l_orderkey desc limit 6001215) tb order by l_orderkey limit 10; With Reclaim Without Reclaim Query 1 Query 2 Query 1 Query 2 MaxTuplePoolMem3.96 KB 3.43 KB 110.2 MB708.8 MB Time (mean)2s 218ms6s 391ms 2s 021ms6s 406ms Time (stdev) 74.38ms 67.45ms 102.71ms70.44ms Reclaims910 5861 N/A N/A We notice that memory footprint is orders of magnitude lower while maintaining similar query runtimes. Cluster perf testing will be done later. Change-Id: I968f57f0ff2905bd581908bc5c5ee486b31e6aa8 --- M be/src/exec/topn-node-ir.cc M be/src/exec/topn-node.cc M be/src/exec/topn-node.h M tests/query_test/test_tpch_queries.py 4 files changed, 124 insertions(+), 22 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/7400/4 -- To view, visit http://gerrit.cloudera.org:8080/7400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I968f57f0ff2905bd581908bc5c5ee486b31e6aa8 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5520: TopN node periodically reclaims old allocations
Bikramjeet Vig has posted comments on this change. Change subject: IMPALA-5520: TopN node periodically reclaims old allocations .. Patch Set 3: (6 comments) http://gerrit.cloudera.org:8080/#/c/7400/3/be/src/exec/topn-node.cc File be/src/exec/topn-node.cc: PS3, Line 244: tuple_row_comparator_wrapper_ > I think something like ComparatorWrapper(*tuple_row_less_than_) will work. Please refer to related comment in topn-node.h PS3, Line 259: NULL > It would be best to pass in the RuntimeState* - that enables better logging Done PS3, Line 271: NULL > Same here. Done http://gerrit.cloudera.org:8080/#/c/7400/3/be/src/exec/topn-node.h File be/src/exec/topn-node.h: Line 36: #define PUSH_HEAP(v, comp, elem) v.push_back(elem); \ > It's probably best to just make it an inline function, it doesn't seem like Done Line 100: boost::scoped_ptr> tuple_row_comparator_wrapper_; > We don't need to store this additional state, since all the state is in tup True, but every time POP_HEAP or PUSH_HEAP is used, it will incur the cost of creating a new wrapper object,which will esp be a problem in InsertTupleRow method PS3, Line 134: vector > Also mention that it's updated with push_heap()/pop_heap(). Done -- To view, visit http://gerrit.cloudera.org:8080/7400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I968f57f0ff2905bd581908bc5c5ee486b31e6aa8 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5394: Handle blocked HS2 connections
Henry Robinson has posted comments on this change. Change subject: IMPALA-5394: Handle blocked HS2 connections .. Patch Set 3: (1 comment) Patch looks pretty good. It would be great to still respect fe_service_threads, if possible, as I know of some users who set that flag to control the concurrency on a particular Impala instance. It shouldn't be too hard to have TAcceptQueueServer::Task limit itself to have only fe_service_threads active at one time with a condition variable and a shared atomic integer. Do you think that's something you could add to this patch? http://gerrit.cloudera.org:8080/#/c/7061/3/be/src/rpc/TAcceptQueueServer.cpp File be/src/rpc/TAcceptQueueServer.cpp: Line 210 > The prior comment mentioned potential thread safety issues, but I didn't se IIRC, the issue was that we didn't know for sure whether the various transport and protocol factories (particularly with SSL enabled) were thread-safe. Might be worth keeping this at 1 until we know for sure one way or the other. -- To view, visit http://gerrit.cloudera.org:8080/7061 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Sherman Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: John Sherman Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5158,IMPALA-5236: account for unused buffer pool reservations
Tim Armstrong has uploaded a new patch set (#3). Change subject: IMPALA-5158,IMPALA-5236: account for unused buffer pool reservations .. IMPALA-5158,IMPALA-5236: account for unused buffer pool reservations We were missing accounting for this, since it is part of the expected difference between query and process memory consumption. The identity that applies is is: buffers allocated from system = reservation + cached buffers - unused reservation Where "cached buffers" includes free buffers and buffers attached to clean pages. The reservation is accounted against queries and "buffers allocated from system" is accounted against the process MemTracker. Reporting this in a direct way required adding a concept of a MemTracker with negative consumption, which fortunately did not require any major changes to the MemTracker code. Example output when applied to buffer pool branch: Process: Limit=8.35 GB Total=579.18 MB Peak=590.41 MB Buffer Pool: Free Buffers: Total=268.25 MB Buffer Pool: Clean Pages: Total=172.25 MB Buffer Pool: Unused Reservation: Total=-8.25 MB Free Disk IO Buffers: Total=21.98 MB Peak=21.98 MB RequestPool=default-pool: Total=12.07 MB Peak=71.58 MB ... ... RequestPool=fe-eval-exprs: Total=0 Peak=4.00 KB Untracked Memory: Total=112.88 MB Testing: Added a basic test for MemTrackers with negative metrics. Change-Id: Idb1fa3110dc893321f9f4e8ced6b7ede12194dad --- M be/src/runtime/exec-env.cc M be/src/runtime/mem-tracker-test.cc M be/src/runtime/mem-tracker.cc M be/src/runtime/mem-tracker.h M be/src/util/memory-metrics.cc M be/src/util/memory-metrics.h M be/src/util/pretty-printer-test.cc M be/src/util/pretty-printer.h M common/thrift/metrics.json 9 files changed, 78 insertions(+), 13 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/7380/3 -- To view, visit http://gerrit.cloudera.org:8080/7380 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idb1fa3110dc893321f9f4e8ced6b7ede12194dad Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong
[Impala-ASF-CR] IMPALA-4276: Profile displays non-default query options set by planner
Bikramjeet Vig has uploaded a new change for review. http://gerrit.cloudera.org:8080/7419 Change subject: IMPALA-4276: Profile displays non-default query options set by planner .. IMPALA-4276: Profile displays non-default query options set by planner Fix to populate the non-default query options set by planner in the runtime profile. Additionally fixed a bug where expilictly set default query options were also populated in the runtime profile. Added a corresponding test case. Change-Id: I08e9dc2bebb83101976bbbd903ee48c5068dbaab --- M be/src/service/client-request-state.cc M be/src/service/query-options.cc M tests/query_test/test_observability.py 3 files changed, 24 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/7419/1 -- To view, visit http://gerrit.cloudera.org:8080/7419 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I08e9dc2bebb83101976bbbd903ee48c5068dbaab Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet Vig
[Impala-ASF-CR] IMPALA-5659: Begin standardizing treatment of thirdparty libraries
Henry Robinson has uploaded a new change for review. http://gerrit.cloudera.org:8080/7418 Change subject: IMPALA-5659: Begin standardizing treatment of thirdparty libraries .. IMPALA-5659: Begin standardizing treatment of thirdparty libraries If Impala was built with --build_shared_libs, some thirdparty libraries were still statically linked; this could cause runtime errors if the libraries were also linked into a .so. This patch fixes that issue (for gflags, glog and protobuf at least) by ensuring that build_shared_libs is respected for those libraries. * Standardize thirdparty library handling w/CMake by adding IMPALA_ADD_THIRDPARTY_LIB. This creates a symbolic name for each library, allowing us to switch the underlying library files (e.g. change from static to dynamic linking) without having to individually change the link clauses for each target. * Remove most cases of add_library() from cmake_modules/* - that is all handled by IMPALA_ADD_THIRDPARTY_LIB. * Add shared library detection for a couple of thirdparty dependencies (many only detect static libraries), just to prove the concept. * All thirdparty libraries now print a standard set of messages. For example: -- --> Adding thirdparty library protoc. <-- -- Header files: /data/henry/src/cloudera/impala-toolchain/protobuf-2.6.1/include -- Added shared library dependency protoc: /data/henry/src/cloudera/impala-toolchain/protobuf-2.6.1/lib/libprotoc.so -- --> Adding thirdparty library libev. <-- -- Header files: /data/henry/src/cloudera/impala-toolchain/libev-4.20/include -- Added shared library dependency libev: -- /data/henry/src/cloudera/impala-toolchain/libev-4.20/lib/libev.so * Some libraries don't quite fit this pattern (LLVM and Boost) - leave them as is for now. * Remove FindOpenSSL.cmake - the toolchain one is more modern. Change-Id: Ib7a6bc5610aaf2450f91348d94cfb984c6a4b78d --- M CMakeLists.txt M be/CMakeLists.txt M cmake_modules/FindAvro.cmake M cmake_modules/FindBreakpad.cmake M cmake_modules/FindGFlags.cmake M cmake_modules/FindGLog.cmake M cmake_modules/FindGTest.cmake M cmake_modules/FindHDFS.cmake M cmake_modules/FindLdap.cmake M cmake_modules/FindLz4.cmake D cmake_modules/FindOpenSSL.cmake M cmake_modules/FindRe2.cmake M cmake_modules/FindSnappy.cmake M cmake_modules/FindZlib.cmake M cmake_modules/kudu_cmake_fns.txt 15 files changed, 117 insertions(+), 260 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/18/7418/1 -- To view, visit http://gerrit.cloudera.org:8080/7418 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ib7a6bc5610aaf2450f91348d94cfb984c6a4b78d Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson
[Impala-ASF-CR] IMPALA-4703: reservation denial debug action
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4703: reservation denial debug action .. Patch Set 7: Code-Review+2 rebase -- To view, visit http://gerrit.cloudera.org:8080/7022 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ied39bb091b12156e5dc61b528c6c0cd8de3fe657 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5116: Remove deprecated hash * types in gutil
Kim Jin Chul has posted comments on this change. Change subject: IMPALA-5116: Remove deprecated hash_* types in gutil .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/7414/1/be/src/gutil/hash/hash.h File be/src/gutil/hash/hash.h: Line 222 > OK, so it looks like you've added it back in. Before we can decide if it sh I guess the reason is that __gnu_cxx::hash<> was not used anywhere. I didn't find any use case for this. Would you please double check? Line 278 > So, do you need it to still exist, and if so, how do you make sure std::has I got it. My idea is that some specializations in __gnu_cxx should be moved to std namespace. I can be sure this will not create a potential problem. Do you agree on this? Or any suggestion? -- To view, visit http://gerrit.cloudera.org:8080/7414 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I06af30fd15acd43a60e3c30af54056c96520b6e9 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Kim Jin Chul Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5116: Remove deprecated hash * types in gutil
Jim Apple has posted comments on this change. Change subject: IMPALA-5116: Remove deprecated hash_* types in gutil .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/7414/1/be/src/gutil/hash/hash.h File be/src/gutil/hash/hash.h: Line 222 > You're right. std::hash template class is specialized between 1 byte and 8 OK, so it looks like you've added it back in. Before we can decide if it should stay or not, let'd understand why it didn't break the build before. Line 278 > You're right. It should make wrong result or corruption. So, do you need it to still exist, and if so, how do you make sure std::hash, which may not be __gnu_cxx::hash, is specialized? -- To view, visit http://gerrit.cloudera.org:8080/7414 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I06af30fd15acd43a60e3c30af54056c96520b6e9 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Kim Jin Chul Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5657: Fix a couple of bugs with FunctionCallExpr and IGNORE NULLS
Bharath Vissapragada has posted comments on this change. Change subject: IMPALA-5657: Fix a couple of bugs with FunctionCallExpr and IGNORE NULLS .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/7416/1//COMMIT_MSG Commit Message: Line 1: Parent: 1d8274a8 (IMPALA-5585: Fix expr-test to call last_day() tests) > Thanks! can you also add a ToSqlTest? Done -- To view, visit http://gerrit.cloudera.org:8080/7416 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I723897886c95763c3f29d6f24c4d9e7d43898ade Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5657: Fix a couple of bugs with FunctionCallExpr and IGNORE NULLS
Bharath Vissapragada has uploaded a new patch set (#2). Change subject: IMPALA-5657: Fix a couple of bugs with FunctionCallExpr and IGNORE NULLS .. IMPALA-5657: Fix a couple of bugs with FunctionCallExpr and IGNORE NULLS Bugs: - FunctionCallExpr's toSql() doesn't include IGNORE NULLS if present causing view definitions to break and leading to incorrect results. - FunctionCallExpr's clone() implementation doesn't carry forward IGNORE NULLS option if present. One case that breaks with this is querying views containing analytic exprs causing wrong plans. Fixed both the bugs and added a test that can reliably reproduce this. Change-Id: I723897886c95763c3f29d6f24c4d9e7d43898ade --- M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M testdata/workloads/functional-query/queries/QueryTest/analytic-fns.test 3 files changed, 33 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/7416/2 -- To view, visit http://gerrit.cloudera.org:8080/7416 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I723897886c95763c3f29d6f24c4d9e7d43898ade Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Matthew Jacobs
[Impala-ASF-CR] IMPALA-4674: Part 3: fix null-aware anti join
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4674: Part 3: fix null-aware anti join .. Patch Set 1: (7 comments) http://gerrit.cloudera.org:8080/#/c/7367/1//COMMIT_MSG Commit Message: PS1, Line 13: note not Line 25: Instead we just iterate over the rows of the stream. > are there join class comments that should be updated to explain this strate I updated the member declarations to mentioned when things are pinned/unpinned and updated the EvaluateNullProbe() comment. I think this covers all of the points in the commit message. http://gerrit.cloudera.org:8080/#/c/7367/1/be/src/exec/partitioned-hash-join-builder.cc File be/src/exec/partitioned-hash-join-builder.cc: Line 251: RETURN_IF_ERROR(null_aware_partition_->Spill(BufferedTupleStreamV2::UNPIN_ALL)); > it's a bit confusing that we call Partition::Spill() when Partition::is_spi Improved the comment. http://gerrit.cloudera.org:8080/#/c/7367/1/be/src/exec/partitioned-hash-join-node.h File be/src/exec/partitioned-hash-join-node.h: PS1, Line 95: /// Null aware anti-join (NAAJ) extends the above algorithm by accumulating rows with : /// NULLs into several different streams, which are processed in a separate step to : /// produce additional output rows. The NAAJ algorithm is documented in more detail in : /// header comments for the null aware functions and data structures. > any of this need updates (or the comments this references)? Done. It would be nice to have a clearer explanation of the NAAJ here (i.e. why NULLs should be treated that wasy) but that seems out of scope for this change. http://gerrit.cloudera.org:8080/#/c/7367/1/testdata/workloads/functional-query/queries/QueryTest/spilling-naaj.test File testdata/workloads/functional-query/queries/QueryTest/spilling-naaj.test: Line 3: set max_block_mgr_memory=10m; Moved below comment Line 8: # This returns the rows returned from as garbled Line 17: where l_suppkey = 4162 and l_shipmode = 'AIR' and l_returnflag = 'A' and l_shipdate > '1993-01-01' and Fixed long lines in files -- To view, visit http://gerrit.cloudera.org:8080/7367 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie2e60eb4dd32bd287a31479a6232400df65964c1 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4674: Part 3: fix null-aware anti join
Tim Armstrong has uploaded a new patch set (#2). Change subject: IMPALA-4674: Part 3: fix null-aware anti join .. IMPALA-4674: Part 3: fix null-aware anti join Part 2 regressed NAAJ by tightening up the spilling invariants (e.g. can't unpin with spilling disabled) but we didn't have tests for spilling NAAJs that could detect the regression. This patch adds those tests, fixes the regressions, and improves NAAJ by reliably spilling the probe side and not trying to bring the whole probe side into memory. The changes are: * All null-aware streams start off in memory and are only unpinned if spilling is enabled. * The null-aware build partition can be spilled in the same way as hash partitions. * Probe streams are unpinned whenever there is memory pressure - if spilling is enabled and either a build partition is spilled or appending to the probe stream fails. * Spilled probe streams are not re-pinned in EvaluateNullProbe(). Instead we just iterate over the rows of the stream. Testing: Add query tests where the three different buckets of rows are large enough to spill: the build and probe of the null-aware partition and the null probe rows. Test both spilling and in-memory (with spilling disabled) cases. Change-Id: Ie2e60eb4dd32bd287a31479a6232400df65964c1 --- M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-builder.h M be/src/exec/partitioned-hash-join-node-ir.cc M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/partitioned-hash-join-node.h A testdata/workloads/functional-query/queries/QueryTest/spilling-naaj.test M tests/query_test/test_spilling.py 7 files changed, 425 insertions(+), 88 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/7367/2 -- To view, visit http://gerrit.cloudera.org:8080/7367 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie2e60eb4dd32bd287a31479a6232400df65964c1 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht
[Impala-ASF-CR] IMPALA-5116: Remove deprecated hash * types in gutil
Kim Jin Chul has posted comments on this change. Change subject: IMPALA-5116: Remove deprecated hash_* types in gutil .. Patch Set 2: (7 comments) Thanks for your kind review. Would you please review again? http://gerrit.cloudera.org:8080/#/c/7414/1//COMMIT_MSG Commit Message: PS1, Line 7: Remove deprecated hash_* types in gutil > "Remove deprecated hash_* types in gutil" Done PS1, Line 9: tut > from Done PS1, Line 9: class tem > class templates Done http://gerrit.cloudera.org:8080/#/c/7414/1/be/src/gutil/hash/hash.h File be/src/gutil/hash/hash.h: Line 222 > I don't think gcc ships with a std::hash overload for uint128. Why doesn't You're right. std::hash template class is specialized between 1 byte and 8 bytes. I agree this is necessary for unsigned 16 bytes. Line 278 > http://en.cppreference.com/w/cpp/utility/hash : You're right. It should make wrong result or corruption. http://gerrit.cloudera.org:8080/#/c/7414/1/be/src/gutil/strings/split.h File be/src/gutil/strings/split.h: Line 135 > Has this fact about container-ism changed? You're right. It's my mistake. pair cannot be a container. Line 132: // multimap, unordered_set and unordered_map, and even std::pair which is not > Fix early line break Done -- To view, visit http://gerrit.cloudera.org:8080/7414 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I06af30fd15acd43a60e3c30af54056c96520b6e9 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Kim Jin Chul Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5116: Remove deprecated hash * types in gutil
Kim Jin Chul has uploaded a new patch set (#2). Change subject: IMPALA-5116: Remove deprecated hash_* types in gutil .. IMPALA-5116: Remove deprecated hash_* types in gutil The following class templates are substituted from C++11 standard. __gnu_cxx::hash_map => std::unordered_map __gnu_cxx::hash_set => std::unordered_set __gnu_cxx::hash => std::hash Change-Id: I06af30fd15acd43a60e3c30af54056c96520b6e9 --- M be/src/gutil/hash/hash.cc M be/src/gutil/hash/hash.h M be/src/gutil/strings/join.h M be/src/gutil/strings/serialize.cc M be/src/gutil/strings/serialize.h M be/src/gutil/strings/split.cc M be/src/gutil/strings/split.h 7 files changed, 35 insertions(+), 106 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/7414/2 -- To view, visit http://gerrit.cloudera.org:8080/7414 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I06af30fd15acd43a60e3c30af54056c96520b6e9 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Jim Apple
[Impala-ASF-CR] IMPALA-5116: Removal of uses for the deprecated functions in gutil
Jim Apple has posted comments on this change. Change subject: IMPALA-5116: Removal of uses for the deprecated functions in gutil .. Patch Set 1: (7 comments) THank you for sending this patch! A few questions: http://gerrit.cloudera.org:8080/#/c/7414/1//COMMIT_MSG Commit Message: PS1, Line 7: Removal of uses for the deprecated functions in gutil "Remove deprecated hash_* types in gutil" PS1, Line 9: functions class templates PS1, Line 9: for from http://gerrit.cloudera.org:8080/#/c/7414/1/be/src/gutil/hash/hash.h File be/src/gutil/hash/hash.h: Line 222 I don't think gcc ships with a std::hash overload for uint128. Why doesn't removing this break the build? Line 278 http://en.cppreference.com/w/cpp/utility/hash : "std::hash produces a hash of the value of the pointer (the memory address)" http://gerrit.cloudera.org:8080/#/c/7414/1/be/src/gutil/strings/split.h File be/src/gutil/strings/split.h: Line 135 Has this fact about container-ism changed? Line 132: // multimap, unordered_set and unordered_map, and Fix early line break -- To view, visit http://gerrit.cloudera.org:8080/7414 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I06af30fd15acd43a60e3c30af54056c96520b6e9 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Jim Apple Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4674: Part 2: port backend exec to BufferPool
Dan Hecht has posted comments on this change. Change subject: IMPALA-4674: Part 2: port backend exec to BufferPool .. Patch Set 26: (1 comment) http://gerrit.cloudera.org:8080/#/c/5801/26/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: PS26, Line 149: if (mem_limit == -1) mem_limit = numeric_limits::max(); > Filed IMPALA-5661. What did the old system do? was there a block mgr limit per query when there was no mem limit? i'm wondering if the new system can starve out queries more easily when running without mem limits. Of course, the real answer is integrating with admission control, but since that won't happen till later. -- To view, visit http://gerrit.cloudera.org:8080/5801 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7fc7fe1c04e9dfb1a0c749fb56a5e0f2bf9c6c3e Gerrit-PatchSet: 26 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4674: Part 2: port backend exec to BufferPool
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4674: Part 2: port backend exec to BufferPool .. Patch Set 32: Code-Review+2 rebase -- To view, visit http://gerrit.cloudera.org:8080/5801 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7fc7fe1c04e9dfb1a0c749fb56a5e0f2bf9c6c3e Gerrit-PatchSet: 32 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4674: Part 2: port backend exec to BufferPool
Hello Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5801 to look at the new patch set (#32). Change subject: IMPALA-4674: Part 2: port backend exec to BufferPool .. IMPALA-4674: Part 2: port backend exec to BufferPool Always create global BufferPool at startup using 80% of memory and limit reservations to 80% of query memory (same as BufferedBlockMgr). The query's initial reservation is computed in the planner, claimed centrally (managed by the InitialReservations class) and distributed to query operators from there. min_spillable_buffer_size and default_spillable_buffer_size query options control the buffer size that the planner selects for spilling operators. Port ExecNodes to use BufferPool: * Each ExecNode has to claim its reservation during Open() * Port Sorter to use BufferPool. * Switch from BufferedTupleStream to BufferedTupleStreamV2 * Port HashTable to use BufferPool via a Suballocator. This also makes PAGG memory consumption more efficient (avoid wasting buffers) and improve the spilling algorithm: * Allow preaggs to execute with 0 reservation - if streams and hash tables cannot be allocated, it will pass through rows. * Halve the buffer requirement for spilling aggs - avoid allocating buffers for aggregated and unaggregated streams simultaneously. * Rebuild spilled partitions instead of repartitioning (IMPALA-2708) TODO in follow-up patches: * Rename BufferedTupleStreamV2 to BufferedTupleStream * Implement max_row_size query option. Testing: * Updated tests to reflect new memory requirements Change-Id: I7fc7fe1c04e9dfb1a0c749fb56a5e0f2bf9c6c3e --- M be/src/codegen/gen_ir_descriptions.py M be/src/exec/analytic-eval-node.cc M be/src/exec/analytic-eval-node.h M be/src/exec/exec-node.cc M be/src/exec/exec-node.h M be/src/exec/hash-table-test.cc M be/src/exec/hash-table.cc M be/src/exec/hash-table.h M be/src/exec/hash-table.inline.h M be/src/exec/nested-loop-join-builder.cc M be/src/exec/partitioned-aggregation-node-ir.cc M be/src/exec/partitioned-aggregation-node.cc M be/src/exec/partitioned-aggregation-node.h M be/src/exec/partitioned-hash-join-builder-ir.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-builder.h M be/src/exec/partitioned-hash-join-node-ir.cc M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/partitioned-hash-join-node.h M be/src/exec/partitioned-hash-join-node.inline.h M be/src/exec/sort-node.cc M be/src/exec/sort-node.h M be/src/runtime/CMakeLists.txt D be/src/runtime/buffered-block-mgr-test.cc D be/src/runtime/buffered-block-mgr.cc D be/src/runtime/buffered-block-mgr.h D be/src/runtime/buffered-tuple-stream-test.cc D be/src/runtime/buffered-tuple-stream.cc D be/src/runtime/buffered-tuple-stream.h D be/src/runtime/buffered-tuple-stream.inline.h M be/src/runtime/bufferpool/buffer-pool.cc M be/src/runtime/bufferpool/buffer-pool.h M be/src/runtime/bufferpool/reservation-tracker.h M be/src/runtime/disk-io-mgr.cc M be/src/runtime/exec-env.cc M be/src/runtime/exec-env.h M be/src/runtime/fragment-instance-state.cc A be/src/runtime/initial-reservations.cc A be/src/runtime/initial-reservations.h M be/src/runtime/query-exec-mgr.cc M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/row-batch.cc M be/src/runtime/row-batch.h M be/src/runtime/runtime-filter.h M be/src/runtime/runtime-state.cc M be/src/runtime/runtime-state.h M be/src/runtime/sorter.cc M be/src/runtime/sorter.h M be/src/runtime/test-env.cc M be/src/runtime/test-env.h M be/src/runtime/tmp-file-mgr-test.cc M be/src/runtime/tmp-file-mgr.h M be/src/service/client-request-state.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/bloom-filter.h M be/src/util/static-asserts.cc M common/thrift/Frontend.thrift M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/PlanNodes.thrift M common/thrift/generate_error_codes.py M fe/src/main/java/org/apache/impala/common/RuntimeEnv.java M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java M fe/src/main/java/org/apache/impala/planner/DataStreamSink.java M fe/src/main/java/org/apache/impala/planner/EmptySetNode.java M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java M fe/src/main/java/org/apache/impala/planner/HBaseTableSink.java M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java M fe/src/main/java/org/apache/impala/planner/JoinBuildSink.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/Kudu
[Impala-ASF-CR] IMPALA-4674: Part 1: remove old aggs and joins
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4674: Part 1: remove old aggs and joins .. Patch Set 9: Code-Review+2 rebase -- To view, visit http://gerrit.cloudera.org:8080/7102 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5ce2236d37c0ced188a4a81f7e00d4b8ac98e7e9 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4674: Part 2: port backend exec to BufferPool
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4674: Part 2: port backend exec to BufferPool .. Patch Set 31: Code-Review+2 Carry +2 -- To view, visit http://gerrit.cloudera.org:8080/5801 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7fc7fe1c04e9dfb1a0c749fb56a5e0f2bf9c6c3e Gerrit-PatchSet: 31 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4674: Part 2: port backend exec to BufferPool
Hello Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5801 to look at the new patch set (#31). Change subject: IMPALA-4674: Part 2: port backend exec to BufferPool .. IMPALA-4674: Part 2: port backend exec to BufferPool Always create global BufferPool at startup using 80% of memory and limit reservations to 80% of query memory (same as BufferedBlockMgr). The query's initial reservation is computed in the planner, claimed centrally (managed by the InitialReservations class) and distributed to query operators from there. min_spillable_buffer_size and default_spillable_buffer_size query options control the buffer size that the planner selects for spilling operators. Port ExecNodes to use BufferPool: * Each ExecNode has to claim its reservation during Open() * Port Sorter to use BufferPool. * Switch from BufferedTupleStream to BufferedTupleStreamV2 * Port HashTable to use BufferPool via a Suballocator. This also makes PAGG memory consumption more efficient (avoid wasting buffers) and improve the spilling algorithm: * Allow preaggs to execute with 0 reservation - if streams and hash tables cannot be allocated, it will pass through rows. * Halve the buffer requirement for spilling aggs - avoid allocating buffers for aggregated and unaggregated streams simultaneously. * Rebuild spilled partitions instead of repartitioning (IMPALA-2708) TODO in follow-up patches: * Rename BufferedTupleStreamV2 to BufferedTupleStream * Implement max_row_size query option. Testing: * Updated tests to reflect new memory requirements Change-Id: I7fc7fe1c04e9dfb1a0c749fb56a5e0f2bf9c6c3e --- M be/src/codegen/gen_ir_descriptions.py M be/src/exec/analytic-eval-node.cc M be/src/exec/analytic-eval-node.h M be/src/exec/exec-node.cc M be/src/exec/exec-node.h M be/src/exec/hash-table-test.cc M be/src/exec/hash-table.cc M be/src/exec/hash-table.h M be/src/exec/hash-table.inline.h M be/src/exec/nested-loop-join-builder.cc M be/src/exec/partitioned-aggregation-node-ir.cc M be/src/exec/partitioned-aggregation-node.cc M be/src/exec/partitioned-aggregation-node.h M be/src/exec/partitioned-hash-join-builder-ir.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-builder.h M be/src/exec/partitioned-hash-join-node-ir.cc M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/partitioned-hash-join-node.h M be/src/exec/partitioned-hash-join-node.inline.h M be/src/exec/sort-node.cc M be/src/exec/sort-node.h M be/src/runtime/CMakeLists.txt D be/src/runtime/buffered-block-mgr-test.cc D be/src/runtime/buffered-block-mgr.cc D be/src/runtime/buffered-block-mgr.h D be/src/runtime/buffered-tuple-stream-test.cc D be/src/runtime/buffered-tuple-stream.cc D be/src/runtime/buffered-tuple-stream.h D be/src/runtime/buffered-tuple-stream.inline.h M be/src/runtime/bufferpool/buffer-pool.cc M be/src/runtime/bufferpool/buffer-pool.h M be/src/runtime/bufferpool/reservation-tracker.h M be/src/runtime/disk-io-mgr.cc M be/src/runtime/exec-env.cc M be/src/runtime/exec-env.h M be/src/runtime/fragment-instance-state.cc A be/src/runtime/initial-reservations.cc A be/src/runtime/initial-reservations.h M be/src/runtime/query-exec-mgr.cc M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/row-batch.cc M be/src/runtime/row-batch.h M be/src/runtime/runtime-filter.h M be/src/runtime/runtime-state.cc M be/src/runtime/runtime-state.h M be/src/runtime/sorter.cc M be/src/runtime/sorter.h M be/src/runtime/test-env.cc M be/src/runtime/test-env.h M be/src/runtime/tmp-file-mgr-test.cc M be/src/runtime/tmp-file-mgr.h M be/src/service/client-request-state.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/bloom-filter.h M be/src/util/static-asserts.cc M common/thrift/Frontend.thrift M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/PlanNodes.thrift M common/thrift/generate_error_codes.py M fe/src/main/java/org/apache/impala/common/RuntimeEnv.java M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java M fe/src/main/java/org/apache/impala/planner/DataStreamSink.java M fe/src/main/java/org/apache/impala/planner/EmptySetNode.java M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java M fe/src/main/java/org/apache/impala/planner/HBaseTableSink.java M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java M fe/src/main/java/org/apache/impala/planner/JoinBuildSink.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/Kudu
[Impala-ASF-CR] IMPALA-4674: Part 2: port backend exec to BufferPool
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4674: Part 2: port backend exec to BufferPool .. Patch Set 29: (4 comments) http://gerrit.cloudera.org:8080/#/c/5801/30/be/src/exec/partitioned-aggregation-node.cc File be/src/exec/partitioned-aggregation-node.cc: PS30, Line 1246: > typo Done http://gerrit.cloudera.org:8080/#/c/5801/30/be/src/runtime/exec-env.cc File be/src/runtime/exec-env.cc: Line 366: << PrettyPrinter::Print(bytes_limit, TUnit::BYTES); > maybe log bufffer_pool_capacity as well? or do we do that already somewhere Done http://gerrit.cloudera.org:8080/#/c/5801/26/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: PS26, Line 149: bytes_limit = query_options().mem_limit; > How about we file a jira for this and consider it for a follow on change? o Filed IMPALA-5661. I agree that the process-wide buffer pool limit kicks in automatically so we don't need to add a query-local buffer pool limit if there's no query mem_limit. Restructured the code to exploit that fact and be less confusing. http://gerrit.cloudera.org:8080/#/c/5801/29/tests/query_test/test_mem_usage_scaling.py File tests/query_test/test_mem_usage_scaling.py: Line 126:'Q21' : 187, 'Q22' : 125} > Is that because we have to assume all fragment executions overlap, or pessi I think the fragment overlapping is the main problem. Mostly within a single fragment I'd expect it to be pretty accurate, since in the old code the small buffers get allocated on Open() around the same time as the reservation is acquired. -- To view, visit http://gerrit.cloudera.org:8080/5801 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7fc7fe1c04e9dfb1a0c749fb56a5e0f2bf9c6c3e Gerrit-PatchSet: 29 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] [DOCS] Fold some lines
John Russell has uploaded a new change for review. http://gerrit.cloudera.org:8080/7417 Change subject: [DOCS] Fold some lines .. [DOCS] Fold some lines Putting on a separate line causes an extra leading blank line in PDF output. (I think that's a bug in the PDF transform, but what can you do...) I took one file that had a few instances of such examples, and folded the first line of the code example onto the same line as the tag. Using this minor change to demo upstream -> downstream cherry-picking techniques. Change-Id: I563a71ab9e59c691264c1f6a088e71f4722d3097 --- M docs/topics/impala_connecting.xml 1 file changed, 3 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/7417/1 -- To view, visit http://gerrit.cloudera.org:8080/7417 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I563a71ab9e59c691264c1f6a088e71f4722d3097 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell
[Impala-ASF-CR] IMPALA-5650: Make sum init zero a SUM function
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5650: Make sum_init_zero a SUM function .. Patch Set 4: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/859/ -- To view, visit http://gerrit.cloudera.org:8080/7404 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibca53e3ef6bad5283550100b33db55d674cff0b9 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: No
[native-toolchain-CR] IMPALA-5659: Build shared libs for gflags
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-5659: Build shared libs for gflags .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7415 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If1842768f21444ccf231cfd3fe5e51a4ca0d7244 Gerrit-PatchSet: 1 Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5650: Make sum init zero a SUM function
Michael Ho has posted comments on this change. Change subject: IMPALA-5650: Make sum_init_zero a SUM function .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7404 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibca53e3ef6bad5283550100b33db55d674cff0b9 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5657: Fix a couple of bugs with FunctionCallExpr and IGNORE NULLS
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-5657: Fix a couple of bugs with FunctionCallExpr and IGNORE NULLS .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/7416/1//COMMIT_MSG Commit Message: Line 1: Parent: 1d8274a8 (IMPALA-5585: Fix expr-test to call last_day() tests) Thanks! can you also add a ToSqlTest? -- To view, visit http://gerrit.cloudera.org:8080/7416 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I723897886c95763c3f29d6f24c4d9e7d43898ade Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5650: Make sum init zero a SUM function
Taras Bobrovytsky has uploaded a new patch set (#4). Change subject: IMPALA-5650: Make sum_init_zero a SUM function .. IMPALA-5650: Make sum_init_zero a SUM function The recent Parquet count(*) optimization (IMPALA-5036) introduced a new aggregate function sum_init_zero(). This caused an issue for non partitioned aggregation node, which checks that the function type is either count, min, max, sum or ndv. Since sum_init_zero() was not labelled as a sum() function, a DCHECK was hit. This issue is fixed by labelling sum_init_zero() as a sum() function. Testing: - Verified that we no longer hit a DCHECK when running a query with a Parquet count(*) optimization with legacy agg node enabled. Change-Id: Ibca53e3ef6bad5283550100b33db55d674cff0b9 --- M be/src/exec/partitioned-aggregation-node.cc M be/src/exprs/agg-fn.cc 2 files changed, 10 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/7404/4 -- To view, visit http://gerrit.cloudera.org:8080/7404 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibca53e3ef6bad5283550100b33db55d674cff0b9 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-5650: Make sum init zero a SUM function
Hello Lars Volker, Michael Ho, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7404 to look at the new patch set (#4). Change subject: IMPALA-5650: Make sum_init_zero a SUM function .. IMPALA-5650: Make sum_init_zero a SUM function The recent Parquet count(*) optimization (IMPALA-5036) introduced a new aggregate function sum_init_zero(). This caused an issue for non partitioned aggregation node, which checks that the function type is either count, min, max, sum or ndv. Since sum_init_zero() was not labelled as a sum() function, a DCHECK was hit. This issue is fixed by labelling sum_init_zero() as a sum() function. Testing: - Verified that we no longer hit a DCHECK when running a query with a Parquet count(*) optimization with legacy agg node enabled. Change-Id: Ibca53e3ef6bad5283550100b33db55d674cff0b9 --- M be/src/exec/partitioned-aggregation-node.cc M be/src/exprs/agg-fn.cc 2 files changed, 10 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/7404/4 -- To view, visit http://gerrit.cloudera.org:8080/7404 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibca53e3ef6bad5283550100b33db55d674cff0b9 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-5650: Make sum init zero a SUM function
Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-5650: Make sum_init_zero a SUM function .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/7404/3/be/src/exec/partitioned-aggregation-node.cc File be/src/exec/partitioned-aggregation-node.cc: Line 1590: // 'slot_desc' is not nullable if the aggregate function is sum_init_zero(). > Also add a comment stating that: In which case, the slot is initialized to Done Line 1592: slot_desc->CodegenSetNullIndicator( > Alternately, DCHECK(slot_desc->is_nullable() || agg_fn->fn_name() == "sum_i Added a similar DCHECK to what was suggested. -- To view, visit http://gerrit.cloudera.org:8080/7404 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibca53e3ef6bad5283550100b33db55d674cff0b9 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5657: Fix a couple of bugs with FunctionCallExpr and IGNORE NULLS
Bharath Vissapragada has uploaded a new change for review. http://gerrit.cloudera.org:8080/7416 Change subject: IMPALA-5657: Fix a couple of bugs with FunctionCallExpr and IGNORE NULLS .. IMPALA-5657: Fix a couple of bugs with FunctionCallExpr and IGNORE NULLS Bugs: - FunctionCallExpr's toSql() doesn't include IGNORE NULLS if present causing view definitions to break and leading to incorrect results. - FunctionCallExpr's clone() implementation doesn't carry forward IGNORE NULLS option if present. One case that breaks with this is querying views containing analytic exprs causing wrong plans. Fixed both the bugs and added a test that can reliably reproduce this. Change-Id: I723897886c95763c3f29d6f24c4d9e7d43898ade --- M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java M testdata/workloads/functional-query/queries/QueryTest/analytic-fns.test 2 files changed, 27 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/7416/1 -- To view, visit http://gerrit.cloudera.org:8080/7416 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I723897886c95763c3f29d6f24c4d9e7d43898ade Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath Vissapragada
[native-toolchain-CR] IMPALA-5659: Build shared libs for gflags
Henry Robinson has uploaded a new change for review. http://gerrit.cloudera.org:8080/7415 Change subject: IMPALA-5659: Build shared libs for gflags .. IMPALA-5659: Build shared libs for gflags Change-Id: If1842768f21444ccf231cfd3fe5e51a4ca0d7244 --- M source/gflags/build.sh 1 file changed, 2 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/native-toolchain refs/changes/15/7415/1 -- To view, visit http://gerrit.cloudera.org:8080/7415 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: If1842768f21444ccf231cfd3fe5e51a4ca0d7244 Gerrit-PatchSet: 1 Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-Owner: Henry Robinson
[Impala-ASF-CR] IMPALA-5582: Store sentry privileges in lower case
Michael Brown has posted comments on this change. Change subject: IMPALA-5582: Store sentry privileges in lower case .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/7332/3/tests/authorization/test_grant_revoke.py File tests/authorization/test_grant_revoke.py: PS3, Line 143: grant_rev_db > def test_role_privilege_case(self, vector, unique_database): Got it. Filed IMPALA-5660 which shouldn't block this. -- To view, visit http://gerrit.cloudera.org:8080/7332 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ide3dfa601fcf77f5acc6adce9bea443aea600901 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadke Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: anujphadke Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4674: Part 3: fix null-aware anti join
Dan Hecht has posted comments on this change. Change subject: IMPALA-4674: Part 3: fix null-aware anti join .. Patch Set 1: (4 comments) http://gerrit.cloudera.org:8080/#/c/7367/1//COMMIT_MSG Commit Message: PS1, Line 13: note not Line 25: Instead we just iterate over the rows of the stream. are there join class comments that should be updated to explain this strategy? http://gerrit.cloudera.org:8080/#/c/7367/1/be/src/exec/partitioned-hash-join-builder.cc File be/src/exec/partitioned-hash-join-builder.cc: Line 251: RETURN_IF_ERROR(null_aware_partition_->Spill(BufferedTupleStreamV2::UNPIN_ALL)); it's a bit confusing that we call Partition::Spill() when Partition::is_spilled() is already true. The comment here helps, but maybe something we can say in the Spill() function comment about this use of Spill()? do we do this elsewhere? http://gerrit.cloudera.org:8080/#/c/7367/1/be/src/exec/partitioned-hash-join-node.h File be/src/exec/partitioned-hash-join-node.h: PS1, Line 95: /// Null aware anti-join (NAAJ) extends the above algorithm by accumulating rows with : /// NULLs into several different streams, which are processed in a separate step to : /// produce additional output rows. The NAAJ algorithm is documented in more detail in : /// header comments for the null aware functions and data structures. any of this need updates (or the comments this references)? -- To view, visit http://gerrit.cloudera.org:8080/7367 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie2e60eb4dd32bd287a31479a6232400df65964c1 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5650: Make sum init zero a SUM function
Michael Ho has posted comments on this change. Change subject: IMPALA-5650: Make sum_init_zero a SUM function .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/7404/3/be/src/exec/partitioned-aggregation-node.cc File be/src/exec/partitioned-aggregation-node.cc: Line 1590: // 'slot_desc' is not nullable if the aggregate function is sum_init_zero(). Also add a comment stating that: In which case, the slot is initialized to be zero and null bit is cleared. Line 1592: slot_desc->CodegenSetNullIndicator( > Can you add a DCHECK to make sure it's not sum_init_zero()? Alternately, DCHECK(slot_desc->is_nullable() || agg_fn->fn_name() == "sum_init_zero"); -- To view, visit http://gerrit.cloudera.org:8080/7404 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibca53e3ef6bad5283550100b33db55d674cff0b9 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4674: Part 2: port backend exec to BufferPool
Dan Hecht has posted comments on this change. Change subject: IMPALA-4674: Part 2: port backend exec to BufferPool .. Patch Set 30: Code-Review+2 (4 comments) +2 assuming we do the addition flag in a separate change. Good luck with the checkin. http://gerrit.cloudera.org:8080/#/c/5801/30/be/src/exec/partitioned-aggregation-node.cc File be/src/exec/partitioned-aggregation-node.cc: PS30, Line 1246: nitialise typo http://gerrit.cloudera.org:8080/#/c/5801/30/be/src/runtime/exec-env.cc File be/src/runtime/exec-env.cc: Line 366: << PrettyPrinter::Print(bytes_limit, TUnit::BYTES); maybe log bufffer_pool_capacity as well? or do we do that already somewhere else? http://gerrit.cloudera.org:8080/#/c/5801/26/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: PS26, Line 149: bytes_limit = query_options().mem_limit; > hmm i'm not sure, maybe. What about the case where there is no query mem_l How about we file a jira for this and consider it for a follow on change? or do you want to settle it in this commit? http://gerrit.cloudera.org:8080/#/c/5801/29/tests/query_test/test_mem_usage_scaling.py File tests/query_test/test_mem_usage_scaling.py: Line 126:'Q21' : 187, 'Q22' : 125} > Part of it is that the values were either stale or obtained in a different Is that because we have to assume all fragment executions overlap, or pessimism within a single fragment, or both? anyway, I added a comment to the IMPALA-3200 test doc, we can discuss there. -- To view, visit http://gerrit.cloudera.org:8080/5801 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7fc7fe1c04e9dfb1a0c749fb56a5e0f2bf9c6c3e Gerrit-PatchSet: 30 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5520: TopN node periodically reclaims old allocations
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5520: TopN node periodically reclaims old allocations .. Patch Set 3: (7 comments) http://gerrit.cloudera.org:8080/#/c/7400/1/be/src/exec/topn-node-ir.cc File be/src/exec/topn-node-ir.cc: Line 61: } > Done Looks like you missed the second part of this (I wasn't that clear). I don't think the call to ReclaimTuplePool() needs to be cross-compiled to LLVM IR. It can be called after InsertBatch() instead of from inside InsertBatch(). http://gerrit.cloudera.org:8080/#/c/7400/3/be/src/exec/topn-node.cc File be/src/exec/topn-node.cc: PS3, Line 244: tuple_row_comparator_wrapper_ I think something like ComparatorWrapper(*tuple_row_less_than_) will work. PS3, Line 259: NULL It would be best to pass in the RuntimeState* - that enables better logging of the failure. PS3, Line 271: NULL Same here. http://gerrit.cloudera.org:8080/#/c/7400/3/be/src/exec/topn-node.h File be/src/exec/topn-node.h: Line 36: #define PUSH_HEAP(v, comp, elem) v.push_back(elem); \ It's probably best to just make it an inline function, it doesn't seem like it needs to be a macro. Line 100: boost::scoped_ptr> tuple_row_comparator_wrapper_; We don't need to store this additional state, since all the state is in tuple_row_less_than_ - we can just create a wrapper where needed. PS3, Line 134: vector Also mention that it's updated with push_heap()/pop_heap(). -- To view, visit http://gerrit.cloudera.org:8080/7400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I968f57f0ff2905bd581908bc5c5ee486b31e6aa8 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5627: fix dropped statuses in HDFS writers
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5627: fix dropped statuses in HDFS writers .. Patch Set 4: (4 comments) http://gerrit.cloudera.org:8080/#/c/7372/4/be/src/exec/hdfs-avro-table-writer.h File be/src/exec/hdfs-avro-table-writer.h: Line 71: virtual Status Init(); > Can we add WARN_UNUSED_RESULT to virtual functions, too? I intended to add it to the base class but missed doing that. Thanks for catching. Added that and then added "override" declarations here so it's clearer that they're overridden functions. http://gerrit.cloudera.org:8080/#/c/7372/4/be/src/exec/hdfs-parquet-table-writer.cc File be/src/exec/hdfs-parquet-table-writer.cc: Line 194: // aggregates. Returns true if the value was appended successfully to the current page. > Should this also mention that the method updates next_page_encoding_? Done Line 260: Encoding::type next_page_encoding_; // Preferred encoding for the next page. > Is this ever not used as the encoding of the next page? Otherwise it may be It always is used. Updated. I'd prefer not to document where the variable is set in this case since I think it will get out of sync with the code too easily, so you need to grep the code anyway to verify that the comment is correct. Instead expanded a little on why it is set. http://gerrit.cloudera.org:8080/#/c/7372/4/be/src/exec/parquet-column-stats.cc File be/src/exec/parquet-column-stats.cc: Line 123: RETURN_IF_ERROR(buffer->Append(value->ptr, value->len)); > can you add WARN_IF_UNUSED to StringBuffer::Append()? Done -- To view, visit http://gerrit.cloudera.org:8080/7372 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I753d352c640faf5eaef650cd743e53de53761431 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5627: fix dropped statuses in HDFS writers
Tim Armstrong has uploaded a new patch set (#5). Change subject: IMPALA-5627: fix dropped statuses in HDFS writers .. IMPALA-5627: fix dropped statuses in HDFS writers The change is mostly mechanical - added Status returns where need. In one place I restructured the the logic around 'current_encoding_' for Parquet to allow a cleaner solution to the dropped status from FinalizeCurrentPage() call in ProcessValue(): after the restructuring the call was no longer needed. 'current_encoding_' was overloaded to represent both the encoding of the current page and the preferred encoding for subsequent pages. Testing: Ran exhaustive build. Change-Id: I753d352c640faf5eaef650cd743e53de53761431 --- M be/src/exec/hdfs-avro-table-writer.cc M be/src/exec/hdfs-avro-table-writer.h M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/hdfs-parquet-table-writer.h M be/src/exec/hdfs-sequence-table-writer.cc M be/src/exec/hdfs-table-writer.h M be/src/exec/parquet-column-stats.cc M be/src/exec/parquet-column-stats.h M be/src/exec/parquet-column-stats.inline.h M be/src/runtime/string-buffer.h 10 files changed, 98 insertions(+), 71 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/7372/5 -- To view, visit http://gerrit.cloudera.org:8080/7372 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I753d352c640faf5eaef650cd743e53de53761431 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-5582: Store sentry privileges in lower case
anujphadke has posted comments on this change. Change subject: IMPALA-5582: Store sentry privileges in lower case .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/7332/3/be/src/catalog/catalog.cc File be/src/catalog/catalog.cc: PS3, Line 41: DEFINE_int64 > int32 should be fine. int64_t maps to a long? scheduleAtFixedRate(Runnable command, long initialDelay, long period, TimeUnit unit) http://gerrit.cloudera.org:8080/#/c/7332/3/tests/authorization/test_grant_revoke.py File tests/authorization/test_grant_revoke.py: PS3, Line 143: grant_rev_db > We looked at that, but it actually doesn't work for these custom cluster te def test_role_privilege_case(self, vector, unique_database): This is the error I got on doing it. - ERROR at setup of TestGrantRevoke.test_role_privilege_case[exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: text/none] conftest.py:294: in unique_database E ImpalaBeeswaxException: ImpalaBeeswaxException: EINNER EXCEPTION: EMESSAGE: AuthorizationException: User 'anuj' does not have privileges to execute 'DROP' on: test_role_privilege_case_2af3a508 -- To view, visit http://gerrit.cloudera.org:8080/7332 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ide3dfa601fcf77f5acc6adce9bea443aea600901 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadke Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: anujphadke Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5582: Store sentry privileges in lower case
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-5582: Store sentry privileges in lower case .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/7332/3/tests/authorization/test_grant_revoke.py File tests/authorization/test_grant_revoke.py: PS3, Line 143: grant_rev_db > I suggest using the unique_database fixture instead of creating databases a We looked at that, but it actually doesn't work for these custom cluster tests because the code that would set up the unique_database don't have authorization to create a database. As a separate task, it might be worth considering a way to extend that to support granting access before creating the unique_database, but I think only this test file would make use of it. What do you think? -- To view, visit http://gerrit.cloudera.org:8080/7332 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ide3dfa601fcf77f5acc6adce9bea443aea600901 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadke Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: anujphadke Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5582: Store sentry privileges in lower case
Michael Brown has posted comments on this change. Change subject: IMPALA-5582: Store sentry privileges in lower case .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/7332/3/tests/authorization/test_grant_revoke.py File tests/authorization/test_grant_revoke.py: Line 25: from __builtin__ import any as b_any > why is this necessary? It shouldn't be unless any is shadowed, which it ought not be. If it is, that should be fixed. any has been a builtin library function since python 2.5. PS3, Line 143: grant_rev_db > Looks like a common db name used in authz tests. Could you randomize it? El I suggest using the unique_database fixture instead of creating databases and having a use side-effect command. "git grep unique_database" for examples. -- To view, visit http://gerrit.cloudera.org:8080/7332 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ide3dfa601fcf77f5acc6adce9bea443aea600901 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadke Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: anujphadke Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5582: Store sentry privileges in lower case
Bharath Vissapragada has posted comments on this change. Change subject: IMPALA-5582: Store sentry privileges in lower case .. Patch Set 3: (9 comments) Few more in addition to MJ's comments. http://gerrit.cloudera.org:8080/#/c/7332/3//COMMIT_MSG Commit Message: PS3, Line 16: If the catalogObjectCache on a update adds the new : rolePrivilege object first and removes the old object later qq, Isn't this order always the same? http://gerrit.cloudera.org:8080/#/c/7332/3/be/src/catalog/catalog.cc File be/src/catalog/catalog.cc: PS3, Line 41: DEFINE_int64 int32 should be fine. PS3, Line 41: sentry_catalog_polling_frequency how about sentry_catalog_polling_frequency_s ? (indicates seconds) http://gerrit.cloudera.org:8080/#/c/7332/3/fe/src/main/java/org/apache/impala/catalog/AuthorizationPolicy.java File fe/src/main/java/org/apache/impala/catalog/AuthorizationPolicy.java: PS3, Line 336: Fix GRANTs on URIs with uppercase letters s/ URIs are case sensitive ? http://gerrit.cloudera.org:8080/#/c/7332/3/fe/src/main/java/org/apache/impala/catalog/RolePrivilege.java File fe/src/main/java/org/apache/impala/catalog/RolePrivilege.java: Line 75: toLowerCase())); nit: 4 space formatting everywhere http://gerrit.cloudera.org:8080/#/c/7332/3/tests/authorization/test_grant_revoke.py File tests/authorization/test_grant_revoke.py: PS3, Line 130: statestored_args=("--statestore_heartbeat_frequency_ms=300 " : "--statestore_update_frequency_ms=300")) Is this required? PS3, Line 143: grant_rev_db Looks like a common db name used in authz tests. Could you randomize it? Else we might hit flaky tests if this DB already exists from test_grant_revoke for ex. Line 149: self.client.execute("grant select on table test1 to test_role") Could you also add some grants on camel case DB names? PS3, Line 165: self.client.execute("drop role test_role") How about adding it to a finally block incase the test fails, also clean up the dbs created? -- To view, visit http://gerrit.cloudera.org:8080/7332 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ide3dfa601fcf77f5acc6adce9bea443aea600901 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadke Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: anujphadke Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5520: TopN node periodically reclaims old allocations
Bikramjeet Vig has uploaded a new patch set (#3). Change subject: IMPALA-5520: TopN node periodically reclaims old allocations .. IMPALA-5520: TopN node periodically reclaims old allocations Currently TopN retains old string allocations in a tuple pool which is held longer than necessary, resulting in unnecessary memory usage. With this commit, the TopN node will periodically re-materialise the rows stored in the priority queue and reclaim the old allocations. This is done when the number of rows removed from the priority queue is more than twice the N (limit + offset). Moreover, a new counter called "TuplePoolReclamations" is added to the TopN node that keeps track of the number of times the tuple pool is reclaimed. Testing: Test added to test_tpch_queries.py which sets a low mem_limit such that the test would fail if reclamation is not implemented and pass otherwise. Performance: Query 1 (expected general case): select * from tpch.lineitem order by l_orderkey desc limit 10; Query 2 (example worst case: data stored in reverse order before feeding to the last TopN node): select * from (select * from tpch.lineitem order by l_orderkey desc limit 6001215) tb order by l_orderkey limit 10; With Reclaim Without Reclaim Query 1 Query 2 Query 1 Query 2 MaxTuplePoolMem3.96 KB 3.43 KB 110.2 MB708.8 MB Time (mean)2s 218ms6s 391ms 2s 021ms6s 406ms Time (stdev) 74.38ms 67.45ms 102.71ms70.44ms Reclaims910 5861 N/A N/A We notice that memory footprint is orders of magnitude lower while maintaining similar query runtimes. Cluster perf testing will be done later. Change-Id: I968f57f0ff2905bd581908bc5c5ee486b31e6aa8 --- M be/src/exec/topn-node-ir.cc M be/src/exec/topn-node.cc M be/src/exec/topn-node.h M tests/query_test/test_tpch_queries.py 4 files changed, 120 insertions(+), 27 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/7400/3 -- To view, visit http://gerrit.cloudera.org:8080/7400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I968f57f0ff2905bd581908bc5c5ee486b31e6aa8 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5521: TopN node periodically reclaims old allocations
Bikramjeet Vig has uploaded a new patch set (#2). Change subject: IMPALA-5521: TopN node periodically reclaims old allocations .. IMPALA-5521: TopN node periodically reclaims old allocations Currently TopN retains old string allocations in a tuple pool which is held longer than necessary, resulting in unnecessary memory usage. With this commit, the TopN node will periodically re-materialise the rows stored in the priority queue and reclaim the old allocations. This is done when the number of rows removed from the priority queue is more than twice the N (limit + offset). Moreover, a new counter called "TuplePoolReclamations" is added to the TopN node that keeps track of the number of times the tuple pool is reclaimed. Testing: Test added to test_tpch_queries.py which sets a low mem_limit such that the test would fail if reclamation is not implemented and pass otherwise. Performance: Query 1 (expected general case): select * from tpch.lineitem order by l_orderkey desc limit 10; Query 2 (example worst case: data stored in reverse order before feeding to the last TopN node): select * from (select * from tpch.lineitem order by l_orderkey desc limit 6001215) tb order by l_orderkey limit 10; With Reclaim Without Reclaim Query 1 Query 2 Query 1 Query 2 MaxTuplePoolMem3.96 KB 3.43 KB 110.2 MB708.8 MB Time (mean)2s 218ms6s 391ms 2s 021ms6s 406ms Time (stdev) 74.38ms 67.45ms 102.71ms70.44ms Reclaims910 5861 N/A N/A We notice that memory footprint is orders of magnitude lower while maintaining similar query runtimes. Cluster perf testing will be done later. Change-Id: I968f57f0ff2905bd581908bc5c5ee486b31e6aa8 --- M be/src/exec/topn-node-ir.cc M be/src/exec/topn-node.cc M be/src/exec/topn-node.h M tests/query_test/test_tpch_queries.py 4 files changed, 120 insertions(+), 27 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/7400/2 -- To view, visit http://gerrit.cloudera.org:8080/7400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I968f57f0ff2905bd581908bc5c5ee486b31e6aa8 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5520: TopN node periodically reclaims old allocations
Bikramjeet Vig has posted comments on this change. Change subject: IMPALA-5520: TopN node periodically reclaims old allocations .. Patch Set 1: (19 comments) http://gerrit.cloudera.org:8080/#/c/7400/1//COMMIT_MSG Commit Message: PS1, Line 9: tuple > in a tuple pool which is held longer than necessary, resulting in unnecessa Done PS1, Line 11: With this commit TopN node > commit, the TopN node Done PS1, Line 12: its > stored in the priority queue Done PS1, Line 12: This is done every time the : number of old rows (removed from the priority queue it uses) is more : than twice the row limit (N in TopN) > This is done when the number of rows removed from the priority queue is mor Done PS1, Line 24: general > expected general case Done PS1, Line 27: data > example worst case: data stored... Done PS1, Line 39: performance > query runtimes Done PS1, Line 39: More extensive perf testing will be : done in the future to recognize pathological cases > Not necessarily pathological cases, more that we're gonna do cluster testin Done http://gerrit.cloudera.org:8080/#/c/7400/1/be/src/exec/topn-node-ir.cc File be/src/exec/topn-node-ir.cc: PS1, Line 28: limit_ > why isn't this (limit_ + offset_), since that's the size of the PQ? Done Line 61: void TopNNode::ReclaimTuplePool() { > I don't think this needs to be in the *-ir.cc - the stuff in here gets reco Done Line 62: auto temp_pool = new MemPool(mem_tracker()); > It would be good to use unique_ptr here just so we don't accidentally creat Done Line 63: auto temp_pq = new std::priority_queue, > Rebuilding the priority queue like this seems unnecessary. It's a std::vect Done PS1, Line 69: Allocate > We're trying to move to using TryAllocate() instead and failing synchronous Done http://gerrit.cloudera.org:8080/#/c/7400/1/be/src/exec/topn-node.cc File be/src/exec/topn-node.cc: PS1, Line 88: NumOfTimesTuplePoolReclaimed > TuplePoolReclamations is more concise Done PS1, Line 88: > indentation Done http://gerrit.cloudera.org:8080/#/c/7400/1/be/src/exec/topn-node.h File be/src/exec/topn-node.h: PS1, Line 71: Create new tuple pool with only the elements inside the priority queue and release : // memory from the old tuple pool > Re-materialize the elements in the priority queue into a new tuple pool, an Done PS1, Line 115: poped > popped has two 'p's. Regardless, let's call this rows_to_reclaim_ because n Done http://gerrit.cloudera.org:8080/#/c/7400/1/tests/query_test/test_tpch_queries.py File tests/query_test/test_tpch_queries.py: Line 67: return 'tpch' > newline after this Done PS1, Line 75: \ > nit: remove space to be consistent Done -- To view, visit http://gerrit.cloudera.org:8080/7400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I968f57f0ff2905bd581908bc5c5ee486b31e6aa8 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4669: [KUTIL] Add kudu util library to the build.
Henry Robinson has posted comments on this change. Change subject: IMPALA-4669: [KUTIL] Add kudu_util library to the build. .. Patch Set 22: PS22 takes the blunt-instrument approach to preventing clang-tidy from running on be/src/kudu/*. I have not been able to test in my local environment, so running another dry run on jenkins.impala.io. -- To view, visit http://gerrit.cloudera.org:8080/5715 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8da798ee55506d6f969416b17c191eb03cb215f5 Gerrit-PatchSet: 22 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4669: [KUTIL] Add kudu util library to the build.
Hello Impala Public Jenkins, Michael Ho, Matthew Jacobs, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5715 to look at the new patch set (#22). Change subject: IMPALA-4669: [KUTIL] Add kudu_util library to the build. .. IMPALA-4669: [KUTIL] Add kudu_util library to the build. A few miscellaneous changes to allow kudu_util to compile with Impala. Add kudu_version.cc to substitute for the version.cc file that is automatically built during the full Kudu build. Set LZ4_DISABLE_DEPRECATE_WARNINGS to allow Kudu's compressor utility to use deprecated names for LZ4 methods. Add NO_NVM_SUPPORT flag to Kudu build (plan to upstream this later) to disable building with nvm support, removing a library dependency. Also remove imported FindOpenSSL.cmake in favour of the standard one provided by cmake itself. Finally, a few changes to allow compilation on RHEL5: * Only use sched_getcpu() if supported * Only include magic.h if available * Workaround for kernels that don't have SOCK_NONBLOCK * Workaround for kernels that don't have O_CLOEXEC (ignore the flag) * Provide non-working implementation of fallocate() * Disable inclusion of linux/fiemap.h - although this exists on RHEL5, it does not compile due to other #includes in env_posix.cc. We disable the path this is used for, since Impala does not call that code. * Use Kudu's implementation of pipe(2), preadv(2) and pwritev(2) where it doesn't exist. In most cases these changes simply force kutil to revert to a different implementation that was already written for OSX support - this patch generalises the logic to provide the implementation whenever the required function doesn't exist. This patch compiles on RHEL5.5 and 6.0, SLES11 and 12, Ubuntu 12.04 and 14.04 and Debian 7.0 and 8.0. Change-Id: I8da798ee55506d6f969416b17c191eb03cb215f5 --- M CMakeLists.txt M be/CMakeLists.txt M be/src/common/CMakeLists.txt M be/src/common/config.h.in A be/src/common/kudu_version.cc M be/src/common/logging.cc M be/src/exec/kudu-util.h A be/src/kudu/gutil M be/src/kudu/util/CMakeLists.txt M be/src/kudu/util/cache.cc M be/src/kudu/util/compression/compression_codec.cc M be/src/kudu/util/env.cc M be/src/kudu/util/env_posix.cc M be/src/kudu/util/flags.cc A be/src/kudu/util/kudu_export.h M be/src/kudu/util/locks.h M be/src/kudu/util/logging.cc M be/src/kudu/util/minidump.cc M be/src/kudu/util/net/socket.cc M be/src/kudu/util/subprocess.cc M bin/run_clang_tidy.sh D cmake_modules/FindOpenSSL.cmake 22 files changed, 223 insertions(+), 107 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/5715/22 -- To view, visit http://gerrit.cloudera.org:8080/5715 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8da798ee55506d6f969416b17c191eb03cb215f5 Gerrit-PatchSet: 22 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5511: Add process start time to debug web page
Henry Robinson has posted comments on this change. Change subject: IMPALA-5511: Add process start time to debug web page .. Patch Set 5: To that end, it might be better to define a single process-start-time metric, and to have each daemon call a new InitCommonMetrics(MetricGroup* root) method. Then the handler for the root URL can just grab the process start time metric, without needing to know if it's a statestore, catalog server or impalad. -- To view, visit http://gerrit.cloudera.org:8080/7363 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I05ae2f80835b1b0e4bc3b38731778ba0871338a4 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4669: [KUTIL] Add kudu util library to the build.
Henry Robinson has posted comments on this change. Change subject: IMPALA-4669: [KUTIL] Add kudu_util library to the build. .. Patch Set 21: PS21 passed all tests. clang-tidy issue is still outstanding, as there is no obvious way to disable all checks for a single directory. -- To view, visit http://gerrit.cloudera.org:8080/5715 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8da798ee55506d6f969416b17c191eb03cb215f5 Gerrit-PatchSet: 21 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5511: Add process start time to debug web page
Henry Robinson has posted comments on this change. Change subject: IMPALA-5511: Add process start time to debug web page .. Patch Set 5: Can you add this information to the root (/) web page for each daemon? I think it's very useful to be able to see this prominently. -- To view, visit http://gerrit.cloudera.org:8080/7363 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I05ae2f80835b1b0e4bc3b38731778ba0871338a4 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5582: Store sentry privileges in lower case
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-5582: Store sentry privileges in lower case .. Patch Set 3: (4 comments) http://gerrit.cloudera.org:8080/#/c/7332/3/be/src/catalog/catalog.cc File be/src/catalog/catalog.cc: PS3, Line 42: ( nit space http://gerrit.cloudera.org:8080/#/c/7332/3/tests/authorization/test_grant_revoke.py File tests/authorization/test_grant_revoke.py: Line 25: from __builtin__ import any as b_any why is this necessary? PS3, Line 129: 10 why not just 1 second? PS3, Line 156: # Sleep for 15 seconds and make sure that the privileges : # on all 3 tables still persist on a sentryProxy thread : # update. sentry_catalog_polling_frequency is set to 10 : # seconds. : sleep(15) I wonder if we can lower this if we reduce the sentry polling freq. Even a 15 sec wait is unfortunate in our already long running tests -- To view, visit http://gerrit.cloudera.org:8080/7332 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ide3dfa601fcf77f5acc6adce9bea443aea600901 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadke Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: anujphadke Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5116: Removal of uses for the deprecated functions in gutil
Kim Jin Chul has uploaded a new change for review. http://gerrit.cloudera.org:8080/7414 Change subject: IMPALA-5116: Removal of uses for the deprecated functions in gutil .. IMPALA-5116: Removal of uses for the deprecated functions in gutil The following functions are substituted for C++11 standard. __gnu_cxx::hash_map => std::unordered_map __gnu_cxx::hash_set => std::unordered_set __gnu_cxx::hash => std::hash Change-Id: I06af30fd15acd43a60e3c30af54056c96520b6e9 --- M be/src/gutil/hash/hash.cc M be/src/gutil/hash/hash.h M be/src/gutil/strings/join.h M be/src/gutil/strings/serialize.cc M be/src/gutil/strings/serialize.h M be/src/gutil/strings/split.cc M be/src/gutil/strings/split.h 7 files changed, 35 insertions(+), 221 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/7414/1 -- To view, visit http://gerrit.cloudera.org:8080/7414 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I06af30fd15acd43a60e3c30af54056c96520b6e9 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Kim Jin Chul
[Impala-ASF-CR] IMPALA-5116: Removal of uses for the deprecated functions in gutil
Kim Jin Chul has abandoned this change. Change subject: IMPALA-5116: Removal of uses for the deprecated functions in gutil .. Abandoned -- To view, visit http://gerrit.cloudera.org:8080/7413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: abandon Gerrit-Change-Id: Ia4bd8ae4cc4ca5aa72e181ce17bac9d1c91f9fb3 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Kim Jin Chul
[Impala-ASF-CR] IMPALA-5116: Removal of uses for the deprecated functions in gutil
Kim Jin Chul has uploaded a new change for review. http://gerrit.cloudera.org:8080/7413 Change subject: IMPALA-5116: Removal of uses for the deprecated functions in gutil .. IMPALA-5116: Removal of uses for the deprecated functions in gutil The following functions are substituted for C++11 standard. __gnu_cxx::hash_map => std::unordered_map __gnu_cxx::hash_set => std::unordered_set __gnu_cxx::hash => std::hash Change-Id: Ia4bd8ae4cc4ca5aa72e181ce17bac9d1c91f9fb3 --- M be/src/gutil/hash/hash.cc M be/src/gutil/hash/hash.h M be/src/gutil/strings/join.h M be/src/gutil/strings/serialize.cc M be/src/gutil/strings/serialize.h M be/src/gutil/strings/split.cc M be/src/gutil/strings/split.h 7 files changed, 35 insertions(+), 221 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/7413/1 -- To view, visit http://gerrit.cloudera.org:8080/7413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ia4bd8ae4cc4ca5aa72e181ce17bac9d1c91f9fb3 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Kim Jin Chul
[Impala-ASF-CR] IMPALA-5511: Add process start time to debug web page
Gabor Kaszab has posted comments on this change. Change subject: IMPALA-5511: Add process start time to debug web page .. Patch Set 5: comments incorporated -- To view, visit http://gerrit.cloudera.org:8080/7363 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I05ae2f80835b1b0e4bc3b38731778ba0871338a4 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5407: Fix crash in HdfsSequenceTableWriter
Attila Jeges has uploaded a new patch set (#2). Change subject: IMPALA-5407: Fix crash in HdfsSequenceTableWriter .. IMPALA-5407: Fix crash in HdfsSequenceTableWriter The following use of sequence file writer can lead to a crash: > set compression_codec=gzip; > set seq_compression_mode=record; > set allow_unsupported_formats=1; > create table seq_tbl like tbl stored as sequencefile; > insert into seq_tbl select * from tbl; The issue is that HdfsSequenceTableWriter::Flush() frees the per-file memory pool, but the buffer used for compression is still in use after Flush() returns. If there are additional rows to process after calling Flush(), impalad might crash. This fix removes the FreeAll() call from Flush(). It was tested on a local minicluster with a table containing around 100 million rows. Change-Id: Ida0b9f189175358ae54149d0e1af7caa06ae3bec --- M be/src/exec/hdfs-sequence-table-writer.cc 1 file changed, 0 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/7394/2 -- To view, visit http://gerrit.cloudera.org:8080/7394 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ida0b9f189175358ae54149d0e1af7caa06ae3bec Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Michael Ho
[Impala-ASF-CR] IMPALA-5407: Fix crash in HdfsSequenceTableWriter
Attila Jeges has posted comments on this change. Change subject: IMPALA-5407: Fix crash in HdfsSequenceTableWriter .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/7394/1//COMMIT_MSG Commit Message: PS1, Line 16: The issue is that HdfsSequenceTableWriter::Flush() frees the per-file : memory pool, but the buffer used for compression is still in use after : Flush() returns. If there are additional > Would you mind also explaining the original problem which triggered the cra The buffer used for compression is still in used after Flush() returns. If there are additional rows to process after calling Flush(), impalad might crash. I've updated the commit message. After taking another look at the code I realized that InitNewFile() is called only once for each HdfsSequenceTableWriter instance, so it doesn't really make sense to clear the mem pool there. I've removed mem_pool_->Clear() from InitNewFile(). -- To view, visit http://gerrit.cloudera.org:8080/7394 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ida0b9f189175358ae54149d0e1af7caa06ae3bec Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Michael Ho Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5407: Fix crash in HdfsSequenceTableWriter
Attila Jeges has uploaded a new patch set (#2). Change subject: IMPALA-5407: Fix crash in HdfsSequenceTableWriter .. IMPALA-5407: Fix crash in HdfsSequenceTableWriter The following use of sequence file writer can lead to a crash: > set compression_codec=gzip; > set seq_compression_mode=record; > set allow_unsupported_formats=1; > create table seq_tbl like tbl stored as sequencefile; > insert into seq_tbl select * from tbl; The issue is that HdfsSequenceTableWriter::Flush() frees the per-file memory pool, but the buffer used for compression is still in use after Flush() returns. If there are additional rows to process after calling Flush(), impalad might crash. This fix removes the FreeAll() call from Flush(). It was tested on a local minicluster with a table containing around 100 million rows. Change-Id: Ida0b9f189175358ae54149d0e1af7caa06ae3bec --- M be/src/exec/hdfs-sequence-table-writer.cc 1 file changed, 0 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/7394/2 -- To view, visit http://gerrit.cloudera.org:8080/7394 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ida0b9f189175358ae54149d0e1af7caa06ae3bec Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Michael Ho
[Impala-ASF-CR] test
Kim Jin Chul has abandoned this change. Change subject: test .. Abandoned -- To view, visit http://gerrit.cloudera.org:8080/7412 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: abandon Gerrit-Change-Id: I8e3f1e28004e9d2a7526c83e00fcd8ecd2e0abc2 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Kim Jin Chul
[Impala-ASF-CR] test
Kim Jin Chul has uploaded a new change for review. http://gerrit.cloudera.org:8080/7412 Change subject: test .. test Change-Id: I8e3f1e28004e9d2a7526c83e00fcd8ecd2e0abc2 --- M README.md 1 file changed, 2 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/7412/1 -- To view, visit http://gerrit.cloudera.org:8080/7412 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I8e3f1e28004e9d2a7526c83e00fcd8ecd2e0abc2 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Kim Jin Chul
[Impala-ASF-CR] IMPALA-5586: Null-aware anti-join can take a long time to cancel
anujphadke has posted comments on this change. Change subject: IMPALA-5586: Null-aware anti-join can take a long time to cancel .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/7393/2//COMMIT_MSG Commit Message: PS2, Line 15: PartitionedHashJoinNode::EvaluateNullProbe > Add '()' after like you have above. Done http://gerrit.cloudera.org:8080/#/c/7393/2/be/src/exec/partitioned-hash-join-node.h File be/src/exec/partitioned-hash-join-node.h: PS2, Line 350: u > extra space Done -- To view, visit http://gerrit.cloudera.org:8080/7393 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0800754d4ad31cbadbdfadc630c640963f3f6053 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadke Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: anujphadke Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5586: Null-aware anti-join can take a long time to cancel
anujphadke has uploaded a new patch set (#3). Change subject: IMPALA-5586: Null-aware anti-join can take a long time to cancel .. IMPALA-5586: Null-aware anti-join can take a long time to cancel Queries with a null-aware anti-join joining on a large number of NULLs can take a long time to cancel if threads are stuck in PartitionedHashJoinNode::EvaluateNullProbe(). This change adds the RETURN_IF_CANCELLED macro to the function. Testing: Added logs to PartitionedHashJoinNode::EvaluateNullProbe() and made sure that the function returns right away on cancellation. Change-Id: I0800754d4ad31cbadbdfadc630c640963f3f6053 --- M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/partitioned-hash-join-node.h 2 files changed, 12 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/7393/3 -- To view, visit http://gerrit.cloudera.org:8080/7393 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0800754d4ad31cbadbdfadc630c640963f3f6053 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadke Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-5582: Store sentry privileges in lower case
anujphadke has posted comments on this change. Change subject: IMPALA-5582: Store sentry privileges in lower case .. Patch Set 3: Discussed this issue with MJ. This issue does occur due to inconsistent handling of casing. Changed buildRolePrivilegeName to return the privilege names in lower case instead of changing it in privilegeSpec. Also changed getRolePrivileges to return the output of show grant roles in lower case since we are not lower casing privilegeSpec object anymore. Added a test in test_grant_revoke.py and made sentryProxy configurable. -- To view, visit http://gerrit.cloudera.org:8080/7332 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ide3dfa601fcf77f5acc6adce9bea443aea600901 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadke Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: anujphadke Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5582: Store sentry privileges in lower case
anujphadke has uploaded a new patch set (#3). Change subject: IMPALA-5582: Store sentry privileges in lower case .. IMPALA-5582: Store sentry privileges in lower case Privileges granted to a db/table whose name contains upper case characters can disappear after a few seconds. A privilege is inserted into the catalogObjectCache using a key that uses the db/table name. The key gets converted to lower case before inserting. The db/table name is stored in upper case in the rolePrivilege object. The sentryProxy thread on the other hand returns the db/table name in lower case. If the catalogObjectCache on a update adds the new rolePrivilege object first and removes the old object later, it ends up deleting the newer verison of the rolePrivilege since they both use the same key to add/remove from the cache. This change sets the privilege name in lower case which does not trigger these chain of events on a sentryProxy thread update. This change also adds a new catalogd startup option (sentry_catalog_polling_frequency) to configure the frequency at which catalogd polls the sentry service to update any policy changes. The default value is 60 seconds. Test: Added a test which adds select privileges to 3 tables specified in lower case, upper case and mixed case. The test verifies that the privileges on the 3 tables do not disappear on a sentry update. Change-Id: Ide3dfa601fcf77f5acc6adce9bea443aea600901 --- M be/src/catalog/catalog.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/catalog/AuthorizationPolicy.java M fe/src/main/java/org/apache/impala/catalog/RolePrivilege.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/util/SentryProxy.java M tests/authorization/test_grant_revoke.py 8 files changed, 90 insertions(+), 21 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/7332/3 -- To view, visit http://gerrit.cloudera.org:8080/7332 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ide3dfa601fcf77f5acc6adce9bea443aea600901 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadke Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: anujphadke