[Impala-ASF-CR] IMPALA-5513: Fix display message exception when using invalid KEYVAL
Donghui Xu has posted comments on this change. Change subject: IMPALA-5513: Fix display message exception when using invalid KEYVAL .. Patch Set 7: -Code-Review -- To view, visit http://gerrit.cloudera.org:8080/7187 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib883499a88f39d91b69bea4291f1ce5dd264ccf6 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Donghui Xu Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Donghui Xu Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Jim Apple Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5513: Fix display message exception when using invalid KEYVAL
Donghui Xu has posted comments on this change. Change subject: IMPALA-5513: Fix display message exception when using invalid KEYVAL .. Patch Set 7: Code-Review+1 Thank you for the suggestion that I have modified the code. -- To view, visit http://gerrit.cloudera.org:8080/7187 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib883499a88f39d91b69bea4291f1ce5dd264ccf6 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Donghui Xu Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Donghui Xu Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Jim Apple Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5513: Fix display message exception when using invalid KEYVAL
Hello Bharath Vissapragada, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7187 to look at the new patch set (#7). Change subject: IMPALA-5513: Fix display message exception when using invalid KEYVAL .. IMPALA-5513: Fix display message exception when using invalid KEYVAL Function print_to_stderr() has a syntax error when an error message is displayed. I solved this problem by exchanging the position of variable and the subsequent strings in function print_to_stderr(). Change-Id: Ib883499a88f39d91b69bea4291f1ce5dd264ccf6 --- M shell/impala_shell.py M tests/shell/test_shell_commandline.py 2 files changed, 6 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/7187/7 -- To view, visit http://gerrit.cloudera.org:8080/7187 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib883499a88f39d91b69bea4291f1ce5dd264ccf6 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Donghui Xu Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Donghui Xu Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Jim Apple
[Impala-ASF-CR] IMPALA-4939, IMPALA-4939: Decimal V2 multiplication
Dan Hecht has posted comments on this change. Change subject: IMPALA-4939, IMPALA-4939: Decimal V2 multiplication .. Patch Set 1: What is the perf implication of the backend changes? -- To view, visit http://gerrit.cloudera.org:8080/7438 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4939, IMPALA-4939: Decimal V2 multiplication
Dan Hecht has posted comments on this change. Change subject: IMPALA-4939, IMPALA-4939: Decimal V2 multiplication .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/7438/1/fe/src/main/java/org/apache/impala/analysis/TypesUtil.java File fe/src/main/java/org/apache/impala/analysis/TypesUtil.java: PS1, Line 261:if (resultPrecision > 38) { : int minScale = Math.min(resultScale, ScalarType.MIN_ADJUSTED_SCALE); : // How low would the scale need to be in the ideal case, in order to be able to : // store all the digits to the LEFT of the dot. This number may be negative. : int idealScale = resultScale - (resultPrecision - ScalarType.MAX_PRECISION); : resultScale = Math.max(idealScale, minScale); : resultPrecision = 38; : } doesn't the call to ScalarType.createAdjustedDecimalType() on line 277 do that? -- To view, visit http://gerrit.cloudera.org:8080/7438 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4939, IMPALA-4939: Decimal V2 multiplication
Taras Bobrovytsky has uploaded a new change for review. http://gerrit.cloudera.org:8080/7438 Change subject: IMPALA-4939, IMPALA-4939: Decimal V2 multiplication .. IMPALA-4939, IMPALA-4939: Decimal V2 multiplication Implement the new DECIMAL return type rules for multiply expressions, active when query option DECIMAL_V2=1. The algorithm for determining the type of the result of multiplication is described in the JIRA. DECIMAL V1: +---+ | typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) | +---+ | DECIMAL(38,38)| +---+ +---+ | typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) | +---+ | DECIMAL(38,30)| +---+ DECIMAL V2: +---+ | typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) | +---+ | DECIMAL(38,37)| +---+ +---+ | typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) | +---+ | DECIMAL(38,6) | +---+ In this patch, we also fix the early multiplication overflow. We compute an int256 intermediate value, which we then attempt to scale down and round to int128. Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1 --- M be/src/exprs/expr-test.cc M be/src/runtime/decimal-value.inline.h M be/src/util/bit-util.h M be/src/util/decimal-util.h M fe/src/main/java/org/apache/impala/analysis/TypesUtil.java 5 files changed, 385 insertions(+), 29 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/7438/1 -- To view, visit http://gerrit.cloudera.org:8080/7438 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-4276: Profile displays non-default query options set by planner
Bikramjeet Vig has posted comments on this change. Change subject: IMPALA-4276: Profile displays non-default query options set by planner .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/7419/3/tests/query_test/test_observability.py File tests/query_test/test_observability.py: PS3, Line 101: self.execute_query("set mem_limit = 8589934592") : # For this query, the pla > here and below, I don't see any reason to define these short queries as a l Done -- To view, visit http://gerrit.cloudera.org:8080/7419 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08e9dc2bebb83101976bbbd903ee48c5068dbaab Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4276: Profile displays non-default query options set by planner
Hello Matthew Jacobs, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7419 to look at the new patch set (#4). Change subject: IMPALA-4276: Profile displays non-default query options set by planner .. IMPALA-4276: Profile displays non-default query options set by planner Fix to populate the non-default query options set by planner in the runtime profile. Added a corresponding test case. Change-Id: I08e9dc2bebb83101976bbbd903ee48c5068dbaab --- M be/src/service/client-request-state.cc M tests/query_test/test_observability.py 2 files changed, 19 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/7419/4 -- To view, visit http://gerrit.cloudera.org:8080/7419 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I08e9dc2bebb83101976bbbd903ee48c5068dbaab Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Matthew Jacobs
[Impala-ASF-CR] IMPALA-5520: TopN node periodically reclaims old allocations
Bikramjeet Vig has uploaded a new patch set (#6). Change subject: IMPALA-5520: TopN node periodically reclaims old allocations .. IMPALA-5520: TopN node periodically reclaims old allocations Currently TopN retains old string allocations in a tuple pool which is held longer than necessary, resulting in unnecessary memory usage. With this commit, the TopN node will periodically re-materialise the rows stored in the priority queue and reclaim the old allocations. This is done when the number of rows removed from the priority queue is more than twice the N (limit + offset). Moreover, a new counter called "TuplePoolReclamations" is added to the TopN node that keeps track of the number of times the tuple pool is reclaimed. Testing: Test added to test_queries.py which sets a low mem_limit such that the test would fail if reclamation is not implemented and pass otherwise. Performance: Query 1 (expected general case): select * from tpch.lineitem order by l_orderkey desc limit 10; Query 2 (example worst case: data stored in reverse order before feeding to the last TopN node): select * from (select * from tpch.lineitem order by l_orderkey desc limit 6001215) tb order by l_orderkey limit 10; With Reclaim Without Reclaim Query 1 Query 2 Query 1 Query 2 MaxTuplePoolMem3.96 KB 3.43 KB 110.2 MB708.8 MB Time (mean)2s 218ms6s 391ms 2s 021ms6s 406ms Time (stdev) 74.38ms 67.45ms 102.71ms70.44ms Reclaims910 5861 N/A N/A We notice that memory footprint is orders of magnitude lower while maintaining similar query runtimes. Cluster perf testing will be done later. Change-Id: I968f57f0ff2905bd581908bc5c5ee486b31e6aa8 --- M be/src/exec/topn-node-ir.cc M be/src/exec/topn-node.cc M be/src/exec/topn-node.h M tests/query_test/test_queries.py 4 files changed, 120 insertions(+), 22 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/7400/6 -- To view, visit http://gerrit.cloudera.org:8080/7400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I968f57f0ff2905bd581908bc5c5ee486b31e6aa8 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5520: TopN node periodically reclaims old allocations
Bikramjeet Vig has posted comments on this change. Change subject: IMPALA-5520: TopN node periodically reclaims old allocations .. Patch Set 5: (11 comments) http://gerrit.cloudera.org:8080/#/c/7400/5//COMMIT_MSG Commit Message: PS5, Line 19: test_tpch_queries > The test doesn't belong in this file, see my comment in the test file itsel Done http://gerrit.cloudera.org:8080/#/c/7400/5/be/src/exec/topn-node-ir.cc File be/src/exec/topn-node-ir.cc: PS5, Line 53: ) : PushHeap(priority_queue_, : ComparatorWrapper(*tuple_row_less_than_), insert_tuple); > wrap this in { } per the c++ style guide: Done http://gerrit.cloudera.org:8080/#/c/7400/5/be/src/exec/topn-node.cc File be/src/exec/topn-node.cc: PS5, Line 255: std:: > Shouldn't be needed (common/names.h has "using std::unique_ptr"). Done http://gerrit.cloudera.org:8080/#/c/7400/5/be/src/exec/topn-node.h File be/src/exec/topn-node.h: PS5, Line 114: tuple_pool > tuple_pool_ Done PS5, Line 126: modified only using : /// push_heap()/pop_heap() to maintain ordered heap invariants > this makes it sound like it should be encapsulated in an inner class, with Done PS5, Line 133: priority_queue_ > remove the trailing _ Done PS5, Line 135: inline > Could be static, since it doesn't reference member variables. Done PS5, Line 135: std::vector& > Our convention is to pass modified arguments by pointer - https://google.g Done PS5, Line 133: /// Helper methods for modifying priority_queue_ while maintaining ordered heap : /// invariants : inline void PushHeap(std::vector& priority_queue, : const ComparatorWrapper& comparator, Tuple* const insert_row) { : priority_queue.push_back(insert_row); : std::push_heap(priority_queue.begin(), priority_queue.end(), comparator); : } : : inline void PopHeap(std::vector& priority_queue, : const ComparatorWrapper& comparator) { : std::pop_heap(priority_queue.begin(), priority_queue.end(), comparator); : priority_queue.pop_back(); > move these functions up where the other private functions are Done http://gerrit.cloudera.org:8080/#/c/7400/5/tests/query_test/test_tpch_queries.py File tests/query_test/test_tpch_queries.py: PS5, Line 74: cls.ImpalaTestMatrix.add_constraint(lambda v:\ : v.get_value('table_format').file_format == 'text') : cls.ImpalaTestMatrix.add_constraint(lambda v:\ : v.get_value('table_format').compression_codec == 'none') > helper fn to do this: Done PS5, Line 56: class TestTopNReclaimQuery(ImpalaTestSuite): : """Test class to validate that TopN periodically reclaims tuple pool memory :and runs with a lower memory footprint.""" : QUERY = "select * from tpch.lineitem order by l_orderkey desc limit 10;" : : # Mem limit empirically selected so that the query fails if tuple pool reclamation : # is not implemented for TopN : MEM_LIMIT = "50m" : : @classmethod : def get_workload(self): : return 'tpch' : : @classmethod : def add_test_dimensions(cls): : super(TestTopNReclaimQuery, cls).add_test_dimensions() : # The tpch tests take a long time to execute so restrict the combinations they : # execute over. : cls.ImpalaTestMatrix.add_constraint(lambda v:\ : v.get_value('table_format').file_format == 'text') : cls.ImpalaTestMatrix.add_constraint(lambda v:\ : v.get_value('table_format').compression_codec == 'none') : : def test_top_n_reclaim(self, vector): : exec_options = vector.get_value('exec_option') : exec_options['mem_limit'] = self.MEM_LIMIT : result = self.execute_query(self.QUERY, exec_options) : runtime_profile = str(result.runtime_profile) : num_of_times_tuple_pool_reclaimed = re.findall( : 'TuplePoolReclamations: ([0-9]*)', runtime_profile) : # Confirm newly added counter is visible : assert len(num_of_times_tuple_pool_reclaimed) > 0 : # Tuple pool is expected to be reclaimed for this query : for n in num_of_times_tuple_pool_reclaimed: : assert int(n) > 0 > this doesn't belong in test_tpch_queries.py, it just happens to use a tpch Done -- To view, visit http://gerrit.cloudera.org:8080/7400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I968f57f0ff2905bd5
[Impala-ASF-CR] IMPALA-4276: Profile displays non-default query options set by planner
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-4276: Profile displays non-default query options set by planner .. Patch Set 3: Code-Review+1 (1 comment) LGTM with 1 small test change. Let's wait to see if Henry wants to add anything else. http://gerrit.cloudera.org:8080/#/c/7419/3/tests/query_test/test_observability.py File tests/query_test/test_observability.py: PS3, Line 101: query = "set mem_limit = 8589934592" : self.execute_query(query) here and below, I don't see any reason to define these short queries as a local variable, just pass the strings directly to execute_query(...) -- To view, visit http://gerrit.cloudera.org:8080/7419 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08e9dc2bebb83101976bbbd903ee48c5068dbaab Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5513: Fix display message exception when using invalid KEYVAL
Bharath Vissapragada has posted comments on this change. Change subject: IMPALA-5513: Fix display message exception when using invalid KEYVAL .. Patch Set 6: Code-Review+1 Didn't mean to -1, feel free to carry my +1 once you fix the test. -- To view, visit http://gerrit.cloudera.org:8080/7187 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib883499a88f39d91b69bea4291f1ce5dd264ccf6 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Donghui Xu Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Donghui Xu Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Jim Apple Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5513: Fix display message exception when using invalid KEYVAL
Bharath Vissapragada has posted comments on this change. Change subject: IMPALA-5513: Fix display message exception when using invalid KEYVAL .. Patch Set 6: -Code-Review Sorry for holding this up, didn't realize it was waiting on me until Jim pointed out (thanks Jim). So I think it is ok to club those tests together. We generally group them by functionality and in this case I don't think we need to create a separate unit test function for such a trivial test. -- To view, visit http://gerrit.cloudera.org:8080/7187 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib883499a88f39d91b69bea4291f1ce5dd264ccf6 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Donghui Xu Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Donghui Xu Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Jim Apple Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5498: Support for partial sorts
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-5498: Support for partial sorts .. Patch Set 6: (14 comments) http://gerrit.cloudera.org:8080/#/c/7267/6//COMMIT_MSG Commit Message: Line 7: IMPALA-5498: Support for partial sorts lets specify in Kudu for now PS6, Line 28: A nit: This also adds... Line 32: used previously for inserts into Kudu. "for inserts into Kudu"->"for DML statements into Kudu tables only. Future work can extend this to other table sinks IMPALA-" Line 42: Now: no spills, sort takes 6m19s, for ~18% speedup comparison vs no partition/sort at all? http://gerrit.cloudera.org:8080/#/c/7267/6/be/src/exec/partial-sort-node.h File be/src/exec/partial-sort-node.h: PS6, Line 62: sort_exec_exprs_ update PS6, Line 62: derived based awkward/confusing http://gerrit.cloudera.org:8080/#/c/7267/6/be/src/runtime/sorter.cc File be/src/runtime/sorter.cc: PS6, Line 1383: RunSize can you indicate the units, e.g. #rows -> NumRowsPerRun? PS6, Line 1387: if (no_spill_) { : // 'min_buffers_required' is 1 since the sorter will only have 1 run at a time, so : // there won't be any merges. : min_buffers_required = 1; : } else { : min_buffers_required = MIN_BUFFERS_PER_MERGE; : } nit: ternary operator http://gerrit.cloudera.org:8080/#/c/7267/6/be/src/runtime/sorter.h File be/src/runtime/sorter.h: PS6, Line 45: Specifying 'no_spill' as true is For this use case, 'no_spill' should be set to true so the sorter can reduce... PS6, Line 103: bool no_spill = false comment on this even though it's mentioned above Line 117: /// up, it is sorted and a new unsorted run is created. comment that this may not be called if no_spill is true PS6, Line 127: InputDone comment that this may not be called if no_spill is true (are there any other cases I missed?) http://gerrit.cloudera.org:8080/#/c/7267/6/fe/src/main/java/org/apache/impala/planner/SortNode.java File fe/src/main/java/org/apache/impala/planner/SortNode.java: PS6, Line 56: TODO: run experiments to : // determine the value for this, consider making it configurable, enforce it in the BE. JIRA? PS6, Line 124: { : return type_ != TSortType.PARTIAL; : } nit: oneline? -- To view, visit http://gerrit.cloudera.org:8080/7267 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ieec2a15a0cc5240b1c13682067ab64670d1e0a38 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5498: Support for partial sorts
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5498: Support for partial sorts .. Patch Set 6: Code-Review+1 (1 comment) LGTM aside from the variable name http://gerrit.cloudera.org:8080/#/c/7267/6/be/src/runtime/sorter.h File be/src/runtime/sorter.h: PS6, Line 103: no_spill A positive argument would be marginally easier to follow. E.g. 'enable_spilling' -- To view, visit http://gerrit.cloudera.org:8080/7267 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ieec2a15a0cc5240b1c13682067ab64670d1e0a38 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5586: Null-aware anti-join can take a long time to cancel
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5586: Null-aware anti-join can take a long time to cancel .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/7393/3/be/src/exec/partitioned-hash-join-node.cc File be/src/exec/partitioned-hash-join-node.cc: Line 1018: // For each row, iterate over all rows in the build table. The need for RETURN_IF_CANCELLED is a little subtle. A one-line comment would be useful, e.g. "This loop can run for a long time. Periodically check for cancellation." Line 1023: for (int j = 0; j < build_rows->num_rows(); ++j) { This loop can also potentially run for a long time if build_rows is large. It will be a bit expensive to check every iteration but maybe check every 1024 rows or something like that (I'm suggesting a power-of-two because computing % is cheap for a constant power-of-two). if (j % 1024 == 0) RETURN_IF_CANCELLED(state); -- To view, visit http://gerrit.cloudera.org:8080/7393 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0800754d4ad31cbadbdfadc630c640963f3f6053 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadke Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5667: Race in DataStremSender could cause TransmitData sidecar corruption
Sailesh Mukil has posted comments on this change. Change subject: IMPALA-5667: Race in DataStremSender could cause TransmitData sidecar corruption .. Patch Set 1: > Uploaded patch set 1. I will stress test this patch to make sure that we don't hit the DCHECK anymore and post my findings here. I wanted to push this to gerrit so that it would be easier to have a discussion. -- To view, visit http://gerrit.cloudera.org:8080/7436 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7742551236205c49393a7351291eae6aefb4fa09 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5659: Begin standardizing treatment of thirdparty libraries
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5659: Begin standardizing treatment of thirdparty libraries .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/7418/1/CMakeLists.txt File CMakeLists.txt: PS1, Line 125: impala_add_thirdparty_lib upper case function name? It seems weird that the declaration is lower case and all the callsites are upper case. Line 136: if (NOT STATIC_LIB) Could we simplify this to: if (BUILD_SHARED_LIBS OR NOT STATIC_LIB) else endif() -- To view, visit http://gerrit.cloudera.org:8080/7418 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib7a6bc5610aaf2450f91348d94cfb984c6a4b78d Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5586: Null-aware anti-join can take a long time to cancel
Thomas Tauber-Marshall has posted comments on this change. Change subject: IMPALA-5586: Null-aware anti-join can take a long time to cancel .. Patch Set 3: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/7393 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0800754d4ad31cbadbdfadc630c640963f3f6053 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadke Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: anujphadke Gerrit-HasComments: No
[Impala-ASF-CR] [SECURITY] Use KRPC's Kinit code to avoid expensive fork
Sailesh Mukil has abandoned this change. Change subject: [SECURITY] Use KRPC's Kinit code to avoid expensive fork .. Abandoned -- To view, visit http://gerrit.cloudera.org:8080/7428 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: abandon Gerrit-Change-Id: I576ff4d2ffcca75f9247132e11c7a2ca50b255bc Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil
[Impala-ASF-CR] Downgrade log level of "sampled call" message
Sailesh Mukil has abandoned this change. Change subject: Downgrade log level of "sampled call" message .. Abandoned -- To view, visit http://gerrit.cloudera.org:8080/7424 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: abandon Gerrit-Change-Id: I77d50f0f5b47cf3ff28253cf7d0f970853f8a017 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Michael Ho
[Impala-ASF-CR] [SECURITY] Enable MiniKdc and add kerberized BE tests
Sailesh Mukil has abandoned this change. Change subject: [SECURITY] Enable MiniKdc and add kerberized BE tests .. Abandoned -- To view, visit http://gerrit.cloudera.org:8080/7427 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: abandon Gerrit-Change-Id: I231c7cbc7433fa84da780b6168ae1b30ddeeab57 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil
[Impala-ASF-CR] IMPALA-4671: Add Impala-native ServicePool
Sailesh Mukil has abandoned this change. Change subject: IMPALA-4671: Add Impala-native ServicePool .. Abandoned -- To view, visit http://gerrit.cloudera.org:8080/7423 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: abandon Gerrit-Change-Id: I1fe97dcdd45f0984913a52677f3d13136c641786 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Michael Ho
[Impala-ASF-CR] tmp: force static linking against ssl
Sailesh Mukil has abandoned this change. Change subject: tmp: force static linking against ssl .. Abandoned -- To view, visit http://gerrit.cloudera.org:8080/7432 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: abandon Gerrit-Change-Id: I0e927f9a51241a3ade4eb253d4a8d33b2650bedb Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Henry Robinson
[Impala-ASF-CR] Honor enable shared from this<> rule of sharing only pre-shared objects
Sailesh Mukil has abandoned this change. Change subject: Honor enable_shared_from_this<> rule of sharing only pre-shared objects .. Abandoned -- To view, visit http://gerrit.cloudera.org:8080/7433 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: abandon Gerrit-Change-Id: Ib32e247c270f480ce203058f1a5aa8445b5bcec6 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil
[Impala-ASF-CR] IMPALA-5498: Support for partial sorts
Thomas Tauber-Marshall has posted comments on this change. Change subject: IMPALA-5498: Support for partial sorts .. Patch Set 6: (1 comment) > Can you extend the Sort metrics in the query profile to include > Avg, Min and Max run size? > This will be needed to reflect on the quality of the sorted data? Done. http://gerrit.cloudera.org:8080/#/c/7267/4/be/src/runtime/sorter.h File be/src/runtime/sorter.h: PS4, Line 99: to the merger and retrieve r > I don't feel strongly - the change you made looks good to me. Maybe 'is_par Done -- To view, visit http://gerrit.cloudera.org:8080/7267 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ieec2a15a0cc5240b1c13682067ab64670d1e0a38 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4875: Bound maximum statestore topic update size to 2GB.
Sailesh Mukil has abandoned this change. Change subject: IMPALA-4875: Bound maximum statestore topic update size to 2GB. .. Abandoned -- To view, visit http://gerrit.cloudera.org:8080/7425 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: abandon Gerrit-Change-Id: Icbe42704d3c98d9619e3737786b75ee67fb2a264 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Michael Ho
[Impala-ASF-CR] IMPALA-5659: Begin standardizing treatment of thirdparty libraries
Henry Robinson has posted comments on this change. Change subject: IMPALA-5659: Begin standardizing treatment of thirdparty libraries .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/7418/1/CMakeLists.txt File CMakeLists.txt: PS1, Line 232: FLATBUFFERS_LIBS > nit: FLATBUFFERS_STATIC_LIB for consistency. May need to update flatbuffer' Done -- To view, visit http://gerrit.cloudera.org:8080/7418 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib7a6bc5610aaf2450f91348d94cfb984c6a4b78d Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] Add two-way negotiation thread pools
Sailesh Mukil has abandoned this change. Change subject: Add two-way negotiation thread pools .. Abandoned -- To view, visit http://gerrit.cloudera.org:8080/7430 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: abandon Gerrit-Change-Id: Ia4cbdd0e7e3b570e8d4e8c52fbff5d353532a310 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Henry Robinson
[Impala-ASF-CR] IMPALA-5659: Begin standardizing treatment of thirdparty libraries
Henry Robinson has uploaded a new patch set (#2). Change subject: IMPALA-5659: Begin standardizing treatment of thirdparty libraries .. IMPALA-5659: Begin standardizing treatment of thirdparty libraries If Impala was built with --build_shared_libs, some thirdparty libraries were still statically linked; this could cause runtime errors if the libraries were also linked into a .so. This patch fixes that issue (for gflags, glog and protobuf at least) by ensuring that build_shared_libs is respected for those libraries. * Standardize thirdparty library handling w/CMake by adding IMPALA_ADD_THIRDPARTY_LIB. This creates a symbolic name for each library, allowing us to switch the underlying library files (e.g. change from static to dynamic linking) without having to individually change the link clauses for each target. * Remove most cases of add_library() from cmake_modules/* - that is all handled by IMPALA_ADD_THIRDPARTY_LIB. * Add shared library detection for a couple of thirdparty dependencies (many only detect static libraries), just to prove the concept. * All thirdparty libraries now print a standard set of messages. For example: -- --> Adding thirdparty library protoc. <-- -- Header files: /data/henry/src/cloudera/impala-toolchain/protobuf-2.6.1/include -- Added shared library dependency protoc: /data/henry/src/cloudera/impala-toolchain/protobuf-2.6.1/lib/libprotoc.so -- --> Adding thirdparty library libev. <-- -- Header files: /data/henry/src/cloudera/impala-toolchain/libev-4.20/include -- Added shared library dependency libev: -- /data/henry/src/cloudera/impala-toolchain/libev-4.20/lib/libev.so * Some libraries don't quite fit this pattern (LLVM and Boost) - leave them as is for now. * Remove FindOpenSSL.cmake - the toolchain one is more modern. Change-Id: Ib7a6bc5610aaf2450f91348d94cfb984c6a4b78d --- M CMakeLists.txt M be/CMakeLists.txt M cmake_modules/FindAvro.cmake M cmake_modules/FindBreakpad.cmake M cmake_modules/FindFlatBuffers.cmake M cmake_modules/FindGFlags.cmake M cmake_modules/FindGLog.cmake M cmake_modules/FindGTest.cmake M cmake_modules/FindHDFS.cmake M cmake_modules/FindLdap.cmake M cmake_modules/FindLz4.cmake D cmake_modules/FindOpenSSL.cmake M cmake_modules/FindRe2.cmake M cmake_modules/FindSnappy.cmake M cmake_modules/FindZlib.cmake M cmake_modules/kudu_cmake_fns.txt 16 files changed, 120 insertions(+), 263 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/18/7418/2 -- To view, visit http://gerrit.cloudera.org:8080/7418 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib7a6bc5610aaf2450f91348d94cfb984c6a4b78d Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] Disable Kerberos testing to unblock builds
Sailesh Mukil has abandoned this change. Change subject: Disable Kerberos testing to unblock builds .. Abandoned -- To view, visit http://gerrit.cloudera.org:8080/7429 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: abandon Gerrit-Change-Id: Icad5e69c7d63012cbefdd384ac05b7750221f8e0 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Henry Robinson
[Impala-ASF-CR] Test: change linking order for crypto and ssl
Sailesh Mukil has abandoned this change. Change subject: Test: change linking order for crypto and ssl .. Abandoned -- To view, visit http://gerrit.cloudera.org:8080/7431 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: abandon Gerrit-Change-Id: I41438edd8728bb8746a85aa0b2861daa4a16752c Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Henry Robinson
[Impala-ASF-CR] Unbreak the build
Sailesh Mukil has abandoned this change. Change subject: Unbreak the build .. Abandoned -- To view, visit http://gerrit.cloudera.org:8080/7435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: abandon Gerrit-Change-Id: Ie8bc5e41e8d5ae2b5ebf93c2380d651d99df10aa Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Michael Ho
[Impala-ASF-CR] IMPALA-5592: Fix computation of uncompressed row batch size
Sailesh Mukil has abandoned this change. Change subject: IMPALA-5592: Fix computation of uncompressed row batch size .. Abandoned -- To view, visit http://gerrit.cloudera.org:8080/7434 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: abandon Gerrit-Change-Id: Ic8d285d1e4a1aa97c98c505351c18fea04c60311 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Michael Ho
[Impala-ASF-CR] IMPALA-5498: Support for partial sorts
Thomas Tauber-Marshall has uploaded a new patch set (#6). Change subject: IMPALA-5498: Support for partial sorts .. IMPALA-5498: Support for partial sorts Impala currently supports total sorts (the entire set of data is sorted) and top-n sorts (only the highest/lowest n elements are sorted). This patch adds the ability to do partial sorts, where the data is divided up into some number of subsets, each of which is sorted individually. It accomplishes this by adding a new exec node, PartialSortNode. When PartialSortNode::GetNext() is called, it retrieves input up to its memory limit, uses the existing Sorter class to sort it, and outputs it. This is faster than a total sort with SortNode as it avoids the need to spill if the input is larger than the memory limit. Future work will look into setting a more restrictive memory limit on the PartialSortNode. In the planner, the SortNode plan node is used, with an enum value indicating if it is a total or partial sort. Adds a new counter 'RunSize' to the runtime profile which tracks the min, max, and avg size of the generated runs, in tuples. As a first use case, partial sort is used where a total sort was used previously for inserts into Kudu. Testing: - E2E test with a large INSERT into a Kudu table with a mem limit. Checks that no spills occurred. - Updated planner tests. - Existing E2E tests and stress test verify correctness of INSERT. - Perf tests on the 10 node cluster: inserting tpch_100.lineitem into a Kudu table with mem_limit=3gb: Previously: 5 runs are spilled, sort took 7m33s Now: no spills, sort takes 6m19s, for ~18% speedup Change-Id: Ieec2a15a0cc5240b1c13682067ab64670d1e0a38 --- M be/src/exec/CMakeLists.txt M be/src/exec/exec-node.cc A be/src/exec/partial-sort-node.cc A be/src/exec/partial-sort-node.h M be/src/exec/sort-node.h M be/src/runtime/sorter.cc M be/src/runtime/sorter.h M be/src/util/runtime-profile-counters.h M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/main/java/org/apache/impala/planner/SortNode.java M testdata/workloads/functional-planner/queries/PlannerTest/kudu-upsert.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test 16 files changed, 460 insertions(+), 67 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/7267/6 -- To view, visit http://gerrit.cloudera.org:8080/7267 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ieec2a15a0cc5240b1c13682067ab64670d1e0a38 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] [SECURITY] Make KRPC work with Kerberos
Sailesh Mukil has abandoned this change. Change subject: [SECURITY] Make KRPC work with Kerberos .. Abandoned -- To view, visit http://gerrit.cloudera.org:8080/7426 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: abandon Gerrit-Change-Id: Ic27ac6a42fd43ef67994f6b0120c38b15971d24e Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil
[Impala-ASF-CR] IMPALA-5659: Bump toolchain version
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-5659: Bump toolchain version .. IMPALA-5659: Bump toolchain version This bumps the toolchain version to include the fix which ensured gflags built shared libraries. Impala does not currently depend on that change, but an upcoming change will do so. Change-Id: I9fe91b37ae00cd3d2e35b6a5c465f55238f32d1a Reviewed-on: http://gerrit.cloudera.org:8080/7422 Reviewed-by: Matthew Jacobs Tested-by: Impala Public Jenkins --- M bin/impala-config.sh 1 file changed, 1 insertion(+), 1 deletion(-) Approvals: Impala Public Jenkins: Verified Matthew Jacobs: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/7422 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I9fe91b37ae00cd3d2e35b6a5c465f55238f32d1a Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs
[Impala-ASF-CR] IMPALA-5659: Bump toolchain version
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5659: Bump toolchain version .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/7422 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9fe91b37ae00cd3d2e35b6a5c465f55238f32d1a Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-{4670,4672,4784}: Add RpcMgr and port Statestore services to KRPC
Sailesh Mukil has uploaded a new patch set (#16). Change subject: IMPALA-{4670,4672,4784}: Add RpcMgr and port Statestore services to KRPC .. IMPALA-{4670,4672,4784}: Add RpcMgr and port Statestore services to KRPC This patch adds the core abstractions necessary for using Kudu RPC for Impala's backend services. The StatestoreService and StatestoreSubscriberService are both ported to KRPC to demonstrate how the new RPC framework operates. The main new class is RpcMgr, which is responsible for both services run by a daemon and for obtaining clients to remote services. The new Rpc class helps clients build rpc invocation objects, and use them to call remote methods. Also done: * Utility code to convert from a Thrift structure to a Protobuf wrapper. Thrift RPCs do not need to be ported to Protobuf to work. * Added more build support for Protobuf generation. All library targets now depend on the Protobuf generation step, just as they do Thrift. * thrift-server-test is rewritten to use Beeswax as a test service, since the old StatestoreService has been removed. * Remove InProcessStatestore that was only used for statestore-test and mini-impala-cluster. As far as we know, mini-impala-cluster is unused by anyone. (IMPALA-4784) TESTING: * Impala's test suite passes. * statestore-test has been rewritten to use new statestore code. It includes all test cases from the now-removed Python test_statestore.py * rpc-mgr-test unit-tests the RpcMgr and Rpc classes. TODO: * Enable SSL and Kerberos support (already in Kudu's imported libraries) Change-Id: I8dbf12b9ecd71d26d239d31c19b487175194c766 --- M be/generated-sources/gen-cpp/CMakeLists.txt M be/src/catalog/CMakeLists.txt M be/src/catalog/catalog-server.cc M be/src/catalog/catalog-server.h M be/src/codegen/CMakeLists.txt M be/src/common/CMakeLists.txt M be/src/exec/CMakeLists.txt M be/src/exec/catalog-op-executor.cc M be/src/exec/kudu-util.h M be/src/experiments/CMakeLists.txt M be/src/exprs/CMakeLists.txt M be/src/rpc/CMakeLists.txt A be/src/rpc/common.proto A be/src/rpc/rpc-mgr-test.cc A be/src/rpc/rpc-mgr.cc A be/src/rpc/rpc-mgr.h A be/src/rpc/rpc-mgr.inline.h A be/src/rpc/rpc.h A be/src/rpc/rpc_test.proto M be/src/rpc/thrift-server-test.cc M be/src/rpc/thrift-util.h M be/src/runtime/CMakeLists.txt M be/src/runtime/bufferpool/CMakeLists.txt M be/src/runtime/exec-env.cc M be/src/runtime/exec-env.h M be/src/scheduling/CMakeLists.txt M be/src/service/CMakeLists.txt M be/src/service/client-request-state.cc M be/src/statestore/CMakeLists.txt D be/src/statestore/statestore-service-client-wrapper.h D be/src/statestore/statestore-subscriber-client-wrapper.h M be/src/statestore/statestore-subscriber.cc M be/src/statestore/statestore-subscriber.h M be/src/statestore/statestore-test.cc M be/src/statestore/statestore.cc M be/src/statestore/statestore.h A be/src/statestore/statestore.proto M be/src/statestore/statestored-main.cc M be/src/testutil/CMakeLists.txt M be/src/testutil/in-process-servers.cc M be/src/testutil/in-process-servers.h D be/src/testutil/mini-impala-cluster.cc M be/src/transport/CMakeLists.txt M be/src/udf/CMakeLists.txt M be/src/udf_samples/CMakeLists.txt M be/src/util/CMakeLists.txt M be/src/util/collection-metrics.h M be/src/util/counting-barrier.h M bin/start-impala-cluster.py M bin/start-impalad.sh M common/thrift/CMakeLists.txt M common/thrift/StatestoreService.thrift D tests/statestore/test_statestore.py 53 files changed, 1,733 insertions(+), 1,219 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/5720/16 -- To view, visit http://gerrit.cloudera.org:8080/5720 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8dbf12b9ecd71d26d239d31c19b487175194c766 Gerrit-PatchSet: 16 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Juan Yu Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] Unbreak the build
Sailesh Mukil has uploaded a new change for review. http://gerrit.cloudera.org:8080/7435 Change subject: Unbreak the build .. Unbreak the build Change-Id: Ie8bc5e41e8d5ae2b5ebf93c2380d651d99df10aa --- M CMakeLists.txt 1 file changed, 9 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/7435/1 -- To view, visit http://gerrit.cloudera.org:8080/7435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ie8bc5e41e8d5ae2b5ebf93c2380d651d99df10aa Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Michael Ho
[Impala-ASF-CR] IMPALA-4669: [KUTIL] Add kudu util library to the build.
Hello Impala Public Jenkins, Michael Ho, Matthew Jacobs, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5715 to look at the new patch set (#23). Change subject: IMPALA-4669: [KUTIL] Add kudu_util library to the build. .. IMPALA-4669: [KUTIL] Add kudu_util library to the build. A few miscellaneous changes to allow kudu_util to compile with Impala. Add kudu_version.cc to substitute for the version.cc file that is automatically built during the full Kudu build. Set LZ4_DISABLE_DEPRECATE_WARNINGS to allow Kudu's compressor utility to use deprecated names for LZ4 methods. Add NO_NVM_SUPPORT flag to Kudu build (plan to upstream this later) to disable building with nvm support, removing a library dependency. Also remove imported FindOpenSSL.cmake in favour of the standard one provided by cmake itself. Change-Id: I8da798ee55506d6f969416b17c191eb03cb215f5 --- M CMakeLists.txt M be/CMakeLists.txt M be/src/common/CMakeLists.txt A be/src/common/kudu_version.cc M be/src/common/logging.cc M be/src/exec/kudu-util.h A be/src/kudu/gutil M be/src/kudu/util/CMakeLists.txt M be/src/kudu/util/cache.cc M be/src/kudu/util/compression/compression_codec.cc M be/src/kudu/util/env.cc M be/src/kudu/util/env_posix.cc M be/src/kudu/util/flags.cc A be/src/kudu/util/kudu_export.h M be/src/kudu/util/locks.h M be/src/kudu/util/logging.cc M be/src/kudu/util/minidump.cc D cmake_modules/FindOpenSSL.cmake 18 files changed, 140 insertions(+), 90 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/5715/23 -- To view, visit http://gerrit.cloudera.org:8080/5715 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8da798ee55506d6f969416b17c191eb03cb215f5 Gerrit-PatchSet: 23 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4671: Add Impala-native ServicePool
Sailesh Mukil has uploaded a new change for review. http://gerrit.cloudera.org:8080/7423 Change subject: IMPALA-4671: Add Impala-native ServicePool .. IMPALA-4671: Add Impala-native ServicePool Add a ServicePool implementation that uses Impala's native threads, and hooks into Impala's instrumentation APIs. Change-Id: I1fe97dcdd45f0984913a52677f3d13136c641786 --- M be/src/catalog/catalog-server.h M be/src/catalog/catalogd-main.cc M be/src/rpc/CMakeLists.txt A be/src/rpc/impala-service-pool.cc A be/src/rpc/impala-service-pool.h M be/src/rpc/rpc-mgr.cc M be/src/rpc/rpc-mgr.h M be/src/rpc/rpc-trace.cc M be/src/rpc/rpc-trace.h M be/src/service/impalad-main.cc M be/src/statestore/statestore.h M be/src/statestore/statestored-main.cc M be/src/util/histogram-metric.h M www/common-header.tmpl M www/rpcz.tmpl 15 files changed, 570 insertions(+), 14 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/7423/1 -- To view, visit http://gerrit.cloudera.org:8080/7423 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I1fe97dcdd45f0984913a52677f3d13136c641786 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Michael Ho
[Impala-ASF-CR] [SECURITY] Enable MiniKdc and add kerberized BE tests
Sailesh Mukil has uploaded a new change for review. http://gerrit.cloudera.org:8080/7427 Change subject: [SECURITY] Enable MiniKdc and add kerberized BE tests .. [SECURITY] Enable MiniKdc and add kerberized BE tests This patch enables the MiniKdc and converts the rpc-mgr-test to have a kerberized mode, which uses the MiniKdc to run the tests configured with Kerberos enabled. There are a couple of small code changes on the KuduRPC side as those changes are currently required to avoid a clash between Impala's and Kudu's util libraries. Change-Id: I231c7cbc7433fa84da780b6168ae1b30ddeeab57 --- M CMakeLists.txt M be/src/kudu/security/CMakeLists.txt M be/src/kudu/security/test/mini_kdc.cc M be/src/rpc/CMakeLists.txt M be/src/rpc/authentication.cc M be/src/rpc/rpc-mgr-test.cc A cmake_modules/FindKerberosPrograms.cmake 7 files changed, 165 insertions(+), 26 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/7427/1 -- To view, visit http://gerrit.cloudera.org:8080/7427 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I231c7cbc7433fa84da780b6168ae1b30ddeeab57 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil
[Impala-ASF-CR] IMPALA-4856: Port data stream service to KRPC
Sailesh Mukil has uploaded a new patch set (#6). Change subject: IMPALA-4856: Port data stream service to KRPC .. IMPALA-4856: Port data stream service to KRPC This patch ports the data-flow parts of ImpalaInternalService to KRPC. * ImpalaInternalService is split into two services. The first, ImpalaInternalService, deals with control messages for plan fragment instance execution, cancellation and reporting, and remains implemented in Thrift for now. The second, DataStreamService, handles large-payload RPCs for transmitting runtime filters and row batches between hosts. * In the DataStreamService, all RPCs use 'native' protobuf. The DataStreamService starts on the port previously reserved for the StatestoreSubscriberService (which is also a KRPC service), to avoid having to configure another port when starting Impala. When the ImpalaInternalService is ported to KRPC, all services will run on one port. * To support needing to address two different backend services, a data service port has been added to TBackendDescriptor. * This patch adds support for asynchronous RPCs to the RpcMgr and Rpc classes. Previously, Impala used fixed size thread pools + synchronous RPCs to achieve some parallelism for 'broadcast' RPCs like filter propagation, or a dedicated per-sender+receiver pair thread on the sender side in the DataStreamSender case. In this patch, the PublishFilter() and TransmitData() RPCs are sent asynchronously using KRPC's thread pools. * The TransmitData() protocol has changed to adapt to asynchronous RPCs. The full details are in data-stream-mgr.h. * As a result, DataStreamSender no longer creates a thread-per-connection on the sender side. * Both tuple transmission and runtime filter publication use sidecars to minimise the number of copies and serialization steps required. * Also include a fix for KUDU-2011 that properly allows sidecars to be shared between KRPC and the RPC caller (fixing IMPALA-5093, a corruption bug). * A large portion of this patch is the replacement of TRowBatch with its Protobuf equivalent, RowBatchPb. The replacement is a literal port of the data structure, and row-batch-test, row-batch-list-test and row-batch-serialize-benchmark continue to execute without major logic changes. * Simplify FindRecvr() logic in DataStreamManager. No-longer need to handle blocking sender-side, so no need for complex promise-based machinery. Instead, all senders with no receiver are added to a per-receiver list, which is processed when the receiver arrives. If it does not arrive promptly, the DataStreamManager cleans them up after FLAGS_datastream_sender_timeout_ms. * This patch also begins a clean-up of how ImpalaServer instances are created (by removing CreateImpalaServer), and clarifying the relationship between ExecEnv and ImpalaServer. ImpalaServer now follows the standard construct->Init()->Start()->Join() lifecycle that we use for other services. * Ensure that all addresses used for KRPCs are fully resolved, avoiding the need to resolve them for each RPC. TESTING --- * New tests added to rpc-mgr-test. TO DO - * Re-enable throughput and latency measurements per data-stream sender when that information is exposed from KRPC (KUDU-1738). * TLS and Kerberos are still not supported by KRPC in this patch. Change-Id: Ia66704be7a0a8162bb85556d07b583ec756c584b --- M .clang-format M CMakeLists.txt M be/CMakeLists.txt M be/src/benchmarks/bloom-filter-benchmark.cc M be/src/benchmarks/row-batch-serialize-benchmark.cc M be/src/common/init.cc M be/src/common/status.cc M be/src/common/status.h M be/src/exprs/expr-test.cc M be/src/kudu/rpc/rpc_sidecar.cc M be/src/kudu/rpc/rpc_sidecar.h M be/src/rpc/CMakeLists.txt M be/src/rpc/TAcceptQueueServer.cpp M be/src/rpc/common.proto M be/src/rpc/rpc-mgr-test.cc M be/src/rpc/rpc-mgr.h M be/src/rpc/rpc-mgr.inline.h M be/src/rpc/rpc.h M be/src/runtime/backend-client.h M be/src/runtime/client-cache-types.h M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/coordinator-backend-state.h M be/src/runtime/coordinator-filter-state.h M be/src/runtime/coordinator.cc M be/src/runtime/coordinator.h M be/src/runtime/data-stream-mgr.cc M be/src/runtime/data-stream-mgr.h M be/src/runtime/data-stream-recvr.cc M be/src/runtime/data-stream-recvr.h M be/src/runtime/data-stream-sender.cc M be/src/runtime/data-stream-sender.h M be/src/runtime/data-stream-test.cc M be/src/runtime/exec-env.cc M be/src/runtime/exec-env.h M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/row-batch-serialize-test.cc M be/src/runtime/row-batch.cc M be/src/runtime/row-batch.h M be/src/runtime/runtime-filter-bank.cc M be/src/runtime/runtime-filter-bank.h M be/src/scheduling/backend-config-test.cc M be/src/scheduling/backe
[Impala-ASF-CR] IMPALA-5592: Fix computation of uncompressed row batch size
Sailesh Mukil has uploaded a new change for review. http://gerrit.cloudera.org:8080/7434 Change subject: IMPALA-5592: Fix computation of uncompressed row batch size .. IMPALA-5592: Fix computation of uncompressed row batch size The uncompressed row batch size is the size of a row batch if the tuple data is sent without compression. It's computed by the following formula: serialized row batch size - compressed tuple data size + uncompressed tuple data size The existing code is using the wrong field for the compressed tuple data size and it causes uncompressed data size to be always identical to the compressed row batch size. This change fixes the problem by using the right field for the compressed tuple data size. Change-Id: Ic8d285d1e4a1aa97c98c505351c18fea04c60311 --- M be/src/runtime/data-stream-sender.cc 1 file changed, 9 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/34/7434/1 -- To view, visit http://gerrit.cloudera.org:8080/7434 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ic8d285d1e4a1aa97c98c505351c18fea04c60311 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Michael Ho
[Impala-ASF-CR] tmp: force static linking against ssl
Sailesh Mukil has uploaded a new change for review. http://gerrit.cloudera.org:8080/7432 Change subject: tmp: force static linking against ssl .. tmp: force static linking against ssl Change-Id: I0e927f9a51241a3ade4eb253d4a8d33b2650bedb --- M CMakeLists.txt 1 file changed, 1 insertion(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/7432/1 -- To view, visit http://gerrit.cloudera.org:8080/7432 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I0e927f9a51241a3ade4eb253d4a8d33b2650bedb Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Henry Robinson
[Impala-ASF-CR] Add two-way negotiation thread pools
Sailesh Mukil has uploaded a new change for review. http://gerrit.cloudera.org:8080/7430 Change subject: Add two-way negotiation thread pools .. Add two-way negotiation thread pools Change-Id: Ia4cbdd0e7e3b570e8d4e8c52fbff5d353532a310 --- M be/src/kudu/rpc/messenger.cc M be/src/kudu/rpc/messenger.h M be/src/kudu/rpc/reactor.cc M be/src/runtime/data-stream-recvr.cc M be/src/runtime/data-stream-recvr.h 5 files changed, 18 insertions(+), 15 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/7430/1 -- To view, visit http://gerrit.cloudera.org:8080/7430 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ia4cbdd0e7e3b570e8d4e8c52fbff5d353532a310 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Henry Robinson
[Impala-ASF-CR] IMPALA-5667: Race in DataStremSender could cause TransmitData sidecar corruption
Sailesh Mukil has uploaded a new change for review. http://gerrit.cloudera.org:8080/7436 Change subject: IMPALA-5667: Race in DataStremSender could cause TransmitData sidecar corruption .. IMPALA-5667: Race in DataStremSender could cause TransmitData sidecar corruption This is caused by a race in the DataStreamSender where if a query is cancelled while 2 TransmitData RPCs for HASH_PARTITIONED exchanges are in progress, a third TransmitData RPC may overwrite the sidecar pointed to by the first RPC while the first RPC is in progress. This could also happen if the DataStreamSender instance has somehow asynchronously been closed, or if a previous RPC has returned with a DATASTREAM_RECVR_ALREADY_GONE error. The entire sequence of events is listed in the JIRA and explained more clearly there as it's a long and complicated explanation. The fix is basically to return a cancelled status if the DSS realizes that the query is being cancelled. We previously returned an OK status since some higher level execution node would take care of propogating this cancelled status to the rest of the execution tree. This doesn't work well for TransmitData to HASH_PARTITIONED exchanges since we don't check for cancellation until we've consumed the entire RowBatch input to the DSS. In that time, we could potentially have multiple RPCs attempting to serialize the outbound protobuf sidecars only to later realize that the query has been cancelled and throw them away. Change-Id: I7742551236205c49393a7351291eae6aefb4fa09 --- M be/src/runtime/data-stream-sender.cc 1 file changed, 8 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/7436/1 -- To view, visit http://gerrit.cloudera.org:8080/7436 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I7742551236205c49393a7351291eae6aefb4fa09 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil
[Impala-ASF-CR] [SECURITY] Make KRPC work with Kerberos
Sailesh Mukil has uploaded a new change for review. http://gerrit.cloudera.org:8080/7426 Change subject: [SECURITY] Make KRPC work with Kerberos .. [SECURITY] Make KRPC work with Kerberos KuduRPC itself has experimental support for Kerberos. However, since Impala's client transport still uses the Thrift transport stack, we need to make sure that a single security configuration applies to both internal communication (KuduRPC) and external communication (Thrift's TSaslTransport). This patch achieves that. However, slight modification of the KuduRPC code is required which isn't ideal, but there's currently no way around it. Change-Id: Ic27ac6a42fd43ef67994f6b0120c38b15971d24e --- M be/src/kudu/rpc/client_negotiation.cc M be/src/kudu/rpc/server_negotiation.cc A be/src/rpc/authentication-export.h M be/src/rpc/authentication-test.cc M be/src/rpc/authentication.cc 5 files changed, 104 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/7426/1 -- To view, visit http://gerrit.cloudera.org:8080/7426 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ic27ac6a42fd43ef67994f6b0120c38b15971d24e Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil
[Impala-ASF-CR] Honor enable shared from this<> rule of sharing only pre-shared objects
Sailesh Mukil has uploaded a new change for review. http://gerrit.cloudera.org:8080/7433 Change subject: Honor enable_shared_from_this<> rule of sharing only pre-shared objects .. Honor enable_shared_from_this<> rule of sharing only pre-shared objects Change-Id: Ib32e247c270f480ce203058f1a5aa8445b5bcec6 --- M be/src/runtime/data-stream-sender.cc 1 file changed, 2 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/7433/1 -- To view, visit http://gerrit.cloudera.org:8080/7433 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ib32e247c270f480ce203058f1a5aa8445b5bcec6 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil
[Impala-ASF-CR] IMPALA-4874: Increase the maximum KRPC message size to 4GB
Sailesh Mukil has uploaded a new patch set (#2). Change subject: IMPALA-4874: Increase the maximum KRPC message size to 4GB .. IMPALA-4874: Increase the maximum KRPC message size to 4GB KRPC restricts the maximum message size with --rpc_max_message_size. The Protobuf API used to deserialize messages also has a maximum size. Both can be raised, and should be as a stop-gap until Impala has mechanisms to prevent all its RPC payloads getting too large. Note that some statestore payloads may still be larger than the 4GB max. IMPALA-4875 will handle that for the statestore case particularly. TESTING --- * This allows test_very_large_string to pass with KRPC. * Tests added to statestore-test and rpc-mgr-test for large payloads. Change-Id: Idb97dd4604548c404278fe8094db2e419021a7c3 --- M be/src/kudu/rpc/serialization.cc M be/src/rpc/rpc-mgr-test.cc M be/src/rpc/rpc-mgr.cc M be/src/rpc/rpc-mgr.h M be/src/rpc/rpc_test.proto M be/src/statestore/statestore-test.cc 6 files changed, 87 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/5887/2 -- To view, visit http://gerrit.cloudera.org:8080/5887 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idb97dd4604548c404278fe8094db2e419021a7c3 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Michael Ho
[Impala-ASF-CR] Test: change linking order for crypto and ssl
Sailesh Mukil has uploaded a new change for review. http://gerrit.cloudera.org:8080/7431 Change subject: Test: change linking order for crypto and ssl .. Test: change linking order for crypto and ssl Change-Id: I41438edd8728bb8746a85aa0b2861daa4a16752c --- M be/src/kudu/security/CMakeLists.txt 1 file changed, 2 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/7431/1 -- To view, visit http://gerrit.cloudera.org:8080/7431 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I41438edd8728bb8746a85aa0b2861daa4a16752c Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Henry Robinson
[Impala-ASF-CR] Disable Kerberos testing to unblock builds
Sailesh Mukil has uploaded a new change for review. http://gerrit.cloudera.org:8080/7429 Change subject: Disable Kerberos testing to unblock builds .. Disable Kerberos testing to unblock builds Change-Id: Icad5e69c7d63012cbefdd384ac05b7750221f8e0 --- M be/src/rpc/rpc-mgr-test.cc 1 file changed, 1 insertion(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/7429/1 -- To view, visit http://gerrit.cloudera.org:8080/7429 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Icad5e69c7d63012cbefdd384ac05b7750221f8e0 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Henry Robinson
[Impala-ASF-CR] [SECURITY] Use KRPC's Kinit code to avoid expensive fork
Sailesh Mukil has uploaded a new change for review. http://gerrit.cloudera.org:8080/7428 Change subject: [SECURITY] Use KRPC's Kinit code to avoid expensive fork .. [SECURITY] Use KRPC's Kinit code to avoid expensive fork Impala currently kinits by forking off a subprocess. This has proved to be expensive in many cases since the subprocess tries to reserve as much memory as Impala is currently using which can be quite a lot. This patch removes our kinit-thread and uses KRPCs kinit code instead which programatically uses the krb5 library to kinit and renew tickets. Testing: Verified with rpc-mgr-test and also manually on a live kerberized cluster. Change-Id: I576ff4d2ffcca75f9247132e11c7a2ca50b255bc --- M be/src/kudu/security/init.cc M be/src/rpc/CMakeLists.txt M be/src/rpc/auth-provider.h M be/src/rpc/authentication.cc 4 files changed, 11 insertions(+), 68 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/7428/1 -- To view, visit http://gerrit.cloudera.org:8080/7428 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I576ff4d2ffcca75f9247132e11c7a2ca50b255bc Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil
[Impala-ASF-CR] IMPALA-4875: Bound maximum statestore topic update size to 2GB.
Sailesh Mukil has uploaded a new change for review. http://gerrit.cloudera.org:8080/7425 Change subject: IMPALA-4875: Bound maximum statestore topic update size to 2GB. .. IMPALA-4875: Bound maximum statestore topic update size to 2GB. This patch enforces a maximum topic update size for the statestore to be 2GB. If that limit is exceeded, the statestore will split topic updates over several RPCs to the subscriber. If a single topic key + value exceeds the limit, it will be dropped by the statestore when the subscriber tries to publish it. Doing so introduces the possibility that a catalog update (all updates pertaining to a single catalog version) might be split across two separate UpdateState() RPCs. To ensure that all updates are flushed to the catalog FE at the same time, we buffer updates until a new object of type CATALOG is received. Remove the special treatment of deletions in the UpdateState() RPC. Previously they were not sent with their version. New statestore tests added to ensure: * Update sizes don't exceed the configured max. * 'from' version is properly handled (not previously tested) so that subscribers can process only a prefix of the update log. TODO : Check min subscriber version Change-Id: Icbe42704d3c98d9619e3737786b75ee67fb2a264 --- M be/src/catalog/catalog-server.cc M be/src/scheduling/admission-controller.cc M be/src/scheduling/admission-controller.h M be/src/scheduling/scheduler-test-util.cc M be/src/scheduling/scheduler.cc M be/src/service/impala-server.cc M be/src/service/impala-server.h M be/src/statestore/statestore-test.cc M be/src/statestore/statestore.cc M be/src/statestore/statestore.h M common/thrift/StatestoreService.thrift 11 files changed, 308 insertions(+), 163 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/7425/1 -- To view, visit http://gerrit.cloudera.org:8080/7425 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Icbe42704d3c98d9619e3737786b75ee67fb2a264 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Michael Ho
[Impala-ASF-CR] Downgrade log level of "sampled call" message
Sailesh Mukil has uploaded a new change for review. http://gerrit.cloudera.org:8080/7424 Change subject: Downgrade log level of "sampled call" message .. Downgrade log level of "sampled call" message Change-Id: I77d50f0f5b47cf3ff28253cf7d0f970853f8a017 --- M be/src/kudu/rpc/rpcz_store.cc 1 file changed, 1 insertion(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/24/7424/1 -- To view, visit http://gerrit.cloudera.org:8080/7424 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I77d50f0f5b47cf3ff28253cf7d0f970853f8a017 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Michael Ho
[Impala-ASF-CR] IMPALA-5659: Begin standardizing treatment of thirdparty libraries
Michael Ho has posted comments on this change. Change subject: IMPALA-5659: Begin standardizing treatment of thirdparty libraries .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/7418/1/CMakeLists.txt File CMakeLists.txt: PS1, Line 232: FLATBUFFERS_LIBS nit: FLATBUFFERS_STATIC_LIB for consistency. May need to update flatbuffer's FindCMake files. -- To view, visit http://gerrit.cloudera.org:8080/7418 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib7a6bc5610aaf2450f91348d94cfb984c6a4b78d Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5520: TopN node periodically reclaims old allocations
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-5520: TopN node periodically reclaims old allocations .. Patch Set 5: (8 comments) http://gerrit.cloudera.org:8080/#/c/7400/5//COMMIT_MSG Commit Message: PS5, Line 19: test_tpch_queries The test doesn't belong in this file, see my comment in the test file itself. http://gerrit.cloudera.org:8080/#/c/7400/5/be/src/exec/topn-node-ir.cc File be/src/exec/topn-node-ir.cc: PS5, Line 53: ) : PushHeap(priority_queue_, : ComparatorWrapper(*tuple_row_less_than_), insert_tuple); wrap this in { } per the c++ style guide: https://google.github.io/styleguide/cppguide.html#Conditionals http://gerrit.cloudera.org:8080/#/c/7400/5/be/src/exec/topn-node.h File be/src/exec/topn-node.h: PS5, Line 114: tuple_pool tuple_pool_ PS5, Line 126: modified only using : /// push_heap()/pop_heap() to maintain ordered heap invariants this makes it sound like it should be encapsulated in an inner class, with the priority queue made private PS5, Line 133: priority_queue_ remove the trailing _ this should reference the parameter, which is priority_queue (not the member var) PS5, Line 133: /// Helper methods for modifying priority_queue_ while maintaining ordered heap : /// invariants : inline void PushHeap(std::vector& priority_queue, : const ComparatorWrapper& comparator, Tuple* const insert_row) { : priority_queue.push_back(insert_row); : std::push_heap(priority_queue.begin(), priority_queue.end(), comparator); : } : : inline void PopHeap(std::vector& priority_queue, : const ComparatorWrapper& comparator) { : std::pop_heap(priority_queue.begin(), priority_queue.end(), comparator); : priority_queue.pop_back(); move these functions up where the other private functions are http://gerrit.cloudera.org:8080/#/c/7400/5/tests/query_test/test_tpch_queries.py File tests/query_test/test_tpch_queries.py: PS5, Line 74: cls.ImpalaTestMatrix.add_constraint(lambda v:\ : v.get_value('table_format').file_format == 'text') : cls.ImpalaTestMatrix.add_constraint(lambda v:\ : v.get_value('table_format').compression_codec == 'none') helper fn to do this: cls.ImpalaTestMatrix.add_dimension(create_uncompressed_text_dimension(cls.get_workload())) PS5, Line 56: class TestTopNReclaimQuery(ImpalaTestSuite): : """Test class to validate that TopN periodically reclaims tuple pool memory :and runs with a lower memory footprint.""" : QUERY = "select * from tpch.lineitem order by l_orderkey desc limit 10;" : : # Mem limit empirically selected so that the query fails if tuple pool reclamation : # is not implemented for TopN : MEM_LIMIT = "50m" : : @classmethod : def get_workload(self): : return 'tpch' : : @classmethod : def add_test_dimensions(cls): : super(TestTopNReclaimQuery, cls).add_test_dimensions() : # The tpch tests take a long time to execute so restrict the combinations they : # execute over. : cls.ImpalaTestMatrix.add_constraint(lambda v:\ : v.get_value('table_format').file_format == 'text') : cls.ImpalaTestMatrix.add_constraint(lambda v:\ : v.get_value('table_format').compression_codec == 'none') : : def test_top_n_reclaim(self, vector): : exec_options = vector.get_value('exec_option') : exec_options['mem_limit'] = self.MEM_LIMIT : result = self.execute_query(self.QUERY, exec_options) : runtime_profile = str(result.runtime_profile) : num_of_times_tuple_pool_reclaimed = re.findall( : 'TuplePoolReclamations: ([0-9]*)', runtime_profile) : # Confirm newly added counter is visible : assert len(num_of_times_tuple_pool_reclaimed) > 0 : # Tuple pool is expected to be reclaimed for this query : for n in num_of_times_tuple_pool_reclaimed: : assert int(n) > 0 this doesn't belong in test_tpch_queries.py, it just happens to use a tpch query test_sort.py or test_queries.py could both be appropriate. -- To view, visit http://gerrit.cloudera.org:8080/7400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I968f57f0ff2905bd581908bc5c5ee486b31e6aa8 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet
[Impala-ASF-CR] IMPALA-5520: TopN node periodically reclaims old allocations
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-5520: TopN node periodically reclaims old allocations .. Patch Set 5: Thanks, Tim. Yeah I'll do another pass. -- To view, visit http://gerrit.cloudera.org:8080/7400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I968f57f0ff2905bd581908bc5c5ee486b31e6aa8 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5520: TopN node periodically reclaims old allocations
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5520: TopN node periodically reclaims old allocations .. Patch Set 5: (3 comments) Looks good aside from some minor things. MJ, I'm ok if you want to +2 after the remaining things are fixed. http://gerrit.cloudera.org:8080/#/c/7400/5/be/src/exec/topn-node.cc File be/src/exec/topn-node.cc: PS5, Line 255: std:: Shouldn't be needed (common/names.h has "using std::unique_ptr"). http://gerrit.cloudera.org:8080/#/c/7400/5/be/src/exec/topn-node.h File be/src/exec/topn-node.h: PS5, Line 135: std::vector& Our convention is to pass modified arguments by pointer - https://google.github.io/styleguide/cppguide.html#Reference_Arguments PS5, Line 135: inline Could be static, since it doesn't reference member variables. -- To view, visit http://gerrit.cloudera.org:8080/7400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I968f57f0ff2905bd581908bc5c5ee486b31e6aa8 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5520: TopN node periodically reclaims old allocations
Bikramjeet Vig has uploaded a new patch set (#5). Change subject: IMPALA-5520: TopN node periodically reclaims old allocations .. IMPALA-5520: TopN node periodically reclaims old allocations Currently TopN retains old string allocations in a tuple pool which is held longer than necessary, resulting in unnecessary memory usage. With this commit, the TopN node will periodically re-materialise the rows stored in the priority queue and reclaim the old allocations. This is done when the number of rows removed from the priority queue is more than twice the N (limit + offset). Moreover, a new counter called "TuplePoolReclamations" is added to the TopN node that keeps track of the number of times the tuple pool is reclaimed. Testing: Test added to test_tpch_queries.py which sets a low mem_limit such that the test would fail if reclamation is not implemented and pass otherwise. Performance: Query 1 (expected general case): select * from tpch.lineitem order by l_orderkey desc limit 10; Query 2 (example worst case: data stored in reverse order before feeding to the last TopN node): select * from (select * from tpch.lineitem order by l_orderkey desc limit 6001215) tb order by l_orderkey limit 10; With Reclaim Without Reclaim Query 1 Query 2 Query 1 Query 2 MaxTuplePoolMem3.96 KB 3.43 KB 110.2 MB708.8 MB Time (mean)2s 218ms6s 391ms 2s 021ms6s 406ms Time (stdev) 74.38ms 67.45ms 102.71ms70.44ms Reclaims910 5861 N/A N/A We notice that memory footprint is orders of magnitude lower while maintaining similar query runtimes. Cluster perf testing will be done later. Change-Id: I968f57f0ff2905bd581908bc5c5ee486b31e6aa8 --- M be/src/exec/topn-node-ir.cc M be/src/exec/topn-node.cc M be/src/exec/topn-node.h M tests/query_test/test_tpch_queries.py 4 files changed, 120 insertions(+), 22 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/7400/5 -- To view, visit http://gerrit.cloudera.org:8080/7400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I968f57f0ff2905bd581908bc5c5ee486b31e6aa8 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5520: TopN node periodically reclaims old allocations
Bikramjeet Vig has uploaded a new patch set (#5). Change subject: IMPALA-5520: TopN node periodically reclaims old allocations .. IMPALA-5520: TopN node periodically reclaims old allocations Currently TopN retains old string allocations in a tuple pool which is held longer than necessary, resulting in unnecessary memory usage. With this commit, the TopN node will periodically re-materialise the rows stored in the priority queue and reclaim the old allocations. This is done when the number of rows removed from the priority queue is more than twice the N (limit + offset). Moreover, a new counter called "TuplePoolReclamations" is added to the TopN node that keeps track of the number of times the tuple pool is reclaimed. Testing: Test added to test_tpch_queries.py which sets a low mem_limit such that the test would fail if reclamation is not implemented and pass otherwise. Performance: Query 1 (expected general case): select * from tpch.lineitem order by l_orderkey desc limit 10; Query 2 (example worst case: data stored in reverse order before feeding to the last TopN node): select * from (select * from tpch.lineitem order by l_orderkey desc limit 6001215) tb order by l_orderkey limit 10; With Reclaim Without Reclaim Query 1 Query 2 Query 1 Query 2 MaxTuplePoolMem3.96 KB 3.43 KB 110.2 MB708.8 MB Time (mean)2s 218ms6s 391ms 2s 021ms6s 406ms Time (stdev) 74.38ms 67.45ms 102.71ms70.44ms Reclaims910 5861 N/A N/A We notice that memory footprint is orders of magnitude lower while maintaining similar query runtimes. Cluster perf testing will be done later. Change-Id: I968f57f0ff2905bd581908bc5c5ee486b31e6aa8 --- M be/src/exec/topn-node-ir.cc M be/src/exec/topn-node.cc M be/src/exec/topn-node.h M tests/query_test/test_tpch_queries.py 4 files changed, 120 insertions(+), 22 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/7400/5 -- To view, visit http://gerrit.cloudera.org:8080/7400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I968f57f0ff2905bd581908bc5c5ee486b31e6aa8 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5520: TopN node periodically reclaims old allocations
Bikramjeet Vig has posted comments on this change. Change subject: IMPALA-5520: TopN node periodically reclaims old allocations .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/7400/3/be/src/exec/topn-node.h File be/src/exec/topn-node.h: Line 100: > The only member of ComparatorWrapper is the TupleRowComparator&. If you tre Done -- To view, visit http://gerrit.cloudera.org:8080/7400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I968f57f0ff2905bd581908bc5c5ee486b31e6aa8 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4276: Profile displays non-default query options set by planner
Bikramjeet Vig has uploaded a new patch set (#3). Change subject: IMPALA-4276: Profile displays non-default query options set by planner .. IMPALA-4276: Profile displays non-default query options set by planner Fix to populate the non-default query options set by planner in the runtime profile. Added a corresponding test case. Change-Id: I08e9dc2bebb83101976bbbd903ee48c5068dbaab --- M be/src/service/client-request-state.cc M tests/query_test/test_observability.py 2 files changed, 23 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/7419/3 -- To view, visit http://gerrit.cloudera.org:8080/7419 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I08e9dc2bebb83101976bbbd903ee48c5068dbaab Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Henry Robinson
[Impala-ASF-CR] IMPALA-4276: Profile displays non-default query options set by planner
Bikramjeet Vig has uploaded a new patch set (#2). Change subject: IMPALA-4276: Profile displays non-default query options set by planner .. IMPALA-4276: Profile displays non-default query options set by planner Fix to populate the non-default query options set by planner in the runtime profile. Added a corresponding test case. Change-Id: I08e9dc2bebb83101976bbbd903ee48c5068dbaab --- M be/src/service/client-request-state.cc M be/src/service/query-options.cc M tests/query_test/test_observability.py 3 files changed, 25 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/7419/2 -- To view, visit http://gerrit.cloudera.org:8080/7419 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I08e9dc2bebb83101976bbbd903ee48c5068dbaab Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Henry Robinson
[Impala-ASF-CR] IMPALA-4276: Profile displays non-default query options set by planner
Bikramjeet Vig has posted comments on this change. Change subject: IMPALA-4276: Profile displays non-default query options set by planner .. Patch Set 1: (5 comments) http://gerrit.cloudera.org:8080/#/c/7419/1//COMMIT_MSG Commit Message: PS1, Line 10: expilictly > explicitly Removing this, according to the comment in query-options.cc, this is not a bug PS1, Line 10: expilictly set : default query options were also populated in the runtime profile. > can you clarify what this means - what's an explicitly set default query op same as above http://gerrit.cloudera.org:8080/#/c/7419/1/be/src/service/client-request-state.cc File be/src/service/client-request-state.cc: PS1, Line 151: exec_request > nit: use exec_request_ Done http://gerrit.cloudera.org:8080/#/c/7419/1/be/src/service/query-options.cc File be/src/service/query-options.cc: PS1, Line 84: > can you quickly explain why this is right? It looks to me like we would wan Thanks for pointing that out. I was under the impression that all values had a default value set during initialization when I looked at ImpalaInternalService_types.h I now realize that some values dont have a default value set in the thrift file and this is indicated by __isset.NAME. Hence reverting this. http://gerrit.cloudera.org:8080/#/c/7419/1/tests/query_test/test_observability.py File tests/query_test/test_observability.py: PS1, Line 112: query = "set mem_limit = 0" > can you change this to some other query option, then test that MEM_LIMIT=8 Done -- To view, visit http://gerrit.cloudera.org:8080/7419 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08e9dc2bebb83101976bbbd903ee48c5068dbaab Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Henry Robinson Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5659: Bump toolchain version
Henry Robinson has posted comments on this change. Change subject: IMPALA-5659: Bump toolchain version .. Patch Set 1: This could be combined with https://gerrit.cloudera.org/#/c/7418/, but I wanted to get it committed early so that the build machines have a chance to pick up the change (because the gflags version hasn't actually changed, the toolchain bootstrap won't replace an existing gflags install, so we want to wait for the build machines to be restarted, which happens periodocally). -- To view, visit http://gerrit.cloudera.org:8080/7422 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9fe91b37ae00cd3d2e35b6a5c465f55238f32d1a Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5659: Bump toolchain version
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-5659: Bump toolchain version .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7422 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9fe91b37ae00cd3d2e35b6a5c465f55238f32d1a Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5659: Bump toolchain version
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5659: Bump toolchain version .. Patch Set 1: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/860/ -- To view, visit http://gerrit.cloudera.org:8080/7422 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9fe91b37ae00cd3d2e35b6a5c465f55238f32d1a Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5659: Bump toolchain version
Henry Robinson has uploaded a new change for review. http://gerrit.cloudera.org:8080/7422 Change subject: IMPALA-5659: Bump toolchain version .. IMPALA-5659: Bump toolchain version This bumps the toolchain version to include the fix which ensured gflags built shared libraries. Impala does not currently depend on that change, but an upcoming change will do so. Change-Id: I9fe91b37ae00cd3d2e35b6a5c465f55238f32d1a --- M bin/impala-config.sh 1 file changed, 1 insertion(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/7422/1 -- To view, visit http://gerrit.cloudera.org:8080/7422 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I9fe91b37ae00cd3d2e35b6a5c465f55238f32d1a Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson
[Impala-ASF-CR] IMPALA-5539: Fix Kudu timestamp with -use local tz for unix ts
Matthew Jacobs has uploaded a new patch set (#4). Change subject: IMPALA-5539: Fix Kudu timestamp with -use_local_tz_for_unix_ts .. IMPALA-5539: Fix Kudu timestamp with -use_local_tz_for_unix_ts The -use_local_tz_for_unix_timestamp_conversion flag exists to specify if TIMESTAMPs should be interpreted as localtime or UTC when converting to/from Unix time via builtins: from_unixtime(bigint unixtime) unix_timestamp(string datetime[, ...]) unix_timestamp(timestamp datetime) However, the KuduScanner was calling into code that, when the gflag above was set, interpreted Unix times as local time. Unfortunately the write path (KuduTableSink) and some FE TIMESTAMP code (see KuduUtil.java) did not have this behavior, i.e. we were handling the gflag inconsistently. Tests: * Adds a custom cluster test to run Kudu test cases with -use_local_tz_for_unix_timestamp_conversion. * Adds tests for the new builtin unix_micros_to_utc_timestamp() which run in a custom cluster test (added test_local_tz_conversion.py) as well as in the regular tests (added to test_exprs.py). Change-Id: I423a810427353be76aa64442044133a9a22cdc9b --- M be/src/exec/kudu-scanner.cc M be/src/exprs/timestamp-functions-ir.cc M be/src/exprs/timestamp-functions.h M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M be/src/runtime/timestamp-value.inline.h M common/function-registry/impala_functions.py M fe/src/main/java/org/apache/impala/catalog/KuduColumn.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java A testdata/workloads/functional-query/queries/QueryTest/utc-timestamp-functions.test M tests/custom_cluster/test_kudu.py A tests/custom_cluster/test_local_tz_conversion.py M tests/query_test/test_exprs.py M tests/query_test/test_kudu.py 14 files changed, 106 insertions(+), 16 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/7311/4 -- To view, visit http://gerrit.cloudera.org:8080/7311 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I423a810427353be76aa64442044133a9a22cdc9b Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs
[Impala-ASF-CR] IMPALA-5539: Fix Kudu timestamp with -use local tz for unix ts
Matthew Jacobs has uploaded a new patch set (#3). Change subject: IMPALA-5539: Fix Kudu timestamp with -use_local_tz_for_unix_ts .. IMPALA-5539: Fix Kudu timestamp with -use_local_tz_for_unix_ts The -use_local_tz_for_unix_timestamp_conversion flag exists to specify if TIMESTAMPs should be interpreted as localtime or UTC when converting to/from Unix time via builtins: from_unixtime(bigint unixtime) unix_timestamp(string datetime[, ...]) unix_timestamp(timestamp datetime) However, the KuduScanner was calling into code that, when the gflag above was set, interpreted Unix times as local time. Unfortunately the write path (KuduTableSink) and some FE TIMESTAMP code (see KuduUtil.java) did not have this behavior, i.e. we were handling the gflag inconsistently. Tests: * Adds a custom cluster test to run Kudu test cases with -use_local_tz_for_unix_timestamp_conversion. * Adds tests for the new builtin * unix_micros_to_utc_timestamp() which run in a custom cluster test (added test_local_tz_conversion.py) as well as in the regular tests (added to test_exprs.py). Change-Id: I423a810427353be76aa64442044133a9a22cdc9b --- M be/src/exec/kudu-scanner.cc M be/src/exprs/timestamp-functions-ir.cc M be/src/exprs/timestamp-functions.h M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M be/src/runtime/timestamp-value.inline.h M common/function-registry/impala_functions.py M fe/src/main/java/org/apache/impala/catalog/KuduColumn.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java A testdata/workloads/functional-query/queries/QueryTest/utc-timestamp-functions.test M tests/custom_cluster/test_kudu.py A tests/custom_cluster/test_local_tz_conversion.py M tests/query_test/test_exprs.py M tests/query_test/test_kudu.py 14 files changed, 106 insertions(+), 16 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/7311/3 -- To view, visit http://gerrit.cloudera.org:8080/7311 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I423a810427353be76aa64442044133a9a22cdc9b Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs
[Impala-ASF-CR] IMPALA-5394: Handle blocked HS2 connections
John Sherman has posted comments on this change. Change subject: IMPALA-5394: Handle blocked HS2 connections .. Patch Set 3: Yeah, I'll give it a go. > Do you think that's something you could add to this patch? -- To view, visit http://gerrit.cloudera.org:8080/7061 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Sherman Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: John Sherman Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5627: fix dropped statuses in HDFS writers
Lars Volker has posted comments on this change. Change subject: IMPALA-5627: fix dropped statuses in HDFS writers .. Patch Set 6: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/7372 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I753d352c640faf5eaef650cd743e53de53761431 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5417: make I/O buffer queue fixed-size
Tim Armstrong has uploaded a new patch set (#2). Change subject: IMPALA-5417: make I/O buffer queue fixed-size .. IMPALA-5417: make I/O buffer queue fixed-size This removes the dynamically-varying queue size behaviour in the I/O manager. The motivation is to bound resource consumption of scans and make it possible to reserve memory for I/O buffers upfront. Does some cleanup/documentation of the locking policy. Fix some cases in ScanRange::GetNext() where members documented as being protected by ScanRange::lock_ were accessed without holding it. I think the races were either benign or prevented by holding DiskIoRequestContext::lock_ in practice. Testing: Ran exhaustive build. Change-Id: If7cc3f7199f5320db00b7face97a96cdadb6f83f --- M be/src/runtime/disk-io-mgr-internal.h M be/src/runtime/disk-io-mgr-reader-context.cc M be/src/runtime/disk-io-mgr-scan-range.cc M be/src/runtime/disk-io-mgr-test.cc M be/src/runtime/disk-io-mgr.cc M be/src/runtime/disk-io-mgr.h 6 files changed, 266 insertions(+), 309 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/7408/2 -- To view, visit http://gerrit.cloudera.org:8080/7408 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: If7cc3f7199f5320db00b7face97a96cdadb6f83f Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong
[Impala-ASF-CR] IMPALA-5627: fix dropped statuses in HDFS writers
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5627: fix dropped statuses in HDFS writers .. Patch Set 5: (2 comments) http://gerrit.cloudera.org:8080/#/c/7372/5/be/src/exec/hdfs-parquet-table-writer.cc File be/src/exec/hdfs-parquet-table-writer.cc: PS5, Line 264: next_page_encoding_ > Should this say "same as 'current_encoding_'"? Done PS5, Line 265: if when > nit: superfluous word? Done -- To view, visit http://gerrit.cloudera.org:8080/7372 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I753d352c640faf5eaef650cd743e53de53761431 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5627: fix dropped statuses in HDFS writers
Tim Armstrong has uploaded a new patch set (#6). Change subject: IMPALA-5627: fix dropped statuses in HDFS writers .. IMPALA-5627: fix dropped statuses in HDFS writers The change is mostly mechanical - added Status returns where need. In one place I restructured the the logic around 'current_encoding_' for Parquet to allow a cleaner solution to the dropped status from FinalizeCurrentPage() call in ProcessValue(): after the restructuring the call was no longer needed. 'current_encoding_' was overloaded to represent both the encoding of the current page and the preferred encoding for subsequent pages. Testing: Ran exhaustive build. Change-Id: I753d352c640faf5eaef650cd743e53de53761431 --- M be/src/exec/hdfs-avro-table-writer.cc M be/src/exec/hdfs-avro-table-writer.h M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/hdfs-parquet-table-writer.h M be/src/exec/hdfs-sequence-table-writer.cc M be/src/exec/hdfs-table-writer.h M be/src/exec/parquet-column-stats.cc M be/src/exec/parquet-column-stats.h M be/src/exec/parquet-column-stats.inline.h M be/src/runtime/string-buffer.h 10 files changed, 98 insertions(+), 71 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/7372/6 -- To view, visit http://gerrit.cloudera.org:8080/7372 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I753d352c640faf5eaef650cd743e53de53761431 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5627: fix dropped statuses in HDFS writers
Lars Volker has posted comments on this change. Change subject: IMPALA-5627: fix dropped statuses in HDFS writers .. Patch Set 5: (2 comments) http://gerrit.cloudera.org:8080/#/c/7372/5/be/src/exec/hdfs-parquet-table-writer.cc File be/src/exec/hdfs-parquet-table-writer.cc: PS5, Line 264: next_page_encoding_ Should this say "same as 'current_encoding_'"? PS5, Line 265: if when nit: superfluous word? -- To view, visit http://gerrit.cloudera.org:8080/7372 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I753d352c640faf5eaef650cd743e53de53761431 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5498: Support for partial sorts
Mostafa Mokhtar has posted comments on this change. Change subject: IMPALA-5498: Support for partial sorts .. Patch Set 4: Can you extend the Sort metrics in the query profile to include Avg, Min and Max run size? This will be needed to reflect on the quality of the sorted data? -- To view, visit http://gerrit.cloudera.org:8080/7267 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ieec2a15a0cc5240b1c13682067ab64670d1e0a38 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: No