[Impala-ASF-CR] IMPALA-5513: Fix display message exception when using invalid KEYVAL

2017-07-14 Thread Donghui Xu (Code Review)
Donghui Xu has posted comments on this change.

Change subject: IMPALA-5513: Fix display message exception when using invalid 
KEYVAL
..


Patch Set 7: -Code-Review

-- 
To view, visit http://gerrit.cloudera.org:8080/7187
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib883499a88f39d91b69bea4291f1ce5dd264ccf6
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Donghui Xu 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Donghui Xu 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Jim Apple 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5513: Fix display message exception when using invalid KEYVAL

2017-07-14 Thread Donghui Xu (Code Review)
Donghui Xu has posted comments on this change.

Change subject: IMPALA-5513: Fix display message exception when using invalid 
KEYVAL
..


Patch Set 7: Code-Review+1

Thank you for the suggestion that I have modified the code.

-- 
To view, visit http://gerrit.cloudera.org:8080/7187
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib883499a88f39d91b69bea4291f1ce5dd264ccf6
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Donghui Xu 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Donghui Xu 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Jim Apple 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5513: Fix display message exception when using invalid KEYVAL

2017-07-14 Thread Donghui Xu (Code Review)
Hello Bharath Vissapragada,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7187

to look at the new patch set (#7).

Change subject: IMPALA-5513: Fix display message exception when using invalid 
KEYVAL
..

IMPALA-5513: Fix display message exception when using invalid KEYVAL

Function print_to_stderr() has a syntax error when an error message is
displayed.

I solved this problem by exchanging the position of variable and the
subsequent strings in function print_to_stderr().

Change-Id: Ib883499a88f39d91b69bea4291f1ce5dd264ccf6
---
M shell/impala_shell.py
M tests/shell/test_shell_commandline.py
2 files changed, 6 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/7187/7
-- 
To view, visit http://gerrit.cloudera.org:8080/7187
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib883499a88f39d91b69bea4291f1ce5dd264ccf6
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Donghui Xu 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Donghui Xu 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Jim Apple 


[Impala-ASF-CR] IMPALA-4939, IMPALA-4939: Decimal V2 multiplication

2017-07-14 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-4939, IMPALA-4939: Decimal V2 multiplication
..


Patch Set 1:

What is the perf implication of the backend changes?

-- 
To view, visit http://gerrit.cloudera.org:8080/7438
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4939, IMPALA-4939: Decimal V2 multiplication

2017-07-14 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-4939, IMPALA-4939: Decimal V2 multiplication
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7438/1/fe/src/main/java/org/apache/impala/analysis/TypesUtil.java
File fe/src/main/java/org/apache/impala/analysis/TypesUtil.java:

PS1, Line 261:if (resultPrecision > 38) {
 :   int minScale = Math.min(resultScale, 
ScalarType.MIN_ADJUSTED_SCALE);
 :   // How low would the scale need to be in the ideal 
case, in order to be able to
 :   // store all the digits to the LEFT of the dot. This 
number may be negative.
 :   int idealScale = resultScale - (resultPrecision - 
ScalarType.MAX_PRECISION);
 :   resultScale = Math.max(idealScale, minScale);
 :   resultPrecision = 38;
 : }
doesn't the call to ScalarType.createAdjustedDecimalType() on line 277 do that?


-- 
To view, visit http://gerrit.cloudera.org:8080/7438
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4939, IMPALA-4939: Decimal V2 multiplication

2017-07-14 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7438

Change subject: IMPALA-4939, IMPALA-4939: Decimal V2 multiplication
..

IMPALA-4939, IMPALA-4939: Decimal V2 multiplication

Implement the new DECIMAL return type rules for multiply expressions,
active when query option DECIMAL_V2=1. The algorithm for determining
the type of the result of multiplication is described in the JIRA.

DECIMAL V1:

+---+
| typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) |
+---+
| DECIMAL(38,38)|
+---+

+---+
| typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) |
+---+
| DECIMAL(38,30)|
+---+

DECIMAL V2:

+---+
| typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) |
+---+
| DECIMAL(38,37)|
+---+

+---+
| typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) |
+---+
| DECIMAL(38,6) |
+---+

In this patch, we also fix the early multiplication overflow. We compute
an int256 intermediate value, which we then attempt to scale down and
round to int128.

Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1
---
M be/src/exprs/expr-test.cc
M be/src/runtime/decimal-value.inline.h
M be/src/util/bit-util.h
M be/src/util/decimal-util.h
M fe/src/main/java/org/apache/impala/analysis/TypesUtil.java
5 files changed, 385 insertions(+), 29 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/7438/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7438
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-4276: Profile displays non-default query options set by planner

2017-07-14 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has posted comments on this change.

Change subject: IMPALA-4276: Profile displays non-default query options set by 
planner
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7419/3/tests/query_test/test_observability.py
File tests/query_test/test_observability.py:

PS3, Line 101: self.execute_query("set mem_limit = 8589934592")
 : # For this query, the pla
> here and below, I don't see any reason to define these short queries as a l
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7419
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08e9dc2bebb83101976bbbd903ee48c5068dbaab
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4276: Profile displays non-default query options set by planner

2017-07-14 Thread Bikramjeet Vig (Code Review)
Hello Matthew Jacobs,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7419

to look at the new patch set (#4).

Change subject: IMPALA-4276: Profile displays non-default query options set by 
planner
..

IMPALA-4276: Profile displays non-default query options set by planner

Fix to populate the non-default query options set by planner in the
runtime profile.

Added a corresponding test case.

Change-Id: I08e9dc2bebb83101976bbbd903ee48c5068dbaab
---
M be/src/service/client-request-state.cc
M tests/query_test/test_observability.py
2 files changed, 19 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/7419/4
-- 
To view, visit http://gerrit.cloudera.org:8080/7419
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I08e9dc2bebb83101976bbbd903ee48c5068dbaab
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Matthew Jacobs 


[Impala-ASF-CR] IMPALA-5520: TopN node periodically reclaims old allocations

2017-07-14 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has uploaded a new patch set (#6).

Change subject: IMPALA-5520: TopN node periodically reclaims old allocations
..

IMPALA-5520: TopN node periodically reclaims old allocations

Currently TopN retains old string allocations in a tuple pool which is
held longer than necessary, resulting in unnecessary memory usage.
With this commit, the TopN node will periodically re-materialise the
rows stored in the priority queue and reclaim the old allocations.
This is done when the number of rows removed from the priority queue
is more than twice the N (limit + offset). Moreover, a new counter
called "TuplePoolReclamations" is added to the TopN node that keeps
track of the number of times the tuple pool is reclaimed.

Testing:
Test added to test_queries.py which sets a low mem_limit such
that the test would fail if reclamation is not implemented and pass
otherwise.

Performance:
Query 1 (expected general case):
select * from tpch.lineitem order by l_orderkey desc limit 10;

Query 2 (example worst case: data stored in reverse order before
feeding to the last TopN node):
select * from (select * from tpch.lineitem order by l_orderkey desc
limit 6001215) tb order by l_orderkey limit 10;

   With Reclaim   Without Reclaim
   Query 1 Query 2  Query 1 Query 2
MaxTuplePoolMem3.96 KB 3.43 KB  110.2 MB708.8 MB
Time (mean)2s 218ms6s 391ms 2s 021ms6s 406ms
Time (stdev)   74.38ms 67.45ms  102.71ms70.44ms
Reclaims910 5861  N/A N/A

We notice that memory footprint is orders of magnitude lower while
maintaining similar query runtimes. Cluster perf testing will be done
later.

Change-Id: I968f57f0ff2905bd581908bc5c5ee486b31e6aa8
---
M be/src/exec/topn-node-ir.cc
M be/src/exec/topn-node.cc
M be/src/exec/topn-node.h
M tests/query_test/test_queries.py
4 files changed, 120 insertions(+), 22 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/7400/6
-- 
To view, visit http://gerrit.cloudera.org:8080/7400
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I968f57f0ff2905bd581908bc5c5ee486b31e6aa8
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5520: TopN node periodically reclaims old allocations

2017-07-14 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has posted comments on this change.

Change subject: IMPALA-5520: TopN node periodically reclaims old allocations
..


Patch Set 5:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/7400/5//COMMIT_MSG
Commit Message:

PS5, Line 19: test_tpch_queries
> The test doesn't belong in this file, see my comment in the test file itsel
Done


http://gerrit.cloudera.org:8080/#/c/7400/5/be/src/exec/topn-node-ir.cc
File be/src/exec/topn-node-ir.cc:

PS5, Line 53: )
: PushHeap(priority_queue_,
: 
ComparatorWrapper(*tuple_row_less_than_), insert_tuple);
> wrap this in { } per the c++ style guide:
Done


http://gerrit.cloudera.org:8080/#/c/7400/5/be/src/exec/topn-node.cc
File be/src/exec/topn-node.cc:

PS5, Line 255: std::
> Shouldn't be needed (common/names.h has "using std::unique_ptr").
Done


http://gerrit.cloudera.org:8080/#/c/7400/5/be/src/exec/topn-node.h
File be/src/exec/topn-node.h:

PS5, Line 114: tuple_pool
> tuple_pool_
Done


PS5, Line 126: modified only using
 :   /// push_heap()/pop_heap() to maintain ordered heap invariants
> this makes it sound like it should be encapsulated in an inner class, with 
Done


PS5, Line 133: priority_queue_
> remove the trailing _
Done


PS5, Line 135: inline
> Could be static, since it doesn't reference member variables.
Done


PS5, Line 135: std::vector&
> Our convention is to pass modified arguments by pointer  - https://google.g
Done


PS5, Line 133:   /// Helper methods for modifying priority_queue_ while 
maintaining ordered heap
 :   /// invariants
 :   inline void PushHeap(std::vector& priority_queue,
 :   const ComparatorWrapper& comparator, 
Tuple* const insert_row) {
 : priority_queue.push_back(insert_row);
 : std::push_heap(priority_queue.begin(), priority_queue.end(), 
comparator);
 :   }
 : 
 :   inline void PopHeap(std::vector& priority_queue,
 :   const ComparatorWrapper& comparator) {
 : std::pop_heap(priority_queue.begin(), priority_queue.end(), 
comparator);
 : priority_queue.pop_back();
> move these functions up where the other private functions are
Done


http://gerrit.cloudera.org:8080/#/c/7400/5/tests/query_test/test_tpch_queries.py
File tests/query_test/test_tpch_queries.py:

PS5, Line 74: cls.ImpalaTestMatrix.add_constraint(lambda v:\
: v.get_value('table_format').file_format == 'text')
: cls.ImpalaTestMatrix.add_constraint(lambda v:\
: v.get_value('table_format').compression_codec == 'none')
> helper fn to do this:
Done


PS5, Line 56: class TestTopNReclaimQuery(ImpalaTestSuite):
:   """Test class to validate that TopN periodically reclaims tuple 
pool memory
:and runs with a lower memory footprint."""
:   QUERY = "select * from tpch.lineitem order by l_orderkey desc 
limit 10;"
: 
:   # Mem limit empirically selected so that the query fails if 
tuple pool reclamation
:   # is not implemented for TopN
:   MEM_LIMIT = "50m"
: 
:   @classmethod
:   def get_workload(self):
: return 'tpch'
: 
:   @classmethod
:   def add_test_dimensions(cls):
: super(TestTopNReclaimQuery, cls).add_test_dimensions()
: # The tpch tests take a long time to execute so restrict the 
combinations they
: # execute over.
: cls.ImpalaTestMatrix.add_constraint(lambda v:\
: v.get_value('table_format').file_format == 'text')
: cls.ImpalaTestMatrix.add_constraint(lambda v:\
: v.get_value('table_format').compression_codec == 'none')
: 
:   def test_top_n_reclaim(self, vector):
: exec_options = vector.get_value('exec_option')
: exec_options['mem_limit'] = self.MEM_LIMIT
: result = self.execute_query(self.QUERY, exec_options)
: runtime_profile = str(result.runtime_profile)
: num_of_times_tuple_pool_reclaimed = re.findall(
:   'TuplePoolReclamations: ([0-9]*)', runtime_profile)
: # Confirm newly added counter is visible
: assert len(num_of_times_tuple_pool_reclaimed) > 0
: # Tuple pool is expected to be reclaimed for this query
: for n in num_of_times_tuple_pool_reclaimed:
:   assert int(n) > 0
> this doesn't belong in test_tpch_queries.py, it just happens to use a tpch 
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7400
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I968f57f0ff2905bd5

[Impala-ASF-CR] IMPALA-4276: Profile displays non-default query options set by planner

2017-07-14 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-4276: Profile displays non-default query options set by 
planner
..


Patch Set 3: Code-Review+1

(1 comment)

LGTM with 1 small test change. Let's wait to see if Henry wants to add anything 
else.

http://gerrit.cloudera.org:8080/#/c/7419/3/tests/query_test/test_observability.py
File tests/query_test/test_observability.py:

PS3, Line 101: query = "set mem_limit = 8589934592"
 : self.execute_query(query)
here and below, I don't see any reason to define these short queries as a local 
variable, just pass the strings directly to execute_query(...)


-- 
To view, visit http://gerrit.cloudera.org:8080/7419
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08e9dc2bebb83101976bbbd903ee48c5068dbaab
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5513: Fix display message exception when using invalid KEYVAL

2017-07-14 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-5513: Fix display message exception when using invalid 
KEYVAL
..


Patch Set 6: Code-Review+1

Didn't mean to -1, feel free to carry my +1 once you fix the test.

-- 
To view, visit http://gerrit.cloudera.org:8080/7187
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib883499a88f39d91b69bea4291f1ce5dd264ccf6
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Donghui Xu 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Donghui Xu 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Jim Apple 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5513: Fix display message exception when using invalid KEYVAL

2017-07-14 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-5513: Fix display message exception when using invalid 
KEYVAL
..


Patch Set 6: -Code-Review

Sorry for holding this up, didn't realize it was waiting on me until Jim 
pointed out (thanks Jim). So I think it is ok to club those tests together. We 
generally group them by functionality and in this case I don't think we need to 
create a separate unit test function for such a trivial test.

-- 
To view, visit http://gerrit.cloudera.org:8080/7187
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib883499a88f39d91b69bea4291f1ce5dd264ccf6
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Donghui Xu 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Donghui Xu 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Jim Apple 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5498: Support for partial sorts

2017-07-14 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5498: Support for partial sorts
..


Patch Set 6:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/7267/6//COMMIT_MSG
Commit Message:

Line 7: IMPALA-5498: Support for partial sorts
lets specify in Kudu for now


PS6, Line 28: A
nit: This also adds...


Line 32: used previously for inserts into Kudu.
"for inserts into Kudu"->"for DML statements into Kudu tables only. Future work 
can extend this to other table sinks IMPALA-"


Line 42:   Now: no spills, sort takes 6m19s, for ~18% speedup
comparison vs no partition/sort at all?


http://gerrit.cloudera.org:8080/#/c/7267/6/be/src/exec/partial-sort-node.h
File be/src/exec/partial-sort-node.h:

PS6, Line 62: sort_exec_exprs_
update


PS6, Line 62: derived based
awkward/confusing


http://gerrit.cloudera.org:8080/#/c/7267/6/be/src/runtime/sorter.cc
File be/src/runtime/sorter.cc:

PS6, Line 1383: RunSize
can you indicate the units, e.g. #rows -> NumRowsPerRun?


PS6, Line 1387:   if (no_spill_) {
  : // 'min_buffers_required' is 1 since the sorter will only 
have 1 run at a time, so
  : // there won't be any merges.
  : min_buffers_required = 1;
  :   } else {
  : min_buffers_required = MIN_BUFFERS_PER_MERGE;
  :   }
nit: ternary operator


http://gerrit.cloudera.org:8080/#/c/7267/6/be/src/runtime/sorter.h
File be/src/runtime/sorter.h:

PS6, Line 45: Specifying 'no_spill' as true is
For this use case, 'no_spill' should be set to true so the sorter can reduce...


PS6, Line 103: bool no_spill = false
comment on this even though it's mentioned above


Line 117:   /// up, it is sorted and a new unsorted run is created.
comment that this may not be called if no_spill is true


PS6, Line 127: InputDone
comment that this may not be called if no_spill is true
(are there any other cases I missed?)


http://gerrit.cloudera.org:8080/#/c/7267/6/fe/src/main/java/org/apache/impala/planner/SortNode.java
File fe/src/main/java/org/apache/impala/planner/SortNode.java:

PS6, Line 56: TODO: run experiments to
:   // determine the value for this, consider making it 
configurable, enforce it in the BE.
JIRA?


PS6, Line 124: {
 : return type_ != TSortType.PARTIAL;
 :   }
nit: oneline?


-- 
To view, visit http://gerrit.cloudera.org:8080/7267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ieec2a15a0cc5240b1c13682067ab64670d1e0a38
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5498: Support for partial sorts

2017-07-14 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5498: Support for partial sorts
..


Patch Set 6: Code-Review+1

(1 comment)

LGTM aside from the variable name

http://gerrit.cloudera.org:8080/#/c/7267/6/be/src/runtime/sorter.h
File be/src/runtime/sorter.h:

PS6, Line 103: no_spill
A positive argument would be marginally easier to follow. E.g. 'enable_spilling'


-- 
To view, visit http://gerrit.cloudera.org:8080/7267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ieec2a15a0cc5240b1c13682067ab64670d1e0a38
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5586: Null-aware anti-join can take a long time to cancel

2017-07-14 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5586: Null-aware anti-join can take a long time to cancel
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7393/3/be/src/exec/partitioned-hash-join-node.cc
File be/src/exec/partitioned-hash-join-node.cc:

Line 1018:   // For each row, iterate over all rows in the build table.
The need for RETURN_IF_CANCELLED is a little subtle. A one-line comment would 
be useful, e.g. "This loop can run for a long time. Periodically check for 
cancellation."


Line 1023: for (int j = 0; j < build_rows->num_rows(); ++j) {
This loop can also potentially run for a long time if build_rows is large. It 
will be a bit expensive to check every iteration but maybe check every 1024 
rows or something like that (I'm suggesting a power-of-two because computing % 
is cheap for a constant power-of-two).

  if (j % 1024 == 0) RETURN_IF_CANCELLED(state);


-- 
To view, visit http://gerrit.cloudera.org:8080/7393
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0800754d4ad31cbadbdfadc630c640963f3f6053
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5667: Race in DataStremSender could cause TransmitData sidecar corruption

2017-07-14 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-5667: Race in DataStremSender could cause TransmitData 
sidecar corruption
..


Patch Set 1:

> Uploaded patch set 1.

I will stress test this patch to make sure that we don't hit the DCHECK anymore 
and post my findings here. I wanted to push this to gerrit so that it would be 
easier to have a discussion.

-- 
To view, visit http://gerrit.cloudera.org:8080/7436
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7742551236205c49393a7351291eae6aefb4fa09
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5659: Begin standardizing treatment of thirdparty libraries

2017-07-14 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5659: Begin standardizing treatment of thirdparty 
libraries
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7418/1/CMakeLists.txt
File CMakeLists.txt:

PS1, Line 125: impala_add_thirdparty_lib
upper case function name? It seems weird that the declaration is lower case and 
all the callsites are upper case.


Line 136:   if (NOT STATIC_LIB)
Could we simplify this to:

if (BUILD_SHARED_LIBS OR NOT STATIC_LIB)

else

endif()


-- 
To view, visit http://gerrit.cloudera.org:8080/7418
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib7a6bc5610aaf2450f91348d94cfb984c6a4b78d
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5586: Null-aware anti-join can take a long time to cancel

2017-07-14 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-5586: Null-aware anti-join can take a long time to cancel
..


Patch Set 3: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7393
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0800754d4ad31cbadbdfadc630c640963f3f6053
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: No


[Impala-ASF-CR] [SECURITY] Use KRPC's Kinit code to avoid expensive fork

2017-07-14 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has abandoned this change.

Change subject: [SECURITY] Use KRPC's Kinit code to avoid expensive fork
..


Abandoned

-- 
To view, visit http://gerrit.cloudera.org:8080/7428
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: I576ff4d2ffcca75f9247132e11c7a2ca50b255bc
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 


[Impala-ASF-CR] Downgrade log level of "sampled call" message

2017-07-14 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has abandoned this change.

Change subject: Downgrade log level of "sampled call" message
..


Abandoned

-- 
To view, visit http://gerrit.cloudera.org:8080/7424
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: I77d50f0f5b47cf3ff28253cf7d0f970853f8a017
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Michael Ho 


[Impala-ASF-CR] [SECURITY] Enable MiniKdc and add kerberized BE tests

2017-07-14 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has abandoned this change.

Change subject: [SECURITY] Enable MiniKdc and add kerberized BE tests
..


Abandoned

-- 
To view, visit http://gerrit.cloudera.org:8080/7427
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: I231c7cbc7433fa84da780b6168ae1b30ddeeab57
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-4671: Add Impala-native ServicePool

2017-07-14 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has abandoned this change.

Change subject: IMPALA-4671: Add Impala-native ServicePool
..


Abandoned

-- 
To view, visit http://gerrit.cloudera.org:8080/7423
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: I1fe97dcdd45f0984913a52677f3d13136c641786
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Michael Ho 


[Impala-ASF-CR] tmp: force static linking against ssl

2017-07-14 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has abandoned this change.

Change subject: tmp: force static linking against ssl
..


Abandoned

-- 
To view, visit http://gerrit.cloudera.org:8080/7432
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: I0e927f9a51241a3ade4eb253d4a8d33b2650bedb
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Henry Robinson 


[Impala-ASF-CR] Honor enable shared from this<> rule of sharing only pre-shared objects

2017-07-14 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has abandoned this change.

Change subject: Honor enable_shared_from_this<> rule of sharing only pre-shared 
objects
..


Abandoned

-- 
To view, visit http://gerrit.cloudera.org:8080/7433
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: Ib32e247c270f480ce203058f1a5aa8445b5bcec6
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-5498: Support for partial sorts

2017-07-14 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-5498: Support for partial sorts
..


Patch Set 6:

(1 comment)

> Can you extend the Sort metrics in the query profile to include
 > Avg, Min and Max run size?
 > This will be needed to reflect on the quality of the sorted data?

Done.

http://gerrit.cloudera.org:8080/#/c/7267/4/be/src/runtime/sorter.h
File be/src/runtime/sorter.h:

PS4, Line 99: to the merger and retrieve r
> I don't feel strongly - the change you made looks good to me. Maybe 'is_par
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ieec2a15a0cc5240b1c13682067ab64670d1e0a38
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4875: Bound maximum statestore topic update size to 2GB.

2017-07-14 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has abandoned this change.

Change subject: IMPALA-4875: Bound maximum statestore topic update size to 2GB.
..


Abandoned

-- 
To view, visit http://gerrit.cloudera.org:8080/7425
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: Icbe42704d3c98d9619e3737786b75ee67fb2a264
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Michael Ho 


[Impala-ASF-CR] IMPALA-5659: Begin standardizing treatment of thirdparty libraries

2017-07-14 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-5659: Begin standardizing treatment of thirdparty 
libraries
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7418/1/CMakeLists.txt
File CMakeLists.txt:

PS1, Line 232: FLATBUFFERS_LIBS
> nit: FLATBUFFERS_STATIC_LIB for consistency. May need to update flatbuffer'
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7418
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib7a6bc5610aaf2450f91348d94cfb984c6a4b78d
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Add two-way negotiation thread pools

2017-07-14 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has abandoned this change.

Change subject: Add two-way negotiation thread pools
..


Abandoned

-- 
To view, visit http://gerrit.cloudera.org:8080/7430
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: Ia4cbdd0e7e3b570e8d4e8c52fbff5d353532a310
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Henry Robinson 


[Impala-ASF-CR] IMPALA-5659: Begin standardizing treatment of thirdparty libraries

2017-07-14 Thread Henry Robinson (Code Review)
Henry Robinson has uploaded a new patch set (#2).

Change subject: IMPALA-5659: Begin standardizing treatment of thirdparty 
libraries
..

IMPALA-5659: Begin standardizing treatment of thirdparty libraries

If Impala was built with --build_shared_libs, some thirdparty libraries
were still statically linked; this could cause runtime errors if the
libraries were also linked into a .so. This patch fixes that issue (for
gflags, glog and protobuf at least) by ensuring that build_shared_libs
is respected for those libraries.

* Standardize thirdparty library handling w/CMake by adding
  IMPALA_ADD_THIRDPARTY_LIB. This creates a symbolic name for each
  library, allowing us to switch the underlying library
  files (e.g. change from static to dynamic linking) without having to
  individually change the link clauses for each target.

* Remove most cases of add_library() from cmake_modules/* - that is all
  handled by IMPALA_ADD_THIRDPARTY_LIB.

* Add shared library detection for a couple of thirdparty
  dependencies (many only detect static libraries), just to prove the concept.

* All thirdparty libraries now print a standard set of messages. For example:

-- --> Adding thirdparty library protoc. <--
-- Header files: 
/data/henry/src/cloudera/impala-toolchain/protobuf-2.6.1/include
-- Added shared library dependency protoc: 
/data/henry/src/cloudera/impala-toolchain/protobuf-2.6.1/lib/libprotoc.so
-- --> Adding thirdparty library libev. <--
-- Header files: /data/henry/src/cloudera/impala-toolchain/libev-4.20/include
-- Added shared library dependency libev:
-- /data/henry/src/cloudera/impala-toolchain/libev-4.20/lib/libev.so

* Some libraries don't quite fit this pattern (LLVM and Boost) - leave
  them as is for now.

* Remove FindOpenSSL.cmake - the toolchain one is more modern.

Change-Id: Ib7a6bc5610aaf2450f91348d94cfb984c6a4b78d
---
M CMakeLists.txt
M be/CMakeLists.txt
M cmake_modules/FindAvro.cmake
M cmake_modules/FindBreakpad.cmake
M cmake_modules/FindFlatBuffers.cmake
M cmake_modules/FindGFlags.cmake
M cmake_modules/FindGLog.cmake
M cmake_modules/FindGTest.cmake
M cmake_modules/FindHDFS.cmake
M cmake_modules/FindLdap.cmake
M cmake_modules/FindLz4.cmake
D cmake_modules/FindOpenSSL.cmake
M cmake_modules/FindRe2.cmake
M cmake_modules/FindSnappy.cmake
M cmake_modules/FindZlib.cmake
M cmake_modules/kudu_cmake_fns.txt
16 files changed, 120 insertions(+), 263 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/18/7418/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7418
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib7a6bc5610aaf2450f91348d94cfb984c6a4b78d
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] Disable Kerberos testing to unblock builds

2017-07-14 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has abandoned this change.

Change subject: Disable Kerberos testing to unblock builds
..


Abandoned

-- 
To view, visit http://gerrit.cloudera.org:8080/7429
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: Icad5e69c7d63012cbefdd384ac05b7750221f8e0
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Henry Robinson 


[Impala-ASF-CR] Test: change linking order for crypto and ssl

2017-07-14 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has abandoned this change.

Change subject: Test: change linking order for crypto and ssl
..


Abandoned

-- 
To view, visit http://gerrit.cloudera.org:8080/7431
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: I41438edd8728bb8746a85aa0b2861daa4a16752c
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Henry Robinson 


[Impala-ASF-CR] Unbreak the build

2017-07-14 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has abandoned this change.

Change subject: Unbreak the build
..


Abandoned

-- 
To view, visit http://gerrit.cloudera.org:8080/7435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: Ie8bc5e41e8d5ae2b5ebf93c2380d651d99df10aa
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Michael Ho 


[Impala-ASF-CR] IMPALA-5592: Fix computation of uncompressed row batch size

2017-07-14 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has abandoned this change.

Change subject: IMPALA-5592: Fix computation of uncompressed row batch size
..


Abandoned

-- 
To view, visit http://gerrit.cloudera.org:8080/7434
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: Ic8d285d1e4a1aa97c98c505351c18fea04c60311
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Michael Ho 


[Impala-ASF-CR] IMPALA-5498: Support for partial sorts

2017-07-14 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has uploaded a new patch set (#6).

Change subject: IMPALA-5498: Support for partial sorts
..

IMPALA-5498: Support for partial sorts

Impala currently supports total sorts (the entire set of data
is sorted) and top-n sorts (only the highest/lowest n elements
are sorted). This patch adds the ability to do partial sorts,
where the data is divided up into some number of subsets, each
of which is sorted individually.

It accomplishes this by adding a new exec node, PartialSortNode.
When PartialSortNode::GetNext() is called, it retrieves input
up to its memory limit, uses the existing Sorter class to sort
it, and outputs it. This is faster than a total sort with SortNode
as it avoids the need to spill if the input is larger than the
memory limit.

Future work will look into setting a more restrictive memory limit
on the PartialSortNode.

In the planner, the SortNode plan node is used, with an enum value
indicating if it is a total or partial sort.

Adds a new counter 'RunSize' to the runtime profile which tracks
the min, max, and avg size of the generated runs, in tuples.

As a first use case, partial sort is used where a total sort was
used previously for inserts into Kudu.

Testing:
- E2E test with a large INSERT into a Kudu table with a mem limit.
  Checks that no spills occurred.
- Updated planner tests.
- Existing E2E tests and stress test verify correctness of INSERT.
- Perf tests on the 10 node cluster: inserting tpch_100.lineitem
  into a Kudu table with mem_limit=3gb:
  Previously: 5 runs are spilled, sort took 7m33s
  Now: no spills, sort takes 6m19s, for ~18% speedup

Change-Id: Ieec2a15a0cc5240b1c13682067ab64670d1e0a38
---
M be/src/exec/CMakeLists.txt
M be/src/exec/exec-node.cc
A be/src/exec/partial-sort-node.cc
A be/src/exec/partial-sort-node.h
M be/src/exec/sort-node.h
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/util/runtime-profile-counters.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/SortNode.java
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-upsert.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test
16 files changed, 460 insertions(+), 67 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/7267/6
-- 
To view, visit http://gerrit.cloudera.org:8080/7267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ieec2a15a0cc5240b1c13682067ab64670d1e0a38
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] [SECURITY] Make KRPC work with Kerberos

2017-07-14 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has abandoned this change.

Change subject: [SECURITY] Make KRPC work with Kerberos
..


Abandoned

-- 
To view, visit http://gerrit.cloudera.org:8080/7426
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: Ic27ac6a42fd43ef67994f6b0120c38b15971d24e
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-5659: Bump toolchain version

2017-07-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5659: Bump toolchain version
..


IMPALA-5659: Bump toolchain version

This bumps the toolchain version to include the fix which ensured gflags
built shared libraries. Impala does not currently depend on that change,
but an upcoming change will do so.

Change-Id: I9fe91b37ae00cd3d2e35b6a5c465f55238f32d1a
Reviewed-on: http://gerrit.cloudera.org:8080/7422
Reviewed-by: Matthew Jacobs 
Tested-by: Impala Public Jenkins
---
M bin/impala-config.sh
1 file changed, 1 insertion(+), 1 deletion(-)

Approvals:
  Impala Public Jenkins: Verified
  Matthew Jacobs: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/7422
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I9fe91b37ae00cd3d2e35b6a5c465f55238f32d1a
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 


[Impala-ASF-CR] IMPALA-5659: Bump toolchain version

2017-07-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5659: Bump toolchain version
..


Patch Set 1: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7422
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9fe91b37ae00cd3d2e35b6a5c465f55238f32d1a
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-{4670,4672,4784}: Add RpcMgr and port Statestore services to KRPC

2017-07-14 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has uploaded a new patch set (#16).

Change subject: IMPALA-{4670,4672,4784}: Add RpcMgr and port Statestore 
services to KRPC
..

IMPALA-{4670,4672,4784}: Add RpcMgr and port Statestore services to KRPC

This patch adds the core abstractions necessary for using Kudu RPC for
Impala's backend services. The StatestoreService and
StatestoreSubscriberService are both ported to KRPC to demonstrate how
the new RPC framework operates.

The main new class is RpcMgr, which is responsible for both services run
by a daemon and for obtaining clients to remote services. The new Rpc
class helps clients build rpc invocation objects, and use them to call
remote methods.

Also done:
 * Utility code to convert from a Thrift structure to a Protobuf
   wrapper. Thrift RPCs do not need to be ported to Protobuf to work.
 * Added more build support for Protobuf generation. All library targets
   now depend on the Protobuf generation step, just as they do Thrift.
 * thrift-server-test is rewritten to use Beeswax as a test service,
   since the old StatestoreService has been removed.
 * Remove InProcessStatestore that was only used for statestore-test and
   mini-impala-cluster. As far as we know, mini-impala-cluster is unused
   by anyone. (IMPALA-4784)

TESTING:
 * Impala's test suite passes.
 * statestore-test has been rewritten to use new statestore code. It
   includes all test cases from the now-removed Python test_statestore.py
 * rpc-mgr-test unit-tests the RpcMgr and Rpc classes.

TODO:
 * Enable SSL and Kerberos support (already in Kudu's imported
   libraries)

Change-Id: I8dbf12b9ecd71d26d239d31c19b487175194c766
---
M be/generated-sources/gen-cpp/CMakeLists.txt
M be/src/catalog/CMakeLists.txt
M be/src/catalog/catalog-server.cc
M be/src/catalog/catalog-server.h
M be/src/codegen/CMakeLists.txt
M be/src/common/CMakeLists.txt
M be/src/exec/CMakeLists.txt
M be/src/exec/catalog-op-executor.cc
M be/src/exec/kudu-util.h
M be/src/experiments/CMakeLists.txt
M be/src/exprs/CMakeLists.txt
M be/src/rpc/CMakeLists.txt
A be/src/rpc/common.proto
A be/src/rpc/rpc-mgr-test.cc
A be/src/rpc/rpc-mgr.cc
A be/src/rpc/rpc-mgr.h
A be/src/rpc/rpc-mgr.inline.h
A be/src/rpc/rpc.h
A be/src/rpc/rpc_test.proto
M be/src/rpc/thrift-server-test.cc
M be/src/rpc/thrift-util.h
M be/src/runtime/CMakeLists.txt
M be/src/runtime/bufferpool/CMakeLists.txt
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/scheduling/CMakeLists.txt
M be/src/service/CMakeLists.txt
M be/src/service/client-request-state.cc
M be/src/statestore/CMakeLists.txt
D be/src/statestore/statestore-service-client-wrapper.h
D be/src/statestore/statestore-subscriber-client-wrapper.h
M be/src/statestore/statestore-subscriber.cc
M be/src/statestore/statestore-subscriber.h
M be/src/statestore/statestore-test.cc
M be/src/statestore/statestore.cc
M be/src/statestore/statestore.h
A be/src/statestore/statestore.proto
M be/src/statestore/statestored-main.cc
M be/src/testutil/CMakeLists.txt
M be/src/testutil/in-process-servers.cc
M be/src/testutil/in-process-servers.h
D be/src/testutil/mini-impala-cluster.cc
M be/src/transport/CMakeLists.txt
M be/src/udf/CMakeLists.txt
M be/src/udf_samples/CMakeLists.txt
M be/src/util/CMakeLists.txt
M be/src/util/collection-metrics.h
M be/src/util/counting-barrier.h
M bin/start-impala-cluster.py
M bin/start-impalad.sh
M common/thrift/CMakeLists.txt
M common/thrift/StatestoreService.thrift
D tests/statestore/test_statestore.py
53 files changed, 1,733 insertions(+), 1,219 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/5720/16
-- 
To view, visit http://gerrit.cloudera.org:8080/5720
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8dbf12b9ecd71d26d239d31c19b487175194c766
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Juan Yu 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] Unbreak the build

2017-07-14 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7435

Change subject: Unbreak the build
..

Unbreak the build

Change-Id: Ie8bc5e41e8d5ae2b5ebf93c2380d651d99df10aa
---
M CMakeLists.txt
1 file changed, 9 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/7435/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ie8bc5e41e8d5ae2b5ebf93c2380d651d99df10aa
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Michael Ho 


[Impala-ASF-CR] IMPALA-4669: [KUTIL] Add kudu util library to the build.

2017-07-14 Thread Sailesh Mukil (Code Review)
Hello Impala Public Jenkins, Michael Ho, Matthew Jacobs, Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5715

to look at the new patch set (#23).

Change subject: IMPALA-4669: [KUTIL] Add kudu_util library to the build.
..

IMPALA-4669: [KUTIL] Add kudu_util library to the build.

A few miscellaneous changes to allow kudu_util to compile with Impala.

Add kudu_version.cc to substitute for the version.cc file that is
automatically built during the full Kudu build.

Set LZ4_DISABLE_DEPRECATE_WARNINGS to allow Kudu's compressor utility to
use deprecated names for LZ4 methods.

Add NO_NVM_SUPPORT flag to Kudu build (plan to upstream this later) to
disable building with nvm support, removing a library dependency.

Also remove imported FindOpenSSL.cmake in favour of the standard one provided
by cmake itself.

Change-Id: I8da798ee55506d6f969416b17c191eb03cb215f5
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/src/common/CMakeLists.txt
A be/src/common/kudu_version.cc
M be/src/common/logging.cc
M be/src/exec/kudu-util.h
A be/src/kudu/gutil
M be/src/kudu/util/CMakeLists.txt
M be/src/kudu/util/cache.cc
M be/src/kudu/util/compression/compression_codec.cc
M be/src/kudu/util/env.cc
M be/src/kudu/util/env_posix.cc
M be/src/kudu/util/flags.cc
A be/src/kudu/util/kudu_export.h
M be/src/kudu/util/locks.h
M be/src/kudu/util/logging.cc
M be/src/kudu/util/minidump.cc
D cmake_modules/FindOpenSSL.cmake
18 files changed, 140 insertions(+), 90 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/5715/23
-- 
To view, visit http://gerrit.cloudera.org:8080/5715
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8da798ee55506d6f969416b17c191eb03cb215f5
Gerrit-PatchSet: 23
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4671: Add Impala-native ServicePool

2017-07-14 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7423

Change subject: IMPALA-4671: Add Impala-native ServicePool
..

IMPALA-4671: Add Impala-native ServicePool

Add a ServicePool implementation that uses Impala's native threads, and
hooks into Impala's instrumentation APIs.

Change-Id: I1fe97dcdd45f0984913a52677f3d13136c641786
---
M be/src/catalog/catalog-server.h
M be/src/catalog/catalogd-main.cc
M be/src/rpc/CMakeLists.txt
A be/src/rpc/impala-service-pool.cc
A be/src/rpc/impala-service-pool.h
M be/src/rpc/rpc-mgr.cc
M be/src/rpc/rpc-mgr.h
M be/src/rpc/rpc-trace.cc
M be/src/rpc/rpc-trace.h
M be/src/service/impalad-main.cc
M be/src/statestore/statestore.h
M be/src/statestore/statestored-main.cc
M be/src/util/histogram-metric.h
M www/common-header.tmpl
M www/rpcz.tmpl
15 files changed, 570 insertions(+), 14 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/7423/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7423
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I1fe97dcdd45f0984913a52677f3d13136c641786
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Michael Ho 


[Impala-ASF-CR] [SECURITY] Enable MiniKdc and add kerberized BE tests

2017-07-14 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7427

Change subject: [SECURITY] Enable MiniKdc and add kerberized BE tests
..

[SECURITY] Enable MiniKdc and add kerberized BE tests

This patch enables the MiniKdc and converts the rpc-mgr-test
to have a kerberized mode, which uses the MiniKdc to run the tests
configured with Kerberos enabled.

There are a couple of small code changes on the KuduRPC side as
those changes are currently required to avoid a clash between
Impala's and Kudu's util libraries.

Change-Id: I231c7cbc7433fa84da780b6168ae1b30ddeeab57
---
M CMakeLists.txt
M be/src/kudu/security/CMakeLists.txt
M be/src/kudu/security/test/mini_kdc.cc
M be/src/rpc/CMakeLists.txt
M be/src/rpc/authentication.cc
M be/src/rpc/rpc-mgr-test.cc
A cmake_modules/FindKerberosPrograms.cmake
7 files changed, 165 insertions(+), 26 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/7427/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7427
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I231c7cbc7433fa84da780b6168ae1b30ddeeab57
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-4856: Port data stream service to KRPC

2017-07-14 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has uploaded a new patch set (#6).

Change subject: IMPALA-4856: Port data stream service to KRPC
..

IMPALA-4856: Port data stream service to KRPC

This patch ports the data-flow parts of ImpalaInternalService to KRPC.

* ImpalaInternalService is split into two services. The first,
  ImpalaInternalService, deals with control messages for plan fragment
  instance execution, cancellation and reporting, and remains
  implemented in Thrift for now. The second, DataStreamService, handles
  large-payload RPCs for transmitting runtime filters and row batches
  between hosts.

* In the DataStreamService, all RPCs use 'native' protobuf. The
  DataStreamService starts on the port previously reserved for the
  StatestoreSubscriberService (which is also a KRPC service), to avoid
  having to configure another port when starting Impala. When the
  ImpalaInternalService is ported to KRPC, all services will run on one
  port.

* To support needing to address two different backend services, a data
  service port has been added to TBackendDescriptor.

* This patch adds support for asynchronous RPCs to the RpcMgr and Rpc
  classes. Previously, Impala used fixed size thread pools + synchronous
  RPCs to achieve some parallelism for 'broadcast' RPCs like filter
  propagation, or a dedicated per-sender+receiver pair thread on the
  sender side in the DataStreamSender case. In this patch, the
  PublishFilter() and TransmitData() RPCs are sent asynchronously using
  KRPC's thread pools.

* The TransmitData() protocol has changed to adapt to asynchronous
  RPCs. The full details are in data-stream-mgr.h.

* As a result, DataStreamSender no longer creates a
  thread-per-connection on the sender side.

* Both tuple transmission and runtime filter publication use sidecars to
  minimise the number of copies and serialization steps required.

* Also include a fix for KUDU-2011 that properly allows sidecars to be
  shared between KRPC and the RPC caller (fixing IMPALA-5093, a
  corruption bug).

* A large portion of this patch is the replacement of TRowBatch with its
  Protobuf equivalent, RowBatchPb. The replacement is a literal port of
  the data structure, and row-batch-test, row-batch-list-test and
  row-batch-serialize-benchmark continue to execute without major logic
  changes.

* Simplify FindRecvr() logic in DataStreamManager. No-longer need to
  handle blocking sender-side, so no need for complex promise-based
  machinery. Instead, all senders with no receiver are added to a
  per-receiver list, which is processed when the receiver arrives. If it
  does not arrive promptly, the DataStreamManager cleans them up after
  FLAGS_datastream_sender_timeout_ms.

* This patch also begins a clean-up of how ImpalaServer instances are
  created (by removing CreateImpalaServer), and clarifying the
  relationship between ExecEnv and ImpalaServer. ImpalaServer now
  follows the standard construct->Init()->Start()->Join() lifecycle that
  we use for other services.

* Ensure that all addresses used for KRPCs are fully resolved, avoiding
  the need to resolve them for each RPC.

TESTING
---

* New tests added to rpc-mgr-test.

TO DO
-

* Re-enable throughput and latency measurements per data-stream sender
  when that information is exposed from KRPC (KUDU-1738).

* TLS and Kerberos are still not supported by KRPC in this patch.

Change-Id: Ia66704be7a0a8162bb85556d07b583ec756c584b
---
M .clang-format
M CMakeLists.txt
M be/CMakeLists.txt
M be/src/benchmarks/bloom-filter-benchmark.cc
M be/src/benchmarks/row-batch-serialize-benchmark.cc
M be/src/common/init.cc
M be/src/common/status.cc
M be/src/common/status.h
M be/src/exprs/expr-test.cc
M be/src/kudu/rpc/rpc_sidecar.cc
M be/src/kudu/rpc/rpc_sidecar.h
M be/src/rpc/CMakeLists.txt
M be/src/rpc/TAcceptQueueServer.cpp
M be/src/rpc/common.proto
M be/src/rpc/rpc-mgr-test.cc
M be/src/rpc/rpc-mgr.h
M be/src/rpc/rpc-mgr.inline.h
M be/src/rpc/rpc.h
M be/src/runtime/backend-client.h
M be/src/runtime/client-cache-types.h
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator-filter-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/data-stream-mgr.cc
M be/src/runtime/data-stream-mgr.h
M be/src/runtime/data-stream-recvr.cc
M be/src/runtime/data-stream-recvr.h
M be/src/runtime/data-stream-sender.cc
M be/src/runtime/data-stream-sender.h
M be/src/runtime/data-stream-test.cc
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/row-batch-serialize-test.cc
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-bank.h
M be/src/scheduling/backend-config-test.cc
M be/src/scheduling/backe

[Impala-ASF-CR] IMPALA-5592: Fix computation of uncompressed row batch size

2017-07-14 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7434

Change subject: IMPALA-5592: Fix computation of uncompressed row batch size
..

IMPALA-5592: Fix computation of uncompressed row batch size

The uncompressed row batch size is the size of a row batch
if the tuple data is sent without compression. It's computed
by the following formula:

serialized row batch size - compressed tuple data size + uncompressed tuple 
data size

The existing code is using the wrong field for the compressed tuple
data size and it causes uncompressed data size to be always identical
to the compressed row batch size. This change fixes the problem by
using the right field for the compressed tuple data size.

Change-Id: Ic8d285d1e4a1aa97c98c505351c18fea04c60311
---
M be/src/runtime/data-stream-sender.cc
1 file changed, 9 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/34/7434/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7434
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ic8d285d1e4a1aa97c98c505351c18fea04c60311
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Michael Ho 


[Impala-ASF-CR] tmp: force static linking against ssl

2017-07-14 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7432

Change subject: tmp: force static linking against ssl
..

tmp: force static linking against ssl

Change-Id: I0e927f9a51241a3ade4eb253d4a8d33b2650bedb
---
M CMakeLists.txt
1 file changed, 1 insertion(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/7432/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7432
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I0e927f9a51241a3ade4eb253d4a8d33b2650bedb
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Henry Robinson 


[Impala-ASF-CR] Add two-way negotiation thread pools

2017-07-14 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7430

Change subject: Add two-way negotiation thread pools
..

Add two-way negotiation thread pools

Change-Id: Ia4cbdd0e7e3b570e8d4e8c52fbff5d353532a310
---
M be/src/kudu/rpc/messenger.cc
M be/src/kudu/rpc/messenger.h
M be/src/kudu/rpc/reactor.cc
M be/src/runtime/data-stream-recvr.cc
M be/src/runtime/data-stream-recvr.h
5 files changed, 18 insertions(+), 15 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/7430/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7430
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ia4cbdd0e7e3b570e8d4e8c52fbff5d353532a310
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Henry Robinson 


[Impala-ASF-CR] IMPALA-5667: Race in DataStremSender could cause TransmitData sidecar corruption

2017-07-14 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7436

Change subject: IMPALA-5667: Race in DataStremSender could cause TransmitData 
sidecar corruption
..

IMPALA-5667: Race in DataStremSender could cause TransmitData sidecar corruption

This is caused by a race in the DataStreamSender where if a query is
cancelled while 2 TransmitData RPCs for HASH_PARTITIONED exchanges
are in progress, a third TransmitData RPC may overwrite the sidecar
pointed to by the first RPC while the first RPC is in progress. This
could also happen if the DataStreamSender instance has somehow
asynchronously been closed, or if a previous RPC has returned with a
DATASTREAM_RECVR_ALREADY_GONE error.

The entire sequence of events is listed in the JIRA and explained more
clearly there as it's a long and complicated explanation.

The fix is basically to return a cancelled status if the DSS realizes
that the query is being cancelled. We previously returned an OK status
since some higher level execution node would take care of propogating
this cancelled status to the rest of the execution tree. This doesn't
work well for TransmitData to HASH_PARTITIONED exchanges since we don't
check for cancellation until we've consumed the entire RowBatch input
to the DSS. In that time, we could potentially have multiple RPCs
attempting to serialize the outbound protobuf sidecars only to later
realize that the query has been cancelled and throw them away.

Change-Id: I7742551236205c49393a7351291eae6aefb4fa09
---
M be/src/runtime/data-stream-sender.cc
1 file changed, 8 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/7436/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7436
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I7742551236205c49393a7351291eae6aefb4fa09
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 


[Impala-ASF-CR] [SECURITY] Make KRPC work with Kerberos

2017-07-14 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7426

Change subject: [SECURITY] Make KRPC work with Kerberos
..

[SECURITY] Make KRPC work with Kerberos

KuduRPC itself has experimental support for Kerberos. However, since
Impala's client transport still uses the Thrift transport stack, we
need to make sure that a single security configuration applies to both
internal communication (KuduRPC) and external communication (Thrift's
TSaslTransport).

This patch achieves that. However, slight modification of the KuduRPC
code is required which isn't ideal, but there's currently no way around
it.

Change-Id: Ic27ac6a42fd43ef67994f6b0120c38b15971d24e
---
M be/src/kudu/rpc/client_negotiation.cc
M be/src/kudu/rpc/server_negotiation.cc
A be/src/rpc/authentication-export.h
M be/src/rpc/authentication-test.cc
M be/src/rpc/authentication.cc
5 files changed, 104 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/7426/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7426
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ic27ac6a42fd43ef67994f6b0120c38b15971d24e
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 


[Impala-ASF-CR] Honor enable shared from this<> rule of sharing only pre-shared objects

2017-07-14 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7433

Change subject: Honor enable_shared_from_this<> rule of sharing only pre-shared 
objects
..

Honor enable_shared_from_this<> rule of sharing only pre-shared objects

Change-Id: Ib32e247c270f480ce203058f1a5aa8445b5bcec6
---
M be/src/runtime/data-stream-sender.cc
1 file changed, 2 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/7433/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7433
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ib32e247c270f480ce203058f1a5aa8445b5bcec6
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-4874: Increase the maximum KRPC message size to 4GB

2017-07-14 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has uploaded a new patch set (#2).

Change subject: IMPALA-4874: Increase the maximum KRPC message size to 4GB
..

IMPALA-4874: Increase the maximum KRPC message size to 4GB

KRPC restricts the maximum message size with --rpc_max_message_size. The
Protobuf API used to deserialize messages also has a maximum size. Both
can be raised, and should be as a stop-gap until Impala has mechanisms
to prevent all its RPC payloads getting too large.

Note that some statestore payloads may still be larger than the 4GB
max. IMPALA-4875 will handle that for the statestore case particularly.

TESTING
---

* This allows test_very_large_string to pass with KRPC.
* Tests added to statestore-test and rpc-mgr-test for large payloads.

Change-Id: Idb97dd4604548c404278fe8094db2e419021a7c3
---
M be/src/kudu/rpc/serialization.cc
M be/src/rpc/rpc-mgr-test.cc
M be/src/rpc/rpc-mgr.cc
M be/src/rpc/rpc-mgr.h
M be/src/rpc/rpc_test.proto
M be/src/statestore/statestore-test.cc
6 files changed, 87 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/5887/2
-- 
To view, visit http://gerrit.cloudera.org:8080/5887
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idb97dd4604548c404278fe8094db2e419021a7c3
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Michael Ho 


[Impala-ASF-CR] Test: change linking order for crypto and ssl

2017-07-14 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7431

Change subject: Test: change linking order for crypto and ssl
..

Test: change linking order for crypto and ssl

Change-Id: I41438edd8728bb8746a85aa0b2861daa4a16752c
---
M be/src/kudu/security/CMakeLists.txt
1 file changed, 2 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/7431/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7431
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I41438edd8728bb8746a85aa0b2861daa4a16752c
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Henry Robinson 


[Impala-ASF-CR] Disable Kerberos testing to unblock builds

2017-07-14 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7429

Change subject: Disable Kerberos testing to unblock builds
..

Disable Kerberos testing to unblock builds

Change-Id: Icad5e69c7d63012cbefdd384ac05b7750221f8e0
---
M be/src/rpc/rpc-mgr-test.cc
1 file changed, 1 insertion(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/7429/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7429
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Icad5e69c7d63012cbefdd384ac05b7750221f8e0
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Henry Robinson 


[Impala-ASF-CR] [SECURITY] Use KRPC's Kinit code to avoid expensive fork

2017-07-14 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7428

Change subject: [SECURITY] Use KRPC's Kinit code to avoid expensive fork
..

[SECURITY] Use KRPC's Kinit code to avoid expensive fork

Impala currently kinits by forking off a subprocess. This
has proved to be expensive in many cases since the subprocess
tries to reserve as much memory as Impala is currently using
which can be quite a lot.

This patch removes our kinit-thread and uses KRPCs kinit code
instead which programatically uses the krb5 library to kinit
and renew tickets.

Testing: Verified with rpc-mgr-test and also manually on a
live kerberized cluster.

Change-Id: I576ff4d2ffcca75f9247132e11c7a2ca50b255bc
---
M be/src/kudu/security/init.cc
M be/src/rpc/CMakeLists.txt
M be/src/rpc/auth-provider.h
M be/src/rpc/authentication.cc
4 files changed, 11 insertions(+), 68 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/7428/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7428
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I576ff4d2ffcca75f9247132e11c7a2ca50b255bc
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-4875: Bound maximum statestore topic update size to 2GB.

2017-07-14 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7425

Change subject: IMPALA-4875: Bound maximum statestore topic update size to 2GB.
..

IMPALA-4875: Bound maximum statestore topic update size to 2GB.

This patch enforces a maximum topic update size for the statestore to be
2GB. If that limit is exceeded, the statestore will split topic updates
over several RPCs to the subscriber. If a single topic key + value
exceeds the limit, it will be dropped by the statestore when the
subscriber tries to publish it.

Doing so introduces the possibility that a catalog update (all updates
pertaining to a single catalog version) might be split across two
separate UpdateState() RPCs. To ensure that all updates are flushed to
the catalog FE at the same time, we buffer updates until a new object of
type CATALOG is received.

Remove the special treatment of deletions in the UpdateState()
RPC. Previously they were not sent with their version.

New statestore tests added to ensure:

* Update sizes don't exceed the configured max.
* 'from' version is properly handled (not previously tested) so that
  subscribers can process only a prefix of the update log.

TODO : Check min subscriber version

Change-Id: Icbe42704d3c98d9619e3737786b75ee67fb2a264
---
M be/src/catalog/catalog-server.cc
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/admission-controller.h
M be/src/scheduling/scheduler-test-util.cc
M be/src/scheduling/scheduler.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/statestore/statestore-test.cc
M be/src/statestore/statestore.cc
M be/src/statestore/statestore.h
M common/thrift/StatestoreService.thrift
11 files changed, 308 insertions(+), 163 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/7425/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7425
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Icbe42704d3c98d9619e3737786b75ee67fb2a264
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Michael Ho 


[Impala-ASF-CR] Downgrade log level of "sampled call" message

2017-07-14 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7424

Change subject: Downgrade log level of "sampled call" message
..

Downgrade log level of "sampled call" message

Change-Id: I77d50f0f5b47cf3ff28253cf7d0f970853f8a017
---
M be/src/kudu/rpc/rpcz_store.cc
1 file changed, 1 insertion(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/24/7424/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7424
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I77d50f0f5b47cf3ff28253cf7d0f970853f8a017
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Michael Ho 


[Impala-ASF-CR] IMPALA-5659: Begin standardizing treatment of thirdparty libraries

2017-07-14 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-5659: Begin standardizing treatment of thirdparty 
libraries
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7418/1/CMakeLists.txt
File CMakeLists.txt:

PS1, Line 232: FLATBUFFERS_LIBS
nit: FLATBUFFERS_STATIC_LIB for consistency. May need to update flatbuffer's 
FindCMake files.


-- 
To view, visit http://gerrit.cloudera.org:8080/7418
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib7a6bc5610aaf2450f91348d94cfb984c6a4b78d
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5520: TopN node periodically reclaims old allocations

2017-07-14 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5520: TopN node periodically reclaims old allocations
..


Patch Set 5:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/7400/5//COMMIT_MSG
Commit Message:

PS5, Line 19: test_tpch_queries
The test doesn't belong in this file, see my comment in the test file itself.


http://gerrit.cloudera.org:8080/#/c/7400/5/be/src/exec/topn-node-ir.cc
File be/src/exec/topn-node-ir.cc:

PS5, Line 53: )
: PushHeap(priority_queue_,
: 
ComparatorWrapper(*tuple_row_less_than_), insert_tuple);
wrap this in { } per the c++ style guide:
https://google.github.io/styleguide/cppguide.html#Conditionals


http://gerrit.cloudera.org:8080/#/c/7400/5/be/src/exec/topn-node.h
File be/src/exec/topn-node.h:

PS5, Line 114: tuple_pool
tuple_pool_


PS5, Line 126: modified only using
 :   /// push_heap()/pop_heap() to maintain ordered heap invariants
this makes it sound like it should be encapsulated in an inner class, with the 
priority queue made private


PS5, Line 133: priority_queue_
remove the trailing _
this should reference the parameter, which is priority_queue (not the member 
var)


PS5, Line 133:   /// Helper methods for modifying priority_queue_ while 
maintaining ordered heap
 :   /// invariants
 :   inline void PushHeap(std::vector& priority_queue,
 :   const ComparatorWrapper& comparator, 
Tuple* const insert_row) {
 : priority_queue.push_back(insert_row);
 : std::push_heap(priority_queue.begin(), priority_queue.end(), 
comparator);
 :   }
 : 
 :   inline void PopHeap(std::vector& priority_queue,
 :   const ComparatorWrapper& comparator) {
 : std::pop_heap(priority_queue.begin(), priority_queue.end(), 
comparator);
 : priority_queue.pop_back();
move these functions up where the other private functions are


http://gerrit.cloudera.org:8080/#/c/7400/5/tests/query_test/test_tpch_queries.py
File tests/query_test/test_tpch_queries.py:

PS5, Line 74: cls.ImpalaTestMatrix.add_constraint(lambda v:\
: v.get_value('table_format').file_format == 'text')
: cls.ImpalaTestMatrix.add_constraint(lambda v:\
: v.get_value('table_format').compression_codec == 'none')
helper fn to do this:


cls.ImpalaTestMatrix.add_dimension(create_uncompressed_text_dimension(cls.get_workload()))


PS5, Line 56: class TestTopNReclaimQuery(ImpalaTestSuite):
:   """Test class to validate that TopN periodically reclaims tuple 
pool memory
:and runs with a lower memory footprint."""
:   QUERY = "select * from tpch.lineitem order by l_orderkey desc 
limit 10;"
: 
:   # Mem limit empirically selected so that the query fails if 
tuple pool reclamation
:   # is not implemented for TopN
:   MEM_LIMIT = "50m"
: 
:   @classmethod
:   def get_workload(self):
: return 'tpch'
: 
:   @classmethod
:   def add_test_dimensions(cls):
: super(TestTopNReclaimQuery, cls).add_test_dimensions()
: # The tpch tests take a long time to execute so restrict the 
combinations they
: # execute over.
: cls.ImpalaTestMatrix.add_constraint(lambda v:\
: v.get_value('table_format').file_format == 'text')
: cls.ImpalaTestMatrix.add_constraint(lambda v:\
: v.get_value('table_format').compression_codec == 'none')
: 
:   def test_top_n_reclaim(self, vector):
: exec_options = vector.get_value('exec_option')
: exec_options['mem_limit'] = self.MEM_LIMIT
: result = self.execute_query(self.QUERY, exec_options)
: runtime_profile = str(result.runtime_profile)
: num_of_times_tuple_pool_reclaimed = re.findall(
:   'TuplePoolReclamations: ([0-9]*)', runtime_profile)
: # Confirm newly added counter is visible
: assert len(num_of_times_tuple_pool_reclaimed) > 0
: # Tuple pool is expected to be reclaimed for this query
: for n in num_of_times_tuple_pool_reclaimed:
:   assert int(n) > 0
this doesn't belong in test_tpch_queries.py, it just happens to use a tpch query

test_sort.py or test_queries.py could both be appropriate.


-- 
To view, visit http://gerrit.cloudera.org:8080/7400
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I968f57f0ff2905bd581908bc5c5ee486b31e6aa8
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet 

[Impala-ASF-CR] IMPALA-5520: TopN node periodically reclaims old allocations

2017-07-14 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5520: TopN node periodically reclaims old allocations
..


Patch Set 5:

Thanks, Tim. Yeah I'll do another pass.

-- 
To view, visit http://gerrit.cloudera.org:8080/7400
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I968f57f0ff2905bd581908bc5c5ee486b31e6aa8
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5520: TopN node periodically reclaims old allocations

2017-07-14 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5520: TopN node periodically reclaims old allocations
..


Patch Set 5:

(3 comments)

Looks good aside from some minor things. MJ, I'm ok if you want to +2 after the 
remaining things are fixed.

http://gerrit.cloudera.org:8080/#/c/7400/5/be/src/exec/topn-node.cc
File be/src/exec/topn-node.cc:

PS5, Line 255: std::
Shouldn't be needed (common/names.h has "using std::unique_ptr").


http://gerrit.cloudera.org:8080/#/c/7400/5/be/src/exec/topn-node.h
File be/src/exec/topn-node.h:

PS5, Line 135: std::vector&
Our convention is to pass modified arguments by pointer  - 
https://google.github.io/styleguide/cppguide.html#Reference_Arguments


PS5, Line 135: inline
Could be static, since it doesn't reference member variables.


-- 
To view, visit http://gerrit.cloudera.org:8080/7400
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I968f57f0ff2905bd581908bc5c5ee486b31e6aa8
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5520: TopN node periodically reclaims old allocations

2017-07-14 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has uploaded a new patch set (#5).

Change subject: IMPALA-5520: TopN node periodically reclaims old allocations
..

IMPALA-5520: TopN node periodically reclaims old allocations

Currently TopN retains old string allocations in a tuple pool which is
held longer than necessary, resulting in unnecessary memory usage.
With this commit, the TopN node will periodically re-materialise the
rows stored in the priority queue and reclaim the old allocations.
This is done when the number of rows removed from the priority queue
is more than twice the N (limit + offset). Moreover, a new counter
called "TuplePoolReclamations" is added to the TopN node that keeps
track of the number of times the tuple pool is reclaimed.

Testing:
Test added to test_tpch_queries.py which sets a low mem_limit such
that the test would fail if reclamation is not implemented and pass
otherwise.

Performance:
Query 1 (expected general case):
select * from tpch.lineitem order by l_orderkey desc limit 10;

Query 2 (example worst case: data stored in reverse order before
feeding to the last TopN node):
select * from (select * from tpch.lineitem order by l_orderkey desc
limit 6001215) tb order by l_orderkey limit 10;

   With Reclaim   Without Reclaim
   Query 1 Query 2  Query 1 Query 2
MaxTuplePoolMem3.96 KB 3.43 KB  110.2 MB708.8 MB
Time (mean)2s 218ms6s 391ms 2s 021ms6s 406ms
Time (stdev)   74.38ms 67.45ms  102.71ms70.44ms
Reclaims910 5861  N/A N/A

We notice that memory footprint is orders of magnitude lower while
maintaining similar query runtimes. Cluster perf testing will be done
later.

Change-Id: I968f57f0ff2905bd581908bc5c5ee486b31e6aa8
---
M be/src/exec/topn-node-ir.cc
M be/src/exec/topn-node.cc
M be/src/exec/topn-node.h
M tests/query_test/test_tpch_queries.py
4 files changed, 120 insertions(+), 22 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/7400/5
-- 
To view, visit http://gerrit.cloudera.org:8080/7400
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I968f57f0ff2905bd581908bc5c5ee486b31e6aa8
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5520: TopN node periodically reclaims old allocations

2017-07-14 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has uploaded a new patch set (#5).

Change subject: IMPALA-5520: TopN node periodically reclaims old allocations
..

IMPALA-5520: TopN node periodically reclaims old allocations

Currently TopN retains old string allocations in a tuple pool which is
held longer than necessary, resulting in unnecessary memory usage.
With this commit, the TopN node will periodically re-materialise the
rows stored in the priority queue and reclaim the old allocations.
This is done when the number of rows removed from the priority queue
is more than twice the N (limit + offset). Moreover, a new counter
called "TuplePoolReclamations" is added to the TopN node that keeps
track of the number of times the tuple pool is reclaimed.

Testing:
Test added to test_tpch_queries.py which sets a low mem_limit such
that the test would fail if reclamation is not implemented and pass
otherwise.

Performance:
Query 1 (expected general case):
select * from tpch.lineitem order by l_orderkey desc limit 10;

Query 2 (example worst case: data stored in reverse order before
feeding to the last TopN node):
select * from (select * from tpch.lineitem order by l_orderkey desc
limit 6001215) tb order by l_orderkey limit 10;

   With Reclaim   Without Reclaim
   Query 1 Query 2  Query 1 Query 2
MaxTuplePoolMem3.96 KB 3.43 KB  110.2 MB708.8 MB
Time (mean)2s 218ms6s 391ms 2s 021ms6s 406ms
Time (stdev)   74.38ms 67.45ms  102.71ms70.44ms
Reclaims910 5861  N/A N/A

We notice that memory footprint is orders of magnitude lower while
maintaining similar query runtimes. Cluster perf testing will be done
later.

Change-Id: I968f57f0ff2905bd581908bc5c5ee486b31e6aa8
---
M be/src/exec/topn-node-ir.cc
M be/src/exec/topn-node.cc
M be/src/exec/topn-node.h
M tests/query_test/test_tpch_queries.py
4 files changed, 120 insertions(+), 22 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/7400/5
-- 
To view, visit http://gerrit.cloudera.org:8080/7400
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I968f57f0ff2905bd581908bc5c5ee486b31e6aa8
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5520: TopN node periodically reclaims old allocations

2017-07-14 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has posted comments on this change.

Change subject: IMPALA-5520: TopN node periodically reclaims old allocations
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7400/3/be/src/exec/topn-node.h
File be/src/exec/topn-node.h:

Line 100: 
> The only member of ComparatorWrapper is the TupleRowComparator&. If you tre
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7400
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I968f57f0ff2905bd581908bc5c5ee486b31e6aa8
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4276: Profile displays non-default query options set by planner

2017-07-14 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has uploaded a new patch set (#3).

Change subject: IMPALA-4276: Profile displays non-default query options set by 
planner
..

IMPALA-4276: Profile displays non-default query options set by planner

Fix to populate the non-default query options set by planner in the
runtime profile.

Added a corresponding test case.

Change-Id: I08e9dc2bebb83101976bbbd903ee48c5068dbaab
---
M be/src/service/client-request-state.cc
M tests/query_test/test_observability.py
2 files changed, 23 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/7419/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7419
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I08e9dc2bebb83101976bbbd903ee48c5068dbaab
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Henry Robinson 


[Impala-ASF-CR] IMPALA-4276: Profile displays non-default query options set by planner

2017-07-14 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has uploaded a new patch set (#2).

Change subject: IMPALA-4276: Profile displays non-default query options set by 
planner
..

IMPALA-4276: Profile displays non-default query options set by planner

Fix to populate the non-default query options set by planner in the
runtime profile.

Added a corresponding test case.

Change-Id: I08e9dc2bebb83101976bbbd903ee48c5068dbaab
---
M be/src/service/client-request-state.cc
M be/src/service/query-options.cc
M tests/query_test/test_observability.py
3 files changed, 25 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/7419/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7419
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I08e9dc2bebb83101976bbbd903ee48c5068dbaab
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Henry Robinson 


[Impala-ASF-CR] IMPALA-4276: Profile displays non-default query options set by planner

2017-07-14 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has posted comments on this change.

Change subject: IMPALA-4276: Profile displays non-default query options set by 
planner
..


Patch Set 1:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/7419/1//COMMIT_MSG
Commit Message:

PS1, Line 10: expilictly
> explicitly
Removing this, according to the comment in query-options.cc, this is not a bug


PS1, Line 10: expilictly set
: default query options were also populated in the runtime profile.
> can you clarify what this means - what's an explicitly set default query op
same as above


http://gerrit.cloudera.org:8080/#/c/7419/1/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

PS1, Line 151: exec_request
> nit: use exec_request_
Done


http://gerrit.cloudera.org:8080/#/c/7419/1/be/src/service/query-options.cc
File be/src/service/query-options.cc:

PS1, Line 84: 
> can you quickly explain why this is right? It looks to me like we would wan
Thanks for pointing that out. I was under the impression that all values had a 
default value set during initialization when I looked at 
ImpalaInternalService_types.h
I now realize that some values dont have a default value set in the thrift file 
and this is indicated by __isset.NAME.
Hence reverting this.


http://gerrit.cloudera.org:8080/#/c/7419/1/tests/query_test/test_observability.py
File tests/query_test/test_observability.py:

PS1, Line 112: query = "set mem_limit = 0"
> can you change this to some other  query option, then test that MEM_LIMIT=8
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7419
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08e9dc2bebb83101976bbbd903ee48c5068dbaab
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Henry Robinson 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5659: Bump toolchain version

2017-07-14 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-5659: Bump toolchain version
..


Patch Set 1:

This could be combined with https://gerrit.cloudera.org/#/c/7418/, but I wanted 
to get it committed early so that the build machines have a chance to pick up 
the change (because the gflags version hasn't actually changed, the toolchain 
bootstrap won't replace an existing gflags install, so we want to wait for the 
build machines to be restarted, which happens periodocally).

-- 
To view, visit http://gerrit.cloudera.org:8080/7422
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9fe91b37ae00cd3d2e35b6a5c465f55238f32d1a
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5659: Bump toolchain version

2017-07-14 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5659: Bump toolchain version
..


Patch Set 1: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7422
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9fe91b37ae00cd3d2e35b6a5c465f55238f32d1a
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5659: Bump toolchain version

2017-07-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5659: Bump toolchain version
..


Patch Set 1:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/860/

-- 
To view, visit http://gerrit.cloudera.org:8080/7422
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9fe91b37ae00cd3d2e35b6a5c465f55238f32d1a
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5659: Bump toolchain version

2017-07-14 Thread Henry Robinson (Code Review)
Henry Robinson has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7422

Change subject: IMPALA-5659: Bump toolchain version
..

IMPALA-5659: Bump toolchain version

This bumps the toolchain version to include the fix which ensured gflags
built shared libraries. Impala does not currently depend on that change,
but an upcoming change will do so.

Change-Id: I9fe91b37ae00cd3d2e35b6a5c465f55238f32d1a
---
M bin/impala-config.sh
1 file changed, 1 insertion(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/7422/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7422
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I9fe91b37ae00cd3d2e35b6a5c465f55238f32d1a
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 


[Impala-ASF-CR] IMPALA-5539: Fix Kudu timestamp with -use local tz for unix ts

2017-07-14 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has uploaded a new patch set (#4).

Change subject: IMPALA-5539: Fix Kudu timestamp with -use_local_tz_for_unix_ts
..

IMPALA-5539: Fix Kudu timestamp with -use_local_tz_for_unix_ts

The -use_local_tz_for_unix_timestamp_conversion flag exists
to specify if TIMESTAMPs should be interpreted as localtime
or UTC when converting to/from Unix time via builtins:
  from_unixtime(bigint unixtime)
  unix_timestamp(string datetime[, ...])
  unix_timestamp(timestamp datetime)

However, the KuduScanner was calling into code that, when
the gflag above was set, interpreted Unix times as local
time.  Unfortunately the write path (KuduTableSink) and some
FE TIMESTAMP code (see KuduUtil.java) did not have this
behavior, i.e. we were handling the gflag inconsistently.

Tests:
* Adds a custom cluster test to run Kudu test cases with
  -use_local_tz_for_unix_timestamp_conversion.
* Adds tests for the new builtin
  unix_micros_to_utc_timestamp() which run in a custom
  cluster test (added test_local_tz_conversion.py) as well
  as in the regular tests (added to test_exprs.py).

Change-Id: I423a810427353be76aa64442044133a9a22cdc9b
---
M be/src/exec/kudu-scanner.cc
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.h
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/runtime/timestamp-value.inline.h
M common/function-registry/impala_functions.py
M fe/src/main/java/org/apache/impala/catalog/KuduColumn.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
A 
testdata/workloads/functional-query/queries/QueryTest/utc-timestamp-functions.test
M tests/custom_cluster/test_kudu.py
A tests/custom_cluster/test_local_tz_conversion.py
M tests/query_test/test_exprs.py
M tests/query_test/test_kudu.py
14 files changed, 106 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/7311/4
-- 
To view, visit http://gerrit.cloudera.org:8080/7311
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I423a810427353be76aa64442044133a9a22cdc9b
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Matthew Jacobs 


[Impala-ASF-CR] IMPALA-5539: Fix Kudu timestamp with -use local tz for unix ts

2017-07-14 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has uploaded a new patch set (#3).

Change subject: IMPALA-5539: Fix Kudu timestamp with -use_local_tz_for_unix_ts
..

IMPALA-5539: Fix Kudu timestamp with -use_local_tz_for_unix_ts

The -use_local_tz_for_unix_timestamp_conversion flag exists
to specify if TIMESTAMPs should be interpreted as localtime
or UTC when converting to/from Unix time via builtins:
  from_unixtime(bigint unixtime)
  unix_timestamp(string datetime[, ...])
  unix_timestamp(timestamp datetime)

However, the KuduScanner was calling into code that, when
the gflag above was set, interpreted Unix times as local
time.  Unfortunately the write path (KuduTableSink) and some
FE TIMESTAMP code (see KuduUtil.java) did not have this
behavior, i.e. we were handling the gflag inconsistently.

Tests:
* Adds a custom cluster test to run Kudu test cases with
  -use_local_tz_for_unix_timestamp_conversion.
* Adds tests for the new builtin
* unix_micros_to_utc_timestamp() which run in a custom
  cluster test (added test_local_tz_conversion.py) as well
  as in the regular tests (added to test_exprs.py).

Change-Id: I423a810427353be76aa64442044133a9a22cdc9b
---
M be/src/exec/kudu-scanner.cc
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.h
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/runtime/timestamp-value.inline.h
M common/function-registry/impala_functions.py
M fe/src/main/java/org/apache/impala/catalog/KuduColumn.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
A 
testdata/workloads/functional-query/queries/QueryTest/utc-timestamp-functions.test
M tests/custom_cluster/test_kudu.py
A tests/custom_cluster/test_local_tz_conversion.py
M tests/query_test/test_exprs.py
M tests/query_test/test_kudu.py
14 files changed, 106 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/7311/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7311
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I423a810427353be76aa64442044133a9a22cdc9b
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Matthew Jacobs 


[Impala-ASF-CR] IMPALA-5394: Handle blocked HS2 connections

2017-07-14 Thread John Sherman (Code Review)
John Sherman has posted comments on this change.

Change subject: IMPALA-5394: Handle blocked HS2 connections
..


Patch Set 3:

Yeah, I'll give it a go.

> Do you think that's something you could add to this patch?

-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5627: fix dropped statuses in HDFS writers

2017-07-14 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-5627: fix dropped statuses in HDFS writers
..


Patch Set 6: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7372
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I753d352c640faf5eaef650cd743e53de53761431
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5417: make I/O buffer queue fixed-size

2017-07-14 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#2).

Change subject: IMPALA-5417: make I/O buffer queue fixed-size
..

IMPALA-5417: make I/O buffer queue fixed-size

This removes the dynamically-varying queue size behaviour in the I/O
manager. The motivation is to bound resource consumption of scans
and make it possible to reserve memory for I/O buffers upfront.

Does some cleanup/documentation of the locking policy. Fix some cases
in ScanRange::GetNext() where members documented as being protected by
ScanRange::lock_ were accessed without holding it. I think the races
were either benign or prevented by holding DiskIoRequestContext::lock_
in practice.

Testing:
Ran exhaustive build.

Change-Id: If7cc3f7199f5320db00b7face97a96cdadb6f83f
---
M be/src/runtime/disk-io-mgr-internal.h
M be/src/runtime/disk-io-mgr-reader-context.cc
M be/src/runtime/disk-io-mgr-scan-range.cc
M be/src/runtime/disk-io-mgr-test.cc
M be/src/runtime/disk-io-mgr.cc
M be/src/runtime/disk-io-mgr.h
6 files changed, 266 insertions(+), 309 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/7408/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7408
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If7cc3f7199f5320db00b7face97a96cdadb6f83f
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5627: fix dropped statuses in HDFS writers

2017-07-14 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5627: fix dropped statuses in HDFS writers
..


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7372/5/be/src/exec/hdfs-parquet-table-writer.cc
File be/src/exec/hdfs-parquet-table-writer.cc:

PS5, Line 264: next_page_encoding_
> Should this say "same as 'current_encoding_'"?
Done


PS5, Line 265: if when
> nit: superfluous word?
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7372
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I753d352c640faf5eaef650cd743e53de53761431
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5627: fix dropped statuses in HDFS writers

2017-07-14 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#6).

Change subject: IMPALA-5627: fix dropped statuses in HDFS writers
..

IMPALA-5627: fix dropped statuses in HDFS writers

The change is mostly mechanical - added Status returns where
need.

In one place I restructured the the logic around
'current_encoding_' for Parquet to allow a cleaner solution
to the dropped status from FinalizeCurrentPage() call in
ProcessValue(): after the restructuring the call was no longer
needed. 'current_encoding_' was overloaded to represent both the
encoding of the current page and the preferred encoding
for subsequent pages.

Testing:
Ran exhaustive build.

Change-Id: I753d352c640faf5eaef650cd743e53de53761431
---
M be/src/exec/hdfs-avro-table-writer.cc
M be/src/exec/hdfs-avro-table-writer.h
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-parquet-table-writer.h
M be/src/exec/hdfs-sequence-table-writer.cc
M be/src/exec/hdfs-table-writer.h
M be/src/exec/parquet-column-stats.cc
M be/src/exec/parquet-column-stats.h
M be/src/exec/parquet-column-stats.inline.h
M be/src/runtime/string-buffer.h
10 files changed, 98 insertions(+), 71 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/7372/6
-- 
To view, visit http://gerrit.cloudera.org:8080/7372
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I753d352c640faf5eaef650cd743e53de53761431
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5627: fix dropped statuses in HDFS writers

2017-07-14 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-5627: fix dropped statuses in HDFS writers
..


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7372/5/be/src/exec/hdfs-parquet-table-writer.cc
File be/src/exec/hdfs-parquet-table-writer.cc:

PS5, Line 264: next_page_encoding_
Should this say "same as 'current_encoding_'"?


PS5, Line 265: if when
nit: superfluous word?


-- 
To view, visit http://gerrit.cloudera.org:8080/7372
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I753d352c640faf5eaef650cd743e53de53761431
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5498: Support for partial sorts

2017-07-14 Thread Mostafa Mokhtar (Code Review)
Mostafa Mokhtar has posted comments on this change.

Change subject: IMPALA-5498: Support for partial sorts
..


Patch Set 4:

Can you extend the Sort metrics in the query profile to include Avg, Min and 
Max run size?
This will be needed to reflect on the quality of the sorted data?

-- 
To view, visit http://gerrit.cloudera.org:8080/7267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ieec2a15a0cc5240b1c13682067ab64670d1e0a38
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: No