[native-toolchain-CR] IMPALA-5659: Build shared libs for gflags

2017-07-13 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-5659: Build shared libs for gflags
..


Patch Set 1: Verified+1

Passed a toolchain build

-- 
To view, visit http://gerrit.cloudera.org:8080/7415
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If1842768f21444ccf231cfd3fe5e51a4ca0d7244
Gerrit-PatchSet: 1
Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[native-toolchain-CR] IMPALA-5659: Build shared libs for gflags

2017-07-13 Thread Henry Robinson (Code Review)
Henry Robinson has submitted this change and it was merged.

Change subject: IMPALA-5659: Build shared libs for gflags
..


IMPALA-5659: Build shared libs for gflags

Change-Id: If1842768f21444ccf231cfd3fe5e51a4ca0d7244
---
M source/gflags/build.sh
1 file changed, 2 insertions(+), 1 deletion(-)

Approvals:
  Henry Robinson: Verified
  Matthew Jacobs: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/7415
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: If1842768f21444ccf231cfd3fe5e51a4ca0d7244
Gerrit-PatchSet: 1
Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Matthew Jacobs 


[Impala-ASF-CR] IMPALA-5116: Remove deprecated hash * types in gutil

2017-07-13 Thread Kim Jin Chul (Code Review)
Kim Jin Chul has posted comments on this change.

Change subject: IMPALA-5116: Remove deprecated hash_* types in gutil
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7414/3/be/src/gutil/hash/hash.h
File be/src/gutil/hash/hash.h:

PS3, Line 218: 
As I told in the previous comment, _gnu_ctx is replaced into std.

https://gerrit.cloudera.org/#/c/7414/1/be/src/gutil/hash/hash.h@a278


Line 251
You may know std already defined pointer type as below.

   /// Partial specializations for pointer types.
   template
 struct hash<_Tp*> : public __hash_base
 {
   size_t
   operator()(_Tp* __p) const noexcept
   { return reinterpret_cast(__p); }
 };

I guess your hash function is more faster than std's implementation, but there 
are pros and cons. I found the interesting article: 
https://stackoverflow.com/questions/20953390/what-is-the-fastest-hash-function-for-pointers

I have the following issue on this. Please answer it.

How to implement a customized hash specialization for pointer type.
I removed your code because it cannot be redefined. I think we need to 
introduce a new template class which extends std::hash if we want to keep 
own hash.


-- 
To view, visit http://gerrit.cloudera.org:8080/7414
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I06af30fd15acd43a60e3c30af54056c96520b6e9
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Kim Jin Chul 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Kim Jin Chul 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5116: Remove deprecated hash * types in gutil

2017-07-13 Thread Kim Jin Chul (Code Review)
Kim Jin Chul has uploaded a new patch set (#3).

Change subject: IMPALA-5116: Remove deprecated hash_* types in gutil
..

IMPALA-5116: Remove deprecated hash_* types in gutil

The following class templates are substituted from C++11 standard.
__gnu_cxx::hash_map => std::unordered_map
__gnu_cxx::hash_set => std::unordered_set
__gnu_cxx::hash => std::hash

Change-Id: I06af30fd15acd43a60e3c30af54056c96520b6e9
---
M be/src/gutil/hash/hash.cc
M be/src/gutil/hash/hash.h
M be/src/gutil/strings/join.h
M be/src/gutil/strings/serialize.cc
M be/src/gutil/strings/serialize.h
M be/src/gutil/strings/split.cc
M be/src/gutil/strings/split.h
7 files changed, 37 insertions(+), 121 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/7414/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7414
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I06af30fd15acd43a60e3c30af54056c96520b6e9
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Kim Jin Chul 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Kim Jin Chul 


[Impala-ASF-CR] IMPALA-5650: Make sum init zero a SUM function

2017-07-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5650: Make sum_init_zero a SUM function
..


IMPALA-5650: Make sum_init_zero a SUM function

The recent Parquet count(*) optimization (IMPALA-5036) introduced a new
aggregate function sum_init_zero(). This caused an issue for non
partitioned aggregation node, which checks that the function type is
either count, min, max, sum or ndv. Since sum_init_zero() was not
labelled as a sum() function, a DCHECK was hit. This issue is fixed by
labelling sum_init_zero() as a sum() function.

Testing:
- Verified that we no longer hit a DCHECK when running a query with a
  Parquet count(*) optimization with legacy agg node enabled.

Change-Id: Ibca53e3ef6bad5283550100b33db55d674cff0b9
Reviewed-on: http://gerrit.cloudera.org:8080/7404
Reviewed-by: Michael Ho 
Tested-by: Impala Public Jenkins
---
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exprs/agg-fn.cc
2 files changed, 10 insertions(+), 5 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Michael Ho: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/7404
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ibca53e3ef6bad5283550100b33db55d674cff0b9
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-5650: Make sum init zero a SUM function

2017-07-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5650: Make sum_init_zero a SUM function
..


Patch Set 4: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7404
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibca53e3ef6bad5283550100b33db55d674cff0b9
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5520: TopN node periodically reclaims old allocations

2017-07-13 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5520: TopN node periodically reclaims old allocations
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7400/3/be/src/exec/topn-node.h
File be/src/exec/topn-node.h:

Line 100:   boost::scoped_ptr> 
tuple_row_comparator_wrapper_;
> True, but every time POP_HEAP or PUSH_HEAP is used, it will incur the cost 
The only member of ComparatorWrapper is the TupleRowComparator&. If you treat 
the wrapper object as a value (i.e. don't heap allocate it) any decent compiler 
will be able to optimise that into just passing the TupleRowComparator* pointer 
around.


-- 
To view, visit http://gerrit.cloudera.org:8080/7400
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I968f57f0ff2905bd581908bc5c5ee486b31e6aa8
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4276: Profile displays non-default query options set by planner

2017-07-13 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-4276: Profile displays non-default query options set by 
planner
..


Patch Set 1:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/7419/1//COMMIT_MSG
Commit Message:

PS1, Line 10: expilictly
explicitly


PS1, Line 10: expilictly set
: default query options were also populated in the runtime profile.
can you clarify what this means - what's an explicitly set default query 
option? One that's set by the user to the default value?


http://gerrit.cloudera.org:8080/#/c/7419/1/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

PS1, Line 151: exec_request
nit: use exec_request_


http://gerrit.cloudera.org:8080/#/c/7419/1/be/src/service/query-options.cc
File be/src/service/query-options.cc:

PS1, Line 84: 
can you quickly explain why this is right? It looks to me like we would want to 
include any option that is set, if there is no default value for it.


http://gerrit.cloudera.org:8080/#/c/7419/1/tests/query_test/test_observability.py
File tests/query_test/test_observability.py:

PS1, Line 112: query = "set mem_limit = 0"
can you change this to some other  query option, then test that 
MEM_LIMIT=8589934592 is still in the profile as well?


-- 
To view, visit http://gerrit.cloudera.org:8080/7419
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08e9dc2bebb83101976bbbd903ee48c5068dbaab
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Henry Robinson 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5520: TopN node periodically reclaims old allocations

2017-07-13 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has uploaded a new patch set (#4).

Change subject: IMPALA-5520: TopN node periodically reclaims old allocations
..

IMPALA-5520: TopN node periodically reclaims old allocations

Currently TopN retains old string allocations in a tuple pool which is
held longer than necessary, resulting in unnecessary memory usage.
With this commit, the TopN node will periodically re-materialise the
rows stored in the priority queue and reclaim the old allocations.
This is done when the number of rows removed from the priority queue
is more than twice the N (limit + offset). Moreover, a new counter
called "TuplePoolReclamations" is added to the TopN node that keeps
track of the number of times the tuple pool is reclaimed.

Testing:
Test added to test_tpch_queries.py which sets a low mem_limit such
that the test would fail if reclamation is not implemented and pass
otherwise.

Performance:
Query 1 (expected general case):
select * from tpch.lineitem order by l_orderkey desc limit 10;

Query 2 (example worst case: data stored in reverse order before
feeding to the last TopN node):
select * from (select * from tpch.lineitem order by l_orderkey desc
limit 6001215) tb order by l_orderkey limit 10;

   With Reclaim   Without Reclaim
   Query 1 Query 2  Query 1 Query 2
MaxTuplePoolMem3.96 KB 3.43 KB  110.2 MB708.8 MB
Time (mean)2s 218ms6s 391ms 2s 021ms6s 406ms
Time (stdev)   74.38ms 67.45ms  102.71ms70.44ms
Reclaims910 5861  N/A N/A

We notice that memory footprint is orders of magnitude lower while
maintaining similar query runtimes. Cluster perf testing will be done
later.

Change-Id: I968f57f0ff2905bd581908bc5c5ee486b31e6aa8
---
M be/src/exec/topn-node-ir.cc
M be/src/exec/topn-node.cc
M be/src/exec/topn-node.h
M tests/query_test/test_tpch_queries.py
4 files changed, 124 insertions(+), 22 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/7400/4
-- 
To view, visit http://gerrit.cloudera.org:8080/7400
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I968f57f0ff2905bd581908bc5c5ee486b31e6aa8
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5520: TopN node periodically reclaims old allocations

2017-07-13 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has posted comments on this change.

Change subject: IMPALA-5520: TopN node periodically reclaims old allocations
..


Patch Set 3:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/7400/3/be/src/exec/topn-node.cc
File be/src/exec/topn-node.cc:

PS3, Line 244: tuple_row_comparator_wrapper_
> I think something like ComparatorWrapper(*tuple_row_less_than_) will work.
Please refer to related comment in topn-node.h


PS3, Line 259: NULL
> It would be best to pass in the RuntimeState* - that enables better logging
Done


PS3, Line 271: NULL
> Same here.
Done


http://gerrit.cloudera.org:8080/#/c/7400/3/be/src/exec/topn-node.h
File be/src/exec/topn-node.h:

Line 36: #define PUSH_HEAP(v, comp, elem) v.push_back(elem); \
> It's probably best to just make it an inline function, it doesn't seem like
Done


Line 100:   boost::scoped_ptr> 
tuple_row_comparator_wrapper_;
> We don't need to store this additional state, since all the state is in tup
True, but every time POP_HEAP or PUSH_HEAP is used, it will incur the cost of 
creating a new wrapper object,which will esp be a problem in InsertTupleRow 
method


PS3, Line 134: vector
> Also mention that it's updated with push_heap()/pop_heap().
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7400
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I968f57f0ff2905bd581908bc5c5ee486b31e6aa8
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5394: Handle blocked HS2 connections

2017-07-13 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-5394: Handle blocked HS2 connections
..


Patch Set 3:

(1 comment)

Patch looks pretty good. It would be great to still respect fe_service_threads, 
if possible, as I know of some users who set that flag to control the 
concurrency on a particular Impala instance. It shouldn't be too hard to have 
TAcceptQueueServer::Task limit itself to have only fe_service_threads active at 
one time with a condition variable and a shared atomic integer. Do you think 
that's something you could add to this patch?

http://gerrit.cloudera.org:8080/#/c/7061/3/be/src/rpc/TAcceptQueueServer.cpp
File be/src/rpc/TAcceptQueueServer.cpp:

Line 210
> The prior comment mentioned potential thread safety issues, but I didn't se
IIRC, the issue was that we didn't know for sure whether the various transport 
and protocol factories (particularly with SSL enabled) were thread-safe. Might 
be worth keeping this at 1 until we know for sure one way or the other.


-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5158,IMPALA-5236: account for unused buffer pool reservations

2017-07-13 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#3).

Change subject: IMPALA-5158,IMPALA-5236: account for unused buffer pool 
reservations
..

IMPALA-5158,IMPALA-5236: account for unused buffer pool reservations

We were missing accounting for this, since it is part of the expected
difference between query and process memory consumption. The identity
that applies is is:

  buffers allocated from system =
  reservation + cached buffers - unused reservation

Where "cached buffers" includes free buffers and buffers attached to
clean pages. The reservation is accounted against queries and "buffers
allocated from system" is accounted against the process MemTracker.

Reporting this in a direct way required adding a concept of a MemTracker
with negative consumption, which fortunately did not require any major
changes to the MemTracker code.

Example output when applied to buffer pool branch:

  Process: Limit=8.35 GB Total=579.18 MB Peak=590.41 MB
Buffer Pool: Free Buffers: Total=268.25 MB
Buffer Pool: Clean Pages: Total=172.25 MB
Buffer Pool: Unused Reservation: Total=-8.25 MB
Free Disk IO Buffers: Total=21.98 MB Peak=21.98 MB
RequestPool=default-pool: Total=12.07 MB Peak=71.58 MB
  ...  ...
RequestPool=fe-eval-exprs: Total=0 Peak=4.00 KB
Untracked Memory: Total=112.88 MB

Testing:
Added a basic test for MemTrackers with negative metrics.

Change-Id: Idb1fa3110dc893321f9f4e8ced6b7ede12194dad
---
M be/src/runtime/exec-env.cc
M be/src/runtime/mem-tracker-test.cc
M be/src/runtime/mem-tracker.cc
M be/src/runtime/mem-tracker.h
M be/src/util/memory-metrics.cc
M be/src/util/memory-metrics.h
M be/src/util/pretty-printer-test.cc
M be/src/util/pretty-printer.h
M common/thrift/metrics.json
9 files changed, 78 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/7380/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7380
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idb1fa3110dc893321f9f4e8ced6b7ede12194dad
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4276: Profile displays non-default query options set by planner

2017-07-13 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7419

Change subject: IMPALA-4276: Profile displays non-default query options set by 
planner
..

IMPALA-4276: Profile displays non-default query options set by planner

Fix to populate the non-default query options set by planner in the
runtime profile. Additionally fixed a bug where expilictly set
default query options were also populated in the runtime profile.

Added a corresponding test case.

Change-Id: I08e9dc2bebb83101976bbbd903ee48c5068dbaab
---
M be/src/service/client-request-state.cc
M be/src/service/query-options.cc
M tests/query_test/test_observability.py
3 files changed, 24 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/7419/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7419
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I08e9dc2bebb83101976bbbd903ee48c5068dbaab
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 


[Impala-ASF-CR] IMPALA-5659: Begin standardizing treatment of thirdparty libraries

2017-07-13 Thread Henry Robinson (Code Review)
Henry Robinson has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7418

Change subject: IMPALA-5659: Begin standardizing treatment of thirdparty 
libraries
..

IMPALA-5659: Begin standardizing treatment of thirdparty libraries

If Impala was built with --build_shared_libs, some thirdparty libraries
were still statically linked; this could cause runtime errors if the
libraries were also linked into a .so. This patch fixes that issue (for
gflags, glog and protobuf at least) by ensuring that build_shared_libs
is respected for those libraries.

* Standardize thirdparty library handling w/CMake by adding
  IMPALA_ADD_THIRDPARTY_LIB. This creates a symbolic name for each
  library, allowing us to switch the underlying library
  files (e.g. change from static to dynamic linking) without having to
  individually change the link clauses for each target.

* Remove most cases of add_library() from cmake_modules/* - that is all
  handled by IMPALA_ADD_THIRDPARTY_LIB.

* Add shared library detection for a couple of thirdparty
  dependencies (many only detect static libraries), just to prove the concept.

* All thirdparty libraries now print a standard set of messages. For example:

-- --> Adding thirdparty library protoc. <--
-- Header files: 
/data/henry/src/cloudera/impala-toolchain/protobuf-2.6.1/include
-- Added shared library dependency protoc: 
/data/henry/src/cloudera/impala-toolchain/protobuf-2.6.1/lib/libprotoc.so
-- --> Adding thirdparty library libev. <--
-- Header files: /data/henry/src/cloudera/impala-toolchain/libev-4.20/include
-- Added shared library dependency libev:
-- /data/henry/src/cloudera/impala-toolchain/libev-4.20/lib/libev.so

* Some libraries don't quite fit this pattern (LLVM and Boost) - leave
  them as is for now.

* Remove FindOpenSSL.cmake - the toolchain one is more modern.

Change-Id: Ib7a6bc5610aaf2450f91348d94cfb984c6a4b78d
---
M CMakeLists.txt
M be/CMakeLists.txt
M cmake_modules/FindAvro.cmake
M cmake_modules/FindBreakpad.cmake
M cmake_modules/FindGFlags.cmake
M cmake_modules/FindGLog.cmake
M cmake_modules/FindGTest.cmake
M cmake_modules/FindHDFS.cmake
M cmake_modules/FindLdap.cmake
M cmake_modules/FindLz4.cmake
D cmake_modules/FindOpenSSL.cmake
M cmake_modules/FindRe2.cmake
M cmake_modules/FindSnappy.cmake
M cmake_modules/FindZlib.cmake
M cmake_modules/kudu_cmake_fns.txt
15 files changed, 117 insertions(+), 260 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/18/7418/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7418
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ib7a6bc5610aaf2450f91348d94cfb984c6a4b78d
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 


[Impala-ASF-CR] IMPALA-4703: reservation denial debug action

2017-07-13 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4703: reservation denial debug action
..


Patch Set 7: Code-Review+2

rebase

-- 
To view, visit http://gerrit.cloudera.org:8080/7022
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ied39bb091b12156e5dc61b528c6c0cd8de3fe657
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5116: Remove deprecated hash * types in gutil

2017-07-13 Thread Kim Jin Chul (Code Review)
Kim Jin Chul has posted comments on this change.

Change subject: IMPALA-5116: Remove deprecated hash_* types in gutil
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7414/1/be/src/gutil/hash/hash.h
File be/src/gutil/hash/hash.h:

Line 222
> OK, so it looks like you've added it back in. Before we can decide if it sh
I guess the reason is that __gnu_cxx::hash<> was not used anywhere.  I didn't 
find any use case for this. Would you please double check?


Line 278
> So, do you need it to still exist, and if so, how do you make sure std::has
I got it. My idea is that some specializations in __gnu_cxx should be moved to 
std namespace. I can be sure this will not create a potential problem. Do you 
agree on this? Or any suggestion?


-- 
To view, visit http://gerrit.cloudera.org:8080/7414
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I06af30fd15acd43a60e3c30af54056c96520b6e9
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Kim Jin Chul 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Kim Jin Chul 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5116: Remove deprecated hash * types in gutil

2017-07-13 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-5116: Remove deprecated hash_* types in gutil
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7414/1/be/src/gutil/hash/hash.h
File be/src/gutil/hash/hash.h:

Line 222
> You're right. std::hash template class is specialized between 1 byte and 8 
OK, so it looks like you've added it back in. Before we can decide if it should 
stay or not, let'd understand why it didn't break the build before.


Line 278
> You're right. It should make wrong result or corruption.
So, do you need it to still exist, and if so, how do you make sure std::hash, 
which may not be __gnu_cxx::hash, is specialized?


-- 
To view, visit http://gerrit.cloudera.org:8080/7414
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I06af30fd15acd43a60e3c30af54056c96520b6e9
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Kim Jin Chul 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Kim Jin Chul 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5657: Fix a couple of bugs with FunctionCallExpr and IGNORE NULLS

2017-07-13 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-5657: Fix a couple of bugs with FunctionCallExpr and 
IGNORE NULLS
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7416/1//COMMIT_MSG
Commit Message:

Line 1: Parent: 1d8274a8 (IMPALA-5585: Fix expr-test to call last_day() 
tests)
> Thanks! can you also add a ToSqlTest?
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7416
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I723897886c95763c3f29d6f24c4d9e7d43898ade
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5657: Fix a couple of bugs with FunctionCallExpr and IGNORE NULLS

2017-07-13 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has uploaded a new patch set (#2).

Change subject: IMPALA-5657: Fix a couple of bugs with FunctionCallExpr and 
IGNORE NULLS
..

IMPALA-5657: Fix a couple of bugs with FunctionCallExpr and IGNORE NULLS

Bugs:

- FunctionCallExpr's toSql() doesn't include IGNORE NULLS if present
  causing view definitions to break and leading to incorrect results.

- FunctionCallExpr's clone() implementation doesn't carry forward
  IGNORE NULLS option if present. One case that breaks with this is
  querying views containing analytic exprs causing wrong plans.

Fixed both the bugs and added a test that can reliably reproduce this.

Change-Id: I723897886c95763c3f29d6f24c4d9e7d43898ade
---
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M testdata/workloads/functional-query/queries/QueryTest/analytic-fns.test
3 files changed, 33 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/7416/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7416
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I723897886c95763c3f29d6f24c4d9e7d43898ade
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Matthew Jacobs 


[Impala-ASF-CR] IMPALA-4674: Part 3: fix null-aware anti join

2017-07-13 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4674: Part 3: fix null-aware anti join
..


Patch Set 1:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/7367/1//COMMIT_MSG
Commit Message:

PS1, Line 13: note
not


Line 25:   Instead we just iterate over the rows of the stream.
> are there join class comments that should be updated to explain this strate
I updated the member declarations to mentioned when things are pinned/unpinned 
and updated the EvaluateNullProbe() comment. I think this covers all of the 
points in the commit message.


http://gerrit.cloudera.org:8080/#/c/7367/1/be/src/exec/partitioned-hash-join-builder.cc
File be/src/exec/partitioned-hash-join-builder.cc:

Line 251: 
RETURN_IF_ERROR(null_aware_partition_->Spill(BufferedTupleStreamV2::UNPIN_ALL));
> it's a bit confusing that we call Partition::Spill() when Partition::is_spi
Improved the comment.


http://gerrit.cloudera.org:8080/#/c/7367/1/be/src/exec/partitioned-hash-join-node.h
File be/src/exec/partitioned-hash-join-node.h:

PS1, Line 95: /// Null aware anti-join (NAAJ) extends the above algorithm by 
accumulating rows with
: /// NULLs into several different streams, which are processed in 
a separate step to
: /// produce additional output rows. The NAAJ algorithm is 
documented in more detail in
: /// header comments for the null aware functions and data 
structures.
> any of this need updates (or the comments this references)?
Done.

It would be nice to have a clearer explanation of the NAAJ here (i.e. why NULLs 
should be treated that wasy) but that seems out of scope for this change.


http://gerrit.cloudera.org:8080/#/c/7367/1/testdata/workloads/functional-query/queries/QueryTest/spilling-naaj.test
File testdata/workloads/functional-query/queries/QueryTest/spilling-naaj.test:

Line 3: set max_block_mgr_memory=10m;
Moved below comment


Line 8: # This returns the rows returned from as
garbled


Line 17: where l_suppkey = 4162 and l_shipmode = 'AIR' and l_returnflag = 'A' 
and l_shipdate > '1993-01-01' and
Fixed long lines in files


-- 
To view, visit http://gerrit.cloudera.org:8080/7367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie2e60eb4dd32bd287a31479a6232400df65964c1
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4674: Part 3: fix null-aware anti join

2017-07-13 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#2).

Change subject: IMPALA-4674: Part 3: fix null-aware anti join
..

IMPALA-4674: Part 3: fix null-aware anti join

Part 2 regressed NAAJ by tightening up the spilling invariants (e.g.
can't unpin with spilling disabled) but we didn't have tests for
spilling NAAJs that could detect the regression. This patch adds
those tests, fixes the regressions, and improves NAAJ by reliably
spilling the probe side and not trying to bring the whole probe side
into memory.

The changes are:
* All null-aware streams start off in memory and are only unpinned if
  spilling is enabled.
* The null-aware build partition can be spilled in the same way as hash
  partitions.
* Probe streams are unpinned whenever there is memory pressure - if
  spilling is enabled and either a build partition is spilled or
  appending to the probe stream fails.
* Spilled probe streams are not re-pinned in EvaluateNullProbe().
  Instead we just iterate over the rows of the stream.

Testing:
Add query tests where the three different buckets of rows are large
enough to spill: the build and probe of the null-aware partition and the
null probe rows.

Test both spilling and in-memory (with spilling disabled) cases.

Change-Id: Ie2e60eb4dd32bd287a31479a6232400df65964c1
---
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/exec/partitioned-hash-join-node-ir.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/partitioned-hash-join-node.h
A testdata/workloads/functional-query/queries/QueryTest/spilling-naaj.test
M tests/query_test/test_spilling.py
7 files changed, 425 insertions(+), 88 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/7367/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie2e60eb4dd32bd287a31479a6232400df65964c1
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 


[Impala-ASF-CR] IMPALA-5116: Remove deprecated hash * types in gutil

2017-07-13 Thread Kim Jin Chul (Code Review)
Kim Jin Chul has posted comments on this change.

Change subject: IMPALA-5116: Remove deprecated hash_* types in gutil
..


Patch Set 2:

(7 comments)

Thanks for your kind review. Would you please review again?

http://gerrit.cloudera.org:8080/#/c/7414/1//COMMIT_MSG
Commit Message:

PS1, Line 7: Remove deprecated hash_* types in gutil
> "Remove deprecated hash_* types in gutil"
Done


PS1, Line 9: tut
> from
Done


PS1, Line 9: class tem
> class templates
Done


http://gerrit.cloudera.org:8080/#/c/7414/1/be/src/gutil/hash/hash.h
File be/src/gutil/hash/hash.h:

Line 222
> I don't think gcc ships with a std::hash overload for uint128. Why doesn't 
You're right. std::hash template class is specialized between 1 byte and 8 
bytes. I agree this is necessary for unsigned 16 bytes.


Line 278
> http://en.cppreference.com/w/cpp/utility/hash :
You're right. It should make wrong result or corruption.


http://gerrit.cloudera.org:8080/#/c/7414/1/be/src/gutil/strings/split.h
File be/src/gutil/strings/split.h:

Line 135
> Has this fact about container-ism changed?
You're right. It's my mistake. pair cannot be a container.


Line 132: // multimap, unordered_set and unordered_map, and even std::pair 
which is not
> Fix early line break
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7414
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I06af30fd15acd43a60e3c30af54056c96520b6e9
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Kim Jin Chul 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Kim Jin Chul 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5116: Remove deprecated hash * types in gutil

2017-07-13 Thread Kim Jin Chul (Code Review)
Kim Jin Chul has uploaded a new patch set (#2).

Change subject: IMPALA-5116: Remove deprecated hash_* types in gutil
..

IMPALA-5116: Remove deprecated hash_* types in gutil

The following class templates are substituted from C++11 standard.
__gnu_cxx::hash_map => std::unordered_map
__gnu_cxx::hash_set => std::unordered_set
__gnu_cxx::hash => std::hash

Change-Id: I06af30fd15acd43a60e3c30af54056c96520b6e9
---
M be/src/gutil/hash/hash.cc
M be/src/gutil/hash/hash.h
M be/src/gutil/strings/join.h
M be/src/gutil/strings/serialize.cc
M be/src/gutil/strings/serialize.h
M be/src/gutil/strings/split.cc
M be/src/gutil/strings/split.h
7 files changed, 35 insertions(+), 106 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/7414/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7414
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I06af30fd15acd43a60e3c30af54056c96520b6e9
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Kim Jin Chul 
Gerrit-Reviewer: Jim Apple 


[Impala-ASF-CR] IMPALA-5116: Removal of uses for the deprecated functions in gutil

2017-07-13 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-5116: Removal of uses for the deprecated functions in 
gutil
..


Patch Set 1:

(7 comments)

THank you for sending this patch! A few questions:

http://gerrit.cloudera.org:8080/#/c/7414/1//COMMIT_MSG
Commit Message:

PS1, Line 7: Removal of uses for the deprecated functions in gutil
"Remove deprecated hash_* types in gutil"


PS1, Line 9: functions
class templates


PS1, Line 9: for
from


http://gerrit.cloudera.org:8080/#/c/7414/1/be/src/gutil/hash/hash.h
File be/src/gutil/hash/hash.h:

Line 222
I don't think gcc ships with a std::hash overload for uint128. Why doesn't 
removing this break the build?


Line 278
http://en.cppreference.com/w/cpp/utility/hash :

"std::hash produces a hash of the value of the pointer (the memory 
address)"


http://gerrit.cloudera.org:8080/#/c/7414/1/be/src/gutil/strings/split.h
File be/src/gutil/strings/split.h:

Line 135
Has this fact about container-ism changed?


Line 132: // multimap, unordered_set and unordered_map, and
Fix early line break


-- 
To view, visit http://gerrit.cloudera.org:8080/7414
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I06af30fd15acd43a60e3c30af54056c96520b6e9
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Kim Jin Chul 
Gerrit-Reviewer: Jim Apple 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4674: Part 2: port backend exec to BufferPool

2017-07-13 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-4674: Part 2: port backend exec to BufferPool
..


Patch Set 26:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5801/26/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

PS26, Line 149: if (mem_limit == -1) mem_limit = numeric_limits::max();
> Filed IMPALA-5661.
What did the old system do? was there a block mgr limit per query when there 
was no mem limit? i'm wondering if the new system can starve out queries more 
easily when running without mem limits. Of course, the real answer is 
integrating with admission control, but since that won't happen till later.


-- 
To view, visit http://gerrit.cloudera.org:8080/5801
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7fc7fe1c04e9dfb1a0c749fb56a5e0f2bf9c6c3e
Gerrit-PatchSet: 26
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4674: Part 2: port backend exec to BufferPool

2017-07-13 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4674: Part 2: port backend exec to BufferPool
..


Patch Set 32: Code-Review+2

rebase

-- 
To view, visit http://gerrit.cloudera.org:8080/5801
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7fc7fe1c04e9dfb1a0c749fb56a5e0f2bf9c6c3e
Gerrit-PatchSet: 32
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4674: Part 2: port backend exec to BufferPool

2017-07-13 Thread Tim Armstrong (Code Review)
Hello Dan Hecht,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5801

to look at the new patch set (#32).

Change subject: IMPALA-4674: Part 2: port backend exec to BufferPool
..

IMPALA-4674: Part 2: port backend exec to BufferPool

Always create global BufferPool at startup using 80% of memory and
limit reservations to 80% of query memory (same as BufferedBlockMgr).
The query's initial reservation is computed in the planner, claimed
centrally (managed by the InitialReservations class) and distributed
to query operators from there.

min_spillable_buffer_size and default_spillable_buffer_size query
options control the buffer size that the planner selects for
spilling operators.

Port ExecNodes to use BufferPool:
  * Each ExecNode has to claim its reservation during Open()
  * Port Sorter to use BufferPool.
  * Switch from BufferedTupleStream to BufferedTupleStreamV2
  * Port HashTable to use BufferPool via a Suballocator.

This also makes PAGG memory consumption more efficient (avoid wasting buffers)
and improve the spilling algorithm:
* Allow preaggs to execute with 0 reservation - if streams and hash tables
  cannot be allocated, it will pass through rows.
* Halve the buffer requirement for spilling aggs - avoid allocating
  buffers for aggregated and unaggregated streams simultaneously.
* Rebuild spilled partitions instead of repartitioning (IMPALA-2708)

TODO in follow-up patches:
* Rename BufferedTupleStreamV2 to BufferedTupleStream
* Implement max_row_size query option.

Testing:
* Updated tests to reflect new memory requirements

Change-Id: I7fc7fe1c04e9dfb1a0c749fb56a5e0f2bf9c6c3e
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/analytic-eval-node.cc
M be/src/exec/analytic-eval-node.h
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/hash-table-test.cc
M be/src/exec/hash-table.cc
M be/src/exec/hash-table.h
M be/src/exec/hash-table.inline.h
M be/src/exec/nested-loop-join-builder.cc
M be/src/exec/partitioned-aggregation-node-ir.cc
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-aggregation-node.h
M be/src/exec/partitioned-hash-join-builder-ir.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/exec/partitioned-hash-join-node-ir.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/partitioned-hash-join-node.h
M be/src/exec/partitioned-hash-join-node.inline.h
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/runtime/CMakeLists.txt
D be/src/runtime/buffered-block-mgr-test.cc
D be/src/runtime/buffered-block-mgr.cc
D be/src/runtime/buffered-block-mgr.h
D be/src/runtime/buffered-tuple-stream-test.cc
D be/src/runtime/buffered-tuple-stream.cc
D be/src/runtime/buffered-tuple-stream.h
D be/src/runtime/buffered-tuple-stream.inline.h
M be/src/runtime/bufferpool/buffer-pool.cc
M be/src/runtime/bufferpool/buffer-pool.h
M be/src/runtime/bufferpool/reservation-tracker.h
M be/src/runtime/disk-io-mgr.cc
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/runtime/fragment-instance-state.cc
A be/src/runtime/initial-reservations.cc
A be/src/runtime/initial-reservations.h
M be/src/runtime/query-exec-mgr.cc
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
M be/src/runtime/runtime-filter.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/runtime/test-env.cc
M be/src/runtime/test-env.h
M be/src/runtime/tmp-file-mgr-test.cc
M be/src/runtime/tmp-file-mgr.h
M be/src/service/client-request-state.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/bloom-filter.h
M be/src/util/static-asserts.cc
M common/thrift/Frontend.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/common/RuntimeEnv.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
M fe/src/main/java/org/apache/impala/planner/DataStreamSink.java
M fe/src/main/java/org/apache/impala/planner/EmptySetNode.java
M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/HBaseTableSink.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/JoinBuildSink.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/Kudu

[Impala-ASF-CR] IMPALA-4674: Part 1: remove old aggs and joins

2017-07-13 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4674: Part 1: remove old aggs and joins
..


Patch Set 9: Code-Review+2

rebase

-- 
To view, visit http://gerrit.cloudera.org:8080/7102
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5ce2236d37c0ced188a4a81f7e00d4b8ac98e7e9
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4674: Part 2: port backend exec to BufferPool

2017-07-13 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4674: Part 2: port backend exec to BufferPool
..


Patch Set 31: Code-Review+2

Carry +2

-- 
To view, visit http://gerrit.cloudera.org:8080/5801
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7fc7fe1c04e9dfb1a0c749fb56a5e0f2bf9c6c3e
Gerrit-PatchSet: 31
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4674: Part 2: port backend exec to BufferPool

2017-07-13 Thread Tim Armstrong (Code Review)
Hello Dan Hecht,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5801

to look at the new patch set (#31).

Change subject: IMPALA-4674: Part 2: port backend exec to BufferPool
..

IMPALA-4674: Part 2: port backend exec to BufferPool

Always create global BufferPool at startup using 80% of memory and
limit reservations to 80% of query memory (same as BufferedBlockMgr).
The query's initial reservation is computed in the planner, claimed
centrally (managed by the InitialReservations class) and distributed
to query operators from there.

min_spillable_buffer_size and default_spillable_buffer_size query
options control the buffer size that the planner selects for
spilling operators.

Port ExecNodes to use BufferPool:
  * Each ExecNode has to claim its reservation during Open()
  * Port Sorter to use BufferPool.
  * Switch from BufferedTupleStream to BufferedTupleStreamV2
  * Port HashTable to use BufferPool via a Suballocator.

This also makes PAGG memory consumption more efficient (avoid wasting buffers)
and improve the spilling algorithm:
* Allow preaggs to execute with 0 reservation - if streams and hash tables
  cannot be allocated, it will pass through rows.
* Halve the buffer requirement for spilling aggs - avoid allocating
  buffers for aggregated and unaggregated streams simultaneously.
* Rebuild spilled partitions instead of repartitioning (IMPALA-2708)

TODO in follow-up patches:
* Rename BufferedTupleStreamV2 to BufferedTupleStream
* Implement max_row_size query option.

Testing:
* Updated tests to reflect new memory requirements

Change-Id: I7fc7fe1c04e9dfb1a0c749fb56a5e0f2bf9c6c3e
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/analytic-eval-node.cc
M be/src/exec/analytic-eval-node.h
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/hash-table-test.cc
M be/src/exec/hash-table.cc
M be/src/exec/hash-table.h
M be/src/exec/hash-table.inline.h
M be/src/exec/nested-loop-join-builder.cc
M be/src/exec/partitioned-aggregation-node-ir.cc
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-aggregation-node.h
M be/src/exec/partitioned-hash-join-builder-ir.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/exec/partitioned-hash-join-node-ir.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/partitioned-hash-join-node.h
M be/src/exec/partitioned-hash-join-node.inline.h
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/runtime/CMakeLists.txt
D be/src/runtime/buffered-block-mgr-test.cc
D be/src/runtime/buffered-block-mgr.cc
D be/src/runtime/buffered-block-mgr.h
D be/src/runtime/buffered-tuple-stream-test.cc
D be/src/runtime/buffered-tuple-stream.cc
D be/src/runtime/buffered-tuple-stream.h
D be/src/runtime/buffered-tuple-stream.inline.h
M be/src/runtime/bufferpool/buffer-pool.cc
M be/src/runtime/bufferpool/buffer-pool.h
M be/src/runtime/bufferpool/reservation-tracker.h
M be/src/runtime/disk-io-mgr.cc
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/runtime/fragment-instance-state.cc
A be/src/runtime/initial-reservations.cc
A be/src/runtime/initial-reservations.h
M be/src/runtime/query-exec-mgr.cc
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
M be/src/runtime/runtime-filter.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/runtime/test-env.cc
M be/src/runtime/test-env.h
M be/src/runtime/tmp-file-mgr-test.cc
M be/src/runtime/tmp-file-mgr.h
M be/src/service/client-request-state.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/bloom-filter.h
M be/src/util/static-asserts.cc
M common/thrift/Frontend.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/common/RuntimeEnv.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
M fe/src/main/java/org/apache/impala/planner/DataStreamSink.java
M fe/src/main/java/org/apache/impala/planner/EmptySetNode.java
M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/HBaseTableSink.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/JoinBuildSink.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/Kudu

[Impala-ASF-CR] IMPALA-4674: Part 2: port backend exec to BufferPool

2017-07-13 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4674: Part 2: port backend exec to BufferPool
..


Patch Set 29:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/5801/30/be/src/exec/partitioned-aggregation-node.cc
File be/src/exec/partitioned-aggregation-node.cc:

PS30, Line 1246: 
> typo
Done


http://gerrit.cloudera.org:8080/#/c/5801/30/be/src/runtime/exec-env.cc
File be/src/runtime/exec-env.cc:

Line 366: << PrettyPrinter::Print(bytes_limit, TUnit::BYTES);
> maybe log bufffer_pool_capacity as well? or do we do that already somewhere
Done


http://gerrit.cloudera.org:8080/#/c/5801/26/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

PS26, Line 149:   bytes_limit = query_options().mem_limit;
> How about we file a jira for this and consider it for a follow on change? o
Filed IMPALA-5661.

I agree that the process-wide buffer pool limit kicks in automatically so we 
don't need to add a query-local buffer pool limit if there's no query mem_limit.

Restructured the code to exploit that fact and be less confusing.


http://gerrit.cloudera.org:8080/#/c/5801/29/tests/query_test/test_mem_usage_scaling.py
File tests/query_test/test_mem_usage_scaling.py:

Line 126:'Q21' : 187, 'Q22' : 125}
> Is that because we have to assume all fragment executions overlap, or pessi
I think the fragment overlapping is the main problem. Mostly within a single 
fragment I'd expect it to be pretty accurate, since in the old code the small 
buffers get allocated on Open() around the same time as the reservation is 
acquired.


-- 
To view, visit http://gerrit.cloudera.org:8080/5801
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7fc7fe1c04e9dfb1a0c749fb56a5e0f2bf9c6c3e
Gerrit-PatchSet: 29
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] [DOCS] Fold some lines

2017-07-13 Thread John Russell (Code Review)
John Russell has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7417

Change subject: [DOCS] Fold some  lines
..

[DOCS] Fold some  lines

Putting  on a separate line causes an extra
leading blank line in PDF output. (I think that's a
bug in the PDF transform, but what can you do...)

I took one file that had a few instances of such
examples, and folded the first line of the code
example onto the same line as the  tag.

Using this minor change to demo upstream -> downstream
cherry-picking techniques.

Change-Id: I563a71ab9e59c691264c1f6a088e71f4722d3097
---
M docs/topics/impala_connecting.xml
1 file changed, 3 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/7417/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7417
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I563a71ab9e59c691264c1f6a088e71f4722d3097
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 


[Impala-ASF-CR] IMPALA-5650: Make sum init zero a SUM function

2017-07-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5650: Make sum_init_zero a SUM function
..


Patch Set 4:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/859/

-- 
To view, visit http://gerrit.cloudera.org:8080/7404
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibca53e3ef6bad5283550100b33db55d674cff0b9
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: No


[native-toolchain-CR] IMPALA-5659: Build shared libs for gflags

2017-07-13 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5659: Build shared libs for gflags
..


Patch Set 1: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7415
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If1842768f21444ccf231cfd3fe5e51a4ca0d7244
Gerrit-PatchSet: 1
Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5650: Make sum init zero a SUM function

2017-07-13 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-5650: Make sum_init_zero a SUM function
..


Patch Set 4: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7404
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibca53e3ef6bad5283550100b33db55d674cff0b9
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5657: Fix a couple of bugs with FunctionCallExpr and IGNORE NULLS

2017-07-13 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5657: Fix a couple of bugs with FunctionCallExpr and 
IGNORE NULLS
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7416/1//COMMIT_MSG
Commit Message:

Line 1: Parent: 1d8274a8 (IMPALA-5585: Fix expr-test to call last_day() 
tests)
Thanks! can you also add a ToSqlTest?


-- 
To view, visit http://gerrit.cloudera.org:8080/7416
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I723897886c95763c3f29d6f24c4d9e7d43898ade
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5650: Make sum init zero a SUM function

2017-07-13 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#4).

Change subject: IMPALA-5650: Make sum_init_zero a SUM function
..

IMPALA-5650: Make sum_init_zero a SUM function

The recent Parquet count(*) optimization (IMPALA-5036) introduced a new
aggregate function sum_init_zero(). This caused an issue for non
partitioned aggregation node, which checks that the function type is
either count, min, max, sum or ndv. Since sum_init_zero() was not
labelled as a sum() function, a DCHECK was hit. This issue is fixed by
labelling sum_init_zero() as a sum() function.

Testing:
- Verified that we no longer hit a DCHECK when running a query with a
  Parquet count(*) optimization with legacy agg node enabled.

Change-Id: Ibca53e3ef6bad5283550100b33db55d674cff0b9
---
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exprs/agg-fn.cc
2 files changed, 10 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/7404/4
-- 
To view, visit http://gerrit.cloudera.org:8080/7404
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibca53e3ef6bad5283550100b33db55d674cff0b9
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-5650: Make sum init zero a SUM function

2017-07-13 Thread Taras Bobrovytsky (Code Review)
Hello Lars Volker, Michael Ho,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7404

to look at the new patch set (#4).

Change subject: IMPALA-5650: Make sum_init_zero a SUM function
..

IMPALA-5650: Make sum_init_zero a SUM function

The recent Parquet count(*) optimization (IMPALA-5036) introduced a new
aggregate function sum_init_zero(). This caused an issue for non
partitioned aggregation node, which checks that the function type is
either count, min, max, sum or ndv. Since sum_init_zero() was not
labelled as a sum() function, a DCHECK was hit. This issue is fixed by
labelling sum_init_zero() as a sum() function.

Testing:
- Verified that we no longer hit a DCHECK when running a query with a
  Parquet count(*) optimization with legacy agg node enabled.

Change-Id: Ibca53e3ef6bad5283550100b33db55d674cff0b9
---
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exprs/agg-fn.cc
2 files changed, 10 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/7404/4
-- 
To view, visit http://gerrit.cloudera.org:8080/7404
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibca53e3ef6bad5283550100b33db55d674cff0b9
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-5650: Make sum init zero a SUM function

2017-07-13 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-5650: Make sum_init_zero a SUM function
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7404/3/be/src/exec/partitioned-aggregation-node.cc
File be/src/exec/partitioned-aggregation-node.cc:

Line 1590: // 'slot_desc' is not nullable if the aggregate function is 
sum_init_zero().
> Also add a comment stating that: In which case, the slot is initialized to 
Done


Line 1592:   slot_desc->CodegenSetNullIndicator(
> Alternately, DCHECK(slot_desc->is_nullable() || agg_fn->fn_name() == "sum_i
Added a similar DCHECK to what was suggested.


-- 
To view, visit http://gerrit.cloudera.org:8080/7404
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibca53e3ef6bad5283550100b33db55d674cff0b9
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5657: Fix a couple of bugs with FunctionCallExpr and IGNORE NULLS

2017-07-13 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7416

Change subject: IMPALA-5657: Fix a couple of bugs with FunctionCallExpr and 
IGNORE NULLS
..

IMPALA-5657: Fix a couple of bugs with FunctionCallExpr and IGNORE NULLS

Bugs:

- FunctionCallExpr's toSql() doesn't include IGNORE NULLS if present
  causing view definitions to break and leading to incorrect results.

- FunctionCallExpr's clone() implementation doesn't carry forward
  IGNORE NULLS option if present. One case that breaks with this is
  querying views containing analytic exprs causing wrong plans.

Fixed both the bugs and added a test that can reliably reproduce this.

Change-Id: I723897886c95763c3f29d6f24c4d9e7d43898ade
---
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M testdata/workloads/functional-query/queries/QueryTest/analytic-fns.test
2 files changed, 27 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/7416/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7416
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I723897886c95763c3f29d6f24c4d9e7d43898ade
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 


[native-toolchain-CR] IMPALA-5659: Build shared libs for gflags

2017-07-13 Thread Henry Robinson (Code Review)
Henry Robinson has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7415

Change subject: IMPALA-5659: Build shared libs for gflags
..

IMPALA-5659: Build shared libs for gflags

Change-Id: If1842768f21444ccf231cfd3fe5e51a4ca0d7244
---
M source/gflags/build.sh
1 file changed, 2 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/native-toolchain 
refs/changes/15/7415/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7415
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: If1842768f21444ccf231cfd3fe5e51a4ca0d7244
Gerrit-PatchSet: 1
Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 


[Impala-ASF-CR] IMPALA-5582: Store sentry privileges in lower case

2017-07-13 Thread Michael Brown (Code Review)
Michael Brown has posted comments on this change.

Change subject: IMPALA-5582: Store sentry privileges in lower case
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7332/3/tests/authorization/test_grant_revoke.py
File tests/authorization/test_grant_revoke.py:

PS3, Line 143: grant_rev_db
> def test_role_privilege_case(self, vector, unique_database):
Got it. Filed IMPALA-5660 which shouldn't block this.


-- 
To view, visit http://gerrit.cloudera.org:8080/7332
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ide3dfa601fcf77f5acc6adce9bea443aea600901
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4674: Part 3: fix null-aware anti join

2017-07-13 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-4674: Part 3: fix null-aware anti join
..


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/7367/1//COMMIT_MSG
Commit Message:

PS1, Line 13: note
not


Line 25:   Instead we just iterate over the rows of the stream.
are there join class comments that should be updated to explain this strategy?


http://gerrit.cloudera.org:8080/#/c/7367/1/be/src/exec/partitioned-hash-join-builder.cc
File be/src/exec/partitioned-hash-join-builder.cc:

Line 251: 
RETURN_IF_ERROR(null_aware_partition_->Spill(BufferedTupleStreamV2::UNPIN_ALL));
it's a bit confusing that we call Partition::Spill() when 
Partition::is_spilled() is already true. The comment here helps, but maybe 
something we can say in the Spill() function comment about this use of Spill()? 
do we do this elsewhere?


http://gerrit.cloudera.org:8080/#/c/7367/1/be/src/exec/partitioned-hash-join-node.h
File be/src/exec/partitioned-hash-join-node.h:

PS1, Line 95: /// Null aware anti-join (NAAJ) extends the above algorithm by 
accumulating rows with
: /// NULLs into several different streams, which are processed in 
a separate step to
: /// produce additional output rows. The NAAJ algorithm is 
documented in more detail in
: /// header comments for the null aware functions and data 
structures.
any of this need updates (or the comments this references)?


-- 
To view, visit http://gerrit.cloudera.org:8080/7367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie2e60eb4dd32bd287a31479a6232400df65964c1
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5650: Make sum init zero a SUM function

2017-07-13 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-5650: Make sum_init_zero a SUM function
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7404/3/be/src/exec/partitioned-aggregation-node.cc
File be/src/exec/partitioned-aggregation-node.cc:

Line 1590: // 'slot_desc' is not nullable if the aggregate function is 
sum_init_zero().
Also add a comment stating that: In which case, the slot is initialized to be 
zero and null bit is cleared.


Line 1592:   slot_desc->CodegenSetNullIndicator(
> Can you add a DCHECK to make sure it's not sum_init_zero()?
Alternately, DCHECK(slot_desc->is_nullable() || agg_fn->fn_name() == 
"sum_init_zero");


-- 
To view, visit http://gerrit.cloudera.org:8080/7404
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibca53e3ef6bad5283550100b33db55d674cff0b9
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4674: Part 2: port backend exec to BufferPool

2017-07-13 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-4674: Part 2: port backend exec to BufferPool
..


Patch Set 30: Code-Review+2

(4 comments)

+2 assuming we do the addition flag in a separate change.  Good luck with the 
checkin.

http://gerrit.cloudera.org:8080/#/c/5801/30/be/src/exec/partitioned-aggregation-node.cc
File be/src/exec/partitioned-aggregation-node.cc:

PS30, Line 1246: nitialise
typo


http://gerrit.cloudera.org:8080/#/c/5801/30/be/src/runtime/exec-env.cc
File be/src/runtime/exec-env.cc:

Line 366: << PrettyPrinter::Print(bytes_limit, TUnit::BYTES);
maybe log bufffer_pool_capacity as well? or do we do that already somewhere 
else?


http://gerrit.cloudera.org:8080/#/c/5801/26/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

PS26, Line 149:   bytes_limit = query_options().mem_limit;
> hmm i'm not sure, maybe.  What about the case where there is no query mem_l
How about we file a jira for this and consider it for a follow on change? or do 
you want to settle it in this commit?


http://gerrit.cloudera.org:8080/#/c/5801/29/tests/query_test/test_mem_usage_scaling.py
File tests/query_test/test_mem_usage_scaling.py:

Line 126:'Q21' : 187, 'Q22' : 125}
> Part of it is that the values were either stale or obtained in a different 
Is that because we have to assume all fragment executions overlap, or pessimism 
within a single fragment, or both?  anyway, I added a comment to the 
IMPALA-3200 test doc, we can discuss there.


-- 
To view, visit http://gerrit.cloudera.org:8080/5801
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7fc7fe1c04e9dfb1a0c749fb56a5e0f2bf9c6c3e
Gerrit-PatchSet: 30
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5520: TopN node periodically reclaims old allocations

2017-07-13 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5520: TopN node periodically reclaims old allocations
..


Patch Set 3:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/7400/1/be/src/exec/topn-node-ir.cc
File be/src/exec/topn-node-ir.cc:

Line 61: }
> Done
Looks like you missed the second part of this (I wasn't that clear). I don't 
think the call to ReclaimTuplePool() needs to be cross-compiled to LLVM IR. It 
can be called after InsertBatch() instead of from inside InsertBatch().


http://gerrit.cloudera.org:8080/#/c/7400/3/be/src/exec/topn-node.cc
File be/src/exec/topn-node.cc:

PS3, Line 244: tuple_row_comparator_wrapper_
I think something like ComparatorWrapper(*tuple_row_less_than_) will work.


PS3, Line 259: NULL
It would be best to pass in the RuntimeState* - that enables better logging of 
the failure.


PS3, Line 271: NULL
Same here.


http://gerrit.cloudera.org:8080/#/c/7400/3/be/src/exec/topn-node.h
File be/src/exec/topn-node.h:

Line 36: #define PUSH_HEAP(v, comp, elem) v.push_back(elem); \
It's probably best to just make it an inline function, it doesn't seem like it 
needs to be a macro.


Line 100:   boost::scoped_ptr> 
tuple_row_comparator_wrapper_;
We don't need to store this additional state, since all the state is in 
tuple_row_less_than_ - we can just create a wrapper where needed.


PS3, Line 134: vector
Also mention that it's updated with push_heap()/pop_heap().


-- 
To view, visit http://gerrit.cloudera.org:8080/7400
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I968f57f0ff2905bd581908bc5c5ee486b31e6aa8
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5627: fix dropped statuses in HDFS writers

2017-07-13 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5627: fix dropped statuses in HDFS writers
..


Patch Set 4:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/7372/4/be/src/exec/hdfs-avro-table-writer.h
File be/src/exec/hdfs-avro-table-writer.h:

Line 71:   virtual Status Init();
> Can we add WARN_UNUSED_RESULT to virtual functions, too?
I intended to add it to the base class but missed doing that. Thanks for 
catching. Added that and then added "override" declarations here so it's 
clearer that they're overridden functions.


http://gerrit.cloudera.org:8080/#/c/7372/4/be/src/exec/hdfs-parquet-table-writer.cc
File be/src/exec/hdfs-parquet-table-writer.cc:

Line 194:   // aggregates. Returns true if the value was appended successfully 
to the current page.
> Should this also mention that the method updates next_page_encoding_?
Done


Line 260:   Encoding::type next_page_encoding_; // Preferred encoding for the 
next page.
> Is this ever not used as the encoding of the next page? Otherwise it may be
It always is used. Updated.

I'd prefer not to document where the variable is set in this case since I think 
it will get out of sync with the code too easily, so you need to grep the code 
anyway to verify that the comment is correct. Instead expanded a little on why 
it is set.


http://gerrit.cloudera.org:8080/#/c/7372/4/be/src/exec/parquet-column-stats.cc
File be/src/exec/parquet-column-stats.cc:

Line 123:   RETURN_IF_ERROR(buffer->Append(value->ptr, value->len));
> can you add WARN_IF_UNUSED to StringBuffer::Append()?
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7372
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I753d352c640faf5eaef650cd743e53de53761431
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5627: fix dropped statuses in HDFS writers

2017-07-13 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#5).

Change subject: IMPALA-5627: fix dropped statuses in HDFS writers
..

IMPALA-5627: fix dropped statuses in HDFS writers

The change is mostly mechanical - added Status returns where
need.

In one place I restructured the the logic around
'current_encoding_' for Parquet to allow a cleaner solution
to the dropped status from FinalizeCurrentPage() call in
ProcessValue(): after the restructuring the call was no longer
needed. 'current_encoding_' was overloaded to represent both the
encoding of the current page and the preferred encoding
for subsequent pages.

Testing:
Ran exhaustive build.

Change-Id: I753d352c640faf5eaef650cd743e53de53761431
---
M be/src/exec/hdfs-avro-table-writer.cc
M be/src/exec/hdfs-avro-table-writer.h
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-parquet-table-writer.h
M be/src/exec/hdfs-sequence-table-writer.cc
M be/src/exec/hdfs-table-writer.h
M be/src/exec/parquet-column-stats.cc
M be/src/exec/parquet-column-stats.h
M be/src/exec/parquet-column-stats.inline.h
M be/src/runtime/string-buffer.h
10 files changed, 98 insertions(+), 71 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/7372/5
-- 
To view, visit http://gerrit.cloudera.org:8080/7372
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I753d352c640faf5eaef650cd743e53de53761431
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Lars Volker 


[Impala-ASF-CR] IMPALA-5582: Store sentry privileges in lower case

2017-07-13 Thread anujphadke (Code Review)
anujphadke has posted comments on this change.

Change subject: IMPALA-5582: Store sentry privileges in lower case
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7332/3/be/src/catalog/catalog.cc
File be/src/catalog/catalog.cc:

PS3, Line 41: DEFINE_int64
> int32 should be fine.
int64_t maps to a long?

scheduleAtFixedRate(Runnable command,
 long initialDelay,
 long period,
 TimeUnit unit)


http://gerrit.cloudera.org:8080/#/c/7332/3/tests/authorization/test_grant_revoke.py
File tests/authorization/test_grant_revoke.py:

PS3, Line 143: grant_rev_db
> We looked at that, but it actually doesn't work for these custom cluster te
def test_role_privilege_case(self, vector, unique_database):

This is the error I got on doing it. -
ERROR at setup of TestGrantRevoke.test_role_privilege_case[exec_option: 
{'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 
'disable_codegen': False, 'abort_on_error': 1, 
'exec_single_node_rows_threshold': 0} | table_format: text/none] 
conftest.py:294: in unique_database
E   ImpalaBeeswaxException: ImpalaBeeswaxException:
EINNER EXCEPTION: 
EMESSAGE: AuthorizationException: User 'anuj' does not have privileges to 
execute 'DROP' on: test_role_privilege_case_2af3a508


-- 
To view, visit http://gerrit.cloudera.org:8080/7332
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ide3dfa601fcf77f5acc6adce9bea443aea600901
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5582: Store sentry privileges in lower case

2017-07-13 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5582: Store sentry privileges in lower case
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7332/3/tests/authorization/test_grant_revoke.py
File tests/authorization/test_grant_revoke.py:

PS3, Line 143: grant_rev_db
> I suggest using the unique_database fixture instead of creating databases a
We looked at that, but it actually doesn't work for these custom cluster tests 
because the code that would set up the unique_database don't have authorization 
to create a database. As a separate task, it might be worth considering a way 
to extend that to support granting access before creating the unique_database, 
but I think only this test file would make use of it. What do you think?


-- 
To view, visit http://gerrit.cloudera.org:8080/7332
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ide3dfa601fcf77f5acc6adce9bea443aea600901
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5582: Store sentry privileges in lower case

2017-07-13 Thread Michael Brown (Code Review)
Michael Brown has posted comments on this change.

Change subject: IMPALA-5582: Store sentry privileges in lower case
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7332/3/tests/authorization/test_grant_revoke.py
File tests/authorization/test_grant_revoke.py:

Line 25: from  __builtin__ import any as b_any
> why is this necessary?
It shouldn't be unless any is shadowed, which it ought not be. If it is, that 
should be fixed. any has been a builtin library function since python 2.5.


PS3, Line 143: grant_rev_db
> Looks like a common db name used in authz tests. Could you randomize it? El
I suggest using the unique_database fixture instead of creating databases and 
having a use side-effect command. "git grep unique_database" for examples.


-- 
To view, visit http://gerrit.cloudera.org:8080/7332
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ide3dfa601fcf77f5acc6adce9bea443aea600901
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5582: Store sentry privileges in lower case

2017-07-13 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-5582: Store sentry privileges in lower case
..


Patch Set 3:

(9 comments)

Few more in addition to MJ's comments.

http://gerrit.cloudera.org:8080/#/c/7332/3//COMMIT_MSG
Commit Message:

PS3, Line 16: If the catalogObjectCache on a update adds the new
: rolePrivilege object first and removes the old object later
qq, Isn't this order always the same?


http://gerrit.cloudera.org:8080/#/c/7332/3/be/src/catalog/catalog.cc
File be/src/catalog/catalog.cc:

PS3, Line 41: DEFINE_int64
int32 should be fine.


PS3, Line 41: sentry_catalog_polling_frequency
how about sentry_catalog_polling_frequency_s ? (indicates seconds)


http://gerrit.cloudera.org:8080/#/c/7332/3/fe/src/main/java/org/apache/impala/catalog/AuthorizationPolicy.java
File fe/src/main/java/org/apache/impala/catalog/AuthorizationPolicy.java:

PS3, Line 336: Fix GRANTs on URIs with uppercase letters
s/ URIs are case sensitive ?


http://gerrit.cloudera.org:8080/#/c/7332/3/fe/src/main/java/org/apache/impala/catalog/RolePrivilege.java
File fe/src/main/java/org/apache/impala/catalog/RolePrivilege.java:

Line 75: toLowerCase()));
nit: 4 space formatting everywhere


http://gerrit.cloudera.org:8080/#/c/7332/3/tests/authorization/test_grant_revoke.py
File tests/authorization/test_grant_revoke.py:

PS3, Line 130: statestored_args=("--statestore_heartbeat_frequency_ms=300 "
 : "--statestore_update_frequency_ms=300"))
Is this required?


PS3, Line 143: grant_rev_db
Looks like a common db name used in authz tests. Could you randomize it? Else 
we might hit flaky tests if this DB already exists from test_grant_revoke for 
ex.


Line 149: self.client.execute("grant select on table test1 to test_role")
Could you also add some grants on camel case DB names?


PS3, Line 165: self.client.execute("drop role test_role")
How about adding it to a finally block incase the test fails, also clean up the 
dbs created?


-- 
To view, visit http://gerrit.cloudera.org:8080/7332
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ide3dfa601fcf77f5acc6adce9bea443aea600901
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5520: TopN node periodically reclaims old allocations

2017-07-13 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has uploaded a new patch set (#3).

Change subject: IMPALA-5520: TopN node periodically reclaims old allocations
..

IMPALA-5520: TopN node periodically reclaims old allocations

Currently TopN retains old string allocations in a tuple pool which is
held longer than necessary, resulting in unnecessary memory usage.
With this commit, the TopN node will periodically re-materialise the
rows stored in the priority queue and reclaim the old allocations.
This is done when the number of rows removed from the priority queue
is more than twice the N (limit + offset). Moreover, a new counter
called "TuplePoolReclamations" is added to the TopN node that keeps
track of the number of times the tuple pool is reclaimed.

Testing:
Test added to test_tpch_queries.py which sets a low mem_limit such
that the test would fail if reclamation is not implemented and pass
otherwise.

Performance:
Query 1 (expected general case):
select * from tpch.lineitem order by l_orderkey desc limit 10;

Query 2 (example worst case: data stored in reverse order before
feeding to the last TopN node):
select * from (select * from tpch.lineitem order by l_orderkey desc
limit 6001215) tb order by l_orderkey limit 10;

   With Reclaim   Without Reclaim
   Query 1 Query 2  Query 1 Query 2
MaxTuplePoolMem3.96 KB 3.43 KB  110.2 MB708.8 MB
Time (mean)2s 218ms6s 391ms 2s 021ms6s 406ms
Time (stdev)   74.38ms 67.45ms  102.71ms70.44ms
Reclaims910 5861  N/A N/A

We notice that memory footprint is orders of magnitude lower while
maintaining similar query runtimes. Cluster perf testing will be done
later.

Change-Id: I968f57f0ff2905bd581908bc5c5ee486b31e6aa8
---
M be/src/exec/topn-node-ir.cc
M be/src/exec/topn-node.cc
M be/src/exec/topn-node.h
M tests/query_test/test_tpch_queries.py
4 files changed, 120 insertions(+), 27 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/7400/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7400
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I968f57f0ff2905bd581908bc5c5ee486b31e6aa8
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5521: TopN node periodically reclaims old allocations

2017-07-13 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has uploaded a new patch set (#2).

Change subject: IMPALA-5521: TopN node periodically reclaims old allocations
..

IMPALA-5521: TopN node periodically reclaims old allocations

Currently TopN retains old string allocations in a tuple pool which is
held longer than necessary, resulting in unnecessary memory usage.
With this commit, the TopN node will periodically re-materialise the
rows stored in the priority queue and reclaim the old allocations.
This is done when the number of rows removed from the priority queue
is more than twice the N (limit + offset). Moreover, a new counter
called "TuplePoolReclamations" is added to the TopN node that keeps
track of the number of times the tuple pool is reclaimed.

Testing:
Test added to test_tpch_queries.py which sets a low mem_limit such
that the test would fail if reclamation is not implemented and pass
otherwise.

Performance:
Query 1 (expected general case):
select * from tpch.lineitem order by l_orderkey desc limit 10;

Query 2 (example worst case: data stored in reverse order before
feeding to the last TopN node):
select * from (select * from tpch.lineitem order by l_orderkey desc
limit 6001215) tb order by l_orderkey limit 10;

   With Reclaim   Without Reclaim
   Query 1 Query 2  Query 1 Query 2
MaxTuplePoolMem3.96 KB 3.43 KB  110.2 MB708.8 MB
Time (mean)2s 218ms6s 391ms 2s 021ms6s 406ms
Time (stdev)   74.38ms 67.45ms  102.71ms70.44ms
Reclaims910 5861  N/A N/A

We notice that memory footprint is orders of magnitude lower while
maintaining similar query runtimes. Cluster perf testing will be done
later.

Change-Id: I968f57f0ff2905bd581908bc5c5ee486b31e6aa8
---
M be/src/exec/topn-node-ir.cc
M be/src/exec/topn-node.cc
M be/src/exec/topn-node.h
M tests/query_test/test_tpch_queries.py
4 files changed, 120 insertions(+), 27 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/7400/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7400
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I968f57f0ff2905bd581908bc5c5ee486b31e6aa8
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5520: TopN node periodically reclaims old allocations

2017-07-13 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has posted comments on this change.

Change subject: IMPALA-5520: TopN node periodically reclaims old allocations
..


Patch Set 1:

(19 comments)

http://gerrit.cloudera.org:8080/#/c/7400/1//COMMIT_MSG
Commit Message:

PS1, Line 9: tuple
> in a tuple pool which is held longer than necessary, resulting in unnecessa
Done


PS1, Line 11: With this commit TopN node
> commit, the TopN node
Done


PS1, Line 12: its
> stored in the priority queue
Done


PS1, Line 12: This is done every time the
: number of old rows (removed from the priority queue it uses) is 
more
: than twice the row limit (N in TopN)
> This is done when the number of rows removed from the priority queue is mor
Done


PS1, Line 24: general
> expected general case
Done


PS1, Line 27: data
> example worst case: data stored...
Done


PS1, Line 39: performance
> query runtimes
Done


PS1, Line 39: More extensive perf testing will be
: done in the future to recognize pathological cases
> Not necessarily pathological cases, more that we're gonna do cluster testin
Done


http://gerrit.cloudera.org:8080/#/c/7400/1/be/src/exec/topn-node-ir.cc
File be/src/exec/topn-node-ir.cc:

PS1, Line 28: limit_
> why isn't this (limit_ + offset_), since that's the size of the PQ?
Done


Line 61: void TopNNode::ReclaimTuplePool() {
> I don't think this needs to be in the *-ir.cc - the stuff in here gets reco
Done


Line 62:   auto temp_pool = new MemPool(mem_tracker());
> It would be good to use unique_ptr here just so we don't accidentally creat
Done


Line 63:   auto temp_pq = new std::priority_queue,
> Rebuilding the priority queue like this seems unnecessary. It's a std::vect
Done


PS1, Line 69: Allocate
> We're trying to move to using TryAllocate() instead and failing synchronous
Done


http://gerrit.cloudera.org:8080/#/c/7400/1/be/src/exec/topn-node.cc
File be/src/exec/topn-node.cc:

PS1, Line 88: NumOfTimesTuplePoolReclaimed
> TuplePoolReclamations is more concise
Done


PS1, Line 88:   
> indentation
Done


http://gerrit.cloudera.org:8080/#/c/7400/1/be/src/exec/topn-node.h
File be/src/exec/topn-node.h:

PS1, Line 71: Create new tuple pool with only the elements inside the priority 
queue and release
:   // memory from the old tuple pool
> Re-materialize the elements in the priority queue into a new tuple pool, an
Done


PS1, Line 115: poped
> popped has two 'p's. Regardless, let's call this rows_to_reclaim_ because n
Done


http://gerrit.cloudera.org:8080/#/c/7400/1/tests/query_test/test_tpch_queries.py
File tests/query_test/test_tpch_queries.py:

Line 67: return 'tpch'
> newline after this
Done


PS1, Line 75:  \
> nit: remove space to be consistent
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7400
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I968f57f0ff2905bd581908bc5c5ee486b31e6aa8
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4669: [KUTIL] Add kudu util library to the build.

2017-07-13 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-4669: [KUTIL] Add kudu_util library to the build.
..


Patch Set 22:

PS22 takes the blunt-instrument approach to preventing clang-tidy from running 
on be/src/kudu/*. I have not been able to test in my local environment, so 
running another dry run on jenkins.impala.io.

-- 
To view, visit http://gerrit.cloudera.org:8080/5715
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8da798ee55506d6f969416b17c191eb03cb215f5
Gerrit-PatchSet: 22
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4669: [KUTIL] Add kudu util library to the build.

2017-07-13 Thread Henry Robinson (Code Review)
Hello Impala Public Jenkins, Michael Ho, Matthew Jacobs, Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5715

to look at the new patch set (#22).

Change subject: IMPALA-4669: [KUTIL] Add kudu_util library to the build.
..

IMPALA-4669: [KUTIL] Add kudu_util library to the build.

A few miscellaneous changes to allow kudu_util to compile with Impala.

Add kudu_version.cc to substitute for the version.cc file that is
automatically built during the full Kudu build.

Set LZ4_DISABLE_DEPRECATE_WARNINGS to allow Kudu's compressor utility to
use deprecated names for LZ4 methods.

Add NO_NVM_SUPPORT flag to Kudu build (plan to upstream this later) to
disable building with nvm support, removing a library dependency.

Also remove imported FindOpenSSL.cmake in favour of the standard one provided
by cmake itself.

Finally, a few changes to allow compilation on RHEL5:

* Only use sched_getcpu() if supported
* Only include magic.h if available
* Workaround for kernels that don't have SOCK_NONBLOCK
* Workaround for kernels that don't have O_CLOEXEC (ignore the flag)
* Provide non-working implementation of fallocate()
* Disable inclusion of linux/fiemap.h - although this exists on RHEL5,
  it does not compile due to other #includes in env_posix.cc. We disable
  the path this is used for, since Impala does not call that code.
* Use Kudu's implementation of pipe(2), preadv(2) and pwritev(2) where
  it doesn't exist.

In most cases these changes simply force kutil to revert to a different
implementation that was already written for OSX support - this patch
generalises the logic to provide the implementation whenever the
required function doesn't exist.

This patch compiles on RHEL5.5 and 6.0, SLES11 and 12, Ubuntu 12.04 and
14.04 and Debian 7.0 and 8.0.

Change-Id: I8da798ee55506d6f969416b17c191eb03cb215f5
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/src/common/CMakeLists.txt
M be/src/common/config.h.in
A be/src/common/kudu_version.cc
M be/src/common/logging.cc
M be/src/exec/kudu-util.h
A be/src/kudu/gutil
M be/src/kudu/util/CMakeLists.txt
M be/src/kudu/util/cache.cc
M be/src/kudu/util/compression/compression_codec.cc
M be/src/kudu/util/env.cc
M be/src/kudu/util/env_posix.cc
M be/src/kudu/util/flags.cc
A be/src/kudu/util/kudu_export.h
M be/src/kudu/util/locks.h
M be/src/kudu/util/logging.cc
M be/src/kudu/util/minidump.cc
M be/src/kudu/util/net/socket.cc
M be/src/kudu/util/subprocess.cc
M bin/run_clang_tidy.sh
D cmake_modules/FindOpenSSL.cmake
22 files changed, 223 insertions(+), 107 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/5715/22
-- 
To view, visit http://gerrit.cloudera.org:8080/5715
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8da798ee55506d6f969416b17c191eb03cb215f5
Gerrit-PatchSet: 22
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5511: Add process start time to debug web page

2017-07-13 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-5511: Add process start time to debug web page
..


Patch Set 5:

To that end, it might be better to define a single process-start-time metric, 
and to have each daemon call a new InitCommonMetrics(MetricGroup* root) method. 
Then the handler for the root URL can just grab the process start time metric, 
without needing to know if it's a statestore, catalog server or impalad.

-- 
To view, visit http://gerrit.cloudera.org:8080/7363
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I05ae2f80835b1b0e4bc3b38731778ba0871338a4
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4669: [KUTIL] Add kudu util library to the build.

2017-07-13 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-4669: [KUTIL] Add kudu_util library to the build.
..


Patch Set 21:

PS21 passed all tests. clang-tidy issue is still outstanding, as there is no 
obvious way to disable all checks for a single directory.

-- 
To view, visit http://gerrit.cloudera.org:8080/5715
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8da798ee55506d6f969416b17c191eb03cb215f5
Gerrit-PatchSet: 21
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5511: Add process start time to debug web page

2017-07-13 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-5511: Add process start time to debug web page
..


Patch Set 5:

Can you add this information to the root (/) web page for each daemon? I think 
it's very useful to be able to see this prominently.

-- 
To view, visit http://gerrit.cloudera.org:8080/7363
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I05ae2f80835b1b0e4bc3b38731778ba0871338a4
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5582: Store sentry privileges in lower case

2017-07-13 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5582: Store sentry privileges in lower case
..


Patch Set 3:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/7332/3/be/src/catalog/catalog.cc
File be/src/catalog/catalog.cc:

PS3, Line 42: (
nit space


http://gerrit.cloudera.org:8080/#/c/7332/3/tests/authorization/test_grant_revoke.py
File tests/authorization/test_grant_revoke.py:

Line 25: from  __builtin__ import any as b_any
why is this necessary?


PS3, Line 129: 10
why not just 1 second?


PS3, Line 156: # Sleep for 15 seconds and make sure that the privileges
 : # on all 3 tables still persist on a sentryProxy thread
 : # update. sentry_catalog_polling_frequency is set to 10
 : # seconds.
 : sleep(15)
I wonder if we can lower this if we reduce the sentry polling freq. Even a 15 
sec wait is unfortunate in our already long running tests


-- 
To view, visit http://gerrit.cloudera.org:8080/7332
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ide3dfa601fcf77f5acc6adce9bea443aea600901
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5116: Removal of uses for the deprecated functions in gutil

2017-07-13 Thread Kim Jin Chul (Code Review)
Kim Jin Chul has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7414

Change subject: IMPALA-5116: Removal of uses for the deprecated functions in 
gutil
..

IMPALA-5116: Removal of uses for the deprecated functions in gutil

The following functions are substituted for C++11 standard.
__gnu_cxx::hash_map => std::unordered_map
__gnu_cxx::hash_set => std::unordered_set
__gnu_cxx::hash => std::hash

Change-Id: I06af30fd15acd43a60e3c30af54056c96520b6e9
---
M be/src/gutil/hash/hash.cc
M be/src/gutil/hash/hash.h
M be/src/gutil/strings/join.h
M be/src/gutil/strings/serialize.cc
M be/src/gutil/strings/serialize.h
M be/src/gutil/strings/split.cc
M be/src/gutil/strings/split.h
7 files changed, 35 insertions(+), 221 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/7414/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7414
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I06af30fd15acd43a60e3c30af54056c96520b6e9
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Kim Jin Chul 


[Impala-ASF-CR] IMPALA-5116: Removal of uses for the deprecated functions in gutil

2017-07-13 Thread Kim Jin Chul (Code Review)
Kim Jin Chul has abandoned this change.

Change subject: IMPALA-5116: Removal of uses for the deprecated functions in 
gutil
..


Abandoned

-- 
To view, visit http://gerrit.cloudera.org:8080/7413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: Ia4bd8ae4cc4ca5aa72e181ce17bac9d1c91f9fb3
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Kim Jin Chul 


[Impala-ASF-CR] IMPALA-5116: Removal of uses for the deprecated functions in gutil

2017-07-13 Thread Kim Jin Chul (Code Review)
Kim Jin Chul has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7413

Change subject: IMPALA-5116: Removal of uses for the deprecated functions in 
gutil
..

IMPALA-5116: Removal of uses for the deprecated functions in gutil

The following functions are substituted for C++11 standard.
__gnu_cxx::hash_map => std::unordered_map
__gnu_cxx::hash_set => std::unordered_set
__gnu_cxx::hash => std::hash

Change-Id: Ia4bd8ae4cc4ca5aa72e181ce17bac9d1c91f9fb3
---
M be/src/gutil/hash/hash.cc
M be/src/gutil/hash/hash.h
M be/src/gutil/strings/join.h
M be/src/gutil/strings/serialize.cc
M be/src/gutil/strings/serialize.h
M be/src/gutil/strings/split.cc
M be/src/gutil/strings/split.h
7 files changed, 35 insertions(+), 221 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/7413/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ia4bd8ae4cc4ca5aa72e181ce17bac9d1c91f9fb3
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Kim Jin Chul 


[Impala-ASF-CR] IMPALA-5511: Add process start time to debug web page

2017-07-13 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change.

Change subject: IMPALA-5511: Add process start time to debug web page
..


Patch Set 5:

comments incorporated

-- 
To view, visit http://gerrit.cloudera.org:8080/7363
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I05ae2f80835b1b0e4bc3b38731778ba0871338a4
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5407: Fix crash in HdfsSequenceTableWriter

2017-07-13 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#2).

Change subject: IMPALA-5407: Fix crash in HdfsSequenceTableWriter
..

IMPALA-5407: Fix crash in HdfsSequenceTableWriter

The following use of sequence file writer can lead to a crash:
> set compression_codec=gzip;
> set seq_compression_mode=record;
> set allow_unsupported_formats=1;
> create table seq_tbl like tbl stored as sequencefile;
> insert into seq_tbl select * from tbl;

The issue is that HdfsSequenceTableWriter::Flush() frees the per-file
memory pool, but the buffer used for compression is still in use after
Flush() returns. If there are additional rows to process after calling
Flush(), impalad might crash.

This fix removes the FreeAll() call from Flush(). It was tested on a
local minicluster with a table containing around 100 million rows.

Change-Id: Ida0b9f189175358ae54149d0e1af7caa06ae3bec
---
M be/src/exec/hdfs-sequence-table-writer.cc
1 file changed, 0 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/7394/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7394
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ida0b9f189175358ae54149d0e1af7caa06ae3bec
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Michael Ho 


[Impala-ASF-CR] IMPALA-5407: Fix crash in HdfsSequenceTableWriter

2017-07-13 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change.

Change subject: IMPALA-5407: Fix crash in HdfsSequenceTableWriter
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7394/1//COMMIT_MSG
Commit Message:

PS1, Line 16: The issue is that HdfsSequenceTableWriter::Flush() frees the 
per-file
: memory pool, but the buffer used for compression is still in use 
after
: Flush() returns. If there are additional
> Would you mind also explaining the original problem which triggered the cra
The buffer used for compression is still in used after Flush() returns. If 
there are additional rows to process after calling Flush(), impalad might 
crash. I've updated the commit message.

After taking another look at the code I realized that InitNewFile() is called 
only once for each HdfsSequenceTableWriter instance, so it doesn't really make 
sense to clear the mem pool there. I've removed mem_pool_->Clear() from 
InitNewFile().


-- 
To view, visit http://gerrit.cloudera.org:8080/7394
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ida0b9f189175358ae54149d0e1af7caa06ae3bec
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5407: Fix crash in HdfsSequenceTableWriter

2017-07-13 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#2).

Change subject: IMPALA-5407: Fix crash in HdfsSequenceTableWriter
..

IMPALA-5407: Fix crash in HdfsSequenceTableWriter

The following use of sequence file writer can lead to a crash:
> set compression_codec=gzip;
> set seq_compression_mode=record;
> set allow_unsupported_formats=1;
> create table seq_tbl like tbl stored as sequencefile;
> insert into seq_tbl select * from tbl;

The issue is that HdfsSequenceTableWriter::Flush() frees the per-file
memory pool, but the buffer used for compression is still in use after
Flush() returns. If there are additional rows to process after calling
Flush(), impalad might crash.

This fix removes the FreeAll() call from Flush(). It was tested on a
local minicluster with a table containing around 100 million rows.

Change-Id: Ida0b9f189175358ae54149d0e1af7caa06ae3bec
---
M be/src/exec/hdfs-sequence-table-writer.cc
1 file changed, 0 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/7394/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7394
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ida0b9f189175358ae54149d0e1af7caa06ae3bec
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Michael Ho 


[Impala-ASF-CR] test

2017-07-13 Thread Kim Jin Chul (Code Review)
Kim Jin Chul has abandoned this change.

Change subject: test
..


Abandoned

-- 
To view, visit http://gerrit.cloudera.org:8080/7412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: I8e3f1e28004e9d2a7526c83e00fcd8ecd2e0abc2
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Kim Jin Chul 


[Impala-ASF-CR] test

2017-07-13 Thread Kim Jin Chul (Code Review)
Kim Jin Chul has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7412

Change subject: test
..

test

Change-Id: I8e3f1e28004e9d2a7526c83e00fcd8ecd2e0abc2
---
M README.md
1 file changed, 2 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/7412/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I8e3f1e28004e9d2a7526c83e00fcd8ecd2e0abc2
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Kim Jin Chul 


[Impala-ASF-CR] IMPALA-5586: Null-aware anti-join can take a long time to cancel

2017-07-13 Thread anujphadke (Code Review)
anujphadke has posted comments on this change.

Change subject: IMPALA-5586: Null-aware anti-join can take a long time to cancel
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7393/2//COMMIT_MSG
Commit Message:

PS2, Line 15: PartitionedHashJoinNode::EvaluateNullProbe
> Add '()' after like you have above.
Done


http://gerrit.cloudera.org:8080/#/c/7393/2/be/src/exec/partitioned-hash-join-node.h
File be/src/exec/partitioned-hash-join-node.h:

PS2, Line 350: u
> extra space
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7393
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0800754d4ad31cbadbdfadc630c640963f3f6053
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5586: Null-aware anti-join can take a long time to cancel

2017-07-13 Thread anujphadke (Code Review)
anujphadke has uploaded a new patch set (#3).

Change subject: IMPALA-5586: Null-aware anti-join can take a long time to cancel
..

IMPALA-5586: Null-aware anti-join can take a long time to cancel

Queries with a null-aware anti-join joining on a large number of NULLs
can take a long time to cancel if threads are stuck in
PartitionedHashJoinNode::EvaluateNullProbe(). This change adds the
RETURN_IF_CANCELLED macro to the function.

Testing:
Added logs to PartitionedHashJoinNode::EvaluateNullProbe() and made sure
that the function returns right away on cancellation.

Change-Id: I0800754d4ad31cbadbdfadc630c640963f3f6053
---
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/partitioned-hash-join-node.h
2 files changed, 12 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/7393/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7393
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0800754d4ad31cbadbdfadc630c640963f3f6053
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-5582: Store sentry privileges in lower case

2017-07-13 Thread anujphadke (Code Review)
anujphadke has posted comments on this change.

Change subject: IMPALA-5582: Store sentry privileges in lower case
..


Patch Set 3:

Discussed this issue with MJ. This issue does occur due to inconsistent 
handling of casing.
Changed buildRolePrivilegeName to return the privilege names in lower case 
instead of changing  it in privilegeSpec.
 Also changed getRolePrivileges to return the output of show grant roles in 
lower case since we are not lower casing privilegeSpec object anymore.

Added a test in test_grant_revoke.py and made sentryProxy configurable.

-- 
To view, visit http://gerrit.cloudera.org:8080/7332
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ide3dfa601fcf77f5acc6adce9bea443aea600901
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5582: Store sentry privileges in lower case

2017-07-13 Thread anujphadke (Code Review)
anujphadke has uploaded a new patch set (#3).

Change subject: IMPALA-5582: Store sentry privileges in lower case
..

IMPALA-5582: Store sentry privileges in lower case

Privileges granted to a db/table whose name contains upper case
characters can disappear after a few seconds. A privilege
is inserted into the catalogObjectCache using a key that
uses the db/table name. The key gets converted to lower case
before inserting. The db/table name is stored in upper case in
the rolePrivilege object.
The sentryProxy thread on the other hand returns the db/table name
in lower case. If the catalogObjectCache on a update adds the new
rolePrivilege object first and removes the old object later, it
ends up deleting the newer verison of the rolePrivilege since they
both use the same key to add/remove from the cache.
This change sets the privilege name in lower case which does not
trigger these chain of events on a sentryProxy thread update.

This change also adds a new catalogd startup option
(sentry_catalog_polling_frequency)
to configure the frequency at which catalogd polls the sentry service
to update any policy changes. The default value is 60 seconds.

Test:
Added a test which adds select privileges to 3 tables  specified in
lower case, upper case and mixed case. The test verifies that the
privileges on the 3 tables do not disappear on a sentry update.

Change-Id: Ide3dfa601fcf77f5acc6adce9bea443aea600901
---
M be/src/catalog/catalog.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/catalog/AuthorizationPolicy.java
M fe/src/main/java/org/apache/impala/catalog/RolePrivilege.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/util/SentryProxy.java
M tests/authorization/test_grant_revoke.py
8 files changed, 90 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/7332/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7332
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ide3dfa601fcf77f5acc6adce9bea443aea600901
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: anujphadke