[Impala-ASF-CR] IMPALA-5077: add NUMA and current cpu to CpuInfo

2017-03-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5077: add NUMA and current cpu to CpuInfo
..


Patch Set 8:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/383/

-- 
To view, visit http://gerrit.cloudera.org:8080/6402
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0525228a56bcf20c45f78ee1ba1d300c74cf4d05
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5077: add NUMA and current cpu to CpuInfo

2017-03-15 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5077: add NUMA and current cpu to CpuInfo
..


Patch Set 8: Code-Review+2

Fix a linker issue - order of boost libraries was wrong

-- 
To view, visit http://gerrit.cloudera.org:8080/6402
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0525228a56bcf20c45f78ee1ba1d300c74cf4d05
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5077: add NUMA and current cpu to CpuInfo

2017-03-15 Thread Tim Armstrong (Code Review)
Hello Impala Public Jenkins, Jim Apple, Dan Hecht,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6402

to look at the new patch set (#8).

Change subject: IMPALA-5077: add NUMA and current cpu to CpuInfo
..

IMPALA-5077: add NUMA and current cpu to CpuInfo

NUMA info is found using the /sys filesystem.

The current CPU can be found using sched_getcpu(), which is supported
on all recent Linux kernels (unfortunately CentOS 5 shipped with an
older kernel, so we need a fallback).

Testing:
Confirmed that this built on a range of different Linux distros,
including CentOS 5, which is missing support for features like
sched_getcpu().

Change-Id: I0525228a56bcf20c45f78ee1ba1d300c74cf4d05
---
M CMakeLists.txt
M be/CMakeLists.txt
A be/src/common/.gitignore
M be/src/common/CMakeLists.txt
A be/src/common/config.h.in
M be/src/util/cpu-info.cc
M be/src/util/cpu-info.h
7 files changed, 168 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/6402/8
-- 
To view, visit http://gerrit.cloudera.org:8080/6402
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0525228a56bcf20c45f78ee1ba1d300c74cf4d05
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool

2017-03-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool
..


Patch Set 14: Verified-1

Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/382/

-- 
To view, visit http://gerrit.cloudera.org:8080/5811
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7bb47a818564ac19a6be4ca88db0578f4ea0b709
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5077: add NUMA and current cpu to CpuInfo

2017-03-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5077: add NUMA and current cpu to CpuInfo
..


Patch Set 7: Verified-1

Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/380/

-- 
To view, visit http://gerrit.cloudera.org:8080/6402
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0525228a56bcf20c45f78ee1ba1d300c74cf4d05
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool

2017-03-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool
..


Patch Set 14:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/382/

-- 
To view, visit http://gerrit.cloudera.org:8080/5811
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7bb47a818564ac19a6be4ca88db0578f4ea0b709
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool

2017-03-15 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool
..


Patch Set 14: Code-Review+2

Carry +2

-- 
To view, visit http://gerrit.cloudera.org:8080/5811
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7bb47a818564ac19a6be4ca88db0578f4ea0b709
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4996: Single-threaded KuduScanNode

2017-03-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4996: Single-threaded KuduScanNode
..


Patch Set 6: Verified-1

Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/377/

-- 
To view, visit http://gerrit.cloudera.org:8080/6312
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6e4593300e376bc508b78acaea64ffdd2c73a67a
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-03-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 17:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/381/

-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 17
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4831: enforce BufferPool reservation invariants

2017-03-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4831: enforce BufferPool reservation invariants
..


Patch Set 3: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5b0c335d6e73250f7e5a3b9ce2f999c5119c573
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4831: enforce BufferPool reservation invariants

2017-03-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4831: enforce BufferPool reservation invariants
..


IMPALA-4831: enforce BufferPool reservation invariants

Before this patch ill-behaved code outside BufferPool could
violate BufferPool invariants by calling methods on ReservationTracker()
such as DecreaseReservation() or ReleaseFrom() or by hooking
up Clients and ReservationTrackers in the wrong way (e.g. sharing
a ReservationTracker between two Clients).

Now each client creates and owns its ReservationTracker and restricts
which methods can be called from outside BufferPool. This also reduces
the amount of boilerplate code required to set up and tear down a
client.

Change-Id: Ic5b0c335d6e73250f7e5a3b9ce2f999c5119c573
Reviewed-on: http://gerrit.cloudera.org:8080/6313
Reviewed-by: Dan Hecht 
Tested-by: Impala Public Jenkins
---
M be/src/runtime/bufferpool/buffer-pool-internal.h
M be/src/runtime/bufferpool/buffer-pool-test.cc
M be/src/runtime/bufferpool/buffer-pool.cc
M be/src/runtime/bufferpool/buffer-pool.h
M be/src/runtime/bufferpool/reservation-tracker.h
M be/src/runtime/bufferpool/suballocator-test.cc
M be/src/runtime/bufferpool/suballocator.cc
7 files changed, 163 insertions(+), 150 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Dan Hecht: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/6313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ic5b0c335d6e73250f7e5a3b9ce2f999c5119c573
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins


[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-03-15 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 17: Code-Review+2

Forgot to update the alloc test. Made the tiny change to the test. Forwarding 
the +2.

-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 17
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-03-15 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#17).

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..

IMPALA-4787: Optimize APPX_MEDIAN() memory usage

Before this change, ReservoirSample functions (such as APPX_MEDIAN())
allocated memory for 20,000 elements up front per grouping key. This
caused inefficient memory usage for aggregations with many grouping
keys.

This patch fixes this by initially allocating memory for 16 elements.
Once the buffer becomes full, we reallocate a new buffer with double
capacity and copy the original buffer into the new one. We continue
doubling the buffer size until the buffer has room for 20,000 elements
as before.

Testing:
Added some EE APPX_MEDIAN() tests on larger datasets that exercise the
resize code path.

Perf Benchrmark (about 35,000 elements per bucket):

SELECT MAX(a) from (
  SELECT c1, appx_median(c2) as a FROM benchmark GROUP BY c1) t

BEFORE: 11s067ms
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
-
06:AGGREGATE1  124.726us  124.726us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   29.544us   29.544us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   86.406us  120.372us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE31s840ms2s824ms   2.00K  -1   1.02 GB  
128.00 MB  FINALIZE
03:EXCHANGE 31s163ms1s989ms   6.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE33s356ms3s416ms   6.00K  -1   1.95 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   64.962ms   65.490ms  65.54M  -1  25.97 MB   
64.00 MB  tpcds_10_parquet.benchmark

AFTER: 9s465ms
Operator   #Hosts   Avg Time  Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail

06:AGGREGATE1   73.961us  73.961us   1   1  28.00 KB
-1.00 B  FINALIZE
05:EXCHANGE 1   18.101us  18.101us   3   1 0
-1.00 B  UNPARTITIONED
02:AGGREGATE3   75.795us  83.969us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE31s608ms   2s683ms   2.00K  -1   1.02 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  826.683ms   1s322ms   6.00K  -1 0
  0  HASH(c1)
01:AGGREGATE32s457ms   2s672ms   6.00K  -1   3.14 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   81.514ms  89.056ms  65.54M  -1  25.94 MB   
64.00 MB  tpcds_10_parquet.benchmark

Memory Benchmark (about 12 elements per bucket):

SELECT MAX(a) FROM (
  SELECT ss_customer_sk, APPX_MEDIAN(ss_sold_date_sk) as a
  FROM tpcds_parquet.store_sales
  GROUP BY ss_customer_sk) t

BEFORE: 7s477ms
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail
-
06:AGGREGATE1  114.686us  114.686us1   1  28.00 KB  
  -1.00 B  FINALIZE
05:EXCHANGE 1   18.214us   18.214us3   1 0  
  -1.00 B  UNPARTITIONED
02:AGGREGATE3  147.055us  165.464us3   1  28.00 KB  
 10.00 MB
04:AGGREGATE32s043ms2s147ms   14.82K  -1   4.94 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  840.528ms  943.254ms   15.61K  -1 0  
0  HASH(ss_customer_sk)
01:AGGREGATE31s769ms1s869ms   15.61K  -1   5.32 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   17.941ms   37.109ms  183.59K  -1   1.94 MB  
 16.00 MB  tpcds_parquet.store_sales

AFTER: 434ms
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail
-
06:AGGREGATE1  125.915us  125.915us1   1  28.00 KB  
  -1.00 B  FINALIZE
05:EXCHANGE 1   72.179us   72.179us3   1 0  
  -1.00 B  UNPARTITIONED
02:AGGREGATE3   79.054us   83.385us3   1  28.00 KB  
 10.00 MB
04:AGGREGATE36.559ms7.669ms   14.82K  -1  17.32 MB  
128.00 MB  FINALIZE
03:EXCHANGE 3   67.370us   85.068us   15.60K  -1 0  
0  HASH(ss_customer_sk)
01:AGGREGATE3   19.245ms   24.472ms   15.60K  -1   9.48 MB  
128.00 MB  STREAMING
00:SCAN HDFS3   53.173ms   55.844ms  183.59K  -1   1.18 MB  
 16.00 MB  tpcds_parquet.store_sales


[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-03-15 Thread Taras Bobrovytsky (Code Review)
Hello Marcel Kornacker, Impala Public Jenkins, Matthew Jacobs, Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6025

to look at the new patch set (#17).

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..

IMPALA-4787: Optimize APPX_MEDIAN() memory usage

Before this change, ReservoirSample functions (such as APPX_MEDIAN())
allocated memory for 20,000 elements up front per grouping key. This
caused inefficient memory usage for aggregations with many grouping
keys.

This patch fixes this by initially allocating memory for 16 elements.
Once the buffer becomes full, we reallocate a new buffer with double
capacity and copy the original buffer into the new one. We continue
doubling the buffer size until the buffer has room for 20,000 elements
as before.

Testing:
Added some EE APPX_MEDIAN() tests on larger datasets that exercise the
resize code path.

Perf Benchrmark (about 35,000 elements per bucket):

SELECT MAX(a) from (
  SELECT c1, appx_median(c2) as a FROM benchmark GROUP BY c1) t

BEFORE: 11s067ms
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
-
06:AGGREGATE1  124.726us  124.726us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   29.544us   29.544us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   86.406us  120.372us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE31s840ms2s824ms   2.00K  -1   1.02 GB  
128.00 MB  FINALIZE
03:EXCHANGE 31s163ms1s989ms   6.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE33s356ms3s416ms   6.00K  -1   1.95 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   64.962ms   65.490ms  65.54M  -1  25.97 MB   
64.00 MB  tpcds_10_parquet.benchmark

AFTER: 9s465ms
Operator   #Hosts   Avg Time  Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail

06:AGGREGATE1   73.961us  73.961us   1   1  28.00 KB
-1.00 B  FINALIZE
05:EXCHANGE 1   18.101us  18.101us   3   1 0
-1.00 B  UNPARTITIONED
02:AGGREGATE3   75.795us  83.969us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE31s608ms   2s683ms   2.00K  -1   1.02 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  826.683ms   1s322ms   6.00K  -1 0
  0  HASH(c1)
01:AGGREGATE32s457ms   2s672ms   6.00K  -1   3.14 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   81.514ms  89.056ms  65.54M  -1  25.94 MB   
64.00 MB  tpcds_10_parquet.benchmark

Memory Benchmark (about 12 elements per bucket):

SELECT MAX(a) FROM (
  SELECT ss_customer_sk, APPX_MEDIAN(ss_sold_date_sk) as a
  FROM tpcds_parquet.store_sales
  GROUP BY ss_customer_sk) t

BEFORE: 7s477ms
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail
-
06:AGGREGATE1  114.686us  114.686us1   1  28.00 KB  
  -1.00 B  FINALIZE
05:EXCHANGE 1   18.214us   18.214us3   1 0  
  -1.00 B  UNPARTITIONED
02:AGGREGATE3  147.055us  165.464us3   1  28.00 KB  
 10.00 MB
04:AGGREGATE32s043ms2s147ms   14.82K  -1   4.94 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  840.528ms  943.254ms   15.61K  -1 0  
0  HASH(ss_customer_sk)
01:AGGREGATE31s769ms1s869ms   15.61K  -1   5.32 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   17.941ms   37.109ms  183.59K  -1   1.94 MB  
 16.00 MB  tpcds_parquet.store_sales

AFTER: 434ms
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail
-
06:AGGREGATE1  125.915us  125.915us1   1  28.00 KB  
  -1.00 B  FINALIZE
05:EXCHANGE 1   72.179us   72.179us3   1 0  
  -1.00 B  UNPARTITIONED
02:AGGREGATE3   79.054us   83.385us3   1  28.00 KB  
 10.00 MB
04:AGGREGATE36.559ms7.669ms   14.82K  -1  17.32 MB  
128.00 MB  FINALIZE
03:EXCHANGE 3   67.370us   85.068us   15.60K  -1 0  
0  HASH(ss_customer_sk)
01:AGGREGATE3   19.245ms   24.472ms   15.60K  -1   

[native-toolchain-CR] IMPALA-5057: Upgrade gflags (2.2.0) and glog (0.3.4-p2)

2017-03-15 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-5057: Upgrade gflags (2.2.0) and glog (0.3.4-p2)
..


Patch Set 2:

Lars took a look at the glog breakpad patch, and Matt took a quick look at the 
glog message listener patch.

-- 
To view, visit http://gerrit.cloudera.org:8080/6372
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I43bfe51a7b85f55f75e0bfe9f4eda1eea7856ff5
Gerrit-PatchSet: 2
Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[native-toolchain-CR] IMPALA-5057: Upgrade gflags (2.2.0) and glog (0.3.4-p2)

2017-03-15 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-5057: Upgrade gflags (2.2.0) and glog (0.3.4-p2)
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6372/2/source/gflags/build.sh
File source/gflags/build.sh:

PS2, Line 33:   if [ -e CMakeLists.txt ]; then
: wrap cmake -DBUILD_STATIC_LIBS=ON 
-DCMAKE_INSTALL_PREFIX=$LOCAL_INSTALL -DCMAKE_BUILD_TYPE=RELEASE
:   else
: wrap ./configure --with-pic --prefix=$LOCAL_INSTALL
> I assume the CMakeLists.txt path is for the newer version? I think it'd be 
Would prefer not to use PACKAGE_VERSION, as I find version checking in bash to 
be pretty error-prone (at least with my skill level in bash). I added a comment 
to clarify.


-- 
To view, visit http://gerrit.cloudera.org:8080/6372
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I43bfe51a7b85f55f75e0bfe9f4eda1eea7856ff5
Gerrit-PatchSet: 2
Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: Yes


[native-toolchain-CR] IMPALA-5057: Upgrade gflags (2.2.0) and glog (0.3.4-p2)

2017-03-15 Thread Henry Robinson (Code Review)
Hello Matthew Jacobs,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6372

to look at the new patch set (#3).

Change subject: IMPALA-5057: Upgrade gflags (2.2.0) and glog (0.3.4-p2)
..

IMPALA-5057: Upgrade gflags (2.2.0) and glog (0.3.4-p2)

Upgrade gflags and glog together, glog is dependent on gflags.

Fix patch 0001 to install a message listener that no longer worked in
glog 0.3.4. Patched branch available for inspection at:

https://github.com/henryr/glog/commits/0.3.4-p2

Tested message listener reimplementation with Impala. glog unit tests
also pass.

Impala breakpad DCHECK test, which exercises the patched path, passes.

Change-Id: I43bfe51a7b85f55f75e0bfe9f4eda1eea7856ff5
---
M buildall.sh
M source/gflags/build.sh
M source/glog/build.sh
A 
source/glog/glog-0.3.4-patches/0001-Allow-glog-to-accept-a-message-listener.patch
A 
source/glog/glog-0.3.4-patches/0002-Preserve-custom-signal-handler-for-dcheck.patch
5 files changed, 284 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/native-toolchain 
refs/changes/72/6372/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6372
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I43bfe51a7b85f55f75e0bfe9f4eda1eea7856ff5
Gerrit-PatchSet: 3
Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Matthew Jacobs 


[Impala-ASF-CR] IMPALA-3203: Part 1: Free list implementation

2017-03-15 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6410

Change subject: IMPALA-3203: Part 1: Free list implementation
..

IMPALA-3203: Part 1: Free list implementation

We will have a single free list per size class, protected by a
SpinLock. Free buffers are stored in a heap, ordered by the
memory address of the buffer, to help reduce address space
fragmentation.

A follow-up patch will use the free lists in BufferPool.

Testing:
Added unit tests for sanity checks and verification of behaviour
that is trickier to check in integration or system tests.
The cost will be exercised more thoroughly via BufferPool
in Part 2.

Performance:
Includes a benchmark that demonstrates the scalability of
the free lists under concurrency. When measuring pure throughput
of free list operations, having a free list per core is
significantly faster than a shared free list, or allocating
directly from TCMalloc. On 8 cores, if the memory allocated is
actually touched then for 64KB+ buffers, memory access is the
bottleneck rather than lock contention.

This suggests that having a free list per core is more than sufficient
(however, we will need to validate this with system concurrency
benchmarks once we switch to using this during query execution).

Change-Id: Ia89acfa4efdecb96d3678443b4748932b4133b9b
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/free-lists-benchmark.cc
M be/src/runtime/bufferpool/CMakeLists.txt
M be/src/runtime/bufferpool/buffer-allocator.cc
M be/src/runtime/bufferpool/buffer-allocator.h
M be/src/runtime/bufferpool/buffer-pool.cc
M be/src/runtime/bufferpool/buffer-pool.h
A be/src/runtime/bufferpool/free-list-test.cc
A be/src/runtime/bufferpool/free-list.h
9 files changed, 860 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/6410/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6410
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ia89acfa4efdecb96d3678443b4748932b4133b9b
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-03-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 16: Verified-1

Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/378/

-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool

2017-03-15 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool
..


Patch Set 13: Code-Review+2

Rebased onto https://gerrit.cloudera.org/#/c/6313/

-- 
To view, visit http://gerrit.cloudera.org:8080/5811
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7bb47a818564ac19a6be4ca88db0578f4ea0b709
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool

2017-03-15 Thread Tim Armstrong (Code Review)
Hello Dan Hecht,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5811

to look at the new patch set (#13).

Change subject: IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool
..

IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool

Add a copy of BufferedTupleStream that allocates memory from BufferPool.
This will replace the original implementation when IMPALA-3200 is
completed.

The major changes are:
* Terminology is updated from "blocks" to "pages"
* No small buffer support is needed (hooray!).
* BufferedTupleStream needs to do its own tracking of # of rows per
  page, etc instead of relying on BufferedBlockMgr to do it. A
  wrapper around PageHandle is used.
* Profile counters (unpin, pin, get new block time) are removed -
  similar counters in the BufferPool client are more useful.
* Changed the tuple null indicators so that they are allocated
  before each tuple, rather than in a block at the start of the
  page. This slightly reduces the memory density, but greatly
  simplifies much logic. In particular, it avoids problems with
  RowIdx and larger pages with offsets that don't fit in 16 bits.
* Buffer management of read/write streams uses the new pin-counting
  support to separate pinning of the read and write pages.
  This means that the reservation usage of an unpinned read/write
  stream does not fluctuate and the read/write iterators can always
  advance without requiring additional reservation.

Testing this required some further changes. TestEnv was refactored
so it can set up either BufferPool or BufferedBlockMgr. Some
BufferPool-related state is added to ExecEnv, QueryState and
RuntimeState, but is only created for backend tests that explicitly
create a BufferPool.

The following is left for future work:
* IMPALA-3808 (large row support) is not added. I've added
  TODOs to the code to places that will require changes.
* IMPALA-4179 (remove MarkNeedsDeepCopy()) is not fixed, since
  it requires global changes to operators that accumulate memory.

Testing:
All of the BufferedTupleStream unit tests are ported to the new
implementation, except for ones specifically testing the small
buffer functionality.

Change-Id: I7bb47a818564ac19a6be4ca88db0578f4ea0b709
---
M be/src/exec/hash-table-test.cc
M be/src/runtime/CMakeLists.txt
M be/src/runtime/buffered-block-mgr-test.cc
M be/src/runtime/buffered-tuple-stream-test.cc
A be/src/runtime/buffered-tuple-stream-v2-test.cc
A be/src/runtime/buffered-tuple-stream-v2.cc
A be/src/runtime/buffered-tuple-stream-v2.h
A be/src/runtime/buffered-tuple-stream-v2.inline.h
M be/src/runtime/bufferpool/buffer-pool-test.cc
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/test-env.cc
M be/src/runtime/test-env.h
M be/src/runtime/tmp-file-mgr-test.cc
20 files changed, 3,002 insertions(+), 100 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/5811/13
-- 
To view, visit http://gerrit.cloudera.org:8080/5811
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7bb47a818564ac19a6be4ca88db0578f4ea0b709
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool

2017-03-15 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool
..


Patch Set 12: Code-Review+2

Carry +2

-- 
To view, visit http://gerrit.cloudera.org:8080/5811
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7bb47a818564ac19a6be4ca88db0578f4ea0b709
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5077: add NUMA and current cpu to CpuInfo

2017-03-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5077: add NUMA and current cpu to CpuInfo
..


Patch Set 7:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/380/

-- 
To view, visit http://gerrit.cloudera.org:8080/6402
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0525228a56bcf20c45f78ee1ba1d300c74cf4d05
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool

2017-03-15 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool
..


Patch Set 11:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/5811/11/be/src/runtime/buffered-tuple-stream-v2.cc
File be/src/runtime/buffered-tuple-stream-v2.cc:

Line 176:   CHECK_CONSISTENCY();
> maybe move these CHECK_CONSISTECY() to the end of the functions so they val
We don't check at the exit points of all functions so I wanted to at least 
check at the entry to some of these functions before they start changing state.


Line 270:   RETURN_IF_ERROR(PinPageIfNeeded(write_page_, pinned_));
> wouldn't it be better to move this to AdvanceWritePage()?  The other caller
Done


Line 299:   // May need to pin the new page for both reading and writing.
> Add: "See ExpectedPinCount()."
Done


Line 374:   pinned_ || 
buffer_pool_client_->reservation()->IncreaseReservationToFit(page_len_);
> let's add a comment, something like:
Done


Line 443: MemTracker* tracker, scoped_ptr* batch, bool* got_rows) 
{
> you don't have to do it now, but I kinda think this function should be in P
Yeah, I think we should really remove it: IMPALA-2758


http://gerrit.cloudera.org:8080/#/c/5811/11/be/src/runtime/buffered-tuple-stream-v2.h
File be/src/runtime/buffered-tuple-stream-v2.h:

PS11, Line 498: and to pin it for reading if the stream's current
  :   /// state requires it. 
> see comment in cc file on this function. if you take that suggestion, then 
Done


PS11, Line 518: write iterator is
> iterators are
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5811
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7bb47a818564ac19a6be4ca88db0578f4ea0b709
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool

2017-03-15 Thread Tim Armstrong (Code Review)
Hello Dan Hecht,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5811

to look at the new patch set (#12).

Change subject: IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool
..

IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool

Add a copy of BufferedTupleStream that allocates memory from BufferPool.
This will replace the original implementation when IMPALA-3200 is
completed.

The major changes are:
* Terminology is updated from "blocks" to "pages"
* No small buffer support is needed (hooray!).
* BufferedTupleStream needs to do its own tracking of # of rows per
  page, etc instead of relying on BufferedBlockMgr to do it. A
  wrapper around PageHandle is used.
* Profile counters (unpin, pin, get new block time) are removed -
  similar counters in the BufferPool client are more useful.
* Changed the tuple null indicators so that they are allocated
  before each tuple, rather than in a block at the start of the
  page. This slightly reduces the memory density, but greatly
  simplifies much logic. In particular, it avoids problems with
  RowIdx and larger pages with offsets that don't fit in 16 bits.
* Buffer management of read/write streams uses the new pin-counting
  support to separate pinning of the read and write pages.
  This means that the reservation usage of an unpinned read/write
  stream does not fluctuate and the read/write iterators can always
  advance without requiring additional reservation.

Testing this required some further changes. TestEnv was refactored
so it can set up either BufferPool or BufferedBlockMgr. Some
BufferPool-related state is added to ExecEnv, QueryState and
RuntimeState, but is only created for backend tests that explicitly
create a BufferPool.

The following is left for future work:
* IMPALA-3808 (large row support) is not added. I've added
  TODOs to the code to places that will require changes.
* IMPALA-4179 (remove MarkNeedsDeepCopy()) is not fixed, since
  it requires global changes to operators that accumulate memory.

Testing:
All of the BufferedTupleStream unit tests are ported to the new
implementation, except for ones specifically testing the small
buffer functionality.

Change-Id: I7bb47a818564ac19a6be4ca88db0578f4ea0b709
---
M be/src/exec/hash-table-test.cc
M be/src/runtime/CMakeLists.txt
M be/src/runtime/buffered-block-mgr-test.cc
M be/src/runtime/buffered-tuple-stream-test.cc
A be/src/runtime/buffered-tuple-stream-v2-test.cc
A be/src/runtime/buffered-tuple-stream-v2.cc
A be/src/runtime/buffered-tuple-stream-v2.h
A be/src/runtime/buffered-tuple-stream-v2.inline.h
M be/src/runtime/bufferpool/buffer-pool-test.cc
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/test-env.cc
M be/src/runtime/test-env.h
M be/src/runtime/tmp-file-mgr-test.cc
20 files changed, 3,012 insertions(+), 101 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/5811/12
-- 
To view, visit http://gerrit.cloudera.org:8080/5811
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7bb47a818564ac19a6be4ca88db0578f4ea0b709
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5077: add NUMA and current cpu to CpuInfo

2017-03-15 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5077: add NUMA and current cpu to CpuInfo
..


Patch Set 7: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6402
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0525228a56bcf20c45f78ee1ba1d300c74cf4d05
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5077: add NUMA and current cpu to CpuInfo

2017-03-15 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-5077: add NUMA and current cpu to CpuInfo
..


Patch Set 7: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6402
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0525228a56bcf20c45f78ee1ba1d300c74cf4d05
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5077: add NUMA and current cpu to CpuInfo

2017-03-15 Thread Tim Armstrong (Code Review)
Hello Jim Apple, Dan Hecht,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6402

to look at the new patch set (#7).

Change subject: IMPALA-5077: add NUMA and current cpu to CpuInfo
..

IMPALA-5077: add NUMA and current cpu to CpuInfo

NUMA info is found using the /sys filesystem.

The current CPU can be found using sched_getcpu(), which is supported
on all recent Linux kernels (unfortunately CentOS 5 shipped with an
older kernel, so we need a fallback).

Testing:
Confirmed that this built on a range of different Linux distros,
including CentOS 5, which is missing support for features like
sched_getcpu().

Change-Id: I0525228a56bcf20c45f78ee1ba1d300c74cf4d05
---
M be/CMakeLists.txt
A be/src/common/.gitignore
M be/src/common/CMakeLists.txt
A be/src/common/config.h.in
M be/src/util/cpu-info.cc
M be/src/util/cpu-info.h
6 files changed, 167 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/6402/7
-- 
To view, visit http://gerrit.cloudera.org:8080/6402
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0525228a56bcf20c45f78ee1ba1d300c74cf4d05
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5077: add NUMA and current cpu to CpuInfo

2017-03-15 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5077: add NUMA and current cpu to CpuInfo
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6402/6/be/src/util/cpu-info.cc
File be/src/util/cpu-info.cc:

Line 183: for (int core = 0; core < max_num_cores_; ++core) 
core_to_numa_node_[core] = 0;
> std::fill
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6402
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0525228a56bcf20c45f78ee1ba1d300c74cf4d05
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

2017-03-15 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-4029: Reduce memory requirements for storing file 
metadata
..


Patch Set 1:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/6406/1//COMMIT_MSG
Commit Message:

Line 11: serialization library.
Any benchmark numbers? Just curious. Also I think it is good to add them here.


http://gerrit.cloudera.org:8080/#/c/6406/1/common/fbs/CMakeLists.txt
File common/fbs/CMakeLists.txt:

PS1, Line 24:  
Trailing white spaces at multiple places in this file.


Line 45: set(CPP_ARGS --cpp -o ${BE_OUTPUT_DIR} -b)
Can this be moved outside the loop like JAVA_FE_ARGS


http://gerrit.cloudera.org:8080/#/c/6406/1/common/fbs/CatalogObjects.fbs
File common/fbs/CatalogObjects.fbs:

PS1, Line 20:  
extraneous white spaces.


PS1, Line 54: file_name_prefix_idx
Didn't understand what this prefix means till I read 
HdfsTable.fileNamePrefixes_. I think it should be pointed out here.


Line 70: root_type FbFileDesc;
What is the significance of this? I'm reading about FlatBuffers for the first 
time, so pardon my ignorance. From what I understand, this isn't required in 
strongly typed systems. Please correct me if I'm wrong.


http://gerrit.cloudera.org:8080/#/c/6406/1/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

Line 318: private static int REPLICA_HOST_IDX_MASK = 0x7FFF;
Document these?


PS1, Line 318: REPLICA_HOST_IDX_MASK
Looks like it used for UNMASKing, rename may be?


PS1, Line 357: getHostIdx
how about just making hostIdx_ a short rather than typecasting everytime?


PS1, Line 360: REPLICA_HOST_CACHE_MASK
Shouldn't this be short rather than int? May be it works fine this way, just 
want to be sure.


http://gerrit.cloudera.org:8080/#/c/6406/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 315: Preconditions.checkNotNull(fd);
I don't think this FileDescriptor.create() can ever return null? If yes, worth 
removing Preconditions check here and elsewhere.


Line 338:* Computes the common prefix between 'fileName' and 'prevFileName' 
and returns a
I like the idea but I'm wondering if this kind of optimization is a little 
premature. Especially given the filenames generated by Impala writes are 
typically random UUID style strings and would likely not qualify this 
optimization. Thoughts? Please correct me if I got this wrong.


Line 420:   prefixCompressFileName(currentFileName, prevFileName);
Also I think it may not be optimal to construct the prefixes this way by 
comparing the previous name with the current name. May be we should huffman 
style encoding that minimizes the no. of bits across the entire set of file 
names?


http://gerrit.cloudera.org:8080/#/c/6406/1/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

Line 96:   protected static EnumSet SUPPORTED_TABLE_TYPES = 
EnumSet.of(
May be remove TableLoader.SUPPORTED_TABLE_TYPES and make it use this one by 
making it public?


-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool

2017-03-15 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool
..


Patch Set 11: Code-Review+2

(7 comments)

http://gerrit.cloudera.org:8080/#/c/5811/11/be/src/runtime/buffered-tuple-stream-v2.cc
File be/src/runtime/buffered-tuple-stream-v2.cc:

Line 176:   CHECK_CONSISTENCY();
maybe move these CHECK_CONSISTECY() to the end of the functions so they 
validate the resulting state, but feel free to ignore.


Line 270:   RETURN_IF_ERROR(PinPageIfNeeded(write_page_, pinned_));
wouldn't it be better to move this to AdvanceWritePage()?  The other callers 
handle it for their specific case (i.e. by calling PrepareForReadInternal(), or 
not needing it), so this is really just for when we're adding a new write page 
to an existing stream, no?


Line 299:   // May need to pin the new page for both reading and writing.
Add: "See ExpectedPinCount()."
since that's where we really document how we manage reservations/pins.


Line 374:   pinned_ || 
buffer_pool_client_->reservation()->IncreaseReservationToFit(page_len_);
let's add a comment, something like:
// If already pinned, no additional pin is needed (see ExpectedPinCount()).


Line 443: MemTracker* tracker, scoped_ptr* batch, bool* got_rows) 
{
you don't have to do it now, but I kinda think this function should be in PHJ. 
It's really just a composition of buffered-tuple-stream operations, and it's 
not something that we should be doing generally.


http://gerrit.cloudera.org:8080/#/c/5811/11/be/src/runtime/buffered-tuple-stream-v2.h
File be/src/runtime/buffered-tuple-stream-v2.h:

PS11, Line 498: and to pin it for reading if the stream's current
  :   /// state requires it. 
see comment in cc file on this function. if you take that suggestion, then 
update this.


PS11, Line 518: write iterator is
iterators are


-- 
To view, visit http://gerrit.cloudera.org:8080/5811
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7bb47a818564ac19a6be4ca88db0578f4ea0b709
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5077: add NUMA and current cpu to CpuInfo

2017-03-15 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-5077: add NUMA and current cpu to CpuInfo
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6402/6/be/src/util/cpu-info.cc
File be/src/util/cpu-info.cc:

Line 183: for (int core = 0; core < max_num_cores_; ++core) 
core_to_numa_node_[core] = 0;
std::fill


-- 
To view, visit http://gerrit.cloudera.org:8080/6402
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0525228a56bcf20c45f78ee1ba1d300c74cf4d05
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5077: add NUMA and current cpu to CpuInfo

2017-03-15 Thread Tim Armstrong (Code Review)
Hello Jim Apple, Dan Hecht,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6402

to look at the new patch set (#6).

Change subject: IMPALA-5077: add NUMA and current cpu to CpuInfo
..

IMPALA-5077: add NUMA and current cpu to CpuInfo

NUMA info is found using the /sys filesystem.

The current CPU can be found using sched_getcpu(), which is supported
on all recent Linux kernels (unfortunately CentOS 5 shipped with an
older kernel, so we need a fallback).

Testing:
Confirmed that this built on a range of different Linux distros,
including CentOS 5, which is missing support for features like
sched_getcpu().

Change-Id: I0525228a56bcf20c45f78ee1ba1d300c74cf4d05
---
M be/CMakeLists.txt
A be/src/common/.gitignore
M be/src/common/CMakeLists.txt
A be/src/common/config.h.in
M be/src/util/cpu-info.cc
M be/src/util/cpu-info.h
6 files changed, 166 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/6402/6
-- 
To view, visit http://gerrit.cloudera.org:8080/6402
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0525228a56bcf20c45f78ee1ba1d300c74cf4d05
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5077: add NUMA and current cpu to CpuInfo

2017-03-15 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5077: add NUMA and current cpu to CpuInfo
..


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6402/5/be/src/util/cpu-info.cc
File be/src/util/cpu-info.cc:

PS5, Line 161: getcpu
> should that say sched_getcpu()?
Yes - sched_getcpu() is a glibc wrapper around the getcpu() syscall.


Line 186: LOG(FATAL) << "Could not find nodes in /sys/devices/system/node";
> are we confident that all distros impala runs on will have this?
Oops, I intended to check for the existence of the directory (it's not present 
if the kernel has NUMA support disabled). Now all errors should be non-fatal 
and just fall back to assuming one node.


-- 
To view, visit http://gerrit.cloudera.org:8080/6402
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0525228a56bcf20c45f78ee1ba1d300c74cf4d05
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] [DOCS] Pare down ODBC info

2017-03-15 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: [DOCS] Pare down ODBC info
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6409/1/docs/topics/impala_odbc.xml
File docs/topics/impala_odbc.xml:

PS1, Line 39: can be integrate
"can be integrated"


PS1, Line 42: all been approved
by whom?


-- 
To view, visit http://gerrit.cloudera.org:8080/6409
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I302d23432d348aa0c9c2837c0f42bc4f2844f9b8
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Jim Apple 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool

2017-03-15 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool
..


Patch Set 10:

(19 comments)

http://gerrit.cloudera.org:8080/#/c/5811/9/be/src/runtime/buffered-tuple-stream-v2.cc
File be/src/runtime/buffered-tuple-stream-v2.cc:

PS9, Line 322: 
> !has_read_iterator() ?
Done


http://gerrit.cloudera.org:8080/#/c/5811/10/be/src/runtime/buffered-tuple-stream-v2.cc
File be/src/runtime/buffered-tuple-stream-v2.cc:

Line 101:   for (const Page& page : pages_) {
> will this be too expensive, even in debug builds (and we should make sure t
We were already doing a pass for the list in one of the Calc* methods, so it's 
not likely to be significantly worse. I removed the consistency check from 
GetNextInternal(), which is likely to be the most commonly-called.

I added a CHECK_CONSISTENCY macro to guarantee the calls get removed in release 
builds.


PS10, Line 125: read_page_ == pages_.end()
> has_read_iterator() and switch clauses?
Done


Line 129:   }
> does it make sense to print the write page as well?
It's printed on line 123


PS10, Line 149:   // This must be the first call to PrepareForWrite().
> Maybe say, this must be the first iterator? i.e. you can't call this functi
Done


PS10, Line 158:   // This must be the first call to PrepareForWrite().
> same
Done


PS10, Line 205: if
> when
Done


PS10, Line 211: If
> When
Done


PS10, Line 258: get_extra_reservation
> it's unfortunate we need this. i wonder if moving the reservation logic (an
Done


PS10, Line 272:  bool pin_for_read = has_read_iterator() && pinned_;
  :   get_extra_reservation |= pin_for_read;
> this is still pretty confusing. if you don't move the reservation logic com
I restructured this code and move the reservation logic into the callers. It 
adds a bit more boilerplate but seems less magical.


Line 283:   if (pin_for_read) RETURN_IF_ERROR(PinPage(_page));
> see comment above.
Done


Line 298: int64_t row_size, bool get_extra_reservation, bool* got_page) 
noexcept {
> this wrapper now seems unnecessary and not helpful for readability because 
Done


PS10, Line 339: read_page_ == pages_.end()
> !has_read_iterator()?
Done


Line 379:   ResetReadPage();
> this seems to make more sense to do in PrepareForRead() since that's the on
Done


http://gerrit.cloudera.org:8080/#/c/5811/10/be/src/runtime/buffered-tuple-stream-v2.h
File be/src/runtime/buffered-tuple-stream-v2.h:

PS10, Line 217: got_buffer
> maybe this should be 'got_reservation' now, and explained in terms of reser
Works for me - done.


PS10, Line 228: got_buffer
> same
Done


PS10, Line 496: get_extra_reservation
> this needs explanation, but first see comments in the cc file about it.
Reworked this.


Line 503:   bool* got_page) noexcept WARN_UNUSED_RESULT;
> seems like we can put this code in NewWritePage() now.
This ended up being restructured a bit - some of the code is in a new function 
AdvanceWritePage().


PS10, Line 509: got_buffer
> got_page?
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5811
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7bb47a818564ac19a6be4ca88db0578f4ea0b709
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool

2017-03-15 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#11).

Change subject: IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool
..

IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool

Add a copy of BufferedTupleStream that allocates memory from BufferPool.
This will replace the original implementation when IMPALA-3200 is
completed.

The major changes are:
* Terminology is updated from "blocks" to "pages"
* No small buffer support is needed (hooray!).
* BufferedTupleStream needs to do its own tracking of # of rows per
  page, etc instead of relying on BufferedBlockMgr to do it. A
  wrapper around PageHandle is used.
* Profile counters (unpin, pin, get new block time) are removed -
  similar counters in the BufferPool client are more useful.
* Changed the tuple null indicators so that they are allocated
  before each tuple, rather than in a block at the start of the
  page. This slightly reduces the memory density, but greatly
  simplifies much logic. In particular, it avoids problems with
  RowIdx and larger pages with offsets that don't fit in 16 bits.
* Buffer management of read/write streams uses the new pin-counting
  support to separate pinning of the read and write pages.
  This means that the reservation usage of an unpinned read/write
  stream does not fluctuate and the read/write iterators can always
  advance without requiring additional reservation.

Testing this required some further changes. TestEnv was refactored
so it can set up either BufferPool or BufferedBlockMgr. Some
BufferPool-related state is added to ExecEnv, QueryState and
RuntimeState, but is only created for backend tests that explicitly
create a BufferPool.

The following is left for future work:
* IMPALA-3808 (large row support) is not added. I've added
  TODOs to the code to places that will require changes.
* IMPALA-4179 (remove MarkNeedsDeepCopy()) is not fixed, since
  it requires global changes to operators that accumulate memory.

Testing:
All of the BufferedTupleStream unit tests are ported to the new
implementation, except for ones specifically testing the small
buffer functionality.

Change-Id: I7bb47a818564ac19a6be4ca88db0578f4ea0b709
---
M be/src/exec/hash-table-test.cc
M be/src/runtime/CMakeLists.txt
M be/src/runtime/buffered-block-mgr-test.cc
M be/src/runtime/buffered-tuple-stream-test.cc
A be/src/runtime/buffered-tuple-stream-v2-test.cc
A be/src/runtime/buffered-tuple-stream-v2.cc
A be/src/runtime/buffered-tuple-stream-v2.h
A be/src/runtime/buffered-tuple-stream-v2.inline.h
M be/src/runtime/bufferpool/buffer-pool-test.cc
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/test-env.cc
M be/src/runtime/test-env.h
M be/src/runtime/tmp-file-mgr-test.cc
20 files changed, 3,010 insertions(+), 101 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/5811/11
-- 
To view, visit http://gerrit.cloudera.org:8080/5811
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7bb47a818564ac19a6be4ca88db0578f4ea0b709
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] [DOCS] Pare down ODBC info

2017-03-15 Thread Greg Rahn (Code Review)
Greg Rahn has posted comments on this change.

Change subject: [DOCS] Pare down ODBC info
..


Patch Set 1: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6409
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I302d23432d348aa0c9c2837c0f42bc4f2844f9b8
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Greg Rahn 
Gerrit-HasComments: No


[Impala-ASF-CR] [DOCS] Pare down ODBC info

2017-03-15 Thread John Russell (Code Review)
John Russell has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6409

Change subject: [DOCS] Pare down ODBC info
..

[DOCS] Pare down ODBC info

Remove the Cloudera-specific portion of the
Impala + ODBC page, leaving just the background
info about why to use ODBC. Defer to third parties
who supply ODBC drivers.

Change-Id: I302d23432d348aa0c9c2837c0f42bc4f2844f9b8
---
M docs/topics/impala_odbc.xml
1 file changed, 5 insertions(+), 174 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/6409/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6409
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I302d23432d348aa0c9c2837c0f42bc4f2844f9b8
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 


[Impala-ASF-CR] IMPALA-4831: enforce BufferPool reservation invariants

2017-03-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4831: enforce BufferPool reservation invariants
..


Patch Set 3:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/379/

-- 
To view, visit http://gerrit.cloudera.org:8080/6313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5b0c335d6e73250f7e5a3b9ce2f999c5119c573
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4318: Kudu support for CREATE EXTERNAL TABLE AS SELECT

2017-03-15 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change.

Change subject: IMPALA-4318:  Kudu support for CREATE EXTERNAL TABLE AS SELECT
..


Patch Set 6:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6261/6/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java:

PS6, Line 201: Ingested
> how about analyzeExistingKuduTableParams ?
Done


http://gerrit.cloudera.org:8080/#/c/6261/6/fe/src/test/java/org/apache/impala/analysis/ParserTest.java
File fe/src/test/java/org/apache/impala/analysis/ParserTest.java:

Line 2793: 
> nit extra line
Done


http://gerrit.cloudera.org:8080/#/c/6261/6/tests/query_test/test_kudu.py
File tests/query_test/test_kudu.py:

PS6, Line 360: kudu_tbl_name = 
KuduTestSuite.to_kudu_table_name(unique_database, "foo2")
 : assert kudu_client.table_exists(kudu_tbl_name)
 : cursor.execute("SELECT * FROM %s.foo2" % (unique_database))
 : assert len(cursor.fetchall()) == 1
 : cursor.execute("DROP TABLE %s.foo2" % (unique_database))
 : assert kudu_client.table_exists(kudu_tbl_name)
> I think you wanna wrap this in a try/finally and make sure to drop the kudu
Unless I'm mistaken, the entire unique_database gets dropped at the end of the 
test.  The only reason I'm dropping the tables at all here is to test the 
external vs. managed table behavior.  See for example test_kudu_col_removed, 
test_kudu_rename_table which don't bother to clean up.


-- 
To view, visit http://gerrit.cloudera.org:8080/6261
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9aa82809a6c0c5e6386827314b7e5b520c1a6633
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5072: Fix test recover partitions on S3

2017-03-15 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5072: Fix test_recover_partitions on S3
..


Patch Set 1: Code-Review+2

In terms of testing, you may run a private jenkins jobs to test your 
uncommitted changes against s3 using 
impala-auxiliary-tests/jenkins/run_impala_tests.py.

-- 
To view, visit http://gerrit.cloudera.org:8080/6408
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I30d86e49b000db925a778ff53479b61a075dc88c
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5072: Fix test recover partitions on S3

2017-03-15 Thread Zach Amsden (Code Review)
Zach Amsden has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6408

Change subject: IMPALA-5072: Fix test_recover_partitions on S3
..

IMPALA-5072: Fix test_recover_partitions on S3

On S3, directory creation is a nop, so we have to actually
create values to materialize the partitions.

Testing: I'm still trying to figure out how to run the tests against
S3, but reasonably confident this change will actually fix the problem.

Change-Id: I30d86e49b000db925a778ff53479b61a075dc88c
---
M tests/metadata/test_recover_partitions.py
1 file changed, 3 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/6408/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6408
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I30d86e49b000db925a778ff53479b61a075dc88c
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 


[Impala-ASF-CR] IMPALA-4831: enforce BufferPool reservation invariants

2017-03-15 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-4831: enforce BufferPool reservation invariants
..


Patch Set 3: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5b0c335d6e73250f7e5a3b9ce2f999c5119c573
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5077: add NUMA and current cpu to CpuInfo

2017-03-15 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5077: add NUMA and current cpu to CpuInfo
..


Patch Set 5: Code-Review+2

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6402/5/be/src/util/cpu-info.cc
File be/src/util/cpu-info.cc:

PS5, Line 161: getcpu
should that say sched_getcpu()?


Line 186: LOG(FATAL) << "Could not find nodes in /sys/devices/system/node";
are we confident that all distros impala runs on will have this?
why is this fatal, but !found_numa_node is not? why not be consistent?


-- 
To view, visit http://gerrit.cloudera.org:8080/6402
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0525228a56bcf20c45f78ee1ba1d300c74cf4d05
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-03-15 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 16:

Thanks for all the reviews, Alex, Jim, MJ, Mostafa, Marcel!

-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-03-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 16:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/378/

-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3586: Implement union passthrough

2017-03-15 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-3586: Implement union passthrough
..


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5816/9//COMMIT_MSG
Commit Message:

Line 15: as a precaution and for testing purposes.
> if we're worried about bugs, we should add more tests. i'm not in favor of 
I don't think it's unusual to have "feature flags" for disabling invasive 
changes quickly to unblock users if they do hit a problem - these flags exist 
because it's infeasible to test all possible scenarios.

Let's do more testing and remove the flag.

Taras, let's talk about what additional testing we should do.


-- 
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3748: minimum buffer requirements in planner

2017-03-15 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3748: minimum buffer requirements in planner
..


Patch Set 13:

Updated the explain plan output to use more concise names for estimates and 
reservations. Removed the doc changes from this patch - will update them in a 
follow-on.

-- 
To view, visit http://gerrit.cloudera.org:8080/5847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1e358182bcf2bc5fe5c73883eb97878735b12d37
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3748: minimum buffer requirements in planner

2017-03-15 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#13).

Change subject: IMPALA-3748: minimum buffer requirements in planner
..

IMPALA-3748: minimum buffer requirements in planner

Compute the minimum buffer requirement for spilling nodes and
per-host estimates for the entire plan tree.

This builds on top of the existing resource estimation code, which
computes the sets of plan nodes that can execute concurrently. This is
cleaned up so that the process of producing resource requirements is
clearer. It also removes the unused VCore estimates.

Fixes various bugs and other issues:
* computeCosts() was not called for unpartitioned fragments, so
  the per-operator memory estimate was not visible.
* Nested loop join was not treated as a blocking join.
* The TODO comment about union was misleading
* Fix the computation for mt_dop > 1 by distinguishing per-instance and
  per-host estimates.
* Always generate an estimate instead of unpredictably returning
  -1/"unavailable" in many circumstances - there was little rhyme or
  reason to when this happened.
* Remove the special "trivial plan" estimates. With the rest of the
  cleanup we generate estimates <= 10MB for those trivial plans through
  the normal code path.

I left one bug (IMPALA-4862) unfixed because it is subtle, will affect
estimates for many plans and will be easier to review once we have the
test infra in place.

Testing:
Added basic planner tests for resource requirements in both the MT and
non-MT cases.

Re-enabled the explain_level tests, which appears to be the only
coverage for many of these estimates. Removed the complex and
brittle test cases and replaced with a couple of much simpler
end-to-end tests.

Change-Id: I1e358182bcf2bc5fe5c73883eb97878735b12d37
---
M be/src/exec/analytic-eval-node.cc
M be/src/exec/partitioned-aggregation-node.h
M be/src/exec/partitioned-hash-join-builder.h
M be/src/runtime/sorter.cc
M be/src/scheduling/query-schedule.cc
M be/src/service/query-exec-state.cc
M common/thrift/Frontend.thrift
M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/common/PrintUtils.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/DataSink.java
M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
M fe/src/main/java/org/apache/impala/planner/DataStreamSink.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/EmptySetNode.java
M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/HBaseTableSink.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/JoinBuildSink.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M fe/src/main/java/org/apache/impala/planner/PipelinedPlanNodeSet.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
A fe/src/main/java/org/apache/impala/planner/ResourceProfile.java
M fe/src/main/java/org/apache/impala/planner/SelectNode.java
M fe/src/main/java/org/apache/impala/planner/SingularRowSrcNode.java
M fe/src/main/java/org/apache/impala/planner/SortNode.java
M fe/src/main/java/org/apache/impala/planner/SubplanNode.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/main/java/org/apache/impala/planner/UnnestNode.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/kudu-selectivity.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level0.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level1.test
M 

[Impala-ASF-CR] IMPALA-3586: Implement union passthrough

2017-03-15 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-3586: Implement union passthrough
..


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5816/9//COMMIT_MSG
Commit Message:

Line 15: as a precaution and for testing purposes.
> I agree that "blindly" adding query options is bad, but there are so many t
if we're worried about bugs, we should add more tests. i'm not in favor of 
piling on query options as a work-around for missing test coverage (and i don't 
think we're talking about "testing to death").

query options make the product harder to use. and we do not guarantee that for 
every query we're able to run at the moment that the runtime behavior 
(including memory consumption) will never change in the future. that would be 
unreasonable.


-- 
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3586: Implement union passthrough

2017-03-15 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-3586: Implement union passthrough
..


Patch Set 16:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5816/16/be/src/exec/union-node.h
File be/src/exec/union-node.h:

Line 91:   bool const_todo_;
> Minor naming thing, I don't feel too strongly but 'todo' seems unusual. Alt
Rather than having these variables, how about just saving the index of the 
first materialize child.  Then, these can be simple functions, e.g. 
HasMorePassthrough(), HasMoreMaterialized(), that is derived from existing 
state. that way, there's less dynamic state that needs to be updated and 
reasoned about.


-- 
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool

2017-03-15 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool
..


Patch Set 10:

(19 comments)

http://gerrit.cloudera.org:8080/#/c/5811/9/be/src/runtime/buffered-tuple-stream-v2.cc
File be/src/runtime/buffered-tuple-stream-v2.cc:

PS9, Line 322: 
!has_read_iterator() ?


http://gerrit.cloudera.org:8080/#/c/5811/10/be/src/runtime/buffered-tuple-stream-v2.cc
File be/src/runtime/buffered-tuple-stream-v2.cc:

Line 101:   for (const Page& page : pages_) {
will this be too expensive, even in debug builds (and we should make sure this 
loop is compiled out of release builds)?


PS10, Line 125: read_page_ == pages_.end()
has_read_iterator() and switch clauses?


Line 129:   }
does it make sense to print the write page as well?


PS10, Line 149:   // This must be the first call to PrepareForWrite().
Maybe say, this must be the first iterator? i.e. you can't call this function 
after starting a read iteration or read/write iteration, either, right?


PS10, Line 158:   // This must be the first call to PrepareForWrite().
same


PS10, Line 205: if
when
(sorry, i wrote the comment but 'if' sounds confusing since it's not currently 
pinned)


PS10, Line 211: If
When


PS10, Line 258: get_extra_reservation
it's unfortunate we need this. i wonder if moving the reservation logic (and 
the ResetWritePage() call) to the callers would be simpler?

In any case, if we leave it here, how about calling this parameter 
'get_read_reservation'? also see next comment.


PS10, Line 272:  bool pin_for_read = has_read_iterator() && pinned_;
  :   get_extra_reservation |= pin_for_read;
this is still pretty confusing. if you don't move the reservation logic 
completely into the callers, it may still make sense to move the calculation 
'pinned_ && has_read_iterator()' into AddRowSlow(). The other callsites are 
known to be either static true or false. 

THen you would also move the PinPage() call into the caller (using 
PinPageIfNeeded()), which seems to make sense since in the case of 
PrepareForReadWrite(), you really want PrepareForReadInternal() to do that 
anyway, no?  Or at least you could move this after the write iterator is set up 
and still use PinPageIfNeeded().


Line 283:   if (pin_for_read) RETURN_IF_ERROR(PinPage(_page));
see comment above.


Line 298: int64_t row_size, bool get_extra_reservation, bool* got_page) 
noexcept {
this wrapper now seems unnecessary and not helpful for readability because 
NewWritePage() has no other callers.


PS10, Line 339: read_page_ == pages_.end()
!has_read_iterator()?


Line 379:   ResetReadPage();
this seems to make more sense to do in PrepareForRead() since that's the only 
case you can possible have a read iterator, right? and that assumption is built 
into the reservation management code.


http://gerrit.cloudera.org:8080/#/c/5811/10/be/src/runtime/buffered-tuple-stream-v2.h
File be/src/runtime/buffered-tuple-stream-v2.h:

PS10, Line 217: got_buffer
maybe this should be 'got_reservation' now, and explained in terms of 
reservation. or at least got_page?


PS10, Line 228: got_buffer
same


PS10, Line 496: get_extra_reservation
this needs explanation, but first see comments in the cc file about it.


Line 503:   bool* got_page) noexcept WARN_UNUSED_RESULT;
seems like we can put this code in NewWritePage() now.


PS10, Line 509: got_buffer
got_page?


-- 
To view, visit http://gerrit.cloudera.org:8080/5811
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7bb47a818564ac19a6be4ca88db0578f4ea0b709
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3742: partitions DMLs for Kudu tables

2017-03-15 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-3742: partitions DMLs for Kudu tables
..


Patch Set 8:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/6037/7/be/src/runtime/data-stream-partitioner.cc
File be/src/runtime/data-stream-partitioner.cc:

PS7, Line 51:   KuduTableDescriptor* table_desc_ = 
static_cast(table_desc);
:   sp::shared_ptr client_;
:   RETURN_IF_ERROR(CreateKuduClient(table_desc_->kudu
> probably a codegen opportunity here for later, can you leave a TODO?
Done


http://gerrit.cloudera.org:8080/#/c/6037/7/be/src/runtime/data-stream-partitioner.h
File be/src/runtime/data-stream-partitioner.h:

PS7, Line 45:   // partition number as parameters.
:   // TODO: make this cleaner without sacrifi
> I'm sorry if I wasn't clear when we spoke last, but I think it is still bet
Done


PS7, Line 63: 
> typo
Done


PS7, Line 87:   ++next_unknow
> rename to indicate used for R.R. only, e.g. next_unknown_partition_ or some
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6037
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic10b3295159354888efcde3df76b0edb24161515
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3742: partitions DMLs for Kudu tables

2017-03-15 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has uploaded a new patch set (#8).

Change subject: IMPALA-3742: partitions DMLs for Kudu tables
..

IMPALA-3742: partitions DMLs for Kudu tables

Bulk DMLs (INSERT, UPSERT, UPDATE, and DELETE) for Kudu
are currently painful because we just send rows randomly,
which creates a lot of work for Kudu since it partitions
and sorts data before writing, causing writes to be slow.

We can alleviate this by sending the rows to Kudu already
partitioned and sorted. This patch partitions the rows
according to Kudu's partitioning scheme. A followup patch
will deal with sorting.

It accomplishes this by inserting an exchange node into the
plan before the DML operation. The DataStreamSender then uses
a new abstraction, DataStreamPartitioner, that calls into the
Kudu client to determine the partition for each row.

Testing:
- Updated planner tests.
- Manually verified the partitioning works as expected.

Change-Id: Ic10b3295159354888efcde3df76b0edb24161515
---
M be/src/exec/kudu-table-sink.cc
M be/src/exec/kudu-util.cc
M be/src/exec/kudu-util.h
M be/src/runtime/CMakeLists.txt
M be/src/runtime/coordinator.cc
A be/src/runtime/data-stream-partitioner.cc
A be/src/runtime/data-stream-partitioner.h
M be/src/runtime/data-stream-sender.cc
M be/src/runtime/data-stream-sender.h
M be/src/scheduling/scheduler.cc
M bin/impala-config.sh
M common/thrift/Partitions.thrift
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/planner/DataPartition.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-delete.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-upsert.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
23 files changed, 586 insertions(+), 104 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/6037/8
-- 
To view, visit http://gerrit.cloudera.org:8080/6037
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic10b3295159354888efcde3df76b0edb24161515
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-5077: add NUMA and current cpu to CpuInfo

2017-03-15 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5077: add NUMA and current cpu to CpuInfo
..


Patch Set 5: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6402
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0525228a56bcf20c45f78ee1ba1d300c74cf4d05
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5077: add NUMA and current cpu to CpuInfo

2017-03-15 Thread Tim Armstrong (Code Review)
Hello Jim Apple,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6402

to look at the new patch set (#5).

Change subject: IMPALA-5077: add NUMA and current cpu to CpuInfo
..

IMPALA-5077: add NUMA and current cpu to CpuInfo

NUMA info is found using the /sys filesystem.

The current CPU can be found using sched_getcpu(), which is supported
on all recent Linux kernels (unfortunately CentOS 5 shipped with an
older kernel, so we need a fallback).

Testing:
Confirmed that this built on a range of different Linux distros,
including CentOS 5, which is missing support for features like
sched_getcpu().

Change-Id: I0525228a56bcf20c45f78ee1ba1d300c74cf4d05
---
M be/CMakeLists.txt
A be/src/common/.gitignore
M be/src/common/CMakeLists.txt
A be/src/common/config.h.in
M be/src/util/cpu-info.cc
M be/src/util/cpu-info.h
6 files changed, 155 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/6402/5
-- 
To view, visit http://gerrit.cloudera.org:8080/6402
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0525228a56bcf20c45f78ee1ba1d300c74cf4d05
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

2017-03-15 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4029: Reduce memory requirements for storing file 
metadata
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6406/1/fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java
File fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java:

Line 83: storageIdGenerator.put(host, new Short(diskId));
Drive-by comment: using 'new' here is pessimising memory consumption. If you 
use Short.valueof() or just let Java automatically box it, it will use a 
built-in cache for values -128 to 127, e.g. see 
http://docs.oracle.com/javase/7/docs/api/java/lang/Short.html#valueOf(short)


-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5077: add NUMA and current cpu to CpuInfo

2017-03-15 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5077: add NUMA and current cpu to CpuInfo
..


Patch Set 3: Code-Review+1

Carry Jim's +1

-- 
To view, visit http://gerrit.cloudera.org:8080/6402
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0525228a56bcf20c45f78ee1ba1d300c74cf4d05
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5077: add NUMA and current cpu to CpuInfo

2017-03-15 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5077: add NUMA and current cpu to CpuInfo
..


Patch Set 3:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/6402/3/be/CMakeLists.txt
File be/CMakeLists.txt:

PS3, Line 282:  
> nit: remove space?
Done


http://gerrit.cloudera.org:8080/#/c/6402/3/be/src/common/CMakeLists.txt
File be/src/common/CMakeLists.txt:

PS3, Line 53: CONFIGURE_FILE
> Does cmake automatically understand that compilation depends on this step?
The config file is generated during configuration (i.e. the "cmake ." step), so 
it will always be generated before compilation. I think the only catch is that 
if you modify config.h.in without touching a CMakeLists.txt file (very 
unlikely) then it doesn't trigger regeneration of the file.


http://gerrit.cloudera.org:8080/#/c/6402/3/be/src/util/cpu-info.cc
File be/src/util/cpu-info.cc:

PS3, Line 181: for (; dir_it != fs::directory_iterator(); ++dir_it) {
> directory_iterator doesn't work with ranged for syntax?
Doesn't look like it: https://github.com/Beman/filesystem-proposal/issues/1


http://gerrit.cloudera.org:8080/#/c/6402/3/be/src/util/cpu-info.h
File be/src/util/cpu-info.h:

PS3, Line 87: hould
:   /// be used by Impala.
> what does 'should be used' mean? Available for use?
I added a bit more explanation - it's usually just the number of online cores 
but there's a --num_cores command line options.


-- 
To view, visit http://gerrit.cloudera.org:8080/6402
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0525228a56bcf20c45f78ee1ba1d300c74cf4d05
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

2017-03-15 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6406

Change subject: IMPALA-4029: Reduce memory requirements for storing file 
metadata
..

IMPALA-4029: Reduce memory requirements for storing file metadata

This commit improves the memory requirements for storing file and block
metadata in the catalog and the impalad nodes by using the FlatBuffers
serialization library.

Testing:
Passed an exhaustive tests run.

Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
---
M CMakeLists.txt
A common/fbs/CMakeLists.txt
A common/fbs/CatalogObjects.fbs
M common/thrift/CatalogObjects.thrift
M fe/CMakeLists.txt
M fe/pom.xml
M fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java
M fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java
M fe/src/test/java/org/apache/impala/testutil/BlockIdGenerator.java
14 files changed, 653 insertions(+), 298 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/6406/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 


[Impala-ASF-CR] IMPALA-5077: add NUMA and current cpu to CpuInfo

2017-03-15 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-5077: add NUMA and current cpu to CpuInfo
..


Patch Set 3:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/6402/3/be/CMakeLists.txt
File be/CMakeLists.txt:

PS3, Line 282:  
nit: remove space?


http://gerrit.cloudera.org:8080/#/c/6402/3/be/src/common/CMakeLists.txt
File be/src/common/CMakeLists.txt:

PS3, Line 53: CONFIGURE_FILE
Does cmake automatically understand that compilation depends on this step?


http://gerrit.cloudera.org:8080/#/c/6402/3/be/src/util/cpu-info.cc
File be/src/util/cpu-info.cc:

PS3, Line 181: for (; dir_it != fs::directory_iterator(); ++dir_it) {
directory_iterator doesn't work with ranged for syntax?


http://gerrit.cloudera.org:8080/#/c/6402/3/be/src/util/cpu-info.h
File be/src/util/cpu-info.h:

PS3, Line 87: hould
:   /// be used by Impala.
what does 'should be used' mean? Available for use?


-- 
To view, visit http://gerrit.cloudera.org:8080/6402
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0525228a56bcf20c45f78ee1ba1d300c74cf4d05
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4996: Single-threaded KuduScanNode

2017-03-15 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4996: Single-threaded KuduScanNode
..


Patch Set 6: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6312
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6e4593300e376bc508b78acaea64ffdd2c73a67a
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4996: Single-threaded KuduScanNode

2017-03-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4996: Single-threaded KuduScanNode
..


Patch Set 6:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/377/

-- 
To view, visit http://gerrit.cloudera.org:8080/6312
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6e4593300e376bc508b78acaea64ffdd2c73a67a
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4996: Single-threaded KuduScanNode

2017-03-15 Thread Joe McDonnell (Code Review)
Hello Marcel Kornacker,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6312

to look at the new patch set (#6).

Change subject: IMPALA-4996: Single-threaded KuduScanNode
..

IMPALA-4996: Single-threaded KuduScanNode

This introduces KuduScanNodeMt, the single-threaded version
of KuduScanNode that materializes the tuples in GetNext().
KuduScanNodeMt is enabled by the same condition as
HdfsScanNodeMt: mt_dop is greater than or equal to 1.

To share code between the two implementations, KuduScanNode
and KuduScanNodeMt are now subclasses of KuduScanNodeBase,
which implements the shared code. The KuduScanner is
minimally impacted, as it already had the required GetNext
interface.

Since the KuduClient is a heavy-weight object, it is now
shared at the QueryState level. We try to share the
KuduClient as much as possible, but there are times when
the KuduClient cannot be shared. Each Kudu table has
master addresses stored in the Hive Metastore. We only
share KuduClients for tables that have an identical value
for the master addresses. In the ideal case, every Kudu
table will have the same value, but there is no explicit
guarantee of this.

The testing for this is a modified version of
kudu-scan-node.test run with various mt_dop values.

Change-Id: I6e4593300e376bc508b78acaea64ffdd2c73a67a
---
M be/src/exec/CMakeLists.txt
M be/src/exec/exec-node.cc
M be/src/exec/hdfs-scan-node-base.h
A be/src/exec/kudu-scan-node-base.cc
A be/src/exec/kudu-scan-node-base.h
A be/src/exec/kudu-scan-node-mt.cc
A be/src/exec/kudu-scan-node-mt.h
M be/src/exec/kudu-scan-node.cc
M be/src/exec/kudu-scan-node.h
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
A testdata/workloads/functional-query/queries/QueryTest/mt-dop-kudu.test
M tests/query_test/test_mt_dop.py
17 files changed, 558 insertions(+), 143 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/6312/6
-- 
To view, visit http://gerrit.cloudera.org:8080/6312
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6e4593300e376bc508b78acaea64ffdd2c73a67a
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-03-15 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 16: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5077: add NUMA and current cpu to CpuInfo

2017-03-15 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-5077: add NUMA and current cpu to CpuInfo
..


Patch Set 3: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6402
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0525228a56bcf20c45f78ee1ba1d300c74cf4d05
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4996: Single-threaded KuduScanNode

2017-03-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4996: Single-threaded KuduScanNode
..


Patch Set 5:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/376/

-- 
To view, visit http://gerrit.cloudera.org:8080/6312
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6e4593300e376bc508b78acaea64ffdd2c73a67a
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4731/IMPALA-397/IMPALA-4728: Materialize sort exprs

2017-03-15 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has uploaded a new patch set (#2).

Change subject: IMPALA-4731/IMPALA-397/IMPALA-4728: Materialize sort exprs
..

IMPALA-4731/IMPALA-397/IMPALA-4728: Materialize sort exprs

Previously, exprs used in sorts were evaluated lazily. This can
potentially be bad for performance if the exprs are expensive to
evaluate, and it can lead to crashes if the exprs are
non-deterministic, as this violates assumptions of our sorting
algorithm.

This patch addresses these issues by materializing ordering exprs.
It does so when the expr is non-deterministic (including when it
contains a UDF, which we cannot currently know if they are
non-deterministic), or when its cost exceeds a threshold (or the
cost is unknown).

Testing:
- Added e2e tests in test_sort.py and test_queries.py.
- Updated planner tests.

Change-Id: Ifefdaff8557a30ac44ea82ed428e6d1ffbca2e9e
---
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/analysis/FunctionName.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
M fe/src/main/java/org/apache/impala/analysis/SortInfo.java
M fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java
M fe/src/main/java/org/apache/impala/planner/SortNode.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/sort-expr-materialization.test
M tests/query_test/test_sort.py
M tests/query_test/test_udfs.py
12 files changed, 326 insertions(+), 22 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/6322/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6322
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifefdaff8557a30ac44ea82ed428e6d1ffbca2e9e
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4731/IMPALA-397/IMPALA-4728: Materialize sort exprs

2017-03-15 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-4731/IMPALA-397/IMPALA-4728: Materialize sort exprs
..


Patch Set 2:

(24 comments)

http://gerrit.cloudera.org:8080/#/c/6322/1/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

Line 212:   // True if this expr always returns the same value given the same 
input, false if it
> True if this exprs always returns the same value given the same input. Fals
Done


Line 213:   // does not, or if the behavior is unknown (e.g. UDFs). Set at the 
end of analyze() and
> e.g.
Done


http://gerrit.cloudera.org:8080/#/c/6322/1/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java:

Line 292: // We currently don't have a way to indicate if a UDF is 
deterministic, so just always
> Ideally we should not conflate the UDF and deterministic concepts. For exam
Not sure what you mean by 'two separate members". Maybe 
Expr.containsNonDeterministicBuiltin and Expr.containsUDF?

Also, it doesn't seem like FunctionCallExpr.isNondeterministicBuiltinFn() and 
Expr.isDeterministic() could ever be guaranteed to return opposite values, 
since a deterministic function that has non-deterministic parameters would 
constitute a non-deterministic expr.


http://gerrit.cloudera.org:8080/#/c/6322/1/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
File fe/src/main/java/org/apache/impala/analysis/QueryStmt.java:

Line 253:   if (!(smap.getLhs().get(i) instanceof SlotRef)
> use {} for multi-line if
Done


http://gerrit.cloudera.org:8080/#/c/6322/1/fe/src/main/java/org/apache/impala/analysis/SortInfo.java
File fe/src/main/java/org/apache/impala/analysis/SortInfo.java:

Line 39:   // All Exprs with cost greater than this will be materialized. Since 
we don't currently
> Remove TODO, instead comment where this value came from, maybe with some ex
Done


Line 46: 
> needs a brief comment, in particular why we need to store them
Done


Line 172:*/
> We should collect the SlotRefs only for non-materialized ordering exprs.
Done


Line 180: // substOrderBy is a mapping from slot refs in the sort node's 
input and ordering
> update comment to reflect the new contents of the substOrderBy smap
Done


Line 196:   SlotDescriptor origSlotDesc = origSlotRef.getDesc();
> update comment, we are not only substituting slot refs
Done


Line 208: substituteOrderingExprs(substOrderBy, analyzer);
> Say something about the args of this function. Also mention side-effects (p
Done


Line 210: // Update the tuple descriptor used to materialize the input of 
the sort.
> make this the first sentence in this comment
Done


Line 211: setMaterializedTupleInfo(sortTupleDesc, sortTupleExprs);
> no more hint, right?
Done


Line 214:   }
> from the materialized sort exprs to the new SlotRefs
Done


Line 216:   /**
> Instead of returning a new smap and combining with the other one, why not p
I updated it to take the smap as a parameter, but the problem with returning 
the materialized exprs to avoid side effects is that we also need to update 
isAsc and nullsFirst.


Line 217:* Materialize ordering exprs that are non-deterministic, are more 
expensive than a cost
> analyzer needed?
For 'addSlotDescriptor'


Line 219:* 'sortTupleDesc' as their parent.
> remove
Done


Line 228:* exprs that get materialized.
> Why TODO? What happens when there are no sort exprs at all left?
There's already a lot in this review, and I wasn't sure how easy this would be. 
I'm happy to take a look if you want.

If there are no sort exprs, we still go through the whole sort but every row is 
considered equal (by TupleRowComparator::Compare"). This is the same behavior 
as would currently happen, except slightly faster as the literal expressions 
don't get repeatedly evaluated and compared.

I added a test to at least show it works as expected.


http://gerrit.cloudera.org:8080/#/c/6322/1/fe/src/test/java/org/apache/impala/planner/PlannerTest.java
File fe/src/test/java/org/apache/impala/planner/PlannerTest.java:

Line 374:   public void testSortExprMaterialization() {
> testSortExprMaterialization
Done


http://gerrit.cloudera.org:8080/#/c/6322/1/testdata/workloads/functional-planner/queries/PlannerTest/sort-materialization.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/sort-materialization.test:

Line 19
> I think these tests need more examples cheap and expensive exprs to make su
I'm not sure how to make that work with PlannerTest, but I added a test to 
query_tests/test_udfs that works.

For UDAs, assuming that you mean something like:
select bool_col, sum(min(int_col)) over(order by uda(int_col)) from 
functional.alltypes group by 1;
I believe that will already be materialized, in a SlotRef created in 

[Impala-ASF-CR] IMPALA-5077: add NUMA and current cpu to CpuInfo

2017-03-15 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#3).

Change subject: IMPALA-5077: add NUMA and current cpu to CpuInfo
..

IMPALA-5077: add NUMA and current cpu to CpuInfo

NUMA information is found using the /sys filesystem.

The current CPU can be found using sched_getcpu(), which is supported
on all recent Linux kernels (unfortunately CentOS 5 shipped with an
older kernel, so we need a fallback).

Testing:
Confirmed that this built on a range of different Linux distros,
including CentOS 5, which is missing support for features like
sched_getcpu().

Change-Id: I0525228a56bcf20c45f78ee1ba1d300c74cf4d05
---
M be/CMakeLists.txt
A be/src/common/.gitignore
M be/src/common/CMakeLists.txt
A be/src/common/config.h.in
M be/src/util/cpu-info.cc
M be/src/util/cpu-info.h
6 files changed, 154 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/6402/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6402
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0525228a56bcf20c45f78ee1ba1d300c74cf4d05
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 


[Impala-ASF-CR] IMPALA-5077: add NUMA and current cpu to CpuInfo

2017-03-15 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5077: add NUMA and current cpu to CpuInfo
..


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/6402/2/be/src/util/cpu-info.cc
File be/src/util/cpu-info.cc:

PS2, Line 161: result
> Just inline sched_getcpu() if you aren't going to use result.
Done


PS2, Line 165: Running
> "Built"?
Good point - done.


Line 183: string filename = dir_it->path().filename().string();
> const
Done


Line 192:   core_to_numa_node_ = new int[max_num_cores_];
> Will ASAN complain about this memory leak?
LeakSanitiser would report it as "reachable" but it's easy enough to switch to 
unique_ptr.


-- 
To view, visit http://gerrit.cloudera.org:8080/6402
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0525228a56bcf20c45f78ee1ba1d300c74cf4d05
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes