[Impala-ASF-CR] Remove caching of llvm StructType by TupleDescriptor::GetLlvmStruct().

2017-03-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: Remove caching of llvm StructType by 
TupleDescriptor::GetLlvmStruct().
..


Remove caching of llvm StructType by TupleDescriptor::GetLlvmStruct().

This caching is causing problems when multiple threads are
involved in query startup, which is being implemented as
part of the move to a per-query exec rpc (IMPALA-2550).

Change-Id: I6f765048f36b9e596aeb4e4482d38b94f5ffba06
Reviewed-on: http://gerrit.cloudera.org:8080/6511
Reviewed-by: Marcel Kornacker 
Tested-by: Impala Public Jenkins
---
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
2 files changed, 40 insertions(+), 46 deletions(-)

Approvals:
  Marcel Kornacker: Looks good to me, approved
  Impala Public Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/6511
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I6f765048f36b9e596aeb4e4482d38b94f5ffba06
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Marcel Kornacker 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] Remove caching of llvm StructType by TupleDescriptor::GetLlvmStruct().

2017-03-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: Remove caching of llvm StructType by 
TupleDescriptor::GetLlvmStruct().
..


Patch Set 2: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6511
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6f765048f36b9e596aeb4e4482d38b94f5ffba06
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Marcel Kornacker 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4883: Union Codegen

2017-03-30 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4883: Union Codegen
..


Patch Set 4:

I ran the query above on Patch Set 4 as well, complete results:

Apache/master: 17.60s
Patch set 4: 10.58s
Patch set 5: 9.98s

-- 
To view, visit http://gerrit.cloudera.org:8080/6459
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib4107d27582ff5416172810364a6e76d3d93c439
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4883: Union Codegen

2017-03-30 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4883: Union Codegen
..


Patch Set 5:

I reran the benchmark on patch 5 on a larger table where we select only 1 
column:
SELECT
  COUNT(c)
FROM (
  select fnv_hash(ss_sold_time_sk) c from 
tpcds_10_parquet.store_sales_unpartitioned_big
  union all
  select fnv_hash(ss_sold_time_sk) c from 
tpcds_10_parquet.store_sales_unpartitioned_big
  union all
  select fnv_hash(ss_sold_time_sk) c from 
tpcds_10_parquet.store_sales_unpartitioned_big
  union all
  select fnv_hash(ss_sold_time_sk) c from 
tpcds_10_parquet.store_sales_unpartitioned_big
) t

Before: 17.6s
After: 9.98s

Not a huge difference. I think the bottleneck is scanning (not union), that's 
why the improvement is not as big. Maybe the difference will be more 
significant on a large cluster?

-- 
To view, visit http://gerrit.cloudera.org:8080/6459
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib4107d27582ff5416172810364a6e76d3d93c439
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4883: Union Codegen

2017-03-30 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#5).

Change subject: IMPALA-4883: Union Codegen
..

IMPALA-4883: Union Codegen

For each non-passthrough child of the Union node, codegen the loop that
does per row tuple materialization.

Testing:
Ran test_queries.py test locally in exchaustive mode.

Benchmark:
Ran a local benchmark on a local 10 GB TPCDS dataset on an unpartitioned
store_sales table.

SELECT
  COUNT(c),
  COUNT(ss_customer_sk),
  COUNT(ss_cdemo_sk),
  COUNT(ss_hdemo_sk),
  COUNT(ss_addr_sk),
  COUNT(ss_store_sk),
  COUNT(ss_promo_sk),
  COUNT(ss_ticket_number),
  COUNT(ss_quantity),
  COUNT(ss_wholesale_cost),
  COUNT(ss_list_price),
  COUNT(ss_sales_price),
  COUNT(ss_ext_discount_amt),
  COUNT(ss_ext_sales_price),
  COUNT(ss_ext_wholesale_cost),
  COUNT(ss_ext_list_price),
  COUNT(ss_ext_tax),
  COUNT(ss_coupon_amt),
  COUNT(ss_net_paid),
  COUNT(ss_net_paid_inc_tax),
  COUNT(ss_net_profit),
  COUNT(ss_sold_date_sk)
FROM (
  select fnv_hash(ss_sold_time_sk) c, * from 
tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from 
tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from 
tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from 
tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from 
tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from 
tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from 
tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from 
tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from 
tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from 
tpcds_10_parquet.store_sales_unpartitioned
) t

Before: 39s704ms
Operator  #Hosts   Avg Time   Max Time#Rows  Est. #Rows   Peak Mem  
Est. Peak Mem  Detail
--
13:AGGREGATE   1  194.504us  194.504us1   1   28.00 KB  
  -1.00 B  FINALIZE
12:EXCHANGE1   17.284us   17.284us3   1  0  
  -1.00 B  UNPARTITIONED
11:AGGREGATE   32s202ms2s934ms3   1  115.00 KB  
 10.00 MB
00:UNION   3   32s514ms   34s926ms  288.01M 288.01M3.08 MB  
0
|--02:SCAN HDFS3  158.373ms  216.085ms   28.80M  28.80M  489.71 MB  
  1.88 GB  tpcds_10_parquet.store_sales
|--03:SCAN HDFS3  167.002ms  171.738ms   28.80M  28.80M  489.74 MB  
  1.88 GB  tpcds_10_parquet.store_sales
|--04:SCAN HDFS3  125.331ms  145.496ms   28.80M  28.80M  489.57 MB  
  1.88 GB  tpcds_10_parquet.store_sales
|--05:SCAN HDFS3  148.478ms  194.311ms   28.80M  28.80M  489.69 MB  
  1.88 GB  tpcds_10_parquet.store_sales
|--06:SCAN HDFS3  143.995ms  162.781ms   28.80M  28.80M  489.57 MB  
  1.88 GB  tpcds_10_parquet.store_sales
|--07:SCAN HDFS3  169.731ms  250.201ms   28.80M  28.80M  489.58 MB  
  1.88 GB  tpcds_10_parquet.store_sales
|--08:SCAN HDFS3  164.110ms  254.374ms   28.80M  28.80M  489.61 MB  
  1.88 GB  tpcds_10_parquet.store_sales
|--09:SCAN HDFS3  135.631ms  162.117ms   28.80M  28.80M  489.63 MB  
  1.88 GB  tpcds_10_parquet.store_sales
|--10:SCAN HDFS3  138.736ms  167.778ms   28.80M  28.80M  489.67 MB  
  1.88 GB  tpcds_10_parquet.store_sales
01:SCAN HDFS   3  202.015ms  248.728ms   28.80M  28.80M  489.68 MB  
  1.88 GB  tpcds_10_parquet.store_sales

After: 20s664ms
Operator  #Hosts   Avg Time   Max Time#Rows  Est. #Rows   Peak Mem  
Est. Peak Mem  Detail
--
13:AGGREGATE   1  167.757us  167.757us1   1   28.00 KB  
  -1.00 B  FINALIZE
12:EXCHANGE1   16.592us   16.592us3   1  0  
  -1.00 B  UNPARTITIONED
11:AGGREGATE   32s924ms3s715ms3   1  115.00 KB  
 10.00 MB
00:UNION   34s971ms6s082ms  288.01M 288.01M3.08 MB  
0
|--02:SCAN HDFS31s189ms1s588ms   28.80M  28.80M  483.82 MB  
  1.88 GB  tpcds_10_parquet.store_sales
|--03:SCAN HDFS31s117ms1s157ms   28.80M  28.80M  484.85 MB  
  1.88 GB  tpcds_10_parquet.store_sales
|--04:SCAN HDFS31s226ms1s454ms   28.80M  28.80M  483.00 MB  
  1.88 GB  tpcds_10_parquet.store_sales
|--05:SCAN HDFS31s141ms1s3

[Impala-ASF-CR] IMPALA-4883: Union Codegen

2017-03-30 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#5).

Change subject: IMPALA-4883: Union Codegen
..

IMPALA-4883: Union Codegen

For each non-passthrough child of the Union node, codegen the loop that
does per row tuple materialization.

Testing:
Ran test_queries.py test locally in exchaustive mode.

Benchmark:
Ran a local benchmark on a local 10 GB TPCDS dataset on an unpartitioned
store_sales table.

SELECT
  COUNT(c),
  COUNT(ss_customer_sk),
  COUNT(ss_cdemo_sk),
  COUNT(ss_hdemo_sk),
  COUNT(ss_addr_sk),
  COUNT(ss_store_sk),
  COUNT(ss_promo_sk),
  COUNT(ss_ticket_number),
  COUNT(ss_quantity),
  COUNT(ss_wholesale_cost),
  COUNT(ss_list_price),
  COUNT(ss_sales_price),
  COUNT(ss_ext_discount_amt),
  COUNT(ss_ext_sales_price),
  COUNT(ss_ext_wholesale_cost),
  COUNT(ss_ext_list_price),
  COUNT(ss_ext_tax),
  COUNT(ss_coupon_amt),
  COUNT(ss_net_paid),
  COUNT(ss_net_paid_inc_tax),
  COUNT(ss_net_profit),
  COUNT(ss_sold_date_sk)
FROM (
  select fnv_hash(ss_sold_time_sk) c, * from 
tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from 
tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from 
tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from 
tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from 
tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from 
tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from 
tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from 
tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from 
tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from 
tpcds_10_parquet.store_sales_unpartitioned
) t

Before: 39s704ms
Operator  #Hosts   Avg Time   Max Time#Rows  Est. #Rows   Peak Mem  
Est. Peak Mem  Detail
--
13:AGGREGATE   1  194.504us  194.504us1   1   28.00 KB  
  -1.00 B  FINALIZE
12:EXCHANGE1   17.284us   17.284us3   1  0  
  -1.00 B  UNPARTITIONED
11:AGGREGATE   32s202ms2s934ms3   1  115.00 KB  
 10.00 MB
00:UNION   3   32s514ms   34s926ms  288.01M 288.01M3.08 MB  
0
|--02:SCAN HDFS3  158.373ms  216.085ms   28.80M  28.80M  489.71 MB  
  1.88 GB  tpcds_10_parquet.store_sales
|--03:SCAN HDFS3  167.002ms  171.738ms   28.80M  28.80M  489.74 MB  
  1.88 GB  tpcds_10_parquet.store_sales
|--04:SCAN HDFS3  125.331ms  145.496ms   28.80M  28.80M  489.57 MB  
  1.88 GB  tpcds_10_parquet.store_sales
|--05:SCAN HDFS3  148.478ms  194.311ms   28.80M  28.80M  489.69 MB  
  1.88 GB  tpcds_10_parquet.store_sales
|--06:SCAN HDFS3  143.995ms  162.781ms   28.80M  28.80M  489.57 MB  
  1.88 GB  tpcds_10_parquet.store_sales
|--07:SCAN HDFS3  169.731ms  250.201ms   28.80M  28.80M  489.58 MB  
  1.88 GB  tpcds_10_parquet.store_sales
|--08:SCAN HDFS3  164.110ms  254.374ms   28.80M  28.80M  489.61 MB  
  1.88 GB  tpcds_10_parquet.store_sales
|--09:SCAN HDFS3  135.631ms  162.117ms   28.80M  28.80M  489.63 MB  
  1.88 GB  tpcds_10_parquet.store_sales
|--10:SCAN HDFS3  138.736ms  167.778ms   28.80M  28.80M  489.67 MB  
  1.88 GB  tpcds_10_parquet.store_sales
01:SCAN HDFS   3  202.015ms  248.728ms   28.80M  28.80M  489.68 MB  
  1.88 GB  tpcds_10_parquet.store_sales

After: 20s664ms
Operator  #Hosts   Avg Time   Max Time#Rows  Est. #Rows   Peak Mem  
Est. Peak Mem  Detail
--
13:AGGREGATE   1  167.757us  167.757us1   1   28.00 KB  
  -1.00 B  FINALIZE
12:EXCHANGE1   16.592us   16.592us3   1  0  
  -1.00 B  UNPARTITIONED
11:AGGREGATE   32s924ms3s715ms3   1  115.00 KB  
 10.00 MB
00:UNION   34s971ms6s082ms  288.01M 288.01M3.08 MB  
0
|--02:SCAN HDFS31s189ms1s588ms   28.80M  28.80M  483.82 MB  
  1.88 GB  tpcds_10_parquet.store_sales
|--03:SCAN HDFS31s117ms1s157ms   28.80M  28.80M  484.85 MB  
  1.88 GB  tpcds_10_parquet.store_sales
|--04:SCAN HDFS31s226ms1s454ms   28.80M  28.80M  483.00 MB  
  1.88 GB  tpcds_10_parquet.store_sales
|--05:SCAN HDFS31s141ms1s3

[Impala-ASF-CR] IMPALA-4883: Union Codegen

2017-03-30 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4883: Union Codegen
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6459/4/be/src/exec/union-node-ir.cc
File be/src/exec/union-node-ir.cc:

Line 28:   while (!dst_batch->AtCapacity() && child_row_idx_ < 
child_batch_->num_rows()) {
> The remaining optimisation is to pull all references to member variables ou
Done. I think I took everything out. Let me know if you see any other ways to 
further improve performance.


Line 34: if (ReachedLimit()) break;
> We can avoid checking limits for each row if we check it at the end and tru
Good idea, done. (We already do a similar thing in the passthrough case)


-- 
To view, visit http://gerrit.cloudera.org:8080/6459
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib4107d27582ff5416172810364a6e76d3d93c439
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4883: Union Codegen

2017-03-30 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#7).

Change subject: IMPALA-4883: Union Codegen
..

IMPALA-4883: Union Codegen

For each non-passthrough child of the Union node, codegen the loop that
does per row tuple materialization.

Testing:
Ran test_queries.py test locally in exchaustive mode.

Benchmark:
Ran a local benchmark on a local 10 GB TPCDS dataset on an unpartitioned
store_sales table.

SELECT
  COUNT(c),
  COUNT(ss_customer_sk),
  COUNT(ss_cdemo_sk),
  COUNT(ss_hdemo_sk),
  COUNT(ss_addr_sk),
  COUNT(ss_store_sk),
  COUNT(ss_promo_sk),
  COUNT(ss_ticket_number),
  COUNT(ss_quantity),
  COUNT(ss_wholesale_cost),
  COUNT(ss_list_price),
  COUNT(ss_sales_price),
  COUNT(ss_ext_discount_amt),
  COUNT(ss_ext_sales_price),
  COUNT(ss_ext_wholesale_cost),
  COUNT(ss_ext_list_price),
  COUNT(ss_ext_tax),
  COUNT(ss_coupon_amt),
  COUNT(ss_net_paid),
  COUNT(ss_net_paid_inc_tax),
  COUNT(ss_net_profit),
  COUNT(ss_sold_date_sk)
FROM (
  select fnv_hash(ss_sold_time_sk) c, * from 
tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from 
tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from 
tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from 
tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from 
tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from 
tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from 
tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from 
tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from 
tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from 
tpcds_10_parquet.store_sales_unpartitioned
) t

Before: 39s704ms
Operator  #Hosts   Avg Time   Max Time#Rows  Est. #Rows   Peak Mem  
Est. Peak Mem  Detail
--
13:AGGREGATE   1  194.504us  194.504us1   1   28.00 KB  
  -1.00 B  FINALIZE
12:EXCHANGE1   17.284us   17.284us3   1  0  
  -1.00 B  UNPARTITIONED
11:AGGREGATE   32s202ms2s934ms3   1  115.00 KB  
 10.00 MB
00:UNION   3   32s514ms   34s926ms  288.01M 288.01M3.08 MB  
0
|--02:SCAN HDFS3  158.373ms  216.085ms   28.80M  28.80M  489.71 MB  
  1.88 GB  tpcds_10_parquet.store_sales
|--03:SCAN HDFS3  167.002ms  171.738ms   28.80M  28.80M  489.74 MB  
  1.88 GB  tpcds_10_parquet.store_sales
|--04:SCAN HDFS3  125.331ms  145.496ms   28.80M  28.80M  489.57 MB  
  1.88 GB  tpcds_10_parquet.store_sales
|--05:SCAN HDFS3  148.478ms  194.311ms   28.80M  28.80M  489.69 MB  
  1.88 GB  tpcds_10_parquet.store_sales
|--06:SCAN HDFS3  143.995ms  162.781ms   28.80M  28.80M  489.57 MB  
  1.88 GB  tpcds_10_parquet.store_sales
|--07:SCAN HDFS3  169.731ms  250.201ms   28.80M  28.80M  489.58 MB  
  1.88 GB  tpcds_10_parquet.store_sales
|--08:SCAN HDFS3  164.110ms  254.374ms   28.80M  28.80M  489.61 MB  
  1.88 GB  tpcds_10_parquet.store_sales
|--09:SCAN HDFS3  135.631ms  162.117ms   28.80M  28.80M  489.63 MB  
  1.88 GB  tpcds_10_parquet.store_sales
|--10:SCAN HDFS3  138.736ms  167.778ms   28.80M  28.80M  489.67 MB  
  1.88 GB  tpcds_10_parquet.store_sales
01:SCAN HDFS   3  202.015ms  248.728ms   28.80M  28.80M  489.68 MB  
  1.88 GB  tpcds_10_parquet.store_sales

After: 20s664ms
Operator  #Hosts   Avg Time   Max Time#Rows  Est. #Rows   Peak Mem  
Est. Peak Mem  Detail
--
13:AGGREGATE   1  167.757us  167.757us1   1   28.00 KB  
  -1.00 B  FINALIZE
12:EXCHANGE1   16.592us   16.592us3   1  0  
  -1.00 B  UNPARTITIONED
11:AGGREGATE   32s924ms3s715ms3   1  115.00 KB  
 10.00 MB
00:UNION   34s971ms6s082ms  288.01M 288.01M3.08 MB  
0
|--02:SCAN HDFS31s189ms1s588ms   28.80M  28.80M  483.82 MB  
  1.88 GB  tpcds_10_parquet.store_sales
|--03:SCAN HDFS31s117ms1s157ms   28.80M  28.80M  484.85 MB  
  1.88 GB  tpcds_10_parquet.store_sales
|--04:SCAN HDFS31s226ms1s454ms   28.80M  28.80M  483.00 MB  
  1.88 GB  tpcds_10_parquet.store_sales
|--05:SCAN HDFS31s141ms1s3

[Impala-ASF-CR] IMPALA-4893: Efficiently update the rows read counter for sequence file

2017-03-30 Thread anujphadke (Code Review)
anujphadke has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6522

Change subject: IMPALA-4893: Efficiently update the rows read counter for 
sequence file
..

IMPALA-4893: Efficiently update the rows read counter for sequence file

Update the rows read counter after processing the scan range instead of updating
it after reading every row for sequence files to save CPU cycles.

Change-Id: Ie42c97a36e46172884cc497aa645036c2c11f541
---
M be/src/exec/hdfs-sequence-scanner.cc
1 file changed, 7 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/6522/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6522
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ie42c97a36e46172884cc497aa645036c2c11f541
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke 


[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan nodes and inline views

2017-03-30 Thread Zach Amsden (Code Review)
Zach Amsden has uploaded a new patch set (#10).

Change subject: IMPALA-5003: Constant propagation in scan nodes and inline views
..

IMPALA-5003: Constant propagation in scan nodes and inline views

N.B. - preview diff again.  I added an inequality collation phase which
is not yet tested or finished, but should combine conjuncts such as
a < k1, a < k2  into a < min(k1, k2).

When conjuncts are pushed into table refs and inline views, they can
be considered for constant progagation within that node.  In certain
cases, we might end up with a FALSE conditional and now we can
convert ScanNodes to EmptySet nodes when that occurs.

Testing: Expanded the test cases for the planner to achieve constant
propagation.  Added Kudu, datasource, Hdfs and HBase tests to validate
we can create EmptySetNodes

Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
---
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/SelectList.java
M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/ValueRange.java
M fe/src/main/java/org/apache/impala/rewrite/ExprRewriter.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/conjunct-ordering.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/data-source-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/hdfs.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
18 files changed, 607 insertions(+), 110 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/89/6389/10
-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-4883: Union Codegen

2017-03-30 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#6).

Change subject: IMPALA-4883: Union Codegen
..

IMPALA-4883: Union Codegen

For each non-passthrough child of the Union node, codegen the loop that
does per row tuple materialization.

Testing:
Ran test_queries.py test locally in exchaustive mode.

Benchmark:
Ran a local benchmark on a local 10 GB TPCDS dataset on an unpartitioned
store_sales table.

SELECT
  COUNT(c),
  COUNT(ss_customer_sk),
  COUNT(ss_cdemo_sk),
  COUNT(ss_hdemo_sk),
  COUNT(ss_addr_sk),
  COUNT(ss_store_sk),
  COUNT(ss_promo_sk),
  COUNT(ss_ticket_number),
  COUNT(ss_quantity),
  COUNT(ss_wholesale_cost),
  COUNT(ss_list_price),
  COUNT(ss_sales_price),
  COUNT(ss_ext_discount_amt),
  COUNT(ss_ext_sales_price),
  COUNT(ss_ext_wholesale_cost),
  COUNT(ss_ext_list_price),
  COUNT(ss_ext_tax),
  COUNT(ss_coupon_amt),
  COUNT(ss_net_paid),
  COUNT(ss_net_paid_inc_tax),
  COUNT(ss_net_profit),
  COUNT(ss_sold_date_sk)
FROM (
  select fnv_hash(ss_sold_time_sk) c, * from 
tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from 
tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from 
tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from 
tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from 
tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from 
tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from 
tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from 
tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from 
tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from 
tpcds_10_parquet.store_sales_unpartitioned
) t

Before: 39s704ms
Operator  #Hosts   Avg Time   Max Time#Rows  Est. #Rows   Peak Mem  
Est. Peak Mem  Detail
--
13:AGGREGATE   1  194.504us  194.504us1   1   28.00 KB  
  -1.00 B  FINALIZE
12:EXCHANGE1   17.284us   17.284us3   1  0  
  -1.00 B  UNPARTITIONED
11:AGGREGATE   32s202ms2s934ms3   1  115.00 KB  
 10.00 MB
00:UNION   3   32s514ms   34s926ms  288.01M 288.01M3.08 MB  
0
|--02:SCAN HDFS3  158.373ms  216.085ms   28.80M  28.80M  489.71 MB  
  1.88 GB  tpcds_10_parquet.store_sales
|--03:SCAN HDFS3  167.002ms  171.738ms   28.80M  28.80M  489.74 MB  
  1.88 GB  tpcds_10_parquet.store_sales
|--04:SCAN HDFS3  125.331ms  145.496ms   28.80M  28.80M  489.57 MB  
  1.88 GB  tpcds_10_parquet.store_sales
|--05:SCAN HDFS3  148.478ms  194.311ms   28.80M  28.80M  489.69 MB  
  1.88 GB  tpcds_10_parquet.store_sales
|--06:SCAN HDFS3  143.995ms  162.781ms   28.80M  28.80M  489.57 MB  
  1.88 GB  tpcds_10_parquet.store_sales
|--07:SCAN HDFS3  169.731ms  250.201ms   28.80M  28.80M  489.58 MB  
  1.88 GB  tpcds_10_parquet.store_sales
|--08:SCAN HDFS3  164.110ms  254.374ms   28.80M  28.80M  489.61 MB  
  1.88 GB  tpcds_10_parquet.store_sales
|--09:SCAN HDFS3  135.631ms  162.117ms   28.80M  28.80M  489.63 MB  
  1.88 GB  tpcds_10_parquet.store_sales
|--10:SCAN HDFS3  138.736ms  167.778ms   28.80M  28.80M  489.67 MB  
  1.88 GB  tpcds_10_parquet.store_sales
01:SCAN HDFS   3  202.015ms  248.728ms   28.80M  28.80M  489.68 MB  
  1.88 GB  tpcds_10_parquet.store_sales

After: 20s664ms
Operator  #Hosts   Avg Time   Max Time#Rows  Est. #Rows   Peak Mem  
Est. Peak Mem  Detail
--
13:AGGREGATE   1  167.757us  167.757us1   1   28.00 KB  
  -1.00 B  FINALIZE
12:EXCHANGE1   16.592us   16.592us3   1  0  
  -1.00 B  UNPARTITIONED
11:AGGREGATE   32s924ms3s715ms3   1  115.00 KB  
 10.00 MB
00:UNION   34s971ms6s082ms  288.01M 288.01M3.08 MB  
0
|--02:SCAN HDFS31s189ms1s588ms   28.80M  28.80M  483.82 MB  
  1.88 GB  tpcds_10_parquet.store_sales
|--03:SCAN HDFS31s117ms1s157ms   28.80M  28.80M  484.85 MB  
  1.88 GB  tpcds_10_parquet.store_sales
|--04:SCAN HDFS31s226ms1s454ms   28.80M  28.80M  483.00 MB  
  1.88 GB  tpcds_10_parquet.store_sales
|--05:SCAN HDFS31s141ms1s3

[Impala-ASF-CR] IMPALA-4883: Union Codegen

2017-03-30 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#5).

Change subject: IMPALA-4883: Union Codegen
..

IMPALA-4883: Union Codegen

For each non-passthrough child of the Union node, codegen the loop that
does per row tuple materialization.

Testing:
Ran test_queries.py test locally in exchaustive mode.

Benchmark:
Ran a local benchmark on a local 10 GB TPCDS dataset on an unpartitioned
store_sales table.

SELECT
  COUNT(c),
  COUNT(ss_customer_sk),
  COUNT(ss_cdemo_sk),
  COUNT(ss_hdemo_sk),
  COUNT(ss_addr_sk),
  COUNT(ss_store_sk),
  COUNT(ss_promo_sk),
  COUNT(ss_ticket_number),
  COUNT(ss_quantity),
  COUNT(ss_wholesale_cost),
  COUNT(ss_list_price),
  COUNT(ss_sales_price),
  COUNT(ss_ext_discount_amt),
  COUNT(ss_ext_sales_price),
  COUNT(ss_ext_wholesale_cost),
  COUNT(ss_ext_list_price),
  COUNT(ss_ext_tax),
  COUNT(ss_coupon_amt),
  COUNT(ss_net_paid),
  COUNT(ss_net_paid_inc_tax),
  COUNT(ss_net_profit),
  COUNT(ss_sold_date_sk)
FROM (
  select fnv_hash(ss_sold_time_sk) c, * from 
tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from 
tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from 
tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from 
tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from 
tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from 
tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from 
tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from 
tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from 
tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from 
tpcds_10_parquet.store_sales_unpartitioned
) t

Before: 39s704ms
Operator  #Hosts   Avg Time   Max Time#Rows  Est. #Rows   Peak Mem  
Est. Peak Mem  Detail
--
13:AGGREGATE   1  194.504us  194.504us1   1   28.00 KB  
  -1.00 B  FINALIZE
12:EXCHANGE1   17.284us   17.284us3   1  0  
  -1.00 B  UNPARTITIONED
11:AGGREGATE   32s202ms2s934ms3   1  115.00 KB  
 10.00 MB
00:UNION   3   32s514ms   34s926ms  288.01M 288.01M3.08 MB  
0
|--02:SCAN HDFS3  158.373ms  216.085ms   28.80M  28.80M  489.71 MB  
  1.88 GB  tpcds_10_parquet.store_sales
|--03:SCAN HDFS3  167.002ms  171.738ms   28.80M  28.80M  489.74 MB  
  1.88 GB  tpcds_10_parquet.store_sales
|--04:SCAN HDFS3  125.331ms  145.496ms   28.80M  28.80M  489.57 MB  
  1.88 GB  tpcds_10_parquet.store_sales
|--05:SCAN HDFS3  148.478ms  194.311ms   28.80M  28.80M  489.69 MB  
  1.88 GB  tpcds_10_parquet.store_sales
|--06:SCAN HDFS3  143.995ms  162.781ms   28.80M  28.80M  489.57 MB  
  1.88 GB  tpcds_10_parquet.store_sales
|--07:SCAN HDFS3  169.731ms  250.201ms   28.80M  28.80M  489.58 MB  
  1.88 GB  tpcds_10_parquet.store_sales
|--08:SCAN HDFS3  164.110ms  254.374ms   28.80M  28.80M  489.61 MB  
  1.88 GB  tpcds_10_parquet.store_sales
|--09:SCAN HDFS3  135.631ms  162.117ms   28.80M  28.80M  489.63 MB  
  1.88 GB  tpcds_10_parquet.store_sales
|--10:SCAN HDFS3  138.736ms  167.778ms   28.80M  28.80M  489.67 MB  
  1.88 GB  tpcds_10_parquet.store_sales
01:SCAN HDFS   3  202.015ms  248.728ms   28.80M  28.80M  489.68 MB  
  1.88 GB  tpcds_10_parquet.store_sales

After: 20s664ms
Operator  #Hosts   Avg Time   Max Time#Rows  Est. #Rows   Peak Mem  
Est. Peak Mem  Detail
--
13:AGGREGATE   1  167.757us  167.757us1   1   28.00 KB  
  -1.00 B  FINALIZE
12:EXCHANGE1   16.592us   16.592us3   1  0  
  -1.00 B  UNPARTITIONED
11:AGGREGATE   32s924ms3s715ms3   1  115.00 KB  
 10.00 MB
00:UNION   34s971ms6s082ms  288.01M 288.01M3.08 MB  
0
|--02:SCAN HDFS31s189ms1s588ms   28.80M  28.80M  483.82 MB  
  1.88 GB  tpcds_10_parquet.store_sales
|--03:SCAN HDFS31s117ms1s157ms   28.80M  28.80M  484.85 MB  
  1.88 GB  tpcds_10_parquet.store_sales
|--04:SCAN HDFS31s226ms1s454ms   28.80M  28.80M  483.00 MB  
  1.88 GB  tpcds_10_parquet.store_sales
|--05:SCAN HDFS31s141ms1s3

[Impala-ASF-CR] IMPALA-5137: pt1, Refactor TimestampValue constructors

2017-03-30 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has uploaded a new patch set (#2).

Change subject: IMPALA-5137: pt1, Refactor TimestampValue constructors
..

IMPALA-5137: pt1, Refactor TimestampValue constructors

In preparation for supporting Kudu TIMESTAMPs, some cleanup
of the TimestampValue code will be helpful first.

This change just refactors some of the TimestampValue
constructors which do more than simple construction, e.g.
parsing or converting, to static factory methods.

Change-Id: Id25e19f7984e5ebf9073d9c569faf69cec142fa1
---
M be/src/benchmarks/parse-timestamp-benchmark.cc
M be/src/exec/data-source-scan-node.cc
M be/src/exec/text-converter.inline.h
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/expr-value.h
M be/src/exprs/literal.cc
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.cc
M be/src/runtime/raw-value-test.cc
M be/src/runtime/runtime-state.cc
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/service/impala-http-handler.cc
M be/src/service/impala-server.cc
M be/src/service/query-exec-state.cc
M be/src/statestore/statestore-subscriber.cc
M be/src/util/dict-test.cc
21 files changed, 254 insertions(+), 150 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/6510/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6510
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id25e19f7984e5ebf9073d9c569faf69cec142fa1
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Matthew Jacobs 


[Impala-ASF-CR] IMPALA-5137: pt1, Refactor TimestampValue constructors

2017-03-30 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5137: pt1, Refactor TimestampValue constructors
..


Patch Set 1:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/6510/1/be/src/exec/data-source-scan-node.cc
File be/src/exec/data-source-scan-node.cc:

PS1, Line 284: FromUnixTimeNanos
> does it make sense that this conversion is subjected to use local tz flag? 
Hm I'm not sure, though if so I'll file a separate JIRA so it can be tracked 
externally as this would be a breaking change (which probably affects nearly 
nobody).


http://gerrit.cloudera.org:8080/#/c/6510/1/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

PS1, Line 354: TimestampValue
> maybe use "const TimetampValue& tv = ..." here (and elsewhere), though mayb
Done


http://gerrit.cloudera.org:8080/#/c/6510/1/be/src/exprs/timestamp-functions-ir.cc
File be/src/exprs/timestamp-functions-ir.cc:

PS1, Line 81: lexical_cast(t)
> how about getting rid of this implicit code as well by defining a ToString(
yeah this is weird, agreed


http://gerrit.cloudera.org:8080/#/c/6510/1/be/src/runtime/timestamp-test.cc
File be/src/runtime/timestamp-test.cc:

PS1, Line 632: coverted
> converted
Done


PS1, Line 641:   EXPECT_TRUE(leap_tv1.ToUnixTime(&leap_time_t));
 :   EXPECT_EQ(915148800, leap_time_t);
 :   // Converted to the Unix time representation
 :   EXPECT_EQ("1999-01-01 00:00:00", leap_tv1.DebugString());
> maybe do these cases on leap_ptime2 as well, just to be sure everything is 
Done


http://gerrit.cloudera.org:8080/#/c/6510/1/be/src/runtime/timestamp-value.h
File be/src/runtime/timestamp-value.h:

PS1, Line 91: Conversion to local time will be done if
> Return the corresponding timestamp in the local timezone if 
Done


PS1, Line 92: .
> Otherwise, return the corresponding timestamp in UTC.
Done


PS1, Line 97: 'FromUnixTime'
> For functions, we usually write: FromUnixTime()
Done


PS1, Line 105: unix_time
> 'unix_time'
Done


PS1, Line 287: operator
> do we need this other than that lexical_cast I commented on?
I think we could work around it, but this does get used in a few places where 
we branch on the type and directly use this operator, e.g. literal.cc and 
raw-value.cc


-- 
To view, visit http://gerrit.cloudera.org:8080/6510
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id25e19f7984e5ebf9073d9c569faf69cec142fa1
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3637,IMPALA-3636: refactor codegen constant replacement

2017-03-30 Thread Tim Armstrong (Code Review)
Tim Armstrong has abandoned this change.

Change subject: IMPALA-3637,IMPALA-3636: refactor codegen constant replacement
..


Abandoned

Abandoning for now since there wasn't a consensus about it and I don't have 
time right now to push it forward. I think we should consider going forward 
with it even if some of it will be superseded by other mechanisms because it 
provides immediate performance benefits by inlining CpuInfo::IsSupported() and 
speeding up the interpreted path.

-- 
To view, visit http://gerrit.cloudera.org:8080/3843
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: I6ba029ed8589698eb15dbfb0a20dd2a7ea752635
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-3203: Part 2: per-core free lists in buffer pool

2017-03-30 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-3203: Part 2: per-core free lists in buffer pool
..


Patch Set 15:

(14 comments)

Next batch. I've gone through buffer-allocator.h/.cc.

http://gerrit.cloudera.org:8080/#/c/6414/15/be/src/runtime/bufferpool/buffer-allocator.cc
File be/src/runtime/bufferpool/buffer-allocator.cc:

PS15, Line 137:  Return the buffer size for a buffer of the given length.
doesn't match


PS15, Line 138: GetBuffersOfSize
GetListsForSize?


PS15, Line 419: FreeBuffers
this name is a bit confusing now that i see it in use since "free buffers" is 
also used as a noun.  Anyway, this code might be clearer if instead of this 
method, the free-list exposes a way to get N buffers from the tail, and then 
this class calls the system_allocator_->Free(). That way, all system_allocator_ 
Allocate()/Free() happens in one layer and FreeList doesn't have to know about 
SystemAllocator. (You'd have to remove FreeAll() also, but that seems like you 
could just use the same routine by passing the number of buffers in the list as 
the count).


PS15, Line 421: so that other threads
  : // can claim the memory.
not sure what this comment is saying.


PS15, Line 423: bytes_freed > 0
when would this be false?


PS15, Line 478: !al.owns_lock() && 
why have this condition here? it's still correct to skip the whole block if the 
lock is already held, right?


Line 488: // Figure out how many buffers we should free of this size.
i was misled by this comment since at this point, buffers_to_free is only how 
many buffers from the free list (not  how many buffers we should free of this 
size).


PS15, Line 489: bytes_to_free
this is so similarly named to buffers_to_free and buffer_bytes_to_free but has 
a different meaning in that it's the target, not the accumulated. That makes 
the loop a bit harder to understand.  So, how about renaming this to 
'target_bytes' or something?


Line 496: // that they are freed based on memory address in the expected 
order.
this is basically the same as evicting the pages, right? it would help to use 
that terminology here to make the connection explicit.


PS15, Line 525:   int64_t max_to_claim = claim_memory ? bytes_to_free : 0;
  :   if (bytes_freed > max_to_claim) {
  : // Add back the extra for other threads before releasing 
the lock to avoid race
  : // where the other thread may not be able to find enough 
buffers.
  : parent_->system_bytes_remaining_.Add(bytes_freed - 
max_to_claim);
  : bytes_freed = max_to_claim;
  :   }
as mentioned earlier in the function comment, this is pretty confusing.


PS15, Line 551: BufferHandle buffer;
  : {
  :   lock_guard pl(page->buffer_lock);
  :   buffer = move(page->buffer);
  : }
  : lists->free_buffers.AddFreeBuffer(move(buffer));
  : lists->num_free_buffers.Add(1);
why doesn't this need to do the other stuff that 
FreeBufferArena::AddFreeBuffer() does?  Why not just remove 'claim_buffer' and 
this block, and then have the buffer-pool just always call back down into 
BufferAllocator::Free().

Oh we just talked about that and you pointed out that then a client would 
temporarily be over its reservation while the lock is dropped.

But, shouldn't this block do everything FreeBufferArena::AddFreeBuffer() does?


PS15, Line 569: touched
needed?


Line 571:   // least one, we guarantee that an idle list will shrink to 
zero entries.
can you add some motivation for why we need to care about the previous two 
intervals, rather than, say, just the last interval?


PS15, Line 576: bytes_freed > 0
when would this be false?


-- 
To view, visit http://gerrit.cloudera.org:8080/6414
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I612bd1cd0f0e87f7d8186e5bedd53a22f2d80832
Gerrit-PatchSet: 15
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3079: Fix sequence file writer

2017-03-30 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-3079: Fix sequence file writer
..


Patch Set 5: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6107
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0db642ad35132a9a5a6611810a6cafbbe26e7487
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3079: Fix sequence file writer

2017-03-30 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-3079: Fix sequence file writer
..


Patch Set 4:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/6107/4/be/src/exec/hdfs-sequence-table-writer.cc
File be/src/exec/hdfs-sequence-table-writer.cc:

PS4, Line 179: 
 : 
> The old sequence file writer had multiple issues:
Thanks for listing them out. Please also put this list in the commit message.


http://gerrit.cloudera.org:8080/#/c/6107/5/be/src/exec/read-write-util.h
File be/src/exec/read-write-util.h:

Line 214: // Returns size of the encoded long value, including the 1 byte for 
length.
for val < -112 or val > 127.


PS5, Line 228: 
nit: long line


PS5, Line 245: -(num_bytes + 111);
nit: help to comment why it's 119 here (which is different from 120 in the 
comment above) as we already include the length of the extra 1 byte to 
differentiate between the case of -112 <= v <= 127 vs other values.


-- 
To view, visit http://gerrit.cloudera.org:8080/6107
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0db642ad35132a9a5a6611810a6cafbbe26e7487
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4669: [SECURITY] Add security library to build

2017-03-30 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-4669: [SECURITY] Add security library to build
..


Patch Set 5:

(1 comment)

Some of the same questions/comments apply from the util/ review

http://gerrit.cloudera.org:8080/#/c/5717/5/be/src/kudu/security/token_verifier.cc
File be/src/kudu/security/token_verifier.cc:

Line 160:   DCHECK(false);
seems reasonable to add this upstream


-- 
To view, visit http://gerrit.cloudera.org:8080/5717
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4f327810066bee7f3ac107b0295480fb9ed45e14
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4669: [KUTIL] Add kudu util library to the build.

2017-03-30 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-4669: [KUTIL] Add kudu_util library to the build.
..


Patch Set 5:

(5 comments)

1) Brining in all the gflags is maybe concerning. will any conflict with ours? 
even if not, they will crowd our already large number of gflags. perhaps we 
need to look more closely at them.
2) I'm curious to know how this (and more particularly the actual code import 
change) affects the binary size.
3) It'd be good if we could push some of these changes upstream if possible.

Some additional comments inline.

http://gerrit.cloudera.org:8080/#/c/5715/5/be/CMakeLists.txt
File be/CMakeLists.txt:

Line 315:   maintenance_manager_proto
It might be nice to keep kudu specific libs in a separate list and merge that 
list in.


http://gerrit.cloudera.org:8080/#/c/5715/5/be/src/kudu/util/CMakeLists.txt
File be/src/kudu/util/CMakeLists.txt:

Line 208: if(NOT NO_NVM_SUPPORT)
I think the NO_NVM_SUPPORT flag would probably be useful to add to Kudu, and of 
course if its upstream it's less to maintain in our codebase in the future.


Line 331: target_link_libraries(protoc-gen-insertions gutil glog gflags 
protobuf protoc ${KUDU_BASE_LIBS})
can this be added upstream?


http://gerrit.cloudera.org:8080/#/c/5715/5/be/src/kudu/util/env.cc
File be/src/kudu/util/env.cc:

Line 10: #include "kudu/util/status.h"
seems like they're missing this include, can this be added upstream?


http://gerrit.cloudera.org:8080/#/c/5715/5/be/src/kudu/util/logging.cc
File be/src/kudu/util/logging.cc:

Line 47: DECLARE_string(log_filename);
what does this mean for our log files? will this code start writing to our log?


-- 
To view, visit http://gerrit.cloudera.org:8080/5715
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8da798ee55506d6f969416b17c191eb03cb215f5
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Remove caching of llvm StructType by TupleDescriptor::GetLlvmStruct().

2017-03-30 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: Remove caching of llvm StructType by 
TupleDescriptor::GetLlvmStruct().
..


Patch Set 2: Code-Review+2

rebase

-- 
To view, visit http://gerrit.cloudera.org:8080/6511
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6f765048f36b9e596aeb4e4482d38b94f5ffba06
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Marcel Kornacker 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] Remove caching of llvm StructType by TupleDescriptor::GetLlvmStruct().

2017-03-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: Remove caching of llvm StructType by 
TupleDescriptor::GetLlvmStruct().
..


Patch Set 2:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/424/

-- 
To view, visit http://gerrit.cloudera.org:8080/6511
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6f765048f36b9e596aeb4e4482d38b94f5ffba06
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Marcel Kornacker 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4701: make distcc work reliably with clang

2017-03-30 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4701: make distcc work reliably with clang
..


Patch Set 7:

Any more comments?

-- 
To view, visit http://gerrit.cloudera.org:8080/6493
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I199b245fb14b6c3484b66339a7d4b37d74755af7
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-03-30 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6495/3//COMMIT_MSG
Commit Message:

PS3, Line 13: Specifying SORT BY columns also enables clustering during inserts,
: so the SORT node will contain all partitioning columns first, 
followed
: by the SORT BY columns.
> in general, the commit message shouldn't act as documentation. for things l
That's fine by me. Feel free to simplify the description here and move it to 
the JIRA. I just wanted to understand the implemented semantics instead of 
trying to infer them from the code.


-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4883: Union Codegen

2017-03-30 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4883: Union Codegen
..


Patch Set 4:

(2 comments)

Hmm, I guess the per-row overhead is probably not as significant for the case 
when we're returning a bunch of columns. There might be a more pronounced 
effect if we're just returning a handful of columns. 

I think we can still squeeze some more cycles out of this with minimal effort, 
but if we stop seeing a measurable improvement for a query that returns a 
smaller number of columns we could consider stopping then.

http://gerrit.cloudera.org:8080/#/c/6459/4/be/src/exec/union-node-ir.cc
File be/src/exec/union-node-ir.cc:

Line 28:   while (!dst_batch->AtCapacity() && child_row_idx_ < 
child_batch_->num_rows()) {
The remaining optimisation is to pull all references to member variables out of 
the loop. I.e. child_row_idx_, child_batch_, child_row_idx_, child_expr_lists_, 
tuple_desc_->byte_size(). This will reduce the number of loads and stores quite 
a bit. E.g.

int child_row_idx = child_row_idx_;
int tuple_byte_size = tuple_desc_->byte_size;
while (...) {

}
child_row_idx_ = child_row_idx;

Currently it will do a load and store to variables like 'child_row_idx_' via 
the 'this' pointer on every loop iteration.

The compiler could do that automatically if it could deduce that the values 
aren't modified via a different pointer, but I don't think it's deducible in 
this case because the compiler has  to generate code that's "correct" in a 
weird case like 'this' and *tuple_buf pointing to the same memory.


Line 34: if (ReachedLimit()) break;
We can avoid checking limits for each row if we check it at the end and 
truncate the batch using RowBatch::set_num_rows. E.g. see SortNode::GetNext().


-- 
To view, visit http://gerrit.cloudera.org:8080/6459
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib4107d27582ff5416172810364a6e76d3d93c439
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-03-30 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6495/3//COMMIT_MSG
Commit Message:

PS3, Line 13: Specifying SORT BY columns also enables clustering during inserts,
: so the SORT node will contain all partitioning columns first, 
followed
: by the SORT BY columns.
> Done
in general, the commit message shouldn't act as documentation. for things like 
that, we should use the jira, which is more accessible as a public record.


-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4889: Use client sidecars for Thrift RPCs

2017-03-30 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4889: Use client sidecars for Thrift RPCs
..


Patch Set 3: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6473
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0fcc87c6b67aa167d70ae022663b14bc90261c95
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5127: Add history max option

2017-03-30 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5127: Add history_max option
..


Patch Set 2:

(1 comment)

Thanks, this will be very handy.

What do you think about also bumping up the default value? I suspect most 
people won't think to change the config but might benefit from a longer history.

We should also make sure to document this in 
docs/topics/impala_shell_options.xml.

http://gerrit.cloudera.org:8080/#/c/6335/2/shell/impala_shell.py
File shell/impala_shell.py:

Line 183: self.readline.set_history_length(int(options.history_max))
It would be nice to produce a human-friendly error message. I get the 
following, which hints at the problem but has a lot of noise:

tarmstrong@tarmstrong-box:~/Impala/incubator-impala$ impala-shell.sh 
Starting Impala Shell without Kerberos authentication
Traceback (most recent call last):
  File "/home/tarmstrong/Impala/incubator-impala/shell/impala_shell.py", line 
1382, in 
shell = ImpalaShell(options)
  File "/home/tarmstrong/Impala/incubator-impala/shell/impala_shell.py", line 
183, in __init__
self.readline.set_history_length(int(options.history_max))
ValueError: invalid literal for int() with base 10: '100k'


-- 
To view, visit http://gerrit.cloudera.org:8080/6335
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iaf65bbecb8fd7f1105aac62b6745d6125a603d7f
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5128: BE Test Infrastructure

2017-03-30 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-5128: BE Test Infrastructure
..


Patch Set 2: Code-Review+1

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6477/2/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

Line 2093:   if (FLAGS_run_exhaustive)
nit: missing { }


http://gerrit.cloudera.org:8080/#/c/6477/2/be/src/testutil/gtest-util.h
File be/src/testutil/gtest-util.h:

PS2, Line 25: DECLARE_bool(run_exhaustive);
This can use a one line comment.


http://gerrit.cloudera.org:8080/#/c/6477/2/bin/run-backend-tests.sh
File bin/run-backend-tests.sh:

Line 30: # parse command line options
Not your change but it helps to add a comment that -E and -R are ctest options 
to exclude and include tests matching the given regular expression


-- 
To view, visit http://gerrit.cloudera.org:8080/6477
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I71fe6de39dff21b21d322daf0232b0578bdff297
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Increase the robustness of pip download.py

2017-03-30 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: Increase the robustness of pip_download.py
..


Patch Set 1:

Is there a JIRA for this?

I'm not opposed to this as a workaround but this seems like a stopgap and it 
would be nice to have a more robust solution.

It would be nice if we could remove the dependence on the main python.org. The 
information that we're getting in JSON format is available in a HTML format on 
all mirrors (https://www.python.org/dev/peps/pep-0503/). E.g. 
https://pypi.python.org/simple/kudu-python/

-- 
To view, visit http://gerrit.cloudera.org:8080/6518
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie2d923e4b061a4197be69e3259aad881aca7b519
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4643: [DOCS] Change URLs / set up keydefs for JIRA reports

2017-03-30 Thread Michael Brown (Code Review)
Michael Brown has posted comments on this change.

Change subject: IMPALA-4643: [DOCS] Change URLs / set up keydefs for JIRA 
reports
..


Patch Set 1:

(3 comments)

I checked impala_fixed_issues down to "Issues Fixed in Impala 2.1.1" and found 
a couple problems. Other reviewers should build the docs and check the links 
for the rest of that page.

http://gerrit.cloudera.org:8080/#/c/6515/1/docs/topics/impala_fixed_issues.xml
File docs/topics/impala_fixed_issues.xml:

PS1, Line 59: https://issues.apache.org/jira/issues/?jql=
This works, but why not make it jira_list_280?


PS1, Line 573: 
This didn't work for me. It seems as if there is no fixed_issues_232 anchor.


http://gerrit.cloudera.org:8080/#/c/6515/1/docs/topics/impala_known_issues.xml
File docs/topics/impala_known_issues.xml:

PS1, Line 96: https://issues.apache.org/jira/IMPALA-2093?filter=
Doesn't work.

What about this?

  
https://issues.apache.org/jira/issues/?jql=project%20%3D%20IMPALA%20AND%20priority%20in%20(blocker%2C%20critical)%20AND%20status%20in%20(open%2C%20Reopened)%20AND%20labels%20%3D%20correctness%20ORDER%20BY%20priority%20DESC

If Gerrit muddles this, let me know and I'll email it to you.


-- 
To view, visit http://gerrit.cloudera.org:8080/6515
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I007e634f9da57289674683dd5bf64e3e3ca8f525
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Laurel Hale 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4669: [KUTIL] Import kudu util library from kudu@a1bfd7b

2017-03-30 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-4669: [KUTIL] Import kudu_util library from kudu@a1bfd7b
..


Patch Set 4: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5714
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6d89384730b60354b5fae2b1472183d2a561d170
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-03-30 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 4:

(7 comments)

Thank you for the reviews. Please see my comments and PS5.

http://gerrit.cloudera.org:8080/#/c/6495/3//COMMIT_MSG
Commit Message:

PS3, Line 9: SORT BY (...) clauses to CREATE
   : TABLE and ALTER TABLE statements
> It would be helpful to have the formal specification of the new syntax here
Done


PS3, Line 13:  TABLE t (i INT, x INT PRIMARY KEY) PARTITION BY HASH(x)
: PARTITIONS 8 SORT BY(i) STORED AS KUDU;
: CREATE TABLE t SORT BY 
> Can you expand the comment a little bit on the semantics of this? A few hig
Done


http://gerrit.cloudera.org:8080/#/c/6495/3/common/thrift/JniCatalog.thrift
File common/thrift/JniCatalog.thrift:

PS3, Line 382: Optiona
> Optional?
Done


http://gerrit.cloudera.org:8080/#/c/6495/3/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

PS3, Line 1132: // This extra production is necessary since without it the 
parser will not be able to
  :   // parse "CREATE TABLE A LIKE B".
> I don't understand this comment, since the sort_col_cols here is optional.
At first I tried adding sort_by_cols to the production above, but then the 
parser would not pick it up for queries like "CREATE TABLE A LIKE B" and 
instead try to parse it only as a "CREATE TABLE LIKE FILE" query. I could not 
figure out why it does that, but copying the rule fixes it. I agree that it 
should work. I'll ask Alex for help.


http://gerrit.cloudera.org:8080/#/c/6495/3/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

Line 452: 
> I think this only makes sense in the context of testing? If so, note that h
Done


http://gerrit.cloudera.org:8080/#/c/6495/3/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

PS3, Line 539: s' property and the query has a 'noclu
> I think that you need a '&& !hasNoClusteredHint()' here.
Done, also added a comment to explain that the hint will be ignored if there 
are sort by columns.


http://gerrit.cloudera.org:8080/#/c/6495/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

Line 515: AnalyzesOk("alter table functional.alltypes set tblproperties(" +
> Is 'sort.by.columns'='' handled?
Yes, added a test.


-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-03-30 Thread Lars Volker (Code Review)
Lars Volker has uploaded a new patch set (#5).

Change subject: IMPALA-4166: Add SORT BY sql clause
..

IMPALA-4166: Add SORT BY sql clause

This change adds support for adding SORT BY (...) clauses to CREATE
TABLE and ALTER TABLE statements. Examples are:

CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j);
CREATE TABLE t (i INT, x INT PRIMARY KEY) PARTITION BY HASH(x)
PARTITIONS 8 SORT BY(i) STORED AS KUDU;
CREATE TABLE t SORT BY (int_col,id) LIKE u;
CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip);

ALTER TABLE t SORT BY (int_col,id);
ALTER TABLE t SORT BY ();

SORT BY columns can be specified for all table types, but effectiveness
may vary based on table type, for example TEXT tables will not see
improved compression. The SORT BY clause must not contain clustering
columns for HDFS tables, and must not contain primary key columns for
Kudu tables. The columns in the SORT BY clause are stored in the
'sort.by.columns' table property and will result in an additional SORT
node being added to the plan before the final table sink. Specifying
SORT BY columns also enables clustering during inserts, so the SORT node
will contain all partitioning columns first, followed by the SORT BY
columns. We do this because SORT BY columns add a SORT node to the plan
and adding the clustering columns to the SORT node is cheap.

SORT BY columns supersede the sortby() hint, which we will remove in a
subsequent change (IMPALA-5144).

This change introduces a new TablePropertyAnalyzer class to collect
various methods used to analyze table properties.

Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
---
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
A 
fe/src/main/java/org/apache/impala/analysis/AlterTableSetSortByColumnsStmt.java
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
A fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java
M fe/src/main/java/org/apache/impala/catalog/Column.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test
M testdata/workloads/functional-planner/queries/PlannerTest/insert.test
M testdata/workloads/functional-query/queries/QueryTest/alter-table.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
30 files changed, 856 insertions(+), 54 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/5
-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-03-30 Thread Lars Volker (Code Review)
Lars Volker has uploaded a new patch set (#4).

Change subject: IMPALA-4166: Add SORT BY sql clause
..

IMPALA-4166: Add SORT BY sql clause

This change adds support for adding SORT BY (...) clauses to CREATE
TABLE and ALTER TABLE statements. Examples are:

CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j);
CREATE TABLE t (i INT, x INT PRIMARY KEY) PARTITION BY HASH(x)
PARTITIONS 8 SORT BY(i) STORED AS KUDU;
CREATE TABLE t SORT BY (int_col,id) LIKE u;
CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip);

ALTER TABLE t SORT BY (int_col,id);
ALTER TABLE t SORT BY ();

SORT BY columns can be specified for all table types, but effectiveness
may vary based on table type, for example TEXT tables will not see
improved compression. The SORT BY clause must not contain clustering
columns for HDFS tables, and must not contain primary key columns for
Kudu tables. The columns in the SORT BY clause are stored in the
'sort.by.columns' table property and will result in an additional SORT
node being added to the plan before the final table sink. Specifying
SORT BY columns also enables clustering during inserts, so the SORT node
will contain all partitioning columns first, followed by the SORT BY
columns. We do this because SORT BY columns add a SORT node to the plan
and adding the clustering columns to the SORT node is cheap.

SORT BY columns supersede the sortby() hint, which we will remove in a
subsequent change (IMPALA-5144).

This change introduces a new TablePropertyAnalyzer class to collect
various methods used to analyze table properties.

Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
---
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
A 
fe/src/main/java/org/apache/impala/analysis/AlterTableSetSortByColumnsStmt.java
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
A fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java
M fe/src/main/java/org/apache/impala/catalog/Column.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test
M testdata/workloads/functional-planner/queries/PlannerTest/insert.test
M testdata/workloads/functional-query/queries/QueryTest/alter-table.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
30 files changed, 856 insertions(+), 54 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/4
-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] Increase the robustness of pip download.py

2017-03-30 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6518

Change subject: Increase the robustness of pip_download.py
..

Increase the robustness of pip_download.py

There were some build failures due to a failure to download a JSON file
for some Python package. In this patch, we increase the number of
download attempts and randomly vary the amount of time between each
download attempt.

Change-Id: Ie2d923e4b061a4197be69e3259aad881aca7b519
---
M infra/python/deps/pip_download.py
1 file changed, 6 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/18/6518/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6518
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ie2d923e4b061a4197be69e3259aad881aca7b519
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-4643: [DOCS] Change URLs / set up keydefs for JIRA reports

2017-03-30 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-4643: [DOCS] Change URLs / set up keydefs for JIRA 
reports
..


Patch Set 1: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6515
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I007e634f9da57289674683dd5bf64e3e3ca8f525
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Laurel Hale 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-03-30 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 3:

(6 comments)

Just a heads up, this is going to have some conflicts with IMPALA-3742 
(partition and sort Kudu inserts) which I'll have a patch out for soon, but 
this will probably go in first so I guess that's my problem.

http://gerrit.cloudera.org:8080/#/c/6495/3//COMMIT_MSG
Commit Message:

PS3, Line 9: SORT BY (...) clauses to CREATE
   : TABLE and ALTER TABLE statements
It would be helpful to have the formal specification of the new syntax here, to 
show where the clause would go, e.g.:
ALTER TABLE name SORT BY (col_spec[, col_spec...])
etc.


http://gerrit.cloudera.org:8080/#/c/6495/3/common/thrift/JniCatalog.thrift
File common/thrift/JniCatalog.thrift:

PS3, Line 382: Options
Optional?


http://gerrit.cloudera.org:8080/#/c/6495/3/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

PS3, Line 1132: // This extra production is necessary since without it the 
parser will not be able to
  :   // parse "CREATE TABLE A LIKE B".
I don't understand this comment, since the sort_col_cols here is optional.


http://gerrit.cloudera.org:8080/#/c/6495/3/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

Line 452:   public void setNumClusteringCols(int n) { numClusteringCols_ = n; }
I think this only makes sense in the context of testing? If so, note that here.


http://gerrit.cloudera.org:8080/#/c/6495/3/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

PS3, Line 539: !insertStmt.getSortByExprs().isEmpty()
I think that you need a '&& !hasNoClusteredHint()' here.


http://gerrit.cloudera.org:8080/#/c/6495/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

Line 515: AnalysisError("alter table functional.alltypes set 
tblproperties(" +
Is 'sort.by.columns'='' handled?


-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4226, IMPALA-4227: Bump Breakpad version

2017-03-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4226, IMPALA-4227: Bump Breakpad version
..


Patch Set 1: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6513
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I89cbcc3f5cc742a0f5700f3985c02482bdfd03d0
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4226, IMPALA-4227: Bump Breakpad version

2017-03-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4226, IMPALA-4227: Bump Breakpad version
..


IMPALA-4226, IMPALA-4227: Bump Breakpad version

Change-Id: I89cbcc3f5cc742a0f5700f3985c02482bdfd03d0
Reviewed-on: http://gerrit.cloudera.org:8080/6513
Reviewed-by: Lars Volker 
Tested-by: Impala Public Jenkins
---
M bin/impala-config.sh
1 file changed, 2 insertions(+), 2 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Lars Volker: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/6513
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I89cbcc3f5cc742a0f5700f3985c02482bdfd03d0
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 


[Impala-ASF-CR] IMPALA-4643: [DOCS] Change URLs / set up keydefs for JIRA reports

2017-03-30 Thread John Russell (Code Review)
John Russell has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6515

Change subject: IMPALA-4643: [DOCS] Change URLs / set up keydefs for JIRA 
reports
..

IMPALA-4643: [DOCS] Change URLs / set up keydefs for JIRA reports

In addition to switching URLs for individual JIRAs
to point to issues.apache.org, also construct alternative
URLs for JIRA reports (usually 1 per release, summarizing
the issues fixed in that release).

There is some inconsistency in which releases have associated
JIRA reports. For releases where we never published a link to
a JIRA report, I added a  tag that could be used to
construct a report in future, but didn't fill in any URL.

Some releases do have a link to a JIRA report in the release
notes, however the report is empty. Possibly there was some
strangeness around the release process for those, or there
could be some missing info in the JIRA system. I just
transcribed the links in those cases and didn't try to
reconstruct the history to debug the empty reports.

Change-Id: I007e634f9da57289674683dd5bf64e3e3ca8f525
---
M docs/impala_keydefs.ditamap
M docs/topics/impala_fixed_issues.xml
M docs/topics/impala_known_issues.xml
3 files changed, 212 insertions(+), 66 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/6515/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6515
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I007e634f9da57289674683dd5bf64e3e3ca8f525
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 


[Impala-ASF-CR] IMPALA-4889: Use client sidecars for Thrift RPCs

2017-03-30 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-4889: Use client sidecars for Thrift RPCs
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6473/1/be/src/rpc/rpc.h
File be/src/rpc/rpc.h:

Line 169: int idx = -1;
> what's undefined about it?
Hm, looks like compilers won't warn in this case (I thought they'd be more 
conservative), but turns out:

  void t(int* p) { }
  int foo() {
int i;
t(&i);
if (i) { }
  }

doesn't give a warning even though i is uninitialized (whereas if you omit the 
call to t(), it does). 

Long story short, I removed the initialization.


Line 179: RETURN_IF_ERROR(DeserializeThriftMsg(sidecar.data(), &len, true, 
resp));
> how long is sidecar.data() valid?
As long as the controller_ object is valid, so as long as the Rpc is valid 
(which is usually not very long after this method returns).


-- 
To view, visit http://gerrit.cloudera.org:8080/6473
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0fcc87c6b67aa167d70ae022663b14bc90261c95
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4889: Use client sidecars for Thrift RPCs

2017-03-30 Thread Henry Robinson (Code Review)
Henry Robinson has uploaded a new patch set (#3).

Change subject: IMPALA-4889: Use client sidecars for Thrift RPCs
..

IMPALA-4889: Use client sidecars for Thrift RPCs

This patch changes the way Thrift structures are serialized with
KRPC. Instead of serializing them to a byte stream, and then writing
that stream to a Protobuf object which is serialized again en route to
the wire, the Thrift objects are serialized to byte streams which are
then directly written to the wire as a 'sidecar'. This saves a copy and
serialization step both on the sender and receiver sides of the RPC.

Change-Id: I0fcc87c6b67aa167d70ae022663b14bc90261c95
---
M be/src/rpc/common.proto
M be/src/rpc/rpc-mgr-test.cc
M be/src/rpc/rpc.h
M be/src/rpc/thrift-util.h
M be/src/statestore/statestore-subscriber.cc
M be/src/statestore/statestore-test.cc
M be/src/statestore/statestore.cc
7 files changed, 125 insertions(+), 26 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/6473/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6473
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0fcc87c6b67aa167d70ae022663b14bc90261c95
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 


[Impala-ASF-CR] IMPALA-5140: improve docs building guidelines

2017-03-30 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-5140: improve docs building guidelines
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6512/4/docs/README.md
File docs/README.md:

Line 6: * Open a terminal window and run the following commands to get the 
Impala documentation source files from Git:
Could yo wrap long lines to help with the gerrit display to help ease reviewing?


-- 
To view, visit http://gerrit.cloudera.org:8080/6512
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I71ae79ecd346045697fe225140ee9a317c5a337f
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Brown 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-03-30 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 3:

(1 comment)

Lars, a few high level questions before I start the next review iteration. 
Thanks

http://gerrit.cloudera.org:8080/#/c/6495/3//COMMIT_MSG
Commit Message:

PS3, Line 13: Specifying SORT BY columns also enables clustering during inserts,
: so the SORT node will contain all partitioning columns first, 
followed
: by the SORT BY columns.
Can you expand the comment a little bit on the semantics of this? A few high 
level questions are:
1. What is the connection between this and the sortby() hints? Can I use both? 
What if there is a conflict? Same for clustered hint.
2. Why does sort by enables clustering as well? 
3. Can I specify any table columns (even the partitioning columns) in the SORT 
BY clause or are there some restrictions? 
4. I suppose this works for all table types (including Kudu)?


-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4318: Kudu support for CREATE EXTERNAL TABLE AS SELECT

2017-03-30 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change.

Change subject: IMPALA-4318:  Kudu support for CREATE EXTERNAL TABLE AS SELECT
..


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6261/11/shell/impala_shell.py
File shell/impala_shell.py:

Line 49: HISTORY_LENGTH = 1
> revert that
Whoops, this was supposed to be a local only change.


-- 
To view, visit http://gerrit.cloudera.org:8080/6261
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9aa82809a6c0c5e6386827314b7e5b520c1a6633
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4318: Kudu support for CREATE EXTERNAL TABLE AS SELECT

2017-03-30 Thread Zach Amsden (Code Review)
Hello Matthew Jacobs, Dan Hecht,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6261

to look at the new patch set (#12).

Change subject: IMPALA-4318:  Kudu support for CREATE EXTERNAL TABLE AS SELECT
..

IMPALA-4318:  Kudu support for CREATE EXTERNAL TABLE AS SELECT

Create external table can be used for two different operations,
creating a new table by selecting data from another, and importing
an existing unmanaged external table.  Previously, for Kudu tables,
it was only possible to issue CTAS for managed tables, and external
tables were assumed to be imports of already existing data.  This
change allows the creation of new, unmanaged (external) kudu tables
from SELECT statements.

In the catalog executor, we detect if a table creation is new by
inferring that from the presence of column definitions, not by whether
it is external / managed.  Front-end code is changed to allow this
through the parser.  This means any attempts to specify table metadata
for Kudu tables are now interpreted as a table creation.

Testing: Manual testing, expanded ParserTest and AnalyzeDDLTest
and added test cases to query_test to validate Kudu support.

Ran the folowing query:

create external table test2_name primary key(id) partition by hash(id)
partitions 10 stored as kudu as select * from test_name;

Which successfully created and inserted 3 rows of data.  Tried many
other things like changing primary keys and forms of partitioning.

Change-Id: I9aa82809a6c0c5e6386827314b7e5b520c1a6633
---
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M tests/query_test/test_kudu.py
7 files changed, 91 insertions(+), 32 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/61/6261/12
-- 
To view, visit http://gerrit.cloudera.org:8080/6261
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9aa82809a6c0c5e6386827314b7e5b520c1a6633
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-4318: Kudu support for CREATE EXTERNAL TABLE AS SELECT

2017-03-30 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-4318:  Kudu support for CREATE EXTERNAL TABLE AS SELECT
..


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6261/11/shell/impala_shell.py
File shell/impala_shell.py:

Line 49: HISTORY_LENGTH = 1
revert that


-- 
To view, visit http://gerrit.cloudera.org:8080/6261
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9aa82809a6c0c5e6386827314b7e5b520c1a6633
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3203: Part 2: per-core free lists in buffer pool

2017-03-30 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-3203: Part 2: per-core free lists in buffer pool
..


Patch Set 15:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/6414/14/be/src/runtime/bufferpool/buffer-allocator.cc
File be/src/runtime/bufferpool/buffer-allocator.cc:

Line 347: lock_guard pl(page->buffer_lock);
> 'client_lock' would guard against the page being moved to a different arena
Yup, realized that after writing this comment. oops.


http://gerrit.cloudera.org:8080/#/c/6414/15/be/src/runtime/bufferpool/buffer-allocator.cc
File be/src/runtime/bufferpool/buffer-allocator.cc:

PS15, Line 59: .
should be more explicit that it might return less memory than 'target_bytes'. 
Maybe say "Free as much memory as possible up to 'target_bytes' from this area 
back to the system allocator.


PS15, Line 64: returns the total amount freed 
hmm, it's kinda confusing that the return value has this subtle difference in 
meaning depending on claim_memory. 

It seems the most intuitive return value would be "Returns the number of bytes 
of system memory claimed for the caller. Memory returned to the system in 
excess of that amount is added into 'system_bytes_remaining_'.

Can the claim_memory=false path accommodate something like that? If we can't 
reconcile it to have a single return value, then it seems we really have two 
different abstractions bunched into one call.


PS15, Line 70: FreeMemory
do you think it makes sense to call this FreeSystemMemory() to make it clearer 
that this is about pushing memory back to the system?


PS15, Line 254: ffinally
typo


Line 261:   if (delta == len) break;
since we can't get rid of the other DecreaseSystemBytesRemaining() probably 
better to move this back to where it was.


PS15, Line 263: lock_arenas
thought this would become 'final_attempt'


Line 304:   vector> arena_locks;
let's add a comment explaining all of this. Something like:

There are two strategies for scavenging buffers:
1) Fast, opportunistic: Each arena is searched in succession. Although 
reservations guarantee that the memory we need is available somewhere, this can 
fail because we can race with another thread that returned buffers to an arena 
that we've already searched and took the buffers from an arena we haven't yet 
searched.

2) Slow, guaranteed to succeed: In order to ensure that we can find the memory 
in a single pass, we hold on to the locks for arenas we've already examined. 
That way, if another thread can't take the memory that we need from an arena 
that we haven't yet examined (or from 'system_bytes_available_') because in 
order to do so, it would have had to return the equivalent amount of memory to 
an earlier arena or added it back into 'systems_bytes_reamining_'. The former 
can't happen since we're still holding those locks, and the latter is solved by 
trying DecreaseSystemBytesRemaining() at the end.


PS15, Line 317:   // It's possible that 'system_bytes_remaining_' was 
incremented since we last checked
  :   // it.
maybe this sentence would now be explained in the proposed comment above.


PS15, Line 319: another thread between when that thread released memory to the 
system and
  :   // incremented 'system_bytes_remaining_'.
I think it'd be good to explicitly say:
... since that happens while holding one of the arena locks.
or similar. though the exception is when AllocateInternal() we add it back in 
without holding a lock, which isn't a correctness problem since that memory 
couldn't be needed by another client, but makes the statement a little 
confusing.


PS15, Line 322: false
why false instead of true? I guess it doesn't matter since it should succeed to 
get all of it, but 'true' seems more intuitive to me.


http://gerrit.cloudera.org:8080/#/c/6414/15/be/src/runtime/bufferpool/buffer-allocator.h
File be/src/runtime/bufferpool/buffer-allocator.h:

Line 47: /// required memory from the OS).
I think explaining a couple of more things would help:

1) The fact that BufferAllocator relies on the reservations (which are managed 
by the BufferPool & client) and the BufferPool's eviction policy, for 
correctness, to ensure that a client that has unused reservation can call 
Allocate() and the BufferAllocator() will always be able to find the memory to 
back the buffer somewhere.

2) Some implementation notes explaining that available memory is sort of kept 
in three forms: buffers in the free lists, buffers in the clean pages lists, 
and available memory that can be called up from the system. This information is 
kind of in the above paragraphs, but I think explaining it in that way will 
help understand the implementation. Probably paragraph two above can be moved 
into this section since the class abstraction itself doesn't expose arenas, 
right?


PS15, Line 91: void
does the caller care if it couldn't release 'bytes_to_free' bytes?


PS15, Line 133: 

[Impala-ASF-CR] IMPALA-4226, IMPALA-4227: Bump Breakpad version

2017-03-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4226, IMPALA-4227: Bump Breakpad version
..


Patch Set 1:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/423/

-- 
To view, visit http://gerrit.cloudera.org:8080/6513
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I89cbcc3f5cc742a0f5700f3985c02482bdfd03d0
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5123: Fix ASAN use after free in timezone db

2017-03-30 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-5123: Fix ASAN use after free in timezone_db
..


Patch Set 7:

Thanks for fixing this! Could you close the Jira?

-- 
To view, visit http://gerrit.cloudera.org:8080/6503
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I490f741403ea2004bc51394aa1251577337b1e1d
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5123: Fix ASAN use after free in timezone db

2017-03-30 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change.

Change subject: IMPALA-5123: Fix ASAN use after free in timezone_db
..


Patch Set 7:

While I did forget to remove the std:: on vector, but I think enough effort has 
gone into this.  Back to bug-hunting.

-- 
To view, visit http://gerrit.cloudera.org:8080/6503
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I490f741403ea2004bc51394aa1251577337b1e1d
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5123: Fix ASAN use after free in timezone db

2017-03-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5123: Fix ASAN use after free in timezone_db
..


Patch Set 6: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6503
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I490f741403ea2004bc51394aa1251577337b1e1d
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5123: Fix ASAN use after free in timezone db

2017-03-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5123: Fix ASAN use after free in timezone_db
..


IMPALA-5123: Fix ASAN use after free in timezone_db

The issue is that the string temporary returned by .string goes
out of scope immediately after being created.  Also, the API
to mkstemp is unclear on whether it modifies the string in place.
Just strdup() the c_str() to be safe - this is not performance
critical code.

Testing: ASAN build, running expr-test be test; ASAN fails before,
passes after this change.

Change-Id: I490f741403ea2004bc51394aa1251577337b1e1d
Reviewed-on: http://gerrit.cloudera.org:8080/6503
Reviewed-by: Lars Volker 
Tested-by: Impala Public Jenkins
---
M be/src/exprs/timezone_db.cc
1 file changed, 15 insertions(+), 11 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Lars Volker: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/6503
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I490f741403ea2004bc51394aa1251577337b1e1d
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Zach Amsden 


[native-toolchain-CR] Ported native-toolchain to work on ppc64le

2017-03-30 Thread Valencia Edna Serrao (Code Review)
Valencia Edna Serrao has posted comments on this change.

Change subject: Ported native-toolchain to work on ppc64le
..


Patch Set 1:

(1 comment)

I'm ready with almost all required changes, except, for crcutil. Before I push 
them on gerrit, I need to know your views on patching mechanism usage issue 
w.r.t crcutil below.

http://gerrit.cloudera.org:8080/#/c/6468/1/source/crcutil/build.sh
File source/crcutil/build.sh:

Line 42:   wrap autoreconf -i;
> The patch is applied on Makefile.am which is generated only after autogen.s
Patching mechanism usage for 2nd patch file. Please let me know your views on 
this.


-- 
To view, visit http://gerrit.cloudera.org:8080/6468
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7931752ca497bc7a5e3cc574bbb54637f382c72a
Gerrit-PatchSet: 1
Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-Owner: Valencia Edna Serrao 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Valencia Edna Serrao 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5123: Fix ASAN use after free in timezone db

2017-03-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5123: Fix ASAN use after free in timezone_db
..


Patch Set 6:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/422/

-- 
To view, visit http://gerrit.cloudera.org:8080/6503
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I490f741403ea2004bc51394aa1251577337b1e1d
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: No