[Impala-ASF-CR] IMPALA-3586: Implement union passthrough

2017-03-20 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#21).

Change subject: IMPALA-3586: Implement union passthrough
..

IMPALA-3586: Implement union passthrough

The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.

Removed operand reordering in the FE because it's simpler and safer to
handle all passthrough children before non-passthrough children in the
BE. The recent improvements to memory management allowed us to remove
this requirement.

Testing:
- Added new planner and end to end tests that cover the new
  functionality.
- Updated existing tests to reflect the new behavior.

Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:

SELECT
  COUNT(ss_sold_time_sk),
  COUNT(ss_item_sk),
  COUNT(ss_customer_sk),
  COUNT(ss_cdemo_sk),
  COUNT(ss_hdemo_sk),
  COUNT(ss_addr_sk),
  COUNT(ss_store_sk),
  COUNT(ss_promo_sk),
  COUNT(ss_ticket_number),
  COUNT(ss_quantity),
  COUNT(ss_wholesale_cost),
  COUNT(ss_list_price),
  COUNT(ss_sales_price),
  COUNT(ss_ext_discount_amt),
  COUNT(ss_ext_sales_price),
  COUNT(ss_ext_wholesale_cost),
  COUNT(ss_ext_list_price),
  COUNT(ss_ext_tax),
  COUNT(ss_coupon_amt),
  COUNT(ss_net_paid),
  COUNT(ss_net_paid_inc_tax),
  COUNT(ss_net_profit),
  COUNT(ss_sold_date_sk)
FROM (
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
) t

Before:
Total Time: 43s164ms

Summary:
Operator  #Hosts   Avg Time   Max Time#Rows  Est. #Rows   Peak Mem  
Est. Peak Mem  Detail
--
13:AGGREGATE   1  224.721us  224.721us1   1   28.00 KB  
  -1.00 B  FINALIZE
12:EXCHANGE1   24.578us   24.578us3   1  0  
  -1.00 B  UNPARTITIONED
11:AGGREGATE   32s402ms3s060ms3   1  119.00 KB  
 10.00 MB
00:UNION   3   35s380ms   37s846ms  288.01M 288.01M3.08 MB  
0
|--02:SCAN HDFS3  184.197ms  219.931ms   28.80M  28.80M  535.03 MB  
  1.88 GB  store_sales_unpartitioned
|--03:SCAN HDFS3  131.956ms  153.401ms   28.80M  28.80M  534.98 MB  
  1.88 GB  store_sales_unpartitioned
|--04:SCAN HDFS3  178.456ms  247.721ms   28.80M  28.80M  534.98 MB  
  1.88 GB  store_sales_unpartitioned
|--05:SCAN HDFS3  189.398ms  242.251ms   28.80M  28.80M  535.01 MB  
  1.88 GB  store_sales_unpartitioned
|--06:SCAN HDFS3  122.786ms  156.528ms   28.80M  28.80M  534.98 MB  
  1.88 GB  store_sales_unpartitioned
|--07:SCAN HDFS3  147.467ms  183.391ms   28.80M  28.80M  535.13 MB  
  1.88 GB  store_sales_unpartitioned
|--08:SCAN HDFS3  147.502ms  186.273ms   28.80M  28.80M  535.01 MB  
  1.88 GB  store_sales_unpartitioned
|--09:SCAN HDFS3  130.086ms  154.682ms   28.80M  28.80M  535.04 MB  
  1.88 GB  store_sales_unpartitioned
|--10:SCAN HDFS3  122.701ms  161.056ms   28.80M  28.80M  534.89 MB  
  1.88 GB  store_sales_unpartitioned
01:SCAN HDFS   3  287.863ms  330.436ms   28.80M  28.80M  534.98 MB  
  1.88 GB  store_sales_unpartitioned

After:
Total Time: 19s139ms

Summary:
Operator  #Hosts   Avg Time   Max Time#Rows  Est. #Rows   Peak Mem  
Est. Peak Mem  Detail
--
13:AGGREGATE   1  166.241us  166.241us1   1   28.00 KB  
  -1.00 B  FINALIZE
12:EXCHANGE1   71.695us   71.695us3   1  0  
  -1.00 B  UNPARTITIONED
11:AGGREGATE   32s971ms3s809ms3   13.08 MB  
 10.00 MB
00:UNION   3  207.956ms  222.846ms  288.01M 288.01M  0  
0
|--02:SCAN HDFS31s533ms1s535ms   28.80M  28.80M  532.28 MB  
  1.88 

[Impala-ASF-CR] Pass build type to Impala LZO.

2017-03-20 Thread Alex Behm (Code Review)
Alex Behm has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6446

Change subject: Pass build type to Impala LZO.
..

Pass build type to Impala LZO.

Before, the build type used for Impala LZO was always debug.
Now, the build type is passed from the Impala CMakeLists.txt.

This patch needs corresponding changes to Impala LZO.

Testing: I tested locally with these build types:
DEBUG, RELEASE, and ADDRESS_SANITIZER.

Change-Id: Ia83e594409ad5938662ca210c810d5d31b8637b0
---
M CMakeLists.txt
1 file changed, 2 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/6446/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ia83e594409ad5938662ca210c810d5d31b8637b0
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 


[Impala-ASF-CR] IMPALA-5088: Fix heap buffer overflow

2017-03-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5088: Fix heap buffer overflow
..


Patch Set 2:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/401/

-- 
To view, visit http://gerrit.cloudera.org:8080/6441
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia68d69b35d53934e6f1aae2e1696503d17660ca2
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3586: Implement union passthrough

2017-03-20 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-3586: Implement union passthrough
..


Patch Set 20:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/5816/19/be/src/exec/union-node.cc
File be/src/exec/union-node.cc:

Line 263:   (!HasMorePassthrough() && !HasMoreMaterialized() && 
!HasMoreConst(state));
> Unfortunately that wouldn't work.
We could make it work but would require changing how row-batches work a little. 
I agree it's not worth it, so let's leave this alone.


http://gerrit.cloudera.org:8080/#/c/5816/19/be/src/exec/union-node.h
File be/src/exec/union-node.h:

Line 129: return first_materialized_child_idx_ <= child_idx_ && child_idx_ 
< children_.size();
> Reordered it as you suggested. (By the way, in Python you can actually writ
The condition you have here doesn't tell "if there are still rows to be 
returned from children than need materialization".  Your condition tells you 
whether we are currently processing children that need passthrough.

The condition that makes sense for this function is the one I wrote earlier, 
read the comment to see why:

// We have children that need materialization and haven't processed them all 
yet.
first_materizlied_child_idx_ != children_.size() && child_idx_ < 
children_.size()


-- 
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 20
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5057: Upgrade glog (0.3.4-p2) and gflags (2.2.0)

2017-03-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5057: Upgrade glog (0.3.4-p2) and gflags (2.2.0)
..


IMPALA-5057: Upgrade glog (0.3.4-p2) and gflags (2.2.0)

Change-Id: If5c8718c67f021e18b3b95178b077fc147d6fcee
Reviewed-on: http://gerrit.cloudera.org:8080/6436
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins
---
M bin/impala-config.sh
1 file changed, 3 insertions(+), 3 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Tim Armstrong: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/6436
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: If5c8718c67f021e18b3b95178b077fc147d6fcee
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5057: Upgrade glog (0.3.4-p2) and gflags (2.2.0)

2017-03-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5057: Upgrade glog (0.3.4-p2) and gflags (2.2.0)
..


Patch Set 2: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6436
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If5c8718c67f021e18b3b95178b077fc147d6fcee
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3586: Implement union passthrough

2017-03-20 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#20).

Change subject: IMPALA-3586: Implement union passthrough
..

IMPALA-3586: Implement union passthrough

The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.

Removed operand reordering in the FE because it's simpler and safer to
handle all passthrough children before non-passthrough children in the
BE. The recent improvements to memory management allowed us to remove
this requirement.

Testing:
- Added new planner and end to end tests that cover the new
  functionality.
- Updated existing tests to reflect the new behavior.

Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:

SELECT
  COUNT(ss_sold_time_sk),
  COUNT(ss_item_sk),
  COUNT(ss_customer_sk),
  COUNT(ss_cdemo_sk),
  COUNT(ss_hdemo_sk),
  COUNT(ss_addr_sk),
  COUNT(ss_store_sk),
  COUNT(ss_promo_sk),
  COUNT(ss_ticket_number),
  COUNT(ss_quantity),
  COUNT(ss_wholesale_cost),
  COUNT(ss_list_price),
  COUNT(ss_sales_price),
  COUNT(ss_ext_discount_amt),
  COUNT(ss_ext_sales_price),
  COUNT(ss_ext_wholesale_cost),
  COUNT(ss_ext_list_price),
  COUNT(ss_ext_tax),
  COUNT(ss_coupon_amt),
  COUNT(ss_net_paid),
  COUNT(ss_net_paid_inc_tax),
  COUNT(ss_net_profit),
  COUNT(ss_sold_date_sk)
FROM (
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
) t

Before:
Total Time: 43s164ms

Summary:
Operator  #Hosts   Avg Time   Max Time#Rows  Est. #Rows   Peak Mem  
Est. Peak Mem  Detail
--
13:AGGREGATE   1  224.721us  224.721us1   1   28.00 KB  
  -1.00 B  FINALIZE
12:EXCHANGE1   24.578us   24.578us3   1  0  
  -1.00 B  UNPARTITIONED
11:AGGREGATE   32s402ms3s060ms3   1  119.00 KB  
 10.00 MB
00:UNION   3   35s380ms   37s846ms  288.01M 288.01M3.08 MB  
0
|--02:SCAN HDFS3  184.197ms  219.931ms   28.80M  28.80M  535.03 MB  
  1.88 GB  store_sales_unpartitioned
|--03:SCAN HDFS3  131.956ms  153.401ms   28.80M  28.80M  534.98 MB  
  1.88 GB  store_sales_unpartitioned
|--04:SCAN HDFS3  178.456ms  247.721ms   28.80M  28.80M  534.98 MB  
  1.88 GB  store_sales_unpartitioned
|--05:SCAN HDFS3  189.398ms  242.251ms   28.80M  28.80M  535.01 MB  
  1.88 GB  store_sales_unpartitioned
|--06:SCAN HDFS3  122.786ms  156.528ms   28.80M  28.80M  534.98 MB  
  1.88 GB  store_sales_unpartitioned
|--07:SCAN HDFS3  147.467ms  183.391ms   28.80M  28.80M  535.13 MB  
  1.88 GB  store_sales_unpartitioned
|--08:SCAN HDFS3  147.502ms  186.273ms   28.80M  28.80M  535.01 MB  
  1.88 GB  store_sales_unpartitioned
|--09:SCAN HDFS3  130.086ms  154.682ms   28.80M  28.80M  535.04 MB  
  1.88 GB  store_sales_unpartitioned
|--10:SCAN HDFS3  122.701ms  161.056ms   28.80M  28.80M  534.89 MB  
  1.88 GB  store_sales_unpartitioned
01:SCAN HDFS   3  287.863ms  330.436ms   28.80M  28.80M  534.98 MB  
  1.88 GB  store_sales_unpartitioned

After:
Total Time: 19s139ms

Summary:
Operator  #Hosts   Avg Time   Max Time#Rows  Est. #Rows   Peak Mem  
Est. Peak Mem  Detail
--
13:AGGREGATE   1  166.241us  166.241us1   1   28.00 KB  
  -1.00 B  FINALIZE
12:EXCHANGE1   71.695us   71.695us3   1  0  
  -1.00 B  UNPARTITIONED
11:AGGREGATE   32s971ms3s809ms3   13.08 MB  
 10.00 MB
00:UNION   3  207.956ms  222.846ms  288.01M 288.01M  0  
0
|--02:SCAN HDFS31s533ms1s535ms   28.80M  28.80M  532.28 MB  
  1.88 

[Impala-ASF-CR] IMPALA-5072: Fix test recover partitions on S3

2017-03-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5072: Fix test_recover_partitions on S3
..


IMPALA-5072: Fix test_recover_partitions on S3

On S3, directory creation is a nop, so we have to actually
create values to materialize the partitions.

Was able to merge this against my fork and successfully ran
the metadata/test_recover_partitions test on S3.  Also ran
locally to verify there is no slowdown for HDFS.

Change-Id: I30d86e49b000db925a778ff53479b61a075dc88c
Reviewed-on: http://gerrit.cloudera.org:8080/6408
Reviewed-by: Michael Brown 
Reviewed-by: Dimitris Tsirogiannis 
Tested-by: Impala Public Jenkins
---
M tests/metadata/test_recover_partitions.py
1 file changed, 7 insertions(+), 1 deletion(-)

Approvals:
  Impala Public Jenkins: Verified
  Michael Brown: Looks good to me, approved
  Dimitris Tsirogiannis: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/6408
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I30d86e49b000db925a778ff53479b61a075dc88c
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-3586: Implement union passthrough

2017-03-20 Thread Taras Bobrovytsky (Code Review)
Hello Alex Behm, Dan Hecht,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5816

to look at the new patch set (#20).

Change subject: IMPALA-3586: Implement union passthrough
..

IMPALA-3586: Implement union passthrough

The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.

Removed operand reordering in the FE because it's simpler and safer to
handle all passthrough children before non-passthrough children in the
BE. The recent improvements to memory management allowed us to remove
this requirement.

Testing:
- Added new planner and end to end tests that cover the new
  functionality.
- Updated existing tests to reflect the new behavior.

Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:

SELECT
  COUNT(ss_sold_time_sk),
  COUNT(ss_item_sk),
  COUNT(ss_customer_sk),
  COUNT(ss_cdemo_sk),
  COUNT(ss_hdemo_sk),
  COUNT(ss_addr_sk),
  COUNT(ss_store_sk),
  COUNT(ss_promo_sk),
  COUNT(ss_ticket_number),
  COUNT(ss_quantity),
  COUNT(ss_wholesale_cost),
  COUNT(ss_list_price),
  COUNT(ss_sales_price),
  COUNT(ss_ext_discount_amt),
  COUNT(ss_ext_sales_price),
  COUNT(ss_ext_wholesale_cost),
  COUNT(ss_ext_list_price),
  COUNT(ss_ext_tax),
  COUNT(ss_coupon_amt),
  COUNT(ss_net_paid),
  COUNT(ss_net_paid_inc_tax),
  COUNT(ss_net_profit),
  COUNT(ss_sold_date_sk)
FROM (
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
) t

Before:
Total Time: 43s164ms

Summary:
Operator  #Hosts   Avg Time   Max Time#Rows  Est. #Rows   Peak Mem  
Est. Peak Mem  Detail
--
13:AGGREGATE   1  224.721us  224.721us1   1   28.00 KB  
  -1.00 B  FINALIZE
12:EXCHANGE1   24.578us   24.578us3   1  0  
  -1.00 B  UNPARTITIONED
11:AGGREGATE   32s402ms3s060ms3   1  119.00 KB  
 10.00 MB
00:UNION   3   35s380ms   37s846ms  288.01M 288.01M3.08 MB  
0
|--02:SCAN HDFS3  184.197ms  219.931ms   28.80M  28.80M  535.03 MB  
  1.88 GB  store_sales_unpartitioned
|--03:SCAN HDFS3  131.956ms  153.401ms   28.80M  28.80M  534.98 MB  
  1.88 GB  store_sales_unpartitioned
|--04:SCAN HDFS3  178.456ms  247.721ms   28.80M  28.80M  534.98 MB  
  1.88 GB  store_sales_unpartitioned
|--05:SCAN HDFS3  189.398ms  242.251ms   28.80M  28.80M  535.01 MB  
  1.88 GB  store_sales_unpartitioned
|--06:SCAN HDFS3  122.786ms  156.528ms   28.80M  28.80M  534.98 MB  
  1.88 GB  store_sales_unpartitioned
|--07:SCAN HDFS3  147.467ms  183.391ms   28.80M  28.80M  535.13 MB  
  1.88 GB  store_sales_unpartitioned
|--08:SCAN HDFS3  147.502ms  186.273ms   28.80M  28.80M  535.01 MB  
  1.88 GB  store_sales_unpartitioned
|--09:SCAN HDFS3  130.086ms  154.682ms   28.80M  28.80M  535.04 MB  
  1.88 GB  store_sales_unpartitioned
|--10:SCAN HDFS3  122.701ms  161.056ms   28.80M  28.80M  534.89 MB  
  1.88 GB  store_sales_unpartitioned
01:SCAN HDFS   3  287.863ms  330.436ms   28.80M  28.80M  534.98 MB  
  1.88 GB  store_sales_unpartitioned

After:
Total Time: 19s139ms

Summary:
Operator  #Hosts   Avg Time   Max Time#Rows  Est. #Rows   Peak Mem  
Est. Peak Mem  Detail
--
13:AGGREGATE   1  166.241us  166.241us1   1   28.00 KB  
  -1.00 B  FINALIZE
12:EXCHANGE1   71.695us   71.695us3   1  0  
  -1.00 B  UNPARTITIONED
11:AGGREGATE   32s971ms3s809ms3   13.08 MB  
 10.00 MB
00:UNION   3  207.956ms  222.846ms  288.01M 288.01M  0  
  

[Impala-ASF-CR] IMPALA-5072: Fix test recover partitions on S3

2017-03-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5072: Fix test_recover_partitions on S3
..


Patch Set 3: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6408
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I30d86e49b000db925a778ff53479b61a075dc88c
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3586: Implement union passthrough

2017-03-20 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-3586: Implement union passthrough
..


Patch Set 19:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/5816/19/be/src/exec/union-node.cc
File be/src/exec/union-node.cc:

Line 111:   DCHECK_LT(child_idx_, children_.size());
> this DCHECK should come before any use of child_idx_ since if it's violated
Done


Line 114: 
DCHECK(child(child_idx_)->row_desc().LayoutEquals(row_batch->row_desc()));
> what do these dchecks have to do with child_eos_? i.e. shouldn't they true 
Yes, They should be true. I think the idea was to call this once per child. I 
actually think it's safer to call this every time because currently it does not 
get called for the first child (which gets opened in Open()).

Changed it so that it gets called every time.


PS19, Line 132: child_idx_++
> ++child_idx_ per our style.
Done


PS19, Line 147: child_idx_ < children_.size()
> Would make more sense as HasMoreMaterialized()
Done. Even though this does an extra unnecessary check 
(first_materialized_child_idx_ <= child_idx_), this is cleaner.


PS19, Line 150: children
> children that need materialization
Done


PS19, Line 153: Row
> Child row batch
Done


Line 168:   // There are only 3 ways of getting out of this loop:
> it would be helpful to concisely say what this loop is responsible for give
Done


Line 193: COUNTER_SET(rows_returned_counter_, num_rows_returned_);
> should we do: child_batch_.reset() here? we shouldn't ever reference it aga
I see. So the invariant should be if child_batch references something (it's not 
reset), then it corresponds to an open materialized child.

Added reset() here. By the way, reset() will be called twice, once here and 
once in Close().


PS19, Line 200: // We end up here iff one of the following is true (or 
both).
  : // 1. We are done consuming all batches from the current 
child and we need to move on
  : //to the next child.
  : // 2. The output row batch is at capacity.
> this could be summarized for a quicker read:
Changed the structure along the lines of what you suggested, and cleaned this 
up.


PS19, Line 204:  In other words, the only way to not end up here if we entered 
the outer loop is if
  : // the limit is reached.
> rather than say that in a comment, how about:
Done


Line 263:   (!HasMorePassthrough() && !HasMoreMaterialized() && 
!HasMoreConst(state));
> shouldn't we put lines 254-263 inside a do-while loop, so that we don't ret
Unfortunately that wouldn't work.

For passthrough, we simply forward as they are. Even if they are partially 
filled, we just forward them. That's kind of the point of this patch.

Maybe we could put materialized and const in a while loop, but that would make 
the code messier and the benefit would be minimal (up to 1 fewer partially 
filled row batch per union node per entire cluster).


http://gerrit.cloudera.org:8080/#/c/5816/19/be/src/exec/union-node.h
File be/src/exec/union-node.h:

PS19, Line 115: child_idx
> 'child_idx'
Done


Line 129: return child_idx_ >= first_materialized_child_idx_ && child_idx_ 
< children_.size();
> this suggestion is okay to ignore, but i find these kind of conditions easi
Reordered it as you suggested. (By the way, in Python you can actually write "w 
< x <= y < z")

first_materialized_child_idx_ points to the first materialized child. So 
everything after it is materialized. that means if child_idx_ is greater or 
equal to materialized_child_idx_ then it's materialized.

It's the exact opposite of passthrough (where we check if child_idx_ is less 
than).


http://gerrit.cloudera.org:8080/#/c/5816/19/be/src/exprs/slot-ref.cc
File be/src/exprs/slot-ref.cc:

Line 94: DCHECK(false);
> this is confusing. either it's an invariant or it's not. it's even more con
Removed.


-- 
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 19
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3203: Part 1: Free list implementation

2017-03-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-3203: Part 1: Free list implementation
..


Patch Set 10: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6410
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia89acfa4efdecb96d3678443b4748932b4133b9b
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3203: Part 1: Free list implementation

2017-03-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-3203: Part 1: Free list implementation
..


IMPALA-3203: Part 1: Free list implementation

We will have a single free list per size class. Free buffers
are stored in a heap, ordered by the memory address of the
buffer, to help reduce address space fragmentation.

A follow-up patch will use the free lists in BufferPool.

Currently TCMalloc has thread-local caches with somewhat similar
purpose. However, these are not suitable for our purposes for
several reasons:
* They are designed for caching small allocations - large allocations
  like most buffers are served from a global page heap protected
  by a global lock.
* We intend to move away from using TCMalloc for buffers: IMPALA-5073
* Thread caches are ineffective for the producer-consumer pattern where
  one thread allocates memory and another thread frees it.
* TCMalloc gives us limited control over how and when memory is
  actually released to the OS.

Testing:
Added unit tests for sanity checks and verification of behaviour
that is trickier to check in integration or system tests.
The cost will be exercised more thoroughly via BufferPool
in Part 2.

Performance:
Includes a benchmark that demonstrates the scalability of
the free lists under concurrency. When measuring pure throughput
of free list operations, having a free list per core is
significantly faster than a shared free list, or allocating
directly from TCMalloc. On 8 cores, if the memory allocated is
actually touched then for 64KB+ buffers, memory access is the
bottleneck rather than lock contention.

The benchmark also showed that non-inlined constructors and move
operators of BufferHandle were taking significant CPU cycles, so
I inlined those.

This suggests that having a free list per core is more than sufficient
(however, we will need to validate this with system concurrency
benchmarks once we switch to using this during query execution).

Change-Id: Ia89acfa4efdecb96d3678443b4748932b4133b9b
Reviewed-on: http://gerrit.cloudera.org:8080/6410
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/free-lists-benchmark.cc
M be/src/runtime/bufferpool/CMakeLists.txt
M be/src/runtime/bufferpool/buffer-allocator.cc
M be/src/runtime/bufferpool/buffer-allocator.h
M be/src/runtime/bufferpool/buffer-pool.cc
M be/src/runtime/bufferpool/buffer-pool.h
A be/src/runtime/bufferpool/free-list-test.cc
A be/src/runtime/bufferpool/free-list.h
M be/src/util/benchmark-test.cc
M be/src/util/benchmark.cc
M be/src/util/benchmark.h
12 files changed, 813 insertions(+), 64 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Tim Armstrong: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/6410
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ia89acfa4efdecb96d3678443b4748932b4133b9b
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-3586: Implement union passthrough

2017-03-20 Thread Taras Bobrovytsky (Code Review)
Hello Alex Behm, Dan Hecht,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5816

to look at the new patch set (#20).

Change subject: IMPALA-3586: Implement union passthrough
..

IMPALA-3586: Implement union passthrough

The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.

Removed operand reordering in the FE because it's simpler and safer to
handle all passthrough children before non-passthrough children in the
BE. The recent improvements to memory management allowed us to remove
this requirement.

Testing:
- Added new planner and end to end tests that cover the new
  functionality.
- Updated existing tests to reflect the new behavior.

Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:

SELECT
  COUNT(ss_sold_time_sk),
  COUNT(ss_item_sk),
  COUNT(ss_customer_sk),
  COUNT(ss_cdemo_sk),
  COUNT(ss_hdemo_sk),
  COUNT(ss_addr_sk),
  COUNT(ss_store_sk),
  COUNT(ss_promo_sk),
  COUNT(ss_ticket_number),
  COUNT(ss_quantity),
  COUNT(ss_wholesale_cost),
  COUNT(ss_list_price),
  COUNT(ss_sales_price),
  COUNT(ss_ext_discount_amt),
  COUNT(ss_ext_sales_price),
  COUNT(ss_ext_wholesale_cost),
  COUNT(ss_ext_list_price),
  COUNT(ss_ext_tax),
  COUNT(ss_coupon_amt),
  COUNT(ss_net_paid),
  COUNT(ss_net_paid_inc_tax),
  COUNT(ss_net_profit),
  COUNT(ss_sold_date_sk)
FROM (
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
) t

Before:
Total Time: 43s164ms

Summary:
Operator  #Hosts   Avg Time   Max Time#Rows  Est. #Rows   Peak Mem  
Est. Peak Mem  Detail
--
13:AGGREGATE   1  224.721us  224.721us1   1   28.00 KB  
  -1.00 B  FINALIZE
12:EXCHANGE1   24.578us   24.578us3   1  0  
  -1.00 B  UNPARTITIONED
11:AGGREGATE   32s402ms3s060ms3   1  119.00 KB  
 10.00 MB
00:UNION   3   35s380ms   37s846ms  288.01M 288.01M3.08 MB  
0
|--02:SCAN HDFS3  184.197ms  219.931ms   28.80M  28.80M  535.03 MB  
  1.88 GB  store_sales_unpartitioned
|--03:SCAN HDFS3  131.956ms  153.401ms   28.80M  28.80M  534.98 MB  
  1.88 GB  store_sales_unpartitioned
|--04:SCAN HDFS3  178.456ms  247.721ms   28.80M  28.80M  534.98 MB  
  1.88 GB  store_sales_unpartitioned
|--05:SCAN HDFS3  189.398ms  242.251ms   28.80M  28.80M  535.01 MB  
  1.88 GB  store_sales_unpartitioned
|--06:SCAN HDFS3  122.786ms  156.528ms   28.80M  28.80M  534.98 MB  
  1.88 GB  store_sales_unpartitioned
|--07:SCAN HDFS3  147.467ms  183.391ms   28.80M  28.80M  535.13 MB  
  1.88 GB  store_sales_unpartitioned
|--08:SCAN HDFS3  147.502ms  186.273ms   28.80M  28.80M  535.01 MB  
  1.88 GB  store_sales_unpartitioned
|--09:SCAN HDFS3  130.086ms  154.682ms   28.80M  28.80M  535.04 MB  
  1.88 GB  store_sales_unpartitioned
|--10:SCAN HDFS3  122.701ms  161.056ms   28.80M  28.80M  534.89 MB  
  1.88 GB  store_sales_unpartitioned
01:SCAN HDFS   3  287.863ms  330.436ms   28.80M  28.80M  534.98 MB  
  1.88 GB  store_sales_unpartitioned

After:
Total Time: 19s139ms

Summary:
Operator  #Hosts   Avg Time   Max Time#Rows  Est. #Rows   Peak Mem  
Est. Peak Mem  Detail
--
13:AGGREGATE   1  166.241us  166.241us1   1   28.00 KB  
  -1.00 B  FINALIZE
12:EXCHANGE1   71.695us   71.695us3   1  0  
  -1.00 B  UNPARTITIONED
11:AGGREGATE   32s971ms3s809ms3   13.08 MB  
 10.00 MB
00:UNION   3  207.956ms  222.846ms  288.01M 288.01M  0  
  

[Impala-ASF-CR] IMPALA-3586: Implement union passthrough

2017-03-20 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-3586: Implement union passthrough
..


Patch Set 19:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5816/19/be/src/exec/union-node.cc
File be/src/exec/union-node.cc:

Line 263:   (!HasMorePassthrough() && !HasMoreMaterialized() && 
!HasMoreConst(state));
> shouldn't we put lines 254-263 inside a do-while loop, so that we don't ret
I guess we can't really make this work easily when there are multiple 
passthrough children, so let's not worry about doing this.


-- 
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 19
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] [DOCS] Replace LZO setup instructions with placeholder

2017-03-20 Thread Laurel Hale (Code Review)
Laurel Hale has posted comments on this change.

Change subject: [DOCS] Replace LZO setup instructions with placeholder
..


Patch Set 3: Code-Review+1

Builds cleanly and renders well in both html & pdf.

-- 
To view, visit http://gerrit.cloudera.org:8080/6439
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Idea5a28a73287b6b3a023843ddada88204db7b1b
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Laurel Hale 
Gerrit-HasComments: No


[Impala-ASF-CR] [DOCS] Genericize references to ODBC driver versions

2017-03-20 Thread Laurel Hale (Code Review)
Laurel Hale has posted comments on this change.

Change subject: [DOCS] Genericize references to ODBC driver versions
..


Patch Set 1: Code-Review+1

Builds cleanly and changes render well in both HTML & PDF. I agree with Jim 
that it would read well to remove the phrase, "versions of" in both places 
identified by him.

-- 
To view, visit http://gerrit.cloudera.org:8080/6440
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ida0fa4f25934eaebf095b427413f20944f1a4f71
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Laurel Hale 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5088: Fix heap buffer overflow

2017-03-20 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-5088: Fix heap buffer overflow
..


Patch Set 2:

As if to prove the point about this not just being a hypothetical bug, I just 
hit this, in unison, on five nodes at once in a twenty node cluster :) Thanks 
for fixing it.

-- 
To view, visit http://gerrit.cloudera.org:8080/6441
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia68d69b35d53934e6f1aae2e1696503d17660ca2
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5088: Fix heap buffer overflow

2017-03-20 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-5088: Fix heap buffer overflow
..


Patch Set 2: Code-Review+2

Thanks for the review, Tim. Updated the commit message to mention the commit 
that introduced the bug (for backporting), forwarding the +2. I'll start the 
GVO after my private ASAN build passes.

-- 
To view, visit http://gerrit.cloudera.org:8080/6441
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia68d69b35d53934e6f1aae2e1696503d17660ca2
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5088: Fix heap buffer overflow

2017-03-20 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#2).

Change subject: IMPALA-5088: Fix heap buffer overflow
..

IMPALA-5088: Fix heap buffer overflow

In a recent patch (IMPALA-4787), we introduced a change where we are
reading past the end of a buffer during a copy. Even though we would
never read the data that was copied from past the end of the buffer,
this could cause a crash if the allocation is sitting on the edge of an
unmapped area of memory. This also caused the ASAN build to fail.

The issue is fixed by never accessing memory that is past the end of the
buffer.

Change-Id: Ia68d69b35d53934e6f1aae2e1696503d17660ca2
---
M be/src/exprs/aggregate-functions-ir.cc
1 file changed, 7 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/41/6441/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6441
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia68d69b35d53934e6f1aae2e1696503d17660ca2
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5088: Fix heap buffer overflow

2017-03-20 Thread Taras Bobrovytsky (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6441

to look at the new patch set (#2).

Change subject: IMPALA-5088: Fix heap buffer overflow
..

IMPALA-5088: Fix heap buffer overflow

In a recent patch (IMPALA-4787), we introduced a change where we are
reading past the end of a buffer during a copy. Even though we would
never read the data that was copied from past the end of the buffer,
this could cause a crash if the allocation is sitting on the edge of an
unmapped area of memory. This also caused the ASAN build to fail.

The issue is fixed by never accessing memory that is past the end of the
buffer.

Change-Id: Ia68d69b35d53934e6f1aae2e1696503d17660ca2
---
M be/src/exprs/aggregate-functions-ir.cc
1 file changed, 7 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/41/6441/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6441
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia68d69b35d53934e6f1aae2e1696503d17660ca2
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-3586: Implement union passthrough

2017-03-20 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-3586: Implement union passthrough
..


Patch Set 19:

(14 comments)

nice cleanup

http://gerrit.cloudera.org:8080/#/c/5816/19/be/src/exec/union-node.cc
File be/src/exec/union-node.cc:

Line 111:   DCHECK_LT(child_idx_, children_.size());
this DCHECK should come before any use of child_idx_ since if it's violated, it 
doesn't make sense to call e.g. IsChildPassthrough() with chidl_idx_.


Line 114: 
DCHECK(child(child_idx_)->row_desc().LayoutEquals(row_batch->row_desc()));
what do these dchecks have to do with child_eos_? i.e. shouldn't they true 
regardless of the value of child_eos_?


PS19, Line 132: child_idx_++
++child_idx_ per our style.


PS19, Line 147: child_idx_ < children_.size()
Would make more sense as HasMoreMaterialized()


PS19, Line 150: children
children that need materialization


PS19, Line 153: Row
Child row batch


Line 168:   // There are only 3 ways of getting out of this loop:
it would be helpful to concisely say what this loop is responsible for given 
it's complexity.  Something like:
This loop fetches row batches from a single child and materializes each output 
row, until one of these conditions:
1) ...


Line 193: COUNTER_SET(rows_returned_counter_, num_rows_returned_);
should we do: child_batch_.reset() here? we shouldn't ever reference it again, 
but seems cleaner to keep the invariant that child_batch_ always corresponds to 
the open child (otherwise its memory references aren't valid).


PS19, Line 200: // We end up here iff one of the following is true (or 
both).
  : // 1. We are done consuming all batches from the current 
child and we need to move on
  : //to the next child.
  : // 2. The output row batch is at capacity.
this could be summarized for a quicker read:

// Either we've finished the current child or the output batch is at capacity, 
or both.


PS19, Line 204:  In other words, the only way to not end up here if we entered 
the outer loop is if
  : // the limit is reached.
rather than say that in a comment, how about:

DCHECK(!ReachedLimit());


Line 263:   (!HasMorePassthrough() && !HasMoreMaterialized() && 
!HasMoreConst(state));
shouldn't we put lines 254-263 inside a do-while loop, so that we don't return 
partially filled row-batches when we still have output to produce?  Returning 
partially filled row-batches is legal but can be confusing to clients 
(admittedly it can happen for other reasons currently, but it'd be nice to 
avoid doing that unless there's a good reason).

do {
...
while (!*eos && !row_batch->AtCapacity());


http://gerrit.cloudera.org:8080/#/c/5816/19/be/src/exec/union-node.h
File be/src/exec/union-node.h:

PS19, Line 115: child_idx
'child_idx'


Line 129: return child_idx_ >= first_materialized_child_idx_ && child_idx_ 
< children_.size();
this suggestion is okay to ignore, but i find these kind of conditions easier 
to read as:

x <= y && y < z

so that it looks more close to how you'd read it in math: x <= y < z.

That said, I don't see how the condition:

first_materialized_child_idx_ <= child_idx_ < children_.size() 

is correct for whether there are more materialized children. i.e. when 
child_idx_ < first_materialized_child_idx, we still have more materalized 
children, right?

So, shouldn't this be:

// We have children that need materialization and haven't processed them all 
yet.
first_materizlied_child_idx_ != children_.size() && child_idx_ < 
children_.size()


http://gerrit.cloudera.org:8080/#/c/5816/19/be/src/exprs/slot-ref.cc
File be/src/exprs/slot-ref.cc:

Line 94: DCHECK(false);
this is confusing. either it's an invariant or it's not. it's even more 
confusing because now it makes this case and the one at line 79 different for 
no good reason, and future code readers won't know why.


-- 
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 19
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5088: Fix heap buffer overflow

2017-03-20 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5088: Fix heap buffer overflow
..


Patch Set 1: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6441
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia68d69b35d53934e6f1aae2e1696503d17660ca2
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5088: Fix heap buffer overflow

2017-03-20 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6441

Change subject: IMPALA-5088: Fix heap buffer overflow
..

IMPALA-5088: Fix heap buffer overflow

In a recent patch, we introduced a change where we are incorrectly
copying a buffer. We are reading past the end of a buffer during the
copy. Even though we would never read the data that was copied from
past the end of the buffer, this could cause a crash if the allocation
is sitting on the edge of an unmapped area of memory. This also caused
the ASAN build to fail.

The issue is fixed by never accessing memory that is past the end of
the buffer

Change-Id: Ia68d69b35d53934e6f1aae2e1696503d17660ca2
---
M be/src/exprs/aggregate-functions-ir.cc
1 file changed, 7 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/41/6441/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6441
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ia68d69b35d53934e6f1aae2e1696503d17660ca2
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan nodes and inline views

2017-03-20 Thread Zach Amsden (Code Review)
Zach Amsden has uploaded a new patch set (#6).

Change subject: IMPALA-5003: Constant propagation in scan nodes and inline views
..

IMPALA-5003: Constant propagation in scan nodes and inline views

When conjuncts are pushed into table refs and inline views, they can
be considered for constant progagation within that node.

Note: currently broken due to optimizing away column refs for HBase
nodes when FALSE conjuncts are present.  Working on a fix.

Testing: Expanded the test cases for the planner to achieve constant
propagation.  More tests being added (for HBase and Kudu)

Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
---
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/SelectList.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/rewrite/ExprRewriter.java
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/conjunct-ordering.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
A testdata/workloads/visa
13 files changed, 219 insertions(+), 43 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/89/6389/6
-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] [DOCS] Genericize references to ODBC driver versions

2017-03-20 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: [DOCS] Genericize references to ODBC driver versions
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6440/1/docs/topics/impala_ports.xml
File docs/topics/impala_ports.xml:

PS1, Line 94: versions of 
elide here and below


-- 
To view, visit http://gerrit.cloudera.org:8080/6440
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ida0fa4f25934eaebf095b427413f20944f1a4f71
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Laurel Hale 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5057: Upgrade glog (0.3.4-p2) and gflags (2.2.0)

2017-03-20 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5057: Upgrade glog (0.3.4-p2) and gflags (2.2.0)
..


Patch Set 2: Code-Review+2

Seems reasonable so long as it doesn't conflict with other in-flight version 
bumps.

-- 
To view, visit http://gerrit.cloudera.org:8080/6436
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If5c8718c67f021e18b3b95178b077fc147d6fcee
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] PREVIEW: IMPALA-4678: port backend exec to use buffer pool

2017-03-20 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#5).

Change subject: PREVIEW: IMPALA-4678: port backend exec to use buffer pool
..

PREVIEW: IMPALA-4678: port backend exec to use buffer pool

Always create global BufferPool at startup using 80% of memory and
limit reservations to  80% of query memory (same as BufferedBlockMgr).

Each ExecNode has to declare its memory requirements at Prepare() time.

Convert HashTable to use the new BufferPool via a Suballocator.

Make PAGG memory consumption more efficient (avoid wasting buffers):
* Allow preaggs to execute with 0 reservation - if streams and hash tables
  cannot be allocated, it will pass through rows.
* Halve the buffer requirement for spilling aggs - avoid allocating
  buffers for aggregated and unaggregated streams simultaneously.

Convert Sorter to use BufferPool.

TODO in this patch:
 * some of the DCHECKS may be too aggressive. With the current memory
   transfer model, operators that accumulate batches, i.e. NLJ, can
   "steal" reservation. We need a test to reproduce this problem. We
   can probably fix by having NLJ copy if it sees an attached buffer.
 * Consider renaming buffer_pool_page_size, e.g. to spillable_page_size
 * We hit a delete_on_read_ DCHECK in PHJ. This is currently too strict.

TODO in follow-up patches:
* Rename BufferedTupleStreamV2 to BufferedTupleStream
* Remove the old hash join and aggregation nodes

Testing:
* Updated tests to reflect new memory requirements
* TODO: recalibrate limits in test_mem_usage_scaling
* TODO: more tests to exercise new code paths

Change-Id: I7fc7fe1c04e9dfb1a0c749fb56a5e0f2bf9c6c3e
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/analytic-eval-node.cc
M be/src/exec/analytic-eval-node.h
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/hash-table-test.cc
M be/src/exec/hash-table.cc
M be/src/exec/hash-table.h
M be/src/exec/hash-table.inline.h
M be/src/exec/partitioned-aggregation-node-ir.cc
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-aggregation-node.h
M be/src/exec/partitioned-hash-join-builder-ir.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/exec/partitioned-hash-join-node-ir.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/partitioned-hash-join-node.h
M be/src/exec/partitioned-hash-join-node.inline.h
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/runtime/CMakeLists.txt
D be/src/runtime/buffered-block-mgr-test.cc
D be/src/runtime/buffered-block-mgr.cc
D be/src/runtime/buffered-block-mgr.h
D be/src/runtime/buffered-tuple-stream-test.cc
M be/src/runtime/buffered-tuple-stream-v2.cc
M be/src/runtime/buffered-tuple-stream-v2.h
D be/src/runtime/buffered-tuple-stream.cc
D be/src/runtime/buffered-tuple-stream.h
D be/src/runtime/buffered-tuple-stream.inline.h
M be/src/runtime/disk-io-mgr.cc
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/runtime/plan-fragment-executor.cc
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
M be/src/runtime/runtime-filter.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/runtime/test-env.cc
M be/src/runtime/test-env.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/bloom-filter.h
M be/src/util/memory-metrics.h
M be/src/util/static-asserts.cc
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/generate_error_codes.py
M testdata/workloads/functional-query/queries/QueryTest/analytic-fns.test
M 
testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters_phj.test
M testdata/workloads/functional-query/queries/QueryTest/spilling.test
D tests/custom_cluster/test_spilling.py
M tests/query_test/test_mem_usage_scaling.py
M tests/query_test/test_sort.py
A tests/query_test/test_spilling.py
61 files changed, 1,669 insertions(+), 7,647 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/01/5801/5
-- 
To view, visit http://gerrit.cloudera.org:8080/5801
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7fc7fe1c04e9dfb1a0c749fb56a5e0f2bf9c6c3e
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 


[Impala-ASF-CR] PREVIEW: IMPALA-4678: port backend exec to use buffer pool

2017-03-20 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#3).

Change subject: PREVIEW: IMPALA-4678: port backend exec to use buffer pool
..

PREVIEW: IMPALA-4678: port backend exec to use buffer pool

Always create global BufferPool at startup using 80% of memory and
limit reservations to  80% of query memory (same as BufferedBlockMgr).

Each ExecNode has to declare its memory requirements at Prepare() time.

Convert HashTable to use the new BufferPool via a Suballocator.

Make PAGG memory consumption more efficient (avoid wasting buffers):
* Allow preaggs to execute with 0 reservation - if streams and hash tables
  cannot be allocated, it will pass through rows.
* Halve the buffer requirement for spilling aggs - avoid allocating
  buffers for aggregated and unaggregated streams simultaneously.

Convert Sorter to use BufferPool.

TODO in this patch:
 * some of the DCHECKS may be too aggressive. With the current memory
   transfer model, operators that accumulate batches, i.e. NLJ, can
   "steal" reservation. We need a test to reproduce this problem. We
   can probably fix by having NLJ copy if it sees an attached buffer.
 * Consider renaming buffer_pool_page_size, e.g. to spillable_page_size
 * We hit a delete_on_read_ DCHECK in PHJ. This is currently too strict.

TODO in follow-up patches:
* Rename BufferedTupleStreamV2 to BufferedTupleStream
* Remove the old hash join and aggregation nodes

Testing:
* Updated tests to reflect new memory requirements
* TODO: recalibrate limits in test_mem_usage_scaling
* TODO: more tests to exercise new code paths

Change-Id: I7fc7fe1c04e9dfb1a0c749fb56a5e0f2bf9c6c3e
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/analytic-eval-node.cc
M be/src/exec/analytic-eval-node.h
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/hash-table-test.cc
M be/src/exec/hash-table.cc
M be/src/exec/hash-table.h
M be/src/exec/hash-table.inline.h
M be/src/exec/partitioned-aggregation-node-ir.cc
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-aggregation-node.h
M be/src/exec/partitioned-hash-join-builder-ir.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/exec/partitioned-hash-join-node-ir.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/partitioned-hash-join-node.h
M be/src/exec/partitioned-hash-join-node.inline.h
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/runtime/CMakeLists.txt
D be/src/runtime/buffered-block-mgr-test.cc
D be/src/runtime/buffered-block-mgr.cc
D be/src/runtime/buffered-block-mgr.h
D be/src/runtime/buffered-tuple-stream-test.cc
M be/src/runtime/buffered-tuple-stream-v2.cc
M be/src/runtime/buffered-tuple-stream-v2.h
D be/src/runtime/buffered-tuple-stream.cc
D be/src/runtime/buffered-tuple-stream.h
D be/src/runtime/buffered-tuple-stream.inline.h
M be/src/runtime/disk-io-mgr.cc
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/runtime/plan-fragment-executor.cc
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
M be/src/runtime/runtime-filter.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/runtime/test-env.cc
M be/src/runtime/test-env.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/bloom-filter.h
M be/src/util/memory-metrics.h
M be/src/util/static-asserts.cc
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/generate_error_codes.py
M testdata/workloads/functional-query/queries/QueryTest/analytic-fns.test
M 
testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters_phj.test
M testdata/workloads/functional-query/queries/QueryTest/spilling.test
D tests/custom_cluster/test_spilling.py
M tests/query_test/test_mem_usage_scaling.py
M tests/query_test/test_sort.py
A tests/query_test/test_spilling.py
61 files changed, 1,669 insertions(+), 7,722 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/01/5801/3
-- 
To view, visit http://gerrit.cloudera.org:8080/5801
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7fc7fe1c04e9dfb1a0c749fb56a5e0f2bf9c6c3e
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 


[Impala-ASF-CR] IMPALA-5003: Generic constant propagation in planner

2017-03-20 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change.

Change subject: IMPALA-5003: Generic constant propagation in planner
..


Patch Set 3:

(12 comments)

I'll post a WIP update to this diff as there was some refactoring that turned 
out to be necessary, and some new revelations about HBase key filtering have 
come up.

http://gerrit.cloudera.org:8080/#/c/6389/5//COMMIT_MSG
Commit Message:

Line 7: IMPALA-5003: [DRAFT] Generic constant propagation in planner
> Instead of "Generic constant propagation in planner" I'd propose something 
Done


Line 9: Rather than specialize the constant propagation framework to be specific
> Maybe rephrase to what the new code does and where it is applied, e.g.,
Done


http://gerrit.cloudera.org:8080/#/c/6389/5/fe/src/main/java/org/apache/impala/analysis/ConstantPropagator.java
File fe/src/main/java/org/apache/impala/analysis/ConstantPropagator.java:

Line 115
> Looks wrong. We cannot just drop the 'false' predicate. I think we should e
No, you missed the ! - we certainly can't drop the false predicate, but we do 
want to drop constant true predicates.  I can probably (e.isConstant() && 
Expr.IS_TRUE_LITERAL) to make this more clear - this skips dropping constants 
that don't evaluate to bools, but there shouldn't be any such conjuncts anyway.


http://gerrit.cloudera.org:8080/#/c/6389/3/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

Line 1217: ConstantPropagator.optimizeConjuncts(conjuncts, analyzer,
> Did it not come out cleaner?
It did, but required some refactoring.


http://gerrit.cloudera.org:8080/#/c/6389/5/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

Line 1217: ConstantPropagator.optimizeConjuncts(conjuncts, analyzer,
> Have you tried running a few queries that end up with a constant 'false' pr
In practice this is more complicated.  In many places we've already created a 
ScanNode of some sort and end up adding impossible false filters on to it.  For 
HBase key filtering, this is actually made worse by optimizing (key = 'a' && 
false) => (false), and then the key = 'a' conjunct is not picked up by key 
range optimization and we end up with an unbounded key range scan across all 
data on all nodes :(

I'm trying to rework so that this converts to an EmptySetNode, but this is 
actually more difficult than it should be.  Ideally we would pull the conjunct 
optimization to figure out impossible scans first and have a generic way to 
create an EmptySetNode similar across all scan types, but some more refactoring 
would be needed to do that cleanly (turns out this already works for inline 
views).


http://gerrit.cloudera.org:8080/#/c/6389/5/fe/src/main/java/org/apache/impala/rewrite/ExprRewriter.java
File fe/src/main/java/org/apache/impala/rewrite/ExprRewriter.java:

Line 48:   public enum RuleSet {
> This enum seems to add an unnecessary indirection. It's the same as "enable
Done


http://gerrit.cloudera.org:8080/#/c/6389/5/testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test:

Line 16
> Add a similar test for Kudu in one of the Kudu .test files.
Will do.

Also, this word 'propagation' is my nemesis of all words in the English 
language.


Line 18
> flip the lhs/rhs of some predicates to test the expr normalization within t
Done


Line 27
> whitespace
I noticed this too after the diff was sent


Line 43
> What's the impossibility?
copypasta mistake


Line 56
> Can we move this into an HBase .test file? It's sometimes better to keep th
Done


Line 67
> Add negative test cases with arbitrary exprs, e.g. slot refs with explicit 
I think we also want to test with analytic expressions and functions returning 
bool as conjuncts


-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4914,IMPALA-4999: remove flaky TestSpillStress

2017-03-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4914,IMPALA-4999: remove flaky TestSpillStress
..


Patch Set 2: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6437
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie792d64dc88f682066c13e559918d72d76b31b71
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5057: Upgrade glog (0.3.4-p2) and gflags (2.2.0)

2017-03-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5057: Upgrade glog (0.3.4-p2) and gflags (2.2.0)
..


Patch Set 2:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/400/

-- 
To view, visit http://gerrit.cloudera.org:8080/6436
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If5c8718c67f021e18b3b95178b077fc147d6fcee
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4914,IMPALA-4999: remove flaky TestSpillStress

2017-03-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4914,IMPALA-4999: remove flaky TestSpillStress
..


IMPALA-4914,IMPALA-4999: remove flaky TestSpillStress

The test does not work as intended and would likely not provide
very good coverage of the different spilling paths, because it
only runs a single simple query. The stress test
(tests/stress/concurrent_select.py) provides much better coverage.
test_mem_usage_scaling.py probably also provides better coverage
that TestSpillStress in its current form.

Change-Id: Ie792d64dc88f682066c13e559918d72d76b31b71
Reviewed-on: http://gerrit.cloudera.org:8080/6437
Reviewed-by: Michael Brown 
Tested-by: Impala Public Jenkins
---
M tests/custom_cluster/test_spilling.py
1 file changed, 0 insertions(+), 75 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Michael Brown: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/6437
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ie792d64dc88f682066c13e559918d72d76b31b71
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Brown 


[Impala-ASF-CR] IMPALA-5057: Upgrade glog (0.3.4-p2) and gflags (2.2.0)

2017-03-20 Thread Henry Robinson (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6436

to look at the new patch set (#2).

Change subject: IMPALA-5057: Upgrade glog (0.3.4-p2) and gflags (2.2.0)
..

IMPALA-5057: Upgrade glog (0.3.4-p2) and gflags (2.2.0)

Change-Id: If5c8718c67f021e18b3b95178b077fc147d6fcee
---
M bin/impala-config.sh
1 file changed, 3 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/6436/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6436
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If5c8718c67f021e18b3b95178b077fc147d6fcee
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins


[Impala-ASF-CR] IMPALA-4846: Upgrade Snappy to 1.1.4

2017-03-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4846: Upgrade Snappy to 1.1.4
..


Patch Set 1: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6428
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2f3439019ae22bbcf4db7f731e45ba5f7899fcc2
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Laszlo Gaal 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4846: Upgrade Snappy to 1.1.4

2017-03-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4846: Upgrade Snappy to 1.1.4
..


IMPALA-4846: Upgrade Snappy to 1.1.4

Change-Id: I2f3439019ae22bbcf4db7f731e45ba5f7899fcc2
Reviewed-on: http://gerrit.cloudera.org:8080/6428
Reviewed-by: Tim Armstrong 
Reviewed-by: Henry Robinson 
Tested-by: Impala Public Jenkins
---
M bin/impala-config.sh
1 file changed, 2 insertions(+), 2 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Henry Robinson: Looks good to me, approved
  Tim Armstrong: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/6428
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I2f3439019ae22bbcf4db7f731e45ba5f7899fcc2
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Laszlo Gaal 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] Bump Kudu client version to 16dd6e4

2017-03-20 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: Bump Kudu client version to 16dd6e4
..


Patch Set 1: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6423
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9a8074640a6b93b5a65a1e0e2f22c7fe78754ad1
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5072: Fix test recover partitions on S3

2017-03-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5072: Fix test_recover_partitions on S3
..


Patch Set 3:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/399/

-- 
To view, visit http://gerrit.cloudera.org:8080/6408
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I30d86e49b000db925a778ff53479b61a075dc88c
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5072: Fix test recover partitions on S3

2017-03-20 Thread Michael Brown (Code Review)
Michael Brown has posted comments on this change.

Change subject: IMPALA-5072: Fix test_recover_partitions on S3
..


Patch Set 3:

Sorry for noise. Somehow two Gerrit jobs started.

-- 
To view, visit http://gerrit.cloudera.org:8080/6408
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I30d86e49b000db925a778ff53479b61a075dc88c
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5072: Fix test recover partitions on S3

2017-03-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5072: Fix test_recover_partitions on S3
..


Patch Set 3: Verified-1

Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/398/

-- 
To view, visit http://gerrit.cloudera.org:8080/6408
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I30d86e49b000db925a778ff53479b61a075dc88c
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5072: Fix test recover partitions on S3

2017-03-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5072: Fix test_recover_partitions on S3
..


Patch Set 3:

Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/397/

-- 
To view, visit http://gerrit.cloudera.org:8080/6408
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I30d86e49b000db925a778ff53479b61a075dc88c
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5072: Fix test recover partitions on S3

2017-03-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5072: Fix test_recover_partitions on S3
..


Patch Set 3:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/397/

-- 
To view, visit http://gerrit.cloudera.org:8080/6408
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I30d86e49b000db925a778ff53479b61a075dc88c
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5072: Fix test recover partitions on S3

2017-03-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5072: Fix test_recover_partitions on S3
..


Patch Set 3:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/398/

-- 
To view, visit http://gerrit.cloudera.org:8080/6408
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I30d86e49b000db925a778ff53479b61a075dc88c
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5072: Fix test recover partitions on S3

2017-03-20 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change.

Change subject: IMPALA-5072: Fix test_recover_partitions on S3
..


Patch Set 3:

Yes, Michael, if you could start a build, that should unbreak S3 tests. :)

-- 
To view, visit http://gerrit.cloudera.org:8080/6408
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I30d86e49b000db925a778ff53479b61a075dc88c
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: No


[Impala-ASF-CR] Allow BlockingQueue and ThreadPool to accept rvalue args

2017-03-20 Thread Henry Robinson (Code Review)
Henry Robinson has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6442

Change subject: Allow BlockingQueue and ThreadPool to accept rvalue args
..

Allow BlockingQueue and ThreadPool to accept rvalue args

Previously the BlockingQueue and ThreadPool APIs only accepted const
lvalue references, so the argument was always copied into the
queue. Very often we create a thin wrapper for each work item we submit
to a thread pool, and will not want to use that object again, so moving
it into the pool rather than copying makes the most sense.

Note the introduction of an extra template parameter into Offer() and
BlockingPut*(). To enable perfect-forwarding (i.e. allow the methods to
accept rvalue or lvalues and pass them through), we need to use a)
rvalue references (V&&) and b) do so in a 'type-deducing
context' [1]. Having the enclosing class be template-parameterized does not
count as type-deducing, so we add the dummy V parameter and the compiler
will ensure that V is properly compatible with the original T type.

[1] 
http://eli.thegreenplace.net/2014/perfect-forwarding-and-universal-references-in-c/

Change-Id: I1791870576cb269e86495034f92555de48f92f10
---
M be/src/rpc/TAcceptQueueServer.cpp
M be/src/util/blocking-queue.h
M be/src/util/thread-pool.h
3 files changed, 15 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/42/6442/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6442
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I1791870576cb269e86495034f92555de48f92f10
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 


[Impala-ASF-CR] IMPALA-3203: Part 1: Free list implementation

2017-03-20 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3203: Part 1: Free list implementation
..


Patch Set 10: Code-Review+2

Missed updating benchmark-test to reflect the change in optional arguments

-- 
To view, visit http://gerrit.cloudera.org:8080/6410
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia89acfa4efdecb96d3678443b4748932b4133b9b
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4318: Kudu support for CREATE EXTERNAL TABLE AS SELECT

2017-03-20 Thread Zach Amsden (Code Review)
Hello Matthew Jacobs,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6261

to look at the new patch set (#8).

Change subject: IMPALA-4318:  Kudu support for CREATE EXTERNAL TABLE AS SELECT
..

IMPALA-4318:  Kudu support for CREATE EXTERNAL TABLE AS SELECT

Create external table can be used for two different operations,
creating a new managed table, and importing an existing unmanaged
external table.  Previously, it was only possible to create managed
external tables with CTAS for Kudu.  This change makes it possible
to create new, unmanaged external kudu tables from SELECT statements.
This is useful for importing data from one table to another persistent
table in Kudu.

In the catalog executor, we detect if a table creation is new by
inferring that from the presence of column definitions, not by whether
it is external / managed.  Front-end code is changed to allow this
through the parser.  This means any attempts to specify table metadata
for Kudu tables are interpreted as a table creation.

Testing: Manual testing, expanded ParserTest and AnalyzeDDLTest
and added test cases to query_test to validate Kudu support.

Ran the folowing query:

create external table test2_name primary key(id) partition by hash(id)
partitions 10 stored as kudu as select * from test_name;

Which successfully created and inserted 3 rows of data.  Tried many
other things like changing primary keys and forms of partitioning.

Change-Id: I9aa82809a6c0c5e6386827314b7e5b520c1a6633
---
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M shell/impala_shell.py
M tests/query_test/test_kudu.py
8 files changed, 92 insertions(+), 33 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/61/6261/8
-- 
To view, visit http://gerrit.cloudera.org:8080/6261
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9aa82809a6c0c5e6386827314b7e5b520c1a6633
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-3203: Part 1: Free list implementation

2017-03-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-3203: Part 1: Free list implementation
..


Patch Set 10:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/396/

-- 
To view, visit http://gerrit.cloudera.org:8080/6410
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia89acfa4efdecb96d3678443b4748932b4133b9b
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3203: Part 1: Free list implementation

2017-03-20 Thread Tim Armstrong (Code Review)
Hello Impala Public Jenkins, Jim Apple, Dan Hecht,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6410

to look at the new patch set (#10).

Change subject: IMPALA-3203: Part 1: Free list implementation
..

IMPALA-3203: Part 1: Free list implementation

We will have a single free list per size class. Free buffers
are stored in a heap, ordered by the memory address of the
buffer, to help reduce address space fragmentation.

A follow-up patch will use the free lists in BufferPool.

Currently TCMalloc has thread-local caches with somewhat similar
purpose. However, these are not suitable for our purposes for
several reasons:
* They are designed for caching small allocations - large allocations
  like most buffers are served from a global page heap protected
  by a global lock.
* We intend to move away from using TCMalloc for buffers: IMPALA-5073
* Thread caches are ineffective for the producer-consumer pattern where
  one thread allocates memory and another thread frees it.
* TCMalloc gives us limited control over how and when memory is
  actually released to the OS.

Testing:
Added unit tests for sanity checks and verification of behaviour
that is trickier to check in integration or system tests.
The cost will be exercised more thoroughly via BufferPool
in Part 2.

Performance:
Includes a benchmark that demonstrates the scalability of
the free lists under concurrency. When measuring pure throughput
of free list operations, having a free list per core is
significantly faster than a shared free list, or allocating
directly from TCMalloc. On 8 cores, if the memory allocated is
actually touched then for 64KB+ buffers, memory access is the
bottleneck rather than lock contention.

The benchmark also showed that non-inlined constructors and move
operators of BufferHandle were taking significant CPU cycles, so
I inlined those.

This suggests that having a free list per core is more than sufficient
(however, we will need to validate this with system concurrency
benchmarks once we switch to using this during query execution).

Change-Id: Ia89acfa4efdecb96d3678443b4748932b4133b9b
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/free-lists-benchmark.cc
M be/src/runtime/bufferpool/CMakeLists.txt
M be/src/runtime/bufferpool/buffer-allocator.cc
M be/src/runtime/bufferpool/buffer-allocator.h
M be/src/runtime/bufferpool/buffer-pool.cc
M be/src/runtime/bufferpool/buffer-pool.h
A be/src/runtime/bufferpool/free-list-test.cc
A be/src/runtime/bufferpool/free-list.h
M be/src/util/benchmark-test.cc
M be/src/util/benchmark.cc
M be/src/util/benchmark.h
12 files changed, 813 insertions(+), 64 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/6410/10
-- 
To view, visit http://gerrit.cloudera.org:8080/6410
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia89acfa4efdecb96d3678443b4748932b4133b9b
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5057: Upgrade glog (0.3.4-p2) and gflags (2.2.0)

2017-03-20 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-5057: Upgrade glog (0.3.4-p2) and gflags (2.2.0)
..


Patch Set 1:

All the Kudu tests failed, not clear why one of the table servers failed to 
start. Unfortunately, looks hard to get on that machine to get at the tablet 
logs. 

As expected, one sub-build failed because Snappy 1.1.3 is no longer available 
in the toolchain. Will wait for that build to finish before retrying this.

-- 
To view, visit http://gerrit.cloudera.org:8080/6436
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If5c8718c67f021e18b3b95178b077fc147d6fcee
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5057: Upgrade glog (0.3.4-p2) and gflags (2.2.0)

2017-03-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5057: Upgrade glog (0.3.4-p2) and gflags (2.2.0)
..


Patch Set 1: Verified-1

Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/392/

-- 
To view, visit http://gerrit.cloudera.org:8080/6436
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If5c8718c67f021e18b3b95178b077fc147d6fcee
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4318: Kudu support for CREATE EXTERNAL TABLE AS SELECT

2017-03-20 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change.

Change subject: IMPALA-4318:  Kudu support for CREATE EXTERNAL TABLE AS SELECT
..


Patch Set 7:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6261/7//COMMIT_MSG
Commit Message:

Line 10: new by inferring that from the presence of column definitions, not
> what does 'if a table creation is new' mean?
New rather that existing table.  Terminology here is confusing, as managed != 
external, yet in many places of code and documentation, that is implied.  Will 
rephrase.


http://gerrit.cloudera.org:8080/#/c/6261/7/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

Line 2264: "tblproperties('kudu.table_name'='t')", "Table partitioning 
must be " +
> why is that? wouldn't the kudu table/metadata be able to provide that?
Specifying any part of the metadata (x int) now implies a table creation, as we 
use the presence of column definitions to infer that this is a table creation.  
Since the assumption is that the table is being created, we need to know the 
partitioning scheme to create it.  I tried to make the error message suggest 
this.  I'll add a positive test case to illustrate the required syntax.


http://gerrit.cloudera.org:8080/#/c/6261/6/tests/query_test/test_kudu.py
File tests/query_test/test_kudu.py:

PS6, Line 360: kudu_tbl_name = 
KuduTestSuite.to_kudu_table_name(unique_database, "foo2")
 : assert kudu_client.table_exists(kudu_tbl_name)
 : cursor.execute("SELECT * FROM %s.foo2" % (unique_database))
 : assert len(cursor.fetchall()) == 1
 : cursor.execute("DROP TABLE %s.foo2" % (unique_database))
 : assert kudu_client.table_exists(kudu_tbl_name)
> foo2 in kudu won't be deleted when you drop the database in impala though, 
Good point.


-- 
To view, visit http://gerrit.cloudera.org:8080/6261
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9aa82809a6c0c5e6386827314b7e5b520c1a6633
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] [DOCS] Genericize references to ODBC driver versions

2017-03-20 Thread John Russell (Code Review)
John Russell has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6440

Change subject: [DOCS] Genericize references to ODBC driver versions
..

[DOCS] Genericize references to ODBC driver versions

Make the statement about who uses which ports for
Impala less specific w.r.t. the Cloudera ODBC driver.

Change-Id: Ida0fa4f25934eaebf095b427413f20944f1a4f71
---
M docs/topics/impala_ports.xml
1 file changed, 2 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/6440/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6440
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ida0fa4f25934eaebf095b427413f20944f1a4f71
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 


[Impala-ASF-CR] IMPALA-3203: Part 1: Free list implementation

2017-03-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-3203: Part 1: Free list implementation
..


Patch Set 9: Verified-1

Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/395/

-- 
To view, visit http://gerrit.cloudera.org:8080/6410
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia89acfa4efdecb96d3678443b4748932b4133b9b
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5072: Fix test recover partitions on S3

2017-03-20 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5072: Fix test_recover_partitions on S3
..


Patch Set 3: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6408
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I30d86e49b000db925a778ff53479b61a075dc88c
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5072: Fix test recover partitions on S3

2017-03-20 Thread Michael Brown (Code Review)
Michael Brown has posted comments on this change.

Change subject: IMPALA-5072: Fix test_recover_partitions on S3
..


Patch Set 3:

Zach, let me know if you would like a build started for this patch, and I can 
do it.

-- 
To view, visit http://gerrit.cloudera.org:8080/6408
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I30d86e49b000db925a778ff53479b61a075dc88c
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: No


[Impala-ASF-CR] [DOCS] Rewrite github URL for timezone source file

2017-03-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: [DOCS] Rewrite github URL for timezone source file
..


Patch Set 1: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6438
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I40bff9286d08f1de11abd9ab4b804855f0330cc2
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Laurel Hale 
Gerrit-HasComments: No


[Impala-ASF-CR] [DOCS] Rewrite github URL for timezone source file

2017-03-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: [DOCS] Rewrite github URL for timezone source file
..


[DOCS] Rewrite github URL for timezone source file

Use the corresponding Apache github URL instead of
the Cloudera one.

Change-Id: I40bff9286d08f1de11abd9ab4b804855f0330cc2
Reviewed-on: http://gerrit.cloudera.org:8080/6438
Reviewed-by: Jim Apple 
Reviewed-by: Laurel Hale 
Tested-by: Impala Public Jenkins
---
M docs/impala_keydefs.ditamap
M docs/topics/impala_fixed_issues.xml
2 files changed, 2 insertions(+), 2 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Laurel Hale: Looks good to me, but someone else must approve
  Jim Apple: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/6438
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I40bff9286d08f1de11abd9ab4b804855f0330cc2
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Laurel Hale 


[Impala-ASF-CR] IMPALA-5072: Fix test recover partitions on S3

2017-03-20 Thread Michael Brown (Code Review)
Michael Brown has posted comments on this change.

Change subject: IMPALA-5072: Fix test_recover_partitions on S3
..


Patch Set 3: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6408
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I30d86e49b000db925a778ff53479b61a075dc88c
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5072: Fix test recover partitions on S3

2017-03-20 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change.

Change subject: IMPALA-5072: Fix test_recover_partitions on S3
..


Patch Set 3:

Yes, updated diff attached.

-- 
To view, visit http://gerrit.cloudera.org:8080/6408
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I30d86e49b000db925a778ff53479b61a075dc88c
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5072: Fix test recover partitions on S3

2017-03-20 Thread Zach Amsden (Code Review)
Hello Dimitris Tsirogiannis,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6408

to look at the new patch set (#3).

Change subject: IMPALA-5072: Fix test_recover_partitions on S3
..

IMPALA-5072: Fix test_recover_partitions on S3

On S3, directory creation is a nop, so we have to actually
create values to materialize the partitions.

Was able to merge this against my fork and successfully ran
the metadata/test_recover_partitions test on S3.  Also ran
locally to verify there is no slowdown for HDFS.

Change-Id: I30d86e49b000db925a778ff53479b61a075dc88c
---
M tests/metadata/test_recover_partitions.py
1 file changed, 7 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/6408/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6408
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I30d86e49b000db925a778ff53479b61a075dc88c
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] [DOCS] Rewrite github URL for timezone source file

2017-03-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: [DOCS] Rewrite github URL for timezone source file
..


Patch Set 1:

Build started: http://jenkins.impala.io:8080/job/gerrit-docs-submit/103/

-- 
To view, visit http://gerrit.cloudera.org:8080/6438
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I40bff9286d08f1de11abd9ab4b804855f0330cc2
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Laurel Hale 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3586: Implement union passthrough

2017-03-20 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-3586: Implement union passthrough
..


Patch Set 19: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 19
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4041: Limit catalog and admission control updates to coordinators

2017-03-20 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4041: Limit catalog and admission control updates to 
coordinators
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6344/2/be/src/runtime/exec-env.cc
File be/src/runtime/exec-env.cc:

Line 228:   if (FLAGS_use_statestore && statestore_port > 0) {
> This is a good idea - but please leave it for now, as one of the KRPC commi
let's please not file a jira for something this small, the time involved in 
dealing with the jira will exceed the time spent on the code. please leave a 
todo so that henry can do this as part of the krpc changes.


-- 
To view, visit http://gerrit.cloudera.org:8080/6344
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5f2c74abdbcd60ac050efa323616bd41182ceff3
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4041: Limit catalog and admission control updates to coordinators

2017-03-20 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-4041: Limit catalog and admission control updates to 
coordinators
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6344/2/be/src/runtime/exec-env.cc
File be/src/runtime/exec-env.cc:

Line 228:   if (FLAGS_use_statestore && statestore_port > 0) {
> please move all of the duplicated code, incl. initializer list, into a shar
This is a good idea - but please leave it for now, as one of the KRPC commits 
would conflict significantly with this. Maybe file a JIRA to clean this up 
(pretty sure we can remove the second c'tor).


-- 
To view, visit http://gerrit.cloudera.org:8080/6344
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5f2c74abdbcd60ac050efa323616bd41182ceff3
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4041: Limit catalog and admission control updates to coordinators

2017-03-20 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4041: Limit catalog and admission control updates to 
coordinators
..


Patch Set 2:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/6344/2/be/src/runtime/exec-env.cc
File be/src/runtime/exec-env.cc:

Line 183:   request_pool_service_.get(), metrics_.get(), 
backend_address_));
no need to pass in backend_address_, you can get that with 
ExecEnv::GetInstance()->backend_address().

same for scheduler c'tor.


Line 228:   if (FLAGS_use_statestore && statestore_port > 0) {
please move all of the duplicated code, incl. initializer list, into a shared 
c'tor or something like that, there's too much copy & paste at this point.


Line 349:   if (admission_controller_ != NULL) 
RETURN_IF_ERROR(admission_controller_->Init());
replace all NULL w/ nullptr in this file (and any other file you touch where 
that hasn't happened yet)


http://gerrit.cloudera.org:8080/#/c/6344/2/be/src/scheduling/admission-controller.cc
File be/src/scheduling/admission-controller.cc:

Line 38: DECLARE_bool(is_coordinator);
unused


http://gerrit.cloudera.org:8080/#/c/6344/2/be/src/scheduling/admission-controller.h
File be/src/scheduling/admission-controller.h:

Line 210:   /// Subscription manager
owned?


http://gerrit.cloudera.org:8080/#/c/6344/2/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

Line 341:   bind(mem_fn(&ImpalaServer::CatalogUpdateCallback), 
this, _1, _2);
use lambda, unless impractical. you can also inline that in the call.


Line 1484: // Check if the local backend descriptor is in the list of known
this function is already too long. break this up.

where did this block come from? was it moved?


Line 1903:   // Only for coordinators, initialize the HS2 and Beeswax services.
rephrase to fix grammar


http://gerrit.cloudera.org:8080/#/c/6344/2/be/src/service/impala-server.h
File be/src/service/impala-server.h:

Line 955: /// Impala server is a coordinator (dictated from the is_coordinator 
flag).
"indicated by the .. flag"


http://gerrit.cloudera.org:8080/#/c/6344/2/tests/custom_cluster/test_coordinators.py
File tests/custom_cluster/test_coordinators.py:

Line 42:   client = worker.service.create_beeswax_client()
same for hs2?


-- 
To view, visit http://gerrit.cloudera.org:8080/6344
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5f2c74abdbcd60ac050efa323616bd41182ceff3
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: Yes


[Impala-ASF-CR] [DOCS] Replace LZO setup instructions with placeholder

2017-03-20 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: [DOCS] Replace LZO setup instructions with placeholder
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6439/2/docs/topics/impala_txtfile.xml
File docs/topics/impala_txtfile.xml:

PS2, Line 562: 
 : 
 : 
 : 
 : 
 : 
 : 
 : 
 : 
 : 
 : 
 : 
 : 
 : 
 : 
 : 
 : 
 : 
 : 
 : 
 : 
 : 
 : 
 : 
 : 
 : 
 : 
 : 
 : 
 : 
 : 
 : 
IN particular this section and one at the top seem likely to be universal, no 
matter which LZO library is used


-- 
To view, visit http://gerrit.cloudera.org:8080/6439
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Idea5a28a73287b6b3a023843ddada88204db7b1b
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Laurel Hale 
Gerrit-HasComments: Yes


[Impala-ASF-CR] [DOCS] Replace LZO setup instructions with placeholder

2017-03-20 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: [DOCS] Replace LZO setup instructions with placeholder
..


Patch Set 2:

I thought the plan was to leave in as much as possible about LZO?

-- 
To view, visit http://gerrit.cloudera.org:8080/6439
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Idea5a28a73287b6b3a023843ddada88204db7b1b
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Laurel Hale 
Gerrit-HasComments: No


[Impala-ASF-CR] [DOCS] Replace LZO setup instructions with placeholder

2017-03-20 Thread John Russell (Code Review)
John Russell has posted comments on this change.

Change subject: [DOCS] Replace LZO setup instructions with placeholder
..


Patch Set 3:

Patch set 2 = fix a stray 'cdh' inside the same impala_txtfile page.
Patch set 3 = fix a reference to the old package name including -cdh4 that I 
had genericized elsewhere.

-- 
To view, visit http://gerrit.cloudera.org:8080/6439
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Idea5a28a73287b6b3a023843ddada88204db7b1b
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Laurel Hale 
Gerrit-HasComments: No


[Impala-ASF-CR] [DOCS] Replace LZO setup instructions with placeholder

2017-03-20 Thread John Russell (Code Review)
John Russell has uploaded a new patch set (#3).

Change subject: [DOCS] Replace LZO setup instructions with placeholder
..

[DOCS] Replace LZO setup instructions with placeholder

An "under construction" banner since non-Cloudera-specific
URLs are not available at this time, and the information
itself may be out of date / inapplicable for Apache Impala
project users.

Change-Id: Idea5a28a73287b6b3a023843ddada88204db7b1b
---
M docs/topics/impala_incompatible_changes.xml
M docs/topics/impala_txtfile.xml
2 files changed, 4 insertions(+), 121 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/6439/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6439
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idea5a28a73287b6b3a023843ddada88204db7b1b
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Laurel Hale 


[Impala-ASF-CR] [DOCS] Replace LZO setup instructions with placeholder

2017-03-20 Thread John Russell (Code Review)
John Russell has uploaded a new patch set (#2).

Change subject: [DOCS] Replace LZO setup instructions with placeholder
..

[DOCS] Replace LZO setup instructions with placeholder

An "under construction" banner since non-Cloudera-specific
URLs are not available at this time, and the information
itself may be out of date / inapplicable for Apache Impala
project users.

Change-Id: Idea5a28a73287b6b3a023843ddada88204db7b1b
---
M docs/topics/impala_txtfile.xml
1 file changed, 3 insertions(+), 120 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/6439/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6439
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idea5a28a73287b6b3a023843ddada88204db7b1b
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Laurel Hale 


[Impala-ASF-CR] [DOCS] Replace LZO setup instructions with placeholder

2017-03-20 Thread John Russell (Code Review)
John Russell has posted comments on this change.

Change subject: [DOCS] Replace LZO setup instructions with placeholder
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6439/1/docs/topics/impala_txtfile.xml
File docs/topics/impala_txtfile.xml:

PS1, Line 538: cdh4-0.4.15
Replace this 'cdh' portion of the filename with a generic placeholder.


-- 
To view, visit http://gerrit.cloudera.org:8080/6439
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Idea5a28a73287b6b3a023843ddada88204db7b1b
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Laurel Hale 
Gerrit-HasComments: Yes


[Impala-ASF-CR] [DOCS] Replace LZO setup instructions with placeholder

2017-03-20 Thread John Russell (Code Review)
John Russell has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6439

Change subject: [DOCS] Replace LZO setup instructions with placeholder
..

[DOCS] Replace LZO setup instructions with placeholder

An "under construction" banner since non-Cloudera-specific
URLs are not available at this time, and the information
itself may be out of date / inapplicable for Apache Impala
project users.

Change-Id: Idea5a28a73287b6b3a023843ddada88204db7b1b
---
M docs/topics/impala_txtfile.xml
1 file changed, 1 insertion(+), 118 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/6439/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6439
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Idea5a28a73287b6b3a023843ddada88204db7b1b
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 


[Impala-ASF-CR] [DOCS] Rewrite github URL for timezone source file

2017-03-20 Thread Laurel Hale (Code Review)
Laurel Hale has posted comments on this change.

Change subject: [DOCS] Rewrite github URL for timezone source file
..


Patch Set 1: Code-Review+1

Builds cleanly, displays perfectly in both html & pdf.

-- 
To view, visit http://gerrit.cloudera.org:8080/6438
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I40bff9286d08f1de11abd9ab4b804855f0330cc2
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Laurel Hale 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3586: Implement union passthrough

2017-03-20 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#19).

Change subject: IMPALA-3586: Implement union passthrough
..

IMPALA-3586: Implement union passthrough

The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.

Removed operand reordering in the FE because it's simpler and safer to
handle all passthrough children before non-passthrough children in the
BE. The recent improvements to memory management allowed us to remove
this requirement.

Testing:
- Added new planner and end to end tests that cover the new
  functionality.
- Updated existing tests to reflect the new behavior.

Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:

SELECT
  COUNT(ss_sold_time_sk),
  COUNT(ss_item_sk),
  COUNT(ss_customer_sk),
  COUNT(ss_cdemo_sk),
  COUNT(ss_hdemo_sk),
  COUNT(ss_addr_sk),
  COUNT(ss_store_sk),
  COUNT(ss_promo_sk),
  COUNT(ss_ticket_number),
  COUNT(ss_quantity),
  COUNT(ss_wholesale_cost),
  COUNT(ss_list_price),
  COUNT(ss_sales_price),
  COUNT(ss_ext_discount_amt),
  COUNT(ss_ext_sales_price),
  COUNT(ss_ext_wholesale_cost),
  COUNT(ss_ext_list_price),
  COUNT(ss_ext_tax),
  COUNT(ss_coupon_amt),
  COUNT(ss_net_paid),
  COUNT(ss_net_paid_inc_tax),
  COUNT(ss_net_profit),
  COUNT(ss_sold_date_sk)
FROM (
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
) t

Before:
Total Time: 43s164ms

Summary:
Operator  #Hosts   Avg Time   Max Time#Rows  Est. #Rows   Peak Mem  
Est. Peak Mem  Detail
--
13:AGGREGATE   1  224.721us  224.721us1   1   28.00 KB  
  -1.00 B  FINALIZE
12:EXCHANGE1   24.578us   24.578us3   1  0  
  -1.00 B  UNPARTITIONED
11:AGGREGATE   32s402ms3s060ms3   1  119.00 KB  
 10.00 MB
00:UNION   3   35s380ms   37s846ms  288.01M 288.01M3.08 MB  
0
|--02:SCAN HDFS3  184.197ms  219.931ms   28.80M  28.80M  535.03 MB  
  1.88 GB  store_sales_unpartitioned
|--03:SCAN HDFS3  131.956ms  153.401ms   28.80M  28.80M  534.98 MB  
  1.88 GB  store_sales_unpartitioned
|--04:SCAN HDFS3  178.456ms  247.721ms   28.80M  28.80M  534.98 MB  
  1.88 GB  store_sales_unpartitioned
|--05:SCAN HDFS3  189.398ms  242.251ms   28.80M  28.80M  535.01 MB  
  1.88 GB  store_sales_unpartitioned
|--06:SCAN HDFS3  122.786ms  156.528ms   28.80M  28.80M  534.98 MB  
  1.88 GB  store_sales_unpartitioned
|--07:SCAN HDFS3  147.467ms  183.391ms   28.80M  28.80M  535.13 MB  
  1.88 GB  store_sales_unpartitioned
|--08:SCAN HDFS3  147.502ms  186.273ms   28.80M  28.80M  535.01 MB  
  1.88 GB  store_sales_unpartitioned
|--09:SCAN HDFS3  130.086ms  154.682ms   28.80M  28.80M  535.04 MB  
  1.88 GB  store_sales_unpartitioned
|--10:SCAN HDFS3  122.701ms  161.056ms   28.80M  28.80M  534.89 MB  
  1.88 GB  store_sales_unpartitioned
01:SCAN HDFS   3  287.863ms  330.436ms   28.80M  28.80M  534.98 MB  
  1.88 GB  store_sales_unpartitioned

After:
Total Time: 19s139ms

Summary:
Operator  #Hosts   Avg Time   Max Time#Rows  Est. #Rows   Peak Mem  
Est. Peak Mem  Detail
--
13:AGGREGATE   1  166.241us  166.241us1   1   28.00 KB  
  -1.00 B  FINALIZE
12:EXCHANGE1   71.695us   71.695us3   1  0  
  -1.00 B  UNPARTITIONED
11:AGGREGATE   32s971ms3s809ms3   13.08 MB  
 10.00 MB
00:UNION   3  207.956ms  222.846ms  288.01M 288.01M  0  
0
|--02:SCAN HDFS31s533ms1s535ms   28.80M  28.80M  532.28 MB  
  1.88 

[Impala-ASF-CR] IMPALA-3586: Implement union passthrough

2017-03-20 Thread Taras Bobrovytsky (Code Review)
Hello Alex Behm, Dan Hecht,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5816

to look at the new patch set (#19).

Change subject: IMPALA-3586: Implement union passthrough
..

IMPALA-3586: Implement union passthrough

The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.

Removed operand reordering in the FE because it's simpler and safer to
handle all passthrough children before non-passthrough children in the
BE. The recent improvements to memory management allowed us to remove
this requirement.

Testing:
- Added new planner and end to end tests that cover the new
  functionality.
- Updated existing tests to reflect the new behavior.

Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:

SELECT
  COUNT(ss_sold_time_sk),
  COUNT(ss_item_sk),
  COUNT(ss_customer_sk),
  COUNT(ss_cdemo_sk),
  COUNT(ss_hdemo_sk),
  COUNT(ss_addr_sk),
  COUNT(ss_store_sk),
  COUNT(ss_promo_sk),
  COUNT(ss_ticket_number),
  COUNT(ss_quantity),
  COUNT(ss_wholesale_cost),
  COUNT(ss_list_price),
  COUNT(ss_sales_price),
  COUNT(ss_ext_discount_amt),
  COUNT(ss_ext_sales_price),
  COUNT(ss_ext_wholesale_cost),
  COUNT(ss_ext_list_price),
  COUNT(ss_ext_tax),
  COUNT(ss_coupon_amt),
  COUNT(ss_net_paid),
  COUNT(ss_net_paid_inc_tax),
  COUNT(ss_net_profit),
  COUNT(ss_sold_date_sk)
FROM (
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
) t

Before:
Total Time: 43s164ms

Summary:
Operator  #Hosts   Avg Time   Max Time#Rows  Est. #Rows   Peak Mem  
Est. Peak Mem  Detail
--
13:AGGREGATE   1  224.721us  224.721us1   1   28.00 KB  
  -1.00 B  FINALIZE
12:EXCHANGE1   24.578us   24.578us3   1  0  
  -1.00 B  UNPARTITIONED
11:AGGREGATE   32s402ms3s060ms3   1  119.00 KB  
 10.00 MB
00:UNION   3   35s380ms   37s846ms  288.01M 288.01M3.08 MB  
0
|--02:SCAN HDFS3  184.197ms  219.931ms   28.80M  28.80M  535.03 MB  
  1.88 GB  store_sales_unpartitioned
|--03:SCAN HDFS3  131.956ms  153.401ms   28.80M  28.80M  534.98 MB  
  1.88 GB  store_sales_unpartitioned
|--04:SCAN HDFS3  178.456ms  247.721ms   28.80M  28.80M  534.98 MB  
  1.88 GB  store_sales_unpartitioned
|--05:SCAN HDFS3  189.398ms  242.251ms   28.80M  28.80M  535.01 MB  
  1.88 GB  store_sales_unpartitioned
|--06:SCAN HDFS3  122.786ms  156.528ms   28.80M  28.80M  534.98 MB  
  1.88 GB  store_sales_unpartitioned
|--07:SCAN HDFS3  147.467ms  183.391ms   28.80M  28.80M  535.13 MB  
  1.88 GB  store_sales_unpartitioned
|--08:SCAN HDFS3  147.502ms  186.273ms   28.80M  28.80M  535.01 MB  
  1.88 GB  store_sales_unpartitioned
|--09:SCAN HDFS3  130.086ms  154.682ms   28.80M  28.80M  535.04 MB  
  1.88 GB  store_sales_unpartitioned
|--10:SCAN HDFS3  122.701ms  161.056ms   28.80M  28.80M  534.89 MB  
  1.88 GB  store_sales_unpartitioned
01:SCAN HDFS   3  287.863ms  330.436ms   28.80M  28.80M  534.98 MB  
  1.88 GB  store_sales_unpartitioned

After:
Total Time: 19s139ms

Summary:
Operator  #Hosts   Avg Time   Max Time#Rows  Est. #Rows   Peak Mem  
Est. Peak Mem  Detail
--
13:AGGREGATE   1  166.241us  166.241us1   1   28.00 KB  
  -1.00 B  FINALIZE
12:EXCHANGE1   71.695us   71.695us3   1  0  
  -1.00 B  UNPARTITIONED
11:AGGREGATE   32s971ms3s809ms3   13.08 MB  
 10.00 MB
00:UNION   3  207.956ms  222.846ms  288.01M 288.01M  0  
  

[Impala-ASF-CR] IMPALA-3586: Implement union passthrough

2017-03-20 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-3586: Implement union passthrough
..


Patch Set 18:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/5816/18/be/src/exec/union-node.h
File be/src/exec/union-node.h:

Line 61:   /// Index of the first non-passthrough child; i.e. a child that 
needs materializing and
> i.e. a child that needs materialization (remove the "and evaluating its exp
Done


Line 62:   /// evaluating its exprs. When all children are materialized, this 
should be zero. When
> Shrink like this:
Done


http://gerrit.cloudera.org:8080/#/c/5816/18/fe/src/main/java/org/apache/impala/planner/UnionNode.java
File fe/src/main/java/org/apache/impala/planner/UnionNode.java:

Line 180:* children come before the children that need to be materialized 
and evaluated. Also
> remove "and evaluated"
Done


Line 181:* reorders 'resultExprLists_'. This is done in order to simplify 
the implementation in
> Instead of 'This' be explicit since the reference could be misunderstood. S
Done


http://gerrit.cloudera.org:8080/#/c/5816/18/tests/query_test/test_queries.py
File tests/query_test/test_queries.py:

Line 80: query_string = ("select count(c) from ( "
> Why not add this at the end of union.test? Seems odd to have a single naked
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 18
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5031: Suppress noisy UBSAN errors from Thrift.

2017-03-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5031: Suppress noisy UBSAN errors from Thrift.
..


Patch Set 2: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6425
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I74ece2157048e8cd24c2536c0a292d9c21f719b9
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5031: Suppress noisy UBSAN errors from Thrift.

2017-03-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5031: Suppress noisy UBSAN errors from Thrift.
..


IMPALA-5031: Suppress noisy UBSAN errors from Thrift.

This suppresses a Thrift undefined behavior error in which a negative
value is left-shifted. See THRIFT-2026 for tracking.

One example backtrace from the expr-test backend test:

/thrift/protocol/TCompactProtocol.tcc:375:13: runtime error: left shift of 
negative value -1
#0 0x2b02fe247996 in 
apache::thrift::protocol::TCompactProtocolT::i64ToZigzag(long)
 /thrift/protocol/TCompactProtocol.tcc:375:13
#1 0x2b02fe247674 in 
apache::thrift::protocol::TCompactProtocolT::writeI64(long)
 /thrift/protocol/TCompactProtocol.tcc:242:24
#2 0x2b02fe239504 in 
apache::thrift::protocol::TVirtualProtocol,
 apache::thrift::protocol::TProtocolDefaults>::writeI64_virt(long) 
/thrift/protocol/TVirtualProtocol.h:409:12
#3 0x2b03014a3a06 in apache::thrift::protocol::TProtocol::writeI64(long) 
/thrift/protocol/TProtocol.h:453:12
#4 0x2b0301d55c02 in 
impala::TRuntimeProfileNode::write(apache::thrift::protocol::TProtocol*) const 
/generated-sources/gen-cpp/RuntimeProfile_types.cpp:827:11
#5 0x2b0301d59e34 in 
impala::TRuntimeProfileTree::write(apache::thrift::protocol::TProtocol*) const 
/generated-sources/gen-cpp/RuntimeProfile_types.cpp:1017:15
#6 0x2b02f6f8c7be in impala::Status 
impala::ThriftSerializer::Serialize(impala::TRuntimeProfileTree*,
 unsigned int*, unsigned char**) /src/rpc/thrift-util.h:67:7
#7 0x2b02f6f0d23e in impala::Status 
impala::ThriftSerializer::Serialize(impala::TRuntimeProfileTree*,
 std::vector >*) 
/src/rpc/thrift-util.h:54:31
#8 0x2b02f6eec934 in 
impala::RuntimeProfile::SerializeToArchiveString(std::basic_stringstream, std::allocator >*) const 
/src/util/runtime-profile.cc:727:19
#9 0x2b02f6eec640 in impala::RuntimeProfile::SerializeToArchiveString() 
const /src/util/runtime-profile.cc:718:3
#10 0x2b02feda626c in 
impala::ImpalaServer::ArchiveQuery(impala::ImpalaServer::QueryExecState const&) 
/src/service/impala-server.cc:671:39
#11 0x2b02fedb57b0 in 
impala::ImpalaServer::UnregisterQuery(impala::TUniqueId const&, bool, 
impala::Status const*) /src/service/impala-server.cc:972:3
#12 0x2b02ff15b666 in impala::ImpalaServer::close(beeswax::QueryHandle 
const&) /src/service/impala-beeswax-server.cc:236:29
#13 0x2b030177dc14 in beeswax::BeeswaxServiceProcessor::process_close(int, 
apache::thrift::protocol::TProtocol*, apache::thrift::protocol::TProtocol*, 
void*) /generated-sources/gen-cpp/BeeswaxService.cpp:3543:5
#14 0x2b0301763825 in 
beeswax::BeeswaxServiceProcessor::dispatchCall(apache::thrift::protocol::TProtocol*,
 apache::thrift::protocol::TProtocol*, std::string const&, int, void*) 
/generated-sources/gen-cpp/BeeswaxService.cpp:2952:3
#15 0x2b03016bc2d6 in 
impala::ImpalaServiceProcessor::dispatchCall(apache::thrift::protocol::TProtocol*,
 apache::thrift::protocol::TProtocol*, std::string const&, int, void*) 
/generated-sources/gen-cpp/ImpalaService.cpp:1673:12
#16 0x2b02f650138e in 
apache::thrift::TDispatchProcessor::process(boost::shared_ptr,
 boost::shared_ptr, void*) 
/thrift/TDispatchProcessor.h:121:12
#17 0x1d4080a in apache::thrift::server::TThreadPoolServer::Task::run() 
(/build/debug/exprs/expr-test+0x1d4080a)
#18 0x1d23e38 in apache::thrift::concurrency::ThreadManager::Worker::run() 
(/build/debug/exprs/expr-test+0x1d23e38)
#19 0x2b02fe2be520 in 
impala::ThriftThread::RunRunnable(boost::shared_ptr,
 impala::Promise*) /src/rpc/thrift-thread.cc:64:3
#20 0x2b02fe2c4c6b in boost::_mfi::mf2, 
impala::Promise*>::operator()(impala::ThriftThread*, 
boost::shared_ptr, 
impala::Promise*) const /boost/bind/mem_fn_template.hpp:280:16
#21 0x2b02fe2c498a in void 
boost::_bi::list3, 
boost::_bi::value >, 
boost::_bi::value*> 
>::operator(), 
impala::Promise*>, boost::_bi::list0>(boost::_bi::type, 
boost::_mfi::mf2, 
impala::Promise*>&, boost::_bi::list0&, int) 
/boost/bind/bind.hpp:392:9
#22 0x2b02fe2c444b in boost::_bi::bind_t, 
impala::Promise*>, 
boost::_bi::list3, 
boost::_bi::value >, 
boost::_bi::value*> > >::operator()() 
/boost/bind/bind_template.hpp:20:16
#23 0x2b02fe2c3709 in 
boost::detail::function::void_function_obj_invoker0, 
impala::Promise*>, 
boost::_bi::list3, 
boost::_bi::value >, 
boost::_bi::value*> > >, 
void>::invoke(boost::detail::function::function_buffer&) 
/boost/function/function_template.hpp:153:11
#24 0x2b02f70085d4 in boost::function0::operator()() const 
/boost/function/function_template.hpp:766:14
#25 0x2b02f6ff9710 in impala::Thread::SuperviseThread(std::string const&, 
std::string const&, boost::function, impala::Promise*) 
/src/util/thread.cc:325:3
#26 0x2b02f7021783 in void 
boost::_bi::list4, 
boost::_bi::value, boost::_bi::value >, 
boost::_bi::value*> >::operator(), impala::Promise*), 
b

[Impala-ASF-CR] IMPALA-3586: Implement union passthrough

2017-03-20 Thread Taras Bobrovytsky (Code Review)
Hello Alex Behm, Dan Hecht,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5816

to look at the new patch set (#19).

Change subject: IMPALA-3586: Implement union passthrough
..

IMPALA-3586: Implement union passthrough

The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.

Removed operand reordering in the FE because it's simpler and safer to
handle all passthrough children before non-passthrough children in the
BE. The recent improvements to memory management allowed us to remove
this requirement.

Testing:
- Added new planner and end to end tests that cover the new
  functionality.
- Updated existing tests to reflect the new behavior.

Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:

SELECT
  COUNT(ss_sold_time_sk),
  COUNT(ss_item_sk),
  COUNT(ss_customer_sk),
  COUNT(ss_cdemo_sk),
  COUNT(ss_hdemo_sk),
  COUNT(ss_addr_sk),
  COUNT(ss_store_sk),
  COUNT(ss_promo_sk),
  COUNT(ss_ticket_number),
  COUNT(ss_quantity),
  COUNT(ss_wholesale_cost),
  COUNT(ss_list_price),
  COUNT(ss_sales_price),
  COUNT(ss_ext_discount_amt),
  COUNT(ss_ext_sales_price),
  COUNT(ss_ext_wholesale_cost),
  COUNT(ss_ext_list_price),
  COUNT(ss_ext_tax),
  COUNT(ss_coupon_amt),
  COUNT(ss_net_paid),
  COUNT(ss_net_paid_inc_tax),
  COUNT(ss_net_profit),
  COUNT(ss_sold_date_sk)
FROM (
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
) t

Before:
Total Time: 43s164ms

Summary:
Operator  #Hosts   Avg Time   Max Time#Rows  Est. #Rows   Peak Mem  
Est. Peak Mem  Detail
--
13:AGGREGATE   1  224.721us  224.721us1   1   28.00 KB  
  -1.00 B  FINALIZE
12:EXCHANGE1   24.578us   24.578us3   1  0  
  -1.00 B  UNPARTITIONED
11:AGGREGATE   32s402ms3s060ms3   1  119.00 KB  
 10.00 MB
00:UNION   3   35s380ms   37s846ms  288.01M 288.01M3.08 MB  
0
|--02:SCAN HDFS3  184.197ms  219.931ms   28.80M  28.80M  535.03 MB  
  1.88 GB  store_sales_unpartitioned
|--03:SCAN HDFS3  131.956ms  153.401ms   28.80M  28.80M  534.98 MB  
  1.88 GB  store_sales_unpartitioned
|--04:SCAN HDFS3  178.456ms  247.721ms   28.80M  28.80M  534.98 MB  
  1.88 GB  store_sales_unpartitioned
|--05:SCAN HDFS3  189.398ms  242.251ms   28.80M  28.80M  535.01 MB  
  1.88 GB  store_sales_unpartitioned
|--06:SCAN HDFS3  122.786ms  156.528ms   28.80M  28.80M  534.98 MB  
  1.88 GB  store_sales_unpartitioned
|--07:SCAN HDFS3  147.467ms  183.391ms   28.80M  28.80M  535.13 MB  
  1.88 GB  store_sales_unpartitioned
|--08:SCAN HDFS3  147.502ms  186.273ms   28.80M  28.80M  535.01 MB  
  1.88 GB  store_sales_unpartitioned
|--09:SCAN HDFS3  130.086ms  154.682ms   28.80M  28.80M  535.04 MB  
  1.88 GB  store_sales_unpartitioned
|--10:SCAN HDFS3  122.701ms  161.056ms   28.80M  28.80M  534.89 MB  
  1.88 GB  store_sales_unpartitioned
01:SCAN HDFS   3  287.863ms  330.436ms   28.80M  28.80M  534.98 MB  
  1.88 GB  store_sales_unpartitioned

After:
Total Time: 19s139ms

Summary:
Operator  #Hosts   Avg Time   Max Time#Rows  Est. #Rows   Peak Mem  
Est. Peak Mem  Detail
--
13:AGGREGATE   1  166.241us  166.241us1   1   28.00 KB  
  -1.00 B  FINALIZE
12:EXCHANGE1   71.695us   71.695us3   1  0  
  -1.00 B  UNPARTITIONED
11:AGGREGATE   32s971ms3s809ms3   13.08 MB  
 10.00 MB
00:UNION   3  207.956ms  222.846ms  288.01M 288.01M  0  
  

[Impala-ASF-CR] IMPALA-3586: Implement union passthrough

2017-03-20 Thread Taras Bobrovytsky (Code Review)
Hello Alex Behm, Dan Hecht,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5816

to look at the new patch set (#19).

Change subject: IMPALA-3586: Implement union passthrough
..

IMPALA-3586: Implement union passthrough

The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.

Removed operand reordering in the FE because it's simpler and safer to
handle all passthrough children before non-passthrough children in the
BE. The recent improvements to memory management allowed us to remove
this requirement.

Testing:
- Added new planner and end to end tests that cover the new
  functionality.
- Updated existing tests to reflect the new behavior.

Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:

SELECT
  COUNT(ss_sold_time_sk),
  COUNT(ss_item_sk),
  COUNT(ss_customer_sk),
  COUNT(ss_cdemo_sk),
  COUNT(ss_hdemo_sk),
  COUNT(ss_addr_sk),
  COUNT(ss_store_sk),
  COUNT(ss_promo_sk),
  COUNT(ss_ticket_number),
  COUNT(ss_quantity),
  COUNT(ss_wholesale_cost),
  COUNT(ss_list_price),
  COUNT(ss_sales_price),
  COUNT(ss_ext_discount_amt),
  COUNT(ss_ext_sales_price),
  COUNT(ss_ext_wholesale_cost),
  COUNT(ss_ext_list_price),
  COUNT(ss_ext_tax),
  COUNT(ss_coupon_amt),
  COUNT(ss_net_paid),
  COUNT(ss_net_paid_inc_tax),
  COUNT(ss_net_profit),
  COUNT(ss_sold_date_sk)
FROM (
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
) t

Before:
Total Time: 43s164ms

Summary:
Operator  #Hosts   Avg Time   Max Time#Rows  Est. #Rows   Peak Mem  
Est. Peak Mem  Detail
--
13:AGGREGATE   1  224.721us  224.721us1   1   28.00 KB  
  -1.00 B  FINALIZE
12:EXCHANGE1   24.578us   24.578us3   1  0  
  -1.00 B  UNPARTITIONED
11:AGGREGATE   32s402ms3s060ms3   1  119.00 KB  
 10.00 MB
00:UNION   3   35s380ms   37s846ms  288.01M 288.01M3.08 MB  
0
|--02:SCAN HDFS3  184.197ms  219.931ms   28.80M  28.80M  535.03 MB  
  1.88 GB  store_sales_unpartitioned
|--03:SCAN HDFS3  131.956ms  153.401ms   28.80M  28.80M  534.98 MB  
  1.88 GB  store_sales_unpartitioned
|--04:SCAN HDFS3  178.456ms  247.721ms   28.80M  28.80M  534.98 MB  
  1.88 GB  store_sales_unpartitioned
|--05:SCAN HDFS3  189.398ms  242.251ms   28.80M  28.80M  535.01 MB  
  1.88 GB  store_sales_unpartitioned
|--06:SCAN HDFS3  122.786ms  156.528ms   28.80M  28.80M  534.98 MB  
  1.88 GB  store_sales_unpartitioned
|--07:SCAN HDFS3  147.467ms  183.391ms   28.80M  28.80M  535.13 MB  
  1.88 GB  store_sales_unpartitioned
|--08:SCAN HDFS3  147.502ms  186.273ms   28.80M  28.80M  535.01 MB  
  1.88 GB  store_sales_unpartitioned
|--09:SCAN HDFS3  130.086ms  154.682ms   28.80M  28.80M  535.04 MB  
  1.88 GB  store_sales_unpartitioned
|--10:SCAN HDFS3  122.701ms  161.056ms   28.80M  28.80M  534.89 MB  
  1.88 GB  store_sales_unpartitioned
01:SCAN HDFS   3  287.863ms  330.436ms   28.80M  28.80M  534.98 MB  
  1.88 GB  store_sales_unpartitioned

After:
Total Time: 19s139ms

Summary:
Operator  #Hosts   Avg Time   Max Time#Rows  Est. #Rows   Peak Mem  
Est. Peak Mem  Detail
--
13:AGGREGATE   1  166.241us  166.241us1   1   28.00 KB  
  -1.00 B  FINALIZE
12:EXCHANGE1   71.695us   71.695us3   1  0  
  -1.00 B  UNPARTITIONED
11:AGGREGATE   32s971ms3s809ms3   13.08 MB  
 10.00 MB
00:UNION   3  207.956ms  222.846ms  288.01M 288.01M  0  
  

[Impala-ASF-CR] [DOCS] Rewrite github URL for timezone source file

2017-03-20 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: [DOCS] Rewrite github URL for timezone source file
..


Patch Set 1: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6438
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I40bff9286d08f1de11abd9ab4b804855f0330cc2
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Laurel Hale 
Gerrit-HasComments: No


[Impala-ASF-CR] [DOCS] Rewrite github URL for timezone source file

2017-03-20 Thread John Russell (Code Review)
John Russell has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6438

Change subject: [DOCS] Rewrite github URL for timezone source file
..

[DOCS] Rewrite github URL for timezone source file

Use the corresponding Apache github URL instead of
the Cloudera one.

Change-Id: I40bff9286d08f1de11abd9ab4b804855f0330cc2
---
M docs/impala_keydefs.ditamap
M docs/topics/impala_fixed_issues.xml
2 files changed, 2 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/6438/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6438
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I40bff9286d08f1de11abd9ab4b804855f0330cc2
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 


[Impala-ASF-CR] IMPALA-3203: Part 1: Free list implementation

2017-03-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-3203: Part 1: Free list implementation
..


Patch Set 9:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/395/

-- 
To view, visit http://gerrit.cloudera.org:8080/6410
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia89acfa4efdecb96d3678443b4748932b4133b9b
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3203: Part 1: Free list implementation

2017-03-20 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3203: Part 1: Free list implementation
..


Patch Set 9: Code-Review+2

Rebase

-- 
To view, visit http://gerrit.cloudera.org:8080/6410
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia89acfa4efdecb96d3678443b4748932b4133b9b
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3203: Part 1: Free list implementation

2017-03-20 Thread Tim Armstrong (Code Review)
Hello Jim Apple, Dan Hecht,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6410

to look at the new patch set (#9).

Change subject: IMPALA-3203: Part 1: Free list implementation
..

IMPALA-3203: Part 1: Free list implementation

We will have a single free list per size class. Free buffers
are stored in a heap, ordered by the memory address of the
buffer, to help reduce address space fragmentation.

A follow-up patch will use the free lists in BufferPool.

Currently TCMalloc has thread-local caches with somewhat similar
purpose. However, these are not suitable for our purposes for
several reasons:
* They are designed for caching small allocations - large allocations
  like most buffers are served from a global page heap protected
  by a global lock.
* We intend to move away from using TCMalloc for buffers: IMPALA-5073
* Thread caches are ineffective for the producer-consumer pattern where
  one thread allocates memory and another thread frees it.
* TCMalloc gives us limited control over how and when memory is
  actually released to the OS.

Testing:
Added unit tests for sanity checks and verification of behaviour
that is trickier to check in integration or system tests.
The cost will be exercised more thoroughly via BufferPool
in Part 2.

Performance:
Includes a benchmark that demonstrates the scalability of
the free lists under concurrency. When measuring pure throughput
of free list operations, having a free list per core is
significantly faster than a shared free list, or allocating
directly from TCMalloc. On 8 cores, if the memory allocated is
actually touched then for 64KB+ buffers, memory access is the
bottleneck rather than lock contention.

The benchmark also showed that non-inlined constructors and move
operators of BufferHandle were taking significant CPU cycles, so
I inlined those.

This suggests that having a free list per core is more than sufficient
(however, we will need to validate this with system concurrency
benchmarks once we switch to using this during query execution).

Change-Id: Ia89acfa4efdecb96d3678443b4748932b4133b9b
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/free-lists-benchmark.cc
M be/src/runtime/bufferpool/CMakeLists.txt
M be/src/runtime/bufferpool/buffer-allocator.cc
M be/src/runtime/bufferpool/buffer-allocator.h
M be/src/runtime/bufferpool/buffer-pool.cc
M be/src/runtime/bufferpool/buffer-pool.h
A be/src/runtime/bufferpool/free-list-test.cc
A be/src/runtime/bufferpool/free-list.h
M be/src/util/benchmark.cc
M be/src/util/benchmark.h
11 files changed, 812 insertions(+), 63 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/6410/9
-- 
To view, visit http://gerrit.cloudera.org:8080/6410
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia89acfa4efdecb96d3678443b4748932b4133b9b
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4914,IMPALA-4999: remove flaky TestSpillStress

2017-03-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4914,IMPALA-4999: remove flaky TestSpillStress
..


Patch Set 2:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/394/

-- 
To view, visit http://gerrit.cloudera.org:8080/6437
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie792d64dc88f682066c13e559918d72d76b31b71
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4846: Upgrade Snappy to 1.1.4

2017-03-20 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-4846: Upgrade Snappy to 1.1.4
..


Patch Set 1: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6428
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2f3439019ae22bbcf4db7f731e45ba5f7899fcc2
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Laszlo Gaal 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4846: Upgrade Snappy to 1.1.4

2017-03-20 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4846: Upgrade Snappy to 1.1.4
..


Patch Set 1: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6428
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2f3439019ae22bbcf4db7f731e45ba5f7899fcc2
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Laszlo Gaal 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4846: Upgrade Snappy to 1.1.4

2017-03-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4846: Upgrade Snappy to 1.1.4
..


Patch Set 1:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/393/

-- 
To view, visit http://gerrit.cloudera.org:8080/6428
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2f3439019ae22bbcf4db7f731e45ba5f7899fcc2
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Laszlo Gaal 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4846: Upgrade Snappy to 1.1.4

2017-03-20 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-4846: Upgrade Snappy to 1.1.4
..


Patch Set 1:

With Snappy 1.1.3, that query against a 96 million row table took on average 
6.1s. 

With Snappy 1.1.4, the query took 5.8s. 

Confirmed that these were release builds, same query options etc.

-- 
To view, visit http://gerrit.cloudera.org:8080/6428
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2f3439019ae22bbcf4db7f731e45ba5f7899fcc2
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Laszlo Gaal 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3203: Part 1: Free list implementation

2017-03-20 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-3203: Part 1: Free list implementation
..


Patch Set 8: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6410
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia89acfa4efdecb96d3678443b4748932b4133b9b
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4914,IMPALA-4999: remove flaky TestSpillStress

2017-03-20 Thread Michael Brown (Code Review)
Michael Brown has posted comments on this change.

Change subject: IMPALA-4914,IMPALA-4999: remove flaky TestSpillStress
..


Patch Set 2: Code-Review+2

Thanks for the commit message explaining better test alternatives.

-- 
To view, visit http://gerrit.cloudera.org:8080/6437
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie792d64dc88f682066c13e559918d72d76b31b71
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4914,IMPALA-4999: remove flaky TestSpillStress

2017-03-20 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#2).

Change subject: IMPALA-4914,IMPALA-4999: remove flaky TestSpillStress
..

IMPALA-4914,IMPALA-4999: remove flaky TestSpillStress

The test does not work as intended and would likely not provide
very good coverage of the different spilling paths, because it
only runs a single simple query. The stress test
(tests/stress/concurrent_select.py) provides much better coverage.
test_mem_usage_scaling.py probably also provides better coverage
that TestSpillStress in its current form.

Change-Id: Ie792d64dc88f682066c13e559918d72d76b31b71
---
M tests/custom_cluster/test_spilling.py
1 file changed, 0 insertions(+), 75 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/6437/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6437
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie792d64dc88f682066c13e559918d72d76b31b71
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 


[Impala-ASF-CR] IMPALA-IMPALA-4914,IMPALA-4999: remove flaky TestSpillStress

2017-03-20 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6437

Change subject: IMPALA-IMPALA-4914,IMPALA-4999: remove flaky TestSpillStress
..

IMPALA-IMPALA-4914,IMPALA-4999: remove flaky TestSpillStress

The test does not work as intended and would likely not provide
very good coverage of the different spilling paths, because it
only runs a single simple query. The stress test
(tests/stress/concurrent_select.py) provides much better coverage.
test_mem_usage_scaling.py probably also provides better coverage
that TestSpillStress in its current form.

Change-Id: Ie792d64dc88f682066c13e559918d72d76b31b71
---
M tests/custom_cluster/test_spilling.py
1 file changed, 0 insertions(+), 75 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/6437/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6437
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ie792d64dc88f682066c13e559918d72d76b31b71
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4846: Upgrade Snappy to 1.1.4

2017-03-20 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-4846: Upgrade Snappy to 1.1.4
..


Patch Set 1:

I'll try and do this now. Because of the way the toolchain works, we can't get 
any other version changes in until this one is committed (all subsequent builds 
build snappy 1.1.4 and not 1.1.3). We could do a historical build, but better 
to make forward progress.

-- 
To view, visit http://gerrit.cloudera.org:8080/6428
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2f3439019ae22bbcf4db7f731e45ba5f7899fcc2
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Laszlo Gaal 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5057: Upgrade glog (0.3.4-p2) and gflags (2.2.0)

2017-03-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5057: Upgrade glog (0.3.4-p2) and gflags (2.2.0)
..


Patch Set 1:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/392/

-- 
To view, visit http://gerrit.cloudera.org:8080/6436
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If5c8718c67f021e18b3b95178b077fc147d6fcee
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5057: Upgrade glog (0.3.4-p2) and gflags (2.2.0)

2017-03-20 Thread Henry Robinson (Code Review)
Henry Robinson has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6436

Change subject: IMPALA-5057: Upgrade glog (0.3.4-p2) and gflags (2.2.0)
..

IMPALA-5057: Upgrade glog (0.3.4-p2) and gflags (2.2.0)

Change-Id: If5c8718c67f021e18b3b95178b077fc147d6fcee
---
M bin/impala-config.sh
1 file changed, 3 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/6436/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6436
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: If5c8718c67f021e18b3b95178b077fc147d6fcee
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 


[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

2017-03-20 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has uploaded a new patch set (#2).

Change subject: IMPALA-4029: Reduce memory requirements for storing file 
metadata
..

IMPALA-4029: Reduce memory requirements for storing file metadata

This commit improves the memory requirements for storing file and block
metadata in the catalog and the impalad nodes by using the FlatBuffers
serialization library.

Testing:
Passed an exhaustive tests run.

Benchmark:
Memory requirement for storing an HDFS table with 250K files is reduced
by 2.5X.

Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
---
M CMakeLists.txt
A common/fbs/CMakeLists.txt
A common/fbs/CatalogObjects.fbs
M common/thrift/CatalogObjects.thrift
M fe/CMakeLists.txt
M fe/pom.xml
M fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java
M fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java
M fe/src/test/java/org/apache/impala/testutil/BlockIdGenerator.java
14 files changed, 654 insertions(+), 301 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/6406/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

2017-03-20 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4029: Reduce memory requirements for storing file 
metadata
..


Patch Set 1:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/6406/1//COMMIT_MSG
Commit Message:

Line 11: serialization library.
> Any benchmark numbers? Just curious. Also I think it is good to add them he
Done


http://gerrit.cloudera.org:8080/#/c/6406/1/common/fbs/CMakeLists.txt
File common/fbs/CMakeLists.txt:

PS1, Line 24:  
> Trailing white spaces at multiple places in this file.
Done


Line 45: set(CPP_ARGS --cpp -o ${BE_OUTPUT_DIR} -b)
> Can this be moved outside the loop like JAVA_FE_ARGS
Done


http://gerrit.cloudera.org:8080/#/c/6406/1/common/fbs/CatalogObjects.fbs
File common/fbs/CatalogObjects.fbs:

PS1, Line 20:  
> extraneous white spaces.
Done


PS1, Line 54: file_name_prefix_idx
> Didn't understand what this prefix means till I read HdfsTable.fileNamePref
Done


Line 70: root_type FbFileDesc;
> What is the significance of this? I'm reading about FlatBuffers for the fir
Yeah, it is mostly used if you read from JSON. Removed.


http://gerrit.cloudera.org:8080/#/c/6406/1/fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java
File fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java:

Line 83: storageIdGenerator.put(host, new Short(diskId));
> Drive-by comment: using 'new' here is pessimising memory consumption. If yo
Good point thanks. Done


http://gerrit.cloudera.org:8080/#/c/6406/1/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

PS1, Line 318: REPLICA_HOST_IDX_MASK
> Looks like it used for UNMASKing, rename may be?
It's a bit mask, so I'd rather keep the name as such :)


Line 318: private static int REPLICA_HOST_IDX_MASK = 0x7FFF;
> Document these?
Done


PS1, Line 357: getHostIdx
> how about just making hostIdx_ a short rather than typecasting everytime?
Done


PS1, Line 360: REPLICA_HOST_CACHE_MASK
> Shouldn't this be short rather than int? May be it works fine this way, jus
For convenience. Java short is signed.


http://gerrit.cloudera.org:8080/#/c/6406/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 315: Preconditions.checkNotNull(fd);
> I don't think this FileDescriptor.create() can ever return null? If yes, wo
Done


Line 338:* Computes the common prefix between 'fileName' and 'prevFileName' 
and returns a
> I like the idea but I'm wondering if this kind of optimization is a little 
That's not entirely true. Multiple filenames from a single write do have a 
common prefix.


Line 420:   prefixCompressFileName(currentFileName, prevFileName);
> Also I think it may not be optimal to construct the prefixes this way by co
In principal you're right, this is not the best way to compress strings. 
However, due to the way these file names are generated (random) this is not 
meant to be a generic file name compressor. Here we take advantage of the 
pattern that file names of a single write exhibit to avoid storing the common 
prefix multiple times.


http://gerrit.cloudera.org:8080/#/c/6406/1/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

Line 96:   protected static EnumSet SUPPORTED_TABLE_TYPES = 
EnumSet.of(
> May be remove TableLoader.SUPPORTED_TABLE_TYPES and make it use this one by
Not sure how this ended up here. Removed.


-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


  1   2   >