subject:"\[Impala\-ASF\-CR\] IMPALA\-4787\: Optimize APPX MEDIAN\(\) memory usage"

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-03-16 Thread Impala Public Jenkins (Code Review)

Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


IMPALA-4787: Optimize APPX_MEDIAN() memory usage

Before this change, ReservoirSample functions (such as APPX_MEDIAN())
allocated memory for 20,000 elements up front per grouping key. This
caused inefficient memory usage for aggregations with many grouping
keys.

This patch fixes this by initially allocating memory for 16 elements.
Once the buffer becomes full, we reallocate a new buffer with double
capacity and copy the original buffer into the new one. We continue
doubling the buffer size until the buffer has room for 20,000 elements
as before.

Testing:
Added some EE APPX_MEDIAN() tests on larger datasets that exercise the
resize code path.

Perf Benchrmark (about 35,000 elements per bucket):

SELECT MAX(a) from (
  SELECT c1, appx_median(c2) as a FROM benchmark GROUP BY c1) t

BEFORE: 11s067ms
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
-
06:AGGREGATE1  124.726us  124.726us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   29.544us   29.544us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   86.406us  120.372us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE31s840ms2s824ms   2.00K  -1   1.02 GB  
128.00 MB  FINALIZE
03:EXCHANGE 31s163ms1s989ms   6.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE33s356ms3s416ms   6.00K  -1   1.95 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   64.962ms   65.490ms  65.54M  -1  25.97 MB   
64.00 MB  tpcds_10_parquet.benchmark

AFTER: 9s465ms
Operator   #Hosts   Avg Time  Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail

06:AGGREGATE1   73.961us  73.961us   1   1  28.00 KB
-1.00 B  FINALIZE
05:EXCHANGE 1   18.101us  18.101us   3   1 0
-1.00 B  UNPARTITIONED
02:AGGREGATE3   75.795us  83.969us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE31s608ms   2s683ms   2.00K  -1   1.02 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  826.683ms   1s322ms   6.00K  -1 0
  0  HASH(c1)
01:AGGREGATE32s457ms   2s672ms   6.00K  -1   3.14 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   81.514ms  89.056ms  65.54M  -1  25.94 MB   
64.00 MB  tpcds_10_parquet.benchmark

Memory Benchmark (about 12 elements per bucket):

SELECT MAX(a) FROM (
  SELECT ss_customer_sk, APPX_MEDIAN(ss_sold_date_sk) as a
  FROM tpcds_parquet.store_sales
  GROUP BY ss_customer_sk) t

BEFORE: 7s477ms
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail
-
06:AGGREGATE1  114.686us  114.686us1   1  28.00 KB  
  -1.00 B  FINALIZE
05:EXCHANGE 1   18.214us   18.214us3   1 0  
  -1.00 B  UNPARTITIONED
02:AGGREGATE3  147.055us  165.464us3   1  28.00 KB  
 10.00 MB
04:AGGREGATE32s043ms2s147ms   14.82K  -1   4.94 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  840.528ms  943.254ms   15.61K  -1 0  
0  HASH(ss_customer_sk)
01:AGGREGATE31s769ms1s869ms   15.61K  -1   5.32 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   17.941ms   37.109ms  183.59K  -1   1.94 MB  
 16.00 MB  tpcds_parquet.store_sales

AFTER: 434ms
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail
-
06:AGGREGATE1  125.915us  125.915us1   1  28.00 KB  
  -1.00 B  FINALIZE
05:EXCHANGE 1   72.179us   72.179us3   1 0  
  -1.00 B  UNPARTITIONED
02:AGGREGATE3   79.054us   83.385us3   1  28.00 KB  
 10.00 MB
04:AGGREGATE36.559ms7.669ms   14.82K  -1  17.32 MB  
128.00 MB  FINALIZE
03:EXCHANGE 3   67.370us   85.068us   15.60K  -1 0  
0  HASH(ss_customer_sk)
01:AGGREGATE3   19.245ms   24.472ms   15.60K  -1   9.48 MB  
128.00 MB  STREAMING
00:SCAN HDFS3   53.173ms   55.844ms  183.59K  -1   1.18 MB  
 16.00 MB

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-03-16 Thread Impala Public Jenkins (Code Review)

Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 17: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 17
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-03-15 Thread Impala Public Jenkins (Code Review)

Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 17:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/381/

-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 17
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-03-15 Thread Taras Bobrovytsky (Code Review)

Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 17: Code-Review+2

Forgot to update the alloc test. Made the tiny change to the test. Forwarding 
the +2.

-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 17
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-03-15 Thread Taras Bobrovytsky (Code Review)

Taras Bobrovytsky has uploaded a new patch set (#17).

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..

IMPALA-4787: Optimize APPX_MEDIAN() memory usage

Before this change, ReservoirSample functions (such as APPX_MEDIAN())
allocated memory for 20,000 elements up front per grouping key. This
caused inefficient memory usage for aggregations with many grouping
keys.

This patch fixes this by initially allocating memory for 16 elements.
Once the buffer becomes full, we reallocate a new buffer with double
capacity and copy the original buffer into the new one. We continue
doubling the buffer size until the buffer has room for 20,000 elements
as before.

Testing:
Added some EE APPX_MEDIAN() tests on larger datasets that exercise the
resize code path.

Perf Benchrmark (about 35,000 elements per bucket):

SELECT MAX(a) from (
  SELECT c1, appx_median(c2) as a FROM benchmark GROUP BY c1) t

BEFORE: 11s067ms
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
-
06:AGGREGATE1  124.726us  124.726us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   29.544us   29.544us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   86.406us  120.372us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE31s840ms2s824ms   2.00K  -1   1.02 GB  
128.00 MB  FINALIZE
03:EXCHANGE 31s163ms1s989ms   6.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE33s356ms3s416ms   6.00K  -1   1.95 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   64.962ms   65.490ms  65.54M  -1  25.97 MB   
64.00 MB  tpcds_10_parquet.benchmark

AFTER: 9s465ms
Operator   #Hosts   Avg Time  Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail

06:AGGREGATE1   73.961us  73.961us   1   1  28.00 KB
-1.00 B  FINALIZE
05:EXCHANGE 1   18.101us  18.101us   3   1 0
-1.00 B  UNPARTITIONED
02:AGGREGATE3   75.795us  83.969us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE31s608ms   2s683ms   2.00K  -1   1.02 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  826.683ms   1s322ms   6.00K  -1 0
  0  HASH(c1)
01:AGGREGATE32s457ms   2s672ms   6.00K  -1   3.14 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   81.514ms  89.056ms  65.54M  -1  25.94 MB   
64.00 MB  tpcds_10_parquet.benchmark

Memory Benchmark (about 12 elements per bucket):

SELECT MAX(a) FROM (
  SELECT ss_customer_sk, APPX_MEDIAN(ss_sold_date_sk) as a
  FROM tpcds_parquet.store_sales
  GROUP BY ss_customer_sk) t

BEFORE: 7s477ms
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail
-
06:AGGREGATE1  114.686us  114.686us1   1  28.00 KB  
  -1.00 B  FINALIZE
05:EXCHANGE 1   18.214us   18.214us3   1 0  
  -1.00 B  UNPARTITIONED
02:AGGREGATE3  147.055us  165.464us3   1  28.00 KB  
 10.00 MB
04:AGGREGATE32s043ms2s147ms   14.82K  -1   4.94 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  840.528ms  943.254ms   15.61K  -1 0  
0  HASH(ss_customer_sk)
01:AGGREGATE31s769ms1s869ms   15.61K  -1   5.32 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   17.941ms   37.109ms  183.59K  -1   1.94 MB  
 16.00 MB  tpcds_parquet.store_sales

AFTER: 434ms
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail
-
06:AGGREGATE1  125.915us  125.915us1   1  28.00 KB  
  -1.00 B  FINALIZE
05:EXCHANGE 1   72.179us   72.179us3   1 0  
  -1.00 B  UNPARTITIONED
02:AGGREGATE3   79.054us   83.385us3   1  28.00 KB  
 10.00 MB
04:AGGREGATE36.559ms7.669ms   14.82K  -1  17.32 MB  
128.00 MB  FINALIZE
03:EXCHANGE 3   67.370us   85.068us   15.60K  -1 0  
0  HASH(ss_customer_sk)
01:AGGREGATE3   19.245ms   24.472ms   15.60K  -1   9.48 MB  
128.00 MB  STREAMING
00:SCAN HDFS3   53.173ms   55.844ms  183.59K  -1   1.18 MB  
 16.00 MB  tpcds_parquet.store_sales

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-03-15 Thread Taras Bobrovytsky (Code Review)

Hello Marcel Kornacker, Impala Public Jenkins, Matthew Jacobs, Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6025

to look at the new patch set (#17).

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..

IMPALA-4787: Optimize APPX_MEDIAN() memory usage

Before this change, ReservoirSample functions (such as APPX_MEDIAN())
allocated memory for 20,000 elements up front per grouping key. This
caused inefficient memory usage for aggregations with many grouping
keys.

This patch fixes this by initially allocating memory for 16 elements.
Once the buffer becomes full, we reallocate a new buffer with double
capacity and copy the original buffer into the new one. We continue
doubling the buffer size until the buffer has room for 20,000 elements
as before.

Testing:
Added some EE APPX_MEDIAN() tests on larger datasets that exercise the
resize code path.

Perf Benchrmark (about 35,000 elements per bucket):

SELECT MAX(a) from (
  SELECT c1, appx_median(c2) as a FROM benchmark GROUP BY c1) t

BEFORE: 11s067ms
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
-
06:AGGREGATE1  124.726us  124.726us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   29.544us   29.544us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   86.406us  120.372us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE31s840ms2s824ms   2.00K  -1   1.02 GB  
128.00 MB  FINALIZE
03:EXCHANGE 31s163ms1s989ms   6.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE33s356ms3s416ms   6.00K  -1   1.95 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   64.962ms   65.490ms  65.54M  -1  25.97 MB   
64.00 MB  tpcds_10_parquet.benchmark

AFTER: 9s465ms
Operator   #Hosts   Avg Time  Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail

06:AGGREGATE1   73.961us  73.961us   1   1  28.00 KB
-1.00 B  FINALIZE
05:EXCHANGE 1   18.101us  18.101us   3   1 0
-1.00 B  UNPARTITIONED
02:AGGREGATE3   75.795us  83.969us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE31s608ms   2s683ms   2.00K  -1   1.02 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  826.683ms   1s322ms   6.00K  -1 0
  0  HASH(c1)
01:AGGREGATE32s457ms   2s672ms   6.00K  -1   3.14 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   81.514ms  89.056ms  65.54M  -1  25.94 MB   
64.00 MB  tpcds_10_parquet.benchmark

Memory Benchmark (about 12 elements per bucket):

SELECT MAX(a) FROM (
  SELECT ss_customer_sk, APPX_MEDIAN(ss_sold_date_sk) as a
  FROM tpcds_parquet.store_sales
  GROUP BY ss_customer_sk) t

BEFORE: 7s477ms
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail
-
06:AGGREGATE1  114.686us  114.686us1   1  28.00 KB  
  -1.00 B  FINALIZE
05:EXCHANGE 1   18.214us   18.214us3   1 0  
  -1.00 B  UNPARTITIONED
02:AGGREGATE3  147.055us  165.464us3   1  28.00 KB  
 10.00 MB
04:AGGREGATE32s043ms2s147ms   14.82K  -1   4.94 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  840.528ms  943.254ms   15.61K  -1 0  
0  HASH(ss_customer_sk)
01:AGGREGATE31s769ms1s869ms   15.61K  -1   5.32 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   17.941ms   37.109ms  183.59K  -1   1.94 MB  
 16.00 MB  tpcds_parquet.store_sales

AFTER: 434ms
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail
-
06:AGGREGATE1  125.915us  125.915us1   1  28.00 KB  
  -1.00 B  FINALIZE
05:EXCHANGE 1   72.179us   72.179us3   1 0  
  -1.00 B  UNPARTITIONED
02:AGGREGATE3   79.054us   83.385us3   1  28.00 KB  
 10.00 MB
04:AGGREGATE36.559ms7.669ms   14.82K  -1  17.32 MB  
128.00 MB  FINALIZE
03:EXCHANGE 3   67.370us   85.068us   15.60K  -1 0  
0  HASH(ss_customer_sk)
01:AGGREGATE3   19.245ms   24.472ms   15.60K  -1

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-03-15 Thread Impala Public Jenkins (Code Review)

Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 16: Verified-1

Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/378/

-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-03-15 Thread Taras Bobrovytsky (Code Review)

Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 16:

Thanks for all the reviews, Alex, Jim, MJ, Mostafa, Marcel!

-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-03-15 Thread Impala Public Jenkins (Code Review)

Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 16:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/378/

-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-03-15 Thread Marcel Kornacker (Code Review)

Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 16: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-03-13 Thread Taras Bobrovytsky (Code Review)

Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 15:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/6025/15/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

Line 979: // array and reallocating when the array is full. When this object is 
serialized, the
> "When this object is serialized into an output buffer, the samples array is
Done


Line 1042:   // memory containing this object and the samples array is freed. 
The serialized object
> remove the implicit Delete() call and make that explicit at the call site. 
The problem is that we have to make some modifications to the object before 
serializing it. For example, we set inline to true on line 1066. So Delete will 
not work any more.

One option is to set those variables, then copy, then set them to their 
original values. I was considering doing that before, but I thought it might be 
confusing. What do you think is the right thing to do here?


Line 1162:   // Size of the samples array.
> be specific by referencing the member explicitly.
Done


Line 1180:   // Reallocates the samples array increasing it's capacity to 
"new_capacity" rounded up
> fix punctuation.
Rephrased the comment.


-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 15
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-03-13 Thread Taras Bobrovytsky (Code Review)

Taras Bobrovytsky has uploaded a new patch set (#16).

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..

IMPALA-4787: Optimize APPX_MEDIAN() memory usage

Before this change, ReservoirSample functions (such as APPX_MEDIAN())
allocated memory for 20,000 elements up front per grouping key. This
caused inefficient memory usage for aggregations with many grouping
keys.

This patch fixes this by initially allocating memory for 16 elements.
Once the buffer becomes full, we reallocate a new buffer with double
capacity and copy the original buffer into the new one. We continue
doubling the buffer size until the buffer has room for 20,000 elements
as before.

Testing:
Added some EE APPX_MEDIAN() tests on larger datasets that exercise the
resize code path.

Perf Benchrmark (about 35,000 elements per bucket):

SELECT MAX(a) from (
  SELECT c1, appx_median(c2) as a FROM benchmark GROUP BY c1) t

BEFORE: 11s067ms
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
-
06:AGGREGATE1  124.726us  124.726us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   29.544us   29.544us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   86.406us  120.372us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE31s840ms2s824ms   2.00K  -1   1.02 GB  
128.00 MB  FINALIZE
03:EXCHANGE 31s163ms1s989ms   6.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE33s356ms3s416ms   6.00K  -1   1.95 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   64.962ms   65.490ms  65.54M  -1  25.97 MB   
64.00 MB  tpcds_10_parquet.benchmark

AFTER: 9s465ms
Operator   #Hosts   Avg Time  Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail

06:AGGREGATE1   73.961us  73.961us   1   1  28.00 KB
-1.00 B  FINALIZE
05:EXCHANGE 1   18.101us  18.101us   3   1 0
-1.00 B  UNPARTITIONED
02:AGGREGATE3   75.795us  83.969us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE31s608ms   2s683ms   2.00K  -1   1.02 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  826.683ms   1s322ms   6.00K  -1 0
  0  HASH(c1)
01:AGGREGATE32s457ms   2s672ms   6.00K  -1   3.14 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   81.514ms  89.056ms  65.54M  -1  25.94 MB   
64.00 MB  tpcds_10_parquet.benchmark

Memory Benchmark (about 12 elements per bucket):

SELECT MAX(a) FROM (
  SELECT ss_customer_sk, APPX_MEDIAN(ss_sold_date_sk) as a
  FROM tpcds_parquet.store_sales
  GROUP BY ss_customer_sk) t

BEFORE: 7s477ms
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail
-
06:AGGREGATE1  114.686us  114.686us1   1  28.00 KB  
  -1.00 B  FINALIZE
05:EXCHANGE 1   18.214us   18.214us3   1 0  
  -1.00 B  UNPARTITIONED
02:AGGREGATE3  147.055us  165.464us3   1  28.00 KB  
 10.00 MB
04:AGGREGATE32s043ms2s147ms   14.82K  -1   4.94 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  840.528ms  943.254ms   15.61K  -1 0  
0  HASH(ss_customer_sk)
01:AGGREGATE31s769ms1s869ms   15.61K  -1   5.32 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   17.941ms   37.109ms  183.59K  -1   1.94 MB  
 16.00 MB  tpcds_parquet.store_sales

AFTER: 434ms
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail
-
06:AGGREGATE1  125.915us  125.915us1   1  28.00 KB  
  -1.00 B  FINALIZE
05:EXCHANGE 1   72.179us   72.179us3   1 0  
  -1.00 B  UNPARTITIONED
02:AGGREGATE3   79.054us   83.385us3   1  28.00 KB  
 10.00 MB
04:AGGREGATE36.559ms7.669ms   14.82K  -1  17.32 MB  
128.00 MB  FINALIZE
03:EXCHANGE 3   67.370us   85.068us   15.60K  -1 0  
0  HASH(ss_customer_sk)
01:AGGREGATE3   19.245ms   24.472ms   15.60K  -1   9.48 MB  
128.00 MB  STREAMING
00:SCAN HDFS3   53.173ms   55.844ms  183.59K  -1   1.18 MB  
 16.00 MB  tpcds_parquet.store_sales

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-03-13 Thread Taras Bobrovytsky (Code Review)

Hello Matthew Jacobs, Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6025

to look at the new patch set (#16).

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..

IMPALA-4787: Optimize APPX_MEDIAN() memory usage

Before this change, ReservoirSample functions (such as APPX_MEDIAN())
allocated memory for 20,000 elements up front per grouping key. This
caused inefficient memory usage for aggregations with many grouping
keys.

This patch fixes this by initially allocating memory for 16 elements.
Once the buffer becomes full, we reallocate a new buffer with double
capacity and copy the original buffer into the new one. We continue
doubling the buffer size until the buffer has room for 20,000 elements
as before.

Testing:
Added some EE APPX_MEDIAN() tests on larger datasets that exercise the
resize code path.

Perf Benchrmark (about 35,000 elements per bucket):

SELECT MAX(a) from (
  SELECT c1, appx_median(c2) as a FROM benchmark GROUP BY c1) t

BEFORE: 11s067ms
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
-
06:AGGREGATE1  124.726us  124.726us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   29.544us   29.544us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   86.406us  120.372us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE31s840ms2s824ms   2.00K  -1   1.02 GB  
128.00 MB  FINALIZE
03:EXCHANGE 31s163ms1s989ms   6.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE33s356ms3s416ms   6.00K  -1   1.95 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   64.962ms   65.490ms  65.54M  -1  25.97 MB   
64.00 MB  tpcds_10_parquet.benchmark

AFTER: 9s465ms
Operator   #Hosts   Avg Time  Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail

06:AGGREGATE1   73.961us  73.961us   1   1  28.00 KB
-1.00 B  FINALIZE
05:EXCHANGE 1   18.101us  18.101us   3   1 0
-1.00 B  UNPARTITIONED
02:AGGREGATE3   75.795us  83.969us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE31s608ms   2s683ms   2.00K  -1   1.02 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  826.683ms   1s322ms   6.00K  -1 0
  0  HASH(c1)
01:AGGREGATE32s457ms   2s672ms   6.00K  -1   3.14 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   81.514ms  89.056ms  65.54M  -1  25.94 MB   
64.00 MB  tpcds_10_parquet.benchmark

Memory Benchmark (about 12 elements per bucket):

SELECT MAX(a) FROM (
  SELECT ss_customer_sk, APPX_MEDIAN(ss_sold_date_sk) as a
  FROM tpcds_parquet.store_sales
  GROUP BY ss_customer_sk) t

BEFORE: 7s477ms
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail
-
06:AGGREGATE1  114.686us  114.686us1   1  28.00 KB  
  -1.00 B  FINALIZE
05:EXCHANGE 1   18.214us   18.214us3   1 0  
  -1.00 B  UNPARTITIONED
02:AGGREGATE3  147.055us  165.464us3   1  28.00 KB  
 10.00 MB
04:AGGREGATE32s043ms2s147ms   14.82K  -1   4.94 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  840.528ms  943.254ms   15.61K  -1 0  
0  HASH(ss_customer_sk)
01:AGGREGATE31s769ms1s869ms   15.61K  -1   5.32 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   17.941ms   37.109ms  183.59K  -1   1.94 MB  
 16.00 MB  tpcds_parquet.store_sales

AFTER: 434ms
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail
-
06:AGGREGATE1  125.915us  125.915us1   1  28.00 KB  
  -1.00 B  FINALIZE
05:EXCHANGE 1   72.179us   72.179us3   1 0  
  -1.00 B  UNPARTITIONED
02:AGGREGATE3   79.054us   83.385us3   1  28.00 KB  
 10.00 MB
04:AGGREGATE36.559ms7.669ms   14.82K  -1  17.32 MB  
128.00 MB  FINALIZE
03:EXCHANGE 3   67.370us   85.068us   15.60K  -1 0  
0  HASH(ss_customer_sk)
01:AGGREGATE3   19.245ms   24.472ms   15.60K  -1   9.48 MB  
128.00 MB  STREAMING
00:SCAN

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-03-13 Thread Taras Bobrovytsky (Code Review)

Hello Matthew Jacobs, Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6025

to look at the new patch set (#16).

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..

IMPALA-4787: Optimize APPX_MEDIAN() memory usage

Before this change, ReservoirSample functions (such as APPX_MEDIAN())
allocated memory for 20,000 elements up front per grouping key. This
caused inefficient memory usage for aggregations with many grouping
keys.

This patch fixes this by initially allocating memory for 16 elements.
Once the buffer becomes full, we reallocate a new buffer with double
capacity and copy the original buffer into the new one. We continue
doubling the buffer size until the buffer has room for 20,000 elements
as before.

Testing:
Added some EE APPX_MEDIAN() tests on larger datasets that exercise the
resize code path.

Perf Benchrmark (about 35,000 elements per bucket):

SELECT MAX(a) from (
  SELECT c1, appx_median(c2) as a FROM benchmark GROUP BY c1) t

BEFORE: 11s067ms
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
-
06:AGGREGATE1  124.726us  124.726us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   29.544us   29.544us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   86.406us  120.372us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE31s840ms2s824ms   2.00K  -1   1.02 GB  
128.00 MB  FINALIZE
03:EXCHANGE 31s163ms1s989ms   6.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE33s356ms3s416ms   6.00K  -1   1.95 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   64.962ms   65.490ms  65.54M  -1  25.97 MB   
64.00 MB  tpcds_10_parquet.benchmark

AFTER: 9s465ms
Operator   #Hosts   Avg Time  Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail

06:AGGREGATE1   73.961us  73.961us   1   1  28.00 KB
-1.00 B  FINALIZE
05:EXCHANGE 1   18.101us  18.101us   3   1 0
-1.00 B  UNPARTITIONED
02:AGGREGATE3   75.795us  83.969us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE31s608ms   2s683ms   2.00K  -1   1.02 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  826.683ms   1s322ms   6.00K  -1 0
  0  HASH(c1)
01:AGGREGATE32s457ms   2s672ms   6.00K  -1   3.14 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   81.514ms  89.056ms  65.54M  -1  25.94 MB   
64.00 MB  tpcds_10_parquet.benchmark

Memory Benchmark (about 12 elements per bucket):

SELECT MAX(a) FROM (
  SELECT ss_customer_sk, APPX_MEDIAN(ss_sold_date_sk) as a
  FROM tpcds_parquet.store_sales
  GROUP BY ss_customer_sk) t

BEFORE: 7s477ms
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail
-
06:AGGREGATE1  114.686us  114.686us1   1  28.00 KB  
  -1.00 B  FINALIZE
05:EXCHANGE 1   18.214us   18.214us3   1 0  
  -1.00 B  UNPARTITIONED
02:AGGREGATE3  147.055us  165.464us3   1  28.00 KB  
 10.00 MB
04:AGGREGATE32s043ms2s147ms   14.82K  -1   4.94 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  840.528ms  943.254ms   15.61K  -1 0  
0  HASH(ss_customer_sk)
01:AGGREGATE31s769ms1s869ms   15.61K  -1   5.32 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   17.941ms   37.109ms  183.59K  -1   1.94 MB  
 16.00 MB  tpcds_parquet.store_sales

AFTER: 434ms
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail
-
06:AGGREGATE1  125.915us  125.915us1   1  28.00 KB  
  -1.00 B  FINALIZE
05:EXCHANGE 1   72.179us   72.179us3   1 0  
  -1.00 B  UNPARTITIONED
02:AGGREGATE3   79.054us   83.385us3   1  28.00 KB  
 10.00 MB
04:AGGREGATE36.559ms7.669ms   14.82K  -1  17.32 MB  
128.00 MB  FINALIZE
03:EXCHANGE 3   67.370us   85.068us   15.60K  -1 0  
0  HASH(ss_customer_sk)
01:AGGREGATE3   19.245ms   24.472ms   15.60K  -1   9.48 MB  
128.00 MB  STREAMING
00:SCAN

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-03-13 Thread Marcel Kornacker (Code Review)

Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 15:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/6025/15/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

Line 979: // array and reallocating when the array is full. When this object is 
serialized, the
"When this object is serialized into an output buffer, the samples array is 
inlined into the output buffer as well." or something like that.


Line 1042:   // memory containing this object and the samples array is freed. 
The serialized object
remove the implicit Delete() call and make that explicit at the call site. 
member functions should do 'delete this'.


Line 1162:   // Size of the samples array.
be specific by referencing the member explicitly.


Line 1180:   // Reallocates the samples array increasing it's capacity to 
"new_capacity" rounded up
fix punctuation.


-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 15
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-03-13 Thread Taras Bobrovytsky (Code Review)

Taras Bobrovytsky has uploaded a new patch set (#15).

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..

IMPALA-4787: Optimize APPX_MEDIAN() memory usage

Before this change, ReservoirSample functions (such as APPX_MEDIAN())
allocated memory for 20,000 elements up front per grouping key. This
caused inefficient memory usage for aggregations with many grouping
keys.

This patch fixes this by initially allocating memory for 16 elements.
Once the buffer becomes full, we reallocate a new buffer with double
capacity and copy the original buffer into the new one. We continue
doubling the buffer size until the buffer has room for 20,000 elements
as before.

Testing:
Added some EE APPX_MEDIAN() tests on larger datasets that exercise the
resize code path.

Perf Benchrmark (about 35,000 elements per bucket):

SELECT MAX(a) from (
  SELECT c1, appx_median(c2) as a FROM benchmark GROUP BY c1) t

BEFORE: 11s067ms
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
-
06:AGGREGATE1  124.726us  124.726us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   29.544us   29.544us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   86.406us  120.372us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE31s840ms2s824ms   2.00K  -1   1.02 GB  
128.00 MB  FINALIZE
03:EXCHANGE 31s163ms1s989ms   6.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE33s356ms3s416ms   6.00K  -1   1.95 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   64.962ms   65.490ms  65.54M  -1  25.97 MB   
64.00 MB  tpcds_10_parquet.benchmark

AFTER: 9s465ms
Operator   #Hosts   Avg Time  Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail

06:AGGREGATE1   73.961us  73.961us   1   1  28.00 KB
-1.00 B  FINALIZE
05:EXCHANGE 1   18.101us  18.101us   3   1 0
-1.00 B  UNPARTITIONED
02:AGGREGATE3   75.795us  83.969us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE31s608ms   2s683ms   2.00K  -1   1.02 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  826.683ms   1s322ms   6.00K  -1 0
  0  HASH(c1)
01:AGGREGATE32s457ms   2s672ms   6.00K  -1   3.14 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   81.514ms  89.056ms  65.54M  -1  25.94 MB   
64.00 MB  tpcds_10_parquet.benchmark

Memory Benchmark (about 12 elements per bucket):

SELECT MAX(a) FROM (
  SELECT ss_customer_sk, APPX_MEDIAN(ss_sold_date_sk) as a
  FROM tpcds_parquet.store_sales
  GROUP BY ss_customer_sk) t

BEFORE: 7s477ms
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail
-
06:AGGREGATE1  114.686us  114.686us1   1  28.00 KB  
  -1.00 B  FINALIZE
05:EXCHANGE 1   18.214us   18.214us3   1 0  
  -1.00 B  UNPARTITIONED
02:AGGREGATE3  147.055us  165.464us3   1  28.00 KB  
 10.00 MB
04:AGGREGATE32s043ms2s147ms   14.82K  -1   4.94 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  840.528ms  943.254ms   15.61K  -1 0  
0  HASH(ss_customer_sk)
01:AGGREGATE31s769ms1s869ms   15.61K  -1   5.32 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   17.941ms   37.109ms  183.59K  -1   1.94 MB  
 16.00 MB  tpcds_parquet.store_sales

AFTER: 434ms
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail
-
06:AGGREGATE1  125.915us  125.915us1   1  28.00 KB  
  -1.00 B  FINALIZE
05:EXCHANGE 1   72.179us   72.179us3   1 0  
  -1.00 B  UNPARTITIONED
02:AGGREGATE3   79.054us   83.385us3   1  28.00 KB  
 10.00 MB
04:AGGREGATE36.559ms7.669ms   14.82K  -1  17.32 MB  
128.00 MB  FINALIZE
03:EXCHANGE 3   67.370us   85.068us   15.60K  -1 0  
0  HASH(ss_customer_sk)
01:AGGREGATE3   19.245ms   24.472ms   15.60K  -1   9.48 MB  
128.00 MB  STREAMING
00:SCAN HDFS3   53.173ms   55.844ms  183.59K  -1   1.18 MB  
 16.00 MB  tpcds_parquet.store_sales

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-03-13 Thread Taras Bobrovytsky (Code Review)

Hello Matthew Jacobs, Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6025

to look at the new patch set (#15).

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..

IMPALA-4787: Optimize APPX_MEDIAN() memory usage

Before this change, ReservoirSample functions (such as APPX_MEDIAN())
allocated memory for 20,000 elements up front per grouping key. This
caused inefficient memory usage for aggregations with many grouping
keys.

This patch fixes this by initially allocating memory for 16 elements.
Once the buffer becomes full, we reallocate a new buffer with double
capacity and copy the original buffer into the new one. We continue
doubling the buffer size until the buffer has room for 20,000 elements
as before.

Testing:
Added some EE APPX_MEDIAN() tests on larger datasets that exercise the
resize code path.

Perf Benchrmark (about 35,000 elements per bucket):

SELECT MAX(a) from (
  SELECT c1, appx_median(c2) as a FROM benchmark GROUP BY c1) t

BEFORE: 11s067ms
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
-
06:AGGREGATE1  124.726us  124.726us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   29.544us   29.544us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   86.406us  120.372us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE31s840ms2s824ms   2.00K  -1   1.02 GB  
128.00 MB  FINALIZE
03:EXCHANGE 31s163ms1s989ms   6.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE33s356ms3s416ms   6.00K  -1   1.95 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   64.962ms   65.490ms  65.54M  -1  25.97 MB   
64.00 MB  tpcds_10_parquet.benchmark

AFTER: 9s465ms
Operator   #Hosts   Avg Time  Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail

06:AGGREGATE1   73.961us  73.961us   1   1  28.00 KB
-1.00 B  FINALIZE
05:EXCHANGE 1   18.101us  18.101us   3   1 0
-1.00 B  UNPARTITIONED
02:AGGREGATE3   75.795us  83.969us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE31s608ms   2s683ms   2.00K  -1   1.02 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  826.683ms   1s322ms   6.00K  -1 0
  0  HASH(c1)
01:AGGREGATE32s457ms   2s672ms   6.00K  -1   3.14 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   81.514ms  89.056ms  65.54M  -1  25.94 MB   
64.00 MB  tpcds_10_parquet.benchmark

Memory Benchmark (about 12 elements per bucket):

SELECT MAX(a) FROM (
  SELECT ss_customer_sk, APPX_MEDIAN(ss_sold_date_sk) as a
  FROM tpcds_parquet.store_sales
  GROUP BY ss_customer_sk) t

BEFORE: 7s477ms
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail
-
06:AGGREGATE1  114.686us  114.686us1   1  28.00 KB  
  -1.00 B  FINALIZE
05:EXCHANGE 1   18.214us   18.214us3   1 0  
  -1.00 B  UNPARTITIONED
02:AGGREGATE3  147.055us  165.464us3   1  28.00 KB  
 10.00 MB
04:AGGREGATE32s043ms2s147ms   14.82K  -1   4.94 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  840.528ms  943.254ms   15.61K  -1 0  
0  HASH(ss_customer_sk)
01:AGGREGATE31s769ms1s869ms   15.61K  -1   5.32 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   17.941ms   37.109ms  183.59K  -1   1.94 MB  
 16.00 MB  tpcds_parquet.store_sales

AFTER: 434ms
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail
-
06:AGGREGATE1  125.915us  125.915us1   1  28.00 KB  
  -1.00 B  FINALIZE
05:EXCHANGE 1   72.179us   72.179us3   1 0  
  -1.00 B  UNPARTITIONED
02:AGGREGATE3   79.054us   83.385us3   1  28.00 KB  
 10.00 MB
04:AGGREGATE36.559ms7.669ms   14.82K  -1  17.32 MB  
128.00 MB  FINALIZE
03:EXCHANGE 3   67.370us   85.068us   15.60K  -1 0  
0  HASH(ss_customer_sk)
01:AGGREGATE3   19.245ms   24.472ms   15.60K  -1   9.48 MB  
128.00 MB  STREAMING
00:SCAN

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-03-13 Thread Taras Bobrovytsky (Code Review)

Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 14:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/6025/14/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

Line 908: const static int INIT_CAPACITY = 16;
> move those constants into the classes that use them
Done


Line 981: // stored in a dynamically sized array. See the ".h" file for more 
details.
> briefly describe memory mgmt, in particular the switch to the inlined repre
The description of the reservoir sample algorithm is there. Removed.


Line 1040:   // contains. Frees the memory of the current object. Requires a 
call Deserialize() call
> fix grammar
Done


Line 1062: DCHECK(!sample_array_inline_);
> dcheck this at the top, makes it easier to pick up on it.
Done


Line 1085:   // reservoir sampling algorithm.
> point out all side effects (increase of capacity; when that fails)
Done


Line 1100: // First, fill up the dst samples if they don't already exist. 
The samples are now
> separate logically separate blocks with blank lines
Done


Line 1124: if (num_samples_ == 0) {
> single line
Done


Line 1138:   void Free(FunctionContext* ctx) {
> call this Delete, because that's what it basically implements (free() only 
Done


Line 1151:   // resize occurs, this needs to be updated.
> i'm assuming this is the size of the array. comment.
Done


Line 1169:   // Increases the size of the samples array to a rounded up to a 
power of two
> "array to new_capacity, rounded up to a power of two."
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-03-13 Thread Taras Bobrovytsky (Code Review)

Hello Matthew Jacobs, Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6025

to look at the new patch set (#16).

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..

IMPALA-4787: Optimize APPX_MEDIAN() memory usage

Before this change, ReservoirSample functions (such as APPX_MEDIAN())
allocated memory for 20,000 elements up front per grouping key. This
caused inefficient memory usage for aggregations with many grouping
keys.

This patch fixes this by initially allocating memory for 16 elements.
Once the buffer becomes full, we reallocate a new buffer with double
capacity and copy the original buffer into the new one. We continue
doubling the buffer size until the buffer has room for 20,000 elements
as before.

Testing:
Added some EE APPX_MEDIAN() tests on larger datasets that exercise the
resize code path.

Perf Benchrmark (about 35,000 elements per bucket):

SELECT MAX(a) from (
  SELECT c1, appx_median(c2) as a FROM benchmark GROUP BY c1) t

BEFORE: 11s067ms
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
-
06:AGGREGATE1  124.726us  124.726us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   29.544us   29.544us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   86.406us  120.372us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE31s840ms2s824ms   2.00K  -1   1.02 GB  
128.00 MB  FINALIZE
03:EXCHANGE 31s163ms1s989ms   6.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE33s356ms3s416ms   6.00K  -1   1.95 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   64.962ms   65.490ms  65.54M  -1  25.97 MB   
64.00 MB  tpcds_10_parquet.benchmark

AFTER: 9s465ms
Operator   #Hosts   Avg Time  Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail

06:AGGREGATE1   73.961us  73.961us   1   1  28.00 KB
-1.00 B  FINALIZE
05:EXCHANGE 1   18.101us  18.101us   3   1 0
-1.00 B  UNPARTITIONED
02:AGGREGATE3   75.795us  83.969us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE31s608ms   2s683ms   2.00K  -1   1.02 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  826.683ms   1s322ms   6.00K  -1 0
  0  HASH(c1)
01:AGGREGATE32s457ms   2s672ms   6.00K  -1   3.14 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   81.514ms  89.056ms  65.54M  -1  25.94 MB   
64.00 MB  tpcds_10_parquet.benchmark

Memory Benchmark (about 12 elements per bucket):

SELECT MAX(a) FROM (
  SELECT ss_customer_sk, APPX_MEDIAN(ss_sold_date_sk) as a
  FROM tpcds_parquet.store_sales
  GROUP BY ss_customer_sk) t

BEFORE: 7s477ms
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail
-
06:AGGREGATE1  114.686us  114.686us1   1  28.00 KB  
  -1.00 B  FINALIZE
05:EXCHANGE 1   18.214us   18.214us3   1 0  
  -1.00 B  UNPARTITIONED
02:AGGREGATE3  147.055us  165.464us3   1  28.00 KB  
 10.00 MB
04:AGGREGATE32s043ms2s147ms   14.82K  -1   4.94 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  840.528ms  943.254ms   15.61K  -1 0  
0  HASH(ss_customer_sk)
01:AGGREGATE31s769ms1s869ms   15.61K  -1   5.32 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   17.941ms   37.109ms  183.59K  -1   1.94 MB  
 16.00 MB  tpcds_parquet.store_sales

AFTER: 434ms
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail
-
06:AGGREGATE1  125.915us  125.915us1   1  28.00 KB  
  -1.00 B  FINALIZE
05:EXCHANGE 1   72.179us   72.179us3   1 0  
  -1.00 B  UNPARTITIONED
02:AGGREGATE3   79.054us   83.385us3   1  28.00 KB  
 10.00 MB
04:AGGREGATE36.559ms7.669ms   14.82K  -1  17.32 MB  
128.00 MB  FINALIZE
03:EXCHANGE 3   67.370us   85.068us   15.60K  -1 0  
0  HASH(ss_customer_sk)
01:AGGREGATE3   19.245ms   24.472ms   15.60K  -1   9.48 MB  
128.00 MB  STREAMING
00:SCAN

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-03-10 Thread Marcel Kornacker (Code Review)

Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 14:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/6025/14/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

Line 908: const static int INIT_CAPACITY = 16;
move those constants into the classes that use them


Line 981: // stored in a dynamically sized array. See the ".h" file for more 
details.
briefly describe memory mgmt, in particular the switch to the inlined 
representation.

i didn't get the "see .h for more details part", is there anything relevant to 
this in there?


Line 1040:   // contains. Frees the memory of the current object. Requires a 
call Deserialize() call
fix grammar

point out that the array must not be inlined


Line 1062: DCHECK(!sample_array_inline_);
dcheck this at the top, makes it easier to pick up on it.


Line 1085:   // reservoir sampling algorithm.
point out all side effects (increase of capacity; when that fails)


Line 1100: // First, fill up the dst samples if they don't already exist. 
The samples are now
separate logically separate blocks with blank lines


Line 1124: if (num_samples_ == 0) {
single line


Line 1138:   void Free(FunctionContext* ctx) {
call this Delete, because that's what it basically implements (free() only 
frees the memory directly allocated for that pointer)


Line 1151:   // resize occurs, this needs to be updated.
i'm assuming this is the size of the array. comment.


Line 1169:   // Increases the size of the samples array to a rounded up to a 
power of two
"array to new_capacity, rounded up to a power of two."


-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-03-07 Thread Taras Bobrovytsky (Code Review)

Taras Bobrovytsky has uploaded a new patch set (#14).

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..

IMPALA-4787: Optimize APPX_MEDIAN() memory usage

Before this change, ReservoirSample functions (such as APPX_MEDIAN())
allocated memory for 20,000 elements up front per grouping key. This
caused inefficient memory usage for aggregations with many grouping
keys.

This patch fixes this by initially allocating memory for 16 elements.
Once the buffer becomes full, we reallocate a new buffer with double
capacity and copy the original buffer into the new one. We continue
doubling the buffer size until the buffer has room for 20,000 elements
as before.

Testing:
Added some EE APPX_MEDIAN() tests on larger datasets that exercise the
resize code path.

Perf Benchrmark (about 35,000 elements per bucket):

SELECT MAX(a) from (
  SELECT c1, appx_median(c2) as a FROM benchmark GROUP BY c1) t

BEFORE: 11s067ms
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
-
06:AGGREGATE1  124.726us  124.726us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   29.544us   29.544us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   86.406us  120.372us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE31s840ms2s824ms   2.00K  -1   1.02 GB  
128.00 MB  FINALIZE
03:EXCHANGE 31s163ms1s989ms   6.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE33s356ms3s416ms   6.00K  -1   1.95 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   64.962ms   65.490ms  65.54M  -1  25.97 MB   
64.00 MB  tpcds_10_parquet.benchmark

AFTER: 9s465ms
Operator   #Hosts   Avg Time  Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail

06:AGGREGATE1   73.961us  73.961us   1   1  28.00 KB
-1.00 B  FINALIZE
05:EXCHANGE 1   18.101us  18.101us   3   1 0
-1.00 B  UNPARTITIONED
02:AGGREGATE3   75.795us  83.969us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE31s608ms   2s683ms   2.00K  -1   1.02 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  826.683ms   1s322ms   6.00K  -1 0
  0  HASH(c1)
01:AGGREGATE32s457ms   2s672ms   6.00K  -1   3.14 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   81.514ms  89.056ms  65.54M  -1  25.94 MB   
64.00 MB  tpcds_10_parquet.benchmark

Memory Benchmark (about 12 elements per bucket):

SELECT MAX(a) FROM (
  SELECT ss_customer_sk, APPX_MEDIAN(ss_sold_date_sk) as a
  FROM tpcds_parquet.store_sales
  GROUP BY ss_customer_sk) t

BEFORE: 7s477ms
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail
-
06:AGGREGATE1  114.686us  114.686us1   1  28.00 KB  
  -1.00 B  FINALIZE
05:EXCHANGE 1   18.214us   18.214us3   1 0  
  -1.00 B  UNPARTITIONED
02:AGGREGATE3  147.055us  165.464us3   1  28.00 KB  
 10.00 MB
04:AGGREGATE32s043ms2s147ms   14.82K  -1   4.94 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  840.528ms  943.254ms   15.61K  -1 0  
0  HASH(ss_customer_sk)
01:AGGREGATE31s769ms1s869ms   15.61K  -1   5.32 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   17.941ms   37.109ms  183.59K  -1   1.94 MB  
 16.00 MB  tpcds_parquet.store_sales

AFTER: 434ms
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail
-
06:AGGREGATE1  125.915us  125.915us1   1  28.00 KB  
  -1.00 B  FINALIZE
05:EXCHANGE 1   72.179us   72.179us3   1 0  
  -1.00 B  UNPARTITIONED
02:AGGREGATE3   79.054us   83.385us3   1  28.00 KB  
 10.00 MB
04:AGGREGATE36.559ms7.669ms   14.82K  -1  17.32 MB  
128.00 MB  FINALIZE
03:EXCHANGE 3   67.370us   85.068us   15.60K  -1 0  
0  HASH(ss_customer_sk)
01:AGGREGATE3   19.245ms   24.472ms   15.60K  -1   9.48 MB  
128.00 MB  STREAMING
00:SCAN HDFS3   53.173ms   55.844ms  183.59K  -1   1.18 MB  
 16.00 MB  tpcds_parquet.store_sales

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-03-07 Thread Taras Bobrovytsky (Code Review)

Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 13:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/6025/13/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

PS13, Line 1075: capacity_ = num_samples_;
   : sample_array_inline_ = true;
> as we discussed in person, let's move this to serialize which is when this 
Done


PS13, Line 1091: int target_capacity = INIT_CAPACITY;
   : while (target_capacity < necessary_capacity) 
target_capacity *= 2;
> BitUtil::RoundUpToPowerOfTwo
Done


PS13, Line 1093: if (target_capacity > MAX_CAPACITY) target_capacity = 
MAX_CAPACITY;
   : bool result = IncreaseCapacity(ctx, target_capacity);
> this is simplified if IncreaseCapacity takes the max and handles the ceilin
Done. IncreaseCapacity() handles the ceiling.


PS13, Line 1163: may
> may be
Done


PS13, Line 1168:   // if it hasn't been allocated yet. Returns false if the 
operation fails.
   :   bool IncreaseCapacity(FunctionContext* ctx, int 
new_capacity) {
> If you change this to take max_capacity (which always should pass the const
I made some changes. I am not sure if it makes sense to pass max_capacity as a 
parameter, but the ceiling is calculated here as you suggested. Let me know if 
my change makes sense to you.


-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-03-07 Thread Taras Bobrovytsky (Code Review)

Hello Matthew Jacobs, Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6025

to look at the new patch set (#14).

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..

IMPALA-4787: Optimize APPX_MEDIAN() memory usage

Before this change, ReservoirSample functions (such as APPX_MEDIAN())
allocated memory for 20,000 elements up front per grouping key. This
caused inefficient memory usage for aggregations with many grouping
keys.

This patch fixes this by initially allocating memory for 16 elements.
Once the buffer becomes full, we reallocate a new buffer with double
capacity and copy the original buffer into the new one. We continue
doubling the buffer size until the buffer has room for 20,000 elements
as before.

Testing:
Added some EE APPX_MEDIAN() tests on larger datasets that exercise the
resize code path.

Perf Benchrmark (about 35,000 elements per bucket):

SELECT MAX(a) from (
  SELECT c1, appx_median(c2) as a FROM benchmark GROUP BY c1) t

BEFORE: 11s067ms
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
-
06:AGGREGATE1  124.726us  124.726us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   29.544us   29.544us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   86.406us  120.372us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE31s840ms2s824ms   2.00K  -1   1.02 GB  
128.00 MB  FINALIZE
03:EXCHANGE 31s163ms1s989ms   6.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE33s356ms3s416ms   6.00K  -1   1.95 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   64.962ms   65.490ms  65.54M  -1  25.97 MB   
64.00 MB  tpcds_10_parquet.benchmark

AFTER: 9s465ms
Operator   #Hosts   Avg Time  Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail

06:AGGREGATE1   73.961us  73.961us   1   1  28.00 KB
-1.00 B  FINALIZE
05:EXCHANGE 1   18.101us  18.101us   3   1 0
-1.00 B  UNPARTITIONED
02:AGGREGATE3   75.795us  83.969us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE31s608ms   2s683ms   2.00K  -1   1.02 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  826.683ms   1s322ms   6.00K  -1 0
  0  HASH(c1)
01:AGGREGATE32s457ms   2s672ms   6.00K  -1   3.14 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   81.514ms  89.056ms  65.54M  -1  25.94 MB   
64.00 MB  tpcds_10_parquet.benchmark

Memory Benchmark (about 12 elements per bucket):

SELECT MAX(a) FROM (
  SELECT ss_customer_sk, APPX_MEDIAN(ss_sold_date_sk) as a
  FROM tpcds_parquet.store_sales
  GROUP BY ss_customer_sk) t

BEFORE: 7s477ms
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail
-
06:AGGREGATE1  114.686us  114.686us1   1  28.00 KB  
  -1.00 B  FINALIZE
05:EXCHANGE 1   18.214us   18.214us3   1 0  
  -1.00 B  UNPARTITIONED
02:AGGREGATE3  147.055us  165.464us3   1  28.00 KB  
 10.00 MB
04:AGGREGATE32s043ms2s147ms   14.82K  -1   4.94 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  840.528ms  943.254ms   15.61K  -1 0  
0  HASH(ss_customer_sk)
01:AGGREGATE31s769ms1s869ms   15.61K  -1   5.32 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   17.941ms   37.109ms  183.59K  -1   1.94 MB  
 16.00 MB  tpcds_parquet.store_sales

AFTER: 434ms
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail
-
06:AGGREGATE1  125.915us  125.915us1   1  28.00 KB  
  -1.00 B  FINALIZE
05:EXCHANGE 1   72.179us   72.179us3   1 0  
  -1.00 B  UNPARTITIONED
02:AGGREGATE3   79.054us   83.385us3   1  28.00 KB  
 10.00 MB
04:AGGREGATE36.559ms7.669ms   14.82K  -1  17.32 MB  
128.00 MB  FINALIZE
03:EXCHANGE 3   67.370us   85.068us   15.60K  -1 0  
0  HASH(ss_customer_sk)
01:AGGREGATE3   19.245ms   24.472ms   15.60K  -1   9.48 MB  
128.00 MB  STREAMING
00:SCAN

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-03-07 Thread Taras Bobrovytsky (Code Review)

Hello Matthew Jacobs, Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6025

to look at the new patch set (#14).

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..

IMPALA-4787: Optimize APPX_MEDIAN() memory usage

Before this change, ReservoirSample functions (such as APPX_MEDIAN())
allocated memory for 20,000 elements up front per grouping key. This
caused inefficient memory usage for aggregations with many grouping
keys.

This patch fixes this by initially allocating memory for 16 elements.
Once the buffer becomes full, we reallocate a new buffer with double
capacity and copy the original buffer into the new one. We continue
doubling the buffer size until the buffer has room for 20,000 elements
as before.

Testing:
Added some EE APPX_MEDIAN() tests on larger datasets that exercise the
resize code path.

Perf Benchrmark (about 35,000 elements per bucket):

SELECT MAX(a) from (
  SELECT c1, appx_median(c2) as a FROM benchmark GROUP BY c1) t

BEFORE: 11s067ms
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
-
06:AGGREGATE1  124.726us  124.726us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   29.544us   29.544us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   86.406us  120.372us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE31s840ms2s824ms   2.00K  -1   1.02 GB  
128.00 MB  FINALIZE
03:EXCHANGE 31s163ms1s989ms   6.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE33s356ms3s416ms   6.00K  -1   1.95 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   64.962ms   65.490ms  65.54M  -1  25.97 MB   
64.00 MB  tpcds_10_parquet.benchmark

AFTER: 9s465ms
Operator   #Hosts   Avg Time  Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail

06:AGGREGATE1   73.961us  73.961us   1   1  28.00 KB
-1.00 B  FINALIZE
05:EXCHANGE 1   18.101us  18.101us   3   1 0
-1.00 B  UNPARTITIONED
02:AGGREGATE3   75.795us  83.969us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE31s608ms   2s683ms   2.00K  -1   1.02 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  826.683ms   1s322ms   6.00K  -1 0
  0  HASH(c1)
01:AGGREGATE32s457ms   2s672ms   6.00K  -1   3.14 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   81.514ms  89.056ms  65.54M  -1  25.94 MB   
64.00 MB  tpcds_10_parquet.benchmark

Memory Benchmark (about 12 elements per bucket):

SELECT MAX(a) FROM (
  SELECT ss_customer_sk, APPX_MEDIAN(ss_sold_date_sk) as a
  FROM tpcds_parquet.store_sales
  GROUP BY ss_customer_sk) t

BEFORE: 7s477ms
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail
-
06:AGGREGATE1  114.686us  114.686us1   1  28.00 KB  
  -1.00 B  FINALIZE
05:EXCHANGE 1   18.214us   18.214us3   1 0  
  -1.00 B  UNPARTITIONED
02:AGGREGATE3  147.055us  165.464us3   1  28.00 KB  
 10.00 MB
04:AGGREGATE32s043ms2s147ms   14.82K  -1   4.94 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  840.528ms  943.254ms   15.61K  -1 0  
0  HASH(ss_customer_sk)
01:AGGREGATE31s769ms1s869ms   15.61K  -1   5.32 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   17.941ms   37.109ms  183.59K  -1   1.94 MB  
 16.00 MB  tpcds_parquet.store_sales

AFTER: 434ms
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail
-
06:AGGREGATE1  125.915us  125.915us1   1  28.00 KB  
  -1.00 B  FINALIZE
05:EXCHANGE 1   72.179us   72.179us3   1 0  
  -1.00 B  UNPARTITIONED
02:AGGREGATE3   79.054us   83.385us3   1  28.00 KB  
 10.00 MB
04:AGGREGATE36.559ms7.669ms   14.82K  -1  17.32 MB  
128.00 MB  FINALIZE
03:EXCHANGE 3   67.370us   85.068us   15.60K  -1 0  
0  HASH(ss_customer_sk)
01:AGGREGATE3   19.245ms   24.472ms   15.60K  -1   9.48 MB  
128.00 MB  STREAMING
00:SCAN

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-03-07 Thread Taras Bobrovytsky (Code Review)

Hello Matthew Jacobs, Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6025

to look at the new patch set (#14).

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..

IMPALA-4787: Optimize APPX_MEDIAN() memory usage

Before this change, ReservoirSample functions (such as APPX_MEDIAN())
allocated memory for 20,000 elements up front per grouping key. This
caused inefficient memory usage for aggregations with many grouping
keys.

This patch fixes this by initially allocating memory for 16 elements.
Once the buffer becomes full, we reallocate a new buffer with double
capacity and copy the original buffer into the new one. We continue
doubling the buffer size until the buffer has room for 20,000 elements
as before.

Testing:
Added some EE APPX_MEDIAN() tests on larger datasets that exercise the
resize code path.

Perf Benchrmark (about 35,000 elements per bucket):

SELECT MAX(a) from (
  SELECT c1, appx_median(c2) as a FROM benchmark GROUP BY c1) t

BEFORE: 11s067ms
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
-
06:AGGREGATE1  124.726us  124.726us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   29.544us   29.544us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   86.406us  120.372us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE31s840ms2s824ms   2.00K  -1   1.02 GB  
128.00 MB  FINALIZE
03:EXCHANGE 31s163ms1s989ms   6.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE33s356ms3s416ms   6.00K  -1   1.95 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   64.962ms   65.490ms  65.54M  -1  25.97 MB   
64.00 MB  tpcds_10_parquet.benchmark

AFTER: 9s465ms
Operator   #Hosts   Avg Time  Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail

06:AGGREGATE1   73.961us  73.961us   1   1  28.00 KB
-1.00 B  FINALIZE
05:EXCHANGE 1   18.101us  18.101us   3   1 0
-1.00 B  UNPARTITIONED
02:AGGREGATE3   75.795us  83.969us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE31s608ms   2s683ms   2.00K  -1   1.02 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  826.683ms   1s322ms   6.00K  -1 0
  0  HASH(c1)
01:AGGREGATE32s457ms   2s672ms   6.00K  -1   3.14 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   81.514ms  89.056ms  65.54M  -1  25.94 MB   
64.00 MB  tpcds_10_parquet.benchmark

Memory Benchmark (about 12 elements per bucket):

SELECT MAX(a) FROM (
  SELECT ss_customer_sk, APPX_MEDIAN(ss_sold_date_sk) as a
  FROM tpcds_parquet.store_sales
  GROUP BY ss_customer_sk) t

BEFORE: 7s477ms
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail
-
06:AGGREGATE1  114.686us  114.686us1   1  28.00 KB  
  -1.00 B  FINALIZE
05:EXCHANGE 1   18.214us   18.214us3   1 0  
  -1.00 B  UNPARTITIONED
02:AGGREGATE3  147.055us  165.464us3   1  28.00 KB  
 10.00 MB
04:AGGREGATE32s043ms2s147ms   14.82K  -1   4.94 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  840.528ms  943.254ms   15.61K  -1 0  
0  HASH(ss_customer_sk)
01:AGGREGATE31s769ms1s869ms   15.61K  -1   5.32 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   17.941ms   37.109ms  183.59K  -1   1.94 MB  
 16.00 MB  tpcds_parquet.store_sales

AFTER: 434ms
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail
-
06:AGGREGATE1  125.915us  125.915us1   1  28.00 KB  
  -1.00 B  FINALIZE
05:EXCHANGE 1   72.179us   72.179us3   1 0  
  -1.00 B  UNPARTITIONED
02:AGGREGATE3   79.054us   83.385us3   1  28.00 KB  
 10.00 MB
04:AGGREGATE36.559ms7.669ms   14.82K  -1  17.32 MB  
128.00 MB  FINALIZE
03:EXCHANGE 3   67.370us   85.068us   15.60K  -1 0  
0  HASH(ss_customer_sk)
01:AGGREGATE3   19.245ms   24.472ms   15.60K  -1   9.48 MB  
128.00 MB  STREAMING
00:SCAN

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-03-06 Thread Taras Bobrovytsky (Code Review)

Hello Matthew Jacobs, Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6025

to look at the new patch set (#14).

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..

IMPALA-4787: Optimize APPX_MEDIAN() memory usage

Before this change, ReservoirSample functions (such as APPX_MEDIAN())
allocated memory for 20,000 elements up front per grouping key. This
caused inefficient memory usage for aggregations with many grouping
keys.

This patch fixes this by initially allocating memory for 16 elements.
Once the buffer becomes full, we reallocate a new buffer with double
capacity and copy the original buffer into the new one. We continue
doubling the buffer size until the buffer has room for 20,000 elements
as before.

Testing:
Added some EE APPX_MEDIAN() tests on larger datasets that exercise the
resize code path.

Perf Benchrmark (about 35,000 elements per bucket):

SELECT MAX(a) from (
  SELECT c1, appx_median(c2) as a FROM benchmark GROUP BY c1) t

BEFORE: 11s067ms
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
-
06:AGGREGATE1  124.726us  124.726us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   29.544us   29.544us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   86.406us  120.372us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE31s840ms2s824ms   2.00K  -1   1.02 GB  
128.00 MB  FINALIZE
03:EXCHANGE 31s163ms1s989ms   6.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE33s356ms3s416ms   6.00K  -1   1.95 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   64.962ms   65.490ms  65.54M  -1  25.97 MB   
64.00 MB  tpcds_10_parquet.benchmark

AFTER: 9s465ms
Operator   #Hosts   Avg Time  Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail

06:AGGREGATE1   73.961us  73.961us   1   1  28.00 KB
-1.00 B  FINALIZE
05:EXCHANGE 1   18.101us  18.101us   3   1 0
-1.00 B  UNPARTITIONED
02:AGGREGATE3   75.795us  83.969us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE31s608ms   2s683ms   2.00K  -1   1.02 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  826.683ms   1s322ms   6.00K  -1 0
  0  HASH(c1)
01:AGGREGATE32s457ms   2s672ms   6.00K  -1   3.14 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   81.514ms  89.056ms  65.54M  -1  25.94 MB   
64.00 MB  tpcds_10_parquet.benchmark

Memory Benchmark (about 12 elements per bucket):

SELECT MAX(a) FROM (
  SELECT ss_customer_sk, APPX_MEDIAN(ss_sold_date_sk) as a
  FROM tpcds_parquet.store_sales
  GROUP BY ss_customer_sk) t

BEFORE: 7s477ms
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail
-
06:AGGREGATE1  114.686us  114.686us1   1  28.00 KB  
  -1.00 B  FINALIZE
05:EXCHANGE 1   18.214us   18.214us3   1 0  
  -1.00 B  UNPARTITIONED
02:AGGREGATE3  147.055us  165.464us3   1  28.00 KB  
 10.00 MB
04:AGGREGATE32s043ms2s147ms   14.82K  -1   4.94 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  840.528ms  943.254ms   15.61K  -1 0  
0  HASH(ss_customer_sk)
01:AGGREGATE31s769ms1s869ms   15.61K  -1   5.32 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   17.941ms   37.109ms  183.59K  -1   1.94 MB  
 16.00 MB  tpcds_parquet.store_sales

AFTER: 434ms
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail
-
06:AGGREGATE1  125.915us  125.915us1   1  28.00 KB  
  -1.00 B  FINALIZE
05:EXCHANGE 1   72.179us   72.179us3   1 0  
  -1.00 B  UNPARTITIONED
02:AGGREGATE3   79.054us   83.385us3   1  28.00 KB  
 10.00 MB
04:AGGREGATE36.559ms7.669ms   14.82K  -1  17.32 MB  
128.00 MB  FINALIZE
03:EXCHANGE 3   67.370us   85.068us   15.60K  -1 0  
0  HASH(ss_customer_sk)
01:AGGREGATE3   19.245ms   24.472ms   15.60K  -1   9.48 MB  
128.00 MB  STREAMING
00:SCAN

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-03-06 Thread Matthew Jacobs (Code Review)

Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 13:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/6025/13/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

PS13, Line 1075: capacity_ = num_samples_;
   : sample_array_inline_ = true;
as we discussed in person, let's move this to serialize which is when this 
happens


PS13, Line 1091: int target_capacity = INIT_CAPACITY;
   : while (target_capacity < necessary_capacity) 
target_capacity *= 2;
BitUtil::RoundUpToPowerOfTwo


PS13, Line 1093: if (target_capacity > MAX_CAPACITY) target_capacity = 
MAX_CAPACITY;
   : bool result = IncreaseCapacity(ctx, target_capacity);
this is simplified if IncreaseCapacity takes the max and handles the ceiling


PS13, Line 1163: may
may be


PS13, Line 1168:   // if it hasn't been allocated yet. Returns false if the 
operation fails.
   :   bool IncreaseCapacity(FunctionContext* ctx, int 
new_capacity) {
If you change this to take max_capacity (which always should pass the constant 
MAX_CAPACITY but makes the interface to this fn reasonable) and do the ceiling 
in here, then you don't really need DoubleCapacity, and you can simplify some 
logic in Merge as well.


-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-03-03 Thread Taras Bobrovytsky (Code Review)

Hello Matthew Jacobs, Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6025

to look at the new patch set (#13).

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..

IMPALA-4787: Optimize APPX_MEDIAN() memory usage

Before this change, ReservoirSample functions (such as APPX_MEDIAN())
allocated memory for 20,000 elements up front per grouping key. This
caused inefficient memory usage for aggregations with many grouping
keys.

This patch fixes this by initially allocating memory for 16 elements.
Once the buffer becomes full, we reallocate a new buffer with double
capacity and copy the original buffer into the new one. We continue
doubling the buffer size until the buffer has room for 20,000 elements
as before.

Testing:
Added some EE APPX_MEDIAN() tests on larger datasets that exercise the
resize code path.

Perf Benchrmark (about 35,000 elements per bucket):

SELECT MAX(a) from (
  SELECT c1, appx_median(c2) as a FROM benchmark GROUP BY c1) t

BEFORE: 11s067ms
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
-
06:AGGREGATE1  124.726us  124.726us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   29.544us   29.544us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   86.406us  120.372us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE31s840ms2s824ms   2.00K  -1   1.02 GB  
128.00 MB  FINALIZE
03:EXCHANGE 31s163ms1s989ms   6.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE33s356ms3s416ms   6.00K  -1   1.95 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   64.962ms   65.490ms  65.54M  -1  25.97 MB   
64.00 MB  tpcds_10_parquet.benchmark

AFTER: 9s465ms
Operator   #Hosts   Avg Time  Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail

06:AGGREGATE1   73.961us  73.961us   1   1  28.00 KB
-1.00 B  FINALIZE
05:EXCHANGE 1   18.101us  18.101us   3   1 0
-1.00 B  UNPARTITIONED
02:AGGREGATE3   75.795us  83.969us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE31s608ms   2s683ms   2.00K  -1   1.02 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  826.683ms   1s322ms   6.00K  -1 0
  0  HASH(c1)
01:AGGREGATE32s457ms   2s672ms   6.00K  -1   3.14 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   81.514ms  89.056ms  65.54M  -1  25.94 MB   
64.00 MB  tpcds_10_parquet.benchmark

Memory Benchmark (about 12 elements per bucket):

SELECT MAX(a) FROM (
  SELECT ss_customer_sk, APPX_MEDIAN(ss_sold_date_sk) as a
  FROM tpcds_parquet.store_sales
  GROUP BY ss_customer_sk) t

BEFORE: 7s477ms
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail
-
06:AGGREGATE1  114.686us  114.686us1   1  28.00 KB  
  -1.00 B  FINALIZE
05:EXCHANGE 1   18.214us   18.214us3   1 0  
  -1.00 B  UNPARTITIONED
02:AGGREGATE3  147.055us  165.464us3   1  28.00 KB  
 10.00 MB
04:AGGREGATE32s043ms2s147ms   14.82K  -1   4.94 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  840.528ms  943.254ms   15.61K  -1 0  
0  HASH(ss_customer_sk)
01:AGGREGATE31s769ms1s869ms   15.61K  -1   5.32 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   17.941ms   37.109ms  183.59K  -1   1.94 MB  
 16.00 MB  tpcds_parquet.store_sales

AFTER: 434ms
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail
-
06:AGGREGATE1  125.915us  125.915us1   1  28.00 KB  
  -1.00 B  FINALIZE
05:EXCHANGE 1   72.179us   72.179us3   1 0  
  -1.00 B  UNPARTITIONED
02:AGGREGATE3   79.054us   83.385us3   1  28.00 KB  
 10.00 MB
04:AGGREGATE36.559ms7.669ms   14.82K  -1  17.32 MB  
128.00 MB  FINALIZE
03:EXCHANGE 3   67.370us   85.068us   15.60K  -1 0  
0  HASH(ss_customer_sk)
01:AGGREGATE3   19.245ms   24.472ms   15.60K  -1   9.48 MB  
128.00 MB  STREAMING
00:SCAN

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-03-03 Thread Taras Bobrovytsky (Code Review)

Taras Bobrovytsky has uploaded a new patch set (#13).

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..

IMPALA-4787: Optimize APPX_MEDIAN() memory usage

Before this change, ReservoirSample functions (such as APPX_MEDIAN())
allocated memory for 20,000 elements up front per grouping key. This
caused inefficient memory usage for aggregations with many grouping
keys.

This patch fixes this by initially allocating memory for 16 elements.
Once the buffer becomes full, we reallocate a new buffer with double
capacity and copy the original buffer into the new one. We continue
doubling the buffer size until the buffer has room for 20,000 elements
as before.

Testing:
Added some EE APPX_MEDIAN() tests on larger datasets that exercise the
resize code path.

Perf Benchrmark (about 35,000 elements per bucket):

SELECT MAX(a) from (
  SELECT c1, appx_median(c2) as a FROM benchmark GROUP BY c1) t

BEFORE: 11s067ms
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
-
06:AGGREGATE1  124.726us  124.726us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   29.544us   29.544us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   86.406us  120.372us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE31s840ms2s824ms   2.00K  -1   1.02 GB  
128.00 MB  FINALIZE
03:EXCHANGE 31s163ms1s989ms   6.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE33s356ms3s416ms   6.00K  -1   1.95 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   64.962ms   65.490ms  65.54M  -1  25.97 MB   
64.00 MB  tpcds_10_parquet.benchmark

AFTER: 9s465ms
Operator   #Hosts   Avg Time  Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail

06:AGGREGATE1   73.961us  73.961us   1   1  28.00 KB
-1.00 B  FINALIZE
05:EXCHANGE 1   18.101us  18.101us   3   1 0
-1.00 B  UNPARTITIONED
02:AGGREGATE3   75.795us  83.969us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE31s608ms   2s683ms   2.00K  -1   1.02 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  826.683ms   1s322ms   6.00K  -1 0
  0  HASH(c1)
01:AGGREGATE32s457ms   2s672ms   6.00K  -1   3.14 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   81.514ms  89.056ms  65.54M  -1  25.94 MB   
64.00 MB  tpcds_10_parquet.benchmark

Memory Benchmark (about 12 elements per bucket):

SELECT MAX(a) FROM (
  SELECT ss_customer_sk, APPX_MEDIAN(ss_sold_date_sk) as a
  FROM tpcds_parquet.store_sales
  GROUP BY ss_customer_sk) t

BEFORE: 7s477ms
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail
-
06:AGGREGATE1  114.686us  114.686us1   1  28.00 KB  
  -1.00 B  FINALIZE
05:EXCHANGE 1   18.214us   18.214us3   1 0  
  -1.00 B  UNPARTITIONED
02:AGGREGATE3  147.055us  165.464us3   1  28.00 KB  
 10.00 MB
04:AGGREGATE32s043ms2s147ms   14.82K  -1   4.94 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  840.528ms  943.254ms   15.61K  -1 0  
0  HASH(ss_customer_sk)
01:AGGREGATE31s769ms1s869ms   15.61K  -1   5.32 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   17.941ms   37.109ms  183.59K  -1   1.94 MB  
 16.00 MB  tpcds_parquet.store_sales

AFTER: 434ms
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail
-
06:AGGREGATE1  125.915us  125.915us1   1  28.00 KB  
  -1.00 B  FINALIZE
05:EXCHANGE 1   72.179us   72.179us3   1 0  
  -1.00 B  UNPARTITIONED
02:AGGREGATE3   79.054us   83.385us3   1  28.00 KB  
 10.00 MB
04:AGGREGATE36.559ms7.669ms   14.82K  -1  17.32 MB  
128.00 MB  FINALIZE
03:EXCHANGE 3   67.370us   85.068us   15.60K  -1 0  
0  HASH(ss_customer_sk)
01:AGGREGATE3   19.245ms   24.472ms   15.60K  -1   9.48 MB  
128.00 MB  STREAMING
00:SCAN HDFS3   53.173ms   55.844ms  183.59K  -1   1.18 MB  
 16.00 MB  tpcds_parquet.store_sales

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-03-03 Thread Taras Bobrovytsky (Code Review)

Hello Matthew Jacobs, Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6025

to look at the new patch set (#13).

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..

IMPALA-4787: Optimize APPX_MEDIAN() memory usage

Before this change, ReservoirSample functions (such as APPX_MEDIAN())
allocated memory for 20,000 elements up front per grouping key. This
caused inefficient memory usage for aggregations with many grouping
keys.

This patch fixes this by initially allocating memory for 16 elements.
Once the buffer becomes full, we reallocate a new buffer with double
capacity and copy the original buffer into the new one. We continue
doubling the buffer size until the buffer has room for 20,000 elements
as before.

Testing:
Added some EE APPX_MEDIAN() tests on larger datasets that exercise the
resize code path.

Perf Benchrmark (about 35,000 elements per bucket):

SELECT MAX(a) from (
  SELECT c1, appx_median(c2) as a FROM benchmark GROUP BY c1) t

BEFORE: 11s067ms
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
-
06:AGGREGATE1  124.726us  124.726us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   29.544us   29.544us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   86.406us  120.372us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE31s840ms2s824ms   2.00K  -1   1.02 GB  
128.00 MB  FINALIZE
03:EXCHANGE 31s163ms1s989ms   6.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE33s356ms3s416ms   6.00K  -1   1.95 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   64.962ms   65.490ms  65.54M  -1  25.97 MB   
64.00 MB  tpcds_10_parquet.benchmark

AFTER: 9s465ms
Operator   #Hosts   Avg Time  Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail

06:AGGREGATE1   73.961us  73.961us   1   1  28.00 KB
-1.00 B  FINALIZE
05:EXCHANGE 1   18.101us  18.101us   3   1 0
-1.00 B  UNPARTITIONED
02:AGGREGATE3   75.795us  83.969us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE31s608ms   2s683ms   2.00K  -1   1.02 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  826.683ms   1s322ms   6.00K  -1 0
  0  HASH(c1)
01:AGGREGATE32s457ms   2s672ms   6.00K  -1   3.14 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   81.514ms  89.056ms  65.54M  -1  25.94 MB   
64.00 MB  tpcds_10_parquet.benchmark

Memory Benchmark (about 12 elements per bucket):

SELECT MAX(a) FROM (
  SELECT ss_customer_sk, APPX_MEDIAN(ss_sold_date_sk) as a
  FROM tpcds_parquet.store_sales
  GROUP BY ss_customer_sk) t

BEFORE: 7s477ms
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail
-
06:AGGREGATE1  114.686us  114.686us1   1  28.00 KB  
  -1.00 B  FINALIZE
05:EXCHANGE 1   18.214us   18.214us3   1 0  
  -1.00 B  UNPARTITIONED
02:AGGREGATE3  147.055us  165.464us3   1  28.00 KB  
 10.00 MB
04:AGGREGATE32s043ms2s147ms   14.82K  -1   4.94 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  840.528ms  943.254ms   15.61K  -1 0  
0  HASH(ss_customer_sk)
01:AGGREGATE31s769ms1s869ms   15.61K  -1   5.32 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   17.941ms   37.109ms  183.59K  -1   1.94 MB  
 16.00 MB  tpcds_parquet.store_sales

AFTER: 434ms
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail
-
06:AGGREGATE1  125.915us  125.915us1   1  28.00 KB  
  -1.00 B  FINALIZE
05:EXCHANGE 1   72.179us   72.179us3   1 0  
  -1.00 B  UNPARTITIONED
02:AGGREGATE3   79.054us   83.385us3   1  28.00 KB  
 10.00 MB
04:AGGREGATE36.559ms7.669ms   14.82K  -1  17.32 MB  
128.00 MB  FINALIZE
03:EXCHANGE 3   67.370us   85.068us   15.60K  -1 0  
0  HASH(ss_customer_sk)
01:AGGREGATE3   19.245ms   24.472ms   15.60K  -1   9.48 MB  
128.00 MB  STREAMING
00:SCAN

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-03-03 Thread Taras Bobrovytsky (Code Review)

Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 12:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/6025/12/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

Line 985: 
> nit extra newline
Done


PS12, Line 988: capacity_(0),
> the logic around handling 0 seems more complicated than necessary. can you 
Done


PS12, Line 1061:   // Should be called before use after Serialize().
   :   void Deserialize() {
> comment is confusing, seems contradictory? I think we can remove this
Updated the comment. I don't think we can remove this, explained in the comment 
below.


PS12, Line 1063: samples_ = reinterpret_cast(this + 1);
   : capacity_ = num_samples_;
   : sample_array_separate_ = false;
> I don't see why this is necessary - can't you set the properties you need b
Serialize gets called, then the resulting buffer gets transferred to a 
different impalad in the cluster. The pointer no longer makes sense on the new 
machine because the location in memory is different. (Am I thinking about this 
correctly?)


PS12, Line 1138: 
   :   // True if the array of samples is in a separate memory 
allocation. This object is
   :   // responsible for freeing it if true.
   :   bool sample_array_separate_;
> how about calling this samples_inline_ ?
Done


Line 1150:   // Array of ReservoirSamples.
> comment that this may be inline or a pointer to a separately allocated arra
Done


PS12, Line 1153: <
> nit: we don't wrap args with braces in comments
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-03-03 Thread Taras Bobrovytsky (Code Review)

Hello Matthew Jacobs, Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6025

to look at the new patch set (#13).

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..

IMPALA-4787: Optimize APPX_MEDIAN() memory usage

Before this change, ReservoirSample functions (such as APPX_MEDIAN())
allocated memory for 20,000 elements up front per grouping key. This
caused inefficient memory usage for aggregations with many grouping
keys.

This patch fixes this by initially allocating memory for 16 elements.
Once the buffer becomes full, we reallocate a new buffer with double
capacity and copy the original buffer into the new one. We continue
doubling the buffer size until the buffer has room for 20,000 elements
as before.

Testing:
Added some EE APPX_MEDIAN() tests on larger datasets that exercise the
resize code path.

Perf Benchrmark (about 35,000 elements per bucket):

SELECT MAX(a) from (
  SELECT c1, appx_median(c2) as a FROM benchmark GROUP BY c1) t

BEFORE: 11s067ms
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
-
06:AGGREGATE1  124.726us  124.726us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   29.544us   29.544us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   86.406us  120.372us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE31s840ms2s824ms   2.00K  -1   1.02 GB  
128.00 MB  FINALIZE
03:EXCHANGE 31s163ms1s989ms   6.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE33s356ms3s416ms   6.00K  -1   1.95 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   64.962ms   65.490ms  65.54M  -1  25.97 MB   
64.00 MB  tpcds_10_parquet.benchmark

AFTER: 9s465ms
Operator   #Hosts   Avg Time  Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail

06:AGGREGATE1   73.961us  73.961us   1   1  28.00 KB
-1.00 B  FINALIZE
05:EXCHANGE 1   18.101us  18.101us   3   1 0
-1.00 B  UNPARTITIONED
02:AGGREGATE3   75.795us  83.969us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE31s608ms   2s683ms   2.00K  -1   1.02 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  826.683ms   1s322ms   6.00K  -1 0
  0  HASH(c1)
01:AGGREGATE32s457ms   2s672ms   6.00K  -1   3.14 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   81.514ms  89.056ms  65.54M  -1  25.94 MB   
64.00 MB  tpcds_10_parquet.benchmark

Memory Benchmark (about 12 elements per bucket):

SELECT MAX(a) FROM (
  SELECT ss_customer_sk, APPX_MEDIAN(ss_sold_date_sk) as a
  FROM tpcds_parquet.store_sales
  GROUP BY ss_customer_sk) t

BEFORE: 7s477ms
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail
-
06:AGGREGATE1  114.686us  114.686us1   1  28.00 KB  
  -1.00 B  FINALIZE
05:EXCHANGE 1   18.214us   18.214us3   1 0  
  -1.00 B  UNPARTITIONED
02:AGGREGATE3  147.055us  165.464us3   1  28.00 KB  
 10.00 MB
04:AGGREGATE32s043ms2s147ms   14.82K  -1   4.94 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  840.528ms  943.254ms   15.61K  -1 0  
0  HASH(ss_customer_sk)
01:AGGREGATE31s769ms1s869ms   15.61K  -1   5.32 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   17.941ms   37.109ms  183.59K  -1   1.94 MB  
 16.00 MB  tpcds_parquet.store_sales

AFTER: 434ms
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail
-
06:AGGREGATE1  125.915us  125.915us1   1  28.00 KB  
  -1.00 B  FINALIZE
05:EXCHANGE 1   72.179us   72.179us3   1 0  
  -1.00 B  UNPARTITIONED
02:AGGREGATE3   79.054us   83.385us3   1  28.00 KB  
 10.00 MB
04:AGGREGATE36.559ms7.669ms   14.82K  -1  17.32 MB  
128.00 MB  FINALIZE
03:EXCHANGE 3   67.370us   85.068us   15.60K  -1 0  
0  HASH(ss_customer_sk)
01:AGGREGATE3   19.245ms   24.472ms   15.60K  -1   9.48 MB  
128.00 MB  STREAMING
00:SCAN

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-03-03 Thread Matthew Jacobs (Code Review)

Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 12:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/6025/12/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

Line 985: 
nit extra newline


PS12, Line 988: capacity_(0),
the logic around handling 0 seems more complicated than necessary. can you just 
use INIT_CAPACITY and do the allocation in the constructor?


PS12, Line 1061:   // Should be called before use after Serialize().
   :   void Deserialize() {
comment is confusing, seems contradictory? I think we can remove this


PS12, Line 1063: samples_ = reinterpret_cast(this + 1);
   : capacity_ = num_samples_;
   : sample_array_separate_ = false;
I don't see why this is necessary - can't you set the properties you need 
before returning a copy in Serialize?


PS12, Line 1138: 
   :   // True if the array of samples is in a separate memory 
allocation. This object is
   :   // responsible for freeing it if true.
   :   bool sample_array_separate_;
how about calling this samples_inline_ ?

also can you put this right above samples_ to keep parameters grouped together.


Line 1150:   // Array of ReservoirSamples.
comment that this may be inline or a pointer to a separately allocated array


PS12, Line 1153: <
nit: we don't wrap args with braces in comments


-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-03-03 Thread Taras Bobrovytsky (Code Review)

Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 12:

Added skipping doublings in the merge case.

-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-03-03 Thread Taras Bobrovytsky (Code Review)

Taras Bobrovytsky has uploaded a new patch set (#12).

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..

IMPALA-4787: Optimize APPX_MEDIAN() memory usage

Before this change, ReservoirSample functions (such as APPX_MEDIAN())
allocated memory for 20,000 elements up front per grouping key. This
caused inefficient memory usage for aggregations with many grouping
keys.

This patch fixes this by initially allocating memory for 16 elements.
Once the buffer becomes full, we reallocate a new buffer with double
capacity and copy the original buffer into the new one. We continue
doubling the buffer size until the buffer has room for 20,000 elements
as before.

Testing:
Added some EE APPX_MEDIAN() tests on larger datasets that exercise the
resize code path.

Perf Benchrmark (about 35,000 elements per bucket):

SELECT MAX(a) from (
  SELECT c1, appx_median(c2) as a FROM benchmark GROUP BY c1) t

BEFORE: 11s067ms
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
-
06:AGGREGATE1  124.726us  124.726us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   29.544us   29.544us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   86.406us  120.372us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE31s840ms2s824ms   2.00K  -1   1.02 GB  
128.00 MB  FINALIZE
03:EXCHANGE 31s163ms1s989ms   6.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE33s356ms3s416ms   6.00K  -1   1.95 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   64.962ms   65.490ms  65.54M  -1  25.97 MB   
64.00 MB  tpcds_10_parquet.benchmark

AFTER: 9s465ms
Operator   #Hosts   Avg Time  Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail

06:AGGREGATE1   73.961us  73.961us   1   1  28.00 KB
-1.00 B  FINALIZE
05:EXCHANGE 1   18.101us  18.101us   3   1 0
-1.00 B  UNPARTITIONED
02:AGGREGATE3   75.795us  83.969us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE31s608ms   2s683ms   2.00K  -1   1.02 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  826.683ms   1s322ms   6.00K  -1 0
  0  HASH(c1)
01:AGGREGATE32s457ms   2s672ms   6.00K  -1   3.14 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   81.514ms  89.056ms  65.54M  -1  25.94 MB   
64.00 MB  tpcds_10_parquet.benchmark

Memory Benchmark (about 12 elements per bucket):

SELECT MAX(a) FROM (
  SELECT ss_customer_sk, APPX_MEDIAN(ss_sold_date_sk) as a
  FROM tpcds_parquet.store_sales
  GROUP BY ss_customer_sk) t

BEFORE: 7s477ms
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail
-
06:AGGREGATE1  114.686us  114.686us1   1  28.00 KB  
  -1.00 B  FINALIZE
05:EXCHANGE 1   18.214us   18.214us3   1 0  
  -1.00 B  UNPARTITIONED
02:AGGREGATE3  147.055us  165.464us3   1  28.00 KB  
 10.00 MB
04:AGGREGATE32s043ms2s147ms   14.82K  -1   4.94 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  840.528ms  943.254ms   15.61K  -1 0  
0  HASH(ss_customer_sk)
01:AGGREGATE31s769ms1s869ms   15.61K  -1   5.32 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   17.941ms   37.109ms  183.59K  -1   1.94 MB  
 16.00 MB  tpcds_parquet.store_sales

AFTER: 434ms
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail
-
06:AGGREGATE1  125.915us  125.915us1   1  28.00 KB  
  -1.00 B  FINALIZE
05:EXCHANGE 1   72.179us   72.179us3   1 0  
  -1.00 B  UNPARTITIONED
02:AGGREGATE3   79.054us   83.385us3   1  28.00 KB  
 10.00 MB
04:AGGREGATE36.559ms7.669ms   14.82K  -1  17.32 MB  
128.00 MB  FINALIZE
03:EXCHANGE 3   67.370us   85.068us   15.60K  -1 0  
0  HASH(ss_customer_sk)
01:AGGREGATE3   19.245ms   24.472ms   15.60K  -1   9.48 MB  
128.00 MB  STREAMING
00:SCAN HDFS3   53.173ms   55.844ms  183.59K  -1   1.18 MB  
 16.00 MB  tpcds_parquet.store_sales

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-03-03 Thread Taras Bobrovytsky (Code Review)

Hello Matthew Jacobs, Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6025

to look at the new patch set (#12).

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..

IMPALA-4787: Optimize APPX_MEDIAN() memory usage

Before this change, ReservoirSample functions (such as APPX_MEDIAN())
allocated memory for 20,000 elements up front per grouping key. This
caused inefficient memory usage for aggregations with many grouping
keys.

This patch fixes this by initially allocating memory for 16 elements.
Once the buffer becomes full, we reallocate a new buffer with double
capacity and copy the original buffer into the new one. We continue
doubling the buffer size until the buffer has room for 20,000 elements
as before.

Testing:
Added some EE APPX_MEDIAN() tests on larger datasets that exercise the
resize code path.

Perf Benchrmark (about 35,000 elements per bucket):

SELECT MAX(a) from (
  SELECT c1, appx_median(c2) as a FROM benchmark GROUP BY c1) t

BEFORE: 11s067ms
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
-
06:AGGREGATE1  124.726us  124.726us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   29.544us   29.544us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   86.406us  120.372us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE31s840ms2s824ms   2.00K  -1   1.02 GB  
128.00 MB  FINALIZE
03:EXCHANGE 31s163ms1s989ms   6.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE33s356ms3s416ms   6.00K  -1   1.95 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   64.962ms   65.490ms  65.54M  -1  25.97 MB   
64.00 MB  tpcds_10_parquet.benchmark

AFTER: 9s465ms
Operator   #Hosts   Avg Time  Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail

06:AGGREGATE1   73.961us  73.961us   1   1  28.00 KB
-1.00 B  FINALIZE
05:EXCHANGE 1   18.101us  18.101us   3   1 0
-1.00 B  UNPARTITIONED
02:AGGREGATE3   75.795us  83.969us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE31s608ms   2s683ms   2.00K  -1   1.02 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  826.683ms   1s322ms   6.00K  -1 0
  0  HASH(c1)
01:AGGREGATE32s457ms   2s672ms   6.00K  -1   3.14 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   81.514ms  89.056ms  65.54M  -1  25.94 MB   
64.00 MB  tpcds_10_parquet.benchmark

Memory Benchmark (about 12 elements per bucket):

SELECT MAX(a) FROM (
  SELECT ss_customer_sk, APPX_MEDIAN(ss_sold_date_sk) as a
  FROM tpcds_parquet.store_sales
  GROUP BY ss_customer_sk) t

BEFORE: 7s477ms
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail
-
06:AGGREGATE1  114.686us  114.686us1   1  28.00 KB  
  -1.00 B  FINALIZE
05:EXCHANGE 1   18.214us   18.214us3   1 0  
  -1.00 B  UNPARTITIONED
02:AGGREGATE3  147.055us  165.464us3   1  28.00 KB  
 10.00 MB
04:AGGREGATE32s043ms2s147ms   14.82K  -1   4.94 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  840.528ms  943.254ms   15.61K  -1 0  
0  HASH(ss_customer_sk)
01:AGGREGATE31s769ms1s869ms   15.61K  -1   5.32 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   17.941ms   37.109ms  183.59K  -1   1.94 MB  
 16.00 MB  tpcds_parquet.store_sales

AFTER: 434ms
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail
-
06:AGGREGATE1  125.915us  125.915us1   1  28.00 KB  
  -1.00 B  FINALIZE
05:EXCHANGE 1   72.179us   72.179us3   1 0  
  -1.00 B  UNPARTITIONED
02:AGGREGATE3   79.054us   83.385us3   1  28.00 KB  
 10.00 MB
04:AGGREGATE36.559ms7.669ms   14.82K  -1  17.32 MB  
128.00 MB  FINALIZE
03:EXCHANGE 3   67.370us   85.068us   15.60K  -1 0  
0  HASH(ss_customer_sk)
01:AGGREGATE3   19.245ms   24.472ms   15.60K  -1   9.48 MB  
128.00 MB  STREAMING
00:SCAN

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-03-03 Thread Matthew Jacobs (Code Review)

Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 11:

> After talking to Alex, I completely rewrote the patch and reran the
 > benchmarks. The state class no longer has a trailing var-len array.
 > The samples array is stored in a separate memory allocation. I
 > think this approach is much more straightforward and is simpler to
 > understand.

Just took a quick look but I agree this looks more straightforward. I'll do a 
closer pass later.

-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-03-02 Thread Taras Bobrovytsky (Code Review)

Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 11:

After talking to Alex, I completely rewrote the patch and reran the benchmarks. 
The state class no longer has a trailing var-len array. The samples array is 
stored in a separate memory allocation. I think this approach is much more 
straightforward and is simpler to understand.

-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-03-02 Thread Taras Bobrovytsky (Code Review)

Taras Bobrovytsky has uploaded a new patch set (#11).

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..

IMPALA-4787: Optimize APPX_MEDIAN() memory usage

Before this change, ReservoirSample functions (such as APPX_MEDIAN())
allocated memory for 20,000 elements up front per grouping key. This
caused inefficient memory usage for aggregations with many grouping
keys.

This patch fixes this by initially allocating memory for 16 elements.
Once the buffer becomes full, we reallocate a new buffer with double
capacity and copy the original buffer into the new one. We continue
doubling the buffer size until the buffer has room for 20,000 elements
as before.

Testing:
Added some EE APPX_MEDIAN() tests on larger datasets that exercise the
resize code path.

Perf Benchrmark (about 35,000 elements per bucket):

SELECT MAX(a) from (
  SELECT c1, appx_median(c2) as a FROM benchmark GROUP BY c1) t

BEFORE: 11s067ms
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
-
06:AGGREGATE1  124.726us  124.726us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   29.544us   29.544us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   86.406us  120.372us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE31s840ms2s824ms   2.00K  -1   1.02 GB  
128.00 MB  FINALIZE
03:EXCHANGE 31s163ms1s989ms   6.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE33s356ms3s416ms   6.00K  -1   1.95 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   64.962ms   65.490ms  65.54M  -1  25.97 MB   
64.00 MB  tpcds_10_parquet.benchmark

AFTER: 9s465ms
Operator   #Hosts   Avg Time  Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail

06:AGGREGATE1   73.961us  73.961us   1   1  28.00 KB
-1.00 B  FINALIZE
05:EXCHANGE 1   18.101us  18.101us   3   1 0
-1.00 B  UNPARTITIONED
02:AGGREGATE3   75.795us  83.969us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE31s608ms   2s683ms   2.00K  -1   1.02 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  826.683ms   1s322ms   6.00K  -1 0
  0  HASH(c1)
01:AGGREGATE32s457ms   2s672ms   6.00K  -1   3.14 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   81.514ms  89.056ms  65.54M  -1  25.94 MB   
64.00 MB  tpcds_10_parquet.benchmark

Memory Benchmark (about 12 elements per bucket):

SELECT MAX(a) FROM (
  SELECT ss_customer_sk, APPX_MEDIAN(ss_sold_date_sk) as a
  FROM tpcds_parquet.store_sales
  GROUP BY ss_customer_sk) t

BEFORE: 7s477ms
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail
-
06:AGGREGATE1  114.686us  114.686us1   1  28.00 KB  
  -1.00 B  FINALIZE
05:EXCHANGE 1   18.214us   18.214us3   1 0  
  -1.00 B  UNPARTITIONED
02:AGGREGATE3  147.055us  165.464us3   1  28.00 KB  
 10.00 MB
04:AGGREGATE32s043ms2s147ms   14.82K  -1   4.94 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  840.528ms  943.254ms   15.61K  -1 0  
0  HASH(ss_customer_sk)
01:AGGREGATE31s769ms1s869ms   15.61K  -1   5.32 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   17.941ms   37.109ms  183.59K  -1   1.94 MB  
 16.00 MB  tpcds_parquet.store_sales

AFTER: 434ms
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail
-
06:AGGREGATE1  125.915us  125.915us1   1  28.00 KB  
  -1.00 B  FINALIZE
05:EXCHANGE 1   72.179us   72.179us3   1 0  
  -1.00 B  UNPARTITIONED
02:AGGREGATE3   79.054us   83.385us3   1  28.00 KB  
 10.00 MB
04:AGGREGATE36.559ms7.669ms   14.82K  -1  17.32 MB  
128.00 MB  FINALIZE
03:EXCHANGE 3   67.370us   85.068us   15.60K  -1 0  
0  HASH(ss_customer_sk)
01:AGGREGATE3   19.245ms   24.472ms   15.60K  -1   9.48 MB  
128.00 MB  STREAMING
00:SCAN HDFS3   53.173ms   55.844ms  183.59K  -1   1.18 MB  
 16.00 MB  tpcds_parquet.store_sales

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-03-02 Thread Taras Bobrovytsky (Code Review)

Hello Matthew Jacobs, Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6025

to look at the new patch set (#11).

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..

IMPALA-4787: Optimize APPX_MEDIAN() memory usage

Before this change, ReservoirSample functions (such as APPX_MEDIAN())
allocated memory for 20,000 elements up front per grouping key. This
caused inefficient memory usage for aggregations with many grouping
keys.

This patch fixes this by initially allocating memory for 16 elements.
Once the buffer becomes full, we reallocate a new buffer with double
capacity and copy the original buffer into the new one. We continue
doubling the buffer size until the buffer has room for 20,000 elements
as before.

Testing:
Added some EE APPX_MEDIAN() tests on larger datasets that exercise the
resize code path.

Perf Benchrmark (about 35,000 elements per bucket):

SELECT MAX(a) from (
  SELECT c1, appx_median(c2) as a FROM benchmark GROUP BY c1) t

BEFORE: 11s067ms
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
-
06:AGGREGATE1  124.726us  124.726us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   29.544us   29.544us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   86.406us  120.372us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE31s840ms2s824ms   2.00K  -1   1.02 GB  
128.00 MB  FINALIZE
03:EXCHANGE 31s163ms1s989ms   6.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE33s356ms3s416ms   6.00K  -1   1.95 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   64.962ms   65.490ms  65.54M  -1  25.97 MB   
64.00 MB  tpcds_10_parquet.benchmark

AFTER: 9s465ms
Operator   #Hosts   Avg Time  Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail

06:AGGREGATE1   73.961us  73.961us   1   1  28.00 KB
-1.00 B  FINALIZE
05:EXCHANGE 1   18.101us  18.101us   3   1 0
-1.00 B  UNPARTITIONED
02:AGGREGATE3   75.795us  83.969us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE31s608ms   2s683ms   2.00K  -1   1.02 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  826.683ms   1s322ms   6.00K  -1 0
  0  HASH(c1)
01:AGGREGATE32s457ms   2s672ms   6.00K  -1   3.14 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   81.514ms  89.056ms  65.54M  -1  25.94 MB   
64.00 MB  tpcds_10_parquet.benchmark

Memory Benchmark (about 12 elements per bucket):

SELECT MAX(a) FROM (
  SELECT ss_customer_sk, APPX_MEDIAN(ss_sold_date_sk) as a
  FROM tpcds_parquet.store_sales
  GROUP BY ss_customer_sk) t

BEFORE: 7s477ms
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail
-
06:AGGREGATE1  114.686us  114.686us1   1  28.00 KB  
  -1.00 B  FINALIZE
05:EXCHANGE 1   18.214us   18.214us3   1 0  
  -1.00 B  UNPARTITIONED
02:AGGREGATE3  147.055us  165.464us3   1  28.00 KB  
 10.00 MB
04:AGGREGATE32s043ms2s147ms   14.82K  -1   4.94 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  840.528ms  943.254ms   15.61K  -1 0  
0  HASH(ss_customer_sk)
01:AGGREGATE31s769ms1s869ms   15.61K  -1   5.32 GB  
128.00 MB  STREAMING
00:SCAN HDFS3   17.941ms   37.109ms  183.59K  -1   1.94 MB  
 16.00 MB  tpcds_parquet.store_sales

AFTER: 434ms
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail
-
06:AGGREGATE1  125.915us  125.915us1   1  28.00 KB  
  -1.00 B  FINALIZE
05:EXCHANGE 1   72.179us   72.179us3   1 0  
  -1.00 B  UNPARTITIONED
02:AGGREGATE3   79.054us   83.385us3   1  28.00 KB  
 10.00 MB
04:AGGREGATE36.559ms7.669ms   14.82K  -1  17.32 MB  
128.00 MB  FINALIZE
03:EXCHANGE 3   67.370us   85.068us   15.60K  -1 0  
0  HASH(ss_customer_sk)
01:AGGREGATE3   19.245ms   24.472ms   15.60K  -1   9.48 MB  
128.00 MB  STREAMING
00:SCAN

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-03-02 Thread Taras Bobrovytsky (Code Review)

Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 10:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6025/10/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

Line 957: // We allocate a contiguous chunk of memory for ReservoirSampleState 
and an array of
> the initial class comment should describe what it is. also, the array is re
Completely rewrote patch. I think it's clearer now and more encapsulated.


Line 1018:   void increment_source_size() { source_size_++; }
> bad formatting: these functions aren't getters/setters.
Removed these


Line 1040:   // Trailing var-len array of ReservoirSamples.
> "The actual size is ."
This is not a trailing var-len array any more


-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-27 Thread Marcel Kornacker (Code Review)

Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 10:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6025/10/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

Line 957: // We allocate a contiguous chunk of memory for ReservoirSampleState 
and an array of
the initial class comment should describe what it is. also, the array is really 
part of the abstraction of that class.

the way this class is being used also doesn't make sense anymore (such as 
ReservoirSampleUpdate() reaching into the random number generator). please 
restructure.


Line 1018:   void increment_source_size() { source_size_++; }
bad formatting: these functions aren't getters/setters.


Line 1040:   // Trailing var-len array of ReservoirSamples.
"The actual size is ."


-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-27 Thread Taras Bobrovytsky (Code Review)

Taras Bobrovytsky has uploaded a new patch set (#10).

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..

IMPALA-4787: Optimize APPX_MEDIAN() memory usage

Before this change, ReservoirSample functions (such as APPX_MEDIAN())
allocated memory for 20,000 elements up front per grouping key. This
caused inefficient memory usage for aggregations with many grouping
keys.

This patch fixes this by initially allocating memory for 16 elements.
Once the buffer becomes full, we reallocate a new buffer with double
capacity and copy the original buffer into the new one. We continue
doubling the buffer size until the buffer has room for 20,000 elements
as before.

Testing:
Added some EE APPX_MEDIAN() tests on larger datasets that excercise the
resize code path.

Performance benchmark:
I ran a performance benchmark locally on release build. The following
query results in 3,000 grouping keys and about 30,000 values per key:
SELECT MAX(a) from (
  SELECT c1, appx_median(c2) as a FROM benchmark GROUP BY c1) t

Before: 11.57s
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
-
06:AGGREGATE1   96.086us   96.086us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   26.629us   26.629us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   68.187us   87.887us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE32s851ms5s362ms   3.00K  -1   1.95 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  119.540ms  220.191ms   9.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE35s876ms6s254ms   9.00K  -1   2.93 GB  
128.00 MB  STREAMING
00:SCAN HDFS3  127.834ms  146.842ms  98.30M  -1  19.80 MB   
32.00 MB  tpcds_10_parquet.benchmark

After:  13.58s
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
---
06:AGGREGATE1  101.101us  101.101us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   32.296us   32.296us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   83.284us  120.137us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE33s190ms6s555ms   3.00K  -1   1.96 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  247.897ms  497.280ms   9.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE37s370ms8s460ms   9.00K  -1   4.71 GB  
128.00 MB  STREAMING
00:SCAN HDFS3  111.721ms  122.306ms  98.30M  -1  19.94 MB   
32.00 MB  tpcds_10_parquet.benchmark

Memory benchmark:
The following query was used to verify that this patch reduces memory
usage:
SELECT APPX_MEDIAN(ss_sold_date_sk)
FROM tpcds.store_sales
GROUP BY ss_customer_sk;

Peak Mem in Agg nodes is reduced from 4.94 GB to 19.82 MB.

Summary before:
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail

04:EXCHANGE 15.856ms5.856ms   14.82K  15.21K 0  
  -1.00 B  UNPARTITIONED
03:AGGREGATE33s721ms3s789ms   14.82K  15.21K   4.94 GB  
 10.00 MB  FINALIZE
02:EXCHANGE 3  139.276ms  157.753ms   15.60K  15.21K 0  
0  HASH(ss_customer_sk)
01:AGGREGATE32s851ms3s026ms   15.60K  15.21K   5.29 GB  
 10.00 MB  STREAMING
00:SCAN HDFS3   24.245ms   35.727ms  183.59K 183.59K   4.60 MB  
384.00 MB  tpcds.store_sales

Summary after:
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail

04:EXCHANGE 1  288.125us  288.125us   14.82K  15.21K 0  
  -1.00 B  UNPARTITIONED
03:AGGREGATE39.358ms   10.982ms   14.82K  15.21K  19.82 MB  
 10.00 MB  FINALIZE
02:EXCHANGE 3  129.832us  154.953us   15.62K  15.21K 0  
0  HASH(ss_customer_sk)
01:AGGREGATE3   11.086ms   13.102ms   15.62K  15.21K   9.49 MB  
 10.00 MB  STREAMING
00:SCAN HDFS3   40.154ms   50.220ms  183.59K 183.59K   2.94 MB  
384.00 MB  tpcds.store_sales

Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
---
M be/src/exprs/aggregate-functions-ir.cc
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
M

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-27 Thread Taras Bobrovytsky (Code Review)

Hello Matthew Jacobs, Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6025

to look at the new patch set (#10).

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..

IMPALA-4787: Optimize APPX_MEDIAN() memory usage

Before this change, ReservoirSample functions (such as APPX_MEDIAN())
allocated memory for 20,000 elements up front per grouping key. This
caused inefficient memory usage for aggregations with many grouping
keys.

This patch fixes this by initially allocating memory for 16 elements.
Once the buffer becomes full, we reallocate a new buffer with double
capacity and copy the original buffer into the new one. We continue
doubling the buffer size until the buffer has room for 20,000 elements
as before.

Testing:
Added some EE APPX_MEDIAN() tests on larger datasets that excercise the
resize code path.

Performance benchmark:
I ran a performance benchmark locally on release build. The following
query results in 3,000 grouping keys and about 30,000 values per key:
SELECT MAX(a) from (
  SELECT c1, appx_median(c2) as a FROM benchmark GROUP BY c1) t

Before: 11.57s
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
-
06:AGGREGATE1   96.086us   96.086us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   26.629us   26.629us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   68.187us   87.887us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE32s851ms5s362ms   3.00K  -1   1.95 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  119.540ms  220.191ms   9.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE35s876ms6s254ms   9.00K  -1   2.93 GB  
128.00 MB  STREAMING
00:SCAN HDFS3  127.834ms  146.842ms  98.30M  -1  19.80 MB   
32.00 MB  tpcds_10_parquet.benchmark

After:  13.58s
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
---
06:AGGREGATE1  101.101us  101.101us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   32.296us   32.296us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   83.284us  120.137us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE33s190ms6s555ms   3.00K  -1   1.96 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  247.897ms  497.280ms   9.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE37s370ms8s460ms   9.00K  -1   4.71 GB  
128.00 MB  STREAMING
00:SCAN HDFS3  111.721ms  122.306ms  98.30M  -1  19.94 MB   
32.00 MB  tpcds_10_parquet.benchmark

Memory benchmark:
The following query was used to verify that this patch reduces memory
usage:
SELECT APPX_MEDIAN(ss_sold_date_sk)
FROM tpcds.store_sales
GROUP BY ss_customer_sk;

Peak Mem in Agg nodes is reduced from 4.94 GB to 19.82 MB.

Summary before:
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail

04:EXCHANGE 15.856ms5.856ms   14.82K  15.21K 0  
  -1.00 B  UNPARTITIONED
03:AGGREGATE33s721ms3s789ms   14.82K  15.21K   4.94 GB  
 10.00 MB  FINALIZE
02:EXCHANGE 3  139.276ms  157.753ms   15.60K  15.21K 0  
0  HASH(ss_customer_sk)
01:AGGREGATE32s851ms3s026ms   15.60K  15.21K   5.29 GB  
 10.00 MB  STREAMING
00:SCAN HDFS3   24.245ms   35.727ms  183.59K 183.59K   4.60 MB  
384.00 MB  tpcds.store_sales

Summary after:
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail

04:EXCHANGE 1  288.125us  288.125us   14.82K  15.21K 0  
  -1.00 B  UNPARTITIONED
03:AGGREGATE39.358ms   10.982ms   14.82K  15.21K  19.82 MB  
 10.00 MB  FINALIZE
02:EXCHANGE 3  129.832us  154.953us   15.62K  15.21K 0  
0  HASH(ss_customer_sk)
01:AGGREGATE3   11.086ms   13.102ms   15.62K  15.21K   9.49 MB  
 10.00 MB  STREAMING
00:SCAN HDFS3   40.154ms   50.220ms  183.59K 183.59K   2.94 MB  
384.00 MB  tpcds.store_sales

Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
---
M be/src/exprs/aggregate-functions-ir.cc
M

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-23 Thread Taras Bobrovytsky (Code Review)

Taras Bobrovytsky has uploaded a new patch set (#9).

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..

IMPALA-4787: Optimize APPX_MEDIAN() memory usage

Before this change, ReservoirSample functions (such as APPX_MEDIAN())
allocated memory for 20,000 elements up front per grouping key. This
caused inefficient memory usage for aggregations with many grouping
keys.

This patch fixes this by initially allocating memory for 16 elements.
Once the buffer becomes full, we reallocate a new buffer with double
capacity and copy the original buffer into the new one. We continue
doubling the buffer size until the buffer has room for 20,000 elements
as before.

Testing:
Added some EE APPX_MEDIAN() tests on larger datasets that excercise the
resize code path.

Performance benchmark:
I ran a performance benchmark locally on release build. The following
query results in 3,000 grouping keys and about 30,000 values per key:
SELECT MAX(a) from (
  SELECT c1, appx_median(c2) as a FROM benchmark GROUP BY c1) t

Before: 11.57s
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
-
06:AGGREGATE1   96.086us   96.086us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   26.629us   26.629us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   68.187us   87.887us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE32s851ms5s362ms   3.00K  -1   1.95 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  119.540ms  220.191ms   9.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE35s876ms6s254ms   9.00K  -1   2.93 GB  
128.00 MB  STREAMING
00:SCAN HDFS3  127.834ms  146.842ms  98.30M  -1  19.80 MB   
32.00 MB  tpcds_10_parquet.benchmark

After:  13.58s
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
---
06:AGGREGATE1  101.101us  101.101us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   32.296us   32.296us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   83.284us  120.137us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE33s190ms6s555ms   3.00K  -1   1.96 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  247.897ms  497.280ms   9.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE37s370ms8s460ms   9.00K  -1   4.71 GB  
128.00 MB  STREAMING
00:SCAN HDFS3  111.721ms  122.306ms  98.30M  -1  19.94 MB   
32.00 MB  tpcds_10_parquet.benchmark

Memory benchmark:
The following query was used to verify that this patch reduces memory
usage:
SELECT APPX_MEDIAN(ss_sold_date_sk)
FROM tpcds.store_sales
GROUP BY ss_customer_sk;

Peak Mem in Agg nodes is reduced from 4.94 GB to 19.82 MB.

Summary before:
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail

04:EXCHANGE 15.856ms5.856ms   14.82K  15.21K 0  
  -1.00 B  UNPARTITIONED
03:AGGREGATE33s721ms3s789ms   14.82K  15.21K   4.94 GB  
 10.00 MB  FINALIZE
02:EXCHANGE 3  139.276ms  157.753ms   15.60K  15.21K 0  
0  HASH(ss_customer_sk)
01:AGGREGATE32s851ms3s026ms   15.60K  15.21K   5.29 GB  
 10.00 MB  STREAMING
00:SCAN HDFS3   24.245ms   35.727ms  183.59K 183.59K   4.60 MB  
384.00 MB  tpcds.store_sales

Summary after:
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail

04:EXCHANGE 1  288.125us  288.125us   14.82K  15.21K 0  
  -1.00 B  UNPARTITIONED
03:AGGREGATE39.358ms   10.982ms   14.82K  15.21K  19.82 MB  
 10.00 MB  FINALIZE
02:EXCHANGE 3  129.832us  154.953us   15.62K  15.21K 0  
0  HASH(ss_customer_sk)
01:AGGREGATE3   11.086ms   13.102ms   15.62K  15.21K   9.49 MB  
 10.00 MB  STREAMING
00:SCAN HDFS3   40.154ms   50.220ms  183.59K 183.59K   2.94 MB  
384.00 MB  tpcds.store_sales

Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
---
M be/src/exprs/aggregate-functions-ir.cc
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
M

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-23 Thread Taras Bobrovytsky (Code Review)

Hello Matthew Jacobs, Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6025

to look at the new patch set (#9).

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..

IMPALA-4787: Optimize APPX_MEDIAN() memory usage

Before this change, ReservoirSample functions (such as APPX_MEDIAN())
allocated memory for 20,000 elements up front per grouping key. This
caused inefficient memory usage for aggregations with many grouping
keys.

This patch fixes this by initially allocating memory for 16 elements.
Once the buffer becomes full, we reallocate a new buffer with double
capacity and copy the original buffer into the new one. We continue
doubling the buffer size until the buffer has room for 20,000 elements
as before.

Testing:
Added some EE APPX_MEDIAN() tests on larger datasets that excercise the
resize code path.

Performance benchmark:
I ran a performance benchmark locally on release build. The following
query results in 3,000 grouping keys and about 30,000 values per key:
SELECT MAX(a) from (
  SELECT c1, appx_median(c2) as a FROM benchmark GROUP BY c1) t

Before: 11.57s
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
-
06:AGGREGATE1   96.086us   96.086us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   26.629us   26.629us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   68.187us   87.887us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE32s851ms5s362ms   3.00K  -1   1.95 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  119.540ms  220.191ms   9.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE35s876ms6s254ms   9.00K  -1   2.93 GB  
128.00 MB  STREAMING
00:SCAN HDFS3  127.834ms  146.842ms  98.30M  -1  19.80 MB   
32.00 MB  tpcds_10_parquet.benchmark

After:  13.58s
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
---
06:AGGREGATE1  101.101us  101.101us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   32.296us   32.296us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   83.284us  120.137us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE33s190ms6s555ms   3.00K  -1   1.96 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  247.897ms  497.280ms   9.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE37s370ms8s460ms   9.00K  -1   4.71 GB  
128.00 MB  STREAMING
00:SCAN HDFS3  111.721ms  122.306ms  98.30M  -1  19.94 MB   
32.00 MB  tpcds_10_parquet.benchmark

Memory benchmark:
The following query was used to verify that this patch reduces memory
usage:
SELECT APPX_MEDIAN(ss_sold_date_sk)
FROM tpcds.store_sales
GROUP BY ss_customer_sk;

Peak Mem in Agg nodes is reduced from 4.94 GB to 19.82 MB.

Summary before:
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail

04:EXCHANGE 15.856ms5.856ms   14.82K  15.21K 0  
  -1.00 B  UNPARTITIONED
03:AGGREGATE33s721ms3s789ms   14.82K  15.21K   4.94 GB  
 10.00 MB  FINALIZE
02:EXCHANGE 3  139.276ms  157.753ms   15.60K  15.21K 0  
0  HASH(ss_customer_sk)
01:AGGREGATE32s851ms3s026ms   15.60K  15.21K   5.29 GB  
 10.00 MB  STREAMING
00:SCAN HDFS3   24.245ms   35.727ms  183.59K 183.59K   4.60 MB  
384.00 MB  tpcds.store_sales

Summary after:
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail

04:EXCHANGE 1  288.125us  288.125us   14.82K  15.21K 0  
  -1.00 B  UNPARTITIONED
03:AGGREGATE39.358ms   10.982ms   14.82K  15.21K  19.82 MB  
 10.00 MB  FINALIZE
02:EXCHANGE 3  129.832us  154.953us   15.62K  15.21K 0  
0  HASH(ss_customer_sk)
01:AGGREGATE3   11.086ms   13.102ms   15.62K  15.21K   9.49 MB  
 10.00 MB  STREAMING
00:SCAN HDFS3   40.154ms   50.220ms  183.59K 183.59K   2.94 MB  
384.00 MB  tpcds.store_sales

Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
---
M be/src/exprs/aggregate-functions-ir.cc
M

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-23 Thread Taras Bobrovytsky (Code Review)

Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 8:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/6025/8//COMMIT_MSG
Commit Message:

PS8, Line 22: No new tests were added,
> can update this now
Done


http://gerrit.cloudera.org:8080/#/c/6025/8/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

Line 126: // TODO: this file should be cross compiled and then all of the 
builtin
> is this done?
This file ends with a "-ir" and it's in be/src/codegen/impala-ir.cc, so this 
must be done. Removed.


Line 961: struct ReservoirSampleState {
> i'd say this really turned into a class
Done


Line 967:   // resize occurs, this needs to be updated from the outside.
> what does 'from the outside' mean?
I meant that whoever resizes and memcopies this struct over, they are also 
responsible for updating the capacity. It might be more clear if I remove that 
part.


Line 977:   ReservoirSampleState(int init_capacity) :
> use standard formatting
Done.


Line 1016: // The array of ReservoirSamples starts right after 
ReservoirSampleState, so we use
> that's often done by putting an array of size 1 at the end of the header st
Done. Also made some of the functions non-const because we don't want a 
function like GetSample() to return a const ReservoirSample*.


Line 1025:   int64_t GetNext64(int64_t max) {
> while you're at it, this deserves a comment
Done


Line 1033:   // Given a buffer that contains a ReservoirSampleState, resize the 
buffer so that it's
> its
nice catch


Line 1040:   if (new_capacity * 2 >= MAX_CAPACITY) new_capacity = MAX_CAPACITY;
> if state->capacity is 10 and max_capacity is 40, this line sets new_capacit
With the current constants, we would be resizing from about 8000 to 20,000. I 
think this is acceptable. Would it be better to resize to 16,000 then to 20,000?


Line 1062:   // If the array gets filled due to updates or merges, we 
reallocate a larger buffer to
> you should put this (= a brief description of what you're doing) somewhere 
I moved this description higher up. I kept it mostly unmodified.


http://gerrit.cloudera.org:8080/#/c/6025/8/testdata/workloads/functional-query/queries/QueryTest/aggregation.test
File testdata/workloads/functional-query/queries/QueryTest/aggregation.test:

PS8, Line 1163: mediam
> spelling
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-22 Thread Marcel Kornacker (Code Review)

Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 8:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/6025/8/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

Line 126: // TODO: this file should be cross compiled and then all of the 
builtin
is this done?


Line 961: struct ReservoirSampleState {
i'd say this really turned into a class


Line 967:   // resize occurs, this needs to be updated from the outside.
what does 'from the outside' mean?


Line 977:   ReservoirSampleState(int init_capacity) :
use standard formatting


Line 1016: // The array of ReservoirSamples starts right after 
ReservoirSampleState, so we use
that's often done by putting an array of size 1 at the end of the header struct:

ReservoirSample samples[1];

and then you can do things like state.samples[5] = x;

it's convenient and makes it explicit that you have a trailing var-len array.


Line 1025:   int64_t GetNext64(int64_t max) {
while you're at it, this deserves a comment


Line 1033:   // Given a buffer that contains a ReservoirSampleState, resize the 
buffer so that it's
its


Line 1040:   if (new_capacity * 2 >= MAX_CAPACITY) new_capacity = MAX_CAPACITY;
if state->capacity is 10 and max_capacity is 40, this line sets new_capacity to 
40.


Line 1062:   // If the array gets filled due to updates or merges, we 
reallocate a larger buffer to
you should put this (= a brief description of what you're doing) somewhere at 
the beginning of the reservoir sample-related code.


-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-22 Thread Alex Behm (Code Review)

Alex Behm has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 8: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-22 Thread Taras Bobrovytsky (Code Review)

Taras Bobrovytsky has uploaded a new patch set (#8).

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..

IMPALA-4787: Optimize APPX_MEDIAN() memory usage

Before this change, ReservoirSample functions (such as APPX_MEDIAN())
allocated memory for 20,000 elements up front per grouping key. This
caused inefficient memory usage for aggregations with many grouping
keys.

This patch fixes this by initially allocating memory for 16 elements.
Once the buffer becomes full, we reallocate a new buffer with double
capacity and copy the original buffer into the new one. We continue
doubling the buffer size until the buffer has room for 20,000 elements
as before.

Testing:
Ran BE and some relevant EE tests locally.
No new tests were added, the existing tests should provide enough
coverage.

Performance benchmark:
I ran a performance benchmark locally on release build. The following
query results in 3,000 grouping keys and about 30,000 values per key:
SELECT MAX(a) from (
  SELECT c1, appx_median(c2) as a FROM benchmark GROUP BY c1) t

Before: 11.57s
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
-
06:AGGREGATE1   96.086us   96.086us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   26.629us   26.629us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   68.187us   87.887us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE32s851ms5s362ms   3.00K  -1   1.95 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  119.540ms  220.191ms   9.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE35s876ms6s254ms   9.00K  -1   2.93 GB  
128.00 MB  STREAMING
00:SCAN HDFS3  127.834ms  146.842ms  98.30M  -1  19.80 MB   
32.00 MB  tpcds_10_parquet.benchmark

After:  13.58s
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
---
06:AGGREGATE1  101.101us  101.101us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   32.296us   32.296us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   83.284us  120.137us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE33s190ms6s555ms   3.00K  -1   1.96 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  247.897ms  497.280ms   9.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE37s370ms8s460ms   9.00K  -1   4.71 GB  
128.00 MB  STREAMING
00:SCAN HDFS3  111.721ms  122.306ms  98.30M  -1  19.94 MB   
32.00 MB  tpcds_10_parquet.benchmark

Memory benchmark:
The following query was used to verify that this patch reduces memory
usage:
SELECT APPX_MEDIAN(ss_sold_date_sk)
FROM tpcds.store_sales
GROUP BY ss_customer_sk;

Peak Mem in Agg nodes is reduced from 4.94 GB to 19.82 MB.

Summary before:
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail

04:EXCHANGE 15.856ms5.856ms   14.82K  15.21K 0  
  -1.00 B  UNPARTITIONED
03:AGGREGATE33s721ms3s789ms   14.82K  15.21K   4.94 GB  
 10.00 MB  FINALIZE
02:EXCHANGE 3  139.276ms  157.753ms   15.60K  15.21K 0  
0  HASH(ss_customer_sk)
01:AGGREGATE32s851ms3s026ms   15.60K  15.21K   5.29 GB  
 10.00 MB  STREAMING
00:SCAN HDFS3   24.245ms   35.727ms  183.59K 183.59K   4.60 MB  
384.00 MB  tpcds.store_sales

Summary after:
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail

04:EXCHANGE 1  288.125us  288.125us   14.82K  15.21K 0  
  -1.00 B  UNPARTITIONED
03:AGGREGATE39.358ms   10.982ms   14.82K  15.21K  19.82 MB  
 10.00 MB  FINALIZE
02:EXCHANGE 3  129.832us  154.953us   15.62K  15.21K 0  
0  HASH(ss_customer_sk)
01:AGGREGATE3   11.086ms   13.102ms   15.62K  15.21K   9.49 MB  
 10.00 MB  STREAMING
00:SCAN HDFS3   40.154ms   50.220ms  183.59K 183.59K   2.94 MB  
384.00 MB  tpcds.store_sales

Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
---
M be/src/exprs/aggregate-functions-ir.cc
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
M

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-22 Thread Taras Bobrovytsky (Code Review)

Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6025/6/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

PS6, Line 958: ReservoirSampleState
> Because this is not a vector or a list, it's state for reservoir sampling w
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-22 Thread Matthew Jacobs (Code Review)

Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6025/6/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

PS6, Line 958: ReservoirSampleState
> Why do you think a vector-like interface is not good? It seems simple and i
Because this is not a vector or a list, it's state for reservoir sampling which 
contains number of samples, the source size, a RNG, *and* a list of samples. 
Putting methods on this that make it look like a vector is confusing. This 
isn't even a struct anymore IMO.


-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-21 Thread Taras Bobrovytsky (Code Review)

Taras Bobrovytsky has uploaded a new patch set (#7).

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..

IMPALA-4787: Optimize APPX_MEDIAN() memory usage

Before this change, ReservoirSample functions (such as APPX_MEDIAN())
allocated memory for 20,000 elements up front per grouping key. This
caused inefficient memory usage for aggregations with many grouping
keys.

This patch fixes this by initially allocating memory for 16 elements.
Once the buffer becomes full, we reallocate a new buffer with double
capacity and copy the original buffer into the new one. We continue
doubling the buffer size until the buffer has room for 20,000 elements
as before.

Testing:
Ran BE and some relevant EE tests locally.
No new tests were added, the existing tests should provide enough
coverage.

Performance benchmark:
I ran a performance benchmark locally on release build. The following
query results in 3,000 grouping keys and about 30,000 values per key:
SELECT MAX(a) from (
  SELECT c1, appx_median(c2) as a FROM benchmark GROUP BY c1) t

Before: 11.57s
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
-
06:AGGREGATE1   96.086us   96.086us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   26.629us   26.629us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   68.187us   87.887us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE32s851ms5s362ms   3.00K  -1   1.95 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  119.540ms  220.191ms   9.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE35s876ms6s254ms   9.00K  -1   2.93 GB  
128.00 MB  STREAMING
00:SCAN HDFS3  127.834ms  146.842ms  98.30M  -1  19.80 MB   
32.00 MB  tpcds_10_parquet.benchmark

After:  13.58s
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
---
06:AGGREGATE1  101.101us  101.101us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   32.296us   32.296us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   83.284us  120.137us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE33s190ms6s555ms   3.00K  -1   1.96 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  247.897ms  497.280ms   9.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE37s370ms8s460ms   9.00K  -1   4.71 GB  
128.00 MB  STREAMING
00:SCAN HDFS3  111.721ms  122.306ms  98.30M  -1  19.94 MB   
32.00 MB  tpcds_10_parquet.benchmark

Memory benchmark:
The following query was used to verify that this patch reduces memory
usage:
SELECT APPX_MEDIAN(ss_sold_date_sk)
FROM tpcds.store_sales
GROUP BY ss_customer_sk;

Peak Mem in Agg nodes is reduced from 4.94 GB to 19.82 MB.

Summary before:
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail

04:EXCHANGE 15.856ms5.856ms   14.82K  15.21K 0  
  -1.00 B  UNPARTITIONED
03:AGGREGATE33s721ms3s789ms   14.82K  15.21K   4.94 GB  
 10.00 MB  FINALIZE
02:EXCHANGE 3  139.276ms  157.753ms   15.60K  15.21K 0  
0  HASH(ss_customer_sk)
01:AGGREGATE32s851ms3s026ms   15.60K  15.21K   5.29 GB  
 10.00 MB  STREAMING
00:SCAN HDFS3   24.245ms   35.727ms  183.59K 183.59K   4.60 MB  
384.00 MB  tpcds.store_sales

Summary after:
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail

04:EXCHANGE 1  288.125us  288.125us   14.82K  15.21K 0  
  -1.00 B  UNPARTITIONED
03:AGGREGATE39.358ms   10.982ms   14.82K  15.21K  19.82 MB  
 10.00 MB  FINALIZE
02:EXCHANGE 3  129.832us  154.953us   15.62K  15.21K 0  
0  HASH(ss_customer_sk)
01:AGGREGATE3   11.086ms   13.102ms   15.62K  15.21K   9.49 MB  
 10.00 MB  STREAMING
00:SCAN HDFS3   40.154ms   50.220ms  183.59K 183.59K   2.94 MB  
384.00 MB  tpcds.store_sales

Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
---
M be/src/exprs/aggregate-functions-ir.cc
M testdata/workloads/functional-query/queries/QueryTest/alloc-fail-init.test
2 files changed, 156 insertions(+), 37

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-21 Thread Taras Bobrovytsky (Code Review)

Taras Bobrovytsky has uploaded a new patch set (#7).

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..

IMPALA-4787: Optimize APPX_MEDIAN() memory usage

Before this change, ReservoirSample functions (such as APPX_MEDIAN())
allocated memory for 20,000 elements up front per grouping key. This
caused inefficient memory usage for aggregations with many grouping
keys.

This patch fixes this by initially allocating memory for 16 elements.
Once the buffer becomes full, we reallocate a new buffer with double
capacity and copy the original buffer into the new one. We continue
doubling the buffer size until the buffer has room for 20,000 elements
as before.

Testing:
Ran BE and some relevant EE tests locally.
No new tests were added, the existing tests should provide enough
coverage.

Performance benchmark:
I ran a performance benchmark locally on release build. The following
query results in 3,000 grouping keys and about 30,000 values per key:
SELECT MAX(a) from (
  SELECT c1, appx_median(c2) as a FROM benchmark GROUP BY c1) t

Before: 11.57s
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
-
06:AGGREGATE1   96.086us   96.086us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   26.629us   26.629us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   68.187us   87.887us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE32s851ms5s362ms   3.00K  -1   1.95 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  119.540ms  220.191ms   9.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE35s876ms6s254ms   9.00K  -1   2.93 GB  
128.00 MB  STREAMING
00:SCAN HDFS3  127.834ms  146.842ms  98.30M  -1  19.80 MB   
32.00 MB  tpcds_10_parquet.benchmark

After:  13.58s
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
---
06:AGGREGATE1  101.101us  101.101us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   32.296us   32.296us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   83.284us  120.137us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE33s190ms6s555ms   3.00K  -1   1.96 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  247.897ms  497.280ms   9.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE37s370ms8s460ms   9.00K  -1   4.71 GB  
128.00 MB  STREAMING
00:SCAN HDFS3  111.721ms  122.306ms  98.30M  -1  19.94 MB   
32.00 MB  tpcds_10_parquet.benchmark

Memory benchmark:
The following query was used to verify that this patch reduces memory
usage:
SELECT APPX_MEDIAN(ss_sold_date_sk)
FROM tpcds.store_sales
GROUP BY ss_customer_sk;

Peak Mem in Agg nodes is reduced from 4.94 GB to 19.82 MB.

Summary before:
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail

04:EXCHANGE 15.856ms5.856ms   14.82K  15.21K 0  
  -1.00 B  UNPARTITIONED
03:AGGREGATE33s721ms3s789ms   14.82K  15.21K   4.94 GB  
 10.00 MB  FINALIZE
02:EXCHANGE 3  139.276ms  157.753ms   15.60K  15.21K 0  
0  HASH(ss_customer_sk)
01:AGGREGATE32s851ms3s026ms   15.60K  15.21K   5.29 GB  
 10.00 MB  STREAMING
00:SCAN HDFS3   24.245ms   35.727ms  183.59K 183.59K   4.60 MB  
384.00 MB  tpcds.store_sales

Summary after:
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail

04:EXCHANGE 1  288.125us  288.125us   14.82K  15.21K 0  
  -1.00 B  UNPARTITIONED
03:AGGREGATE39.358ms   10.982ms   14.82K  15.21K  19.82 MB  
 10.00 MB  FINALIZE
02:EXCHANGE 3  129.832us  154.953us   15.62K  15.21K 0  
0  HASH(ss_customer_sk)
01:AGGREGATE3   11.086ms   13.102ms   15.62K  15.21K   9.49 MB  
 10.00 MB  STREAMING
00:SCAN HDFS3   40.154ms   50.220ms  183.59K 183.59K   2.94 MB  
384.00 MB  tpcds.store_sales

Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
---
M be/src/exprs/aggregate-functions-ir.cc
M testdata/workloads/functional-query/queries/QueryTest/alloc-fail-init.test
2 files changed, 156 insertions(+), 37

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-21 Thread Taras Bobrovytsky (Code Review)

Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 6:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6025/6/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

PS6, Line 918: const static int INIT_CAPACITY = 16;
 : const static int MAX_NUM_SAMPLES = NUM_BUCKETS * 
NUM_SAMPLES_PER_BUCKET;
> Please add a comment. It's not clear why all of these are grouped together 
Done


PS6, Line 958: ReservoirSampleState
> after thinking more about the casing comments, I came to the conclusion tha
Why do you think a vector-like interface is not good? It seems simple and 
intuitive. Each operation does exactly what the vector equivalent does, with 
the exception of push_back(), which does not resize automatically. Should we 
get rid of begin() and end()? What should they be called?


PS6, Line 1285:   nth_element(src->begin(), mid_point, src->end(), 
SampleValLess);
> nice
thanks!


-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-21 Thread Matthew Jacobs (Code Review)

Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 6:

(3 comments)

I'm still not convinced the new paths are necessarily, e.g. the case I 
mentioned previously where you need to merge and there are different sized 
inputs.

http://gerrit.cloudera.org:8080/#/c/6025/6/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

PS6, Line 918: const static int INIT_CAPACITY = 16;
 : const static int MAX_NUM_SAMPLES = NUM_BUCKETS * 
NUM_SAMPLES_PER_BUCKET;
Please add a comment. It's not clear why all of these are grouped together 
anymore. The first two are only relevant to histograms. These two are about 
capacity for anything using ResSampling. MAX_NUM_SAMPLES should probably also 
be a capacity now, e.g. MAX_CAPACITY.


PS6, Line 958: ReservoirSampleState
after thinking more about the casing comments, I came to the conclusion that I 
don't think trying to make this look like a std::vector is even the best 
interface.

I'd prefer if the methods were named non-std-vector-like names, e.g. 

GetSample(idx)
AddSample(s)
GetStateSize()


PS6, Line 1285:   nth_element(src->begin(), mid_point, src->end(), 
SampleValLess);
nice


-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-21 Thread Jim Apple (Code Review)

Jim Apple has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 6:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6025/5/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

Line 1121: if (state->at(i)->key >= 0) continue;
> There is a comment in the code about upgrading to mersenne twister, I think
I'm satisfied with this answer


PS5, Line 1122: te->num_samples;
> The point of the big comment above is that this is not exact, and we're app
I'm satisfied with this answer


-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-21 Thread Taras Bobrovytsky (Code Review)

Taras Bobrovytsky has uploaded a new patch set (#6).

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..

IMPALA-4787: Optimize APPX_MEDIAN() memory usage

Before this change, ReservoirSample functions (such as APPX_MEDIAN())
allocated memory for 20,000 elements up front per grouping key. This
caused inefficient memory usage for aggregations with many grouping
keys.

This patch fixes this by initially allocating memory for 16 elements.
Once the buffer becomes full, we reallocate a new buffer with double
capacity and copy the original buffer into the new one. We continue
doubling the buffer size until the buffer has room for 20,000 elements
as before.

Testing:
Ran BE and some relevant EE tests locally.
No new tests were added, the existing tests should provide enough
coverage.

Performance benchmark:
I ran a performance benchmark locally on release build. The following
query results in 3,000 grouping keys and about 30,000 values per key:
SELECT MAX(a) from (
  SELECT c1, appx_median(c2) as a FROM benchmark GROUP BY c1) t

Before: 11.57s
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
-
06:AGGREGATE1   96.086us   96.086us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   26.629us   26.629us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   68.187us   87.887us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE32s851ms5s362ms   3.00K  -1   1.95 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  119.540ms  220.191ms   9.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE35s876ms6s254ms   9.00K  -1   2.93 GB  
128.00 MB  STREAMING
00:SCAN HDFS3  127.834ms  146.842ms  98.30M  -1  19.80 MB   
32.00 MB  tpcds_10_parquet.benchmark

After:  13.58s
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
---
06:AGGREGATE1  101.101us  101.101us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   32.296us   32.296us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   83.284us  120.137us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE33s190ms6s555ms   3.00K  -1   1.96 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  247.897ms  497.280ms   9.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE37s370ms8s460ms   9.00K  -1   4.71 GB  
128.00 MB  STREAMING
00:SCAN HDFS3  111.721ms  122.306ms  98.30M  -1  19.94 MB   
32.00 MB  tpcds_10_parquet.benchmark

Memory benchmark:
The following query was used to verify that this patch reduces memory
usage:
SELECT APPX_MEDIAN(ss_sold_date_sk)
FROM tpcds.store_sales
GROUP BY ss_customer_sk;

Peak Mem in Agg nodes is reduced from 4.94 GB to 19.82 MB.

Summary before:
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail

04:EXCHANGE 15.856ms5.856ms   14.82K  15.21K 0  
  -1.00 B  UNPARTITIONED
03:AGGREGATE33s721ms3s789ms   14.82K  15.21K   4.94 GB  
 10.00 MB  FINALIZE
02:EXCHANGE 3  139.276ms  157.753ms   15.60K  15.21K 0  
0  HASH(ss_customer_sk)
01:AGGREGATE32s851ms3s026ms   15.60K  15.21K   5.29 GB  
 10.00 MB  STREAMING
00:SCAN HDFS3   24.245ms   35.727ms  183.59K 183.59K   4.60 MB  
384.00 MB  tpcds.store_sales

Summary after:
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail

04:EXCHANGE 1  288.125us  288.125us   14.82K  15.21K 0  
  -1.00 B  UNPARTITIONED
03:AGGREGATE39.358ms   10.982ms   14.82K  15.21K  19.82 MB  
 10.00 MB  FINALIZE
02:EXCHANGE 3  129.832us  154.953us   15.62K  15.21K 0  
0  HASH(ss_customer_sk)
01:AGGREGATE3   11.086ms   13.102ms   15.62K  15.21K   9.49 MB  
 10.00 MB  STREAMING
00:SCAN HDFS3   40.154ms   50.220ms  183.59K 183.59K   2.94 MB  
384.00 MB  tpcds.store_sales

Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
---
M be/src/exprs/aggregate-functions-ir.cc
M testdata/workloads/functional-query/queries/QueryTest/alloc-fail-init.test
2 files changed, 153 insertions(+), 37

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-21 Thread Taras Bobrovytsky (Code Review)

Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 5:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6025/5/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

Line 1121: int r = rand() % state->num_samples;
> I know you didn't author this line, but could you switch to a better PRNG t
I think this should be addressed in a separate patch. I think we would have to 
benchmark the change, etc.


PS5, Line 1122: ((double) state->source_size - r) / state->source_size
> I'm not yet convinced this is correct. I'm not convinced it is correct in H
I don't think this is correct either. The distribution of keys will not be the 
same as what's described in the blog post: 
https://gregable.com/2007/10/reservoir-sampling.html
We should be generating a random number between 0 and 1 in update(), but we are 
"simulating" this here (see comment on line 1107), which gives us an incorrect 
distribution.

I don't think increasing the denominator by 1 as you suggested would fix the 
problem.

About your second point, in our case the weight k is always 1, so we don't need 
to raise r to some power.

I think algorithm should be addressed in a separate patch.


PS5, Line 1282: sort
> I know this also is present in HEAD, but it can be http://en.cppreference.c
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-19 Thread Jim Apple (Code Review)

Jim Apple has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6025/5/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

PS5, Line 1282: sort
I know this also is present in HEAD, but it can be 
http://en.cppreference.com/w/cpp/algorithm/nth_element


-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-18 Thread Mostafa Mokhtar (Code Review)

Mostafa Mokhtar has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6025/2/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

PS2, Line 1024: new_size 
> 2x works for me but:
With 10x adding a couple of more rows to the table the memory requirements for 
a query can go from 3GB to 30GB, which might not be acceptable. 
Hash table used for joins behave similarly, they start small and double in size 
as need be. 
If the goal is to improve performance of APPEX_MEDIAN then we should be looking 
at more than just the memory curve.


-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-17 Thread Jim Apple (Code Review)

Jim Apple has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6025/5/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

PS5, Line 1122: ((double) state->source_size - r) / state->source_size
I'm not yet convinced this is correct. I'm not convinced it is correct in HEAD.

First, a nit: I think the denominator should be increased by 1. (I think the 
expected value of the maximum of two independent random variables over the 
uniform distribution on (0,1) is 2/3, not 3/4)

Second, while I understand that this produces values in the given range, I 
don't think I understand how this implements the algorithm mentioned in the .h 
file, in which values of weight k are mapped to r^(1/k), where r is uniform 
random from (0,1).


-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-17 Thread Jim Apple (Code Review)

Jim Apple has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6025/5/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

Line 1121: int r = rand() % state->num_samples;
I know you didn't author this line, but could you switch to a better PRNG than 
rand, like 
http://en.cppreference.com/w/cpp/numeric/random/mersenne_twister_engine?


-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-17 Thread Taras Bobrovytsky (Code Review)

Taras Bobrovytsky has uploaded a new patch set (#5).

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..

IMPALA-4787: Optimize APPX_MEDIAN() memory usage

Before this change, ReservoirSample functions (such as APPX_MEDIAN())
allocated memory for 20,000 elements up front per grouping key. This
caused inefficient memory usage for aggregations with many grouping
keys.

This patch fixes this by initially allocating memory for 16 elements.
Once the buffer becomes full, we reallocate a new buffer with double
capacity and copy the original buffer into the new one. We continue
doubling the buffer size until the buffer has room for 20,000 elements
as before.

Testing:
Ran BE and some relevant EE tests locally.
No new tests were added, the existing tests should provide enough
coverage.

Performance benchmark:
I ran a performance benchmark locally on release build. The following
query results in 3,000 grouping keys and about 30,000 values per key:
SELECT MAX(a) from (
  SELECT c1, appx_median(c2) as a FROM benchmark GROUP BY c1) t

Before: 11.57s
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
-
06:AGGREGATE1   96.086us   96.086us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   26.629us   26.629us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   68.187us   87.887us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE32s851ms5s362ms   3.00K  -1   1.95 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  119.540ms  220.191ms   9.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE35s876ms6s254ms   9.00K  -1   2.93 GB  
128.00 MB  STREAMING
00:SCAN HDFS3  127.834ms  146.842ms  98.30M  -1  19.80 MB   
32.00 MB  tpcds_10_parquet.benchmark

After:  13.58s
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
---
06:AGGREGATE1  101.101us  101.101us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   32.296us   32.296us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   83.284us  120.137us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE33s190ms6s555ms   3.00K  -1   1.96 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  247.897ms  497.280ms   9.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE37s370ms8s460ms   9.00K  -1   4.71 GB  
128.00 MB  STREAMING
00:SCAN HDFS3  111.721ms  122.306ms  98.30M  -1  19.94 MB   
32.00 MB  tpcds_10_parquet.benchmark

Memory benchmark:
The following query was used to verify that this patch reduces memory
usage:
SELECT APPX_MEDIAN(ss_sold_date_sk)
FROM tpcds.store_sales
GROUP BY ss_customer_sk;

Peak Mem in Agg nodes is reduced from 4.94 GB to 19.82 MB.

Summary before:
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail

04:EXCHANGE 15.856ms5.856ms   14.82K  15.21K 0  
  -1.00 B  UNPARTITIONED
03:AGGREGATE33s721ms3s789ms   14.82K  15.21K   4.94 GB  
 10.00 MB  FINALIZE
02:EXCHANGE 3  139.276ms  157.753ms   15.60K  15.21K 0  
0  HASH(ss_customer_sk)
01:AGGREGATE32s851ms3s026ms   15.60K  15.21K   5.29 GB  
 10.00 MB  STREAMING
00:SCAN HDFS3   24.245ms   35.727ms  183.59K 183.59K   4.60 MB  
384.00 MB  tpcds.store_sales

Summary after:
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail

04:EXCHANGE 1  288.125us  288.125us   14.82K  15.21K 0  
  -1.00 B  UNPARTITIONED
03:AGGREGATE39.358ms   10.982ms   14.82K  15.21K  19.82 MB  
 10.00 MB  FINALIZE
02:EXCHANGE 3  129.832us  154.953us   15.62K  15.21K 0  
0  HASH(ss_customer_sk)
01:AGGREGATE3   11.086ms   13.102ms   15.62K  15.21K   9.49 MB  
 10.00 MB  STREAMING
00:SCAN HDFS3   40.154ms   50.220ms  183.59K 183.59K   2.94 MB  
384.00 MB  tpcds.store_sales

Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
---
M be/src/exprs/aggregate-functions-ir.cc
M testdata/workloads/functional-query/queries/QueryTest/alloc-fail-init.test
2 files changed, 147 insertions(+), 34

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-17 Thread Taras Bobrovytsky (Code Review)

Taras Bobrovytsky has uploaded a new patch set (#5).

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..

IMPALA-4787: Optimize APPX_MEDIAN() memory usage

Before this change, ReservoirSample functions (such as APPX_MEDIAN())
allocated memory for 20,000 elements up front per grouping key. This
caused inefficient memory usage for aggregations with many grouping
keys.

This patch fixes this by initially allocating memory for 16 elements.
Once the buffer becomes full, we reallocate a new buffer with double
capacity and copy the original buffer into the new one. We continue
doubling the buffer size until the buffer has room for 20,000 elements
as before.

Testing:
Ran BE and some relevant EE tests locally.
No new tests were added, the existing tests should provide enough
coverage.

Performance benchmark:
I ran a performance benchmark locally on release build. The following
query results in 3,000 grouping keys and about 30,000 values per key:
SELECT MAX(a) from (
  SELECT c1, appx_median(c2) as a FROM benchmark GROUP BY c1) t

Before: 11.57s
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
-
06:AGGREGATE1   96.086us   96.086us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   26.629us   26.629us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   68.187us   87.887us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE32s851ms5s362ms   3.00K  -1   1.95 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  119.540ms  220.191ms   9.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE35s876ms6s254ms   9.00K  -1   2.93 GB  
128.00 MB  STREAMING
00:SCAN HDFS3  127.834ms  146.842ms  98.30M  -1  19.80 MB   
32.00 MB  tpcds_10_parquet.benchmark

After:  13.58s
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
---
06:AGGREGATE1  101.101us  101.101us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   32.296us   32.296us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   83.284us  120.137us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE33s190ms6s555ms   3.00K  -1   1.96 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  247.897ms  497.280ms   9.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE37s370ms8s460ms   9.00K  -1   4.71 GB  
128.00 MB  STREAMING
00:SCAN HDFS3  111.721ms  122.306ms  98.30M  -1  19.94 MB   
32.00 MB  tpcds_10_parquet.benchmark

Memory benchmark:
The following query was used to verify that this patch reduces memory
usage:
SELECT APPX_MEDIAN(ss_sold_date_sk)
FROM tpcds.store_sales
GROUP BY ss_customer_sk;

Peak Mem in Agg nodes is reduced from 4.94 GB to 19.82 MB.

Summary before:
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail

04:EXCHANGE 15.856ms5.856ms   14.82K  15.21K 0  
  -1.00 B  UNPARTITIONED
03:AGGREGATE33s721ms3s789ms   14.82K  15.21K   4.94 GB  
 10.00 MB  FINALIZE
02:EXCHANGE 3  139.276ms  157.753ms   15.60K  15.21K 0  
0  HASH(ss_customer_sk)
01:AGGREGATE32s851ms3s026ms   15.60K  15.21K   5.29 GB  
 10.00 MB  STREAMING
00:SCAN HDFS3   24.245ms   35.727ms  183.59K 183.59K   4.60 MB  
384.00 MB  tpcds.store_sales

Summary after:
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail

04:EXCHANGE 1  288.125us  288.125us   14.82K  15.21K 0  
  -1.00 B  UNPARTITIONED
03:AGGREGATE39.358ms   10.982ms   14.82K  15.21K  19.82 MB  
 10.00 MB  FINALIZE
02:EXCHANGE 3  129.832us  154.953us   15.62K  15.21K 0  
0  HASH(ss_customer_sk)
01:AGGREGATE3   11.086ms   13.102ms   15.62K  15.21K   9.49 MB  
 10.00 MB  STREAMING
00:SCAN HDFS3   40.154ms   50.220ms  183.59K 183.59K   2.94 MB  
384.00 MB  tpcds.store_sales

Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
---
M be/src/exprs/aggregate-functions-ir.cc
M testdata/workloads/functional-query/queries/QueryTest/alloc-fail-init.test
2 files changed, 147 insertions(+), 34

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-17 Thread Taras Bobrovytsky (Code Review)

Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 4:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/6025/4/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

Line 1025:   // can fit 10 times as many samples, up to a maximum of 
MAX_NUM_SAMPLES. We do not
> update references to '10'
Done


Line 1026:   // allocate room for MAX_NUM_SAMPLES samples up front in order to 
conserve memory.
> in order to -> to
Done


Line 1031:   if (new_capacity * 2 >= MAX_NUM_SAMPLES) new_capacity = 
MAX_NUM_SAMPLES;
> DCHECK_LT(state->capacity, new_capacity);
Done


Line 1051:   // ReservoirSamples. Initially, the array has enough room for 
MAX_NUM_SAMPLES / 1000
> update to reflect the new initial size
Done


Line 1063:   *state = ReservoirSampleState();
> It would be clearer what's happening if we added a constructor for Reservoi
Done


Line 1076:   // If the container is full, increase it's size by 10x.
> update 10
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-17 Thread Alex Behm (Code Review)

Alex Behm has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 4:

(6 comments)

I'm pretty happy with this patch.

http://gerrit.cloudera.org:8080/#/c/6025/4/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

Line 1025:   // can fit 10 times as many samples, up to a maximum of 
MAX_NUM_SAMPLES. We do not
update references to '10'


Line 1026:   // allocate room for MAX_NUM_SAMPLES samples up front in order to 
conserve memory.
in order to -> to


Line 1031:   if (new_capacity * 2 >= MAX_NUM_SAMPLES) new_capacity = 
MAX_NUM_SAMPLES;
DCHECK_LT(state->capacity, new_capacity);

in case someone decides to significantly muck with the sampling constants such 
that this can overflow


Line 1051:   // ReservoirSamples. Initially, the array has enough room for 
MAX_NUM_SAMPLES / 1000
update to reflect the new initial size


Line 1063:   *state = ReservoirSampleState();
It would be clearer what's happening if we added a constructor for 
ReservoirSampleState that initializes the struct fields. Right now, it appears 
that num_samples and source_size could be uninitialized.


Line 1076:   // If the container is full, increase it's size by 10x.
update 10


-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-17 Thread Alex Behm (Code Review)

Alex Behm has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6025/2/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

PS2, Line 1024: new_size 
> Queries with a large number buckets will run out of memory much quicker tha
2x works for me but:

For buckets with many samples you spend more time reallocating, and the peak 
memory consumption can be higher (space for 1.5 * MAX_NUM_SAMPLES).

See the experiment Taras posted in the commit msg. I'm fine with the 2x 
behavior, just pointing out the tradeoffs.


-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-17 Thread Taras Bobrovytsky (Code Review)

Taras Bobrovytsky has uploaded a new patch set (#4).

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..

IMPALA-4787: Optimize APPX_MEDIAN() memory usage

Before this change, ReservoirSample functions (such as APPX_MEDIAN())
allocated memory for 20,000 elements up front per grouping key. This
caused inefficient memory usage for aggregations with many grouping
keys.

This patch fixes this by initially allocating memory for 20 elements.
Once the buffer becomes full, we allocate room for 200 elements and
copy the original buffer into the new one. We continue increasing the
buffer size this way until the buffer has room for 20,000 elements as
before.

Testing:
Ran BE and some relevant EE tests locally.
No new tests were added, the existing tests should provide enough
coverage.

Performance benchmark:
I ran a performance benchmark locally on release build. The following
query results in 3,000 grouping keys and about 30,000 values per key:
SELECT MAX(a) from (
  SELECT c1, appx_median(c2) as a FROM benchmark GROUP BY c1) t

Before: 11.57s
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
-
06:AGGREGATE1   96.086us   96.086us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   26.629us   26.629us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   68.187us   87.887us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE32s851ms5s362ms   3.00K  -1   1.95 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  119.540ms  220.191ms   9.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE35s876ms6s254ms   9.00K  -1   2.93 GB  
128.00 MB  STREAMING
00:SCAN HDFS3  127.834ms  146.842ms  98.30M  -1  19.80 MB   
32.00 MB  tpcds_10_parquet.benchmark

After:  13.58s
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
---
06:AGGREGATE1  101.101us  101.101us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   32.296us   32.296us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   83.284us  120.137us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE33s190ms6s555ms   3.00K  -1   1.96 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  247.897ms  497.280ms   9.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE37s370ms8s460ms   9.00K  -1   4.71 GB  
128.00 MB  STREAMING
00:SCAN HDFS3  111.721ms  122.306ms  98.30M  -1  19.94 MB   
32.00 MB  tpcds_10_parquet.benchmark

Memory benchmark:
The following query was used to verify that this patch reduces memory
usage:
SELECT APPX_MEDIAN(ss_sold_date_sk)
FROM tpcds.store_sales
GROUP BY ss_customer_sk;

Peak Mem in Agg nodes is reduced from 4.94 GB to 19.82 MB.

Summary before:
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail

04:EXCHANGE 15.856ms5.856ms   14.82K  15.21K 0  
  -1.00 B  UNPARTITIONED
03:AGGREGATE33s721ms3s789ms   14.82K  15.21K   4.94 GB  
 10.00 MB  FINALIZE
02:EXCHANGE 3  139.276ms  157.753ms   15.60K  15.21K 0  
0  HASH(ss_customer_sk)
01:AGGREGATE32s851ms3s026ms   15.60K  15.21K   5.29 GB  
 10.00 MB  STREAMING
00:SCAN HDFS3   24.245ms   35.727ms  183.59K 183.59K   4.60 MB  
384.00 MB  tpcds.store_sales

Summary after:
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail

04:EXCHANGE 1  288.125us  288.125us   14.82K  15.21K 0  
  -1.00 B  UNPARTITIONED
03:AGGREGATE39.358ms   10.982ms   14.82K  15.21K  19.82 MB  
 10.00 MB  FINALIZE
02:EXCHANGE 3  129.832us  154.953us   15.62K  15.21K 0  
0  HASH(ss_customer_sk)
01:AGGREGATE3   11.086ms   13.102ms   15.62K  15.21K   9.49 MB  
 10.00 MB  STREAMING
00:SCAN HDFS3   40.154ms   50.220ms  183.59K 183.59K   2.94 MB  
384.00 MB  tpcds.store_sales

Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
---
M be/src/exprs/aggregate-functions-ir.cc
M testdata/workloads/functional-query/queries/QueryTest/alloc-fail-init.test
2 files changed, 141 insertions(+), 33

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-17 Thread Taras Bobrovytsky (Code Review)

Taras Bobrovytsky has uploaded a new patch set (#4).

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..

IMPALA-4787: Optimize APPX_MEDIAN() memory usage

Before this change, ReservoirSample functions (such as APPX_MEDIAN())
allocated memory for 20,000 elements up front per grouping key. This
caused inefficient memory usage for aggregations with many grouping
keys.

This patch fixes this by initially allocating memory for 20 elements.
Once the buffer becomes full, we allocate room for 200 elements and
copy the original buffer into the new one. We continue increasing the
buffer size this way until the buffer has room for 20,000 elements as
before.

Testing:
Ran BE and some relevant EE tests locally.
No new tests were added, the existing tests should provide enough
coverage.

Performance benchmark:
I ran a performance benchmark locally on release build. The following
query results in 3,000 grouping keys and about 30,000 values per key:
SELECT MAX(a) from (
  SELECT c1, appx_median(c2) as a FROM benchmark GROUP BY c1) t

Before: 11.57s
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
-
06:AGGREGATE1   96.086us   96.086us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   26.629us   26.629us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   68.187us   87.887us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE32s851ms5s362ms   3.00K  -1   1.95 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  119.540ms  220.191ms   9.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE35s876ms6s254ms   9.00K  -1   2.93 GB  
128.00 MB  STREAMING
00:SCAN HDFS3  127.834ms  146.842ms  98.30M  -1  19.80 MB   
32.00 MB  tpcds_10_parquet.benchmark

After:  13.58s
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
---
06:AGGREGATE1  101.101us  101.101us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   32.296us   32.296us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   83.284us  120.137us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE33s190ms6s555ms   3.00K  -1   1.96 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  247.897ms  497.280ms   9.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE37s370ms8s460ms   9.00K  -1   4.71 GB  
128.00 MB  STREAMING
00:SCAN HDFS3  111.721ms  122.306ms  98.30M  -1  19.94 MB   
32.00 MB  tpcds_10_parquet.benchmark

Memory benchmark:
The following query was used to verify that this patch reduces memory
usage:
SELECT APPX_MEDIAN(ss_sold_date_sk)
FROM tpcds.store_sales
GROUP BY ss_customer_sk;

Peak Mem in Agg nodes is reduced from 4.94 GB to 19.82 MB.

Summary before:
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail

04:EXCHANGE 15.856ms5.856ms   14.82K  15.21K 0  
  -1.00 B  UNPARTITIONED
03:AGGREGATE33s721ms3s789ms   14.82K  15.21K   4.94 GB  
 10.00 MB  FINALIZE
02:EXCHANGE 3  139.276ms  157.753ms   15.60K  15.21K 0  
0  HASH(ss_customer_sk)
01:AGGREGATE32s851ms3s026ms   15.60K  15.21K   5.29 GB  
 10.00 MB  STREAMING
00:SCAN HDFS3   24.245ms   35.727ms  183.59K 183.59K   4.60 MB  
384.00 MB  tpcds.store_sales

Summary after:
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail

04:EXCHANGE 1  288.125us  288.125us   14.82K  15.21K 0  
  -1.00 B  UNPARTITIONED
03:AGGREGATE39.358ms   10.982ms   14.82K  15.21K  19.82 MB  
 10.00 MB  FINALIZE
02:EXCHANGE 3  129.832us  154.953us   15.62K  15.21K 0  
0  HASH(ss_customer_sk)
01:AGGREGATE3   11.086ms   13.102ms   15.62K  15.21K   9.49 MB  
 10.00 MB  STREAMING
00:SCAN HDFS3   40.154ms   50.220ms  183.59K 183.59K   2.94 MB  
384.00 MB  tpcds.store_sales

Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
---
M be/src/exprs/aggregate-functions-ir.cc
M testdata/workloads/functional-query/queries/QueryTest/alloc-fail-init.test
2 files changed, 141 insertions(+), 33

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-17 Thread Taras Bobrovytsky (Code Review)

Taras Bobrovytsky has uploaded a new patch set (#3).

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..

IMPALA-4787: Optimize APPX_MEDIAN() memory usage

Before this change, ReservoirSample functions (such as APPX_MEDIAN())
allocated memory for 20,000 elements up front per grouping key. This
caused inefficient memory usage for aggregations with many grouping
keys.

This patch fixes this by initially allocating memory for 20 elements.
Once the buffer becomes full, we allocate room for 200 elements and
copy the original buffer into the new one. We continue increasing the
buffer size this way until the buffer has room for 20,000 elements as
before.

Testing:
Ran BE and some relevant EE tests locally.
No new tests were added, the existing tests should provide enough
coverage.

Memory benchmark:
The following query was used to verify that this patch reduces memory
usage:
SELECT APPX_MEDIAN(ss_sold_date_sk)
FROM tpcds.store_sales
GROUP BY ss_customer_sk;

Peak Mem in Agg nodes is reduced from 4.94 GB to 20.67 MB.

Summary before:

Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail

04:EXCHANGE 15.856ms5.856ms   14.82K  15.21K 0  
  -1.00 B  UNPARTITIONED
03:AGGREGATE33s721ms3s789ms   14.82K  15.21K   4.94 GB  
 10.00 MB  FINALIZE
02:EXCHANGE 3  139.276ms  157.753ms   15.60K  15.21K 0  
0  HASH(ss_customer_sk)
01:AGGREGATE32s851ms3s026ms   15.60K  15.21K   5.29 GB  
 10.00 MB  STREAMING
00:SCAN HDFS3   24.245ms   35.727ms  183.59K 183.59K   4.60 MB  
384.00 MB  tpcds.store_sales

Summary after:

Operator   #Hosts   Avg Time  Max Time#Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
---
04:EXCHANGE 11.869ms   1.869ms   14.82K  15.21K 0   
 -1.00 B  UNPARTITIONED
03:AGGREGATE3   34.225ms  37.978ms   14.82K  15.21K  20.67 MB   
10.00 MB  FINALIZE
02:EXCHANGE 3  860.854us   1.024ms   15.60K  15.21K 0   
   0  HASH(ss_customer_sk)
01:AGGREGATE3   74.048ms  93.605ms   15.60K  15.21K   9.99 MB   
10.00 MB  STREAMING
00:SCAN HDFS3   68.162ms  78.181ms  183.59K 183.59K   3.84 MB  
384.00 MB  tpcds.store_sales

Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
---
M be/src/exprs/aggregate-functions-ir.cc
M testdata/workloads/functional-query/queries/QueryTest/alloc-fail-init.test
2 files changed, 141 insertions(+), 33 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/6025/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-17 Thread Matthew Jacobs (Code Review)

Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6025/2/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

PS2, Line 1036:   state->max_num_samples = new_size;
> The previous state gets copied in ReallocBuffer. The only thing we need to 
Ah, didn't remember the details of reallocate. Comment will be helpful, thanks.


PS2, Line 1167: (dst->num_samples < MAX_NUM_SAMPLES
> If DST does not have enough capacity for MAX_NUM_SAMPLES, it will be resize
I see, I missed that call before. Thanks.


-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-17 Thread Taras Bobrovytsky (Code Review)

Taras Bobrovytsky has uploaded a new patch set (#3).

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..

IMPALA-4787: Optimize APPX_MEDIAN() memory usage

Before this change, ReservoirSample functions (such as APPX_MEDIAN())
allocated memory for 20,000 elements up front per grouping key. This
caused inefficient memory usage for aggregations with many grouping
keys.

This patch fixes this by initially allocating memory for 20 elements.
Once the buffer becomes full, we allocate room for 200 elements and
copy the original buffer into the new one. We continue increasing the
buffer size this way until the buffer has room for 20,000 elements as
before.

Testing:
Ran BE and some relevant EE tests locally.
No new tests were added, the existing tests should provide enough
coverage.

Memory benchmark:
The following query was used to verify that this patch reduces memory
usage:
SELECT APPX_MEDIAN(ss_sold_date_sk)
FROM tpcds.store_sales
GROUP BY ss_customer_sk;

Peak Mem in Agg nodes is reduced from 4.94 GB to 20.67 MB.

Summary before:

Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail

04:EXCHANGE 15.856ms5.856ms   14.82K  15.21K 0  
  -1.00 B  UNPARTITIONED
03:AGGREGATE33s721ms3s789ms   14.82K  15.21K   4.94 GB  
 10.00 MB  FINALIZE
02:EXCHANGE 3  139.276ms  157.753ms   15.60K  15.21K 0  
0  HASH(ss_customer_sk)
01:AGGREGATE32s851ms3s026ms   15.60K  15.21K   5.29 GB  
 10.00 MB  STREAMING
00:SCAN HDFS3   24.245ms   35.727ms  183.59K 183.59K   4.60 MB  
384.00 MB  tpcds.store_sales

Summary after:

Operator   #Hosts   Avg Time  Max Time#Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
---
04:EXCHANGE 11.869ms   1.869ms   14.82K  15.21K 0   
 -1.00 B  UNPARTITIONED
03:AGGREGATE3   34.225ms  37.978ms   14.82K  15.21K  20.67 MB   
10.00 MB  FINALIZE
02:EXCHANGE 3  860.854us   1.024ms   15.60K  15.21K 0   
   0  HASH(ss_customer_sk)
01:AGGREGATE3   74.048ms  93.605ms   15.60K  15.21K   9.99 MB   
10.00 MB  STREAMING
00:SCAN HDFS3   68.162ms  78.181ms  183.59K 183.59K   3.84 MB  
384.00 MB  tpcds.store_sales

Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
---
M be/src/exprs/aggregate-functions-ir.cc
M testdata/workloads/functional-query/queries/QueryTest/alloc-fail-init.test
2 files changed, 141 insertions(+), 33 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/6025/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-17 Thread Taras Bobrovytsky (Code Review)

Taras Bobrovytsky has uploaded a new patch set (#3).

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..

IMPALA-4787: Optimize APPX_MEDIAN() memory usage

Before this change, ReservoirSample functions (such as APPX_MEDIAN())
allocated memory for 20,000 elements up front per grouping key. This
caused inefficient memory usage for aggregations with many grouping
keys.

This patch fixes this by initially allocating memory for 20 elements.
Once the buffer becomes full, we allocate room for 200 elements and
copy the original buffer into the new one. We continue increasing the
buffer size this way until the buffer has room for 20,000 elements as
before.

Testing:
Ran BE and some relevant EE tests locally.
No new tests were added, the existing tests should provide enough
coverage.

Memory benchmark:
The following query was used to verify that this patch reduces memory
usage:
SELECT APPX_MEDIAN(ss_sold_date_sk)
FROM tpcds.store_sales
GROUP BY ss_customer_sk;

Peak Mem in Agg nodes is reduced from 4.94 GB to 20.67 MB.

Summary before:

Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail

04:EXCHANGE 15.856ms5.856ms   14.82K  15.21K 0  
  -1.00 B  UNPARTITIONED
03:AGGREGATE33s721ms3s789ms   14.82K  15.21K   4.94 GB  
 10.00 MB  FINALIZE
02:EXCHANGE 3  139.276ms  157.753ms   15.60K  15.21K 0  
0  HASH(ss_customer_sk)
01:AGGREGATE32s851ms3s026ms   15.60K  15.21K   5.29 GB  
 10.00 MB  STREAMING
00:SCAN HDFS3   24.245ms   35.727ms  183.59K 183.59K   4.60 MB  
384.00 MB  tpcds.store_sales

Summary after:

Operator   #Hosts   Avg Time  Max Time#Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
---
04:EXCHANGE 11.869ms   1.869ms   14.82K  15.21K 0   
 -1.00 B  UNPARTITIONED
03:AGGREGATE3   34.225ms  37.978ms   14.82K  15.21K  20.67 MB   
10.00 MB  FINALIZE
02:EXCHANGE 3  860.854us   1.024ms   15.60K  15.21K 0   
   0  HASH(ss_customer_sk)
01:AGGREGATE3   74.048ms  93.605ms   15.60K  15.21K   9.99 MB   
10.00 MB  STREAMING
00:SCAN HDFS3   68.162ms  78.181ms  183.59K 183.59K   3.84 MB  
384.00 MB  tpcds.store_sales

Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
---
M be/src/exprs/aggregate-functions-ir.cc
M testdata/workloads/functional-query/queries/QueryTest/alloc-fail-init.test
2 files changed, 141 insertions(+), 33 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/6025/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-17 Thread Taras Bobrovytsky (Code Review)

Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 2:

(17 comments)

Marcel, I will run the targeted perf query later today.

http://gerrit.cloudera.org:8080/#/c/6025/2/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

PS2, Line 963: max_num_samples
> confusing to have MAX_NUM_SAMPLES as well. maybe this should be capacity
Done


PS2, Line 973: { return begin() + idx; }
> it'd be good to add bounds checks w/ dchecks
Done


PS2, Line 984: max_num_samples
> capacity
Done


Line 996:   bool IsFull() const {
> dcheck that num_samples <= capacity, i.e. that it is NOT over
Done


PS2, Line 990:   size_t byte_size() const {
 : return byte_size(max_num_samples);
 :   }
 : 
 :   // Returns true if the array of ReservoirSamples that follows 
this struct in memory is
 :   // full, and no more elements can be pushed back without 
resizing.
 :   bool IsFull() const {
 : return num_samples == max_num_samples;
 :   }
> i think both of these can be one line
One lined one of them, added a DCHECK to the other.


PS2, Line 1001: // make this const
> ?
Done


PS2, Line 1018: resizeReservoirSampleState
> nit: inconsistent casing in fn names
Make this one start with a capital letter. But other ones like begin() should 
probably start with a lowercase.


PS2, Line 1021: N_
> NUM
Done


PS2, Line 1024: new_size 
> new_capacity to avoid conflating with memory sizes
Done


PS2, Line 1024: new_size 
> Queries with a large number buckets will run out of memory much quicker tha
Done


PS2, Line 1024:   int new_size = state->max_num_samples * 10;
  :   DCHECK_EQ(state->max_num_samples, state->num_samples);
  :   DCHECK_LE(new_size, MAX_NUM_SAMPLES);
> Looks like it'll dcheck when capacity reaches MAX_NUM_SAMPLES. Shouldn't th
In this implementation it couldn't DCHECK, because we would start with capacity 
20, then go to 200, 2000, and finally 2. This is now changed because 
Mostafa suggested doubling instead.


PS2, Line 1036:   state->max_num_samples = new_size;
> why don't any other fields need to be initialized? e.g. the rng will be gar
The previous state gets copied in ReallocBuffer. The only thing we need to 
change is is the capacity. Added comment.


PS2, Line 1048: size
> again I'd vote for capacity
I got rid of this variable.


PS2, Line 1058:   *state = ReservoirSampleState();
> this looks like it'll initialize the memory properly. I think we'd probably
We don't need to do that on line 1035 because everything gets copied over in 
ReallocBuffer()


PS2, Line 1084:   ++state->source_size;
> I think this gets lost when you allocate a new State.
As mentioned above, everything is copied in realloc


PS2, Line 1152: ReservoirSampleMerge
> I think this algorithm will require some changes to handle varying length S
The capacity shouldn't matter in this algorithm. This algorithm only cares 
about num samples in src and dst states.


PS2, Line 1167: (dst->num_samples < MAX_NUM_SAMPLES
> I don't think you can rely on this anymore, dst may not have capacity MAX_N
If DST does not have enough capacity for MAX_NUM_SAMPLES, it will be resized on 
line 1170


-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-17 Thread Marcel Kornacker (Code Review)

Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 2:

Taras, why don't you run a targeted perf query to measure runtime overhead of 
2x vs. 10x resizing.

-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-15 Thread Taras Bobrovytsky (Code Review)

Taras Bobrovytsky has uploaded a new patch set (#2).

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..

IMPALA-4787: Optimize APPX_MEDIAN() memory usage

Before this change, ReservoirSample functions (such as APPX_MEDIAN())
allocated memory for 20,000 elements up front per grouping key. This
caused inefficient memory usage for aggregations with many grouping
keys.

This patch fixes this by initially allocating memory for 20 elements.
Once the buffer becomes full, we allocate room for 200 elements and
copy the original buffer into the new one. We continue increasing the
buffer size this way until the buffer has room for 20,000 elements as
before.

Testing:
Ran BE and some relevant EE tests locally.
No new tests were added, the existing tests should provide enough
coverage.

Memory benchmark:
The following query was used to verify that this patch reduces memory
usage:
SELECT APPX_MEDIAN(ss_sold_date_sk)
FROM tpcds.store_sales
GROUP BY ss_customer_sk;

Peak Mem in Agg nodes is reduced from 4.94 GB to 20.67 MB.

Summary before:

Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail

04:EXCHANGE 15.856ms5.856ms   14.82K  15.21K 0  
  -1.00 B  UNPARTITIONED
03:AGGREGATE33s721ms3s789ms   14.82K  15.21K   4.94 GB  
 10.00 MB  FINALIZE
02:EXCHANGE 3  139.276ms  157.753ms   15.60K  15.21K 0  
0  HASH(ss_customer_sk)
01:AGGREGATE32s851ms3s026ms   15.60K  15.21K   5.29 GB  
 10.00 MB  STREAMING
00:SCAN HDFS3   24.245ms   35.727ms  183.59K 183.59K   4.60 MB  
384.00 MB  tpcds.store_sales

Summary after:

Operator   #Hosts   Avg Time  Max Time#Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
---
04:EXCHANGE 11.869ms   1.869ms   14.82K  15.21K 0   
 -1.00 B  UNPARTITIONED
03:AGGREGATE3   34.225ms  37.978ms   14.82K  15.21K  20.67 MB   
10.00 MB  FINALIZE
02:EXCHANGE 3  860.854us   1.024ms   15.60K  15.21K 0   
   0  HASH(ss_customer_sk)
01:AGGREGATE3   74.048ms  93.605ms   15.60K  15.21K   9.99 MB   
10.00 MB  STREAMING
00:SCAN HDFS3   68.162ms  78.181ms  183.59K 183.59K   3.84 MB  
384.00 MB  tpcds.store_sales

Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
---
M be/src/exprs/aggregate-functions-ir.cc
M testdata/workloads/functional-query/queries/QueryTest/alloc-fail-init.test
2 files changed, 136 insertions(+), 33 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/6025/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Taras Bobrovytsky

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-15 Thread Taras Bobrovytsky (Code Review)

Taras Bobrovytsky has uploaded a new patch set (#2).

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..

IMPALA-4787: Optimize APPX_MEDIAN() memory usage

Before this change, ReservoirSample functions (such as APPX_MEDIAN())
allocated memory for 20,000 elements up front per grouping key. This
caused inefficient memory usage for aggregations with many grouping
keys.

This patch fixes this by initially allocating memory for 20 elements.
Once the buffer becomes full, we allocate room for 200 elements and
copy the original buffer into the new one. We continue increasing the
buffer size this way until the buffer has room for 20,000 elements as
before.

Testing:
Ran BE and some relevant EE tests locally.
No new tests were added, the existing tests should provide enough
coverage.

Memory benchmark:
The following query was used to verify that this patch reduces memory
usage:
SELECT APPX_MEDIAN(ss_sold_date_sk)
FROM tpcds.store_sales
GROUP BY ss_customer_sk;

Peak Mem in Agg nodes is reduced from 4.94 GB to 20.67 MB.

Summary before:

Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail

04:EXCHANGE 15.856ms5.856ms   14.82K  15.21K 0  
  -1.00 B  UNPARTITIONED
03:AGGREGATE33s721ms3s789ms   14.82K  15.21K   4.94 GB  
 10.00 MB  FINALIZE
02:EXCHANGE 3  139.276ms  157.753ms   15.60K  15.21K 0  
0  HASH(ss_customer_sk)
01:AGGREGATE32s851ms3s026ms   15.60K  15.21K   5.29 GB  
 10.00 MB  STREAMING
00:SCAN HDFS3   24.245ms   35.727ms  183.59K 183.59K   4.60 MB  
384.00 MB  tpcds.store_sales

Summary after:

Operator   #Hosts   Avg Time  Max Time#Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
---
04:EXCHANGE 11.869ms   1.869ms   14.82K  15.21K 0   
 -1.00 B  UNPARTITIONED
03:AGGREGATE3   34.225ms  37.978ms   14.82K  15.21K  20.67 MB   
10.00 MB  FINALIZE
02:EXCHANGE 3  860.854us   1.024ms   15.60K  15.21K 0   
   0  HASH(ss_customer_sk)
01:AGGREGATE3   74.048ms  93.605ms   15.60K  15.21K   9.99 MB   
10.00 MB  STREAMING
00:SCAN HDFS3   68.162ms  78.181ms  183.59K 183.59K   3.84 MB  
384.00 MB  tpcds.store_sales

Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
---
M be/src/exprs/aggregate-functions-ir.cc
M testdata/workloads/functional-query/queries/QueryTest/alloc-fail-init.test
2 files changed, 136 insertions(+), 33 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/6025/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Taras Bobrovytsky

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-15 Thread Taras Bobrovytsky (Code Review)

Taras Bobrovytsky has uploaded a new patch set (#3).

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..

IMPALA-4787: Optimize APPX_MEDIAN() memory usage

Before this change, ReservoirSample functions (such as APPX_MEDIAN())
allocated memory for 20,000 elements up front per grouping key. This
caused inefficient memory usage for aggregations with many grouping
keys.

This patch fixes this by initially allocating memory for 20 elements.
Once the buffer becomes full, we allocate room for 200 elements and
copy the original buffer into the new one. We continue increasing the
buffer size this way until the buffer has room for 20,000 elements as
before.

Testing:
Ran BE and some relevant EE tests locally.
No new tests were added, the existing tests should provide enough
coverage.

Memory benchmark:
The following query was used to verify that this patch reduces memory
usage:
SELECT APPX_MEDIAN(ss_sold_date_sk)
FROM tpcds.store_sales
GROUP BY ss_customer_sk;

Peak Mem in Agg nodes is reduced from 4.94 GB to 20.67 MB.

Summary before:

Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail

04:EXCHANGE 15.856ms5.856ms   14.82K  15.21K 0  
  -1.00 B  UNPARTITIONED
03:AGGREGATE33s721ms3s789ms   14.82K  15.21K   4.94 GB  
 10.00 MB  FINALIZE
02:EXCHANGE 3  139.276ms  157.753ms   15.60K  15.21K 0  
0  HASH(ss_customer_sk)
01:AGGREGATE32s851ms3s026ms   15.60K  15.21K   5.29 GB  
 10.00 MB  STREAMING
00:SCAN HDFS3   24.245ms   35.727ms  183.59K 183.59K   4.60 MB  
384.00 MB  tpcds.store_sales

Summary after:

Operator   #Hosts   Avg Time  Max Time#Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
---
04:EXCHANGE 11.869ms   1.869ms   14.82K  15.21K 0   
 -1.00 B  UNPARTITIONED
03:AGGREGATE3   34.225ms  37.978ms   14.82K  15.21K  20.67 MB   
10.00 MB  FINALIZE
02:EXCHANGE 3  860.854us   1.024ms   15.60K  15.21K 0   
   0  HASH(ss_customer_sk)
01:AGGREGATE3   74.048ms  93.605ms   15.60K  15.21K   9.99 MB   
10.00 MB  STREAMING
00:SCAN HDFS3   68.162ms  78.181ms  183.59K 183.59K   3.84 MB  
384.00 MB  tpcds.store_sales

Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
---
M be/src/exprs/aggregate-functions-ir.cc
M testdata/workloads/functional-query/queries/QueryTest/alloc-fail-init.test
2 files changed, 136 insertions(+), 33 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/6025/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Taras Bobrovytsky

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-15 Thread Taras Bobrovytsky (Code Review)

Taras Bobrovytsky has uploaded a new patch set (#2).

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..

IMPALA-4787: Optimize APPX_MEDIAN() memory usage

Before this change, ReservoirSample functions (such as APPX_MEDIAN())
allocated memory for 20,000 elements up front per grouping key. This
caused inefficient memory usage for aggregations with many grouping
keys.

This patch fixes this by initially allocating memory for 20 elements.
Once the buffer becomes full, we allocate room for 200 elements and
copy the original buffer into the new one. We continue increasing the
buffer size this way until the buffer has room for 20,000 elements as
before.

Testing:
Ran BE and some relevant EE tests locally.
No new tests were added, the existing tests should provide enough
coverage.

Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
---
M be/src/exprs/aggregate-functions-ir.cc
M testdata/workloads/functional-query/queries/QueryTest/alloc-fail-init.test
2 files changed, 136 insertions(+), 33 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/6025/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Taras Bobrovytsky

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-15 Thread Taras Bobrovytsky (Code Review)

Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 1:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/6025/1/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

Line 146: // We assume that the dst buffer has already been allocated earlier. 
This increases
> can simplify with:
Done


Line 150:   uint8_t* ptr = ctx->Reallocate(dst->ptr, buf_len);
> DCHECK(dst->ptr != NULL);
Done


Line 918: const static int NUM_SAMPLES = NUM_BUCKETS * NUM_SAMPLES_PER_BUCKET;
> NUM_SAMPLES -> MAX_NUM_SAMPLES
Done


Line 974:   ReservoirSample* at(int64_t idx) {
> single line, const function
Done


Line 979:   void push_back(ReservoirSample s) {
> use const&, otherwise 's' is copied twice
Done


Line 982: ReservoirSample* arr = 
reinterpret_cast(begin());
> no need to reinterpret_cast
Done


Line 987:   ReservoirSample* begin() {
> const function (and elsewhere)
Done


Line 994:   ReservoirSample* end() {
> single line
Done


Line 1005: void resizeReservoirSampleState(FunctionContext* ctx, StringVal* 
buffer) {
> Why not move this function into ReservoirSampleState and make ReservoirSamp
Unfortunately it's not possible to encapsulate because we also need to update 
the pointer in the StringVal buffer. It would be more complicated and error 
prone to pass it as one of parameters into this function.


Line 1007:   // can fit 10 times as many samples, up to a maximum of 20,000. We 
do not allocate
> instead of 20,000 use the constant MAX_NUM_SAMPLES
Done


Line 1033:   // If the array gets filled due to updates or merges, we will 
reallocate a larger
> suggest "we reallocate a larger buffer to hold up to a maximum of MAX_NUM_S
Done


Line 1035:   int init_size = (NUM_SAMPLES / 1000);
> It would be nice to have this encapsulated in the ReservoirSampleState as w
As mentioned above, it's not possible to encapsulate cleanly.


Line 1061:   resizeReservoirSampleState(ctx, dst);
> if ReallocBuffer() failed, dst->ptr will be in an undefined state, and may 
I just added a check here, because it's not possible to move the logic as you 
suggested.


Line 1147:   DCHECK_EQ(src_val.len, sizeof(ReservoirSampleState) +
> you can add a helper function to ReservoirSampleState to do this size compu
Done. Also added IsFull().


-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-15 Thread Alex Behm (Code Review)

Alex Behm has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 1:

(14 comments)

Did you verify the memory savings with a simple experiment?

http://gerrit.cloudera.org:8080/#/c/6025/1/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

Line 146: // We assume that the dst buffer has already been allocated earlier. 
This increases
can simplify with:

Same as AllocBuffer() above but for re-allocating 'dst->ptr'.


Line 150:   uint8_t* ptr = ctx->Reallocate(dst->ptr, buf_len);
DCHECK(dst->ptr != NULL);


Line 918: const static int NUM_SAMPLES = NUM_BUCKETS * NUM_SAMPLES_PER_BUCKET;
NUM_SAMPLES -> MAX_NUM_SAMPLES


Line 974:   ReservoirSample* at(int64_t idx) {
single line, const function


Line 979:   void push_back(ReservoirSample s) {
use const&, otherwise 's' is copied twice


Line 982: ReservoirSample* arr = 
reinterpret_cast(begin());
no need to reinterpret_cast


Line 987:   ReservoirSample* begin() {
const function (and elsewhere)


Line 994:   ReservoirSample* end() {
single line


Line 1005: void resizeReservoirSampleState(FunctionContext* ctx, StringVal* 
buffer) {
Why not move this function into ReservoirSampleState and make 
ReservoirSampleState auto-resize itself during push_back()? You can add ctx as 
a param to push_back.

That way all the logic is encapsulated inside ReservoirSampleState and not 
throughout various agg fn implementations


Line 1007:   // can fit 10 times as many samples, up to a maximum of 20,000. We 
do not allocate
instead of 20,000 use the constant MAX_NUM_SAMPLES


Line 1033:   // If the array gets filled due to updates or merges, we will 
reallocate a larger
suggest "we reallocate a larger buffer to hold up to a maximum of 
MAX_NUM_SAMPLES ReservoirSamples."


Line 1035:   int init_size = (NUM_SAMPLES / 1000);
It would be nice to have this encapsulated in the ReservoirSampleState as well.


Line 1061:   resizeReservoirSampleState(ctx, dst);
if ReallocBuffer() failed, dst->ptr will be in an undefined state, and may 
cause a crash

I suggest moving the reallocation logic into the push_back() of 
ReservoirSampleState. You might use the return code of push_back() to signal 
success/failure, so you can handle the failure case here.


Line 1147:   DCHECK_EQ(src_val.len, sizeof(ReservoirSampleState) +
you can add a helper function to ReservoirSampleState to do this size 
computation


-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-15 Thread Matthew Jacobs (Code Review)

Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 1:

I'll do a CR later tonight or tmr morning.

-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-15 Thread Taras Bobrovytsky (Code Review)

Taras Bobrovytsky has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6025

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..

IMPALA-4787: Optimize APPX_MEDIAN() memory usage

Before this change, ReservoirSample functions (such as APPX_MEDIAN())
allocated memory for 20,000 elements up front per grouping key. This
caused inefficient memory usage for aggregations with many grouping
keys.

This patch fixes this by initially allocating memory for 20 elements.
Once the buffer becomes full, we allocate room for 200 elements and
copy the original buffer into the new one. We continue increasing the
buffer size this way until the buffer has room for 20,000 elements as
before.

Testing:
Ran BE and some relevant EE tests locally.
No new tests were added, the existing tests should provide enough
coverage.

Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
---
M be/src/exprs/aggregate-functions-ir.cc
1 file changed, 124 insertions(+), 30 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/6025/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky

84 matches

Mail list logo