[Impala-ASF-CR] IMPALA-2020, 4915, 4936: Add rounding for decimal casts

2017-02-21 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change.

Change subject: IMPALA-2020, 4915, 4936: Add rounding for decimal casts
..


Patch Set 18:

> Uploaded patch set 20.

 > Uploaded patch set 21.

rebased and fixed merge conflict

-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 18
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2020, 4915, 4936: Add rounding for decimal casts

2017-02-21 Thread Zach Amsden (Code Review)
Zach Amsden has uploaded a new patch set (#21).

Change subject: IMPALA-2020, 4915, 4936: Add rounding for decimal casts
..

IMPALA-2020, 4915, 4936: Add rounding for decimal casts

This change adds support for DECIMAL_V2 rounding behavior for both
DECIMAL to INT and DOUBLE to DECIMAL casts.  The round behavior
implemented for exact halves is round halves away from zero (e.g
(0.5 -> 1) and (-0.5 -> -1)).

This change also fixes two open bugs regarding overflow checking.
The root cause of both behaviors turned out to be the same - a
missing std:: caused the wrong abs() function to be used.  Due
to details of IEEE floating point representation, this actually
masked another bug, as NaN is often represented as all 1-bits,
which fails the overflow test.  Since the implicit conversion to
int lost precision, we ended up storing large numbers that don't
actually represent valid decimal numbers in the range when the
value happened to be +/- Infinity.  This caused the rendering
back to ASCII characters to go awry, but is otherwise harmless.

Testing: Added expr-test and decimal-test test coverage as well as
manual testing.  I tried to update the expr benchmark to get some
kind of results but the benchmark is pretty bit-rotted.  It was
throwing JNI exceptions.  Fixed up the JNI init call, but there is
still a lot of work to do to get this back in a runnable state.
Even with the hack to get at the RuntimeContext, we end up getting
null derefs due to the slot descriptor table not being initialized.

Output comparisons, before | after
+--+
| cast(0.5 as int) |
+--+
| 0|  1|
+--+

+-+
| cast(cast(0.5999 as float) as decimal(5,1)) |
+-+
| 0.5  | 0.6  |
+-+

Performance summary.  In all cases I have tried multiple times and
taken the fastest query results.

Old version, head at 815c76f9cbbe6585ebed961da506fc54ce2ef4e3:

select sum(cast(l_extendedprice as bigint)) from tpch10_parquet.lineitem;
+--+
| sum(cast(l_extendedprice as bigint)) |
+--+
| 2293784575265|
+--+
Fetched 1 row(s) in 0.53s

With this change, and decimal_v2 off:

+--+
| sum(cast(l_extendedprice as bigint)) |
+--+
| 2293784575265|
+--+
Fetched 1 row(s) in 0.52s

Note that there is some noise / instability in these results and across
invocations there is quite a bit of variance.  Still we appear not to
have regressed.

With decimal V2 enabled, we loose some performance due to rounding.

DECIMAL_V2 set to 1
+--+
| sum(cast(l_extendedprice as bigint)) |
+--+
| 2293814088985|
+--+
Fetched 1 row(s) in 0.63s

So we're about 20% slower.  The variance is quite a lot so this is not a
scientific number, but the trend is maintained.  So we have some work to
do to get this back.

Casting from double seems to be roughly at parity:

Old version, head at 815c76f9cbbe6585ebed961da506fc54ce2ef4e3:
+-+
| sum(cast(cast(l_extendedprice as double) as decimal(14,2))) |
+-+
| 2293813121802.09|
+-+
Fetched 1 row(s) in 0.63s
+--+
| sum(cast(cast(l_extendedprice as double) as decimal(38,10))) |
+--+
| 2293813156773.3596978911 |
+--+
Fetched 1 row(s) in 0.72s

New version, decimal_v2=0
+-+
| sum(cast(cast(l_extendedprice as double) as decimal(14,2))) |
+-+
| 2293813121802.09|
+-+
Fetched 1 row(s) in 0.64s
+--+
| sum(cast(cast(l_extendedprice as double) as decimal(38,10))) |
+--+
| 2293813156773.3596978911 |
+--+
Fetched 1 row(s) in 0.73s

New version, decimal_v2=1;
+-+

[Impala-ASF-CR] IMPALA-2020, 4915, 4936: Add rounding for decimal casts

2017-02-21 Thread Zach Amsden (Code Review)
Zach Amsden has uploaded a new patch set (#20).

Change subject: IMPALA-2020, 4915, 4936: Add rounding for decimal casts
..

IMPALA-2020, 4915, 4936: Add rounding for decimal casts

This change adds support for DECIMAL_V2 rounding behavior for both
DECIMAL to INT and DOUBLE to DECIMAL casts.  The round behavior
implemented for exact halves is round halves away from zero (e.g
(0.5 -> 1) and (-0.5 -> -1)).

This change also fixes two open bugs regarding overflow checking.
The root cause of both behaviors turned out to be the same - a
missing std:: caused the wrong abs() function to be used.  Due
to details of IEEE floating point representation, this actually
masked another bug, as NaN is often represented as all 1-bits,
which fails the overflow test.  Since the implicit conversion to
int lost precision, we ended up storing large numbers that don't
actually represent valid decimal numbers in the range when the
value happened to be +/- Infinity.  This caused the rendering
back to ASCII characters to go awry, but is otherwise harmless.

Testing: Added expr-test and decimal-test test coverage as well as
manual testing.  I tried to update the expr benchmark to get some
kind of results but the benchmark is pretty bit-rotted.  It was
throwing JNI exceptions.  Fixed up the JNI init call, but there is
still a lot of work to do to get this back in a runnable state.
Even with the hack to get at the RuntimeContext, we end up getting
null derefs due to the slot descriptor table not being initialized.

Output comparisons, before | after
+--+
| cast(0.5 as int) |
+--+
| 0|  1|
+--+

+-+
| cast(cast(0.5999 as float) as decimal(5,1)) |
+-+
| 0.5  | 0.6  |
+-+

Performance summary.  In all cases I have tried multiple times and
taken the fastest query results.

Old version, head at 815c76f9cbbe6585ebed961da506fc54ce2ef4e3:

select sum(cast(l_extendedprice as bigint)) from tpch10_parquet.lineitem;
+--+
| sum(cast(l_extendedprice as bigint)) |
+--+
| 2293784575265|
+--+
Fetched 1 row(s) in 0.53s

With this change, and decimal_v2 off:

+--+
| sum(cast(l_extendedprice as bigint)) |
+--+
| 2293784575265|
+--+
Fetched 1 row(s) in 0.52s

Note that there is some noise / instability in these results and across
invocations there is quite a bit of variance.  Still we appear not to
have regressed.

With decimal V2 enabled, we loose some performance due to rounding.

DECIMAL_V2 set to 1
+--+
| sum(cast(l_extendedprice as bigint)) |
+--+
| 2293814088985|
+--+
Fetched 1 row(s) in 0.63s

So we're about 20% slower.  The variance is quite a lot so this is not a
scientific number, but the trend is maintained.  So we have some work to
do to get this back.

Casting from double seems to be roughly at parity:

Old version, head at 815c76f9cbbe6585ebed961da506fc54ce2ef4e3:
+-+
| sum(cast(cast(l_extendedprice as double) as decimal(14,2))) |
+-+
| 2293813121802.09|
+-+
Fetched 1 row(s) in 0.63s
+--+
| sum(cast(cast(l_extendedprice as double) as decimal(38,10))) |
+--+
| 2293813156773.3596978911 |
+--+
Fetched 1 row(s) in 0.72s

New version, decimal_v2=0
+-+
| sum(cast(cast(l_extendedprice as double) as decimal(14,2))) |
+-+
| 2293813121802.09|
+-+
Fetched 1 row(s) in 0.64s
+--+
| sum(cast(cast(l_extendedprice as double) as decimal(38,10))) |
+--+
| 2293813156773.3596978911 |
+--+
Fetched 1 row(s) in 0.73s

New version, decimal_v2=1;
+-+

[Impala-ASF-CR] IMPALA-2020, 4915, 4936: Add rounding for decimal casts

2017-02-21 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change.

Change subject: IMPALA-2020, 4915, 4936: Add rounding for decimal casts
..


Patch Set 18:

(4 comments)

Nope, doesn't solve the DCHECK issue.  I added a DCHECK that the static cast 
back to the target type from double equals the actual value of the target type, 
and it failed.  Issue is real, see the updated comment in 
be/src/runtime/decimal-value.inline.h

Floating point is not so fun, and the conversion really does lose precision.  
Issue is also pre-existing so I'm not trying to fix it now.

http://gerrit.cloudera.org:8080/#/c/5951/18/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

Line 1395:  { true, 0, 38, 38 }}}, // This is overflowing, but wasn't 
noticed by the test before
> let's just say IMPALA-4964 rather than refer to what the old code did, and 
Done


PS18, Line 1397: related to this
> related to what?
Done


PS18, Line 1398: IMAPALA
> IMPALA. also combine to one line
Done


Line 1635: TestIsNotNull(c.expr, type);
> mentioned in the other change -- 
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 18
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2020, 4915, 4936: Add rounding for decimal casts

2017-02-21 Thread Zach Amsden (Code Review)
Zach Amsden has uploaded a new patch set (#19).

Change subject: IMPALA-2020, 4915, 4936: Add rounding for decimal casts
..

IMPALA-2020, 4915, 4936: Add rounding for decimal casts

This change adds support for DECIMAL_V2 rounding behavior for both
DECIMAL to INT and DOUBLE to DECIMAL casts.  The round behavior
implemented for exact halves is round halves away from zero (e.g
(0.5 -> 1) and (-0.5 -> -1)).

This change also fixes two open bugs regarding overflow checking.
The root cause of both behaviors turned out to be the same - a
missing std:: caused the wrong abs() function to be used.  Due
to details of IEEE floating point representation, this actually
masked another bug, as NaN is often represented as all 1-bits,
which fails the overflow test.  Since the implicit conversion to
int lost precision, we ended up storing large numbers that don't
actually represent valid decimal numbers in the range when the
value happened to be +/- Infinity.  This caused the rendering
back to ASCII characters to go awry, but is otherwise harmless.

Testing: Added expr-test and decimal-test test coverage as well as
manual testing.  I tried to update the expr benchmark to get some
kind of results but the benchmark is pretty bit-rotted.  It was
throwing JNI exceptions.  Fixed up the JNI init call, but there is
still a lot of work to do to get this back in a runnable state.
Even with the hack to get at the RuntimeContext, we end up getting
null derefs due to the slot descriptor table not being initialized.

Output comparisons, before | after
+--+
| cast(0.5 as int) |
+--+
| 0|  1|
+--+

+-+
| cast(cast(0.5999 as float) as decimal(5,1)) |
+-+
| 0.5  | 0.6  |
+-+

Performance summary.  In all cases I have tried multiple times and
taken the fastest query results.

Old version, head at 815c76f9cbbe6585ebed961da506fc54ce2ef4e3:

select sum(cast(l_extendedprice as bigint)) from tpch10_parquet.lineitem;
+--+
| sum(cast(l_extendedprice as bigint)) |
+--+
| 2293784575265|
+--+
Fetched 1 row(s) in 0.53s

With this change, and decimal_v2 off:

+--+
| sum(cast(l_extendedprice as bigint)) |
+--+
| 2293784575265|
+--+
Fetched 1 row(s) in 0.52s

Note that there is some noise / instability in these results and across
invocations there is quite a bit of variance.  Still we appear not to
have regressed.

With decimal V2 enabled, we loose some performance due to rounding.

DECIMAL_V2 set to 1
+--+
| sum(cast(l_extendedprice as bigint)) |
+--+
| 2293814088985|
+--+
Fetched 1 row(s) in 0.63s

So we're about 20% slower.  The variance is quite a lot so this is not a
scientific number, but the trend is maintained.  So we have some work to
do to get this back.

Casting from double seems to be roughly at parity:

Old version, head at 815c76f9cbbe6585ebed961da506fc54ce2ef4e3:
+-+
| sum(cast(cast(l_extendedprice as double) as decimal(14,2))) |
+-+
| 2293813121802.09|
+-+
Fetched 1 row(s) in 0.63s
+--+
| sum(cast(cast(l_extendedprice as double) as decimal(38,10))) |
+--+
| 2293813156773.3596978911 |
+--+
Fetched 1 row(s) in 0.72s

New version, decimal_v2=0
+-+
| sum(cast(cast(l_extendedprice as double) as decimal(14,2))) |
+-+
| 2293813121802.09|
+-+
Fetched 1 row(s) in 0.64s
+--+
| sum(cast(cast(l_extendedprice as double) as decimal(38,10))) |
+--+
| 2293813156773.3596978911 |
+--+
Fetched 1 row(s) in 0.73s

New version, decimal_v2=1;
+-+

[Impala-ASF-CR] IMPALA-4936 and IMPALA-4915: Fix decimal overflow test

2017-02-21 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-4936 and IMPALA-4915: Fix decimal overflow test
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6068/2/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

PS2, Line 83: abs
> I tried that and it actually turned out to be a bug.  The places where std:
I'm confused: how is std::abs wrong here for integers? int64_t d is never and 
int on x86-64 linux, right? Also, isn't C's global-namespace abs UB on INT_MIN?


-- 
To view, visit http://gerrit.cloudera.org:8080/6068
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8de7440a585c1d3d937fcbb435b9ead77e7b5a63
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2020, 4915, 4936: Add rounding for decimal casts

2017-02-21 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-2020, 4915, 4936: Add rounding for decimal casts
..


Patch Set 18:

(4 comments)

Does this change solve the DCHECK you were hitting running expr-test?

http://gerrit.cloudera.org:8080/#/c/5951/18/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

Line 1395:  { true, 0, 38, 38 }}}, // This is overflowing, but wasn't 
noticed by the test before
let's just say IMPALA-4964 rather than refer to what the old code did, and why 
not combine both lines (we have the same bug in both v1 and v2)?


PS18, Line 1397: related to this
related to what?


PS18, Line 1398: IMAPALA
IMPALA. also combine to one line


Line 1635: TestIsNotNull(c.expr, type);
mentioned in the other change -- 
thanks for finding that. we should probably add 
EXPECT_EQ(result, StringParser::PARSE_SUCCESS);
to the end of TestDecimalValue(). What's happening is the decimal string parser 
is failing (when parsing "NULL"), but when it does that it happens to return a 
DecimalValue with value()=0.


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 18
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4936 and IMPALA-4915: Fix decimal overflow test

2017-02-21 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-4936 and IMPALA-4915: Fix decimal overflow test
..


Patch Set 2:

since you folded this in to the other change, please abandon this change so 
people don't review it more.

-- 
To view, visit http://gerrit.cloudera.org:8080/6068
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8de7440a585c1d3d937fcbb435b9ead77e7b5a63
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4821: Update AVG() for DECIMAL V2

2017-02-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4821: Update AVG() for DECIMAL_V2
..


IMPALA-4821: Update AVG() for DECIMAL_V2

This change implements the DECIMAL_V2's behavior for AVG().
The differences with DECIMAL_V1 are:

1. The output type has a minimum scale of 6. This is similar
to MS SQL's behavior which takes the max of 6 and the input
type's scale. We deviate from MS SQL in the output's precision
which is always set to 38. We use the smallest precision which
can store the output. A key insight is that the output of AVG()
is no wider than the inputs. Precision only needs to be adjusted
when the scale is augmented. Using a smaller precision avoids
potential loss of precision in subsequent decimal operations
(e.g. division) if AVG() is a subexpression. Please note that
the output type is different from SUM()/COUNT() as the latter
can have a much larger scale.

2. Due to a minimum of 6 decimal places for the output,
AVG() for decimal values whose whole number part exceeds 32
decimal places (e.g. DECIMAL(38,4), DECIMAL(33,0)) will
always overflow as the scale is augmented to 6. Certain
decimal types which work with AVG() in DECIMAL_V1 no longer
work in DECIMAL_V2.

Change-Id: I28f5ef0370938440eb5b1c6d29b2f24e6f88499f
Reviewed-on: http://gerrit.cloudera.org:8080/6038
Reviewed-by: Dan Hecht 
Reviewed-by: Alex Behm 
Tested-by: Impala Public Jenkins
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/expr-test.cc
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test
M testdata/workloads/functional-query/queries/QueryTest/decimal.test
5 files changed, 235 insertions(+), 76 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Alex Behm: Looks good to me, approved
  Dan Hecht: Looks good to me, but someone else must approve



-- 
To view, visit http://gerrit.cloudera.org:8080/6038
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I28f5ef0370938440eb5b1c6d29b2f24e6f88499f
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho 


[Impala-ASF-CR] IMPALA-4821: Update AVG() for DECIMAL V2

2017-02-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4821: Update AVG() for DECIMAL_V2
..


Patch Set 5: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6038
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I28f5ef0370938440eb5b1c6d29b2f24e6f88499f
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4936 and IMPALA-4915: Fix decimal overflow test

2017-02-21 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-4936 and IMPALA-4915: Fix decimal overflow test
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6068/2/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

PS2, Line 1397: true
> I am missing something too... was the test broken before?
yes, the test was broken before. we do return NULL in these cases.  
Technically, we should return 0.998 in both but due to how we implement MOD, we 
overflow during the calculation. That's a similar problem to IMPALA-4939, but I 
think less important and severe.


Line 1635: TestIsNotNull(c.expr, type);
> I added this because I noticed the non-NULL test cases were passing even wh
thanks for finding that. we should probably add 

EXPECT_EQ(result, StringParser::PARSE_SUCCESS);

to the end of TestDecimalValue(). What's happening is the decimal string parser 
is failing (when parsing "NULL"), but when it does that it happens to return a 
DecimalValue with value()=0.


-- 
To view, visit http://gerrit.cloudera.org:8080/6068
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8de7440a585c1d3d937fcbb435b9ead77e7b5a63
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4936 and IMPALA-4915: Fix decimal overflow test

2017-02-21 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change.

Change subject: IMPALA-4936 and IMPALA-4915: Fix decimal overflow test
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6068/2/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

PS2, Line 1397: true
> I am missing something too... was the test broken before?
I filed JIRA-4964 to investigate.  Test was broken before and seems to have 
uncovered a bug with modulus.


http://gerrit.cloudera.org:8080/#/c/6068/2/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

PS2, Line 83: abs
> I think we should be consistent and use std::abs() at least in this entire 
I tried that and it actually turned out to be a bug.  The places where std::abs 
and abs are correct turned out to be a lot more subtle than expected.  std::abs 
selects the correct version for floating point but the wrong version for 
integers as abs(INT_MIN) is undefined.


-- 
To view, visit http://gerrit.cloudera.org:8080/6068
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8de7440a585c1d3d937fcbb435b9ead77e7b5a63
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2020, 4915, 4936: Add rounding for decimal casts

2017-02-21 Thread Zach Amsden (Code Review)
Zach Amsden has uploaded a new patch set (#18).

Change subject: IMPALA-2020, 4915, 4936: Add rounding for decimal casts
..

IMPALA-2020, 4915, 4936: Add rounding for decimal casts

This change adds support for DECIMAL_V2 rounding behavior for both
DECIMAL to INT and DOUBLE to DECIMAL casts.  The round behavior
implemented for exact halves is round halves away from zero (e.g
(0.5 -> 1) and (-0.5 -> -1)).

This change also fixes two open bugs regarding overflow checking.
The root cause of both behaviors turned out to be the same - a
missing std:: caused the wrong abs() function to be used.  Due
to details of IEEE floating point representation, this actually
masked another bug, as NaN is often represented as all 1-bits,
which fails the overflow test.  Since the implicit conversion to
int lost precision, we ended up storing large numbers that don't
actually represent valid decimal numbers in the range when the
value happened to be +/- Infinity.  This caused the rendering
back to ASCII characters to go awry, but is otherwise harmless.

Testing: Added expr-test and decimal-test test coverage as well as
manual testing.  I tried to update the expr benchmark to get some
kind of results but the benchmark is pretty bit-rotted.  It was
throwing JNI exceptions.  Fixed up the JNI init call, but there is
still a lot of work to do to get this back in a runnable state.
Even with the hack to get at the RuntimeContext, we end up getting
null derefs due to the slot descriptor table not being initialized.

Output comparisons, before | after
+--+
| cast(0.5 as int) |
+--+
| 0|  1|
+--+

+-+
| cast(cast(0.5999 as float) as decimal(5,1)) |
+-+
| 0.5  | 0.6  |
+-+

Performance summary.  In all cases I have tried multiple times and
taken the fastest query results.

Old version, head at 815c76f9cbbe6585ebed961da506fc54ce2ef4e3:

select sum(cast(l_extendedprice as bigint)) from tpch10_parquet.lineitem;
+--+
| sum(cast(l_extendedprice as bigint)) |
+--+
| 2293784575265|
+--+
Fetched 1 row(s) in 0.53s

With this change, and decimal_v2 off:

+--+
| sum(cast(l_extendedprice as bigint)) |
+--+
| 2293784575265|
+--+
Fetched 1 row(s) in 0.52s

Note that there is some noise / instability in these results and across
invocations there is quite a bit of variance.  Still we appear not to
have regressed.

With decimal V2 enabled, we loose some performance due to rounding.

DECIMAL_V2 set to 1
+--+
| sum(cast(l_extendedprice as bigint)) |
+--+
| 2293814088985|
+--+
Fetched 1 row(s) in 0.63s

So we're about 20% slower.  The variance is quite a lot so this is not a
scientific number, but the trend is maintained.  So we have some work to
do to get this back.

Casting from double seems to be roughly at parity:

Old version, head at 815c76f9cbbe6585ebed961da506fc54ce2ef4e3:
+-+
| sum(cast(cast(l_extendedprice as double) as decimal(14,2))) |
+-+
| 2293813121802.09|
+-+
Fetched 1 row(s) in 0.63s
+--+
| sum(cast(cast(l_extendedprice as double) as decimal(38,10))) |
+--+
| 2293813156773.3596978911 |
+--+
Fetched 1 row(s) in 0.72s

New version, decimal_v2=0
+-+
| sum(cast(cast(l_extendedprice as double) as decimal(14,2))) |
+-+
| 2293813121802.09|
+-+
Fetched 1 row(s) in 0.64s
+--+
| sum(cast(cast(l_extendedprice as double) as decimal(38,10))) |
+--+
| 2293813156773.3596978911 |
+--+
Fetched 1 row(s) in 0.73s

New version, decimal_v2=1;
+-+

[Impala-ASF-CR] IMPALA-4936 and IMPALA-4915: Fix decimal overflow test

2017-02-21 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change.

Change subject: IMPALA-4936 and IMPALA-4915: Fix decimal overflow test
..


Patch Set 2:

(3 comments)

I'm folding this into the cast change anyway.

http://gerrit.cloudera.org:8080/#/c/6068/1//COMMIT_MSG
Commit Message:

Line 22: 
> I think this probably should be fixed in this commit.
This was a real problem, found it.


http://gerrit.cloudera.org:8080/#/c/6068/2/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

PS2, Line 1397: true
> Sorry, I may be missing something. Why is this NULL ?
I am missing something too... was the test broken before?


Line 1635: TestIsNotNull(c.expr, type);
I added this because I noticed the non-NULL test cases were passing even when 
the value happened to match, which was the case for some zero values.  Looks 
like something is broken with modulus.


-- 
To view, visit http://gerrit.cloudera.org:8080/6068
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8de7440a585c1d3d937fcbb435b9ead77e7b5a63
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3401 [DOCS] Phase 3 of removing Cloudera Manager from upstream docs.

2017-02-21 Thread Laurel Hale (Code Review)
Laurel Hale has uploaded a new patch set (#2).

Change subject: IMPALA-3401 [DOCS] Phase 3 of removing Cloudera Manager from 
upstream docs.
..

IMPALA-3401 [DOCS] Phase 3 of removing Cloudera Manager from upstream
docs.

Hid instances of CM and rewrote for upstream docs when necessary.
This still leaves occurences of CM in the XML, but not in the rendered
documentation. A later project will remove all occurrences of CM from
the XML. This patch set includes fixes made in response to John
Russell's review.

Change-Id: I4748300edc43b7071afc50e7cc7ddd64120c0d8d
---
M docs/topics/impala_impala_shell.xml
M docs/topics/impala_proxy.xml
M docs/topics/impala_resource_management.xml
M docs/topics/impala_timeouts.xml
M docs/topics/impala_udf.xml
5 files changed, 116 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/6067/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6067
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4748300edc43b7071afc50e7cc7ddd64120c0d8d
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Laurel Hale 
Gerrit-Reviewer: Ambreen Kazi 
Gerrit-Reviewer: John Russell 


[Impala-ASF-CR] IMPALA-2020, 4915, 4936: Add rounding for decimal casts

2017-02-21 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change.

Change subject: IMPALA-2020, 4915, 4936: Add rounding for decimal casts
..


Patch Set 17:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5951/17/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

PS17, Line 1395: N.B. - with full decimal V2 this should evaluate to 1
> How so? We've already added the new V2 result type formula for %, and you g
Old comment, I mistook 38, 38 for 38, 0


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 17
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4934: Disable Kudu OpenSSL initialization

2017-02-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4934: Disable Kudu OpenSSL initialization
..


IMPALA-4934: Disable Kudu OpenSSL initialization

Bumps the Kudu version to include the change to the client
that allows Impala to disable SSL initialization.

In authentication.cc, after Impala initializes OpenSSL,
Impala then disables Kudu's OpenSSL init.

Fixed a python test case that started failing after bumping
the Kudu client version.

Change-Id: I3f13f3af512c6d771979638da593685524c73086
Reviewed-on: http://gerrit.cloudera.org:8080/6056
Reviewed-by: Matthew Jacobs 
Tested-by: Impala Public Jenkins
---
M be/src/rpc/authentication.cc
M bin/impala-config.sh
M infra/python/deps/download_requirements
M infra/python/deps/requirements.txt
M tests/query_test/test_kudu.py
5 files changed, 13 insertions(+), 6 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Matthew Jacobs: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/6056
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I3f13f3af512c6d771979638da593685524c73086
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 


[Impala-ASF-CR] IMPALA-4934: Disable Kudu OpenSSL initialization

2017-02-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4934: Disable Kudu OpenSSL initialization
..


Patch Set 6: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6056
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3f13f3af512c6d771979638da593685524c73086
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2020, 4915, 4936: Add rounding for decimal casts

2017-02-21 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-2020, 4915, 4936: Add rounding for decimal casts
..


Patch Set 17:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5951/17/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

PS17, Line 1395: N.B. - with full decimal V2 this should evaluate to 1
How so? We've already added the new V2 result type formula for %, and you get 
decimal(38, 38) in this case which is consistent with other engines.  what type 
would be a better choice?


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 17
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2328: Read support for min/max Parquet statistics

2017-02-21 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-2328: Read support for min/max Parquet statistics
..


Patch Set 6:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/6032/6/be/src/exec/hdfs-parquet-scanner.h
File be/src/exec/hdfs-parquet-scanner.h:

Line 382:   /// Min/max statistics contexts, owned by instances of this class.
they are not owned by the class instances themselves, they're allocated in 
state_->obj_pool()


Line 386:   /// 'min_max_conjunct_ctxs_', which are used when evaluating 
parquet::Statistics.
somewhat incomprehensible comment.

this is used only for a specific function, and then only to avoid the dynamic 
allocation overhead. point this out, right now it seems very mysterious.


Line 476:   Status EvaluateRowGroupStats(const parquet::RowGroup& row_group, 
bool* skip_row_group);
you're not evaluating stats, you're evaluating predicates on the stats


http://gerrit.cloudera.org:8080/#/c/6032/6/be/src/exec/parquet-column-stats.h
File be/src/exec/parquet-column-stats.h:

Line 31: enum class StatsField { MIN, MAX };
move into ColumnStatsBase so it's clear what stats this is referring to.

also, add enums for all fields in Statistics.


Line 114:   static bool ReadFromThrift(const parquet::Statistics& thrift_stats, 
const StatsField& stats_field,
move function bodies into parquet-column-stats-inl.h


http://gerrit.cloudera.org:8080/#/c/6032/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

Line 297: collectionConjuncts_.clear();
> My reasoning was that any constant expression could be evaluated against th
true, but those should be identical, given that we rewrite constants as 
literals.


http://gerrit.cloudera.org:8080/#/c/6032/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

Line 85:  * conjuncts, that are passed to the backend and will be evaluated 
against the
"conjuncts that"


Line 154:   private List minMaxPlanConjuncts_ = Lists.newArrayList();
"plan conjuncts" doesn't really mean anything. original conjuncts?


Line 302:* Builds a predicate to evaluate parquet::Statistics by copying 
'inputSlot' into
"to evaluate against"


Line 347:   // Only constant exprs can be evaluated against the 
parquet::Statistics. This
drop "the"


Line 349:   if (!constExpr.isConstant()) continue;
as pointed out before, you shouldn't see non-literal constants here


http://gerrit.cloudera.org:8080/#/c/6032/4/testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
File testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test:

Line 1047:runtime filters: RF000 -> c_custkey
> I changed the code to track the original predicates and display them instea
extended is fine.

yes, "parquet statistics" is more concrete.


http://gerrit.cloudera.org:8080/#/c/6032/6/testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test:

Line 1: # Test HDFS scan node.
run this file in extended mode so you see the stats predicates.


http://gerrit.cloudera.org:8080/#/c/6032/6/testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test
File testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test:

Line 206: # Test that without expr rewrite and thus without constant folding, 
constant exprs still
what's the point of those tests?


-- 
To view, visit http://gerrit.cloudera.org:8080/6032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I39b836165756fcf929c801048d91c50c8fdcdae4
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Matthew Mulder 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2020, 4915, 4936: Add rounding for decimal casts

2017-02-21 Thread Zach Amsden (Code Review)
Zach Amsden has uploaded a new patch set (#17).

Change subject: IMPALA-2020, 4915, 4936: Add rounding for decimal casts
..

IMPALA-2020, 4915, 4936: Add rounding for decimal casts

This change adds support for DECIMAL_V2 rounding behavior for both
DECIMAL to INT and DOUBLE to DECIMAL casts.  The round behavior
implemented for exact halves is round halves away from zero (e.g
(0.5 -> 1) and (-0.5 -> -1)).

This change also fixes two open bugs regarding overflow checking.
The root cause of both behaviors turned out to be the same - a
missing std:: caused the wrong abs() function to be used.  Due
to details of IEEE floating point representation, this actually
masked another bug, as NaN is often represented as all 1-bits,
which fails the overflow test.  Since the implicit conversion to
int lost precision, we ended up storing large numbers that don't
actually represent valid decimal numbers in the range when the
value happened to be +/- Infinity.  This caused the rendering
back to ASCII characters to go awry, but is otherwise harmless.

Testing: Added expr-test and decimal-test test coverage as well as
manual testing.  I tried to update the expr benchmark to get some
kind of results but the benchmark is pretty bit-rotted.  It was
throwing JNI exceptions.  Fixed up the JNI init call, but there is
still a lot of work to do to get this back in a runnable state.
Even with the hack to get at the RuntimeContext, we end up getting
null derefs due to the slot descriptor table not being initialized.

Output comparisons, before | after
+--+
| cast(0.5 as int) |
+--+
| 0|  1|
+--+

+-+
| cast(cast(0.5999 as float) as decimal(5,1)) |
+-+
| 0.5  | 0.6  |
+-+

Performance summary.  In all cases I have tried multiple times and
taken the fastest query results.

Old version, head at 815c76f9cbbe6585ebed961da506fc54ce2ef4e3:

select sum(cast(l_extendedprice as bigint)) from tpch10_parquet.lineitem;
+--+
| sum(cast(l_extendedprice as bigint)) |
+--+
| 2293784575265|
+--+
Fetched 1 row(s) in 0.53s

With this change, and decimal_v2 off:

+--+
| sum(cast(l_extendedprice as bigint)) |
+--+
| 2293784575265|
+--+
Fetched 1 row(s) in 0.52s

Note that there is some noise / instability in these results and across
invocations there is quite a bit of variance.  Still we appear not to
have regressed.

With decimal V2 enabled, we loose some performance due to rounding.

DECIMAL_V2 set to 1
+--+
| sum(cast(l_extendedprice as bigint)) |
+--+
| 2293814088985|
+--+
Fetched 1 row(s) in 0.63s

So we're about 20% slower.  The variance is quite a lot so this is not a
scientific number, but the trend is maintained.  So we have some work to
do to get this back.

Casting from double seems to be roughly at parity:

Old version, head at 815c76f9cbbe6585ebed961da506fc54ce2ef4e3:
+-+
| sum(cast(cast(l_extendedprice as double) as decimal(14,2))) |
+-+
| 2293813121802.09|
+-+
Fetched 1 row(s) in 0.63s
+--+
| sum(cast(cast(l_extendedprice as double) as decimal(38,10))) |
+--+
| 2293813156773.3596978911 |
+--+
Fetched 1 row(s) in 0.72s

New version, decimal_v2=0
+-+
| sum(cast(cast(l_extendedprice as double) as decimal(14,2))) |
+-+
| 2293813121802.09|
+-+
Fetched 1 row(s) in 0.64s
+--+
| sum(cast(cast(l_extendedprice as double) as decimal(38,10))) |
+--+
| 2293813156773.3596978911 |
+--+
Fetched 1 row(s) in 0.73s

New version, decimal_v2=1;
+-+

[Impala-ASF-CR] IMPALA-4821: Update AVG() for DECIMAL V2

2017-02-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4821: Update AVG() for DECIMAL_V2
..


Patch Set 5:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/291/

-- 
To view, visit http://gerrit.cloudera.org:8080/6038
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I28f5ef0370938440eb5b1c6d29b2f24e6f88499f
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-21 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#7).

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..

IMPALA-4787: Optimize APPX_MEDIAN() memory usage

Before this change, ReservoirSample functions (such as APPX_MEDIAN())
allocated memory for 20,000 elements up front per grouping key. This
caused inefficient memory usage for aggregations with many grouping
keys.

This patch fixes this by initially allocating memory for 16 elements.
Once the buffer becomes full, we reallocate a new buffer with double
capacity and copy the original buffer into the new one. We continue
doubling the buffer size until the buffer has room for 20,000 elements
as before.

Testing:
Ran BE and some relevant EE tests locally.
No new tests were added, the existing tests should provide enough
coverage.

Performance benchmark:
I ran a performance benchmark locally on release build. The following
query results in 3,000 grouping keys and about 30,000 values per key:
SELECT MAX(a) from (
  SELECT c1, appx_median(c2) as a FROM benchmark GROUP BY c1) t

Before: 11.57s
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
-
06:AGGREGATE1   96.086us   96.086us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   26.629us   26.629us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   68.187us   87.887us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE32s851ms5s362ms   3.00K  -1   1.95 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  119.540ms  220.191ms   9.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE35s876ms6s254ms   9.00K  -1   2.93 GB  
128.00 MB  STREAMING
00:SCAN HDFS3  127.834ms  146.842ms  98.30M  -1  19.80 MB   
32.00 MB  tpcds_10_parquet.benchmark

After:  13.58s
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
---
06:AGGREGATE1  101.101us  101.101us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   32.296us   32.296us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   83.284us  120.137us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE33s190ms6s555ms   3.00K  -1   1.96 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  247.897ms  497.280ms   9.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE37s370ms8s460ms   9.00K  -1   4.71 GB  
128.00 MB  STREAMING
00:SCAN HDFS3  111.721ms  122.306ms  98.30M  -1  19.94 MB   
32.00 MB  tpcds_10_parquet.benchmark

Memory benchmark:
The following query was used to verify that this patch reduces memory
usage:
SELECT APPX_MEDIAN(ss_sold_date_sk)
FROM tpcds.store_sales
GROUP BY ss_customer_sk;

Peak Mem in Agg nodes is reduced from 4.94 GB to 19.82 MB.

Summary before:
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail

04:EXCHANGE 15.856ms5.856ms   14.82K  15.21K 0  
  -1.00 B  UNPARTITIONED
03:AGGREGATE33s721ms3s789ms   14.82K  15.21K   4.94 GB  
 10.00 MB  FINALIZE
02:EXCHANGE 3  139.276ms  157.753ms   15.60K  15.21K 0  
0  HASH(ss_customer_sk)
01:AGGREGATE32s851ms3s026ms   15.60K  15.21K   5.29 GB  
 10.00 MB  STREAMING
00:SCAN HDFS3   24.245ms   35.727ms  183.59K 183.59K   4.60 MB  
384.00 MB  tpcds.store_sales

Summary after:
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail

04:EXCHANGE 1  288.125us  288.125us   14.82K  15.21K 0  
  -1.00 B  UNPARTITIONED
03:AGGREGATE39.358ms   10.982ms   14.82K  15.21K  19.82 MB  
 10.00 MB  FINALIZE
02:EXCHANGE 3  129.832us  154.953us   15.62K  15.21K 0  
0  HASH(ss_customer_sk)
01:AGGREGATE3   11.086ms   13.102ms   15.62K  15.21K   9.49 MB  
 10.00 MB  STREAMING
00:SCAN HDFS3   40.154ms   50.220ms  183.59K 183.59K   2.94 MB  
384.00 MB  tpcds.store_sales

Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
---
M be/src/exprs/aggregate-functions-ir.cc
M testdata/workloads/functional-query/queries/QueryTest/alloc-fail-init.test
2 files changed, 156 insertions(+), 37 del

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-21 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#7).

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..

IMPALA-4787: Optimize APPX_MEDIAN() memory usage

Before this change, ReservoirSample functions (such as APPX_MEDIAN())
allocated memory for 20,000 elements up front per grouping key. This
caused inefficient memory usage for aggregations with many grouping
keys.

This patch fixes this by initially allocating memory for 16 elements.
Once the buffer becomes full, we reallocate a new buffer with double
capacity and copy the original buffer into the new one. We continue
doubling the buffer size until the buffer has room for 20,000 elements
as before.

Testing:
Ran BE and some relevant EE tests locally.
No new tests were added, the existing tests should provide enough
coverage.

Performance benchmark:
I ran a performance benchmark locally on release build. The following
query results in 3,000 grouping keys and about 30,000 values per key:
SELECT MAX(a) from (
  SELECT c1, appx_median(c2) as a FROM benchmark GROUP BY c1) t

Before: 11.57s
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
-
06:AGGREGATE1   96.086us   96.086us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   26.629us   26.629us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   68.187us   87.887us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE32s851ms5s362ms   3.00K  -1   1.95 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  119.540ms  220.191ms   9.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE35s876ms6s254ms   9.00K  -1   2.93 GB  
128.00 MB  STREAMING
00:SCAN HDFS3  127.834ms  146.842ms  98.30M  -1  19.80 MB   
32.00 MB  tpcds_10_parquet.benchmark

After:  13.58s
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
---
06:AGGREGATE1  101.101us  101.101us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   32.296us   32.296us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   83.284us  120.137us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE33s190ms6s555ms   3.00K  -1   1.96 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  247.897ms  497.280ms   9.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE37s370ms8s460ms   9.00K  -1   4.71 GB  
128.00 MB  STREAMING
00:SCAN HDFS3  111.721ms  122.306ms  98.30M  -1  19.94 MB   
32.00 MB  tpcds_10_parquet.benchmark

Memory benchmark:
The following query was used to verify that this patch reduces memory
usage:
SELECT APPX_MEDIAN(ss_sold_date_sk)
FROM tpcds.store_sales
GROUP BY ss_customer_sk;

Peak Mem in Agg nodes is reduced from 4.94 GB to 19.82 MB.

Summary before:
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail

04:EXCHANGE 15.856ms5.856ms   14.82K  15.21K 0  
  -1.00 B  UNPARTITIONED
03:AGGREGATE33s721ms3s789ms   14.82K  15.21K   4.94 GB  
 10.00 MB  FINALIZE
02:EXCHANGE 3  139.276ms  157.753ms   15.60K  15.21K 0  
0  HASH(ss_customer_sk)
01:AGGREGATE32s851ms3s026ms   15.60K  15.21K   5.29 GB  
 10.00 MB  STREAMING
00:SCAN HDFS3   24.245ms   35.727ms  183.59K 183.59K   4.60 MB  
384.00 MB  tpcds.store_sales

Summary after:
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail

04:EXCHANGE 1  288.125us  288.125us   14.82K  15.21K 0  
  -1.00 B  UNPARTITIONED
03:AGGREGATE39.358ms   10.982ms   14.82K  15.21K  19.82 MB  
 10.00 MB  FINALIZE
02:EXCHANGE 3  129.832us  154.953us   15.62K  15.21K 0  
0  HASH(ss_customer_sk)
01:AGGREGATE3   11.086ms   13.102ms   15.62K  15.21K   9.49 MB  
 10.00 MB  STREAMING
00:SCAN HDFS3   40.154ms   50.220ms  183.59K 183.59K   2.94 MB  
384.00 MB  tpcds.store_sales

Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
---
M be/src/exprs/aggregate-functions-ir.cc
M testdata/workloads/functional-query/queries/QueryTest/alloc-fail-init.test
2 files changed, 156 insertions(+), 37 del

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-21 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 6:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6025/6/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

PS6, Line 918: const static int INIT_CAPACITY = 16;
 : const static int MAX_NUM_SAMPLES = NUM_BUCKETS * 
NUM_SAMPLES_PER_BUCKET;
> Please add a comment. It's not clear why all of these are grouped together 
Done


PS6, Line 958: ReservoirSampleState
> after thinking more about the casing comments, I came to the conclusion tha
Why do you think a vector-like interface is not good? It seems simple and 
intuitive. Each operation does exactly what the vector equivalent does, with 
the exception of push_back(), which does not resize automatically. Should we 
get rid of begin() and end()? What should they be called?


PS6, Line 1285:   nth_element(src->begin(), mid_point, src->end(), 
SampleValLess);
> nice
thanks!


-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] kudu: fix uninitialized variable usage warning

2017-02-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: kudu: fix uninitialized variable usage warning
..


kudu: fix uninitialized variable usage warning

This fixes a trivial warning of a potential usage of an uninitialized variable
in release builds. If for some reason Kudu sends us an invalid log severity
level, we'll now fall back to INFO instead of reading an uninitialized
severity value.

Change-Id: I4f87c39cf9259bec8ba9a84a2543fb1f2e4a3422
Reviewed-on: http://gerrit.cloudera.org:8080/6098
Reviewed-by: Matthew Jacobs 
Tested-by: Impala Public Jenkins
---
M be/src/exec/kudu-util.cc
1 file changed, 1 insertion(+), 1 deletion(-)

Approvals:
  Impala Public Jenkins: Verified
  Matthew Jacobs: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/6098
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I4f87c39cf9259bec8ba9a84a2543fb1f2e4a3422
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Todd Lipcon 


[Impala-ASF-CR] kudu: fix uninitialized variable usage warning

2017-02-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: kudu: fix uninitialized variable usage warning
..


Patch Set 1: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6098
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4f87c39cf9259bec8ba9a84a2543fb1f2e4a3422
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3401 [DOCS] Phase 1 of removing 'Cloudera Manager'

2017-02-21 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-3401 [DOCS] Phase 1 of removing 'Cloudera Manager'
..


Patch Set 2: Code-Review+2

Thank you!

-- 
To view, visit http://gerrit.cloudera.org:8080/6049
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I02ff6c3fc74e2e59b5d130226bd38c23c9c094b7
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Laurel Hale 
Gerrit-Reviewer: Ambreen Kazi 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: John Russell 
Gerrit-HasComments: No


[Impala-ASF-CR] build: don't look in system paths for kudu client

2017-02-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: build: don't look in system paths for kudu client
..


Patch Set 1: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6097
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8dad3d92e0bd07c27c3a58f6964d4da88218049c
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[Impala-ASF-CR] build: don't look in system paths for kudu client

2017-02-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: build: don't look in system paths for kudu client
..


build: don't look in system paths for kudu client

On my system which had the Kudu client library installed in /usr,
Impala was picking up the wrong version of the client. This instructs
CMake to only look in the specified toolchain directory to find Kudu.

Change-Id: I8dad3d92e0bd07c27c3a58f6964d4da88218049c
Reviewed-on: http://gerrit.cloudera.org:8080/6097
Reviewed-by: Matthew Jacobs 
Tested-by: Impala Public Jenkins
---
M CMakeLists.txt
1 file changed, 1 insertion(+), 1 deletion(-)

Approvals:
  Impala Public Jenkins: Verified
  Matthew Jacobs: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/6097
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I8dad3d92e0bd07c27c3a58f6964d4da88218049c
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Todd Lipcon 


[Impala-ASF-CR] IMPALA-4934: Disable Kudu OpenSSL initialization

2017-02-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4934: Disable Kudu OpenSSL initialization
..


Patch Set 6:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/290/

-- 
To view, visit http://gerrit.cloudera.org:8080/6056
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3f13f3af512c6d771979638da593685524c73086
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4934: Disable Kudu OpenSSL initialization

2017-02-21 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-4934: Disable Kudu OpenSSL initialization
..


Patch Set 6: Code-Review+2

fixed a python test that failed, I had missed it locally because my local 
didn't have a completely fresh environment yet, so the newer python client 
wasn't being used yet.

-- 
To view, visit http://gerrit.cloudera.org:8080/6056
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3f13f3af512c6d771979638da593685524c73086
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4877: fix precedence of unary -/+

2017-02-21 Thread Dan Hecht (Code Review)
Hello Marcel Kornacker,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6044

to look at the new patch set (#3).

Change subject: IMPALA-4877: fix precedence of unary -/+
..

IMPALA-4877: fix precedence of unary -/+

Currently, expressions such as "-2 / 3" parse as "-(2 / 3)". In
practice this usually doesn't cause differences for most common
expressions. However, it does interact with a heuristic that changes
decimal expression types to double when one operand is non-decimal
(see JIRA). For example, before this fix,

  typeof(2.0/3.0) = DECIMAL
  typeof(-2.0/3.0) = DOUBLE

because the latter expression parsed as "-1 * (2.0 / 3.0)".
With this fix, both expressions result in a DECIMAL.

Technically this is a compatibility breaking change but it seems
like no one would expect the current behavior so I think we should
fix it. Let me know if you disagree.

Change-Id: I39cf388ae6e37e1b1e12ddea5fd3878c9c2620d1
---
M be/src/exprs/expr-test.cc
M fe/src/main/cup/sql-parser.cup
2 files changed, 21 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/6044/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6044
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I39cf388ae6e37e1b1e12ddea5fd3878c9c2620d1
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dan Hecht 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 


[Impala-ASF-CR] IMPALA-4877: fix precedence of unary -/+

2017-02-21 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-4877: fix precedence of unary -/+
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6044/2/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

Line 517: precedence left UMINUS, UPLUS;
> actually, since they're identical, why not just have a single 'unarysign' o
good point. done.


-- 
To view, visit http://gerrit.cloudera.org:8080/6044
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I39cf388ae6e37e1b1e12ddea5fd3878c9c2620d1
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dan Hecht 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4934: Disable Kudu OpenSSL initialization

2017-02-21 Thread Matthew Jacobs (Code Review)
Hello Impala Public Jenkins, Henry Robinson,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6056

to look at the new patch set (#6).

Change subject: IMPALA-4934: Disable Kudu OpenSSL initialization
..

IMPALA-4934: Disable Kudu OpenSSL initialization

Bumps the Kudu version to include the change to the client
that allows Impala to disable SSL initialization.

In authentication.cc, after Impala initializes OpenSSL,
Impala then disables Kudu's OpenSSL init.

Fixed a python test case that started failing after bumping
the Kudu client version.

Change-Id: I3f13f3af512c6d771979638da593685524c73086
---
M be/src/rpc/authentication.cc
M bin/impala-config.sh
M infra/python/deps/download_requirements
M infra/python/deps/requirements.txt
M tests/query_test/test_kudu.py
5 files changed, 13 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/6056/6
-- 
To view, visit http://gerrit.cloudera.org:8080/6056
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3f13f3af512c6d771979638da593685524c73086
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 


[Impala-ASF-CR] IMPALA-4959: Avoid picking up the system's boost cmake module

2017-02-21 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-4959: Avoid picking up the system's boost cmake module
..


Patch Set 3: Code-Review+2

ah, i forgot you could steal someone's CR by just posting w/ the same commit ID

-- 
To view, visit http://gerrit.cloudera.org:8080/5994
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I759e1a4f8f69727cc1224bf460326140fd2131a2
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3401 [DOCS] Removing 'Cloudera Manager' from upstream docs.

2017-02-21 Thread Laurel Hale (Code Review)
Laurel Hale has uploaded a new patch set (#2).

Change subject: IMPALA-3401 [DOCS] Removing 'Cloudera Manager' from upstream 
docs.
..

IMPALA-3401 [DOCS] Removing 'Cloudera Manager' from upstream docs.

Used DITA attribute 'audience=hidden' when needed to preserve
information for a future clean-up step. When it made sense,
'Cloudera Manager' was removed and sections were rewritten.
This clean-up task does not completely remove 'Cloudera Manager'
from the upstream docs source XML, but it is now
hidden in these files so it won't show up in the rendered
version. A subsequent cleanup project will remove references
in the XML source.

Change-Id: I76c9b53f587bc85c5c21e195f0a771183d4ef3a0
---
M docs/topics/impala_admission.xml
M docs/topics/impala_config_performance.xml
M docs/topics/impala_noncm_installation.xml
M docs/topics/impala_prereqs.xml
M docs/topics/impala_tutorial.xml
5 files changed, 72 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/6064/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6064
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I76c9b53f587bc85c5c21e195f0a771183d4ef3a0
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Laurel Hale 
Gerrit-Reviewer: Ambreen Kazi 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Laurel Hale 


[Impala-ASF-CR] IMPALA-2328: Read support for min/max Parquet statistics

2017-02-21 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-2328: Read support for min/max Parquet statistics
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6032/4/testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test
File testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test:

Line 2:  QUERY
> Can you add coverage for filtering on the root level for nested data?
Can you elaborate on how to do this? Wouldn't it only make sense to apply 
filtering on the leave level, since objects at the root level would be complex 
ones? Can you post a sample query and the result you would expect? Thanks.


-- 
To view, visit http://gerrit.cloudera.org:8080/6032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I39b836165756fcf929c801048d91c50c8fdcdae4
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Matthew Mulder 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2328: Read support for min/max Parquet statistics

2017-02-21 Thread Lars Volker (Code Review)
Lars Volker has uploaded a new patch set (#6).

Change subject: IMPALA-2328: Read support for min/max Parquet statistics
..

IMPALA-2328: Read support for min/max Parquet statistics

This change adds support for skipping row groups based on Parquet row
group statistics. With this change we only support reading statistics
from Parquet files for numerical types (bool, integer, floating point)
and for simple predicates of the formsor
  , where  is LT, LE, GE, GT, and EQ.

Change-Id: I39b836165756fcf929c801048d91c50c8fdcdae4
---
M be/src/exec/CMakeLists.txt
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
A be/src/exec/parquet-column-stats.cc
M be/src/exec/parquet-column-stats.h
M be/src/exec/parquet-metadata-utils.cc
M be/src/exec/parquet-metadata-utils.h
M be/src/exprs/expr.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
A fe/src/main/java/org/apache/impala/rewrite/NormalizeBinaryPredicatesRule.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/data-source-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/hdfs.test
M testdata/workloads/functional-planner/queries/PlannerTest/implicit-joins.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
A testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test
M tests/query_test/test_insert_parquet.py
29 files changed, 756 insertions(+), 87 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/6032/6
-- 
To view, visit http://gerrit.cloudera.org:8080/6032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I39b836165756fcf929c801048d91c50c8fdcdae4
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Matthew Mulder 
Gerrit-Reviewer: Mostafa Mokhtar 


[Impala-ASF-CR] IMPALA-2328: Read support for min/max Parquet statistics

2017-02-21 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-2328: Read support for min/max Parquet statistics
..


Patch Set 4:

(26 comments)

Thanks for the review. Please see PS6.

http://gerrit.cloudera.org:8080/#/c/6032/4/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

Line 472:   ScopedBuffer tuple_buffer(scan_node_->mem_tracker());
> no need to allocate this and go through the setup for every single row grou
Done


Line 486:   conjuncts_with_stats.reserve(min_max_conjuncts_ctxs_.size());
> there's too much memory allocation going on for what are essentially static
Done


Line 496: unique_ptr stats(
> there's no need for memory allocation here
Done


Line 501: if (e->function_name() == "lt" || e->function_name() == "le") {
> introduce named constants (in the appropriate class)
Other code that compares function_name follows the same pattern. As discussed 
in person, let's keep this for now and see if we find a nicer abstraction to 
determine when to use min/max values, e.g. monotonically bound predicates.


http://gerrit.cloudera.org:8080/#/c/6032/4/be/src/exec/hdfs-parquet-scanner.h
File be/src/exec/hdfs-parquet-scanner.h:

Line 378:   /// Min/max statistics contexts.
> who owns them?
Done


http://gerrit.cloudera.org:8080/#/c/6032/4/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

Line 182:   if (min_max_tuple_id_ > -1) {
> ">" or < don't make sense for tuple ids
Done


http://gerrit.cloudera.org:8080/#/c/6032/4/be/src/exec/parquet-column-stats.h
File be/src/exec/parquet-column-stats.h:

Line 40: /// writing parquet files. It provides an interface to populate a 
parquet::Statistics
> this description isn't true anymore, it's now also an intermediate staging 
I removed the expanded abstraction and added a sentence to the comment 
explaining that it can also be used to decode stats.


Line 70:   /// 'nullptr' for unsupported types. The caller is responsible for 
freeing the memory.
> this should be created in an objectpool
Code has been removed.


Line 87:   virtual void WriteMinToBuffer(void* buffer) const = 0;
> remove "tobuffer", that's clear from the parameters. what's unclear is the 
Done


Line 126: max_value_(max_value) {
> formatting
Done


http://gerrit.cloudera.org:8080/#/c/6032/4/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

Line 208:   // Conjuncts that can be evaluated against the statistics of 
parquet files. Each
> refer to thrift structs explicitly, where appropriate.
Done


Line 209:   // conjunct references the slot in 'min_max_tuple_id' with the same 
index.
> i didn't understand the last sentence.
Changed it.


http://gerrit.cloudera.org:8080/#/c/6032/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

Line 143:   // List of conjuncts for min/max values, that are used to skip data 
when scanning
> "min/max values" is vague, refer to thrift structs explicitly.
Done


Line 145:   private List minMaxConjunts_ = Lists.newArrayList();
> conjuncts
Done


Line 147:   // Tuple that is used to materialize statistics when scanning 
Parquet files.
> describe what's in it.
Done


Line 290:* Builds a predicate to evaluate statistical values by combining 
'inputSlot',
> make 'statistical values' concrete
Done


Line 291:* 'inputPred' and 'op' into a new predicate, and adding it to 
'minMaxConjunts_'.
> missing other side effects
Done


Line 297: Preconditions.checkState(constExpr.isConstant());
> why not a literal?
My reasoning was that any constant expression could be evaluated against the 
statistics. I added a test to make sure this does indeed work as expected.


Line 303: // Make a new slot from the slot descriptor.
> what do you mean by 'slot'? (slot/slot descriptor are usually synonyms)
I meant slotRef, but the comment looked superfluous and I removed it altogether.


Line 315:* TODO: This method currently doesn't de-dup slots that are 
referenced multiple times.
> resolve
As discussed, lets keep this for now and address it in a subsequent change by 
simplifying redundant exprs. I created IMPALA-4958 to track this.


Line 328:   // We only support slot refs on the left hand side of the 
predicate, a rewriting
> this overlaps with what kuduscannode does, and there's already a similar co
I had talked to Alex about this last week and he suggested to add a general 
rewrite rule in the FE to normalize binary exprs. I removed the branches from 
BinaryPredicate.java that are no longer needed after the normalization. I also 
cleaned up code in KuduScanNode.java that is no longer needed.


Line 343:   {
> { on preceding line
Done


Line 658:   
msg.hdfs_scan_node.setMin_max_tuple_id(minMaxTuple_.getId().asInt());
> check that it's not invalid
How can I check this? The tupleId needs to come from the TupleId generator a

[Impala-ASF-CR] IMPALA-2328: Read support for min/max Parquet statistics

2017-02-21 Thread Lars Volker (Code Review)
Lars Volker has uploaded a new patch set (#5).

Change subject: IMPALA-2328: Read support for min/max Parquet statistics
..

IMPALA-2328: Read support for min/max Parquet statistics

This change adds support for skipping row groups based on Parquet row
group statistics. With this change we only support reading statistics
from Parquet files for numerical types (bool, integer, floating point)
and for simple predicates of the formsor
  , where  is LT, LE, GE, GT, and EQ.

Change-Id: I39b836165756fcf929c801048d91c50c8fdcdae4
---
M be/src/exec/CMakeLists.txt
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
A be/src/exec/parquet-column-stats.cc
M be/src/exec/parquet-column-stats.h
M be/src/exec/parquet-metadata-utils.cc
M be/src/exec/parquet-metadata-utils.h
M be/src/exprs/expr.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
A fe/src/main/java/org/apache/impala/rewrite/NormalizeBinaryPredicatesRule.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/data-source-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/hdfs.test
M testdata/workloads/functional-planner/queries/PlannerTest/implicit-joins.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
A testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test
M tests/query_test/test_insert_parquet.py
29 files changed, 757 insertions(+), 87 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/6032/5
-- 
To view, visit http://gerrit.cloudera.org:8080/6032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I39b836165756fcf929c801048d91c50c8fdcdae4
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Matthew Mulder 
Gerrit-Reviewer: Mostafa Mokhtar 


[Impala-ASF-CR] IMPALA-4934: Disable Kudu OpenSSL initialization

2017-02-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4934: Disable Kudu OpenSSL initialization
..


Patch Set 5: Verified-1

Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/287/

-- 
To view, visit http://gerrit.cloudera.org:8080/6056
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3f13f3af512c6d771979638da593685524c73086
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3401 [DOCS] Phase 1 of removing 'Cloudera Manager'

2017-02-21 Thread Laurel Hale (Code Review)
Hello Jim Apple,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6049

to look at the new patch set (#2).

Change subject: IMPALA-3401 [DOCS] Phase 1 of removing 'Cloudera Manager'
..

IMPALA-3401 [DOCS] Phase 1 of removing 'Cloudera Manager'

Rewrote sections to eliminate 'Cloudera Manager' from topics.
Look for subsequent phases to remove remaining instances of CM.

Change-Id: I02ff6c3fc74e2e59b5d130226bd38c23c9c094b7
---
M docs/shared/impala_common.xml
M docs/topics/impala_config_options.xml
M docs/topics/impala_upgrading.xml
3 files changed, 6 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/6049/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6049
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I02ff6c3fc74e2e59b5d130226bd38c23c9c094b7
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Laurel Hale 
Gerrit-Reviewer: Ambreen Kazi 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: John Russell 


[Impala-ASF-CR] IMPALA-4624: Implement Parquet dictionary filtering

2017-02-21 Thread Mostafa Mokhtar (Code Review)
Mostafa Mokhtar has posted comments on this change.

Change subject: IMPALA-4624: Implement Parquet dictionary filtering
..


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5904/8/testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test
File 
testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test:

Line 1: 
Please add coverage for nested data filtering as well.


-- 
To view, visit http://gerrit.cloudera.org:8080/5904
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3a7cc3bd0523fbf3c79bd924219e909ef671cfd7
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Mulder 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2328: Read support for min/max Parquet statistics

2017-02-21 Thread Mostafa Mokhtar (Code Review)
Mostafa Mokhtar has posted comments on this change.

Change subject: IMPALA-2328: Read support for min/max Parquet statistics
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6032/4/testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test
File testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test:

Line 2:  QUERY
Can you add coverage for filtering on the root level for nested data?


-- 
To view, visit http://gerrit.cloudera.org:8080/6032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I39b836165756fcf929c801048d91c50c8fdcdae4
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Matthew Mulder 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts

2017-02-21 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change.

Change subject: IMPALA-2020: Add rounding for decimal casts
..


Patch Set 16:

(1 comment)

After fighting the compiler for over an hour to figure out why our operator<< 
was not being called, I found this little gem in glog/logging.h and since 
abandoned getting DCHECK_EQ / GT to work.

/ Build the error message string.
template
std::string* MakeCheckOpString(const t1& v1, const t2& v2, const char* names) {
  // It means that we cannot use stl_logging if compiler doesn't
  // support using expression for operator.
  // TODO(hamaji): Figure out a way to fix.
#if 1
  using ::operator<<;
#endif
  std::strstream ss;
  ss << names << " (" << v1 << " vs. " << v2 << ")";
  return new std::string(ss.str(), ss.pcount());
}

http://gerrit.cloudera.org:8080/#/c/5951/16//COMMIT_MSG
Commit Message:

Line 64: Query: select sum(cast(l_extendedprice as bigint)) from
> please pare down the commit msg, in particular get rid of the verbatim shel
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts

2017-02-21 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-2020: Add rounding for decimal casts
..


Patch Set 16:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5951/16//COMMIT_MSG
Commit Message:

Line 64: Query: select sum(cast(l_extendedprice as bigint)) from
please pare down the commit msg, in particular get rid of the verbatim shell 
output (a table with the numbers is sufficient to get the point across).


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4877: fix precedence of unary -/+

2017-02-21 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4877: fix precedence of unary -/+
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6044/2/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

Line 517: precedence left UMINUS, UPLUS;
actually, since they're identical, why not just have a single 'unarysign' or 
something like that?


-- 
To view, visit http://gerrit.cloudera.org:8080/6044
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I39cf388ae6e37e1b1e12ddea5fd3878c9c2620d1
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dan Hecht 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4877: fix precedence of unary -/+

2017-02-21 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4877: fix precedence of unary -/+
..


Patch Set 2: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6044
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I39cf388ae6e37e1b1e12ddea5fd3878c9c2620d1
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dan Hecht 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3905: Implements HdfsScanner::GetNext() for text scans.

2017-02-21 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-3905: Implements HdfsScanner::GetNext() for text scans.
..


Patch Set 2:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/6000/2/be/src/exec/hdfs-text-scanner.cc
File be/src/exec/hdfs-text-scanner.cc:

Line 64:   batch_start_ptr_(NULL),
use nullptr throughout


Line 273:   MemPool* pool = row_batch->tuple_data_pool();
please consider getting rid of this variable and referring to the rowbatch pool 
directly. there are too many mempools flying around in this code, it's nice to 
have a reminder which one is being used.


Line 440:   eos_ = true;
advance scan_state_?

also, i don't see anything in hdfs-scanner.h that says you can't call getnext() 
when eos() returns true.


Line 445: << "FE should have generated SNAPPY_BLOCKED instead.";
why not check this when creating the stream?


Line 446: 
RETURN_IF_ERROR(UpdateDecompressor(stream_->file_desc()->file_compression));
why is this needed, given that we just started (ie, why not move it into 
initialization)?


Line 459: eos_ = true;
scan_state_ transition?


Line 469: Status HdfsTextScanner::CommitRows(RowBatch* row_batch, int num_rows) 
{
> This loosely corresponds to the previous HdfsScanNode::CommitRows()
how about coalescing that with hdfsscannode::commitrows? at the very least move 
this into hdfsscannode.


Line 485: ExprContext::FreeLocalAllocations(entry.second);
why call this here?


http://gerrit.cloudera.org:8080/#/c/6000/2/be/src/exec/hdfs-text-scanner.h
File be/src/exec/hdfs-text-scanner.h:

Line 119:   /// regardless if whether it passed the conjuncts.
regardless of whether


Line 124:   /// Only valid to call in scan state FIRST_TUPLE_FOUND or 
PAST_SCAN_RANGE.
thanks for introducing that state variable, it's much clearer than a bunch of 
bools that get passed around.


Line 135:   Status CommitRows(RowBatch* row_batch, int num_rows);
in/out params go last


Line 140:   /// otherwise it will just read num_bytes. If we are reading 
compressed text, num_bytes
if we are reading compressed text, should this even be called (or why is there 
a 'compressed stream' variant)?


-- 
To view, visit http://gerrit.cloudera.org:8080/6000
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id193aa223434d7cc40061a42f81bbb29dcd0404b
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4821: Update AVG() for DECIMAL V2

2017-02-21 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4821: Update AVG() for DECIMAL_V2
..


Patch Set 5: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6038
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I28f5ef0370938440eb5b1c6d29b2f24e6f88499f
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4821: Update AVG() for DECIMAL V2

2017-02-21 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-4821: Update AVG() for DECIMAL_V2
..


Patch Set 5: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6038
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I28f5ef0370938440eb5b1c6d29b2f24e6f88499f
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4821: Update AVG() for DECIMAL V2

2017-02-21 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-4821: Update AVG() for DECIMAL_V2
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6038/4//COMMIT_MSG
Commit Message:

PS4, Line 25: types
> Done.
I'm not sure how many people deal with numbers larger than 10^32... seems like 
a reasonable limitation.


-- 
To view, visit http://gerrit.cloudera.org:8080/6038
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I28f5ef0370938440eb5b1c6d29b2f24e6f88499f
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3401 [DOCS] Phase 5 of "Cloudera Manager" removal. Most of these fixes entailed hiding the paragraph where there are mentions of CM and writing a replacement paragraph for the u

2017-02-21 Thread Laurel Hale (Code Review)
Laurel Hale has uploaded a new patch set (#2).

Change subject: IMPALA-3401 [DOCS] Phase 5 of "Cloudera Manager" removal. Most 
of these fixes entailed hiding the paragraph where there are mentions of CM and 
writing a replacement paragraph for the upstream docs. This removes the CM 
references from the rendered docs. A 
..

IMPALA-3401 [DOCS] Phase 5 of "Cloudera Manager"
removal. Most of these fixes entailed hiding the
paragraph where there are mentions of CM and
writing a replacement paragraph for the upstream
docs. This removes the CM references from the
rendered docs. A subsequent cleanup project will
remove occurrences of CM from the XML. This patch
includes fixes to files in response to John
Russell's first set of review comments.

Change-Id: I4967fae275a8822274aece14a15107237445aba5
---
M docs/topics/impala_isilon.xml
M docs/topics/impala_logging.xml
M docs/topics/impala_s3.xml
M docs/topics/impala_webui.xml
4 files changed, 37 insertions(+), 27 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/6070/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6070
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4967fae275a8822274aece14a15107237445aba5
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Laurel Hale 
Gerrit-Reviewer: Ambreen Kazi 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: John Russell 


[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts

2017-02-21 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change.

Change subject: IMPALA-2020: Add rounding for decimal casts
..


Patch Set 10:

(1 comment)

I am trying to get the DCHECK_EQ to work with int128_t, for which we do provide 
a definition, but gcc keeps complaining about an ambiguous overload.

http://gerrit.cloudera.org:8080/#/c/5951/10/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

PS10, Line 38:   if (round) {
 : // Decimal V2 behavior
 : 
 : // Multiply the double by the scale.
 : //
 : // TODO: this could be made more precise by doing the 
computation in
 : // 64 bit with 128 bit immediate result instead of doing an 
additional
 : // multiply in 53-bit double precision floating point
 : 
 : d *= DecimalUtil::GetScaleMultiplier(scale);
 : d = std::round(d);
 : const T max_value = 
DecimalUtil::GetScaleMultiplier(precision);
 : if (abs(d) >= max_value) {
 :   *overflow = true;
 :   return DecimalValue();
 : }
 : 
 : // Return the rounded integer part.
 : return DecimalValue(static_cast(d));
 :   } else {
 : // TODO: IMPALA-4924: remove DECIMAL V1 code
 : 
 : // Check overflow.
 : const T max_value = 
DecimalUtil::GetScaleMultiplier(precision - scale);
 : if (abs(d) >= max_value) {
 :   *overflow = true;
 :   return DecimalValue();
 : }
 : 
 : // Multiply the double by the scale.
 : d *= DecimalUtil::GetScaleMultiplier(scale);
 : 
 : // Truncate and just take the integer part.
 : return DecimalValue(static_cast(d));
 :   }
> If it won't break DECIMAL_V1, may as well merge the two paths and add the t
It looks like it will work and it doesn't change the speed.


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4959: Avoid picking up the system's boost cmake module

2017-02-21 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4959: Avoid picking up the system's boost cmake module
..


Patch Set 3: Code-Review+1

I updated the commit msg. Looks good to me.

-- 
To view, visit http://gerrit.cloudera.org:8080/5994
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I759e1a4f8f69727cc1224bf460326140fd2131a2
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4959: Avoid picking up the system's boost cmake module

2017-02-21 Thread Alex Behm (Code Review)
Alex Behm has uploaded a new patch set (#3).

Change subject: IMPALA-4959: Avoid picking up the system's boost cmake module
..

IMPALA-4959: Avoid picking up the system's boost cmake module

In some systems with an old boost installed system-wide the impala build
would fail with something like:

CMake Error at /usr/lib64/boost/BoostConfig.cmake:64 (get_target_property):
  get_target_property() called with non-existent target
  "boost_thread-shared".
Call Stack (most recent call first):
  toolchain/cmake-3.2.3-p1/share/cmake-3.2/Modules/FindBoost.cmake:206 
(find_package)
  CMakeLists.txt:116 (find_package)

CMake Error at /usr/lib64/boost/BoostConfig.cmake:72 (get_target_property):
  get_target_property() called with non-existent target
  "boost_thread-shared-debug".
Call Stack (most recent call first):
  toolchain/cmake-3.2.3-p1/share/cmake-3.2/Modules/FindBoost.cmake:206 
(find_package)
  CMakeLists.txt:116 (find_package)

This because, if it exists, cmake's FindBoost.cmake will look for and
use that module, even though boost's cmake build hasn't been maintained
in years and the impala build is actually configured to not use the
systems boost.

This patch sets the cmake flag Boost_NO_BOOST_CMAKE to ON, making sure
the old cmake module is not picked up.

Change-Id: I759e1a4f8f69727cc1224bf460326140fd2131a2
---
M CMakeLists.txt
1 file changed, 4 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/5994/3
-- 
To view, visit http://gerrit.cloudera.org:8080/5994
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I759e1a4f8f69727cc1224bf460326140fd2131a2
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 


[Impala-ASF-CR] IMPALA-4546: Fix Moscow timezone conversion after 2014

2017-02-21 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4546: Fix Moscow timezone conversion after 2014
..


Patch Set 6: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/5969
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id6e3f2c9f6ba29749a26bc1087e664637bc02528
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-21 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 6:

(3 comments)

I'm still not convinced the new paths are necessarily, e.g. the case I 
mentioned previously where you need to merge and there are different sized 
inputs.

http://gerrit.cloudera.org:8080/#/c/6025/6/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

PS6, Line 918: const static int INIT_CAPACITY = 16;
 : const static int MAX_NUM_SAMPLES = NUM_BUCKETS * 
NUM_SAMPLES_PER_BUCKET;
Please add a comment. It's not clear why all of these are grouped together 
anymore. The first two are only relevant to histograms. These two are about 
capacity for anything using ResSampling. MAX_NUM_SAMPLES should probably also 
be a capacity now, e.g. MAX_CAPACITY.


PS6, Line 958: ReservoirSampleState
after thinking more about the casing comments, I came to the conclusion that I 
don't think trying to make this look like a std::vector is even the best 
interface.

I'd prefer if the methods were named non-std-vector-like names, e.g. 

GetSample(idx)
AddSample(s)
GetStateSize()


PS6, Line 1285:   nth_element(src->begin(), mid_point, src->end(), 
SampleValLess);
nice


-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3406: [DOCS] Empty the original Cloudera FAQ

2017-02-21 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-3406: [DOCS] Empty the original Cloudera FAQ
..


Patch Set 2: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6003
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib81242f0981c04fff99e2c27e06a8d9f4da34c9f
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Laurel Hale 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3406: [DOCS] Empty the original Cloudera FAQ

2017-02-21 Thread John Russell (Code Review)
John Russell has posted comments on this change.

Change subject: IMPALA-3406: [DOCS] Empty the original Cloudera FAQ
..


Patch Set 2:

Somehow there was a conflicting change from a gerrit that I had abandoned that 
was getting in the way every time. Once I rebased against asf-gerrit/master and 
deleted that commit:

   d 367a690 IMPALA-3406: [DOCS] Remove stale and Cloudera-specific URLs

I was able to push the new patch set.

-- 
To view, visit http://gerrit.cloudera.org:8080/6003
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib81242f0981c04fff99e2c27e06a8d9f4da34c9f
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Laurel Hale 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3406: [DOCS] Empty the original Cloudera FAQ

2017-02-21 Thread John Russell (Code Review)
Hello Jim Apple,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6003

to look at the new patch set (#2).

Change subject: IMPALA-3406: [DOCS] Empty the original Cloudera FAQ
..

IMPALA-3406: [DOCS] Empty the original Cloudera FAQ

Almost all of the original Impala FAQ material was
Cloudera-themed or commercially oriented. Lots of
answers about the QuickStart VM, Cloudera discussion
forums, CDH-based recommendations, etc. IMO it is
not worth trying to adapt each FAQ entry to be generic.
Better to start over from the ground up.

Phase 1 of making an Apache-friendly FAQ is to strip
the original page "down to the studs" so new FAQ
entries can be added with more of a developer theme,
based on questions people have in the community.

Change-Id: Ib81242f0981c04fff99e2c27e06a8d9f4da34c9f
---
M docs/topics/impala_faq.xml
1 file changed, 6 insertions(+), 1,852 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/03/6003/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6003
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib81242f0981c04fff99e2c27e06a8d9f4da34c9f
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Laurel Hale 


[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-21 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 6:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6025/5/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

Line 1121: if (state->at(i)->key >= 0) continue;
> There is a comment in the code about upgrading to mersenne twister, I think
I'm satisfied with this answer


PS5, Line 1122: te->num_samples;
> The point of the big comment above is that this is not exact, and we're app
I'm satisfied with this answer


-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3401 [DOCS] Phase 3 of removing Cloudera Manager from upstream docs. Hid instances of CM and rewrote for upstream docs when necessary. This still leaves occurences of CM in the

2017-02-21 Thread John Russell (Code Review)
John Russell has posted comments on this change.

Change subject: IMPALA-3401 [DOCS] Phase 3 of removing Cloudera Manager from 
upstream docs. Hid instances of CM and rewrote for upstream docs when 
necessary. This still leaves occurences of CM in the XML, but not in the 
rendered documentation. A later project will remove
..


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6067/1/docs/topics/impala_impala_shell.xml
File docs/topics/impala_impala_shell.xml:

PS1, Line 96: 
Watch out for trailing spaces and/or tabs left behind in new blank lines or 
newly split lines.

In vi, you can find such things by doing

/[ \t][ \t]*$

which will leave the cursor at the start of whichever sequence of spaces and 
tabs comes at the end of a line. At that point, you can press D to remove from 
the cursor position to the end of the line -- i.e. all trailing spaces and tabs.


http://gerrit.cloudera.org:8080/#/c/6067/1/docs/topics/impala_proxy.xml
File docs/topics/impala_proxy.xml:

PS1, Line 336: 
Isn't this too blunt an instrument to hide the whole ? Everything under 
here (except little things like the CM-related ) is appropriate for 
generic Apache Impala usage. The , , etc. are all inside this 
 and will be hidden too.


http://gerrit.cloudera.org:8080/#/c/6067/1/docs/topics/impala_timeouts.xml
File docs/topics/impala_timeouts.xml:

PS1, Line 188: Various client applications
We may as well preserve part of this sentence, at least the example of:

pressing ^C in
impala-shell

since that's applicable to generic Apache Impala.


-- 
To view, visit http://gerrit.cloudera.org:8080/6067
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4748300edc43b7071afc50e7cc7ddd64120c0d8d
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Laurel Hale 
Gerrit-Reviewer: Ambreen Kazi 
Gerrit-Reviewer: John Russell 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-21 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 5:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/6025/2/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

PS2, Line 1018: Returns a pointer to one p
> Make this one start with a capital letter. But other ones like begin() shou
I see that you're matching some casing with c++ style iterators, but we do 
follow the C++ style guide [1], which states you'd use camel case for functions 
unless it's a simple accessor/mutator, which can have the lower-cased names.

1: https://google.github.io/styleguide/cppguide.html#Function_Names


http://gerrit.cloudera.org:8080/#/c/6025/3/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

PS3, Line 1025: 
2 now


PS3, Line 1076: 
2


http://gerrit.cloudera.org:8080/#/c/6025/5/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

Line 1121: int r = rand() % state->num_samples;
> I know you didn't author this line, but could you switch to a better PRNG t
There is a comment in the code about upgrading to mersenne twister, I think we 
can if it's now baked into c++11. However, what is the state/struct? Right now 
we keep the 'ranlux64_3 rng' state on the ResSampState, and we'd need to do the 
same for mt19937_64. We'd need to check that is reasonably sized. It might be 
worth doing that in a separate patch anyway.


PS5, Line 1122: ((double) state->source_size - r) / state->source_size
> I'm not yet convinced this is correct. I'm not convinced it is correct in H
The point of the big comment above is that this is not exact, and we're 
approximating the proper sampling in order to avoid maintaining a heap. The 
point is that this is picking some number proportional to the size of the 
source input. I'm sure you can find mathematical flaws in this approximation, 
and It's not clear to me how bad the error is, but the original use of this 
function was to generate coarse histograms so we really wanted to optimize for 
that use case, especially knowing that planner heuristics are only heuristics 
to begin with. Over time we've come to want more from this code, so it's 
probably worth reevaluating whether or not this approach still makes sense. 
There's a reason we even call the median function "APPX_MEDIAN". I don't think 
we give any guarantees beyond 'it is approximate': 
https://www.cloudera.com/documentation/enterprise/latest/topics/impala_appx_median.html


-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-21 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#6).

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..

IMPALA-4787: Optimize APPX_MEDIAN() memory usage

Before this change, ReservoirSample functions (such as APPX_MEDIAN())
allocated memory for 20,000 elements up front per grouping key. This
caused inefficient memory usage for aggregations with many grouping
keys.

This patch fixes this by initially allocating memory for 16 elements.
Once the buffer becomes full, we reallocate a new buffer with double
capacity and copy the original buffer into the new one. We continue
doubling the buffer size until the buffer has room for 20,000 elements
as before.

Testing:
Ran BE and some relevant EE tests locally.
No new tests were added, the existing tests should provide enough
coverage.

Performance benchmark:
I ran a performance benchmark locally on release build. The following
query results in 3,000 grouping keys and about 30,000 values per key:
SELECT MAX(a) from (
  SELECT c1, appx_median(c2) as a FROM benchmark GROUP BY c1) t

Before: 11.57s
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
-
06:AGGREGATE1   96.086us   96.086us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   26.629us   26.629us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   68.187us   87.887us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE32s851ms5s362ms   3.00K  -1   1.95 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  119.540ms  220.191ms   9.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE35s876ms6s254ms   9.00K  -1   2.93 GB  
128.00 MB  STREAMING
00:SCAN HDFS3  127.834ms  146.842ms  98.30M  -1  19.80 MB   
32.00 MB  tpcds_10_parquet.benchmark

After:  13.58s
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
---
06:AGGREGATE1  101.101us  101.101us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   32.296us   32.296us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   83.284us  120.137us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE33s190ms6s555ms   3.00K  -1   1.96 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  247.897ms  497.280ms   9.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE37s370ms8s460ms   9.00K  -1   4.71 GB  
128.00 MB  STREAMING
00:SCAN HDFS3  111.721ms  122.306ms  98.30M  -1  19.94 MB   
32.00 MB  tpcds_10_parquet.benchmark

Memory benchmark:
The following query was used to verify that this patch reduces memory
usage:
SELECT APPX_MEDIAN(ss_sold_date_sk)
FROM tpcds.store_sales
GROUP BY ss_customer_sk;

Peak Mem in Agg nodes is reduced from 4.94 GB to 19.82 MB.

Summary before:
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail

04:EXCHANGE 15.856ms5.856ms   14.82K  15.21K 0  
  -1.00 B  UNPARTITIONED
03:AGGREGATE33s721ms3s789ms   14.82K  15.21K   4.94 GB  
 10.00 MB  FINALIZE
02:EXCHANGE 3  139.276ms  157.753ms   15.60K  15.21K 0  
0  HASH(ss_customer_sk)
01:AGGREGATE32s851ms3s026ms   15.60K  15.21K   5.29 GB  
 10.00 MB  STREAMING
00:SCAN HDFS3   24.245ms   35.727ms  183.59K 183.59K   4.60 MB  
384.00 MB  tpcds.store_sales

Summary after:
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail

04:EXCHANGE 1  288.125us  288.125us   14.82K  15.21K 0  
  -1.00 B  UNPARTITIONED
03:AGGREGATE39.358ms   10.982ms   14.82K  15.21K  19.82 MB  
 10.00 MB  FINALIZE
02:EXCHANGE 3  129.832us  154.953us   15.62K  15.21K 0  
0  HASH(ss_customer_sk)
01:AGGREGATE3   11.086ms   13.102ms   15.62K  15.21K   9.49 MB  
 10.00 MB  STREAMING
00:SCAN HDFS3   40.154ms   50.220ms  183.59K 183.59K   2.94 MB  
384.00 MB  tpcds.store_sales

Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
---
M be/src/exprs/aggregate-functions-ir.cc
M testdata/workloads/functional-query/queries/QueryTest/alloc-fail-init.test
2 files changed, 153 insertions(+), 37 del

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-21 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 5:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6025/5/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

Line 1121: int r = rand() % state->num_samples;
> I know you didn't author this line, but could you switch to a better PRNG t
I think this should be addressed in a separate patch. I think we would have to 
benchmark the change, etc.


PS5, Line 1122: ((double) state->source_size - r) / state->source_size
> I'm not yet convinced this is correct. I'm not convinced it is correct in H
I don't think this is correct either. The distribution of keys will not be the 
same as what's described in the blog post: 
https://gregable.com/2007/10/reservoir-sampling.html
We should be generating a random number between 0 and 1 in update(), but we are 
"simulating" this here (see comment on line 1107), which gives us an incorrect 
distribution.

I don't think increasing the denominator by 1 as you suggested would fix the 
problem.

About your second point, in our case the weight k is always 1, so we don't need 
to raise r to some power.

I think algorithm should be addressed in a separate patch.


PS5, Line 1282: sort
> I know this also is present in HEAD, but it can be http://en.cppreference.c
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-21 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#6).

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..

IMPALA-4787: Optimize APPX_MEDIAN() memory usage

Before this change, ReservoirSample functions (such as APPX_MEDIAN())
allocated memory for 20,000 elements up front per grouping key. This
caused inefficient memory usage for aggregations with many grouping
keys.

This patch fixes this by initially allocating memory for 16 elements.
Once the buffer becomes full, we reallocate a new buffer with double
capacity and copy the original buffer into the new one. We continue
doubling the buffer size until the buffer has room for 20,000 elements
as before.

Testing:
Ran BE and some relevant EE tests locally.
No new tests were added, the existing tests should provide enough
coverage.

Performance benchmark:
I ran a performance benchmark locally on release build. The following
query results in 3,000 grouping keys and about 30,000 values per key:
SELECT MAX(a) from (
  SELECT c1, appx_median(c2) as a FROM benchmark GROUP BY c1) t

Before: 11.57s
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
-
06:AGGREGATE1   96.086us   96.086us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   26.629us   26.629us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   68.187us   87.887us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE32s851ms5s362ms   3.00K  -1   1.95 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  119.540ms  220.191ms   9.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE35s876ms6s254ms   9.00K  -1   2.93 GB  
128.00 MB  STREAMING
00:SCAN HDFS3  127.834ms  146.842ms  98.30M  -1  19.80 MB   
32.00 MB  tpcds_10_parquet.benchmark

After:  13.58s
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
---
06:AGGREGATE1  101.101us  101.101us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   32.296us   32.296us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   83.284us  120.137us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE33s190ms6s555ms   3.00K  -1   1.96 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  247.897ms  497.280ms   9.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE37s370ms8s460ms   9.00K  -1   4.71 GB  
128.00 MB  STREAMING
00:SCAN HDFS3  111.721ms  122.306ms  98.30M  -1  19.94 MB   
32.00 MB  tpcds_10_parquet.benchmark

Memory benchmark:
The following query was used to verify that this patch reduces memory
usage:
SELECT APPX_MEDIAN(ss_sold_date_sk)
FROM tpcds.store_sales
GROUP BY ss_customer_sk;

Peak Mem in Agg nodes is reduced from 4.94 GB to 19.82 MB.

Summary before:
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail

04:EXCHANGE 15.856ms5.856ms   14.82K  15.21K 0  
  -1.00 B  UNPARTITIONED
03:AGGREGATE33s721ms3s789ms   14.82K  15.21K   4.94 GB  
 10.00 MB  FINALIZE
02:EXCHANGE 3  139.276ms  157.753ms   15.60K  15.21K 0  
0  HASH(ss_customer_sk)
01:AGGREGATE32s851ms3s026ms   15.60K  15.21K   5.29 GB  
 10.00 MB  STREAMING
00:SCAN HDFS3   24.245ms   35.727ms  183.59K 183.59K   4.60 MB  
384.00 MB  tpcds.store_sales

Summary after:
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail

04:EXCHANGE 1  288.125us  288.125us   14.82K  15.21K 0  
  -1.00 B  UNPARTITIONED
03:AGGREGATE39.358ms   10.982ms   14.82K  15.21K  19.82 MB  
 10.00 MB  FINALIZE
02:EXCHANGE 3  129.832us  154.953us   15.62K  15.21K 0  
0  HASH(ss_customer_sk)
01:AGGREGATE3   11.086ms   13.102ms   15.62K  15.21K   9.49 MB  
 10.00 MB  STREAMING
00:SCAN HDFS3   40.154ms   50.220ms  183.59K 183.59K   2.94 MB  
384.00 MB  tpcds.store_sales

Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
---
M be/src/exprs/aggregate-functions-ir.cc
M testdata/workloads/functional-query/queries/QueryTest/alloc-fail-init.test
2 files changed, 153 insertions(+), 37 del

[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts

2017-02-21 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-2020: Add rounding for decimal casts
..


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5951/10/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

PS10, Line 38:   if (round) {
 : // Decimal V2 behavior
 : 
 : // Multiply the double by the scale.
 : //
 : // TODO: this could be made more precise by doing the 
computation in
 : // 64 bit with 128 bit immediate result instead of doing an 
additional
 : // multiply in 53-bit double precision floating point
 : 
 : d *= DecimalUtil::GetScaleMultiplier(scale);
 : d = std::round(d);
 : const T max_value = 
DecimalUtil::GetScaleMultiplier(precision);
 : if (abs(d) >= max_value) {
 :   *overflow = true;
 :   return DecimalValue();
 : }
 : 
 : // Return the rounded integer part.
 : return DecimalValue(static_cast(d));
 :   } else {
 : // TODO: IMPALA-4924: remove DECIMAL V1 code
 : 
 : // Check overflow.
 : const T max_value = 
DecimalUtil::GetScaleMultiplier(precision - scale);
 : if (abs(d) >= max_value) {
 :   *overflow = true;
 :   return DecimalValue();
 : }
 : 
 : // Multiply the double by the scale.
 : d *= DecimalUtil::GetScaleMultiplier(scale);
 : 
 : // Truncate and just take the integer part.
 : return DecimalValue(static_cast(d));
 :   }
> None I can come up with right now, but this was before I noticed the bit pa
If it won't break DECIMAL_V1, may as well merge the two paths and add the test 
cases such as 10^37 to verify things are still working as expected.


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4848: Add WIDHT BUCKET() function

2017-02-21 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4848: Add WIDHT_BUCKET() function
..


Patch Set 1:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/6023/1/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

Line 3456:   TestIsNull("width_bucket(NULL, 2, 14, 4)", TYPE_INT);
Test NULL for all args.

Add tests with extreme values to trigger edge cases. Also see my comments on 
the code.


http://gerrit.cloudera.org:8080/#/c/6023/1/be/src/exprs/math-functions-ir.cc
File be/src/exprs/math-functions-ir.cc:

Line 427:   if (expr.is_null) return IntVal::null();
do all null checks in one if


Line 429:   if (min_range == max_range) {
use min_range.val directly because the == operator will check for null again


Line 430: ctx->SetError("lower bound cannot be equal to upper bound");
Lower...


Line 434: ctx->SetError("Number of buckets should be greater than zero");
should -> must

also print which value was given


Line 437:   if (expr.val >= max_range.val) return num_buckets.val + 1;
Comment that these cases go into the special under/overflow buckets.

We need to be mindful of the case where num_buckets is already MAX_INT, adding 
+1 here will overflow it. We should return null in that case. Also add a test.


Line 439:   double num_elems = (max_range.val - min_range.val) / 
num_buckets.val;
bucket_width?

This could become 0 in extreme cases, and then we'd get infinity in the next 
line. Casting from infinity to int results in undefined behavior, so I think we 
should handle this case, and add a test that triggers it.


http://gerrit.cloudera.org:8080/#/c/6023/1/be/src/exprs/math-functions.h
File be/src/exprs/math-functions.h:

Line 128:   static IntVal WidthBucket(FunctionContext* ctx, const DoubleVal& 
expr,
Move in the public section like the other functions.


-- 
To view, visit http://gerrit.cloudera.org:8080/6023
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I081bc916b1bef7b929ca161a9aade3b54c6b858f
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: Yes


[Impala-ASF-CR] kudu: fix uninitialized variable usage warning

2017-02-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: kudu: fix uninitialized variable usage warning
..


Patch Set 1:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/289/

-- 
To view, visit http://gerrit.cloudera.org:8080/6098
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4f87c39cf9259bec8ba9a84a2543fb1f2e4a3422
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4546: Fix Moscow timezone conversion after 2014

2017-02-21 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-4546: Fix Moscow timezone conversion after 2014
..


Patch Set 6: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5969/5/be/src/exprs/timestamp-functions.cc
File be/src/exprs/timestamp-functions.cc:

PS5, Line 228:  }
> The function does not end here. It ends on line 245.
Oops, sorry about that!


-- 
To view, visit http://gerrit.cloudera.org:8080/5969
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id6e3f2c9f6ba29749a26bc1087e664637bc02528
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4546: Fix Moscow timezone conversion after 2014

2017-02-21 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4546: Fix Moscow timezone conversion after 2014
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5969/5/be/src/exprs/timestamp-functions.cc
File be/src/exprs/timestamp-functions.cc:

PS5, Line 228:  }
> This function can fall off the bottom without returning anything, which I t
The function does not end here. It ends on line 245.


-- 
To view, visit http://gerrit.cloudera.org:8080/5969
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id6e3f2c9f6ba29749a26bc1087e664637bc02528
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts

2017-02-21 Thread Zach Amsden (Code Review)
Zach Amsden has uploaded a new patch set (#16).

Change subject: IMPALA-2020: Add rounding for decimal casts
..

IMPALA-2020: Add rounding for decimal casts

This change adds support for DECIMAL_V2 rounding behavior for both
DECIMAL to INT and DOUBLE to DECIMAL casts.  The round behavior
implemented for exact halves is round halves away from zero (e.g
(0.5 -> 1) and (-0.5 -> -1)).

Testing: Added expr-test and decimal-test test coverage as well as
manual testing.  I tried to update the expr benchmark to get some
kind of results but the benchmark is pretty bit-rotted.  It was
throwing JNI exceptions.  Fixed up the JNI init call, but there is
still a lot of work to do to get this back in a runnable state.
Even with the hack to get at the RuntimeContext, we end up getting
null derefs due to the slot descriptor table not being initialized.

I have decided to wait on expanding the python test until the bugs
with overflow are fixed, which will make it easier to test sane
behavior.

[localhost:21000] > select cast(0.5 AS int);
+--+
| cast(0.5 as int) |
+--+
| 0|
+--+
Fetched 1 row(s) in 0.01s
[localhost:21000] > select cast(cast(0.5999 as float) as decimal(5,1));
+-+
| cast(cast(0.5999 as float) as decimal(5,1)) |
+-+
| 0.5 |
+-+
Fetched 1 row(s) in 0.01s
[localhost:21000] > set decimal_v2=1;
DECIMAL_V2 set to 1
[localhost:21000] > select cast(0.5 AS int);
+--+
| cast(0.5 as int) |
+--+
| 1|
+--+
Fetched 1 row(s) in 0.01s
[localhost:21000] > select cast(cast(0.5999 as float) as decimal(5,1));
+-+
| cast(cast(0.5999 as float) as decimal(5,1)) |
+-+
| 0.6 |
+-+
Fetched 1 row(s) in 0.01s

Performance summary.  In all cases I have tried multiple times and
taken the fastest query results.

Old version, head at 815c76f9cbbe6585ebed961da506fc54ce2ef4e3:

[localhost:21000] > select sum(cast(l_extendedprice as bigint)) from
tpch10_parquet.lineitem;
Query: select sum(cast(l_extendedprice as bigint)) from
tpch10_parquet.lineitem
Query submitted at: 2017-02-21 19:31:32 (Coordinator:
http://impala-dev:25000)
Query progress can be monitored at:
http://impala-dev:25000/query_plan?query_id=bf4d06517e3093ca:b053953b
+--+
| sum(cast(l_extendedprice as bigint)) |
+--+
| 2293784575265|
+--+
Fetched 1 row(s) in 0.53s

With this change, and decimal_v2 off:

[localhost:21000] > select sum(cast(l_extendedprice as bigint)) from
tpch10_parquet.lineitem;
Query: select sum(cast(l_extendedprice as bigint)) from
tpch10_parquet.lineitem
Query submitted at: 2017-02-21 19:06:05 (Coordinator:
http://impala-dev:25000)
Query progress can be monitored at:
http://impala-dev:25000/query_plan?query_id=e54f6fc2c21d63d0:da7b6baa
+--+
| sum(cast(l_extendedprice as bigint)) |
+--+
| 2293784575265|
+--+
Fetched 1 row(s) in 0.52s

Note that there is some noise / instability in these results and across
invocations there is quite a bit of variance.  Still we appear not to
have regressed.

With decimal V2 enabled, we loose some performance due to rounding.

[localhost:21000] > set decimal_v2=1;
DECIMAL_V2 set to 1
[localhost:21000] >  select sum(cast(l_extendedprice as bigint)) from
tpch10_parquet.lineitem;
Query: select sum(cast(l_extendedprice as bigint)) from
tpch10_parquet.lineitem
Query submitted at: 2017-02-21 21:04:45 (Coordinator:
http://impala-dev:25000)
Query progress can be monitored at:
http://impala-dev:25000/query_plan?query_id=444a15a709f0672:74197af
+--+
| sum(cast(l_extendedprice as bigint)) |
+--+
| 2293814088985|
+--+
Fetched 1 row(s) in 0.63s

So we're about 20% slower.  The variance is quite a lot so this is not a
scientific number, but the trend is maintained.  So we have some work to
do to get this back.

Casting from double seems to be roughly at parity:

Old version, head at 815c76f9cbbe6585ebed961da506fc54ce2ef4e3:

[localhost:21000] >  select sum(cast(cast(l_extendedprice as double) as
decimal(14,2))) from tpch10_parquet.lineitem;
Query: select sum(cast(cast(l_extendedprice as double) as
decimal(14,2))) from tpch10_parquet.lineitem
Query submitte

[Impala-ASF-CR] IMPALA-4546: Fix Moscow timezone conversion after 2014

2017-02-21 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-4546: Fix Moscow timezone conversion after 2014
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5969/5/be/src/exprs/timestamp-functions.cc
File be/src/exprs/timestamp-functions.cc:

PS5, Line 228:  }
This function can fall off the bottom without returning anything, which I think 
is technically http://catb.org/jargon/html/N/nasal-demons.html


-- 
To view, visit http://gerrit.cloudera.org:8080/5969
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id6e3f2c9f6ba29749a26bc1087e664637bc02528
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Make sure impala doesn't pickup the system's boost cmake module

2017-02-21 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: Make sure impala doesn't pickup the system's boost cmake module
..


Patch Set 2:

I filed https://issues.cloudera.org/browse/IMPALA-4959

Please just add that to the commit message

-- 
To view, visit http://gerrit.cloudera.org:8080/5994
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I759e1a4f8f69727cc1224bf460326140fd2131a2
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4904,IMPALA-4914: add targeted-stress to exhaustive tests

2017-02-21 Thread Jim Apple (Code Review)
Jim Apple has submitted this change and it was merged.

Change subject: IMPALA-4904,IMPALA-4914: add targeted-stress to exhaustive tests
..


IMPALA-4904,IMPALA-4914: add targeted-stress to exhaustive tests

This patch allows any test suites whose workload is "targeted-stress" to
be run in so-called "exhaustive" mode. Before this patch, only suites
whose workload was "functional-query" would be run exhaustively. More on
this flaw is in IMPALA-3947.

The net effects are:

1. We fix IMPALA-4904, which allows test_ddl_stress to start running
   again.
2. We also improve the situation in IMPALA-4914 by getting
   TestSpillStress to run, though we don't fix its
   not-running-concurrently problem.

The other mini-cluster stress tests were disabled in this commit:
  IMPALA-2605: Omit the sort and mini stress tests
so they are not directly affected here.

I also tried to clarify what "exhaustive" means in some of our shell
scripts, via help text and comments.

An exhaustive build+test run showed test_ddl_stress and TestSpillStress
now get run and passed. This adds roughly 12 minutes to a build that's
on the order of 13-14 hours.

Change-Id: Ie6bd5bbd380e636d680368e908519b042d79dfec
Reviewed-on: http://gerrit.cloudera.org:8080/6002
Tested-by: Impala Public Jenkins
Reviewed-by: Jim Apple 
---
M bin/run-all-tests.sh
M buildall.sh
M tests/custom_cluster/test_spilling.py
M tests/stress/test_ddl_stress.py
4 files changed, 85 insertions(+), 39 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Jim Apple: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/6002
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ie6bd5bbd380e636d680368e908519b042d79dfec
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Brown 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4904,IMPALA-4914: add targeted-stress to exhaustive tests

2017-02-21 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-4904,IMPALA-4914: add targeted-stress to exhaustive tests
..


Patch Set 8: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6002
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie6bd5bbd380e636d680368e908519b042d79dfec
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Brown 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4904,IMPALA-4914: add targeted-stress to exhaustive tests

2017-02-21 Thread Michael Brown (Code Review)
Michael Brown has posted comments on this change.

Change subject: IMPALA-4904,IMPALA-4914: add targeted-stress to exhaustive tests
..


Patch Set 8:

Sorry Jim. This needs a +2 after the rebase, and a submit.

-- 
To view, visit http://gerrit.cloudera.org:8080/6002
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie6bd5bbd380e636d680368e908519b042d79dfec
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Brown 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] build: don't look in system paths for kudu client

2017-02-21 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: build: don't look in system paths for kudu client
..


Patch Set 1:

I submit it, it doesn't get kicked off automatically

-- 
To view, visit http://gerrit.cloudera.org:8080/6097
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8dad3d92e0bd07c27c3a58f6964d4da88218049c
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[Impala-ASF-CR] build: don't look in system paths for kudu client

2017-02-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: build: don't look in system paths for kudu client
..


Patch Set 1:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/288/

-- 
To view, visit http://gerrit.cloudera.org:8080/6097
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8dad3d92e0bd07c27c3a58f6964d4da88218049c
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4934: Disable Kudu OpenSSL initialization

2017-02-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4934: Disable Kudu OpenSSL initialization
..


Patch Set 5:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/287/

-- 
To view, visit http://gerrit.cloudera.org:8080/6056
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3f13f3af512c6d771979638da593685524c73086
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4934: Disable Kudu OpenSSL initialization

2017-02-21 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-4934: Disable Kudu OpenSSL initialization
..


Patch Set 5: Code-Review+2

had to bump the client version one more time to get another change in Kudu

-- 
To view, visit http://gerrit.cloudera.org:8080/6056
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3f13f3af512c6d771979638da593685524c73086
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4934: Disable Kudu OpenSSL initialization

2017-02-21 Thread Matthew Jacobs (Code Review)
Hello Impala Public Jenkins, Henry Robinson,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6056

to look at the new patch set (#5).

Change subject: IMPALA-4934: Disable Kudu OpenSSL initialization
..

IMPALA-4934: Disable Kudu OpenSSL initialization

Bumps the Kudu version to include the change to the client
that allows Impala to disable SSL initialization.

In authentication.cc, after Impala initializes OpenSSL,
Impala then disables Kudu's OpenSSL init.

Change-Id: I3f13f3af512c6d771979638da593685524c73086
---
M be/src/rpc/authentication.cc
M bin/impala-config.sh
M infra/python/deps/download_requirements
M infra/python/deps/requirements.txt
4 files changed, 12 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/6056/5
-- 
To view, visit http://gerrit.cloudera.org:8080/6056
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3f13f3af512c6d771979638da593685524c73086
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 


[Impala-ASF-CR] IMPALA-4904,IMPALA-4914: add targeted-stress to exhaustive tests

2017-02-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4904,IMPALA-4914: add targeted-stress to exhaustive tests
..


Patch Set 8: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6002
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie6bd5bbd380e636d680368e908519b042d79dfec
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Brown 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] [DOCS] Release note updates for Impala 2.8

2017-02-21 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: [DOCS] Release note updates for Impala 2.8
..


Patch Set 9:

2.8.0 shipped a month ago. The delay on this patch is going to start preventing 
developers from updating impala_new_features.xml with their changes that they 
think will go in 2.9.0, thus preventing an incremental update method that would 
ensure that file stays up-to-date and can be released along with the actual 
release.

-- 
To view, visit http://gerrit.cloudera.org:8080/5668
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7c47f422e509cec6d3eb8aaa82294b584f393aed
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Ambreen Kazi 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Laurel Hale 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Silvius Rus 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3401 [DOCS] Part 4 of "Cloudera Manager" removal

2017-02-21 Thread Laurel Hale (Code Review)
Laurel Hale has abandoned this change.

Change subject: IMPALA-3401 [DOCS] Part 4 of "Cloudera Manager" removal
..


Abandoned

Made a mistake trying to create a second patch set.

-- 
To view, visit http://gerrit.cloudera.org:8080/6099
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: Icfe3a7bd7d2a086fa5dcbc0c06ebeee1f52c5519
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Laurel Hale 


[Impala-ASF-CR] IMPALA-3401 [DOCS] Part 4 of "Cloudera Manager" removal. Most of these fixes involved hiding the paragraphs with the DITA attribute 'audience="hidden"' and then inserting a paragraph s

2017-02-21 Thread Laurel Hale (Code Review)
Laurel Hale has uploaded a new patch set (#2).

Change subject: IMPALA-3401 [DOCS] Part 4 of "Cloudera Manager" removal. Most 
of these fixes involved hiding the paragraphs with the DITA attribute 
'audience="hidden"' and then inserting a paragraph suitable for upstream 
documentation. This hides the mention of Cloudera 
..

IMPALA-3401 [DOCS] Part 4 of "Cloudera Manager" removal.
Most of these fixes involved hiding the paragraphs with
the DITA attribute 'audience="hidden"' and then inserting
a paragraph suitable for upstream documentation. This
hides the mention of Cloudera Manager in the rendered
documentation. In a subsequent cleanup project, the
"Cloudera Manager" mentions will be removed from the
XML. This patch includes a fix to impala_common.xml
requested by John Russell.

Change-Id: I3c3c2177e0b9c4c81f1541820013c66a59c0c7b1
---
M docs/shared/impala_common.xml
M docs/topics/impala_perf_resources.xml
M docs/topics/impala_perf_skew.xml
M docs/topics/impala_perf_testing.xml
M docs/topics/impala_scalability.xml
M docs/topics/impala_txtfile.xml
6 files changed, 56 insertions(+), 14 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/6069/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6069
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3c3c2177e0b9c4c81f1541820013c66a59c0c7b1
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Laurel Hale 
Gerrit-Reviewer: Ambreen Kazi 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Matthew Jacobs 


[Impala-ASF-CR] IMPALA-3401 [DOCS] Part 4 of "Cloudera Manager" removal

2017-02-21 Thread Laurel Hale (Code Review)
Laurel Hale has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6099

Change subject: IMPALA-3401 [DOCS] Part 4 of "Cloudera Manager" removal
..

IMPALA-3401 [DOCS] Part 4 of "Cloudera Manager" removal

Fixed impala_common.xml as requested.

Change-Id: Icfe3a7bd7d2a086fa5dcbc0c06ebeee1f52c5519
---
M docs/shared/impala_common.xml
1 file changed, 8 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/6099/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6099
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Icfe3a7bd7d2a086fa5dcbc0c06ebeee1f52c5519
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Laurel Hale 


[Impala-ASF-CR] IMPALA-4821: Update AVG() for DECIMAL V2

2017-02-21 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4821: Update AVG() for DECIMAL_V2
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6038/4//COMMIT_MSG
Commit Message:

PS4, Line 25: types
> values
Done.

This may seem like a concern in the long run, isn't it ?


http://gerrit.cloudera.org:8080/#/c/6038/4/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java:

PS4, Line 338: Our implementation is similar to MS SQL which takes the max of 
the input's scale
 :   // and MIN_ADJUSTED_SCALE.
> How about explaining why we deviate from SQL Server and from SUM()/COUNT():
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6038
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I28f5ef0370938440eb5b1c6d29b2f24e6f88499f
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4821: Update AVG() for DECIMAL V2

2017-02-21 Thread Michael Ho (Code Review)
Hello Dan Hecht,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6038

to look at the new patch set (#5).

Change subject: IMPALA-4821: Update AVG() for DECIMAL_V2
..

IMPALA-4821: Update AVG() for DECIMAL_V2

This change implements the DECIMAL_V2's behavior for AVG().
The differences with DECIMAL_V1 are:

1. The output type has a minimum scale of 6. This is similar
to MS SQL's behavior which takes the max of 6 and the input
type's scale. We deviate from MS SQL in the output's precision
which is always set to 38. We use the smallest precision which
can store the output. A key insight is that the output of AVG()
is no wider than the inputs. Precision only needs to be adjusted
when the scale is augmented. Using a smaller precision avoids
potential loss of precision in subsequent decimal operations
(e.g. division) if AVG() is a subexpression. Please note that
the output type is different from SUM()/COUNT() as the latter
can have a much larger scale.

2. Due to a minimum of 6 decimal places for the output,
AVG() for decimal values whose whole number part exceeds 32
decimal places (e.g. DECIMAL(38,4), DECIMAL(33,0)) will
always overflow as the scale is augmented to 6. Certain
decimal types which work with AVG() in DECIMAL_V1 no longer
work in DECIMAL_V2.

Change-Id: I28f5ef0370938440eb5b1c6d29b2f24e6f88499f
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/expr-test.cc
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test
M testdata/workloads/functional-query/queries/QueryTest/decimal.test
5 files changed, 235 insertions(+), 76 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/6038/5
-- 
To view, visit http://gerrit.cloudera.org:8080/6038
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I28f5ef0370938440eb5b1c6d29b2f24e6f88499f
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 


[Impala-ASF-CR] IMPALA-4546: Fix Moscow timezone conversion after 2014

2017-02-21 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#6).

Change subject: IMPALA-4546: Fix Moscow timezone conversion after 2014
..

IMPALA-4546: Fix Moscow timezone conversion after 2014

In 2014 Moscow timezone rules changed from UTC+3 with no DST to UTC+4
with no DST. A special case has been added to timestamp functions to
handle this.

Testing:
Added BE Expr tests.

Change-Id: Id6e3f2c9f6ba29749a26bc1087e664637bc02528
---
M be/src/exprs/expr-test.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timezone_db.cc
M be/src/exprs/timezone_db.h
4 files changed, 114 insertions(+), 44 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/5969/6
-- 
To view, visit http://gerrit.cloudera.org:8080/5969
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id6e3f2c9f6ba29749a26bc1087e664637bc02528
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-4546: Fix Moscow timezone conversion after 2014

2017-02-21 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#6).

Change subject: IMPALA-4546: Fix Moscow timezone conversion after 2014
..

IMPALA-4546: Fix Moscow timezone conversion after 2014

In 2014 Moscow timezone rules changed from UTC+3 with no DST to UTC+4
with no DST. A special case has been added to timestamp functions to
handle this.

Testing:
Added BE Expr tests.

Change-Id: Id6e3f2c9f6ba29749a26bc1087e664637bc02528
---
M be/src/exprs/expr-test.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timezone_db.cc
M be/src/exprs/timezone_db.h
4 files changed, 114 insertions(+), 44 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/5969/6
-- 
To view, visit http://gerrit.cloudera.org:8080/5969
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id6e3f2c9f6ba29749a26bc1087e664637bc02528
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-4546: Fix Moscow timezone conversion after 2014

2017-02-21 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4546: Fix Moscow timezone conversion after 2014
..


Patch Set 5:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/5969/5/be/src/exprs/timestamp-functions.cc
File be/src/exprs/timestamp-functions.cc:

Line 215: int msk_transition_day = 2456956;
> AND_CAPS
Done


Line 215: int msk_transition_day = 2456956;
> const for these three vars
Done


Line 226: tv.time().hours() < (msk_transition_hour_utc + 
msk_utc_offset) % 24)) {
> Is this correct? Before, we had tv.time().hours() < 1. Now we have 22 + 4 %
Yes, this is correct. The old (pre 2014) timezone is returned also for times 
between 1 and 2 now (before it was returned only for times before 1. We handle 
this case property by returning null on line 129.


-- 
To view, visit http://gerrit.cloudera.org:8080/5969
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id6e3f2c9f6ba29749a26bc1087e664637bc02528
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] kudu: fix uninitialized variable usage warning

2017-02-21 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: kudu: fix uninitialized variable usage warning
..


Patch Set 1: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6098
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4f87c39cf9259bec8ba9a84a2543fb1f2e4a3422
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[Impala-ASF-CR] kudu: fix uninitialized variable usage warning

2017-02-21 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: kudu: fix uninitialized variable usage warning
..


Patch Set 1:

I doubt we'd see this in real life, since as far as I know glog always sets the 
severity to one of the ones covered in the case statement. Otherwise, DEBUG 
builds would be failing with the DCHECK here. But, the warning being emitted 
during compilation was bugging me.

-- 
To view, visit http://gerrit.cloudera.org:8080/6098
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4f87c39cf9259bec8ba9a84a2543fb1f2e4a3422
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[Impala-ASF-CR] build: don't look in system paths for kudu client

2017-02-21 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: build: don't look in system paths for kudu client
..


Patch Set 1:

I'm not sure I can trigger a GVM. Does your +2 automatically trigger it? LMK if 
there's anything I need to do to push this forward, thanks.

-- 
To view, visit http://gerrit.cloudera.org:8080/6097
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8dad3d92e0bd07c27c3a58f6964d4da88218049c
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[Impala-ASF-CR] Avoid including Boost cmake support from system path

2017-02-21 Thread Todd Lipcon (Code Review)
Todd Lipcon has abandoned this change.

Change subject: Avoid including Boost cmake support from system path
..


Abandoned

-- 
To view, visit http://gerrit.cloudera.org:8080/6096
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: I9767952aada11d4d44da99bc0011d6d0829fad4c
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Todd Lipcon 


[Impala-ASF-CR] Avoid including Boost cmake support from system path

2017-02-21 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: Avoid including Boost cmake support from system path
..


Patch Set 1:

Oh, interesting. He and I didn't coordinate, but I'm guessing we both tried 
building on the same dev box (the Kudu team shares a beefy box we often use for 
builds). His has a better commit message than mine, so I'll abandon this one.

-- 
To view, visit http://gerrit.cloudera.org:8080/6096
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9767952aada11d4d44da99bc0011d6d0829fad4c
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[Impala-ASF-CR] build: don't look in system paths for kudu client

2017-02-21 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: build: don't look in system paths for kudu client
..


Patch Set 1: Code-Review+2

assuming this passes real builds

-- 
To view, visit http://gerrit.cloudera.org:8080/6097
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8dad3d92e0bd07c27c3a58f6964d4da88218049c
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


  1   2   >