[Impala-ASF-CR] IMPALA-2020, 4915, 4936: Add rounding for decimal casts
Zach Amsden has posted comments on this change. Change subject: IMPALA-2020, 4915, 4936: Add rounding for decimal casts .. Patch Set 18: > Uploaded patch set 20. > Uploaded patch set 21. rebased and fixed merge conflict -- To view, visit http://gerrit.cloudera.org:8080/5951 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810 Gerrit-PatchSet: 18 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2020, 4915, 4936: Add rounding for decimal casts
Zach Amsden has uploaded a new patch set (#21). Change subject: IMPALA-2020, 4915, 4936: Add rounding for decimal casts .. IMPALA-2020, 4915, 4936: Add rounding for decimal casts This change adds support for DECIMAL_V2 rounding behavior for both DECIMAL to INT and DOUBLE to DECIMAL casts. The round behavior implemented for exact halves is round halves away from zero (e.g (0.5 -> 1) and (-0.5 -> -1)). This change also fixes two open bugs regarding overflow checking. The root cause of both behaviors turned out to be the same - a missing std:: caused the wrong abs() function to be used. Due to details of IEEE floating point representation, this actually masked another bug, as NaN is often represented as all 1-bits, which fails the overflow test. Since the implicit conversion to int lost precision, we ended up storing large numbers that don't actually represent valid decimal numbers in the range when the value happened to be +/- Infinity. This caused the rendering back to ASCII characters to go awry, but is otherwise harmless. Testing: Added expr-test and decimal-test test coverage as well as manual testing. I tried to update the expr benchmark to get some kind of results but the benchmark is pretty bit-rotted. It was throwing JNI exceptions. Fixed up the JNI init call, but there is still a lot of work to do to get this back in a runnable state. Even with the hack to get at the RuntimeContext, we end up getting null derefs due to the slot descriptor table not being initialized. Output comparisons, before | after +--+ | cast(0.5 as int) | +--+ | 0| 1| +--+ +-+ | cast(cast(0.5999 as float) as decimal(5,1)) | +-+ | 0.5 | 0.6 | +-+ Performance summary. In all cases I have tried multiple times and taken the fastest query results. Old version, head at 815c76f9cbbe6585ebed961da506fc54ce2ef4e3: select sum(cast(l_extendedprice as bigint)) from tpch10_parquet.lineitem; +--+ | sum(cast(l_extendedprice as bigint)) | +--+ | 2293784575265| +--+ Fetched 1 row(s) in 0.53s With this change, and decimal_v2 off: +--+ | sum(cast(l_extendedprice as bigint)) | +--+ | 2293784575265| +--+ Fetched 1 row(s) in 0.52s Note that there is some noise / instability in these results and across invocations there is quite a bit of variance. Still we appear not to have regressed. With decimal V2 enabled, we loose some performance due to rounding. DECIMAL_V2 set to 1 +--+ | sum(cast(l_extendedprice as bigint)) | +--+ | 2293814088985| +--+ Fetched 1 row(s) in 0.63s So we're about 20% slower. The variance is quite a lot so this is not a scientific number, but the trend is maintained. So we have some work to do to get this back. Casting from double seems to be roughly at parity: Old version, head at 815c76f9cbbe6585ebed961da506fc54ce2ef4e3: +-+ | sum(cast(cast(l_extendedprice as double) as decimal(14,2))) | +-+ | 2293813121802.09| +-+ Fetched 1 row(s) in 0.63s +--+ | sum(cast(cast(l_extendedprice as double) as decimal(38,10))) | +--+ | 2293813156773.3596978911 | +--+ Fetched 1 row(s) in 0.72s New version, decimal_v2=0 +-+ | sum(cast(cast(l_extendedprice as double) as decimal(14,2))) | +-+ | 2293813121802.09| +-+ Fetched 1 row(s) in 0.64s +--+ | sum(cast(cast(l_extendedprice as double) as decimal(38,10))) | +--+ | 2293813156773.3596978911 | +--+ Fetched 1 row(s) in 0.73s New version, decimal_v2=1; +-+
[Impala-ASF-CR] IMPALA-2020, 4915, 4936: Add rounding for decimal casts
Zach Amsden has uploaded a new patch set (#20). Change subject: IMPALA-2020, 4915, 4936: Add rounding for decimal casts .. IMPALA-2020, 4915, 4936: Add rounding for decimal casts This change adds support for DECIMAL_V2 rounding behavior for both DECIMAL to INT and DOUBLE to DECIMAL casts. The round behavior implemented for exact halves is round halves away from zero (e.g (0.5 -> 1) and (-0.5 -> -1)). This change also fixes two open bugs regarding overflow checking. The root cause of both behaviors turned out to be the same - a missing std:: caused the wrong abs() function to be used. Due to details of IEEE floating point representation, this actually masked another bug, as NaN is often represented as all 1-bits, which fails the overflow test. Since the implicit conversion to int lost precision, we ended up storing large numbers that don't actually represent valid decimal numbers in the range when the value happened to be +/- Infinity. This caused the rendering back to ASCII characters to go awry, but is otherwise harmless. Testing: Added expr-test and decimal-test test coverage as well as manual testing. I tried to update the expr benchmark to get some kind of results but the benchmark is pretty bit-rotted. It was throwing JNI exceptions. Fixed up the JNI init call, but there is still a lot of work to do to get this back in a runnable state. Even with the hack to get at the RuntimeContext, we end up getting null derefs due to the slot descriptor table not being initialized. Output comparisons, before | after +--+ | cast(0.5 as int) | +--+ | 0| 1| +--+ +-+ | cast(cast(0.5999 as float) as decimal(5,1)) | +-+ | 0.5 | 0.6 | +-+ Performance summary. In all cases I have tried multiple times and taken the fastest query results. Old version, head at 815c76f9cbbe6585ebed961da506fc54ce2ef4e3: select sum(cast(l_extendedprice as bigint)) from tpch10_parquet.lineitem; +--+ | sum(cast(l_extendedprice as bigint)) | +--+ | 2293784575265| +--+ Fetched 1 row(s) in 0.53s With this change, and decimal_v2 off: +--+ | sum(cast(l_extendedprice as bigint)) | +--+ | 2293784575265| +--+ Fetched 1 row(s) in 0.52s Note that there is some noise / instability in these results and across invocations there is quite a bit of variance. Still we appear not to have regressed. With decimal V2 enabled, we loose some performance due to rounding. DECIMAL_V2 set to 1 +--+ | sum(cast(l_extendedprice as bigint)) | +--+ | 2293814088985| +--+ Fetched 1 row(s) in 0.63s So we're about 20% slower. The variance is quite a lot so this is not a scientific number, but the trend is maintained. So we have some work to do to get this back. Casting from double seems to be roughly at parity: Old version, head at 815c76f9cbbe6585ebed961da506fc54ce2ef4e3: +-+ | sum(cast(cast(l_extendedprice as double) as decimal(14,2))) | +-+ | 2293813121802.09| +-+ Fetched 1 row(s) in 0.63s +--+ | sum(cast(cast(l_extendedprice as double) as decimal(38,10))) | +--+ | 2293813156773.3596978911 | +--+ Fetched 1 row(s) in 0.72s New version, decimal_v2=0 +-+ | sum(cast(cast(l_extendedprice as double) as decimal(14,2))) | +-+ | 2293813121802.09| +-+ Fetched 1 row(s) in 0.64s +--+ | sum(cast(cast(l_extendedprice as double) as decimal(38,10))) | +--+ | 2293813156773.3596978911 | +--+ Fetched 1 row(s) in 0.73s New version, decimal_v2=1; +-+
[Impala-ASF-CR] IMPALA-2020, 4915, 4936: Add rounding for decimal casts
Zach Amsden has posted comments on this change. Change subject: IMPALA-2020, 4915, 4936: Add rounding for decimal casts .. Patch Set 18: (4 comments) Nope, doesn't solve the DCHECK issue. I added a DCHECK that the static cast back to the target type from double equals the actual value of the target type, and it failed. Issue is real, see the updated comment in be/src/runtime/decimal-value.inline.h Floating point is not so fun, and the conversion really does lose precision. Issue is also pre-existing so I'm not trying to fix it now. http://gerrit.cloudera.org:8080/#/c/5951/18/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: Line 1395: { true, 0, 38, 38 }}}, // This is overflowing, but wasn't noticed by the test before > let's just say IMPALA-4964 rather than refer to what the old code did, and Done PS18, Line 1397: related to this > related to what? Done PS18, Line 1398: IMAPALA > IMPALA. also combine to one line Done Line 1635: TestIsNotNull(c.expr, type); > mentioned in the other change -- Done -- To view, visit http://gerrit.cloudera.org:8080/5951 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810 Gerrit-PatchSet: 18 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2020, 4915, 4936: Add rounding for decimal casts
Zach Amsden has uploaded a new patch set (#19). Change subject: IMPALA-2020, 4915, 4936: Add rounding for decimal casts .. IMPALA-2020, 4915, 4936: Add rounding for decimal casts This change adds support for DECIMAL_V2 rounding behavior for both DECIMAL to INT and DOUBLE to DECIMAL casts. The round behavior implemented for exact halves is round halves away from zero (e.g (0.5 -> 1) and (-0.5 -> -1)). This change also fixes two open bugs regarding overflow checking. The root cause of both behaviors turned out to be the same - a missing std:: caused the wrong abs() function to be used. Due to details of IEEE floating point representation, this actually masked another bug, as NaN is often represented as all 1-bits, which fails the overflow test. Since the implicit conversion to int lost precision, we ended up storing large numbers that don't actually represent valid decimal numbers in the range when the value happened to be +/- Infinity. This caused the rendering back to ASCII characters to go awry, but is otherwise harmless. Testing: Added expr-test and decimal-test test coverage as well as manual testing. I tried to update the expr benchmark to get some kind of results but the benchmark is pretty bit-rotted. It was throwing JNI exceptions. Fixed up the JNI init call, but there is still a lot of work to do to get this back in a runnable state. Even with the hack to get at the RuntimeContext, we end up getting null derefs due to the slot descriptor table not being initialized. Output comparisons, before | after +--+ | cast(0.5 as int) | +--+ | 0| 1| +--+ +-+ | cast(cast(0.5999 as float) as decimal(5,1)) | +-+ | 0.5 | 0.6 | +-+ Performance summary. In all cases I have tried multiple times and taken the fastest query results. Old version, head at 815c76f9cbbe6585ebed961da506fc54ce2ef4e3: select sum(cast(l_extendedprice as bigint)) from tpch10_parquet.lineitem; +--+ | sum(cast(l_extendedprice as bigint)) | +--+ | 2293784575265| +--+ Fetched 1 row(s) in 0.53s With this change, and decimal_v2 off: +--+ | sum(cast(l_extendedprice as bigint)) | +--+ | 2293784575265| +--+ Fetched 1 row(s) in 0.52s Note that there is some noise / instability in these results and across invocations there is quite a bit of variance. Still we appear not to have regressed. With decimal V2 enabled, we loose some performance due to rounding. DECIMAL_V2 set to 1 +--+ | sum(cast(l_extendedprice as bigint)) | +--+ | 2293814088985| +--+ Fetched 1 row(s) in 0.63s So we're about 20% slower. The variance is quite a lot so this is not a scientific number, but the trend is maintained. So we have some work to do to get this back. Casting from double seems to be roughly at parity: Old version, head at 815c76f9cbbe6585ebed961da506fc54ce2ef4e3: +-+ | sum(cast(cast(l_extendedprice as double) as decimal(14,2))) | +-+ | 2293813121802.09| +-+ Fetched 1 row(s) in 0.63s +--+ | sum(cast(cast(l_extendedprice as double) as decimal(38,10))) | +--+ | 2293813156773.3596978911 | +--+ Fetched 1 row(s) in 0.72s New version, decimal_v2=0 +-+ | sum(cast(cast(l_extendedprice as double) as decimal(14,2))) | +-+ | 2293813121802.09| +-+ Fetched 1 row(s) in 0.64s +--+ | sum(cast(cast(l_extendedprice as double) as decimal(38,10))) | +--+ | 2293813156773.3596978911 | +--+ Fetched 1 row(s) in 0.73s New version, decimal_v2=1; +-+
[Impala-ASF-CR] IMPALA-4936 and IMPALA-4915: Fix decimal overflow test
Jim Apple has posted comments on this change. Change subject: IMPALA-4936 and IMPALA-4915: Fix decimal overflow test .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/6068/2/be/src/runtime/decimal-value.inline.h File be/src/runtime/decimal-value.inline.h: PS2, Line 83: abs > I tried that and it actually turned out to be a bug. The places where std: I'm confused: how is std::abs wrong here for integers? int64_t d is never and int on x86-64 linux, right? Also, isn't C's global-namespace abs UB on INT_MIN? -- To view, visit http://gerrit.cloudera.org:8080/6068 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8de7440a585c1d3d937fcbb435b9ead77e7b5a63 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2020, 4915, 4936: Add rounding for decimal casts
Dan Hecht has posted comments on this change. Change subject: IMPALA-2020, 4915, 4936: Add rounding for decimal casts .. Patch Set 18: (4 comments) Does this change solve the DCHECK you were hitting running expr-test? http://gerrit.cloudera.org:8080/#/c/5951/18/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: Line 1395: { true, 0, 38, 38 }}}, // This is overflowing, but wasn't noticed by the test before let's just say IMPALA-4964 rather than refer to what the old code did, and why not combine both lines (we have the same bug in both v1 and v2)? PS18, Line 1397: related to this related to what? PS18, Line 1398: IMAPALA IMPALA. also combine to one line Line 1635: TestIsNotNull(c.expr, type); mentioned in the other change -- thanks for finding that. we should probably add EXPECT_EQ(result, StringParser::PARSE_SUCCESS); to the end of TestDecimalValue(). What's happening is the decimal string parser is failing (when parsing "NULL"), but when it does that it happens to return a DecimalValue with value()=0. -- To view, visit http://gerrit.cloudera.org:8080/5951 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810 Gerrit-PatchSet: 18 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4936 and IMPALA-4915: Fix decimal overflow test
Dan Hecht has posted comments on this change. Change subject: IMPALA-4936 and IMPALA-4915: Fix decimal overflow test .. Patch Set 2: since you folded this in to the other change, please abandon this change so people don't review it more. -- To view, visit http://gerrit.cloudera.org:8080/6068 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8de7440a585c1d3d937fcbb435b9ead77e7b5a63 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4821: Update AVG() for DECIMAL V2
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-4821: Update AVG() for DECIMAL_V2 .. IMPALA-4821: Update AVG() for DECIMAL_V2 This change implements the DECIMAL_V2's behavior for AVG(). The differences with DECIMAL_V1 are: 1. The output type has a minimum scale of 6. This is similar to MS SQL's behavior which takes the max of 6 and the input type's scale. We deviate from MS SQL in the output's precision which is always set to 38. We use the smallest precision which can store the output. A key insight is that the output of AVG() is no wider than the inputs. Precision only needs to be adjusted when the scale is augmented. Using a smaller precision avoids potential loss of precision in subsequent decimal operations (e.g. division) if AVG() is a subexpression. Please note that the output type is different from SUM()/COUNT() as the latter can have a much larger scale. 2. Due to a minimum of 6 decimal places for the output, AVG() for decimal values whose whole number part exceeds 32 decimal places (e.g. DECIMAL(38,4), DECIMAL(33,0)) will always overflow as the scale is augmented to 6. Certain decimal types which work with AVG() in DECIMAL_V1 no longer work in DECIMAL_V2. Change-Id: I28f5ef0370938440eb5b1c6d29b2f24e6f88499f Reviewed-on: http://gerrit.cloudera.org:8080/6038 Reviewed-by: Dan Hecht Reviewed-by: Alex Behm Tested-by: Impala Public Jenkins --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/expr-test.cc M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java M testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test M testdata/workloads/functional-query/queries/QueryTest/decimal.test 5 files changed, 235 insertions(+), 76 deletions(-) Approvals: Impala Public Jenkins: Verified Alex Behm: Looks good to me, approved Dan Hecht: Looks good to me, but someone else must approve -- To view, visit http://gerrit.cloudera.org:8080/6038 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I28f5ef0370938440eb5b1c6d29b2f24e6f88499f Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho
[Impala-ASF-CR] IMPALA-4821: Update AVG() for DECIMAL V2
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4821: Update AVG() for DECIMAL_V2 .. Patch Set 5: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/6038 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I28f5ef0370938440eb5b1c6d29b2f24e6f88499f Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4936 and IMPALA-4915: Fix decimal overflow test
Dan Hecht has posted comments on this change. Change subject: IMPALA-4936 and IMPALA-4915: Fix decimal overflow test .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/6068/2/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: PS2, Line 1397: true > I am missing something too... was the test broken before? yes, the test was broken before. we do return NULL in these cases. Technically, we should return 0.998 in both but due to how we implement MOD, we overflow during the calculation. That's a similar problem to IMPALA-4939, but I think less important and severe. Line 1635: TestIsNotNull(c.expr, type); > I added this because I noticed the non-NULL test cases were passing even wh thanks for finding that. we should probably add EXPECT_EQ(result, StringParser::PARSE_SUCCESS); to the end of TestDecimalValue(). What's happening is the decimal string parser is failing (when parsing "NULL"), but when it does that it happens to return a DecimalValue with value()=0. -- To view, visit http://gerrit.cloudera.org:8080/6068 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8de7440a585c1d3d937fcbb435b9ead77e7b5a63 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4936 and IMPALA-4915: Fix decimal overflow test
Zach Amsden has posted comments on this change. Change subject: IMPALA-4936 and IMPALA-4915: Fix decimal overflow test .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/6068/2/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: PS2, Line 1397: true > I am missing something too... was the test broken before? I filed JIRA-4964 to investigate. Test was broken before and seems to have uncovered a bug with modulus. http://gerrit.cloudera.org:8080/#/c/6068/2/be/src/runtime/decimal-value.inline.h File be/src/runtime/decimal-value.inline.h: PS2, Line 83: abs > I think we should be consistent and use std::abs() at least in this entire I tried that and it actually turned out to be a bug. The places where std::abs and abs are correct turned out to be a lot more subtle than expected. std::abs selects the correct version for floating point but the wrong version for integers as abs(INT_MIN) is undefined. -- To view, visit http://gerrit.cloudera.org:8080/6068 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8de7440a585c1d3d937fcbb435b9ead77e7b5a63 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2020, 4915, 4936: Add rounding for decimal casts
Zach Amsden has uploaded a new patch set (#18). Change subject: IMPALA-2020, 4915, 4936: Add rounding for decimal casts .. IMPALA-2020, 4915, 4936: Add rounding for decimal casts This change adds support for DECIMAL_V2 rounding behavior for both DECIMAL to INT and DOUBLE to DECIMAL casts. The round behavior implemented for exact halves is round halves away from zero (e.g (0.5 -> 1) and (-0.5 -> -1)). This change also fixes two open bugs regarding overflow checking. The root cause of both behaviors turned out to be the same - a missing std:: caused the wrong abs() function to be used. Due to details of IEEE floating point representation, this actually masked another bug, as NaN is often represented as all 1-bits, which fails the overflow test. Since the implicit conversion to int lost precision, we ended up storing large numbers that don't actually represent valid decimal numbers in the range when the value happened to be +/- Infinity. This caused the rendering back to ASCII characters to go awry, but is otherwise harmless. Testing: Added expr-test and decimal-test test coverage as well as manual testing. I tried to update the expr benchmark to get some kind of results but the benchmark is pretty bit-rotted. It was throwing JNI exceptions. Fixed up the JNI init call, but there is still a lot of work to do to get this back in a runnable state. Even with the hack to get at the RuntimeContext, we end up getting null derefs due to the slot descriptor table not being initialized. Output comparisons, before | after +--+ | cast(0.5 as int) | +--+ | 0| 1| +--+ +-+ | cast(cast(0.5999 as float) as decimal(5,1)) | +-+ | 0.5 | 0.6 | +-+ Performance summary. In all cases I have tried multiple times and taken the fastest query results. Old version, head at 815c76f9cbbe6585ebed961da506fc54ce2ef4e3: select sum(cast(l_extendedprice as bigint)) from tpch10_parquet.lineitem; +--+ | sum(cast(l_extendedprice as bigint)) | +--+ | 2293784575265| +--+ Fetched 1 row(s) in 0.53s With this change, and decimal_v2 off: +--+ | sum(cast(l_extendedprice as bigint)) | +--+ | 2293784575265| +--+ Fetched 1 row(s) in 0.52s Note that there is some noise / instability in these results and across invocations there is quite a bit of variance. Still we appear not to have regressed. With decimal V2 enabled, we loose some performance due to rounding. DECIMAL_V2 set to 1 +--+ | sum(cast(l_extendedprice as bigint)) | +--+ | 2293814088985| +--+ Fetched 1 row(s) in 0.63s So we're about 20% slower. The variance is quite a lot so this is not a scientific number, but the trend is maintained. So we have some work to do to get this back. Casting from double seems to be roughly at parity: Old version, head at 815c76f9cbbe6585ebed961da506fc54ce2ef4e3: +-+ | sum(cast(cast(l_extendedprice as double) as decimal(14,2))) | +-+ | 2293813121802.09| +-+ Fetched 1 row(s) in 0.63s +--+ | sum(cast(cast(l_extendedprice as double) as decimal(38,10))) | +--+ | 2293813156773.3596978911 | +--+ Fetched 1 row(s) in 0.72s New version, decimal_v2=0 +-+ | sum(cast(cast(l_extendedprice as double) as decimal(14,2))) | +-+ | 2293813121802.09| +-+ Fetched 1 row(s) in 0.64s +--+ | sum(cast(cast(l_extendedprice as double) as decimal(38,10))) | +--+ | 2293813156773.3596978911 | +--+ Fetched 1 row(s) in 0.73s New version, decimal_v2=1; +-+
[Impala-ASF-CR] IMPALA-4936 and IMPALA-4915: Fix decimal overflow test
Zach Amsden has posted comments on this change. Change subject: IMPALA-4936 and IMPALA-4915: Fix decimal overflow test .. Patch Set 2: (3 comments) I'm folding this into the cast change anyway. http://gerrit.cloudera.org:8080/#/c/6068/1//COMMIT_MSG Commit Message: Line 22: > I think this probably should be fixed in this commit. This was a real problem, found it. http://gerrit.cloudera.org:8080/#/c/6068/2/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: PS2, Line 1397: true > Sorry, I may be missing something. Why is this NULL ? I am missing something too... was the test broken before? Line 1635: TestIsNotNull(c.expr, type); I added this because I noticed the non-NULL test cases were passing even when the value happened to match, which was the case for some zero values. Looks like something is broken with modulus. -- To view, visit http://gerrit.cloudera.org:8080/6068 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8de7440a585c1d3d937fcbb435b9ead77e7b5a63 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3401 [DOCS] Phase 3 of removing Cloudera Manager from upstream docs.
Laurel Hale has uploaded a new patch set (#2). Change subject: IMPALA-3401 [DOCS] Phase 3 of removing Cloudera Manager from upstream docs. .. IMPALA-3401 [DOCS] Phase 3 of removing Cloudera Manager from upstream docs. Hid instances of CM and rewrote for upstream docs when necessary. This still leaves occurences of CM in the XML, but not in the rendered documentation. A later project will remove all occurrences of CM from the XML. This patch set includes fixes made in response to John Russell's review. Change-Id: I4748300edc43b7071afc50e7cc7ddd64120c0d8d --- M docs/topics/impala_impala_shell.xml M docs/topics/impala_proxy.xml M docs/topics/impala_resource_management.xml M docs/topics/impala_timeouts.xml M docs/topics/impala_udf.xml 5 files changed, 116 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/6067/2 -- To view, visit http://gerrit.cloudera.org:8080/6067 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4748300edc43b7071afc50e7cc7ddd64120c0d8d Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Laurel Hale Gerrit-Reviewer: Ambreen Kazi Gerrit-Reviewer: John Russell
[Impala-ASF-CR] IMPALA-2020, 4915, 4936: Add rounding for decimal casts
Zach Amsden has posted comments on this change. Change subject: IMPALA-2020, 4915, 4936: Add rounding for decimal casts .. Patch Set 17: (1 comment) http://gerrit.cloudera.org:8080/#/c/5951/17/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: PS17, Line 1395: N.B. - with full decimal V2 this should evaluate to 1 > How so? We've already added the new V2 result type formula for %, and you g Old comment, I mistook 38, 38 for 38, 0 -- To view, visit http://gerrit.cloudera.org:8080/5951 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810 Gerrit-PatchSet: 17 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4934: Disable Kudu OpenSSL initialization
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-4934: Disable Kudu OpenSSL initialization .. IMPALA-4934: Disable Kudu OpenSSL initialization Bumps the Kudu version to include the change to the client that allows Impala to disable SSL initialization. In authentication.cc, after Impala initializes OpenSSL, Impala then disables Kudu's OpenSSL init. Fixed a python test case that started failing after bumping the Kudu client version. Change-Id: I3f13f3af512c6d771979638da593685524c73086 Reviewed-on: http://gerrit.cloudera.org:8080/6056 Reviewed-by: Matthew Jacobs Tested-by: Impala Public Jenkins --- M be/src/rpc/authentication.cc M bin/impala-config.sh M infra/python/deps/download_requirements M infra/python/deps/requirements.txt M tests/query_test/test_kudu.py 5 files changed, 13 insertions(+), 6 deletions(-) Approvals: Impala Public Jenkins: Verified Matthew Jacobs: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/6056 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I3f13f3af512c6d771979638da593685524c73086 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs
[Impala-ASF-CR] IMPALA-4934: Disable Kudu OpenSSL initialization
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4934: Disable Kudu OpenSSL initialization .. Patch Set 6: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/6056 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3f13f3af512c6d771979638da593685524c73086 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2020, 4915, 4936: Add rounding for decimal casts
Dan Hecht has posted comments on this change. Change subject: IMPALA-2020, 4915, 4936: Add rounding for decimal casts .. Patch Set 17: (1 comment) http://gerrit.cloudera.org:8080/#/c/5951/17/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: PS17, Line 1395: N.B. - with full decimal V2 this should evaluate to 1 How so? We've already added the new V2 result type formula for %, and you get decimal(38, 38) in this case which is consistent with other engines. what type would be a better choice? -- To view, visit http://gerrit.cloudera.org:8080/5951 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810 Gerrit-PatchSet: 17 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2328: Read support for min/max Parquet statistics
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-2328: Read support for min/max Parquet statistics .. Patch Set 6: (14 comments) http://gerrit.cloudera.org:8080/#/c/6032/6/be/src/exec/hdfs-parquet-scanner.h File be/src/exec/hdfs-parquet-scanner.h: Line 382: /// Min/max statistics contexts, owned by instances of this class. they are not owned by the class instances themselves, they're allocated in state_->obj_pool() Line 386: /// 'min_max_conjunct_ctxs_', which are used when evaluating parquet::Statistics. somewhat incomprehensible comment. this is used only for a specific function, and then only to avoid the dynamic allocation overhead. point this out, right now it seems very mysterious. Line 476: Status EvaluateRowGroupStats(const parquet::RowGroup& row_group, bool* skip_row_group); you're not evaluating stats, you're evaluating predicates on the stats http://gerrit.cloudera.org:8080/#/c/6032/6/be/src/exec/parquet-column-stats.h File be/src/exec/parquet-column-stats.h: Line 31: enum class StatsField { MIN, MAX }; move into ColumnStatsBase so it's clear what stats this is referring to. also, add enums for all fields in Statistics. Line 114: static bool ReadFromThrift(const parquet::Statistics& thrift_stats, const StatsField& stats_field, move function bodies into parquet-column-stats-inl.h http://gerrit.cloudera.org:8080/#/c/6032/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: Line 297: collectionConjuncts_.clear(); > My reasoning was that any constant expression could be evaluated against th true, but those should be identical, given that we rewrite constants as literals. http://gerrit.cloudera.org:8080/#/c/6032/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: Line 85: * conjuncts, that are passed to the backend and will be evaluated against the "conjuncts that" Line 154: private List minMaxPlanConjuncts_ = Lists.newArrayList(); "plan conjuncts" doesn't really mean anything. original conjuncts? Line 302:* Builds a predicate to evaluate parquet::Statistics by copying 'inputSlot' into "to evaluate against" Line 347: // Only constant exprs can be evaluated against the parquet::Statistics. This drop "the" Line 349: if (!constExpr.isConstant()) continue; as pointed out before, you shouldn't see non-literal constants here http://gerrit.cloudera.org:8080/#/c/6032/4/testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test File testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test: Line 1047:runtime filters: RF000 -> c_custkey > I changed the code to track the original predicates and display them instea extended is fine. yes, "parquet statistics" is more concrete. http://gerrit.cloudera.org:8080/#/c/6032/6/testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test File testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test: Line 1: # Test HDFS scan node. run this file in extended mode so you see the stats predicates. http://gerrit.cloudera.org:8080/#/c/6032/6/testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test File testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test: Line 206: # Test that without expr rewrite and thus without constant folding, constant exprs still what's the point of those tests? -- To view, visit http://gerrit.cloudera.org:8080/6032 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I39b836165756fcf929c801048d91c50c8fdcdae4 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Matthew Mulder Gerrit-Reviewer: Mostafa Mokhtar Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2020, 4915, 4936: Add rounding for decimal casts
Zach Amsden has uploaded a new patch set (#17). Change subject: IMPALA-2020, 4915, 4936: Add rounding for decimal casts .. IMPALA-2020, 4915, 4936: Add rounding for decimal casts This change adds support for DECIMAL_V2 rounding behavior for both DECIMAL to INT and DOUBLE to DECIMAL casts. The round behavior implemented for exact halves is round halves away from zero (e.g (0.5 -> 1) and (-0.5 -> -1)). This change also fixes two open bugs regarding overflow checking. The root cause of both behaviors turned out to be the same - a missing std:: caused the wrong abs() function to be used. Due to details of IEEE floating point representation, this actually masked another bug, as NaN is often represented as all 1-bits, which fails the overflow test. Since the implicit conversion to int lost precision, we ended up storing large numbers that don't actually represent valid decimal numbers in the range when the value happened to be +/- Infinity. This caused the rendering back to ASCII characters to go awry, but is otherwise harmless. Testing: Added expr-test and decimal-test test coverage as well as manual testing. I tried to update the expr benchmark to get some kind of results but the benchmark is pretty bit-rotted. It was throwing JNI exceptions. Fixed up the JNI init call, but there is still a lot of work to do to get this back in a runnable state. Even with the hack to get at the RuntimeContext, we end up getting null derefs due to the slot descriptor table not being initialized. Output comparisons, before | after +--+ | cast(0.5 as int) | +--+ | 0| 1| +--+ +-+ | cast(cast(0.5999 as float) as decimal(5,1)) | +-+ | 0.5 | 0.6 | +-+ Performance summary. In all cases I have tried multiple times and taken the fastest query results. Old version, head at 815c76f9cbbe6585ebed961da506fc54ce2ef4e3: select sum(cast(l_extendedprice as bigint)) from tpch10_parquet.lineitem; +--+ | sum(cast(l_extendedprice as bigint)) | +--+ | 2293784575265| +--+ Fetched 1 row(s) in 0.53s With this change, and decimal_v2 off: +--+ | sum(cast(l_extendedprice as bigint)) | +--+ | 2293784575265| +--+ Fetched 1 row(s) in 0.52s Note that there is some noise / instability in these results and across invocations there is quite a bit of variance. Still we appear not to have regressed. With decimal V2 enabled, we loose some performance due to rounding. DECIMAL_V2 set to 1 +--+ | sum(cast(l_extendedprice as bigint)) | +--+ | 2293814088985| +--+ Fetched 1 row(s) in 0.63s So we're about 20% slower. The variance is quite a lot so this is not a scientific number, but the trend is maintained. So we have some work to do to get this back. Casting from double seems to be roughly at parity: Old version, head at 815c76f9cbbe6585ebed961da506fc54ce2ef4e3: +-+ | sum(cast(cast(l_extendedprice as double) as decimal(14,2))) | +-+ | 2293813121802.09| +-+ Fetched 1 row(s) in 0.63s +--+ | sum(cast(cast(l_extendedprice as double) as decimal(38,10))) | +--+ | 2293813156773.3596978911 | +--+ Fetched 1 row(s) in 0.72s New version, decimal_v2=0 +-+ | sum(cast(cast(l_extendedprice as double) as decimal(14,2))) | +-+ | 2293813121802.09| +-+ Fetched 1 row(s) in 0.64s +--+ | sum(cast(cast(l_extendedprice as double) as decimal(38,10))) | +--+ | 2293813156773.3596978911 | +--+ Fetched 1 row(s) in 0.73s New version, decimal_v2=1; +-+
[Impala-ASF-CR] IMPALA-4821: Update AVG() for DECIMAL V2
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4821: Update AVG() for DECIMAL_V2 .. Patch Set 5: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/291/ -- To view, visit http://gerrit.cloudera.org:8080/6038 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I28f5ef0370938440eb5b1c6d29b2f24e6f88499f Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage
Taras Bobrovytsky has uploaded a new patch set (#7). Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage .. IMPALA-4787: Optimize APPX_MEDIAN() memory usage Before this change, ReservoirSample functions (such as APPX_MEDIAN()) allocated memory for 20,000 elements up front per grouping key. This caused inefficient memory usage for aggregations with many grouping keys. This patch fixes this by initially allocating memory for 16 elements. Once the buffer becomes full, we reallocate a new buffer with double capacity and copy the original buffer into the new one. We continue doubling the buffer size until the buffer has room for 20,000 elements as before. Testing: Ran BE and some relevant EE tests locally. No new tests were added, the existing tests should provide enough coverage. Performance benchmark: I ran a performance benchmark locally on release build. The following query results in 3,000 grouping keys and about 30,000 values per key: SELECT MAX(a) from ( SELECT c1, appx_median(c2) as a FROM benchmark GROUP BY c1) t Before: 11.57s Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail - 06:AGGREGATE1 96.086us 96.086us 1 1 28.00 KB -1.00 B FINALIZE 05:EXCHANGE 1 26.629us 26.629us 3 1 0 -1.00 B UNPARTITIONED 02:AGGREGATE3 68.187us 87.887us 3 1 44.00 KB 10.00 MB 04:AGGREGATE32s851ms5s362ms 3.00K -1 1.95 GB 128.00 MB FINALIZE 03:EXCHANGE 3 119.540ms 220.191ms 9.00K -1 0 0 HASH(c1) 01:AGGREGATE35s876ms6s254ms 9.00K -1 2.93 GB 128.00 MB STREAMING 00:SCAN HDFS3 127.834ms 146.842ms 98.30M -1 19.80 MB 32.00 MB tpcds_10_parquet.benchmark After: 13.58s Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail --- 06:AGGREGATE1 101.101us 101.101us 1 1 28.00 KB -1.00 B FINALIZE 05:EXCHANGE 1 32.296us 32.296us 3 1 0 -1.00 B UNPARTITIONED 02:AGGREGATE3 83.284us 120.137us 3 1 44.00 KB 10.00 MB 04:AGGREGATE33s190ms6s555ms 3.00K -1 1.96 GB 128.00 MB FINALIZE 03:EXCHANGE 3 247.897ms 497.280ms 9.00K -1 0 0 HASH(c1) 01:AGGREGATE37s370ms8s460ms 9.00K -1 4.71 GB 128.00 MB STREAMING 00:SCAN HDFS3 111.721ms 122.306ms 98.30M -1 19.94 MB 32.00 MB tpcds_10_parquet.benchmark Memory benchmark: The following query was used to verify that this patch reduces memory usage: SELECT APPX_MEDIAN(ss_sold_date_sk) FROM tpcds.store_sales GROUP BY ss_customer_sk; Peak Mem in Agg nodes is reduced from 4.94 GB to 19.82 MB. Summary before: Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail 04:EXCHANGE 15.856ms5.856ms 14.82K 15.21K 0 -1.00 B UNPARTITIONED 03:AGGREGATE33s721ms3s789ms 14.82K 15.21K 4.94 GB 10.00 MB FINALIZE 02:EXCHANGE 3 139.276ms 157.753ms 15.60K 15.21K 0 0 HASH(ss_customer_sk) 01:AGGREGATE32s851ms3s026ms 15.60K 15.21K 5.29 GB 10.00 MB STREAMING 00:SCAN HDFS3 24.245ms 35.727ms 183.59K 183.59K 4.60 MB 384.00 MB tpcds.store_sales Summary after: Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail 04:EXCHANGE 1 288.125us 288.125us 14.82K 15.21K 0 -1.00 B UNPARTITIONED 03:AGGREGATE39.358ms 10.982ms 14.82K 15.21K 19.82 MB 10.00 MB FINALIZE 02:EXCHANGE 3 129.832us 154.953us 15.62K 15.21K 0 0 HASH(ss_customer_sk) 01:AGGREGATE3 11.086ms 13.102ms 15.62K 15.21K 9.49 MB 10.00 MB STREAMING 00:SCAN HDFS3 40.154ms 50.220ms 183.59K 183.59K 2.94 MB 384.00 MB tpcds.store_sales Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968 --- M be/src/exprs/aggregate-functions-ir.cc M testdata/workloads/functional-query/queries/QueryTest/alloc-fail-init.test 2 files changed, 156 insertions(+), 37 del
[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage
Taras Bobrovytsky has uploaded a new patch set (#7). Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage .. IMPALA-4787: Optimize APPX_MEDIAN() memory usage Before this change, ReservoirSample functions (such as APPX_MEDIAN()) allocated memory for 20,000 elements up front per grouping key. This caused inefficient memory usage for aggregations with many grouping keys. This patch fixes this by initially allocating memory for 16 elements. Once the buffer becomes full, we reallocate a new buffer with double capacity and copy the original buffer into the new one. We continue doubling the buffer size until the buffer has room for 20,000 elements as before. Testing: Ran BE and some relevant EE tests locally. No new tests were added, the existing tests should provide enough coverage. Performance benchmark: I ran a performance benchmark locally on release build. The following query results in 3,000 grouping keys and about 30,000 values per key: SELECT MAX(a) from ( SELECT c1, appx_median(c2) as a FROM benchmark GROUP BY c1) t Before: 11.57s Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail - 06:AGGREGATE1 96.086us 96.086us 1 1 28.00 KB -1.00 B FINALIZE 05:EXCHANGE 1 26.629us 26.629us 3 1 0 -1.00 B UNPARTITIONED 02:AGGREGATE3 68.187us 87.887us 3 1 44.00 KB 10.00 MB 04:AGGREGATE32s851ms5s362ms 3.00K -1 1.95 GB 128.00 MB FINALIZE 03:EXCHANGE 3 119.540ms 220.191ms 9.00K -1 0 0 HASH(c1) 01:AGGREGATE35s876ms6s254ms 9.00K -1 2.93 GB 128.00 MB STREAMING 00:SCAN HDFS3 127.834ms 146.842ms 98.30M -1 19.80 MB 32.00 MB tpcds_10_parquet.benchmark After: 13.58s Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail --- 06:AGGREGATE1 101.101us 101.101us 1 1 28.00 KB -1.00 B FINALIZE 05:EXCHANGE 1 32.296us 32.296us 3 1 0 -1.00 B UNPARTITIONED 02:AGGREGATE3 83.284us 120.137us 3 1 44.00 KB 10.00 MB 04:AGGREGATE33s190ms6s555ms 3.00K -1 1.96 GB 128.00 MB FINALIZE 03:EXCHANGE 3 247.897ms 497.280ms 9.00K -1 0 0 HASH(c1) 01:AGGREGATE37s370ms8s460ms 9.00K -1 4.71 GB 128.00 MB STREAMING 00:SCAN HDFS3 111.721ms 122.306ms 98.30M -1 19.94 MB 32.00 MB tpcds_10_parquet.benchmark Memory benchmark: The following query was used to verify that this patch reduces memory usage: SELECT APPX_MEDIAN(ss_sold_date_sk) FROM tpcds.store_sales GROUP BY ss_customer_sk; Peak Mem in Agg nodes is reduced from 4.94 GB to 19.82 MB. Summary before: Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail 04:EXCHANGE 15.856ms5.856ms 14.82K 15.21K 0 -1.00 B UNPARTITIONED 03:AGGREGATE33s721ms3s789ms 14.82K 15.21K 4.94 GB 10.00 MB FINALIZE 02:EXCHANGE 3 139.276ms 157.753ms 15.60K 15.21K 0 0 HASH(ss_customer_sk) 01:AGGREGATE32s851ms3s026ms 15.60K 15.21K 5.29 GB 10.00 MB STREAMING 00:SCAN HDFS3 24.245ms 35.727ms 183.59K 183.59K 4.60 MB 384.00 MB tpcds.store_sales Summary after: Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail 04:EXCHANGE 1 288.125us 288.125us 14.82K 15.21K 0 -1.00 B UNPARTITIONED 03:AGGREGATE39.358ms 10.982ms 14.82K 15.21K 19.82 MB 10.00 MB FINALIZE 02:EXCHANGE 3 129.832us 154.953us 15.62K 15.21K 0 0 HASH(ss_customer_sk) 01:AGGREGATE3 11.086ms 13.102ms 15.62K 15.21K 9.49 MB 10.00 MB STREAMING 00:SCAN HDFS3 40.154ms 50.220ms 183.59K 183.59K 2.94 MB 384.00 MB tpcds.store_sales Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968 --- M be/src/exprs/aggregate-functions-ir.cc M testdata/workloads/functional-query/queries/QueryTest/alloc-fail-init.test 2 files changed, 156 insertions(+), 37 del
[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage
Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage .. Patch Set 6: (3 comments) http://gerrit.cloudera.org:8080/#/c/6025/6/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: PS6, Line 918: const static int INIT_CAPACITY = 16; : const static int MAX_NUM_SAMPLES = NUM_BUCKETS * NUM_SAMPLES_PER_BUCKET; > Please add a comment. It's not clear why all of these are grouped together Done PS6, Line 958: ReservoirSampleState > after thinking more about the casing comments, I came to the conclusion tha Why do you think a vector-like interface is not good? It seems simple and intuitive. Each operation does exactly what the vector equivalent does, with the exception of push_back(), which does not resize automatically. Should we get rid of begin() and end()? What should they be called? PS6, Line 1285: nth_element(src->begin(), mid_point, src->end(), SampleValLess); > nice thanks! -- To view, visit http://gerrit.cloudera.org:8080/6025 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] kudu: fix uninitialized variable usage warning
Impala Public Jenkins has submitted this change and it was merged. Change subject: kudu: fix uninitialized variable usage warning .. kudu: fix uninitialized variable usage warning This fixes a trivial warning of a potential usage of an uninitialized variable in release builds. If for some reason Kudu sends us an invalid log severity level, we'll now fall back to INFO instead of reading an uninitialized severity value. Change-Id: I4f87c39cf9259bec8ba9a84a2543fb1f2e4a3422 Reviewed-on: http://gerrit.cloudera.org:8080/6098 Reviewed-by: Matthew Jacobs Tested-by: Impala Public Jenkins --- M be/src/exec/kudu-util.cc 1 file changed, 1 insertion(+), 1 deletion(-) Approvals: Impala Public Jenkins: Verified Matthew Jacobs: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/6098 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I4f87c39cf9259bec8ba9a84a2543fb1f2e4a3422 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Todd Lipcon
[Impala-ASF-CR] kudu: fix uninitialized variable usage warning
Impala Public Jenkins has posted comments on this change. Change subject: kudu: fix uninitialized variable usage warning .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/6098 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4f87c39cf9259bec8ba9a84a2543fb1f2e4a3422 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3401 [DOCS] Phase 1 of removing 'Cloudera Manager'
Jim Apple has posted comments on this change. Change subject: IMPALA-3401 [DOCS] Phase 1 of removing 'Cloudera Manager' .. Patch Set 2: Code-Review+2 Thank you! -- To view, visit http://gerrit.cloudera.org:8080/6049 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I02ff6c3fc74e2e59b5d130226bd38c23c9c094b7 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Laurel Hale Gerrit-Reviewer: Ambreen Kazi Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: John Russell Gerrit-HasComments: No
[Impala-ASF-CR] build: don't look in system paths for kudu client
Impala Public Jenkins has posted comments on this change. Change subject: build: don't look in system paths for kudu client .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/6097 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8dad3d92e0bd07c27c3a58f6964d4da88218049c Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[Impala-ASF-CR] build: don't look in system paths for kudu client
Impala Public Jenkins has submitted this change and it was merged. Change subject: build: don't look in system paths for kudu client .. build: don't look in system paths for kudu client On my system which had the Kudu client library installed in /usr, Impala was picking up the wrong version of the client. This instructs CMake to only look in the specified toolchain directory to find Kudu. Change-Id: I8dad3d92e0bd07c27c3a58f6964d4da88218049c Reviewed-on: http://gerrit.cloudera.org:8080/6097 Reviewed-by: Matthew Jacobs Tested-by: Impala Public Jenkins --- M CMakeLists.txt 1 file changed, 1 insertion(+), 1 deletion(-) Approvals: Impala Public Jenkins: Verified Matthew Jacobs: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/6097 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I8dad3d92e0bd07c27c3a58f6964d4da88218049c Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Todd Lipcon
[Impala-ASF-CR] IMPALA-4934: Disable Kudu OpenSSL initialization
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4934: Disable Kudu OpenSSL initialization .. Patch Set 6: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/290/ -- To view, visit http://gerrit.cloudera.org:8080/6056 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3f13f3af512c6d771979638da593685524c73086 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4934: Disable Kudu OpenSSL initialization
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-4934: Disable Kudu OpenSSL initialization .. Patch Set 6: Code-Review+2 fixed a python test that failed, I had missed it locally because my local didn't have a completely fresh environment yet, so the newer python client wasn't being used yet. -- To view, visit http://gerrit.cloudera.org:8080/6056 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3f13f3af512c6d771979638da593685524c73086 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4877: fix precedence of unary -/+
Hello Marcel Kornacker, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6044 to look at the new patch set (#3). Change subject: IMPALA-4877: fix precedence of unary -/+ .. IMPALA-4877: fix precedence of unary -/+ Currently, expressions such as "-2 / 3" parse as "-(2 / 3)". In practice this usually doesn't cause differences for most common expressions. However, it does interact with a heuristic that changes decimal expression types to double when one operand is non-decimal (see JIRA). For example, before this fix, typeof(2.0/3.0) = DECIMAL typeof(-2.0/3.0) = DOUBLE because the latter expression parsed as "-1 * (2.0 / 3.0)". With this fix, both expressions result in a DECIMAL. Technically this is a compatibility breaking change but it seems like no one would expect the current behavior so I think we should fix it. Let me know if you disagree. Change-Id: I39cf388ae6e37e1b1e12ddea5fd3878c9c2620d1 --- M be/src/exprs/expr-test.cc M fe/src/main/cup/sql-parser.cup 2 files changed, 21 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/6044/3 -- To view, visit http://gerrit.cloudera.org:8080/6044 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I39cf388ae6e37e1b1e12ddea5fd3878c9c2620d1 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dan Hecht Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker
[Impala-ASF-CR] IMPALA-4877: fix precedence of unary -/+
Dan Hecht has posted comments on this change. Change subject: IMPALA-4877: fix precedence of unary -/+ .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/6044/2/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: Line 517: precedence left UMINUS, UPLUS; > actually, since they're identical, why not just have a single 'unarysign' o good point. done. -- To view, visit http://gerrit.cloudera.org:8080/6044 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I39cf388ae6e37e1b1e12ddea5fd3878c9c2620d1 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dan Hecht Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4934: Disable Kudu OpenSSL initialization
Hello Impala Public Jenkins, Henry Robinson, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6056 to look at the new patch set (#6). Change subject: IMPALA-4934: Disable Kudu OpenSSL initialization .. IMPALA-4934: Disable Kudu OpenSSL initialization Bumps the Kudu version to include the change to the client that allows Impala to disable SSL initialization. In authentication.cc, after Impala initializes OpenSSL, Impala then disables Kudu's OpenSSL init. Fixed a python test case that started failing after bumping the Kudu client version. Change-Id: I3f13f3af512c6d771979638da593685524c73086 --- M be/src/rpc/authentication.cc M bin/impala-config.sh M infra/python/deps/download_requirements M infra/python/deps/requirements.txt M tests/query_test/test_kudu.py 5 files changed, 13 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/6056/6 -- To view, visit http://gerrit.cloudera.org:8080/6056 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3f13f3af512c6d771979638da593685524c73086 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs
[Impala-ASF-CR] IMPALA-4959: Avoid picking up the system's boost cmake module
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-4959: Avoid picking up the system's boost cmake module .. Patch Set 3: Code-Review+2 ah, i forgot you could steal someone's CR by just posting w/ the same commit ID -- To view, visit http://gerrit.cloudera.org:8080/5994 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I759e1a4f8f69727cc1224bf460326140fd2131a2 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Ribeiro Alves Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3401 [DOCS] Removing 'Cloudera Manager' from upstream docs.
Laurel Hale has uploaded a new patch set (#2). Change subject: IMPALA-3401 [DOCS] Removing 'Cloudera Manager' from upstream docs. .. IMPALA-3401 [DOCS] Removing 'Cloudera Manager' from upstream docs. Used DITA attribute 'audience=hidden' when needed to preserve information for a future clean-up step. When it made sense, 'Cloudera Manager' was removed and sections were rewritten. This clean-up task does not completely remove 'Cloudera Manager' from the upstream docs source XML, but it is now hidden in these files so it won't show up in the rendered version. A subsequent cleanup project will remove references in the XML source. Change-Id: I76c9b53f587bc85c5c21e195f0a771183d4ef3a0 --- M docs/topics/impala_admission.xml M docs/topics/impala_config_performance.xml M docs/topics/impala_noncm_installation.xml M docs/topics/impala_prereqs.xml M docs/topics/impala_tutorial.xml 5 files changed, 72 insertions(+), 28 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/6064/2 -- To view, visit http://gerrit.cloudera.org:8080/6064 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I76c9b53f587bc85c5c21e195f0a771183d4ef3a0 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Laurel Hale Gerrit-Reviewer: Ambreen Kazi Gerrit-Reviewer: John Russell Gerrit-Reviewer: Laurel Hale
[Impala-ASF-CR] IMPALA-2328: Read support for min/max Parquet statistics
Lars Volker has posted comments on this change. Change subject: IMPALA-2328: Read support for min/max Parquet statistics .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/6032/4/testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test File testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test: Line 2: QUERY > Can you add coverage for filtering on the root level for nested data? Can you elaborate on how to do this? Wouldn't it only make sense to apply filtering on the leave level, since objects at the root level would be complex ones? Can you post a sample query and the result you would expect? Thanks. -- To view, visit http://gerrit.cloudera.org:8080/6032 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I39b836165756fcf929c801048d91c50c8fdcdae4 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Matthew Mulder Gerrit-Reviewer: Mostafa Mokhtar Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2328: Read support for min/max Parquet statistics
Lars Volker has uploaded a new patch set (#6). Change subject: IMPALA-2328: Read support for min/max Parquet statistics .. IMPALA-2328: Read support for min/max Parquet statistics This change adds support for skipping row groups based on Parquet row group statistics. With this change we only support reading statistics from Parquet files for numerical types (bool, integer, floating point) and for simple predicates of the formsor , where is LT, LE, GE, GT, and EQ. Change-Id: I39b836165756fcf929c801048d91c50c8fdcdae4 --- M be/src/exec/CMakeLists.txt M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h A be/src/exec/parquet-column-stats.cc M be/src/exec/parquet-column-stats.h M be/src/exec/parquet-metadata-utils.cc M be/src/exec/parquet-metadata-utils.h M be/src/exprs/expr.h M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java A fe/src/main/java/org/apache/impala/rewrite/NormalizeBinaryPredicatesRule.java M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test M testdata/workloads/functional-planner/queries/PlannerTest/data-source-tables.test M testdata/workloads/functional-planner/queries/PlannerTest/hdfs.test M testdata/workloads/functional-planner/queries/PlannerTest/implicit-joins.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test A testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test M tests/query_test/test_insert_parquet.py 29 files changed, 756 insertions(+), 87 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/6032/6 -- To view, visit http://gerrit.cloudera.org:8080/6032 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I39b836165756fcf929c801048d91c50c8fdcdae4 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Matthew Mulder Gerrit-Reviewer: Mostafa Mokhtar
[Impala-ASF-CR] IMPALA-2328: Read support for min/max Parquet statistics
Lars Volker has posted comments on this change. Change subject: IMPALA-2328: Read support for min/max Parquet statistics .. Patch Set 4: (26 comments) Thanks for the review. Please see PS6. http://gerrit.cloudera.org:8080/#/c/6032/4/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: Line 472: ScopedBuffer tuple_buffer(scan_node_->mem_tracker()); > no need to allocate this and go through the setup for every single row grou Done Line 486: conjuncts_with_stats.reserve(min_max_conjuncts_ctxs_.size()); > there's too much memory allocation going on for what are essentially static Done Line 496: unique_ptr stats( > there's no need for memory allocation here Done Line 501: if (e->function_name() == "lt" || e->function_name() == "le") { > introduce named constants (in the appropriate class) Other code that compares function_name follows the same pattern. As discussed in person, let's keep this for now and see if we find a nicer abstraction to determine when to use min/max values, e.g. monotonically bound predicates. http://gerrit.cloudera.org:8080/#/c/6032/4/be/src/exec/hdfs-parquet-scanner.h File be/src/exec/hdfs-parquet-scanner.h: Line 378: /// Min/max statistics contexts. > who owns them? Done http://gerrit.cloudera.org:8080/#/c/6032/4/be/src/exec/hdfs-scan-node-base.cc File be/src/exec/hdfs-scan-node-base.cc: Line 182: if (min_max_tuple_id_ > -1) { > ">" or < don't make sense for tuple ids Done http://gerrit.cloudera.org:8080/#/c/6032/4/be/src/exec/parquet-column-stats.h File be/src/exec/parquet-column-stats.h: Line 40: /// writing parquet files. It provides an interface to populate a parquet::Statistics > this description isn't true anymore, it's now also an intermediate staging I removed the expanded abstraction and added a sentence to the comment explaining that it can also be used to decode stats. Line 70: /// 'nullptr' for unsupported types. The caller is responsible for freeing the memory. > this should be created in an objectpool Code has been removed. Line 87: virtual void WriteMinToBuffer(void* buffer) const = 0; > remove "tobuffer", that's clear from the parameters. what's unclear is the Done Line 126: max_value_(max_value) { > formatting Done http://gerrit.cloudera.org:8080/#/c/6032/4/common/thrift/PlanNodes.thrift File common/thrift/PlanNodes.thrift: Line 208: // Conjuncts that can be evaluated against the statistics of parquet files. Each > refer to thrift structs explicitly, where appropriate. Done Line 209: // conjunct references the slot in 'min_max_tuple_id' with the same index. > i didn't understand the last sentence. Changed it. http://gerrit.cloudera.org:8080/#/c/6032/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: Line 143: // List of conjuncts for min/max values, that are used to skip data when scanning > "min/max values" is vague, refer to thrift structs explicitly. Done Line 145: private List minMaxConjunts_ = Lists.newArrayList(); > conjuncts Done Line 147: // Tuple that is used to materialize statistics when scanning Parquet files. > describe what's in it. Done Line 290:* Builds a predicate to evaluate statistical values by combining 'inputSlot', > make 'statistical values' concrete Done Line 291:* 'inputPred' and 'op' into a new predicate, and adding it to 'minMaxConjunts_'. > missing other side effects Done Line 297: Preconditions.checkState(constExpr.isConstant()); > why not a literal? My reasoning was that any constant expression could be evaluated against the statistics. I added a test to make sure this does indeed work as expected. Line 303: // Make a new slot from the slot descriptor. > what do you mean by 'slot'? (slot/slot descriptor are usually synonyms) I meant slotRef, but the comment looked superfluous and I removed it altogether. Line 315:* TODO: This method currently doesn't de-dup slots that are referenced multiple times. > resolve As discussed, lets keep this for now and address it in a subsequent change by simplifying redundant exprs. I created IMPALA-4958 to track this. Line 328: // We only support slot refs on the left hand side of the predicate, a rewriting > this overlaps with what kuduscannode does, and there's already a similar co I had talked to Alex about this last week and he suggested to add a general rewrite rule in the FE to normalize binary exprs. I removed the branches from BinaryPredicate.java that are no longer needed after the normalization. I also cleaned up code in KuduScanNode.java that is no longer needed. Line 343: { > { on preceding line Done Line 658: msg.hdfs_scan_node.setMin_max_tuple_id(minMaxTuple_.getId().asInt()); > check that it's not invalid How can I check this? The tupleId needs to come from the TupleId generator a
[Impala-ASF-CR] IMPALA-2328: Read support for min/max Parquet statistics
Lars Volker has uploaded a new patch set (#5). Change subject: IMPALA-2328: Read support for min/max Parquet statistics .. IMPALA-2328: Read support for min/max Parquet statistics This change adds support for skipping row groups based on Parquet row group statistics. With this change we only support reading statistics from Parquet files for numerical types (bool, integer, floating point) and for simple predicates of the formsor , where is LT, LE, GE, GT, and EQ. Change-Id: I39b836165756fcf929c801048d91c50c8fdcdae4 --- M be/src/exec/CMakeLists.txt M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h A be/src/exec/parquet-column-stats.cc M be/src/exec/parquet-column-stats.h M be/src/exec/parquet-metadata-utils.cc M be/src/exec/parquet-metadata-utils.h M be/src/exprs/expr.h M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java A fe/src/main/java/org/apache/impala/rewrite/NormalizeBinaryPredicatesRule.java M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test M testdata/workloads/functional-planner/queries/PlannerTest/data-source-tables.test M testdata/workloads/functional-planner/queries/PlannerTest/hdfs.test M testdata/workloads/functional-planner/queries/PlannerTest/implicit-joins.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test A testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test M tests/query_test/test_insert_parquet.py 29 files changed, 757 insertions(+), 87 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/6032/5 -- To view, visit http://gerrit.cloudera.org:8080/6032 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I39b836165756fcf929c801048d91c50c8fdcdae4 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Matthew Mulder Gerrit-Reviewer: Mostafa Mokhtar
[Impala-ASF-CR] IMPALA-4934: Disable Kudu OpenSSL initialization
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4934: Disable Kudu OpenSSL initialization .. Patch Set 5: Verified-1 Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/287/ -- To view, visit http://gerrit.cloudera.org:8080/6056 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3f13f3af512c6d771979638da593685524c73086 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3401 [DOCS] Phase 1 of removing 'Cloudera Manager'
Hello Jim Apple, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6049 to look at the new patch set (#2). Change subject: IMPALA-3401 [DOCS] Phase 1 of removing 'Cloudera Manager' .. IMPALA-3401 [DOCS] Phase 1 of removing 'Cloudera Manager' Rewrote sections to eliminate 'Cloudera Manager' from topics. Look for subsequent phases to remove remaining instances of CM. Change-Id: I02ff6c3fc74e2e59b5d130226bd38c23c9c094b7 --- M docs/shared/impala_common.xml M docs/topics/impala_config_options.xml M docs/topics/impala_upgrading.xml 3 files changed, 6 insertions(+), 17 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/6049/2 -- To view, visit http://gerrit.cloudera.org:8080/6049 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I02ff6c3fc74e2e59b5d130226bd38c23c9c094b7 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Laurel Hale Gerrit-Reviewer: Ambreen Kazi Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: John Russell
[Impala-ASF-CR] IMPALA-4624: Implement Parquet dictionary filtering
Mostafa Mokhtar has posted comments on this change. Change subject: IMPALA-4624: Implement Parquet dictionary filtering .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/5904/8/testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test File testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test: Line 1: Please add coverage for nested data filtering as well. -- To view, visit http://gerrit.cloudera.org:8080/5904 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3a7cc3bd0523fbf3c79bd924219e909ef671cfd7 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Mulder Gerrit-Reviewer: Mostafa Mokhtar Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2328: Read support for min/max Parquet statistics
Mostafa Mokhtar has posted comments on this change. Change subject: IMPALA-2328: Read support for min/max Parquet statistics .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/6032/4/testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test File testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test: Line 2: QUERY Can you add coverage for filtering on the root level for nested data? -- To view, visit http://gerrit.cloudera.org:8080/6032 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I39b836165756fcf929c801048d91c50c8fdcdae4 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Matthew Mulder Gerrit-Reviewer: Mostafa Mokhtar Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts
Zach Amsden has posted comments on this change. Change subject: IMPALA-2020: Add rounding for decimal casts .. Patch Set 16: (1 comment) After fighting the compiler for over an hour to figure out why our operator<< was not being called, I found this little gem in glog/logging.h and since abandoned getting DCHECK_EQ / GT to work. / Build the error message string. template std::string* MakeCheckOpString(const t1& v1, const t2& v2, const char* names) { // It means that we cannot use stl_logging if compiler doesn't // support using expression for operator. // TODO(hamaji): Figure out a way to fix. #if 1 using ::operator<<; #endif std::strstream ss; ss << names << " (" << v1 << " vs. " << v2 << ")"; return new std::string(ss.str(), ss.pcount()); } http://gerrit.cloudera.org:8080/#/c/5951/16//COMMIT_MSG Commit Message: Line 64: Query: select sum(cast(l_extendedprice as bigint)) from > please pare down the commit msg, in particular get rid of the verbatim shel Done -- To view, visit http://gerrit.cloudera.org:8080/5951 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810 Gerrit-PatchSet: 16 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-2020: Add rounding for decimal casts .. Patch Set 16: (1 comment) http://gerrit.cloudera.org:8080/#/c/5951/16//COMMIT_MSG Commit Message: Line 64: Query: select sum(cast(l_extendedprice as bigint)) from please pare down the commit msg, in particular get rid of the verbatim shell output (a table with the numbers is sufficient to get the point across). -- To view, visit http://gerrit.cloudera.org:8080/5951 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810 Gerrit-PatchSet: 16 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4877: fix precedence of unary -/+
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4877: fix precedence of unary -/+ .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/6044/2/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: Line 517: precedence left UMINUS, UPLUS; actually, since they're identical, why not just have a single 'unarysign' or something like that? -- To view, visit http://gerrit.cloudera.org:8080/6044 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I39cf388ae6e37e1b1e12ddea5fd3878c9c2620d1 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dan Hecht Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4877: fix precedence of unary -/+
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4877: fix precedence of unary -/+ .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6044 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I39cf388ae6e37e1b1e12ddea5fd3878c9c2620d1 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dan Hecht Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3905: Implements HdfsScanner::GetNext() for text scans.
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-3905: Implements HdfsScanner::GetNext() for text scans. .. Patch Set 2: (12 comments) http://gerrit.cloudera.org:8080/#/c/6000/2/be/src/exec/hdfs-text-scanner.cc File be/src/exec/hdfs-text-scanner.cc: Line 64: batch_start_ptr_(NULL), use nullptr throughout Line 273: MemPool* pool = row_batch->tuple_data_pool(); please consider getting rid of this variable and referring to the rowbatch pool directly. there are too many mempools flying around in this code, it's nice to have a reminder which one is being used. Line 440: eos_ = true; advance scan_state_? also, i don't see anything in hdfs-scanner.h that says you can't call getnext() when eos() returns true. Line 445: << "FE should have generated SNAPPY_BLOCKED instead."; why not check this when creating the stream? Line 446: RETURN_IF_ERROR(UpdateDecompressor(stream_->file_desc()->file_compression)); why is this needed, given that we just started (ie, why not move it into initialization)? Line 459: eos_ = true; scan_state_ transition? Line 469: Status HdfsTextScanner::CommitRows(RowBatch* row_batch, int num_rows) { > This loosely corresponds to the previous HdfsScanNode::CommitRows() how about coalescing that with hdfsscannode::commitrows? at the very least move this into hdfsscannode. Line 485: ExprContext::FreeLocalAllocations(entry.second); why call this here? http://gerrit.cloudera.org:8080/#/c/6000/2/be/src/exec/hdfs-text-scanner.h File be/src/exec/hdfs-text-scanner.h: Line 119: /// regardless if whether it passed the conjuncts. regardless of whether Line 124: /// Only valid to call in scan state FIRST_TUPLE_FOUND or PAST_SCAN_RANGE. thanks for introducing that state variable, it's much clearer than a bunch of bools that get passed around. Line 135: Status CommitRows(RowBatch* row_batch, int num_rows); in/out params go last Line 140: /// otherwise it will just read num_bytes. If we are reading compressed text, num_bytes if we are reading compressed text, should this even be called (or why is there a 'compressed stream' variant)? -- To view, visit http://gerrit.cloudera.org:8080/6000 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id193aa223434d7cc40061a42f81bbb29dcd0404b Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4821: Update AVG() for DECIMAL V2
Alex Behm has posted comments on this change. Change subject: IMPALA-4821: Update AVG() for DECIMAL_V2 .. Patch Set 5: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6038 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I28f5ef0370938440eb5b1c6d29b2f24e6f88499f Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4821: Update AVG() for DECIMAL V2
Dan Hecht has posted comments on this change. Change subject: IMPALA-4821: Update AVG() for DECIMAL_V2 .. Patch Set 5: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/6038 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I28f5ef0370938440eb5b1c6d29b2f24e6f88499f Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4821: Update AVG() for DECIMAL V2
Dan Hecht has posted comments on this change. Change subject: IMPALA-4821: Update AVG() for DECIMAL_V2 .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/6038/4//COMMIT_MSG Commit Message: PS4, Line 25: types > Done. I'm not sure how many people deal with numbers larger than 10^32... seems like a reasonable limitation. -- To view, visit http://gerrit.cloudera.org:8080/6038 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I28f5ef0370938440eb5b1c6d29b2f24e6f88499f Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3401 [DOCS] Phase 5 of "Cloudera Manager" removal. Most of these fixes entailed hiding the paragraph where there are mentions of CM and writing a replacement paragraph for the u
Laurel Hale has uploaded a new patch set (#2). Change subject: IMPALA-3401 [DOCS] Phase 5 of "Cloudera Manager" removal. Most of these fixes entailed hiding the paragraph where there are mentions of CM and writing a replacement paragraph for the upstream docs. This removes the CM references from the rendered docs. A .. IMPALA-3401 [DOCS] Phase 5 of "Cloudera Manager" removal. Most of these fixes entailed hiding the paragraph where there are mentions of CM and writing a replacement paragraph for the upstream docs. This removes the CM references from the rendered docs. A subsequent cleanup project will remove occurrences of CM from the XML. This patch includes fixes to files in response to John Russell's first set of review comments. Change-Id: I4967fae275a8822274aece14a15107237445aba5 --- M docs/topics/impala_isilon.xml M docs/topics/impala_logging.xml M docs/topics/impala_s3.xml M docs/topics/impala_webui.xml 4 files changed, 37 insertions(+), 27 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/6070/2 -- To view, visit http://gerrit.cloudera.org:8080/6070 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4967fae275a8822274aece14a15107237445aba5 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Laurel Hale Gerrit-Reviewer: Ambreen Kazi Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: John Russell
[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts
Zach Amsden has posted comments on this change. Change subject: IMPALA-2020: Add rounding for decimal casts .. Patch Set 10: (1 comment) I am trying to get the DCHECK_EQ to work with int128_t, for which we do provide a definition, but gcc keeps complaining about an ambiguous overload. http://gerrit.cloudera.org:8080/#/c/5951/10/be/src/runtime/decimal-value.inline.h File be/src/runtime/decimal-value.inline.h: PS10, Line 38: if (round) { : // Decimal V2 behavior : : // Multiply the double by the scale. : // : // TODO: this could be made more precise by doing the computation in : // 64 bit with 128 bit immediate result instead of doing an additional : // multiply in 53-bit double precision floating point : : d *= DecimalUtil::GetScaleMultiplier(scale); : d = std::round(d); : const T max_value = DecimalUtil::GetScaleMultiplier(precision); : if (abs(d) >= max_value) { : *overflow = true; : return DecimalValue(); : } : : // Return the rounded integer part. : return DecimalValue(static_cast(d)); : } else { : // TODO: IMPALA-4924: remove DECIMAL V1 code : : // Check overflow. : const T max_value = DecimalUtil::GetScaleMultiplier(precision - scale); : if (abs(d) >= max_value) { : *overflow = true; : return DecimalValue(); : } : : // Multiply the double by the scale. : d *= DecimalUtil::GetScaleMultiplier(scale); : : // Truncate and just take the integer part. : return DecimalValue(static_cast(d)); : } > If it won't break DECIMAL_V1, may as well merge the two paths and add the t It looks like it will work and it doesn't change the speed. -- To view, visit http://gerrit.cloudera.org:8080/5951 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4959: Avoid picking up the system's boost cmake module
Alex Behm has posted comments on this change. Change subject: IMPALA-4959: Avoid picking up the system's boost cmake module .. Patch Set 3: Code-Review+1 I updated the commit msg. Looks good to me. -- To view, visit http://gerrit.cloudera.org:8080/5994 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I759e1a4f8f69727cc1224bf460326140fd2131a2 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Ribeiro Alves Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4959: Avoid picking up the system's boost cmake module
Alex Behm has uploaded a new patch set (#3). Change subject: IMPALA-4959: Avoid picking up the system's boost cmake module .. IMPALA-4959: Avoid picking up the system's boost cmake module In some systems with an old boost installed system-wide the impala build would fail with something like: CMake Error at /usr/lib64/boost/BoostConfig.cmake:64 (get_target_property): get_target_property() called with non-existent target "boost_thread-shared". Call Stack (most recent call first): toolchain/cmake-3.2.3-p1/share/cmake-3.2/Modules/FindBoost.cmake:206 (find_package) CMakeLists.txt:116 (find_package) CMake Error at /usr/lib64/boost/BoostConfig.cmake:72 (get_target_property): get_target_property() called with non-existent target "boost_thread-shared-debug". Call Stack (most recent call first): toolchain/cmake-3.2.3-p1/share/cmake-3.2/Modules/FindBoost.cmake:206 (find_package) CMakeLists.txt:116 (find_package) This because, if it exists, cmake's FindBoost.cmake will look for and use that module, even though boost's cmake build hasn't been maintained in years and the impala build is actually configured to not use the systems boost. This patch sets the cmake flag Boost_NO_BOOST_CMAKE to ON, making sure the old cmake module is not picked up. Change-Id: I759e1a4f8f69727cc1224bf460326140fd2131a2 --- M CMakeLists.txt 1 file changed, 4 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/5994/3 -- To view, visit http://gerrit.cloudera.org:8080/5994 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I759e1a4f8f69727cc1224bf460326140fd2131a2 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Ribeiro Alves Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs
[Impala-ASF-CR] IMPALA-4546: Fix Moscow timezone conversion after 2014
Alex Behm has posted comments on this change. Change subject: IMPALA-4546: Fix Moscow timezone conversion after 2014 .. Patch Set 6: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/5969 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id6e3f2c9f6ba29749a26bc1087e664637bc02528 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage .. Patch Set 6: (3 comments) I'm still not convinced the new paths are necessarily, e.g. the case I mentioned previously where you need to merge and there are different sized inputs. http://gerrit.cloudera.org:8080/#/c/6025/6/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: PS6, Line 918: const static int INIT_CAPACITY = 16; : const static int MAX_NUM_SAMPLES = NUM_BUCKETS * NUM_SAMPLES_PER_BUCKET; Please add a comment. It's not clear why all of these are grouped together anymore. The first two are only relevant to histograms. These two are about capacity for anything using ResSampling. MAX_NUM_SAMPLES should probably also be a capacity now, e.g. MAX_CAPACITY. PS6, Line 958: ReservoirSampleState after thinking more about the casing comments, I came to the conclusion that I don't think trying to make this look like a std::vector is even the best interface. I'd prefer if the methods were named non-std-vector-like names, e.g. GetSample(idx) AddSample(s) GetStateSize() PS6, Line 1285: nth_element(src->begin(), mid_point, src->end(), SampleValLess); nice -- To view, visit http://gerrit.cloudera.org:8080/6025 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3406: [DOCS] Empty the original Cloudera FAQ
Jim Apple has posted comments on this change. Change subject: IMPALA-3406: [DOCS] Empty the original Cloudera FAQ .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6003 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib81242f0981c04fff99e2c27e06a8d9f4da34c9f Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: John Russell Gerrit-Reviewer: Laurel Hale Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3406: [DOCS] Empty the original Cloudera FAQ
John Russell has posted comments on this change. Change subject: IMPALA-3406: [DOCS] Empty the original Cloudera FAQ .. Patch Set 2: Somehow there was a conflicting change from a gerrit that I had abandoned that was getting in the way every time. Once I rebased against asf-gerrit/master and deleted that commit: d 367a690 IMPALA-3406: [DOCS] Remove stale and Cloudera-specific URLs I was able to push the new patch set. -- To view, visit http://gerrit.cloudera.org:8080/6003 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib81242f0981c04fff99e2c27e06a8d9f4da34c9f Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: John Russell Gerrit-Reviewer: Laurel Hale Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3406: [DOCS] Empty the original Cloudera FAQ
Hello Jim Apple, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6003 to look at the new patch set (#2). Change subject: IMPALA-3406: [DOCS] Empty the original Cloudera FAQ .. IMPALA-3406: [DOCS] Empty the original Cloudera FAQ Almost all of the original Impala FAQ material was Cloudera-themed or commercially oriented. Lots of answers about the QuickStart VM, Cloudera discussion forums, CDH-based recommendations, etc. IMO it is not worth trying to adapt each FAQ entry to be generic. Better to start over from the ground up. Phase 1 of making an Apache-friendly FAQ is to strip the original page "down to the studs" so new FAQ entries can be added with more of a developer theme, based on questions people have in the community. Change-Id: Ib81242f0981c04fff99e2c27e06a8d9f4da34c9f --- M docs/topics/impala_faq.xml 1 file changed, 6 insertions(+), 1,852 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/03/6003/2 -- To view, visit http://gerrit.cloudera.org:8080/6003 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib81242f0981c04fff99e2c27e06a8d9f4da34c9f Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: John Russell Gerrit-Reviewer: Laurel Hale
[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage
Jim Apple has posted comments on this change. Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage .. Patch Set 6: (2 comments) http://gerrit.cloudera.org:8080/#/c/6025/5/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: Line 1121: if (state->at(i)->key >= 0) continue; > There is a comment in the code about upgrading to mersenne twister, I think I'm satisfied with this answer PS5, Line 1122: te->num_samples; > The point of the big comment above is that this is not exact, and we're app I'm satisfied with this answer -- To view, visit http://gerrit.cloudera.org:8080/6025 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3401 [DOCS] Phase 3 of removing Cloudera Manager from upstream docs. Hid instances of CM and rewrote for upstream docs when necessary. This still leaves occurences of CM in the
John Russell has posted comments on this change. Change subject: IMPALA-3401 [DOCS] Phase 3 of removing Cloudera Manager from upstream docs. Hid instances of CM and rewrote for upstream docs when necessary. This still leaves occurences of CM in the XML, but not in the rendered documentation. A later project will remove .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/6067/1/docs/topics/impala_impala_shell.xml File docs/topics/impala_impala_shell.xml: PS1, Line 96: Watch out for trailing spaces and/or tabs left behind in new blank lines or newly split lines. In vi, you can find such things by doing /[ \t][ \t]*$ which will leave the cursor at the start of whichever sequence of spaces and tabs comes at the end of a line. At that point, you can press D to remove from the cursor position to the end of the line -- i.e. all trailing spaces and tabs. http://gerrit.cloudera.org:8080/#/c/6067/1/docs/topics/impala_proxy.xml File docs/topics/impala_proxy.xml: PS1, Line 336: Isn't this too blunt an instrument to hide the whole ? Everything under here (except little things like the CM-related ) is appropriate for generic Apache Impala usage. The , , etc. are all inside this and will be hidden too. http://gerrit.cloudera.org:8080/#/c/6067/1/docs/topics/impala_timeouts.xml File docs/topics/impala_timeouts.xml: PS1, Line 188: Various client applications We may as well preserve part of this sentence, at least the example of: pressing ^C in impala-shell since that's applicable to generic Apache Impala. -- To view, visit http://gerrit.cloudera.org:8080/6067 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4748300edc43b7071afc50e7cc7ddd64120c0d8d Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Laurel Hale Gerrit-Reviewer: Ambreen Kazi Gerrit-Reviewer: John Russell Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage .. Patch Set 5: (5 comments) http://gerrit.cloudera.org:8080/#/c/6025/2/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: PS2, Line 1018: Returns a pointer to one p > Make this one start with a capital letter. But other ones like begin() shou I see that you're matching some casing with c++ style iterators, but we do follow the C++ style guide [1], which states you'd use camel case for functions unless it's a simple accessor/mutator, which can have the lower-cased names. 1: https://google.github.io/styleguide/cppguide.html#Function_Names http://gerrit.cloudera.org:8080/#/c/6025/3/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: PS3, Line 1025: 2 now PS3, Line 1076: 2 http://gerrit.cloudera.org:8080/#/c/6025/5/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: Line 1121: int r = rand() % state->num_samples; > I know you didn't author this line, but could you switch to a better PRNG t There is a comment in the code about upgrading to mersenne twister, I think we can if it's now baked into c++11. However, what is the state/struct? Right now we keep the 'ranlux64_3 rng' state on the ResSampState, and we'd need to do the same for mt19937_64. We'd need to check that is reasonably sized. It might be worth doing that in a separate patch anyway. PS5, Line 1122: ((double) state->source_size - r) / state->source_size > I'm not yet convinced this is correct. I'm not convinced it is correct in H The point of the big comment above is that this is not exact, and we're approximating the proper sampling in order to avoid maintaining a heap. The point is that this is picking some number proportional to the size of the source input. I'm sure you can find mathematical flaws in this approximation, and It's not clear to me how bad the error is, but the original use of this function was to generate coarse histograms so we really wanted to optimize for that use case, especially knowing that planner heuristics are only heuristics to begin with. Over time we've come to want more from this code, so it's probably worth reevaluating whether or not this approach still makes sense. There's a reason we even call the median function "APPX_MEDIAN". I don't think we give any guarantees beyond 'it is approximate': https://www.cloudera.com/documentation/enterprise/latest/topics/impala_appx_median.html -- To view, visit http://gerrit.cloudera.org:8080/6025 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage
Taras Bobrovytsky has uploaded a new patch set (#6). Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage .. IMPALA-4787: Optimize APPX_MEDIAN() memory usage Before this change, ReservoirSample functions (such as APPX_MEDIAN()) allocated memory for 20,000 elements up front per grouping key. This caused inefficient memory usage for aggregations with many grouping keys. This patch fixes this by initially allocating memory for 16 elements. Once the buffer becomes full, we reallocate a new buffer with double capacity and copy the original buffer into the new one. We continue doubling the buffer size until the buffer has room for 20,000 elements as before. Testing: Ran BE and some relevant EE tests locally. No new tests were added, the existing tests should provide enough coverage. Performance benchmark: I ran a performance benchmark locally on release build. The following query results in 3,000 grouping keys and about 30,000 values per key: SELECT MAX(a) from ( SELECT c1, appx_median(c2) as a FROM benchmark GROUP BY c1) t Before: 11.57s Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail - 06:AGGREGATE1 96.086us 96.086us 1 1 28.00 KB -1.00 B FINALIZE 05:EXCHANGE 1 26.629us 26.629us 3 1 0 -1.00 B UNPARTITIONED 02:AGGREGATE3 68.187us 87.887us 3 1 44.00 KB 10.00 MB 04:AGGREGATE32s851ms5s362ms 3.00K -1 1.95 GB 128.00 MB FINALIZE 03:EXCHANGE 3 119.540ms 220.191ms 9.00K -1 0 0 HASH(c1) 01:AGGREGATE35s876ms6s254ms 9.00K -1 2.93 GB 128.00 MB STREAMING 00:SCAN HDFS3 127.834ms 146.842ms 98.30M -1 19.80 MB 32.00 MB tpcds_10_parquet.benchmark After: 13.58s Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail --- 06:AGGREGATE1 101.101us 101.101us 1 1 28.00 KB -1.00 B FINALIZE 05:EXCHANGE 1 32.296us 32.296us 3 1 0 -1.00 B UNPARTITIONED 02:AGGREGATE3 83.284us 120.137us 3 1 44.00 KB 10.00 MB 04:AGGREGATE33s190ms6s555ms 3.00K -1 1.96 GB 128.00 MB FINALIZE 03:EXCHANGE 3 247.897ms 497.280ms 9.00K -1 0 0 HASH(c1) 01:AGGREGATE37s370ms8s460ms 9.00K -1 4.71 GB 128.00 MB STREAMING 00:SCAN HDFS3 111.721ms 122.306ms 98.30M -1 19.94 MB 32.00 MB tpcds_10_parquet.benchmark Memory benchmark: The following query was used to verify that this patch reduces memory usage: SELECT APPX_MEDIAN(ss_sold_date_sk) FROM tpcds.store_sales GROUP BY ss_customer_sk; Peak Mem in Agg nodes is reduced from 4.94 GB to 19.82 MB. Summary before: Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail 04:EXCHANGE 15.856ms5.856ms 14.82K 15.21K 0 -1.00 B UNPARTITIONED 03:AGGREGATE33s721ms3s789ms 14.82K 15.21K 4.94 GB 10.00 MB FINALIZE 02:EXCHANGE 3 139.276ms 157.753ms 15.60K 15.21K 0 0 HASH(ss_customer_sk) 01:AGGREGATE32s851ms3s026ms 15.60K 15.21K 5.29 GB 10.00 MB STREAMING 00:SCAN HDFS3 24.245ms 35.727ms 183.59K 183.59K 4.60 MB 384.00 MB tpcds.store_sales Summary after: Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail 04:EXCHANGE 1 288.125us 288.125us 14.82K 15.21K 0 -1.00 B UNPARTITIONED 03:AGGREGATE39.358ms 10.982ms 14.82K 15.21K 19.82 MB 10.00 MB FINALIZE 02:EXCHANGE 3 129.832us 154.953us 15.62K 15.21K 0 0 HASH(ss_customer_sk) 01:AGGREGATE3 11.086ms 13.102ms 15.62K 15.21K 9.49 MB 10.00 MB STREAMING 00:SCAN HDFS3 40.154ms 50.220ms 183.59K 183.59K 2.94 MB 384.00 MB tpcds.store_sales Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968 --- M be/src/exprs/aggregate-functions-ir.cc M testdata/workloads/functional-query/queries/QueryTest/alloc-fail-init.test 2 files changed, 153 insertions(+), 37 del
[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage
Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage .. Patch Set 5: (3 comments) http://gerrit.cloudera.org:8080/#/c/6025/5/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: Line 1121: int r = rand() % state->num_samples; > I know you didn't author this line, but could you switch to a better PRNG t I think this should be addressed in a separate patch. I think we would have to benchmark the change, etc. PS5, Line 1122: ((double) state->source_size - r) / state->source_size > I'm not yet convinced this is correct. I'm not convinced it is correct in H I don't think this is correct either. The distribution of keys will not be the same as what's described in the blog post: https://gregable.com/2007/10/reservoir-sampling.html We should be generating a random number between 0 and 1 in update(), but we are "simulating" this here (see comment on line 1107), which gives us an incorrect distribution. I don't think increasing the denominator by 1 as you suggested would fix the problem. About your second point, in our case the weight k is always 1, so we don't need to raise r to some power. I think algorithm should be addressed in a separate patch. PS5, Line 1282: sort > I know this also is present in HEAD, but it can be http://en.cppreference.c Done -- To view, visit http://gerrit.cloudera.org:8080/6025 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage
Taras Bobrovytsky has uploaded a new patch set (#6). Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage .. IMPALA-4787: Optimize APPX_MEDIAN() memory usage Before this change, ReservoirSample functions (such as APPX_MEDIAN()) allocated memory for 20,000 elements up front per grouping key. This caused inefficient memory usage for aggregations with many grouping keys. This patch fixes this by initially allocating memory for 16 elements. Once the buffer becomes full, we reallocate a new buffer with double capacity and copy the original buffer into the new one. We continue doubling the buffer size until the buffer has room for 20,000 elements as before. Testing: Ran BE and some relevant EE tests locally. No new tests were added, the existing tests should provide enough coverage. Performance benchmark: I ran a performance benchmark locally on release build. The following query results in 3,000 grouping keys and about 30,000 values per key: SELECT MAX(a) from ( SELECT c1, appx_median(c2) as a FROM benchmark GROUP BY c1) t Before: 11.57s Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail - 06:AGGREGATE1 96.086us 96.086us 1 1 28.00 KB -1.00 B FINALIZE 05:EXCHANGE 1 26.629us 26.629us 3 1 0 -1.00 B UNPARTITIONED 02:AGGREGATE3 68.187us 87.887us 3 1 44.00 KB 10.00 MB 04:AGGREGATE32s851ms5s362ms 3.00K -1 1.95 GB 128.00 MB FINALIZE 03:EXCHANGE 3 119.540ms 220.191ms 9.00K -1 0 0 HASH(c1) 01:AGGREGATE35s876ms6s254ms 9.00K -1 2.93 GB 128.00 MB STREAMING 00:SCAN HDFS3 127.834ms 146.842ms 98.30M -1 19.80 MB 32.00 MB tpcds_10_parquet.benchmark After: 13.58s Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail --- 06:AGGREGATE1 101.101us 101.101us 1 1 28.00 KB -1.00 B FINALIZE 05:EXCHANGE 1 32.296us 32.296us 3 1 0 -1.00 B UNPARTITIONED 02:AGGREGATE3 83.284us 120.137us 3 1 44.00 KB 10.00 MB 04:AGGREGATE33s190ms6s555ms 3.00K -1 1.96 GB 128.00 MB FINALIZE 03:EXCHANGE 3 247.897ms 497.280ms 9.00K -1 0 0 HASH(c1) 01:AGGREGATE37s370ms8s460ms 9.00K -1 4.71 GB 128.00 MB STREAMING 00:SCAN HDFS3 111.721ms 122.306ms 98.30M -1 19.94 MB 32.00 MB tpcds_10_parquet.benchmark Memory benchmark: The following query was used to verify that this patch reduces memory usage: SELECT APPX_MEDIAN(ss_sold_date_sk) FROM tpcds.store_sales GROUP BY ss_customer_sk; Peak Mem in Agg nodes is reduced from 4.94 GB to 19.82 MB. Summary before: Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail 04:EXCHANGE 15.856ms5.856ms 14.82K 15.21K 0 -1.00 B UNPARTITIONED 03:AGGREGATE33s721ms3s789ms 14.82K 15.21K 4.94 GB 10.00 MB FINALIZE 02:EXCHANGE 3 139.276ms 157.753ms 15.60K 15.21K 0 0 HASH(ss_customer_sk) 01:AGGREGATE32s851ms3s026ms 15.60K 15.21K 5.29 GB 10.00 MB STREAMING 00:SCAN HDFS3 24.245ms 35.727ms 183.59K 183.59K 4.60 MB 384.00 MB tpcds.store_sales Summary after: Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail 04:EXCHANGE 1 288.125us 288.125us 14.82K 15.21K 0 -1.00 B UNPARTITIONED 03:AGGREGATE39.358ms 10.982ms 14.82K 15.21K 19.82 MB 10.00 MB FINALIZE 02:EXCHANGE 3 129.832us 154.953us 15.62K 15.21K 0 0 HASH(ss_customer_sk) 01:AGGREGATE3 11.086ms 13.102ms 15.62K 15.21K 9.49 MB 10.00 MB STREAMING 00:SCAN HDFS3 40.154ms 50.220ms 183.59K 183.59K 2.94 MB 384.00 MB tpcds.store_sales Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968 --- M be/src/exprs/aggregate-functions-ir.cc M testdata/workloads/functional-query/queries/QueryTest/alloc-fail-init.test 2 files changed, 153 insertions(+), 37 del
[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts
Michael Ho has posted comments on this change. Change subject: IMPALA-2020: Add rounding for decimal casts .. Patch Set 10: (1 comment) http://gerrit.cloudera.org:8080/#/c/5951/10/be/src/runtime/decimal-value.inline.h File be/src/runtime/decimal-value.inline.h: PS10, Line 38: if (round) { : // Decimal V2 behavior : : // Multiply the double by the scale. : // : // TODO: this could be made more precise by doing the computation in : // 64 bit with 128 bit immediate result instead of doing an additional : // multiply in 53-bit double precision floating point : : d *= DecimalUtil::GetScaleMultiplier(scale); : d = std::round(d); : const T max_value = DecimalUtil::GetScaleMultiplier(precision); : if (abs(d) >= max_value) { : *overflow = true; : return DecimalValue(); : } : : // Return the rounded integer part. : return DecimalValue(static_cast(d)); : } else { : // TODO: IMPALA-4924: remove DECIMAL V1 code : : // Check overflow. : const T max_value = DecimalUtil::GetScaleMultiplier(precision - scale); : if (abs(d) >= max_value) { : *overflow = true; : return DecimalValue(); : } : : // Multiply the double by the scale. : d *= DecimalUtil::GetScaleMultiplier(scale); : : // Truncate and just take the integer part. : return DecimalValue(static_cast(d)); : } > None I can come up with right now, but this was before I noticed the bit pa If it won't break DECIMAL_V1, may as well merge the two paths and add the test cases such as 10^37 to verify things are still working as expected. -- To view, visit http://gerrit.cloudera.org:8080/5951 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4848: Add WIDHT BUCKET() function
Alex Behm has posted comments on this change. Change subject: IMPALA-4848: Add WIDHT_BUCKET() function .. Patch Set 1: (8 comments) http://gerrit.cloudera.org:8080/#/c/6023/1/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: Line 3456: TestIsNull("width_bucket(NULL, 2, 14, 4)", TYPE_INT); Test NULL for all args. Add tests with extreme values to trigger edge cases. Also see my comments on the code. http://gerrit.cloudera.org:8080/#/c/6023/1/be/src/exprs/math-functions-ir.cc File be/src/exprs/math-functions-ir.cc: Line 427: if (expr.is_null) return IntVal::null(); do all null checks in one if Line 429: if (min_range == max_range) { use min_range.val directly because the == operator will check for null again Line 430: ctx->SetError("lower bound cannot be equal to upper bound"); Lower... Line 434: ctx->SetError("Number of buckets should be greater than zero"); should -> must also print which value was given Line 437: if (expr.val >= max_range.val) return num_buckets.val + 1; Comment that these cases go into the special under/overflow buckets. We need to be mindful of the case where num_buckets is already MAX_INT, adding +1 here will overflow it. We should return null in that case. Also add a test. Line 439: double num_elems = (max_range.val - min_range.val) / num_buckets.val; bucket_width? This could become 0 in extreme cases, and then we'd get infinity in the next line. Casting from infinity to int results in undefined behavior, so I think we should handle this case, and add a test that triggers it. http://gerrit.cloudera.org:8080/#/c/6023/1/be/src/exprs/math-functions.h File be/src/exprs/math-functions.h: Line 128: static IntVal WidthBucket(FunctionContext* ctx, const DoubleVal& expr, Move in the public section like the other functions. -- To view, visit http://gerrit.cloudera.org:8080/6023 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I081bc916b1bef7b929ca161a9aade3b54c6b858f Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadke Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: anujphadke Gerrit-HasComments: Yes
[Impala-ASF-CR] kudu: fix uninitialized variable usage warning
Impala Public Jenkins has posted comments on this change. Change subject: kudu: fix uninitialized variable usage warning .. Patch Set 1: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/289/ -- To view, visit http://gerrit.cloudera.org:8080/6098 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4f87c39cf9259bec8ba9a84a2543fb1f2e4a3422 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4546: Fix Moscow timezone conversion after 2014
Jim Apple has posted comments on this change. Change subject: IMPALA-4546: Fix Moscow timezone conversion after 2014 .. Patch Set 6: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/5969/5/be/src/exprs/timestamp-functions.cc File be/src/exprs/timestamp-functions.cc: PS5, Line 228: } > The function does not end here. It ends on line 245. Oops, sorry about that! -- To view, visit http://gerrit.cloudera.org:8080/5969 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id6e3f2c9f6ba29749a26bc1087e664637bc02528 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4546: Fix Moscow timezone conversion after 2014
Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-4546: Fix Moscow timezone conversion after 2014 .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/5969/5/be/src/exprs/timestamp-functions.cc File be/src/exprs/timestamp-functions.cc: PS5, Line 228: } > This function can fall off the bottom without returning anything, which I t The function does not end here. It ends on line 245. -- To view, visit http://gerrit.cloudera.org:8080/5969 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id6e3f2c9f6ba29749a26bc1087e664637bc02528 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts
Zach Amsden has uploaded a new patch set (#16). Change subject: IMPALA-2020: Add rounding for decimal casts .. IMPALA-2020: Add rounding for decimal casts This change adds support for DECIMAL_V2 rounding behavior for both DECIMAL to INT and DOUBLE to DECIMAL casts. The round behavior implemented for exact halves is round halves away from zero (e.g (0.5 -> 1) and (-0.5 -> -1)). Testing: Added expr-test and decimal-test test coverage as well as manual testing. I tried to update the expr benchmark to get some kind of results but the benchmark is pretty bit-rotted. It was throwing JNI exceptions. Fixed up the JNI init call, but there is still a lot of work to do to get this back in a runnable state. Even with the hack to get at the RuntimeContext, we end up getting null derefs due to the slot descriptor table not being initialized. I have decided to wait on expanding the python test until the bugs with overflow are fixed, which will make it easier to test sane behavior. [localhost:21000] > select cast(0.5 AS int); +--+ | cast(0.5 as int) | +--+ | 0| +--+ Fetched 1 row(s) in 0.01s [localhost:21000] > select cast(cast(0.5999 as float) as decimal(5,1)); +-+ | cast(cast(0.5999 as float) as decimal(5,1)) | +-+ | 0.5 | +-+ Fetched 1 row(s) in 0.01s [localhost:21000] > set decimal_v2=1; DECIMAL_V2 set to 1 [localhost:21000] > select cast(0.5 AS int); +--+ | cast(0.5 as int) | +--+ | 1| +--+ Fetched 1 row(s) in 0.01s [localhost:21000] > select cast(cast(0.5999 as float) as decimal(5,1)); +-+ | cast(cast(0.5999 as float) as decimal(5,1)) | +-+ | 0.6 | +-+ Fetched 1 row(s) in 0.01s Performance summary. In all cases I have tried multiple times and taken the fastest query results. Old version, head at 815c76f9cbbe6585ebed961da506fc54ce2ef4e3: [localhost:21000] > select sum(cast(l_extendedprice as bigint)) from tpch10_parquet.lineitem; Query: select sum(cast(l_extendedprice as bigint)) from tpch10_parquet.lineitem Query submitted at: 2017-02-21 19:31:32 (Coordinator: http://impala-dev:25000) Query progress can be monitored at: http://impala-dev:25000/query_plan?query_id=bf4d06517e3093ca:b053953b +--+ | sum(cast(l_extendedprice as bigint)) | +--+ | 2293784575265| +--+ Fetched 1 row(s) in 0.53s With this change, and decimal_v2 off: [localhost:21000] > select sum(cast(l_extendedprice as bigint)) from tpch10_parquet.lineitem; Query: select sum(cast(l_extendedprice as bigint)) from tpch10_parquet.lineitem Query submitted at: 2017-02-21 19:06:05 (Coordinator: http://impala-dev:25000) Query progress can be monitored at: http://impala-dev:25000/query_plan?query_id=e54f6fc2c21d63d0:da7b6baa +--+ | sum(cast(l_extendedprice as bigint)) | +--+ | 2293784575265| +--+ Fetched 1 row(s) in 0.52s Note that there is some noise / instability in these results and across invocations there is quite a bit of variance. Still we appear not to have regressed. With decimal V2 enabled, we loose some performance due to rounding. [localhost:21000] > set decimal_v2=1; DECIMAL_V2 set to 1 [localhost:21000] > select sum(cast(l_extendedprice as bigint)) from tpch10_parquet.lineitem; Query: select sum(cast(l_extendedprice as bigint)) from tpch10_parquet.lineitem Query submitted at: 2017-02-21 21:04:45 (Coordinator: http://impala-dev:25000) Query progress can be monitored at: http://impala-dev:25000/query_plan?query_id=444a15a709f0672:74197af +--+ | sum(cast(l_extendedprice as bigint)) | +--+ | 2293814088985| +--+ Fetched 1 row(s) in 0.63s So we're about 20% slower. The variance is quite a lot so this is not a scientific number, but the trend is maintained. So we have some work to do to get this back. Casting from double seems to be roughly at parity: Old version, head at 815c76f9cbbe6585ebed961da506fc54ce2ef4e3: [localhost:21000] > select sum(cast(cast(l_extendedprice as double) as decimal(14,2))) from tpch10_parquet.lineitem; Query: select sum(cast(cast(l_extendedprice as double) as decimal(14,2))) from tpch10_parquet.lineitem Query submitte
[Impala-ASF-CR] IMPALA-4546: Fix Moscow timezone conversion after 2014
Jim Apple has posted comments on this change. Change subject: IMPALA-4546: Fix Moscow timezone conversion after 2014 .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/5969/5/be/src/exprs/timestamp-functions.cc File be/src/exprs/timestamp-functions.cc: PS5, Line 228: } This function can fall off the bottom without returning anything, which I think is technically http://catb.org/jargon/html/N/nasal-demons.html -- To view, visit http://gerrit.cloudera.org:8080/5969 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id6e3f2c9f6ba29749a26bc1087e664637bc02528 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] Make sure impala doesn't pickup the system's boost cmake module
Matthew Jacobs has posted comments on this change. Change subject: Make sure impala doesn't pickup the system's boost cmake module .. Patch Set 2: I filed https://issues.cloudera.org/browse/IMPALA-4959 Please just add that to the commit message -- To view, visit http://gerrit.cloudera.org:8080/5994 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I759e1a4f8f69727cc1224bf460326140fd2131a2 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Ribeiro Alves Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4904,IMPALA-4914: add targeted-stress to exhaustive tests
Jim Apple has submitted this change and it was merged. Change subject: IMPALA-4904,IMPALA-4914: add targeted-stress to exhaustive tests .. IMPALA-4904,IMPALA-4914: add targeted-stress to exhaustive tests This patch allows any test suites whose workload is "targeted-stress" to be run in so-called "exhaustive" mode. Before this patch, only suites whose workload was "functional-query" would be run exhaustively. More on this flaw is in IMPALA-3947. The net effects are: 1. We fix IMPALA-4904, which allows test_ddl_stress to start running again. 2. We also improve the situation in IMPALA-4914 by getting TestSpillStress to run, though we don't fix its not-running-concurrently problem. The other mini-cluster stress tests were disabled in this commit: IMPALA-2605: Omit the sort and mini stress tests so they are not directly affected here. I also tried to clarify what "exhaustive" means in some of our shell scripts, via help text and comments. An exhaustive build+test run showed test_ddl_stress and TestSpillStress now get run and passed. This adds roughly 12 minutes to a build that's on the order of 13-14 hours. Change-Id: Ie6bd5bbd380e636d680368e908519b042d79dfec Reviewed-on: http://gerrit.cloudera.org:8080/6002 Tested-by: Impala Public Jenkins Reviewed-by: Jim Apple --- M bin/run-all-tests.sh M buildall.sh M tests/custom_cluster/test_spilling.py M tests/stress/test_ddl_stress.py 4 files changed, 85 insertions(+), 39 deletions(-) Approvals: Impala Public Jenkins: Verified Jim Apple: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/6002 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ie6bd5bbd380e636d680368e908519b042d79dfec Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Brown Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4904,IMPALA-4914: add targeted-stress to exhaustive tests
Jim Apple has posted comments on this change. Change subject: IMPALA-4904,IMPALA-4914: add targeted-stress to exhaustive tests .. Patch Set 8: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6002 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie6bd5bbd380e636d680368e908519b042d79dfec Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Brown Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4904,IMPALA-4914: add targeted-stress to exhaustive tests
Michael Brown has posted comments on this change. Change subject: IMPALA-4904,IMPALA-4914: add targeted-stress to exhaustive tests .. Patch Set 8: Sorry Jim. This needs a +2 after the rebase, and a submit. -- To view, visit http://gerrit.cloudera.org:8080/6002 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie6bd5bbd380e636d680368e908519b042d79dfec Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Brown Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] build: don't look in system paths for kudu client
Matthew Jacobs has posted comments on this change. Change subject: build: don't look in system paths for kudu client .. Patch Set 1: I submit it, it doesn't get kicked off automatically -- To view, visit http://gerrit.cloudera.org:8080/6097 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8dad3d92e0bd07c27c3a58f6964d4da88218049c Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[Impala-ASF-CR] build: don't look in system paths for kudu client
Impala Public Jenkins has posted comments on this change. Change subject: build: don't look in system paths for kudu client .. Patch Set 1: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/288/ -- To view, visit http://gerrit.cloudera.org:8080/6097 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8dad3d92e0bd07c27c3a58f6964d4da88218049c Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4934: Disable Kudu OpenSSL initialization
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4934: Disable Kudu OpenSSL initialization .. Patch Set 5: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/287/ -- To view, visit http://gerrit.cloudera.org:8080/6056 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3f13f3af512c6d771979638da593685524c73086 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4934: Disable Kudu OpenSSL initialization
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-4934: Disable Kudu OpenSSL initialization .. Patch Set 5: Code-Review+2 had to bump the client version one more time to get another change in Kudu -- To view, visit http://gerrit.cloudera.org:8080/6056 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3f13f3af512c6d771979638da593685524c73086 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4934: Disable Kudu OpenSSL initialization
Hello Impala Public Jenkins, Henry Robinson, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6056 to look at the new patch set (#5). Change subject: IMPALA-4934: Disable Kudu OpenSSL initialization .. IMPALA-4934: Disable Kudu OpenSSL initialization Bumps the Kudu version to include the change to the client that allows Impala to disable SSL initialization. In authentication.cc, after Impala initializes OpenSSL, Impala then disables Kudu's OpenSSL init. Change-Id: I3f13f3af512c6d771979638da593685524c73086 --- M be/src/rpc/authentication.cc M bin/impala-config.sh M infra/python/deps/download_requirements M infra/python/deps/requirements.txt 4 files changed, 12 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/6056/5 -- To view, visit http://gerrit.cloudera.org:8080/6056 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3f13f3af512c6d771979638da593685524c73086 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs
[Impala-ASF-CR] IMPALA-4904,IMPALA-4914: add targeted-stress to exhaustive tests
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4904,IMPALA-4914: add targeted-stress to exhaustive tests .. Patch Set 8: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/6002 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie6bd5bbd380e636d680368e908519b042d79dfec Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Brown Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] [DOCS] Release note updates for Impala 2.8
Jim Apple has posted comments on this change. Change subject: [DOCS] Release note updates for Impala 2.8 .. Patch Set 9: 2.8.0 shipped a month ago. The delay on this patch is going to start preventing developers from updating impala_new_features.xml with their changes that they think will go in 2.9.0, thus preventing an incremental update method that would ensure that file stays up-to-date and can be released along with the actual release. -- To view, visit http://gerrit.cloudera.org:8080/5668 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7c47f422e509cec6d3eb8aaa82294b584f393aed Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Ambreen Kazi Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: John Russell Gerrit-Reviewer: Laurel Hale Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Silvius Rus Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3401 [DOCS] Part 4 of "Cloudera Manager" removal
Laurel Hale has abandoned this change. Change subject: IMPALA-3401 [DOCS] Part 4 of "Cloudera Manager" removal .. Abandoned Made a mistake trying to create a second patch set. -- To view, visit http://gerrit.cloudera.org:8080/6099 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: abandon Gerrit-Change-Id: Icfe3a7bd7d2a086fa5dcbc0c06ebeee1f52c5519 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Laurel Hale
[Impala-ASF-CR] IMPALA-3401 [DOCS] Part 4 of "Cloudera Manager" removal. Most of these fixes involved hiding the paragraphs with the DITA attribute 'audience="hidden"' and then inserting a paragraph s
Laurel Hale has uploaded a new patch set (#2). Change subject: IMPALA-3401 [DOCS] Part 4 of "Cloudera Manager" removal. Most of these fixes involved hiding the paragraphs with the DITA attribute 'audience="hidden"' and then inserting a paragraph suitable for upstream documentation. This hides the mention of Cloudera .. IMPALA-3401 [DOCS] Part 4 of "Cloudera Manager" removal. Most of these fixes involved hiding the paragraphs with the DITA attribute 'audience="hidden"' and then inserting a paragraph suitable for upstream documentation. This hides the mention of Cloudera Manager in the rendered documentation. In a subsequent cleanup project, the "Cloudera Manager" mentions will be removed from the XML. This patch includes a fix to impala_common.xml requested by John Russell. Change-Id: I3c3c2177e0b9c4c81f1541820013c66a59c0c7b1 --- M docs/shared/impala_common.xml M docs/topics/impala_perf_resources.xml M docs/topics/impala_perf_skew.xml M docs/topics/impala_perf_testing.xml M docs/topics/impala_scalability.xml M docs/topics/impala_txtfile.xml 6 files changed, 56 insertions(+), 14 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/6069/2 -- To view, visit http://gerrit.cloudera.org:8080/6069 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3c3c2177e0b9c4c81f1541820013c66a59c0c7b1 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Laurel Hale Gerrit-Reviewer: Ambreen Kazi Gerrit-Reviewer: John Russell Gerrit-Reviewer: Matthew Jacobs
[Impala-ASF-CR] IMPALA-3401 [DOCS] Part 4 of "Cloudera Manager" removal
Laurel Hale has uploaded a new change for review. http://gerrit.cloudera.org:8080/6099 Change subject: IMPALA-3401 [DOCS] Part 4 of "Cloudera Manager" removal .. IMPALA-3401 [DOCS] Part 4 of "Cloudera Manager" removal Fixed impala_common.xml as requested. Change-Id: Icfe3a7bd7d2a086fa5dcbc0c06ebeee1f52c5519 --- M docs/shared/impala_common.xml 1 file changed, 8 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/6099/1 -- To view, visit http://gerrit.cloudera.org:8080/6099 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Icfe3a7bd7d2a086fa5dcbc0c06ebeee1f52c5519 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Laurel Hale
[Impala-ASF-CR] IMPALA-4821: Update AVG() for DECIMAL V2
Michael Ho has posted comments on this change. Change subject: IMPALA-4821: Update AVG() for DECIMAL_V2 .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/6038/4//COMMIT_MSG Commit Message: PS4, Line 25: types > values Done. This may seem like a concern in the long run, isn't it ? http://gerrit.cloudera.org:8080/#/c/6038/4/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java: PS4, Line 338: Our implementation is similar to MS SQL which takes the max of the input's scale : // and MIN_ADJUSTED_SCALE. > How about explaining why we deviate from SQL Server and from SUM()/COUNT(): Done -- To view, visit http://gerrit.cloudera.org:8080/6038 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I28f5ef0370938440eb5b1c6d29b2f24e6f88499f Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4821: Update AVG() for DECIMAL V2
Hello Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6038 to look at the new patch set (#5). Change subject: IMPALA-4821: Update AVG() for DECIMAL_V2 .. IMPALA-4821: Update AVG() for DECIMAL_V2 This change implements the DECIMAL_V2's behavior for AVG(). The differences with DECIMAL_V1 are: 1. The output type has a minimum scale of 6. This is similar to MS SQL's behavior which takes the max of 6 and the input type's scale. We deviate from MS SQL in the output's precision which is always set to 38. We use the smallest precision which can store the output. A key insight is that the output of AVG() is no wider than the inputs. Precision only needs to be adjusted when the scale is augmented. Using a smaller precision avoids potential loss of precision in subsequent decimal operations (e.g. division) if AVG() is a subexpression. Please note that the output type is different from SUM()/COUNT() as the latter can have a much larger scale. 2. Due to a minimum of 6 decimal places for the output, AVG() for decimal values whose whole number part exceeds 32 decimal places (e.g. DECIMAL(38,4), DECIMAL(33,0)) will always overflow as the scale is augmented to 6. Certain decimal types which work with AVG() in DECIMAL_V1 no longer work in DECIMAL_V2. Change-Id: I28f5ef0370938440eb5b1c6d29b2f24e6f88499f --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/expr-test.cc M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java M testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test M testdata/workloads/functional-query/queries/QueryTest/decimal.test 5 files changed, 235 insertions(+), 76 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/6038/5 -- To view, visit http://gerrit.cloudera.org:8080/6038 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I28f5ef0370938440eb5b1c6d29b2f24e6f88499f Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho
[Impala-ASF-CR] IMPALA-4546: Fix Moscow timezone conversion after 2014
Taras Bobrovytsky has uploaded a new patch set (#6). Change subject: IMPALA-4546: Fix Moscow timezone conversion after 2014 .. IMPALA-4546: Fix Moscow timezone conversion after 2014 In 2014 Moscow timezone rules changed from UTC+3 with no DST to UTC+4 with no DST. A special case has been added to timestamp functions to handle this. Testing: Added BE Expr tests. Change-Id: Id6e3f2c9f6ba29749a26bc1087e664637bc02528 --- M be/src/exprs/expr-test.cc M be/src/exprs/timestamp-functions.cc M be/src/exprs/timezone_db.cc M be/src/exprs/timezone_db.h 4 files changed, 114 insertions(+), 44 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/5969/6 -- To view, visit http://gerrit.cloudera.org:8080/5969 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id6e3f2c9f6ba29749a26bc1087e664637bc02528 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-4546: Fix Moscow timezone conversion after 2014
Taras Bobrovytsky has uploaded a new patch set (#6). Change subject: IMPALA-4546: Fix Moscow timezone conversion after 2014 .. IMPALA-4546: Fix Moscow timezone conversion after 2014 In 2014 Moscow timezone rules changed from UTC+3 with no DST to UTC+4 with no DST. A special case has been added to timestamp functions to handle this. Testing: Added BE Expr tests. Change-Id: Id6e3f2c9f6ba29749a26bc1087e664637bc02528 --- M be/src/exprs/expr-test.cc M be/src/exprs/timestamp-functions.cc M be/src/exprs/timezone_db.cc M be/src/exprs/timezone_db.h 4 files changed, 114 insertions(+), 44 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/5969/6 -- To view, visit http://gerrit.cloudera.org:8080/5969 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id6e3f2c9f6ba29749a26bc1087e664637bc02528 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-4546: Fix Moscow timezone conversion after 2014
Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-4546: Fix Moscow timezone conversion after 2014 .. Patch Set 5: (3 comments) http://gerrit.cloudera.org:8080/#/c/5969/5/be/src/exprs/timestamp-functions.cc File be/src/exprs/timestamp-functions.cc: Line 215: int msk_transition_day = 2456956; > AND_CAPS Done Line 215: int msk_transition_day = 2456956; > const for these three vars Done Line 226: tv.time().hours() < (msk_transition_hour_utc + msk_utc_offset) % 24)) { > Is this correct? Before, we had tv.time().hours() < 1. Now we have 22 + 4 % Yes, this is correct. The old (pre 2014) timezone is returned also for times between 1 and 2 now (before it was returned only for times before 1. We handle this case property by returning null on line 129. -- To view, visit http://gerrit.cloudera.org:8080/5969 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id6e3f2c9f6ba29749a26bc1087e664637bc02528 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] kudu: fix uninitialized variable usage warning
Matthew Jacobs has posted comments on this change. Change subject: kudu: fix uninitialized variable usage warning .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6098 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4f87c39cf9259bec8ba9a84a2543fb1f2e4a3422 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[Impala-ASF-CR] kudu: fix uninitialized variable usage warning
Todd Lipcon has posted comments on this change. Change subject: kudu: fix uninitialized variable usage warning .. Patch Set 1: I doubt we'd see this in real life, since as far as I know glog always sets the severity to one of the ones covered in the case statement. Otherwise, DEBUG builds would be failing with the DCHECK here. But, the warning being emitted during compilation was bugging me. -- To view, visit http://gerrit.cloudera.org:8080/6098 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4f87c39cf9259bec8ba9a84a2543fb1f2e4a3422 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[Impala-ASF-CR] build: don't look in system paths for kudu client
Todd Lipcon has posted comments on this change. Change subject: build: don't look in system paths for kudu client .. Patch Set 1: I'm not sure I can trigger a GVM. Does your +2 automatically trigger it? LMK if there's anything I need to do to push this forward, thanks. -- To view, visit http://gerrit.cloudera.org:8080/6097 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8dad3d92e0bd07c27c3a58f6964d4da88218049c Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[Impala-ASF-CR] Avoid including Boost cmake support from system path
Todd Lipcon has abandoned this change. Change subject: Avoid including Boost cmake support from system path .. Abandoned -- To view, visit http://gerrit.cloudera.org:8080/6096 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: abandon Gerrit-Change-Id: I9767952aada11d4d44da99bc0011d6d0829fad4c Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Todd Lipcon
[Impala-ASF-CR] Avoid including Boost cmake support from system path
Todd Lipcon has posted comments on this change. Change subject: Avoid including Boost cmake support from system path .. Patch Set 1: Oh, interesting. He and I didn't coordinate, but I'm guessing we both tried building on the same dev box (the Kudu team shares a beefy box we often use for builds). His has a better commit message than mine, so I'll abandon this one. -- To view, visit http://gerrit.cloudera.org:8080/6096 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9767952aada11d4d44da99bc0011d6d0829fad4c Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[Impala-ASF-CR] build: don't look in system paths for kudu client
Matthew Jacobs has posted comments on this change. Change subject: build: don't look in system paths for kudu client .. Patch Set 1: Code-Review+2 assuming this passes real builds -- To view, visit http://gerrit.cloudera.org:8080/6097 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8dad3d92e0bd07c27c3a58f6964d4da88218049c Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No