[Impala-ASF-CR] IMPALA-4993: extend dictionary filtering to collections
Hello Lars Volker, Tim Armstrong, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8775 to look at the new patch set (#13). Change subject: IMPALA-4993: extend dictionary filtering to collections .. IMPALA-4993: extend dictionary filtering to collections Currently, top-level scalar columns in parquet files can be used at runtime to prune row-groups by evaluating certain conjuncts over the column's dictionary (if available). This change extends such pruning to scalar values that are stored in collection type columns. Currently, dictionary pruning works by finding eligible conjuncts for top-level slots. Since only top-level slots are supported, the slots are implicitly part of the scan node's tuple descriptor. With this change, we track eligible conjuncts by slot as well as the tuple that contains the slot (either top-level or nested collection). Since collection conjuncts are already managed by a map that associates tuple descriptors to a list of their conjuncts, this extension follows the existing representation. The frontend builds the mapping of SlotId to conjuncts that are dictionary filterable. This mapping now includes SlotId's that reference nested tuples. The backend is adjusted to use the same representation. In addition, collection readers are decomposed into scalar filterable columns and other, non-dictionary filterable readers. When filtering a row group using a conjunct associated to a (possibly) nested collection type, an additional tuple buffer is allocated per tuple descriptor. Testing: - e2e test extended to illustrate row-groups that are pruned by nested collection dictionary filters. Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/parquet-column-readers.cc M be/src/runtime/collection-value-builder.h M be/src/runtime/scoped-buffer.h M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test M testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test M testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test M testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test 16 files changed, 653 insertions(+), 212 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/8775/13 -- To view, visit http://gerrit.cloudera.org:8080/8775 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd Gerrit-Change-Number: 8775 Gerrit-PatchSet: 13 Gerrit-Owner: Vuk Ercegovac Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-4993: extend dictionary filtering to collections
Hello Lars Volker, Tim Armstrong, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8775 to look at the new patch set (#12). Change subject: IMPALA-4993: extend dictionary filtering to collections .. IMPALA-4993: extend dictionary filtering to collections Currently, top-level scalar columns in parquet files can be used at runtime to prune row-groups by evaluating certain conjuncts over the column's dictionary (if available). This change extends such pruning to scalar values that are stored in collection type columns. Currently, dictionary pruning works by finding eligible conjuncts for top-level slots. Since only top-level slots are supported, the slots are implicitly part of the scan node's tuple descriptor. With this change, we track eligible conjuncts by slot as well as the tuple that contains the slot (either top-level or nested collection). Since collection conjuncts are already managed by a map that associates tuple descriptors to a list of their conjuncts, this extension follows the existing representation. The frontend builds the mapping of SlotId to conjuncts that are dictionary filterable. This mapping now includes SlotId's that reference nested tuples. The backend is adjusted to use the same representation. In addition, collection readers are decomposed into scalar filterable columns and other, non-dictionary filterable readers. When filtering a row group using a conjunct associated to a (possibly) nested collection type, an additional tuple buffer is allocated per tuple descriptor. Testing: - e2e test extended to illustrate row-groups that are pruned by nested collection dictionary filters. Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/parquet-column-readers.cc M be/src/runtime/collection-value-builder.h M be/src/runtime/scoped-buffer.h M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test M testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test M testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test M testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test 16 files changed, 653 insertions(+), 212 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/8775/12 -- To view, visit http://gerrit.cloudera.org:8080/8775 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd Gerrit-Change-Number: 8775 Gerrit-PatchSet: 12 Gerrit-Owner: Vuk Ercegovac Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-4993: extend dictionary filtering to collections
Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/8775 ) Change subject: IMPALA-4993: extend dictionary filtering to collections .. Patch Set 11: (1 comment) http://gerrit.cloudera.org:8080/#/c/8775/11/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/8775/11/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1091 PS11, Line 1091: private Map> PrintDictionaryConjuncts( > * return void made it consistent. -- To view, visit http://gerrit.cloudera.org:8080/8775 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd Gerrit-Change-Number: 8775 Gerrit-PatchSet: 11 Gerrit-Owner: Vuk Ercegovac Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Fri, 12 Jan 2018 06:37:47 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1767: [DOCS] Document new Boolean operators
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8942 ) Change subject: IMPALA-1767: [DOCS] Document new Boolean operators .. Patch Set 5: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8942 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iefebf210418ec2d47b154bd37166b76720f085bb Gerrit-Change-Number: 8942 Gerrit-PatchSet: 5 Gerrit-Owner: John Russell Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: John Russell Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Fri, 12 Jan 2018 06:19:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6388: Fix the Union node number of hosts estimation
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/9017 ) Change subject: IMPALA-6388: Fix the Union node number of hosts estimation .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/9017/1/testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test File testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test: http://gerrit.cloudera.org:8080/#/c/9017/1/testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test@3561 PS1, Line 3561: # IMPALA-6388: Verify that the order of the union operands does not impact the move these into union.test you should be able to use the recently added QUERYOPTIONS section to use a higher explain level for only those tests see runtime-filter-query-options.test for an example of using the new QUERYOPTIONS section http://gerrit.cloudera.org:8080/#/c/9017/1/testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test@3563 PS1, Line 3563: select coalesce(int_col, int_col) from functional.alltypes why coalesce()? -- To view, visit http://gerrit.cloudera.org:8080/9017 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I51e1ecca8dbc84b2b5a72708667b2799d00279f0 Gerrit-Change-Number: 9017 Gerrit-PatchSet: 1 Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Comment-Date: Fri, 12 Jan 2018 06:15:36 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4993: extend dictionary filtering to collections
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8775 ) Change subject: IMPALA-4993: extend dictionary filtering to collections .. Patch Set 11: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/8775/11/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/8775/11/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1091 PS11, Line 1091: private Map> PrintDictionaryConjuncts( * return void * the typical pattern we use for explain helper functions is: String getFooBarExplainString(String prefix); see examples like getTableStatsExplainString(), getNodeExplainString() would be good to change this helper to e consistent -- To view, visit http://gerrit.cloudera.org:8080/8775 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd Gerrit-Change-Number: 8775 Gerrit-PatchSet: 11 Gerrit-Owner: Vuk Ercegovac Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Fri, 12 Jan 2018 06:09:07 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4993: extend dictionary filtering to collections
Hello Lars Volker, Tim Armstrong, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8775 to look at the new patch set (#11). Change subject: IMPALA-4993: extend dictionary filtering to collections .. IMPALA-4993: extend dictionary filtering to collections Currently, top-level scalar columns in parquet files can be used at runtime to prune row-groups by evaluating certain conjuncts over the column's dictionary (if available). This change extends such pruning to scalar values that are stored in collection type columns. Currently, dictionary pruning works by finding eligible conjuncts for top-level slots. Since only top-level slots are supported, the slots are implicitly part of the scan node's tuple descriptor. With this change, we track eligible conjuncts by slot as well as the tuple that contains the slot (either top-level or nested collection). Since collection conjuncts are already managed by a map that associates tuple descriptors to a list of their conjuncts, this extension follows the existing representation. The frontend builds the mapping of SlotId to conjuncts that are dictionary filterable. This mapping now includes SlotId's that reference nested tuples. The backend is adjusted to use the same representation. In addition, collection readers are decomposed into scalar filterable columns and other, non-dictionary filterable readers. When filtering a row group using a conjunct associated to a (possibly) nested collection type, an additional tuple buffer is allocated per tuple descriptor. Testing: - e2e test extended to illustrate row-groups that are pruned by nested collection dictionary filters. Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/parquet-column-readers.cc M be/src/runtime/collection-value-builder.h M be/src/runtime/scoped-buffer.h M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test M testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test M testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test M testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test 16 files changed, 654 insertions(+), 213 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/8775/11 -- To view, visit http://gerrit.cloudera.org:8080/8775 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd Gerrit-Change-Number: 8775 Gerrit-PatchSet: 11 Gerrit-Owner: Vuk Ercegovac Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-4993: extend dictionary filtering to collections
Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/8775 ) Change subject: IMPALA-4993: extend dictionary filtering to collections .. Patch Set 10: tests found a bad dep, next patch fixes it. -- To view, visit http://gerrit.cloudera.org:8080/8775 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd Gerrit-Change-Number: 8775 Gerrit-PatchSet: 10 Gerrit-Owner: Vuk Ercegovac Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Fri, 12 Jan 2018 05:35:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6388: Fix the Union node number of hosts estimation
Taras Bobrovytsky has uploaded this change for review. ( http://gerrit.cloudera.org:8080/9017 Change subject: IMPALA-6388: Fix the Union node number of hosts estimation .. IMPALA-6388: Fix the Union node number of hosts estimation Before this patch, we would estimate the number of hosts for the union node by looking only at the first union operand. This is obviously incorrect and lead us to underestimate the value. We fix the problem by setting the estimate to be the maximum of its children. Testing: - Added a planner test that reproduces the issue Change-Id: I51e1ecca8dbc84b2b5a72708667b2799d00279f0 --- M fe/src/main/java/org/apache/impala/planner/UnionNode.java M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test 2 files changed, 95 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/9017/1 -- To view, visit http://gerrit.cloudera.org:8080/9017 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I51e1ecca8dbc84b2b5a72708667b2799d00279f0 Gerrit-Change-Number: 9017 Gerrit-PatchSet: 1 Gerrit-Owner: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-5528: Upgrade GPerfTools to 2.6.3 and tune TCMalloc for KRPC
Hello Sailesh Mukil, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8991 to look at the new patch set (#5). Change subject: IMPALA-5528: Upgrade GPerfTools to 2.6.3 and tune TCMalloc for KRPC .. IMPALA-5528: Upgrade GPerfTools to 2.6.3 and tune TCMalloc for KRPC KRPC in general tends to put more pressure on the thread caches due to allocations of more small objects (i.e. <1MB). While some of them are being addressed in KUDU-1865, it's shown that the following TCMalloc workarounds will provide reasonable performance with KRPC: - TCMALLOC_TRANSFER_NUM_OBJ: - maximum number of object per classe type to transfer between thread and central caches. - the default value of 512 in 2.5.2 seems to cause the spin lock in the central cache to be held for too long with KRPC. 2.6.0 and latter reverts this value to 32 by default. - TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES - total amount of memory allocated to all thread caches in bytes - the default value is 32MB. We need to bump it to 1GB which is the internal cap in TCMalloc. This change upgrades GPerfTools/TCMalloc to 2.6.3 to pick up the change of the default value of TCMALLOC_TRANSFER_NUM_OBJ. In addition, when KRPC is enabled and FLAGS_TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES has the default value of 0, we will automatically bump the thread cache sizes to 1GB. Without these workarounds, stress test with KRPC will grind to a halt due to contention for the spinlock in TCMalloc's central cache. With these workarounds, the stress test completes within the same ballpark as thrift. Also did a perf run with Thrift. The regression in TPCH-Q2 is mostly due to sensitivity in runtime filter timing and the avg can be dragged up due to a bad run when filters arrive late. No regression as measured in targeted-perf. ++---+-++++ | Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | ++---+-++++ | TPCH(_300) | parquet / none / none | 18.93 | -0.84% | 10.08 | +1.45% | ++---+-++++ ++--+---++-++++-+---+ | Workload | Query| File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Num Clients | Iters | ++--+---++-++++-+---+ | TPCH(_300) | TPCH-Q2 | parquet / none / none | 6.28 | 3.25| R +93.41% | * 49.77% * | * 12.47% * | 1 | 3 | | TPCH(_300) | TPCH-Q4 | parquet / none / none | 5.00 | 4.77| +4.83% | 0.41%| 0.03%| 1 | 3 | | TPCH(_300) | TPCH-Q13 | parquet / none / none | 21.29 | 20.69 | +2.90% | 0.55%| 0.37%| 1 | 3 | | TPCH(_300) | TPCH-Q11 | parquet / none / none | 1.73 | 1.71| +0.94% | 1.69%| 2.85%| 1 | 3 | | TPCH(_300) | TPCH-Q14 | parquet / none / none | 6.03 | 5.99| +0.76% | 0.00%| 0.95%| 1 | 3 | | TPCH(_300) | TPCH-Q16 | parquet / none / none | 6.97 | 6.93| +0.58% | 0.74%| 0.73%| 1 | 3 | | TPCH(_300) | TPCH-Q3 | parquet / none / none | 29.15 | 29.03 | +0.40% | 1.63%| 1.39%| 1 | 3 | | TPCH(_300) | TPCH-Q1 | parquet / none / none | 14.01 | 13.96 | +0.34% | 1.28%| 0.51%| 1 | 3 | | TPCH(_300) | TPCH-Q6 | parquet / none / none | 1.27 | 1.27| -0.03% | 3.69%| 0.07%| 1 | 3 | | TPCH(_300) | TPCH-Q9 | parquet / none / none | 30.99 | 31.13 | -0.45% | 0.54%| 0.19%| 1 | 3 | | TPCH(_300) | TPCH-Q5 | parquet / none / none | 48.03 | 48.33 | -0.63% | 4.72%| 0.11%| 1 | 3 | | TPCH(_300) | TPCH-Q7 | parquet / none / none | 46.85 | 47.41 | -1.18% | 1.59%| 0.46%| 1 | 3 | | TPCH(_300) | TPCH-Q8 | parquet / none / none | 7.92 | 8.03| -1.39% | 3.67%| 5.63%| 1 | 3 | | TPCH(_300) | TPCH-Q19 | parquet / none / none | 30.98 | 31.51 | -1.67% | 1.33%| 0.82%| 1 | 3 | | TPCH(_300) | TPCH-Q18 | parquet / none / none | 33.55 | 34.13 | -1.71% | 1.15%| 1.46%| 1 | 3 | | TPCH(_300) | TPCH-Q10 | parquet / none / none | 9.46 | 9.64| -1.82% | 0.63%| 0.
[Impala-ASF-CR] IMPALA-2172, IMPALA-6391: [DOCS] Distinguish char length() from length()
John Russell has uploaded this change for review. ( http://gerrit.cloudera.org:8080/9014 Change subject: IMPALA-2172, IMPALA-6391: [DOCS] Distinguish char_length() from length() .. IMPALA-2172, IMPALA-6391: [DOCS] Distinguish char_length() from length() Modify both char_length() and length() usage notes to say when they return the same or different results. Include the same example, showing both STRING and CHAR types, under both functions. Change-Id: I18cabfce66351bb890bfbfc26b93466204a82625 --- M docs/shared/impala_common.xml M docs/topics/impala_string_functions.xml 2 files changed, 61 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/9014/1 -- To view, visit http://gerrit.cloudera.org:8080/9014 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I18cabfce66351bb890bfbfc26b93466204a82625 Gerrit-Change-Number: 9014 Gerrit-PatchSet: 1 Gerrit-Owner: John Russell
[Impala-ASF-CR] IMPALA-5528: Upgrade GPerfTools to 2.6.3 and tune TCMalloc for KRPC
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/8991 ) Change subject: IMPALA-5528: Upgrade GPerfTools to 2.6.3 and tune TCMalloc for KRPC .. Patch Set 4: PS4 moves the initialization to ExecEnv::InitBufferPool() instead. Fixed up a couple of BE tests which didn't initialize this TCMalloc setting as part of setup. -- To view, visit http://gerrit.cloudera.org:8080/8991 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5be574435af51fb7a875b16888cca260b341190e Gerrit-Change-Number: 8991 Gerrit-PatchSet: 4 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 12 Jan 2018 02:32:32 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5528: Upgrade GPerfTools to 2.6.3 and tune TCMalloc for KRPC
Hello Sailesh Mukil, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8991 to look at the new patch set (#4). Change subject: IMPALA-5528: Upgrade GPerfTools to 2.6.3 and tune TCMalloc for KRPC .. IMPALA-5528: Upgrade GPerfTools to 2.6.3 and tune TCMalloc for KRPC KRPC in general tends to put more pressure on the thread caches due to allocations of more small objects (i.e. <1MB). While some of them are being addressed in KUDU-1865, it's shown that the following TCMalloc workarounds will provide reasonable performance with KRPC: - TCMALLOC_TRANSFER_NUM_OBJ: - maximum number of object per classe type to transfer between thread and central caches. - the default value of 512 in 2.5.2 seems to cause the spin lock in the central cache to be held for too long with KRPC. 2.6.0 and latter reverts this value to 32 by default. - TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES - total amount of memory allocated to all thread caches in bytes - the default value is 32MB. We need to bump it to 1GB which is the internal cap in TCMalloc. This change upgrades GPerfTools/TCMalloc to 2.6.3 to pick up the change of the default value of TCMALLOC_TRANSFER_NUM_OBJ. In addition, when KRPC is enabled and FLAGS_TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES has the default value of 0, we will automatically bump the thread cache sizes to 1GB. Without these workarounds, stress test with KRPC will grind to a halt due to contention for the spinlock in TCMalloc's central cache. With these workarounds, the stress test completes within the same ballpark as thrift. Also did a perf run with Thrift. The regression in TPCH-Q2 is mostly due to sensitivity in runtime filter timing and the avg can be dragged up due to a bad run when filters arrive late. No regression as measured in targeted-perf. ++---+-++++ | Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | ++---+-++++ | TPCH(_300) | parquet / none / none | 18.93 | -0.84% | 10.08 | +1.45% | ++---+-++++ ++--+---++-++++-+---+ | Workload | Query| File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Num Clients | Iters | ++--+---++-++++-+---+ | TPCH(_300) | TPCH-Q2 | parquet / none / none | 6.28 | 3.25| R +93.41% | * 49.77% * | * 12.47% * | 1 | 3 | | TPCH(_300) | TPCH-Q4 | parquet / none / none | 5.00 | 4.77| +4.83% | 0.41%| 0.03%| 1 | 3 | | TPCH(_300) | TPCH-Q13 | parquet / none / none | 21.29 | 20.69 | +2.90% | 0.55%| 0.37%| 1 | 3 | | TPCH(_300) | TPCH-Q11 | parquet / none / none | 1.73 | 1.71| +0.94% | 1.69%| 2.85%| 1 | 3 | | TPCH(_300) | TPCH-Q14 | parquet / none / none | 6.03 | 5.99| +0.76% | 0.00%| 0.95%| 1 | 3 | | TPCH(_300) | TPCH-Q16 | parquet / none / none | 6.97 | 6.93| +0.58% | 0.74%| 0.73%| 1 | 3 | | TPCH(_300) | TPCH-Q3 | parquet / none / none | 29.15 | 29.03 | +0.40% | 1.63%| 1.39%| 1 | 3 | | TPCH(_300) | TPCH-Q1 | parquet / none / none | 14.01 | 13.96 | +0.34% | 1.28%| 0.51%| 1 | 3 | | TPCH(_300) | TPCH-Q6 | parquet / none / none | 1.27 | 1.27| -0.03% | 3.69%| 0.07%| 1 | 3 | | TPCH(_300) | TPCH-Q9 | parquet / none / none | 30.99 | 31.13 | -0.45% | 0.54%| 0.19%| 1 | 3 | | TPCH(_300) | TPCH-Q5 | parquet / none / none | 48.03 | 48.33 | -0.63% | 4.72%| 0.11%| 1 | 3 | | TPCH(_300) | TPCH-Q7 | parquet / none / none | 46.85 | 47.41 | -1.18% | 1.59%| 0.46%| 1 | 3 | | TPCH(_300) | TPCH-Q8 | parquet / none / none | 7.92 | 8.03| -1.39% | 3.67%| 5.63%| 1 | 3 | | TPCH(_300) | TPCH-Q19 | parquet / none / none | 30.98 | 31.51 | -1.67% | 1.33%| 0.82%| 1 | 3 | | TPCH(_300) | TPCH-Q18 | parquet / none / none | 33.55 | 34.13 | -1.71% | 1.15%| 1.46%| 1 | 3 | | TPCH(_300) | TPCH-Q10 | parquet / none / none | 9.46 | 9.64| -1.82% | 0.63%| 0.
[Impala-ASF-CR] IMPALA-4993: extend dictionary filtering to collections
Hello Lars Volker, Tim Armstrong, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8775 to look at the new patch set (#10). Change subject: IMPALA-4993: extend dictionary filtering to collections .. IMPALA-4993: extend dictionary filtering to collections Currently, top-level scalar columns in parquet files can be used at runtime to prune row-groups by evaluating certain conjuncts over the column's dictionary (if available). This change extends such pruning to scalar values that are stored in collection type columns. Currently, dictionary pruning works by finding eligible conjuncts for top-level slots. Since only top-level slots are supported, the slots are implicitly part of the scan node's tuple descriptor. With this change, we track eligible conjuncts by slot as well as the tuple that contains the slot (either top-level or nested collection). Since collection conjuncts are already managed by a map that associates tuple descriptors to a list of their conjuncts, this extension follows the existing representation. The frontend builds the mapping of SlotId to conjuncts that are dictionary filterable. This mapping now includes SlotId's that reference nested tuples. The backend is adjusted to use the same representation. In addition, collection readers are decomposed into scalar filterable columns and other, non-dictionary filterable readers. When filtering a row group using a conjunct associated to a (possibly) nested collection type, an additional tuple buffer is allocated per tuple descriptor. Testing: - e2e test extended to illustrate row-groups that are pruned by nested collection dictionary filters. Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/parquet-column-readers.cc M be/src/runtime/collection-value-builder.h M be/src/runtime/scoped-buffer.h M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test M testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test M testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test M testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test 16 files changed, 656 insertions(+), 213 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/8775/10 -- To view, visit http://gerrit.cloudera.org:8080/8775 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd Gerrit-Change-Number: 8775 Gerrit-PatchSet: 10 Gerrit-Owner: Vuk Ercegovac Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-4993: extend dictionary filtering to collections
Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/8775 ) Change subject: IMPALA-4993: extend dictionary filtering to collections .. Patch Set 9: (9 comments) http://gerrit.cloudera.org:8080/#/c/8775/8/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/8775/8/be/src/exec/hdfs-parquet-scanner.cc@778 PS8, Line 778: } > Makes sense. The function comment incorrectly states they are added to 'non fixed the comment-- yes, that was inconsistent. http://gerrit.cloudera.org:8080/#/c/8775/8/be/src/exec/hdfs-parquet-scanner.cc@809 PS8, Line 809: if (!state_->query_options().parquet_dictionary_filtering || dict_filter_map_.empty()) { > My point was about whether child readers are included or not. In this code clarified in the comment for the members. http://gerrit.cloudera.org:8080/#/c/8775/8/be/src/exec/hdfs-parquet-scanner.cc@845 PS8, Line 845: // where some data pages are dictionary-encoded and others are plain > Agree. Would be good to add a Preonditions check to the FE SlotDescriptor c added to FE's SlotDescriptor. http://gerrit.cloudera.org:8080/#/c/8775/8/be/src/exec/hdfs-scanner.h File be/src/exec/hdfs-scanner.h: http://gerrit.cloudera.org:8080/#/c/8775/8/be/src/exec/hdfs-scanner.h@222 PS8, Line 222: typedef std::map> > Makes sense. added to TupleDescriptor; its the one that currently has more of an overview. http://gerrit.cloudera.org:8080/#/c/8775/8/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/8775/8/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@548 PS8, Line 548: private boolean isCollectionNotEmpty(TupleDescriptor desc) { > I'm still in favor of removing. Reasons: makes sense. inlined it. http://gerrit.cloudera.org:8080/#/c/8775/8/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1102 PS8, Line 1102: // in the same order as the normal conjuncts. Sort the indices so that the > Sorry this was my being vague. I meant creating a helper function for addin Done http://gerrit.cloudera.org:8080/#/c/8775/8/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1116 PS8, Line 1116: for (Integer idx : totalIdxList) { > The perTupleConjuncts already have a new list, so no member is modified wou yup, completely missed that. http://gerrit.cloudera.org:8080/#/c/8775/9/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/8775/9/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@639 PS9, Line 639: List tupleIds = Lists.newArrayList(); > remove, Preconditions check in L646 must trivially be true after L645 Done http://gerrit.cloudera.org:8080/#/c/8775/8/testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test File testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test: http://gerrit.cloudera.org:8080/#/c/8775/8/testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test@330 PS8, Line 330:parquet statistics predicates: c_custkey > 0, o.o_orderkey > 0, l.l_partkey > 0 > that pattern makes more sense to me as well, so the oddity is really an exi added a jira to track it. -- To view, visit http://gerrit.cloudera.org:8080/8775 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd Gerrit-Change-Number: 8775 Gerrit-PatchSet: 9 Gerrit-Owner: Vuk Ercegovac Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Fri, 12 Jan 2018 02:29:51 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6290: limit ScannerContext to 1 buffer at a time
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8814 ) Change subject: IMPALA-6290: limit ScannerContext to 1 buffer at a time .. IMPALA-6290: limit ScannerContext to 1 buffer at a time This is a prerequisite for constraining the number of buffers per scan range. Before this patch, calling ReadBytes(), SkipBytes(), etc could cause an arbitrary number of I/O buffers to accumulate in 'completed_io_buffers_'. E.g. if we allocated 3 * 8MB I/O buffers for a range and then called ReadBytes(30MB), we would hit resource exhaustion as soon as 3 buffers were accumulated in 'completed_io_buffers_'. The fix is to avoid accumulating any buffers in 'completed_io_buffers_'. Instead of adding them to 'completed_io_buffers_', completed buffers are just returned to the I/O manager. It turned out that this did not weaken the ScannerContext's guarantees about memory lifetime, because ScannerContext::GetBytesInternal() cleared 'boundary_buffer_' each time it was called regardless. I checked that this behaviour wasn't a bug by inspecting the scanner code. I could not find any cases where scanners depended on returned memory remaining valid beyond the next Read*()/Get*()/Skip*() call on the stream. This change makes that lifetime explicit in the comments. A side-effect of this fix is that scanners do not need to call ReleaseCompletedResources() in CommitRows() and means that the ScannerContext only ever needs to hold one I/O buffer at a time. This change also reimplements SkipBytes() to avoid it accumulating memory in the boundary buffer for large skip sizes. Also clarifies some of the invariants in ScannerContext. E.g. some places assumed io_buffer_ != NULL, but that is no longer needed. Testing: Ran core tests with ASAN and exhaustive tests with DEBUG. Change-Id: I74c5960a75f7d88b0e1de4199af731fb13e592f0 Reviewed-on: http://gerrit.cloudera.org:8080/8814 Reviewed-by: Tim Armstrong Tested-by: Impala Public Jenkins --- M be/src/exec/base-sequence-scanner.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/hdfs-text-scanner.cc M be/src/exec/parquet-column-readers.cc M be/src/exec/scanner-context.cc M be/src/exec/scanner-context.h M be/src/exec/scanner-context.inline.h M be/src/runtime/io/disk-io-mgr.cc 10 files changed, 315 insertions(+), 209 deletions(-) Approvals: Tim Armstrong: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/8814 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I74c5960a75f7d88b0e1de4199af731fb13e592f0 Gerrit-Change-Number: 8814 Gerrit-PatchSet: 15 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoram Thanga
[Impala-ASF-CR] IMPALA-6290: limit ScannerContext to 1 buffer at a time
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8814 ) Change subject: IMPALA-6290: limit ScannerContext to 1 buffer at a time .. Patch Set 14: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/8814 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I74c5960a75f7d88b0e1de4199af731fb13e592f0 Gerrit-Change-Number: 8814 Gerrit-PatchSet: 14 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Fri, 12 Jan 2018 02:06:32 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2397: Use atomics for metrics
Michael Ho has uploaded this change for review. ( http://gerrit.cloudera.org:8080/9012 Change subject: IMPALA-2397: Use atomics for metrics .. IMPALA-2397: Use atomics for metrics This change removes the spinlock in IntGauge and IntCounter and uses AtomicInt64 instead. As shown in IMPALA-2397, multiple threads can be contending for the spinlocks of some global metrics under concurrent queries. This change also breaks up SimpleMetric into two subclasses: - SimpleProperty: - a value store for any primitive type (int,float,string etc). - atomic read and write via value() and set_value() respectively. - AtomicMetric: - the basis of IntGauge and IntCounter. Support atomic increment of the metric value via Increment() interface. - atomic read and write via value() and set_value() respectively. - only support int64_t type. With this change, DoubleGauge is removed from the code base as the existing use case doesn't make use of the Increment() interface so replacing it with DoubleProperty is sufficient. Change-Id: I48dfa5443cd771916b53541a0ffeaf1bcc7e7606 --- M be/src/exec/external-data-source-executor.cc M be/src/rpc/TAcceptQueueServer.cpp M be/src/rpc/thrift-server.cc M be/src/runtime/client-cache.cc M be/src/runtime/data-stream-mgr.cc M be/src/runtime/exec-env.cc M be/src/runtime/krpc-data-stream-mgr.cc M be/src/runtime/mem-tracker-test.cc M be/src/runtime/mem-tracker.cc M be/src/runtime/tmp-file-mgr.cc M be/src/scheduling/admission-controller.cc M be/src/scheduling/scheduler.cc M be/src/statestore/statestore-subscriber.cc M be/src/statestore/statestore-subscriber.h M be/src/statestore/statestore.cc M be/src/util/impalad-metrics.cc M be/src/util/memory-metrics.cc M be/src/util/memory-metrics.h M be/src/util/metrics-test.cc M be/src/util/metrics.h M be/src/util/thread.cc M common/thrift/metrics.json 22 files changed, 267 insertions(+), 246 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/9012/1 -- To view, visit http://gerrit.cloudera.org:8080/9012 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I48dfa5443cd771916b53541a0ffeaf1bcc7e7606 Gerrit-Change-Number: 9012 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho
[Impala-ASF-CR] IMPALA-5931: Generates scan ranges in planner for s3/adls
Hello Dimitris Tsirogiannis, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8523 to look at the new patch set (#4). Change subject: IMPALA-5931: Generates scan ranges in planner for s3/adls .. IMPALA-5931: Generates scan ranges in planner for s3/adls Currently, for filesystems that do not include physical block information (e.g., block replica locations, caching), synthetic blocks are generated and stored in the catalog when metadata is loaded. Example file systems for which this is done includes S3, ADLS, and local fs. This change avoids generating these blocks when metadata is loaded. Instead, scan ranges are directly generated from such files by the backend coordinator. Previously, all scan ranges were produced by the planner in HDFSScanNode in the frontend. Now, those files without block information are sent to the coordinator represented by a split specification that determines how the coordinator will create scan ranges to send to executors. This change reduces the space needed in the catalog and reduces the scan range datastructures that are passed from the frontend to the backend when planning and coordinating a query. In addition a bug is avoided where non-splittable files were being split anyways to support the query parameter that places a limit on scan ranges. Testing: - local fs tests cover the code paths in this change - all core tests pass when configured with s3 - manually tried larger local filesystem tables (tpch) with multiple partitions and observed the same scan ranges. - TODO: adls testing Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5 --- M be/src/scheduling/query-schedule.cc M be/src/scheduling/query-schedule.h M be/src/scheduling/scheduler.cc M be/src/scheduling/scheduler.h M be/src/service/client-request-state.cc M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java 9 files changed, 343 insertions(+), 178 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/8523/4 -- To view, visit http://gerrit.cloudera.org:8080/8523 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5 Gerrit-Change-Number: 8523 Gerrit-PatchSet: 4 Gerrit-Owner: Vuk Ercegovac Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-5931: Generates scan ranges in planner for s3/adls
Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/8523 ) Change subject: IMPALA-5931: Generates scan ranges in planner for s3/adls .. Patch Set 3: (6 comments) http://gerrit.cloudera.org:8080/#/c/8523/3/be/src/scheduling/scheduler.cc File be/src/scheduling/scheduler.cc: http://gerrit.cloudera.org:8080/#/c/8523/3/be/src/scheduling/scheduler.cc@582 PS3, Line 582: ComputeScanRangeAssignment > This function now does both assignment and generation of scan ranges. Also, I moved it to QueryScheduler... that seems to be the place where data about scheduling is stored (including the request with scan ranges from the FE). Now, the idea is to first generate these ranges (done in client request), then schedule. If there is a better place to store such per request info, let me know and I can easily move it there. The only thing the scheduler knows about all of this right now is to stitch together computed and given ranges-- that's a bit gorpy (L258) and I plan to fix it if this flow/organization looks ok. http://gerrit.cloudera.org:8080/#/c/8523/3/common/thrift/PlanNodes.thrift File common/thrift/PlanNodes.thrift: http://gerrit.cloudera.org:8080/#/c/8523/3/common/thrift/PlanNodes.thrift@195 PS3, Line 195: struct THdfsFileSplitSpec { : 1: required CatalogObjects.THdfsFileDesc file_desc : // -1 means not-splittable : 2: required i64 max_block_size : 3: required i64 partition_id : } > THdfsFileDesc and THdfsFileSplit have quite a bit of intersection wrt the i I did this for two reasons: 1) I expect this Spec to be the only thing that comes out of the FE for hdfs files (unexpanded) and THdfsFileSplit will only be present on the BE (expanded). Currently, I'm doing this just for s3/adls/local, but I'm thinking to apply it to all (not this change) hdfs files. 2) I preferred to use the type-checker to tell me which representation I have (expanded vs. unexpanded). Otherwise, I'd have to peer into THdfsFileSplit to determine this state, which seems more error-prone. http://gerrit.cloudera.org:8080/#/c/8523/3/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java: http://gerrit.cloudera.org:8080/#/c/8523/3/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@131 PS3, Line 131: new int[] {} > I believe you don't have to do that. You can probably pass null and then av Done http://gerrit.cloudera.org:8080/#/c/8523/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/8523/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@338 PS3, Line 338: its > typo: it's Done http://gerrit.cloudera.org:8080/#/c/8523/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/8523/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@666 PS3, Line 666: private void generateScanRangeSpecs > No intention to be picky but can you move this and tansformBlocksToScanRang Done http://gerrit.cloudera.org:8080/#/c/8523/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@694 PS3, Line 694:* TScanRangeLocationLists are added scanRanges_. If disk ids are expected in blocks > to Done -- To view, visit http://gerrit.cloudera.org:8080/8523 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5 Gerrit-Change-Number: 8523 Gerrit-PatchSet: 3 Gerrit-Owner: Vuk Ercegovac Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Fri, 12 Jan 2018 01:19:09 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5736: [DOCS] Document --query option for impala-shell
Hello Lars Volker, Michael Brown, David Knupp, Csaba Ringhofer, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8771 to look at the new patch set (#4). Change subject: IMPALA-5736: [DOCS] Document --query_option for impala-shell .. IMPALA-5736: [DOCS] Document --query_option for impala-shell Change-Id: I5fa4fc27d6566e87fdabe57edc176133d586a84b --- M docs/impala_keydefs.ditamap M docs/topics/impala_query_options.xml M docs/topics/impala_set.xml M docs/topics/impala_shell_options.xml 4 files changed, 49 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/8771/4 -- To view, visit http://gerrit.cloudera.org:8080/8771 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5fa4fc27d6566e87fdabe57edc176133d586a84b Gerrit-Change-Number: 8771 Gerrit-PatchSet: 4 Gerrit-Owner: John Russell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Knupp Gerrit-Reviewer: John Russell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Brown
[Impala-ASF-CR] IMPALA-5736: [DOCS] Document --query option for impala-shell
John Russell has posted comments on this change. ( http://gerrit.cloudera.org:8080/8771 ) Change subject: IMPALA-5736: [DOCS] Document --query_option for impala-shell .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/8771/3/docs/topics/impala_shell_options.xml File docs/topics/impala_shell_options.xml: http://gerrit.cloudera.org:8080/#/c/8771/3/docs/topics/impala_shell_options.xml@627 PS3, Line 627: --config_file=path_to_config_file, to easily select between many > Somewhere in this section you should mention the new [impala.query_options] OK. It is mentioned alongside --query_option / -Q on lines 288-293 but I'll add a specific example. Hmm, if I put live_progress=true into the [impala.query_options] section, I get: Server version: impalad version 2.11.0-SNAPSHOT RELEASE (build 10a0212fef8829ffea0e2ec48eb19e894012c981) LIVE_PROGRESS is not supported for the impalad being connected to, ignoring. I never saw that before. I wonder if it's a limitation of [impala.query_options] that it doesn't work for the 2 special 'shell options'. I'll play it safe and just use a single query option, mem_limit, in my example. -- To view, visit http://gerrit.cloudera.org:8080/8771 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5fa4fc27d6566e87fdabe57edc176133d586a84b Gerrit-Change-Number: 8771 Gerrit-PatchSet: 4 Gerrit-Owner: John Russell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Knupp Gerrit-Reviewer: John Russell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Brown Gerrit-Comment-Date: Fri, 12 Jan 2018 00:45:07 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6386: Invalidate metadata at table level for dataload
Hello Philip Zeyliger, Zach Amsden, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9009 to look at the new patch set (#2). Change subject: IMPALA-6386: Invalidate metadata at table level for dataload .. IMPALA-6386: Invalidate metadata at table level for dataload Dataload currently executes bin/load-data.py for TPC-H, TPC-DS, and functional-query concurrently. One of the final steps for bin/load-data.py is to run a global "invalidate metadata". Global "invalidate metadata" commands are known to cause problem on concurrent systems. See IMPALA-5087. For dataload, if TPC-H executes "invalidate metadata" while TPC-DS is still creating tables and adding partitions, the TPC-DS executor might erroneously believe that a table does not exist. This changes dataload to invalidate metadata at an individual table level rather than globally. This prevents the concurrency issue. This also changes the names of some of the intermediate SQL files generated by generate-schema-statements.py and consumed by load-data.py to make them less confusing. Change-Id: Ibc3a6d8a674a0bf6b02069bfe8a5e12034335b1f --- M bin/load-data.py M testdata/bin/generate-schema-statements.py 2 files changed, 16 insertions(+), 15 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/9009/2 -- To view, visit http://gerrit.cloudera.org:8080/9009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibc3a6d8a674a0bf6b02069bfe8a5e12034335b1f Gerrit-Change-Number: 9009 Gerrit-PatchSet: 2 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-6386: Invalidate metadata at table level for dataload
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/9009 ) Change subject: IMPALA-6386: Invalidate metadata at table level for dataload .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/9009/1/bin/load-data.py File bin/load-data.py: http://gerrit.cloudera.org:8080/#/c/9009/1/bin/load-data.py@318 PS1, Line 318: create_filename = 'load-%s-impala-generated' % load_file_substr > This isn't your fault, but it seems particularly fragile that we have to ge I agree. The SQL generation should also generate a dependency graph, which should drive execution. Arbitrary filenames are a bad interface. I'm fixing the filenames so that create statements are in a create-%s-impala-generated and load is in a load-%s-impala-generated. This makes it a bit less weird. -- To view, visit http://gerrit.cloudera.org:8080/9009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibc3a6d8a674a0bf6b02069bfe8a5e12034335b1f Gerrit-Change-Number: 9009 Gerrit-PatchSet: 1 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Fri, 12 Jan 2018 00:43:34 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4993: extend dictionary filtering to collections
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8775 ) Change subject: IMPALA-4993: extend dictionary filtering to collections .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/8775/8/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/8775/8/be/src/exec/hdfs-parquet-scanner.cc@778 PS8, Line 778: } > Makes sense. The function comment incorrectly states they are added to 'non Please don't wait for my change! I intend to wait for this to get merged and then rebase. -- To view, visit http://gerrit.cloudera.org:8080/8775 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd Gerrit-Change-Number: 8775 Gerrit-PatchSet: 8 Gerrit-Owner: Vuk Ercegovac Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Fri, 12 Jan 2018 00:31:15 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4993: extend dictionary filtering to collections
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8775 ) Change subject: IMPALA-4993: extend dictionary filtering to collections .. Patch Set 9: (9 comments) Change looks pretty good to me http://gerrit.cloudera.org:8080/#/c/8775/8/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/8775/8/be/src/exec/hdfs-parquet-scanner.cc@778 PS8, Line 778: } > no, I intended to omit the collection readers. this partitions all nested s Makes sense. The function comment incorrectly states they are added to 'non_filtering'. http://gerrit.cloudera.org:8080/#/c/8775/8/be/src/exec/hdfs-parquet-scanner.cc@809 PS8, Line 809: if (!state_->query_options().parquet_dictionary_filtering || dict_filter_map_.empty()) { > yup... I assumed that if its a mix of collection + scalars is ok since that My point was about whether child readers are included or not. In this code path we only add the top-level readers. In the other code path we recursively collect all nested readers as well. I think we should either be consistent or add a comment explaining that in some cases only the top-level readers are added. http://gerrit.cloudera.org:8080/#/c/8775/8/be/src/exec/hdfs-parquet-scanner.cc@845 PS8, Line 845: // where some data pages are dictionary-encoded and others are plain > that's correct, removed. I think I carried that conservative check over fro Agree. Would be good to add a Preonditions check to the FE SlotDescriptor c'tors. http://gerrit.cloudera.org:8080/#/c/8775/8/be/src/exec/hdfs-scanner.h File be/src/exec/hdfs-scanner.h: http://gerrit.cloudera.org:8080/#/c/8775/8/be/src/exec/hdfs-scanner.h@222 PS8, Line 222: typedef std::map> > I'll comment here on the vs slotid decision here (covers Makes sense. Would be good to clarify this in TupleDescriptor and/or SlotDescriptor as you suggested. SlotIds and TupleIds are unique within a query. A SlotId belongs to exactly one TupleId. http://gerrit.cloudera.org:8080/#/c/8775/8/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/8775/8/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@548 PS8, Line 548: private boolean isCollectionNotEmpty(TupleDescriptor desc) { > its used in a couple of places and I wanted to keep the same checks in both I'm still in favor of removing. Reasons: The first Preconditions check only seems useful in the context of this function, and even so you'll get the expected "false" return value if we probe for null. The second Preconditions check is not necessary because the type of a tuple is always StructType (see type_ member in TupleDescriptor). http://gerrit.cloudera.org:8080/#/c/8775/8/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1102 PS8, Line 1102: // in the same order as the normal conjuncts. Sort the indices so that the > Done Sorry this was my being vague. I meant creating a helper function for adding the dictionary filtering part of the explain string. It's a good chunk of code that deals with that part. I think the grouping an be done in that function and doesn't need a separate helper. http://gerrit.cloudera.org:8080/#/c/8775/8/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1116 PS8, Line 1116: for (Integer idx : totalIdxList) { > didn't want to mutate a class member for the purpose of printing diags. The perTupleConjuncts already have a new list, so no member is modified would be modified. http://gerrit.cloudera.org:8080/#/c/8775/9/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/8775/9/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@639 PS9, Line 639: List tupleIds = Lists.newArrayList(); remove, Preconditions check in L646 must trivially be true after L645 http://gerrit.cloudera.org:8080/#/c/8775/8/testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test File testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test: http://gerrit.cloudera.org:8080/#/c/8775/8/testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test@330 PS8, Line 330:parquet statistics predicates: c_custkey > 0, o.o_orderkey > 0, l.l_partkey > 0 > hmm... actually I followed the pattern on L324-5 which separates the predic that pattern makes more sense to me as well, so the oddity is really an existing issue with the "parquet statistics predicates" -- To view, visit http://gerrit.cloudera.org:8080/8775 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd
[Impala-ASF-CR] IMPALA-6386: Invalidate metadata at table level for dataload
Zach Amsden has posted comments on this change. ( http://gerrit.cloudera.org:8080/9009 ) Change subject: IMPALA-6386: Invalidate metadata at table level for dataload .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/9009/1/bin/load-data.py File bin/load-data.py: http://gerrit.cloudera.org:8080/#/c/9009/1/bin/load-data.py@318 PS1, Line 318: create_filename = 'load-%s-impala-generated' % load_file_substr > The '%s-impala-generated' currently matches files like 'load-functional-que This isn't your fault, but it seems particularly fragile that we have to generate identical filenames in multiple pieces of python code. -- To view, visit http://gerrit.cloudera.org:8080/9009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibc3a6d8a674a0bf6b02069bfe8a5e12034335b1f Gerrit-Change-Number: 9009 Gerrit-PatchSet: 1 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Fri, 12 Jan 2018 00:24:03 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6386: Invalidate metadata at table level for dataload
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/9009 ) Change subject: IMPALA-6386: Invalidate metadata at table level for dataload .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/9009/1/bin/load-data.py File bin/load-data.py: http://gerrit.cloudera.org:8080/#/c/9009/1/bin/load-data.py@318 PS1, Line 318: create_filename = 'load-%s-impala-generated' % load_file_substr > create_filename = 'load' The '%s-impala-generated' currently matches files like 'load-functional-query-exhaustive-impala-generated-avro-def-block.sql'. It is matching by asking if create_filename is in the filename. So, this match works, but '%s-impala-generated' (filled in => 'functional-query-exhaustive-impala-generated') is not particularly specific. Right now, we don't have any other filenames of this form, but my new 'invalidate-%s-impala-generated.sql' files match this condition. So, to distinguish between the invalidate file and the preexisting files, I need to make the condition more exact. The file names are bad. I'll look into renaming some things. -- To view, visit http://gerrit.cloudera.org:8080/9009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibc3a6d8a674a0bf6b02069bfe8a5e12034335b1f Gerrit-Change-Number: 9009 Gerrit-PatchSet: 1 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Fri, 12 Jan 2018 00:16:27 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.
Hello Michael Ho, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8349 to look at the new patch set (#8). Change subject: IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters. .. IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters. This patch generalizes ltrim()/rtrim() functions to accept a second argument that specifies the set of characters to be removed from the leading/trailing end of the target string: ltrim(string text[, characters text]) rtrim(string text[, characters text]) A common string trimming method has been added to StringFunctions, which is called from the general ltrim/rtrim/btrim functions. The functions also share prepare and close operations. New StringFunctions tests have been added to ExprTest for the new forms of ltrim() and rtrim(). New tests to cover handling of special characters have also been added. Note that our string handling functions only work with the ASCII character set. Handling of other character sets lies outside the scope of this patch. The existing ltrim()/rtrim()/trim() functions that take only one argument have been updated to use the more general methods. Testing: Queries like the following were run on a 1.5-billion row tpch_parquet.lineitem table, with the old and new implementations to ensure there is no performance regression: 1. select count(trim(l_shipinstruct)), count(trim(l_returnflag)), ... 2. select count(*) from t where trim(l_shipinstruct) = '' and ... Change-Id: I8a5ae3f59762e70c3268a01e14ed57a9e36b8d79 --- M be/src/exprs/expr-test.cc M be/src/exprs/string-functions-ir.cc M be/src/exprs/string-functions.h M common/function-registry/impala_functions.py 4 files changed, 175 insertions(+), 88 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/8349/8 -- To view, visit http://gerrit.cloudera.org:8080/8349 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8a5ae3f59762e70c3268a01e14ed57a9e36b8d79 Gerrit-Change-Number: 8349 Gerrit-PatchSet: 8 Gerrit-Owner: Zoram Thanga Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zoram Thanga
[Impala-ASF-CR] IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.
Zoram Thanga has posted comments on this change. ( http://gerrit.cloudera.org:8080/8349 ) Change subject: IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters. .. Patch Set 7: (4 comments) Thanks for the comments. Please see patch set #8 http://gerrit.cloudera.org:8080/#/c/8349/7/be/src/exprs/string-functions-ir.cc File be/src/exprs/string-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/8349/7/be/src/exprs/string-functions-ir.cc@441 PS7, Line 441: IS_CONSTANT > We can possibly get rid of this after IMPALA-6380 is fixed. Agree. Will revisit this aspect of the code. http://gerrit.cloudera.org:8080/#/c/8349/7/be/src/exprs/string-functions-ir.cc@443 PS7, Line 443: chars_to_trim.is_null > As discussed offline, we cannot make any assumption about the length of Str As discussed, this seems like a bad DCHECK anyway - because '' and NULL are legit values of chars_to_trim, even when the arguments come from column elements. Removing the DCHECK, and instead check for zero-length or NULL chars_to_trim, as these do not modify the target string. Have added code to check for NULL chars_to_trim in TrimPrepare(). http://gerrit.cloudera.org:8080/#/c/8349/7/be/src/exprs/string-functions.h File be/src/exprs/string-functions.h: http://gerrit.cloudera.org:8080/#/c/8349/7/be/src/exprs/string-functions.h@87 PS7, Line 87: static void TrimPrepare(FunctionContext*, FunctionContext::FunctionStateScope); > A quick comment on what this Prepare() function do would be useful. Done http://gerrit.cloudera.org:8080/#/c/8349/7/be/src/exprs/string-functions.h@149 PS7, Line 149: template > Please add a comment on what these template parameters stand for. Done -- To view, visit http://gerrit.cloudera.org:8080/8349 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8a5ae3f59762e70c3268a01e14ed57a9e36b8d79 Gerrit-Change-Number: 8349 Gerrit-PatchSet: 7 Gerrit-Owner: Zoram Thanga Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Fri, 12 Jan 2018 00:14:12 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5054: [SECURITY] Enable KRPC w/ TLS in Impala
Hello Michael Ho, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8439 to look at the new patch set (#3). Change subject: IMPALA-5054: [SECURITY] Enable KRPC w/ TLS in Impala .. IMPALA-5054: [SECURITY] Enable KRPC w/ TLS in Impala KRPC has some flags that turn on TLS. This patch sets those to enable TLS communication. Tests are added to rpc-mgr-test. Change-Id: I9a14a44fdea9ab668f3714eb69fdb188bce38f5a --- M be/src/catalog/catalogd-main.cc M be/src/rpc/authentication-test.cc M be/src/rpc/rpc-mgr-test.cc M be/src/rpc/rpc-mgr.cc M be/src/rpc/rpc-mgr.h M be/src/rpc/thrift-server.cc M be/src/rpc/thrift-server.h M be/src/runtime/exec-env.cc M be/src/service/impala-server.cc M be/src/statestore/statestore-subscriber.cc M be/src/statestore/statestore.cc M be/src/statestore/statestored-main.cc M be/src/testutil/in-process-servers.cc M be/src/util/openssl-util.cc M be/src/util/openssl-util.h 15 files changed, 326 insertions(+), 36 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/8439/3 -- To view, visit http://gerrit.cloudera.org:8080/8439 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9a14a44fdea9ab668f3714eb69fdb188bce38f5a Gerrit-Change-Number: 8439 Gerrit-PatchSet: 3 Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil
[Impala-ASF-CR] IMPALA-5054: [SECURITY] Enable KRPC w/ TLS in Impala
Hello Michael Ho, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8439 to look at the new patch set (#2). Change subject: IMPALA-5054: [SECURITY] Enable KRPC w/ TLS in Impala .. IMPALA-5054: [SECURITY] Enable KRPC w/ TLS in Impala KRPC has some flags that turn on TLS. This patch sets those to enable TLS communication. Tests are added to rpc-mgr-test. Change-Id: I9a14a44fdea9ab668f3714eb69fdb188bce38f5a --- M be/src/catalog/catalogd-main.cc M be/src/rpc/authentication-test.cc M be/src/rpc/rpc-mgr-test.cc M be/src/rpc/rpc-mgr.cc M be/src/rpc/rpc-mgr.h M be/src/rpc/thrift-server.cc M be/src/rpc/thrift-server.h M be/src/runtime/exec-env.cc M be/src/service/impala-server.cc M be/src/statestore/statestore-subscriber.cc M be/src/statestore/statestore.cc M be/src/statestore/statestored-main.cc M be/src/testutil/in-process-servers.cc M be/src/util/openssl-util.cc M be/src/util/openssl-util.h 15 files changed, 327 insertions(+), 36 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/8439/2 -- To view, visit http://gerrit.cloudera.org:8080/8439 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9a14a44fdea9ab668f3714eb69fdb188bce38f5a Gerrit-Change-Number: 8439 Gerrit-PatchSet: 2 Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil
[Impala-ASF-CR] IMPALA-6386: Invalidate metadata at table level for dataload
Zach Amsden has posted comments on this change. ( http://gerrit.cloudera.org:8080/9009 ) Change subject: IMPALA-6386: Invalidate metadata at table level for dataload .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/9009/1/bin/load-data.py File bin/load-data.py: http://gerrit.cloudera.org:8080/#/c/9009/1/bin/load-data.py@318 PS1, Line 318: create_filename = 'load-%s-impala-generated' % load_file_substr create_filename = 'load' followed by load_filename = '... Seems strange. Also, I don't see a corresponding change to this anywhere else, seems like there should be one? -- To view, visit http://gerrit.cloudera.org:8080/9009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibc3a6d8a674a0bf6b02069bfe8a5e12034335b1f Gerrit-Change-Number: 9009 Gerrit-PatchSet: 1 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Fri, 12 Jan 2018 00:03:07 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5736: [DOCS] Document --query option for impala-shell
Michael Brown has posted comments on this change. ( http://gerrit.cloudera.org:8080/8771 ) Change subject: IMPALA-5736: [DOCS] Document --query_option for impala-shell .. Patch Set 3: (1 comment) What is here now is fine, but I think one piece of functionality is missing. See the inline comment. http://gerrit.cloudera.org:8080/#/c/8771/3/docs/topics/impala_shell_options.xml File docs/topics/impala_shell_options.xml: http://gerrit.cloudera.org:8080/#/c/8771/3/docs/topics/impala_shell_options.xml@627 PS3, Line 627: Somewhere in this section you should mention the new [impala.query_options] setting in impalarc. See for example https://gerrit.cloudera.org/#/c/8038/19/shell/option_parser.py L22. -- To view, visit http://gerrit.cloudera.org:8080/8771 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5fa4fc27d6566e87fdabe57edc176133d586a84b Gerrit-Change-Number: 8771 Gerrit-PatchSet: 3 Gerrit-Owner: John Russell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Knupp Gerrit-Reviewer: John Russell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Brown Gerrit-Comment-Date: Fri, 12 Jan 2018 00:00:30 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5736: [DOCS] Document --query option for impala-shell
John Russell has posted comments on this change. ( http://gerrit.cloudera.org:8080/8771 ) Change subject: IMPALA-5736: [DOCS] Document --query_option for impala-shell .. Patch Set 3: Tagging in Mike B for possible +2, since I see he reviewed the code gerrit. -- To view, visit http://gerrit.cloudera.org:8080/8771 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5fa4fc27d6566e87fdabe57edc176133d586a84b Gerrit-Change-Number: 8771 Gerrit-PatchSet: 3 Gerrit-Owner: John Russell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Knupp Gerrit-Reviewer: John Russell Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Thu, 11 Jan 2018 23:40:47 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 ) Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. Patch Set 1: (6 comments) Did a pass over this, focusing on the backend implementation. http://gerrit.cloudera.org:8080/#/c/9005/1/be/src/exec/scalar-check-node.cc File be/src/exec/scalar-check-node.cc: http://gerrit.cloudera.org:8080/#/c/9005/1/be/src/exec/scalar-check-node.cc@31 PS1, Line 31: NULL Prefer nullptr in new code. http://gerrit.cloudera.org:8080/#/c/9005/1/be/src/exec/scalar-check-node.cc@32 PS1, Line 32: child_row_idx_(0), We've mostly been switching to initialising member variables to constants inline in the class definition. http://gerrit.cloudera.org:8080/#/c/9005/1/be/src/exec/scalar-check-node.cc@56 PS1, Line 56: if (child_row_batch_->num_rows() == 0) { > Is it possible/ok for this condition to be false? See my comment in scalar- I think this needs to be a while loop that resets the batch after each iteration. It's valid for GetNext() to return a row batch with 0 rows even before eos. This is sometimes an implementation artefact and sometimes to flush memory resources attached to the batch. So if GetNext() returns 0 rows and would have returned 1 rows on the next call, I think this would incorrectly fail. Or if GetNext() returned 1 row && !eos, then 0 rows && eos. I think the two valid termination conditions for this node are if it hits eos or if it sees > 1 row. http://gerrit.cloudera.org:8080/#/c/9005/1/be/src/exec/scalar-check-node.cc@78 PS1, Line 78: child_row_batch_->TransferResourceOwnership(row_batch); > Just an idea, not sure if a good one: it may be better to deep copy the row I think it's the start of a good idea. I originally suggested modelling this after SelectNode(), which is a streaming node that passes through its input rows. However, if we deep copy the output row, we could implement this as a blocking node that consumes its input in Open(), copies the single surviving row, then closes the input tree. Blocking nodes can reduce resource requirements for a plan because the planner knows that operators above and below the blocking node in the same fragment don't execute concurrently. There's a virtual function in the frontend ExecNode implementation that determines if it's blocking or not. http://gerrit.cloudera.org:8080/#/c/9005/1/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java: http://gerrit.cloudera.org:8080/#/c/9005/1/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@735 PS1, Line 735:* @throws ImpalaException We don't generally have @throws declarations in Java comments http://gerrit.cloudera.org:8080/#/c/9005/1/testdata/workloads/functional-query/queries/QueryTest/subquery.test File testdata/workloads/functional-query/queries/QueryTest/subquery.test: http://gerrit.cloudera.org:8080/#/c/9005/1/testdata/workloads/functional-query/queries/QueryTest/subquery.test@839 PS1, Line 839: QUERY Maybe add a test that sets num_nodes=1 to make sure that the single-node versino of the plans are executable. -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 11 Jan 2018 23:32:50 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5654: Disallow setting Kudu table name in CREATE TABLE
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8820 ) Change subject: IMPALA-5654: Disallow setting Kudu table name in CREATE TABLE .. Patch Set 13: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/1709/ -- To view, visit http://gerrit.cloudera.org:8080/8820 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ieca037498abf8f5fde67b77e824b720482cdbe6f Gerrit-Change-Number: 8820 Gerrit-PatchSet: 13 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 11 Jan 2018 23:29:12 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6386: Invalidate metadata at table level for dataload
Joe McDonnell has uploaded this change for review. ( http://gerrit.cloudera.org:8080/9009 Change subject: IMPALA-6386: Invalidate metadata at table level for dataload .. IMPALA-6386: Invalidate metadata at table level for dataload Dataload currently executes bin/load-data.py for TPC-H, TPC-DS, and functional-query concurrently. One of the final steps for bin/load-data.py is to run a global "invalidate metadata". Global "invalidate metadata" commands are known to cause problem on concurrent systems. See IMPALA-5087. For dataload, if TPC-H executes "invalidate metadata" while TPC-DS is still creating tables and adding partitions, the TPC-DS executor might erroneously believe that a table does not exist. This changes dataload to invalidate metadata at an individual table level rather than globally. This prevents the concurrency issue. Change-Id: Ibc3a6d8a674a0bf6b02069bfe8a5e12034335b1f --- M bin/load-data.py M testdata/bin/generate-schema-statements.py 2 files changed, 13 insertions(+), 12 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/9009/1 -- To view, visit http://gerrit.cloudera.org:8080/9009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ibc3a6d8a674a0bf6b02069bfe8a5e12034335b1f Gerrit-Change-Number: 9009 Gerrit-PatchSet: 1 Gerrit-Owner: Joe McDonnell
[Impala-ASF-CR] IMPALA-5519: Allocate Runtime filter memory from Buffer pool
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8971 ) Change subject: IMPALA-5519: Allocate Runtime filter memory from Buffer pool .. Patch Set 2: (21 comments) I think this is looking good overall. http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/runtime/coordinator-backend-state.cc File be/src/runtime/coordinator-backend-state.cc: http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/runtime/coordinator-backend-state.cc@410 PS2, Line 410: return res.__isset.status ? Status(res.status) : Status::OK(); Why not always set TPublishFilterParams.status. This seems to be adding another way to represent Status::OK() http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/runtime/coordinator.cc@1179 PS2, Line 1179: // Cancel query if the result of PublishFilter rpc returns a non-ok status. Nit: slightly redundant wording with result/returns. E.g. "if the PublishFilter RPC returns a non-ok status" http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/runtime/coordinator.cc@1182 PS2, Line 1182: "This Error was encountered in Fragment $0 at $1:$2", fragment_idx, nit: don't need caps on Error/Fragment. http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/runtime/coordinator.cc@1184 PS2, Line 1184: Cancel(&status); If we're going to cancel the query there's no point sending out more filters so we could just break out of the loop. Alternatively, we could just log the error to LOG(ERROR) and continue, since successful delivery of a filter isn't required for correctness. I can see pros and cons for failing the query but don't jave a strong opinion about which behaviour is better. It may be simpler to just log the error to reduce the amount of logic on the error path. Or we could go even further and just log the error on the backend where it happens. http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/runtime/runtime-filter-bank.h File be/src/runtime/runtime-filter-bank.h: http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/runtime/runtime-filter-bank.h@111 PS2, Line 111: input output? http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/runtime/runtime-filter-bank.h@118 PS2, Line 118: *& Let's just use a double pointer for the output to be consistent with the rest of the code. http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/runtime/runtime-filter-bank.h@165 PS2, Line 165: total_filter_mem_required_ I think total_bloom_filter_mem_required_ would be more consistent with the rest of the variables. http://gerrit.cloudera.org:8080/#/c/8971/1/be/src/runtime/runtime-filter-bank.cc File be/src/runtime/runtime-filter-bank.cc: http://gerrit.cloudera.org:8080/#/c/8971/1/be/src/runtime/runtime-filter-bank.cc@189 PS1, Line 189: if (buffer_pool_client_.GetUnusedReservation() < required_space) { > yes its shouldn't fail, I just put it there as an additional check to make I don't see this change in PS2 http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/runtime/runtime-filter-bank.cc File be/src/runtime/runtime-filter-bank.cc: http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/runtime/runtime-filter-bank.cc@a78 PS2, Line 78: Hmm so we were unnecessarily re-allocating filters if they were already registered, right? http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/runtime/runtime-filter-bank.cc@a80 PS2, Line 80: Was the lock removed accidentally? http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/util/bloom-filter.h File be/src/util/bloom-filter.h: http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/util/bloom-filter.h@71 PS2, Line 71: void Close(); Maybe mention that Close() is safe to call multiple times or if Init() wasn't called or Init() failed. http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/util/bloom-filter.h@111 PS2, Line 111: /// buffer pool allocation failed and the filter should be discarded. Is the change in this function's behaviour needed now? http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/util/bloom-filter.cc File be/src/util/bloom-filter.cc: http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/util/bloom-filter.cc@30 PS2, Line 30: : always_false_(true), Maybe initialize the constant ones at their definition while we're here. I find that a bit more readable. http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/util/bloom-filter.cc@63 PS2, Line 63: directory_ directory_ != nullptr, to be consistent with the rest of the code http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/util/bloom-filter.cc@72 PS2, Line 72: if (directory_) { directory_ != nullptr, to be consistent with the rest of the code http://gerrit.cloudera.org:8080/#/c/8971/2/common/thrift/ImpalaInternalService.thrift File common/thrift/ImpalaInternalService.thrift: http://gerrit.cloudera.org:8080/#/c/8971/2/common/thrift/ImpalaInternalService.thrift@833 PS2, Line 833: 1: opt
[Impala-ASF-CR] IMPALA-6384: RequestPoolService should honor custom group mapping config
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/9000 ) Change subject: IMPALA-6384: RequestPoolService should honor custom group mapping config .. IMPALA-6384: RequestPoolService should honor custom group mapping config Due to the way in which we instantiate fair scheduler allocation loader, we donot read the config overrides from the HDFS config files. This is an unexpected behavior from users' POV since we typically support overrides like custom user -> group mapping via HDFS config (for ex: LDAPGroupsMapping) that eventually affects the query -> pool assignment. Fix: This patch loads the hadoop default configuration so that the underlying QueuePlacementPolicy is based on user specified overrides. Testing (manual): Changed the core-site.xml to use LDAPGroupsMapping instead of the default ShellBasedUnixGroupsMapping and confirmed that the correct group mapping plugin is loaded, by adding additional logging. Also, modified TestRequestPoolService to assert that the core-site xml overrides are loaded. Change-Id: Ibb93870c0cc37e2432a643a274931f1d3d13fb96 Reviewed-on: http://gerrit.cloudera.org:8080/9000 Reviewed-by: Bharath Vissapragada Tested-by: Impala Public Jenkins --- M common/yarn-extras/src/main/java/org/apache/impala/yarn/server/resourcemanager/scheduler/fair/QueuePlacementPolicy.java M fe/src/main/java/org/apache/impala/util/RequestPoolService.java M fe/src/test/java/org/apache/impala/util/TestRequestPoolService.java M testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.tmpl 4 files changed, 32 insertions(+), 2 deletions(-) Approvals: Bharath Vissapragada: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/9000 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ibb93870c0cc37e2432a643a274931f1d3d13fb96 Gerrit-Change-Number: 9000 Gerrit-PatchSet: 4 Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoram Thanga
[Impala-ASF-CR] IMPALA-6384: RequestPoolService should honor custom group mapping config
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/9000 ) Change subject: IMPALA-6384: RequestPoolService should honor custom group mapping config .. Patch Set 3: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/9000 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibb93870c0cc37e2432a643a274931f1d3d13fb96 Gerrit-Change-Number: 9000 Gerrit-PatchSet: 3 Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Thu, 11 Jan 2018 22:52:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6307: CTAS statement fails with duplicate column exception.
Hello Michael Ho, Dimitris Tsirogiannis, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8930 to look at the new patch set (#4). Change subject: IMPALA-6307: CTAS statement fails with duplicate column exception. .. IMPALA-6307: CTAS statement fails with duplicate column exception. A CTAS statement with a 'partition by' clause causes the statement to fail with a duplicate column name exception. This is happening because on expression rewrite, the partition defs state is not reset. IMPALA-5796 added TableDef::reset(). This patch expands the method by adding calls to reset ColumnDefs and PartitionColumnDefs. Testing: * Regression test added to AnalyzeDDLTest. * Exhaustive Jenkins build and test. Change-Id: Iee053abecd4384e15eec8db10cb06f5ace159da2 --- M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/main/java/org/apache/impala/analysis/TableDataLayout.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java 5 files changed, 27 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/8930/4 -- To view, visit http://gerrit.cloudera.org:8080/8930 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iee053abecd4384e15eec8db10cb06f5ace159da2 Gerrit-Change-Number: 8930 Gerrit-PatchSet: 4 Gerrit-Owner: Zoram Thanga Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zoram Thanga
[Impala-ASF-CR] IMPALA-6307: CTAS statement fails with duplicate column exception.
Zoram Thanga has posted comments on this change. ( http://gerrit.cloudera.org:8080/8930 ) Change subject: IMPALA-6307: CTAS statement fails with duplicate column exception. .. Patch Set 3: (1 comment) Thanks for the comment. Have added another test case, this one for Kudu. http://gerrit.cloudera.org:8080/#/c/8930/3/fe/src/main/java/org/apache/impala/analysis/TableDataLayout.java File fe/src/main/java/org/apache/impala/analysis/TableDataLayout.java: http://gerrit.cloudera.org:8080/#/c/8930/3/fe/src/main/java/org/apache/impala/analysis/TableDataLayout.java@58 PS3, Line 58: kuduPartitionParams_.clear(); > This is new. Do we have any existing tests with a CTAS on Kudu where the se Added a new CTAS test for Kudu, which is the same repro test case tailored for Kudu. -- To view, visit http://gerrit.cloudera.org:8080/8930 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iee053abecd4384e15eec8db10cb06f5ace159da2 Gerrit-Change-Number: 8930 Gerrit-PatchSet: 3 Gerrit-Owner: Zoram Thanga Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Thu, 11 Jan 2018 22:44:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6290: limit ScannerContext to 1 buffer at a time
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8814 ) Change subject: IMPALA-6290: limit ScannerContext to 1 buffer at a time .. Patch Set 14: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1710/ -- To view, visit http://gerrit.cloudera.org:8080/8814 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I74c5960a75f7d88b0e1de4199af731fb13e592f0 Gerrit-Change-Number: 8814 Gerrit-PatchSet: 14 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Thu, 11 Jan 2018 22:24:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6290: limit ScannerContext to 1 buffer at a time
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8814 ) Change subject: IMPALA-6290: limit ScannerContext to 1 buffer at a time .. Patch Set 14: Code-Review+2 Sorry for the churn here - it looks like I changed the expected output for those tests based on a broken local data load -- To view, visit http://gerrit.cloudera.org:8080/8814 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I74c5960a75f7d88b0e1de4199af731fb13e592f0 Gerrit-Change-Number: 8814 Gerrit-PatchSet: 14 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Thu, 11 Jan 2018 22:24:17 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6290: limit ScannerContext to 1 buffer at a time
Hello Michael Ho, Lars Volker, Zoram Thanga, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8814 to look at the new patch set (#14). Change subject: IMPALA-6290: limit ScannerContext to 1 buffer at a time .. IMPALA-6290: limit ScannerContext to 1 buffer at a time This is a prerequisite for constraining the number of buffers per scan range. Before this patch, calling ReadBytes(), SkipBytes(), etc could cause an arbitrary number of I/O buffers to accumulate in 'completed_io_buffers_'. E.g. if we allocated 3 * 8MB I/O buffers for a range and then called ReadBytes(30MB), we would hit resource exhaustion as soon as 3 buffers were accumulated in 'completed_io_buffers_'. The fix is to avoid accumulating any buffers in 'completed_io_buffers_'. Instead of adding them to 'completed_io_buffers_', completed buffers are just returned to the I/O manager. It turned out that this did not weaken the ScannerContext's guarantees about memory lifetime, because ScannerContext::GetBytesInternal() cleared 'boundary_buffer_' each time it was called regardless. I checked that this behaviour wasn't a bug by inspecting the scanner code. I could not find any cases where scanners depended on returned memory remaining valid beyond the next Read*()/Get*()/Skip*() call on the stream. This change makes that lifetime explicit in the comments. A side-effect of this fix is that scanners do not need to call ReleaseCompletedResources() in CommitRows() and means that the ScannerContext only ever needs to hold one I/O buffer at a time. This change also reimplements SkipBytes() to avoid it accumulating memory in the boundary buffer for large skip sizes. Also clarifies some of the invariants in ScannerContext. E.g. some places assumed io_buffer_ != NULL, but that is no longer needed. Testing: Ran core tests with ASAN and exhaustive tests with DEBUG. Change-Id: I74c5960a75f7d88b0e1de4199af731fb13e592f0 --- M be/src/exec/base-sequence-scanner.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/hdfs-text-scanner.cc M be/src/exec/parquet-column-readers.cc M be/src/exec/scanner-context.cc M be/src/exec/scanner-context.h M be/src/exec/scanner-context.inline.h M be/src/runtime/io/disk-io-mgr.cc 10 files changed, 315 insertions(+), 209 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/8814/14 -- To view, visit http://gerrit.cloudera.org:8080/8814 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I74c5960a75f7d88b0e1de4199af731fb13e592f0 Gerrit-Change-Number: 8814 Gerrit-PatchSet: 14 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoram Thanga
[Impala-ASF-CR] IMPALA-5528: Upgrade GPerfTools to 2.6.3 and tune TCMalloc for KRPC
Sailesh Mukil has removed a vote on this change. Change subject: IMPALA-5528: Upgrade GPerfTools to 2.6.3 and tune TCMalloc for KRPC .. Removed Code-Review+2 by Sailesh Mukil -- To view, visit http://gerrit.cloudera.org:8080/8991 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: deleteVote Gerrit-Change-Id: I5be574435af51fb7a875b16888cca260b341190e Gerrit-Change-Number: 8991 Gerrit-PatchSet: 3 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5528: Upgrade GPerfTools to 2.6.3 and tune TCMalloc for KRPC
Sailesh Mukil has posted comments on this change. ( http://gerrit.cloudera.org:8080/8991 ) Change subject: IMPALA-5528: Upgrade GPerfTools to 2.6.3 and tune TCMalloc for KRPC .. Patch Set 3: > (1 comment) To avoid extra complication then, wouldn't it be easier to move the initialization somewhere else? -- To view, visit http://gerrit.cloudera.org:8080/8991 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5be574435af51fb7a875b16888cca260b341190e Gerrit-Change-Number: 8991 Gerrit-PatchSet: 3 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 11 Jan 2018 22:21:37 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6290: limit ScannerContext to 1 buffer at a time
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8814 ) Change subject: IMPALA-6290: limit ScannerContext to 1 buffer at a time .. Patch Set 13: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/1707/ -- To view, visit http://gerrit.cloudera.org:8080/8814 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I74c5960a75f7d88b0e1de4199af731fb13e592f0 Gerrit-Change-Number: 8814 Gerrit-PatchSet: 13 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Thu, 11 Jan 2018 22:16:18 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5528: Upgrade GPerfTools to 2.6.3 and tune TCMalloc for KRPC
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/8991 ) Change subject: IMPALA-5528: Upgrade GPerfTools to 2.6.3 and tune TCMalloc for KRPC .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/8991/3/be/src/common/init.cc File be/src/common/init.cc: http://gerrit.cloudera.org:8080/#/c/8991/3/be/src/common/init.cc@262 PS3, Line 262: MallocExtension::instance()->SetNumericProperty( : "tcmalloc.aggressive_memory_decommit", 1); This actually causes a linking issue in libfesupport.so which isn't linked against the tcmalloc library. FE tests will fail due to unresolved symbols. Two possible ways of fixing it: - Move this initialization somewhere - Fix CMakeList.txt of libCommon.a to add a dependency on tcmallocstatic but this will make IMPALA_LINK_LIBS_DYNAMIC_TARGETS moot as it explicitly avoids linking against tcmalloc library due to the use of JVM but I cannot quite understand the reasoning of the comments as we statically link against tcmalloc library: # The above link list does not include tcmalloc. This is because the Impala JVM support # libraries (libfesupport, libloggingsupport) cannot use tcmalloc in all cases. When they # are started up by the FE (for tests) the jvm has already made allocations before # tcmalloc can be loaded. In all other binaries, we can use tcmalloc except the address # sanitizer build. Address sanitizer is incompatible with tcmalloc (they both intercept # malloc/free) -- To view, visit http://gerrit.cloudera.org:8080/8991 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5be574435af51fb7a875b16888cca260b341190e Gerrit-Change-Number: 8991 Gerrit-PatchSet: 3 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 11 Jan 2018 22:12:11 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5519: Allocate Runtime filter memory from Buffer pool
Bikramjeet Vig has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8971 Change subject: IMPALA-5519: Allocate Runtime filter memory from Buffer pool .. IMPALA-5519: Allocate Runtime filter memory from Buffer pool This patch adds changes to the planner to account for memory used by bloom filters at the fragment instance level. Also adds changes to allocate memory for the bloom filters from the buffer pool. Testing: - Modified Planner Tests and end to end tests to account for memory reservation for the runtime filters. - Modified backend tests and benchmarks to use the bufferpool for bloom filter allocation. - Add an end to end test. - Ran rest of the core tests. Change-Id: Iea2759665fb2e8bef9433014a8d42a7ebf99ce1f --- M be/src/benchmarks/bloom-filter-benchmark.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-builder.h M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/coordinator-backend-state.h M be/src/runtime/coordinator.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/runtime-filter-bank.cc M be/src/runtime/runtime-filter-bank.h M be/src/runtime/runtime-filter.h M be/src/runtime/runtime-state.cc M be/src/runtime/runtime-state.h M be/src/service/impala-internal-service.cc M be/src/util/backend-gflag-util.cc M be/src/util/bloom-filter-test.cc M be/src/util/bloom-filter.cc M be/src/util/bloom-filter.h M common/thrift/BackendGflags.thrift M common/thrift/ImpalaInternalService.thrift M common/thrift/PlanNodes.thrift M common/thrift/Planner.thrift M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M testdata/workloads/functional-planner/queries/PlannerTest/disable-codegen.test M testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test M testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test M testdata/workloads/functional-query/queries/QueryTest/admission-reject-min-reservation.test M testdata/workloads/functional-query/queries/QueryTest/bloom_filters.test M testdata/workloads/functional-query/queries/QueryTest/bloom_filters_wait.test M testdata/workloads/functional-query/queries/QueryTest/explain-level0.test M testdata/workloads/functional-query/queries/QueryTest/explain-level1.test M testdata/workloads/functional-query/queries/QueryTest/explain-level2.test M testdata/workloads/functional-query/queries/QueryTest/explain-level3.test M testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters.test M testdata/workloads/functional-query/queries/QueryTest/spilling.test 43 files changed, 868 insertions(+), 559 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/8971/2 -- To view, visit http://gerrit.cloudera.org:8080/8971 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Iea2759665fb2e8bef9433014a8d42a7ebf99ce1f Gerrit-Change-Number: 8971 Gerrit-PatchSet: 2 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4252: [DOCS] Document min/max filters for Kudu tables
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8986 ) Change subject: IMPALA-4252: [DOCS] Document min/max filters for Kudu tables .. Patch Set 3: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/8986 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I15d8c952ab5b90e89fdd57640dfb4da882f7ecb2 Gerrit-Change-Number: 8986 Gerrit-PatchSet: 3 Gerrit-Owner: John Russell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: John Russell Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Thu, 11 Jan 2018 21:49:22 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4252: [DOCS] Document min/max filters for Kudu tables
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8986 ) Change subject: IMPALA-4252: [DOCS] Document min/max filters for Kudu tables .. IMPALA-4252: [DOCS] Document min/max filters for Kudu tables Change-Id: I15d8c952ab5b90e89fdd57640dfb4da882f7ecb2 Reviewed-on: http://gerrit.cloudera.org:8080/8986 Reviewed-by: Thomas Tauber-Marshall Tested-by: Impala Public Jenkins --- M docs/shared/impala_common.xml M docs/topics/impala_disable_row_runtime_filtering.xml M docs/topics/impala_kudu.xml M docs/topics/impala_max_num_runtime_filters.xml M docs/topics/impala_runtime_bloom_filter_size.xml M docs/topics/impala_runtime_filter_max_size.xml M docs/topics/impala_runtime_filter_min_size.xml M docs/topics/impala_runtime_filtering.xml 8 files changed, 71 insertions(+), 6 deletions(-) Approvals: Thomas Tauber-Marshall: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/8986 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I15d8c952ab5b90e89fdd57640dfb4da882f7ecb2 Gerrit-Change-Number: 8986 Gerrit-PatchSet: 4 Gerrit-Owner: John Russell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: John Russell Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Todd Lipcon
[Impala-ASF-CR] IMPALA-4252: [DOCS] Document min/max filters for Kudu tables
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8986 ) Change subject: IMPALA-4252: [DOCS] Document min/max filters for Kudu tables .. Patch Set 3: Build started: https://jenkins.impala.io/job/gerrit-docs-submit/185/ -- To view, visit http://gerrit.cloudera.org:8080/8986 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I15d8c952ab5b90e89fdd57640dfb4da882f7ecb2 Gerrit-Change-Number: 8986 Gerrit-PatchSet: 3 Gerrit-Owner: John Russell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: John Russell Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Thu, 11 Jan 2018 21:39:38 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3193: Show table's comment on show tables
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/8851 ) Change subject: IMPALA-3193: Show table's comment on show tables .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/8851/5/common/thrift/Frontend.thrift File common/thrift/Frontend.thrift: http://gerrit.cloudera.org:8080/#/c/8851/5/common/thrift/Frontend.thrift@70 PS5, Line 70: Arguments to getTableNamesAndComments I think we can clean up the API a bit. Instead of having getTableNames and getTableNamesAndComments, how about we create a single getTablesInfo call that takes a TGetTablesInfoParams (renamed TGetTablesParams) argument and returns a TGetTablesInfoResult (renamed TGetTablesResult). The latter returns a list, where TTableInfo has a required name and optional comment field. That way, we avoid the checks around the length of list of names and list of comments and it's easier to expand it if in the future we decide to add additional information such as table creation time, owner, etc. -- To view, visit http://gerrit.cloudera.org:8080/8851 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I44f814af05db6f3c027718ade9f474f8b8153bcd Gerrit-Change-Number: 8851 Gerrit-PatchSet: 5 Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Kim Jin Chul Gerrit-Comment-Date: Thu, 11 Jan 2018 21:38:38 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3942: Fix wronly escaped string literal in front-end
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/8818 ) Change subject: IMPALA-3942: Fix wronly escaped string literal in front-end .. Patch Set 6: (6 comments) Code looks good to me, thanks for the contribution! I have just one more round of comments, all about keeping your comments clear and concise. After that, we should be able to pull a +2-er in http://gerrit.cloudera.org:8080/#/c/8818/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8818/6//COMMIT_MSG@7 PS6, Line 7: wronly wrongly http://gerrit.cloudera.org:8080/#/c/8818/6//COMMIT_MSG@15 PS6, Line 15: and that's why we just choose one : to always normalize to remove this, covered by the paragraph below. http://gerrit.cloudera.org:8080/#/c/8818/6//COMMIT_MSG@19 PS6, Line 19: . ... to single quotes http://gerrit.cloudera.org:8080/#/c/8818/6/fe/src/main/java/org/apache/impala/analysis/StringLiteral.java File fe/src/main/java/org/apache/impala/analysis/StringLiteral.java: http://gerrit.cloudera.org:8080/#/c/8818/6/fe/src/main/java/org/apache/impala/analysis/StringLiteral.java@41 PS6, Line 41: It is not always possible to determine if a string "should" be : // single or double quoted(e.g. concat('a', "b")) and that's why we just choose one : // to always normalize to. This second sentence is not necessary, this is explained better in the larger comment below anyways. http://gerrit.cloudera.org:8080/#/c/8818/6/fe/src/main/java/org/apache/impala/analysis/StringLiteral.java@99 PS6, Line 99: . : // Here is a background why we should deploy single quotes as a default Its nice to keep comments short, so this can just be reduced to 'because:' http://gerrit.cloudera.org:8080/#/c/8818/6/fe/src/main/java/org/apache/impala/analysis/StringLiteral.java@103 PS6, Line 103: It is easy to know if a given string is wrapped by single/double quotes, but Similarly, in the interest of being concise, this can be removed. -- To view, visit http://gerrit.cloudera.org:8080/8818 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibc4b5f5d8ffaa8feb96a466959427a04b3b06fec Gerrit-Change-Number: 8818 Gerrit-PatchSet: 6 Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 11 Jan 2018 21:14:02 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4252: [DOCS] Document min/max filters for Kudu tables
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/8986 ) Change subject: IMPALA-4252: [DOCS] Document min/max filters for Kudu tables .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8986 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I15d8c952ab5b90e89fdd57640dfb4da882f7ecb2 Gerrit-Change-Number: 8986 Gerrit-PatchSet: 3 Gerrit-Owner: John Russell Gerrit-Reviewer: John Russell Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Thu, 11 Jan 2018 20:59:54 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4323: "SET ROW FORMAT" option added to "ALTER TABLE" command
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8928 ) Change subject: IMPALA-4323: "SET ROW FORMAT" option added to "ALTER TABLE" command .. Patch Set 8: (3 comments) Almost there http://gerrit.cloudera.org:8080/#/c/8928/8/fe/src/main/java/org/apache/impala/analysis/AlterTableSetRowFormatStmt.java File fe/src/main/java/org/apache/impala/analysis/AlterTableSetRowFormatStmt.java: http://gerrit.cloudera.org:8080/#/c/8928/8/fe/src/main/java/org/apache/impala/analysis/AlterTableSetRowFormatStmt.java@72 PS8, Line 72: throw new AnalysisException("ALTER TABLE SET ROW FORMAT is only supported " + The new analyzer tests don't cover this scenario, please add a test. You can use functional.alltypesmixedformat. http://gerrit.cloudera.org:8080/#/c/8928/8/fe/src/main/java/org/apache/impala/analysis/AlterTableSetRowFormatStmt.java@73 PS8, Line 73: "on TEXT or SEQUENCE file formats: " + partition.getPartitionName()); Let's add a sentence "Conflicting partition: " + name and include the format of the partition otherwise it's not obvious what the partition name means http://gerrit.cloudera.org:8080/#/c/8928/8/fe/src/main/java/org/apache/impala/analysis/AlterTableSetRowFormatStmt.java@82 PS8, Line 82: "on TEXT or SEQUENCE file formats: " + tbl.getFullName()); let's include the format of the table -- To view, visit http://gerrit.cloudera.org:8080/8928 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I96e347463504915a6f33932552e4d1f61e9b1154 Gerrit-Change-Number: 8928 Gerrit-PatchSet: 8 Gerrit-Owner: Adam Holley Gerrit-Reviewer: Adam Holley Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: John Russell Gerrit-Reviewer: Kim Jin Chul Gerrit-Comment-Date: Thu, 11 Jan 2018 20:53:27 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1767: [DOCS] Document new Boolean operators
John Russell has posted comments on this change. ( http://gerrit.cloudera.org:8080/8942 ) Change subject: IMPALA-1767: [DOCS] Document new Boolean operators .. Patch Set 5: Adding Alex, who reviewed the code gerrit. Looking for a +2. -- To view, visit http://gerrit.cloudera.org:8080/8942 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iefebf210418ec2d47b154bd37166b76720f085bb Gerrit-Change-Number: 8942 Gerrit-PatchSet: 5 Gerrit-Owner: John Russell Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: John Russell Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Thu, 11 Jan 2018 20:25:36 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4252: [DOCS] Document min/max filters for Kudu tables
John Russell has posted comments on this change. ( http://gerrit.cloudera.org:8080/8986 ) Change subject: IMPALA-4252: [DOCS] Document min/max filters for Kudu tables .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/8986/2/docs/topics/impala_runtime_filtering.xml File docs/topics/impala_runtime_filtering.xml: http://gerrit.cloudera.org:8080/#/c/8986/2/docs/topics/impala_runtime_filtering.xml@181 PS2, Line 181: ture representing a minimum and ma > This is the only part I see that doesn't make sense for min-max filters, as Done http://gerrit.cloudera.org:8080/#/c/8986/2/docs/topics/impala_runtime_filtering.xml@203 PS2, Line 203: gher, the default for runtime filtering is the GLOBAL setting. : > I find this sentence confusing, as Kudu isn't identifying the matching rows Done -- To view, visit http://gerrit.cloudera.org:8080/8986 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I15d8c952ab5b90e89fdd57640dfb4da882f7ecb2 Gerrit-Change-Number: 8986 Gerrit-PatchSet: 3 Gerrit-Owner: John Russell Gerrit-Reviewer: John Russell Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Thu, 11 Jan 2018 20:16:51 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4252: [DOCS] Document min/max filters for Kudu tables
Hello Thomas Tauber-Marshall, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8986 to look at the new patch set (#3). Change subject: IMPALA-4252: [DOCS] Document min/max filters for Kudu tables .. IMPALA-4252: [DOCS] Document min/max filters for Kudu tables Change-Id: I15d8c952ab5b90e89fdd57640dfb4da882f7ecb2 --- M docs/shared/impala_common.xml M docs/topics/impala_disable_row_runtime_filtering.xml M docs/topics/impala_kudu.xml M docs/topics/impala_max_num_runtime_filters.xml M docs/topics/impala_runtime_bloom_filter_size.xml M docs/topics/impala_runtime_filter_max_size.xml M docs/topics/impala_runtime_filter_min_size.xml M docs/topics/impala_runtime_filtering.xml 8 files changed, 71 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/86/8986/3 -- To view, visit http://gerrit.cloudera.org:8080/8986 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I15d8c952ab5b90e89fdd57640dfb4da882f7ecb2 Gerrit-Change-Number: 8986 Gerrit-PatchSet: 3 Gerrit-Owner: John Russell Gerrit-Reviewer: John Russell Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Todd Lipcon
[Impala-ASF-CR] Fix typo in test observability.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/9003 ) Change subject: Fix typo in test_observability. .. Fix typo in test_observability. Should fix "NameError: global name 'dgb_str' is not defined". Change-Id: Ida3f355c6c6be5ed52e4d445f8f80665cdc8e2b8 Reviewed-on: http://gerrit.cloudera.org:8080/9003 Reviewed-by: Alex Behm Reviewed-by: Zoram Thanga Tested-by: Impala Public Jenkins --- M tests/query_test/test_observability.py 1 file changed, 1 insertion(+), 1 deletion(-) Approvals: Alex Behm: Looks good to me, approved Zoram Thanga: Looks good to me, but someone else must approve Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/9003 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ida3f355c6c6be5ed52e4d445f8f80665cdc8e2b8 Gerrit-Change-Number: 9003 Gerrit-PatchSet: 2 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoram Thanga
[Impala-ASF-CR] Fix typo in test observability.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/9003 ) Change subject: Fix typo in test_observability. .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/9003 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ida3f355c6c6be5ed52e4d445f8f80665cdc8e2b8 Gerrit-Change-Number: 9003 Gerrit-PatchSet: 1 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Thu, 11 Jan 2018 20:10:00 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4993: extend dictionary filtering to collections
Hello Lars Volker, Tim Armstrong, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8775 to look at the new patch set (#9). Change subject: IMPALA-4993: extend dictionary filtering to collections .. IMPALA-4993: extend dictionary filtering to collections Currently, top-level scalar columns in parquet files can be used at runtime to prune row-groups by evaluating certain conjuncts over the column's dictionary (if available). This change extends such pruning to scalar values that are stored in collection type columns. Currently, dictionary pruning works by finding eligible conjuncts for top-level slots. Since only top-level slots are supported, the slots are implicitly part of the scan node's tuple descriptor. With this change, we track eligible conjuncts by slot as well as the tuple that contains the slot (either top-level or nested collection). Since collection conjuncts are already managed by a map that associates tuple descriptors to a list of their conjuncts, this extension follows the existing representation. The frontend builds the mapping of to conjuncts that are dictionary filterable. The backend is adjusted to use the same representation. In addition, collection readers are decomposed into scalar filterable columns and other, non-dictionary filterable readers. When filtering a row group using a conjunct associated to a (possibly) nested collection type, an additional tuple buffer is allocated per tuple descriptor. Testing: - e2e test extended to illustrate row-groups that are pruned by nested collection dictionary filters. Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/parquet-column-readers.cc M be/src/runtime/collection-value-builder.h M be/src/runtime/scoped-buffer.h M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test M testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test M testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test M testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test 14 files changed, 650 insertions(+), 208 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/8775/9 -- To view, visit http://gerrit.cloudera.org:8080/8775 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd Gerrit-Change-Number: 8775 Gerrit-PatchSet: 9 Gerrit-Owner: Vuk Ercegovac Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-4993: extend dictionary filtering to collections
Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/8775 ) Change subject: IMPALA-4993: extend dictionary filtering to collections .. Patch Set 8: (23 comments) http://gerrit.cloudera.org:8080/#/c/8775/8/be/src/exec/hdfs-parquet-scanner.h File be/src/exec/hdfs-parquet-scanner.h: http://gerrit.cloudera.org:8080/#/c/8775/8/be/src/exec/hdfs-parquet-scanner.h@420 PS8, Line 420: /// dictionary filtering. The definition/repetition levels are are populated by the > are are Done http://gerrit.cloudera.org:8080/#/c/8775/8/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/8775/8/be/src/exec/hdfs-parquet-scanner.cc@778 PS8, Line 778: } > FWIW, I discovered today I needed to make some similar adjustments to the c will take a look at your change more closely. http://gerrit.cloudera.org:8080/#/c/8775/8/be/src/exec/hdfs-parquet-scanner.cc@778 PS8, Line 778: } > Did you intend to add the collection_reader to non_filtering? no, I intended to omit the collection readers. this partitions all nested scalar readers. http://gerrit.cloudera.org:8080/#/c/8775/8/be/src/exec/hdfs-parquet-scanner.cc@809 PS8, Line 809: if (!state_->query_options().parquet_dictionary_filtering || dict_filter_map_.empty()) { > Looks like non_dict_filterable_readers_ is inconsistently populated. With d yup... I assumed that if its a mix of collection + scalars is ok since that's what works today (perhaps modified by tim's change). And I don't see a problem with flattening down to just the scalar readers. http://gerrit.cloudera.org:8080/#/c/8775/8/be/src/exec/hdfs-parquet-scanner.cc@845 PS8, Line 845: if (tuple_desc == nullptr) tuple_desc = scan_node_->tuple_desc(); > Why this check? I believe all SlotDescriptors require a parent. We even DHC that's correct, removed. I think I carried that conservative check over from the frontend where the parent is not required to be non-null in the ctor. should it be required? http://gerrit.cloudera.org:8080/#/c/8775/8/be/src/exec/hdfs-scanner.h File be/src/exec/hdfs-scanner.h: http://gerrit.cloudera.org:8080/#/c/8775/8/be/src/exec/hdfs-scanner.h@222 PS8, Line 222: typedef std::map, std::vector> > Why is the key a pair and not just SlotId or a SlotDescriptor*? I'll comment here on the vs slotid decision here (covers several related comments in the thrift file and hdfsscannode). I suspected that slotid is unique across tuples so with your other comments and various other implications baked into various apis, I've gone ahead and made this change. However, with a two-level naming scheme (tuple/slot), I was a bit unclear on this and did not find it stated explicitly. I would've expected this in frontend's TupleDescriptor-- should I add a clarifying blurb there? http://gerrit.cloudera.org:8080/#/c/8775/8/common/thrift/PlanNodes.thrift File common/thrift/PlanNodes.thrift: http://gerrit.cloudera.org:8080/#/c/8775/8/common/thrift/PlanNodes.thrift@245 PS8, Line 245: 9: optional list dictionary_filter_conjuncts > Not sure I understand why this needed to change. SlotIds are unique, so wha Done. See comment in hdfs-scanner for why I went down this route. http://gerrit.cloudera.org:8080/#/c/8775/8/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/8775/8/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@115 PS8, Line 115: * each value present in the row group's dictionary > ... prune a row group by evaluating the conjuncts against the respective co Done http://gerrit.cloudera.org:8080/#/c/8775/8/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@200 PS8, Line 200: private Map, List> dictionaryFilterConjuncts_ = > Unfortunate that we have to use a Pair as key since conceptually the pair h Done. See comment in hdfs-scanner for why I went down this route. http://gerrit.cloudera.org:8080/#/c/8775/8/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@543 PS8, Line 543:* These filters are added by analysis (see: SelectStmt#registerIsNotEmptyPredicates). > Move this comment to notEmptyCollections_ Done http://gerrit.cloudera.org:8080/#/c/8775/8/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@548 PS8, Line 548: private boolean isCollectionNotEmpty(TupleDescriptor desc) { > Why do we need this function? Can't we just check notEmptyCollections_.cont its used in a couple of places and I wanted to keep the same checks in both (and not repeat the checks). http://gerrit.cloudera.org:8080/#/c/8775/8/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@638 PS8, Line 638: private void addDictionaryFilter(Analyzer analyzer, Expr conjunct, int conjunct_idx) { > conjunctIdx Done http://gerrit.cloudera.org:8080/#/c/8775/8/fe/src/main/java/org/apac
[Impala-ASF-CR] IMPALA-3526: update FE tests to pass on S3
Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/8890 ) Change subject: IMPALA-3526: update FE tests to pass on S3 .. Patch Set 5: alex, pls have a look and let me know if you have additional comments. -- To view, visit http://gerrit.cloudera.org:8080/8890 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4c8221949e76b0a0b9192e6b56c4da5eeae04141 Gerrit-Change-Number: 8890 Gerrit-PatchSet: 5 Gerrit-Owner: Vuk Ercegovac Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Thu, 11 Jan 2018 20:01:50 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3526: update FE tests to pass on S3
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/8890 ) Change subject: IMPALA-3526: update FE tests to pass on S3 .. Patch Set 5: Code-Review+2 I am fine with the change to unblock the tests. Feel free to check with Alex in case he has any comments/concerns. -- To view, visit http://gerrit.cloudera.org:8080/8890 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4c8221949e76b0a0b9192e6b56c4da5eeae04141 Gerrit-Change-Number: 8890 Gerrit-PatchSet: 5 Gerrit-Owner: Vuk Ercegovac Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Thu, 11 Jan 2018 19:54:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5654: Disallow setting Kudu table name in CREATE TABLE
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8820 ) Change subject: IMPALA-5654: Disallow setting Kudu table name in CREATE TABLE .. Patch Set 13: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1709/ -- To view, visit http://gerrit.cloudera.org:8080/8820 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ieca037498abf8f5fde67b77e824b720482cdbe6f Gerrit-Change-Number: 8820 Gerrit-PatchSet: 13 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 11 Jan 2018 19:52:31 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5654: Disallow setting Kudu table name in CREATE TABLE
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/8820 ) Change subject: IMPALA-5654: Disallow setting Kudu table name in CREATE TABLE .. Patch Set 13: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8820 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ieca037498abf8f5fde67b77e824b720482cdbe6f Gerrit-Change-Number: 8820 Gerrit-PatchSet: 13 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 11 Jan 2018 19:51:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5654: Disallow setting Kudu table name in CREATE TABLE
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/8820 ) Change subject: IMPALA-5654: Disallow setting Kudu table name in CREATE TABLE .. Patch Set 12: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8820 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ieca037498abf8f5fde67b77e824b720482cdbe6f Gerrit-Change-Number: 8820 Gerrit-PatchSet: 12 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 11 Jan 2018 19:51:49 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6363: avoid cscope build races
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9007 ) Change subject: IMPALA-6363: avoid cscope build races .. Patch Set 1: Yeah, I figure we could go ahead with this and see if the problem stops reproducing. -- To view, visit http://gerrit.cloudera.org:8080/9007 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I22bdb7c64036cb88a8a10907af35c5e3a55a9195 Gerrit-Change-Number: 9007 Gerrit-PatchSet: 1 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 11 Jan 2018 19:46:46 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6363: avoid cscope build races
Michael Brown has posted comments on this change. ( http://gerrit.cloudera.org:8080/9007 ) Change subject: IMPALA-6363: avoid cscope build races .. Patch Set 1: Code-Review+1 It seems like this couldn't hurt even if IMPALA-6363 does not happen much. I had the luxury of checking a CentOS 5 host, and the -ignore_readdir_race flag does exist there. -- To view, visit http://gerrit.cloudera.org:8080/9007 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I22bdb7c64036cb88a8a10907af35c5e3a55a9195 Gerrit-Change-Number: 9007 Gerrit-PatchSet: 1 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Michael Brown Gerrit-Comment-Date: Thu, 11 Jan 2018 19:41:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6363: avoid cscope build races
Tim Armstrong has uploaded this change for review. ( http://gerrit.cloudera.org:8080/9007 Change subject: IMPALA-6363: avoid cscope build races .. IMPALA-6363: avoid cscope build races Use the -ignore_readdir_race flag for find so that find doesn't fail if a directory disappears under it. From what I could tell the flag has been in GNU find for a long time and is also available in other OS flavours like BSD and OS X. Make the step depend on gen-deps so that it can index thrift, protobuf, etc, output. Change-Id: I22bdb7c64036cb88a8a10907af35c5e3a55a9195 --- M CMakeLists.txt M bin/gen-cscope.sh 2 files changed, 6 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/9007/1 -- To view, visit http://gerrit.cloudera.org:8080/9007 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I22bdb7c64036cb88a8a10907af35c5e3a55a9195 Gerrit-Change-Number: 9007 Gerrit-PatchSet: 1 Gerrit-Owner: Tim Armstrong
[Impala-ASF-CR] IMPALA-6268: KerberosOnAndOff/RpcMgrKerberizedTest.MultipleServices failing
Sailesh Mukil has posted comments on this change. ( http://gerrit.cloudera.org:8080/9006 ) Change subject: IMPALA-6268: KerberosOnAndOff/RpcMgrKerberizedTest.MultipleServices failing .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/9006/2/CMakeLists.txt File CMakeLists.txt: http://gerrit.cloudera.org:8080/#/c/9006/2/CMakeLists.txt@310 PS2, Line 310: # Tests that run any security related tests need to link this in to override the > This is pretty opaque. Could you provide a comment here with what's going o It's pretty well explained in be/src/kudu/security/krb5_realm_override.cc, so I added a comment to refer that file. -- To view, visit http://gerrit.cloudera.org:8080/9006 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I60e291e8aa1b59b645b856d33c658471f314c221 Gerrit-Change-Number: 9006 Gerrit-PatchSet: 3 Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Thu, 11 Jan 2018 19:29:56 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6268: KerberosOnAndOff/RpcMgrKerberizedTest.MultipleServices failing
Hello Philip Zeyliger, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9006 to look at the new patch set (#3). Change subject: IMPALA-6268: KerberosOnAndOff/RpcMgrKerberizedTest.MultipleServices failing .. IMPALA-6268: KerberosOnAndOff/RpcMgrKerberizedTest.MultipleServices failing On systems that have Kerberos 1.11 or earlier, service principals with IP addresses are not supported due to a bug: http://krbdev.mit.edu/rt/Ticket/Display.html?id=7603 Since our BE tests use such principals, they fail on older platforms with the above mentioned kerberos versions. Kudu fixed this by adding a workaround which overrides krb5_realm_override. https://github.com/cloudera/kudu/commit/ba2ae3de4a7c43ff2f5873e822410e066ea99667 However, when we moved Kudu's security library into Impala, we did not add the appropriate build flags that allow it to be used. This patch fixes that. Testing: Verified that the failing test runs successfully on CentOs 6.4 with Kerberos 1.10.3 Change-Id: I60e291e8aa1b59b645b856d33c658471f314c221 --- M CMakeLists.txt M be/src/rpc/CMakeLists.txt M be/src/rpc/rpc-mgr-test.cc 3 files changed, 9 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/9006/3 -- To view, visit http://gerrit.cloudera.org:8080/9006 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I60e291e8aa1b59b645b856d33c658471f314c221 Gerrit-Change-Number: 9006 Gerrit-PatchSet: 3 Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Philip Zeyliger
[Impala-ASF-CR] IMPALA-6268: KerberosOnAndOff/RpcMgrKerberizedTest.MultipleServices failing
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/9006 ) Change subject: IMPALA-6268: KerberosOnAndOff/RpcMgrKerberizedTest.MultipleServices failing .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/9006/2/CMakeLists.txt File CMakeLists.txt: http://gerrit.cloudera.org:8080/#/c/9006/2/CMakeLists.txt@310 PS2, Line 310: set(KRB5_REALM_OVERRIDE -Wl,--undefined=krb5_realm_override_loaded krb5_realm_override) This is pretty opaque. Could you provide a comment here with what's going on? -- To view, visit http://gerrit.cloudera.org:8080/9006 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I60e291e8aa1b59b645b856d33c658471f314c221 Gerrit-Change-Number: 9006 Gerrit-PatchSet: 2 Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Thu, 11 Jan 2018 19:16:56 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6383: free memory after skipping parquet row groups
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9002 ) Change subject: IMPALA-6383: free memory after skipping parquet row groups .. Patch Set 1: (4 comments) http://gerrit.cloudera.org:8080/#/c/9002/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/9002/1//COMMIT_MSG@13 PS1, Line 13: Testing: My feeling is that this testing is enough for now. With the reservation changes forthcoming, the multiple-row-group tests failed hard when it tried to allocate new buffers for the next row group when the buffers for the previous row group were released. We could add a mem_limit test tuned to reproduce it but those have a tendency to be flaky and it doesn't seem worth the effort when more robust test coverage will be forthcoming. http://gerrit.cloudera.org:8080/#/c/9002/1/be/src/exec/hdfs-parquet-scanner.h File be/src/exec/hdfs-parquet-scanner.h: http://gerrit.cloudera.org:8080/#/c/9002/1/be/src/exec/hdfs-parquet-scanner.h@647 PS1, Line 647: /// were returned. and when returning the last batch. > nit: remove "."? Done http://gerrit.cloudera.org:8080/#/c/9002/1/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/9002/1/be/src/exec/hdfs-parquet-scanner.cc@623 PS1, Line 623: // We skipped the previous row group. Release resources before starting another. > This gets called for cases where no resources have been reserved (e.g. L655 I could also reorganise the code so that ReleaseSkippedRowGroupResources() gets called before "continue" so that it's more explicit. I was originally thinking that would be error-prone and easy to forget, but with the NumStreams() DCHECK maybe that won't be the case. http://gerrit.cloudera.org:8080/#/c/9002/1/be/src/exec/hdfs-parquet-scanner.cc@738 PS1, Line 738: col_reader->Close(row_batch); > nit: while you're here you could reduce this to a single line. Done -- To view, visit http://gerrit.cloudera.org:8080/9002 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibc2f8f27c9b238be60261539f8d4be2facb57a2b Gerrit-Change-Number: 9002 Gerrit-PatchSet: 1 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 11 Jan 2018 19:16:43 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6384: RequestPoolService should honor custom group mapping config
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/9000 ) Change subject: IMPALA-6384: RequestPoolService should honor custom group mapping config .. Patch Set 3: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1708/ -- To view, visit http://gerrit.cloudera.org:8080/9000 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibb93870c0cc37e2432a643a274931f1d3d13fb96 Gerrit-Change-Number: 9000 Gerrit-PatchSet: 3 Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Thu, 11 Jan 2018 19:16:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6383: free memory after skipping parquet row groups
Hello Michael Ho, Lars Volker, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9002 to look at the new patch set (#2). Change subject: IMPALA-6383: free memory after skipping parquet row groups .. IMPALA-6383: free memory after skipping parquet row groups Before this patch, resources were only flushed after breaking out of NextRowGroup(). This is a problem because resources can be allocated for skipped row groups (e.g. for reading dictionaries). Testing: Tested in conjunction with a prototype buffer pool patch that was DCHECKing before the change. Added DCHECKs to the current version to ensure the streams are cleared up as expected. Change-Id: Ibc2f8f27c9b238be60261539f8d4be2facb57a2b --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/scanner-context.h 3 files changed, 26 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/9002/2 -- To view, visit http://gerrit.cloudera.org:8080/9002 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibc2f8f27c9b238be60261539f8d4be2facb57a2b Gerrit-Change-Number: 9002 Gerrit-PatchSet: 2 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho
[Impala-ASF-CR] IMPALA-6384: RequestPoolService should honor custom group mapping config
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/9000 ) Change subject: IMPALA-6384: RequestPoolService should honor custom group mapping config .. Patch Set 3: Code-Review+2 (1 comment) Thanks Zoram and Tim. Carrying +2. http://gerrit.cloudera.org:8080/#/c/9000/2/testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.tmpl File testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.tmpl: http://gerrit.cloudera.org:8080/#/c/9000/2/testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.tmpl@116 PS2, Line 116: overridde > nit: overriden Done -- To view, visit http://gerrit.cloudera.org:8080/9000 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibb93870c0cc37e2432a643a274931f1d3d13fb96 Gerrit-Change-Number: 9000 Gerrit-PatchSet: 3 Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Thu, 11 Jan 2018 19:16:11 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6384: RequestPoolService should honor custom group mapping config
Hello Dimitris Tsirogiannis, Zoram Thanga, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9000 to look at the new patch set (#3). Change subject: IMPALA-6384: RequestPoolService should honor custom group mapping config .. IMPALA-6384: RequestPoolService should honor custom group mapping config Due to the way in which we instantiate fair scheduler allocation loader, we donot read the config overrides from the HDFS config files. This is an unexpected behavior from users' POV since we typically support overrides like custom user -> group mapping via HDFS config (for ex: LDAPGroupsMapping) that eventually affects the query -> pool assignment. Fix: This patch loads the hadoop default configuration so that the underlying QueuePlacementPolicy is based on user specified overrides. Testing (manual): Changed the core-site.xml to use LDAPGroupsMapping instead of the default ShellBasedUnixGroupsMapping and confirmed that the correct group mapping plugin is loaded, by adding additional logging. Also, modified TestRequestPoolService to assert that the core-site xml overrides are loaded. Change-Id: Ibb93870c0cc37e2432a643a274931f1d3d13fb96 --- M common/yarn-extras/src/main/java/org/apache/impala/yarn/server/resourcemanager/scheduler/fair/QueuePlacementPolicy.java M fe/src/main/java/org/apache/impala/util/RequestPoolService.java M fe/src/test/java/org/apache/impala/util/TestRequestPoolService.java M testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.tmpl 4 files changed, 32 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/9000/3 -- To view, visit http://gerrit.cloudera.org:8080/9000 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibb93870c0cc37e2432a643a274931f1d3d13fb96 Gerrit-Change-Number: 9000 Gerrit-PatchSet: 3 Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoram Thanga
[Impala-ASF-CR] IMPALA-4252: [DOCS] Document min/max filters for Kudu tables
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/8986 ) Change subject: IMPALA-4252: [DOCS] Document min/max filters for Kudu tables .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/8986/1/docs/topics/impala_runtime_filtering.xml File docs/topics/impala_runtime_filtering.xml: http://gerrit.cloudera.org:8080/#/c/8986/1/docs/topics/impala_runtime_filtering.xml@173 PS1, Line 173: : For HD > Done. Because this paragraph is followed by info that's only relevant for B Actually, the partitioned/broadcast and local/global discussion applies to min-max filters as well. I should also add that the long term plan is to have all filter types supported by all scan types, so no need to separate out min-max as being a really specifically Kudu thing (though of course it only applies to Kudu at the moment). http://gerrit.cloudera.org:8080/#/c/8986/2/docs/topics/impala_runtime_filtering.xml File docs/topics/impala_runtime_filtering.xml: http://gerrit.cloudera.org:8080/#/c/8986/2/docs/topics/impala_runtime_filtering.xml@181 PS2, Line 181: a complete list of relevant values This is the only part I see that doesn't make sense for min-max filters, as they're not a 'list of values', but then a bloom filter isn't a 'list of values' either. Maybe rephrase it something like "A broadcast filter reflects the complete set of relevant values and can be immediately evaluated..." and "A partitioned filter reflects only the values processed by one host..." or perhaps "contains" instead of reflects http://gerrit.cloudera.org:8080/#/c/8986/2/docs/topics/impala_runtime_filtering.xml@203 PS2, Line 203: These filters are used by Kudu to scan a range of values : for join columns when identifying matching rows within a join query. I find this sentence confusing, as Kudu isn't identifying the matching rows (Kudu doesn't even know we're doing a join, its just scanning values for us) Maybe say something like "These filters are passed to Kudu to reduce the number of rows returnrf to Impala when scanning the probe side of the join" -- To view, visit http://gerrit.cloudera.org:8080/8986 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I15d8c952ab5b90e89fdd57640dfb4da882f7ecb2 Gerrit-Change-Number: 8986 Gerrit-PatchSet: 2 Gerrit-Owner: John Russell Gerrit-Reviewer: John Russell Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Thu, 11 Jan 2018 19:12:47 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5528: Upgrade GPerfTools to 2.6.3 and tune TCMalloc for KRPC
Sailesh Mukil has posted comments on this change. ( http://gerrit.cloudera.org:8080/8991 ) Change subject: IMPALA-5528: Upgrade GPerfTools to 2.6.3 and tune TCMalloc for KRPC .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8991 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5be574435af51fb7a875b16888cca260b341190e Gerrit-Change-Number: 8991 Gerrit-PatchSet: 3 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 11 Jan 2018 19:12:01 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6384: RequestPoolService should honor custom group mapping config
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9000 ) Change subject: IMPALA-6384: RequestPoolService should honor custom group mapping config .. Patch Set 2: Code-Review+2 (1 comment) The test seems good since it's at least testing the root cause of the problem http://gerrit.cloudera.org:8080/#/c/9000/2/testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.tmpl File testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.tmpl: http://gerrit.cloudera.org:8080/#/c/9000/2/testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.tmpl@116 PS2, Line 116: overriden nit: overriden -- To view, visit http://gerrit.cloudera.org:8080/9000 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibb93870c0cc37e2432a643a274931f1d3d13fb96 Gerrit-Change-Number: 9000 Gerrit-PatchSet: 2 Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Thu, 11 Jan 2018 19:06:28 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6268: KerberosOnAndOff/RpcMgrKerberizedTest.MultipleServices failing
Sailesh Mukil has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/9006 ) Change subject: IMPALA-6268: KerberosOnAndOff/RpcMgrKerberizedTest.MultipleServices failing .. IMPALA-6268: KerberosOnAndOff/RpcMgrKerberizedTest.MultipleServices failing On systems that have Kerberos 1.11 or earlier, service principals with IP addresses are not supported due to a bug: http://krbdev.mit.edu/rt/Ticket/Display.html?id=7603 Since our BE tests use such principals, they fail on older platforms with the above mentioned kerberos versions. Kudu fixed this by adding a workaround which overrides krb5_realm_override. https://github.com/cloudera/kudu/commit/ba2ae3de4a7c43ff2f5873e822410e066ea99667 However, when we moved Kudu's security library into Impala, we did not add the appropriate build flags that allow it to be used. This patch fixes that. Testing: Verified that the failing test runs successfully on CentOs 6.4 with Kerberos 1.10.3 Change-Id: I60e291e8aa1b59b645b856d33c658471f314c221 --- M CMakeLists.txt M be/src/rpc/CMakeLists.txt M be/src/rpc/rpc-mgr-test.cc 3 files changed, 6 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/9006/2 -- To view, visit http://gerrit.cloudera.org:8080/9006 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I60e291e8aa1b59b645b856d33c658471f314c221 Gerrit-Change-Number: 9006 Gerrit-PatchSet: 2 Gerrit-Owner: Sailesh Mukil
[Impala-ASF-CR] IMPALA-6384: RequestPoolService should honor custom group mapping config
Zoram Thanga has posted comments on this change. ( http://gerrit.cloudera.org:8080/9000 ) Change subject: IMPALA-6384: RequestPoolService should honor custom group mapping config .. Patch Set 2: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/9000 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibb93870c0cc37e2432a643a274931f1d3d13fb96 Gerrit-Change-Number: 9000 Gerrit-PatchSet: 2 Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Thu, 11 Jan 2018 18:59:43 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6346: Potential deadlock in KrpcDataStreamMgr
Hello Michael Ho, Lars Volker, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8950 to look at the new patch set (#3). Change subject: IMPALA-6346: Potential deadlock in KrpcDataStreamMgr .. IMPALA-6346: Potential deadlock in KrpcDataStreamMgr In KrpcDataStreamMgr::CreateRecvr() we take the lock_ and then call recvr->TakeOverEarlySender() for all contexts. recvr->TakeOverEarlySender() then calls recvr_->mgr_->EnqueueDeserializeTask((), which can block if the deserialize pool queue is full. The next thread to become available in that queue will also have to acquire lock_, thus leading to a deadlock. We fix this by moving the EarlySendersList out of the EarlySendersMap and dropping the lock before taking any actions on the RPC contexts in the EarlySendersList. All functions called after dropping 'lock_' do not require the lock to protect them as they are thread safe. Additionally modified the BE test data-stream-test to work with KRPC as well. Testing: Added a new test to data-stream-test to verify that the deadlock does not happen. Also, I verified that this test hangs without the fix. Change-Id: Ib7d1a8f12a4821092ca61ccc8a6f20c0404d56c7 --- M be/src/runtime/data-stream-test.cc M be/src/runtime/krpc-data-stream-mgr.cc 2 files changed, 262 insertions(+), 46 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/8950/3 -- To view, visit http://gerrit.cloudera.org:8080/8950 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib7d1a8f12a4821092ca61ccc8a6f20c0404d56c7 Gerrit-Change-Number: 8950 Gerrit-PatchSet: 3 Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho
[Impala-ASF-CR] IMPALA-6268: KerberosOnAndOff/RpcMgrKerberizedTest.MultipleServices failing
Sailesh Mukil has uploaded this change for review. ( http://gerrit.cloudera.org:8080/9006 Change subject: IMPALA-6268: KerberosOnAndOff/RpcMgrKerberizedTest.MultipleServices failing .. IMPALA-6268: KerberosOnAndOff/RpcMgrKerberizedTest.MultipleServices failing On systems that have Kerberos 1.11 or earlier, service principals with IP addresses are not supported due to a bug: http://krbdev.mit.edu/rt/Ticket/Display.html?id=7603 Since our BE tests use such principals, they fail on older platforms with the above mentioned kerberos versions. Kudu fixed this by adding a workaround which overrides krb5_realm_override. https://github.com/cloudera/kudu/commit/ba2ae3de4a7c43ff2f5873e822410e066ea99667 However, when we moved Kudu's security library into Impala, we did not add the appropriate build flags that allow it to be used. This patch fixes that. Change-Id: I60e291e8aa1b59b645b856d33c658471f314c221 --- M CMakeLists.txt M be/src/rpc/CMakeLists.txt M be/src/rpc/rpc-mgr-test.cc 3 files changed, 6 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/9006/1 -- To view, visit http://gerrit.cloudera.org:8080/9006 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I60e291e8aa1b59b645b856d33c658471f314c221 Gerrit-Change-Number: 9006 Gerrit-PatchSet: 1 Gerrit-Owner: Sailesh Mukil
[Impala-ASF-CR] IMPALA-6346: Potential deadlock in KrpcDataStreamMgr
Hello Michael Ho, Lars Volker, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8950 to look at the new patch set (#2). Change subject: IMPALA-6346: Potential deadlock in KrpcDataStreamMgr .. IMPALA-6346: Potential deadlock in KrpcDataStreamMgr In KrpcDataStreamMgr::CreateRecvr() we take the lock_ and then call recvr->TakeOverEarlySender() for all contexts. recvr->TakeOverEarlySender() then calls recvr_->mgr_->EnqueueDeserializeTask((), which can block if the deserialize pool queue is full. The next thread to become available in that queue will also have to acquire lock_, thus leading to a deadlock. We fix this by moving the EarlySendersList out of the EarlySendersMap and dropping the lock before taking any actions on the RPC contexts in the EarlySendersList. All functions called after dropping 'lock_' do not require the lock to protect them as they are thread safe. Additionally modified the BE test data-stream-test to work with KRPC as well. Testing: Added a new test to data-stream-test to verify that the deadlock does not happen. Also, I verified that this test hangs without the fix. Change-Id: Ib7d1a8f12a4821092ca61ccc8a6f20c0404d56c7 --- M be/src/runtime/data-stream-test.cc M be/src/runtime/krpc-data-stream-mgr.cc 2 files changed, 263 insertions(+), 46 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/8950/2 -- To view, visit http://gerrit.cloudera.org:8080/8950 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib7d1a8f12a4821092ca61ccc8a6f20c0404d56c7 Gerrit-Change-Number: 8950 Gerrit-PatchSet: 2 Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho
[Impala-ASF-CR] IMPALA-6290: limit ScannerContext to 1 buffer at a time
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8814 ) Change subject: IMPALA-6290: limit ScannerContext to 1 buffer at a time .. Patch Set 12: Release build failed with a link error because of missing includes of a .inline.h file. -- To view, visit http://gerrit.cloudera.org:8080/8814 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I74c5960a75f7d88b0e1de4199af731fb13e592f0 Gerrit-Change-Number: 8814 Gerrit-PatchSet: 12 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Thu, 11 Jan 2018 18:24:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6290: limit ScannerContext to 1 buffer at a time
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8814 ) Change subject: IMPALA-6290: limit ScannerContext to 1 buffer at a time .. Patch Set 13: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1707/ -- To view, visit http://gerrit.cloudera.org:8080/8814 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I74c5960a75f7d88b0e1de4199af731fb13e592f0 Gerrit-Change-Number: 8814 Gerrit-PatchSet: 13 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Thu, 11 Jan 2018 18:24:29 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6290: limit ScannerContext to 1 buffer at a time
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8814 ) Change subject: IMPALA-6290: limit ScannerContext to 1 buffer at a time .. Patch Set 13: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8814 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I74c5960a75f7d88b0e1de4199af731fb13e592f0 Gerrit-Change-Number: 8814 Gerrit-PatchSet: 13 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Thu, 11 Jan 2018 18:24:18 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6290: limit ScannerContext to 1 buffer at a time
Hello Michael Ho, Lars Volker, Zoram Thanga, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8814 to look at the new patch set (#13). Change subject: IMPALA-6290: limit ScannerContext to 1 buffer at a time .. IMPALA-6290: limit ScannerContext to 1 buffer at a time This is a prerequisite for constraining the number of buffers per scan range. Before this patch, calling ReadBytes(), SkipBytes(), etc could cause an arbitrary number of I/O buffers to accumulate in 'completed_io_buffers_'. E.g. if we allocated 3 * 8MB I/O buffers for a range and then called ReadBytes(30MB), we would hit resource exhaustion as soon as 3 buffers were accumulated in 'completed_io_buffers_'. The fix is to avoid accumulating any buffers in 'completed_io_buffers_'. Instead of adding them to 'completed_io_buffers_', completed buffers are just returned to the I/O manager. It turned out that this did not weaken the ScannerContext's guarantees about memory lifetime, because ScannerContext::GetBytesInternal() cleared 'boundary_buffer_' each time it was called regardless. I checked that this behaviour wasn't a bug by inspecting the scanner code. I could not find any cases where scanners depended on returned memory remaining valid beyond the next Read*()/Get*()/Skip*() call on the stream. This change makes that lifetime explicit in the comments. A side-effect of this fix is that scanners do not need to call ReleaseCompletedResources() in CommitRows() and means that the ScannerContext only ever needs to hold one I/O buffer at a time. This change also reimplements SkipBytes() to avoid it accumulating memory in the boundary buffer for large skip sizes. Also clarifies some of the invariants in ScannerContext. E.g. some places assumed io_buffer_ != NULL, but that is no longer needed. Testing: Ran core tests with ASAN and exhaustive tests with DEBUG. Change-Id: I74c5960a75f7d88b0e1de4199af731fb13e592f0 --- M be/src/exec/base-sequence-scanner.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/hdfs-text-scanner.cc M be/src/exec/parquet-column-readers.cc M be/src/exec/scanner-context.cc M be/src/exec/scanner-context.h M be/src/exec/scanner-context.inline.h M be/src/runtime/io/disk-io-mgr.cc M testdata/workloads/functional-query/queries/DataErrorsTest/hdfs-scan-node-errors.test 11 files changed, 321 insertions(+), 211 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/8814/13 -- To view, visit http://gerrit.cloudera.org:8080/8814 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I74c5960a75f7d88b0e1de4199af731fb13e592f0 Gerrit-Change-Number: 8814 Gerrit-PatchSet: 13 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoram Thanga
[Impala-ASF-CR] IMPALA-4323: "SET ROW FORMAT" option added to "ALTER TABLE" command
Adam Holley has posted comments on this change. ( http://gerrit.cloudera.org:8080/8928 ) Change subject: IMPALA-4323: "SET ROW FORMAT" option added to "ALTER TABLE" command .. Patch Set 8: (5 comments) Changes applied. http://gerrit.cloudera.org:8080/#/c/8928/4/fe/src/main/java/org/apache/impala/analysis/AlterTableSetRowFormatStmt.java File fe/src/main/java/org/apache/impala/analysis/AlterTableSetRowFormatStmt.java: http://gerrit.cloudera.org:8080/#/c/8928/4/fe/src/main/java/org/apache/impala/analysis/AlterTableSetRowFormatStmt.java@72 PS4, Line 72: throw new AnalysisException("ALTER TABLE SET ROW FORMAT is only supported " + > I feel pretty strongly that table-level delimiter changes should not be pre Ok. I think I understand now. Done. http://gerrit.cloudera.org:8080/#/c/8928/4/fe/src/main/java/org/apache/impala/analysis/AlterTableSetRowFormatStmt.java@77 PS4, Line 77: HdfsFileFormat format = HdfsFileFormat.fromHdfsInputFormatClass( > We should include info about the conflicting partition so the user can debu Included partition name. Is there some better data to return? http://gerrit.cloudera.org:8080/#/c/8928/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/8928/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@460 PS4, Line 460: reloadFileMetadata = alterTableSetRowFormat(tbl, > Thanks for investigating. I agree it's better to clean this up in a separat I've opened JIRA https://issues.apache.org/jira/browse/IMPALA-6385. Can you comment on that one as to which pattern should be used going forward? http://gerrit.cloudera.org:8080/#/c/8928/7/testdata/workloads/functional-query/queries/QueryTest/alter-table.test File testdata/workloads/functional-query/queries/QueryTest/alter-table.test: http://gerrit.cloudera.org:8080/#/c/8928/7/testdata/workloads/functional-query/queries/QueryTest/alter-table.test@1404 PS7, Line 1404: # Test select after alter table to ensure no partition changes. > typo: taqble Done http://gerrit.cloudera.org:8080/#/c/8928/7/testdata/workloads/functional-query/queries/QueryTest/alter-table.test@1410 PS7, Line 1410: TYPES > We should also validate that new partitions are created based on the new ta Done -- To view, visit http://gerrit.cloudera.org:8080/8928 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I96e347463504915a6f33932552e4d1f61e9b1154 Gerrit-Change-Number: 8928 Gerrit-PatchSet: 8 Gerrit-Owner: Adam Holley Gerrit-Reviewer: Adam Holley Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: John Russell Gerrit-Reviewer: Kim Jin Chul Gerrit-Comment-Date: Thu, 11 Jan 2018 18:22:14 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4323: "SET ROW FORMAT" option added to "ALTER TABLE" command
Adam Holley has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/8928 ) Change subject: IMPALA-4323: "SET ROW FORMAT" option added to "ALTER TABLE" command .. IMPALA-4323: "SET ROW FORMAT" option added to "ALTER TABLE" command Examples of new command: ALTER TABLE t1 SET ROW FORMAT DELIMITED FIELDS TERMINATED BY '\002'; ALTER TABLE t1 SET ROW FORMAT DELIMITED LINES TERMINATED BY '\001'; Testing: Added parser tests and unit tests for alter statements including partition options. Change-Id: I96e347463504915a6f33932552e4d1f61e9b1154 --- M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup A fe/src/main/java/org/apache/impala/analysis/AlterTableSetRowFormatStmt.java M fe/src/main/java/org/apache/impala/catalog/HiveStorageDescriptorFactory.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M testdata/workloads/functional-query/queries/QueryTest/alter-table.test 8 files changed, 331 insertions(+), 20 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/8928/8 -- To view, visit http://gerrit.cloudera.org:8080/8928 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I96e347463504915a6f33932552e4d1f61e9b1154 Gerrit-Change-Number: 8928 Gerrit-PatchSet: 8 Gerrit-Owner: Adam Holley Gerrit-Reviewer: Adam Holley Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: John Russell Gerrit-Reviewer: Kim Jin Chul
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 ) Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/9005/1/be/src/exec/scalar-check-node.h File be/src/exec/scalar-check-node.h: http://gerrit.cloudera.org:8080/#/c/9005/1/be/src/exec/scalar-check-node.h@47 PS1, Line 47: child_row_idx_ This member is not used anywhere. Probably it should be removed. -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 11 Jan 2018 17:40:25 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 ) Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/9005/1/testdata/workloads/functional-query/queries/QueryTest/subquery.test File testdata/workloads/functional-query/queries/QueryTest/subquery.test: http://gerrit.cloudera.org:8080/#/c/9005/1/testdata/workloads/functional-query/queries/QueryTest/subquery.test@932 PS1, Line 932: The case when the subquery returns 0 row could be tested too. -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 11 Jan 2018 16:45:50 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 ) Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/9005/1/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java: http://gerrit.cloudera.org:8080/#/c/9005/1/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@746 PS1, Line 746: selectNode scalarCheckNode? -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 11 Jan 2018 16:44:41 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 ) Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. Patch Set 1: (8 comments) http://gerrit.cloudera.org:8080/#/c/9005/1/be/src/exec/scalar-check-node.h File be/src/exec/scalar-check-node.h: http://gerrit.cloudera.org:8080/#/c/9005/1/be/src/exec/scalar-check-node.h@44 PS1, Line 44: boost::scoped_ptr child_row_batch_; If I understand correctly, having child_row_batch_ as member is usually needed because the row_batch in GetNext() can reach its capacity, so the remaining fetched rows will be copied in the next GetNext() call(s). Is this possible here? That would mean that the GetNext() was called with an already full row batch. If this is not possible, then this member could be replaced with a local variable in GetNext(). http://gerrit.cloudera.org:8080/#/c/9005/1/be/src/exec/scalar-check-node.cc File be/src/exec/scalar-check-node.cc: http://gerrit.cloudera.org:8080/#/c/9005/1/be/src/exec/scalar-check-node.cc@56 PS1, Line 56: if (child_row_batch_->num_rows() == 0) { Is it possible/ok for this condition to be false? See my comment in scalar-check-node.h http://gerrit.cloudera.org:8080/#/c/9005/1/be/src/exec/scalar-check-node.cc@63 PS1, Line 63: one "zero or one" or "maximum one" would be more exact http://gerrit.cloudera.org:8080/#/c/9005/1/be/src/exec/scalar-check-node.cc@78 PS1, Line 78: child_row_batch_->TransferResourceOwnership(row_batch); Just an idea, not sure if a good one: it may be better to deep copy the row, which would make it possible to release the resources referenced by child_row_batch_ earlier. http://gerrit.cloudera.org:8080/#/c/9005/1/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java File fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java: http://gerrit.cloudera.org:8080/#/c/9005/1/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java@178 PS1, Line 178: // Runtime check needed to check if subquery returns scalar This is not just a check, but can be a transformation, as array types are replaced. This should be mentioned in the comment. +nit: missing . http://gerrit.cloudera.org:8080/#/c/9005/1/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java File fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java: http://gerrit.cloudera.org:8080/#/c/9005/1/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java@179 PS1, Line 179: // Runtime check needed to check if subquery returns scalar This is not just a check, but can be a transformation, as array types are replaced. This should be mentioned in the comment. +nit: missing . http://gerrit.cloudera.org:8080/#/c/9005/1/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java File fe/src/main/java/org/apache/impala/analysis/QueryStmt.java: http://gerrit.cloudera.org:8080/#/c/9005/1/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java@90 PS1, Line 90: // If true, a ScalarCheckNode is needed to check this statement's result at runtime nit: missing . http://gerrit.cloudera.org:8080/#/c/9005/1/fe/src/main/java/org/apache/impala/planner/ScalarCheckNode.java File fe/src/main/java/org/apache/impala/planner/ScalarCheckNode.java: http://gerrit.cloudera.org:8080/#/c/9005/1/fe/src/main/java/org/apache/impala/planner/ScalarCheckNode.java@32 PS1, Line 32: // LIMIT 2 is enough to check if the child returns a scalar value : // Also, an optimized plan is generated with LIMIT 2 nit: 2 missing . -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 11 Jan 2018 16:42:33 + Gerrit-HasComments: Yes
[Impala-ASF-CR] Fix typo in test observability.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/9003 ) Change subject: Fix typo in test_observability. .. Patch Set 1: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1706/ -- To view, visit http://gerrit.cloudera.org:8080/9003 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ida3f355c6c6be5ed52e4d445f8f80665cdc8e2b8 Gerrit-Change-Number: 9003 Gerrit-PatchSet: 1 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Thu, 11 Jan 2018 16:27:05 + Gerrit-HasComments: No