[Impala-ASF-CR] IMPALA-5976: Remove equivalence class computation in FE
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8317 ) Change subject: IMPALA-5976: Remove equivalence class computation in FE .. Patch Set 9: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1497/ -- To view, visit http://gerrit.cloudera.org:8080/8317 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If4cb1d8be46efa8fd61a97048cc79dabe2ffa51a Gerrit-Change-Number: 8317 Gerrit-PatchSet: 9 Gerrit-Owner: Tianyi Wang Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tianyi Wang Gerrit-Comment-Date: Sat, 18 Nov 2017 05:36:21 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5976: Remove equivalence class computation in FE
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8317 ) Change subject: IMPALA-5976: Remove equivalence class computation in FE .. Patch Set 9: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8317 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If4cb1d8be46efa8fd61a97048cc79dabe2ffa51a Gerrit-Change-Number: 8317 Gerrit-PatchSet: 9 Gerrit-Owner: Tianyi Wang Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tianyi Wang Gerrit-Comment-Date: Sat, 18 Nov 2017 05:35:27 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6109: xfail TestHdfsUnknownErrors::test hdfs safe mode error 255
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8590 ) Change subject: IMPALA-6109: xfail TestHdfsUnknownErrors::test_hdfs_safe_mode_error_255 .. IMPALA-6109: xfail TestHdfsUnknownErrors::test_hdfs_safe_mode_error_255 The test puts the HDFS name node into safe mode to trigger an "Unknown Error 255" and verifies that the error details can be obtained correctly via the libHDFS API. However, putting the name node into safe mode can trip up HBase (HBASE-18738), which causes sporadic failures of our other HBase tests. To prevent this, we xfail the test until the HBase issue has been addressed (or we find a better way to trigger a 255 error). IMPALA-6212 tracks re-enabling the test in the future. Change-Id: I55979bed07147409949b798d4beb7a3b3b7ec5c3 Reviewed-on: http://gerrit.cloudera.org:8080/8590 Reviewed-by: Sailesh Mukil Tested-by: Impala Public Jenkins --- M tests/data_errors/test_data_errors.py 1 file changed, 1 insertion(+), 0 deletions(-) Approvals: Sailesh Mukil: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/8590 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I55979bed07147409949b798d4beb7a3b3b7ec5c3 Gerrit-Change-Number: 8590 Gerrit-PatchSet: 2 Gerrit-Owner: Lars Volker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-6109: xfail TestHdfsUnknownErrors::test hdfs safe mode error 255
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8590 ) Change subject: IMPALA-6109: xfail TestHdfsUnknownErrors::test_hdfs_safe_mode_error_255 .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/8590 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I55979bed07147409949b798d4beb7a3b3b7ec5c3 Gerrit-Change-Number: 8590 Gerrit-PatchSet: 1 Gerrit-Owner: Lars Volker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Sat, 18 Nov 2017 03:22:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5976: Remove equivalence class computation in FE
Tianyi Wang has posted comments on this change. ( http://gerrit.cloudera.org:8080/8317 ) Change subject: IMPALA-5976: Remove equivalence class computation in FE .. Patch Set 9: Added the outer join case in the comment. -- To view, visit http://gerrit.cloudera.org:8080/8317 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If4cb1d8be46efa8fd61a97048cc79dabe2ffa51a Gerrit-Change-Number: 8317 Gerrit-PatchSet: 9 Gerrit-Owner: Tianyi Wang Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tianyi Wang Gerrit-Comment-Date: Sat, 18 Nov 2017 02:26:20 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5976: Remove equivalence class computation in FE
Tianyi Wang has uploaded a new patch set (#9). ( http://gerrit.cloudera.org:8080/8317 ) Change subject: IMPALA-5976: Remove equivalence class computation in FE .. IMPALA-5976: Remove equivalence class computation in FE Equivalent class is used to get the equivalencies between slots. It is ill-defined and the current implementation is inefficient. This patch removes it and directly uses the information from the value transfer graph instead. Value transfer graph is reimplemented using Tarjan's strongly connected component algorithm and BFS with adjacency lists to speed up on both condensed and sparse graphs. Testing: It passes the existing tests. In planner tests the equivalence between SCC-condensed graph and uncondensed graph is checked. A test case is added for a helper class IntArrayList. An outer-join edge case is added in planner test. On a query with 1800 union operations, the equivalence class computation time is reduced from 7m57s to 65ms and the planning time is reduced from 8m5s to 13s. Change-Id: If4cb1d8be46efa8fd61a97048cc79dabe2ffa51a --- M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java M fe/src/main/java/org/apache/impala/analysis/AggregateInfoBase.java M fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/BetweenPredicate.java M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/main/java/org/apache/impala/analysis/BoolLiteral.java M fe/src/main/java/org/apache/impala/analysis/CaseExpr.java M fe/src/main/java/org/apache/impala/analysis/CastExpr.java M fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java M fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java M fe/src/main/java/org/apache/impala/analysis/IsNullPredicate.java M fe/src/main/java/org/apache/impala/analysis/LikePredicate.java M fe/src/main/java/org/apache/impala/analysis/NullLiteral.java M fe/src/main/java/org/apache/impala/analysis/NumericLiteral.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/analysis/SlotRef.java M fe/src/main/java/org/apache/impala/analysis/StringLiteral.java M fe/src/main/java/org/apache/impala/analysis/Subquery.java M fe/src/main/java/org/apache/impala/analysis/TimestampLiteral.java M fe/src/main/java/org/apache/impala/analysis/TupleIsNullPredicate.java M fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java A fe/src/main/java/org/apache/impala/util/Graph.java A fe/src/main/java/org/apache/impala/util/IntArrayList.java A fe/src/main/java/org/apache/impala/util/IntIterator.java A fe/src/test/java/org/apache/impala/util/IntArrayListTest.java M testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test 33 files changed, 1,176 insertions(+), 785 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/8317/9 -- To view, visit http://gerrit.cloudera.org:8080/8317 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If4cb1d8be46efa8fd61a97048cc79dabe2ffa51a Gerrit-Change-Number: 8317 Gerrit-PatchSet: 9 Gerrit-Owner: Tianyi Wang Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tianyi Wang
[Impala-ASF-CR] IMPALA-5976: Remove equivalence class computation in FE
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8317 ) Change subject: IMPALA-5976: Remove equivalence class computation in FE .. Patch Set 6: (2 comments) http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java: http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@406 PS6, Line 406: // For right anti and semi joins the lhs join slots does not appear in the output. > I don't understand. Why are the slots in the rhsJoinPartition not returned I got confused. You are right. http://gerrit.cloudera.org:8080/#/c/8317/6/testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test File testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test: http://gerrit.cloudera.org:8080/#/c/8317/6/testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test@1056 PS6, Line 1056: # around a check in refsNullableTupleId(). > - I removed refsNullableTupleId in the newest ps. Makes sense. Let me take a look at the newest ps -- To view, visit http://gerrit.cloudera.org:8080/8317 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If4cb1d8be46efa8fd61a97048cc79dabe2ffa51a Gerrit-Change-Number: 8317 Gerrit-PatchSet: 6 Gerrit-Owner: Tianyi Wang Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tianyi Wang Gerrit-Comment-Date: Sat, 18 Nov 2017 01:22:33 + Gerrit-HasComments: Yes
[Impala-ASF-CR] Pin gen build version's git handling to typical git dir.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8500 ) Change subject: Pin gen_build_version's git handling to typical git dir. .. Pin gen_build_version's git handling to typical git dir. gen_build_version.py now specifies --git-dir explicitly, to avoid fetching the revision of a containing git directory that's not an Impala checkout. This came up when building Impala without a git directory, but within a build system that happens to have a git directory higher up in the directory tree. I tested this by running the script manually and observing it works identically in the normal case. Change-Id: I98870a1cb2073ddcf9ba1620e3801e1964930edd Reviewed-on: http://gerrit.cloudera.org:8080/8500 Reviewed-by: Joe McDonnell Tested-by: Impala Public Jenkins --- M bin/gen_build_version.py 1 file changed, 5 insertions(+), 6 deletions(-) Approvals: Joe McDonnell: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/8500 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I98870a1cb2073ddcf9ba1620e3801e1964930edd Gerrit-Change-Number: 8500 Gerrit-PatchSet: 4 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] Pin gen build version's git handling to typical git dir.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8500 ) Change subject: Pin gen_build_version's git handling to typical git dir. .. Patch Set 3: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/8500 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I98870a1cb2073ddcf9ba1620e3801e1964930edd Gerrit-Change-Number: 8500 Gerrit-PatchSet: 3 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Sat, 18 Nov 2017 01:20:58 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5976: Remove equivalence class computation in FE
Tianyi Wang has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/8317 ) Change subject: IMPALA-5976: Remove equivalence class computation in FE .. IMPALA-5976: Remove equivalence class computation in FE Equivalent class is used to get the equivalencies between slots. It is ill-defined and the current implementation is inefficient. This patch removes it and directly uses the information from the value transfer graph instead. Value transfer graph is reimplemented using Tarjan's strongly connected component algorithm and BFS with adjacency lists to speed up on both condensed and sparse graphs. Testing: It passes the existing tests. In planner tests the equivalence between SCC-condensed graph and uncondensed graph is checked. A test case is added for a helper class IntArrayList. An outer-join edge case is added in planner test. On a query with 1800 union operations, the equivalence class computation time is reduced from 7m57s to 65ms and the planning time is reduced from 8m5s to 13s. Change-Id: If4cb1d8be46efa8fd61a97048cc79dabe2ffa51a --- M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java M fe/src/main/java/org/apache/impala/analysis/AggregateInfoBase.java M fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/BetweenPredicate.java M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/main/java/org/apache/impala/analysis/BoolLiteral.java M fe/src/main/java/org/apache/impala/analysis/CaseExpr.java M fe/src/main/java/org/apache/impala/analysis/CastExpr.java M fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java M fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java M fe/src/main/java/org/apache/impala/analysis/IsNullPredicate.java M fe/src/main/java/org/apache/impala/analysis/LikePredicate.java M fe/src/main/java/org/apache/impala/analysis/NullLiteral.java M fe/src/main/java/org/apache/impala/analysis/NumericLiteral.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/analysis/SlotRef.java M fe/src/main/java/org/apache/impala/analysis/StringLiteral.java M fe/src/main/java/org/apache/impala/analysis/Subquery.java M fe/src/main/java/org/apache/impala/analysis/TimestampLiteral.java M fe/src/main/java/org/apache/impala/analysis/TupleIsNullPredicate.java M fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java A fe/src/main/java/org/apache/impala/util/Graph.java A fe/src/main/java/org/apache/impala/util/IntArrayList.java A fe/src/main/java/org/apache/impala/util/IntIterator.java A fe/src/test/java/org/apache/impala/util/IntArrayListTest.java M testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test 33 files changed, 1,119 insertions(+), 785 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/8317/8 -- To view, visit http://gerrit.cloudera.org:8080/8317 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If4cb1d8be46efa8fd61a97048cc79dabe2ffa51a Gerrit-Change-Number: 8317 Gerrit-PatchSet: 8 Gerrit-Owner: Tianyi Wang Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tianyi Wang
[Impala-ASF-CR] IMPALA-4835: Part 1: simplify I/O mgr mem mgmt and cancellation
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/8414 ) Change subject: IMPALA-4835: Part 1: simplify I/O mgr mem mgmt and cancellation .. Patch Set 11: (19 comments) http://gerrit.cloudera.org:8080/#/c/8414/11/be/src/runtime/io/disk-io-mgr.h File be/src/runtime/io/disk-io-mgr.h: http://gerrit.cloudera.org:8080/#/c/8414/11/be/src/runtime/io/disk-io-mgr.h@a222 PS11, Line 222: : : did you mean to just delete that? http://gerrit.cloudera.org:8080/#/c/8414/11/be/src/runtime/io/disk-io-mgr.h@457 PS11, Line 457: queue queued? http://gerrit.cloudera.org:8080/#/c/8414/11/be/src/runtime/io/disk-io-mgr.cc File be/src/runtime/io/disk-io-mgr.cc: http://gerrit.cloudera.org:8080/#/c/8414/11/be/src/runtime/io/disk-io-mgr.cc@333 PS11, Line 333: CANCELLED seems like the right thing to do but are you sure we weren't depending on propagating the error status through this path? how does the potential error get propagated now and are we careful to not overwrite it with this CANCELLED further up the stack / across threads? http://gerrit.cloudera.org:8080/#/c/8414/11/be/src/runtime/io/disk-io-mgr.cc@581 PS11, Line 581: DCHECK(!buffer->is_cached()) << "HDFS cache reads don't go through this code path."; maybe put that dcheck upstream too (like ReadRange() or even earlier) to help clarify the paths)? http://gerrit.cloudera.org:8080/#/c/8414/11/be/src/runtime/io/disk-io-mgr.cc@665 PS11, Line 665: DCHECK nit: _EQ http://gerrit.cloudera.org:8080/#/c/8414/11/be/src/runtime/io/request-context.h File be/src/runtime/io/request-context.h: http://gerrit.cloudera.org:8080/#/c/8414/11/be/src/runtime/io/request-context.h@448 PS11, Line 448: threads disk threads? http://gerrit.cloudera.org:8080/#/c/8414/9/be/src/runtime/io/request-context.h File be/src/runtime/io/request-context.h: http://gerrit.cloudera.org:8080/#/c/8414/9/be/src/runtime/io/request-context.h@27 PS9, Line 27: For most I/O manager clients it is an : /// opaque pointer, but some clients may need to include this heade is that still true? http://gerrit.cloudera.org:8080/#/c/8414/9/be/src/runtime/io/request-context.h@93 PS9, Line 93: void Cancel(); how did you decide to make this a first class thing in the RequestContext, as opposed to all the other DiskIoMgr methods that take a RequestContext* as their first parameter? Should those also (eventually) just becomes methods on context? (And maybe context is really "DiskIoMgrClient" - though still unclear exactly what this abstraction should be)? e.g. Read(), GetNextRange(), AddScanRanges(), AddWriteRange() kinda seem like methods on the context (or client). http://gerrit.cloudera.org:8080/#/c/8414/11/be/src/runtime/io/request-context.cc File be/src/runtime/io/request-context.cc: http://gerrit.cloudera.org:8080/#/c/8414/11/be/src/runtime/io/request-context.cc@42 PS11, Line 42: << static_cast(range->external_buffer_tag_); oh, I guess DCHECK_EQ doesn't work with enum or something? http://gerrit.cloudera.org:8080/#/c/8414/11/be/src/runtime/io/request-context.cc@75 PS11, Line 75: reader what about the writing case? is that relevant? http://gerrit.cloudera.org:8080/#/c/8414/11/be/src/runtime/io/request-context.cc@76 PS11, Line 76: GetNext GetNext() http://gerrit.cloudera.org:8080/#/c/8414/11/be/src/runtime/io/request-context.cc@79 PS11, Line 79: it does this referring to the context or something else? http://gerrit.cloudera.org:8080/#/c/8414/11/be/src/runtime/io/request-context.cc@90 PS11, Line 90: cancelled Status does that mean the CANCELLED status or the status that lead to cancellation (if cancelled for an error)? http://gerrit.cloudera.org:8080/#/c/8414/11/be/src/runtime/io/request-context.cc@91 PS11, Line 91: the cancelled state. what does that mean? http://gerrit.cloudera.org:8080/#/c/8414/11/be/src/runtime/io/request-context.cc@95 PS11, Line 95: decrements the number of disk queues the context : // is on. which code is this talking about? Oh, I guess that's now in DecrementDiskThread()? http://gerrit.cloudera.org:8080/#/c/8414/11/be/src/runtime/io/request-ranges.h File be/src/runtime/io/request-ranges.h: http://gerrit.cloudera.org:8080/#/c/8414/11/be/src/runtime/io/request-ranges.h@a391 PS11, Line 391: great to get rid of this. but what is it that allows us to no longer need it? http://gerrit.cloudera.org:8080/#/c/8414/11/be/src/runtime/io/request-ranges.h@400 PS11, Line 400: to the I/O manager should that be reworded given where ReturnBuffer() now lives? One option is to just delete that. http://gerrit.cloudera.org:8080/#/c/8414/9/be/src/runtime/io/request-ranges.h File be/src/runtime/io/request-ranges.h: http://gerrit.cloudera.org:8080/#/c/8414/9/be/src/runtime/io/request-ranges.h@226 PS9, Line 226: Read() I guess that's talking about DiskIoMgr::Read()? I wonder if we should get rid of Read() so that the lifecycl
[Impala-ASF-CR] IMPALA-5976: Remove equivalence class computation in FE
Tianyi Wang has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/8317 ) Change subject: IMPALA-5976: Remove equivalence class computation in FE .. IMPALA-5976: Remove equivalence class computation in FE Equivalent class is used to get the equivalencies between slots. It is ill-defined and the current implementation is inefficient. This patch removes it and directly uses the information from the value transfer graph instead. Value transfer graph is reimplemented using Tarjan's strongly connected component algorithm and BFS with adjacency lists to speed up on both condensed and sparse graphs. Testing: It passes the existing tests. In planner tests the equivalence between SCC-condensed graph and uncondensed graph is checked. A test case is added for a helper class IntArrayList. An outer-join edge case is added in planner test. On a query with 1800 union operations, the equivalence class computation time is reduced from 7m57s to 65ms and the planning time is reduced from 8m5s to 13s. Change-Id: If4cb1d8be46efa8fd61a97048cc79dabe2ffa51a --- M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java M fe/src/main/java/org/apache/impala/analysis/AggregateInfoBase.java M fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/BetweenPredicate.java M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/main/java/org/apache/impala/analysis/BoolLiteral.java M fe/src/main/java/org/apache/impala/analysis/CaseExpr.java M fe/src/main/java/org/apache/impala/analysis/CastExpr.java M fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java M fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java M fe/src/main/java/org/apache/impala/analysis/IsNullPredicate.java M fe/src/main/java/org/apache/impala/analysis/LikePredicate.java M fe/src/main/java/org/apache/impala/analysis/NullLiteral.java M fe/src/main/java/org/apache/impala/analysis/NumericLiteral.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/analysis/SlotRef.java M fe/src/main/java/org/apache/impala/analysis/StringLiteral.java M fe/src/main/java/org/apache/impala/analysis/Subquery.java M fe/src/main/java/org/apache/impala/analysis/TimestampLiteral.java M fe/src/main/java/org/apache/impala/analysis/TupleIsNullPredicate.java M fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java A fe/src/main/java/org/apache/impala/util/Graph.java A fe/src/main/java/org/apache/impala/util/IntArrayList.java A fe/src/main/java/org/apache/impala/util/IntIterator.java A fe/src/test/java/org/apache/impala/util/IntArrayListTest.java M testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test 33 files changed, 1,119 insertions(+), 785 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/8317/7 -- To view, visit http://gerrit.cloudera.org:8080/8317 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If4cb1d8be46efa8fd61a97048cc79dabe2ffa51a Gerrit-Change-Number: 8317 Gerrit-PatchSet: 7 Gerrit-Owner: Tianyi Wang Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tianyi Wang
[Impala-ASF-CR] IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/8546 ) Change subject: IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo .. Patch Set 5: (4 comments) http://gerrit.cloudera.org:8080/#/c/8546/5/be/src/util/process-state-info.h File be/src/util/process-state-info.h: http://gerrit.cloudera.org:8080/#/c/8546/5/be/src/util/process-state-info.h@61 PS5, Line 61: /// Below are some of the Process status information that will be read from /proc/self/status. nit: long line. Please have a look at how to configure your text editor to highlight these for you. Alternatively check out git hooks to check for changed lines that are too long. http://gerrit.cloudera.org:8080/#/c/8546/4/be/src/util/process-state-info.cc File be/src/util/process-state-info.cc: http://gerrit.cloudera.org:8080/#/c/8546/4/be/src/util/process-state-info.cc@159 PS4, Line 159: ry > Done Much better! http://gerrit.cloudera.org:8080/#/c/8546/5/be/src/util/process-state-info.cc File be/src/util/process-state-info.cc: http://gerrit.cloudera.org:8080/#/c/8546/5/be/src/util/process-state-info.cc@36 PS5, Line 36: using std::to_string; nit: sort alphabetically? http://gerrit.cloudera.org:8080/#/c/8546/5/be/src/util/process-state-info.cc@157 PS5, Line 157: // readdir() is not thread-safe according to its man page, but in glibc Can you include a reference to the source (e.g. https://www.gnu.org/software/libc/manual/html_node/Reading_002fClosing-Directory.html) ? -- To view, visit http://gerrit.cloudera.org:8080/8546 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0 Gerrit-Change-Number: 8546 Gerrit-PatchSet: 5 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Sat, 18 Nov 2017 00:54:25 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1144: Fix exception when cancelling query in Impala-shell with CTRL-C
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8549 ) Change subject: IMPALA-1144: Fix exception when cancelling query in Impala-shell with CTRL-C .. Patch Set 5: > The only way to guarantee that there is no race condition around > is_query_cancelled flag is to Introduce locking around that variable. It certainly does make sense to swallow cancellation errors, especially if they're "query doesn't exist." I'll note that if we're having trouble with the "reset", we need not: a query can only go from running to cancelled. It can't go from cancelled back to running. There's still a race, of course, if the query finishes and we try to cancel it, which is what I think you're saying. > Re: flaky tests If it's stable ~100 times, I think it's fine to push it in. If it turns out to be a bad idea, we'll change the test. -- To view, visit http://gerrit.cloudera.org:8080/8549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6cefaf1dae78baae238289816a7cb9d210fb38e2 Gerrit-Change-Number: 8549 Gerrit-PatchSet: 5 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Sat, 18 Nov 2017 00:52:16 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5019: Decimal V2 addition
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/8309 ) Change subject: IMPALA-5019: Decimal V2 addition .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/8309/4/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: http://gerrit.cloudera.org:8080/#/c/8309/4/be/src/exprs/expr-test.cc@a2402 PS4, Line 2402: > the result of power(10, 3) - 0.45 was a double. Converted to decimal it loo Still not following. The old V2 expected value was 999.6 not 999.5. Stepping back, it seems the intent of this test case is to test rounding for CAST DOUBLE -> DECIMAL and using the double value of 999.55 for that. So the other question is whether this test is no longer testing what it meant to test. -- To view, visit http://gerrit.cloudera.org:8080/8309 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I401049c56d910eb1546a178c909c923b01239336 Gerrit-Change-Number: 8309 Gerrit-PatchSet: 4 Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Sat, 18 Nov 2017 00:50:39 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/8546 ) Change subject: IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo .. Patch Set 4: (7 comments) http://gerrit.cloudera.org:8080/#/c/8546/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8546/4//COMMIT_MSG@11 PS4, Line 11: api > nit: acronyms are usually all caps Done http://gerrit.cloudera.org:8080/#/c/8546/4/be/src/util/process-state-info.h File be/src/util/process-state-info.h: http://gerrit.cloudera.org:8080/#/c/8546/4/be/src/util/process-state-info.h@77 PS4, Line 77: /// File Descriptors information will be read from /proc/pid/fd. > nit: /proc/self/fd Done http://gerrit.cloudera.org:8080/#/c/8546/4/be/src/util/process-state-info.cc File be/src/util/process-state-info.cc: http://gerrit.cloudera.org:8080/#/c/8546/4/be/src/util/process-state-info.cc@155 PS4, Line 155: does > nit: do not count. Done http://gerrit.cloudera.org:8080/#/c/8546/4/be/src/util/process-state-info.cc@156 PS4, Line 156: self > nit: either use getpid() here to be consistent with the rest of the file, o Done http://gerrit.cloudera.org:8080/#/c/8546/4/be/src/util/process-state-info.cc@159 PS4, Line 159: dir > Can you try to think of better variable names? You have a DIR called d, and Done http://gerrit.cloudera.org:8080/#/c/8546/4/be/src/util/process-state-info.cc@160 PS4, Line 160: while ((dir = readdir(d)) != nullptr) ++fd_count; > Please add a brief comment why your usage of readdir is thread-safe here. Done http://gerrit.cloudera.org:8080/#/c/8546/4/be/src/util/process-state-info.cc@163 PS4, Line 163: std:: > You shouldn't need std:: here Done -- To view, visit http://gerrit.cloudera.org:8080/8546 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0 Gerrit-Change-Number: 8546 Gerrit-PatchSet: 4 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Sat, 18 Nov 2017 00:44:27 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo
Hello Lars Volker, Philip Zeyliger, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8546 to look at the new patch set (#5). Change subject: IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo .. IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo Running shell commands from impalad can be problematic, because using popen() leads to forking which causes a spike in virtual memory. To avoid this, "ls" is replaced with POSIX API calls. FileDescriptorMap fd_desc_ was only used to get the number of file descriptors, so it was unneccesery work to initialize it. It is removed, and only the number of file descriptors is computed. The automatic test for this function is only a sanity check, because there is no way to know the "expected value" in advance, and the number of file desciptors can change anytime. Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0 --- M be/src/util/proc-info-test.cc M be/src/util/process-state-info.cc M be/src/util/process-state-info.h 3 files changed, 32 insertions(+), 45 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/8546/5 -- To view, visit http://gerrit.cloudera.org:8080/8546 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0 Gerrit-Change-Number: 8546 Gerrit-PatchSet: 5 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Philip Zeyliger
[Impala-ASF-CR] IMPALA-5019: Decimal V2 addition
Taras Bobrovytsky has posted comments on this change. ( http://gerrit.cloudera.org:8080/8309 ) Change subject: IMPALA-5019: Decimal V2 addition .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/8309/4/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: http://gerrit.cloudera.org:8080/#/c/8309/4/be/src/exprs/expr-test.cc@a2402 PS4, Line 2402: > But 1000 - 0.45 = 999.55. When casting that to DECIMAL(4,1), why doesn't th the result of power(10, 3) - 0.45 was a double. Converted to decimal it looked like this: 999.5500 (decimal(38,16)). Converted back to double it was something like 999.499 (due to IMPALA-6183), then converting to decimal made it 999.5. This is no longer the case. -- To view, visit http://gerrit.cloudera.org:8080/8309 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I401049c56d910eb1546a178c909c923b01239336 Gerrit-Change-Number: 8309 Gerrit-PatchSet: 4 Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Sat, 18 Nov 2017 00:41:16 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3436: Return a decimal when rounding a double
Taras Bobrovytsky has posted comments on this change. ( http://gerrit.cloudera.org:8080/8398 ) Change subject: IMPALA-3436: Return a decimal when rounding a double .. Patch Set 4: Code-Review-1 The current plan is to go in a completely different direction. We want to implement the following rule for round() functions: The output type of a round() function must match the input type. This patch does the opposite. I will abandon this patch next week, after thinking about it some more. -- To view, visit http://gerrit.cloudera.org:8080/8398 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I240ec70da7996bbfefcf6438eba4dff8d33d7bd6 Gerrit-Change-Number: 8398 Gerrit-PatchSet: 4 Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 18 Nov 2017 00:36:16 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5019: Decimal V2 addition
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/8309 ) Change subject: IMPALA-5019: Decimal V2 addition .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/8309/4/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: http://gerrit.cloudera.org:8080/#/c/8309/4/be/src/exprs/expr-test.cc@a2402 PS4, Line 2402: > This went away after I fixed IMPALA-6183. But 1000 - 0.45 = 999.55. When casting that to DECIMAL(4,1), why doesn't that round to 999.6? What am I missing? -- To view, visit http://gerrit.cloudera.org:8080/8309 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I401049c56d910eb1546a178c909c923b01239336 Gerrit-Change-Number: 8309 Gerrit-PatchSet: 4 Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Sat, 18 Nov 2017 00:35:56 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1144: Fix exception when cancelling query in Impala-shell with CTRL-C
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/8549 ) Change subject: IMPALA-1144: Fix exception when cancelling query in Impala-shell with CTRL-C .. Patch Set 5: (4 comments) We discussed this issue with Tim from a different angle with the following outcome: - The only way to guarantee that there is no race condition around is_query_cancelled flag is to Introduce locking around that variable. - To make this work well the flag should be checked inside _do_rpc right before the rpc() call. In addition the rpc() call itself should be protected by the same lock block we use for is_query_cancelled check. - Unfortunately, as a result cancelling a query would have to wait until the actual rpc() call is finished. If it runs long then we've shot ourselves in the foot as the error wont be shown at the end but the query cancellation would lag for long. For a solution we considered a symptomatic treatment like approach to let the whole cancellation race condition throw it's exception and then where we catch that exception we can decide to suppress it or not. http://gerrit.cloudera.org:8080/#/c/8549/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8549/2//COMMIT_MSG@25 PS2, Line 25: Change-Id: I6cefaf1dae78baae238289816a7cb9d210fb38e2 > So, um, if you want a query that returns a lot of rows: Thanks for the nice example, Phil! I created a TC using your example and it seems to stably reproduce the issue. I wonder what is the probability of this test to become flaky. (I executed it like 100 times and seems to be still stable) http://gerrit.cloudera.org:8080/#/c/8549/3/shell/impala_client.py File shell/impala_client.py: http://gerrit.cloudera.org:8080/#/c/8549/3/shell/impala_client.py@319 PS3, Line 319: def wait_to_finish(self, last_query_handle, periodic_callback=None): > Do we also need to check is_cancelled here? In the case of a long-running D Indeed. http://gerrit.cloudera.org:8080/#/c/8549/3/shell/impala_client.py@335 PS3, Line 335: """Fetch all the results. > Why does this need to be set back to False? Seems confusing since the query This in fact not needed here. Done http://gerrit.cloudera.org:8080/#/c/8549/3/shell/impala_client.py@355 PS3, Line 355: if not result.has_more: > I think some of these other methods might also be subject to the same bug - Good point, thx. -- To view, visit http://gerrit.cloudera.org:8080/8549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6cefaf1dae78baae238289816a7cb9d210fb38e2 Gerrit-Change-Number: 8549 Gerrit-PatchSet: 5 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Sat, 18 Nov 2017 00:33:43 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6087: Revisit tests withheld from TPC-DS suite for use of TRUNCATE
Tim Wood has posted comments on this change. ( http://gerrit.cloudera.org:8080/8372 ) Change subject: IMPALA-6087: Revisit tests withheld from TPC-DS suite for use of TRUNCATE .. Patch Set 8: > Patch Set 8: > > > Patch Set 8: > > > > (1 comment) > > *wording Restated: I don't understand your request here, or for 77a nor 78. I adjusted the wording here to make it uniform across the files, and added explanations of the reasons. -- To view, visit http://gerrit.cloudera.org:8080/8372 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I79d2e34621639c8f8c4c4eb0b0944eaefca13a7a Gerrit-Change-Number: 8372 Gerrit-PatchSet: 8 Gerrit-Owner: Tim Wood Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Matthew Mulder Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Wood Gerrit-Comment-Date: Sat, 18 Nov 2017 00:31:57 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6087: Revisit tests withheld from TPC-DS suite for use of TRUNCATE
Tim Wood has posted comments on this change. ( http://gerrit.cloudera.org:8080/8372 ) Change subject: IMPALA-6087: Revisit tests withheld from TPC-DS suite for use of TRUNCATE .. Patch Set 8: > Patch Set 8: > > (1 comment) *wording -- To view, visit http://gerrit.cloudera.org:8080/8372 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I79d2e34621639c8f8c4c4eb0b0944eaefca13a7a Gerrit-Change-Number: 8372 Gerrit-PatchSet: 8 Gerrit-Owner: Tim Wood Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Matthew Mulder Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Wood Gerrit-Comment-Date: Sat, 18 Nov 2017 00:30:18 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5019: Decimal V2 addition
Taras Bobrovytsky has posted comments on this change. ( http://gerrit.cloudera.org:8080/8309 ) Change subject: IMPALA-5019: Decimal V2 addition .. Patch Set 5: Code-Review+1 Carrying the +1 from Tim -- To view, visit http://gerrit.cloudera.org:8080/8309 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I401049c56d910eb1546a178c909c923b01239336 Gerrit-Change-Number: 8309 Gerrit-PatchSet: 5 Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Sat, 18 Nov 2017 00:27:29 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5019: Decimal V2 addition
Taras Bobrovytsky has posted comments on this change. ( http://gerrit.cloudera.org:8080/8309 ) Change subject: IMPALA-5019: Decimal V2 addition .. Patch Set 5: Patch 5 is a rebase. -- To view, visit http://gerrit.cloudera.org:8080/8309 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I401049c56d910eb1546a178c909c923b01239336 Gerrit-Change-Number: 8309 Gerrit-PatchSet: 5 Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Sat, 18 Nov 2017 00:25:22 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5019: Decimal V2 addition
Taras Bobrovytsky has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/8309 ) Change subject: IMPALA-5019: Decimal V2 addition .. IMPALA-5019: Decimal V2 addition In this patch, we implement the new decimal return type rules for addition expressions. These rules become active when the query option DECIMAL_V2 is enabled. The algorithm for determining the type of the result is described in the JIRA. DECIMAL V1: ++ | typeof(cast(1 as decimal(38,0)) + cast(0.1 as decimal(38,38))) | ++ | DECIMAL(38,38) | ++ DECIMAL V2: ++ | typeof(cast(1 as decimal(38,0)) + cast(0.1 as decimal(38,38))) | ++ | DECIMAL(38,6) | ++ This patch required backend changes. We implement an algorithm where we handle the whole and fractional parts separately, and then combine them to get the final result. This is more complex and slower. We try to avoid this by first checking if the result would fit into int128. Testing: - Added expr tests. - Tested locally on my machine with a script that generates random decimal numbers and checks that Impala adds them correctly. Performance: For the common case, performance remains the same. select cast(2.2 as decimal(18, 1) + cast(2.2 as decimal(18, 1) BEFORE: 4.74s AFTER: 4.73s In this case, we check if it is necessary to do the complex addition, and it turns out to be not necessary. We see a slowdown because the result needs to be scaled down by dividing. select cast(2.2 as decimal(38, 19) + cast(2.2 as decimal(38, 19) BEFORE: 1.63s AFTER: 13.57s In following case, we take the most complex path and see the most signification perfmance hit. select cast(7.5 as decimal(38,37)) + cast(2.2 as decimal(38,37)) BEFORE: 1.63s AFTER: 20.57 Change-Id: I401049c56d910eb1546a178c909c923b01239336 --- M be/src/exprs/expr-test.cc M be/src/runtime/decimal-value.h M be/src/runtime/decimal-value.inline.h M fe/src/main/java/org/apache/impala/analysis/TypesUtil.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java 5 files changed, 386 insertions(+), 76 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/8309/5 -- To view, visit http://gerrit.cloudera.org:8080/8309 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I401049c56d910eb1546a178c909c923b01239336 Gerrit-Change-Number: 8309 Gerrit-PatchSet: 5 Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-5019: Decimal V2 addition
Taras Bobrovytsky has posted comments on this change. ( http://gerrit.cloudera.org:8080/8309 ) Change subject: IMPALA-5019: Decimal V2 addition .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/8309/4/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: http://gerrit.cloudera.org:8080/#/c/8309/4/be/src/exprs/expr-test.cc@a2402 PS4, Line 2402: > why does this rounding no longer happen in decimal v2? This went away after I fixed IMPALA-6183. What was happening is that power(10, 3) - 0.45 returned a decimal in Decimal V2 and was being converted to a double, which was imprecise. After converting back to decimal, rounding didn't happen because the value happened to be a little smaller than necessary for rounding. I rebased the patch and fixed the conflicts. -- To view, visit http://gerrit.cloudera.org:8080/8309 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I401049c56d910eb1546a178c909c923b01239336 Gerrit-Change-Number: 8309 Gerrit-PatchSet: 4 Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Sat, 18 Nov 2017 00:24:59 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1144: Fix exception when cancelling query in Impala-shell with CTRL-C
Hello Lars Volker, Laszlo Gaal, Michael Brown, Zoltan Borok-Nagy, Philip Zeyliger, David Knupp, Attila Jeges, Tim Armstrong, Csaba Ringhofer, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8549 to look at the new patch set (#5). Change subject: IMPALA-1144: Fix exception when cancelling query in Impala-shell with CTRL-C .. IMPALA-1144: Fix exception when cancelling query in Impala-shell with CTRL-C Issue1: When query is cancelled via CTRL-C while being executed in Impala-shell then an exception is thrown from Impala backend saying 'Invalid query handle'. This is because one ImpalaClient was making RPC's while another ImpalaClient cancelled the query on the backend. As a result RPC handlers in ImpalaServer try to access a ClientRequestState that had been cleared from the backend. The issue is confidently reproducable both in wait_to_finish and in fetch states of the query. As a solution the query cancellation is indicated to ImpalaClient via a bool flag. Once a cancellation originated exception reaches Impala shell this flag is checked to decide whether to suppress the error or not. Issue2: Every time a query was cancelled a 'use db' command was issued automatically. This happened to historical reasons but is not needed anymore (see Jira for more details). Change-Id: I6cefaf1dae78baae238289816a7cb9d210fb38e2 --- M shell/impala_client.py M shell/impala_shell.py M tests/shell/test_shell_commandline.py 3 files changed, 54 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/8549/5 -- To view, visit http://gerrit.cloudera.org:8080/8549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6cefaf1dae78baae238289816a7cb9d210fb38e2 Gerrit-Change-Number: 8549 Gerrit-PatchSet: 5 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-1144: Fix exception when cancelling query in Impala-shell with CTRL-C
Hello Lars Volker, Laszlo Gaal, Michael Brown, Zoltan Borok-Nagy, Philip Zeyliger, David Knupp, Attila Jeges, Tim Armstrong, Csaba Ringhofer, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8549 to look at the new patch set (#4). Change subject: IMPALA-1144: Fix exception when cancelling query in Impala-shell with CTRL-C .. IMPALA-1144: Fix exception when cancelling query in Impala-shell with CTRL-C Issue1: When query is cancelled via CTRL-C while being executed in Impala-shell then an exception is thrown from Impala backend saying 'Invalid query handle'. This is because one ImpalaClient was making RPC's while another ImpalaClient cancelled the query on the backend. As a result RPC handlers in ImpalaServer try to access a ClientRequestState that had been cleared from the backend. The issue is confidently reproducable both in wait_to_finish and in fetch states of the query. As a solution the query cancellation is indicated to ImpalaClient via a bool flag. Once a cancellation originated exception reaches Impala shell this flag is checked to decide whether to suppress the error or not. Issue2: Every time a query was cancelled a 'use db' command was issued automatically. This happened to historical reasons but is not needed anymore (see Jira for more details). Change-Id: I6cefaf1dae78baae238289816a7cb9d210fb38e2 --- M shell/impala_client.py M shell/impala_shell.py M tests/shell/test_shell_commandline.py 3 files changed, 56 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/8549/4 -- To view, visit http://gerrit.cloudera.org:8080/8549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6cefaf1dae78baae238289816a7cb9d210fb38e2 Gerrit-Change-Number: 8549 Gerrit-PatchSet: 4 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-6109: xfail TestHdfsUnknownErrors::test hdfs safe mode error 255
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8590 ) Change subject: IMPALA-6109: xfail TestHdfsUnknownErrors::test_hdfs_safe_mode_error_255 .. Patch Set 1: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1496/ -- To view, visit http://gerrit.cloudera.org:8080/8590 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I55979bed07147409949b798d4beb7a3b3b7ec5c3 Gerrit-Change-Number: 8590 Gerrit-PatchSet: 1 Gerrit-Owner: Lars Volker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Fri, 17 Nov 2017 23:58:23 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/8546 ) Change subject: IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo .. Patch Set 4: (7 comments) http://gerrit.cloudera.org:8080/#/c/8546/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8546/4//COMMIT_MSG@11 PS4, Line 11: api nit: acronyms are usually all caps http://gerrit.cloudera.org:8080/#/c/8546/4/be/src/util/process-state-info.h File be/src/util/process-state-info.h: http://gerrit.cloudera.org:8080/#/c/8546/4/be/src/util/process-state-info.h@77 PS4, Line 77: /// File Descriptors information will be read from /proc/pid/fd. nit: /proc/self/fd http://gerrit.cloudera.org:8080/#/c/8546/4/be/src/util/process-state-info.cc File be/src/util/process-state-info.cc: http://gerrit.cloudera.org:8080/#/c/8546/4/be/src/util/process-state-info.cc@155 PS4, Line 155: does nit: do not count. http://gerrit.cloudera.org:8080/#/c/8546/4/be/src/util/process-state-info.cc@156 PS4, Line 156: self nit: either use getpid() here to be consistent with the rest of the file, or change the other occurrences to /proc/self/* http://gerrit.cloudera.org:8080/#/c/8546/4/be/src/util/process-state-info.cc@159 PS4, Line 159: dir Can you try to think of better variable names? You have a DIR called d, and something else called dir. DIR is really a directory stream, and readdir returns directory entries. http://gerrit.cloudera.org:8080/#/c/8546/4/be/src/util/process-state-info.cc@160 PS4, Line 160: while ((dir = readdir(d)) != nullptr) ++fd_count; Please add a brief comment why your usage of readdir is thread-safe here. http://gerrit.cloudera.org:8080/#/c/8546/4/be/src/util/process-state-info.cc@163 PS4, Line 163: std:: You shouldn't need std:: here -- To view, visit http://gerrit.cloudera.org:8080/8546 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0 Gerrit-Change-Number: 8546 Gerrit-PatchSet: 4 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Fri, 17 Nov 2017 23:55:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6206: Fix data load failure with -notests
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8580 ) Change subject: IMPALA-6206: Fix data load failure with -notests .. Patch Set 3: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/8580 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idaa193989d77d56b72db05ad54e1fb0a345938fb Gerrit-Change-Number: 8580 Gerrit-PatchSet: 3 Gerrit-Owner: Zach Amsden Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Fri, 17 Nov 2017 23:46:09 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6087: Revisit tests withheld from TPC-DS suite for use of TRUNCATE
Tim Wood has posted comments on this change. ( http://gerrit.cloudera.org:8080/8372 ) Change subject: IMPALA-6087: Revisit tests withheld from TPC-DS suite for use of TRUNCATE .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/8372/8/testdata/workloads/tpcds/queries/tpcds-q61.test File testdata/workloads/tpcds/queries/tpcds-q61.test: http://gerrit.cloudera.org:8080/#/c/8372/8/testdata/workloads/tpcds/queries/tpcds-q61.test@4 PS8, Line 4: -- FIXED. CAST RESULT QUOTIENT TO DECIMAL(15, 4), USE ACTUAL RESULT AS EXPECTED RESULT > Similar to others can you please add expected Vs actual? I don't understand your request here, or for 77a nor 78. I adjusted the working here to make it uniform across the times and added explanations of the reasons. -- To view, visit http://gerrit.cloudera.org:8080/8372 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I79d2e34621639c8f8c4c4eb0b0944eaefca13a7a Gerrit-Change-Number: 8372 Gerrit-PatchSet: 8 Gerrit-Owner: Tim Wood Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Matthew Mulder Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Wood Gerrit-Comment-Date: Fri, 17 Nov 2017 23:42:06 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/8546 ) Change subject: IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo .. Patch Set 4: Lars, can you take care of doing the +2 review for this one? -- To view, visit http://gerrit.cloudera.org:8080/8546 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0 Gerrit-Change-Number: 8546 Gerrit-PatchSet: 4 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Fri, 17 Nov 2017 23:34:22 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6109: xfail TestHdfsUnknownErrors::test hdfs safe mode error 255
Sailesh Mukil has posted comments on this change. ( http://gerrit.cloudera.org:8080/8590 ) Change subject: IMPALA-6109: xfail TestHdfsUnknownErrors::test_hdfs_safe_mode_error_255 .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8590 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I55979bed07147409949b798d4beb7a3b3b7ec5c3 Gerrit-Change-Number: 8590 Gerrit-PatchSet: 1 Gerrit-Owner: Lars Volker Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Fri, 17 Nov 2017 23:30:12 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8546 ) Change subject: IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo .. Patch Set 4: Code-Review+1 Looks fine as far as I'm concerned. I learned something about readdir()! -- To view, visit http://gerrit.cloudera.org:8080/8546 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0 Gerrit-Change-Number: 8546 Gerrit-PatchSet: 4 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Fri, 17 Nov 2017 23:29:36 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6109: xfail TestHdfsUnknownErrors::test hdfs safe mode error 255
Lars Volker has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8590 Change subject: IMPALA-6109: xfail TestHdfsUnknownErrors::test_hdfs_safe_mode_error_255 .. IMPALA-6109: xfail TestHdfsUnknownErrors::test_hdfs_safe_mode_error_255 The test puts the HDFS name node into safe mode to trigger an "Unknown Error 255" and verifies that the error details can be obtained correctly via the libHDFS API. However, putting the name node into safe mode can trip up HBase (HBASE-18738), which causes sporadic failures of our other HBase tests. To prevent this, we xfail the test until the HBase issue has been addressed (or we find a better way to trigger a 255 error). IMPALA-6212 tracks re-enabling the test in the future. Change-Id: I55979bed07147409949b798d4beb7a3b3b7ec5c3 --- M tests/data_errors/test_data_errors.py 1 file changed, 1 insertion(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/8590/1 -- To view, visit http://gerrit.cloudera.org:8080/8590 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I55979bed07147409949b798d4beb7a3b3b7ec5c3 Gerrit-Change-Number: 8590 Gerrit-PatchSet: 1 Gerrit-Owner: Lars Volker
[Impala-ASF-CR] IMPALA-6210: Add query id to lineage graph logging
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8589 ) Change subject: IMPALA-6210: Add query id to lineage graph logging .. Patch Set 1: (4 comments) Thanks! This looks largely good to me. Would you mind adding an assertion to tests/custom_cluster/test_lineage.py that the query id is populated, and is maybe the same as reported in the profile? It's trivial, but it'd be nice to convince ourselves that TUniqueIdUtil.java matches the C++ implementations, too. http://gerrit.cloudera.org:8080/#/c/8589/1/common/thrift/LineageGraph.thrift File common/thrift/LineageGraph.thrift: http://gerrit.cloudera.org:8080/#/c/8589/1/common/thrift/LineageGraph.thrift@55 PS1, Line 55: 3: required string hash It's poor form to renumber thrift structs. Typically, you just use the next available number. This makes serialized structs compatible over time. (In practice, these are internal RPCs and are unlikely to have been serialized, but I think it's best not to violate the rules.) See https://developers.google.com/protocol-buffers/docs/proto#assigning-tags about "unique numbered tags". Thrift and ProtoBuf are basically isomorphic in this respect. http://gerrit.cloudera.org:8080/#/c/8589/1/fe/src/main/java/org/apache/impala/util/TUniqueIdUtil.java File fe/src/main/java/org/apache/impala/util/TUniqueIdUtil.java: http://gerrit.cloudera.org:8080/#/c/8589/1/fe/src/main/java/org/apache/impala/util/TUniqueIdUtil.java@25 PS1, Line 25: public class TUniqueIdUtil { I didn't see any test code for this. It'd be good to unit test it, and confirm that you're getting the same results as the tests in be/src/util/debug-util-test.cc. Should be quick, and it'd be good to check that the Java and C++ versions of this code are identical. http://gerrit.cloudera.org:8080/#/c/8589/1/fe/src/main/java/org/apache/impala/util/TUniqueIdUtil.java@26 PS1, Line 26: public static String Print(TUniqueId id) { This is called PrintId() and ParseId() in be/src/util/debug-util.cc. Perhaps use the same name to make it more obvious? http://gerrit.cloudera.org:8080/#/c/8589/1/fe/src/main/java/org/apache/impala/util/TUniqueIdUtil.java@31 PS1, Line 31: String[] splitted = id.split(":"); nit: one space after =. -- To view, visit http://gerrit.cloudera.org:8080/8589 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4adbd02df37a234dbb79f58b7c46ca11a914229f Gerrit-Change-Number: 8589 Gerrit-PatchSet: 1 Gerrit-Owner: Tianyi Wang Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Fri, 17 Nov 2017 23:25:18 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4591: Bound Kudu client error mem usage
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/8464 ) Change subject: IMPALA-4591: Bound Kudu client error mem usage .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/8464/2/be/src/exec/kudu-table-sink.cc File be/src/exec/kudu-table-sink.cc: http://gerrit.cloudera.org:8080/#/c/8464/2/be/src/exec/kudu-table-sink.cc@350 PS2, Line 350: mem_tracker_->Release(client_tracked_bytes_); > The mutation buffer is cleared by KuduSession::Flush() and the error buffer When you say "cleared" do you mean zeroed or freed? Also, if CountPendingErrors() is 0, we skip GetPendingErrors() -- is it possible for the client to have a non-zero sized buffer still in that case? FlushFinal() isn't guaranteed to be called if fragment instance execution was terminated due to an error (see FragmentInstanceState::ExecInternal()). The session_ is a shared_ptr -- I wonder if the implicit client API is that we should be nulling it out to drop our reference to cause the session resources to be freed? The client.h header does say this: /// @return A new session object; caller is responsible for destroying it. sp::shared_ptr NewSession(); http://gerrit.cloudera.org:8080/#/c/8464/2/tests/custom_cluster/test_kudu.py File tests/custom_cluster/test_kudu.py: http://gerrit.cloudera.org:8080/#/c/8464/2/tests/custom_cluster/test_kudu.py@78 PS2, Line 78: assert "Error overflow in Kudu session." in str(e) > Yes, there are existing tests in kudu_insert.test that generate thousands o But what about when the limit is set to 1024? Is there a way to determine that we don't hit the limit too early? Anyway, you can decide whether the testing in that regard is sufficient already or not. -- To view, visit http://gerrit.cloudera.org:8080/8464 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I186ddb3f3b5865e08f17dba57cf6640591d06b14 Gerrit-Change-Number: 8464 Gerrit-PatchSet: 2 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 17 Nov 2017 23:14:14 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6206: Fix data load failure with -notests
Zach Amsden has posted comments on this change. ( http://gerrit.cloudera.org:8080/8580 ) Change subject: IMPALA-6206: Fix data load failure with -notests .. Patch Set 3: Even simpler, we don't need a new target at all. -- To view, visit http://gerrit.cloudera.org:8080/8580 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idaa193989d77d56b72db05ad54e1fb0a345938fb Gerrit-Change-Number: 8580 Gerrit-PatchSet: 3 Gerrit-Owner: Zach Amsden Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Fri, 17 Nov 2017 23:09:47 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-1575: part 2: yield admission control resources
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/8581 ) Change subject: IMPALA-1575: part 2: yield admission control resources .. Patch Set 1: Code-Review+2 Walked through this and compared to the original patch. Everything matches up except for the statestore frequency change. Looks good. -- To view, visit http://gerrit.cloudera.org:8080/8581 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib1fae8dc1c4b0eca7bfa8fadae4a56ef2b37947a Gerrit-Change-Number: 8581 Gerrit-PatchSet: 1 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 17 Nov 2017 23:08:34 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6206: Fix data load failure with -notests
Hello Michael Brown, Philip Zeyliger, David Knupp, Tim Armstrong, Joe McDonnell, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8580 to look at the new patch set (#3). Change subject: IMPALA-6206: Fix data load failure with -notests .. IMPALA-6206: Fix data load failure with -notests When tests are not built, specifically with -notests, instead of just -skiptests, the be-test target is omitted by cmake, and since nothing in impalad depends on uda/udf samples, they are not built. This causes data load to fail on a clean build. Just build them anyway under the target ImpalaUdf since they build in a few seconds. This shaves a few minutes off data load testing by avoiding the time spent linking tests. Note: only emperically, not scientifically measured. Testing: Run a clean build find . -name 'libud[af]sample.so' ./be/build/debug/udf_samples/libudasample.so ./be/build/debug/udf_samples/libudfsample.so Change-Id: Idaa193989d77d56b72db05ad54e1fb0a345938fb --- M bin/make_impala.sh 1 file changed, 2 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/8580/3 -- To view, visit http://gerrit.cloudera.org:8080/8580 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idaa193989d77d56b72db05ad54e1fb0a345938fb Gerrit-Change-Number: 8580 Gerrit-PatchSet: 3 Gerrit-Owner: Zach Amsden Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-4591: Bound Kudu client error mem usage
Hello Tim Armstrong, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8464 to look at the new patch set (#3). Change subject: IMPALA-4591: Bound Kudu client error mem usage .. IMPALA-4591: Bound Kudu client error mem usage Previously, Kudu client errors could grow in size unbounded, potentially causing the process to be killed. This patch sets a bound on the mem that can be used for these error messages, with the size determined by the flag 'kudu_error_buffer_size'. If the errors for a Kudu client exceed this size, the query will fail, as some errors will be dropped and we won't be able to tell if all of the errors can be safely ignored. Testing: - Added a custom cluster test that verifies that a query that exceeds the limit fails. Change-Id: I186ddb3f3b5865e08f17dba57cf6640591d06b14 --- M be/src/exec/kudu-table-sink.cc M be/src/exec/kudu-table-sink.h M tests/custom_cluster/test_kudu.py 3 files changed, 41 insertions(+), 26 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/8464/3 -- To view, visit http://gerrit.cloudera.org:8080/8464 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I186ddb3f3b5865e08f17dba57cf6640591d06b14 Gerrit-Change-Number: 8464 Gerrit-PatchSet: 3 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4591: Bound Kudu client error mem usage
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/8464 ) Change subject: IMPALA-4591: Bound Kudu client error mem usage .. Patch Set 3: (3 comments) http://gerrit.cloudera.org:8080/#/c/8464/2/be/src/exec/kudu-table-sink.cc File be/src/exec/kudu-table-sink.cc: http://gerrit.cloudera.org:8080/#/c/8464/2/be/src/exec/kudu-table-sink.cc@193 PS2, Line 193: KUDU_RETURN_IF_ERROR(session_->SetErrorBufferSpace(error_buffer_size), > Stray debug log Done http://gerrit.cloudera.org:8080/#/c/8464/2/be/src/exec/kudu-table-sink.cc@350 PS2, Line 350: mem_tracker_->Release(client_tracked_bytes_); > when does the client actually free the buffers that this is accounting? it The mutation buffer is cleared by KuduSession::Flush() and the error buffer is cleared by KuduSession::GetPendingErrors(), both called in FlushFinal(). http://gerrit.cloudera.org:8080/#/c/8464/2/tests/custom_cluster/test_kudu.py File tests/custom_cluster/test_kudu.py: http://gerrit.cloudera.org:8080/#/c/8464/2/tests/custom_cluster/test_kudu.py@78 PS2, Line 78: assert "Error overflow in Kudu session." in str(e) > is there a test case that generates a good number of errors and verifies th Yes, there are existing tests in kudu_insert.test that generate thousands of errors. As mentioned elsewhere in the review, it takes millions of errors to hit the default limit, but I'm not sure that adding a test that gets close to the limit gives us coverage that's worth the extra testing time. Happy to add one if you feel differently. -- To view, visit http://gerrit.cloudera.org:8080/8464 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I186ddb3f3b5865e08f17dba57cf6640591d06b14 Gerrit-Change-Number: 8464 Gerrit-PatchSet: 3 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 17 Nov 2017 23:02:40 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo
Hello Lars Volker, Philip Zeyliger, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8546 to look at the new patch set (#4). Change subject: IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo .. IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo Running shell commands from impalad can be problematic, because using popen() leads to forking which causes a spike in virtual memory. To avoid this, "ls" is replaced with Posix api calls. FileDescriptorMap fd_desc_ was only used to get the number of file descriptors, so it was unneccesery work to initialize it. It is removed, and only the number of file descriptors is computed. The automatic test for this function is only a sanity check, because there is no way to know the "expected value" in advance, and the number of file desciptors can change anytime. Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0 --- M be/src/util/proc-info-test.cc M be/src/util/process-state-info.cc M be/src/util/process-state-info.h 3 files changed, 16 insertions(+), 28 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/8546/4 -- To view, visit http://gerrit.cloudera.org:8080/8546 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0 Gerrit-Change-Number: 8546 Gerrit-PatchSet: 4 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Philip Zeyliger
[Impala-ASF-CR] IMPALA-4835 (prep only): create io subfolder and namespace
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8424 ) Change subject: IMPALA-4835 (prep only): create io subfolder and namespace .. IMPALA-4835 (prep only): create io subfolder and namespace Instead of using the DiskIoMgr class as a namespace, which prevents forward-declaration of inner classes, create an impala::io namespace and unnested the inner class. This is done in anticipation of DiskIoMgr depending on BufferPool. This helps avoid a circular dependency between DiskIoMgr, TmpFileMgr and BufferPool headers that could not be broken with forward declarations. Testing: Ran core tests. Change-Id: If807f93a47d8027a43e56dd80b1b535d0bb74e1b Reviewed-on: http://gerrit.cloudera.org:8080/8424 Reviewed-by: Tim Armstrong Tested-by: Impala Public Jenkins --- M be/CMakeLists.txt M be/src/exec/base-sequence-scanner.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scan-node-mt.h M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scan-node.h M be/src/exec/hdfs-text-scanner.cc M be/src/exec/kudu-scan-node.cc M be/src/exec/scanner-context.cc M be/src/exec/scanner-context.h M be/src/runtime/CMakeLists.txt D be/src/runtime/disk-io-mgr.h M be/src/runtime/exec-env.cc M be/src/runtime/exec-env.h A be/src/runtime/io/CMakeLists.txt R be/src/runtime/io/disk-io-mgr-internal.h R be/src/runtime/io/disk-io-mgr-stress-test.cc R be/src/runtime/io/disk-io-mgr-stress.cc R be/src/runtime/io/disk-io-mgr-stress.h R be/src/runtime/io/disk-io-mgr-test.cc R be/src/runtime/io/disk-io-mgr.cc A be/src/runtime/io/disk-io-mgr.h R be/src/runtime/io/handle-cache.h R be/src/runtime/io/handle-cache.inline.h R be/src/runtime/io/request-context.cc R be/src/runtime/io/request-context.h A be/src/runtime/io/request-ranges.h R be/src/runtime/io/scan-range.cc M be/src/runtime/row-batch.h M be/src/runtime/runtime-state.cc M be/src/runtime/runtime-state.h M be/src/runtime/test-env.h M be/src/runtime/tmp-file-mgr-test.cc M be/src/runtime/tmp-file-mgr.cc M be/src/runtime/tmp-file-mgr.h 38 files changed, 1,417 insertions(+), 1,310 deletions(-) Approvals: Tim Armstrong: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/8424 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: If807f93a47d8027a43e56dd80b1b535d0bb74e1b Gerrit-Change-Number: 8424 Gerrit-PatchSet: 10 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4835 (prep only): create io subfolder and namespace
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8424 ) Change subject: IMPALA-4835 (prep only): create io subfolder and namespace .. Patch Set 9: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/8424 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If807f93a47d8027a43e56dd80b1b535d0bb74e1b Gerrit-Change-Number: 8424 Gerrit-PatchSet: 9 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 17 Nov 2017 22:47:33 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6210: Add query id to lineage graph logging
Tianyi Wang has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8589 Change subject: IMPALA-6210: Add query id to lineage graph logging .. IMPALA-6210: Add query id to lineage graph logging Some tools use lineage graph logging to collect query metrics. Currently only query hash is present in this log. Adding query id into it makes such accounting easier. Change-Id: I4adbd02df37a234dbb79f58b7c46ca11a914229f --- M be/src/util/lineage-util.h M common/thrift/LineageGraph.thrift M fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java A fe/src/main/java/org/apache/impala/util/TUniqueIdUtil.java M testdata/workloads/functional-planner/queries/PlannerTest/lineage.test 5 files changed, 106 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/89/8589/1 -- To view, visit http://gerrit.cloudera.org:8080/8589 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I4adbd02df37a234dbb79f58b7c46ca11a914229f Gerrit-Change-Number: 8589 Gerrit-PatchSet: 1 Gerrit-Owner: Tianyi Wang
[Impala-ASF-CR] IMPALA-4927: Impala should be able to handle invalid input from Sentry
Pranay Singh has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8588 Change subject: IMPALA-4927: Impala should be able to handle invalid input from Sentry .. IMPALA-4927: Impala should be able to handle invalid input from Sentry Impala requests a list of roles from Sentry and then asks for privileges for each role. If Sentry returns a non existent role in the first step, then there will be a Java exception in Impala in the second step and the communication with Sentry is aborted. The issue is fixed by handling the exception if an invalid role is found and continue with getting permissions for the rest of the roles. Testing: --- Since invalid role could not be created through impala-shell/Hue interface the code was instrumented to have a invalid Role " " and see how the condition is handled. Change-Id: I781411018d580854d80a9cad81a1ded7ca16af8b --- M fe/src/main/java/org/apache/impala/util/SentryProxy.java 1 file changed, 29 insertions(+), 24 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/88/8588/1 -- To view, visit http://gerrit.cloudera.org:8080/8588 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I781411018d580854d80a9cad81a1ded7ca16af8b Gerrit-Change-Number: 8588 Gerrit-PatchSet: 1 Gerrit-Owner: Pranay Singh
[Impala-ASF-CR] Pin gen build version's git handling to typical git dir.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8500 ) Change subject: Pin gen_build_version's git handling to typical git dir. .. Patch Set 3: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1495/ -- To view, visit http://gerrit.cloudera.org:8080/8500 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I98870a1cb2073ddcf9ba1620e3801e1964930edd Gerrit-Change-Number: 8500 Gerrit-PatchSet: 3 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Fri, 17 Nov 2017 21:46:09 + Gerrit-HasComments: No
[Impala-ASF-CR] Pin gen build version's git handling to typical git dir.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/8500 ) Change subject: Pin gen_build_version's git handling to typical git dir. .. Patch Set 3: Code-Review+2 Carry +2 -- To view, visit http://gerrit.cloudera.org:8080/8500 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I98870a1cb2073ddcf9ba1620e3801e1964930edd Gerrit-Change-Number: 8500 Gerrit-PatchSet: 3 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Fri, 17 Nov 2017 21:44:42 + Gerrit-HasComments: No
[Impala-ASF-CR] Removing testdata/bin/run-hive.sh.
Philip Zeyliger has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8503 Change subject: Removing testdata/bin/run-hive.sh. .. Removing testdata/bin/run-hive.sh. I can't find any uses of it. Change-Id: Ibdb65f8390efec8559cea59da0a48584a383cb24 --- D testdata/bin/run-hive.sh 1 file changed, 0 insertions(+), 40 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/03/8503/1 -- To view, visit http://gerrit.cloudera.org:8080/8503 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ibdb65f8390efec8559cea59da0a48584a383cb24 Gerrit-Change-Number: 8503 Gerrit-PatchSet: 1 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger
[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/7793 ) Change subject: IMPALA-4252: Min-max runtime filters for Kudu .. IMPALA-4252: Min-max runtime filters for Kudu This patch implements min-max filters for runtime filters. Each runtime filter generates a bloom filter or a min-max filter, depending on if it has HDFS or Kudu targets, respectively. In RuntimeFilterGenerator in the planner, each hash join node generates a bloom and min-max filter for each equi-join predicate, but only those filters that end up being assigned to a target make it into the final plan. Min-max filters are only assigned to Kudu scans if the target expr is a column, as Kudu doesn't support bounds on general exprs, and only if the join op is '=' and not 'is distinct from', as Kudu doesn't support returning NULLs if a bound is set. Min-max filters are inserted into by the PartitionedHashJoinBuilder. Codegen is used to eliminate branching on the type of filter. String min-max filters truncate their bounds at 1024 chars, so that the max amount of memory used by min-max filters is negligible. For now, min-max filters are only applied at the KuduScanner, which passes them into the Kudu client. Future work will address applying min-max filters at HDFS scan nodes and applying bloom filters at Kudu scan nodes. Functional Testing: - Added new planner tests and updated the old ones. (in old tests, a lot of runtime filters are renumbered as we always generate min-max filters even if they don't end up getting assigned and they take up some of the RF ids). - Updated existing runtime filter tests to work with Kudu. - Added e2e tests for min-max filter specific functionality. Perf Testing: - All tests run on Kudu stress cluster (10 nodes) and tpch_100_kudu, timings are averages of 3 runs. - Ran a contrived query with a filter that does not eliminate any rows (full self join of lineitem). The difference in running time was negligible - 24.46s with filters on, 24.15s with filters off for a ~1% slowdown. - Ran a contrived query with a filter that elimiates all rows (self join on lineitem with a join condition that never matches). The filters resulted in a significant speedup - 0.26s with filters on, 1.46s with filters off for a ~5.6x speedup. This query is added to targeted-perf. Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c Reviewed-on: http://gerrit.cloudera.org:8080/7793 Reviewed-by: Thomas Tauber-Marshall Tested-by: Impala Public Jenkins --- M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/hdfs-parquet-scanner-ir.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/kudu-scan-node-base.cc M be/src/exec/kudu-scan-node-mt.cc M be/src/exec/kudu-scan-node.cc M be/src/exec/kudu-scanner.cc M be/src/exec/kudu-scanner.h M be/src/exec/kudu-util.cc M be/src/exec/kudu-util.h M be/src/exec/partitioned-hash-join-builder-ir.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/scan-node.cc M be/src/runtime/coordinator-filter-state.h M be/src/runtime/coordinator.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/runtime-filter-bank.cc M be/src/runtime/runtime-filter-bank.h M be/src/runtime/runtime-filter-ir.cc M be/src/runtime/runtime-filter.cc M be/src/runtime/runtime-filter.h M be/src/runtime/runtime-filter.inline.h M be/src/runtime/timestamp-value.h M be/src/service/impala-internal-service.cc M be/src/util/CMakeLists.txt A be/src/util/min-max-filter-ir.cc A be/src/util/min-max-filter-test.cc A be/src/util/min-max-filter.cc A be/src/util/min-max-filter.h M common/thrift/Data.thrift M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test M testdata/workloads/functional-planner/queries/PlannerTest/implicit-joins.test M testdata/workloads/functional-planner/queries/PlannerTest/inline-view-limit.test M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test M testdata/workloads/functional-planner/queries/PlannerTest/joins.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu-
[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/7793 ) Change subject: IMPALA-4252: Min-max runtime filters for Kudu .. Patch Set 16: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/7793 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c Gerrit-Change-Number: 7793 Gerrit-PatchSet: 16 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Anonymous Coward #345 Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 17 Nov 2017 21:33:50 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5053: [SECURITY] Make KRPC work with Kerberos
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/8270 ) Change subject: IMPALA-5053: [SECURITY] Make KRPC work with Kerberos .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/8270/4/be/src/rpc/authentication.cc File be/src/rpc/authentication.cc: http://gerrit.cloudera.org:8080/#/c/8270/4/be/src/rpc/authentication.cc@593 PS4, Line 593: !FLAGS_principal.empty() What do you think about wrapping this inside an inline function ? bool IsKerberosEnabled() { return !FLAGS_prinicipal.empty(); } http://gerrit.cloudera.org:8080/#/c/8270/3/be/src/util/auth-util.cc File be/src/util/auth-util.cc: http://gerrit.cloudera.org:8080/#/c/8270/3/be/src/util/auth-util.cc@90 PS3, Line 90: } > Yes, this function won't return an error for that. Should we do more in dep It seems better to at least make sure / appears before @. -- To view, visit http://gerrit.cloudera.org:8080/8270 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8cec5cca5fdb4b1d46bab19e86cb1a8a3ad718fd Gerrit-Change-Number: 8270 Gerrit-PatchSet: 4 Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Fri, 17 Nov 2017 20:07:17 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8294 ) Change subject: IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs .. Patch Set 3: Code-Review+1 (3 comments) I'm fine with this as-is; there are very minor nits that you can tackle if you'd like. Please do answer the question about changing the FE tests to just use s3a so that we can remove the hackery altogether. Thanks! http://gerrit.cloudera.org:8080/#/c/8294/3/bin/check-s3-access.sh File bin/check-s3-access.sh: http://gerrit.cloudera.org:8080/#/c/8294/3/bin/check-s3-access.sh@56 PS3, Line 56: exit 0; you don't need the semicolon http://gerrit.cloudera.org:8080/#/c/8294/3/testdata/cluster/admin File testdata/cluster/admin: http://gerrit.cloudera.org:8080/#/c/8294/3/testdata/cluster/admin@289 PS3, Line 289: sed '//d' \ You could use "sed -i" to edit this in place, though I know Mac OS X sed behaves slightly differently than Linux sed, causing some trouble. We seem to have some limited "sed -i" usage already. I'm not insisting. http://gerrit.cloudera.org:8080/#/c/8294/3/testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.tmpl File testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.tmpl: http://gerrit.cloudera.org:8080/#/c/8294/3/testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.tmpl@59 PS3, Line 59: Is there any reason not to s/s3a/s3n/ in the tests so that we can get rid of this? -- To view, visit http://gerrit.cloudera.org:8080/8294 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I14cd9d4453a91baad3c379aa7e4944993fca95ae Gerrit-Change-Number: 8294 Gerrit-PatchSet: 3 Gerrit-Owner: Laszlo Gaal Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 17 Nov 2017 19:59:53 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4835 (prep only): create io subfolder and namespace
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8424 ) Change subject: IMPALA-4835 (prep only): create io subfolder and namespace .. Patch Set 9: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1494/ -- To view, visit http://gerrit.cloudera.org:8080/8424 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If807f93a47d8027a43e56dd80b1b535d0bb74e1b Gerrit-Change-Number: 8424 Gerrit-PatchSet: 9 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 17 Nov 2017 19:16:47 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6206: Fix data load failure with -notests
Zach Amsden has posted comments on this change. ( http://gerrit.cloudera.org:8080/8580 ) Change subject: IMPALA-6206: Fix data load failure with -notests .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/8580/2/be/src/udf_samples/CMakeLists.txt File be/src/udf_samples/CMakeLists.txt: http://gerrit.cloudera.org:8080/#/c/8580/2/be/src/udf_samples/CMakeLists.txt@44 PS2, Line 44: # Data loading depends on building these targets > This feels a little weird since those targets aren't really part of the Imp I felt the same way but didn't want to clutter things with more targets, so I lazily re-used an existing one. I'll make a proper top-level target instead. -- To view, visit http://gerrit.cloudera.org:8080/8580 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idaa193989d77d56b72db05ad54e1fb0a345938fb Gerrit-Change-Number: 8580 Gerrit-PatchSet: 2 Gerrit-Owner: Zach Amsden Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Fri, 17 Nov 2017 19:10:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/8546 ) Change subject: IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo .. Patch Set 3: > Patch Set 3: > > > > Patch Set 3: > > > > > > > Dan, what's your take? > > > > > > I think the question is where does the structure that is returned > > come from? In the non-reentrant C lib functions, the problem is > > usually that they return a pointer to a static structure, so if you > > call the function again (or concurrently), the returned structure > > can be overwritten. > > > > I double checked the opendir() implementation in sysdeps/posix/opendir.c > > and it does not return any static structures but calls malloc() to > > get a new DIR*. > > I was talking about the return value of readdir() (dirent*). Ah, I see, sorry for the confusion. readdir() only accesses the DIR* struct that is passed as a parameter (I checked that). The DIR* struct gets created by opendir() and does not share data structures with other DIR*s. It contains a fd, which is different for every DIR*, too. readdir() eventually calls getdents(2) and copies the result into the DIR* struct. -- To view, visit http://gerrit.cloudera.org:8080/8546 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0 Gerrit-Change-Number: 8546 Gerrit-PatchSet: 3 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Fri, 17 Nov 2017 18:45:57 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5936: operator '%' overflows on large decimals
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/8574 ) Change subject: IMPALA-5936: operator '%' overflows on large decimals .. Patch Set 2: Code-Review+1 Taras, please do the +2 review on this one. -- To view, visit http://gerrit.cloudera.org:8080/8574 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2b06c8acd5aa490943e84013faf2eaac7c26ceb4 Gerrit-Change-Number: 8574 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 17 Nov 2017 18:19:50 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/8546 ) Change subject: IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo .. Patch Set 3: > > Patch Set 3: > > > > > Dan, what's your take? > > > > I think the question is where does the structure that is returned > come from? In the non-reentrant C lib functions, the problem is > usually that they return a pointer to a static structure, so if you > call the function again (or concurrently), the returned structure > can be overwritten. > > I double checked the opendir() implementation in sysdeps/posix/opendir.c > and it does not return any static structures but calls malloc() to > get a new DIR*. I was talking about the return value of readdir() (dirent*). -- To view, visit http://gerrit.cloudera.org:8080/8546 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0 Gerrit-Change-Number: 8546 Gerrit-PatchSet: 3 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Fri, 17 Nov 2017 18:08:56 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/7793 ) Change subject: IMPALA-4252: Min-max runtime filters for Kudu .. Patch Set 16: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1493/ -- To view, visit http://gerrit.cloudera.org:8080/7793 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c Gerrit-Change-Number: 7793 Gerrit-PatchSet: 16 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Anonymous Coward #345 Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 17 Nov 2017 18:07:55 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/7793 ) Change subject: IMPALA-4252: Min-max runtime filters for Kudu .. Patch Set 16: Code-Review+2 GVO failed due to minor clang tidy issue. -- To view, visit http://gerrit.cloudera.org:8080/7793 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c Gerrit-Change-Number: 7793 Gerrit-PatchSet: 16 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Anonymous Coward #345 Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 17 Nov 2017 18:07:34 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu
Hello Michael Ho, Lars Volker, Matthew Jacobs, Anonymous Coward #345, Tim Armstrong, Todd Lipcon, Mostafa Mokhtar, Alex Behm, Dan Hecht, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7793 to look at the new patch set (#16). Change subject: IMPALA-4252: Min-max runtime filters for Kudu .. IMPALA-4252: Min-max runtime filters for Kudu This patch implements min-max filters for runtime filters. Each runtime filter generates a bloom filter or a min-max filter, depending on if it has HDFS or Kudu targets, respectively. In RuntimeFilterGenerator in the planner, each hash join node generates a bloom and min-max filter for each equi-join predicate, but only those filters that end up being assigned to a target make it into the final plan. Min-max filters are only assigned to Kudu scans if the target expr is a column, as Kudu doesn't support bounds on general exprs, and only if the join op is '=' and not 'is distinct from', as Kudu doesn't support returning NULLs if a bound is set. Min-max filters are inserted into by the PartitionedHashJoinBuilder. Codegen is used to eliminate branching on the type of filter. String min-max filters truncate their bounds at 1024 chars, so that the max amount of memory used by min-max filters is negligible. For now, min-max filters are only applied at the KuduScanner, which passes them into the Kudu client. Future work will address applying min-max filters at HDFS scan nodes and applying bloom filters at Kudu scan nodes. Functional Testing: - Added new planner tests and updated the old ones. (in old tests, a lot of runtime filters are renumbered as we always generate min-max filters even if they don't end up getting assigned and they take up some of the RF ids). - Updated existing runtime filter tests to work with Kudu. - Added e2e tests for min-max filter specific functionality. Perf Testing: - All tests run on Kudu stress cluster (10 nodes) and tpch_100_kudu, timings are averages of 3 runs. - Ran a contrived query with a filter that does not eliminate any rows (full self join of lineitem). The difference in running time was negligible - 24.46s with filters on, 24.15s with filters off for a ~1% slowdown. - Ran a contrived query with a filter that elimiates all rows (self join on lineitem with a join condition that never matches). The filters resulted in a significant speedup - 0.26s with filters on, 1.46s with filters off for a ~5.6x speedup. This query is added to targeted-perf. Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c --- M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/hdfs-parquet-scanner-ir.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/kudu-scan-node-base.cc M be/src/exec/kudu-scan-node-mt.cc M be/src/exec/kudu-scan-node.cc M be/src/exec/kudu-scanner.cc M be/src/exec/kudu-scanner.h M be/src/exec/kudu-util.cc M be/src/exec/kudu-util.h M be/src/exec/partitioned-hash-join-builder-ir.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/scan-node.cc M be/src/runtime/coordinator-filter-state.h M be/src/runtime/coordinator.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/runtime-filter-bank.cc M be/src/runtime/runtime-filter-bank.h M be/src/runtime/runtime-filter-ir.cc M be/src/runtime/runtime-filter.cc M be/src/runtime/runtime-filter.h M be/src/runtime/runtime-filter.inline.h M be/src/runtime/timestamp-value.h M be/src/service/impala-internal-service.cc M be/src/util/CMakeLists.txt A be/src/util/min-max-filter-ir.cc A be/src/util/min-max-filter-test.cc A be/src/util/min-max-filter.cc A be/src/util/min-max-filter.h M common/thrift/Data.thrift M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test M testdata/workloads/functional-planner/queries/PlannerTest/implicit-joins.test M testdata/workloads/functional-planner/queries/PlannerTest/inline-view-limit.test M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test M testdata/workloads/functional-planner/queries/PlannerTest/joins.test M t
[Impala-ASF-CR] IMPALA-3436: Return a decimal when rounding a double
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/8398 ) Change subject: IMPALA-3436: Return a decimal when rounding a double .. Patch Set 3: Taras, can you summarize where we are at (should reviews continue?) Or even -1 it if the plan has changed (or isn't yet known). -- To view, visit http://gerrit.cloudera.org:8080/8398 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I240ec70da7996bbfefcf6438eba4dff8d33d7bd6 Gerrit-Change-Number: 8398 Gerrit-PatchSet: 3 Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 17 Nov 2017 18:05:20 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5019: Decimal V2 addition
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/8309 ) Change subject: IMPALA-5019: Decimal V2 addition .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/8309/4/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: http://gerrit.cloudera.org:8080/#/c/8309/4/be/src/exprs/expr-test.cc@a2402 PS4, Line 2402: why does this rounding no longer happen in decimal v2? -- To view, visit http://gerrit.cloudera.org:8080/8309 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I401049c56d910eb1546a178c909c923b01239336 Gerrit-Change-Number: 8309 Gerrit-PatchSet: 4 Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Fri, 17 Nov 2017 17:59:54 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/8546 ) Change subject: IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo .. Patch Set 3: > Patch Set 3: > > > Dan, what's your take? > > I think the question is where does the structure that is returned come from? > In the non-reentrant C lib functions, the problem is usually that they return > a pointer to a static structure, so if you call the function again (or > concurrently), the returned structure can be overwritten. I double checked the opendir() implementation in sysdeps/posix/opendir.c and it does not return any static structures but calls malloc() to get a new DIR*. -- To view, visit http://gerrit.cloudera.org:8080/8546 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0 Gerrit-Change-Number: 8546 Gerrit-PatchSet: 3 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Fri, 17 Nov 2017 17:58:42 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/8546 ) Change subject: IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo .. Patch Set 3: > > Dan, what's your take? > > I think the question is where does the structure that is returned > come from? In the non-reentrant C lib functions, the problem is > usually that they return a pointer to a static structure, so if you > call the function again (or concurrently), the returned structure > can be overwritten. On, just saw that comment about modern implementations being thread safe and readdir_r being deprecated. So, yeah seems like readdir() is the way to go. -- To view, visit http://gerrit.cloudera.org:8080/8546 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0 Gerrit-Change-Number: 8546 Gerrit-PatchSet: 3 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Fri, 17 Nov 2017 17:49:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4835 (prep only): create io subfolder and namespace
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8424 ) Change subject: IMPALA-4835 (prep only): create io subfolder and namespace .. Patch Set 9: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8424 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If807f93a47d8027a43e56dd80b1b535d0bb74e1b Gerrit-Change-Number: 8424 Gerrit-PatchSet: 9 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 17 Nov 2017 17:47:23 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/8546 ) Change subject: IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo .. Patch Set 3: > Dan, what's your take? I think the question is where does the structure that is returned come from? In the non-reentrant C lib functions, the problem is usually that they return a pointer to a static structure, so if you call the function again (or concurrently), the returned structure can be overwritten. -- To view, visit http://gerrit.cloudera.org:8080/8546 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0 Gerrit-Change-Number: 8546 Gerrit-PatchSet: 3 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Fri, 17 Nov 2017 17:47:25 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4835 (prep only): create io subfolder and namespace
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8424 ) Change subject: IMPALA-4835 (prep only): create io subfolder and namespace .. Patch Set 9: rebased and tweaked the #define guard to include _IO so that text scanner plugins like Impala-lzo can detect whether it is pre/post refactor. -- To view, visit http://gerrit.cloudera.org:8080/8424 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If807f93a47d8027a43e56dd80b1b535d0bb74e1b Gerrit-Change-Number: 8424 Gerrit-PatchSet: 9 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 17 Nov 2017 17:47:17 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4835: Part 1: simplify I/O mgr mem mgmt and cancellation
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8414 ) Change subject: IMPALA-4835: Part 1: simplify I/O mgr mem mgmt and cancellation .. Patch Set 8: (5 comments) http://gerrit.cloudera.org:8080/#/c/8414/8/be/src/runtime/io/request-context.h File be/src/runtime/io/request-context.h: http://gerrit.cloudera.org:8080/#/c/8414/8/be/src/runtime/io/request-context.h@171 PS8, Line 171: // boost doesn't let us dcheck that the reader lock is taken > Outdated comment Done http://gerrit.cloudera.org:8080/#/c/8414/8/be/src/runtime/io/request-context.h@294 PS8, Line 294: InternalQueue cached_ranges_; > It seems these queues don't need the spinlocks in them. But we don't need t Good point, we can fix that now - it makes it confusing about whether holding 'lock_' is actually required. We have an InternalList class that is the same thing without the locking. http://gerrit.cloudera.org:8080/#/c/8414/8/be/src/runtime/io/request-context.h@366 PS8, Line 366: void DecrementRequestThread(const boost::unique_lock& context_lock, > How about DecrementDiskThread? I don't see the term "request thread" elsewh Done http://gerrit.cloudera.org:8080/#/c/8414/8/be/src/runtime/io/request-context.h@378 PS8, Line 378: if (is_on_queue_.Load() == 0 && num_threads_in_op_.Load() == 0 && !done_) { > I see the correctness here. It's pretty subtle. Yeah it took me a while to figure it out. I think we could simplify it further but I stopped at documenting why it was correct. I think this patch is already large. http://gerrit.cloudera.org:8080/#/c/8414/8/be/src/runtime/io/request-context.h@386 PS8, Line 386: /// If done is true, it means that this request must not be on this disk's queue > Is this statement true? I think RequestContext::Cancel() can enqueue the co I'm pretty sure it's true. Validate() does check this. Cancel() doesn't actually set done or even wait for the disks to be done - instead it relies on the disk threads to see that the context is cancelled and set "done". I tried to improve the comment in Cancel() a little bit. -- To view, visit http://gerrit.cloudera.org:8080/8414 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If5cb42437d11c13bc4a55c3ab426b66777332bd1 Gerrit-Change-Number: 8414 Gerrit-PatchSet: 8 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 17 Nov 2017 17:44:13 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4835 (prep only): create io subfolder and namespace
Hello Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8424 to look at the new patch set (#9). Change subject: IMPALA-4835 (prep only): create io subfolder and namespace .. IMPALA-4835 (prep only): create io subfolder and namespace Instead of using the DiskIoMgr class as a namespace, which prevents forward-declaration of inner classes, create an impala::io namespace and unnested the inner class. This is done in anticipation of DiskIoMgr depending on BufferPool. This helps avoid a circular dependency between DiskIoMgr, TmpFileMgr and BufferPool headers that could not be broken with forward declarations. Testing: Ran core tests. Change-Id: If807f93a47d8027a43e56dd80b1b535d0bb74e1b --- M be/CMakeLists.txt M be/src/exec/base-sequence-scanner.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scan-node-mt.h M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scan-node.h M be/src/exec/hdfs-text-scanner.cc M be/src/exec/kudu-scan-node.cc M be/src/exec/scanner-context.cc M be/src/exec/scanner-context.h M be/src/runtime/CMakeLists.txt D be/src/runtime/disk-io-mgr.h M be/src/runtime/exec-env.cc M be/src/runtime/exec-env.h A be/src/runtime/io/CMakeLists.txt R be/src/runtime/io/disk-io-mgr-internal.h R be/src/runtime/io/disk-io-mgr-stress-test.cc R be/src/runtime/io/disk-io-mgr-stress.cc R be/src/runtime/io/disk-io-mgr-stress.h R be/src/runtime/io/disk-io-mgr-test.cc R be/src/runtime/io/disk-io-mgr.cc A be/src/runtime/io/disk-io-mgr.h R be/src/runtime/io/handle-cache.h R be/src/runtime/io/handle-cache.inline.h R be/src/runtime/io/request-context.cc R be/src/runtime/io/request-context.h A be/src/runtime/io/request-ranges.h R be/src/runtime/io/scan-range.cc M be/src/runtime/row-batch.h M be/src/runtime/runtime-state.cc M be/src/runtime/runtime-state.h M be/src/runtime/test-env.h M be/src/runtime/tmp-file-mgr-test.cc M be/src/runtime/tmp-file-mgr.cc M be/src/runtime/tmp-file-mgr.h 38 files changed, 1,417 insertions(+), 1,310 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/24/8424/9 -- To view, visit http://gerrit.cloudera.org:8080/8424 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If807f93a47d8027a43e56dd80b1b535d0bb74e1b Gerrit-Change-Number: 8424 Gerrit-PatchSet: 9 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4835: Part 1: simplify I/O mgr mem mgmt and cancellation
Hello Tianyi Wang, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8414 to look at the new patch set (#10). Change subject: IMPALA-4835: Part 1: simplify I/O mgr mem mgmt and cancellation .. IMPALA-4835: Part 1: simplify I/O mgr mem mgmt and cancellation In preparation for switching the I/O mgr to the buffer pool, this removes and cleans up a lot of code so that the switchover patch starts from a cleaner slate. * Remove the free buffer cache (which will be replaced by buffer pool's own caching). * Make memory limit exceeded error checking synchronous (in anticipation of having to propagate buffer pool errors synchronously). * Simplify error propagation - remove the (ineffectual) code that enqueued BufferDescriptors containing error statuses. * Document locking scheme better in a few places, make it part of the function signature when it seemed reasonable. * Move ReturnBuffer() to ScanRange, because it is intrinsically connected with the lifecycle of a scan range. * Separate external ReturnBuffer() and internal CleanUpBuffer() interfaces - previously callers of ReturnBuffer() were fudging the num_buffers_in_reader accounting to make the external interface work. * Eliminate redundant state in ScanRange: 'eosr_returned_' and 'is_cancelled_'. * Clarify the logic around calling Close() for the last BufferDescriptor. -> There appeared to be an implicit assumption that buffers would be freed in the order they were returned from the scan range, so that the "eos" buffer was returned last. Instead just count the number of outstanding buffers to detect the last one. -> Touching the is_cancelled_ field without holding a lock was hard to reason about - violated locking rules and it was unclear that it was race-free. This will probably regress performance somewhat because of the cache removal, so my plan is to merge it around the same time as switching the I/O mgr to allocate from the buffer pool. I'm keeping the patches separate to make reviewing easier. Testing: * Ran exhaustive tests * Ran the disk-io-mgr-stress-test overnight Change-Id: If5cb42437d11c13bc4a55c3ab426b66777332bd1 --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node.cc M be/src/exec/scanner-context.cc M be/src/runtime/exec-env.cc M be/src/runtime/io/disk-io-mgr-stress.cc M be/src/runtime/io/disk-io-mgr-test.cc M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/disk-io-mgr.h M be/src/runtime/io/request-context.cc M be/src/runtime/io/request-context.h M be/src/runtime/io/request-ranges.h M be/src/runtime/io/scan-range.cc M be/src/runtime/mem-tracker.h M be/src/runtime/test-env.cc M be/src/runtime/tmp-file-mgr-test.cc M be/src/runtime/tmp-file-mgr.cc M be/src/util/impalad-metrics.cc M be/src/util/impalad-metrics.h 19 files changed, 552 insertions(+), 861 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/8414/10 -- To view, visit http://gerrit.cloudera.org:8080/8414 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If5cb42437d11c13bc4a55c3ab426b66777332bd1 Gerrit-Change-Number: 8414 Gerrit-PatchSet: 10 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-6206: Fix data load failure with -notests
Joe McDonnell has removed a vote on this change. Change subject: IMPALA-6206: Fix data load failure with -notests .. Removed Code-Review+2 by Joe McDonnell -- To view, visit http://gerrit.cloudera.org:8080/8580 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: deleteVote Gerrit-Change-Id: Idaa193989d77d56b72db05ad54e1fb0a345938fb Gerrit-Change-Number: 8580 Gerrit-PatchSet: 2 Gerrit-Owner: Zach Amsden Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-6206: Fix data load failure with -notests
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8580 ) Change subject: IMPALA-6206: Fix data load failure with -notests .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/8580/2/be/src/udf_samples/CMakeLists.txt File be/src/udf_samples/CMakeLists.txt: http://gerrit.cloudera.org:8080/#/c/8580/2/be/src/udf_samples/CMakeLists.txt@44 PS2, Line 44: # Data loading depends on building these targets This feels a little weird since those targets aren't really part of the ImpalaUdf library (or in that directory). How about added a UdfSamples target and adding it to MAKE_TARGETS in bin/make_impala.sh. -- To view, visit http://gerrit.cloudera.org:8080/8580 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idaa193989d77d56b72db05ad54e1fb0a345938fb Gerrit-Change-Number: 8580 Gerrit-PatchSet: 2 Gerrit-Owner: Zach Amsden Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 17 Nov 2017 17:15:26 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6206: Fix data load failure with -notests
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/8580 ) Change subject: IMPALA-6206: Fix data load failure with -notests .. Patch Set 2: Code-Review+2 This makes sense to me. -- To view, visit http://gerrit.cloudera.org:8080/8580 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idaa193989d77d56b72db05ad54e1fb0a345938fb Gerrit-Change-Number: 8580 Gerrit-PatchSet: 2 Gerrit-Owner: Zach Amsden Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Fri, 17 Nov 2017 17:13:50 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6206: Fix data load failure with -notests
Michael Brown has posted comments on this change. ( http://gerrit.cloudera.org:8080/8580 ) Change subject: IMPALA-6206: Fix data load failure with -notests .. Patch Set 2: I don't have much cmake expertise to weigh in. This is a soft +1 from me. I'll add Joe as a reviewer to get his input. -- To view, visit http://gerrit.cloudera.org:8080/8580 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idaa193989d77d56b72db05ad54e1fb0a345938fb Gerrit-Change-Number: 8580 Gerrit-PatchSet: 2 Gerrit-Owner: Zach Amsden Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Fri, 17 Nov 2017 17:05:23 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-1575: part 2: yield admission control resources
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8581 ) Change subject: IMPALA-1575: part 2: yield admission control resources .. Patch Set 1: Ran the test in a loop overnight and it didn't flake out, so that's good news. -- To view, visit http://gerrit.cloudera.org:8080/8581 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib1fae8dc1c4b0eca7bfa8fadae4a56ef2b37947a Gerrit-Change-Number: 8581 Gerrit-PatchSet: 1 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 17 Nov 2017 16:25:55 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/8436 ) Change subject: IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction .. Patch Set 6: > Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/1491/ There seems to be some problem with the handling if char type, I will check it. -- To view, visit http://gerrit.cloudera.org:8080/8436 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004 Gerrit-Change-Number: 8436 Gerrit-PatchSet: 6 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 17 Nov 2017 15:06:31 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4848: Add WIDTH BUCKET() function
anujphadke has posted comments on this change. ( http://gerrit.cloudera.org:8080/6023 ) Change subject: IMPALA-4848: Add WIDTH_BUCKET() function .. Patch Set 9: (5 comments) http://gerrit.cloudera.org:8080/#/c/6023/7/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: http://gerrit.cloudera.org:8080/#/c/6023/7/be/src/exprs/expr-test.cc@4247 PS7, Line 4247: TestStringValue("typeOf(4294967296)", "BIGINT"); > You should include a case like (1, 1, 6, 3) Do you mean to add a test case when expr > max_val of the width bucket? We have a similar case here - TestValue("width_bucket(22, 5, 20.1, 4)", TYPE_BIGINT, 5); http://gerrit.cloudera.org:8080/#/c/6023/7/be/src/exprs/math-functions-ir.cc File be/src/exprs/math-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/6023/7/be/src/exprs/math-functions-ir.cc@429 PS7, Line 429: onst FloatVa > Please ignore my last comment. Accidentally got posted when replying to Dan Done http://gerrit.cloudera.org:8080/#/c/6023/7/be/src/exprs/math-functions-ir.cc@431 PS7, Line 431: loatVal(fm > Please ignore my last comment. Accidentally got posted when replying to Dan Done http://gerrit.cloudera.org:8080/#/c/6023/7/be/src/exprs/math-functions-ir.cc@479 PS7, Line 479: > Please ignore my last comment. Accidentally got posted when replying to Dan if expr > max_val and num_buckets is the max value that can be represented by intVal. Adding 1 to num_buckets will cause an overflow. Hence, I am storing it in a BigIntVal and then adding 1 http://gerrit.cloudera.org:8080/#/c/6023/7/be/src/exprs/math-functions-ir.cc@516 PS7, Line 516: error_msg << "Overflow while evaluating the difference between expr: " << > Please ignore my last comment. Accidentally got posted when replying to Dan Switched to using int128_t. Using int256_t incase of overflows. Will investigate the binary search approach and post a follow up patch if there are perf improvements. -- To view, visit http://gerrit.cloudera.org:8080/6023 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I081bc916b1bef7b929ca161a9aade3b54c6b858f Gerrit-Change-Number: 6023 Gerrit-PatchSet: 9 Gerrit-Owner: anujphadke Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: anujphadke Gerrit-Comment-Date: Fri, 17 Nov 2017 09:11:45 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4848: Add WIDTH BUCKET() function
Hello Taras Bobrovytsky, Michael Brown, Alex Behm, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6023 to look at the new patch set (#9). Change subject: IMPALA-4848: Add WIDTH_BUCKET() function .. IMPALA-4848: Add WIDTH_BUCKET() function Syntax : width_bucket(expr decimal, min_val decimal, max_val decimal, num_buckets int) This function creates equiwidth histograms , where the histogram range is divided into num_buckets buckets having identical sizes. This function returns the bucket in which the expr value would fall. min_val and max_val are the minimum and maximum value of the histogram range respectively. -> This function returns NULL if expr is a NULL. -> This function returns 0 if expr < min_val -> This function returns num_buckets + 1 if expr > max_val E.g. [localhost:21000] > select width_bucket(8, 1, 20, 3); +---+ | width_bucket(8, 1, 20, 3) | +---+ | 2 | +---+ Change-Id: I081bc916b1bef7b929ca161a9aade3b54c6b858f --- M be/src/exprs/expr-test.cc M be/src/exprs/math-functions-ir.cc M be/src/exprs/math-functions.h M common/function-registry/impala_functions.py M fe/src/main/java/org/apache/impala/analysis/Expr.java 5 files changed, 192 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/6023/9 -- To view, visit http://gerrit.cloudera.org:8080/6023 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I081bc916b1bef7b929ca161a9aade3b54c6b858f Gerrit-Change-Number: 6023 Gerrit-PatchSet: 9 Gerrit-Owner: anujphadke Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: anujphadke
[Impala-ASF-CR] IMPALA-4848: Add WIDTH BUCKET() function
Hello Taras Bobrovytsky, Michael Brown, Alex Behm, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6023 to look at the new patch set (#8). Change subject: IMPALA-4848: Add WIDTH_BUCKET() function .. IMPALA-4848: Add WIDTH_BUCKET() function Syntax : width_bucket (expr decimal, min_val decimal, max_val decimal, num_buckets int) This function creates equiwidth histograms , where the histogram range is divided into num_buckets buckets having identical sizes. This function returns the bucket in which the expr value would fall. min_val and max_val are the minimum and maximum value of the histogram range respectively. -> This function returns NULL if expr is a NULL. -> This function returns 0 if expr < min_val -> This function returns num_buckets + 1 if expr > max_val E.g. [localhost:21000] > select width_bucket(8, 1, 20, 3); +---+ | width_bucket(8, 1, 20, 3) | +---+ | 2 | +---+ Change-Id: I081bc916b1bef7b929ca161a9aade3b54c6b858f --- M be/src/exprs/expr-test.cc M be/src/exprs/math-functions-ir.cc M be/src/exprs/math-functions.h M common/function-registry/impala_functions.py M fe/src/main/java/org/apache/impala/analysis/Expr.java 5 files changed, 192 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/6023/8 -- To view, visit http://gerrit.cloudera.org:8080/6023 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I081bc916b1bef7b929ca161a9aade3b54c6b858f Gerrit-Change-Number: 6023 Gerrit-PatchSet: 8 Gerrit-Owner: anujphadke Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: anujphadke
[Impala-ASF-CR] IMPALA-5976: Remove equivalence class computation in FE
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8317 ) Change subject: IMPALA-5976: Remove equivalence class computation in FE .. Patch Set 6: (26 comments) Patch is looking good! http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@91 PS3, Line 91: /** > My point is that in "select * from A union B", A.col and B.col doesn't appe Got it, good point! Now I see where your original formulation is coming from. How about this: Slot A has a value transfer to slot B if a predicate on A can be applied to B at some point in the plan. Determining the lowest correct placement of that predicate is subject to the conventional assignment rules. http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1987 PS6, Line 1987: Preconditions.checkState(false, "Condensed transitive closure doesn't equal to " + throw new IllegalStatsException() http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1988 PS6, Line 1988: "uncondensed transitive closure. Uncondensed graph:\n" + tc + Graph (capital) http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1989 PS6, Line 1989: "Condensed Graph:\n" + condensedTc); I think we need a \n before "Condensed" as well http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2095 PS6, Line 2095: Collections.sort(result); comment why we are sorting here, also adjust function comment accordingly http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/analysis/Expr.java File fe/src/main/java/org/apache/impala/analysis/Expr.java: http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/analysis/Expr.java@692 PS6, Line 692:* 1. The tree structures are the same and every pair of corresponding Expr nodes have The tree structures ignoring implicit casts are the same. (we don't explicitly check the return type, that's really part of condition 3 localEquals()) http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/analysis/Expr.java@712 PS6, Line 712: return localEquals(that); Move this above the loop to detect mismatches faster? http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/analysis/Expr.java@716 PS6, Line 716:* Local eq comparator. It returns whether this expr is equal to 'that' ignoring Returns true if this expr is equal to 'that' ignoring children. http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java: http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@406 PS6, Line 406: // For right anti and semi joins the lhs join slots does not appear in the output. Right anti and semi joins do not produce NULLs so should the output partition not be lhsJoinPartition? Like you said, the slots in the rhsJoinPartition are not returned from such joins. http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@413 PS6, Line 413: // Otherwise it's good to use the lhs partition. Otherwise we're good to... http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/util/Graph.java File fe/src/main/java/org/apache/impala/util/Graph.java: http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/util/Graph.java@25 PS6, Line 25: import static java.lang.Math.addExact; remove (not needed and does not compile on Java7) http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/util/Graph.java@62 PS6, Line 62: /** Return an iterator of vertex IDs with an edge from srcVid. */ remove (comment in super class is sufficient) http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/util/Graph.java@176 PS6, Line 176: final int[] condensedAdjList = condensed_.adjList_[sccIds_[srcVid]]; members should be private http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/util/Graph.java@185 PS6, Line 185: adjListPos += 1; ++adjListPos http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/util/Graph.java@194 PS6, Line 194: memberPos += 1; ++memberPos http://gerrit.cloudera.org:8080/#/c/8317/6/fe/s