[Impala-ASF-CR] IMPALA-3804: Re-enable per-scan filtering for sequence-based scanners
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8684 ) Change subject: IMPALA-3804: Re-enable per-scan filtering for sequence-based scanners .. IMPALA-3804: Re-enable per-scan filtering for sequence-based scanners IMPALA-3798 disabled per-scan filtering for sequence- based scanners due to a race between runtime filter arrival and header splits processing. This commit enables per-scan filtering again for the sequence based files. In HdfsScanNode::ProcessSplit() we check if the current range is the header of a sequence file. If so, and the filters reject the file, the whole file skipped. If it is not a sequence header, but the filters reject the partition, we call RangeComplete() on the current scan range. Change-Id: I4b38c26bcbe67f83efcc65a1723d766626ae3d3e Reviewed-on: http://gerrit.cloudera.org:8080/8684 Reviewed-by: Tim ArmstrongTested-by: Impala Public Jenkins --- M be/src/exec/base-sequence-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scanner.cc M tests/custom_cluster/test_always_false_filter.py 6 files changed, 51 insertions(+), 33 deletions(-) Approvals: Tim Armstrong: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/8684 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I4b38c26bcbe67f83efcc65a1723d766626ae3d3e Gerrit-Change-Number: 8684 Gerrit-PatchSet: 7 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-6286: Remove invalid runtime filter targets.
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8783 ) Change subject: IMPALA-6286: Remove invalid runtime filter targets. .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/8783/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/8783/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1631 PS3, Line 1631: > nit: remove the space Done http://gerrit.cloudera.org:8080/#/c/8783/3/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java: http://gerrit.cloudera.org:8080/#/c/8783/3/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@265 PS3, Line 265: targer > nit: target Done -- To view, visit http://gerrit.cloudera.org:8080/8783 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I88153eea9f4b5117df60366fad2bd91776b95298 Gerrit-Change-Number: 8783 Gerrit-PatchSet: 3 Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Thu, 07 Dec 2017 06:54:15 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6286: Remove invalid runtime filter targets.
Hello Dimitris Tsirogiannis, Vuk Ercegovac, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8783 to look at the new patch set (#4). Change subject: IMPALA-6286: Remove invalid runtime filter targets. .. IMPALA-6286: Remove invalid runtime filter targets. If the target expression of a runtime filter evaluates to a non-NULL value for outer-join non-matches, then assigning the filter below the nullable side of an outer join may lead to incorrect query results. See IMPALA-6286 for an example and explanation. This patch adds a conservative check that prevents the creation of runtime filters that could potentially have such incorrect targets. Some safe opportunities are deliberately missed to keep the code simple. See RuntimeFilterGenerator#getTargetSlots(). Testing: - added planner tests which passed locally Change-Id: I88153eea9f4b5117df60366fad2bd91776b95298 --- M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test 3 files changed, 128 insertions(+), 25 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/8783/4 -- To view, visit http://gerrit.cloudera.org:8080/8783 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I88153eea9f4b5117df60366fad2bd91776b95298 Gerrit-Change-Number: 8783 Gerrit-PatchSet: 4 Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-6286: Remove invalid runtime filter targets.
Hello Dimitris Tsirogiannis, Vuk Ercegovac, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8783 to look at the new patch set (#3). Change subject: IMPALA-6286: Remove invalid runtime filter targets. .. IMPALA-6286: Remove invalid runtime filter targets. If the target expression of a runtime filter evaluates to a non-NULL value for outer-join non-matches, then assigning the filter below the nullable side of an outer join may lead to incorrect query results. See IMPALA-6286 for an example and explanation. This patch adds a conservative check that prevents the creation of runtime filters that could potentially have such incorrect targets. Some safe opportunities are deliberately missed to keep the code simple. See RuntimeFilterGenerator#getTargetSlots(). Testing: - added planner tests which passed locally Change-Id: I88153eea9f4b5117df60366fad2bd91776b95298 --- M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test 3 files changed, 128 insertions(+), 25 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/8783/3 -- To view, visit http://gerrit.cloudera.org:8080/8783 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I88153eea9f4b5117df60366fad2bd91776b95298 Gerrit-Change-Number: 8783 Gerrit-PatchSet: 3 Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-4664: Unexpected string conversion in Shell
Kim Jin Chul has posted comments on this change. ( http://gerrit.cloudera.org:8080/8639 ) Change subject: IMPALA-4664: Unexpected string conversion in Shell .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/8639/4/shell/impala_shell.py File shell/impala_shell.py: http://gerrit.cloudera.org:8080/#/c/8639/4/shell/impala_shell.py@337 PS4, Line 337: def sanitise_input(self, args): > I feel like all this string manipulation is very hard to reason about, > leading to bugs like this. I agree. The current string manipulation logic has the following problems. 1. May have hidden bugs 2. Code maintenance is not easy because it has some special handling such as making lower-case command(IMPALA-2640) 3. Not to readable Currently I would like to fix the problem with a minimal change. You may feel there is inefficient code. We should need to clarify the logic on an another JIRA ticket. > It seems like if we're calling sqlparse.parse() to construct Statement > instances, we should return those Statements, which will be easier to operate > on, instead of converting it back to strings. Instead we're translating tokens back to strings, joining then, then resplitting them with sqlparse.split(). sqlparse.parse() can return multiple statements. (e.g. select 'foo'; select 'boo';) Actually, I would like to rely on sqlparse.parse because it is used widely and confirmed by many users than our own parser logic. By the way, I am not sure the parse can work for all the Impala syntax. As far as I guarantee, it can parse command appropriately. If we will adopt the parse function widely, I need to get an answer for the following curiosity. You know SQL syntax in DBMSes and sql-on-hadoop(e.g. Impala, Hive) are slightly different each other. The systems try to follow up SQL standard, but, in fact, there are own specialized syntax on each system. Here is my assumption. I guess we need to customize sqlparse to consume Impala syntax and sqlparse can recognize Impala's version to decide a set of available syntax. sqlparse can take some arguments a system name and a version to apply custom syntax. I am not sure this feature is already in sqlparse, but this is necessary to push upstream. -- To view, visit http://gerrit.cloudera.org:8080/8639 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iba6fa4e9ac570d22135ba51b844db8f9be965ca3 Gerrit-Change-Number: 8639 Gerrit-PatchSet: 4 Gerrit-Owner: Kim Jin ChulGerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Thu, 07 Dec 2017 02:11:56 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6225: Part 2: Query profile date-time strings should have ns precision.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/8784 ) Change subject: IMPALA-6225: Part 2: Query profile date-time strings should have ns precision. .. Patch Set 1: (4 comments) http://gerrit.cloudera.org:8080/#/c/8784/1/tests/query_test/test_observability.py File tests/query_test/test_observability.py: http://gerrit.cloudera.org:8080/#/c/8784/1/tests/query_test/test_observability.py@119 PS1, Line 119: __ Why __ prefix ? http://gerrit.cloudera.org:8080/#/c/8784/1/tests/query_test/test_observability.py@136 PS1, Line 136: while (retries < 60): for i in xrange(MAX_RETRIES): http://gerrit.cloudera.org:8080/#/c/8784/1/tests/query_test/test_observability.py@139 PS1, Line 139: ["Start Time" Is this key always in info_strings[] ? Is there a chance this will result in key not found exception ? http://gerrit.cloudera.org:8080/#/c/8784/1/tests/query_test/test_observability.py@140 PS1, Line 140: ["End Time"] Same question for "End Time". -- To view, visit http://gerrit.cloudera.org:8080/8784 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id3421a34cc029ebca551730084c7cbd402d5c109 Gerrit-Change-Number: 8784 Gerrit-PatchSet: 1 Gerrit-Owner: Zoram ThangaGerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Thu, 07 Dec 2017 01:59:29 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6285: Don't print stack trace on RPC errors.
Michael Ho has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8788 Change subject: IMPALA-6285: Don't print stack trace on RPC errors. .. IMPALA-6285: Don't print stack trace on RPC errors. There is not much benefit in printing the stack trace when Thrift RPC hits an error. As long as we print enough info about the error and idetify the caller, that should be sufficient. In fact, it has been observed that stack crawl caused unnecessary CPU spikes in the past. This change replaces Status() with Status::Expected() in DoRpc(), RetryRpc() and RetryRpcRecv() to avoid unnecessary stack crawls. Testing done: private core build. Change-Id: Ia83294494442ef21f7934f92ba9112e80d81fa58 --- M be/src/exec/catalog-op-executor.cc M be/src/runtime/client-cache.h M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/data-stream-sender.cc M be/src/runtime/data-stream-test.cc M be/src/runtime/query-state.cc M be/src/runtime/runtime-filter-bank.cc M be/src/service/client-request-state.cc M be/src/statestore/statestore-subscriber.cc M be/src/statestore/statestore.cc 10 files changed, 65 insertions(+), 42 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/88/8788/1 -- To view, visit http://gerrit.cloudera.org:8080/8788 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ia83294494442ef21f7934f92ba9112e80d81fa58 Gerrit-Change-Number: 8788 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho
[Impala-ASF-CR] IMPALA-5929: Remove redundant explicit casts to string
Hello Thomas Tauber-Marshall, Tim Armstrong, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8660 to look at the new patch set (#2). Change subject: IMPALA-5929: Remove redundant explicit casts to string .. IMPALA-5929: Remove redundant explicit casts to string This patch adds a query rewriter to remove redundant explicit casts to string from binary predicates of the form: "cast( to string) ". The cast is redundant if the predicate evaluation is the same even if the cast is removed and the constant is converted to the original type of the expression. For example: cast(int_col as string) = '123456' -> int_col = 123456 Performance: For the following query on a table having 6001215 records - select * from tpch.lineitem where cast(l_linenumber as string) = '0' +-+---++ | | Scan Time | +-+---++ | | Avg | St dev | | Without rewrite | 1s406ms | 44ms | | With rewrite| 1s099ms | 28ms | +-+---++ Testing: - Added unit tests to ExprRewriteRulesTest - Added functional test to expr.test - Current FE planner tests and BE expr-test run successfully with this change. Change-Id: I91b7c6452d0693115f9b9ed9ba09f3ffe0f36b2b --- M fe/src/main/java/org/apache/impala/analysis/Analyzer.java A fe/src/main/java/org/apache/impala/rewrite/RemoveRedundantStringCast.java M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java M testdata/workloads/functional-query/queries/QueryTest/exprs.test 4 files changed, 167 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/8660/2 -- To view, visit http://gerrit.cloudera.org:8080/8660 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I91b7c6452d0693115f9b9ed9ba09f3ffe0f36b2b Gerrit-Change-Number: 8660 Gerrit-PatchSet: 2 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5929: Remove redundant explicit casts to string
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/8660 ) Change subject: IMPALA-5929: Remove redundant explicit casts to string .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/8660/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8660/1//COMMIT_MSG@17 PS1, Line 17: Testing: > Not super important, since this is clearly a win, but do you have any perf Done http://gerrit.cloudera.org:8080/#/c/8660/1/fe/src/main/java/org/apache/impala/rewrite/RemoveRedundantStringCast.java File fe/src/main/java/org/apache/impala/rewrite/RemoveRedundantStringCast.java: http://gerrit.cloudera.org:8080/#/c/8660/1/fe/src/main/java/org/apache/impala/rewrite/RemoveRedundantStringCast.java@55 PS1, Line 55: oolean isPotentiallyRedundantCast > Seems unnecessary to store this instead of just doing it inline in the 'if' I intentionally did this for improving code readability to avoid putting huge conditional statements inside if. Would you prefer i do it inline? http://gerrit.cloudera.org:8080/#/c/8660/1/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java File fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java: http://gerrit.cloudera.org:8080/#/c/8660/1/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java@589 PS1, Line 589: RewritesOk("cast(tinyint_col as string) = '100'", rule, "tinyint_col = 100"); > In all of your cases here, the cast() is applied on a slotref, and the cons Done -- To view, visit http://gerrit.cloudera.org:8080/8660 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91b7c6452d0693115f9b9ed9ba09f3ffe0f36b2b Gerrit-Change-Number: 8660 Gerrit-PatchSet: 1 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 07 Dec 2017 01:23:25 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6286: Remove invalid runtime filter targets.
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8783 ) Change subject: IMPALA-6286: Remove invalid runtime filter targets. .. Patch Set 1: (6 comments) Thanks for the quick review! http://gerrit.cloudera.org:8080/#/c/8783/1/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java: http://gerrit.cloudera.org:8080/#/c/8783/1/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@278 PS1, Line 278: if (LOG.isTraceEnabled()) { : LOG.trace("Generating runtime filter from predicate " + joinPredicate); : } > move this to L297 when you know that the filter will be generated. Good catch. Done http://gerrit.cloudera.org:8080/#/c/8783/1/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@284 PS1, Line 284: incorrectly alter the query results > produce incorrect results. [more direct] Done http://gerrit.cloudera.org:8080/#/c/8783/1/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@285 PS1, Line 285: should > nit: must Done http://gerrit.cloudera.org:8080/#/c/8783/1/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@285 PS1, Line 285: the expr > the target expression Done http://gerrit.cloudera.org:8080/#/c/8783/1/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@287 PS1, Line 287: Literal > not following what is the Literal part of this predicate. You are right, not literal at this point. Done. http://gerrit.cloudera.org:8080/#/c/8783/1/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test File testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test: http://gerrit.cloudera.org:8080/#/c/8783/1/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test@1483 PS1, Line 1483: s > nit: produced Done -- To view, visit http://gerrit.cloudera.org:8080/8783 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I88153eea9f4b5117df60366fad2bd91776b95298 Gerrit-Change-Number: 8783 Gerrit-PatchSet: 1 Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Thu, 07 Dec 2017 00:50:26 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6286: Remove invalid runtime filter targets.
Hello Dimitris Tsirogiannis, Vuk Ercegovac, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8783 to look at the new patch set (#2). Change subject: IMPALA-6286: Remove invalid runtime filter targets. .. IMPALA-6286: Remove invalid runtime filter targets. If the target expression of a runtime filter evaluates to a non-NULL value for outer-join non-matches, then assigning the filter below the nullable side of an outer join may incorrectly alter the query results. See IMPALA-6286 for an example and explanation. Such runtime-filter targets are removed. Testing: - added planner tests which passed locally Change-Id: I88153eea9f4b5117df60366fad2bd91776b95298 --- M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test 2 files changed, 80 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/8783/2 -- To view, visit http://gerrit.cloudera.org:8080/8783 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I88153eea9f4b5117df60366fad2bd91776b95298 Gerrit-Change-Number: 8783 Gerrit-PatchSet: 2 Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-6286: Remove invalid runtime filter targets.
Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/8783 ) Change subject: IMPALA-6286: Remove invalid runtime filter targets. .. Patch Set 1: (6 comments) still getting the hang of target/source, outer, nullable in the context of runtime filters, but here are some comments. http://gerrit.cloudera.org:8080/#/c/8783/1/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java: http://gerrit.cloudera.org:8080/#/c/8783/1/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@278 PS1, Line 278: if (LOG.isTraceEnabled()) { : LOG.trace("Generating runtime filter from predicate " + joinPredicate); : } move this to L297 when you know that the filter will be generated. http://gerrit.cloudera.org:8080/#/c/8783/1/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@284 PS1, Line 284: incorrectly alter the query results produce incorrect results. [more direct] http://gerrit.cloudera.org:8080/#/c/8783/1/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@285 PS1, Line 285: the expr the target expression http://gerrit.cloudera.org:8080/#/c/8783/1/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@285 PS1, Line 285: should nit: must http://gerrit.cloudera.org:8080/#/c/8783/1/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@287 PS1, Line 287: Literal not following what is the Literal part of this predicate. http://gerrit.cloudera.org:8080/#/c/8783/1/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test File testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test: http://gerrit.cloudera.org:8080/#/c/8783/1/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test@1483 PS1, Line 1483: s nit: produced -- To view, visit http://gerrit.cloudera.org:8080/8783 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I88153eea9f4b5117df60366fad2bd91776b95298 Gerrit-Change-Number: 8783 Gerrit-PatchSet: 1 Gerrit-Owner: Alex BehmGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Thu, 07 Dec 2017 00:42:24 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5754: Improve randomness of rand()/random()
Hello Michael Ho, Jim Apple, Attila Jeges, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8355 to look at the new patch set (#16). Change subject: IMPALA-5754: Improve randomness of rand()/random() .. IMPALA-5754: Improve randomness of rand()/random() Currently implementation of rand/random built-in functions use rand_r of C library. We recognized its randomness was poor. pcg32 of third party library shows better randomness than rand_r. Testing: Revise unit test in expr-test Add E2E test to random.test Change-Id: Idafdd5fe7502ff242c76a91a815c565146108684 --- M LICENSE.txt M be/src/exprs/expr-test.cc M be/src/exprs/math-functions-ir.cc A be/src/thirdparty/pcg-cpp-0.98/LICENSE.txt A be/src/thirdparty/pcg-cpp-0.98/README.md A be/src/thirdparty/pcg-cpp-0.98/include/pcg_extras.hpp A be/src/thirdparty/pcg-cpp-0.98/include/pcg_random.hpp A be/src/thirdparty/pcg-cpp-0.98/include/pcg_uint128.hpp M testdata/workloads/functional-query/queries/QueryTest/alloc-fail-init.test A testdata/workloads/functional-query/queries/QueryTest/random.test M tests/query_test/test_queries.py 11 files changed, 3,454 insertions(+), 31 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/8355/16 -- To view, visit http://gerrit.cloudera.org:8080/8355 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idafdd5fe7502ff242c76a91a815c565146108684 Gerrit-Change-Number: 8355 Gerrit-PatchSet: 16 Gerrit-Owner: Kim Jin ChulGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Michael Ho
[Impala-ASF-CR] IMPALA-5754: Improve randomness of rand()/random()
Kim Jin Chul has posted comments on this change. ( http://gerrit.cloudera.org:8080/8355 ) Change subject: IMPALA-5754: Improve randomness of rand()/random() .. Patch Set 15: The third party library is officially released recently: version 0.98 (http://www.pcg-random.org/download.html) I didn't include some unnecessary files such as sample and tests. Please look around PCG repo: https://github.com/imneme/pcg-cpp -- To view, visit http://gerrit.cloudera.org:8080/8355 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idafdd5fe7502ff242c76a91a815c565146108684 Gerrit-Change-Number: 8355 Gerrit-PatchSet: 15 Gerrit-Owner: Kim Jin ChulGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Thu, 07 Dec 2017 00:36:42 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5754: Improve randomness of rand()/random()
Hello Michael Ho, Jim Apple, Attila Jeges, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8355 to look at the new patch set (#15). Change subject: IMPALA-5754: Improve randomness of rand()/random() .. IMPALA-5754: Improve randomness of rand()/random() Currently implementation of rand/random built-in functions use rand_r of C library. We recognized its randomness was poor. pcg32 of third party library shows better randomness than rand_r. Testing: Revise unit test in expr-test Add E2E test to random.test Change-Id: Idafdd5fe7502ff242c76a91a815c565146108684 --- M LICENSE.txt M be/src/exprs/expr-test.cc M be/src/exprs/math-functions-ir.cc A be/src/thirdparty/pcg-cpp-0.98/LICENSE.txt A be/src/thirdparty/pcg-cpp-0.98/README.md A be/src/thirdparty/pcg-cpp-0.98/include/pcg_extras.hpp A be/src/thirdparty/pcg-cpp-0.98/include/pcg_random.hpp A be/src/thirdparty/pcg-cpp-0.98/include/pcg_uint128.hpp M testdata/workloads/functional-query/queries/QueryTest/alloc-fail-init.test A testdata/workloads/functional-query/queries/QueryTest/random.test M tests/query_test/test_queries.py 11 files changed, 3,454 insertions(+), 31 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/8355/15 -- To view, visit http://gerrit.cloudera.org:8080/8355 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idafdd5fe7502ff242c76a91a815c565146108684 Gerrit-Change-Number: 8355 Gerrit-PatchSet: 15 Gerrit-Owner: Kim Jin ChulGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Michael Ho
[Impala-ASF-CR] IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8034 ) Change subject: IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder .. Patch Set 17: Code-Review+1 Did you want to have another look Joe? -- To view, visit http://gerrit.cloudera.org:8080/8034 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I02a3b54f6c107d19b62ad9e1c49df94175964299 Gerrit-Change-Number: 8034 Gerrit-PatchSet: 17 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Bikramjeet VigGerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Pranay Singh Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 07 Dec 2017 00:21:51 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6281: Fix use-after-free in InitAuth()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8777 ) Change subject: IMPALA-6281: Fix use-after-free in InitAuth() .. Patch Set 2: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1582/ -- To view, visit http://gerrit.cloudera.org:8080/8777 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1f29c2396df114264dfc23726b8ba778f50e12e9 Gerrit-Change-Number: 8777 Gerrit-PatchSet: 2 Gerrit-Owner: Michael HoGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 07 Dec 2017 00:18:38 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4132: Use -fno-omit-frame-pointer
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8612 ) Change subject: IMPALA-4132: Use -fno-omit-frame-pointer .. Patch Set 2: I spoke to Gabor about this a few days ago. I agree it would be good to validate it but we we didn't plan for that - the JIRA didn't have a reproduction of a bad stacktrace scenario and the comments on the JIRA were confident that it would be beneficial. We could try building a release binary, stripping it with strip --strip-debug and inducing a crash. @Lars do you have any examples of crashes from minidumps where we got a "bad" stacktrace? -- To view, visit http://gerrit.cloudera.org:8080/8612 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib7d0d88ba015a847356ed0274fe91017b98cb941 Gerrit-Change-Number: 8612 Gerrit-PatchSet: 2 Gerrit-Owner: Gabor KaszabGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 07 Dec 2017 00:18:05 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6262: Always initialize runtime profile for DataSink
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8770 ) Change subject: IMPALA-6262: Always initialize runtime profile for DataSink .. Patch Set 3: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1581/ -- To view, visit http://gerrit.cloudera.org:8080/8770 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a683000ef180027b929dbebe78bc2a530a4767e Gerrit-Change-Number: 8770 Gerrit-PatchSet: 3 Gerrit-Owner: Michael HoGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 07 Dec 2017 00:15:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3804: Re-enable per-scan filtering for sequence-based scanners
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8684 ) Change subject: IMPALA-3804: Re-enable per-scan filtering for sequence-based scanners .. Patch Set 6: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1580/ -- To view, visit http://gerrit.cloudera.org:8080/8684 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4b38c26bcbe67f83efcc65a1723d766626ae3d3e Gerrit-Change-Number: 8684 Gerrit-PatchSet: 6 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 07 Dec 2017 00:02:02 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6222: Add details to error msg on failure to get min reservation
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8781 ) Change subject: IMPALA-6222: Add details to error msg on failure to get min reservation .. Patch Set 1: (10 comments) I like the overall approach and the tests. Comments are mainly about the invariants we can assume about where limits are placed in the hierarchy. http://gerrit.cloudera.org:8080/#/c/8781/1/be/src/runtime/bufferpool/reservation-tracker-test.cc File be/src/runtime/bufferpool/reservation-tracker-test.cc: http://gerrit.cloudera.org:8080/#/c/8781/1/be/src/runtime/bufferpool/reservation-tracker-test.cc@44 PS1, Line 44: NULL nullptr? http://gerrit.cloudera.org:8080/#/c/8781/1/be/src/runtime/bufferpool/reservation-tracker.h File be/src/runtime/bufferpool/reservation-tracker.h: http://gerrit.cloudera.org:8080/#/c/8781/1/be/src/runtime/bufferpool/reservation-tracker.h@121 PS1, Line 121: /// 'error_status'. Maybe add something like (if 'error_status' is non-NULL) just to clarify that it's valid to pass in NULL? http://gerrit.cloudera.org:8080/#/c/8781/1/be/src/runtime/bufferpool/reservation-tracker.cc File be/src/runtime/bufferpool/reservation-tracker.cc: http://gerrit.cloudera.org:8080/#/c/8781/1/be/src/runtime/bufferpool/reservation-tracker.cc@168 PS1, Line 168: memtracker nit: mem_tracker for consistency http://gerrit.cloudera.org:8080/#/c/8781/1/be/src/runtime/bufferpool/reservation-tracker.cc@174 PS1, Line 174: *error_status = Status::Expected(Substitute( Seems like we should probably add this one to generate_error_codes.py (we don't have terribly consistent criteria for doing so but it's usually a good practice). http://gerrit.cloudera.org:8080/#/c/8781/1/be/src/runtime/bufferpool/reservation-tracker.cc@185 PS1, Line 185: memtracker->LogTopNQueries(5 This is kind of assuming that only query mem trackers and above have reservation limits - this would produce a weird result if we called it on a lower one. The assumption is true for now but I'm wary of adding an undocumented assumption here since we might want to add constraints to other plan nodes, e.g. for partial sort. I guess we could document on this function that the error message only makes sense if you call this on a query-level reservation tracker (or above). http://gerrit.cloudera.org:8080/#/c/8781/1/be/src/runtime/bufferpool/reservation-tracker.cc@185 PS1, Line 185: memtracker->LogTopNQueries(5))); I feel like we should probably document the lock order implied here since we're calling a function that acquires MemTracker::child_trackers_lock_ while holding ReservationTracker::lock_. Maybe document in the ReservationTracker::lock_ comment? http://gerrit.cloudera.org:8080/#/c/8781/1/be/src/runtime/bufferpool/reservation-tracker.cc@199 PS1, Line 199: Substitute("Failed to increase reservation by $0 because it would exceed the " Also move to generate_error_codes.py? Or maybe we should call MemTracker::MemLimitExceeded() here instead for consistency? http://gerrit.cloudera.org:8080/#/c/8781/1/be/src/runtime/bufferpool/reservation-tracker.cc@203 PS1, Line 203: mem_tracker_->LogUsage(0), mem_tracker_->LogTopNQueries(5))); We don't actually know that the limit hit was on mem_tracker_ itself, unfortunately, it could be further up. I think calling MemTracker::MemLimitExceeded() might solve this issue though. http://gerrit.cloudera.org:8080/#/c/8781/1/be/src/runtime/mem-tracker.cc File be/src/runtime/mem-tracker.cc: http://gerrit.cloudera.org:8080/#/c/8781/1/be/src/runtime/mem-tracker.cc@336 PS1, Line 336: void MemTracker::GetTopNQueries(std::priority_queue, Returning the pointers to the MemTrackers isn't safe - they're only guaranteed to live as long as 'child_trackers_lock_' is held. I think we need to copy out any state from the child trackers that we need before dropping the lock. http://gerrit.cloudera.org:8080/#/c/8781/1/be/src/runtime/mem-tracker.cc@338 PS1, Line 338: { nit: curly braces don't seem necessary since the lock is held for the whole function. -- To view, visit http://gerrit.cloudera.org:8080/8781 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic4675fe923b33fdc4ddefd1872e6d6b803993d74 Gerrit-Change-Number: 8781 Gerrit-PatchSet: 1 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 06 Dec 2017 23:45:15 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6225: Part 2: Query profile date-time strings should have ns precision.
Zoram Thanga has posted comments on this change. ( http://gerrit.cloudera.org:8080/8784 ) Change subject: IMPALA-6225: Part 2: Query profile date-time strings should have ns precision. .. Patch Set 1: The earlier attempt parsed the debug webpage; this one gets the thrift profile and extracts the time stamps from the profile tree. Thanks PhilZ for sharing how to do that. -- To view, visit http://gerrit.cloudera.org:8080/8784 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id3421a34cc029ebca551730084c7cbd402d5c109 Gerrit-Change-Number: 8784 Gerrit-PatchSet: 1 Gerrit-Owner: Zoram ThangaGerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Wed, 06 Dec 2017 23:30:25 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6225: Part 2: Query profile date-time strings should have ns precision.
Zoram Thanga has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8784 Change subject: IMPALA-6225: Part 2: Query profile date-time strings should have ns precision. .. IMPALA-6225: Part 2: Query profile date-time strings should have ns precision. This commit follows 16d8dd58. This patch adds a test case that inspects the thrift profile of a completed query, and verifies that the "Start Time" and "End Time" of the query have nanosecond precision. We chose to work with the thrift profile directly, rather than parse the debug web page, as it is the thrift profile which is consumed by management API clients of Impala. Change-Id: Id3421a34cc029ebca551730084c7cbd402d5c109 --- M tests/common/impala_service.py M tests/query_test/test_observability.py 2 files changed, 70 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/84/8784/1 -- To view, visit http://gerrit.cloudera.org:8080/8784 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Id3421a34cc029ebca551730084c7cbd402d5c109 Gerrit-Change-Number: 8784 Gerrit-PatchSet: 1 Gerrit-Owner: Zoram Thanga
[Impala-ASF-CR] IMPALA-5848: Account for TCMalloc overhead in MemTracker
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8782 ) Change subject: IMPALA-5848: Account for TCMalloc overhead in MemTracker .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/8782/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8782/1//COMMIT_MSG@13 PS1, Line 13: Can you include an example of how this looks? http://gerrit.cloudera.org:8080/#/c/8782/1/be/src/runtime/mem-tracker.cc File be/src/runtime/mem-tracker.cc: http://gerrit.cloudera.org:8080/#/c/8782/1/be/src/runtime/mem-tracker.cc@298 PS1, Line 298: tcmalloc_overhead = curr_consumption - TcmallocMetric::BYTES_IN_USE->value(); This is essentially assuming that the consumption metric is TcmallocMetric::PhysicalBytesMetric. This isn't entirely true - it is actually AggregateMemoryMetrics::TOTAL_USED, which contains non-TCMalloc memory if the experimental option --mmap_buffers=true is on. It would be nice to get the accounting right for that case, even though it is experimental. You could compute PHYSICAL_BYTES_RESERVED - BYTES_IN_USE here, but I think if you're doing that it make more sense to use the same approach as setting up "Buffer Pool: Free Buffers", etc, in runtime/exec-env.cc where we add an extra MemTracker pointing to a metric. -- To view, visit http://gerrit.cloudera.org:8080/8782 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I602e9d5e8e8d7470dcfe4addde3265057c16263a Gerrit-Change-Number: 8782 Gerrit-PatchSet: 1 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 06 Dec 2017 23:24:26 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6286: Remove invalid runtime filter targets.
Alex Behm has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8783 Change subject: IMPALA-6286: Remove invalid runtime filter targets. .. IMPALA-6286: Remove invalid runtime filter targets. If the target expression of a runtime filter evaluates to a non-NULL value for outer-join non-matches, then assigning the filter below the nullable side of an outer join may incorrectly alter the query results. See IMPALA-6286 for an example and explanation. Such runtime-filter targets are removed. Testing: - added planner tests which passed locally Change-Id: I88153eea9f4b5117df60366fad2bd91776b95298 --- M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test 2 files changed, 80 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/8783/1 -- To view, visit http://gerrit.cloudera.org:8080/8783 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I88153eea9f4b5117df60366fad2bd91776b95298 Gerrit-Change-Number: 8783 Gerrit-PatchSet: 1 Gerrit-Owner: Alex Behm
[Impala-ASF-CR] IMPALA-5848: Account for TCMalloc overhead in MemTracker
Bikramjeet Vig has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8782 Change subject: IMPALA-5848: Account for TCMalloc overhead in MemTracker .. IMPALA-5848: Account for TCMalloc overhead in MemTracker This patch adds a new field to the MemTracker dump called "TCMalloc Overhead" which accounts for different cache freelists maintained by TCMalloc. This added accounting also helps bring down the amount of untracked memory. Testing: Tested manually by checking the memz webpage. Change-Id: I602e9d5e8e8d7470dcfe4addde3265057c16263a --- M be/src/runtime/mem-tracker.cc 1 file changed, 14 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/8782/1 -- To view, visit http://gerrit.cloudera.org:8080/8782 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I602e9d5e8e8d7470dcfe4addde3265057c16263a Gerrit-Change-Number: 8782 Gerrit-PatchSet: 1 Gerrit-Owner: Bikramjeet Vig
[Impala-ASF-CR] IMPALA-6281: Fix use-after-free in InitAuth()
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/8777 ) Change subject: IMPALA-6281: Fix use-after-free in InitAuth() .. Patch Set 2: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/8777 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1f29c2396df114264dfc23726b8ba778f50e12e9 Gerrit-Change-Number: 8777 Gerrit-PatchSet: 2 Gerrit-Owner: Michael HoGerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 06 Dec 2017 22:41:52 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6222: Add details to error msg on failure to get min reservation
Bikramjeet Vig has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8781 Change subject: IMPALA-6222: Add details to error msg on failure to get min reservation .. IMPALA-6222: Add details to error msg on failure to get min reservation This patch adds the following details to the error message encountered on failure to get minimum memory reservation: - which ReservationTracker or MemTracker hit its limit - top 5 admitted queries that are consuming the most memory under the tracker that hit its limit Testing: - added tests to reservation-tracker-test.cc that verify the error message returned for different cases. - tested "initial reservation failed" condition manually to verify the error message returned (Request suggestion for an automated test). Change-Id: Ic4675fe923b33fdc4ddefd1872e6d6b803993d74 --- M be/src/runtime/bufferpool/reservation-tracker-test.cc M be/src/runtime/bufferpool/reservation-tracker.cc M be/src/runtime/bufferpool/reservation-tracker.h M be/src/runtime/initial-reservations.cc M be/src/runtime/mem-tracker.cc M be/src/runtime/mem-tracker.h M common/thrift/generate_error_codes.py 7 files changed, 169 insertions(+), 38 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/81/8781/1 -- To view, visit http://gerrit.cloudera.org:8080/8781 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ic4675fe923b33fdc4ddefd1872e6d6b803993d74 Gerrit-Change-Number: 8781 Gerrit-PatchSet: 1 Gerrit-Owner: Bikramjeet Vig
[Impala-ASF-CR] IMPALA-6270: create Impala parent pom
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8753 ) Change subject: IMPALA-6270: create Impala parent pom .. Patch Set 1: Code-Review+2 (1 comment) I played around with this change locally to see how it affects my FE-heavy workflows. I'm happy with it. http://gerrit.cloudera.org:8080/#/c/8753/1/impala-parent/CMakeLists.txt File impala-parent/CMakeLists.txt: http://gerrit.cloudera.org:8080/#/c/8753/1/impala-parent/CMakeLists.txt@19 PS1, Line 19: COMMAND $ENV{IMPALA_HOME}/bin/mvn-quiet.sh install -DskipTests > This will "install" the impala-parent module into your local m2 cache. Roug Thanks for the explanation. -- To view, visit http://gerrit.cloudera.org:8080/8753 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id744e4357ee4d8e4be4e5490b2159bb76a2192f0 Gerrit-Change-Number: 8753 Gerrit-PatchSet: 1 Gerrit-Owner: Philip ZeyligerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Wed, 06 Dec 2017 21:13:29 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5310: Part 2: Add SAMPLED NDV() function.
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8569 ) Change subject: IMPALA-5310: Part 2: Add SAMPLED_NDV() function. .. Patch Set 5: Before merging, I'll run the tests one more time and give the Apache legal team some time to respond regarding including the MpFit library. -- To view, visit http://gerrit.cloudera.org:8080/8569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia51d56ee67ec6073e92f90bebb4005484138b820 Gerrit-Change-Number: 8569 Gerrit-PatchSet: 5 Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Comment-Date: Wed, 06 Dec 2017 20:41:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5310: Part 2: Add SAMPLED NDV() function.
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/8569 ) Change subject: IMPALA-5310: Part 2: Add SAMPLED_NDV() function. .. Patch Set 5: Code-Review+2 That's clearer, thanks. -- To view, visit http://gerrit.cloudera.org:8080/8569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia51d56ee67ec6073e92f90bebb4005484138b820 Gerrit-Change-Number: 8569 Gerrit-PatchSet: 5 Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Comment-Date: Wed, 06 Dec 2017 20:35:24 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5310: Part 2: Add SAMPLED NDV() function.
Hello Bharath Vissapragada, Jim Apple, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8569 to look at the new patch set (#5). Change subject: IMPALA-5310: Part 2: Add SAMPLED_NDV() function. .. IMPALA-5310: Part 2: Add SAMPLED_NDV() function. Adds a new SAMPLED_NDV() aggregate function that is intended to be used in COMPUTE STATS TABLESAMPLE. This patch only adds the function itself. Integration with COMPUTE STATS will come in a separate patch. SAMPLED_NDV() estimates the number of distinct values (NDV) based on a sample of data and the corresponding sampling rate. The main idea is to collect several x/y data points where x is the number of rows and y is the corresponding NDV estimate. These data points are used to fit an objective function to the data such that the true NDV can be extrapolated. The aggregate function maintains a fixed number of HyperLogLog intermediates to compute the x/y points. Several objective functions are fit and the best-fit one is used for extrapolation. Adds the MPFIT C library to perform curve fitting: https://www.physics.wisc.edu/~craigm/idl/cmpfit.html The library is a C port from Fortran. Scipy uses the Fortran version of the library for curve fitting. Testing: - added functional tests - core/hdfs run passed Change-Id: Ia51d56ee67ec6073e92f90bebb4005484138b820 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h A be/src/thirdparty/mpfit/DISCLAIMER A be/src/thirdparty/mpfit/README A be/src/thirdparty/mpfit/mpfit.c A be/src/thirdparty/mpfit/mpfit.h M be/src/util/CMakeLists.txt A be/src/util/mpfit-util.cc A be/src/util/mpfit-util.h M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M tests/query_test/test_aggregation.py 15 files changed, 3,900 insertions(+), 19 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/8569/5 -- To view, visit http://gerrit.cloudera.org:8080/8569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia51d56ee67ec6073e92f90bebb4005484138b820 Gerrit-Change-Number: 8569 Gerrit-PatchSet: 5 Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple
[Impala-ASF-CR] IMPALA-5310: Part 2: Add SAMPLED NDV() function.
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8569 ) Change subject: IMPALA-5310: Part 2: Add SAMPLED_NDV() function. .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/8569/3/be/src/util/mpfit-util.cc File be/src/util/mpfit-util.cc: http://gerrit.cloudera.org:8080/#/c/8569/3/be/src/util/mpfit-util.cc@33 PS3, Line 33: function > I just meant to say "fitting function" in the comment where I highlighted, Clarified in comment as discussed. -- To view, visit http://gerrit.cloudera.org:8080/8569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia51d56ee67ec6073e92f90bebb4005484138b820 Gerrit-Change-Number: 8569 Gerrit-PatchSet: 3 Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Comment-Date: Wed, 06 Dec 2017 20:32:30 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6281: Fix use-after-free in InitAuth()
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/8777 ) Change subject: IMPALA-6281: Fix use-after-free in InitAuth() .. Patch Set 1: (4 comments) http://gerrit.cloudera.org:8080/#/c/8777/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8777/1//COMMIT_MSG@9 PS1, Line 9: Previously, we implicitly create a local string object created from nit: can you briefly mention why this was found now / when it was introduced. I saw that you explained it in the JIRA but it may be helpful to include it here, too. http://gerrit.cloudera.org:8080/#/c/8777/1/be/src/rpc/rpc-mgr-test.cc File be/src/rpc/rpc-mgr-test.cc: http://gerrit.cloudera.org:8080/#/c/8777/1/be/src/rpc/rpc-mgr-test.cc@57 PS1, Line 57: static string current_executable_path; Should we make this a char* (and spell all uppercase)? http://gerrit.cloudera.org:8080/#/c/8777/1/be/src/rpc/thrift-server-test.cc File be/src/rpc/thrift-server-test.cc: http://gerrit.cloudera.org:8080/#/c/8777/1/be/src/rpc/thrift-server-test.cc@94 PS1, Line 94: static string current_executable_path; Change to char*, upper case? http://gerrit.cloudera.org:8080/#/c/8777/1/be/src/transport/TSasl.h File be/src/transport/TSasl.h: http://gerrit.cloudera.org:8080/#/c/8777/1/be/src/transport/TSasl.h@213 PS1, Line 213: life cycle nit: lifetime -- To view, visit http://gerrit.cloudera.org:8080/8777 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1f29c2396df114264dfc23726b8ba778f50e12e9 Gerrit-Change-Number: 8777 Gerrit-PatchSet: 1 Gerrit-Owner: Michael HoGerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Wed, 06 Dec 2017 19:15:56 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6272: Update external Hadoop ecosystem versions
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8759 ) Change subject: IMPALA-6272: Update external Hadoop ecosystem versions .. IMPALA-6272: Update external Hadoop ecosystem versions Updates from 5.14.0-SNAPSHOT to 5.15.0-SNAPSHOT. I did this like so: $perl -pi -e 's/5.14.0-SNAP/5.15.0-SNAP/' $(git grep -l 5.14.0-SNAP) Change-Id: I28caf226de8fc6626a9482f6473fec539125b7d5 Reviewed-on: http://gerrit.cloudera.org:8080/8759 Reviewed-by: Michael BrownTested-by: Impala Public Jenkins --- M bin/impala-config.sh 1 file changed, 6 insertions(+), 6 deletions(-) Approvals: Michael Brown: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/8759 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I28caf226de8fc6626a9482f6473fec539125b7d5 Gerrit-Change-Number: 8759 Gerrit-PatchSet: 3 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-6272: Update external Hadoop ecosystem versions
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8759 ) Change subject: IMPALA-6272: Update external Hadoop ecosystem versions .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/8759 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I28caf226de8fc6626a9482f6473fec539125b7d5 Gerrit-Change-Number: 8759 Gerrit-PatchSet: 2 Gerrit-Owner: Philip ZeyligerGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Wed, 06 Dec 2017 19:15:57 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6242: Reduce flakiness in TimerCounterTest
Tianyi Wang has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/8670 ) Change subject: IMPALA-6242: Reduce flakiness in TimerCounterTest .. IMPALA-6242: Reduce flakiness in TimerCounterTest The error threshold in TimerCounterTest is 15ms, which is still not enough in some rare cases. This patch increases it to 30ms. Change-Id: Ifc038908857060ccbabfe30c46e72fd93907f412 --- M be/src/util/runtime-profile-test.cc 1 file changed, 12 insertions(+), 12 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/8670/2 -- To view, visit http://gerrit.cloudera.org:8080/8670 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifc038908857060ccbabfe30c46e72fd93907f412 Gerrit-Change-Number: 8670 Gerrit-PatchSet: 2 Gerrit-Owner: Tianyi WangGerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-3804: Re-enable per-scan filtering for sequence-based scanners
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8684 ) Change subject: IMPALA-3804: Re-enable per-scan filtering for sequence-based scanners .. Patch Set 6: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1578/ -- To view, visit http://gerrit.cloudera.org:8080/8684 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4b38c26bcbe67f83efcc65a1723d766626ae3d3e Gerrit-Change-Number: 8684 Gerrit-PatchSet: 6 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 06 Dec 2017 18:52:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3804: Re-enable per-scan filtering for sequence-based scanners
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8684 ) Change subject: IMPALA-3804: Re-enable per-scan filtering for sequence-based scanners .. Patch Set 6: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8684 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4b38c26bcbe67f83efcc65a1723d766626ae3d3e Gerrit-Change-Number: 8684 Gerrit-PatchSet: 6 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 06 Dec 2017 18:52:50 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6262: Always initialize runtime profile for DataSink
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/8770 ) Change subject: IMPALA-6262: Always initialize runtime profile for DataSink .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/8770/2/be/src/exec/hdfs-table-sink.cc File be/src/exec/hdfs-table-sink.cc: http://gerrit.cloudera.org:8080/#/c/8770/2/be/src/exec/hdfs-table-sink.cc@61 PS2, Line 61: GetName() Are you sure it's safe to call a virtual function during the constructor? It's also a bit dangerous to call this because you don't really know that GetName() doesn't depend on some other object state (luckily, it looks like all GetName implementations return a static string). Given that we're calling GetName() from the derived classes now, let's make GetName() not virtual to avoid the first problem. For the second problem, you may want to just use the static string at these places and get rid of GetName(), but I feel less strongly about that. -- To view, visit http://gerrit.cloudera.org:8080/8770 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a683000ef180027b929dbebe78bc2a530a4767e Gerrit-Change-Number: 8770 Gerrit-PatchSet: 2 Gerrit-Owner: Michael HoGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 06 Dec 2017 16:58:40 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6067: Enable S3 access via IAM roles for EC2 VMs
Hello Lars Volker, Michael Brown, Jim Apple, Philip Zeyliger, Sailesh Mukil, David Knupp, Joe McDonnell, Tim Armstrong, Alex Behm, Zach Amsden, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8294 to look at the new patch set (#5). Change subject: IMPALA-6067: Enable S3 access via IAM roles for EC2 VMs .. IMPALA-6067: Enable S3 access via IAM roles for EC2 VMs For some time Impala in a production environment has been able to access data stored in Amazon S3 buckets using credentials specified in a number of ways: - storing Amazon access keys in environment variables or in core-site.xml. - using proprietary management tools to store Amazon access keys securely - using Amazon IAM roles bound to VMs running in EC2. The development minicluster environment used the first approach, which risked leaking these keys. This change enables Impala builds to use IAM roles to access S3 buckets when running on an Amazon EC2 virtual machine. The changes mainly ensure that environment variables carrying the traditional AWS credentials do not conflict with credentials supplied by the IAM role attached to the VM instance. IAM role based credentials are accessible through the EC2 instance-property mechanism; for further details see Amazon's docs at http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html#instance-metadata-security-credentials The change also removes the remaining references to the s3n: provider. In the FE tests all URIs referring to s3n: are replaced with their s3a: equivalents, except for a single negative test in AnalyzeStmtsTest.java, which is removed. In addition to the code changes, the s3n: and s3a: credential properties are also removed from core-site.xml.tmpl. The s3a: provider can pick up AWS S3 credentials from environment variables or IAM properties bound to the VM instance, which is a more flexible approach. As environment variables have precedence over IAM roles, care must be taken when managing the canonical environment variables carrying AWS credentials. There are two requirements to be reconciled: 1. The FE tests have code that examines s3a: URIs; this code needs existing, but not necessarily valid AWS credentials. 2. When the Impala test suite is executed on an EC2 VM, AWS credentials can be supplied via IAM roles. These credentials can be used only if the AWS_* environment variables are unset (do not exist). The tradeoff is managed following these rules: 1. When AWS_* environment variables are set before invoking the Impala configuration scripts, their value is preserved and the config scripts ensure that the variables are exported. 2. If the AWS_* variables are missing or empty, they will be unset to ensure that credentials supplied by Amazon's IAM roles can be accessed, 3. except if the scripts are running outside of EC2 (so there can be no IAM roles) and TARGET_FILESYSTEM is not set "s3". This combination is most often the case on a developer's local workstation. In this case the AWS_* credential variables are forcibly set to dummy values to allow the FE tests to succeed. The removal of S3 credential parameters from core-site.xml[.tmpl] also allows users to set up their own credentials there, the config scripts will not change those settings. Environment variables carrying AWS security credentials will be set up according to the following table: Instance: Running outside EC2 || Running in EC2 | ++++++ TARGET_FILESYSTEM | S3 | not S3 || S3 | not S3 | ++++++ |||||| empty | unset | dummy || unset | unset | AWS_* |||||| env --++++++ var |||||| not empty | export | export || export | export | |||||| ++++++ Legend: unset: the variable is unset export: the variable is exported with its current value dummy: the variable is set to a preset dummy value and exported Running on an EC2 VM is indicated by setting RUNNING_IN_EC2 to "true" and exporting it before impala_config.sh is invoked. The change also moves the logic performing the S3 access checks into a separate script file: bin/check-s3-access.sh. This file now contains all the S3-specific logic and network access to check if the requested S3 bucket can be accessed. Testing: Performed local builds for HDFS as well as automated builds against HDFS and S3, using both IAM roles and explicit AWS_* credentials for authentication. Verified that FE tests that parse s3a: URLs are still
[Impala-ASF-CR] IMPALA-6262: Always initialize runtime profile for DataSink
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8770 ) Change subject: IMPALA-6262: Always initialize runtime profile for DataSink .. Patch Set 2: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/8770/2/be/src/exec/data-sink.cc File be/src/exec/data-sink.cc: http://gerrit.cloudera.org:8080/#/c/8770/2/be/src/exec/data-sink.cc@49 PS2, Line 49: state-> > I guess you may assume state cannot be nullptr. If so, why don't you pass s We follow the google style guide on this point, which says to avoid non-const reference arguments: https://google.github.io/styleguide/cppguide.html#Reference_Arguments I agree that it's useful that reference argument document that the argument is non-null, but it can also make the code more confusing in some ways. -- To view, visit http://gerrit.cloudera.org:8080/8770 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a683000ef180027b929dbebe78bc2a530a4767e Gerrit-Change-Number: 8770 Gerrit-PatchSet: 2 Gerrit-Owner: Michael HoGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 06 Dec 2017 16:38:47 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6242: Reduce flakiness in TimerCounterTest
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/8670 ) Change subject: IMPALA-6242: Reduce flakiness in TimerCounterTest .. Patch Set 1: Since we are measuring wall time here, I don't know any better. I think 30 ms for measurement error is still acceptable, but maybe you could increase the sleep times as well to keep the measurement error a small fraction of the measured runtime. -- To view, visit http://gerrit.cloudera.org:8080/8670 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifc038908857060ccbabfe30c46e72fd93907f412 Gerrit-Change-Number: 8670 Gerrit-PatchSet: 1 Gerrit-Owner: Tianyi WangGerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 06 Dec 2017 16:37:32 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4835: Part 1: simplify I/O mgr mem mgmt and cancellation
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8414 ) Change subject: IMPALA-4835: Part 1: simplify I/O mgr mem mgmt and cancellation .. Patch Set 13: Code-Review+2 carry -- To view, visit http://gerrit.cloudera.org:8080/8414 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If5cb42437d11c13bc4a55c3ab426b66777332bd1 Gerrit-Change-Number: 8414 Gerrit-PatchSet: 13 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 06 Dec 2017 16:33:24 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4835: Part 1: simplify I/O mgr mem mgmt and cancellation
Tim Armstrong has uploaded a new patch set (#13). ( http://gerrit.cloudera.org:8080/8414 ) Change subject: IMPALA-4835: Part 1: simplify I/O mgr mem mgmt and cancellation .. IMPALA-4835: Part 1: simplify I/O mgr mem mgmt and cancellation In preparation for switching the I/O mgr to the buffer pool, this removes and cleans up a lot of code so that the switchover patch starts from a cleaner slate. * Remove the free buffer cache (which will be replaced by buffer pool's own caching). * Make memory limit exceeded error checking synchronous (in anticipation of having to propagate buffer pool errors synchronously). * Simplify error propagation - remove the (ineffectual) code that enqueued BufferDescriptors containing error statuses. * Document locking scheme better in a few places, make it part of the function signature when it seemed reasonable. * Move ReturnBuffer() to ScanRange, because it is intrinsically connected with the lifecycle of a scan range. * Separate external ReturnBuffer() and internal CleanUpBuffer() interfaces - previously callers of ReturnBuffer() were fudging the num_buffers_in_reader accounting to make the external interface work. * Eliminate redundant state in ScanRange: 'eosr_returned_' and 'is_cancelled_'. * Clarify the logic around calling Close() for the last BufferDescriptor. -> There appeared to be an implicit assumption that buffers would be freed in the order they were returned from the scan range, so that the "eos" buffer was returned last. Instead just count the number of outstanding buffers to detect the last one. -> Touching the is_cancelled_ field without holding a lock was hard to reason about - violated locking rules and it was unclear that it was race-free. * Remove DiskIoMgr::Read() to simplify interface. It is trivial to inline at the callsites. This will probably regress performance somewhat because of the cache removal, so my plan is to merge it around the same time as switching the I/O mgr to allocate from the buffer pool. I'm keeping the patches separate to make reviewing easier. Testing: * Ran exhaustive tests * Ran the disk-io-mgr-stress-test overnight Change-Id: If5cb42437d11c13bc4a55c3ab426b66777332bd1 --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node.cc M be/src/exec/scanner-context.cc M be/src/runtime/exec-env.cc M be/src/runtime/io/disk-io-mgr-stress.cc M be/src/runtime/io/disk-io-mgr-test.cc M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/disk-io-mgr.h M be/src/runtime/io/request-context.cc M be/src/runtime/io/request-context.h M be/src/runtime/io/request-ranges.h M be/src/runtime/io/scan-range.cc M be/src/runtime/mem-tracker.h M be/src/runtime/test-env.cc M be/src/runtime/tmp-file-mgr-test.cc M be/src/runtime/tmp-file-mgr.cc M be/src/util/impalad-metrics.cc M be/src/util/impalad-metrics.h 19 files changed, 575 insertions(+), 904 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/8414/13 -- To view, visit http://gerrit.cloudera.org:8080/8414 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If5cb42437d11c13bc4a55c3ab426b66777332bd1 Gerrit-Change-Number: 8414 Gerrit-PatchSet: 13 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-6272: Update external Hadoop ecosystem versions
Michael Brown has posted comments on this change. ( http://gerrit.cloudera.org:8080/8759 ) Change subject: IMPALA-6272: Update external Hadoop ecosystem versions .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8759 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I28caf226de8fc6626a9482f6473fec539125b7d5 Gerrit-Change-Number: 8759 Gerrit-PatchSet: 2 Gerrit-Owner: Philip ZeyligerGerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Wed, 06 Dec 2017 15:42:55 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2248: Make idle session timeout a query option
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/8490 ) Change subject: IMPALA-2248: Make idle_session_timeout a query option .. Patch Set 11: (10 comments) Thanks! http://gerrit.cloudera.org:8080/#/c/8490/10//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8490/10//COMMIT_MSG@7 PS10, Line 7: IMPALA-2248: Make idle_session_timeout a query option > Sounds reasonable. We'll have to make sure to document the limitations clea I'll need some help in filing the Doc JIRA, but sure! http://gerrit.cloudera.org:8080/#/c/8490/11//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8490/11//COMMIT_MSG@20 PS11, Line 20: SET statement. > Quick question about the behavior of setting this query option: if the sess I updated the code not to expire anything with the SET statement. The last accessed time of the session is updated immediately, therefore the session won't expire for at least the newly set timeout. It shouldn't affect the queries either, they have their own timeouts. http://gerrit.cloudera.org:8080/#/c/8490/10/be/src/service/client-request-state.cc File be/src/service/client-request-state.cc: http://gerrit.cloudera.org:8080/#/c/8490/10/be/src/service/client-request-state.cc@207 PS10, Line 207: if (iequals(key, "idle_session_timeout")) { > Right, but if the client provides an invalid value, I think it will just ge OK, I restructured my code. Now the validation mainly happens in SetQueryOption(), and the client will get an error if they want to set it to an invalid value. (An exception to this is HS2 OpenSession(), which used to ignore the errors related to query options, so I didn't change the behavior there.) http://gerrit.cloudera.org:8080/#/c/8490/11/be/src/service/impala-hs2-server.cc File be/src/service/impala-hs2-server.cc: http://gerrit.cloudera.org:8080/#/c/8490/11/be/src/service/impala-hs2-server.cc@333 PS11, Line 333: } else if (iequals(v.first, "idle_session_timeout")) { > Can this fall through to SetQueryOption() below now that it's supported ? I modified the code structure, now some part of the validation is done in SetQueryTimeout(), but some special casing is still needed. http://gerrit.cloudera.org:8080/#/c/8490/11/fe/src/test/java/org/apache/impala/service/JdbcTest.java File fe/src/test/java/org/apache/impala/service/JdbcTest.java: http://gerrit.cloudera.org:8080/#/c/8490/11/fe/src/test/java/org/apache/impala/service/JdbcTest.java@616 PS11, Line 616: ew Long( > nit: is the new Long() necessary? Seems like it should be automatically cas Right, done. http://gerrit.cloudera.org:8080/#/c/8490/11/fe/src/test/java/org/apache/impala/service/JdbcTest.java@626 PS11, Line 626: int sleepPeriod = (int)(timeout * timeoutTolerance * 1000) + 500; > I think the wakeup timing could do with a brief explanation. Is the idea is Right, that's the idea. I implemented this test case based on test_concurrent_session_mixed_idle_timeout in tests/hs2/test_hs2.py. Maybe it's redundant, but it is a thorough test of the jdbc client regarding to timeouts. I added some description to the beginning of the loop. http://gerrit.cloudera.org:8080/#/c/8490/11/fe/src/test/java/org/apache/impala/service/JdbcTest.java@634 PS11, Line 634: new Long > same here Done http://gerrit.cloudera.org:8080/#/c/8490/11/fe/src/test/java/org/apache/impala/service/JdbcTest.java@636 PS11, Line 636: Boolean > bool? Not sure why we need to use the boxed type Done http://gerrit.cloudera.org:8080/#/c/8490/11/fe/src/test/java/org/apache/impala/service/JdbcTest.java@639 PS11, Line 639: try(ResultSet rs = connection.createStatement().executeQuery("SELECT 1+2")) { > nit: whitespace Done http://gerrit.cloudera.org:8080/#/c/8490/11/fe/src/test/java/org/apache/impala/util/Metrics.java File fe/src/test/java/org/apache/impala/util/Metrics.java: http://gerrit.cloudera.org:8080/#/c/8490/11/fe/src/test/java/org/apache/impala/util/Metrics.java@43 PS11, Line 43: public Object getMetric(String metric) throws Exception { > A brief comment would be helpful, mainly to explain what it returns - when Added a javadoc comment to the method. -- To view, visit http://gerrit.cloudera.org:8080/8490 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32e2775f80da387b0df4195fe2c5435b3f8e585e Gerrit-Change-Number: 8490 Gerrit-PatchSet: 11 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-2248: Make idle session timeout a query option
Hello Michael Ho, Thomas Tauber-Marshall, Laszlo Gaal, Gabor Kaszab, Attila Jeges, Tim Armstrong, Csaba Ringhofer, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8490 to look at the new patch set (#12). Change subject: IMPALA-2248: Make idle_session_timeout a query option .. IMPALA-2248: Make idle_session_timeout a query option This commit makes idle_session_timeout a query option. idle_session_timeout currently can be set as a command line option, which will be the default timeout for sessions. HS2 sessions can override it with a smaller value by setting it in the configuration overlay of HS2 OpenSession(). However, we can't override idle_session_timeout for JDBC/ODBC connections, because we cannot put this in the connection string. This commit is a workaround for this problem, it allows JDBC/ODBC connections to set the session timeout as a query option with the SET statement. I created an automated test case in JdbcTest.java based on test_hs2.py::test_concurrent_session_mixed_idle_timeout. I also extended the test_session_expiration and test_set_and_unset test suites. Change-Id: I32e2775f80da387b0df4195fe2c5435b3f8e585e --- M be/src/service/client-request-state.cc M be/src/service/client-request-state.h M be/src/service/impala-hs2-server.cc M be/src/service/impala-server.cc M be/src/service/impala-server.h M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M fe/src/test/java/org/apache/impala/service/JdbcTest.java A fe/src/test/java/org/apache/impala/util/Metrics.java M tests/custom_cluster/test_session_expiration.py M tests/custom_cluster/test_set_and_unset.py 13 files changed, 375 insertions(+), 45 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/8490/12 -- To view, visit http://gerrit.cloudera.org:8080/8490 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I32e2775f80da387b0df4195fe2c5435b3f8e585e Gerrit-Change-Number: 8490 Gerrit-PatchSet: 12 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format
Hello Gabor Kaszab, Attila Jeges, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8508 to look at the new patch set (#6). Change subject: IMPALA-5237: Support a quoted string in date/time format .. IMPALA-5237: Support a quoted string in date/time format Impala does not represent a quoted string at date/time format. For example, addtional quoted string between the patterns (e.g. HH\'H\' => 11H). Hive supports this feature, so user wants to get a same result from Impala. By the way, Impala returns a different result as below. * Hive > select from_unixtime(1492677564, 'HH\'H\' mm\'M\' ss\'S\''); 08H 39M 24S * Impala > select from_unixtime(1492677564, 'HH\'H\' mm\'M\' ss\'S\''); 08'8' 39'4' 24'0' The change affects the format pattern for from_unixtime/from_timestamp/unix_timestamp. In unix_timestamp, user can also specify a quoted string like this. > select unix_timestamp('\'Epoch time: \'19700101', > '\'Epoch time: \'MMdd'); 0 By the way, the quoted strings between input and format should be exactly same and internally string comparison is case-sensitive. There is one intentional difference between Hive and Impala. In Impala, the format string should have any date or time patten as below. Throwing the error/warning is better if Impala cannot optimize the expression. User must rewrite the query and don't pay the function call. * Hive > select from_unixtime(0, '\'Hello world!\''); Hello world! * Impala > select from_unixtime(0, '\'Hello world!\''); Bad date/time conversion format: 'Hello world!' Testing: Add unit tests for working and nonworking cases Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178 --- M be/src/exprs/expr-test.cc M be/src/runtime/timestamp-parse-util.cc M be/src/runtime/timestamp-parse-util.h 3 files changed, 98 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/8508/6 -- To view, visit http://gerrit.cloudera.org:8080/8508 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178 Gerrit-Change-Number: 8508 Gerrit-PatchSet: 6 Gerrit-Owner: Kim Jin ChulGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-6262: Always initialize runtime profile for DataSink
Kim Jin Chul has posted comments on this change. ( http://gerrit.cloudera.org:8080/8770 ) Change subject: IMPALA-6262: Always initialize runtime profile for DataSink .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/8770/2/be/src/exec/data-sink.cc File be/src/exec/data-sink.cc: http://gerrit.cloudera.org:8080/#/c/8770/2/be/src/exec/data-sink.cc@49 PS2, Line 49: state-> I guess you may assume state cannot be nullptr. If so, why don't you pass state using reference type instead of pointer? I could see several argument passing in this change. Reference type is more intuitive to guarantee the variable points an object. -- To view, visit http://gerrit.cloudera.org:8080/8770 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a683000ef180027b929dbebe78bc2a530a4767e Gerrit-Change-Number: 8770 Gerrit-PatchSet: 2 Gerrit-Owner: Michael HoGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 06 Dec 2017 14:30:28 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format
Kim Jin Chul has posted comments on this change. ( http://gerrit.cloudera.org:8080/8508 ) Change subject: IMPALA-5237: Support a quoted string in date/time format .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/8508/4/be/src/runtime/timestamp-parse-util.h File be/src/runtime/timestamp-parse-util.h: http://gerrit.cloudera.org:8080/#/c/8508/4/be/src/runtime/timestamp-parse-util.h@183 PS4, Line 183: /// dt_ctx -- date/time format context > Actually I don't want leave meaningless comments. I think the parameter nam I meant "the parameter names" are in new code in the recent patch set. Please feel free to tell me if you have a better name or more informative comment. -- To view, visit http://gerrit.cloudera.org:8080/8508 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178 Gerrit-Change-Number: 8508 Gerrit-PatchSet: 4 Gerrit-Owner: Kim Jin ChulGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 06 Dec 2017 11:44:38 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format
Kim Jin Chul has posted comments on this change. ( http://gerrit.cloudera.org:8080/8508 ) Change subject: IMPALA-5237: Support a quoted string in date/time format .. Patch Set 3: (3 comments) http://gerrit.cloudera.org:8080/#/c/8508/3/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: http://gerrit.cloudera.org:8080/#/c/8508/3/be/src/exprs/expr-test.cc@5746 PS3, Line 5746: TestValue("unix_timestamp('\\'Epoch time: \\'1970|01|01 00|00|00',\ > Thanks for the explanation! I guess your question is that the result seems to be wrong even though the format is find and everything is okey to run. Timezone can affect epoch timestamp. Impala interprets date and time with UTC timezone by default due to avoidance of undesired results by local timezone. - Regarding default timezone: https://github.com/apache/impala/blob/master/docs/topics/impala_timestamp.xml#L89 - Regarding data and time interpretation at unix_timestamp: https://github.com/apache/impala/blob/master/docs/topics/impala_datetime_functions.xml#L2463 - Test case for epoch: https://github.com/apache/impala/blob/master/be/src/exprs/expr-test.cc#L5617 Therefore, I think the result should be zero. http://gerrit.cloudera.org:8080/#/c/8508/4/be/src/runtime/timestamp-parse-util.h File be/src/runtime/timestamp-parse-util.h: http://gerrit.cloudera.org:8080/#/c/8508/4/be/src/runtime/timestamp-parse-util.h@182 PS4, Line 182: Reset(fmt, fmt_len); > 1: This comment doesn't add any further info as it is basically the functi Thanks for your kind comment. 1,2: I agree. Actually I thought I should have to add the meaningless comment, but I didn't imagine more meaningful comment and I think the function name looks self-descriptive. 3. More detailed explanation is added to return description instead of the word "parse". 4. I agree. Reader expects a string literal due to "Get", so the revised function will return a pointer of string literal. The function is split into two parts: GetStringLiteralBetweenQuotes and SaveDateTimeToken. I think SaveDateTimeToken will be used generally when refactoring ParseFormatTokens or adding a new code. I fully agree an unit function should have a minimal purpose and responsibility. http://gerrit.cloudera.org:8080/#/c/8508/4/be/src/runtime/timestamp-parse-util.h@183 PS4, Line 183: } > Again, this is simply writing the type of the parameter with spaces. Actually I don't want leave meaningless comments. I think the parameter names are self-explainable, but please leave a comment as a new reader if you think any additional information should be helpful. -- To view, visit http://gerrit.cloudera.org:8080/8508 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178 Gerrit-Change-Number: 8508 Gerrit-PatchSet: 3 Gerrit-Owner: Kim Jin ChulGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 06 Dec 2017 11:41:16 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format
Hello Gabor Kaszab, Attila Jeges, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8508 to look at the new patch set (#5). Change subject: IMPALA-5237: Support a quoted string in date/time format .. IMPALA-5237: Support a quoted string in date/time format Impala does not represent a quoted string at date/time format. For example, addtional quoted string between the patterns (e.g. HH\'H\' => 11H). Hive supports this feature, so user wants to get a same result from Impala. By the way, Impala returns a different result as below. * Hive > select from_unixtime(1492677564, 'HH\'H\' mm\'M\' ss\'S\''); 08H 39M 24S * Impala > select from_unixtime(1492677564, 'HH\'H\' mm\'M\' ss\'S\''); 08'8' 39'4' 24'0' The change affects the format pattern for from_unixtime/from_timestamp/unix_timestamp. In unix_timestamp, user can also specify a quoted string like this. > select unix_timestamp('\'Epoch time: \'19700101', > '\'Epoch time: \'MMdd'); 0 By the way, the quoted strings between input and format should be exactly same and internally string comparison is case-sensitive. There is one intentional difference between Hive and Impala. In Impala, the format string should have any date or time patten as below. Throwing the error/warning is better if Impala cannot optimize the expression. User must rewrite the query and don't pay the function call. * Hive > select from_unixtime(0, '\'Hello world!\''); Hello world! * Impala > select from_unixtime(0, '\'Hello world!\''); Bad date/time conversion format: 'Hello world!' Testing: Add unit tests for working and nonworking cases Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178 --- M be/src/exprs/expr-test.cc M be/src/runtime/timestamp-parse-util.cc M be/src/runtime/timestamp-parse-util.h 3 files changed, 98 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/8508/5 -- To view, visit http://gerrit.cloudera.org:8080/8508 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178 Gerrit-Change-Number: 8508 Gerrit-PatchSet: 5 Gerrit-Owner: Kim Jin ChulGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5237: Support a quoted string in date/time format
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/8508 ) Change subject: IMPALA-5237: Support a quoted string in date/time format .. Patch Set 4: (4 comments) http://gerrit.cloudera.org:8080/#/c/8508/3/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: http://gerrit.cloudera.org:8080/#/c/8508/3/be/src/exprs/expr-test.cc@5746 PS3, Line 5746: TestIsNull("months_between('23:33:45','15:33:45')", TYPE_DOUBLE); > I guess you may expect "Epoch time: 1970|01|01 00|00|00" as a result, unix_ Thanks for the explanation! One more question: If we decide not to return Null here, then wouldn't it make sense to return the value as if that "Epoch time: " something wasn't there? I have a feeling that we should either return Null, or return the value indicated by the given parameters, so in this case I think this query should return the same as the following: unix_timestamp("1970|01|01 00|00|00","|MM|dd HH|mm|ss"); 28800 Could you please share your thoughts? http://gerrit.cloudera.org:8080/#/c/8508/3/be/src/exprs/expr-test.cc@5764 PS3, Line 5764: TestIsNull("dayofweek(cast('09:10:11.00' as timestamp))", TYPE_INT); > As I explained above, I think returning NULL for the mismatched case is eno Thanks again for the explanation! Using TestIsNull here sounds fine. http://gerrit.cloudera.org:8080/#/c/8508/4/be/src/runtime/timestamp-parse-util.h File be/src/runtime/timestamp-parse-util.h: http://gerrit.cloudera.org:8080/#/c/8508/4/be/src/runtime/timestamp-parse-util.h@182 PS4, Line 182: /// Get string literal between quotes 1: This comment doesn't add any further info as it is basically the function name with spaces. 2: The comment should tell the reader if there is some precondition for this function. In this case I think it would be beneficial to mention that we assume that str is currently pointing to the opening quote. 3: For me it would also make sense to say explicitly what successful and unsuccessful parse mean. e.g. parsing is unsuccessful if there is no closing quote. 4: The 'Get' in the function name indicates that when we call this function it either returns the found string literal as a return value or through an out parameter. I think this is a bit misleading as in fact this function is for 2 different things (but not for returning a string literal): - Finding the closing quote and getting the string between the 2 quotes. - Adding a DateTimeFormatToken to dt_ctx. My preference here would be to introduce functions with a single area of responsibility: one for finding the string between the quotes and one for inserting the FormatToken to dt_ctx. What do you think? http://gerrit.cloudera.org:8080/#/c/8508/4/be/src/runtime/timestamp-parse-util.h@183 PS4, Line 183: /// dt_ctx -- date/time format context Again, this is simply writing the type of the parameter with spaces. -- To view, visit http://gerrit.cloudera.org:8080/8508 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie34055ac695748bcfb110bfa6ed5308f469ea178 Gerrit-Change-Number: 8508 Gerrit-PatchSet: 4 Gerrit-Owner: Kim Jin ChulGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 06 Dec 2017 09:42:15 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6262: Always initialize runtime profile for DataSink
Sailesh Mukil has posted comments on this change. ( http://gerrit.cloudera.org:8080/8770 ) Change subject: IMPALA-6262: Always initialize runtime profile for DataSink .. Patch Set 2: Code-Review+1 I'm fine with this patch. Please feel free to upgrade to a +2 if no one else is reviewing it. -- To view, visit http://gerrit.cloudera.org:8080/8770 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a683000ef180027b929dbebe78bc2a530a4767e Gerrit-Change-Number: 8770 Gerrit-PatchSet: 2 Gerrit-Owner: Michael HoGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 06 Dec 2017 09:27:03 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6262: Always initialize runtime profile for DataSink
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/8770 ) Change subject: IMPALA-6262: Always initialize runtime profile for DataSink .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/8770/1/be/src/exec/data-sink.cc File be/src/exec/data-sink.cc: http://gerrit.cloudera.org:8080/#/c/8770/1/be/src/exec/data-sink.cc@47 PS1, Line 47: DataSink::DataSink(const RowDescriptor* row_desc, const string& name, RuntimeState* state) > Does it make sense to just pass RuntimeState and 'name' here and create the That's a good idea. -- To view, visit http://gerrit.cloudera.org:8080/8770 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a683000ef180027b929dbebe78bc2a530a4767e Gerrit-Change-Number: 8770 Gerrit-PatchSet: 2 Gerrit-Owner: Michael HoGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 06 Dec 2017 09:03:17 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6262: Always initialize runtime profile for DataSink
Michael Ho has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/8770 ) Change subject: IMPALA-6262: Always initialize runtime profile for DataSink .. IMPALA-6262: Always initialize runtime profile for DataSink This change moves the creation of the runtime profile from DataSink::Prepare() to the ctor of DataSink derived classes. This makes sure that DataSink::Close() and other functions can access the profile even if the DataSink fails to initialize. Testing done: Added a test case which triggers failure in the initialization of output expressions in a HdfsTableSink. Impalad crashed consistently without the fix. Change-Id: I2a683000ef180027b929dbebe78bc2a530a4767e --- M be/src/exec/data-sink.cc M be/src/exec/data-sink.h M be/src/exec/hbase-table-sink.cc M be/src/exec/hbase-table-sink.h M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/exec/kudu-table-sink.cc M be/src/exec/kudu-table-sink.h M be/src/exec/nested-loop-join-builder.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-builder.h M be/src/exec/plan-root-sink.cc M be/src/exec/plan-root-sink.h M be/src/runtime/data-stream-sender.cc M be/src/runtime/data-stream-sender.h M be/src/runtime/data-stream-test.cc M be/src/runtime/krpc-data-stream-sender.cc M be/src/runtime/krpc-data-stream-sender.h A testdata/workloads/functional-query/queries/QueryTest/insert_bad_expr.test M tests/query_test/test_insert.py 20 files changed, 79 insertions(+), 46 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/8770/2 -- To view, visit http://gerrit.cloudera.org:8080/8770 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2a683000ef180027b929dbebe78bc2a530a4767e Gerrit-Change-Number: 8770 Gerrit-PatchSet: 2 Gerrit-Owner: Michael HoGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-2640: Maintain the command given by an user
Hello John Russell, Andre Araujo, Zoltan Borok-Nagy, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8762 to look at the new patch set (#3). Change subject: IMPALA-2640: Maintain the command given by an user .. IMPALA-2640: Maintain the command given by an user IMPALA-2180 has forced the command to be lowercase. It is better to maintain the command given by an user. Testing: TestImpalaShell.test_var_substitution already has a testcase Change-Id: Ifdce9781d1d97596c188691b62a141b9bd137610 --- M shell/impala_shell.py 1 file changed, 84 insertions(+), 71 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/62/8762/3 -- To view, visit http://gerrit.cloudera.org:8080/8762 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifdce9781d1d97596c188691b62a141b9bd137610 Gerrit-Change-Number: 8762 Gerrit-PatchSet: 3 Gerrit-Owner: Kim Jin ChulGerrit-Reviewer: Andre Araujo Gerrit-Reviewer: John Russell Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-2640: Maintain the command given by an user
Kim Jin Chul has posted comments on this change. ( http://gerrit.cloudera.org:8080/8762 ) Change subject: IMPALA-2640: Maintain the command given by an user .. Patch Set 2: Code-Review-1 I found that "ctrl + d" (EOF) does not work on my change. Let me take a look at this and then provide a new patch set. -- To view, visit http://gerrit.cloudera.org:8080/8762 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifdce9781d1d97596c188691b62a141b9bd137610 Gerrit-Change-Number: 8762 Gerrit-PatchSet: 2 Gerrit-Owner: Kim Jin ChulGerrit-Reviewer: Andre Araujo Gerrit-Reviewer: John Russell Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 06 Dec 2017 08:28:33 + Gerrit-HasComments: No