[Impala-ASF-CR] [DOCS] Explain Boost setting needed for 96-bit timestamps
John Russell has posted comments on this change. Change subject: [DOCS] Explain Boost setting needed for 96-bit timestamps .. Patch Set 2: Tim and Michael, I added you as reviewers based on your frequent activity in be/src/exprs/. If someone else is more familiar with the compilation aspect w.r.t. TIMESTAMP, please suggest any other reviewer. -- To view, visit http://gerrit.cloudera.org:8080/7983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4b67cd7762f682c3a054e0d9641080aa51801c83 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: John Russell Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5894: [DOCS] Clarify placement of STRAIGHT JOIN hint
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5894: [DOCS] Clarify placement of STRAIGHT_JOIN hint .. Patch Set 1: Build started: https://jenkins.impala.io/job/gerrit-docs-submit/153/ -- To view, visit http://gerrit.cloudera.org:8080/8031 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3cac1afccc132f389b2017ad217fdf7e7b04513a Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4670: Introduces RpcMgr class
Sailesh Mukil has posted comments on this change. Change subject: IMPALA-4670: Introduces RpcMgr class .. Patch Set 6: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/7901/6/be/src/rpc/rpc-mgr-test.cc File be/src/rpc/rpc-mgr-test.cc: PS6, Line 65: FLAGS_hostname Do we want this to be the loopback address since this is only a test? The thrift-server-test listens on the loopback address instead of the externally visible hostname, but I'm fine either way. -- To view, visit http://gerrit.cloudera.org:8080/7901 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8adb10ae375d7bf945394c38a520f12d29cf7b46 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5881: Use native allocation while building catalog updates
Alex Behm has posted comments on this change. Change subject: IMPALA-5881: Use native allocation while building catalog updates .. Patch Set 17: (7 comments) http://gerrit.cloudera.org:8080/#/c/7955/17/fe/src/test/java/org/apache/impala/thrift/NativeByteArrayOutputStreamTest.java File fe/src/test/java/org/apache/impala/thrift/NativeByteArrayOutputStreamTest.java: Line 40: || size >= NativeByteArrayOutputStream.BUFFER_MAX_SIZE_LIMIT) { Our real allocator would not OOM if the size is big, at least not in this place, so let's remove the second check. Line 60: private void writeAndCheck(NativeByteArrayOutputStream nboas, I'd prefer to split this up into writeOk() and writeNotOk(). Otherwise, tests may go into the "wrong" code path but still succeed, e.g., because the allocation succeeded and updated the bytesWritten correctly, although we expected the allocation to fail. Line 78: NativeByteArrayOutputStream nboas = The first allocation happens here in the c'tor. We need to test that as well. Alternatively, you can change the nbaos implementation to not allocate in the c'tor (either way is fine with me) Line 90: nboas = new NativeByteArrayOutputStream(testAllocator); nit: nbaos Line 108: remove extraneous newines Line 114: testAllocator = new NativeTestAllocator(); Add a helper function for these smaller tests that creates a new allocator, nbaos, array, etc. http://gerrit.cloudera.org:8080/#/c/7955/17/fe/src/test/java/org/apache/impala/thrift/TNativeSerializerTest.java File fe/src/test/java/org/apache/impala/thrift/TNativeSerializerTest.java: Line 81: public void testBasicSerialization() We usually call these TestBasicSerialization (first letter is uppercase) -- To view, visit http://gerrit.cloudera.org:8080/7955 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I383684effa9524734ce3c6c0fb7ed37de0e15782 Gerrit-PatchSet: 17 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1767 Adds predicate to test boolean values true, false, unknown.
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-1767 Adds predicate to test boolean values true, false, unknown. .. Patch Set 6: (22 comments) http://gerrit.cloudera.org:8080/#/c/8014/1/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: PS1, Line 2876: don't think you need that PS1, Line 2878: same here Line 2891: | UNMATCHED_STRING_LITERAL:l expr:e nit: extra spaces (see marked as red). http://gerrit.cloudera.org:8080/#/c/8014/2/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: PS2, Line 273: KW_UNBOUNDED, KW_UNCACHED, KW_UNION, KW_UPDATE, KW_UPDATE_FN, KW_UPSERT, KW_USE, nit: long line (see vertical separator, that's ok for .test files but for everything else, we try to stay < 90). http://gerrit.cloudera.org:8080/#/c/8014/6/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: PS6, Line 337: rules.add(BoolTestToCompoundRule.INSTANCE); Maybe add a brief comment about this rule as well? PS6, Line 1093: it nit: them PS6, Line 1098: For all conjuncts other than : // the ones listed below, the method should be a no-op. remove. Similar comment above (L1082). http://gerrit.cloudera.org:8080/#/c/8014/2/fe/src/main/java/org/apache/impala/analysis/BoolTestPredicate.java File fe/src/main/java/org/apache/impala/analysis/BoolTestPredicate.java: PS2, Line 26: predicate that tests a boolean value using IS. "a boolean test predicate."? I think you can also add the grammar spec here besides the example. PS2, Line 33: it to be executable, i.e., it is illegal to call For literals we usually make them static as well; also, remove "_" from the name. PS2, Line 38: ivate No need for "this". We use "_" in the name to indicate private fields. PS2, Line 41: super(); : this.isNegated_ = You can use the addChild() function here, same thing. PS2, Line 88: Will that work with something like "select 1 is true"? For instance, "select 1 = true" returns true. Type is not boolean but it can be casted into one. PS2, Line 94: new AnalysisException("Ex Similar comment as above. e.g. select 1 is 1; http://gerrit.cloudera.org:8080/#/c/8014/2/fe/src/main/java/org/apache/impala/rewrite/BoolTestToCompoundRule.java File fe/src/main/java/org/apache/impala/rewrite/BoolTestToCompoundRule.java: PS2, Line 29: expressions nit re terminology: "compound predicate". Technically, it is an expression (subclass of Expr) but try to use the more specific term. http://gerrit.cloudera.org:8080/#/c/8014/6/fe/src/main/java/org/apache/impala/rewrite/BoolTestToCompoundRule.java File fe/src/main/java/org/apache/impala/rewrite/BoolTestToCompoundRule.java: PS6, Line 29: to compound expressions "a compound predicate." Line 64: } Preconditions.checkNotNull(result); http://gerrit.cloudera.org:8080/#/c/8014/6/fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java: PS6, Line 578: nit: extra space PS6, Line 580: TestBoolTestPredicate A few more test cases to consider: 1. bool test predicate in the select list 2. nested bool test predicates, e.g. ((bool_col is true) is true) 3. other bool exprs than bool_col, e.g. function returning bool, case expr 4. wrap the bool test in an istrue/isfalse function PS6, Line 586: null Also use "unknown" PS6, Line 591: where 1 is true" This doesn't seem to be consistent with how we handle for instance expr like 1 = true (we allow that). PS6, Line 591: AnalysisError Add a few more negative cases. Example: 1. subquery 2. literals that can't be cast to true/false, e.g. 10 is true http://gerrit.cloudera.org:8080/#/c/8014/6/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java File fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java: PS6, Line 168: // TODO: figure out how/if parens are printed to reflect expression tree. Did you figure that out? If so, remove the TODO. -- To view, visit http://gerrit.cloudera.org:8080/8014 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5bd0aa82a0316ac236b7ecf8612ef4b30c016a00 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Vuk Ercegovac Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Vuk Ercegovac Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5250: Unify decompressor output length semantics
Alex Behm has posted comments on this change. Change subject: IMPALA-5250: Unify decompressor output_length semantics .. Patch Set 1: (18 comments) http://gerrit.cloudera.org:8080/#/c/8030/1//COMMIT_MSG Commit Message: Line 9: This patch makes the semantics output_length parameter in the semantics of the output_length Line 10: Codec::ProcessBlock to be the same. In existing code different Codec::ProcessBlock() to be the same across all codecs. In the existing code different decompressors treat output_length differently. Line 13:to actual decompressed length, but it does not set it to actual to the actual decompressed length Line 16:be exactly the same as actual decompressed length, otherwise as the actual decompressed length Line 19:actual decompressed length and will set it to actual decompressed the actual decompressed length Line 21: This inconsistency leads to a bug where the error message is This inconsistency lead to a bug where Line 24: decompressors can handle oversized output buffer correctly. can handle an oversized output buffer correctly. Line 25: Lz4Decompressor will use the "safe" version of decompression function will use the "safe" instead of the "fast" decompression function Line 26: instead of the "fast" version, for the latter is insecure with corrupted We use LZ4 for compressing exchange data, so we check the perf impact of these changes. I can help you with the details. Line 27: data and requires decompressed length to be known. and requires the decompressed length to be known http://gerrit.cloudera.org:8080/#/c/8030/1/be/src/util/decompress-test.cc File be/src/util/decompress-test.cc: Line 153: // Test the behavior when the decompressor is given too little / too much space "." at end of sentence Line 155: // correct output size when the space is enough, and does not write beyond the output the correct output size Line 157: void DecompressOverUnderSizedOutputBuffer(Codec *compressor, Codec *decompressor, Out style is "Codec* compressor" Line 196: } I think we should test more directly the scenario mentioned in the JIRA where a corrupt file can lead to the output_len not being set to 0 properly. http://gerrit.cloudera.org:8080/#/c/8030/1/be/src/util/decompress.cc File be/src/util/decompress.cc: Line 426: // If size_only is false, output buffer size must be at least output_len. *output_len is Mention what output_len is set to if a non-OK status is returned. Line 427: // also updated with actual output size. the actual output size Line 575: // LZ4_decompress_fast will cause a segmentation fault if passed a nullptr output. Comment still relevant? Line 579: if (ret < 0) { single line -- To view, visit http://gerrit.cloudera.org:8080/8030 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ifd42942b169921a7eb53940c3762bc45bb82a993 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tianyi Wang Gerrit-Reviewer: Alex Behm Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4987: Fix flaky test test row availability.py
Alex Behm has posted comments on this change. Change subject: IMPALA-4987: Fix flaky test test_row_availability.py .. Patch Set 1: (13 comments) Thanks for fixing this http://gerrit.cloudera.org:8080/#/c/8036/1//COMMIT_MSG Commit Message: Line 9: This patch keeps test_row_availbility from random failure. In this test Please take a pass to correct wording/grammar. For example, it should be: "This patch keeps test_rows_availability from randomly failing." More such examples below. Line 10: the time interval between 'Rows available' event and the previous event the 'Rows available' timeline event Line 11: in runtime profile is measured in order to make sure that rows become in the runtime profile Line 12: available after a specific amount of time. This is not correct since Instead of 'This' try to rephrase/repeat what 'This' refers to for clarity, e.g. This measurement is not correct ... Line 13: the previous event is that the coordinator finishes sending query to finished sending the query to the backends Line 14: backends, which means the execution on backend might have already on some backends might have already started Line 15: started. This patch tracks another event "ready to start" as the "Read to start" Line 17: query to backends after this event so the time check should always pass. the query to backends http://gerrit.cloudera.org:8080/#/c/8036/1/tests/query_test/test_rows_availability.py File tests/query_test/test_rows_availability.py: Line 81: row_avail_time_ms = self.__parse_time_ms(self.__find_time(line)) rows_avail_time_ms (not just one row is available) Line 92: "'ready to start' event.\nExpected the event to be marked no earlier than "\ 'Ready to start' Line 93: "%sms after the 'ready to start' event.\nQuery: %s"\ 'Ready to start' Line 99: """Find event time point in a line from runtime timeline.""" from the runtime profile timeline Line 100: # Example line: "- Rows available: 3s311ms (2s300ms)" Example should include what this function returns -- To view, visit http://gerrit.cloudera.org:8080/8036 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I96142f1868a26426cbc34aa9a0e0a56979df66c3 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tianyi Wang Gerrit-Reviewer: Alex Behm Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3642: Adding backend addresses to error statuses for some scratch failures.
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-3642: Adding backend addresses to error statuses for some scratch failures. .. Patch Set 3: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/7816 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If31a50fdf6031312d0348d48aeb8f9688274cac2 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3642: Adding backend addresses to error statuses for some scratch failures.
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-3642: Adding backend addresses to error statuses for some scratch failures. .. IMPALA-3642: Adding backend addresses to error statuses for some scratch failures. Adds GetBackendAddress() (which is host:port) to error messages stemming from SCRATCH_LIMIT_EXCEEDED, SCRATCH_READ_TRUNCATED, and SCRATCH_ALLOCATION_FAILED messages. Testing: * Unit tests assert the string is updated for SCRATCH_LIMIT_EXCEEDED and SCRATCH_ALLOCATION_FAILED. SCRATCH_READ_TRUNCATED doesn't have an existing test, and I didn't add a new one. * Manually testing a query that spills after "chmod 000 /tmp/impala-scratch": $ chmod 000 /tmp/impala-scratch $ impala-shell [dev:21000] > set mem_limit=100m; MEM_LIMIT set to 100m [dev:21000] > select count(*) from tpch_parquet.lineitem join tpch_parquet.orders on l_orderkey = o_orderkey; Query: select count(*) from tpch_parquet.lineitem join tpch_parquet.orders on l_orderkey = o_orderkey Query submitted at: 2017-09-11 11:07:06 (Coordinator: http://dev:25000) Query progress can be monitored at: http://dev:25000/query_plan?query_id=5c48ff8f4103c194:1b40a6c WARNINGS: Could not create files in any configured scratch directories (--scratch_dirs=/tmp/impala-scratch) on backend 'dev:22002'. See logs for previous errors that may have prevented creating or writing scratch files. Opening '/tmp/impala-scratch/5c48ff8f4103c194:1b40a6c_08e8d63b-169d-4571-a0fe-c48fa08d73e6' for write failed with errno=13 description=Error(13): Permission denied Opening '/tmp/impala-scratch/5c48ff8f4103c194:1b40a6c_08e8d63b-169d-4571-a0fe-c48fa08d73e6' for write failed with errno=13 description=Error(13): Permission denied Opening '/tmp/impala-scratch/5c48ff8f4103c194:1b40a6c_08e8d63b-169d-4571-a0fe-c48fa08d73e6' for write failed with errno=13 description=Error(13): Permission denied Change-Id: If31a50fdf6031312d0348d48aeb8f9688274cac2 Reviewed-on: http://gerrit.cloudera.org:8080/7816 Reviewed-by: Tim Armstrong Tested-by: Impala Public Jenkins --- M be/src/runtime/bufferpool/buffer-pool-test.cc M be/src/runtime/tmp-file-mgr-test.cc M be/src/runtime/tmp-file-mgr.cc M common/thrift/generate_error_codes.py 4 files changed, 13 insertions(+), 9 deletions(-) Approvals: Impala Public Jenkins: Verified Tim Armstrong: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/7816 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: If31a50fdf6031312d0348d48aeb8f9688274cac2 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-3897 Codegen null-aware constant in PHJ::ProcessBuildBatch()
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-3897 Codegen null-aware constant in PHJ::ProcessBuildBatch() .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/7849 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I06acbebc9d2d23bef4734b480a5d3ce41680ea70 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3897 Codegen null-aware constant in PHJ::ProcessBuildBatch()
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-3897 Codegen null-aware constant in PHJ::ProcessBuildBatch() .. IMPALA-3897 Codegen null-aware constant in PHJ::ProcessBuildBatch() This change codegen outs a branch in ProcessBuildBatch(). This branch never gets executed for most of the join types except NULL_AWARE_LEFT_ANTI_JOIN. The branch itself is not expensive to execute, but it will reduce codegen time by removing the dead code inside the branch for almost all join modes. Change-Id: I06acbebc9d2d23bef4734b480a5d3ce41680ea70 Reviewed-on: http://gerrit.cloudera.org:8080/7849 Reviewed-by: Tim Armstrong Tested-by: Impala Public Jenkins --- M be/src/codegen/gen_ir_descriptions.py M be/src/exec/partitioned-hash-join-builder-ir.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-builder.h 4 files changed, 20 insertions(+), 10 deletions(-) Approvals: Impala Public Jenkins: Verified Tim Armstrong: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/7849 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I06acbebc9d2d23bef4734b480a5d3ce41680ea70 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke
[Impala-ASF-CR] IMPALA-5894: [DOCS] Clarify placement of STRAIGHT JOIN hint
Alex Behm has posted comments on this change. Change subject: IMPALA-5894: [DOCS] Clarify placement of STRAIGHT_JOIN hint .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8031 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3cac1afccc132f389b2017ad217fdf7e7b04513a Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Alex Behm Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5129: Use KRPC's Kinit code to avoid expensive fork
Sailesh Mukil has uploaded a new patch set (#4). Change subject: IMPALA-5129: Use KRPC's Kinit code to avoid expensive fork .. IMPALA-5129: Use KRPC's Kinit code to avoid expensive fork Impala currently kinits by forking off a child process. This has proved to be expensive in many cases since the subprocess tries to reserve as much memory as Impala is currently using which can be quite a lot. This patch adds a flag called 'use_krpc_kinit' that defaults to true. When it's true, it uses the Kudu security library's kinit code that programatically uses the krb5 library to kinit. When it's false, we run our current path which kicks off the kinit-thread and forks off a kinit process periodically to reacquire tickets based on FLAGS_kerberos_reinit_interval. Converted existing tests in thrift-server-test to run with and without kerberos. We now run this BE test with kerberos by using Kudu's MiniKdc utility. This introduces a new dependency on some kerberos binaries that are checked through FindKerberosPrograms.cmake. Note that this is only a test dependency and not a dependency for the impalad binaries and friends. Compilation will still succeed if the kerberos binaries for the MiniKdc are not found, however, the thrift-server-test will fail. Updated bin/bootstap_development.sh to install new sasl-gssapi modules and the kerberos binaries required for the MiniKdc. Testing: Verified with thrift-server-test and also manually on a live kerberized cluster. Change-Id: I9cea56cc6e7412d87f4c2e92399a2f91ea6af6c7 --- M CMakeLists.txt M be/src/kudu/security/CMakeLists.txt M be/src/kudu/security/init.cc M be/src/kudu/security/test/mini_kdc.cc M be/src/rpc/CMakeLists.txt M be/src/rpc/auth-provider.h M be/src/rpc/authentication.cc M be/src/rpc/thrift-server-test.cc M be/src/testutil/gtest-util.h M bin/bootstrap_development.sh A cmake_modules/FindKerberosPrograms.cmake 11 files changed, 208 insertions(+), 21 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/7938/4 -- To view, visit http://gerrit.cloudera.org:8080/7938 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9cea56cc6e7412d87f4c2e92399a2f91ea6af6c7 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil
[Impala-ASF-CR] IMPALA-4082: Remove todo item in getRegionsInRange
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4082: Remove todo item in getRegionsInRange .. Patch Set 1: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1213/ -- To view, visit http://gerrit.cloudera.org:8080/8018 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I35f0a3cd9363b55b3cde237beaf3037a7967582a Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tianyi Wang Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4082: Remove todo item in getRegionsInRange
Alex Behm has posted comments on this change. Change subject: IMPALA-4082: Remove todo item in getRegionsInRange .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8018 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I35f0a3cd9363b55b3cde237beaf3037a7967582a Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tianyi Wang Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4082: Remove todo item in getRegionsInRange
Lars Volker has posted comments on this change. Change subject: IMPALA-4082: Remove todo item in getRegionsInRange .. Patch Set 1: Code-Review+1 Thank you for fixing this. -- To view, visit http://gerrit.cloudera.org:8080/8018 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I35f0a3cd9363b55b3cde237beaf3037a7967582a Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tianyi Wang Gerrit-Reviewer: Lars Volker Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5912: fix crash in trunc(..., "WW") in release build
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-5912: fix crash in trunc(..., "WW") in release build .. IMPALA-5912: fix crash in trunc(..., "WW") in release build The bug is with the pattern below: const date& d = TruncMonth(orig_date).date(); The problem is that 'd' is a reference into the temporary returned from TruncMonth. But the temporary is only guaranteed to live until the expression has been evaluated. Thus if the compiler re-uses the register or stack slot holding the temporary, 'd' may end up pointing to a bogus value, causing a crash or incorrect results. The fix is to simply create a local date value with the required date, which avoids use of references to expression temporary and makes the logic more obviously correct. Also remove other uses of references to temporary that were correct but unnecessary given that the function returned a value and C++11 return value optimisation should kick in. Testing: Ran expr-test to completion with a release build. Before this fix it reliably crashed for me. Change-Id: Ic5017518188f5025daa5040ca1943581a0675726 Reviewed-on: http://gerrit.cloudera.org:8080/8015 Reviewed-by: Thomas Tauber-Marshall Reviewed-by: Dan Hecht Tested-by: Impala Public Jenkins --- M be/src/exprs/udf-builtins.cc 1 file changed, 6 insertions(+), 6 deletions(-) Approvals: Impala Public Jenkins: Verified Thomas Tauber-Marshall: Looks good to me, but someone else must approve Dan Hecht: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/8015 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ic5017518188f5025daa5040ca1943581a0675726 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4987: Fix flaky test test row availability.py
Tianyi Wang has uploaded a new change for review. http://gerrit.cloudera.org:8080/8036 Change subject: IMPALA-4987: Fix flaky test test_row_availability.py .. IMPALA-4987: Fix flaky test test_row_availability.py This patch keeps test_row_availbility from random failure. In this test the time interval between 'Rows available' event and the previous event in runtime profile is measured in order to make sure that rows become available after a specific amount of time. This is not correct since the previous event is that the coordinator finishes sending query to backends, which means the execution on backend might have already started. This patch tracks another event "ready to start" as the beginning of time interval instead. The coordinator begins to send query to backends after this event so the time check should always pass. Change-Id: I96142f1868a26426cbc34aa9a0e0a56979df66c3 --- M tests/query_test/test_rows_availability.py 1 file changed, 30 insertions(+), 21 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/8036/1 -- To view, visit http://gerrit.cloudera.org:8080/8036 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I96142f1868a26426cbc34aa9a0e0a56979df66c3 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tianyi Wang
[Impala-ASF-CR] IMPALA-5912: fix crash in trunc(..., "WW") in release build
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5912: fix crash in trunc(..., "WW") in release build .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/8015 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic5017518188f5025daa5040ca1943581a0675726 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5890: Abort queries if scanner hits IO errors
Lars Volker has uploaded a new patch set (#7). Change subject: IMPALA-5890: Abort queries if scanner hits IO errors .. IMPALA-5890: Abort queries if scanner hits IO errors Prior to this fix, an error in ScannerContext::Stream::GetNextBuffer() could leave the stream in an inconsistent state: - The DiskIoMgr hits EOF unexpected, cancels the scan range and enqueues a buffer with eosr set. - The ScannerContext::Stream tries to read more bytes, but since it has hit eosr, it tries to read beyond the end of the scan range using DiskIoMgr::Read(). - The previous read error resulted in a new file handle being opened. The now truncated, smaller file causes the seek to fail. - Then during error handling, the BaseSequenceScanner calls SkipToSync() and trips over the NULL pointer in in the IO buffer. In my reproduction this only happens with the file handle cache enabled, which causes Impala to see two different sized handles: the one from the cache when the query starts, and the one after reopening the file. To fix this, we abort a scan range if it encounters an I/O error. This change also fixes GetBytesInternal() to maintain the invariant that the output buffer points to the boundary buffer whenever the latter contains some data. I tested this by running the repro from the JIRA and impalad did not crash but aborted the queries. I also ran the repro with abort_on_error=1, and with the file handle cache disabled. Text files are not affected by this problem, since the text scanner doesn't try to recover from errors during ProcessRange() but wraps it in RETURN_IF_ERROR instead. With this change queries abort with the same error. Parquet files are also not affected since they have the metadata at the end. Truncated files immediately fail with this error: WARNINGS: File 'hdfs://localhost:20500/test-warehouse/tpch.partsupp_parquet/foo.0.parq' has an invalid version number: Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509 --- M be/src/exec/base-sequence-scanner.cc M be/src/exec/scanner-context.cc M be/src/exec/scanner-context.h M be/src/runtime/disk-io-mgr-scan-range.cc M be/src/runtime/disk-io-mgr.cc M be/src/runtime/runtime-state.cc M common/thrift/generate_error_codes.py 7 files changed, 84 insertions(+), 48 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/8011/7 -- To view, visit http://gerrit.cloudera.org:8080/8011 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5890: Abort queries if scanner hits IO errors
Lars Volker has posted comments on this change. Change subject: IMPALA-5890: Abort queries if scanner hits IO errors .. Patch Set 6: (5 comments) http://gerrit.cloudera.org:8080/#/c/8011/6/be/src/runtime/disk-io-mgr-scan-range.cc File be/src/runtime/disk-io-mgr-scan-range.cc: Line 456: status = Status(GetHdfsErrorMsg("Error reading from HDFS file: ", file_)); > Missed this one. Done Line 468: status = Status(ss.str()); > Missed this one. Done Line 476: status = Status(GetHdfsErrorMsg("Error reading from HDFS file: ", file_)); > Missed this one. Done http://gerrit.cloudera.org:8080/#/c/8011/6/be/src/runtime/disk-io-mgr.cc File be/src/runtime/disk-io-mgr.cc: Line 544: return Status(Substitute("Invalid scan range. Bad disk id: $0", disk_id)); > We should treat these as non-recoverable too I think. Should we re-use DISK_IO_ERROR for those, even though they're technically not IO errors? Or should we create a new error code for them? Line 1167: ret_status = Status(ErrorMsg(TErrorCode::RUNTIME_ERROR, > Any reason not to change these too? Done -- To view, visit http://gerrit.cloudera.org:8080/8011 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] PREVIEW: IMPALA-3437: DECIMAL V2: avoid implicit decimal->double conversion
Tim Armstrong has uploaded a new patch set (#4). Change subject: PREVIEW: IMPALA-3437: DECIMAL_V2: avoid implicit decimal->double conversion .. PREVIEW: IMPALA-3437: DECIMAL_V2: avoid implicit decimal->double conversion This changes the behaviour of applying an arithmetic operator to constant DECIMAL and non-DECIMAL arguments. In DECIMAL_V1, this caused an implicit conversion to floating point, which caused users a lot of confusion in some cases. In DECIMAL_V2 the typing rules are simplified: constant decimals are treated the same as any other decimals. Testing: Added some expression tests for different arithmetic operators and binary predicates (the two Expr subclasses that call convertNumericLiteralsFromDecimal()). Extended analyzer tests to test DECIMAL_V2 behaviour. Ran core tests. TODO: * add test cases from the JIRA including explicit casts * add tests for scientific notation to address coverage gap Change-Id: Ie419a75784eec2294947103e6e1465dfadfc29da --- M be/src/exprs/expr-test.cc M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java 3 files changed, 130 insertions(+), 47 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/7916/4 -- To view, visit http://gerrit.cloudera.org:8080/7916 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie419a75784eec2294947103e6e1465dfadfc29da Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong
[Impala-ASF-CR] PREVIEW: IMPALA-3437: DECIMAL V2: avoid implicit decimal->double conversion
Tim Armstrong has uploaded a new patch set (#3). Change subject: PREVIEW: IMPALA-3437: DECIMAL_V2: avoid implicit decimal->double conversion .. PREVIEW: IMPALA-3437: DECIMAL_V2: avoid implicit decimal->double conversion This changes the behaviour of applying an arithmetic operator to constant DECIMAL and non-DECIMAL arguments. In DECIMAL_V1, this caused an implicit conversion to floating point, which caused users a lot of confusion in some cases. In DECIMAL_V2 the typing rules are simplified: constant decimals are treated the same as any other decimals. Testing: Added some expression tests for different arithmetic operators and binary predicates (the two Expr subclasses that call convertNumericLiteralsFromDecimal()). Extended analyzer tests to test DECIMAL_V2 behaviour. Ran core tests. TODO: add tests for scientific notation to address coverage gap Change-Id: Ie419a75784eec2294947103e6e1465dfadfc29da --- M be/src/exprs/expr-test.cc M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java 3 files changed, 130 insertions(+), 47 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/7916/3 -- To view, visit http://gerrit.cloudera.org:8080/7916 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie419a75784eec2294947103e6e1465dfadfc29da Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong
[Impala-ASF-CR] IMPALA-2636: HS2 GetTables() returns TABLE TYPE as TABLE for VIEW
Alex Behm has posted comments on this change. Change subject: IMPALA-2636: HS2 GetTables() returns TABLE_TYPE as TABLE for VIEW .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/7353/1/fe/src/main/java/org/apache/impala/service/MetadataOp.java File fe/src/main/java/org/apache/impala/service/MetadataOp.java: Line 62: private static final String TABLE_TYPE_VIEW = "VIEW"; > Please check if this is consistent with what Hive returns. Does Hive return In any case I think returning "VIEW" is better, but it would be good to document the Impala/Hive discrepancy in a comment somewhere if Hive returns something different -- To view, visit http://gerrit.cloudera.org:8080/7353 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I90616388e6181cf342b3de389af940214ed46428 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: sandeep akinapelli Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2636: HS2 GetTables() returns TABLE TYPE as TABLE for VIEW
Alex Behm has posted comments on this change. Change subject: IMPALA-2636: HS2 GetTables() returns TABLE_TYPE as TABLE for VIEW .. Patch Set 1: (10 comments) Nice, thanks for working on this one http://gerrit.cloudera.org:8080/#/c/7353/1/fe/src/main/java/org/apache/impala/service/MetadataOp.java File fe/src/main/java/org/apache/impala/service/MetadataOp.java: Line 62: private static final String TABLE_TYPE_VIEW = "VIEW"; Please check if this is consistent with what Hive returns. Does Hive return "VIRTUAL_VIEW"? Line 63: private static final String TABLE_TYPE_INDEX_TABLE = "INDEX_TABLE"; Why not just "INDEX"? (Not that it really matters too much since Impala cannot load these anyway) Line 298: if (table.getMetaStoreTable() != null) { Why this null check? Loaded tables should really have this set. Line 321: if(typeStr == null) { single line and space after "if" Line 324: typeStr = typeStr.toUpperCase(); Is this really needed? If so, please Inline into L326. Line 328: case EXTERNAL_TABLE: indent case by 2 Line 338: } catch (Exception e) { move this up so it's clear the switch block cannot throw Line 495: // Impala catalog only contains TABLE. Returns an empty set if the search does not This should be fixed as well. http://gerrit.cloudera.org:8080/#/c/7353/1/fe/src/test/java/org/apache/impala/service/FrontendTest.java File fe/src/test/java/org/apache/impala/service/FrontendTest.java: Line 221: public void TestGetView() throws ImpalaException { We should test two cases: 1. View that is not loaded is shown as a table 2. View that is loaded is shown as a view For testing the first case you should be able to use addTestDb() and addTestView() For testing the second case, add a dummy AnalyzesOk("select * from functional.alltypes_view") to make sure the view is really loaded at this point - otherwise the test relies on the view already being loaded and might be flaky Line 226: req.get_tables_req.setSchemaName("functional"); No need to test tables here -- To view, visit http://gerrit.cloudera.org:8080/7353 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I90616388e6181cf342b3de389af940214ed46428 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: sandeep akinapelli Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects
Vuk Ercegovac has posted comments on this change. Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects .. Patch Set 2: (4 comments) http://gerrit.cloudera.org:8080/#/c/7731/2/be/src/catalog/catalog-server.cc File be/src/catalog/catalog-server.cc: PS2, Line 233: 0, if they have then skip this step and use that data nit: perhaps easier to read? "...version 0. If so, then skip ..." PS2, Line 336: if (topic_deletions) { : VLOG(1) << "Publishing deletion: " << entry_key << "@" : << catalog_object.catalog_version; : } else { : VLOG(1) << "Publishing update: " << entry_key << "@" : << catalog_object.catalog_version; : } perhaps remove some redundancy this way: VLOG(1) << "Publishing " << (topic_deletions ? "deletion" : "update") << ": " << entry_key << "@" << catalog_object.catalog_version; http://gerrit.cloudera.org:8080/#/c/7731/2/be/src/service/impala-server.cc File be/src/service/impala-server.cc: PS2, Line 1366: if (item.deleted) { : VLOG(3) << "Deleted item: " << item.key << " version: " : << catalog_object.catalog_version; : } else { : VLOG(3) << "Added item: " << item.key << " version: " : << catalog_object.catalog_version; : } same suggestion as in catalog_server to reduce redundancy. http://gerrit.cloudera.org:8080/#/c/7731/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: PS2, Line 152: . the word "delta" implies that we're dealing with inserts, deletes, and possibly updates. any reason why this member is named more generally? why not call it deleteLog_? -- To view, visit http://gerrit.cloudera.org:8080/7731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Vuk Ercegovac Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5912: fix crash in trunc(..., "WW") in release build
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5912: fix crash in trunc(..., "WW") in release build .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/8015/1//COMMIT_MSG Commit Message: PS1, Line 23: Also remove other uses of references to temporary that were correct but : unnecessary given that the function returned a value and C++11 : return value optimisation should kick in. > Yeah, i wasn't trying to express an opinion. just noting that it's a common No worries, I didn't take it that way - I was just thinking that some people must like the pattern since it's been repeatedly used. -- To view, visit http://gerrit.cloudera.org:8080/8015 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic5017518188f5025daa5040ca1943581a0675726 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder
Joe McDonnell has posted comments on this change. Change subject: IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/8034/1/be/src/exec/exec-node.h File be/src/exec/exec-node.h: PS1, Line 215: MemTracker* decoder_mem_tracker() { return decoder_mem_tracker_.get(); } The dictionary is only used for Parquet, so I think the MemPool should not be in the ExecNode, as that is used for a large number of other things. Look into moving this down to HdfsParquetScanner (or reuse dictionary_pool_). http://gerrit.cloudera.org:8080/#/c/8034/1/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: PS1, Line 213: dict_decoder_(new MemPool(parent->scan_node_->decoder_mem_tracker())), It is important not to create objects on a per-column basis unless truly necessary. I think it makes more sense to have a single MemPool up at the HdfsParquetScanner level. Look at how dictionary_pool_ works. I think it might be better to reuse that pool. http://gerrit.cloudera.org:8080/#/c/8034/1/be/src/util/dict-encoding.h File be/src/util/dict-encoding.h: PS1, Line 234: *val_ptr = *dict_[index]; dict_ needs to return T's directly rather than T*'s. This is a performance critical path, and an extra dereference is too expensive. -- To view, visit http://gerrit.cloudera.org:8080/8034 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I02a3b54f6c107d19b62ad9e1c49df94175964299 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Joe McDonnell Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5881: Use native allocation while building catalog updates
Bharath Vissapragada has uploaded a new patch set (#17). Change subject: IMPALA-5881: Use native allocation while building catalog updates .. IMPALA-5881: Use native allocation while building catalog updates This patch moves the allocation of thrift structures for serializing the output of GetAllCatalogObjects() call into the native side. This is done to prevent the Catalog from hitting JVM array limitations (2GB maximum size) at scale. Additionally this patch also extends the --compact_catalog_top=true to apply TCompactProtocol for the above serialization to reduce the in-memory footprint. This patch also caps the native allocations to not go beyond 4GB due to Thrift library limitations. (Thrift internally uses uint32_t datatype to represent the message size which limits the size to ~4.2GB). Testing: Passed ASAN + HDFS core and DEBUG + HDFS exhaustive. Deployed the patch on a 16 node cluster and tested it on a Catalog-update topic of 3.5GB (uncompressed) or 780MB (compressed). Change-Id: I383684effa9524734ce3c6c0fb7ed37de0e15782 --- M be/src/catalog/catalog.cc M be/src/catalog/catalog.h M be/src/rpc/jni-thrift-util.h M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M common/thrift/JniCatalog.thrift M fe/src/main/java/org/apache/impala/service/JniCatalog.java A fe/src/main/java/org/apache/impala/thrift/NativeAllocator.java A fe/src/main/java/org/apache/impala/thrift/NativeByteArrayOutputStream.java A fe/src/main/java/org/apache/impala/thrift/TNativeSerializer.java A fe/src/test/java/org/apache/impala/thrift/NativeByteArrayOutputStreamTest.java A fe/src/test/java/org/apache/impala/thrift/TNativeSerializerTest.java 12 files changed, 716 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/7955/17 -- To view, visit http://gerrit.cloudera.org:8080/7955 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I383684effa9524734ce3c6c0fb7ed37de0e15782 Gerrit-PatchSet: 17 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5920: Remove admission control dependency on YARN RM jar
Matthew Jacobs has uploaded a new patch set (#2). Change subject: IMPALA-5920: Remove admission control dependency on YARN RM jar .. IMPALA-5920: Remove admission control dependency on YARN RM jar Impala's admission controller relies on the YARN fair-scheduler.xml for configuration. That configuration is loaded using YARN directly (ie. as a library by the frontend). In Hadoop 3, a number of changes were made to the YARN resourcemanager which break Impala. While we eventually want to rethink the admission control configuration (IMPALA-4159), in the meantime we at least should avoid using unsupported YARN APIs. This patch removes the fe dependency on the YARN artifact 'hadoop-yarn-server-resourcemanager' which contains private APIs and isn't meant to be used as a library. A subset of the required code has been added in 'common/yarn-extras', taken from Hadoop 2.6.0. The goal the 'yarn-extras' is to make Impala's handling of the AC configuration self-sufficient, i.e. it shouldn't matter what version of Hadoop exists. As-is, this was tested and found to work when Hadoop 2.6 is installed. Because the yarn-extras/pom.xml still references hadoop-common, hadoop-yarn-common, and hadoop-yarn-api, additional testing will be required to ensure Impala using yarn-extras works when installed along side Hadoop 3. That testing for Hadoop 3 will be done later. Future changes will make any other changes required for existing code to work when Hadoop 3 is installed. Testing: * Ran existing tests on master. Change-Id: I7efdd8ebea298836ca2a82c0a4ae037ac9285bcf --- M CMakeLists.txt A common/yarn-extras/CMakeLists.txt A common/yarn-extras/README.txt A common/yarn-extras/pom.xml A common/yarn-extras/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/resource/ResourceWeights.java A common/yarn-extras/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationConfiguration.java A common/yarn-extras/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationConfigurationException.java A common/yarn-extras/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationFileLoaderService.java A common/yarn-extras/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueueType.java A common/yarn-extras/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerConfiguration.java A common/yarn-extras/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueuePlacementPolicy.java A common/yarn-extras/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueuePlacementRule.java A common/yarn-extras/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/SchedulingPolicy.java A common/yarn-extras/src/main/java/org/apache/hadoop/yarn/server/utils/BuilderUtils.java M fe/CMakeLists.txt M fe/pom.xml M fe/src/test/java/org/apache/impala/util/TestRequestPoolService.java 17 files changed, 1,804 insertions(+), 14 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/8035/2 -- To view, visit http://gerrit.cloudera.org:8080/8035 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7efdd8ebea298836ca2a82c0a4ae037ac9285bcf Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs
[Impala-ASF-CR] IMPALA-5920: Remove admission control dependency on YARN resourcemanager
Matthew Jacobs has uploaded a new change for review. http://gerrit.cloudera.org:8080/8035 Change subject: IMPALA-5920: Remove admission control dependency on YARN resourcemanager .. IMPALA-5920: Remove admission control dependency on YARN resourcemanager Impala's admission controller relies on the YARN fair-scheduler.xml for configuration. That configuration is loaded using YARN directly (ie. as a library by the frontend). In Hadoop 3, a number of changes were made to the YARN resourcemanager which break Impala. While we eventually want to rethink the admission control configuration (IMPALA-4159), in the meantime we at least should avoid using unsupported YARN APIs. This patch removes the fe dependency on the YARN artifact 'hadoop-yarn-server-resourcemanager' which contains private APIs and isn't meant to be used as a library. A subset of the required code has been added in 'common/yarn-extras', taken from Hadoop 2.6.0. The goal the 'yarn-extras' is to make Impala's handling of the AC configuration self-sufficient, i.e. it shouldn't matter what version of Hadoop exists. As-is, this was tested and found to work when Hadoop 2.6 is installed. Because the yarn-extras/pom.xml still references hadoop-common, hadoop-yarn-common, and hadoop-yarn-api, additional testing will be required to ensure Impala using yarn-extras works when installed along side Hadoop 3. That testing for Hadoop 3 will be done later. Future changes will make any other changes required for existing code to work when Hadoop 3 is installed. Testing: * Ran existing tests on master. Change-Id: I7efdd8ebea298836ca2a82c0a4ae037ac9285bcf --- M CMakeLists.txt A common/yarn-extras/CMakeLists.txt A common/yarn-extras/README.txt A common/yarn-extras/pom.xml A common/yarn-extras/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/resource/ResourceWeights.java A common/yarn-extras/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationConfiguration.java A common/yarn-extras/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationConfigurationException.java A common/yarn-extras/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationFileLoaderService.java A common/yarn-extras/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueueType.java A common/yarn-extras/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerConfiguration.java A common/yarn-extras/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueuePlacementPolicy.java A common/yarn-extras/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueuePlacementRule.java A common/yarn-extras/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/SchedulingPolicy.java A common/yarn-extras/src/main/java/org/apache/hadoop/yarn/server/utils/BuilderUtils.java M fe/CMakeLists.txt M fe/pom.xml M fe/src/test/java/org/apache/impala/util/TestRequestPoolService.java 17 files changed, 1,804 insertions(+), 14 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/8035/1 -- To view, visit http://gerrit.cloudera.org:8080/8035 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I7efdd8ebea298836ca2a82c0a4ae037ac9285bcf Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs
[Impala-ASF-CR] IMPALA-3642: Adding backend addresses to error statuses for some scratch failures.
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-3642: Adding backend addresses to error statuses for some scratch failures. .. Patch Set 3: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1212/ -- To view, visit http://gerrit.cloudera.org:8080/7816 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If31a50fdf6031312d0348d48aeb8f9688274cac2 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3642: Adding backend addresses to error statuses for some scratch failures.
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3642: Adding backend addresses to error statuses for some scratch failures. .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7816 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If31a50fdf6031312d0348d48aeb8f9688274cac2 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3642: Adding backend addresses to error statuses for some scratch failures.
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3642: Adding backend addresses to error statuses for some scratch failures. .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7816 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If31a50fdf6031312d0348d48aeb8f9688274cac2 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3897 Codegen null-aware constant in PHJ::ProcessBuildBatch()
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-3897 Codegen null-aware constant in PHJ::ProcessBuildBatch() .. Patch Set 2: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1211/ -- To view, visit http://gerrit.cloudera.org:8080/7849 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I06acbebc9d2d23bef4734b480a5d3ce41680ea70 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3897 Codegen null-aware constant in PHJ::ProcessBuildBatch()
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3897 Codegen null-aware constant in PHJ::ProcessBuildBatch() .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7849 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I06acbebc9d2d23bef4734b480a5d3ce41680ea70 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadke Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4848: Add WIDTH BUCKET() function
anujphadke has uploaded a new patch set (#7). Change subject: IMPALA-4848: Add WIDTH_BUCKET() function .. IMPALA-4848: Add WIDTH_BUCKET() function Change-Id: I081bc916b1bef7b929ca161a9aade3b54c6b858f --- M be/src/exprs/expr-test.cc M be/src/exprs/math-functions-ir.cc M be/src/exprs/math-functions.h M common/function-registry/impala_functions.py M fe/src/main/java/org/apache/impala/analysis/Expr.java 5 files changed, 169 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/6023/7 -- To view, visit http://gerrit.cloudera.org:8080/6023 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I081bc916b1bef7b929ca161a9aade3b54c6b858f Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadke Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: anujphadke
[Impala-ASF-CR] IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder
Pranay Singh has uploaded a new change for review. http://gerrit.cloudera.org:8080/8034 Change subject: IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder .. IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder Currently DictDecoder class and DictEncoder class uses std::vector to store the tables mapping codeword to value and vice-versa. It is hard to detect the memory usage by these tables when they becomes very large, since this memory is not accounted by Impala's memory mangement infrastructure. This patch introduces a new MemTracker to track the memory used by std::vector in DictDecoder class, whereas an existing memory pool for DictEncoder class is used when allocating memory for std::vector in DictEncoder class. Change-Id: I02a3b54f6c107d19b62ad9e1c49df94175964299 --- M be/src/exec/exec-node.cc M be/src/exec/exec-node.h M be/src/exec/parquet-column-readers.cc M be/src/util/dict-encoding.h M be/src/util/dict-test.cc 5 files changed, 52 insertions(+), 17 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/34/8034/1 -- To view, visit http://gerrit.cloudera.org:8080/8034 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I02a3b54f6c107d19b62ad9e1c49df94175964299 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Pranay Singh
[Impala-ASF-CR] IMPALA-5890: Abort queries if scanner hits IO errors
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5890: Abort queries if scanner hits IO errors .. Patch Set 6: (5 comments) http://gerrit.cloudera.org:8080/#/c/8011/6/be/src/runtime/disk-io-mgr-scan-range.cc File be/src/runtime/disk-io-mgr-scan-range.cc: Line 456: status = Status(GetHdfsErrorMsg("Error reading from HDFS file: ", file_)); Missed this one. Line 468: status = Status(ss.str()); Missed this one. Line 476: status = Status(GetHdfsErrorMsg("Error reading from HDFS file: ", file_)); Missed this one. http://gerrit.cloudera.org:8080/#/c/8011/6/be/src/runtime/disk-io-mgr.cc File be/src/runtime/disk-io-mgr.cc: Line 544: return Status(Substitute("Invalid scan range. Bad disk id: $0", disk_id)); We should treat these as non-recoverable too I think. Line 1167: ret_status = Status(ErrorMsg(TErrorCode::RUNTIME_ERROR, Any reason not to change these too? -- To view, visit http://gerrit.cloudera.org:8080/8011 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5881: Use native allocation while building catalog updates
Bharath Vissapragada has posted comments on this change. Change subject: IMPALA-5881: Use native allocation while building catalog updates .. Patch Set 16: (14 comments) http://gerrit.cloudera.org:8080/#/c/7955/16/be/src/catalog/catalog.cc File be/src/catalog/catalog.cc: Line 120: // Now that we deserialize the thrift objects into TGetAllCatalogObjectsResponse, > deserialized (past tense) Done http://gerrit.cloudera.org:8080/#/c/7955/16/fe/src/main/java/org/apache/impala/service/JniCatalog.java File fe/src/main/java/org/apache/impala/service/JniCatalog.java: Line 84: // flag. > remove this line Done http://gerrit.cloudera.org:8080/#/c/7955/16/fe/src/main/java/org/apache/impala/thrift/NativeByteArrayOutputStream.java File fe/src/main/java/org/apache/impala/thrift/NativeByteArrayOutputStream.java: Line 61: " maximum limit of " + BUFFER_MAX_SIZE_LIMIT + " bytes"; > remove first space in string Done Line 99:* - bufferPtr_>=0 > use consistent spacing, i.e. bufferPtr_ >= 0 Done Line 102: Preconditions.checkState(bufferPtr_ >= 0); > remove (documenting in comment is better here) Done http://gerrit.cloudera.org:8080/#/c/7955/16/fe/src/test/java/org/apache/impala/thrift/NativeByteArrayOutputStreamTest.java File fe/src/test/java/org/apache/impala/thrift/NativeByteArrayOutputStreamTest.java: Line 28: // Custom NativeAllocator that accounts the allocated and freed bytes and randomly > the allocated/freed bytes and simulates allocation failures Done Line 39: // Throws an OOM on every alternate call to reallocate(). > on every 2nd call Done Line 51: allocatedBytes_ = 0; > single line Done Line 87: while (testAllocator_.getAllocationFailures() < 10) { > I still don't think this is a great approach because it does not reflect re Makes sense. Deterministic tests are always better and the test combinations are small too. Line 91: writeAndCheck(b, 1, 0); > These should be tested from a fresh Nbaos and not from the one that is alre yea I added a new nboas for all these set of tests. http://gerrit.cloudera.org:8080/#/c/7955/16/fe/src/test/java/org/apache/impala/thrift/TNativeSerializerTest.java File fe/src/test/java/org/apache/impala/thrift/TNativeSerializerTest.java: Line 58: // Deserialize the object at result.buffer_ptr and we confirm it is the same > remove "we" Done Line 119: Assert.assertTrue(nativeSerializer.getSerializedBytesSize() >= 3584 * 1024 * 1024); > Isn't there an assertGe() or something like that? Also print a message with Yea, I didn't find it. Line 137: e.printStackTrace(); > remove Done Line 166: testBasicSerialization(factory); > Let's make these separate top-level tests, if's fine to repeat the loop ove Done -- To view, visit http://gerrit.cloudera.org:8080/7955 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I383684effa9524734ce3c6c0fb7ed37de0e15782 Gerrit-PatchSet: 16 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects
Alex Behm has posted comments on this change. Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects .. Patch Set 2: (12 comments) Change is looking pretty good to me, only minor comments left. http://gerrit.cloudera.org:8080/#/c/7731/2/be/src/catalog/catalog-server.cc File be/src/catalog/catalog-server.cc: Line 305: BuildTopicUpdatesHelper(catalog_objects.updated_objects, false); I like the previous version more where this is inlined into L292. A quick comment about the requirements/structure of a topic update would be nice, e.g. stating that the set of deletions and updates are disjoint and therefore the order in which updates/deletions are added is irrelevant. Line 317: if (catalog_object.catalog_version <= last_sent_catalog_version_) continue; Move this above L312? http://gerrit.cloudera.org:8080/#/c/7731/2/be/src/service/impala-server.cc File be/src/service/impala-server.cc: Line 1340: LOG(INFO) << "Received large catalog update(>100mb): " Received a large catalog item? http://gerrit.cloudera.org:8080/#/c/7731/2/common/thrift/StatestoreService.thrift File common/thrift/StatestoreService.thrift: Line 86: 3: required bool deleted = false; Isn't it better to not have a default value to make sure this is always set? http://gerrit.cloudera.org:8080/#/c/7731/2/fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java File fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java: Line 56: * determine which catalog objects were deleted since the last catalog update topic. catalog topic update Line 57: * Once the catalog update topic is constructed, the old deleted catalog objects are catalog topic update Line 74: // Retrieve all the removed catalog objects with version > 'fromVersion'. /** style comment http://gerrit.cloudera.org:8080/#/c/7731/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: Line 347: // Identify the catalog objects that were updated (added/dropped) in the catalog with /** style comment Line 351: Set addedCatalogObjectKeys = Sets.newHashSet(); updatedCatalogObjects? Line 363: TCatalogObject catalogTbl = new TCatalogObject(TCatalogObjectType.TABLE, Why not just iterate over db.getTables()? Line 441: addedCatalogObjectKeys.add(CatalogDeltaLog.toCatalogObjectKey(privilege)); Instead of having a duplicate add() in all places, how about creating the addedCatalogObjets set right before L448 by looping over the addedCatalogObjects? Seems only marginally less efficient, but less prone to forgetting an add() somewhere. Line 448: for (TCatalogObject removedObject: deltaLog_.retrieveObjects(fromVersion)) { Nice! Much clearer. -- To view, visit http://gerrit.cloudera.org:8080/7731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4939, IMPALA-4940: Decimal V2 multiplication
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4939, IMPALA-4940: Decimal V2 multiplication .. Patch Set 5: (2 comments) http://gerrit.cloudera.org:8080/#/c/7438/4//COMMIT_MSG Commit Message: Line 75: DECIMAL_V2 disabled: 4.25s > Running perf top showed that most of the time is spent in ScaleDownAndRound Yeah that seems likely - the faster everything else becomes the more this would be a bottleneck. I don't see an obvious way to speed this up (except maybe avoiding the redundant divide/mod calls?) and I guess it only makes a difference if a lot of expressions are overflowing. Maybe someone with deeper knowledge of decimal might see a way to improve it. http://gerrit.cloudera.org:8080/#/c/7438/4/be/src/runtime/decimal-value.inline.h File be/src/runtime/decimal-value.inline.h: Line 204: if (round) { It's unfortunate that we're not computing the quotient and remainder at the same time. It looks like boost eventually calls divide_unsigned_helper in multiprecision/cpp_int/divide.hpp with the same arguments in both cases but throws away either the result or remainder. If avoiding two divide calls would help performance we could consider calling divide_unsigned_helper() directly. -- To view, visit http://gerrit.cloudera.org:8080/7438 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3897 Codegen null-aware constant in PHJ::ProcessBuildBatch()
anujphadke has posted comments on this change. Change subject: IMPALA-3897 Codegen null-aware constant in PHJ::ProcessBuildBatch() .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/7849/1/be/src/exec/partitioned-hash-join-builder.h File be/src/exec/partitioned-hash-join-builder.h: Line 280: /// Free local allocations made from expr evaluators during hash table construction. > Can you add a short description of the parameter? Done PS1, Line 477: > nit: parameter name doesn't match other functions Done -- To view, visit http://gerrit.cloudera.org:8080/7849 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I06acbebc9d2d23bef4734b480a5d3ce41680ea70 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadke Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3897 Codegen null-aware constant in PHJ::ProcessBuildBatch()
anujphadke has uploaded a new patch set (#2). Change subject: IMPALA-3897 Codegen null-aware constant in PHJ::ProcessBuildBatch() .. IMPALA-3897 Codegen null-aware constant in PHJ::ProcessBuildBatch() This change codegen outs a branch in ProcessBuildBatch(). This branch never gets executed for most of the join types except NULL_AWARE_LEFT_ANTI_JOIN. The branch itself is not expensive to execute, but it will reduce codegen time by removing the dead code inside the branch for almost all join modes. Change-Id: I06acbebc9d2d23bef4734b480a5d3ce41680ea70 --- M be/src/codegen/gen_ir_descriptions.py M be/src/exec/partitioned-hash-join-builder-ir.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-builder.h 4 files changed, 20 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/7849/2 -- To view, visit http://gerrit.cloudera.org:8080/7849 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I06acbebc9d2d23bef4734b480a5d3ce41680ea70 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadke Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec
anujphadke has posted comments on this change. Change subject: IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec .. Patch Set 7: I tested this locally on my machine with a cluster size = 1. I was able to recreate this locally on starting the impala minicluster with size=3. I have a fix for this which I am testing right now. Spoke to Tim about it. Sorry forgot to update it here. -- To view, visit http://gerrit.cloudera.org:8080/7245 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I797916505f62e568f4159e07099481b8ff571da2 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadke Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5912: fix crash in trunc(..., "WW") in release build
Dan Hecht has posted comments on this change. Change subject: IMPALA-5912: fix crash in trunc(..., "WW") in release build .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/8015/1//COMMIT_MSG Commit Message: PS1, Line 23: Also remove other uses of references to temporary that were correct but : unnecessary given that the function returned a value and C++11 : return value optimisation should kick in. > I guess there's an element of taste to it. It definitely gets confusing. Yeah, i wasn't trying to express an opinion. just noting that it's a common pattern. -- To view, visit http://gerrit.cloudera.org:8080/8015 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic5017518188f5025daa5040ca1943581a0675726 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec
Dan Hecht has posted comments on this change. Change subject: IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec .. Patch Set 7: Ping? -- To view, visit http://gerrit.cloudera.org:8080/7245 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I797916505f62e568f4159e07099481b8ff571da2 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadke Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5199: prevent hang on empty row batch exchange
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5199: prevent hang on empty row batch exchange .. Patch Set 2: Yeah I think there are some fundamental problems with the current parallel startup logic - I don't think it's actually possible to make all cases work with the current approach where all state is receiver-side and there's no additional communication (point-to-point or via the coordinator) to disambiguate hangs and delays. IMPALA-3990 -- To view, visit http://gerrit.cloudera.org:8080/8005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib370ebe44e3bb34d3f0fb9f05aa6386eb91c8645 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4856: Port data stream service to KRPC
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4856: Port data stream service to KRPC .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/8023/1/be/src/runtime/krpc-data-stream-mgr.h File be/src/runtime/krpc-data-stream-mgr.h: Line 129: /// In exceptional circumstances, the data stream manager will garbage-collect the closed There's a pre-existing flaw in the reasoning here that we should call out. "Exceptional circumstances" is vague and I think hides a distinction between an unhealthy cluster with extreme delays and the expected behaviour of certain long-running queries. I think the problem is an invalid assumption that the the receiver sends batches on a regular cadence with a bounded delay before the first batch is sent and when each subsequent batch is sent. That assumption is incorrect. I think we should call it out in this comment so that readers understand the current flaw. Here's an example where it's wrong. Consider a plan with three fragments. F1 (long-running) | V F2 (limit = 1 on exchange) | V F3 (long-running selective scan) 1. The fragments all start up. 2. Instance 1 of F3 immediately finds and returns a matching row, which is sent to F2. 3. This causes F2 to hit its limit, close its exchange and tear itself down. 4. Let's assume F1 also has a lot of work to do and won't finish for 20 minutes 5. Instance 2 of F3 is still churning away on the scan. After 10 minutes it finally find a matching row. 6. F3 tries to send the row, can't find the receiver after a timeout and returns an error to the coordinator 7. The coordinator cancels the query and returns an error There are two problems here: 1. The query failed when it shouldn't have 2. F3 wasn't cancelled when it was no longer needed and used lots of resources unnecessarily. The JIRA is IMPALA-3990. I believe that the main reason we haven't seen this in practice is that it can only occur when there's a limit without order in a subquery. Most queries with that property are non-deterministic and it doesn't really make a lot of sense to have a long-running query that returns non-deterministic results. But this actually blocked me from implementing early-close for joins with empty build sides, which is a nice optimisations. There may also be a slightly different invalid assumption that the time between the receiver closing the exchange and the sender sending its last batch is bounded. That seems possible to solve with sender-side state if the receiver notifies the sender that the receiver was not present and the sender can infer it was closed cleanly. -- To view, visit http://gerrit.cloudera.org:8080/8023 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic0b8c1e50678da66ab1547d16530f88b323ed8c1 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4939, IMPALA-4940: Decimal V2 multiplication
Taras Bobrovytsky has uploaded a new patch set (#5). Change subject: IMPALA-4939, IMPALA-4940: Decimal V2 multiplication .. IMPALA-4939, IMPALA-4940: Decimal V2 multiplication Implement the new DECIMAL return type rules for multiply expressions, active when query option DECIMAL_V2=1. The algorithm for determining the type of the result of multiplication is described in the JIRA. DECIMAL V1: +---+ | typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) | +---+ | DECIMAL(38,38)| +---+ +---+ | typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) | +---+ | DECIMAL(38,30)| +---+ DECIMAL V2: +---+ | typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) | +---+ | DECIMAL(38,37)| +---+ +---+ | typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) | +---+ | DECIMAL(38,6) | +---+ In this patch, we also fix the early multiplication overflow. We compute an int256 intermediate value, which we then attempt to scale down and round to int128. Benchmarks: In the following benchmarks, we are selecting from lineitem_big, which has about 100 times as many rows as the normal lineitem. Query: select sum(l_quantity * l_tax) + sum(l_extendedprice * l_discount) from lineitem_big; Before: DECIMAL_V2 disabled: 2.24s DECIMAL_V2 enabled : 2.14 After: DECIMAL_V2 disabled: 2.66s DECIMAL_V2 enabled : 2.25s Query: select sum(l_quantity * l_tax * cast(1 as decimal(38, 0))) + sum(l_extendedprice * l_discount * cast(1 as decimal(38, 0))) from lineitem_big Before: DECIMAL_V2 disabled: 2.34s DECIMAL_V2 enabled : 2.36s After: DECIMAL_V2 disabled: 4.25s DECIMAL_V2 enabled : 4.15s Query: select sum(l_quantity * l_tax * cast(1 as decimal(38, 37))) + sum(l_extendedprice * l_discount * cast(1 as decimal(38, 37))) from lineitem_big Before: DECIMAL_V2 disabled: 2.84s DECIMAL_V2 enabled : 8.26s After: DECIMAL_V2 disabled: 69.16s DECIMAL_V2 enabled : 66.13s Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1 --- M be/src/exprs/expr-test.cc M be/src/runtime/decimal-value.inline.h M be/src/util/bit-util.h M fe/src/main/java/org/apache/impala/analysis/TypesUtil.java 4 files changed, 282 insertions(+), 60 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/7438/5 -- To view, visit http://gerrit.cloudera.org:8080/7438 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-4978: Impala should set the kerberos principal to the FQDN
Dan Hecht has posted comments on this change. Change subject: IMPALA-4978: Impala should set the kerberos principal to the FQDN .. Patch Set 1: Do we want to move forward with this? -- To view, visit http://gerrit.cloudera.org:8080/6260 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1c94f36cb3493afdfcf84f8b31b9897404bd095f Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5667: Race in DataStremSender could cause TransmitData sidecar corruption
Dan Hecht has posted comments on this change. Change subject: IMPALA-5667: Race in DataStremSender could cause TransmitData sidecar corruption .. Patch Set 1: Is this something we want to move forward? -- To view, visit http://gerrit.cloudera.org:8080/7436 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7742551236205c49393a7351291eae6aefb4fa09 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4939, IMPALA-4939: Decimal V2 multiplication
Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-4939, IMPALA-4939: Decimal V2 multiplication .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/7438/4//COMMIT_MSG Commit Message: Line 7: IMPALA-4939, IMPALA-4939: Decimal V2 multiplication > The same JIRA is listed twice. Done http://gerrit.cloudera.org:8080/#/c/7438/1/be/src/runtime/decimal-value.inline.h File be/src/runtime/decimal-value.inline.h: Line 301: // The following is frought with apparent difficulty, as there is only 1 bit > It seems most efficient development-wise if we give this a try now while we Implemented checking the leading zeros optimization. It doesn't seem to have a big impact on the performance of the queries I tried. -- To view, visit http://gerrit.cloudera.org:8080/7438 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4939, IMPALA-4940: Decimal V2 multiplication
Taras Bobrovytsky has uploaded a new patch set (#5). Change subject: IMPALA-4939, IMPALA-4940: Decimal V2 multiplication .. IMPALA-4939, IMPALA-4940: Decimal V2 multiplication Implement the new DECIMAL return type rules for multiply expressions, active when query option DECIMAL_V2=1. The algorithm for determining the type of the result of multiplication is described in the JIRA. DECIMAL V1: +---+ | typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) | +---+ | DECIMAL(38,38)| +---+ +---+ | typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) | +---+ | DECIMAL(38,30)| +---+ DECIMAL V2: +---+ | typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) | +---+ | DECIMAL(38,37)| +---+ +---+ | typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) | +---+ | DECIMAL(38,6) | +---+ In this patch, we also fix the early multiplication overflow. We compute an int256 intermediate value, which we then attempt to scale down and round to int128. Benchmarks: In the following benchmarks, we are selecting from lineitem_big, which has about 100 times as many rows as the normal lineitem. Query: select sum(l_quantity * l_tax) + sum(l_extendedprice * l_discount) from lineitem_big; Before: DECIMAL_V2 disabled: 2.24s DECIMAL_V2 enabled : 2.14 After: DECIMAL_V2 disabled: 2.66s DECIMAL_V2 enabled : 2.25s Query: select sum(l_quantity * l_tax * cast(1 as decimal(38, 0))) + sum(l_extendedprice * l_discount * cast(1 as decimal(38, 0))) from lineitem_big Before: DECIMAL_V2 disabled: 2.34s DECIMAL_V2 enabled : 2.36s After: DECIMAL_V2 disabled: 4.25s DECIMAL_V2 enabled : 4.15s Query: select sum(l_quantity * l_tax * cast(1 as decimal(38, 37))) + sum(l_extendedprice * l_discount * cast(1 as decimal(38, 37))) from lineitem_big Before: DECIMAL_V2 disabled: 2.84s DECIMAL_V2 enabled : 8.26s After: DECIMAL_V2 disabled: 69.16s DECIMAL_V2 enabled : 66.13s Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1 --- M be/src/exprs/expr-test.cc M be/src/runtime/decimal-value.inline.h M be/src/util/bit-util.h M fe/src/main/java/org/apache/impala/analysis/TypesUtil.java 4 files changed, 282 insertions(+), 60 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/7438/5 -- To view, visit http://gerrit.cloudera.org:8080/7438 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-5912: fix crash in trunc(..., "WW") in release build
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5912: fix crash in trunc(..., "WW") in release build .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/8015/1//COMMIT_MSG Commit Message: PS1, Line 23: Also remove other uses of references to temporary that were correct but : unnecessary given that the function returned a value and C++11 : return value optimisation should kick in. > okay with me (if RVO is applying so no perf impact), but the const-ref-to-t I guess there's an element of taste to it. It definitely gets confusing. -- To view, visit http://gerrit.cloudera.org:8080/8015 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic5017518188f5025daa5040ca1943581a0675726 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5199: prevent hang on empty row batch exchange
Dan Hecht has posted comments on this change. Change subject: IMPALA-5199: prevent hang on empty row batch exchange .. Patch Set 2: Code-Review+1 Making these cases consistent makes sense to me. Really, I think we need to eventually rethink the whole parallel startup stuff, especially now that we have per-query execrpc, etc. It just seems too fragile. But in the mean time, I see no reason the eos and non-eos case inconsistent. -- To view, visit http://gerrit.cloudera.org:8080/8005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib370ebe44e3bb34d3f0fb9f05aa6386eb91c8645 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5890: Abort queries if scanner hits IO errors
Lars Volker has posted comments on this change. Change subject: IMPALA-5890: Abort queries if scanner hits IO errors .. Patch Set 5: (5 comments) http://gerrit.cloudera.org:8080/#/c/8011/5/be/src/exec/base-sequence-scanner.cc File be/src/exec/base-sequence-scanner.cc: PS5, Line 177: if (status.IsCancelled() || status.IsMemLimitExceeded()) return status; : : // Log error from file format parsing. : state_->LogError(ErrorMsg(TErrorCode::SEQUENCE_SCANNER_PARSE_ERROR, : stream_->filename(), stream_->file_offset(), : (stream_->eof() ? "(EOF)" : ""))); : : // Make sure errors specified in the status are logged as well : state_->LogError(status.msg()); : : // If abort on error then return, otherwise try to recover. : if (state_->abort_on_error()) return status; : : // Abort scan range for I/O related errors : if (status.code() == TErrorCode::SCANNER_CONTEXT_WRAPPED_IO_ERROR) { : eos_ = true; : return Status::OK(); : } > it does seem like this could be written in terms of RuntimeState::LogOrRetu It looks to me as if the query will be aborted if return a non-OK status here. ProcessSplit() will return that status and it will be handled in hdfs-scan-node.cc:441. This will call SetDoneInternal() and the scanners will stop. Setting eos_ here will make stop ProcessSplit() but make it return Status::OK(). if (!status.ok()) { unique_lock l(lock_); // If there was already an error, the main thread will do the cleanup if (!status_.ok()) break; if (status.IsCancelled() && done_) { // Scan node initiated scanner thread cancellation. No need to do anything. break; } // Set status_ before calling SetDone() (which shuts down the RowBatchQueue), // to ensure that GetNextInternal() notices the error status. status_ = status; SetDoneInternal(); break; } That being said, I changed the code to abort the query. http://gerrit.cloudera.org:8080/#/c/8011/5/be/src/exec/scanner-context.cc File be/src/exec/scanner-context.cc: PS5, Line 194: Internal " > would be best to keep "Internal error" on the same line so that it's more e Done Line 284: io_buffer_bytes_left_ = 0; > Maybe add a short comment explaining why this is necessary? Done, thx for the suggestion. Line 371: Status wrapped_status(TErrorCode::SCANNER_CONTEXT_WRAPPED_IO_ERROR, status.msg().msg()); > Maybe reference IMPALA-4697 for context on why we're doing this instead of Removed the code altogether. http://gerrit.cloudera.org:8080/#/c/8011/5/be/src/exec/scanner-context.h File be/src/exec/scanner-context.h: Line 277: Status WrapIoError(const Status& status); > Is there a simple thing we can start with for this patch (rather than intro Done -- To view, visit http://gerrit.cloudera.org:8080/8011 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5912: fix crash in trunc(..., "WW") in release build
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5912: fix crash in trunc(..., "WW") in release build .. Patch Set 1: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1210/ -- To view, visit http://gerrit.cloudera.org:8080/8015 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic5017518188f5025daa5040ca1943581a0675726 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5890: Abort queries if scanner hits IO errors
Lars Volker has uploaded a new patch set (#6). Change subject: IMPALA-5890: Abort queries if scanner hits IO errors .. IMPALA-5890: Abort queries if scanner hits IO errors Prior to this fix, an error in ScannerContext::Stream::GetNextBuffer() could leave the stream in an inconsistent state: - The DiskIoMgr hits EOF unexpected, cancels the scan range and enqueues a buffer with eosr set. - The ScannerContext::Stream tries to read more bytes, but since it has hit eosr, it tries to read beyond the end of the scan range using DiskIoMgr::Read(). - The previous read error resulted in a new file handle being opened. The now truncated, smaller file causes the seek to fail. - Then during error handling, the BaseSequenceScanner calls SkipToSync() and trips over the NULL pointer in in the IO buffer. In my reproduction this only happens with the file handle cache enabled, which causes Impala to see two different sized handles: the one from the cache when the query starts, and the one after reopening the file. To fix this, we abort a scan range if it encounters an I/O error. This change also fixes GetBytesInternal() to maintain the invariant that the output buffer points to the boundary buffer whenever the latter contains some data. I tested this by running the repro from the JIRA and impalad did not crash but aborted the queries. I also ran the repro with abort_on_error=1, and with the file handle cache disabled. Text files are not affected by this problem, since the text scanner doesn't try to recover from errors during ProcessRange() but wraps it in RETURN_IF_ERROR instead. With this change queries abort with the same error. Parquet files are also not affected since they have the metadata at the end. Truncated files immediately fail with this error: WARNINGS: File 'hdfs://localhost:20500/test-warehouse/tpch.partsupp_parquet/foo.0.parq' has an invalid version number: Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509 --- M be/src/exec/base-sequence-scanner.cc M be/src/exec/scanner-context.cc M be/src/exec/scanner-context.h M be/src/runtime/disk-io-mgr-scan-range.cc M be/src/runtime/disk-io-mgr.cc M be/src/runtime/runtime-state.cc M common/thrift/generate_error_codes.py 7 files changed, 66 insertions(+), 23 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/8011/6 -- To view, visit http://gerrit.cloudera.org:8080/8011 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5597: Check predicate children types when building runtime filter plan
Tianyi Wang has abandoned this change. Change subject: IMPALA-5597: Check predicate children types when building runtime filter plan .. Abandoned Sorry. Not ready. -- To view, visit http://gerrit.cloudera.org:8080/7949 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: abandon Gerrit-Change-Id: I0d66673e280ce13cd3a514236c0c2ecbc50475a6 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tianyi Wang Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tianyi Wang
[Impala-ASF-CR] IMPALA-5912: fix crash in trunc(..., "WW") in release build
Dan Hecht has posted comments on this change. Change subject: IMPALA-5912: fix crash in trunc(..., "WW") in release build .. Patch Set 1: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/8015/1//COMMIT_MSG Commit Message: PS1, Line 23: Also remove other uses of references to temporary that were correct but : unnecessary given that the function returned a value and C++11 : return value optimisation should kick in. okay with me (if RVO is applying so no perf impact), but the const-ref-to-temporary is a pattern we use broadly especially for TimestampValue (for better or worse). -- To view, visit http://gerrit.cloudera.org:8080/8015 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic5017518188f5025daa5040ca1943581a0675726 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5597: Check predicate children types when building runtime filter plan
Tianyi Wang has restored this change. Change subject: IMPALA-5597: Check predicate children types when building runtime filter plan .. Restored Reopen since a new fix is implemented. -- To view, visit http://gerrit.cloudera.org:8080/7949 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: restore Gerrit-Change-Id: I0d66673e280ce13cd3a514236c0c2ecbc50475a6 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tianyi Wang Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tianyi Wang
[Impala-ASF-CR] IMPALA-1767 Adds predicate to test boolean values true, false, unknown.
Vuk Ercegovac has uploaded a new patch set (#5). Change subject: IMPALA-1767 Adds predicate to test boolean values true, false, unknown. .. IMPALA-1767 Adds predicate to test boolean values true, false, unknown. Adds a new expression to represent the following boolean predicate: IS [NOT] (TRUE | FALSE | UNKNOWN) The expression is rewritten into compound expressions using negation, is null, and equality. As a result, there are no BE operators to eval this predicate. Testing: - Frontend: parser, analyzer, rewriter, tosql - EndToEnd query expressions Change-Id: I5bd0aa82a0316ac236b7ecf8612ef4b30c016a00 --- M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/Analyzer.java A fe/src/main/java/org/apache/impala/analysis/BoolTestPredicate.java A fe/src/main/java/org/apache/impala/rewrite/BoolTestToCompoundRule.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M testdata/workloads/functional-query/queries/QueryTest/exprs.test 9 files changed, 337 insertions(+), 16 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/8014/5 -- To view, visit http://gerrit.cloudera.org:8080/8014 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5bd0aa82a0316ac236b7ecf8612ef4b30c016a00 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Vuk Ercegovac Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-1767 Adds predicate to test boolean values true, false, unknown.
Vuk Ercegovac has posted comments on this change. Change subject: IMPALA-1767 Adds predicate to test boolean values true, false, unknown. .. Patch Set 6: (12 comments) http://gerrit.cloudera.org:8080/#/c/8014/2/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: Line 2933: {: RESULT = p; :} > It's a little weird that "IS NULL" is handled this way and "IS TRUE" is han they're separate predicates in the standard (is null vs. is boolean)so I've kept them separate in the grammar as well. one option to make things more uniform here is to put the not handling for is null in a separate rule. open to other suggestions as well to make this clearer. for null vs unknown, I've kept them separate since the rule for boolean places more restrictions on type of the lhs. afaik, unknown expects that the lhs type is null or boolean but when checking null, the lhs can be any type or null. for example, if you test a boolean column with unknown or null in postgres, it works. switch the column's type to int, and is null type checks but is unknown does not. http://gerrit.cloudera.org:8080/#/c/8014/3/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: PS3, Line 1583: > whitespace Done PS3, Line 3034: > whitespace, here and below Done http://gerrit.cloudera.org:8080/#/c/8014/3/fe/src/main/java/org/apache/impala/analysis/BoolTestPredicate.java File fe/src/main/java/org/apache/impala/analysis/BoolTestPredicate.java: PS3, Line 26: Class describing a predicate that tests a boolean value using IS. : * The form of a bool test > You should explain more thoroughly what this expr actually does, eg. the ha more detailed comment was in the rewrite.. makes more sense here so moved it. PS3, Line 28: s a bool or null and it returns a bool : * (much like IS [NO > weird line wrapping here Done PS3, Line 33: l > We don't use trailing underscores on constant values. Done PS3, Line 94: thro > Preconditions.checkState Done http://gerrit.cloudera.org:8080/#/c/8014/2/fe/src/main/java/org/apache/impala/rewrite/BoolTestToCompoundRule.java File fe/src/main/java/org/apache/impala/rewrite/BoolTestToCompoundRule.java: Line 29: * Rewrites a bool test predicate to compound expressions. > Please consider adding the rewrite rule (i.e., line 50 and 47) up here to t Done http://gerrit.cloudera.org:8080/#/c/8014/3/fe/src/main/java/org/apache/impala/rewrite/BoolTestToCompoundRule.java File fe/src/main/java/org/apache/impala/rewrite/BoolTestToCompoundRule.java: PS3, Line 39: : public class BoolT > single line Done PS3, Line 64: } : > single line Done http://gerrit.cloudera.org:8080/#/c/8014/3/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java File fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java: PS3, Line 164: > whitespace Done http://gerrit.cloudera.org:8080/#/c/8014/3/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java File fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java: Line 1257: // BoolTestPredicate. > modify one of these tests to use "NOT" added a test using not, but I'm not sure if I've captured what you're after. -- To view, visit http://gerrit.cloudera.org:8080/8014 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5bd0aa82a0316ac236b7ecf8612ef4b30c016a00 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Vuk Ercegovac Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Vuk Ercegovac Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1767 Adds predicate to test boolean values true, false, unknown.
Vuk Ercegovac has uploaded a new patch set (#6). Change subject: IMPALA-1767 Adds predicate to test boolean values true, false, unknown. .. IMPALA-1767 Adds predicate to test boolean values true, false, unknown. Adds a new expression to represent the following boolean predicate: IS [NOT] (TRUE | FALSE | UNKNOWN) The expression is rewritten into compound expressions using negation, is null, and equality. As a result, there are no BE operators to eval this predicate. Testing: - Frontend: parser, analyzer, rewriter, tosql - EndToEnd query expressions Change-Id: I5bd0aa82a0316ac236b7ecf8612ef4b30c016a00 --- M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/Analyzer.java A fe/src/main/java/org/apache/impala/analysis/BoolTestPredicate.java A fe/src/main/java/org/apache/impala/rewrite/BoolTestToCompoundRule.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M testdata/workloads/functional-query/queries/QueryTest/exprs.test 9 files changed, 337 insertions(+), 16 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/8014/6 -- To view, visit http://gerrit.cloudera.org:8080/8014 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5bd0aa82a0316ac236b7ecf8612ef4b30c016a00 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Vuk Ercegovac Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-1767 Adds predicate to test boolean values true, false, unknown.
Vuk Ercegovac has uploaded a new patch set (#4). Change subject: IMPALA-1767 Adds predicate to test boolean values true, false, unknown. .. IMPALA-1767 Adds predicate to test boolean values true, false, unknown. Adds a new expression to represent the following boolean predicate: IS [NOT] (TRUE | FALSE | UNKNOWN) The expression is rewritten into compound expressions using negation, is null, and equality. As a result, there are no BE operators to eval this predicate. Testing: - Frontend: parser, analyzer, rewriter, tosql - EndToEnd query expressions Change-Id: I5bd0aa82a0316ac236b7ecf8612ef4b30c016a00 --- M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/Analyzer.java A fe/src/main/java/org/apache/impala/analysis/BoolTestPredicate.java A fe/src/main/java/org/apache/impala/rewrite/BoolTestToCompoundRule.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M testdata/workloads/functional-query/queries/QueryTest/exprs.test 9 files changed, 336 insertions(+), 16 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/8014/4 -- To view, visit http://gerrit.cloudera.org:8080/8014 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5bd0aa82a0316ac236b7ecf8612ef4b30c016a00 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Vuk Ercegovac Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-1767 Adds predicate to test boolean values true, false, unknown.
Thomas Tauber-Marshall has posted comments on this change. Change subject: IMPALA-1767 Adds predicate to test boolean values true, false, unknown. .. Patch Set 3: (10 comments) http://gerrit.cloudera.org:8080/#/c/8014/3/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: PS3, Line 1583: whitespace PS3, Line 3034: whitespace, here and below http://gerrit.cloudera.org:8080/#/c/8014/3/fe/src/main/java/org/apache/impala/analysis/BoolTestPredicate.java File fe/src/main/java/org/apache/impala/analysis/BoolTestPredicate.java: PS3, Line 26: Class describing a predicate that tests a boolean values using IS. : * Example: (1 < 1) IS TRUE You should explain more thoroughly what this expr actually does, eg. the handling of NULLs. PS3, Line 28: a : * CompoundPredicate weird line wrapping here PS3, Line 33: _ We don't use trailing underscores on constant values. PS3, Line 94: assert Preconditions.checkState http://gerrit.cloudera.org:8080/#/c/8014/3/fe/src/main/java/org/apache/impala/rewrite/BoolTestToCompoundRule.java File fe/src/main/java/org/apache/impala/rewrite/BoolTestToCompoundRule.java: PS3, Line 39: if (!(expr instanceof BoolTestPredicate)) : return expr; single line PS3, Line 64: private BoolTestToCompoundRule() { : } single line http://gerrit.cloudera.org:8080/#/c/8014/3/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java File fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java: PS3, Line 164: whitespace http://gerrit.cloudera.org:8080/#/c/8014/3/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java File fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java: Line 1257: // BoolTestPredicate. modify one of these tests to use "NOT" -- To view, visit http://gerrit.cloudera.org:8080/8014 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5bd0aa82a0316ac236b7ecf8612ef4b30c016a00 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Vuk Ercegovac Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Vuk Ercegovac Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1767 Adds predicate to test boolean values true, false, unknown.
Philip Zeyliger has posted comments on this change. Change subject: IMPALA-1767 Adds predicate to test boolean values true, false, unknown. .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/8014/2/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: Line 2933: {: RESULT = p; :} It's a little weird that "IS NULL" is handled this way and "IS TRUE" is handled in line 3034. (And even more weird that IS NULL and IS UNKOWN are handled differently.) http://gerrit.cloudera.org:8080/#/c/8014/2/fe/src/main/java/org/apache/impala/rewrite/BoolTestToCompoundRule.java File fe/src/main/java/org/apache/impala/rewrite/BoolTestToCompoundRule.java: Line 29: * Rewrites a bool test predicate to compound expressions. The form of a bool Please consider adding the rewrite rule (i.e., line 50 and 47) up here to the documentation. -- To view, visit http://gerrit.cloudera.org:8080/8014 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5bd0aa82a0316ac236b7ecf8612ef4b30c016a00 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Vuk Ercegovac Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Vuk Ercegovac Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1767 Adds predicate to test boolean values true, false, unknown.
Vuk Ercegovac has posted comments on this change. Change subject: IMPALA-1767 Adds predicate to test boolean values true, false, unknown. .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/8014/2/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: PS2, Line 273: KW_UPDATE, > Treating "unknown" as a keyword conflicts with kudu specification: encoding Backed out the approach where "unknown" is a keyword. That lets us not require that "unknown" be reserved, and does not require any changes for the existing kudu spec. -- To view, visit http://gerrit.cloudera.org:8080/8014 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5bd0aa82a0316ac236b7ecf8612ef4b30c016a00 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Vuk Ercegovac Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Vuk Ercegovac Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1767 Adds predicate to test boolean values true, false, unknown.
Vuk Ercegovac has uploaded a new patch set (#3). Change subject: IMPALA-1767 Adds predicate to test boolean values true, false, unknown. .. IMPALA-1767 Adds predicate to test boolean values true, false, unknown. Adds a new expression to represent the following boolean predicate: IS [NOT] (TRUE | FALSE | UNKNOWN) The expression is rewritten into compound expressions using negation, is null, and equality. As a result, there are no BE operators to eval this predicate. Testing: - Frontend: parser, analyzer, rewriter, tosql - EndToEnd query expressions Change-Id: I5bd0aa82a0316ac236b7ecf8612ef4b30c016a00 --- M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/Analyzer.java A fe/src/main/java/org/apache/impala/analysis/BoolTestPredicate.java A fe/src/main/java/org/apache/impala/rewrite/BoolTestToCompoundRule.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M testdata/workloads/functional-query/queries/QueryTest/exprs.test 9 files changed, 330 insertions(+), 17 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/8014/3 -- To view, visit http://gerrit.cloudera.org:8080/8014 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5bd0aa82a0316ac236b7ecf8612ef4b30c016a00 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Vuk Ercegovac Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-5894: [DOCS] Clarify placement of STRAIGHT JOIN hint
John Russell has uploaded a new change for review. http://gerrit.cloudera.org:8080/8031 Change subject: IMPALA-5894: [DOCS] Clarify placement of STRAIGHT_JOIN hint .. IMPALA-5894: [DOCS] Clarify placement of STRAIGHT_JOIN hint "immediately after the SELECT keyword" was mentioned in a few places for STRAIGHT_JOIN. I reworded all instances to mention that [DISTINCT | ALL] can also come before the hint name. Change-Id: I3cac1afccc132f389b2017ad217fdf7e7b04513a --- M docs/shared/impala_common.xml M docs/topics/impala_hints.xml M docs/topics/impala_perf_joins.xml 3 files changed, 12 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/8031/1 -- To view, visit http://gerrit.cloudera.org:8080/8031 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I3cac1afccc132f389b2017ad217fdf7e7b04513a Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell
[Impala-ASF-CR] IMPALA-5881: Use native allocation while building catalog updates
Alex Behm has posted comments on this change. Change subject: IMPALA-5881: Use native allocation while building catalog updates .. Patch Set 16: (14 comments) http://gerrit.cloudera.org:8080/#/c/7955/16/be/src/catalog/catalog.cc File be/src/catalog/catalog.cc: Line 120: // Now that we deserialize the thrift objects into TGetAllCatalogObjectsResponse, deserialized (past tense) http://gerrit.cloudera.org:8080/#/c/7955/16/fe/src/main/java/org/apache/impala/service/JniCatalog.java File fe/src/main/java/org/apache/impala/service/JniCatalog.java: Line 84: // flag. remove this line http://gerrit.cloudera.org:8080/#/c/7955/16/fe/src/main/java/org/apache/impala/thrift/NativeByteArrayOutputStream.java File fe/src/main/java/org/apache/impala/thrift/NativeByteArrayOutputStream.java: Line 61: " maximum limit of " + BUFFER_MAX_SIZE_LIMIT + " bytes"; remove first space in string Line 99:* - bufferPtr_>=0 use consistent spacing, i.e. bufferPtr_ >= 0 Line 102: Preconditions.checkState(bufferPtr_ >= 0); remove (documenting in comment is better here) http://gerrit.cloudera.org:8080/#/c/7955/16/fe/src/test/java/org/apache/impala/thrift/NativeByteArrayOutputStreamTest.java File fe/src/test/java/org/apache/impala/thrift/NativeByteArrayOutputStreamTest.java: Line 28: // Custom NativeAllocator that accounts the allocated and freed bytes and randomly the allocated/freed bytes and simulates allocation failures Line 39: // Throws an OOM on every alternate call to reallocate(). on every 2nd call Line 51: allocatedBytes_ = 0; single line Line 87: while (testAllocator_.getAllocationFailures() < 10) { I still don't think this is a great approach because it does not reflect real-world usage of the NBAOS. I think it's worth exhaustively testing all states (there are not that many). You could do something like this instead: // Test first allocation failure testAllocator_.setFailRealloc(true); try { Nbaos n = new Nbaos(testAllocator_); } catch () { // Fill in } // Test allocating failure from all possible states, allocating 64M at a time. int reallocs = 1; byte[] b = new byte[64 * 1024 * 1024 * 1024]; while (true) { TestAllocator alloc = new TestAllocator(); Nbaos n = new Nbaos(alloc); while (alloc.getReallocs() == reallocs) { n.write(b, 0, len); } alloc.setFailRealloc(true); writeAndCheck(); reallocs = alloc.getReallocs(); // If nbaos is at max capacity break out of loop } Line 91: writeAndCheck(b, 1, 0); These should be tested from a fresh Nbaos and not from the one that is already 'polluted' from previous tests. http://gerrit.cloudera.org:8080/#/c/7955/16/fe/src/test/java/org/apache/impala/thrift/TNativeSerializerTest.java File fe/src/test/java/org/apache/impala/thrift/TNativeSerializerTest.java: Line 58: // Deserialize the object at result.buffer_ptr and we confirm it is the same remove "we" Line 119: Assert.assertTrue(nativeSerializer.getSerializedBytesSize() >= 3584 * 1024 * 1024); Isn't there an assertGe() or something like that? Also print a message with the actual size for debugging Line 137: e.printStackTrace(); remove Line 166: testBasicSerialization(factory); Let's make these separate top-level tests, if's fine to repeat the loop over PROTOCOLS_TO_TEST. -- To view, visit http://gerrit.cloudera.org:8080/7955 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I383684effa9524734ce3c6c0fb7ed37de0e15782 Gerrit-PatchSet: 16 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[native-toolchain-CR] Bump Kudu version to 3f49724
Thomas Tauber-Marshall has posted comments on this change. Change subject: Bump Kudu version to 3f49724 .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/8028 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I38334bdbaa1e7a6c6f5d595c26efb76efa062305 Gerrit-PatchSet: 1 Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[native-toolchain-CR] Bump Kudu version to 3f49724
Thomas Tauber-Marshall has submitted this change and it was merged. Change subject: Bump Kudu version to 3f49724 .. Bump Kudu version to 3f49724 Change-Id: I38334bdbaa1e7a6c6f5d595c26efb76efa062305 --- M buildall.sh 1 file changed, 1 insertion(+), 1 deletion(-) Approvals: Matthew Jacobs: Looks good to me, approved Thomas Tauber-Marshall: Verified -- To view, visit http://gerrit.cloudera.org:8080/8028 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I38334bdbaa1e7a6c6f5d595c26efb76efa062305 Gerrit-PatchSet: 1 Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-3642: Adding backend addresses to error statuses for some scratch failures.
Philip Zeyliger has uploaded a new patch set (#2). Change subject: IMPALA-3642: Adding backend addresses to error statuses for some scratch failures. .. IMPALA-3642: Adding backend addresses to error statuses for some scratch failures. Adds GetBackendAddress() (which is host:port) to error messages stemming from SCRATCH_LIMIT_EXCEEDED, SCRATCH_READ_TRUNCATED, and SCRATCH_ALLOCATION_FAILED messages. Testing: * Unit tests assert the string is updated for SCRATCH_LIMIT_EXCEEDED and SCRATCH_ALLOCATION_FAILED. SCRATCH_READ_TRUNCATED doesn't have an existing test, and I didn't add a new one. * Manually testing a query that spills after "chmod 000 /tmp/impala-scratch": $ chmod 000 /tmp/impala-scratch $ impala-shell [dev:21000] > set mem_limit=100m; MEM_LIMIT set to 100m [dev:21000] > select count(*) from tpch_parquet.lineitem join tpch_parquet.orders on l_orderkey = o_orderkey; Query: select count(*) from tpch_parquet.lineitem join tpch_parquet.orders on l_orderkey = o_orderkey Query submitted at: 2017-09-11 11:07:06 (Coordinator: http://dev:25000) Query progress can be monitored at: http://dev:25000/query_plan?query_id=5c48ff8f4103c194:1b40a6c WARNINGS: Could not create files in any configured scratch directories (--scratch_dirs=/tmp/impala-scratch) on backend 'dev:22002'. See logs for previous errors that may have prevented creating or writing scratch files. Opening '/tmp/impala-scratch/5c48ff8f4103c194:1b40a6c_08e8d63b-169d-4571-a0fe-c48fa08d73e6' for write failed with errno=13 description=Error(13): Permission denied Opening '/tmp/impala-scratch/5c48ff8f4103c194:1b40a6c_08e8d63b-169d-4571-a0fe-c48fa08d73e6' for write failed with errno=13 description=Error(13): Permission denied Opening '/tmp/impala-scratch/5c48ff8f4103c194:1b40a6c_08e8d63b-169d-4571-a0fe-c48fa08d73e6' for write failed with errno=13 description=Error(13): Permission denied Change-Id: If31a50fdf6031312d0348d48aeb8f9688274cac2 --- M be/src/runtime/bufferpool/buffer-pool-test.cc M be/src/runtime/tmp-file-mgr-test.cc M be/src/runtime/tmp-file-mgr.cc M common/thrift/generate_error_codes.py 4 files changed, 13 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/7816/2 -- To view, visit http://gerrit.cloudera.org:8080/7816 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: If31a50fdf6031312d0348d48aeb8f9688274cac2 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5890: Abort queries if scanner hits IO errors
Lars Volker has posted comments on this change. Change subject: IMPALA-5890: Abort queries if scanner hits IO errors .. Patch Set 5: > Yeah we could probably add an error code like DISK_IO_MGR_ERROR and > use it for the different I/O errors emanating from the DiskIoMgr > code. I'll try that out and will push a PS shortly. -- To view, visit http://gerrit.cloudera.org:8080/8011 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5211: Simplifying ifnull conditional.
Hello Matthew Jacobs, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7829 to look at the new patch set (#6). Change subject: IMPALA-5211: Simplifying ifnull conditional. .. IMPALA-5211: Simplifying ifnull conditional. Re-writes ifnull(x, y) into if(x IS DISTINCT FROM y, x, NULL), and simplifies x IS DISTINCT FROM y -> FALSE if x.equals(y). Removes backend implementation of ifnull. It's important and subtle that the re-write uses "x IS DISTINCT FROM y" rather than "x != y" so that the simplification can be made while handling null values correctly. ("x != x" may be either false or null, but x is distinct from x is always false.) The error messages are slightly altered by this change: Before: Query: select nullif("a", 1) ERROR: AnalysisException: No matching function with signature: nullif(STRING, TINYINT). Query: select nullif(1, 2, 3) ERROR: AnalysisException: No matching function with signature: nullif(TINYINT, TINYINT, TINYINT). After: ERROR: AnalysisException: operands of type STRING and TINYINT are not comparable: 'a' IS DISTINCT FROM 1 ERROR: AnalysisException: nullif() must be called with two arguments. This kind of error message is consistent with what you get for other re-writes. Testing: * Added new tests to ExprRewriteRulesTests for nullif and the if(x distinct from y, ...) simplification. * New test for the rewrite in ParserTest. * Adds an nvl2() test, incidentally. * Confirmed (using EclEmma, which uses jococo engine) that coverage is good. * Ran the tests. Change-Id: Id91ca968a0c0be44e1ec54ad8602f91a5cb2e0e5 --- M be/src/exprs/conditional-functions-ir.cc M be/src/exprs/conditional-functions.cc M be/src/exprs/conditional-functions.h M be/src/exprs/scalar-expr.cc M be/src/exprs/scalar-expr.h M common/function-registry/impala_functions.py M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java M fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java A fe/src/main/java/org/apache/impala/rewrite/SimplifyDistinctFromRule.java M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java 12 files changed, 195 insertions(+), 93 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/7829/6 -- To view, visit http://gerrit.cloudera.org:8080/7829 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id91ca968a0c0be44e1ec54ad8602f91a5cb2e0e5 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Philip Zeyliger
[Impala-ASF-CR] IMPALA-5890: Abort queries if scanner hits IO errors
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5890: Abort queries if scanner hits IO errors .. Patch Set 5: Yeah we could probably add an error code like DISK_IO_MGR_ERROR and use it for the different I/O errors emanating from the DiskIoMgr code. -- To view, visit http://gerrit.cloudera.org:8080/8011 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5890: Abort queries if scanner hits IO errors
Dan Hecht has posted comments on this change. Change subject: IMPALA-5890: Abort queries if scanner hits IO errors .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/8011/5/be/src/exec/scanner-context.h File be/src/exec/scanner-context.h: Line 277: Status WrapIoError(const Status& status); > I agree, Lars filed IMPALA-5915 for this. I think there were some questions Is there a simple thing we can start with for this patch (rather than introduce the wrapping) and then refine it later as needed? Like use a single error code? -- To view, visit http://gerrit.cloudera.org:8080/8011 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5890: Abort queries if scanner hits IO errors
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5890: Abort queries if scanner hits IO errors .. Patch Set 5: (3 comments) http://gerrit.cloudera.org:8080/#/c/8011/5/be/src/exec/scanner-context.cc File be/src/exec/scanner-context.cc: Line 284: io_buffer_bytes_left_ = 0; Maybe add a short comment explaining why this is necessary? // Make state consistent in case we return early with an error below. Line 371: Status wrapped_status(TErrorCode::SCANNER_CONTEXT_WRAPPED_IO_ERROR, status.msg().msg()); Maybe reference IMPALA-4697 for context on why we're doing this instead of MergeStatus(). http://gerrit.cloudera.org:8080/#/c/8011/5/be/src/exec/scanner-context.h File be/src/exec/scanner-context.h: Line 277: Status WrapIoError(const Status& status); > rather than wrapping, is it not possible to make our IO code return a singl I agree, Lars filed IMPALA-5915 for this. I think there were some questions about how to design it - e.g. do we have a hierarchy of error codes, or some other way of categorising them to avoid having to enumerate all the possible errors in the error handling code. -- To view, visit http://gerrit.cloudera.org:8080/8011 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5890: Abort queries if scanner hits IO errors
Dan Hecht has posted comments on this change. Change subject: IMPALA-5890: Abort queries if scanner hits IO errors .. Patch Set 5: (3 comments) http://gerrit.cloudera.org:8080/#/c/8011/5/be/src/exec/base-sequence-scanner.cc File be/src/exec/base-sequence-scanner.cc: PS5, Line 177: if (status.IsCancelled() || status.IsMemLimitExceeded()) return status; : : // Log error from file format parsing. : state_->LogError(ErrorMsg(TErrorCode::SEQUENCE_SCANNER_PARSE_ERROR, : stream_->filename(), stream_->file_offset(), : (stream_->eof() ? "(EOF)" : ""))); : : // Make sure errors specified in the status are logged as well : state_->LogError(status.msg()); : : // If abort on error then return, otherwise try to recover. : if (state_->abort_on_error()) return status; : : // Abort scan range for I/O related errors : if (status.code() == TErrorCode::SCANNER_CONTEXT_WRAPPED_IO_ERROR) { : eos_ = true; : return Status::OK(); : } it does seem like this could be written in terms of RuntimeState::LogOrReturnErrror(), right? I guess the one complication is that we're trying not to fail entirely, but instead mark eos_ in that case. But are we sure that returning error from GetNextInternal() won't do the same thing? I thought the scanners have a check higher up the callstack that would go ahead to the next scan range if abort_on_error==false and GetNext() returns (recoverable) error, but I could be mistaken. http://gerrit.cloudera.org:8080/#/c/8011/5/be/src/exec/scanner-context.cc File be/src/exec/scanner-context.cc: PS5, Line 194: Internal " would be best to keep "Internal error" on the same line so that it's more easily grep-able. http://gerrit.cloudera.org:8080/#/c/8011/5/be/src/exec/scanner-context.h File be/src/exec/scanner-context.h: Line 277: Status WrapIoError(const Status& status); rather than wrapping, is it not possible to make our IO code return a single (or some set of) unrecoverable IO error codes directly, that we can then identify as IO errors? -- To view, visit http://gerrit.cloudera.org:8080/8011 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5250: Unify decompressor output length semantics
Tianyi Wang has uploaded a new change for review. http://gerrit.cloudera.org:8080/8030 Change subject: IMPALA-5250: Unify decompressor output_length semantics .. IMPALA-5250: Unify decompressor output_length semantics This patch makes the semantics output_length parameter in Codec::ProcessBlock to be the same. In existing code different decompressor treats output_length differently: 1. SnappyDecompressor needs output_length to be greater than or equal to actual decompressed length, but it does not set it to actual decompressed length after decompression. 2. SnappyBlockDecompressor and Lz4Decompressor require output_length to be exactly the same as actual decompressed length, otherwise decompression fails. 3. Other decompressors need output_length to be greater than or equal to actual decompressed length and will set it to actual decompressed length if oversized. This inconsistency leads to a bug where the error message is undeterministic when the compressed block is corrupted. This patch makes all decompressor behave as 3 and adds a testcase checking that decompressors can handle oversized output buffer correctly. Lz4Decompressor will use the "safe" version of decompression function instead of the "fast" version, for the latter is insecure with corrupted data and requires decompressed length to be known. Change-Id: Ifd42942b169921a7eb53940c3762bc45bb82a993 --- M be/src/util/decompress-test.cc M be/src/util/decompress.cc 2 files changed, 31 insertions(+), 24 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/8030/1 -- To view, visit http://gerrit.cloudera.org:8080/8030 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ifd42942b169921a7eb53940c3762bc45bb82a993 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tianyi Wang
[Impala-ASF-CR] IMPALA-5912: fix crash in trunc(..., "WW") in release build
Thomas Tauber-Marshall has posted comments on this change. Change subject: IMPALA-5912: fix crash in trunc(..., "WW") in release build .. Patch Set 1: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/8015 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic5017518188f5025daa5040ca1943581a0675726 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-1767 Adds predicate to test boolean values true, false, unknown.
Vuk Ercegovac has posted comments on this change. Change subject: IMPALA-1767 Adds predicate to test boolean values true, false, unknown. .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/8014/2/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: PS2, Line 273: KW_UNKNOWN Treating "unknown" as a keyword conflicts with kudu specification: encoding and compression may both specify "unknown". I can work-around this, but in general, adding "unknown" as a keyword may break existing queries. Any precedence on adding keywords (I saw the handling of "default", for example)? -- To view, visit http://gerrit.cloudera.org:8080/8014 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5bd0aa82a0316ac236b7ecf8612ef4b30c016a00 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Vuk Ercegovac Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Vuk Ercegovac Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3360: Codegen inserting into runtime filters
Thomas Tauber-Marshall has uploaded a new change for review. http://gerrit.cloudera.org:8080/8029 Change subject: IMPALA-3360: Codegen inserting into runtime filters .. IMPALA-3360: Codegen inserting into runtime filters This patch codegens PhjBuilder::InsertRuntimeFilters() and FilterContext::Insert(). This allows us to unroll the loop over all the filters in PhjBuilder::ProcessBuildBatch(), including skipping consideration of FilterContext's that don't have a local_bloom_filter, and eliminates the branch on type that happens in RawValue::GetHashValue(). Testing: - Ran existing runtime filter tests. - Ran perf tests locally (all avg. over three runs): - Four way self join on tpch_parquet.lineitem. Should be a good case for this as there's several large hash join build sides that will benefit from the codegen. Total query running time improved ~7% (from 16.07s to 14.91s). - Single join of tpch_parquet.lineitem against a selectively filtered tpch_parquet.lineitem. Should be a bad case for this patch, as the build side of the join is very small. Total query running time regressed by about ~2% (from 0.73s to 0.75s) due to an increase in codegen time (from 295ms to 309ms for the fragment containing the hash join). Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb --- M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/partitioned-hash-join-builder-ir.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-builder.h M be/src/util/CMakeLists.txt A be/src/util/bloom-filter-ir.cc M be/src/util/bloom-filter.h 10 files changed, 232 insertions(+), 25 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/8029/1 -- To view, visit http://gerrit.cloudera.org:8080/8029 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall
[native-toolchain-CR] Bump Kudu version to 3f49724
Matthew Jacobs has posted comments on this change. Change subject: Bump Kudu version to 3f49724 .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8028 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I38334bdbaa1e7a6c6f5d595c26efb76efa062305 Gerrit-PatchSet: 1 Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[native-toolchain-CR] Bump Kudu version to 3f49724
Thomas Tauber-Marshall has uploaded a new change for review. http://gerrit.cloudera.org:8080/8028 Change subject: Bump Kudu version to 3f49724 .. Bump Kudu version to 3f49724 Change-Id: I38334bdbaa1e7a6c6f5d595c26efb76efa062305 --- M buildall.sh 1 file changed, 1 insertion(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/native-toolchain refs/changes/28/8028/1 -- To view, visit http://gerrit.cloudera.org:8080/8028 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I38334bdbaa1e7a6c6f5d595c26efb76efa062305 Gerrit-PatchSet: 1 Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall