[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size
Alex Behm has posted comments on this change. Change subject: IMPALA-5500: Reduce catalog update topic size .. Patch Set 3: (7 comments) http://gerrit.cloudera.org:8080/#/c/7268/3/be/src/catalog/catalog-server.cc File be/src/catalog/catalog-server.cc: Line 331: status = SerializeAndCompressCatalogObject(catalog_object, item); Thinking we should have a similar separation of serialization and compression like we do for the decompression. Line 366: compressed_buffer.resize(result_len); Remove since we're copying into item.value anyway Line 367: item.value = string(reinterpret_cast(compressed_buffer_ptr), use item.value.assign() http://gerrit.cloudera.org:8080/#/c/7268/3/be/src/service/impala-server.h File be/src/service/impala-server.h: Line 557: std::vector& data_buffer) WARN_UNUSED_RESULT; we use pointers for out params, i.e. use std::vector* http://gerrit.cloudera.org:8080/#/c/7268/3/be/src/util/compress.h File be/src/util/compress.h: Line 113 revert http://gerrit.cloudera.org:8080/#/c/7268/3/be/src/util/decompress.h File be/src/util/decompress.h: Line 86: revert http://gerrit.cloudera.org:8080/#/c/7268/2/tests/custom_cluster/test_compact_catalog_updates.py File tests/custom_cluster/test_compact_catalog_updates.py: Line 45: assert result.data[0] == "7300" > Added the result check. Out of curiosity, what's so special about the parqu Nothing special about Parquet, just another table. -- To view, visit http://gerrit.cloudera.org:8080/7268 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5554: sorter DCHECK on null column
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-5554: sorter DCHECK on null column .. IMPALA-5554: sorter DCHECK on null column The bug was in the DCHECK. The DCHECK is intended to make sure that a tuple's string data didn't get split across blocks. The logic assumed that if the second-or-later string column was in the next-block, that the strings were split between blocks. However, that assumption is invalid if there are NULL strings, which do not belong in any block. The fix for the DCHECK (which is still useful) is to count the number of non-NULL strings and make sure that no non-NULL strings were split between blocks. Testing: Added a test that reproduces the crash. Change-Id: I7a8dee982501008efff5b5abc192cfb5e6544a90 Reviewed-on: http://gerrit.cloudera.org:8080/7295 Reviewed-by: Tim Armstrong Tested-by: Impala Public Jenkins --- M be/src/runtime/sorter.cc M testdata/workloads/functional-query/queries/QueryTest/single-node-large-sorts.test 2 files changed, 39 insertions(+), 3 deletions(-) Approvals: Impala Public Jenkins: Verified Tim Armstrong: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/7295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I7a8dee982501008efff5b5abc192cfb5e6544a90 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5554: sorter DCHECK on null column
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5554: sorter DCHECK on null column .. Patch Set 4: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/7295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7a8dee982501008efff5b5abc192cfb5e6544a90 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5500: Reduce catalog update topic size .. Patch Set 2: (7 comments) http://gerrit.cloudera.org:8080/#/c/7268/2/be/src/catalog/catalog-server.cc File be/src/catalog/catalog-server.cc: Line 330: if (!status.ok()) { > take out of if/else since it's common to both cases Done Line 367: uint8_t* compressed_buffer_ptr = &compressed_buffer[0]; > comptessed_buffer.data() Done Line 370: // Base64 encode the compressed catalog object and store it in the topic item. > Is base64-encoding really necessary? Why don't the raw bytes work? Hm, not really. It works without it, I only used it because the topic items use strings for keys and values. Removed it. http://gerrit.cloudera.org:8080/#/c/7268/2/be/src/catalog/catalog-server.h File be/src/catalog/catalog-server.h: Line 200: /// Serializes 'catalog_object' and compresses it using Snappy compression. > remove second "compression" Done http://gerrit.cloudera.org:8080/#/c/7268/2/be/src/service/impala-server.cc File be/src/service/impala-server.cc: Line 1417: Status ImpalaServer::DecompressAndDeserializeCatalogObject(const TTopicItem& item, > Why not separate decompression from deserialization? Seem like separate con Done http://gerrit.cloudera.org:8080/#/c/7268/2/be/src/util/compress.h File be/src/util/compress.h: Line 103: SnappyCompressor(MemPool* mem_pool = NULL, bool reuse_buffer = false); > Why these changes? Shouldn't these always be created using CreateCompressor oops forgot about these. I was playing around with directly creating the compressor. reverted. http://gerrit.cloudera.org:8080/#/c/7268/2/tests/custom_cluster/test_compact_catalog_updates.py File tests/custom_cluster/test_compact_catalog_updates.py: Line 45: "select count(*) from functional.alltypes") > Check result? Maybe also use another query on a parquet table. Added the result check. Out of curiosity, what's so special about the parquet table in this case? -- To view, visit http://gerrit.cloudera.org:8080/7268 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size
Dimitris Tsirogiannis has uploaded a new patch set (#3). Change subject: IMPALA-5500: Reduce catalog update topic size .. IMPALA-5500: Reduce catalog update topic size Problem: IMPALA-4029 introduced the use of the flatbuffers serialization libary for storing file and block metadata. That change reduced the effectiveness of the Thrift compaction protocol (when --compact_catalog_topic is used), thereby causing a 2X increase in catalog update topic size when the compact protocol is used. Fix: Snappy compress the catalog topic updates before sent to the statestore when --compact_catalog_topic is set to true. Results: ~4X reduction in catalog update topic size Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a --- M be/src/catalog/catalog-server.cc M be/src/catalog/catalog-server.h M be/src/service/impala-server.cc M be/src/service/impala-server.h M be/src/util/compress.h M be/src/util/decompress.h A tests/custom_cluster/test_compact_catalog_updates.py 7 files changed, 141 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/7268/3 -- To view, visit http://gerrit.cloudera.org:8080/7268 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis
[Impala-ASF-CR] IMPALA-5571: Fix authorization tests
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5571: Fix authorization tests .. Patch Set 2: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/793/ -- To view, visit http://gerrit.cloudera.org:8080/7299 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icad893a10fdeb7a721264e69413014603dec6339 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5571: Fix authorization tests
Hello Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7299 to look at the new patch set (#2). Change subject: IMPALA-5571: Fix authorization tests .. IMPALA-5571: Fix authorization tests Issue: With the fix for IMPALA-5549 the catalog cache of an impalad may end up with stale authorization metadata in the presence of GRANT/REVOKE statements that modify multiple privileges. Fix: Revert the behavior of GRANT/REVOKE privilege DDL statements. Change-Id: Icad893a10fdeb7a721264e69413014603dec6339 --- M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java 1 file changed, 7 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/7299/2 -- To view, visit http://gerrit.cloudera.org:8080/7299 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Icad893a10fdeb7a721264e69413014603dec6339 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis
[Impala-ASF-CR] IMPALA-5571: Fix authorization tests
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5571: Fix authorization tests .. Patch Set 2: Code-Review+2 Rebase. Keep Alex's +2 -- To view, visit http://gerrit.cloudera.org:8080/7299 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icad893a10fdeb7a721264e69413014603dec6339 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5431: Remove redundant path exists checks during table load
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-5431: Remove redundant path exists checks during table load .. IMPALA-5431: Remove redundant path exists checks during table load There are multiple places that do an exists() check on a path and then perform some subsequent action on it. This pattern results in two RPCs to the NN (one for the exists() check and one for the subsequent action). We can avoid the exists() check in these cases since most HDFS methods on paths throw a FileNotFoundException if the path does not exist. This can save an RPC to NN and improve the metadata loading time. Testing: Enough tests already cover this code path. This patch passed core and exhaustive tests. Metadata benchmark shows decent increase in perf numbers, for ex: 100K-PARTITIONS-1M-FILES-CUSTOM-05-QUERY-AFTER-INV -20.51% 80-PARTITIONS-250K-FILES-S3-03-RECOVER -20.58% 80-PARTITIONS-250K-FILES-11-DROP-PARTITION -22.13% 80-PARTITIONS-250K-FILES-S3-08-ADD-PARTITION -22.38% 80-PARTITIONS-250K-FILES-S3-12-DROP -23.69% 100K-PARTITIONS-1M-FILES-CUSTOM-11-REFRESH-PARTITION -23.91% 100K-PARTITIONS-1M-FILES-CUSTOM-10-REFRESH-AFTER-ADD-PARTITION -26.04% 100K-PARTITIONS-1M-FILES-CUSTOM-07-REFRESH -26.38% 80-PARTITIONS-250K-FILES-S3-02-CREATE -36.47% 100K-PARTITIONS-1M-FILES-CUSTOM-12-QUERY-PARTITIONS -58.72% 80-PARTITIONS-250K-FILES-S3-01-DROP -95.33% 80-PARTITIONS-250K-FILES-01-DROP -95.93% Change-Id: Id10ecf64ea2eda2d0f9299c0aa371933eca22281 Reviewed-on: http://gerrit.cloudera.org:8080/7095 Reviewed-by: Bharath Vissapragada Tested-by: Impala Public Jenkins --- M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java 2 files changed, 64 insertions(+), 39 deletions(-) Approvals: Impala Public Jenkins: Verified Bharath Vissapragada: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/7095 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Id10ecf64ea2eda2d0f9299c0aa371933eca22281 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-5431: Remove redundant path exists checks during table load
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5431: Remove redundant path exists checks during table load .. Patch Set 7: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/7095 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id10ecf64ea2eda2d0f9299c0aa371933eca22281 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5316: Adds last day() function
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-5316: Adds last_day() function .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/6991/6/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: Line 279: void TestLastDayFunction() { > I'm fixing this. Filed IMPALA-5585 -- To view, visit http://gerrit.cloudera.org:8080/6991 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I429c8734bddca3c37a2eedc211a16a4ffcb04370 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Vincent Tran Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Vincent Tran Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5316: Adds last day() function
Vincent Tran has posted comments on this change. Change subject: IMPALA-5316: Adds last_day() function .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/6991/6/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: Line 279: void TestLastDayFunction() { > Good catch, I suspect the intention was to call this in the same place Test I'm fixing this. -- To view, visit http://gerrit.cloudera.org:8080/6991 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I429c8734bddca3c37a2eedc211a16a4ffcb04370 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Vincent Tran Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Vincent Tran Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5316: Adds last day() function
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-5316: Adds last_day() function .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/6991/6/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: Line 279: void TestLastDayFunction() { > Vincent, MJ, I don't really see how this code is executed, are you sure the Good catch, I suspect the intention was to call this in the same place TestNextDayFunction() is called. -- To view, visit http://gerrit.cloudera.org:8080/6991 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I429c8734bddca3c37a2eedc211a16a4ffcb04370 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Vincent Tran Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Vincent Tran Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5579: Fix IndexOutOfBoundsException in GetTables metadata request
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-5579: Fix IndexOutOfBoundsException in GetTables metadata request .. IMPALA-5579: Fix IndexOutOfBoundsException in GetTables metadata request Issue: GetTables metadata request may result in an IndexOutOfBoundsException while accessing table comments fields if one or more tables can't be retrieved from the catalog cache. Fix: Only add tables that can be retrieved from the catalog cache (i.e. no TableLoadingException is thrown) in the result of a GetTables request. Change-Id: Ic933b9950d58e6f880d4078faf1a69ee7c6adbf1 Reviewed-on: http://gerrit.cloudera.org:8080/7296 Reviewed-by: Dimitris Tsirogiannis Tested-by: Impala Public Jenkins --- M fe/src/main/java/org/apache/impala/service/MetadataOp.java M fe/src/test/java/org/apache/impala/service/FrontendTest.java 2 files changed, 11 insertions(+), 1 deletion(-) Approvals: Impala Public Jenkins: Verified Dimitris Tsirogiannis: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/7296 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ic933b9950d58e6f880d4078faf1a69ee7c6adbf1 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-5579: Fix IndexOutOfBoundsException in GetTables metadata request
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5579: Fix IndexOutOfBoundsException in GetTables metadata request .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/7296 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic933b9950d58e6f880d4078faf1a69ee7c6adbf1 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates .. Patch Set 12: Verified-1 Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/789/ -- To view, visit http://gerrit.cloudera.org:8080/6963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5483: Automatically disable codegen for small queries
Alex Behm has posted comments on this change. Change subject: IMPALA-5483: Automatically disable codegen for small queries .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/7153/6/testdata/workloads/functional-planner/queries/PlannerTest/disable-codegen.test File testdata/workloads/functional-planner/queries/PlannerTest/disable-codegen.test: Line 6 > As mentioned in the other comment the codegen hint isn't equivalent to the I agree that we should be able to distinguish between user-set and planner-set query options. Yes, you can pretty accurately guess that the single-node plan optimization kicked in by looking at the plan. But you might not be able to guess the effects of runtime_filter_mode=off, and the stats-dependent behavior of num_scanner_threads=1 and adjusting the batch_size. The setting of MT_DOP=4 for COMPUTE STATS is not so obvious and arguably belongs in this same list as the disable codegen hint. I'd need to check of there are other examples of significant setting of query options. Personally, I don't think the codegen hint even makes sense as it is today. In most cases I'd imagine users want the hint behavior, i.e., try to at least run the query. But that option is not directly accessible to users. I'd much more prefer a new value of disable_codegen like "force_disable" where queries that need codegen will fail. I'd imagine that option would only be used in very rare circumstances. I like your suggestion of "Codegen disabled by planner". Let's do that. We can consider extending that to also include the reason (query option and value), but I let's not add that in this patch. http://gerrit.cloudera.org:8080/#/c/7153/3/testdata/workloads/functional-query/queries/QueryTest/codegen.test File testdata/workloads/functional-query/queries/QueryTest/codegen.test: Line 2: QUERY > I think we have coverage that the query produces the expected results when Good point, we do have coverage there. Let's leave these tests, but please rename to disable-codegen.test for consistency with the corresponding planner test. -- To view, visit http://gerrit.cloudera.org:8080/7153 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I273bcee58641f5b97de52c0b2caab043c914b32e Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Juan Yu Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5554: sorter DCHECK on null column
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5554: sorter DCHECK on null column .. Patch Set 4: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/792/ -- To view, visit http://gerrit.cloudera.org:8080/7295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7a8dee982501008efff5b5abc192cfb5e6544a90 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5571: Fix authorization tests
Bharath Vissapragada has posted comments on this change. Change subject: IMPALA-5571: Fix authorization tests .. Patch Set 1: Oh ok, Thanks for clarifying. -- To view, visit http://gerrit.cloudera.org:8080/7299 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icad893a10fdeb7a721264e69413014603dec6339 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5554: sorter DCHECK on null column
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5554: sorter DCHECK on null column .. Patch Set 4: Verified-1 Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/788/ -- To view, visit http://gerrit.cloudera.org:8080/7295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7a8dee982501008efff5b5abc192cfb5e6544a90 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5571: Fix authorization tests
Alex Behm has posted comments on this change. Change subject: IMPALA-5571: Fix authorization tests .. Patch Set 1: For others reading along, the original code from http://gerrit.cloudera.org:8080/7248 had a TODO that explicitly stated that updating multiple objects via direct-ddl does not work. -- To view, visit http://gerrit.cloudera.org:8080/7299 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icad893a10fdeb7a721264e69413014603dec6339 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4029: [DOCS] Include known issue for 'flat buffers' change
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-4029: [DOCS] Include known issue for 'flat buffers' change .. IMPALA-4029: [DOCS] Include known issue for 'flat buffers' change Change-Id: I65495748984268763b2690c193488a338409e13e Reviewed-on: http://gerrit.cloudera.org:8080/7236 Reviewed-by: John Russell Tested-by: Impala Public Jenkins --- M docs/topics/impala_known_issues.xml 1 file changed, 26 insertions(+), 0 deletions(-) Approvals: Impala Public Jenkins: Verified John Russell: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/7236 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I65495748984268763b2690c193488a338409e13e Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: John Russell
[Impala-ASF-CR] IMPALA-4029: [DOCS] Include known issue for 'flat buffers' change
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4029: [DOCS] Include known issue for 'flat buffers' change .. Patch Set 4: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/7236 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I65495748984268763b2690c193488a338409e13e Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: John Russell Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5571: Fix authorization tests
Alex Behm has posted comments on this change. Change subject: IMPALA-5571: Fix authorization tests .. Patch Set 1: Code-Review+2 Let's unblock the builds. -- To view, visit http://gerrit.cloudera.org:8080/7299 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icad893a10fdeb7a721264e69413014603dec6339 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5483: Automatically disable codegen for small queries
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5483: Automatically disable codegen for small queries .. Patch Set 6: (2 comments) http://gerrit.cloudera.org:8080/#/c/7153/6/testdata/workloads/functional-planner/queries/PlannerTest/disable-codegen.test File testdata/workloads/functional-planner/queries/PlannerTest/disable-codegen.test: Line 6: Planner Hints: Disable codegen > The term "hint" already has a different meaning as in "straight_join" kind As mentioned in the other comment the codegen hint isn't equivalent to the query option. It seems a bit questionable that with the other query options we can't distinguish between user-set options and planner-set options in the profile (although you can usually infer for single node plan). Disabling codegen does seem like a significant enough planner decision that it should be somewhat visible in the explain plan. It's a bit less of a problem for the single node optimisation since then there's usually a visible change to the plan shape. We can test it like testComputeStatsMtDop() but if there's nothing in the explain plan it's harder to understand what the planner did. "Planner Hint" might not be the best term if we do include it. Not sure what would be better. Just a plain string like "Codegen disabled by planner"? http://gerrit.cloudera.org:8080/#/c/7153/3/testdata/workloads/functional-query/queries/QueryTest/codegen.test File testdata/workloads/functional-query/queries/QueryTest/codegen.test: Line 2: QUERY > The hint is applied to all FE->BE expr evaluations, and the single-node opt I think we have coverage that the query produces the expected results when the hint is set, but we don't have any coverage that codegen was actually disabled. The distinction between the hint and the query option exists for a reason - there are some exprs that can only be evaluated with codegen enabled. In those cases the hint is overridden. But if the query option is set explicitly, then the option takes precedence and the query fails because the exprs can't be evaluated. b7eeb8bf85ef24b730c9ba2891f5a8075ba9605e added some coverage of that interaction - it runs test_udfs with exec_single_node_rows_threshold set to a couple of different values, which exercises that case. -- To view, visit http://gerrit.cloudera.org:8080/7153 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I273bcee58641f5b97de52c0b2caab043c914b32e Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Juan Yu Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5316: Adds last day() function
Alex Behm has posted comments on this change. Change subject: IMPALA-5316: Adds last_day() function .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/6991/6/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: Line 279: void TestLastDayFunction() { Vincent, MJ, I don't really see how this code is executed, are you sure these tests are running? I don't see this code called from a GTest test. You should put these test cases into a GTest like this: TEST_F(ExprTest, LastDay) { // Code goes here } The ExprTest class should only contain helper functions and not test cases. -- To view, visit http://gerrit.cloudera.org:8080/6991 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I429c8734bddca3c37a2eedc211a16a4ffcb04370 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Vincent Tran Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Vincent Tran Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5571: Fix authorization tests
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5571: Fix authorization tests .. Patch Set 1: It's not clear to me yet. I am just posting this to unblock the builds. I need to investigate this more. I agree though, in theory that should work fine. -- To view, visit http://gerrit.cloudera.org:8080/7299 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icad893a10fdeb7a721264e69413014603dec6339 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4029: [DOCS] Include known issue for 'flat buffers' change
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4029: [DOCS] Include known issue for 'flat buffers' change .. Patch Set 4: Build started: http://jenkins.impala.io:8080/job/gerrit-docs-submit/132/ -- To view, visit http://gerrit.cloudera.org:8080/7236 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I65495748984268763b2690c193488a338409e13e Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: John Russell Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4029: [DOCS] Include known issue for 'flat buffers' change
John Russell has posted comments on this change. Change subject: IMPALA-4029: [DOCS] Include known issue for 'flat buffers' change .. Patch Set 3: (3 comments) http://gerrit.cloudera.org:8080/#/c/7236/3/docs/topics/impala_known_issues.xml File docs/topics/impala_known_issues.xml: PS3, Line 572: known : as topics > remove. statestore topics are not only the catalog updates. Done PS3, Line 576: topic > "catalog update topic" Done Line 585: until this issue is resolved. > You may want to mention that 'false' is the default value. Done -- To view, visit http://gerrit.cloudera.org:8080/7236 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I65495748984268763b2690c193488a338409e13e Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: John Russell Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4029: [DOCS] Include known issue for 'flat buffers' change
John Russell has posted comments on this change. Change subject: IMPALA-4029: [DOCS] Include known issue for 'flat buffers' change .. Patch Set 4: Code-Review+2 Carrying forward +2 from Dimitris. -- To view, visit http://gerrit.cloudera.org:8080/7236 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I65495748984268763b2690c193488a338409e13e Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: John Russell Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4029: [DOCS] Include known issue for 'flat buffers' change
Hello Dimitris Tsirogiannis, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7236 to look at the new patch set (#4). Change subject: IMPALA-4029: [DOCS] Include known issue for 'flat buffers' change .. IMPALA-4029: [DOCS] Include known issue for 'flat buffers' change Change-Id: I65495748984268763b2690c193488a338409e13e --- M docs/topics/impala_known_issues.xml 1 file changed, 26 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/7236/4 -- To view, visit http://gerrit.cloudera.org:8080/7236 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I65495748984268763b2690c193488a338409e13e Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: John Russell
[Impala-ASF-CR] IMPALA-5571: Fix authorization tests
Bharath Vissapragada has posted comments on this change. Change subject: IMPALA-5571: Fix authorization tests .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/7299/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: Line 2919: } else if (updatedPrivs.size() > 1) { This is more of a question than a comment. I know this is the behavior prior to your patch(IMPALA-5549) but what exactly prevents the processing/applying of multiple privileges in the coordinator/fe? Had a quick look at the code and ProcessCatalogUpdateResult() and updateCatalog() calls seems to handle multiple updated objects fine? Is something else causing the problem? -- To view, visit http://gerrit.cloudera.org:8080/7299 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icad893a10fdeb7a721264e69413014603dec6339 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Bharath Vissapragada Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5483: Automatically disable codegen for small queries
Alex Behm has posted comments on this change. Change subject: IMPALA-5483: Automatically disable codegen for small queries .. Patch Set 6: (2 comments) http://gerrit.cloudera.org:8080/#/c/7153/6/testdata/workloads/functional-planner/queries/PlannerTest/disable-codegen.test File testdata/workloads/functional-planner/queries/PlannerTest/disable-codegen.test: Line 6: Planner Hints: Disable codegen The term "hint" already has a different meaning as in "straight_join" kind of hint, so I think we'd need a different term. To move forward with this approach, I think we would need at least a plan to extend this to capture all changes the FE makes to query options, and we should be clear that we are referring to query options. I understand that codegen is unfortunately special with it's "hint" mechanism, but I don't think we should expose those details to the user. Do you think we want to embark on that journey to extend this report any time soon? I'm concerned about the incompleteness of only capturing the codegen decision. Also, I'm concerned about potential inconsistency with the query profile. An alternative is to check the effective codegen hint in a planner test like we do for PlannerTest.testComputeStatsMtDop() http://gerrit.cloudera.org:8080/#/c/7153/3/testdata/workloads/functional-query/queries/QueryTest/codegen.test File testdata/workloads/functional-query/queries/QueryTest/codegen.test: Line 2: QUERY > I don't believe so. "git grep 'optimization hints'" doesn't show up any tes The hint is applied to all FE->BE expr evaluations, and the single-node optimization tests it, so I'm not sure we need these tests here specifically. Otoh, the interaction between the disable_codegen query option and the disable_codegen_hint does not seem well tested, so perhaps we should add targeted tests for that in a separate patch, or even better get rid of the codegen hint altogether. -- To view, visit http://gerrit.cloudera.org:8080/7153 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I273bcee58641f5b97de52c0b2caab043c914b32e Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Juan Yu Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5431: Remove redundant path exists checks during table load
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5431: Remove redundant path exists checks during table load .. Patch Set 7: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/791/ -- To view, visit http://gerrit.cloudera.org:8080/7095 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id10ecf64ea2eda2d0f9299c0aa371933eca22281 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5431: Remove redundant path exists checks during table load
Bharath Vissapragada has posted comments on this change. Change subject: IMPALA-5431: Remove redundant path exists checks during table load .. Patch Set 7: Code-Review+2 (3 comments) Carrying +2. http://gerrit.cloudera.org:8080/#/c/7095/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: PS5, Line 302: RemoteIterator fileStatusIter = > nit: move it below L305. If we're going to return there is no need to get a Done PS5, Line 727: DataFile(fileStatus)) continue; > newFileDescs? Done. Rewrote a little. http://gerrit.cloudera.org:8080/#/c/7095/5/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java: Line 498: return fs.listStatus(p); > I am thinking maybe add a warn in the log that the path doesn't exist. Done -- To view, visit http://gerrit.cloudera.org:8080/7095 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id10ecf64ea2eda2d0f9299c0aa371933eca22281 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5431: Remove redundant path exists checks during table load
Hello Dimitris Tsirogiannis, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7095 to look at the new patch set (#7). Change subject: IMPALA-5431: Remove redundant path exists checks during table load .. IMPALA-5431: Remove redundant path exists checks during table load There are multiple places that do an exists() check on a path and then perform some subsequent action on it. This pattern results in two RPCs to the NN (one for the exists() check and one for the subsequent action). We can avoid the exists() check in these cases since most HDFS methods on paths throw a FileNotFoundException if the path does not exist. This can save an RPC to NN and improve the metadata loading time. Testing: Enough tests already cover this code path. This patch passed core and exhaustive tests. Metadata benchmark shows decent increase in perf numbers, for ex: 100K-PARTITIONS-1M-FILES-CUSTOM-05-QUERY-AFTER-INV -20.51% 80-PARTITIONS-250K-FILES-S3-03-RECOVER -20.58% 80-PARTITIONS-250K-FILES-11-DROP-PARTITION -22.13% 80-PARTITIONS-250K-FILES-S3-08-ADD-PARTITION -22.38% 80-PARTITIONS-250K-FILES-S3-12-DROP -23.69% 100K-PARTITIONS-1M-FILES-CUSTOM-11-REFRESH-PARTITION -23.91% 100K-PARTITIONS-1M-FILES-CUSTOM-10-REFRESH-AFTER-ADD-PARTITION -26.04% 100K-PARTITIONS-1M-FILES-CUSTOM-07-REFRESH -26.38% 80-PARTITIONS-250K-FILES-S3-02-CREATE -36.47% 100K-PARTITIONS-1M-FILES-CUSTOM-12-QUERY-PARTITIONS -58.72% 80-PARTITIONS-250K-FILES-S3-01-DROP -95.33% 80-PARTITIONS-250K-FILES-01-DROP -95.93% Change-Id: Id10ecf64ea2eda2d0f9299c0aa371933eca22281 --- M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java 2 files changed, 64 insertions(+), 39 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/7095/7 -- To view, visit http://gerrit.cloudera.org:8080/7095 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id10ecf64ea2eda2d0f9299c0aa371933eca22281 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis
[Impala-ASF-CR] IMPALA-4029: [DOCS] Include known issue for 'flat buffers' change
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4029: [DOCS] Include known issue for 'flat buffers' change .. Patch Set 3: Code-Review+2 (3 comments) http://gerrit.cloudera.org:8080/#/c/7236/3/docs/topics/impala_known_issues.xml File docs/topics/impala_known_issues.xml: PS3, Line 572: known : as topics remove. statestore topics are not only the catalog updates. PS3, Line 576: topic "catalog update topic" Line 585: until this issue is resolved. You may want to mention that 'false' is the default value. -- To view, visit http://gerrit.cloudera.org:8080/7236 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I65495748984268763b2690c193488a338409e13e Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: John Russell Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5483: Automatically disable codegen for small queries
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5483: Automatically disable codegen for small queries .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/7153/3/fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java File fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java: Line 60: || (missingStats && !scan.hasLimit() && scan.getConjuncts().isEmpty())) { > I'm thinking about the case: Done http://gerrit.cloudera.org:8080/#/c/7153/3/testdata/workloads/functional-query/queries/QueryTest/codegen.test File testdata/workloads/functional-query/queries/QueryTest/codegen.test: Line 2: QUERY > Don't we have other tests that confirm the codegen_hint is honored? Validat I don't believe so. "git grep 'optimization hints'" doesn't show up any tests. I added a planner test to test the planner decision about whether to disable codegen. I think this test addresses a valid coverage gap aside from that. -- To view, visit http://gerrit.cloudera.org:8080/7153 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I273bcee58641f5b97de52c0b2caab043c914b32e Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Juan Yu Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5483: Automatically disable codegen for small queries
Tim Armstrong has uploaded a new patch set (#7). Change subject: IMPALA-5483: Automatically disable codegen for small queries .. IMPALA-5483: Automatically disable codegen for small queries This is similar to the single-node execution optimisation, but applies to slightly larger queries that should run in a distributed manner but won't benefit from codegen. This adds a new query option disable_codegen_rows_threshold that defaults to 50,000. If fewer than this number of rows are processed by a plan node per impalad, the cost of codegen almost certainly outweighs the benefit. Using rows processed as a threshold is justified by a simple model that assumes the cost of codegen and execution per row for the same operation are proportional. E.g. if x is the complexity of the operation, n is the number of rows processed, C is a constant factor giving the cost of codegen and Ec/Ei are constant factor giving the cost of codegen'd and interpreted execution and d, then the cost of the codegen'd operator is C * x + Ec * x * n and the cost of the interpreted operator is Ei * x * n. Rearranging means that interpretation is cheaper if n < C / (Ei - Ec), i.e. that (at least with the simplified model) it makes sense to choose interpretation or codegen based on a constant threshold. The model also implies that it is somewhat safer to choose codegen because the additional cost of codegen is O(1) but the additional cost of interpretation is O(n). I ran some experiments with TPC-H Q1, varying the input table size, to determine what the cut-over point where codegen was beneficial was. The cutover was around 150k rows per node for both text and parquet. At 50k rows per node disabling codegen was very beneficial - around 0.12s versus 0.24s. To be somewhat conservative I set the default threshold to 50k rows. On more complex queries, e.g. TPC-H Q10, the cutover tends to be higher because there are plan nodes that process many fewer than the max rows. Fix a couple of minor issues in the frontend - the numNodes_ calculation could return 0 for Kudu, and the single node optimization didn't handle the case where for a scan node with conjuncts, a limit and missing stats correctly (it considered the estimate still valid.) Testing: Updated e2e tests that set disable_codegen to set disable_codegen_rows_threshold to 0, so that those tests run both with and without codegen still. Added an e2e test to make sure that the optimisation is applied in the backend. Added planner tests for various cases where codegen should and shouldn't be disabled. Perf: Added a targeted perf test for a join+agg over a small input, which benefits from this change. Change-Id: I273bcee58641f5b97de52c0b2caab043c914b32e --- M be/src/exprs/expr-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/disable-codegen.test M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test A testdata/workloads/functional-query/queries/QueryTest/codegen.test A testdata/workloads/targeted-perf/queries/primitive_small_join_1.test M tests/common/impala_test_suite.py M tests/common/test_dimensions.py A tests/query_test/test_codegen.py M tests/query_test/test_decimal_queries.py M tests/query_test/test_scanners_fuzz.py M tests/query_test/test_udfs.py 20 files changed, 455 insertions(+), 45 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/7153/7 -- To view, visit http://gerrit.cloudera.org:8080/7153 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I273bcee58641f5b97de52c0b2caab043c914b32e Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Juan Yu Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5551: Fix AggregationNode::Close() when Prepare() fails
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5551: Fix AggregationNode::Close() when Prepare() fails .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/7254 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia0f8cce7cf6e3183d3a5e145c2fcfa50f36c77e0 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5551: Fix AggregationNode::Close() when Prepare() fails
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-5551: Fix AggregationNode::Close() when Prepare() fails .. IMPALA-5551: Fix AggregationNode::Close() when Prepare() fails A recent change moved the initialization of output_tuple_desc_ to the constructor of AggregationNode. This changes the order in which tuple_pool_ and output_tuple_desc_ are initialized. The code in AggregationNode::Close() made the assumption that tuple_pool_ cannot be NULL (although without a DCHECK) if output_tuple_desc_ is not NULL. This no longer holds in the new code. This change makes AggregationNode::Close() checks tuple_pool_ to see if it's NULL before using it. Change-Id: Ia0f8cce7cf6e3183d3a5e145c2fcfa50f36c77e0 Reviewed-on: http://gerrit.cloudera.org:8080/7254 Reviewed-by: Tim Armstrong Tested-by: Impala Public Jenkins --- M be/src/exec/aggregation-node.cc 1 file changed, 3 insertions(+), 1 deletion(-) Approvals: Impala Public Jenkins: Verified Tim Armstrong: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/7254 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ia0f8cce7cf6e3183d3a5e145c2fcfa50f36c77e0 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5583: [DOCS] Document default join distribution mode query option
Alex Behm has posted comments on this change. Change subject: IMPALA-5583: [DOCS] Document default_join_distribution_mode query option .. Patch Set 1: (7 comments) http://gerrit.cloudera.org:8080/#/c/7300/1/docs/impala.ditamap File docs/impala.ditamap: Line 179: Why mention IMPALA-5583 also? http://gerrit.cloudera.org:8080/#/c/7300/1/docs/topics/impala_default_join_distribution_mode.xml File docs/topics/impala_default_join_distribution_mode.xml: Line 40: This option determines the join strategy that Impala uses when any of the tables We deliberately did not use "join strategy" in the option name because strategy is too generic. Line 47: Hive ANALYZE TABLE statement. Sure you want to keep the ANALYZE TABLE part? In most situations we cannot effectively use what Hive produces. Line 48: By default, when a table involved in the join query does not have statistics, Accuracy could be improved. What if both tables do not have stats? Clarify that one table is going to be broadcast. Might even be worth explicitly listing what happens if one table has stats and the other doesn't (the one without stats will be broadcast) Line 58: might be missing statistics due to the overhead involved in calculating them, I wouldn't suppose a particular reason for not having stats. Line 61: of a table involved in a join query and only transmits a portion of the table Not very accurate, both tables are transferred across the network. Not sure if we need to explain the differences between broadcast+shuffle here, maybe provide a link to their explanation/definition? Line 67: recommended when setting up and deploying new clusters. This setting is We should mention why we recommend this. SHUFFLE is generally a safer option because the join build will be less prone to spilling and/or OOM. -- To view, visit http://gerrit.cloudera.org:8080/7300 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4ec6213efc46bce0fe07c590841d51c009fb5c84 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: John Russell Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5483: Automatically disable codegen for small queries
Tim Armstrong has uploaded a new patch set (#6). Change subject: IMPALA-5483: Automatically disable codegen for small queries .. IMPALA-5483: Automatically disable codegen for small queries This is similar to the single-node execution optimisation, but applies to slightly larger queries that should run in a distributed manner but won't benefit from codegen. This adds a new query option disable_codegen_rows_threshold that defaults to 50,000. If fewer than this number of rows are processed by a plan node per impalad, the cost of codegen almost certainly outweighs the benefit. Using rows processed as a threshold is justified by a simple model that assumes the cost of codegen and execution per row for the same operation are proportional. E.g. if x is the complexity of the operation, n is the number of rows processed, C is a constant factor giving the cost of codegen and Ec/Ei are constant factor giving the cost of codegen'd and interpreted execution and d, then the cost of the codegen'd operator is C * x + Ec * x * n and the cost of the interpreted operator is Ei * x * n. Rearranging means that interpretation is cheaper if n < C / (Ei - Ec), i.e. that (at least with the simplified model) it makes sense to choose interpretation or codegen based on a constant threshold. The model also implies that it is somewhat safer to choose codegen because the additional cost of codegen is O(1) but the additional cost of interpretation is O(n). I ran some experiments with TPC-H Q1, varying the input table size, to determine what the cut-over point where codegen was beneficial was. The cutover was around 150k rows per node for both text and parquet. At 50k rows per node disabling codegen was very beneficial - around 0.12s versus 0.24s. To be somewhat conservative I set the default threshold to 50k rows. On more complex queries, e.g. TPC-H Q10, the cutover tends to be higher because there are plan nodes that process many fewer than the max rows. Fix a couple of minor issues in the frontend - the numNodes_ calculation could return 0 for Kudu, and the single node optimization didn't handle the case where for a scan node with conjuncts, a limit and missing stats correctly (it considered the estimate still valid.) Testing: Updated e2e tests that set disable_codegen to set disable_codegen_rows_threshold to 0, so that those tests run both with and without codegen still. Added an e2e test to make sure that the optimisation is applied in the backend. Added planner tests for various cases where codegen should and shouldn't be disabled. Perf: Added a targeted perf test for a join+agg over a small input, which benefits from this change. Change-Id: I273bcee58641f5b97de52c0b2caab043c914b32e --- M be/src/exprs/expr-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/disable-codegen.test M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test A testdata/workloads/functional-query/queries/QueryTest/codegen.test A testdata/workloads/targeted-perf/queries/primitive_small_join_1.test M tests/common/impala_test_suite.py M tests/common/test_dimensions.py A tests/query_test/test_codegen.py M tests/query_test/test_decimal_queries.py M tests/query_test/test_scanners_fuzz.py M tests/query_test/test_udfs.py 20 files changed, 455 insertions(+), 45 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/7153/6 -- To view, visit http://gerrit.cloudera.org:8080/7153 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I273bcee58641f5b97de52c0b2caab043c914b32e Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Juan Yu Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4029: [DOCS] Include known issue for 'flat buffers' change
John Russell has posted comments on this change. Change subject: IMPALA-4029: [DOCS] Include known issue for 'flat buffers' change .. Patch Set 2: (1 comment) Reworded to clarify that size increase is for the topic size, not the metadata itself. http://gerrit.cloudera.org:8080/#/c/7236/2/docs/topics/impala_known_issues.xml File docs/topics/impala_known_issues.xml: PS2, Line 572: Impala catalog metadata (table files : and data blocks) > This is not correct. IMPALA-4029 causes an increase in catalog update topic Done -- To view, visit http://gerrit.cloudera.org:8080/7236 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I65495748984268763b2690c193488a338409e13e Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: John Russell Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4029: [DOCS] Include known issue for 'flat buffers' change
John Russell has uploaded a new patch set (#3). Change subject: IMPALA-4029: [DOCS] Include known issue for 'flat buffers' change .. IMPALA-4029: [DOCS] Include known issue for 'flat buffers' change Change-Id: I65495748984268763b2690c193488a338409e13e --- M docs/topics/impala_known_issues.xml 1 file changed, 25 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/7236/3 -- To view, visit http://gerrit.cloudera.org:8080/7236 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I65495748984268763b2690c193488a338409e13e Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: John Russell
[Impala-ASF-CR] IMPALA-5431: Remove redundant path exists checks during table load
Bharath Vissapragada has posted comments on this change. Change subject: IMPALA-5431: Remove redundant path exists checks during table load .. Patch Set 6: - Updated the commit message with benchmark numbers. - Also, looks like CTAS analysis is likely a non-issue (also discussed with Alex offline) since the "create" part of CTAS results in creation of base directory for table (by HMS) and we always inherit correct ACLs of parent and the case that the table directory doesn't exist is not possible. - Ran the following test to confirm that. ➜ Impala git:(i5431) ✗ hadoop fs -setfacl -m user:bharath:rw- /tmp/foo ➜ Impala git:(i5431) ✗ hadoop fs -getfacl /tmp/foo # file: /tmp/foo # owner: bharath # group: supergroup user::rwx user:bharath:rw- group::r-x mask::rwx other::r-x [localhost:21000] > create table bar location '/tmp/foo/bar' as select * from tpch_parquet.nation; Query: create table bar location '/tmp/foo/bar' as select * from tpch_parquet.nation Query submitted at: 2017-06-26 15:55:28 (Coordinator: http://optimus:25000) Query progress can be monitored at: http://optimus:25000/query_plan?query_id=8d4f60928bebcc9c:f7e2c92f ++ | summary| ++ | Inserted 25 row(s) | ++ [localhost:21000] > select count(*) from bar; Query: select count(*) from bar Query submitted at: 2017-06-26 15:55:44 (Coordinator: http://optimus:25000) Query progress can be monitored at: http://optimus:25000/query_plan?query_id=8e44ede23419d848:afac5116 +--+ | count(*) | +--+ | 25 | +--+ ➜ Impala git:(i5431) ✗ hadoop fs -getfacl /tmp/foo/bar # file: /tmp/foo/bar # owner: bharath # group: supergroup user::rwx user:bharath:rw- group::rwx mask::rwx other::r-x -- To view, visit http://gerrit.cloudera.org:8080/7095 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id10ecf64ea2eda2d0f9299c0aa371933eca22281 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5583: [DOCS] Document default join distribution mode query option
John Russell has posted comments on this change. Change subject: IMPALA-5583: [DOCS] Document default_join_distribution_mode query option .. Patch Set 1: This gerrit is a "stake in the ground" with basic details about the query option. If there are other places where the option could be mentioned (under joins, compute stats, etc.), I'll handle those in a separate gerrit. -- To view, visit http://gerrit.cloudera.org:8080/7300 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4ec6213efc46bce0fe07c590841d51c009fb5c84 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: John Russell Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5431: Remove redundant path exists checks during table load
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5431: Remove redundant path exists checks during table load .. Patch Set 6: Code-Review+2 (3 comments) http://gerrit.cloudera.org:8080/#/c/7095/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: PS5, Line 302: Reference numUnknownDiskIds = new Reference(Long.valueOf(0)); nit: move it below L305. If we're going to return there is no need to get a new Reference. PS5, Line 727: new ArrayList() newFileDescs? http://gerrit.cloudera.org:8080/#/c/7095/5/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java: Line 498: } catch (FileNotFoundException e) { I am thinking maybe add a warn in the log that the path doesn't exist. -- To view, visit http://gerrit.cloudera.org:8080/7095 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id10ecf64ea2eda2d0f9299c0aa371933eca22281 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3643/IMPALA-5344: Fix FE tests on Java 8
Alex Behm has posted comments on this change. Change subject: IMPALA-3643/IMPALA-5344: Fix FE tests on Java 8 .. Patch Set 7: (3 comments) Let's move forward with this change. http://gerrit.cloudera.org:8080/#/c/7073/7/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: PS7, Line 206: p Why is this change needed? http://gerrit.cloudera.org:8080/#/c/7073/7/fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java File fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java: Line 123: this.label_.equals(vertex.label_); Remove this to make compareTo() consistent with equals(). Line 173: return ImmutableSortedSet.copyOf(sources_); Why not return an ordered list? Should be smaller, faster, and generate fewer useless intermediate objects. -- To view, visit http://gerrit.cloudera.org:8080/7073 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iad8e1dccec3a51293a109c420bd2b88b9d1e0625 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5583: [DOCS] Document default join distribution mode query option
John Russell has uploaded a new change for review. http://gerrit.cloudera.org:8080/7300 Change subject: IMPALA-5583: [DOCS] Document default_join_distribution_mode query option .. IMPALA-5583: [DOCS] Document default_join_distribution_mode query option New page for the query option. Change-Id: I4ec6213efc46bce0fe07c590841d51c009fb5c84 --- M docs/impala.ditamap M docs/impala_keydefs.ditamap A docs/topics/impala_default_join_distribution_mode.xml 3 files changed, 132 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/7300/1 -- To view, visit http://gerrit.cloudera.org:8080/7300 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I4ec6213efc46bce0fe07c590841d51c009fb5c84 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell
[Impala-ASF-CR] IMPALA-5431: Remove redundant path exists checks during table load
Hello Dimitris Tsirogiannis, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7095 to look at the new patch set (#6). Change subject: IMPALA-5431: Remove redundant path exists checks during table load .. IMPALA-5431: Remove redundant path exists checks during table load There are multiple places that do an exists() check on a path and then perform some subsequent action on it. This pattern results in two RPCs to the NN (one for the exists() check and one for the subsequent action). We can avoid the exists() check in these cases since most HDFS methods on paths throw a FileNotFoundException if the path does not exist. This can save an RPC to NN and improve the metadata loading time. Testing: Enough tests already cover this code path. This patch passed core and exhaustive tests. Metadata benchmark shows decent increase in perf numbers, for ex: 100K-PARTITIONS-1M-FILES-CUSTOM-05-QUERY-AFTER-INV -20.51% 80-PARTITIONS-250K-FILES-S3-03-RECOVER -20.58% 80-PARTITIONS-250K-FILES-11-DROP-PARTITION -22.13% 80-PARTITIONS-250K-FILES-S3-08-ADD-PARTITION -22.38% 80-PARTITIONS-250K-FILES-S3-12-DROP -23.69% 100K-PARTITIONS-1M-FILES-CUSTOM-11-REFRESH-PARTITION -23.91% 100K-PARTITIONS-1M-FILES-CUSTOM-10-REFRESH-AFTER-ADD-PARTITION -26.04% 100K-PARTITIONS-1M-FILES-CUSTOM-07-REFRESH -26.38% 80-PARTITIONS-250K-FILES-S3-02-CREATE -36.47% 100K-PARTITIONS-1M-FILES-CUSTOM-12-QUERY-PARTITIONS -58.72% 80-PARTITIONS-250K-FILES-S3-01-DROP -95.33% 80-PARTITIONS-250K-FILES-01-DROP -95.93% Change-Id: Id10ecf64ea2eda2d0f9299c0aa371933eca22281 --- M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java 2 files changed, 51 insertions(+), 26 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/7095/6 -- To view, visit http://gerrit.cloudera.org:8080/7095 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id10ecf64ea2eda2d0f9299c0aa371933eca22281 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis
[Impala-ASF-CR] IMPALA-5579: Fix IndexOutOfBoundsException in GetTables metadata request
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5579: Fix IndexOutOfBoundsException in GetTables metadata request .. Patch Set 2: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/790/ -- To view, visit http://gerrit.cloudera.org:8080/7296 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic933b9950d58e6f880d4078faf1a69ee7c6adbf1 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5579: Fix IndexOutOfBoundsException in GetTables metadata request
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5579: Fix IndexOutOfBoundsException in GetTables metadata request .. Patch Set 2: Code-Review+2 Rebase and keep Alex's +2 -- To view, visit http://gerrit.cloudera.org:8080/7296 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic933b9950d58e6f880d4078faf1a69ee7c6adbf1 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5571: Fix authorization tests
Dimitris Tsirogiannis has uploaded a new change for review. http://gerrit.cloudera.org:8080/7299 Change subject: IMPALA-5571: Fix authorization tests .. IMPALA-5571: Fix authorization tests Issue: With the fix for IMPALA-5549 the catalog cache of an impalad may end up with stale authorization metadata in the presence of GRANT/REVOKE statements that modify multiple privileges. Fix: Revert the behavior of GRANT/REVOKE privilege DDL statements. Change-Id: Icad893a10fdeb7a721264e69413014603dec6339 --- M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java 1 file changed, 6 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/7299/1 -- To view, visit http://gerrit.cloudera.org:8080/7299 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Icad893a10fdeb7a721264e69413014603dec6339 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis
[Impala-ASF-CR] IMPALA-5567: race in fragment instance teardown
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-5567: race in fragment instance teardown .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/7275/1/be/src/runtime/runtime-state.h File be/src/runtime/runtime-state.h: PS1, Line 294: /// Returns a non-OK status if query execution should stop (e.g., the query was : /// cancelled or a mem limit was exceeded). Exec nodes should check this periodically so : /// execution doesn't continue if the query terminates abnormally. This should not be : /// called after ReleaseResources(). : Status CheckQueryState(); > Yes, I wasn't suggesting that we should convert those Allocate() to TryAllo Completely agree -- To view, visit http://gerrit.cloudera.org:8080/7275 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If567e734042b5f2b82323368dd536dbf3bdf4744 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates .. Patch Set 12: Code-Review+2 carry +2 -- To view, visit http://gerrit.cloudera.org:8080/6963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates .. Patch Set 12: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/789/ -- To view, visit http://gerrit.cloudera.org:8080/6963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5567: race in fragment instance teardown
Michael Ho has posted comments on this change. Change subject: IMPALA-5567: race in fragment instance teardown .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/7275/2/be/src/exec/plan-root-sink.cc File be/src/exec/plan-root-sink.cc: Line 101: results_ = nullptr; > I don't feel strongly. I think the idea was to move these status checks mor It doesn't seem strictly necessary for this change. Feel free to do it later. http://gerrit.cloudera.org:8080/#/c/7275/1/be/src/runtime/runtime-state.h File be/src/runtime/runtime-state.h: PS1, Line 294: /// Returns a non-OK status if query execution should stop (e.g., the query was : /// cancelled or a mem limit was exceeded). Exec nodes should check this periodically so : /// execution doesn't continue if the query terminates abnormally. This should not be : /// called after ReleaseResources(). : Status CheckQueryState(); > I think we should do the work of removing Allocate() calls at some point bu Yes, I wasn't suggesting that we should convert those Allocate() to TryAllocate() in this change. That's just something we can do to get rid of CheckQueryState() altogether. -- To view, visit http://gerrit.cloudera.org:8080/7275 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If567e734042b5f2b82323368dd536dbf3bdf4744 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates .. Patch Set 12: Removed the preconditions check - it was too strict for the case when the hash join was in a subplan. -- To view, visit http://gerrit.cloudera.org:8080/6963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates
Hello Impala Public Jenkins, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6963 to look at the new patch set (#12). Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates .. IMPALA-5160: adjust spill buffer size based on planner estimates Scale down the buffer size in hash joins and hash aggregations if estimates indicate that the build side of the join is small. This greatly reduces minimum memory requirements for joins in some common cases, e.g. small dimension tables. Currently this is not plumbed through to the backend and only takes effect in planner tests. Testing: Added targeted planner tests for small/mid/large/unknown memory requirements for aggregations and joins. Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b --- M fe/src/main/java/org/apache/impala/common/RuntimeEnv.java M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/SortNode.java A fe/src/main/java/org/apache/impala/util/BitUtil.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java A testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test 10 files changed, 984 insertions(+), 39 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/6963/12 -- To view, visit http://gerrit.cloudera.org:8080/6963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates .. Patch Set 11: Verified-1 Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/786/ -- To view, visit http://gerrit.cloudera.org:8080/6963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5567: race in fragment instance teardown
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5567: race in fragment instance teardown .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/7275/2/be/src/exec/plan-root-sink.cc File be/src/exec/plan-root-sink.cc: Line 101: results_ = nullptr; > At first glance this looks reasonable to put here, but is this necessary fo I don't feel strongly. I think the idea was to move these status checks more closely to the expr evaluations that could cause errors to be hit. http://gerrit.cloudera.org:8080/#/c/7275/1/be/src/runtime/runtime-state.h File be/src/runtime/runtime-state.h: PS1, Line 294: /// Returns a non-OK status if query execution should stop (e.g., the query was : /// cancelled or a mem limit was exceeded). Exec nodes should check this periodically so : /// execution doesn't continue if the query terminates abnormally. This should not be : /// called after ReleaseResources(). : Status CheckQueryState(); > I like CheckAsyncError() for GetQueryStatus(). If we can convert all of the I think we should do the work of removing Allocate() calls at some point but I think we should keep fix as targeted as possible. That might be an argument to defer doing the rename as well. -- To view, visit http://gerrit.cloudera.org:8080/7275 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If567e734042b5f2b82323368dd536dbf3bdf4744 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5554: sorter DCHECK on null column
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5554: sorter DCHECK on null column .. Patch Set 4: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/788/ -- To view, visit http://gerrit.cloudera.org:8080/7295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7a8dee982501008efff5b5abc192cfb5e6544a90 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5554: sorter DCHECK on null column
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5554: sorter DCHECK on null column .. Patch Set 4: Code-Review+2 Carry +2 -- To view, visit http://gerrit.cloudera.org:8080/7295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7a8dee982501008efff5b5abc192cfb5e6544a90 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5554: sorter DCHECK on null column
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5554: sorter DCHECK on null column .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/7295/2/be/src/runtime/sorter.cc File be/src/runtime/sorter.cc: Line 1031: // This must be the first column with var len data for the tuple. Var len data > first column -> first slot Done -- To view, visit http://gerrit.cloudera.org:8080/7295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7a8dee982501008efff5b5abc192cfb5e6544a90 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5554: sorter DCHECK on null column
Hello Thomas Tauber-Marshall, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7295 to look at the new patch set (#3). Change subject: IMPALA-5554: sorter DCHECK on null column .. IMPALA-5554: sorter DCHECK on null column The bug was in the DCHECK. The DCHECK is intended to make sure that a tuple's string data didn't get split across blocks. The logic assumed that if the second-or-later string column was in the next-block, that the strings were split between blocks. However, that assumption is invalid if there are NULL strings, which do not belong in any block. The fix for the DCHECK (which is still useful) is to count the number of non-NULL strings and make sure that no non-NULL strings were split between blocks. Testing: Added a test that reproduces the crash. Change-Id: I7a8dee982501008efff5b5abc192cfb5e6544a90 --- M be/src/runtime/sorter.cc M testdata/workloads/functional-query/queries/QueryTest/single-node-large-sorts.test 2 files changed, 39 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/7295/3 -- To view, visit http://gerrit.cloudera.org:8080/7295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7a8dee982501008efff5b5abc192cfb5e6544a90 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4687: Get Impala working against HBase 2.0
Bharath Vissapragada has posted comments on this change. Change subject: IMPALA-4687: Get Impala working against HBase 2.0 .. Patch Set 2: Code-Review+1 (4 comments) http://gerrit.cloudera.org:8080/#/c/7277/1/be/src/exec/hbase-table-scanner.cc File be/src/exec/hbase-table-scanner.cc: PS1, Line 197: setStartRow > It turns out that setStartRow/setStopRow are newly deprecated. The new meth Ok, thanks. Mind adding a TODO with jira reference? PS1, Line 403: : Status HBaseTableScanner::HandleResultS > We need the actual jthrowable so that we can check whether it is an instanc Yea, I missed that part, thanks http://gerrit.cloudera.org:8080/#/c/7277/2/be/src/exec/hbase-table-scanner.cc File be/src/exec/hbase-table-scanner.cc: PS2, Line 411: error nit: Don't think error is the right term here. GetJniExceptionMsg() can just return Status:OK() as well. http://gerrit.cloudera.org:8080/#/c/7277/1/be/src/exec/hbase-table-scanner.h File be/src/exec/hbase-table-scanner.h: PS1, Line 65: HBASE-17809. : ///If ScannerTimeout nit: Same comment as in the .cc file. We don't always throw an error, we can just pass if there is no other exception. May be good to clear it up -- To view, visit http://gerrit.cloudera.org:8080/7277 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I87610e25c01b3547ec332c6975b61284b6837d27 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Joe McDonnell Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5567: race in fragment instance teardown
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-5567: race in fragment instance teardown .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/7275/2/be/src/exec/plan-root-sink.cc File be/src/exec/plan-root-sink.cc: Line 101: RETURN_IF_ERROR(state->GetQueryStatus()); At first glance this looks reasonable to put here, but is this necessary for this change? Looks like the caller is FIS::ExecInternal and the error status gets propagated. I'm just asking because we'll start seeing errors sooner, which seems like a good thing but may result in some different runtime behavior. http://gerrit.cloudera.org:8080/#/c/7275/1/be/src/runtime/runtime-state.h File be/src/runtime/runtime-state.h: PS1, Line 294: /// Returns a non-OK status if query execution should stop (e.g., the query was : /// cancelled or a mem limit was exceeded). Exec nodes should check this periodically so : /// execution doesn't continue if the query terminates abnormally. This should not be : /// called after ReleaseResources(). : Status CheckQueryState(); > I wonder if most of the remaining callers can be converted to CheckQuerySta I like CheckAsyncError() for GetQueryStatus(). If we can convert all of the other callsites to CheckQueryStatus(), that'd clearly be ideal, but that seems like a bigger change to move Allocate -> TryAllocate. However, maybe we can split CheckQueryState() into a fn that just updates the status_ on mem limit exceeded, then the places where we know the memory may have been over consumed we call that before calling CheckAsyncError()? I guess CheckMemLimitOrAsyncError() is at least more clear for now. I'm OK with fixing this more localized issue now. It's funny, CheckQueryStatus() clearly suffers from the QueryMaintenance() problem. -- To view, visit http://gerrit.cloudera.org:8080/7275 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If567e734042b5f2b82323368dd536dbf3bdf4744 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5431: Remove redundant path exists checks during table load
Alex Behm has posted comments on this change. Change subject: IMPALA-5431: Remove redundant path exists checks during table load .. Patch Set 5: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/7095 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id10ecf64ea2eda2d0f9299c0aa371933eca22281 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5431: Remove redundant path exists checks during table load
Bharath Vissapragada has posted comments on this change. Change subject: IMPALA-5431: Remove redundant path exists checks during table load .. Patch Set 4: (7 comments) As discussed, I'm checking the following, - Effect on CTAS when the path doesn't exist. - Running metadata benchmarks to see if there is some perf gain. Meanwhile, posting the rebased patch that addresses the comments. http://gerrit.cloudera.org:8080/#/c/7095/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: Line 267: // No need to load blocks for empty partitions list. > Right, but saving an RPC seems worth it. Done (in PS4). Line 725: } > remove while you are here Done http://gerrit.cloudera.org:8080/#/c/7095/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: Line 266: FileSystem fs = dirPath.getFileSystem(CONF); > Move this closer to where it is used. getFileSystem() tends to be expensive Done Line 305: if (fileStatusIter == null) { > I think you can make this a single line and remove the comment. Done Line 371: if (fileStatusIter == null) { > I think you can make this a single line and remove the comment. Done Line 1709: if (statuses == null) { > I think you can make this a single line and remove the comment. Done http://gerrit.cloudera.org:8080/#/c/7095/4/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java: Line 491:* the path does not exist. This helps simplifies the caller code in cases where > This helps simplify the caller code... Ouch, done. -- To view, visit http://gerrit.cloudera.org:8080/7095 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id10ecf64ea2eda2d0f9299c0aa371933eca22281 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5431: Remove redundant path exists checks during table load
Hello Dimitris Tsirogiannis, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7095 to look at the new patch set (#5). Change subject: IMPALA-5431: Remove redundant path exists checks during table load .. IMPALA-5431: Remove redundant path exists checks during table load There are multiple places that do an exists() check on a path and then perform some subsequent action on it. This pattern results in two RPCs to the NN (one for the exists() check and one for the subsequent action). We can avoid the exists() check in these cases since most HDFS methods on paths throw a FileNotFoundException if the path does not exist. This can save an RPC to NN and improve the metadata loading time. Testing: Enough tests already cover this code path. This patch passed core and exhaustive tests. Change-Id: Id10ecf64ea2eda2d0f9299c0aa371933eca22281 --- M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java 2 files changed, 51 insertions(+), 26 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/7095/5 -- To view, visit http://gerrit.cloudera.org:8080/7095 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id10ecf64ea2eda2d0f9299c0aa371933eca22281 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis
[Impala-ASF-CR] IMPALA-5554: sorter DCHECK on null column
Alex Behm has posted comments on this change. Change subject: IMPALA-5554: sorter DCHECK on null column .. Patch Set 2: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/7295/2/be/src/runtime/sorter.cc File be/src/runtime/sorter.cc: Line 1031: // This must be the first column with var len data for the tuple. Var len data first column -> first slot -- To view, visit http://gerrit.cloudera.org:8080/7295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7a8dee982501008efff5b5abc192cfb5e6544a90 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4687: Get Impala working against HBase 2.0
Joe McDonnell has uploaded a new patch set (#2). Change subject: IMPALA-4687: Get Impala working against HBase 2.0 .. IMPALA-4687: Get Impala working against HBase 2.0 This changes Impala code to tolerate the API differences between HBase 1.0 and HBase 2.0. It also drops compatibility code for older HBase versions. Specific changes: 1. Tolerate return value of Scan for Scan.setCaching() and Scan.setCacheBlocks(). This has no impact on our code. 2. HBase 2.0 eliminates the ScannerTimeoutException. The case that previously generated the exception will now recreate the scanner, so it is not necessary for our code to recreate the scanner. Short-circuit HandleResultScannerTimeout on HBase 2.0. 3. HBase 2.0 eliminates the Put.add(), which has been replaced with Put.addColumn(). This API exists in HBase 1.0, so it is safe to switch this completely. This was tested by verifying that an HBase 2.0 cluster starts up. Change-Id: I87610e25c01b3547ec332c6975b61284b6837d27 --- M be/src/exec/hbase-table-scanner.cc M be/src/exec/hbase-table-scanner.h M be/src/exec/hbase-table-writer.cc M be/src/exec/hbase-table-writer.h M be/src/util/jni-util.cc M be/src/util/jni-util.h 6 files changed, 89 insertions(+), 65 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/77/7277/2 -- To view, visit http://gerrit.cloudera.org:8080/7277 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I87610e25c01b3547ec332c6975b61284b6837d27 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Joe McDonnell
[Impala-ASF-CR] IMPALA-4687: Get Impala working against HBase 2.0
Joe McDonnell has posted comments on this change. Change subject: IMPALA-4687: Get Impala working against HBase 2.0 .. Patch Set 1: (13 comments) http://gerrit.cloudera.org:8080/#/c/7277/1//COMMIT_MSG Commit Message: PS1, Line 25: This was tested by verifying that an HBase 2.0 cluster : starts up. > I think we should run hbase stress test eventually to test the new ScannerT I think we can run stress tests with existing code. Since HBase stopped throwing this exception with HBASE-16266, I think this is already available in versions of HBase 1.2, which we already support. http://gerrit.cloudera.org:8080/#/c/7277/1/be/src/exec/hbase-table-scanner.cc File be/src/exec/hbase-table-scanner.cc: PS1, Line 238: : > May be retain this part. Done PS1, Line 197: setStartRow > You mentioned on the jira that these are deprecated too (setStartRow/setSto It turns out that setStartRow/setStopRow are newly deprecated. The new methods (with*Row) do not exist in HBase 1.2, so we can't replace it without having conditional code to detect the different APIs. I don't think it is worth switching yet. I will file a separate JIRA. PS1, Line 403: jthrowable exc = env->ExceptionOccurred(); : if (exc == NULL) return Status::OK(); > Unrelated to your change, but why are we calling this here? Shouldn't GetJn We need the actual jthrowable so that we can check whether it is an instance of ScannerTimeoutException. GetJniExceptionMsg only returns the message. Line 407: // necessary). For HBase 2.0, ScannerTimeoutException does not exist, so simple > typo: simply Done PS1, Line 407: simple > simply? Done Line 411: if (scanner_timeout_ex_cl_ == nullptr) return status; > Is there new exception that signals we need to retry? Is the retry not nece >From what I understand, HBase does a heartbeat now, and where it previously >would throw this exception, it now resets the scanner and retries. HBASE-16266 >is when this changed. Line 512: // Newer versions of HBase re-create the ResultScanner without throwing this > Is "re-create" the right term? I'm not sure how that would work. Do you mea Reworked this comment. Line 514: // code will simply return the error. > what is "the error"? Fixed this language. http://gerrit.cloudera.org:8080/#/c/7277/1/be/src/exec/hbase-table-scanner.h File be/src/exec/hbase-table-scanner.h: Line 61: /// 1. Scan.setCaching and Scan.setCacheBlocks tolerate the older void return value > use () for function/methods, e.g. Scan.setCaching() Done http://gerrit.cloudera.org:8080/#/c/7277/1/be/src/exec/hbase-table-writer.h File be/src/exec/hbase-table-writer.h: PS1, Line 108: add > addColumn Done http://gerrit.cloudera.org:8080/#/c/7277/1/be/src/util/jni-util.cc File be/src/util/jni-util.cc: Line 61: env->GetMethodID(class_ref, method_str, method_signature); > Reading the documentation for "GetMethodID" Good point. I haven't looked at the return value, but it does throw NoSuchMethodException. Maybe it also returns null. I think either would work, but in either case, we would need to clear the exception. http://gerrit.cloudera.org:8080/#/c/7277/1/be/src/util/jni-util.h File be/src/util/jni-util.h: PS1, Line 171: a method > ..non static method.. Done -- To view, visit http://gerrit.cloudera.org:8080/7277 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I87610e25c01b3547ec332c6975b61284b6837d27 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Joe McDonnell Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5431: Remove redundant path exists checks during table load
Alex Behm has posted comments on this change. Change subject: IMPALA-5431: Remove redundant path exists checks during table load .. Patch Set 4: Code-Review+1 (7 comments) http://gerrit.cloudera.org:8080/#/c/7095/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: Line 267: // No need to load blocks for empty partitions list. > Okay. I felt this adds a lot of try{} catch{} to the code and hence didn't Right, but saving an RPC seems worth it. Line 725: } remove while you are here http://gerrit.cloudera.org:8080/#/c/7095/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: Line 266: FileSystem fs = dirPath.getFileSystem(CONF); Move this closer to where it is used. getFileSystem() tends to be expensive, so great if we can avoid it! Line 305: if (fileStatusIter == null) { I think you can make this a single line and remove the comment. Line 371: if (fileStatusIter == null) { I think you can make this a single line and remove the comment. Line 1709: if (statuses == null) { I think you can make this a single line and remove the comment. http://gerrit.cloudera.org:8080/#/c/7095/4/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java: Line 491:* the path does not exist. This helps simplifies the caller code in cases where This helps simplify the caller code... -- To view, visit http://gerrit.cloudera.org:8080/7095 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id10ecf64ea2eda2d0f9299c0aa371933eca22281 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4848: Add WIDTH BUCKET() function
Alex Behm has posted comments on this change. Change subject: IMPALA-4848: Add WIDTH_BUCKET() function .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/6023/5/be/src/exprs/math-functions-ir.cc File be/src/exprs/math-functions-ir.cc: Line 452: Decimal16Value buckets = Decimal16Value::FromInt(input_precision, input_scale, It would be good to summarize the calculations and comment on the chosen types, e.g. why are we using Dec16 here? In general, please move definitions closer to where they are used. The buckets are only used a lot further down where it might become clearer why it should be Dec16. Adding those comments and doing some cleanup will make reviewing much easier. I'll still need to think more carefully about the logic. -- To view, visit http://gerrit.cloudera.org:8080/6023 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I081bc916b1bef7b929ca161a9aade3b54c6b858f Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadke Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: anujphadke Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5579: Fix IndexOutOfBoundsException in GetTables metadata request
Alex Behm has posted comments on this change. Change subject: IMPALA-5579: Fix IndexOutOfBoundsException in GetTables metadata request .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7296 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic933b9950d58e6f880d4078faf1a69ee7c6adbf1 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size
Alex Behm has posted comments on this change. Change subject: IMPALA-5500: Reduce catalog update topic size .. Patch Set 2: (7 comments) http://gerrit.cloudera.org:8080/#/c/7268/2/be/src/catalog/catalog-server.cc File be/src/catalog/catalog-server.cc: Line 330: if (!status.ok()) { take out of if/else since it's common to both cases Line 367: uint8_t* compressed_buffer_ptr = &compressed_buffer[0]; comptessed_buffer.data() Line 370: // Base64 encode the compressed catalog object and store it in the topic item. Is base64-encoding really necessary? Why don't the raw bytes work? http://gerrit.cloudera.org:8080/#/c/7268/2/be/src/catalog/catalog-server.h File be/src/catalog/catalog-server.h: Line 200: /// Serializes 'catalog_object' and compresses it using Snappy compression. remove second "compression" http://gerrit.cloudera.org:8080/#/c/7268/2/be/src/service/impala-server.cc File be/src/service/impala-server.cc: Line 1417: Status ImpalaServer::DecompressAndDeserializeCatalogObject(const TTopicItem& item, Why not separate decompression from deserialization? Seem like separate concerns. Then the code would be something like: buffer = raw buffer if (compaction is on) buffer = decompress(); DeserializeThriftMsg() ... http://gerrit.cloudera.org:8080/#/c/7268/2/be/src/util/compress.h File be/src/util/compress.h: Line 103: SnappyCompressor(MemPool* mem_pool = NULL, bool reuse_buffer = false); Why these changes? Shouldn't these always be created using CreateCompressor()? http://gerrit.cloudera.org:8080/#/c/7268/2/tests/custom_cluster/test_compact_catalog_updates.py File tests/custom_cluster/test_compact_catalog_updates.py: Line 45: "select count(*) from functional.alltypes") Check result? Maybe also use another query on a parquet table. -- To view, visit http://gerrit.cloudera.org:8080/7268 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Alex Behm Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5567: race in fragment instance teardown
Michael Ho has posted comments on this change. Change subject: IMPALA-5567: race in fragment instance teardown .. Patch Set 2: Code-Review+1 (1 comment) MJ may wanna do another pass. http://gerrit.cloudera.org:8080/#/c/7275/1/be/src/runtime/runtime-state.h File be/src/runtime/runtime-state.h: PS1, Line 294: /// Returns a non-OK status if query execution should stop (e.g., the query was : /// cancelled or a mem limit was exceeded). Exec nodes should check this periodically so : /// execution doesn't continue if the query terminates abnormally. This should not be : /// called after ReleaseResources(). : Status CheckQueryState(); > How about we rename them to be less vague. Maybe something like: I wonder if most of the remaining callers can be converted to CheckQueryStatus() given that we do mostly use TryAllocate() instead of Allocate(). Most of the remaining callers are DataSinks. May be if we fix hdfs-table-writer and friends to use TryAllocate(), things will be easier ? -- To view, visit http://gerrit.cloudera.org:8080/7275 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If567e734042b5f2b82323368dd536dbf3bdf4744 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5551: Fix AggregationNode::Close() when Prepare() fails
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5551: Fix AggregationNode::Close() when Prepare() fails .. Patch Set 2: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/787/ -- To view, visit http://gerrit.cloudera.org:8080/7254 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia0f8cce7cf6e3183d3a5e145c2fcfa50f36c77e0 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5567: race in fragment instance teardown
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5567: race in fragment instance teardown .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/7275/1/be/src/exec/plan-root-sink.cc File be/src/exec/plan-root-sink.cc: Line 102: ScalarExprEvaluator::FreeLocalAllocations(output_expr_evals_); > Not your change but should we check for query status here too ? I can imagi Done PS1, Line 150: DCHECK( > DCHECK_GE() Done http://gerrit.cloudera.org:8080/#/c/7275/1/be/src/runtime/runtime-state.h File be/src/runtime/runtime-state.h: PS1, Line 294: /// Returns a non-OK status if query execution should stop (e.g., the query was : /// cancelled or a mem limit was exceeded). Exec nodes should check this periodically so : /// execution doesn't continue if the query terminates abnormally. This should not be : /// called after ReleaseResources(). : Status CheckQueryState(); > I think this method is confusing to begin with, which probably led us to mi How about we rename them to be less vague. Maybe something like: CheckQueryState() -> CheckMemLimitOrAsyncError() GetQueryStatus() -> CheckAsyncError() query_status_ -> async_error_status_ -- To view, visit http://gerrit.cloudera.org:8080/7275 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If567e734042b5f2b82323368dd536dbf3bdf4744 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5567: race in fragment instance teardown
Hello Impala Public Jenkins, Lars Volker, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7275 to look at the new patch set (#2). Change subject: IMPALA-5567: race in fragment instance teardown .. IMPALA-5567: race in fragment instance teardown The bug is that PlanRootSink::GetNext() calls RuntimeState::CheckQueryState() was called concurrently with RuntimeState::ReleaseResources() and got a reference The other callsites of CheckQueryState() are safe because they are in two categories: * ExecNodes or DataSink methods executed by fragment instance execution threads, which must terminate before the runtime state resources are released. * In FeSupport where ReleaseResources() called after CheckQueryState() There is no need to asynchronously check for memory limit exceeded in PlanRootSink::GetNext() since that method does not allocate tracked memory that could push the query over the memory limit. Change-Id: If567e734042b5f2b82323368dd536dbf3bdf4744 --- M be/src/exec/plan-root-sink.cc M be/src/runtime/runtime-state.cc M be/src/runtime/runtime-state.h 3 files changed, 9 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/7275/2 -- To view, visit http://gerrit.cloudera.org:8080/7275 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: If567e734042b5f2b82323368dd536dbf3bdf4744 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5579: Fix IndexOutOfBoundsException in GetTables metadata request
Bharath Vissapragada has posted comments on this change. Change subject: IMPALA-5579: Fix IndexOutOfBoundsException in GetTables metadata request .. Patch Set 1: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/7296 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic933b9950d58e6f880d4078faf1a69ee7c6adbf1 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Bharath Vissapragada Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5551: Fix AggregationNode::Close() when Prepare() fails
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5551: Fix AggregationNode::Close() when Prepare() fails .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7254 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia0f8cce7cf6e3183d3a5e145c2fcfa50f36c77e0 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5551: Fix AggregationNode::Close() when Prepare() fails
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7254 to look at the new patch set (#2). Change subject: IMPALA-5551: Fix AggregationNode::Close() when Prepare() fails .. IMPALA-5551: Fix AggregationNode::Close() when Prepare() fails A recent change moved the initialization of output_tuple_desc_ to the constructor of AggregationNode. This changes the order in which tuple_pool_ and output_tuple_desc_ are initialized. The code in AggregationNode::Close() made the assumption that tuple_pool_ cannot be NULL (although without a DCHECK) if output_tuple_desc_ is not NULL. This no longer holds in the new code. This change makes AggregationNode::Close() checks tuple_pool_ to see if it's NULL before using it. Change-Id: Ia0f8cce7cf6e3183d3a5e145c2fcfa50f36c77e0 --- M be/src/exec/aggregation-node.cc 1 file changed, 3 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/7254/2 -- To view, visit http://gerrit.cloudera.org:8080/7254 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia0f8cce7cf6e3183d3a5e145c2fcfa50f36c77e0 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5579: Fix IndexOutOfBoundsException in GetTables metadata request
Dimitris Tsirogiannis has uploaded a new change for review. http://gerrit.cloudera.org:8080/7296 Change subject: IMPALA-5579: Fix IndexOutOfBoundsException in GetTables metadata request .. IMPALA-5579: Fix IndexOutOfBoundsException in GetTables metadata request Issue: GetTables metadata request may result in an IndexOutOfBoundsException while accessing table comments fields if one or more tables can't be retrieved from the catalog cache. Fix: Only add tables that can be retrieved from the catalog cache (i.e. no TableLoadingException is thrown) in the result of a GetTables request. Change-Id: Ic933b9950d58e6f880d4078faf1a69ee7c6adbf1 --- M fe/src/main/java/org/apache/impala/service/MetadataOp.java M fe/src/test/java/org/apache/impala/service/FrontendTest.java 2 files changed, 11 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/96/7296/1 -- To view, visit http://gerrit.cloudera.org:8080/7296 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ic933b9950d58e6f880d4078faf1a69ee7c6adbf1 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis
[Impala-ASF-CR] IMPALA-5031: Set DiskIoMgr::ScanRange::is cancelled before read
Michael Ho has posted comments on this change. Change subject: IMPALA-5031: Set DiskIoMgr::ScanRange::is_cancelled_ before read .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/7294/2/be/src/runtime/disk-io-mgr.h File be/src/runtime/disk-io-mgr.h: PS2, Line 584: is_cancelled_ = false; Actually, shouldn't this be done in Reset() instead ? May also want to fix the constructor to initialize this flag or move the initialization of other bool flags here ? -- To view, visit http://gerrit.cloudera.org:8080/7294 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I22898ec96ac44c4902f8072f27453cfc58358cae Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim Apple Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Ho Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4029: [DOCS] Include known issue for 'flat buffers' change
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4029: [DOCS] Include known issue for 'flat buffers' change .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/7236/2/docs/topics/impala_known_issues.xml File docs/topics/impala_known_issues.xml: PS2, Line 572: Impala catalog metadata (table files : and data blocks) This is not correct. IMPALA-4029 causes an increase in catalog update topic size when compact-catalog-topic is used. It doesn't increase catalog metadata in general, it actually has the opposite effect. -- To view, visit http://gerrit.cloudera.org:8080/7236 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I65495748984268763b2690c193488a338409e13e Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: John Russell Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5551: Fix AggregationNode::Close() when Prepare() fails
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5551: Fix AggregationNode::Close() when Prepare() fails .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7254 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia0f8cce7cf6e3183d3a5e145c2fcfa50f36c77e0 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3909: [DOCS] Add general info about Parquet min/max optimization
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-3909: [DOCS] Add general info about Parquet min/max optimization .. Patch Set 3: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/7068 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5fd5f7b157024f6089af7feffcb538c160bb130d Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: John Russell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3909: [DOCS] Add general info about Parquet min/max optimization
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-3909: [DOCS] Add general info about Parquet min/max optimization .. IMPALA-3909: [DOCS] Add general info about Parquet min/max optimization Just putting an initial stake in the ground. If examples, details of Hive interoperability, or type-by-type details are needed, I prefer to handle those in followup gerrits. Change-Id: I5fd5f7b157024f6089af7feffcb538c160bb130d Reviewed-on: http://gerrit.cloudera.org:8080/7068 Reviewed-by: Tim Armstrong Tested-by: Impala Public Jenkins --- M docs/topics/impala_parquet.xml 1 file changed, 19 insertions(+), 0 deletions(-) Approvals: Impala Public Jenkins: Verified Tim Armstrong: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/7068 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I5fd5f7b157024f6089af7feffcb538c160bb130d Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: John Russell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-2546: Fix handling of invalid timezones
Matthew Jacobs has abandoned this change. Change subject: IMPALA-2546: Fix handling of invalid timezones .. Abandoned Abandoning for now - Greg pointed out in the JIRA that the current behavior of these functions make sense. -- To view, visit http://gerrit.cloudera.org:8080/7216 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: abandon Gerrit-Change-Id: I587a16bb8e835a6eea02e1f87fd4033686e84d0f Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Matthew Jacobs
[Impala-ASF-CR] IMPALA-5240: Allow config of number of disk I/O threads per disk type
Bikramjeet Vig has uploaded a new patch set (#5). Change subject: IMPALA-5240: Allow config of number of disk I/O threads per disk type .. IMPALA-5240: Allow config of number of disk I/O threads per disk type Currently Impala defaults to 8 threads per flash disk and 1 thread per rotational disk. This can be overridden with --num_threads_per_disk, but that sets the thread count independent of disk type. This would allow control of the number of disk I/O threads spawned independently for solid state disks using "--num_io_threads_per_solid_state_disk" and for rotational disks using "--num_io_threads_per_rotational_disk" as startup parameters. Testing: Added backend tests that verify if "num_threads_per_disk", "num_io_threads_per_solid_state_disk" and "num_io_threads_per_rotational_disk" are used to spawn threads appropriately TODO (Request comments on this additional change): Additionally made some changes to fix a bug, where impala would crash if num_disks are set more than the number of logical disks on system and num_threads_per_disk is not set. This bug also lets the user create more disk queues than the num of logical disks which would eventually never be used. As a part of this effort, existing test cases in disk-io-mgr-test were identified that exploited this bug and hence have been fixed by explicitly setting num_disks=1 for now because the test code isn't yet aware of how many disks are actually on the test system. Test code will be updated to use multiple disks as a part of IMPALA-5574. Moreover, after this fix, if num_disks is set to more than the number of logical disks then it will default to max available disks and log a warning. Change-Id: I094aff3747104104fe0465d73dcdbef5d9892f7c --- M be/src/runtime/disk-io-mgr-stress-test.cc M be/src/runtime/disk-io-mgr-stress.cc M be/src/runtime/disk-io-mgr-test.cc M be/src/runtime/disk-io-mgr.cc M be/src/runtime/disk-io-mgr.h M be/src/util/thread.cc M be/src/util/thread.h M fe/src/test/java/org/apache/impala/testutil/SentryServicePinger.java 8 files changed, 410 insertions(+), 345 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/7232/5 -- To view, visit http://gerrit.cloudera.org:8080/7232 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I094aff3747104104fe0465d73dcdbef5d9892f7c Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5240: Allow config of number of disk I/O threads per disk type
Bikramjeet Vig has posted comments on this change. Change subject: IMPALA-5240: Allow config of number of disk I/O threads per disk type .. Patch Set 4: (12 comments) http://gerrit.cloudera.org:8080/#/c/7232/4//COMMIT_MSG Commit Message: Line 34: > Comment that we are changing the tests to explicitly set num_disks=1 for no Done http://gerrit.cloudera.org:8080/#/c/7232/4/be/src/runtime/disk-io-mgr-stress-test.cc File be/src/runtime/disk-io-mgr-stress-test.cc: Line 31: const int NUM_DISKS = 1; > comment why we are setting this to 1 for now, i.e. that the test code isn't I have added a TODO here as well as per your other comment in disk-io-mgr-test http://gerrit.cloudera.org:8080/#/c/7232/3/be/src/runtime/disk-io-mgr-test.cc File be/src/runtime/disk-io-mgr-test.cc: Line 1142: const int num_io_threads_per_rotational_or_ssd = 2; > const Done PS3, Line 1165: > weird indentation, just put the operator on the previous line and use 4 spa Done http://gerrit.cloudera.org:8080/#/c/7232/4/be/src/runtime/disk-io-mgr-test.cc File be/src/runtime/disk-io-mgr-test.cc: Line 212: const int num_disks = 1; > leave a TODO with the JIRA where we'll change this, same for the other case Done http://gerrit.cloudera.org:8080/#/c/7232/4/be/src/runtime/disk-io-mgr.cc File be/src/runtime/disk-io-mgr.cc: PS4, Line 87: // Rotational disks should have 1 thread per disk to minimize seeks. Non-rotational : // don't have this penalty and benefit from multiple concurrent IO requests. : static const int THREADS_PER_ROTATIONAL_DISK = 1; : static const int THREADS_PER_FLASH_DISK = 8; > I don't think there's any reason to keep these separately and then check if We need the default value of the flags to be zero in order to figure out if they were set or not. That is needed to follow the priority order of : 1. FLAG_num_io_threads_per_rotational_disk 2. num_threads_per_disk 3. THREADS_PER_ROTATIONAL_DISK PS4, Line 263: num_io_threads_per_rotational_disk_(FLAGS_num_io_threads_per_rotational_disk), : num_io_threads_per_solid_state_disk_(FLAGS_num_io_threads_per_solid_state_disk), > I think we can configure num_io_threads_per_rotational_disk_ and num_io_thr Done Line 278: " disks"; > To be consistent we should also warn if the value is negative Done PS4, Line 367: } else if (num_io_threads_per_rotational_disk_ != 0 && DiskInfo::is_rotational(i)) { : num_threads_per_disk = num_io_threads_per_rotational_disk_; : } else if (num_io_threads_per_solid_state_disk_ != 0 && !DiskInfo::is_rotational(i)) { : num_threads_per_disk = num_io_threads_per_solid_state_disk_; : } else if (FLAGS_num_threads_per_disk != 0) { : num_threads_per_disk = FLAGS_num_threads_per_disk; : } else if (DiskInfo::is_rotational(i)) { : num_threads_per_disk = THREADS_PER_ROTATIONAL_DISK; : } else { : num_threads_per_disk = THREADS_PER_FLASH_DISK; : } > After my suggestion in the constructor this becomes greatly simplified. Done http://gerrit.cloudera.org:8080/#/c/7232/3/be/src/runtime/disk-io-mgr.h File be/src/runtime/disk-io-mgr.h: PS3, Line 631: /// > comment this is for testing only Done PS3, Line 837: o the disk. > the naming of this should probably be num_io_threads_per_disk_ to be consis Went ahead with your latter suggestion and removed it altogether http://gerrit.cloudera.org:8080/#/c/7232/3/be/src/util/thread.cc File be/src/util/thread.cc: PS3, Line 335: const > nit: space after const Done -- To view, visit http://gerrit.cloudera.org:8080/7232 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I094aff3747104104fe0465d73dcdbef5d9892f7c Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5503: [DOCS] Document how to specify coordinator/executor nodes
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5503: [DOCS] Document how to specify coordinator/executor nodes .. Patch Set 3: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/7237 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia20db6af212122b1f87fc6999f8683860beb2bad Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: John Russell Gerrit-Reviewer: Mostafa Mokhtar Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5567: race in fragment instance teardown
Michael Ho has posted comments on this change. Change subject: IMPALA-5567: race in fragment instance teardown .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/7275/1/be/src/exec/plan-root-sink.cc File be/src/exec/plan-root-sink.cc: Line 102: ScalarExprEvaluator::FreeLocalAllocations(output_expr_evals_); Not your change but should we check for query status here too ? I can imagine expression evaluations above can lead to MemLimitExceeded. Feel free to do it later though. PS1, Line 150: DCHECK( DCHECK_GE() -- To view, visit http://gerrit.cloudera.org:8080/7275 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If567e734042b5f2b82323368dd536dbf3bdf4744 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5503: [DOCS] Document how to specify coordinator/executor nodes
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-5503: [DOCS] Document how to specify coordinator/executor nodes .. IMPALA-5503: [DOCS] Document how to specify coordinator/executor nodes Cf. IMPALA-3807 and IMPALA-5147. Change-Id: Ia20db6af212122b1f87fc6999f8683860beb2bad Reviewed-on: http://gerrit.cloudera.org:8080/7237 Reviewed-by: John Russell Tested-by: Impala Public Jenkins --- M docs/impala_keydefs.ditamap M docs/topics/impala_components.xml M docs/topics/impala_new_features.xml M docs/topics/impala_scalability.xml 4 files changed, 129 insertions(+), 0 deletions(-) Approvals: Impala Public Jenkins: Verified John Russell: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/7237 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ia20db6af212122b1f87fc6999f8683860beb2bad Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: John Russell Gerrit-Reviewer: Mostafa Mokhtar
[Impala-ASF-CR] IMPALA-4029: [DOCS] Include known issue for 'flat buffers' change
John Russell has posted comments on this change. Change subject: IMPALA-4029: [DOCS] Include known issue for 'flat buffers' change .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/7236/1/docs/topics/impala_known_issues.xml File docs/topics/impala_known_issues.xml: PS1, Line 572: in memory usage for Impala catalog metadata (table files : and data blocks). By default, the memory overhead is about 5-7%. If the : compact_catalog_topic flag is used, the memory usage is : more substantial, approximately twice as much as in previous versions. > Increase in memory usage is not the correct description. The effect is incr Done PS1, Line 578: High > I'd say Medium. Done PS1, Line 583: A fix is in the pipeline > IMPALA-5500 is the jira for this fix. Done -- To view, visit http://gerrit.cloudera.org:8080/7236 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I65495748984268763b2690c193488a338409e13e Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: John Russell Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4029: [DOCS] Include known issue for 'flat buffers' change
John Russell has uploaded a new patch set (#2). Change subject: IMPALA-4029: [DOCS] Include known issue for 'flat buffers' change .. IMPALA-4029: [DOCS] Include known issue for 'flat buffers' change Change-Id: I65495748984268763b2690c193488a338409e13e --- M docs/topics/impala_known_issues.xml 1 file changed, 24 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/7236/2 -- To view, visit http://gerrit.cloudera.org:8080/7236 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I65495748984268763b2690c193488a338409e13e Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: John Russell
[Impala-ASF-CR] IMPALA-5567: race in fragment instance teardown
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-5567: race in fragment instance teardown .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/7275/1/be/src/runtime/runtime-state.h File be/src/runtime/runtime-state.h: PS1, Line 294: /// Returns a non-OK status if query execution should stop (e.g., the query was : /// cancelled or a mem limit was exceeded). Exec nodes should check this periodically so : /// execution doesn't continue if the query terminates abnormally. This should not be : /// called after ReleaseResources(). : Status CheckQueryState(); I think this method is confusing to begin with, which probably led us to misuse it. We have a very overloaded concept of 'query state'. We don't necessarily have to address this now (e.g. the change in plan-fragment-root.cc seems good regardless), but we should probably rethink it. -- To view, visit http://gerrit.cloudera.org:8080/7275 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If567e734042b5f2b82323368dd536dbf3bdf4744 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes