[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test
Taras Bobrovytsky has uploaded a new patch set (#8). Change subject: IMPALA-4467: Add support for DML statements in stress test .. IMPALA-4467: Add support for DML statements in stress test - Add support for insert, upsert, update and and delete statements. - Add support for compute stats with mt_dop query options. - Update impyla version in order to be able to have access to query error text for DML queries. - Made flake8 fixes. flake8 on this file is clean. For every Kudu table in the databases, we make a copy and add a '_original' suffix to the table name. The DML queries will only make modifications to the non original table, the original table will never be modified. The orignal tables could be used to bring the non-original table to the inital state. Two flags were added for doing this: --reset-databases-before-binary-search and --reset-databases-after-binary-search. The DML queries are generated based on the mod values passed in with the following flag: --dml-mod-values 11 13 17. For each mod value 4 DML queries are generated. The DML operations will touch table rows where primary_key % mod_value = 0. So, the larger the mod value, the more rows would be affected. The DML queries are generated in such a way that the data for the insert, upsert, and update queries is taken from the table with the _original suffix. The stress test generates DML queries for only kudu databases. For example, --tpch-kudu-db=tpch_100_kudu --tpch-db=tpch_100 --generate-dml-queries would only generate queries for the tpch_100_kudu database. Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 --- M infra/python/deps/requirements.txt M tests/stress/concurrent_select.py M tests/util/parse_util.py 3 files changed, 691 insertions(+), 269 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/5093/8 -- To view, visit http://gerrit.cloudera.org:8080/5093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-3200: Implement suballocator for splitting buffers
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3200: Implement suballocator for splitting buffers .. Patch Set 19: Code-Review+2 Carry +2 -- To view, visit http://gerrit.cloudera.org:8080/4715 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8bfe0e429f67ad273f7c7d0816703a9e6c3da788 Gerrit-PatchSet: 19 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3200: Implement suballocator for splitting buffers
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-3200: Implement suballocator for splitting buffers .. Patch Set 19: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/123/ -- To view, visit http://gerrit.cloudera.org:8080/4715 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8bfe0e429f67ad273f7c7d0816703a9e6c3da788 Gerrit-PatchSet: 19 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3200: Implement suballocator for splitting buffers
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3200: Implement suballocator for splitting buffers .. Patch Set 15: (3 comments) http://gerrit.cloudera.org:8080/#/c/4715/14/be/src/runtime/bufferpool/suballocator.cc File be/src/runtime/bufferpool/suballocator.cc: Line 107: > But once we switch to using NEVER, will we still need WHEN_NEEDED? I pulled it out of this patch. I'll keep the code around, would be easy to add back in in future. http://gerrit.cloudera.org:8080/#/c/4715/15/be/src/runtime/bufferpool/suballocator.cc File be/src/runtime/bufferpool/suballocator.cc: Line 50: } > missing space Done http://gerrit.cloudera.org:8080/#/c/4715/15/be/src/runtime/bufferpool/suballocator.h File be/src/runtime/bufferpool/suballocator.h: PS15, Line 128: bein > missing space Done -- To view, visit http://gerrit.cloudera.org:8080/4715 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8bfe0e429f67ad273f7c7d0816703a9e6c3da788 Gerrit-PatchSet: 15 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3200: Implement suballocator for splitting buffers
Hello Jim Apple, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4715 to look at the new patch set (#18). Change subject: IMPALA-3200: Implement suballocator for splitting buffers .. IMPALA-3200: Implement suballocator for splitting buffers This is useful for situations like hash tables, where we want to make multiple non-spillable allocations of variable size from buffer pool memory and not incur the overhead of interacting with the global buffer pool. The allocator subdivides buffers to service allocations and uses a buddy allocation algorithm to merge freed allocations into larger chunks. This helps avoid external fragmentation and is quite effective at reusing memory given the typical doubling allocation patterns of hash tables in partitioned aggs and joins. Testing: The allocator has fairly robust internal consistency checks via assertions and unique_ptrs. Includes a unit test that exercises various allocation patterns. I tested porting hash tables over to allocate memory using an earlier version of the suballocator, which worked well (was able to run a wide range of queries successfully with good performance). Change-Id: I8bfe0e429f67ad273f7c7d0816703a9e6c3da788 --- M be/src/common/names.h M be/src/exec/hash-table.h M be/src/runtime/bufferpool/CMakeLists.txt A be/src/runtime/bufferpool/suballocator-test.cc A be/src/runtime/bufferpool/suballocator.cc A be/src/runtime/bufferpool/suballocator.h 6 files changed, 857 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/4715/18 -- To view, visit http://gerrit.cloudera.org:8080/4715 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8bfe0e429f67ad273f7c7d0816703a9e6c3da788 Gerrit-PatchSet: 18 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-3200: Implement suballocator for splitting buffers
Hello Jim Apple, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4715 to look at the new patch set (#17). Change subject: IMPALA-3200: Implement suballocator for splitting buffers .. IMPALA-3200: Implement suballocator for splitting buffers This is useful for situations like hash tables, where we want to make multiple non-spillable allocations of variable size from buffer pool memory and not incur the overhead of interacting with the global buffer pool. The allocator subdivides buffers to service allocations and uses a buddy allocation algorithm to merge freed allocations into larger chunks. This helps avoid external fragmentation and is quite effective at reusing memory given the typical doubling allocation patterns of hash tables in partitioned aggs and joins. Testing: The allocator has fairly robust internal consistency checks via assertions and unique_ptrs. Includes a unit test that exercises various allocation patterns. I tested porting hash tables over to allocate memory using an earlier version of the suballocator, which worked well (was able to run a wide range of queries successfully with good performance). Change-Id: I8bfe0e429f67ad273f7c7d0816703a9e6c3da788 --- M be/src/common/names.h M be/src/exec/hash-table.h M be/src/runtime/bufferpool/CMakeLists.txt A be/src/runtime/bufferpool/suballocator-test.cc A be/src/runtime/bufferpool/suballocator.cc A be/src/runtime/bufferpool/suballocator.h 6 files changed, 856 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/4715/17 -- To view, visit http://gerrit.cloudera.org:8080/4715 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8bfe0e429f67ad273f7c7d0816703a9e6c3da788 Gerrit-PatchSet: 17 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4633: Change broken gflag default for Kudu client mem
Internal Jenkins has posted comments on this change. Change subject: IMPALA-4633: Change broken gflag default for Kudu client mem .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/5503 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0b3544f9a93c82e347f6e97540d6b561c30d09fd Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4633: Change broken gflag default for Kudu client mem
Internal Jenkins has submitted this change and it was merged. Change subject: IMPALA-4633: Change broken gflag default for Kudu client mem .. IMPALA-4633: Change broken gflag default for Kudu client mem We discovered that the current Kudu client defaults in the KuduTableSink are causing a large number of queries to timeout, failing the query. The current default value of the 'mutation buffer size' is 100MB which results in higher write throughput than Kudu can currently handle on large clusters. By decreasing the value of this flag, more RPCs will be sent for the same amount of data, i.e. throttling the load on Kudu. We found tests to be more successful on 200 nodes with a 10MB buffer size than the previous 100MB value where most queries couldn't complete due to timeouts. These queries were not timing out with the 10MB value. This appears to work well on 9 node stress tests as well. Change-Id: I0b3544f9a93c82e347f6e97540d6b561c30d09fd Reviewed-on: http://gerrit.cloudera.org:8080/5503 Reviewed-by: Dan Hecht Tested-by: Internal Jenkins --- M be/src/exec/kudu-table-sink.cc M be/src/exec/kudu-table-sink.h 2 files changed, 5 insertions(+), 4 deletions(-) Approvals: Internal Jenkins: Verified Dan Hecht: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/5503 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I0b3544f9a93c82e347f6e97540d6b561c30d09fd Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Matthew Jacobs
[Impala-ASF-CR] IMPALA-3200: Implement suballocator for splitting buffers
Dan Hecht has posted comments on this change. Change subject: IMPALA-3200: Implement suballocator for splitting buffers .. Patch Set 15: (1 comment) http://gerrit.cloudera.org:8080/#/c/4715/15/be/src/runtime/bufferpool/suballocator.cc File be/src/runtime/bufferpool/suballocator.cc: Line 50: } missing space -- To view, visit http://gerrit.cloudera.org:8080/4715 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8bfe0e429f67ad273f7c7d0816703a9e6c3da788 Gerrit-PatchSet: 15 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3200: Implement suballocator for splitting buffers
Dan Hecht has posted comments on this change. Change subject: IMPALA-3200: Implement suballocator for splitting buffers .. Patch Set 15: Code-Review+2 (2 comments) http://gerrit.cloudera.org:8080/#/c/4715/14/be/src/runtime/bufferpool/suballocator.cc File be/src/runtime/bufferpool/suballocator.cc: Line 107: > WHEN_NEEDED will be convenient for porting the existing exec nodes, and I t But once we switch to using NEVER, will we still need WHEN_NEEDED? In general, I think we should keep things simple until we have a need, to avoid adding dead code or over engineering things. But if you're confident both will need both, then okay, but otherwise, let's just add back the extra policy when we actually have a need for it. http://gerrit.cloudera.org:8080/#/c/4715/15/be/src/runtime/bufferpool/suballocator.h File be/src/runtime/bufferpool/suballocator.h: PS15, Line 128: bein missing space -- To view, visit http://gerrit.cloudera.org:8080/4715 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8bfe0e429f67ad273f7c7d0816703a9e6c3da788 Gerrit-PatchSet: 15 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test
Hello Michael Brown, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5093 to look at the new patch set (#8). Change subject: IMPALA-4467: Add support for DML statements in stress test .. IMPALA-4467: Add support for DML statements in stress test - Add support for insert, upsert, update and and delete statements. - Add support for compute stats with mt_dop query options. - Update impyla version in order to be able to have access to query error text for DML queries. - Made flake8 fixes. flake8 on this file is clean. For every Kudu table in the databases, we make a copy and add a '_original' suffix to the table name. The DML queries will only make modifications to the non original table, the original table will never be modified. The orignal tables could be used to bring the non-original table to the inital state. Two flags were added for doing this: --reset-databases-before-binary-search and --reset-databases-after-binary-search. The DML queries are generated based on the mod values passed in with the following flag: --dml-mod-values 11 13 17. For each mod value 4 DML queries are generated. The DML operations will touch table rows where primary_key % mod_value = 0. So, the larger the mod value, the more rows would be affected. The DML queries are generated in such a way that the data for the insert, upsert, and update queries is taken from the table with the _original suffix. The stress test generates DML queries for only kudu databases. For example, --tpch-kudu-db=tpch_100_kudu --tpch-db=tpch_100 --generate-dml-queries would only generate queries for the tpch_100_kudu database. Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 --- M infra/python/deps/requirements.txt M tests/stress/concurrent_select.py M tests/util/parse_util.py 3 files changed, 691 insertions(+), 269 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/5093/8 -- To view, visit http://gerrit.cloudera.org:8080/5093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-3200: Implement suballocator for splitting buffers
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3200: Implement suballocator for splitting buffers .. Patch Set 15: Code-Review+1 carry +1 -- To view, visit http://gerrit.cloudera.org:8080/4715 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8bfe0e429f67ad273f7c7d0816703a9e6c3da788 Gerrit-PatchSet: 15 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3200: Implement suballocator for splitting buffers
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3200: Implement suballocator for splitting buffers .. Patch Set 14: (9 comments) http://gerrit.cloudera.org:8080/#/c/4715/14/be/src/runtime/bufferpool/suballocator.cc File be/src/runtime/bufferpool/suballocator.cc: Line 68: > extraneous blank Done Line 80: DCHECK(free_node != nullptr); > it looks like this dcheck can fire if SplitToSize() fails to allocate a nod Good point - SplitToSize() should return an error in that case. Fixed. Line 107: } > why do we need both policies? WHEN_NEEDED will be convenient for porting the existing exec nodes, and I think we'll need NEVER in future when exec nodes want to have better control over their reservations. If we want to remove NEVER for the time being that also works. Line 137: return Status::OK(); > Now that I read one of Dan's comments, I wonder if this should be an error Yep this was a mistake. Line 216: *ptr_from_prev = move(node->next_free_); > should this reset result->prev_free_ so we don't have a dangling reference, Good point - Done http://gerrit.cloudera.org:8080/#/c/4715/14/be/src/runtime/bufferpool/suballocator.h File be/src/runtime/bufferpool/suballocator.h: Line 31: /// allocation algorithm optimised for power-of-two allocations. Uses a binary buddy > does it require that the buffer pool buffers themselves are power of two? w Expanded to be clearer about how min_buffer_len plays into this and what is a power of two. We require that we can allocate power-of-two buffers from the buffer pool. Line 104: static constexpr int64_t MAX_ALLOCATION_BYTES = 1L << LOG_MAX_ALLOCATION_BYTES; > after reading the rest of the code I see it's used so to bound the depth of Done PS14, Line 125: 'node' should already be free and not in : /// any free list > what does this mean given that free nodes are in the free list? do you mean It's really referring to node->in_use so updated to reflect. PS14, Line 159: support > supported? Done -- To view, visit http://gerrit.cloudera.org:8080/4715 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8bfe0e429f67ad273f7c7d0816703a9e6c3da788 Gerrit-PatchSet: 14 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3200: Implement suballocator for splitting buffers
Hello Jim Apple, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4715 to look at the new patch set (#15). Change subject: IMPALA-3200: Implement suballocator for splitting buffers .. IMPALA-3200: Implement suballocator for splitting buffers This is useful for situations like hash tables, where we want to make multiple non-spillable allocations of variable size from buffer pool memory and not incur the overhead of interacting with the global buffer pool. The allocator subdivides buffers to service allocations and uses a buddy allocation algorithm to merge freed allocations into larger chunks. This helps avoid external fragmentation and is quite effective at reusing memory given the typical doubling allocation patterns of hash tables in partitioned aggs and joins. Testing: The allocator has fairly robust internal consistency checks via assertions and unique_ptrs. Includes a unit test that exercises various allocation patterns. I tested porting hash tables over to allocate memory using an earlier version of the suballocator, which worked well (was able to run a wide range of queries successfully with good performance). Change-Id: I8bfe0e429f67ad273f7c7d0816703a9e6c3da788 --- M be/src/common/names.h M be/src/exec/hash-table.h M be/src/runtime/bufferpool/CMakeLists.txt A be/src/runtime/bufferpool/suballocator-test.cc A be/src/runtime/bufferpool/suballocator.cc A be/src/runtime/bufferpool/suballocator.h 6 files changed, 941 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/4715/15 -- To view, visit http://gerrit.cloudera.org:8080/4715 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8bfe0e429f67ad273f7c7d0816703a9e6c3da788 Gerrit-PatchSet: 15 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4639: Add pytest option and xfail markers for tests that only run locally.
Internal Jenkins has posted comments on this change. Change subject: IMPALA-4639: Add pytest option and xfail markers for tests that only run locally. .. Patch Set 5: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/5446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id4d6e46dc1e64ad20c727ccb19af7a9f3daf917f Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Knupp Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4639: Add pytest option and xfail markers for tests that only run locally.
Internal Jenkins has submitted this change and it was merged. Change subject: IMPALA-4639: Add pytest option and xfail markers for tests that only run locally. .. IMPALA-4639: Add pytest option and xfail markers for tests that only run locally. As we're beginning to run Impala end-to-end tests on remote clusters, we're finding some tests that do not pass for infrastructure-related reasons (as opposed to product issues.) It would be useful to be able to xfail any tests that we know to be problematic within a given module, yet still run the others. This way, we can get passing test runs as we're ironing out those infrastructure issues. Change-Id: Id4d6e46dc1e64ad20c727ccb19af7a9f3daf917f Reviewed-on: http://gerrit.cloudera.org:8080/5446 Reviewed-by: Alex Behm Tested-by: Internal Jenkins --- M tests/conftest.py M tests/metadata/test_compute_stats.py M tests/query_test/test_mt_dop.py 3 files changed, 24 insertions(+), 0 deletions(-) Approvals: Internal Jenkins: Verified Alex Behm: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/5446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Id4d6e46dc1e64ad20c727ccb19af7a9f3daf917f Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Knupp Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-3200: Implement suballocator for splitting buffers
Jim Apple has posted comments on this change. Change subject: IMPALA-3200: Implement suballocator for splitting buffers .. Patch Set 14: (1 comment) http://gerrit.cloudera.org:8080/#/c/4715/14/be/src/runtime/bufferpool/suballocator.cc File be/src/runtime/bufferpool/suballocator.cc: Line 137: return Status::OK(); Now that I read one of Dan's comments, I wonder if this should be an error returned here. Otherwise, it looks like this method never returns an error and could just have a void return type. -- To view, visit http://gerrit.cloudera.org:8080/4715 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8bfe0e429f67ad273f7c7d0816703a9e6c3da788 Gerrit-PatchSet: 14 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3202,IMPALA-2298: rework scratch file I/O
Tim Armstrong has uploaded a new patch set (#15). Change subject: IMPALA-3202,IMPALA-2298: rework scratch file I/O .. IMPALA-3202,IMPALA-2298: rework scratch file I/O Refactor BufferedBlockMgr/TmpFileMgr to push more I/O logic into TmpFileMgr, in anticipation of it being shared with BufferPool. TmpFileMgr now handles: * Scratch space allocation and recycling * Read and write I/O The interface is also greatly changed so that it is built around Write() and Read() calls, abstracting away the details of temporary file allocation from clients. This means the TmpFileMgr::File class can be hidden from clients. Write error recovery: Also implement write error recovery in TmpFileMgr. If an error occurs while writing to scratch and we have multiple scratch directories, we will try one of the other directories before cancelling the query. File-level blacklisting is used to prevent excessive repeated attempts to resize a scratch file during a single query. Device-level blacklisting is still disabled because it is problematic to permanently take a scratch directory out of use. To reduce the number of error paths, all I/O errors are now handled asynchronously. Previously errors creating or extending the file were returned synchronously from WriteUnpinnedBlock(). This required modifying DiskIoMgr to create the file if not present when opened. Future Work: * Support for recycling variable-length scratch file ranges. I omitted this to avoid making the patch even large. Testing: Updated BufferedBlockMgr unit test to reflect changes in behaviour: * Scratch space is no longer permanently associated with a block, and is remapped every time a new block is written to disk . * Files are now blacklisted - updated existing tests and enable the disable blacklisting test. Added some basic testing of recycling of scratch file ranges in the TmpFileMgr unit test. I also manually tested the code in two ways. First by removing permissions for /tmp/impala-scratch and ensuring that a spilling query fails cleanly. Second, by creating a tiny ramdisk (16M) and running with two scratch directories: one on /tmp and one on the tiny ramdisk. When spilling, an out of space error is encountered for the tiny ramdisk and impala spills the remaining data (72M) to /tmp. Change-Id: I8c9c587df006d2f09d72dd636adafbd295fcdc17 --- M be/src/runtime/buffered-block-mgr-test.cc M be/src/runtime/buffered-block-mgr.cc M be/src/runtime/buffered-block-mgr.h M be/src/runtime/disk-io-mgr-test.cc M be/src/runtime/disk-io-mgr.cc M be/src/runtime/disk-io-mgr.h M be/src/runtime/query-state.cc A be/src/runtime/tmp-file-mgr-internal.h M be/src/runtime/tmp-file-mgr-test.cc M be/src/runtime/tmp-file-mgr.cc M be/src/runtime/tmp-file-mgr.h A be/src/util/buffer-ref.h M be/src/util/filesystem-util.cc M be/src/util/filesystem-util.h M common/thrift/ImpalaInternalService.thrift 15 files changed, 1,151 insertions(+), 506 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/41/5141/15 -- To view, visit http://gerrit.cloudera.org:8080/5141 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8c9c587df006d2f09d72dd636adafbd295fcdc17 Gerrit-PatchSet: 15 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong
[Impala-ASF-CR] IMPALA-4662: Fix NULL literal handling in Kudu IN list predicates
Michael Brown has posted comments on this change. Change subject: IMPALA-4662: Fix NULL literal handling in Kudu IN list predicates .. Patch Set 2: Code-Review+1 Thanks for adding the additional test cases. My +1 means I'm satisfied that you addressed my comments, but obviously someone should take a look again at your FE changes. -- To view, visit http://gerrit.cloudera.org:8080/5505 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iaf2c10a326373ad80aef51a85cec64071daefa7b Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3200: Implement suballocator for splitting buffers
Dan Hecht has posted comments on this change. Change subject: IMPALA-3200: Implement suballocator for splitting buffers .. Patch Set 14: (1 comment) http://gerrit.cloudera.org:8080/#/c/4715/14/be/src/runtime/bufferpool/suballocator.h File be/src/runtime/bufferpool/suballocator.h: Line 104: static constexpr int64_t MAX_ALLOCATION_BYTES = 1L << LOG_MAX_ALLOCATION_BYTES; > this is really large (entire virtual address space on current x64 implement after reading the rest of the code I see it's used so to bound the depth of the tree and number of free lists. Maybe comment thats what it's for. -- To view, visit http://gerrit.cloudera.org:8080/4715 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8bfe0e429f67ad273f7c7d0816703a9e6c3da788 Gerrit-PatchSet: 14 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3200: Implement suballocator for splitting buffers
Dan Hecht has posted comments on this change. Change subject: IMPALA-3200: Implement suballocator for splitting buffers .. Patch Set 14: (8 comments) http://gerrit.cloudera.org:8080/#/c/4715/14/be/src/runtime/bufferpool/suballocator.cc File be/src/runtime/bufferpool/suballocator.cc: Line 68: extraneous blank Line 80: DCHECK(free_node != nullptr); it looks like this dcheck can fire if SplitToSize() fails to allocate a node, no? Line 107: } why do we need both policies? Line 216: *ptr_from_prev = move(node->next_free_); should this reset result->prev_free_ so we don't have a dangling reference, for easier debugging? http://gerrit.cloudera.org:8080/#/c/4715/14/be/src/runtime/bufferpool/suballocator.h File be/src/runtime/bufferpool/suballocator.h: Line 31: /// allocation algorithm optimised for power-of-two allocations. Uses a binary buddy does it require that the buffer pool buffers themselves are power of two? what size buffer pool pages does it allocate? Line 104: static constexpr int64_t MAX_ALLOCATION_BYTES = 1L << LOG_MAX_ALLOCATION_BYTES; this is really large (entire virtual address space on current x64 implementations). what does it guard against? PS14, Line 125: 'node' should already be free and not in : /// any free list what does this mean given that free nodes are in the free list? do you mean it was in the free list but already removed or something else? PS14, Line 159: support supported? -- To view, visit http://gerrit.cloudera.org:8080/4715 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8bfe0e429f67ad273f7c7d0816703a9e6c3da788 Gerrit-PatchSet: 14 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test
Alex Behm has posted comments on this change. Change subject: IMPALA-4467: Add support for DML statements in stress test .. Patch Set 7: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/5093/4//COMMIT_MSG Commit Message: Line 24: following flag: --dml-mod-values 11 13 17. For each mod value 4 DML > In order to calculate %rows, we don't actually need to know how many rows a Ahh yes, you are right. Thanks for clarifying. Not sure what I was thinking. I was clearly wrong. -- To view, visit http://gerrit.cloudera.org:8080/5093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test
Hello Michael Brown, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5093 to look at the new patch set (#7). Change subject: IMPALA-4467: Add support for DML statements in stress test .. IMPALA-4467: Add support for DML statements in stress test - Add support for insert, upsert, update and and delete statements. - Add support for compute stats with mt_dop query options. - Update impyla version in order to be able to have access to query error text for DML queries. - Made flake8 fixes. flake8 on this file is clean. For every Kudu table in the databases, we make a copy and add a '_original' suffix to the table name. The DML queries will only make modifications to the non original table, the original table will never be modified. The orignal tables could be used to bring the non-original table to the inital state. Two flags were added for doing this: --reset-databases-before-binary-search and --reset-databases-after-binary-search. The DML queries are generated based on the mod values passed in with the following flag: --dml-mod-values 11 13 17. For each mod value 4 DML queries are generated. The DML operations will touch table rows where primary_key % mod_value = 0. So, the larger the mod value, the more rows would be affected. The DML queries are generated in such a way that the data for the insert, upsert, and update queries is taken from the table with the _original suffix. The stress test generates DML queries for only kudu databases. For example, --tpch-kudu-db=tpch_100_kudu --tpch-db=tpch_100 --generate-dml-queries would only generate queries for the tpch_100_kudu database. Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 --- M infra/python/deps/requirements.txt M tests/stress/concurrent_select.py M tests/util/parse_util.py 3 files changed, 689 insertions(+), 269 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/5093/7 -- To view, visit http://gerrit.cloudera.org:8080/5093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test
Taras Bobrovytsky has uploaded a new patch set (#7). Change subject: IMPALA-4467: Add support for DML statements in stress test .. IMPALA-4467: Add support for DML statements in stress test - Add support for insert, upsert, update and and delete statements. - Add support for compute stats with mt_dop query options. - Update impyla version in order to be able to have access to query error text for DML queries. - Made flake8 fixes. flake8 on this file is clean. For every Kudu table in the databases, we make a copy and add a '_original' suffix to the table name. The DML queries will only make modifications to the non original table, the original table will never be modified. The orignal tables could be used to bring the non-original table to the inital state. Two flags were added for doing this: --reset-databases-before-binary-search and --reset-databases-after-binary-search. The DML queries are generated based on the mod values passed in with the following flag: --dml-mod-values 11 13 17. For each mod value 4 DML queries are generated. The DML operations will touch table rows where primary_key % mod_value = 0. So, the larger the mod value, the more rows would be affected. The DML queries are generated in such a way that the data for the insert, upsert, and update queries is taken from the table with the _original suffix. The stress test generates DML queries for only kudu databases. For example, --tpch-kudu-db=tpch_100_kudu --tpch-db=tpch_100 --generate-dml-queries would only generate queries for the tpch_100_kudu database. Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 --- M infra/python/deps/requirements.txt M tests/stress/concurrent_select.py M tests/util/parse_util.py 3 files changed, 689 insertions(+), 269 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/5093/7 -- To view, visit http://gerrit.cloudera.org:8080/5093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test
Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-4467: Add support for DML statements in stress test .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/5093/4//COMMIT_MSG Commit Message: Line 24: following flag: --dml-mod-values 11 13 17. For each mod value 4 DML > I agree you can achieve the same thing with mod values in principle. Let me In order to calculate %rows, we don't actually need to know how many rows are in the table. %rows = 100 / mod_value. This is why I think it still makes sense to keep the mod_values. Another way to think of mod_value is if mod_value = N, then the DML statement should affect every Nth row. -- To view, visit http://gerrit.cloudera.org:8080/5093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4649: add a mechanism to pass flags into make
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4649: add a mechanism to pass flags into make .. Patch Set 2: Code-Review+1 Rebased, carry +1 -- To view, visit http://gerrit.cloudera.org:8080/5480 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I17b13cbaf395f962762d5cff3d650ffb077934a4 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4649: add a mechanism to pass flags into make
Hello Sailesh Mukil, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5480 to look at the new patch set (#2). Change subject: IMPALA-4649: add a mechanism to pass flags into make .. IMPALA-4649: add a mechanism to pass flags into make Testing: Tested that buildall.sh works as expected. Built locally with IMPALA_MAKE_FLAGS unset to confirm I didn't break anything. Built locally with IMPALA_MAKE_FLAGS=--load-average=$IMPALA_BUILD_THREADS and looked at "ps auxf" output to confirm it's passed through. Change-Id: I17b13cbaf395f962762d5cff3d650ffb077934a4 --- M bin/impala-config.sh M bin/make_impala.sh M testdata/bin/copy-udfs-udas.sh 3 files changed, 8 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/5480/2 -- To view, visit http://gerrit.cloudera.org:8080/5480 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I17b13cbaf395f962762d5cff3d650ffb077934a4 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Sailesh Mukil
[Impala-ASF-CR] IMPALA-4659: fuzz test fixes
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-4659: fuzz test fixes .. IMPALA-4659: fuzz test fixes * Apply a 512m mem_limit to all fuzz tests. This limits aggregate memory consumption to ~5GB per daemon(assuming 10 concurrent tests). * Refactor the exec option handling to use the exec_option dimension. This avoids executing the test multiple times redundantly * Remove the xfails to reduce noise, since there is no immediate plan to fix the product bugs. Instead just pass the tests. Testing: Ran in a loop for ~1h to flush out flakiness. Change-Id: Ie1942ceef252ec3e6171a0a54722b66a7d9abbd7 Reviewed-on: http://gerrit.cloudera.org:8080/5502 Reviewed-by: Tim Armstrong Tested-by: Impala Public Jenkins --- M tests/query_test/test_scanners_fuzz.py 1 file changed, 41 insertions(+), 25 deletions(-) Approvals: Impala Public Jenkins: Verified Tim Armstrong: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/5502 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ie1942ceef252ec3e6171a0a54722b66a7d9abbd7 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4659: fuzz test fixes
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4659: fuzz test fixes .. Patch Set 3: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/5502 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie1942ceef252ec3e6171a0a54722b66a7d9abbd7 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] CDH-48291: Fix flaky test TestRequestPoolService
Matthew Jacobs has posted comments on this change. Change subject: CDH-48291: Fix flaky test TestRequestPoolService .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/5507/1//COMMIT_MSG Commit Message: PS1, Line 7: CDH-48291 is this kosher? http://gerrit.cloudera.org:8080/#/c/5507/1/fe/src/test/java/org/apache/impala/util/TestRequestPoolService.java File fe/src/test/java/org/apache/impala/util/TestRequestPoolService.java: PS1, Line 204: Thread.sleep(1L + CHECK_INTERVAL_MS + : AllocationFileLoaderService.ALLOC_RELOAD_WAIT_MS); so the only issue is that this will always sleep for 16sec (assuming the last var is 5sec). I think to avoid adding a lot of time to tests, we should check every second (or so) up to this amount of time. This is kinda a pain here because checkModifiedConfigResults() doesn't lend itself to this model as it's written, but otherwise this is usually going to be a lot longer than needed. I haven't looked at the code in a wihle, but maybe it's possible to wait on something in AllocationFileLoaderService? That seems best. Or otherwise maybe you catch exceptions from checkModifiedResults and retry every second up to 15 or 30 seconds at which point we rethrow the exception? That is hackier but easy. -- To view, visit http://gerrit.cloudera.org:8080/5507 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I71a036d5b1227508e2175808d7e92837ae247d99 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test
Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-4467: Add support for DML statements in stress test .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/5093/6/tests/stress/concurrent_select.py File tests/stress/concurrent_select.py: Line 1471: # TODO: IMPALA-: Add support for tables with multiple primary keys. > fill in real JIRA or just leave the TODO Yes, I'll fill in the Jira. Michael wants every TODO to have a Jira attached. -- To view, visit http://gerrit.cloudera.org:8080/5093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test
Alex Behm has posted comments on this change. Change subject: IMPALA-4467: Add support for DML statements in stress test .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/5093/6/tests/stress/concurrent_select.py File tests/stress/concurrent_select.py: Line 1471: # TODO: IMPALA-: Add support for tables with multiple primary keys. fill in real JIRA or just leave the TODO -- To view, visit http://gerrit.cloudera.org:8080/5093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test
Taras Bobrovytsky has uploaded a new patch set (#6). Change subject: IMPALA-4467: Add support for DML statements in stress test .. IMPALA-4467: Add support for DML statements in stress test - Add support for insert, upsert, update and and delete statements. - Add support for compute stats with mt_dop query options. - Update impyla version in order to be able to have access to query error text for DML queries. - Made flake8 fixes. flake8 on this file is clean. For every Kudu table in the databases, we make a copy and add a '_original' suffix to the table name. The DML queries will only make modifications to the non original table, the original table will never be modified. The orignal tables could be used to bring the non-original table to the inital state. Two flags were added for doing this: --reset-databases-before-binary-search and --reset-databases-after-binary-search. The DML queries are generated based on the mod values passed in with the following flag: --dml-mod-values 11 13 17. For each mod value 4 DML queries are generated. The DML operations will touch table rows where primary_key % mod_value = 0. So, the larger the mod value, the more rows would be affected. The DML queries are generated in such a way that the data for the insert, upsert, and update queries is taken from the table with the _original suffix. The stress test generates DML queries for only kudu databases. For example, --tpch-kudu-db=tpch_100_kudu --tpch-db=tpch_100 --generate-dml-queries would only generate queries for the tpch_100_kudu database. Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 --- M infra/python/deps/requirements.txt M tests/stress/concurrent_select.py M tests/util/parse_util.py 3 files changed, 689 insertions(+), 269 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/5093/6 -- To view, visit http://gerrit.cloudera.org:8080/5093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test
Hello Michael Brown, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5093 to look at the new patch set (#6). Change subject: IMPALA-4467: Add support for DML statements in stress test .. IMPALA-4467: Add support for DML statements in stress test - Add support for insert, upsert, update and and delete statements. - Add support for compute stats with mt_dop query options. - Update impyla version in order to be able to have access to query error text for DML queries. - Made flake8 fixes. flake8 on this file is clean. For every Kudu table in the databases, we make a copy and add a '_original' suffix to the table name. The DML queries will only make modifications to the non original table, the original table will never be modified. The orignal tables could be used to bring the non-original table to the inital state. Two flags were added for doing this: --reset-databases-before-binary-search and --reset-databases-after-binary-search. The DML queries are generated based on the mod values passed in with the following flag: --dml-mod-values 11 13 17. For each mod value 4 DML queries are generated. The DML operations will touch table rows where primary_key % mod_value = 0. So, the larger the mod value, the more rows would be affected. The DML queries are generated in such a way that the data for the insert, upsert, and update queries is taken from the table with the _original suffix. The stress test generates DML queries for only kudu databases. For example, --tpch-kudu-db=tpch_100_kudu --tpch-db=tpch_100 --generate-dml-queries would only generate queries for the tpch_100_kudu database. Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 --- M infra/python/deps/requirements.txt M tests/stress/concurrent_select.py M tests/util/parse_util.py 3 files changed, 689 insertions(+), 269 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/5093/6 -- To view, visit http://gerrit.cloudera.org:8080/5093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test
Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-4467: Add support for DML statements in stress test .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/5093/4/tests/stress/concurrent_select.py File tests/stress/concurrent_select.py: Line 1481: "UPDATE a SET {update_list} FROM {table_name} a JOIN {table_name}_original b " > To me it's not really about validating the results, but more about predicta Done. Skipping creation of DML queries for tables with more than 1 primary key column. -- To view, visit http://gerrit.cloudera.org:8080/5093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] CDH-48291: Fix flaky test TestRequestPoolService
Dimitris Tsirogiannis has uploaded a new change for review. http://gerrit.cloudera.org:8080/5507 Change subject: CDH-48291: Fix flaky test TestRequestPoolService .. CDH-48291: Fix flaky test TestRequestPoolService This commit fixes a flaky test caused by tight timeouts. The fix is to bump up the timeouts in this test. Change-Id: I71a036d5b1227508e2175808d7e92837ae247d99 --- M fe/src/test/java/org/apache/impala/util/TestRequestPoolService.java 1 file changed, 2 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/5507/1 -- To view, visit http://gerrit.cloudera.org:8080/5507 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I71a036d5b1227508e2175808d7e92837ae247d99 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis
[Impala-ASF-CR] IMPALA-4662: Fix NULL literal handling in Kudu IN list predicates
Hello Dimitris Tsirogiannis, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5505 to look at the new patch set (#2). Change subject: IMPALA-4662: Fix NULL literal handling in Kudu IN list predicates .. IMPALA-4662: Fix NULL literal handling in Kudu IN list predicates The KuduScanNode attempts to push IN list predicates to the Kudu scan, but NULL literals cannot be pushed. The code in KuduScanNode needed to check if the Literals in the InPredicate is a NullLiteral, in which case the entire IN list should not be pushed to Kudu. The same handling is already in place for binary predicate pushdown. Change-Id: Iaf2c10a326373ad80aef51a85cec64071daefa7b --- M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test 2 files changed, 20 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/5505/2 -- To view, visit http://gerrit.cloudera.org:8080/5505 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iaf2c10a326373ad80aef51a85cec64071daefa7b Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown
[Impala-ASF-CR] IMPALA-4662: Fix NULL literal handling in Kudu IN list predicates
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-4662: Fix NULL literal handling in Kudu IN list predicates .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/5505/1/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java File fe/src/main/java/org/apache/impala/planner/KuduScanNode.java: PS1, Line 384: prediates > Typo: predicates Done http://gerrit.cloudera.org:8080/#/c/5505/1/testdata/workloads/functional-planner/queries/PlannerTest/kudu.test File testdata/workloads/functional-planner/queries/PlannerTest/kudu.test: PS1, Line 319: select id from functional_kudu.alltypestiny where id in (1, null); > Is it necessary to add additional tests for other types? (string, bool, etc Sure -- To view, visit http://gerrit.cloudera.org:8080/5505 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iaf2c10a326373ad80aef51a85cec64071daefa7b Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4662: Fix NULL literal handling in Kudu IN list predicates
Michael Brown has posted comments on this change. Change subject: IMPALA-4662: Fix NULL literal handling in Kudu IN list predicates .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/5505/1/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java File fe/src/main/java/org/apache/impala/planner/KuduScanNode.java: PS1, Line 384: prediates Typo: predicates http://gerrit.cloudera.org:8080/#/c/5505/1/testdata/workloads/functional-planner/queries/PlannerTest/kudu.test File testdata/workloads/functional-planner/queries/PlannerTest/kudu.test: PS1, Line 319: select id from functional_kudu.alltypestiny where id in (1, null); Is it necessary to add additional tests for other types? (string, bool, etc.) -- To view, visit http://gerrit.cloudera.org:8080/5505 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iaf2c10a326373ad80aef51a85cec64071daefa7b Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4662: Fix NULL literal handling in Kudu IN list predicates
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-4662: Fix NULL literal handling in Kudu IN list predicates .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/5505/1/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java File fe/src/main/java/org/apache/impala/planner/KuduScanNode.java: PS1, Line 388: Preconditions.checkNotNull(value == null); > Shouldn't this be checkNotNull(value)? It depends, if you want this Precondition to do anything then yes :) Good catch. Fixing this too. -- To view, visit http://gerrit.cloudera.org:8080/5505 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iaf2c10a326373ad80aef51a85cec64071daefa7b Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4662: Fix NULL literal handling in Kudu IN list predicates
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4662: Fix NULL literal handling in Kudu IN list predicates .. Patch Set 1: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/5505/1/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java File fe/src/main/java/org/apache/impala/planner/KuduScanNode.java: PS1, Line 388: Preconditions.checkNotNull(value == null); Shouldn't this be checkNotNull(value)? -- To view, visit http://gerrit.cloudera.org:8080/5505 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iaf2c10a326373ad80aef51a85cec64071daefa7b Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4631: don't use floating point operations for time unit conversions
Jim Apple has posted comments on this change. Change subject: IMPALA-4631: don't use floating point operations for time unit conversions .. Patch Set 6: Verified+1 http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/121/ -- To view, visit http://gerrit.cloudera.org:8080/5434 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7237f579b201f5bd3930f66e9c2c8d700c37ffeb Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dan Hecht Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Ho Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4662: Fix NULL literal handling in Kudu IN list predicates
Matthew Jacobs has uploaded a new change for review. http://gerrit.cloudera.org:8080/5505 Change subject: IMPALA-4662: Fix NULL literal handling in Kudu IN list predicates .. IMPALA-4662: Fix NULL literal handling in Kudu IN list predicates The KuduScanNode attempts to push IN list predicates to the Kudu scan, but NULL literals cannot be pushed. The code in KuduScanNode needed to check if the Literals in the InPredicate is a NullLiteral, in which case the entire IN list should not be pushed to Kudu. The same handling is already in place for binary predicate pushdown. Change-Id: Iaf2c10a326373ad80aef51a85cec64071daefa7b --- M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test 2 files changed, 15 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/5505/1 -- To view, visit http://gerrit.cloudera.org:8080/5505 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Iaf2c10a326373ad80aef51a85cec64071daefa7b Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs
[Impala-ASF-CR] Add all build targets to CMake and speed up builds
Impala Public Jenkins has submitted this change and it was merged. Change subject: Add all build targets to CMake and speed up builds .. Add all build targets to CMake and speed up builds Use CMake's dependency resolution always instead of serial execution of targets via shell scripts. This improves parallelism by building fe, be, and other targets at the same time and avoid some overhead from invoking "make" multiple times. This reduces the time taken for an incremental compilation of fe and be from 56s to 24s with this command: ./buildall.sh -debug -noclean -notests -skiptests -ninja Also use Impala-lzo's build script. This depends on the IMPALA-4277 fixes to the Impala-lzo build script. Log directory creation is also moved from impala-config.sh to buildall.sh. This means that impala-config.sh has no side-effects and can be run concurrently with no issues. Also make sure that "make" builds all the same artifacts as buildall.sh when run with no args. Testing: Ran a jenkins core job, also experimented locally. Ran a jenkins core job with distcc disabled - this exposed some concurrency bugs where impala-config.sh fails if run concurrently. Change-Id: I23617adf13bdeb034c24f6bba14b5ae480e8dd26 Reviewed-on: http://gerrit.cloudera.org:8080/4790 Reviewed-by: Tim Armstrong Tested-by: Impala Public Jenkins --- M CMakeLists.txt M be/src/benchmarks/CMakeLists.txt M be/src/service/CMakeLists.txt M bin/impala-config.sh M bin/make_impala.sh M buildall.sh M ext-data-source/CMakeLists.txt M fe/CMakeLists.txt 8 files changed, 85 insertions(+), 66 deletions(-) Approvals: Impala Public Jenkins: Verified Tim Armstrong: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/4790 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I23617adf13bdeb034c24f6bba14b5ae480e8dd26 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] Add all build targets to CMake and speed up builds
Impala Public Jenkins has posted comments on this change. Change subject: Add all build targets to CMake and speed up builds .. Patch Set 7: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/4790 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I23617adf13bdeb034c24f6bba14b5ae480e8dd26 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4648: remove build thirdparty.sh
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4648: remove build_thirdparty.sh .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/5477 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I36e11384877e5115430baf0170b31a508cb01ba3 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4648: remove build thirdparty.sh
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-4648: remove build_thirdparty.sh .. IMPALA-4648: remove build_thirdparty.sh It is not needed by any build processes. Change-Id: I36e11384877e5115430baf0170b31a508cb01ba3 Reviewed-on: http://gerrit.cloudera.org:8080/5477 Reviewed-by: Tim Armstrong Tested-by: Impala Public Jenkins --- D bin/build_thirdparty.sh 1 file changed, 0 insertions(+), 244 deletions(-) Approvals: Impala Public Jenkins: Verified Tim Armstrong: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/5477 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I36e11384877e5115430baf0170b31a508cb01ba3 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4654: KuduScanner must return when ReachedLimit()
Internal Jenkins has posted comments on this change. Change subject: IMPALA-4654: KuduScanner must return when ReachedLimit() .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/5493 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iaddd5a1b2647995d68e6d37d0500b3a322de Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4654: KuduScanner must return when ReachedLimit()
Internal Jenkins has submitted this change and it was merged. Change subject: IMPALA-4654: KuduScanner must return when ReachedLimit() .. IMPALA-4654: KuduScanner must return when ReachedLimit() Fixes a bug in the KuduScanner where the scan node's limit was not respected and thus the scanner thread would continue executing until the scan range was fully consumed. This could result in completed queries leaving fragments running and those threads could be using significant CPU and memory. For example, the query 'select * from tpch_kudu.lineitem limit 90' when running in the minicluster and lineitem is partitioned into 3 hash partitions would end up leaving a scanner thread running for ~60 seconds. In real world scenarios this can cause unexpected resource consumption. This could build up over time leading to query failures if these queries are submitted frequently. The fix is to ensure KuduScanner::GetNext() returns with eos=true when it finds ReachedLimit=true. An unnecessary and somewhat confusing flag 'batch_done' was being returned by a helper function DecodeRowsIntoRowBatch, which isn't necessary and was removed in order to make it more clear how the code in GetNext() should behave. Change-Id: Iaddd5a1b2647995d68e6d37d0500b3a322de Reviewed-on: http://gerrit.cloudera.org:8080/5493 Reviewed-by: Alex Behm Reviewed-by: Tim Armstrong Reviewed-by: Dan Hecht Tested-by: Internal Jenkins --- M be/src/exec/kudu-scanner.cc M be/src/exec/kudu-scanner.h M tests/query_test/test_kudu.py 3 files changed, 19 insertions(+), 28 deletions(-) Approvals: Internal Jenkins: Verified Alex Behm: Looks good to me, but someone else must approve Dan Hecht: Looks good to me, approved Tim Armstrong: Looks good to me, but someone else must approve -- To view, visit http://gerrit.cloudera.org:8080/5493 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Iaddd5a1b2647995d68e6d37d0500b3a322de Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon
[Impala-ASF-CR] IMPALA-4659: fuzz test fixes
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4659: fuzz test fixes .. Patch Set 3: Code-Review+2 Carry +2 -- To view, visit http://gerrit.cloudera.org:8080/5502 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie1942ceef252ec3e6171a0a54722b66a7d9abbd7 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4659: fuzz test fixes
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4659: fuzz test fixes .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/5502/2//COMMIT_MSG Commit Message: PS2, Line 7: IMPALA-4569 > wrong jira? Done -- To view, visit http://gerrit.cloudera.org:8080/5502 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie1942ceef252ec3e6171a0a54722b66a7d9abbd7 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4659: fuzz test fixes
Hello Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5502 to look at the new patch set (#3). Change subject: IMPALA-4659: fuzz test fixes .. IMPALA-4659: fuzz test fixes * Apply a 512m mem_limit to all fuzz tests. This limits aggregate memory consumption to ~5GB per daemon(assuming 10 concurrent tests). * Refactor the exec option handling to use the exec_option dimension. This avoids executing the test multiple times redundantly * Remove the xfails to reduce noise, since there is no immediate plan to fix the product bugs. Instead just pass the tests. Testing: Ran in a loop for ~1h to flush out flakiness. Change-Id: Ie1942ceef252ec3e6171a0a54722b66a7d9abbd7 --- M tests/query_test/test_scanners_fuzz.py 1 file changed, 41 insertions(+), 25 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/5502/3 -- To view, visit http://gerrit.cloudera.org:8080/5502 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie1942ceef252ec3e6171a0a54722b66a7d9abbd7 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4569: fuzz test fixes
Dan Hecht has posted comments on this change. Change subject: IMPALA-4569: fuzz test fixes .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/5502/2//COMMIT_MSG Commit Message: PS2, Line 7: IMPALA-4569 wrong jira? -- To view, visit http://gerrit.cloudera.org:8080/5502 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie1942ceef252ec3e6171a0a54722b66a7d9abbd7 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4633: Change broken gflag default for Kudu client mem
Dan Hecht has posted comments on this change. Change subject: IMPALA-4633: Change broken gflag default for Kudu client mem .. Patch Set 2: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/5503/2//COMMIT_MSG Commit Message: PS2, Line 11: The current default value of the : 'mutation buffer size' is 100MB which results in higher : write throughput than Kudu can currently handle on large : clusters. is there a kudu jira for making kudu tolerant of writers that are "too fast"? if so, can you mention it (or link the impala jira to it)? -- To view, visit http://gerrit.cloudera.org:8080/5503 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0b3544f9a93c82e347f6e97540d6b561c30d09fd Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4633: Change broken gflag default for Kudu client mem
Matthew Jacobs has uploaded a new patch set (#2). Change subject: IMPALA-4633: Change broken gflag default for Kudu client mem .. IMPALA-4633: Change broken gflag default for Kudu client mem We discovered that the current Kudu client defaults in the KuduTableSink are causing a large number of queries to timeout, failing the query. The current default value of the 'mutation buffer size' is 100MB which results in higher write throughput than Kudu can currently handle on large clusters. By decreasing the value of this flag, more RPCs will be sent for the same amount of data, i.e. throttling the load on Kudu. We found tests to be more successful on 200 nodes with a 10MB buffer size than the previous 100MB value where most queries couldn't complete due to timeouts. These queries were not timing out with the 10MB value. This appears to work well on 9 node stress tests as well. Change-Id: I0b3544f9a93c82e347f6e97540d6b561c30d09fd --- M be/src/exec/kudu-table-sink.cc M be/src/exec/kudu-table-sink.h 2 files changed, 5 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/03/5503/2 -- To view, visit http://gerrit.cloudera.org:8080/5503 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0b3544f9a93c82e347f6e97540d6b561c30d09fd Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs
[Impala-ASF-CR] IMPALA-4639: Add pytest option and xfail markers for tests that only run locally.
Alex Behm has posted comments on this change. Change subject: IMPALA-4639: Add pytest option and xfail markers for tests that only run locally. .. Patch Set 5: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/5446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id4d6e46dc1e64ad20c727ccb19af7a9f3daf917f Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Knupp Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4630: Change broken gflag default for Kudu client mem
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-4630: Change broken gflag default for Kudu client mem .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/5503/1//COMMIT_MSG Commit Message: Line 7: IMPALA-4630: Change broken gflag default for Kudu client mem > Wrong JIRA? Done Line 11: fail. The current mutation buffer size is 100MB, which is > can you qualify "fail"? did they time out? Done Line 15: successful on 200 node tests than the previous 100MB value. > Can you briefly describe the effect that this decrease has? For the same am Done -- To view, visit http://gerrit.cloudera.org:8080/5503 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0b3544f9a93c82e347f6e97540d6b561c30d09fd Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4639: Add pytest option and xfail markers for tests that only run locally.
Alex Behm has posted comments on this change. Change subject: IMPALA-4639: Add pytest option and xfail markers for tests that only run locally. .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/5446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id4d6e46dc1e64ad20c727ccb19af7a9f3daf917f Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Knupp Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3977: TransmitData() should not block
Lars Volker has posted comments on this change. Change subject: IMPALA-3977: TransmitData() should not block .. Patch Set 1: (4 comments) http://gerrit.cloudera.org:8080/#/c/5491/1/be/src/runtime/data-stream-sender.cc File be/src/runtime/data-stream-sender.cc: PS1, Line 226: FLAGS_datastream_sender_timeout_ms > rather than have the sleep period be a function of the total timeout, it ma Does it make sense to add some exponential back-off here, let's say up to a maximum value? That way, if the sender out-races the receiver we will try again sooner and bound delays. http://gerrit.cloudera.org:8080/#/c/5491/1/common/thrift/generate_error_codes.py File common/thrift/generate_error_codes.py: Line 233: ("DATASTREAM_SENDER_TIMEOUT", 72, "Sender timed out waiting for receiver fragment " Should we mark this as deprecated here? PS1, Line 314: DATASTREAM_RECEIVER_EAGAIN Would it make sense to name this more general, e.g. "RPC_TRY_AGAIN" so we can re-use it in similar situations? Line 315:"Datastream receiver is not yet ready"), nit: single line? -- To view, visit http://gerrit.cloudera.org:8080/5491 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I47f9d6c1de95a72ab73471dcd2a3118ef13e7db0 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4569: fuzz test fixes
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4569: fuzz test fixes .. Patch Set 2: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/122/ -- To view, visit http://gerrit.cloudera.org:8080/5502 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie1942ceef252ec3e6171a0a54722b66a7d9abbd7 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4639: Add pytest option and xfail markers for tests that only run locally.
Hello Michael Brown, Lars Volker, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5446 to look at the new patch set (#4). Change subject: IMPALA-4639: Add pytest option and xfail markers for tests that only run locally. .. IMPALA-4639: Add pytest option and xfail markers for tests that only run locally. As we're beginning to run Impala end-to-end tests on remote clusters, we're finding some tests that do not pass for infrastructure-related reasons (as opposed to product issues.) It would be useful to be able to xfail any tests that we know to be problematic within a given module, yet still run the others. This way, we can get passing test runs as we're ironing out those infrastructure issues. Change-Id: Id4d6e46dc1e64ad20c727ccb19af7a9f3daf917f --- M tests/conftest.py M tests/metadata/test_compute_stats.py M tests/query_test/test_mt_dop.py 3 files changed, 24 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/5446/4 -- To view, visit http://gerrit.cloudera.org:8080/5446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id4d6e46dc1e64ad20c727ccb19af7a9f3daf917f Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Knupp Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4639: Add pytest option and xfail markers for tests that only run locally.
David Knupp has posted comments on this change. Change subject: IMPALA-4639: Add pytest option and xfail markers for tests that only run locally. .. Patch Set 3: (4 comments) http://gerrit.cloudera.org:8080/#/c/5446/3/tests/conftest.py File tests/conftest.py: Line 480: Validate that pytest command line options makes sense. > typo: make sense Done Line 483: if "localhost" in pytest.config.option.impalad: > Any address starting with 127.x.x.x is the local machine I think. Done Line 483: if "localhost" in pytest.config.option.impalad: > or "127.0.0.1" ? Done http://gerrit.cloudera.org:8080/#/c/5446/3/tests/metadata/test_compute_stats.py File tests/metadata/test_compute_stats.py: Line 135: reason=("Because of the way we pre-split the data, setting up HBase " > add JIRA to the reason, so we can easily fix all the places later if needed Done -- To view, visit http://gerrit.cloudera.org:8080/5446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id4d6e46dc1e64ad20c727ccb19af7a9f3daf917f Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Knupp Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4569: fuzz test fixes
Alex Behm has posted comments on this change. Change subject: IMPALA-4569: fuzz test fixes .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/5502 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie1942ceef252ec3e6171a0a54722b66a7d9abbd7 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3200: Implement suballocator for splitting buffers
Hello Jim Apple, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4715 to look at the new patch set (#14). Change subject: IMPALA-3200: Implement suballocator for splitting buffers .. IMPALA-3200: Implement suballocator for splitting buffers This is useful for situations like hash tables, where we want to make multiple non-spillable allocations of variable size from buffer pool memory and not incur the overhead of interacting with the global buffer pool. The allocator subdivides buffers to service allocations and uses a buddy allocation algorithm to merge freed allocations into larger chunks. This helps avoid external fragmentation and is quite effective at reusing memory given the typical doubling allocation patterns of hash tables in partitioned aggs and joins. Testing: The allocator has fairly robust internal consistency checks via assertions and unique_ptrs. Includes a unit test that exercises various allocation patterns. I tested porting hash tables over to allocate memory using an earlier version of the suballocator, which worked well (was able to run a wide range of queries successfully with good performance). Change-Id: I8bfe0e429f67ad273f7c7d0816703a9e6c3da788 --- M be/src/common/names.h M be/src/exec/hash-table.h M be/src/runtime/bufferpool/CMakeLists.txt A be/src/runtime/bufferpool/suballocator-test.cc A be/src/runtime/bufferpool/suballocator.cc A be/src/runtime/bufferpool/suballocator.h 6 files changed, 938 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/4715/14 -- To view, visit http://gerrit.cloudera.org:8080/4715 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8bfe0e429f67ad273f7c7d0816703a9e6c3da788 Gerrit-PatchSet: 14 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-3200: Implement suballocator for splitting buffers
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3200: Implement suballocator for splitting buffers .. Patch Set 14: Code-Review+1 Carry +1 -- To view, visit http://gerrit.cloudera.org:8080/4715 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8bfe0e429f67ad273f7c7d0816703a9e6c3da788 Gerrit-PatchSet: 14 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3200: Implement suballocator for splitting buffers
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3200: Implement suballocator for splitting buffers .. Patch Set 13: (1 comment) http://gerrit.cloudera.org:8080/#/c/4715/13/be/src/runtime/bufferpool/suballocator.h File be/src/runtime/bufferpool/suballocator.h: PS13, Line 173: Suballocation objects are owned by : /// either client code (if in use), the free list (if free) or its left child (if split). > I think this could use calling out even more explicitly - the client has to Tried making it more explicit. -- To view, visit http://gerrit.cloudera.org:8080/4715 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8bfe0e429f67ad273f7c7d0816703a9e6c3da788 Gerrit-PatchSet: 13 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4569: fuzz test fixes
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4569: fuzz test fixes .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/5502/1/tests/query_test/test_scanners_fuzz.py File tests/query_test/test_scanners_fuzz.py: Line 59: 'disable_codegen' : cls.DISABLE_CODEGEN_VALUES, 'mem_limit' : cls.MEM_LIMITS})) > Should we add BATCH_SIZES here as well? I can see how you'd want to run sev That's a good point. Changed it to do that. -- To view, visit http://gerrit.cloudera.org:8080/5502 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie1942ceef252ec3e6171a0a54722b66a7d9abbd7 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4569: fuzz test fixes
Tim Armstrong has uploaded a new patch set (#2). Change subject: IMPALA-4569: fuzz test fixes .. IMPALA-4569: fuzz test fixes * Apply a 512m mem_limit to all fuzz tests. This limits aggregate memory consumption to ~5GB per daemon(assuming 10 concurrent tests). * Refactor the exec option handling to use the exec_option dimension. This avoids executing the test multiple times redundantly * Remove the xfails to reduce noise, since there is no immediate plan to fix the product bugs. Instead just pass the tests. Testing: Ran in a loop for ~1h to flush out flakiness. Change-Id: Ie1942ceef252ec3e6171a0a54722b66a7d9abbd7 --- M tests/query_test/test_scanners_fuzz.py 1 file changed, 41 insertions(+), 25 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/5502/2 -- To view, visit http://gerrit.cloudera.org:8080/5502 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie1942ceef252ec3e6171a0a54722b66a7d9abbd7 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-4631: don't use floating point operations for time unit conversions
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4631: don't use floating point operations for time unit conversions .. Patch Set 6: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/121/ -- To view, visit http://gerrit.cloudera.org:8080/5434 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7237f579b201f5bd3930f66e9c2c8d700c37ffeb Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dan Hecht Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Michael Ho Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-1430: enable codegen for native UDAs
Tim Armstrong has uploaded a new patch set (#7). Change subject: IMPALA-1430: enable codegen for native UDAs .. IMPALA-1430: enable codegen for native UDAs This uses the existing infrastructure for codegening builtin UDAs and for codegening calls to UDFs. GetUdf() is refactored to support both UDFs and UDAs. IR UDAs are still not allowed by the frontend. It's unclear if we want to enable them going forward because of the difficulties in testing and supporting IR UDFs/UDAs. Testing: test_udfs.py tests UDAs with codegen enabled and disabled Change-Id: Id1708eaa96eb76fb9bec5eeabf209f81c88eec2f --- M be/src/codegen/codegen-anyval.cc M be/src/codegen/codegen-anyval.h M be/src/exec/partitioned-aggregation-node.cc M be/src/exprs/agg-fn-evaluator.cc M be/src/exprs/agg-fn-evaluator.h M be/src/exprs/anyval-util.cc M be/src/exprs/anyval-util.h M be/src/exprs/expr.cc M be/src/exprs/expr.h M be/src/exprs/scalar-fn-call.cc M be/src/exprs/scalar-fn-call.h 11 files changed, 163 insertions(+), 124 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/61/5161/7 -- To view, visit http://gerrit.cloudera.org:8080/5161 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id1708eaa96eb76fb9bec5eeabf209f81c88eec2f Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] Add all build targets to CMake and speed up builds
Impala Public Jenkins has posted comments on this change. Change subject: Add all build targets to CMake and speed up builds .. Patch Set 7: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/120/ -- To view, visit http://gerrit.cloudera.org:8080/4790 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I23617adf13bdeb034c24f6bba14b5ae480e8dd26 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] Add all build targets to CMake and speed up builds
Jim Apple has posted comments on this change. Change subject: Add all build targets to CMake and speed up builds .. Patch Set 7: > Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/120/ Sorry, that was me :-( -- To view, visit http://gerrit.cloudera.org:8080/4790 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I23617adf13bdeb034c24f6bba14b5ae480e8dd26 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-1430: enable codegen for native UDAs
Tim Armstrong has posted comments on this change. Change subject: IMPALA-1430: enable codegen for native UDAs .. Patch Set 6: (7 comments) http://gerrit.cloudera.org:8080/#/c/5161/6/be/src/exprs/agg-fn-evaluator.cc File be/src/exprs/agg-fn-evaluator.cc: PS6, Line 542: Expr::InlineConstants(AnyValUtil::ColumnTypeToTypeDesc(intermediate_type()), : AnyValUtil::ColumnTypesToTypeDescs(arg_types), codegen, *uda_fn); > I am not sure I understand the intention of this. This is preserving the pre-existing behaviour of the UDA interface, which treats the last (in-out) argument as the return type in the function context. Changing how the types are exposed could break UDAs that were written against the old interface. I updated the comment to hopefully make this clearer. I don't feel strongly about whether this is the right API (on one hand, it doesn't match the C++ function signature, on the other the intermediate type is logically the return type) but I don't think it's worth breaking compatibility for. The constants aren't inlined into the input expressions, only to the UDA function itself, so the second scenario isn't possible. http://gerrit.cloudera.org:8080/#/c/5161/6/be/src/exprs/scalar-fn-call.cc File be/src/exprs/scalar-fn-call.cc: PS6, Line 327: !udf->isDeclaration() > May help to add a comment on why this can be a declaration only (i.e. no fu Done PS6, Line 328: InlineConstants(AnyValUtil::ColumnTypeToTypeDesc(type_), : AnyValUtil::ColumnTypesToTypeDescs(arg_types), codegen, udf); > Please see comments in AggFnEvaluator. It seems that we should be able to k That was the initial path I took but it's not possible because UDAs treat the final in/out argument as the return type. I could pass in an additional argument to determine which mode it should use if you prefer. PS6, Line 449: DCHECK(has_varargs || arg_types.size() == num_fixed_args); : DCHECK(!has_varargs || arg_types.size() > num_fixed_args); > DCHECK(arg_types.size() == num_fixed_args || (has_var_args && arg_types.si We actually want to reject the case of passing 0 varargs into a varargs function. The frontend currently rejects those cases and I added a comment to this function that specifically mentions this case. It would be reasonable to pass 0 args into varargs functions, but a bunch of code here doesn't handle that correctly - it assumes that having 0 varargs means that the function signature doesn't have a varargs argument. PS6, Line 478: codegen->void_type() : : CodegenAnyVal::GetLoweredType(codegen, *return_type); > nit: one line ? clang-format seems to prefer it this way. I don't feel strongly http://gerrit.cloudera.org:8080/#/c/5161/6/be/src/exprs/scalar-fn-call.h File be/src/exprs/scalar-fn-call.h: PS6, Line 59: 'cache_entry' > May help to have a comment about what 'cache_entry' is. Something like the Done Line 60: /// updated to point to the library (or its use count is incremented if already loaded). > Please add a comment that the caller is expected to call FinalizeFunction() Done -- To view, visit http://gerrit.cloudera.org:8080/5161 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id1708eaa96eb76fb9bec5eeabf209f81c88eec2f Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4630: Change broken gflag default for Kudu client mem
Alex Behm has posted comments on this change. Change subject: IMPALA-4630: Change broken gflag default for Kudu client mem .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/5503/1//COMMIT_MSG Commit Message: Line 7: IMPALA-4630: Change broken gflag default for Kudu client mem Wrong JIRA? Line 11: fail. The current mutation buffer size is 100MB, which is can you qualify "fail"? did they time out? Line 15: successful on 200 node tests than the previous 100MB value. Can you briefly describe the effect that this decrease has? For the same amount of data will we do more RPCs? -- To view, visit http://gerrit.cloudera.org:8080/5503 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0b3544f9a93c82e347f6e97540d6b561c30d09fd Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4569: fuzz test fixes
Alex Behm has posted comments on this change. Change subject: IMPALA-4569: fuzz test fixes .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/5502/1/tests/query_test/test_scanners_fuzz.py File tests/query_test/test_scanners_fuzz.py: Line 59: 'disable_codegen' : cls.DISABLE_CODEGEN_VALUES, 'mem_limit' : cls.MEM_LIMITS})) Should we add BATCH_SIZES here as well? I can see how you'd want to run several batch sizes on the same file, but the same would also apply to DISABLE_CODEGEN_VALUES. Not asking you to change anything, just curious what you think. -- To view, visit http://gerrit.cloudera.org:8080/5502 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie1942ceef252ec3e6171a0a54722b66a7d9abbd7 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4630: Change broken gflag default for Kudu client mem
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-4630: Change broken gflag default for Kudu client mem .. Patch Set 1: In addition to stress testing and the 200 node tests from Mostafa, this also passed a private test job. -- To view, visit http://gerrit.cloudera.org:8080/5503 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0b3544f9a93c82e347f6e97540d6b561c30d09fd Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4630: Change broken gflag default for Kudu client mem
Matthew Jacobs has uploaded a new change for review. http://gerrit.cloudera.org:8080/5503 Change subject: IMPALA-4630: Change broken gflag default for Kudu client mem .. IMPALA-4630: Change broken gflag default for Kudu client mem We discovered that the current kudu client defaults in the kudu table sink are causing a large number of queries to fail. The current mutation buffer size is 100MB, which is results in the higher write throughput than Kudu can currently handle on large clusters. By decreasing the default value of this flag, we found tests to be more successful on 200 node tests than the previous 100MB value. This value appeared to work well on 9 node stress tests as well. Change-Id: I0b3544f9a93c82e347f6e97540d6b561c30d09fd --- M be/src/exec/kudu-table-sink.cc M be/src/exec/kudu-table-sink.h 2 files changed, 4 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/03/5503/1 -- To view, visit http://gerrit.cloudera.org:8080/5503 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I0b3544f9a93c82e347f6e97540d6b561c30d09fd Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs
[Impala-ASF-CR] Add all build targets to CMake and speed up builds
Tim Armstrong has posted comments on this change. Change subject: Add all build targets to CMake and speed up builds .. Patch Set 7: Code-Review+2 rebase -- To view, visit http://gerrit.cloudera.org:8080/4790 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I23617adf13bdeb034c24f6bba14b5ae480e8dd26 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] Add all build targets to CMake and speed up builds
Impala Public Jenkins has posted comments on this change. Change subject: Add all build targets to CMake and speed up builds .. Patch Set 7: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/118/ -- To view, visit http://gerrit.cloudera.org:8080/4790 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I23617adf13bdeb034c24f6bba14b5ae480e8dd26 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] Add all build targets to CMake and speed up builds
Impala Public Jenkins has posted comments on this change. Change subject: Add all build targets to CMake and speed up builds .. Patch Set 6: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/117/ -- To view, visit http://gerrit.cloudera.org:8080/4790 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I23617adf13bdeb034c24f6bba14b5ae480e8dd26 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4648: remove build thirdparty.sh
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4648: remove build_thirdparty.sh .. Patch Set 2: Code-Review+2 rebase -- To view, visit http://gerrit.cloudera.org:8080/5477 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I36e11384877e5115430baf0170b31a508cb01ba3 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4648: remove build thirdparty.sh
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4648: remove build_thirdparty.sh .. Patch Set 2: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/116/ -- To view, visit http://gerrit.cloudera.org:8080/5477 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I36e11384877e5115430baf0170b31a508cb01ba3 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4569: fuzz test fixes
Tim Armstrong has uploaded a new change for review. http://gerrit.cloudera.org:8080/5502 Change subject: IMPALA-4569: fuzz test fixes .. IMPALA-4569: fuzz test fixes * Apply a 512m mem_limit to all fuzz tests. This limits aggregate memory consumption to ~5GB per daemon(assuming 10 concurrent tests). * Refactor the exec option handling to use the exec_option dimension. This avoids executing the test multiple times redundantly * Remove the xfails to reduce noise, since there is no immediate plan to fix the product bugs. Instead just pass the tests. Testing: Ran in a loop for ~1h to flush out flakiness. Change-Id: Ie1942ceef252ec3e6171a0a54722b66a7d9abbd7 --- M tests/query_test/test_scanners_fuzz.py 1 file changed, 23 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/5502/1 -- To view, visit http://gerrit.cloudera.org:8080/5502 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ie1942ceef252ec3e6171a0a54722b66a7d9abbd7 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong
[Impala-ASF-CR] IMPALA-3977: TransmitData() should not block
Henry Robinson has posted comments on this change. Change subject: IMPALA-3977: TransmitData() should not block .. Patch Set 1: (8 comments) http://gerrit.cloudera.org:8080/#/c/5491/1/be/src/runtime/data-stream-mgr.cc File be/src/runtime/data-stream-mgr.cc: PS1, Line 53: STREAM_EXPIRATION_TIME_MS Do you think we still need the stream cache? In the case where a receiver has completed before it receives an initial RPC, we should be able to rely on cancellation being detected on the sender side. The cost is that we might retry that initial RPC a couple of times before we receive cancellation from the coordinator. Can you think this through and work out whether there's a state where the stream cache is really worth the extra complexity? PS1, Line 99: if (acquire_lock) lock_.lock(); Can we just require now that the caller takes the lock? http://gerrit.cloudera.org:8080/#/c/5491/1/be/src/runtime/data-stream-sender.cc File be/src/runtime/data-stream-sender.cc: PS1, Line 209: TTransmitDataResult res; Let's move this inside the loop so it's freshly initialized on each iteration. That avoids any bugs where one RPC returns EAGAIN and sets a field in the result that - for some reason - a subsequent successful RPC does not overwrite. PS1, Line 217: DoTransmitDataRpc Since a row batch can potentially be many megabytes large, I wonder if it makes sense to have an initial 'probe' RPC that has no payload, but just confirms that the receiver is ready. Then we can limit the retry logic to that probe only, and if a subsequent RPC returns EAGAIN, we know that the receiver is closed and don't need to retry. Line 224: if (res.status.status_code != TErrorCode::DATASTREAM_RECEIVER_EAGAIN) break; Is cancellation detected during this loop? That's part of the point of moving control back to the sender on EAGAIN, so that cancellation can be detected. Otherwise this blocks from the sender's perspective in the same way that it did before. PS1, Line 225: VLOG_RPC << "Datastream receiver not yet ready. Retrying"; Not very informative without the fragment ID / destination address etc. PS1, Line 226: FLAGS_datastream_sender_timeout_ms rather than have the sleep period be a function of the total timeout, it makes more sense to have a fixed retry interval and exit this loop if the timeout has been exceeded. For example, if someone sets the timeout to ten minutes (because they're being ultra conservative on a heavy cluster), this will retry every two minutes. Line 228: I think it's a bit confusing to have this method return EAGAIN upon failure. Can you replace the status with timeout if it times-out of the loop? -- To view, visit http://gerrit.cloudera.org:8080/5491 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I47f9d6c1de95a72ab73471dcd2a3118ef13e7db0 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4057 and IMPALA-4050 Support starting webserver specified by hostname or "127.0.0.1"
Jim Apple has posted comments on this change. Change subject: IMPALA-4057 and IMPALA-4050 Support starting webserver specified by hostname or "127.0.0.1" .. Patch Set 1: > Why do this patch cannot be merged? Gerrit adds the bolded red "Cannot Merge" comment when a patch conflicts with some changes that have been made to the repository since the patch was first sent to Gerrit. Additionally, your patch has not been merged because you have not addressed the comments by Lars Volker. -- To view, visit http://gerrit.cloudera.org:8080/4553 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I54280a25c3709e2f316a17a6baf33dbbbae720c0 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: hewenting Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: hewenting Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2605: Omit the sort and mini stress tests
Jim Apple has posted comments on this change. Change subject: IMPALA-2605: Omit the sort and mini stress tests .. Patch Set 2: > Maybe we should dump out the value of some of these metrics before > running the tests in tests/run-tests.py. That would help diagnose > in future. So you're suggesting we add logging for, say, impala.thrift-server.backend.connections-in-use, impala.thrift-server.beeswax-frontend.connections-in-use, and impala.thrift-server.hiveserver2-frontend.connections-in-use, abandon this patch, and see what we can diagnose the next time this happens? -- To view, visit http://gerrit.cloudera.org:8080/5401 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibd30abf8215415e0f2830b725e43b005daa2bb2d Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim Apple Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4163: Add sortby() query hint
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4163: Add sortby() query hint .. Patch Set 3: (18 comments) http://gerrit.cloudera.org:8080/#/c/5051/3/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: Line 370: nonterminal List opt_plan_hints, plan_hint_list; do you really need both? why not have a non-existing hint = empty hint list? Line 2397: | /* empty */ what's the point of this? http://gerrit.cloudera.org:8080/#/c/5051/3/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java: Line 127: // Contains the result exprs corresponding to the columns of the target table, that are what do you mean by result exprs? are these basetbl exprs? Line 774: private void analyzeSortByColumns(List columnNames) throws AnalysisException { use line breaks where it makes sense. also, describe the rules for sort-by columns somewhere. analyzeSortByHint is more general (and correct) Line 787: } else { you'd also take this branch for hbase http://gerrit.cloudera.org:8080/#/c/5051/3/fe/src/main/java/org/apache/impala/analysis/PlanHint.java File fe/src/main/java/org/apache/impala/analysis/PlanHint.java: Line 34: private final String hintName_; remove 'hint' from members, this class already contains 'hint' in its name. Line 62: public String toString() { toSql test missing http://gerrit.cloudera.org:8080/#/c/5051/3/fe/src/main/java/org/apache/impala/analysis/TableRef.java File fe/src/main/java/org/apache/impala/analysis/TableRef.java: Line 87: // ArrayList initialization each. yes, please do. http://gerrit.cloudera.org:8080/#/c/5051/3/fe/src/main/java/org/apache/impala/planner/Planner.java File fe/src/main/java/org/apache/impala/planner/Planner.java: PS3, Line 511: clustered/noclustered plan :* hint and the sortby() "on the clustered/noclustered/sortby plan hint" is shorter Line 515:* will also sort by all columns specified in the sortby() hint. rewrite comment, instead of just tacking on http://gerrit.cloudera.org:8080/#/c/5051/3/fe/src/main/jflex/sql-scanner.flex File fe/src/main/jflex/sql-scanner.flex: Line 331: HintContent = [ ]* "+" [^\r\n*]* isn't [ ]* the same as " "*? Line 336: ContainsCommentEnd = [^]* "*/" [^]* what is [^]*? shouldn't 'anything' be simply ".*"? Line 342: TraditionalComment = "/*" !({HintContent}|{ContainsCommentEnd}) "*/" why do you need the ContainsCommentEnd here? wouldn't /* */ */ simply be a parse error? Line 434: yybegin(HINT); shouldn't this only apply to end-of-line hints? http://gerrit.cloudera.org:8080/#/c/5051/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java: Line 1776: "partition (year, month) %sshuffle,clustered,sortby(int_col,bool_col)%s " + mix up the order a bit http://gerrit.cloudera.org:8080/#/c/5051/3/fe/src/test/java/org/apache/impala/analysis/ParserTest.java File fe/src/test/java/org/apache/impala/analysis/ParserTest.java: Line 216: ParserError("select 1 /*+ sortby(() */"); also check that this is illegal: /* +sortby()\n and -- +sortby() */ \n Line 477: "insert into t %sSoRtBy( a , , ,,, b )%s select * from t", prefix, suffix)); is the capitalization necessary? http://gerrit.cloudera.org:8080/#/c/5051/3/testdata/workloads/functional-planner/queries/PlannerTest/insert.test File testdata/workloads/functional-planner/queries/PlannerTest/insert.test: Line 758:/*+ clustered,sortby(int_col, bool_col) */ move clustered to end -- To view, visit http://gerrit.cloudera.org:8080/5051 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I37a3ffab99aaa5d5a4fd1ac674b3e8b394a3c4c0 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2605: Omit the sort and mini stress tests
Tim Armstrong has posted comments on this change. Change subject: IMPALA-2605: Omit the sort and mini stress tests .. Patch Set 2: I guess I'm uneasy that we don't fully understand the cause of the problem. It seems like it's probably a test bug and I don't think we have a path to solving that. I looked again at the backtraces in the ticket and just see a bunch of threads waiting to read from the client connection. My other thought is that, even if we disable these tests, will the same problem just happen with the other enabled stress tests. Maybe we should dump out the value of some of these metrics before running the tests in tests/run-tests.py. That would help diagnose in future. -- To view, visit http://gerrit.cloudera.org:8080/5401 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibd30abf8215415e0f2830b725e43b005daa2bb2d Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim Apple Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4631: don't use floating point operations for time unit conversions
Dan Hecht has posted comments on this change. Change subject: IMPALA-4631: don't use floating point operations for time unit conversions .. Patch Set 6: Code-Review+2 Rebase -- To view, visit http://gerrit.cloudera.org:8080/5434 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7237f579b201f5bd3930f66e9c2c8d700c37ffeb Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dan Hecht Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Michael Ho Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4640: Fix number of rows displayed by parquet-reader tool
Lars Volker has posted comments on this change. Change subject: IMPALA-4640: Fix number of rows displayed by parquet-reader tool .. Patch Set 1: (3 comments) Thanks for the review, please see PS2. http://gerrit.cloudera.org:8080/#/c/5453/1/be/src/util/parquet-reader.cc File be/src/util/parquet-reader.cc: PS1, Line 153: const uint8_t* page) { > This can be moved to the line above. Done, thanks for catching this. Line 255: vector column_rows; > A little confusing, maybe 'column_num_rows'? Done, renamed column_sizes to make it more clear, too. Line 299: assert(column_rows[0] == column_rows[c]); > Add a comment, eg. "Check that all cols have the same number of rows" Done -- To view, visit http://gerrit.cloudera.org:8080/5453 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I281a784a0aa2df4ed1852dfb864587a0c1aa4d9a Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4640: Fix number of rows displayed by parquet-reader tool
Lars Volker has uploaded a new patch set (#2). Change subject: IMPALA-4640: Fix number of rows displayed by parquet-reader tool .. IMPALA-4640: Fix number of rows displayed by parquet-reader tool The variable just never got updated in the code. This change also adds verification that all columns contain the same number of rows. Change-Id: I281a784a0aa2df4ed1852dfb864587a0c1aa4d9a --- M be/src/util/parquet-reader.cc 1 file changed, 19 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/5453/2 -- To view, visit http://gerrit.cloudera.org:8080/5453 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I281a784a0aa2df4ed1852dfb864587a0c1aa4d9a Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-3977: TransmitData() should not block
Lars Volker has posted comments on this change. Change subject: IMPALA-3977: TransmitData() should not block .. Patch Set 1: I'll do the first round of reviews. -- To view, visit http://gerrit.cloudera.org:8080/5491 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I47f9d6c1de95a72ab73471dcd2a3118ef13e7db0 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4654: KuduScanner must return when ReachedLimit()
Dan Hecht has posted comments on this change. Change subject: IMPALA-4654: KuduScanner must return when ReachedLimit() .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/5493 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iaddd5a1b2647995d68e6d37d0500b3a322de Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[Impala-ASF-CR] Clarify synchronization policy for 'done ' in KuduScanNode
Lars Volker has posted comments on this change. Change subject: Clarify synchronization policy for 'done_' in KuduScanNode .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/5494/1//COMMIT_MSG Commit Message: Line 7: Clarify synchronization policy for 'done_' in KuduScanNode Is there a Jira? PS1, Line 9: calrify typo http://gerrit.cloudera.org:8080/#/c/5494/1/be/src/exec/kudu-scan-node.h File be/src/exec/kudu-scan-node.h: Line 99: /// contention. Does it make sense to add small comments at the particular places, explaining why it is not necessary to protect the reads? -- To view, visit http://gerrit.cloudera.org:8080/5494 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I72c5f1fecf4bcf737d71a1fa13e59361604485f0 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Lars Volker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4654: KuduScanner must return when ReachedLimit()
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-4654: KuduScanner must return when ReachedLimit() .. Patch Set 2: > Change looks good to me. > > As discussed, I think there is still one remaining bug with plans > like scan+topn where the coordinator may not necessarily cancel > fragments containing scans. In that case Close() is called on the > plan tree during tear down, but unlike other scan nodes the Kudu > scan node does not use done_ to terminate the scanner threads. For > non-Kudu scan nodes setting done_ triggers teardown of all scanner > threads. The scanner threads detect this indirectly via > ScannerContext::cancelled() which is checked on a per-row-batch > basis. I can't repro this yet (topn doesn't do it) so I'm hesitant to make the change in this CR, but I agree we should do it separately. It's worth digging into the topn case and seeing why it doesn't repro, but I can't spend too much time on that right now. I'll make a separate change to check done_ and that way we don't have to take it into a release branch until it has more bake time / time to find a repro. This does deserve more attention but given I can't find an obvious bug I can't work on it right now. > > Todd has a good point. Our limit check is broken. Agree to deal > with that everywhere in s separate change. Filed https://issues.cloudera.org/browse/IMPALA-4658 -- To view, visit http://gerrit.cloudera.org:8080/5493 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iaddd5a1b2647995d68e6d37d0500b3a322de Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4639: Add pytest option and xfail markers for tests that only run locally.
Lars Volker has posted comments on this change. Change subject: IMPALA-4639: Add pytest option and xfail markers for tests that only run locally. .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/5446/3/tests/conftest.py File tests/conftest.py: Line 483: if "localhost" in pytest.config.option.impalad: > or "127.0.0.1" ? Any address starting with 127.x.x.x is the local machine I think. -- To view, visit http://gerrit.cloudera.org:8080/5446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id4d6e46dc1e64ad20c727ccb19af7a9f3daf917f Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Knupp Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2605: Omit the sort and mini stress tests
Jim Apple has posted comments on this change. Change subject: IMPALA-2605: Omit the sort and mini stress tests .. Patch Set 2: > I'm much more in favor of deleting code we don't need instead of > leaving it and disabling it without understanding what is > happening. Tim, what do you think about deleting this code? -- To view, visit http://gerrit.cloudera.org:8080/5401 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibd30abf8215415e0f2830b725e43b005daa2bb2d Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim Apple Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4192: Avoid updating Expr's states from ExprContext
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4192: Avoid updating Expr's states from ExprContext .. Patch Set 1: (14 comments) Looking pretty good, mainly minor comments. http://gerrit.cloudera.org:8080/#/c/5483/1/be/src/exprs/agg-fn-evaluator.cc File be/src/exprs/agg-fn-evaluator.cc: PS1, Line 80: CreateInputsExprTrees CreateInputExprTrees? http://gerrit.cloudera.org:8080/#/c/5483/1/be/src/exprs/expr-context.h File be/src/exprs/expr-context.h: Line 51: /// The newly created ExprContext is returned. Could probably omit the second sentence - should be obvious from the function signature and name. Doesn't really matter either way though. Line 129: /// TODO: is this still necessary? It looks like this is a bit of a nasty hack so that output floating-point values from Round() are converted to text in a way that "looks right". Should file a JIRA and reference it. http://gerrit.cloudera.org:8080/#/c/5483/1/be/src/exprs/expr.h File be/src/exprs/expr.h: Line 184: static Status CreateExprTree(ObjectPool* pool, const TExpr& texpr, ExprContext** ctx); Are you going to change this as a later step? It seems like it would make sense to further separate creation of the Expr and creation of the initial ExprContexts. Maybe add a TODO if this is just an intermediate step - it's a little confusing otherwise Line 372: const int fn_context_index_; I think we need some more cleanup of the handling here. Based on the expr subclass, this should always be -1 or not, right? We're checking if it's -1 in a few places, e.g. ScalarFnCall and CaseExpr, where it shouldn't be necessary if I understood correctly. It would be good to also add DCHECK_GE(fn_context_index, 0) in the constructors of the appropriate classes. PS1, Line 378: may does PS1, Line 378: can will PS1, Line 391: Parses Is it parsing the tree or just walking it? PS1, Line 393: RegisterFnContext Maybe RegisterFnContexts? Since it may register multiple. PS1, Line 431: An Exp "Each expr in the tree" just to be clearer that it's recursive. PS1, Line 432: FunctionContex FunctionContext. http://gerrit.cloudera.org:8080/#/c/5483/1/be/src/exprs/hive-udf-call.h File be/src/exprs/hive-udf-call.h: Line 68: Extra line http://gerrit.cloudera.org:8080/#/c/5483/1/be/src/exprs/scalar-fn-call.h File be/src/exprs/scalar-fn-call.h: Line 66: virtual int ComputeVarArgsBufferSize() const; Add an override specifier. We don't use it universally yet since it was only added in c++11 but I think it's helpful. http://gerrit.cloudera.org:8080/#/c/5483/1/be/src/service/fe-support.cc File be/src/service/fe-support.cc: Line 148: status = expr_ctx->GetFnContextError(); This is much cleaner. -- To view, visit http://gerrit.cloudera.org:8080/5483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4654: KuduScanner must return when ReachedLimit()
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4654: KuduScanner must return when ReachedLimit() .. Patch Set 2: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/5493/2/be/src/exec/kudu-scanner.cc File be/src/exec/kudu-scanner.cc: PS2, Line 200: scan_node_->ReachedLimit() > Yeah this is a good point, I'll file a separate JIRA for us to address how Agree we should fix it. I think we won't run into any (serious) problems in practice, since there's a memory barrier when we acquire the queue locks to add each batch. The once-per-row check here may not behave as expected since it could be cached. In the HDFS scanners I believe it's only checked once per batch anyway. The scan node needs to truncate the batches correctly regardless. I don't think we need to check this every row for Kudu, so we should consider changing it. -- To view, visit http://gerrit.cloudera.org:8080/5493 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iaddd5a1b2647995d68e6d37d0500b3a322de Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes