[Impala-ASF-CR] IMPALA-4654: KuduScanner must return when ReachedLimit()
Internal Jenkins has posted comments on this change. Change subject: IMPALA-4654: KuduScanner must return when ReachedLimit() .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/5493 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iaddd5a1b2647995d68e6d37d0500b3a322de Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4654: KuduScanner must return when ReachedLimit()
Internal Jenkins has submitted this change and it was merged. Change subject: IMPALA-4654: KuduScanner must return when ReachedLimit() .. IMPALA-4654: KuduScanner must return when ReachedLimit() Fixes a bug in the KuduScanner where the scan node's limit was not respected and thus the scanner thread would continue executing until the scan range was fully consumed. This could result in completed queries leaving fragments running and those threads could be using significant CPU and memory. For example, the query 'select * from tpch_kudu.lineitem limit 90' when running in the minicluster and lineitem is partitioned into 3 hash partitions would end up leaving a scanner thread running for ~60 seconds. In real world scenarios this can cause unexpected resource consumption. This could build up over time leading to query failures if these queries are submitted frequently. The fix is to ensure KuduScanner::GetNext() returns with eos=true when it finds ReachedLimit=true. An unnecessary and somewhat confusing flag 'batch_done' was being returned by a helper function DecodeRowsIntoRowBatch, which isn't necessary and was removed in order to make it more clear how the code in GetNext() should behave. Change-Id: Iaddd5a1b2647995d68e6d37d0500b3a322de Reviewed-on: http://gerrit.cloudera.org:8080/5493 Reviewed-by: Alex BehmReviewed-by: Tim Armstrong Reviewed-by: Dan Hecht Tested-by: Internal Jenkins --- M be/src/exec/kudu-scanner.cc M be/src/exec/kudu-scanner.h M tests/query_test/test_kudu.py 3 files changed, 19 insertions(+), 28 deletions(-) Approvals: Internal Jenkins: Verified Alex Behm: Looks good to me, but someone else must approve Dan Hecht: Looks good to me, approved Tim Armstrong: Looks good to me, but someone else must approve -- To view, visit http://gerrit.cloudera.org:8080/5493 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Iaddd5a1b2647995d68e6d37d0500b3a322de Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon
[Impala-ASF-CR] IMPALA-4654: KuduScanner must return when ReachedLimit()
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-4654: KuduScanner must return when ReachedLimit() .. Patch Set 2: > Change looks good to me. > > As discussed, I think there is still one remaining bug with plans > like scan+topn where the coordinator may not necessarily cancel > fragments containing scans. In that case Close() is called on the > plan tree during tear down, but unlike other scan nodes the Kudu > scan node does not use done_ to terminate the scanner threads. For > non-Kudu scan nodes setting done_ triggers teardown of all scanner > threads. The scanner threads detect this indirectly via > ScannerContext::cancelled() which is checked on a per-row-batch > basis. I can't repro this yet (topn doesn't do it) so I'm hesitant to make the change in this CR, but I agree we should do it separately. It's worth digging into the topn case and seeing why it doesn't repro, but I can't spend too much time on that right now. I'll make a separate change to check done_ and that way we don't have to take it into a release branch until it has more bake time / time to find a repro. This does deserve more attention but given I can't find an obvious bug I can't work on it right now. > > Todd has a good point. Our limit check is broken. Agree to deal > with that everywhere in s separate change. Filed https://issues.cloudera.org/browse/IMPALA-4658 -- To view, visit http://gerrit.cloudera.org:8080/5493 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iaddd5a1b2647995d68e6d37d0500b3a322de Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4654: KuduScanner must return when ReachedLimit()
Alex Behm has posted comments on this change. Change subject: IMPALA-4654: KuduScanner must return when ReachedLimit() .. Patch Set 2: Code-Review+1 Change looks good to me. As discussed, I think there is still one remaining bug with plans like scan+topn where the coordinator may not necessarily cancel fragments containing scans. In that case Close() is called on the plan tree during tear down, but unlike other scan nodes the Kudu scan node does not use done_ to terminate the scanner threads. For non-Kudu scan nodes setting done_ triggers teardown of all scanner threads. The scanner threads detect this indirectly via ScannerContext::cancelled() which is checked on a per-row-batch basis. Todd has a good point. Our limit check is broken. Agree to deal with that everywhere in s separate change. -- To view, visit http://gerrit.cloudera.org:8080/5493 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iaddd5a1b2647995d68e6d37d0500b3a322de Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4654: KuduScanner must return when ReachedLimit()
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-4654: KuduScanner must return when ReachedLimit() .. Patch Set 2: BTW this passed a private jenkins run -- To view, visit http://gerrit.cloudera.org:8080/5493 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iaddd5a1b2647995d68e6d37d0500b3a322de Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4654: KuduScanner must return when ReachedLimit()
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-4654: KuduScanner must return when ReachedLimit() .. Patch Set 2: (3 comments) Alex, per our conversation about other cases- are you OK with investigating other potential related cases separately? I'd like to try to get this patch in since we know this case exists now and is very bad. http://gerrit.cloudera.org:8080/#/c/5493/2/be/src/exec/kudu-scanner.cc File be/src/exec/kudu-scanner.cc: Line 105: RETURN_IF_CANCELLED(state_); > Does this do anything? These should work when the query is actually sent the cancellation :) as we discussed in person, Sailesh found that the coordinator doesn't send the cancel rpc to fragments that have reported 'done' as they would in this case where the limit was reached. PS2, Line 200: scan_node_->ReachedLimit() > this isn't specific to Kudu, but it seems wrong that in this and other plac Yeah this is a good point, I'll file a separate JIRA for us to address how we're doing this everywhere. Doesn't seem to cause any issues here so I'd prefer we deal with it as a separate issue. http://gerrit.cloudera.org:8080/#/c/5493/2/tests/query_test/test_kudu.py File tests/query_test/test_kudu.py: Line 627: v.wait_for_metric("impala-server.num-fragments-in-flight", 0, timeout=30) > How sure are you this won't be flaky? Time seems tight to me. On my box this happened now nearly immediately (e.g. I can set this to 0), I also found this was successful w/ 30 on a test run. I don't see why it would take longer once this change is in? I'm happy to make it longer to be safe if you prefer though. -- To view, visit http://gerrit.cloudera.org:8080/5493 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iaddd5a1b2647995d68e6d37d0500b3a322de Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4654: KuduScanner must return when ReachedLimit()
Alex Behm has posted comments on this change. Change subject: IMPALA-4654: KuduScanner must return when ReachedLimit() .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/5493/2/be/src/exec/kudu-scanner.cc File be/src/exec/kudu-scanner.cc: Line 105: RETURN_IF_CANCELLED(state_); Does this do anything? Other scanners do: RETURN_IF_ERROR(state_->GetQueryStatus()); Looks like we also do that in DecodeRowsIntoRowBatch() Any idea why these checks were not sufficient to stop the thread from continuing to execute? http://gerrit.cloudera.org:8080/#/c/5493/2/tests/query_test/test_kudu.py File tests/query_test/test_kudu.py: Line 627: v.wait_for_metric("impala-server.num-fragments-in-flight", 0, timeout=30) How sure are you this won't be flaky? Time seems tight to me. -- To view, visit http://gerrit.cloudera.org:8080/5493 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iaddd5a1b2647995d68e6d37d0500b3a322de Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4654: KuduScanner must return when ReachedLimit()
Sailesh Mukil has posted comments on this change. Change subject: IMPALA-4654: KuduScanner must return when ReachedLimit() .. Patch Set 1: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/5493/1/be/src/exec/kudu-scanner.h File be/src/exec/kudu-scanner.h: PS1, Line 70: current scanner Should this be "current Kudu scan batch" ? -- To view, visit http://gerrit.cloudera.org:8080/5493 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iaddd5a1b2647995d68e6d37d0500b3a322de Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: Yes