[Impala-ASF-CR] IMPALA-4654: KuduScanner must return when ReachedLimit()

2016-12-14 Thread Internal Jenkins (Code Review)
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-4654: KuduScanner must return when ReachedLimit()
..


Patch Set 2: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/5493
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iaddd5a1b2647995d68e6d37d0500b3a322de
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4654: KuduScanner must return when ReachedLimit()

2016-12-14 Thread Internal Jenkins (Code Review)
Internal Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4654: KuduScanner must return when ReachedLimit()
..


IMPALA-4654: KuduScanner must return when ReachedLimit()

Fixes a bug in the KuduScanner where the scan node's limit
was not respected and thus the scanner thread would
continue executing until the scan range was fully consumed.
This could result in completed queries leaving fragments
running and those threads could be using significant CPU and
memory.

For example, the query 'select * from tpch_kudu.lineitem
limit 90' when running in the minicluster and lineitem is
partitioned into 3 hash partitions would end up leaving a
scanner thread running for ~60 seconds. In real world
scenarios this can cause unexpected resource consumption.
This could build up over time leading to query failures if
these queries are submitted frequently.

The fix is to ensure KuduScanner::GetNext() returns with
eos=true when it finds ReachedLimit=true. An unnecessary and
somewhat confusing flag 'batch_done' was being returned by a
helper function DecodeRowsIntoRowBatch, which isn't
necessary and was removed in order to make it more clear how
the code in GetNext() should behave.

Change-Id: Iaddd5a1b2647995d68e6d37d0500b3a322de
Reviewed-on: http://gerrit.cloudera.org:8080/5493
Reviewed-by: Alex Behm 
Reviewed-by: Tim Armstrong 
Reviewed-by: Dan Hecht 
Tested-by: Internal Jenkins
---
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M tests/query_test/test_kudu.py
3 files changed, 19 insertions(+), 28 deletions(-)

Approvals:
  Internal Jenkins: Verified
  Alex Behm: Looks good to me, but someone else must approve
  Dan Hecht: Looks good to me, approved
  Tim Armstrong: Looks good to me, but someone else must approve



-- 
To view, visit http://gerrit.cloudera.org:8080/5493
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Iaddd5a1b2647995d68e6d37d0500b3a322de
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 


[Impala-ASF-CR] IMPALA-4654: KuduScanner must return when ReachedLimit()

2016-12-14 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-4654: KuduScanner must return when ReachedLimit()
..


Patch Set 2:

> Change looks good to me.
 > 
 > As discussed, I think there is still one remaining bug with plans
 > like scan+topn where the coordinator may not necessarily cancel
 > fragments containing scans. In that case Close() is called on the
 > plan tree during tear down, but unlike other scan nodes the Kudu
 > scan node does not use done_ to terminate the scanner threads. For
 > non-Kudu scan nodes setting done_ triggers teardown of all scanner
 > threads. The scanner threads detect this indirectly via
 > ScannerContext::cancelled() which is checked on a per-row-batch
 > basis.

I can't repro this yet (topn doesn't do it) so I'm hesitant to make the change 
in this CR, but I agree we should do it separately. It's worth digging into the 
topn case and seeing why it doesn't repro, but I can't spend too much time on 
that right now. I'll make a separate change to check done_ and that way we 
don't have to take it into a release branch until it has more bake time / time 
to find a repro. This does deserve more attention but given I can't find an 
obvious bug I can't work on it right now.

 > 
 > Todd has a good point. Our limit check is broken. Agree to deal
 > with that everywhere in s separate change.

Filed https://issues.cloudera.org/browse/IMPALA-4658

-- 
To view, visit http://gerrit.cloudera.org:8080/5493
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iaddd5a1b2647995d68e6d37d0500b3a322de
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4654: KuduScanner must return when ReachedLimit()

2016-12-13 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4654: KuduScanner must return when ReachedLimit()
..


Patch Set 2: Code-Review+1

Change looks good to me.

As discussed, I think there is still one remaining bug with plans like 
scan+topn where the coordinator may not necessarily cancel fragments containing 
scans. In that case Close() is called on the plan tree during tear down, but 
unlike other scan nodes the Kudu scan node does not use done_ to terminate the 
scanner threads. For non-Kudu scan nodes setting done_ triggers teardown of all 
scanner threads. The scanner threads detect this indirectly via 
ScannerContext::cancelled() which is checked on a per-row-batch basis.

Todd has a good point. Our limit check is broken. Agree to deal with that 
everywhere in s separate change.

-- 
To view, visit http://gerrit.cloudera.org:8080/5493
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iaddd5a1b2647995d68e6d37d0500b3a322de
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4654: KuduScanner must return when ReachedLimit()

2016-12-13 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-4654: KuduScanner must return when ReachedLimit()
..


Patch Set 2:

BTW this passed a private jenkins run

-- 
To view, visit http://gerrit.cloudera.org:8080/5493
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iaddd5a1b2647995d68e6d37d0500b3a322de
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4654: KuduScanner must return when ReachedLimit()

2016-12-13 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-4654: KuduScanner must return when ReachedLimit()
..


Patch Set 2:

(3 comments)

Alex, per our conversation about other cases- are you OK with investigating 
other potential related cases separately? I'd like to try to get this patch in 
since we know this case exists now and is very bad.

http://gerrit.cloudera.org:8080/#/c/5493/2/be/src/exec/kudu-scanner.cc
File be/src/exec/kudu-scanner.cc:

Line 105: RETURN_IF_CANCELLED(state_);
> Does this do anything?
These should work when the query is actually sent the cancellation :) as we 
discussed in person, Sailesh found that the coordinator doesn't send the cancel 
rpc to fragments that have reported 'done' as they would in this case where the 
limit was reached.


PS2, Line 200: scan_node_->ReachedLimit()
> this isn't specific to Kudu, but it seems wrong that in this and other plac
Yeah this is a good point, I'll file a separate JIRA for us to address how 
we're doing this everywhere. Doesn't seem to cause any issues here so I'd 
prefer we deal with it as a separate issue.


http://gerrit.cloudera.org:8080/#/c/5493/2/tests/query_test/test_kudu.py
File tests/query_test/test_kudu.py:

Line 627:   v.wait_for_metric("impala-server.num-fragments-in-flight", 0, 
timeout=30)
> How sure are you this won't be flaky? Time seems tight to me.
On my box this happened now nearly immediately (e.g. I can set this to 0), I 
also found this was successful w/ 30 on a test run. I don't see why it would 
take longer once this change is in? I'm happy to make it longer to be safe if 
you prefer though.


-- 
To view, visit http://gerrit.cloudera.org:8080/5493
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iaddd5a1b2647995d68e6d37d0500b3a322de
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4654: KuduScanner must return when ReachedLimit()

2016-12-13 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4654: KuduScanner must return when ReachedLimit()
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/5493/2/be/src/exec/kudu-scanner.cc
File be/src/exec/kudu-scanner.cc:

Line 105: RETURN_IF_CANCELLED(state_);
Does this do anything?

Other scanners do:

RETURN_IF_ERROR(state_->GetQueryStatus());

Looks like we also do that in

DecodeRowsIntoRowBatch()

Any idea why these checks were not sufficient to stop the thread from 
continuing to execute?


http://gerrit.cloudera.org:8080/#/c/5493/2/tests/query_test/test_kudu.py
File tests/query_test/test_kudu.py:

Line 627:   v.wait_for_metric("impala-server.num-fragments-in-flight", 0, 
timeout=30)
How sure are you this won't be flaky? Time seems tight to me.


-- 
To view, visit http://gerrit.cloudera.org:8080/5493
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iaddd5a1b2647995d68e6d37d0500b3a322de
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4654: KuduScanner must return when ReachedLimit()

2016-12-13 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-4654: KuduScanner must return when ReachedLimit()
..


Patch Set 1: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5493/1/be/src/exec/kudu-scanner.h
File be/src/exec/kudu-scanner.h:

PS1, Line 70: current scanner
Should this be "current Kudu scan batch" ?


-- 
To view, visit http://gerrit.cloudera.org:8080/5493
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iaddd5a1b2647995d68e6d37d0500b3a322de
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: Yes