[ 
https://issues.apache.org/jira/browse/IMPALA-2797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17107601#comment-17107601
 ] 

Tim Armstrong commented on IMPALA-2797:
---------------------------------------

We probably won't do any work on this - this queue is no longer used when 
mt_dop > 0

> scanner threads can act like a thundering herd
> ----------------------------------------------
>
>                 Key: IMPALA-2797
>                 URL: https://issues.apache.org/jira/browse/IMPALA-2797
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>    Affects Versions: Impala 2.3.0
>            Reporter: Todd Lipcon
>            Priority: Major
>         Attachments: trace_Tue_Dec_22_2015_2.27.03_PM.json.gz
>
>
> I noticed this issue with the Kudu scan node implementation, but I imagine it 
> could happen with HDFS as well:
> - on a big box, we started up 48 scanner threads for a 'SELECT COUNT(*)' query
> - the underlying batches that are read from Kudu return a few million rows 
> each (because it's trying to do large IOs to amortize round-trips, and the 
> projection is empty for COUNT(*))
> -- the scannerthread chops these Kudu batches into RowBatches of 1000 rows 
> each and pushes those onto the RowBatchQueue
> Because each backend IO (scan RPC to Kudu) results in thousands of Impala 
> RowBatches, we end up with the main thread pulling "round robin" from all of 
> the scanner threads, rather than exhausting one Kudu batch before moving to 
> the next. The issue here is that we see the following:
> - when the query starts, 48 threads hammer the kudu server with Scan RPCs
> - the Kudu server is then completely quiet for ~30 seconds while they drain 
> their buffers
> - all of the buffers "empty" at basically the same time, and we get another 
> herd of IO on the Kudu side.
> It would be preferable to make the RowBatchQueue "unfair" in some way, such 
> that the main thread exhausts entire IO buffers at a time, rather than 
> pulling little bits from each of the threads in a round-robin fashion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to