[ 
https://issues.apache.org/jira/browse/KYLIN-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15374191#comment-15374191
 ] 

Ma Gang commented on KYLIN-1872:
--------------------------------

Yes, the interrupting only happens on client side, the only purpose is to 
protect Kylin server from running OOM because of large query, rather than 
protect the region server.

Totally agree that the final solution is to interrupt the coprocessor, that's a 
long way to go. Do you have any idea to accomplish this? The only solution 
comes to my mind is: use another coprocessor interface to set specify query is 
stopped, that state is stored in a static map, and the visitCube interface 
check the query state in period, and if found it is stopped, quit the 
procedure.  Or store the state in an external storage.

For the scan count limit, we can just put the limit in the CubeVisitRequest, 
just like timeout property, so that the coprocessor don't return too large 
result.

> Make query visible and interruptible, improve server's stablility
> -----------------------------------------------------------------
>
>                 Key: KYLIN-1872
>                 URL: https://issues.apache.org/jira/browse/KYLIN-1872
>             Project: Kylin
>          Issue Type: Improvement
>          Components: Query Engine
>            Reporter: Ma Gang
>            Assignee: Ma Gang
>         Attachments: query_visible_interruptable-1.4rc.patch
>
>
> Problem:
> 1. Large query result will break kylin server, for example: select * from 
> fact_table. Even when properties "kylin.query.scan.threshold" and 
> "kylin.query.mem.budget" are set properly, OOM still happens, because the 
> hbase rpc thread is not interrupted, the result will continually go to kylin 
> server. And server will run OOM quickly when there are multiple such queries.
> 2. Tow many slow queries will occupy all tomcat threads, and make server 
> unresponsed.
> 3. There's no corelation id for a specified query, so it is hard to find the 
> rpc log for a specified query, if there are too many queries running 
> concurrently.
> Solution:
> 1. Interrupt the rpc thread and main query thread when return result size 
> larger than the config limit size.
> 2. Make query visible. Admin can view all running queries, and detail of each 
> query. 
>    Split the query into following steps:
>    1) sql parse
>    2) cube plan 
>    3) query cache
>    4) multiple cube segment query
>       a. for each segment request, have muliple endpoint range request.
>       b. for each endpoint range request, have multiple coprocessor request.
>       c. for each coprocessor request, have multiple region server rpc.
>    Admin can view the startTime/endTime of each step, and the thread stack 
> trace if the step is running.
> 3. Add query id as corelation id in the rpc log.
> 4. Admin can interrupt a running query, to release the thread, memory, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to