[jira] [Comment Edited] (JENA-2328) Query timeouts failing when plan phase is long

Andy Seaborne (Jira) Wed, 08 Jun 2022 00:36:04 -0700


    [ 
https://issues.apache.org/jira/browse/JENA-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551440#comment-17551440
 ]


Andy Seaborne edited comment on JENA-2328 at 6/8/22 7:35 AM:
-------------------------------------------------------------

[~der] I'd be grateful if you or colleagues could try this out in your 
environment.

Your test case worked for me. The new mechanism is covered by existing tests.

(The test isn't included directly because it will be unstable on loaded CI 
where pauses of a thread for several seconds can occur. That makes detecting 
why a timeout occurred somewhat tricky and also the JUnit timeout may go off 
first - they don't necessarily go off "in order".)

 


was (Author: andy.seaborne):
[~der] I'd be grateful if you or colleagues could try this out in your 
environment.

Your test case worked for me. The new mechanism is covered by existing tests.

(The test included directly because it will be unstable on loaded CI where 
pauses of several seconds can occur making detecting why a timeout occurred 
somewhat tricky because the JUnit timeout may go off first.)

 

 

> Query timeouts failing when plan phase is long
> ----------------------------------------------
>
>                 Key: JENA-2328
>                 URL: https://issues.apache.org/jira/browse/JENA-2328
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: ARQ
>            Reporter: Dave Reynolds
>            Assignee: Andy Seaborne
>            Priority: Major
>             Fix For: Jena 4.6.0
>
>         Attachments: TestQueryExecutionTimeout3.java
>
>
> In a production service with a large TDB store (around 500MT) we find that 
> some complex queries evade the query timeouts (set to 90s first result, 120s 
> total) and then run for hours soaking up all available CPU cores. While the 
> queries show no clear pattern, and it has been hard replicate in a controlled 
> setting, we do now have one example which is expressible as a test case. See 
> attached.
> The behaviour is that the abort() call from the alarm timeout is received by 
> QueryExecDataset before there is an iterator to cancel - the QueryExecDataset 
> instance is deep in getPlan() which itself executes part of the query. In the 
> specific example it's OpSlice which is iterating through the offset while 
> still in the planning phase. Though not  queries which cause this sort of 
> behaviour use offsets.
> Sorry but have no PR to offer at this stage. Have looked at whether it's 
> possible to have getPlan() return some future or deferrable plan so that the 
> top level exec has a handle on something that it can abort. However, the 
> changes looks far reaching and I don't yet have a satisfactory approach to 
> offer.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (JENA-2328) Query timeouts failing when plan phase is long

Reply via email to