[jira] [Commented] (AMBARI-10050) Querying For Requests By Task Status Has Poor Performance

Hadoop QA (JIRA) Thu, 12 Mar 2015 13:13:54 -0700

    [ 
https://issues.apache.org/jira/browse/AMBARI-10050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14359297#comment-14359297
 ]


Hadoop QA commented on AMBARI-10050:
------------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12704221/AMBARI-10050.patch
  against trunk revision .

    {color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

    {color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
                        Please justify why no new tests are needed for this 
patch.
                        Also please list what manual steps were performed to 
verify this patch.

    {color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

    {color:red}-1 core tests{color}.  The test build failed in ambari-server 

Test results: 
https://builds.apache.org/job/Ambari-trunk-test-patch/2015//testReport/
Console output: 
https://builds.apache.org/job/Ambari-trunk-test-patch/2015//console

This message is automatically generated.

> Querying For Requests By Task Status Has Poor Performance
> ---------------------------------------------------------
>
>                 Key: AMBARI-10050
>                 URL: https://issues.apache.org/jira/browse/AMBARI-10050
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-server
>    Affects Versions: 2.0.0
>            Reporter: Jonathan Hurley
>            Assignee: Jonathan Hurley
>            Priority: Critical
>             Fix For: 2.0.0
>
>         Attachments: AMBARI-10050.patch
>
>
> When querying for the requests that are either IN_PROGRESS, FAILED or 
> COMPLETED, the query being used is inefficient and can cause a wait of up to 
> 10 minutes in a cluster where there is a large number of stages and tasks.
> {{HostRoleCommandDAO.getRequestsByTaskStatus(...)}}
> This SQL seems overly complex for what it is. Removing the nested SELECT 
> seems like a great way to reduce the query time:
> {code}
> SELECT DISTINCT task.request_id as request_id
>   FROM host_role_command task WHERE task.status IN ( 'COMPLETED', 'FAILED', 
> 'TIMEDOUT', 'ABORTED')
>   ORDER BY task.request_id ASC;
> {code}
> ... or if we want to keep the NOT IN
> {code}
> SELECT DISTINCT task.request_id as task_id
>   FROM host_role_command task WHERE task.status NOT IN ( 'QUEUED', 
> 'IN_PROGRESS', 
>                                               'PENDING', 'HOLDING', 
>                                               'HOLDING_FAILED', 
>                                               'HOLDING_TIMEOUT' )
>   ORDER BY task.request_id ASC
>   LIMIT 1000
> {code}
> But to be honest, my suggestion is to rewrite this as a simple query that 
> matches from an {{EnumSet}} found in {{HostRoleStatus}}.
> Essentially, the problem here is that {{getRequestsByStatus}} is trying to do 
> the calculation work to determine the request status from task status all in 
> SQL. This method should be broken out into 2 SQL queries:
> - My above query for IN_PROGRESS or FAILED requests
> - A new query for COMPLETED that looks for any requests where all tasks have 
> completed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (AMBARI-10050) Querying For Requests By Task Status Has Poor Performance

Reply via email to