[ 
https://issues.apache.org/jira/browse/TEZ-2792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14737061#comment-14737061
 ] 

Hitesh Shah edited comment on TEZ-2792 at 9/9/15 3:49 PM:
----------------------------------------------------------

Comment on this particular section: 

{code}
 else if(vertexMinIDs.size() > 0) {
560             for (Integer vertexMinID : vertexMinIDs) {
561               Vertex vertex = getVertexFromIndex(dag, vertexMinID);
562               List<Task> vertexTasks = new 
ArrayList<>(vertex.getTasks().values());
563               tasks.addAll(vertexTasks.subList(0, 
Math.min(vertexTasks.size(), limit - tasks.size())));
564     
565               if(tasks.size() >= limit) {
566                 break;
567               }
568             }
569           }
570           else {
571             Collection<Vertex> vertices = dag.getVertices().values();
572             for (Vertex vertex : vertices) {
573               List<Task> vertexTasks = new 
ArrayList<>(vertex.getTasks().values());
574               tasks.addAll(vertexTasks.subList(0, 
Math.min(vertexTasks.size(), limit - tasks.size())));
575     
576               if(tasks.size() >= limit) {
577                 break;
578               }
579             }
580           }
581         }
{code}

Is there a reason why all objects are first copied over into an array list and 
then a subset is pulled out? 

Could a different approach be taken? For example, if the ask is minTaskId = 501 
and limit/max = 100, then just search for a given task by Id ( i.e 501 to 600 ) 
and put all of them into an array instead of getting all 10000 task objects and 
then splitting the array? This might require some changes to first check 
vertex::numTasks. 




was (Author: hitesh):
Comment on this particular section: 

{code}
 else if(vertexMinIDs.size() > 0) {
560             for (Integer vertexMinID : vertexMinIDs) {
561               Vertex vertex = getVertexFromIndex(dag, vertexMinID);
562               List<Task> vertexTasks = new 
ArrayList<>(vertex.getTasks().values());
563               tasks.addAll(vertexTasks.subList(0, 
Math.min(vertexTasks.size(), limit - tasks.size())));
564     
565               if(tasks.size() >= limit) {
566                 break;
567               }
568             }
569           }
570           else {
571             Collection<Vertex> vertices = dag.getVertices().values();
572             for (Vertex vertex : vertices) {
573               List<Task> vertexTasks = new 
ArrayList<>(vertex.getTasks().values());
574               tasks.addAll(vertexTasks.subList(0, 
Math.min(vertexTasks.size(), limit - tasks.size())));
575     
576               if(tasks.size() >= limit) {
577                 break;
578               }
579             }
580           }
581         }
{code}

Is there a reason why all objects are first copied over into an array list and 
then a subset is pulled out? 

Could a different approach be taken? For example, if the ask is minTaskId = 501 
and limit/max = 100, then just search for a given task by Id ( i.e 501 to 600 ) 
and put all of them into an array instead of getting all 10000 task objects and 
then splitting the array? 



> Add AM web service API for tasks.
> ---------------------------------
>
>                 Key: TEZ-2792
>                 URL: https://issues.apache.org/jira/browse/TEZ-2792
>             Project: Apache Tez
>          Issue Type: Sub-task
>          Components: UI
>            Reporter: Sreenath Somarajapuram
>            Assignee: Sreenath Somarajapuram
>         Attachments: TEZ-2792.1.patch
>
>
> Add AM API for fetching realtime tasks info:
> - API endpoint : /ws/v2/tez/tasksInfo
> - Query Params:
> -- dagMinID: dagMinID = dagIndex, (mandatory).
> -- vertexMinID: A comma separated list. vertexMinID = vertexIndex.
> -- taskMinID: A comma separated list. taskMinID = vertexIndex_taskIndex
> -- limit: Maximum number of items to be returned (Defaults to 100).
> - If taskMinID is passed: All (capped by limit) the specified tasks will be 
> returned. vertexMinID if present wont be considered.
> - IF vertexMinID is passed: All (capped by limit) tasks under the vertices 
> will be returned.
> - If just dagID is passed: All (capped by limit) tasks under the DAG will be 
> returned.
> - Data returned: complete task id, progress, status



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to