[ https://issues.apache.org/jira/browse/TEZ-2792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14737061#comment-14737061 ]
Hitesh Shah edited comment on TEZ-2792 at 9/9/15 3:49 PM: ---------------------------------------------------------- Comment on this particular section: {code} else if(vertexMinIDs.size() > 0) { 560 for (Integer vertexMinID : vertexMinIDs) { 561 Vertex vertex = getVertexFromIndex(dag, vertexMinID); 562 List<Task> vertexTasks = new ArrayList<>(vertex.getTasks().values()); 563 tasks.addAll(vertexTasks.subList(0, Math.min(vertexTasks.size(), limit - tasks.size()))); 564 565 if(tasks.size() >= limit) { 566 break; 567 } 568 } 569 } 570 else { 571 Collection<Vertex> vertices = dag.getVertices().values(); 572 for (Vertex vertex : vertices) { 573 List<Task> vertexTasks = new ArrayList<>(vertex.getTasks().values()); 574 tasks.addAll(vertexTasks.subList(0, Math.min(vertexTasks.size(), limit - tasks.size()))); 575 576 if(tasks.size() >= limit) { 577 break; 578 } 579 } 580 } 581 } {code} Is there a reason why all objects are first copied over into an array list and then a subset is pulled out? Could a different approach be taken? For example, if the ask is minTaskId = 501 and limit/max = 100, then just search for a given task by Id ( i.e 501 to 600 ) and put all of them into an array instead of getting all 10000 task objects and then splitting the array? This might require some changes to first check vertex::numTasks. was (Author: hitesh): Comment on this particular section: {code} else if(vertexMinIDs.size() > 0) { 560 for (Integer vertexMinID : vertexMinIDs) { 561 Vertex vertex = getVertexFromIndex(dag, vertexMinID); 562 List<Task> vertexTasks = new ArrayList<>(vertex.getTasks().values()); 563 tasks.addAll(vertexTasks.subList(0, Math.min(vertexTasks.size(), limit - tasks.size()))); 564 565 if(tasks.size() >= limit) { 566 break; 567 } 568 } 569 } 570 else { 571 Collection<Vertex> vertices = dag.getVertices().values(); 572 for (Vertex vertex : vertices) { 573 List<Task> vertexTasks = new ArrayList<>(vertex.getTasks().values()); 574 tasks.addAll(vertexTasks.subList(0, Math.min(vertexTasks.size(), limit - tasks.size()))); 575 576 if(tasks.size() >= limit) { 577 break; 578 } 579 } 580 } 581 } {code} Is there a reason why all objects are first copied over into an array list and then a subset is pulled out? Could a different approach be taken? For example, if the ask is minTaskId = 501 and limit/max = 100, then just search for a given task by Id ( i.e 501 to 600 ) and put all of them into an array instead of getting all 10000 task objects and then splitting the array? > Add AM web service API for tasks. > --------------------------------- > > Key: TEZ-2792 > URL: https://issues.apache.org/jira/browse/TEZ-2792 > Project: Apache Tez > Issue Type: Sub-task > Components: UI > Reporter: Sreenath Somarajapuram > Assignee: Sreenath Somarajapuram > Attachments: TEZ-2792.1.patch > > > Add AM API for fetching realtime tasks info: > - API endpoint : /ws/v2/tez/tasksInfo > - Query Params: > -- dagMinID: dagMinID = dagIndex, (mandatory). > -- vertexMinID: A comma separated list. vertexMinID = vertexIndex. > -- taskMinID: A comma separated list. taskMinID = vertexIndex_taskIndex > -- limit: Maximum number of items to be returned (Defaults to 100). > - If taskMinID is passed: All (capped by limit) the specified tasks will be > returned. vertexMinID if present wont be considered. > - IF vertexMinID is passed: All (capped by limit) tasks under the vertices > will be returned. > - If just dagID is passed: All (capped by limit) tasks under the DAG will be > returned. > - Data returned: complete task id, progress, status -- This message was sent by Atlassian JIRA (v6.3.4#6332)