[ 
https://issues.apache.org/jira/browse/SPARK-26363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-26363:
-----------------------------------
    Description: 
In the method `taskList`(Since https://github.com/apache/spark/pull/21688),  
the executor log value is queried in KV store for every task(method 
`constructTaskData`).
We can use a hashmap for reducing duplicated KV store lookups in the method.



  was:
In https://github.com/apache/spark/pull/21688, a new filed `executorLogs` is 
added to `TaskData` in `api.scala`:
1. The field should not belong to `TaskData` (from the meaning of wording).
2. This is redundant with ExecutorSummary. 
3. For each row in the task table, the executor log value is lookup in KV store 
every time, which can be avoided for better performance in large scale.

This PR propose to reuse the executor details of request "/allexecutors" , so 
that we can have a cleaner api data structure, and redundant KV store queries 
are avoided. 




>  Avoid duplicated KV store lookups for task table
> -------------------------------------------------
>
>                 Key: SPARK-26363
>                 URL: https://issues.apache.org/jira/browse/SPARK-26363
>             Project: Spark
>          Issue Type: Improvement
>          Components: Web UI
>    Affects Versions: 3.0.0
>            Reporter: Gengliang Wang
>            Priority: Major
>
> In the method `taskList`(Since https://github.com/apache/spark/pull/21688),  
> the executor log value is queried in KV store for every task(method 
> `constructTaskData`).
> We can use a hashmap for reducing duplicated KV store lookups in the method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to