[ 
https://issues.apache.org/jira/browse/SPARK-2016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14087235#comment-14087235
 ] 

Carlos Fuertes commented on SPARK-2016:
---------------------------------------

I have done some very simple benchmarks comparing the current master and the UI 
is still unresponsive with big tables (high number of blocks) even after the 
change in SPARK-2316. However if you switch to a solution where you serve the 
data for the tables through JSON and build the html table with Javascript, the 
UI remains responsive.

Here it is a rough benchmark running on an old MacBook laptop in local mode and 
using Chrome to render the UI — gathered the stats using the dev tools included 
in Chrome:

> sc.parallelize(1 to 1000000, 50000).count()

The time to load ‘/storage/rdd/?id=0’ is :

- Current master release takes between ~11 secs but then when the page finishes 
loading is completely unusable since it takes forever to scroll up or down. 
Size of the page is 14.4MB.

- If I run the page with the modified css style, it loads couples sec faster 
but it remains unresponsive after it loads. That corresponds to running my pull 
request with “spark.ui.jsRenderingEnabled false”

- With the JSON solution, you have the page without the blocks table instantly 
while it takes ~15 secs to load the blocks table. After that however the page 
is totally responsive. 

>From my limited tests I would say that it is a win using Javascript with JSON 
>to render the page since the page remains responsive and usable after loading 
>big tables.


> rdd in-memory storage UI becomes unresponsive when the number of RDD 
> partitions is large
> ----------------------------------------------------------------------------------------
>
>                 Key: SPARK-2016
>                 URL: https://issues.apache.org/jira/browse/SPARK-2016
>             Project: Spark
>          Issue Type: Sub-task
>            Reporter: Reynold Xin
>              Labels: starter
>
> Try run
> {code}
> sc.parallelize(1 to 100, 1000000).cache().count()
> {code}
> And open the storage UI for this RDD. It takes forever to load the page.
> When the number of partitions is very large, I think there are a few 
> alternatives:
> 0. Only show the top 1000.
> 1. Pagination
> 2. Instead of grouping by RDD blocks, group by executors



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to