[jira] [Commented] (SPARK-2017) web ui stage page becomes unresponsive when the number of tasks is large
[ https://issues.apache.org/jira/browse/SPARK-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621196#comment-14621196 ] Andrew Or commented on SPARK-2017: -- Partially resolved by https://github.com/apache/spark/pull/7296, but I'm going to leave this one open in case we want to do pagination. web ui stage page becomes unresponsive when the number of tasks is large Key: SPARK-2017 URL: https://issues.apache.org/jira/browse/SPARK-2017 Project: Spark Issue Type: Sub-task Components: Web UI Reporter: Reynold Xin Labels: starter {code} sc.parallelize(1 to 100, 100).count() {code} The above code creates one million tasks to be executed. The stage detail web ui page takes forever to load (if it ever completes). There are again a few different alternatives: 0. Limit the number of tasks we show. 1. Pagination 2. By default only show the aggregate metrics and failed tasks, and hide the successful ones. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2017) web ui stage page becomes unresponsive when the number of tasks is large
[ https://issues.apache.org/jira/browse/SPARK-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14619188#comment-14619188 ] Apache Spark commented on SPARK-2017: - User 'andrewor14' has created a pull request for this issue: https://github.com/apache/spark/pull/7296 web ui stage page becomes unresponsive when the number of tasks is large Key: SPARK-2017 URL: https://issues.apache.org/jira/browse/SPARK-2017 Project: Spark Issue Type: Sub-task Components: Web UI Reporter: Reynold Xin Labels: starter {code} sc.parallelize(1 to 100, 100).count() {code} The above code creates one million tasks to be executed. The stage detail web ui page takes forever to load (if it ever completes). There are again a few different alternatives: 0. Limit the number of tasks we show. 1. Pagination 2. By default only show the aggregate metrics and failed tasks, and hide the successful ones. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2017) web ui stage page becomes unresponsive when the number of tasks is large
[ https://issues.apache.org/jira/browse/SPARK-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604723#comment-14604723 ] Apache Spark commented on SPARK-2017: - User 'zsxwing' has created a pull request for this issue: https://github.com/apache/spark/pull/7071 web ui stage page becomes unresponsive when the number of tasks is large Key: SPARK-2017 URL: https://issues.apache.org/jira/browse/SPARK-2017 Project: Spark Issue Type: Sub-task Components: Web UI Reporter: Reynold Xin Labels: starter {code} sc.parallelize(1 to 100, 100).count() {code} The above code creates one million tasks to be executed. The stage detail web ui page takes forever to load (if it ever completes). There are again a few different alternatives: 0. Limit the number of tasks we show. 1. Pagination 2. By default only show the aggregate metrics and failed tasks, and hide the successful ones. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2017) web ui stage page becomes unresponsive when the number of tasks is large
[ https://issues.apache.org/jira/browse/SPARK-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162404#comment-14162404 ] Apache Spark commented on SPARK-2017: - User 'carlosfuertes' has created a pull request for this issue: https://github.com/apache/spark/pull/1682 web ui stage page becomes unresponsive when the number of tasks is large Key: SPARK-2017 URL: https://issues.apache.org/jira/browse/SPARK-2017 Project: Spark Issue Type: Sub-task Components: Web UI Reporter: Reynold Xin Labels: starter {code} sc.parallelize(1 to 100, 100).count() {code} The above code creates one million tasks to be executed. The stage detail web ui page takes forever to load (if it ever completes). There are again a few different alternatives: 0. Limit the number of tasks we show. 1. Pagination 2. By default only show the aggregate metrics and failed tasks, and hide the successful ones. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2017) web ui stage page becomes unresponsive when the number of tasks is large
[ https://issues.apache.org/jira/browse/SPARK-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101567#comment-14101567 ] Reynold Xin commented on SPARK-2017: [~carlosfuertes] in your pull request (#1682), did you address the rendering slowness problem? I didn't actually find the token f9f9f9 in the diff. web ui stage page becomes unresponsive when the number of tasks is large Key: SPARK-2017 URL: https://issues.apache.org/jira/browse/SPARK-2017 Project: Spark Issue Type: Sub-task Components: Web UI Reporter: Reynold Xin Labels: starter {code} sc.parallelize(1 to 100, 100).count() {code} The above code creates one million tasks to be executed. The stage detail web ui page takes forever to load (if it ever completes). There are again a few different alternatives: 0. Limit the number of tasks we show. 1. Pagination 2. By default only show the aggregate metrics and failed tasks, and hide the successful ones. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2017) web ui stage page becomes unresponsive when the number of tasks is large
[ https://issues.apache.org/jira/browse/SPARK-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101810#comment-14101810 ] Carlos Fuertes commented on SPARK-2017: --- Hi, I addressed the table css rendering slowness by creating a different css class spark-simple-table, under core/main/resources/ui.static/spark.css, and using it when rendering those tables, in particular by calling the listingTable method with simpleTable set to true as an optional param. Otherwise by default you use the bootstrap css table class that you had. But according to the the simple tests that I have done, what you gain from that is marginal (see also what I posted at SPARK-2016). The real issue is the responsiveness of the page after you it has loaded. In order to really improve that, the best solution came from using ajax, js and JSON to load the data asynchronously. That way the base html page is much much smaller, loads instantly, and the web browser remains responsive all the time: As I described also in SPARK-2016 no matter what css table class you use, for big table sizes (I have tested it with data sizes up to 15MB which is roughly the table sizes you generate with 5 in the examples above) after you load pages with big tables the browser becomes completely unresponsive, however if you load the data using an ajax call, the page remains perfectly browsable. In pull request #1682 by default you use ajax and js to render those tables. I created a config variable spark.ui.jsRenderingEnabled which by default is true. If you set it to false in your properties, you go back to the original way of creating a big html with all the data embedded in it. Of course all this is without using pagination to show the data, that could also be done. But from what I am seeing, using JSON to serve the data gives you much more flexibility going forward, for other uses and extensions, and increases overall responsiveness of pages no matter how you finally render it. web ui stage page becomes unresponsive when the number of tasks is large Key: SPARK-2017 URL: https://issues.apache.org/jira/browse/SPARK-2017 Project: Spark Issue Type: Sub-task Components: Web UI Reporter: Reynold Xin Labels: starter {code} sc.parallelize(1 to 100, 100).count() {code} The above code creates one million tasks to be executed. The stage detail web ui page takes forever to load (if it ever completes). There are again a few different alternatives: 0. Limit the number of tasks we show. 1. Pagination 2. By default only show the aggregate metrics and failed tasks, and hide the successful ones. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2017) web ui stage page becomes unresponsive when the number of tasks is large
[ https://issues.apache.org/jira/browse/SPARK-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14082909#comment-14082909 ] Carlos Fuertes commented on SPARK-2017: --- I have been digging in on why the bad performance on rendering the tables. As it happens the bottleneck is in the css that is currently used for rendering the tables. In particular bootstrap.css and this type of definition: .table-striped tbodytr:nth-child(odd)td,.table-striped tbodytr:nth-child(odd)th{background-color:#f9f9f9;} The call to nth-child(odd) with large tables slows everything to the point that for big table the whole rendering stalls. I have made a change in the pull request [1682] where I use a custom very simple css table styling (respecting the same overall look and and feel but with no nth-child call). I have not changed the sortable option of the tables. Now if you run for example sc.parallelize(1 to 100, 5).count() loading the whole page /stages/stage/?id=0 takes ~ 11 secs. Of those sec, 2.10 s are spend loading the JSON from the driver (a total of 16.7MB) and the rest in the rendering of the table. Since the JSON request is async, you can see immediately the rest of the page nonetheless. I think this would solve the responsiveness problem for reasonably large number of tasks as a first pass. I have also apply the same solution to all tables under Storage where the same thing was happening [SPARK-2016]. web ui stage page becomes unresponsive when the number of tasks is large Key: SPARK-2017 URL: https://issues.apache.org/jira/browse/SPARK-2017 Project: Spark Issue Type: Sub-task Components: Web UI Reporter: Reynold Xin Labels: starter {code} sc.parallelize(1 to 100, 100).count() {code} The above code creates one million tasks to be executed. The stage detail web ui page takes forever to load (if it ever completes). There are again a few different alternatives: 0. Limit the number of tasks we show. 1. Pagination 2. By default only show the aggregate metrics and failed tasks, and hide the successful ones. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2017) web ui stage page becomes unresponsive when the number of tasks is large
[ https://issues.apache.org/jira/browse/SPARK-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080539#comment-14080539 ] Carlos Fuertes commented on SPARK-2017: --- Hi, I have implemented under https://github.com/apache/spark/pull/1682 the solution where you serve the data for the tables as JSON for tasks under 'stages' and also 'storage' (this is issue SPARK-2016 which boils to same bottom problem). Main addition is exposing paths with the JSON data as: /stages/stage/tasks/json/?id=nnn /storage/json /storage/rdd/workers/json?id=nnn /storage/rdd/blocks/json?id=nnn and using javascript to built the tables from an ajax request of those JSON. This solves partially the issue of responsiveness since the data is served asynchronously to the loading of the page. However since the driver is sending for every refresh all the data again, with very big number of tasks as they progress, that means that it starts taking longer and longer to send all the data. But at least the Summary table loads much faster with no need to wait for all the task table to complete. A better solution would be to stream the data by chunks as they are ready or keep a cache of the previos results. I have not explored the latter yet but the above could be a start to build on it. web ui stage page becomes unresponsive when the number of tasks is large Key: SPARK-2017 URL: https://issues.apache.org/jira/browse/SPARK-2017 Project: Spark Issue Type: Sub-task Components: Web UI Reporter: Reynold Xin Labels: starter {code} sc.parallelize(1 to 100, 100).count() {code} The above code creates one million tasks to be executed. The stage detail web ui page takes forever to load (if it ever completes). There are again a few different alternatives: 0. Limit the number of tasks we show. 1. Pagination 2. By default only show the aggregate metrics and failed tasks, and hide the successful ones. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2017) web ui stage page becomes unresponsive when the number of tasks is large
[ https://issues.apache.org/jira/browse/SPARK-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081598#comment-14081598 ] Carlos Fuertes commented on SPARK-2017: --- I have done some tests with the solution where you use JSON to send the data. If you run with 50k tasks sc.parallelize(1 to 100, 5).count() the JSON [/stages/stage/tasks/json/?id=0] that represents the tasks table takes ~15Mb if you download it. You can get the JSON is some secs but the UI [/stages/stage/?id=0] will take still forever to render it (summary still shows up nonetheless). I did not change the way we are rendering, that is move to pagination or anything else, and still using sorttable to allow the sorting of the table. Maybe just converting to JSON is too simple and you still have to do streaming of the data if you want to go around 50k task and higher while maintaining responsiveness of the browser. And/or incorporate pagination directly with a global index for the tasks on the back. web ui stage page becomes unresponsive when the number of tasks is large Key: SPARK-2017 URL: https://issues.apache.org/jira/browse/SPARK-2017 Project: Spark Issue Type: Sub-task Components: Web UI Reporter: Reynold Xin Labels: starter {code} sc.parallelize(1 to 100, 100).count() {code} The above code creates one million tasks to be executed. The stage detail web ui page takes forever to load (if it ever completes). There are again a few different alternatives: 0. Limit the number of tasks we show. 1. Pagination 2. By default only show the aggregate metrics and failed tasks, and hide the successful ones. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2017) web ui stage page becomes unresponsive when the number of tasks is large
[ https://issues.apache.org/jira/browse/SPARK-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081605#comment-14081605 ] Carlos Fuertes commented on SPARK-2017: --- I did not realize that the tasks all have their own index already so implementing the pagination on top of it should be simple. I'll give it a try. web ui stage page becomes unresponsive when the number of tasks is large Key: SPARK-2017 URL: https://issues.apache.org/jira/browse/SPARK-2017 Project: Spark Issue Type: Sub-task Components: Web UI Reporter: Reynold Xin Labels: starter {code} sc.parallelize(1 to 100, 100).count() {code} The above code creates one million tasks to be executed. The stage detail web ui page takes forever to load (if it ever completes). There are again a few different alternatives: 0. Limit the number of tasks we show. 1. Pagination 2. By default only show the aggregate metrics and failed tasks, and hide the successful ones. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2017) web ui stage page becomes unresponsive when the number of tasks is large
[ https://issues.apache.org/jira/browse/SPARK-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14055207#comment-14055207 ] Masayoshi TSUZUKI commented on SPARK-2017: -- I wonder if the browser can handle (including sort or search etc) the very large size of JSON in a realistic time. If rendering is indeed the bottleneck and JSON process in browser-side is fast enough, your suggested way looks a good idea. web ui stage page becomes unresponsive when the number of tasks is large Key: SPARK-2017 URL: https://issues.apache.org/jira/browse/SPARK-2017 Project: Spark Issue Type: Sub-task Reporter: Reynold Xin Labels: starter {code} sc.parallelize(1 to 100, 100).count() {code} The above code creates one million tasks to be executed. The stage detail web ui page takes forever to load (if it ever completes). There are again a few different alternatives: 0. Limit the number of tasks we show. 1. Pagination 2. By default only show the aggregate metrics and failed tasks, and hide the successful ones. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2017) web ui stage page becomes unresponsive when the number of tasks is large
[ https://issues.apache.org/jira/browse/SPARK-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14054178#comment-14054178 ] Masayoshi TSUZUKI commented on SPARK-2017: -- Pagination seems to be better because with aggregated metrics, 1. we can't identify the skew of tasks between the executors. 2. the same problem will appear again when many tasks fail in a certain stage. In addition, when some errors or problems occur under the production environment, we would like to see the status of tasks near the time even if those tasks mostly succeeded. Although every status of tasks is written in the log file, web ui is very useful in operation phase. web ui stage page becomes unresponsive when the number of tasks is large Key: SPARK-2017 URL: https://issues.apache.org/jira/browse/SPARK-2017 Project: Spark Issue Type: Sub-task Reporter: Reynold Xin Labels: starter {code} sc.parallelize(1 to 100, 100).count() {code} The above code creates one million tasks to be executed. The stage detail web ui page takes forever to load (if it ever completes). There are again a few different alternatives: 0. Limit the number of tasks we show. 1. Pagination 2. By default only show the aggregate metrics and failed tasks, and hide the successful ones. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2017) web ui stage page becomes unresponsive when the number of tasks is large
[ https://issues.apache.org/jira/browse/SPARK-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14052289#comment-14052289 ] Mridul Muralidharan commented on SPARK-2017: With aggregated metrics, we loose the ability to check for gc time (which is actually what I use that UI for, other than to dig up exceptions on failed tasks). web ui stage page becomes unresponsive when the number of tasks is large Key: SPARK-2017 URL: https://issues.apache.org/jira/browse/SPARK-2017 Project: Spark Issue Type: Sub-task Reporter: Reynold Xin Labels: starter {code} sc.parallelize(1 to 100, 100).count() {code} The above code creates one million tasks to be executed. The stage detail web ui page takes forever to load (if it ever completes). There are again a few different alternatives: 0. Limit the number of tasks we show. 1. Pagination 2. By default only show the aggregate metrics and failed tasks, and hide the successful ones. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2017) web ui stage page becomes unresponsive when the number of tasks is large
[ https://issues.apache.org/jira/browse/SPARK-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14052647#comment-14052647 ] Reynold Xin commented on SPARK-2017: You can still see the list of successful tasks by clicking a link. We can also just add the aggregated GC task time by executor to the aggregated metrics side. web ui stage page becomes unresponsive when the number of tasks is large Key: SPARK-2017 URL: https://issues.apache.org/jira/browse/SPARK-2017 Project: Spark Issue Type: Sub-task Reporter: Reynold Xin Labels: starter {code} sc.parallelize(1 to 100, 100).count() {code} The above code creates one million tasks to be executed. The stage detail web ui page takes forever to load (if it ever completes). There are again a few different alternatives: 0. Limit the number of tasks we show. 1. Pagination 2. By default only show the aggregate metrics and failed tasks, and hide the successful ones. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2017) web ui stage page becomes unresponsive when the number of tasks is large
[ https://issues.apache.org/jira/browse/SPARK-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14052679#comment-14052679 ] Mridul Muralidharan commented on SPARK-2017: Sounds great, ability to get to currently running tasks (to check current state), ability to get to task failures (to debug usually), some aggregate stats (gc, stats per executor, etc) and having some way to get to the full details (which is what is seen currently) in an advanced or full view. Anything else would be a bonus :-) web ui stage page becomes unresponsive when the number of tasks is large Key: SPARK-2017 URL: https://issues.apache.org/jira/browse/SPARK-2017 Project: Spark Issue Type: Sub-task Reporter: Reynold Xin Labels: starter {code} sc.parallelize(1 to 100, 100).count() {code} The above code creates one million tasks to be executed. The stage detail web ui page takes forever to load (if it ever completes). There are again a few different alternatives: 0. Limit the number of tasks we show. 1. Pagination 2. By default only show the aggregate metrics and failed tasks, and hide the successful ones. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2017) web ui stage page becomes unresponsive when the number of tasks is large
[ https://issues.apache.org/jira/browse/SPARK-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14051977#comment-14051977 ] Reynold Xin commented on SPARK-2017: It is definitely browser specific (but for all browsers!!!). That's why I think just having the aggregated metrics by default and the list of tasks that failed is probably a good idea. Thoughts? web ui stage page becomes unresponsive when the number of tasks is large Key: SPARK-2017 URL: https://issues.apache.org/jira/browse/SPARK-2017 Project: Spark Issue Type: Sub-task Reporter: Reynold Xin Labels: starter {code} sc.parallelize(1 to 100, 100).count() {code} The above code creates one million tasks to be executed. The stage detail web ui page takes forever to load (if it ever completes). There are again a few different alternatives: 0. Limit the number of tasks we show. 1. Pagination 2. By default only show the aggregate metrics and failed tasks, and hide the successful ones. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2017) web ui stage page becomes unresponsive when the number of tasks is large
[ https://issues.apache.org/jira/browse/SPARK-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14019394#comment-14019394 ] Mridul Muralidharan commented on SPARK-2017: Currently, for our jobs, I run with spark.ui.retainedStages=3 (so that there is some visibility into past stages) : this is to prevent OOM's in the master when number of tasks per stage is not low (50k for example is not very high imo) The stage details UI becomes very sluggish to pretty much unresponsive for our tasks where tasks 30k ... though that might also be a browser issue (firefox/chrome) ? web ui stage page becomes unresponsive when the number of tasks is large Key: SPARK-2017 URL: https://issues.apache.org/jira/browse/SPARK-2017 Project: Spark Issue Type: Sub-task Reporter: Reynold Xin Labels: starter {code} sc.parallelize(1 to 100, 100).count() {code} The above code creates one million tasks to be executed. The stage detail web ui page takes forever to load (if it ever completes). There are again a few different alternatives: 0. Limit the number of tasks we show. 1. Pagination 2. By default only show the aggregate metrics and failed tasks, and hide the successful ones. -- This message was sent by Atlassian JIRA (v6.2#6252)