[ https://issues.apache.org/jira/browse/KUDU-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434970#comment-17434970 ]
ASF subversion and git services commented on KUDU-1959: ------------------------------------------------------- Commit bd5f72e0ed03b614d49b8125160bf555d11dcf9c in kudu's branch refs/heads/master from Abhishek Chennaka [ https://gitbox.apache.org/repos/asf?p=kudu.git;h=bd5f72e ] KUDU-1959 - Implement server startup progress page for tablet and master servers This patch implements adjusting the server startup sequence to start the web server and load the startup page which shows the progress of the startup. This progress is broken down into the below steps: Initializing Reading Filesystem Reading instance metadata files Opening container files Bootstrapping and opening the tablets Starting RPC server Sample screenshot of the page can be found here: https://i.imgur.com/J4NJD5i.png Each of the steps above has a progress status of either 0 or 100 except for the step “Bootstrapping and opening the tablets” which tracks the processing of each of the tablet and the step “Opening container files”. For log block manager this step shows the progress of opening the log block containers. For file block manager, this is renamed to “Reporting Filesystem” and has only 0 or 100 progress status. All of the steps have time elapsed presented in seconds. For the master startup page, the step “Bootstrapping and opening the tablets” is replaced by “Initializing master catalog”. Along with the above startup page the web server will also display the pages - /home, /config, /logs, /mem-trackers, /memz, /threadz, /varz. During the initialization since we expect the user's primary reason to open the web UI to be to check the startup progress, /home will be redirected to /startup until the startup is completed. The usual homepage contents and the other remaining web pages will be available once the rpc server is started due to dependency. Manually validated the startup when security is enabled and filesystem is not present. The footer of the WebUI has the UUID by default. In case of new instances, since we are starting the Webserver before the UUID is generated the footer doesn’t contain UUID but it is set later when the rpc server is started. The next steps for this patch include writing tests for the startup page. So far all the testing has been manual. Change-Id: I1db1fcf16261d4ced1b3657a697766a5335271b4 Reviewed-on: http://gerrit.cloudera.org:8080/17730 Tested-by: Kudu Jenkins Reviewed-by: Andrew Wong <aw...@cloudera.com> > Hard to tell when a cluster is done starting up > ----------------------------------------------- > > Key: KUDU-1959 > URL: https://issues.apache.org/jira/browse/KUDU-1959 > Project: Kudu > Issue Type: Improvement > Components: ops-tooling > Reporter: Jean-Daniel Cryans > Assignee: Abhishek > Priority: Major > Labels: roadmap-candidate, usability > > Restarting a cluster that has a good amount of data, it's hard to tell when > it's "done". Right now the things I do: > - Run ksck, wait until most tablets are not in "unavailable" or > "boostrapping" state. > - Watch the metrics and see when the data under management is close to where > it was before restarting (it grows as tablets are getting bootstrapped). > - Look at the tablet server web UIs for tablets, compare how many are done > bootstrapping VS in the process of VS not started. > Ideas on how to improve this: > - In the master's web UI for tablet servers, show how many tablets are > running VS not running (I wouldn't add anything about tombstoned tablets) > - Add metrics for tablets in different states. -- This message was sent by Atlassian Jira (v8.3.4#803005)