[ 
https://issues.apache.org/jira/browse/KUDU-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434970#comment-17434970
 ] 

ASF subversion and git services commented on KUDU-1959:
-------------------------------------------------------

Commit bd5f72e0ed03b614d49b8125160bf555d11dcf9c in kudu's branch 
refs/heads/master from Abhishek Chennaka
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=bd5f72e ]

KUDU-1959 - Implement server startup progress page for tablet and master servers

This patch implements adjusting the server startup sequence to start the
web server and load the startup page which shows the progress of the startup.
This progress is broken down into the below steps:

Initializing
Reading Filesystem
  Reading instance metadata files
  Opening container files
Bootstrapping and opening the tablets
Starting RPC server

Sample screenshot  of the page can be found here:
https://i.imgur.com/J4NJD5i.png

Each of the steps above has a progress status of either 0 or 100 except
for the step “Bootstrapping and opening the tablets” which tracks the
processing of each of the tablet and the step “Opening container files”.
For log block manager this step shows the progress of opening the log
block containers. For file block manager, this is renamed to
“Reporting Filesystem” and has only 0 or 100 progress status. All of
the steps have time elapsed presented in seconds.

For the master startup page, the step “Bootstrapping and opening the
tablets” is replaced by “Initializing master catalog”.

Along with the above startup page the web server will also display
the pages - /home, /config, /logs, /mem-trackers, /memz, /threadz, /varz.
During the initialization since we expect the user's primary reason to
open the web UI to be to check the startup progress, /home  will be
redirected to /startup until the startup is completed. The usual homepage
contents and the other remaining web pages will be available once the rpc
server is started due to dependency. Manually validated the startup when
security is enabled and filesystem is not present.

The footer of the WebUI has the UUID by default. In case of new
instances, since we are starting the Webserver before the UUID is
generated the footer doesn’t contain UUID but it is set later when
the rpc server is started.

The next steps for this patch include writing tests for the startup
page. So far all the testing has been manual.

Change-Id: I1db1fcf16261d4ced1b3657a697766a5335271b4
Reviewed-on: http://gerrit.cloudera.org:8080/17730
Tested-by: Kudu Jenkins
Reviewed-by: Andrew Wong <aw...@cloudera.com>


> Hard to tell when a cluster is done starting up
> -----------------------------------------------
>
>                 Key: KUDU-1959
>                 URL: https://issues.apache.org/jira/browse/KUDU-1959
>             Project: Kudu
>          Issue Type: Improvement
>          Components: ops-tooling
>            Reporter: Jean-Daniel Cryans
>            Assignee: Abhishek
>            Priority: Major
>              Labels: roadmap-candidate, usability
>
> Restarting a cluster that has a good amount of data, it's hard to tell when 
> it's "done". Right now the things I do:
>  - Run ksck, wait until most tablets are not in "unavailable" or 
> "boostrapping" state.
>  - Watch the metrics and see when the data under management is close to where 
> it was before restarting (it grows as tablets are getting bootstrapped).
>  - Look at the tablet server web UIs for tablets, compare how many are done 
> bootstrapping VS in the process of VS not started.
> Ideas on how to improve this:
>  - In the master's web UI for tablet servers, show how many tablets are 
> running VS not running (I wouldn't add anything about tombstoned tablets)
>  - Add metrics for tablets in different states.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to