[GitHub] storm issue #2710: [WIP] STORM-3099: Extend metrics on supervisor and worker...

zd-project Mon, 25 Jun 2018 14:14:29 -0700

Github user zd-project commented on the issue:

    https://github.com/apache/storm/pull/2710
  
    New supervisor level metrics: 
    
    - [ ] Worker Kill/Restart Statistics
        - [x] Kill Count by Category - assignment change/HB too old/Heap space 
(memory limit?)
                - [x] blob change?
        - [ ] Worker Suicide Cnt - category:  internal error or Assignment 
Change
                - [x] - Implemented based on running status the container's 
main process. Does not actually reflect suicide count because it counts the 
normal exit as well.
        - [x] Worker idle period
                - The metrics records the duration machines spent in each state 
(in histogram) and how many times it transition into/out to a certain state.
        - [x] Time to Actually Kill worker (from identifying need by supervisor 
and actual change in the state of the worker) - (This is only an estimation, 
accuracy affected by SleepTime)
        - [x] Time to start worker for topology from reading assignment for the 
first time.
        - [x] Worker cleanup time
    - [x] Supervisor Level Metrics:
        - [x] Supervisor restart Count
                - simply report everytime it restarts.
        - [x] Blobstore (Request to download time)
                - [x] download time individual blob (inside localizer) 
localizer gettting requst to actually download hdfs request to finish
                        - I assume this to be [the complete process] from 
initiating download to commit download to local blob cache and inform relative 
workers
                - [x] download rate individual blob (inside localizer)
                        - This is tracks the actual download rate of a blob 
retrieval, in MB/s
                - [x] supervisor localizer thread blob download - how long 
(outside localizer)
                        - I put this inside async localizer as it turns out to 
be better suited for purpose. This tracks the time for a topology blob download 
request to be completely processed.
                - [x] Blob update is also considered.
        - [x] Blobstore Update due to Version change Cnts

---

[GitHub] storm issue #2710: [WIP] STORM-3099: Extend metrics on supervisor and worker...

Reply via email to