[ 
https://issues.apache.org/jira/browse/RANGER-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikas Kumar updated RANGER-4047:
--------------------------------
    Description: 
Ranger KMS should collect the important System as well as application level 
health metrics.

System metrics: JVM/CPU/memory related metrics

Application metrics: KMS API execution metrics. Like, number of time DECRYPT 
operation invoked and time taken to complete the request etc.

There should also be API to consume these stats, preferably REST API. Any 
metric tools should be able to get these metrics through that REST API.

This will help making KMS highly observable. Alerts system can be configured to 
consume, process and generate alerts.

There could be many other use cases.

*Approach:*
The solution depends on "ranger-metrics" common module for both Json and 
Prometheus sink.

Kms has added only KMS specific application metrics. Generally, one COUNT 
metric and corresponding elapsed time gauge metric for each REST end points.

By default, metric collection is not thread-safe but the by adding following 
property in kms-site.xml it can be made thread-safe:

Prop name: hadoop.kms.metric.collection.threadsafe=true. // possible values 
true/false

 

===========Sample response to list down metrics collected============

curl -ivk  -H "Content-Type: application/json" -H  -X GET 
[http://localhost:9292/kms/metrics/json?user.name=vikas]

sample response:

{
  "KMS": {
    "GET_CURRENT_KEY_COUNT": 0,
    "DELETE_KEY_ELAPSED_TIME": 0,
    "EEK_DECRYPT_ELAPSED_TIME": 0,
    "GET_KEYS_METADATA_ELAPSED_TIME": 0,
    "EEK_GENERATE_ELAPSED_TIME": 0,
    "GET_CURRENT_KEY_ELAPSED_TIME": 0,
    "EEK_REENCRYPT_ELAPSED_TIME": 0,
    "KEY_CREATE_COUNT": 1,
    "UNAUTHORIZED_CALLS_COUNT": 0,
    "KEY_CREATE_ELAPSED_TIME": 81,
    "GET_KEY_VERSION_COUNT": 0,
    "ROLL_NEW_VERSION_ELAPSED_TIME": 0,
    "REENCRYPT_EEK_BATCH_COUNT": 0,
    "REENCRYPT_EEK_BATCH_ELAPSED_TIME": 0,
    "GET_KEYS_METADATA_COUNT": 0,
    "GET_KEY_VERSIONS_COUNT": 0,
    "GET_KEY_VERSIONS_ELAPSED_TIME": 0,
    "GET_KEYS_COUNT": 2,
    "EEK_GENERATE_COUNT": 0,
    "INVALIDATE_CACHE_COUNT": 0,
    "GET_METADATA_COUNT": 3,
    "REENCRYPT_EEK_BATCH_KEYS_COUNT": 0,
    "EEK_REENCRYPT_COUNT": 0,
    "UNAUTHENTICATED_CALLS_COUNT": 0,
    "GET_KEY_VERSION_ELAPSED_TIME": 0,
    "INVALIDATE_CACHE_ELAPSED_TIME": 0,
    "ROLL_NEW_VERSION_COUNT": 0,
    "EEK_DECRYPT_COUNT": 0,
    "GET_KEYS_METADATA_KEYNAMES_COUNT": 0,
    "DELETE_KEY_COUNT": 0,
    "GET_KEYS_ELAPSED_TIME": 72,
    "GET_METADATA_ELAPSED_TIME": 14,
    "TOTAL_CALL_COUNT": 7
  },
  "RangerJvm": {
    "GcTimeTotal": 339,
    "SystemLoadAvg": 1.47,
    "ThreadsBusy": 5,
    "GcCountTotal": 9,
    "MemoryMax": 1005584384,
    "MemoryCurrent": 221646760,
    "ThreadsWaiting": 20,
    "ProcessorsAvailable": 2,
    "GcTimeMax": 339,
    "ThreadsBlocked": 0,
    "ThreadsRemaining": 9
  }
}

  was:
Ranger KMS should collect the important System as well as application level 
health metrics.

System metrics: JVM/CPU/memory related metrics

Application metrics: KMS API execution metrics. Like, number of time DECRYPT 
operation invoked and time taken to complete the request etc.

There should also be API to consume these stats, preferably REST API. Any 
metric tools should be able to get these metrics through that REST API.

This will help making KMS highly observable. Alerts system can be configured to 
consume, process and generate alerts.

There could be many other use cases.

*Approach:*
The solution depends on "ranger-metrics" common module for both Json and 
Prometheus sink. 

Kms has added only KMS specific application metrics. Generally, one COUNT 
metric and corresponding elapsed time gauge metric for each REST end points.

By default, metric collection is not thread-safe but the by adding following 
property in kms-site.xml it can be made thread-safe:

Prop name: hadoop.kms.metric.collection.threadsafe=true.  // possible values 
true/false

 

===========Sample response to list down metrics collected============

curl -ivk  -H "Content-Type: application/json" -H  -X GET 
http://localhost:9292/kms/metrics/json?user.name=vikas

sample response:

{
  "KMS" : {
    "GET_CURRENT_KEY_COUNT" : 0,
    "DELETE_KEY_ELAPSED_TIME" : 0,
    "EEK_DECRYPT_ELAPSED_TIME" : 0,
    "GET_KEYS_METADATA_ELAPSED_TIME" : 0,
    "EEK_GENERATE_ELAPSED_TIME" : 0,
    "GET_CURRENT_KEY_ELAPSED_TIME" : 0,
    "EEK_REENCRYPT_ELAPSED_TIME" : 0,
    "KEY_CREATE_COUNT" : 1,
    "UNAUTHORIZED_CALLS_COUNT" : 0,
    "KEY_CREATE_ELAPSED_TIME" : 81,
    "GET_KEY_VERSION_COUNT" : 0,
    "ROLL_NEW_VERSION_ELAPSED_TIME" : 0,
    "REENCRYPT_EEK_BATCH_COUNT" : 0,
    "REENCRYPT_EEK_BATCH_ELAPSED_TIME" : 0,
    "GET_KEYS_METADATA_COUNT" : 0,
    "GET_KEY_VERSIONS_COUNT" : 0,
    "GET_KEY_VERSIONS_ELAPSED_TIME" : 0,
    "GET_KEYS_COUNT" : 2,
    "EEK_GENERATE_COUNT" : 0,
    "INVALIDATE_CACHE_COUNT" : 0,
    "GET_METADATA_COUNT" : 3,
    "REENCRYPT_EEK_BATCH_KEYS_COUNT" : 0,
    "EEK_REENCRYPT_COUNT" : 0,
    "UNAUTHENTICATED_CALLS_COUNT" : 0,
    "GET_KEY_VERSION_ELAPSED_TIME" : 0,
    "INVALIDATE_CACHE_ELAPSED_TIME" : 0,
    "ROLL_NEW_VERSION_COUNT" : 0,
    "EEK_DECRYPT_COUNT" : 0,
    "GET_KEYS_METADATA_KEYNAMES_COUNT" : 0,
    "DELETE_KEY_COUNT" : 0,
    "GET_KEYS_ELAPSED_TIME" : 72,
    "GET_METADATA_ELAPSED_TIME" : 14,
    "TOTAL_CALL_COUNT" : 7
  },
  "RangerJvm" : {
    "GcTimeTotal" : 339,
    "SystemLoadAvg" : 1.47,
    "ThreadsBusy" : 5,
    "GcCountTotal" : 9,
    "MemoryMax" : 1005584384,
    "MemoryCurrent" : 221646760,
    "ThreadsWaiting" : 20,
    "ProcessorsAvailable" : 2,
    "GcTimeMax" : 339,
    "ThreadsBlocked" : 0,
    "ThreadsRemaining" : 9
  }


> Ranger KMS health metrics 
> --------------------------
>
>                 Key: RANGER-4047
>                 URL: https://issues.apache.org/jira/browse/RANGER-4047
>             Project: Ranger
>          Issue Type: New Feature
>          Components: kms
>            Reporter: Vikas Kumar
>            Assignee: Vikas Kumar
>            Priority: Major
>
> Ranger KMS should collect the important System as well as application level 
> health metrics.
> System metrics: JVM/CPU/memory related metrics
> Application metrics: KMS API execution metrics. Like, number of time DECRYPT 
> operation invoked and time taken to complete the request etc.
> There should also be API to consume these stats, preferably REST API. Any 
> metric tools should be able to get these metrics through that REST API.
> This will help making KMS highly observable. Alerts system can be configured 
> to consume, process and generate alerts.
> There could be many other use cases.
> *Approach:*
> The solution depends on "ranger-metrics" common module for both Json and 
> Prometheus sink.
> Kms has added only KMS specific application metrics. Generally, one COUNT 
> metric and corresponding elapsed time gauge metric for each REST end points.
> By default, metric collection is not thread-safe but the by adding following 
> property in kms-site.xml it can be made thread-safe:
> Prop name: hadoop.kms.metric.collection.threadsafe=true. // possible values 
> true/false
>  
> ===========Sample response to list down metrics collected============
> curl -ivk  -H "Content-Type: application/json" -H  -X GET 
> [http://localhost:9292/kms/metrics/json?user.name=vikas]
> sample response:
> {
>   "KMS": {
>     "GET_CURRENT_KEY_COUNT": 0,
>     "DELETE_KEY_ELAPSED_TIME": 0,
>     "EEK_DECRYPT_ELAPSED_TIME": 0,
>     "GET_KEYS_METADATA_ELAPSED_TIME": 0,
>     "EEK_GENERATE_ELAPSED_TIME": 0,
>     "GET_CURRENT_KEY_ELAPSED_TIME": 0,
>     "EEK_REENCRYPT_ELAPSED_TIME": 0,
>     "KEY_CREATE_COUNT": 1,
>     "UNAUTHORIZED_CALLS_COUNT": 0,
>     "KEY_CREATE_ELAPSED_TIME": 81,
>     "GET_KEY_VERSION_COUNT": 0,
>     "ROLL_NEW_VERSION_ELAPSED_TIME": 0,
>     "REENCRYPT_EEK_BATCH_COUNT": 0,
>     "REENCRYPT_EEK_BATCH_ELAPSED_TIME": 0,
>     "GET_KEYS_METADATA_COUNT": 0,
>     "GET_KEY_VERSIONS_COUNT": 0,
>     "GET_KEY_VERSIONS_ELAPSED_TIME": 0,
>     "GET_KEYS_COUNT": 2,
>     "EEK_GENERATE_COUNT": 0,
>     "INVALIDATE_CACHE_COUNT": 0,
>     "GET_METADATA_COUNT": 3,
>     "REENCRYPT_EEK_BATCH_KEYS_COUNT": 0,
>     "EEK_REENCRYPT_COUNT": 0,
>     "UNAUTHENTICATED_CALLS_COUNT": 0,
>     "GET_KEY_VERSION_ELAPSED_TIME": 0,
>     "INVALIDATE_CACHE_ELAPSED_TIME": 0,
>     "ROLL_NEW_VERSION_COUNT": 0,
>     "EEK_DECRYPT_COUNT": 0,
>     "GET_KEYS_METADATA_KEYNAMES_COUNT": 0,
>     "DELETE_KEY_COUNT": 0,
>     "GET_KEYS_ELAPSED_TIME": 72,
>     "GET_METADATA_ELAPSED_TIME": 14,
>     "TOTAL_CALL_COUNT": 7
>   },
>   "RangerJvm": {
>     "GcTimeTotal": 339,
>     "SystemLoadAvg": 1.47,
>     "ThreadsBusy": 5,
>     "GcCountTotal": 9,
>     "MemoryMax": 1005584384,
>     "MemoryCurrent": 221646760,
>     "ThreadsWaiting": 20,
>     "ProcessorsAvailable": 2,
>     "GcTimeMax": 339,
>     "ThreadsBlocked": 0,
>     "ThreadsRemaining": 9
>   }
> }



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to