wu-sheng commented on a change in pull request #6711: URL: https://github.com/apache/skywalking/pull/6711#discussion_r609456251
########## File path: docs/en/setup/backend/backend-infrastructure-monitoring.md ########## @@ -0,0 +1,115 @@ +# VMs monitoring +SkyWalking leverages Prometheus node-exporter for collecting metrics data from the VMs, and leverages OpenTelemetry Collector to transfer the metrics to +[OpenTelemetry receiver](backend-receivers.md#opentelemetry-receiver) and into the [Meter System](./../../concepts-and-designs/meter.md). +We defined the VM entity as a `Service` in OAP, use `vm::` as a prefix to identify. + +## Data flow +1. Prometheus node-exporter collects metrics data from the VMs. +2. OpenTelemetry Collector fetches metrics from node-exporter via Prometheus Receiver and pushes metrics to SkyWalking OAP Server via the OpenCensus GRPC Exporter. +3. The SkyWalking OAP Server parses the expression with [MAL](../../concepts-and-designs/mal.md) to filter/calculate/aggregate and store the results. + +## Setup +1. Setup [Prometheus node-exporter](https://prometheus.io/docs/guides/node-exporter/). +2. Setup [OpenTelemetry Collector ](https://opentelemetry.io/docs/collector/). This is an example for OpenTelemetry Collector configuration [otel-collector-config.yaml](../../../../test/e2e/e2e-test/docker/promOtelVM/otel-collector-config.yaml). +3. Config SkyWalking [OpenTelemetry receiver](backend-receivers.md#opentelemetry-receiver). + +## Supported Metrics +| Monitoring Panel | Unit | Metric Name | Description | Data Source | +|-----|-----|-----|-----|-----| +| CPU Usage | % | cpu_total_percentage | The CPU cores total used percentage, if there are 2 cores the max usage is 200% | Prometheus node-exporter | +| Memory RAM Usage | MB | meter_vm_memory_used | The RAM total usage | Prometheus node-exporter | +| Memory Swap Usage | % | meter_vm_memory_swap_percentage | The swap memory used percentage | Prometheus node-exporter | +| CPU Average Used | % | meter_vm_cpu_average_used | The CPU cores used percentage in each mode | Prometheus node-exporter | +| CPU Load | | meter_vm_cpu_load1<br />meter_vm_cpu_load5<br />meter_vm_cpu_load15 | The CPU 1m / 5m / 15m average load | Prometheus node-exporter | +| Memory RAM | MB | meter_vm_memory_total<br />meter_vm_memory_available<br />meter_vm_memory_used | The RAM statistics, include Total / Available / Used | Prometheus node-exporter | +| Memory Swap | MB | meter_vm_memory_swap_free<br />meter_vm_memory_swap_total | The Swap Memory statistics, include Free / Total | Prometheus node-exporter | +| File System Mountpoint Usage | % | meter_vm_filesystem_percentage | The File System used percentage in each mount point | Prometheus node-exporter | +| Disk R/W | KB/s | meter_vm_disk_read,meter_vm_disk_written | The Disk read and written | Prometheus node-exporter | +| Network Bandwidth Usage | KB/s | meter_vm_network_receive<br />meter_vm_network_transmit | The Network receive and transmit | Prometheus node-exporter | +| Network Status | | meter_vm_tcp_curr_estab<br />meter_vm_tcp_tw<br />meter_vm_tcp_alloc<br />meter_vm_sockets_used<br />meter_vm_udp_inuse | The number of the TCP establish / TCP time wait / TCP allocated / Sockets inuse / UDP inuse | Prometheus node-exporter | +| Filefd Allocated | | meter_vm_filefd_allocated | The number of the File Descriptor allocated | Prometheus node-exporter | + +## Customizing +You can customize your own metrics/expression/dashboard panel. +The metrics definition and expression rules are in `/config/otel-oc-rules/vm.yaml`. +The dashboard panel confirmations are in `/config/ui-initialized-templates/vm.yml`. + +## Blog +A related blog can see: [SkyWalking 8.4 provides infrastructure monitoring](https://skywalking.apache.org/blog/2021-02-07-infrastructure-monitoring/) + +# K8s monitoring +SkyWalking leverages K8s kube-state-metrics and cAdvisor for collecting metrics data from the K8s, and leverages OpenTelemetry Collector to transfer the metrics to +[OpenTelemetry receiver](backend-receivers.md#opentelemetry-receiver) and into the [Meter System](./../../concepts-and-designs/meter.md). This feature requires authorizing the OAP Server to access K8s's `API Server`. +We defined the k8s-cluster as a `Service` in OAP, use `k8s-cluster::` as a prefix to identify. +Defined the k8s-node as an `Instance` in OAP, the name is k8s `node name`. +Defined the k8s-service as an `Endpoint` in OAP, the name is `$serviceName.$namespace`. + +## Data flow +1. K8s kube-state-metrics and cAdvisor collects metrics data from the K8s. +2. OpenTelemetry Collector fetches metrics from kube-state-metrics and cAdvisor via Prometheus Receiver and pushes metrics to SkyWalking OAP Server via the OpenCensus GRPC Exporter. +3. The SkyWalking OAP Server access to K8s's `API Server` gets meta info and parses the expression with [MAL](../../concepts-and-designs/mal.md) to filter/calculate/aggregate and store the results. + +## Setup +1. Setup [kube-state-metric](https://github.com/kubernetes/kube-state-metrics#kubernetes-deployment). +2. cAdvisor is integrated into `kubelet` by default. +3. Setup [OpenTelemetry Collector ](https://opentelemetry.io/docs/collector/getting-started/#kubernetes). Prometheus Receiver in OpenTelemetry Collector for K8s can reference [here](https://github.com/prometheus/prometheus/blob/main/documentation/examples/prometheus-kubernetes.yml). For a quick start, we provided a full example for OpenTelemetry Collector configuration [otel-collector-config.yaml](../backend/otel-collector-config.yaml). Review comment: > [otel-collector-config.yaml](../backend/otel-collector-config.yaml) Isn't this in the same folder? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
