This is an automated email from the ASF dual-hosted git repository.
wusheng pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/skywalking-banyandb.git
The following commit(s) were added to refs/heads/main by this push:
new 452668a2 docs: KTM documentation update(workload semantics and metric
interpretation) (#903)
452668a2 is described below
commit 452668a23f7bde3ba09b1d64a0f50b615ce53e59
Author: Lihan Zhou <[email protected]>
AuthorDate: Sat Dec 20 18:13:15 2025 -0800
docs: KTM documentation update(workload semantics and metric
interpretation) (#903)
---
docs/design/ktm.md | 69 ++++----
docs/menu.yml | 2 +
docs/operation/fodc/ktm_metrics.md | 327 +++++++++++++++++++++++++++++++++++++
3 files changed, 370 insertions(+), 28 deletions(-)
diff --git a/docs/design/ktm.md b/docs/design/ktm.md
index acf678b6..047c6de4 100644
--- a/docs/design/ktm.md
+++ b/docs/design/ktm.md
@@ -3,7 +3,7 @@
## Overview
-Kernel Telemetry Module (KTM) is an optional, modular kernel observability
component embedded inside the BanyanDB First Occurrence Data Collection (FODC)
sidecar. The first built-in module is an eBPF-based I/O monitor ("iomonitor")
that focuses on page cache behavior, fadvise() effectiveness, and memory
pressure signals and their impact on BanyanDB performance. KTM is not a
standalone agent or network-facing service; it runs as a sub-component of the
FODC sidecar ("black box") and expose [...]
+Kernel Telemetry Module (KTM) is an optional, modular kernel observability
component embedded inside the BanyanDB First Occurrence Data Collection (FODC)
sidecar. The first built-in module is an eBPF-based I/O monitor ("iomonitor")
that focuses on page cache behavior, fadvise() effectiveness, and memory
pressure signals and their impact on BanyanDB performance. KTM is not a
standalone agent or network-facing service; it runs as a sub-component of the
FODC sidecar ("black box") and expose [...]
## Architecture
@@ -50,60 +50,63 @@ Notes:
- Focus: page cache add/delete, fadvise() calls, I/O counters, and memory
reclaim signals.
- Attachment points: stable tracepoints where possible; fentry/fexit preferred
on newer kernels.
- Data path: kernel events -> BPF maps (monotonic counters) -> userspace
collector -> exporters.
-- Scoping: Fixed to the single, co-located BanyanDB process within the same
container/pod.
+- Scoping: Fixed to the single, co-located BanyanDB process within the same
container/pod, using cgroup membership first and a configurable comm-prefix
fallback (default `banyand`).
## Metrics Model and Collection Strategy
-- Counters in BPF maps are monotonic and are not cleared by the userspace
collector (NoCleanup).
+- Counters in BPF maps are monotonic and are not cleared by the userspace
collector.
- Collection and push interval: 10 seconds by default.
- KTM periodically pushes collected metrics into the FODC Flight Recorder
through a Go-native interface at the configured interval (default 10s). The
push interval is exported through the `collector.interval` configuration
option. The Flight Recorder is responsible for any subsequent export,
persistence, or diagnostics workflows.
-- Downstream systems (for example, FODC Discovery Proxy or higher-level
exporters) should derive rates using `rate()`/`irate()` or equivalents; we
avoid windowed counters and map resets to preserve counter semantics.
+- Downstream systems derive rates (for example, Prometheus/PromQL
`rate()`/`irate()`); FODC/KTM only provides raw counters and does not compute
rates internally. We avoid windowed counters and map resets to preserve counter
semantics.
- int64 overflow is not a practical concern for our use cases; we accept
long-lived monotonic growth.
+- KTM exports only raw counters; any ratios/percentages are derived upstream
(see FODC operations/overview for exporter behavior).
Configuration surface (current):
- `collector.interval`: Controls the periodic push interval for metrics to
Flight Recorder. Defaults to 10s.
-- `collector.enable_cgroup_filter`, `collector.enable_mntns_filter`: default
on when in sidecar mode; can be toggled.
-- `collector.target_pid`/`collector.target_comm`: optional helpers for
discovering scoping targets.
-- `collector.target_comm_regex`: process matcher regular expression used
during target discovery (matches `/proc/<pid>/comm` and/or executable
basename). Defaults to `banyand`.
-- Cleanup strategy is effectively `no_cleanup` by design intent;
clear-after-read logic is deprecated for production metrics.
+- `collector.ebpf.cgroup_path` (optional): absolute or
`/sys/fs/cgroup`-relative path to the BanyanDB cgroup v2; if unset, KTM
autodetects by scanning `/proc/*/comm` for the configured prefix (default
`banyand`).
+- `collector.ebpf.comm_prefix` (optional): comm name prefix used for process
discovery and fallback filtering; defaults to `banyand`.
+- Target discovery heuristic: match `/proc/<pid>/comm` against the configured
prefix (default `banyand`) to locate BanyanDB and derive its cgroup; this also
serves as the runtime fallback if the cgroup filter is unset.
+- Cleanup strategy is monotonic counters only; downstream derives rates. KTM
does not clear BPF maps during collection.
- Configuration is applied via the FODC sidecar; KTM does not define its own
standalone process-level configuration surface.
## Scoping and Filtering
- Scoping is not optional; KTM is designed exclusively to monitor the single
BanyanDB process it is co-located with in a sidecar deployment.
-- The target process is identified at startup, and eBPF programs are
instructed to filter events to only that process.
-- Primary filtering mechanism: cgroup v2. This ensures all events originate
from the correct container. PID and mount namespace filters are used as
supplementary checks.
+- The target container is identified at startup; eBPF programs filter events
by cgroup membership first. If the cgroup filter is absent or misses, a
configurable comm-prefix match (default `banyand`) is used as a narrow fallback.
- The design intentionally avoids multi-process or node-level (DaemonSet)
monitoring to keep the implementation simple and overhead minimal.
### Target Process Discovery (Pod / VM)
-KTM needs to resolve the single “target” BanyanDB process before enabling
filters and attaching eBPF programs. In both Kubernetes pods and VM/bare-metal
deployments, KTM uses a **process matcher** driven by a configurable regular
expression (`collector.target_comm_regex`, default `banyand`).
+KTM needs to resolve the single “target” BanyanDB process before enabling
filters and attaching eBPF programs. In both Kubernetes pods and VM/bare-metal
deployments, KTM uses a **process matcher** based on a configurable comm-prefix
match (default `banyand`).
#### Kubernetes Pod (sidecar)
Preconditions:
-- The pod should be configured with `shareProcessNamespace: true` so the
monitor sidecar can see the target container’s `/proc` entries.
-- The monitor container should have cgroup v2 mounted (typically at
`/sys/fs/cgroup`).
+- The pod must be configured with `shareProcessNamespace: true` so the monitor
sidecar can see the target container’s `/proc` entries.
+- cgroup v2 mounted (typically at `/sys/fs/cgroup`) to enable the primary
cgroup filter.
+- If the target process cannot be discovered (for example,
`shareProcessNamespace` is off), KTM logs a warning and keeps periodically
checking; once the target process appears, KTM enables the module automatically.
Discovery flow (high level):
- Scan `/proc` for candidate processes.
-- For each PID, read `/proc/<pid>/comm` (and/or the executable basename) and
match it against `collector.target_comm_regex`.
-- Once matched, read `/proc/<pid>/cgroup` to obtain the target’s cgroup
path/identity, then enable cgroup filtering so only events from that
container/process are counted.
+- For each PID, read `/proc/<pid>/comm` and match it against the configured
comm prefix (default `banyand`) (or an explicitly provided cgroup path).
+- Once matched, derive the target cgroup from `/proc/<pid>/cgroup` (cgroup v2)
and program the eBPF cgroup filter. The comm match remains as runtime fallback
if the cgroup check does not fire.
+- If no matching process is found at startup, KTM continues periodic probing
and activates once it is found.
#### VM / bare metal
Discovery flow (high level):
- Scan `/proc` for candidate processes.
-- Match `/proc/<pid>/comm` (and/or executable basename) against
`collector.target_comm_regex` (default `banyand`).
-- Use PID (and optionally cgroup/mount namespace filters if available) to
scope kernel events to the selected process.
+- Match `/proc/<pid>/comm` against the configured prefix (default `banyand`).
+- Use the discovered PID to derive cgroup v2 path and program the filter; keep
the comm match as runtime fallback if cgroup filtering is unavailable.
+- If no matching process is found at startup, KTM continues periodic probing
and activates once it is found.
### Scoping Semantics
- The BPF maps use a single-slot structure (e.g., a BPF array map with a
single entry) to store global monotonic counters for the target process.
- This approach eliminates the need for per-pid hash maps, key eviction logic,
and complexities related to tracking multiple processes.
-- All kernel events are filtered by the target process's identity (via its
cgroup ID and PID) before any counters are updated in the BPF map.
+- All kernel events are filtered by the target container’s cgroup ID when
available; if the cgroup filter misses (for example, map not populated), a
comm-prefix match (configurable; default `banyand`) is used before any counters
are updated.
Example (YAML):
```yaml
@@ -111,8 +114,11 @@ collector:
interval: 10s
modules:
- iomonitor
- enable_cgroup_filter: true
- enable_mntns_filter: true
+ ebpf:
+ # Optional: absolute or /sys/fs/cgroup-relative path to the BanyanDB
cgroup v2.
+ # cgroup_path: /sys/fs/cgroup/<banyandb-cgroup>
+ # Optional: comm name prefix used for process discovery and fallback
filtering.
+ # comm_prefix: banyand
```
@@ -141,6 +147,9 @@ Prefix: metrics are currently emitted under the `ktm_`
namespace to reflect thei
- `ktm_cache_read_attempts_total`
- `ktm_cache_misses_total`
- `ktm_page_cache_adds_total`
+- I/O latency (syscall-level)
+ - `ktm_sys_read_latency_seconds` (histogram family: exposes `_bucket`,
`_count`, `_sum`)
+ - `ktm_sys_pread_latency_seconds` (histogram family: exposes `_bucket`,
`_count`, `_sum`)
- fadvise()
- `ktm_fadvise_calls_total`
- `ktm_fadvise_advice_total{advice="..."}`
@@ -148,11 +157,9 @@ Prefix: metrics are currently emitted under the `ktm_`
namespace to reflect thei
- Memory
- `ktm_memory_lru_pages_scanned_total`
- `ktm_memory_lru_pages_reclaimed_total`
- - `ktm_memory_reclaim_efficiency_percent`
- `ktm_memory_direct_reclaim_processes`
-Semantics: all counters are monotonic; use Prometheus functions for
rates/derivatives; no map clearing between scrapes.
-
+Semantics: all counters are monotonic; latency metrics are exported as
Prometheus histograms (`_bucket`, `_count`, `_sum`); use Prometheus functions
for rates/derivatives; no map clearing between scrapes. KTM does not emit
ratio/percentage metrics; derive them upstream.
## Safety & Overhead Boundary
@@ -164,6 +171,8 @@ Loading and managing eBPF programs requires elevated
privileges. The FODC sideca
- `CAP_BPF`: Allows loading, attaching, and managing eBPF programs and maps.
This is the preferred, more restrictive capability.
- `CAP_SYS_ADMIN`: A broader capability that also grants permission to perform
eBPF operations. It may be required on older kernels where `CAP_BPF` is not
fully supported.
+Operational prerequisites and observability: see
`docs/operation/fodc/ktm_metrics.md`.
+
The sidecar should be configured with the minimal set of capabilities required
for its operation to adhere to the principle of least privilege.
## Failure Modes
@@ -179,15 +188,19 @@ This approach ensures that a failure within the
observability module does not im
## Restart Semantics
-On sidecar restart, BPF maps are recreated and all counters reset to zero.
Downstream systems (e.g., Prometheus via FODC integrations) should treat this
as a new counter lifecycle and continue deriving rates/derivatives normally.
+- On sidecar restart, BPF maps are recreated and all counters reset to zero.
Downstream systems (e.g., Prometheus via FODC integrations) should treat this
as a new counter lifecycle and continue deriving rates/derivatives normally.
+- If BanyanDB restarts (PID changes), the cgroup filter continues to match as
long as the container does not change; the comm fallback still matches the
configured prefix (default `banyand`).
+- If the pod/container is recreated (cgroup path changes), KTM re-runs target
discovery, re-programs the cgroup filter, and starts counters from zero;
metrics from the old container are discarded without reconciliation.
+- KTM performs a lightweight health check during collection to ensure the
cgroup filter is still populated; if it is missing (for example, container
crash/restart), KTM re-detects and re-programs the filter automatically.
## Kernel Attachment Points (Current)
-- `ksys_fadvise64_64` → fentry/fexit (preferred) or syscall tracepoints with
kprobe fallback.
-- Page cache add/remove → `filemap_get_read_batch` and
`mm_filemap_add_to_page_cache` tracepoints, with kprobe fallbacks.
-- Memory reclaim → `mm_vmscan_lru_shrink_inactive` and
`mm_vmscan_direct_reclaim_begin` tracepoints.
+- `sys_enter_read`, `sys_exit_read`, `sys_enter_pread64`, `sys_exit_pread64`
(syscall-level I/O latency).
+- `mm_filemap_add_to_page_cache`, `filemap_get_read_batch` (page cache
add/churn).
+- `ksys_fadvise64_64` (fadvise policy actions; fentry/fexit preferred).
+- `mm_vmscan_lru_shrink_inactive`, `mm_vmscan_direct_reclaim_begin` (memory
reclaim/pressure).
## Limitations
- Page cache–only perspective: direct I/O that bypasses the cache is not
observed.
-- Kernel-only visibility: no userspace spans, SQL parsing, or CPU profiling.
\ No newline at end of file
+- Kernel-only visibility: no userspace spans, SQL parsing, or CPU profiling.
diff --git a/docs/menu.yml b/docs/menu.yml
index d3e13a48..ee95f8ea 100644
--- a/docs/menu.yml
+++ b/docs/menu.yml
@@ -153,6 +153,8 @@ catalog:
catalog:
- name: "Overview"
path: "/operation/fodc/overview"
+ - name: "KTM Metrics"
+ path: "/operation/fodc/ktm_metrics"
- name: "Property Background Repair"
path: "/concept/property-repair"
- name: "Benchmark"
diff --git a/docs/operation/fodc/ktm_metrics.md
b/docs/operation/fodc/ktm_metrics.md
new file mode 100644
index 00000000..fc1117ab
--- /dev/null
+++ b/docs/operation/fodc/ktm_metrics.md
@@ -0,0 +1,327 @@
+# KTM Metrics — Semantics & Workload Interpretation
+
+This document defines the **semantic meaning** of kernel-level metrics
collected by the
+Kernel Telemetry Module (KTM) under different BanyanDB workloads.
+
+It serves as the **authoritative interpretation guide** for:
+- First Occurrence Data Capture (FODC)
+- Automated analysis and reporting by LLM agents
+- Self-healing and tuning recommendations
+
+This document does **not** describe kernel attachment points or implementation
details.
+Those are covered separately in the KTM design document.
+
+---
+
+## 1. Scope and Non-Goals
+
+### In Scope
+- Interpreting kernel metrics in the context of **LSM-style read + compaction
workloads**
+- Distinguishing **benign background activity** from **user-visible read-path
impact**
+- Providing **actionable, explainable signals** for automated analysis
+
+### Out of Scope
+- Device-level I/O profiling or per-disk attribution
+- SLA-grade performance accounting
+- Precise block-layer root cause isolation
+
+SLA-grade performance accounting is explicitly out of scope because
+eBPF-based sampling and histogram bucketing introduce statistical
+approximation, and kernel-level telemetry cannot capture application-
+or network-level queuing delays.
+
+KTM focuses on **user-visible impact first**, followed by kernel-side
explanations.
+
+---
+
+## 2. Core Metrics Overview
+
+### 2.1 Read / Pread Syscall Latency (Histogram)
+
+**Metric Type**
+- Histogram (bucketed latency)
+- Collected at syscall entry/exit for `read` and `pread64`
+
+**Semantic Meaning**
+This metric represents the **time BanyanDB threads spend blocked in the
read/pread syscall path**.
+
+It is the **primary impact signal** in KTM.
+
+**Key Rule**
+> If syscall-level read/pread latency does **not** increase, the situation is
**not considered an incident**, regardless of background cache or reclaim
activity.
+
+**Why Histogram**
+- Captures long-tail latency (p95 / p99) reliably
+- More representative of user experience than averages
+- Suitable for LLM-based reasoning and reporting
+
+---
+
+### 2.2 fadvise Policy Actions
+
+**Metric Type**
+- Counter
+
+**Semantic Meaning**
+Records **explicit page cache eviction hints** issued by BanyanDB.
+
+This metric represents **policy intent**, not impact.
+
+**Interpretation Notes**
+- fadvise activity alone is not an anomaly
+- Must be correlated with read/pread latency to assess impact
+
+---
+
+### 2.3 Page Cache Add / Fill Activity
+
+**Metric Type**
+- Counter
+
+**Semantic Meaning**
+Represents pages being added to the OS page cache due to:
+- Read misses
+- Sequential scans
+- Compaction activity
+
+High page cache add rates are **expected** under LSM workloads.
+
+**Note**
+Page cache add activity does not necessarily imply disk I/O or cache miss.
+It may increase due to readahead, sequential scans, or compaction reads,
+and should be treated as a **correlated signal**, not a causal indicator,
+unless accompanied by read/pread latency degradation.
+
+---
+
+### 2.4 Memory Reclaim and Pressure Signals
+
+**Metrics**
+- LRU shrink activity
+- Direct reclaim entry events
+
+**Semantic Meaning**
+Indicates **kernel memory pressure** that may destabilize page cache residency.
+
+These metrics act as **root-cause hints**, not incident triggers.
+
+---
+
+## 3. Interpretation Principles
+
+### 3.1 Impact-First Gating
+
+All incident detection and analysis is gated on:
+
+> **Syscall-level read/pread latency histogram**
+
+This refers to the combined read/pread syscall latency histograms.
+
+Other metrics are used **only to explain why latency increased**, not to
decide whether an incident occurred.
+
+---
+
+### 3.2 Cache Churn Is Not an Incident
+
+High values of:
+- page cache add
+- reclaim
+- background scans
+
+are **normal** under LSM-style workloads and **must not** be treated as
incidents unless they result in read/pread latency degradation.
+
+---
+
+## 4. Workload Semantics
+
+This section defines canonical workload patterns and how KTM metrics should be
interpreted.
+
+---
+
+> **Global Rule — Latency-Gated Evaluation**
+>
+> All workload patterns below are evaluated **only after syscall-level
+> read/pread latency degradation has been detected** (e.g., p95/p99 bucket
shift).
+> Kernel signals such as page cache activity, reclaim, or fadvise **must not**
+> be interpreted as incident triggers on their own.
+
+---
+
+### Workload 1 — Sequential Read / Background Compaction (Benign)
+
+**Typical Signals**
+- `page_cache_add ↑`
+- `lru_shrink ↑` (optional)
+- `read/pread syscall latency stable`
+
+**Interpretation**
+Sequential scans and compaction naturally introduce cache churn.
+As long as read/pread latency remains stable, this workload is benign.
+
+**Operational Decision**
+- Do not trigger FODC
+- No self-healing action required
+
+---
+
+### Workload 2 — High Page Cache Pressure, Foreground Sustained
+
+**Typical Signals**
+- `page_cache_add ↑`
+- `lru_shrink ↑`
+- occasional `direct_reclaim`
+- `read/pread syscall latency stable`
+
+**Interpretation**
+System memory pressure exists, but foreground reads are not impacted.
+This indicates a tight but stable operating point.
+
+**Operational Decision**
+- No incident
+- Monitor trends only
+
+---
+
+### Workload 3 — Aggressive Cache Eviction or Reclaim Impact
+
+**Typical Signals**
+- `fadvise_calls ↑` or early reclaim activity
+- `page_cache_add ↑` (repeated refills)
+- `read/pread syscall latency ↑` (long-tail buckets appear)
+
+**Interpretation**
+Hot pages are evicted too aggressively, causing read amplification.
+Foreground reads are directly impacted.
+
+**Operational Decision**
+- Trigger FODC
+- Recommend tuning eviction thresholds or rate-limiting background activity
+
+**Discriminator**
+Eviction-driven degradation is typically characterized by:
+- Elevated `fadvise` activity
+- Repeated page cache refills
+- Read latency degradation **without sustained compaction throughput
+ or disk I/O saturation**
+
+- **Query pattern signal** (optional): continuously scanning an extensive time
range.
+
+This pattern indicates policy-induced cache churn rather than workload
contention.
+These discriminator signals are typically sourced from DB-level or system-level
+metrics outside KTM.
+
+---
+
+### Workload 4 — I/O Contention or Cold Data Access
+
+**Typical Signals**
+- `page_cache_add ↑` (due to compaction OR new data reads)
+- `read/pread syscall latency ↑`
+- reclaim may or may not be present
+
+**Interpretation**
+Latency degradation is caused by:
+1. **Resource Contention**: Compaction threads competing with foreground reads
for disk I/O.
+2. **Cold Data Access**: The active working set exceeds resident memory,
forcing frequent OS page cache misses (synchronous disk reads).
+
+**Operational Decision**
+- Trigger FODC
+- Suggest reducing compaction concurrency
+- If compaction is idle but latency remains high, consider scaling up memory
(Capacity Planning).
+
+**Discriminator**
+This pattern is characterized by elevated read/pread syscall latency
**without** the explicit eviction signals of W3 (fadvise) or the system-wide
pressure of W5 (reclaim).
+It indicates that the system is physically bound by I/O limits due to
contention or capacity cache misses.
+
+---
+
+### Workload 5 — OS Memory Pressure–Driven Cache Drop
+
+**Typical Signals**
+- `direct_reclaim ↑`
+- `lru_shrink ↑`
+- `read/pread syscall latency ↑`
+- `fadvise` may be absent
+
+**Interpretation**
+Cache eviction is driven by OS memory pressure rather than DB policy.
+Foreground reads stall due to synchronous reclaim.
+
+**Operational Decision**
+- Trigger FODC
+- Recommend adjusting memory limits or reducing background memory usage
+
+---
+
+## 5. Excluded Signals and Rationale
+
+### 5.1 Page Fault Metrics
+
+BanyanDB primarily uses `read()` with page cache access rather than mmap-based
I/O.
+Major and minor page faults do not reliably represent read-path stalls and are
therefore excluded from impact detection.
+
+### 5.2 Block Layer Latency
+
+Block-layer completion context does not reliably map to BanyanDB threads in
containerized environments.
+Syscall-level latency already captures user-visible impact and is used as the
primary signal.
+
+Block-layer metrics may be added later as an optional enhancement.
+
+---
+
+## 6. Summary
+
+KTM identifies read-path incidents by:
+1. Gating on **syscall-level read/pread latency histograms**
+2. Explaining impact using:
+ - eviction policy actions (fadvise)
+ - page cache behavior
+ - memory pressure signals
+
+This separation ensures:
+- Low false positives
+- Clear causality
+- Actionable and explainable self-healing decisions
+
+## 7. Decision Flow Overview
+```mermaid
+graph TD
+ Start([Start: Metric Analysis]) --> CheckLat{Read/Pread Syscall\nLatency
Increased?}
+
+ %% Primary Gating Rule
+ CheckLat -- No --> Benign[Benign State\nNo User Impact]
+ CheckLat -- Yes --> Incident[Incident Detected\nTrigger FODC]
+
+ %% Benign Analysis
+ Benign --> CheckPressure{Pressure Signals\nPresent?}
+ CheckPressure -- Yes --> W2[W2: Stable State]
+ CheckPressure -- No --> W1[W1: Background Scan/Compaction]
+
+ %% Incident Analysis (Root Cause)
+ Incident --> CheckFadvise{High fadvise\ncalls?}
+
+ %% Branch: Policy
+ CheckFadvise -- Yes --> W3[W3: Policy-Driven Eviction\nAssociated with
aggressive DONTNEED (policy signal)]
+
+ %% Branch: Kernel/OS
+ CheckFadvise -- No --> CheckReclaim{Direct Reclaim / \nLRU Shrink?}
+
+ %% Branch: Pressure
+ CheckReclaim -- Yes --> W5[W5: OS Memory Pressure\nCause: Sync Reclaim]
+
+ %% Branch: Contention
+ CheckReclaim -- No --> W4[W4: I/O Contention / Cold Read\nCause:
Compaction or Working Set > RAM]
+
+ %% Styling
+ style CheckLat fill:#f9f,stroke:#333,stroke-width:2px
+ style Incident fill:#f00,stroke:#333,stroke-width:2px,color:#fff
+ style Benign fill:#9f9,stroke:#333,stroke-width:2px
+```
+
+---
+
+## 8. Operational Prerequisites and Observability
+
+- BTF availability and bpffs mounted are expected for fentry/fexit and map
pinning where used.
+- Kernel versions must support the chosen tracepoints/fentry paths; kprobe
fallbacks apply otherwise.
+- On failure to load/attach, KTM logs an error and disables itself (see
Failure Modes in the design document).