zqr10159 opened a new issue, #3892:
URL: https://github.com/apache/hertzbeat/issues/3892
### Feature Request
Native Lightweight Analysis Engine
### Is your feature request related to a problem? Please describe
1. **Static Thresholds:** hertzbeat-alerter relies heavily on static
thresholds (e.g., CPU \> 80%). This leads to alert fatigue in cyclic business
scenarios (e.g., daily traffic peaks).
2. **Existing AI Module:** The current hertzbeat-analysis (or hertzbeat-ai)
module focuses on **LLM integration** (ChatBot/Agent). It relies on external
APIs and is not suitable for high-frequency, low-latency, and cost-effective
numerical anomaly detection.
3. **Lack of Native Intelligence:** HertzBeat needs a built-in mathematical
engine to understand "Trend" and "Seasonality" without requiring users to
deploy heavy external Python/AI environments.
### Describe the solution you'd like
I propose introducing a new module **hertzbeat-analysis** (or extending the
existing AI module with a "Native" engine).
This module serves as a **"System 1" (Fast & Cheap)** intelligence layer
using pure Java mathematics (commons-math3), focusing on **Time-Series
Analysis** and **Dynamic Baseline Prediction**.
#### **1\. Architecture Design**
* **Location:** New Maven module hertzbeat-analysis.
* **Role:**
* **Consumer:** Subscribes to the metrics data stream (side-by-side with
alerter or warehouse).
* **Trainer:** Periodically queries historical data from
hertzbeat-warehouse to update model parameters (coefficients).
* **Provider:** Provides an API for hertzbeat-alerter to check if a value
is "Anomalous" based on the model.
#### **2\. Core Algorithms (Java Native)**
We will implement "TinyProphet" \- a lightweight decomposition algorithm:
* **Trend:** Linear Regression (OLS) or Ridge Regression.
* **Seasonality:** Fourier Series features fitted via OLS (Ordinary Least
Squares).
* **Stack:** org.apache.commons:commons-math3. **No Python/JNI required.**
#### **3\. Workflow**
1. **Data Preprocessing (TimeSeriesPreprocessor):**
* Handle missing data (NaN) from MetricsData with linear interpolation.
* Align timestamps to standard windows (e.g., 1-minute buckets).
2. **Model Training (Async Job):**
* Fetch last 1\~7 days of data from hertzbeat-warehouse.
* Calculate Regression Coefficients ($\\beta$) for Trend and Seasonality.
* Store lightweight coefficients (JSON) in the database
(hzb\_analysis\_model).
3. **Real-time Inference:**
* Calculate expected\_value using the stored coefficients.
* Compare |actual \- expected| against dynamic tolerance (e.g., 3-Sigma).
### Describe alternatives you've considered
* **Prometheus predict\_linear:** Stateless and cannot handle complex
seasonality.
* **External Python Agents:** Breaks the "Out-of-the-box" experience.
* **Using LLM for everything:** Too expensive and slow for real-time metric
stream processing.
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail:
[email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]