This is an automated email from the ASF dual-hosted git repository.
yaozhq pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/geaflow-website.git
The following commit(s) were added to refs/heads/main by this push:
new b9c2c63 Fix website blogs content (#9)
b9c2c63 is described below
commit b9c2c63f3c0c325ff23e8b9214f5a76537790624
Author: Leomrlin <[email protected]>
AuthorDate: Mon Oct 20 17:36:06 2025 +0800
Fix website blogs content (#9)
* update contacts
* remove repo
* fix blog content
* remove tugraph in blogs
* fix graph in en version
---
blog/27.md | 16 ++++++-------
blog/28.md | 19 ++++------------
blog/29.md | 16 ++++++-------
blog/30.md | 21 +++--------------
blog/31.md | 30 ++++++++++++-------------
blog/32.md | 12 +++++-----
i18n/en-US/code.json | 2 +-
i18n/zh-CN/docusaurus-plugin-content-blog/27.md | 28 +++++++++++------------
i18n/zh-CN/docusaurus-plugin-content-blog/28.md | 10 ++++-----
i18n/zh-CN/docusaurus-plugin-content-blog/29.md | 28 +++++++++++------------
i18n/zh-CN/docusaurus-plugin-content-blog/30.md | 2 +-
i18n/zh-CN/docusaurus-plugin-content-blog/31.md | 30 ++++++++++++-------------
i18n/zh-CN/docusaurus-plugin-content-blog/32.md | 8 +++----
13 files changed, 93 insertions(+), 129 deletions(-)
diff --git a/blog/27.md b/blog/27.md
index f694dbf..5ae7dd9 100644
--- a/blog/27.md
+++ b/blog/27.md
@@ -3,15 +3,13 @@ title: "Stream4Graph: Incremental Computation on Dynamic
Graphs"
date: "2025-3-11"
---
-
-
> Author: Zhang Qi
It's well known that when we need to perform correlation analysis on data, we
typically use SQL join operations. However, Cartesian product calculations
during SQL joins require maintaining a large number of intermediate results,
which significantly impacts overall data analysis performance. In contrast,
graph-based approaches maintain data correlations, transforming correlation
analysis into graph traversal operations and greatly reducing the cost of data
analysis.
However, with the continuous growth in data scale and increasing demand for
real-time processing, efficiently solving real-time computation problems on
large-scale graph data has become increasingly urgent. Traditional computing
engines such as Spark and Flink are gradually falling short of meeting the
growing business demands for graph data processing. Therefore, designing a
real-time processing engine tailored for large-scale graph data will bring
significant advancements to big data p [...]
-Stream graph computing engine
[GeaFlow](https://github.com/TuGraph-family/tugraph-analytics), which combines
the technical advantages of graph processing and stream processing. It
implements incremental computation capabilities on dynamic graphs, enhancing
real-time performance in high-performance correlation analysis. In the
following sections, we will introduce the characteristics of graph computing
technology, how the industry addresses large-scale real-time graph computing
challenges [...]
+Stream graph computing engine [Apache GeaFlow
(Incubating)](https://github.com/apache/geaflow), which combines the technical
advantages of graph processing and stream processing. It implements incremental
computation capabilities on dynamic graphs, enhancing real-time performance in
high-performance correlation analysis. In the following sections, we will
introduce the characteristics of graph computing technology, how the industry
addresses large-scale real-time graph computing challeng [...]
<!-- truncate -->
@@ -96,19 +94,19 @@ However, this approach has a significant drawback: it
involves redundant computa
<font style="color:rgb(64, 64, 64);"></font>
-## 4. Incremental Dynamic Graph Computing: GeaFlow
+## 4. Incremental Dynamic Graph Computing: Apache GeaFlow (Incubating)
-We know that in traditional stream computing engines like Flink, the
processing model allows the system to handle continuously incoming data events.
When processing each event, Flink can evaluate changes and execute computations
only on the changed parts. This means that in incremental computing, Flink
focuses on the latest incoming data rather than the entire dataset. Inspired by
Flink’s incremental computing, we developed the incremental graph computing
system GeaFlow (also known as th [...]
+We know that in traditional stream computing engines like Flink, the
processing model allows the system to handle continuously incoming data events.
When processing each event, Flink can evaluate changes and execute computations
only on the changed parts. This means that in incremental computing, Flink
focuses on the latest incoming data rather than the entire dataset. Inspired by
Flink’s incremental computing, we developed the incremental graph computing
system Apache GeaFlow (Incubatin [...]
<font style="color:rgb(64, 64, 64);"></font>
-How does GeaFlow implement incremental graph computing? First, real-time data
is input into GeaFlow through connectors. GeaFlow generates internal node-edge
structure data based on the real-time data and inserts this data into the
underlying graph. Nodes involved in the real-time data within the current
window are activated, triggering graph iterative computation.
+How does Apache GeaFlow (Incubating) implement incremental graph computing?
First, real-time data is input into GeaFlow through connectors. GeaFlow
generates internal node-edge structure data based on the real-time data and
inserts this data into the underlying graph. Nodes involved in the real-time
data within the current window are activated, triggering graph iterative
computation.
Using the WCC algorithm as an example, for the connected components algorithm,
in a time window, each edge’s src id and tar id vertices are activated. In the
first iteration, their id information is sent to neighboring nodes. If a
neighboring node receives the message and finds that it needs to update its
information, it continues to notify its neighbors; otherwise, its iteration
terminates.

-## 5. GeaFlow Architecture Overview
+## 5. Apache GeaFlow (Incubating) Architecture Overview
The GeaFlow engine consists of three main parts: DSL, Framework, and State. It
also provides users with Stream API, Static Graph API, and Dynamic Graph API.
The DSL layer is responsible for parsing and optimizing graph query languages
like SQL+ISO/GQL, as well as schema inference. It also supports various
Connectors such as Hive, Hudi, Kafka, and ODPS. The Framework layer handles
runtime scheduling, fault tolerance, shuffle, and coordination of components.
The State layer is responsible [...]
@@ -218,6 +216,6 @@ The GeaFlow project is fully open-sourced. We have built
some of the foundationa
## References
-1. GeaFlow
Project:[https://github.com/TuGraph-family/tugraph-analytics](https://github.com/TuGraph-family/tugraph-analytics)
+1. Apache GeaFlow (Incubating)
Project:[https://github.com/apache/geaflow](https://github.com/apache/geaflow)
2. soc-Livejournal
Dataset:[https://snap.stanford.edu/data/soc-LiveJournal1.html](https://snap.stanford.edu/data/soc-LiveJournal1.html)
-3. GeaFlow
Issues:[https://github.com/TuGraph-family/tugraph-analytics/issues](https://github.com/TuGraph-family/tugraph-analytics/issues)
+3. Apache GeaFlow (Incubating)
Issues:[https://github.com/apache/geaflow/issues](https://github.com/apache/geaflow/issues)
diff --git a/blog/28.md b/blog/28.md
index b0ce2f5..b1453f7 100644
--- a/blog/28.md
+++ b/blog/28.md
@@ -3,23 +3,15 @@ title: Principles and Applications of Incremental Match in
Streaming Graph Compu
date: 2025-6-3
---
-
-
## Problem Background
In streaming computing, data rarely arrives all at once but is continuously
input and processed. Similarly, in graph computing/graph querying scenarios,
vertices and edges are constantly read from data sources to construct graphs
incrementally. In incremental graph queries, the graph evolves continuously,
leading to different query results across graph versions. When new
vertices/edges form an updated graph version, recomputing through the entire
graph incurs high overhead and duplicates [...]
<!-- truncate -->
-<font style="color:rgb(51, 51, 51);">GQL (Graph Query Language)</font> <font
style="color:rgb(0, 0, 0);">is an international standard developed by ISO for
graph query languages,</font> <font style="color:rgb(51, 51, 51);">used to
execute queries on graphs. Geaflow is an open-source streaming graph engine by
Ant Group’s graph computing team, specializing in dynamically changing graph
data and supporting large-scale, high-concurrency real-time graph computing
scenarios.</font> This article [...]
-
-
+<font style="color:rgb(51, 51, 51);">GQL (Graph Query Language)</font> <font
style="color:rgb(0, 0, 0);">is an international standard developed by ISO for
graph query languages,</font> <font style="color:rgb(51, 51, 51);">used to
execute queries on graphs. Apache GeaFlow (Incubating) is an open-source
streaming graph engine, specializing in dynamically changing graph data and
supporting large-scale, high-concurrency real-time graph computing
scenarios.</font> This article introduces GeaF [...]
## Current Challenges
-<font style="color:rgb(0, 0, 0);">The Geaflow engine adopts a vertex-centric
framework, where each vertex sends messages iteratively. Vertices process
received messages in subsequent iterations.</font> For GQL queries, traversal
starts from initial vertices for pattern matching (e.g., from node `A` to `B`
to `C`). In dynamic graphs, if only newly added vertices/edges trigger
computation, results may be incomplete, as illustrated below:
-
-<div style="text-align: center;">
-<img
src="https://intranetproxy.alipay.com/skylark/lark/0/2025/jpeg/23857192/1741576149930-b169b7da-0600-4fca-b6ad-5eadcfdbff5b.jpeg"
alt='画板' height="281" width="486">
-</div>
+<font style="color:rgb(0, 0, 0);">The Apache GeaFlow (Incubating) engine
adopts a vertex-centric framework, where each vertex sends messages
iteratively. Vertices process received messages in subsequent
iterations.</font> For GQL queries, traversal starts from initial vertices for
pattern matching (e.g., from node `A` to `B` to `C`). In dynamic graphs, if
only newly added vertices/edges trigger computation, results may be incomplete,
as illustrated below:
The key issue is that **Vertex A1 cannot trigger computation if only the delta
is considered**, yet it belongs to the incremental results. To resolve this, we
propose a subgraph expansion method from new vertices. The query is divided
into two phases:
1. **Evolve Phase**: Propagate `EvolveMessage` from new vertices to neighbors,
adding recipients to the `EvolveVertices` set.
@@ -72,9 +64,6 @@ public void compute(Object vertexId, Iterator<MessageBox>
messageIterator) {
}
```
-**Visualization:**
-
-
**Evolve Conditions:**
- Query iterations `>2` (no Evolve needed for ≤2 hops).
- Query iterations `≤ Threshold`.
@@ -82,7 +71,7 @@ public void compute(Object vertexId, Iterator<MessageBox>
messageIterator) {
- No starting vertex filter in GQL (e.g., `Match(a:person where a.id=1)`
excludes Evolve).
## Demo
-In Geaflow, configure incremental graphs via `windowSize` for vertex/edge
tables:
+In Apache GeaFlow (Incubating), configure incremental graphs via `windowSize`
for vertex/edge tables:
```sql
CREATE GRAPH modern (
@@ -160,4 +149,4 @@ In this demo, vertex window size is 20, and edge window
size is 3, meaning each
## Conclusion and Outlook
-In dynamic/streaming graph scenarios, graph nodes and edges change in real
time. When querying such graphs, we can often trigger computation only on the
incremental part using historical information, avoiding full graph traversal.
Geaflow uses a subgraph expansion-based incremental match method, applied
within a vertex-centric distributed graph computing framework, to support
incremental querying in dynamic graph scenarios. In the future, we aim to
implement more complex incremental matc [...]
\ No newline at end of file
+In dynamic/streaming graph scenarios, graph nodes and edges change in real
time. When querying such graphs, we can often trigger computation only on the
incremental part using historical information, avoiding full graph traversal.
Apache GeaFlow (Incubating) uses a subgraph expansion-based incremental match
method, applied within a vertex-centric distributed graph computing framework,
to support incremental querying in dynamic graph scenarios. In the future, we
aim to implement more comp [...]
\ No newline at end of file
diff --git a/blog/29.md b/blog/29.md
index b537fce..120fb71 100644
--- a/blog/29.md
+++ b/blog/29.md
@@ -1,5 +1,5 @@
---
-title: "Exploring GeaFlow's Temporal Capabilities — Breathing New Life into
Time-Series Data!"
+title: "Exploring Apache GeaFlow (Incubating)'s Temporal Capabilities —
Breathing New Life into Time-Series Data!"
date: 2025-6-25
---
@@ -24,9 +24,9 @@ In today's digital era, data has become a core resource
driving decisions and in
- **Lack of Flexibility**
Many tools support only one type of analysis and
cannot concurrently process real-time streams and historical data.
- To address these issues, GeaFlow innovatively
introduces temporal graph computing. As a distributed stream-graph engine
designed for dynamic data, GeaFlow efficiently tackles challenges posed by
evolving datasets. For dynamically changing graph structures, users can
seamlessly perform operations like graph traversal, pattern matching, and
computations—meeting complex analytical needs. By integrating temporal
dimensions with dynamic graph processing, GeaFlow offers [...]
+ To address these issues, Apache GeaFlow (Incubating)
innovatively introduces temporal graph computing. As a distributed stream-graph
engine designed for dynamic data, GeaFlow efficiently tackles challenges posed
by evolving datasets. For dynamically changing graph structures, users can
seamlessly perform operations like graph traversal, pattern matching, and
computations—meeting complex analytical needs. By integrating temporal
dimensions with dynamic graph proces [...]
-## What Is GeaFlow?
+## What Is Apache GeaFlow (Incubating)?
GeaFlow is a powerful distributed computing platform that combines graph
computing and stream processing to handle dynamic graphs and temporal data
efficiently. It supports complex graph algorithms and real-time analytics,
making it ideal for dynamic scenarios. Key features include:
@@ -66,8 +66,8 @@ They complement each other:
- **Temporal Graphs Enhance Stream Analysis**
Timestamps enable complex operations like trend prediction and window-based
analytics.
-### **4. GeaFlow’s Implementation**
-GeaFlow unifies stream and temporal graphs through:
+### **4. Apache GeaFlow (Incubating)’s Implementation**
+Apache GeaFlow (Incubating) unifies stream and temporal graphs through:
- **Timestamp Assignment**
Assigns *processing time* or *event time* to all data.
- **Dynamic Updates & Historical Retention**
@@ -176,10 +176,10 @@ a_id | e1_ts | b_id | e2_ts | c_id
- **Flexible**: SQL-like syntax lowers development barriers.
- **Scalable**: Handles massive dynamic graphs via incremental computation.
-## Core Highlights of GeaFlow’s Temporal Capabilities
+## Core Highlights of Apache GeaFlow (Incubating)’s Temporal Capabilities
### 1. Time-Aware Data Processing
-Timestamps enable precision. GeaFlow supports:
+Timestamps enable precision. Apache GeaFlow (Incubating) supports:
- **5-Minute Trend Analysis**: Track real-time interaction frequency shifts.
- **24-Hour Dynamic Patterns**: Identify long-term trends (e.g., user purchase
behavior).
@@ -203,7 +203,7 @@ Optimized temporal algorithms:
Dynamic data holds immense value, and GeaFlow’s temporal capabilities unlock
it. Whether you’re a novice or an expert, GeaFlow empowers you to harness
time-series data.
-**Download GeaFlow today and explore the power of temporal analytics!**
+**Download Apache GeaFlow (Incubating) today and explore the power of temporal
analytics!**
---
diff --git a/blog/30.md b/blog/30.md
index 5e6f58e..ab91ec2 100644
--- a/blog/30.md
+++ b/blog/30.md
@@ -16,10 +16,6 @@ date: 2025-5-15
<!-- truncate -->
-
-
-**Figure 1: Performance Difference Between SQL Join and GQL Graph Hop Queries**
-
### 2. Data Constraints
**Efficiency Constraint**: When association levels exceed 3 hops, the time
complexity of traditional JOIN operations grows exponentially. Analytical
models centered around multi-table JOINs gradually lose their advantage and
become a "shackle" to efficiency.
@@ -28,9 +24,6 @@ date: 2025-5-15
**Innovation Constraint**: Business analysts often abandon graph technology
stacks due to the need to learn GQL (Graph Query Language). The fragmented
toolchain keeps graph analytics confined to technical departments, failing to
empower front-line business teams.
-
-
-**Figure 2: JOIN vs GQL Expression Examples**
### 3. Breakthrough Strategy: Core Value of the Graph Data Warehouse
@@ -62,23 +55,18 @@ The Graph Data Warehouse Schema Converter automatically
transforms the ER model
**<font style="color:rgba(0, 0, 0, 0.88);">Stage 3: Graph
Assembly.</font>**<font style="color:rgba(0, 0, 0, 0.88);">All vertices are
merged, and edges bound to start nodes are naturally merged. Endpoint binding
is optional. For two different graph conversion schemes, a difference vector
can be calculated — representing how all tables map to entity changes.</font>
-
-
-
-
-**Figure 3: ER to Graph Schema Conversion Example Series**
<font style="color:rgba(0, 0, 0, 0.88);">Through algorithmic analysis of
inter-table associations and automatic graph construction, this provides a
basis for migrating data from its original storage location to the graph data
warehouse. It also significantly reduces manual data modeling and DSL scripting
efforts, enabling fast migration of traditional warehouse data to a graph
warehouse with no manual intervention and immediate analysis readiness.</font>
### 2. Data Pipeline: Materialized Data Interaction Capabilities
-Similar to traditional data warehouses, the graph data warehouse leverages
GeaFlow engine capabilities and TuMaker’s mature business platform to provide
data <font style="color:rgb(0, 0, 0);">task orchestration capabilities —
organizing multiple data processing tasks (like data extraction,
transformation, and loading) in a logical sequence and executing them
automatically. Key features include visual interfaces, task scheduling, event
triggers, error handling, monitoring and logging, ver [...]
+Similar to traditional data warehouses, the graph data warehouse leverages
Apache GeaFlow (Incubating) engine capabilities and TuMaker’s mature business
platform to provide data <font style="color:rgb(0, 0, 0);">task orchestration
capabilities — organizing multiple data processing tasks (like data extraction,
transformation, and loading) in a logical sequence and executing them
automatically. Key features include visual interfaces, task scheduling, event
triggers, error handling, monitor [...]
<font style="color:rgb(0, 0, 0);">With the help of the Schema Converter, a
materialization plan from table storage to graph storage can be generated,
building a data pipeline between traditional and graph data warehouses. Based
on the table-to-graph materialization plan, the system can automatically
generate data sync task orchestrations according to actual business
configurations like acceleration tables, relationships, fields, and
permissions. These are then scheduled via the graph war [...]
<font style="color:rgba(0, 0, 0, 0.88);">The data pipeline integrates deeply
with mainstream big data ecosystems like ODPS/Hive/Paimon. It achieves full
lifecycle data management through a three-tier architecture: at the data access
layer, it automatically captures table changes, generates materialization
plans, and syncs incremental mappings from tables to graph entities, currently
managing graph data at the 10TB scale; at the conversion engine layer, it fully
automates DSL task orchest [...]
-
+
**Figure 4: Open-Source Technical Architecture Overview**
@@ -90,9 +78,6 @@ Similar to traditional data warehouses, the graph data
warehouse leverages GeaFl
Compared to traditional SQL queries<font style="color:rgb(64, 64, 64);">that
may take minutes to analyze user relationships through three table joins, graph
path queries can complete the same task in seconds.</font><font
style="color:rgba(0, 0, 0, 0.88);">This engine has been validated in typical
business scenarios like short video analysis, membership growth, and customer
rights services. In the future, it will expand to support complex subqueries
and expression operations, allowing mor [...]
-
-
-**Figure 5: SQL AST to GQL Structure Translation Difference Example**
## 3. Technical Advantages and Application Scenarios
@@ -114,7 +99,7 @@ Compared to traditional SQL queries<font
style="color:rgb(64, 64, 64);">that may
## 4. Future Outlook
-<font style="color:rgba(0, 0, 0, 0.88);">As a core carrier of next-generation
data infrastructure, we plan to gradually open-source core capabilities like
the graph storage engine, graph computing framework engine, and SQL-GQL
translation module to build a developer-driven technical ecosystem. In 2023, we
first open-sourced the streaming graph computing engine GeaFlow. In Q3 2025, we
will release a standardized graph data analysis platform, high-performance
graph computing engine, and su [...]
+<font style="color:rgba(0, 0, 0, 0.88);">As a core carrier of next-generation
data infrastructure, we plan to gradually open-source core capabilities like
the graph storage engine, graph computing framework engine, and SQL-GQL
translation module to build a developer-driven technical ecosystem. In 2023, we
first open-sourced the streaming graph computing engine Apache GeaFlow
(Incubating). In Q3 2025, we will release a standardized graph data analysis
platform, high-performance graph comp [...]
<font style="color:rgba(0, 0, 0, 0.88);">On the technical evolution front, the
next-generation engine will break through dynamic streaming graph computing
bottlenecks to support trillion-edge incremental updates. By integrating
vectorized computing engines, it can jointly query property graphs and vector
graphs to meet AIGC-era multimodal analysis needs and enable revolutionary
experiences like generating graph queries directly from natural language.
Industry applications are rapidly exp [...]
diff --git a/blog/31.md b/blog/31.md
index c8ba7bf..cbb2663 100644
--- a/blog/31.md
+++ b/blog/31.md
@@ -3,11 +3,9 @@ title: "Graph4Stream: Accelerating Stream Computing with
Graph-Based Approaches"
date: 2025-3-25
---
-
-
> Author: Kunyu; Reviewer: Dongshuo.
-In a previous article ["Stream4Graph: Incremental Computation on Dynamic
Graphs"](https://zhuanlan.zhihu.com/p/27618053733), we introduced how
introducing incremental computation into graph computing—essentially combining
"graphs + streams"—allowed GeaFlow to significantly outperform Spark GraphX in
terms of performance. Now, the question arises: when we introduce graph
computing capabilities into stream computing—combining "streams + graphs"—how
does GeaFlow compare to Flink's associati [...]
+In a previous article ["Stream4Graph: Incremental Computation on Dynamic
Graphs"](https://zhuanlan.zhihu.com/p/27618053733), we introduced how
introducing incremental computation into graph computing—essentially combining
"graphs + streams"—allowed Apache GeaFlow (Incubating) to significantly
outperform Spark GraphX in terms of performance. Now, the question arises: when
we introduce graph computing capabilities into stream computing—combining
"streams + graphs"—how does GeaFlow compare [...]
In today’s era, data is being generated at an unprecedented speed and scale,
and real-time processing of massive datasets has wide applications in various
fields such as anomaly detection, search recommendations, and financial
transactions. As one of the core technologies for real-time data processing,
**stream computing** has become increasingly important.
@@ -15,7 +13,7 @@ In today’s era, data is being generated at an unprecedented
speed and scale, a
Unlike batch processing, which waits for all data to arrive before
computation, stream computing partitions continuously generated data streams
into micro-batches and performs incremental computations on each batch. This
computational characteristic gives stream computing high throughput and low
latency. Common stream computing engines include Flink and Spark Streaming,
both of which process data using tabular representations. However, as stream
computing applications deepen, more and mo [...]
-GeaFlow, an open-source stream graph computing engine developed by Ant Group's
graph computing team, combines graph and stream computing to provide an
efficient framework for stream graph processing, significantly improving
computational performance. Below, we will introduce the limitations of
traditional stream computing engines in relational computation, explain the
principles behind GeaFlow's efficiency, and present performance comparisons.
+Apache GeaFlow (Incubating), an open-source stream graph computing engine,
combines graph and stream computing to provide an efficient framework for
stream graph processing, significantly improving computational performance.
Below, we will introduce the limitations of traditional stream computing
engines in relational computation, explain the principles behind GeaFlow's
efficiency, and present performance comparisons.
## Stream Computing Engine: Flink
@@ -78,7 +76,7 @@ The main performance bottleneck lies in scanning
RightStateView. LeftStateView a
Flink Join Operator Implementation
-## Stream Graph Computing Engine: GeaFlow
+## Stream Graph Computing Engine: Apache GeaFlow (Incubating)
### Graph Computing & Stream Graphs
@@ -90,17 +88,17 @@ Table Modeling vs. Graph Modeling
A **stream graph** is the application of graph computing to streaming
scenarios. It divides the graph into historical and incremental components
based on data stream updates. For example, if the first two rows have been
processed and we are now handling the third row, the historical graph is built
from the first two rows, and the incremental graph is formed by the third row.
Together, they constitute the full graph. Applying incremental graph algorithms
on stream graphs enables efficient [...]
-### GeaFlow Architecture
+### Apache GeaFlow (Incubating) Architecture
The GeaFlow engine’s computation flow consists of stream data input,
distributed incremental graph computation, and incremental result output. Like
traditional stream engines, real-time data is sliced into micro-batches by
window. For each batch, the data is parsed into vertices and edges to form an
incremental graph. This incremental graph and the historical graph (built from
previous data) together form the complete stream graph. The computation
framework applies incremental graph algo [...]

-GeaFlow Incremental Computation
+Apache GeaFlow (Incubating) Incremental Computation
The GeaFlow computation framework is a vertex-centric iterative model. It
starts with vertices in the incremental graph. In each iteration, each vertex
maintains its own state and performs computation based on its associated
historical and incremental graph data. The result is then passed to neighboring
vertices via message passing to trigger the next iteration.
-Taking k-Hop as an example, the incremental algorithm works as follows: In the
first iteration, all edges in the incremental graph are identified and treated
as initial incoming and outgoing paths, which are sent to their start and end
vertices. In subsequent iterations, these paths are extended. Once the desired
hop count is reached, the paths are sent back to the starting vertex, where
they are combined into final results. Detailed implementation can be found in
the open-source reposit [...]
+Taking k-Hop as an example, the incremental algorithm works as follows: In the
first iteration, all edges in the incremental graph are identified and treated
as initial incoming and outgoing paths, which are sent to their start and end
vertices. In subsequent iterations, these paths are extended. Once the desired
hop count is reached, the paths are sent back to the starting vertex, where
they are combined into final results. Detailed implementation can be found in
the open-source reposit [...]
The diagram below illustrates the two-hop case. In the first iteration, the
edge B->C creates incoming and outgoing paths, sent to B and C, respectively.
In the second iteration, B receives an incoming path, adds its own incoming
edges, and forms a 2-hop incoming path, which it sends to itself. Similarly, C
forms a 2-hop outgoing path and sends it to B. In the final iteration, B
combines the incoming and outgoing paths to produce the new paths. Unlike
Flink, which must scan all historica [...]
@@ -169,7 +167,7 @@ RETURN ret
;
```
-## GeaFlow Performance Test
+## Apache GeaFlow (Incubating) Performance Test
To evaluate GeaFlow’s performance in stream graph computing, we designed a
comparative experiment using the k-Hop algorithm. We used the public dataset
[web-Google.txt](https://snap.stanford.edu/data/web-Google.html) as input and
measured the time required to complete the computation across one-hop to
four-hop scenarios. The experiment ran on 16 servers, each with 8 cores and
16GB memory.
@@ -181,19 +179,19 @@ k-Hop Computation Performance Comparison
## Conclusion and Future Work
-Traditional stream engines like Flink use join operators for relationship
computation, which requires scanning all historical data, resulting in poor
performance in large-scale associative scenarios. GeaFlow addresses this by
introducing graph computing into stream processing through a stream graph
framework, significantly boosting performance with incremental graph algorithms.
+Traditional stream engines like Flink use join operators for relationship
computation, which requires scanning all historical data, resulting in poor
performance in large-scale associative scenarios. Apache GeaFlow (Incubating)
addresses this by introducing graph computing into stream processing through a
stream graph framework, significantly boosting performance with incremental
graph algorithms.
-GeaFlow is now open-source. We aim to build a unified lakehouse engine for
graph data to support diverse associative analytics. We are also preparing to
join the Apache Software Foundation to enrich the open-source big data
ecosystem. If you're interested in graph technology, we welcome you to join the
community.
+Apache GeaFlow (Incubating) is now open-source. We aim to build a unified
lakehouse engine for graph data to support diverse associative analytics. We
are also preparing to join the Apache Software Foundation to enrich the
open-source big data ecosystem. If you're interested in graph technology, we
welcome you to join the community.
There are many exciting tasks to explore. You can start with these
beginner-friendly issues:
-- Support incremental k-Core algorithm ([Issue
466](https://github.com/TuGraph-family/tugraph-analytics/issues/466))
-- Support incremental Minimum Spanning Tree algorithm ([Issue
465](https://github.com/TuGraph-family/tugraph-analytics/issues/465))
+- Support incremental k-Core algorithm ([Issue
466](https://github.com/apache/geaflow/issues/466))
+- Support incremental Minimum Spanning Tree algorithm ([Issue
465](https://github.com/apache/geaflow/issues/465))
- ...
## References
-1. GeaFlow Project:
[https://github.com/TuGraph-family/tugraph-analytics](https://github.com/TuGraph-family/tugraph-analytics)
+1. Apache GeaFlow (Incubating) Project:
[https://github.com/apache/geaflow](https://github.com/apache/geaflow)
2. web-Google Dataset:
[https://snap.stanford.edu/data/web-Google.html](https://snap.stanford.edu/data/web-Google.html)
-3. GeaFlow Issues:
[https://github.com/TuGraph-family/tugraph-analytics/issues](https://github.com/TuGraph-family/tugraph-analytics/issues)
-4. Incremental k-Hop Source Code:
[https://github.com/TuGraph-family/tugraph-analytics/blob/master/geaflow/geaflow-dsl/geaflow-dsl-plan/src/main/java/com/antgroup/geaflow/dsl/udf/graph/IncKHopAlgorithm.java](https://github.com/TuGraph-family/tugraph-analytics/blob/master/geaflow/geaflow-dsl/geaflow-dsl-plan/src/main/java/com/antgroup/geaflow/dsl/udf/graph/IncKHopAlgorithm.java)
+3. Apache GeaFlow (Incubating) Issues:
[https://github.com/apache/geaflow/issues](https://github.com/apache/geaflow/issues)
+4. Incremental k-Hop Source Code:
[https://github.com/apache/geaflow/blob/master/geaflow/geaflow-dsl/geaflow-dsl-plan/src/main/java/com/antgroup/geaflow/dsl/udf/graph/IncKHopAlgorithm.java](https://github.com/apache/geaflow/blob/master/geaflow/geaflow-dsl/geaflow-dsl-plan/src/main/java/com/antgroup/geaflow/dsl/udf/graph/IncKHopAlgorithm.java)
diff --git a/blog/32.md b/blog/32.md
index 9794c4e..cb50849 100644
--- a/blog/32.md
+++ b/blog/32.md
@@ -1,9 +1,9 @@
---
-title: "Streaming Graph Computing Engine GeaFlow v0.6.4 Released: Supports
Relational Access to Graph Data, Incremental Matching Optimizes Real-Time
Processing"
+title: "Streaming Graph Computing Engine Apache GeaFlow (Incubating) v0.6.4
Released: Supports Relational Access to Graph Data, Incremental Matching
Optimizes Real-Time Processing"
date: April 3, 2025
---
-**March 2025** saw the release of streaming graph computing engine GeaFlow
v0.6.4. This version implements multiple significant feature updates, including:
+**March 2025** saw the release of streaming graph computing engine Apache
GeaFlow (Incubating) v0.6.4. This version implements multiple significant
feature updates, including:
- 🍀 Experimental support for storing GeaFlow graph data in Paimon data lake
- 🍀 Enhanced graph data warehouse capabilities: Supports relational access to
graph entities
@@ -15,7 +15,7 @@ date: April 3, 2025
## ✨ New Features
-### 🍀 GeaFlow Graph Storage Extended to Support Paimon Data Lake (Experimental)
+### 🍀 Apache GeaFlow (Incubating) Graph Storage Extended to Support Paimon
Data Lake (Experimental)
To enhance GeaFlow's data storage system scalability, real-time processing
capabilities, and cost efficiency, this update adds support for **Apache
Paimon**. As a next-generation streaming data lake storage format, Paimon
differs significantly in design philosophy and features from RocksDB,
previously used by GeaFlow:
@@ -28,12 +28,12 @@ In this update, GeaFlow adds support for Paimon storage
(currently **experimenta
- **Current limitations**: Only supports local filesystem as Paimon backend;
recoverability not yet supported; dynamic graph data storage not yet supported.
- Configure the storage path via the parameter
`geaflow.store.paimon.options.warehouse` (default: `"file:///tmp/paimon/"`).
-The current GeaFlow storage architecture is shown below:
+The current Apache GeaFlow (Incubating) storage architecture is shown below:

### 🍀 Graph Data Warehouse Capability Expansion: Supports Relational Access to
Graph Entities
-In traditional relational databases, multi-table JOIN queries often require
complex SQL statements, hindering development efficiency and struggling with
performance for ad-hoc analysis of massive interconnected data. Addressing this
pain point, GeaFlow introduces innovative SQL support that automatically
translates complex SQL JOIN statements into graph path queries—**no Graph Query
Language (GQL) needed**. This version offers two SQL syntax features:
+In traditional relational databases, multi-table JOIN queries often require
complex SQL statements, hindering development efficiency and struggling with
performance for ad-hoc analysis of massive interconnected data. Addressing this
pain point, Apache GeaFlow (Incubating) introduces innovative SQL support that
automatically translates complex SQL JOIN statements into graph path
queries—**no Graph Query Language (GQL) needed**. This version offers two SQL
syntax features:
1. **Querying Vertices/Edges as Source Tables:**
- The `TableScanToGraphRule` identifies vertices/edges within SQL
statements, enabling users to query graph entities like standard SQL table
scans.
@@ -58,7 +58,7 @@ In traditional relational databases, multi-table JOIN queries
often require comp
### 🍀 Unified Memory Manager Support
-Previously, GeaFlow lacked centralized memory management. Apart from RocksDB
using off-heap memory, all memory was on-heap, leading to significant GC
pressure under heavy loads. Network shuffling also involved multiple data
copies, reducing efficiency.
+Previously, Apache GeaFlow (Incubating) lacked centralized memory management.
Apart from RocksDB using off-heap memory, all memory was on-heap, leading to
significant GC pressure under heavy loads. Network shuffling also involved
multiple data copies, reducing efficiency.
The new **Unified Memory Manager** governs memory allocation, release, and
monitoring across modules (shuffle, state, framework) for both on-heap and
off-heap memory. Key capabilities include:
- **Unified On-heap/Off-heap Management:** Abstracts memory access via
`MemoryView`, shielding users from the underlying type. Off-heap chunks are
pre-allocated (default chunk size: 30% of `-Xmx`, configurable via
`off.heap.memory.chunkSize.MB`) and support dynamic resizing.
diff --git a/i18n/en-US/code.json b/i18n/en-US/code.json
index 7760b2e..fcff1bf 100644
--- a/i18n/en-US/code.json
+++ b/i18n/en-US/code.json
@@ -12,7 +12,7 @@
"message": "OVERVIEW"
},
"product.intro.desc": {
- "message": "Apache GeaFlow (Incubating) is a distributed unified
stream-batch graph computing product that supports core capabilities including
mixed table-graph processing, real-time graph computing, and interactive graph
analysis. It provides high availability and one-stop cloud-native development
and deployment capabilities. Based on Ant Group's self-developed trillion-scale
graph computing practices, GeaFlow is currently widely applied in scenarios
such as data warehouse acce [...]
+ "message": "Apache GeaFlow (Incubating) is a distributed unified
stream-batch graph computing product that supports core capabilities including
mixed table-graph processing, real-time graph computing, and interactive graph
analysis. It provides high availability and one-stop cloud-native development
and deployment capabilities. Based on self-developed trillion-scale graph
computing practices, GeaFlow is currently widely applied in scenarios such as
data warehouse acceleration, fi [...]
},
"product.repo": {
"message": "TuGraph Family"
diff --git a/i18n/zh-CN/docusaurus-plugin-content-blog/27.md
b/i18n/zh-CN/docusaurus-plugin-content-blog/27.md
index 8b0d9b9..8db9e51 100644
--- a/i18n/zh-CN/docusaurus-plugin-content-blog/27.md
+++ b/i18n/zh-CN/docusaurus-plugin-content-blog/27.md
@@ -3,15 +3,13 @@ title: "Stream4Graph:动态图上的增量计算"
date: "2025-3-11"
---
-
-
> 作者:张奇
众所周知,当我们需要对数据做关联性分析的时候,一般会采用表连接(SQL join)的方式完成。但是 SQL join
时的笛卡尔积计算需要维护大量的中间结果,从而对整体的数据分析性能带来巨大影响。相比而言,基于图的方式维护数据的关联性,原本的关联性分析可以转换为图上的遍历操作,从而大幅降低数据分析的成本。
然而,随着数据规模的不断增长,以及对数据处理更强的实时性需求,如何高效地解决大规模图数据上的实时计算问题,就变得越来越紧迫。传统的计算引擎,如
Spark、Flink 对于图数据的处理已经逐渐不能满足业务日益增长的诉求,因此设计一套面向大规模图数据的实时处理引擎,将会对大数据处理技术革新带来巨大的帮助。
-蚂蚁图计算团队开源的流图计算引擎[GeaFlow](https://github.com/TuGraph-family/tugraph-analytics),结合了图处理和流处理的技术优势,实现了动态图上的增量计算能力,在高性能关联性分析的基础上,进一步提升了图计算的实时性。接下来向大家介绍图计算技术的特点,业内如何解决大规模实时图计算问题,以及
GeaFlow 在动态图上的计算性能表现。
+开源的流图计算引擎[Apache GeaFlow
(Incubating)](https://github.com/apache/geaflow),结合了图处理和流处理的技术优势,实现了动态图上的增量计算能力,在高性能关联性分析的基础上,进一步提升了图计算的实时性。接下来向大家介绍图计算技术的特点,业内如何解决大规模实时图计算问题,以及
Apache GeaFlow (Incubating) 在动态图上的计算性能表现。
<!-- truncate -->
@@ -96,25 +94,25 @@ date: "2025-3-11"
<font style="color:rgb(64, 64, 64);"></font>
-## 4. 动态图增量计算:GeaFlow
+## 4. 动态图增量计算:Apache GeaFlow (Incubating)
-<font style="color:rgb(64, 64, 64);">我们知道在传统的流计算引擎中,如 Flink,</font><font
style="color:rgb(51, 51, 51);">其处理模型允许系统能够处理不断流入的数据事件。处理每个事件时,Flink
可以评估变化并仅针对变化的部分执行计算。这意味着在增量计算过程中,Flink 会关注最新到达的数据,而不是整个数据集。于是受到 Flink
增量计算的启发,</font><font style="color:rgb(64, 64, 64);">我们自研了增量图计算系统
GeaFlow(也叫流图计算引擎),能够很好的支持增量图迭代计算。</font>
+<font style="color:rgb(64, 64, 64);">我们知道在传统的流计算引擎中,如 Flink,</font><font
style="color:rgb(51, 51, 51);">其处理模型允许系统能够处理不断流入的数据事件。处理每个事件时,Flink
可以评估变化并仅针对变化的部分执行计算。这意味着在增量计算过程中,Flink 会关注最新到达的数据,而不是整个数据集。于是受到 Flink
增量计算的启发,</font><font style="color:rgb(64, 64, 64);">我们自研了增量图计算系统 Apache GeaFlow
(Incubating)(也叫流图计算引擎),能够很好的支持增量图迭代计算。</font>
<font style="color:rgb(64, 64, 64);"></font>
-那么 GeaFlow 是如何实现增量图计算的呢?首先,实时数据通过 connector 消息源输入的 GeaFlow 中,GeaFlow
依据实时数据,生成内部的点边结构数据,并且将点边数据插入进底图中。当前窗口的实时数据涉及到的点会被激活,触发图迭代计算。
+那么 Apache GeaFlow (Incubating) 是如何实现增量图计算的呢?首先,实时数据通过 connector 消息源输入的 GeaFlow
中,GeaFlow 依据实时数据,生成内部的点边结构数据,并且将点边数据插入进底图中。当前窗口的实时数据涉及到的点会被激活,触发图迭代计算。
这里以 WCC 算法为例,对联通分量算法而言,在一个时间窗口内每条边对应的 src id 和 tar id 对应的顶点会被激活,第一次迭代需要将其 id
信息通知其邻居节点。如果邻居节点收到消息后,发现需要更新自己的信息,那么它需要继续将更新消息通知给它的邻居节点;如果说邻居节点不需要更新自己的信息,那么它就不需要通知其邻居节点,它对应的迭代终止。

-## 5. GeaFlow 架构简析
+## 5. Apache GeaFlow (Incubating) 架构简析
-GeaFlow 引擎主要由三大主要部分组成,DSL、Framework 和 State,同时向上为用户提供了 Stream API、静态图 API 和动态图
API。DSL 主要负责图查询语言 SQL+ISO/GQL 的解析和执行计划的优化,同时负责 schema 的推导,也向外部承接了多种
Connector,比如 hive、hudi、kafka、odps 等。Framework 层负责运行时的调度和容灾,shuffle
以及框架内各个组件的管理协调。State 层负责存储底层图数据和数据的持久化,同时也负责索引、下推等众多性能优化工作。
+Apache GeaFlow (Incubating) 引擎主要由三大主要部分组成,DSL、Framework 和 State,同时向上为用户提供了
Stream API、静态图 API 和动态图 API。DSL 主要负责图查询语言 SQL+ISO/GQL 的解析和执行计划的优化,同时负责 schema
的推导,也向外部承接了多种 Connector,比如 hive、hudi、kafka、odps 等。Framework
层负责运行时的调度和容灾,shuffle 以及框架内各个组件的管理协调。State 层负责存储底层图数据和数据的持久化,同时也负责索引、下推等众多性能优化工作。

-## 6. GeaFlow 性能测试
+## 6. Apache GeaFlow (Incubating) 性能测试
为了验证 GeaFlow 的增量图计算性能,我们设计了这样的实验。一批数据按照固定时间窗口实时输入到计算引擎中,我们分别用 Spark 和 GeaFlow
对全图做联通分量算法计算,比较两者计算耗时。实验在 3 台 24 核内存 128G
的机器上开展,使用的数据集是公开数据集[soc-Livejournal](https://snap.stanford.edu/data/soc-LiveJournal1.html),测试的图算法是弱联通分量算法。我们以
50w 条数据作为一个计算窗口,每输入到引擎中 50w 条数据,就触发一次图计算。
@@ -214,17 +212,17 @@ RETURN vid, component
2. <font style="color:rgb(6, 6, 7);">GeaFlow
通过增量计算避免了全量数据的重复处理,计算效率更高,计算时间更短性能不明显下降。</font>
3. <font style="color:rgb(6, 6, 7);">GeaFlow 支持 SQL+GQL
混合处理语言,更适合开发复杂的图数据处理任务。</font>
-GeaFlow 项目代码已全部开源,我们完成了部分流图引擎基础能力的构建,未来希望基于 GeaFlow
构建面向图数据的统一湖仓处理引擎,以解决多样化的大数据关联性分析诉求。同时我们也在积极筹备加入 Apache
基金会,丰富大数据开源生态,因此非常欢迎对图技术有浓厚兴趣同学加入社区共建。
+Apache GeaFlow (Incubating) 项目代码已全部开源,我们完成了部分流图引擎基础能力的构建,未来希望基于 GeaFlow
构建面向图数据的统一湖仓处理引擎,以解决多样化的大数据关联性分析诉求。同时我们也在积极筹备加入 Apache
基金会,丰富大数据开源生态,因此非常欢迎对图技术有浓厚兴趣同学加入社区共建。
社区中有诸多有趣的工作尚待完成,你可以从如下简单的「Good First Issue」开始,期待你加入同行。
-- 支持 Paimon Connector 插件,连接数据湖生态。([Issue
361](https://github.com/TuGraph-family/tugraph-analytics/issues/361))
-- 优化 GQL match 语句性能。([Issue
363](https://github.com/TuGraph-family/tugraph-analytics/issues/363))
-- 新增 ISO/GQL 语法,支持 same 谓词。([Issue
368](https://github.com/TuGraph-family/tugraph-analytics/issues/368))
+- 支持 Paimon Connector 插件,连接数据湖生态。([Issue
361](https://github.com/apache/geaflow/issues/361))
+- 优化 GQL match 语句性能。([Issue 363](https://github.com/apache/geaflow/issues/363))
+- 新增 ISO/GQL 语法,支持 same 谓词。([Issue
368](https://github.com/apache/geaflow/issues/368))
- ...
## 参考链接
-1. GeaFlow
项目地址:[https://github.com/TuGraph-family/tugraph-analytics](https://github.com/TuGraph-family/tugraph-analytics)
+1. Apache GeaFlow (Incubating)
项目地址:[https://github.com/apache/geaflow](https://github.com/apache/geaflow)
2. soc-Livejournal
数据集地址:[https://snap.stanford.edu/data/soc-LiveJournal1.html](https://snap.stanford.edu/data/soc-LiveJournal1.html)
-3. GeaFlow
Issues:[https://github.com/TuGraph-family/tugraph-analytics/issues](https://github.com/TuGraph-family/tugraph-analytics/issues)
+3. Apache GeaFlow (Incubating)
Issues:[https://github.com/apache/geaflow/issues](https://github.com/apache/geaflow/issues)
diff --git a/i18n/zh-CN/docusaurus-plugin-content-blog/28.md
b/i18n/zh-CN/docusaurus-plugin-content-blog/28.md
index 86401fb..47cf295 100644
--- a/i18n/zh-CN/docusaurus-plugin-content-blog/28.md
+++ b/i18n/zh-CN/docusaurus-plugin-content-blog/28.md
@@ -3,21 +3,19 @@ title: 流图计算之增量match原理与应用
date: 2025-6-3
---
-
-
## 问题背景
在流式计算中,数据往往不是全部一批到来,而会源源不断地进行输入和计算,在图计算/图查询领域,也存在类似的场景,图的点边不断地从数据源读取,进行构图,从而形成增量图。在增量图查询中,图随时发生着变化,在不同的图版本中,进行图查询的结果也会有所不同。对于某一次新增的点边,构成了一个新的版本的图,如果重新对全图(即当前所有点边)进行图遍历,开销较大,并且也会和历史数据有重复。由于历史的数据已经计算过一遍,理想情况下,只需要对增量所影响的部分进行计算/查询,而不需要对全图重新进行查询。
<!-- truncate -->
-<font style="color:rgb(51, 51, 51);">GQL(Graph Query Language)</font><font
style="color:rgb(0, 0, 0);">是国际标准化组织(ISO)为标准化图查询语言所制定的一个标准,</font><font
style="color:rgb(51, 51, 51);">用于在图上执行查询的语言。Geaflow
是蚂蚁图计算团队开源的流图计算引擎,专注于处理动态变化的图数据,支持大规模、高并发的实时图计算场景。</font>本文将介绍在 Geaflow
引擎中,对增量图使用 GQL 进行增量 Match 的方法,目的尽可能地只对增量的数据进行查询,避免冗余的全量计算。
+<font style="color:rgb(51, 51, 51);">GQL(Graph Query Language)</font><font
style="color:rgb(0, 0, 0);">是国际标准化组织(ISO)为标准化图查询语言所制定的一个标准,</font><font
style="color:rgb(51, 51, 51);">用于在图上执行查询的语言。Apache GeaFlow (Incubating)
是开源的流图计算引擎,专注于处理动态变化的图数据,支持大规模、高并发的实时图计算场景。</font>本文将介绍在 Apache GeaFlow
(Incubating) 引擎中,对增量图使用 GQL 进行增量 Match 的方法,目的尽可能地只对增量的数据进行查询,避免冗余的全量计算。

## 当前问题
-<font style="color:rgb(0, 0, 0);">Geaflow 引擎基于点中心框架(vertex
center),通过迭代的方式,每一轮迭代中,每个点向其他点发送消息,并在下一轮收到消息时进行处理、分析。</font>在 Geaflow 的框架中,GQL
的查询需要从前往后进行 Traversal 遍历走图,即从起始节点开始出发,进行扩散,依次进行点边匹配,直到匹配到所需要的查询
pattern。在动态图里场景,如果只使用当前批次新增的点边触发计算,增量的结果会有缺失,例如下面例子所示。
+<font style="color:rgb(0, 0, 0);">Apache GeaFlow (Incubating) 引擎基于点中心框架(vertex
center),通过迭代的方式,每一轮迭代中,每个点向其他点发送消息,并在下一轮收到消息时进行处理、分析。</font>在 Geaflow 的框架中,GQL
的查询需要从前往后进行 Traversal 遍历走图,即从起始节点开始出发,进行扩散,依次进行点边匹配,直到匹配到所需要的查询
pattern。在动态图里场景,如果只使用当前批次新增的点边触发计算,增量的结果会有缺失,例如下面例子所示。
<div style="text-align: center;">
<img src="/graph/1741576149930-b169b7da-0600-4fca-b6ad-5eadcfdbff5b.jpeg"
alt='画板' height="281" width="486"></div>
@@ -85,7 +83,7 @@ public void compute(Object vertexId, Iterator<MessageBox>
messageIterator) {
## Demo 示例
-在 Geaflow 中,通过设置点表或边表的 windowSize 来默认实现增量逻辑,即每一批读入 windowSize 大小的点边数据,来构建增量图。
+在 Apache GeaFlow (Incubating) 中,通过设置点表或边表的 windowSize 来默认实现增量逻辑,即每一批读入
windowSize 大小的点边数据,来构建增量图。
```sql
CREATE GRAPH modern (
@@ -163,4 +161,4 @@ INSERT INTO tbl_result
## 总结和展望
-<font style="color:rgb(0, 0,
0);">在动态图/流图的场景中,图的点边是在实时变化的,在进行图查询时,对于不同窗口数据的图,我们往往可以根据一些历史信息,只对增量的部分触发计算,来进行增量地计算,避免触发全图的遍历。Geaflow
使用了一种基于子图扩展的增量 match
方法,应用于点中心分布式图计算框架,在动态图场景下进行增量的查询,未来期望实现更多更复杂场景下的增量匹配逻辑。</font>
+<font style="color:rgb(0, 0,
0);">在动态图/流图的场景中,图的点边是在实时变化的,在进行图查询时,对于不同窗口数据的图,我们往往可以根据一些历史信息,只对增量的部分触发计算,来进行增量地计算,避免触发全图的遍历。Apache
GeaFlow (Incubating) 使用了一种基于子图扩展的增量 match
方法,应用于点中心分布式图计算框架,在动态图场景下进行增量的查询,未来期望实现更多更复杂场景下的增量匹配逻辑。</font>
diff --git a/i18n/zh-CN/docusaurus-plugin-content-blog/29.md
b/i18n/zh-CN/docusaurus-plugin-content-blog/29.md
index aeb5ed7..ba44ce7 100644
--- a/i18n/zh-CN/docusaurus-plugin-content-blog/29.md
+++ b/i18n/zh-CN/docusaurus-plugin-content-blog/29.md
@@ -1,5 +1,5 @@
---
-title: GeaFlow 时序能力探秘——让时间数据焕发新生!
+title: Apache GeaFlow (Incubating) 时序能力探秘——让时间数据焕发新生!
date: 2025-6-25
---
@@ -28,15 +28,15 @@ date: 2025-6-25
<font style="color:rgb(63, 63,
63);"> 很多工具只支持单一类型的数据分析,无法同时处理实时流数据和历史数据。</font>
-<font style="color:rgb(63, 63, 63);"> 为了解决上述问题,GeaFlow
创新性地提出了时序图计算的概念。作为一款专为动态图数据处理设计的分布式流图计算引擎,GeaFlow
能够高效应对动态数据带来的挑战。针对实时变化的图结构,用户可以轻松进行图遍历、图匹配和图计算等操作,从而满足复杂场景下的分析需求。通过结合时间维度与动态图处理能力,GeaFlow
为实时数据分析提供了全新的解决方案,帮助用户更精准地挖掘动态数据中的价值。</font>
+<font style="color:rgb(63, 63, 63);"> 为了解决上述问题,Apache
GeaFlow (Incubating) 创新性地提出了时序图计算的概念。作为一款专为动态图数据处理设计的分布式流图计算引擎,GeaFlow
能够高效应对动态数据带来的挑战。针对实时变化的图结构,用户可以轻松进行图遍历、图匹配和图计算等操作,从而满足复杂场景下的分析需求。通过结合时间维度与动态图处理能力,GeaFlow
为实时数据分析提供了全新的解决方案,帮助用户更精准地挖掘动态数据中的价值。</font>
-## <font style="color:rgb(63, 63, 63);">什么是 GeaFlow?</font><font
style="color:rgb(63, 63, 63);"></font>
+## <font style="color:rgb(63, 63, 63);">什么是 Apache GeaFlow
(Incubating)?</font><font style="color:rgb(63, 63, 63);"></font>
-<font style="color:rgb(63, 63, 63);">GeaFlow
是一个强大的分布式计算平台,结合了图计算和流处理的优势,能够高效处理动态图和时序数据。它不仅支持复杂的图算法,还具备实时分析能力,适用于各种动态场景。其主要特点包括:</font>
+<font style="color:rgb(63, 63, 63);">Apache GeaFlow (Incubating)
是一个强大的分布式计算平台,结合了图计算和流处理的优势,能够高效处理动态图和时序数据。它不仅支持复杂的图算法,还具备实时分析能力,适用于各种动态场景。其主要特点包括:</font>
- <font style="color:rgb(63, 63, 63);">分布式架构</font>
-<font style="color:rgb(63, 63, 63);">GeaFlow
基于分布式计算框架,能够高效处理超大规模的动态图数据(例如数十亿节点和边)。通过分区和副本机制,GeaFlow 确保了系统的高可用性和可扩展性。</font>
+<font style="color:rgb(63, 63, 63);">Apache GeaFlow (Incubating)
基于分布式计算框架,能够高效处理超大规模的动态图数据(例如数十亿节点和边)。通过分区和副本机制,GeaFlow 确保了系统的高可用性和可扩展性。</font>
- <font style="color:rgb(63, 63, 63);">流图与时序图的无缝集成</font>
@@ -92,7 +92,7 @@ date: 2025-6-25
<font style="color:rgb(63, 63,
63);">通过引入时间戳,时序图使得流图能够进行更复杂的分析,例如时间窗口分析、趋势预测等。</font>
-### **4. GeaFlow 的实现细节**
+### **4. Apache GeaFlow (Incubating) 的实现细节**
<font style="color:rgb(63, 63, 63);">GeaFlow 通过以下技术手段实现了流图与时序图的无缝结合:</font>
@@ -432,26 +432,26 @@ a_id | e1_ts | b_id | e2_ts | c_id
### **10. 技术优势**
-- **<font style="color:rgb(63, 63, 63);">实时性</font>**<font
style="color:rgba(0, 0, 0, 0.9);">:</font><font style="color:rgb(63, 63,
63);">GeaFlow 支持毫秒级的数据流处理,确保用户关系图始终是最新的。</font>
+- **<font style="color:rgb(63, 63, 63);">实时性</font>**<font
style="color:rgba(0, 0, 0, 0.9);">:</font><font style="color:rgb(63, 63,
63);">Apache GeaFlow (Incubating) 支持毫秒级的数据流处理,确保用户关系图始终是最新的。</font>
- **<font style="color:rgb(63, 63, 63);">时间敏感性:</font>**<font
style="color:rgb(63, 63, 63);">通过时间戳字段,精确管理好友关系的时间顺序。</font>
- **<font style="color:rgb(63, 63, 63);">灵活性:</font>**<font
style="color:rgb(63, 63, 63);">SQL 驱动的开发模式,降低了开发门槛,提升了开发效率。</font>
- **<font style="color:rgb(63, 63, 63);">可拓展性:</font>**<font
style="color:rgb(63, 63, 63);">支持大规模动态图的增量计算,能够轻松应对社交平台的海量用户数据。</font>
-## <font style="color:rgb(63, 63, 63);">GeaFlow 时序能力的核心亮点</font><font
style="color:rgba(0, 0, 0, 0.9);"></font>
+## <font style="color:rgb(63, 63, 63);">Apache GeaFlow (Incubating)
时序能力的核心亮点</font><font style="color:rgba(0, 0, 0, 0.9);"></font>
### **1. 时间感知的数据处理**
-<font style="color:rgb(63, 63, 63);">每条数据都带有时间戳,能够精确记录事件发生的时间。GeaFlow
支持基于时间窗口的分析,例如:</font>
+<font style="color:rgb(63, 63, 63);">每条数据都带有时间戳,能够精确记录事件发生的时间。Apache GeaFlow
(Incubating) 支持基于时间窗口的分析,例如:</font>
- **<font style="color:rgb(63, 63, 63);">最近 5 分钟的趋势变化</font>**<font
style="color:rgba(0, 0, 0, 0.9);">
</font><font style="color:rgb(63, 63, 63);">用户可以通过设置时间窗口,分析最近 5
分钟内的数据变化趋势。例如,在社交网络中,分析用户互动的频率变化。</font>**<font style="color:rgb(63, 63,
63);"></font>**
- **<font style="color:rgb(63, 63, 63);">过去一天的动态模式</font>**<font
style="color:rgba(0, 0, 0, 0.9);">
- </font><font style="color:rgb(63, 63, 63);">GeaFlow
支持长时间跨度的分析,帮助用户发现长期趋势。例如,在电商推荐系统中,分析用户在过去一天内的购买行为。</font>
+ </font><font style="color:rgb(63, 63, 63);">Apache GeaFlow (Incubating)
支持长时间跨度的分析,帮助用户发现长期趋势。例如,在电商推荐系统中,分析用户在过去一天内的购买行为。</font>
### **2. 动态图与时序结合**
-<font style="color:rgb(63, 63, 63);">GeaFlow 将图结构与时间维度结合,能够捕捉图中关系的演变。例如:</font>
+<font style="color:rgb(63, 63, 63);">Apache GeaFlow (Incubating)
将图结构与时间维度结合,能够捕捉图中关系的演变。例如:</font>
- **<font style="color:rgb(63, 63, 63);">社交网络中好友关系的变化</font>**<font
style="color:rgba(0, 0, 0, 0.9);">
@@ -460,7 +460,7 @@ a_id | e1_ts | b_id | e2_ts | c_id
### **3. 实时与历史数据的无缝融合**
-<font style="color:rgb(63, 63, 63);">GeaFlow
不仅支持实时流数据的处理,还能结合历史数据进行对比分析。这种能力特别适合需要长期趋势分析和短期实时监控的场景。例如:</font>
+<font style="color:rgb(63, 63, 63);">Apache GeaFlow (Incubating)
不仅支持实时流数据的处理,还能结合历史数据进行对比分析。这种能力特别适合需要长期趋势分析和短期实时监控的场景。例如:</font>
- **<font style="color:rgb(63, 63, 63);">物联网设备监控</font>**<font
style="color:rgba(0, 0, 0, 0.9);">
@@ -469,7 +469,7 @@ a_id | e1_ts | b_id | e2_ts | c_id
### **4. 丰富的内置算法**
-<font style="color:rgb(63, 63, 63);">GeaFlow 提供针对时序数据优化的算法,例如:</font>
+<font style="color:rgb(63, 63, 63);">Apache GeaFlow (Incubating)
提供针对时序数据优化的算法,例如:</font>
- <font style="color:rgb(63, 63, 63);">最短路径</font>
- <font style="color:rgb(63, 63, 63);">弱联通分量</font>
@@ -485,4 +485,4 @@ a_id | e1_ts | b_id | e2_ts | c_id
## <font style="color:rgb(63, 63, 63);">术语</font>**<font style="color:rgb(73,
80, 87);"></font>**
-**<font style="color:rgb(63, 63, 63);">DSL: </font>**<font
style="color:rgb(63, 63, 63);">Domain-Specific Language。融合 DSL 是 GeaFlow
提供的图表一体的数据分析语言,支持标准 SQL+ISO/GQL 进行图表分析.通过融合 DSL
可以对表数据做关系运算处理,也可以对图数据做图匹配和图算法计算,同时也支持同时图表数据的联合处理。</font>
+**<font style="color:rgb(63, 63, 63);">DSL: </font>**<font
style="color:rgb(63, 63, 63);">Domain-Specific Language。融合 DSL 是 Apache GeaFlow
(Incubating) 提供的图表一体的数据分析语言,支持标准 SQL+ISO/GQL 进行图表分析.通过融合 DSL
可以对表数据做关系运算处理,也可以对图数据做图匹配和图算法计算,同时也支持同时图表数据的联合处理。</font>
diff --git a/i18n/zh-CN/docusaurus-plugin-content-blog/30.md
b/i18n/zh-CN/docusaurus-plugin-content-blog/30.md
index 874a4b2..5675486 100644
--- a/i18n/zh-CN/docusaurus-plugin-content-blog/30.md
+++ b/i18n/zh-CN/docusaurus-plugin-content-blog/30.md
@@ -72,7 +72,7 @@ date: 2025-5-15
### 2. 数据通道:物化数据交互能力
-类似于传统数据仓库,图数仓基于 GeaFlow 引擎能力与 TuMaker 成熟的业务平台提供数据<font style="color:rgb(0, 0,
0);">任务编排能力,即将多个数据处理任务(如数据抽取、转换、加载等)按照一定的逻辑顺序组织起来,自动执行的过程。提供可视化界面、任务调度机制、监听事件触发、错误处理、监控与日志、版本控制与回滚、智能调度集群资源等关键能力。</font>
+类似于传统数据仓库,图数仓基于 Apache GeaFlow (Incubating) 引擎能力与 TuMaker 成熟的业务平台提供数据<font
style="color:rgb(0, 0,
0);">任务编排能力,即将多个数据处理任务(如数据抽取、转换、加载等)按照一定的逻辑顺序组织起来,自动执行的过程。提供可视化界面、任务调度机制、监听事件触发、错误处理、监控与日志、版本控制与回滚、智能调度集群资源等关键能力。</font>
<font style="color:rgb(0, 0, 0);">在 Schema
转换器的加持下,可以得到从表存储到图存储的物化方案,它</font><font style="color:rgba(0, 0, 0,
0.88);">构建了连接传统数仓与图数仓的数据通道。基于表转图的物化方案,可以根据业务实际配置的加速表、加速关系、字段、权限等信息,全自动生成数据同步的任务编排,再通过图数仓平台调度,实现数据迁移全程无感,后续实时更新与增量同步,同步效率可达延迟十分钟级别。</font>
diff --git a/i18n/zh-CN/docusaurus-plugin-content-blog/31.md
b/i18n/zh-CN/docusaurus-plugin-content-blog/31.md
index 2b7980b..f5d0ceb 100644
--- a/i18n/zh-CN/docusaurus-plugin-content-blog/31.md
+++ b/i18n/zh-CN/docusaurus-plugin-content-blog/31.md
@@ -3,11 +3,9 @@ title: Graph4Stream:基于图的流计算加速
date: 2025-3-25
---
-
-
> 作者:坤羽;审校:东朔。
-之前在「姊妹篇」[《Stream4Graph:动态图上的增量计算》](https://zhuanlan.zhihu.com/p/27618053733)中,向大家介绍了在图计算技术中引入增量计算能力「图+流」,GeaFlow
流图计算相比 Spark GraphX 取得了显著的性能提升。那么在流计算技术中引入图计算能力「流+图」,GeaFlow 流图计算相比 Flink
关联计算性能如何呢?
+之前在「姊妹篇」[《Stream4Graph:动态图上的增量计算》](https://zhuanlan.zhihu.com/p/27618053733)中,向大家介绍了在图计算技术中引入增量计算能力「图+流」,Apache
GeaFlow (Incubating) 流图计算相比 Spark GraphX
取得了显著的性能提升。那么在流计算技术中引入图计算能力「流+图」,GeaFlow 流图计算相比 Flink 关联计算性能如何呢?
当今时代,<font style="color:rgb(51, 51,
51);">数据正以前所未有的速度和规模产生,对海量数据进行实时处理在异常检测、搜索推荐、金融交易等各个领域都有着广泛的应用。流计算</font><font
style="color:rgb(0, 0, 0);">作为最主要的实时数据处理技术也变得越来越重要。</font>
@@ -19,7 +17,7 @@ date: 2025-3-25
<font style="color:rgb(0, 0, 0);"></font>
-<font style="color:rgb(0, 0, 0);">蚂蚁图计算团队开源的流图计算引擎</font>GeaFlow<font
style="color:rgb(0, 0,
0);">,将图计算与流计算相结合,提供了高效的流图处理框架,大幅提升了计算性能。下面为大家介绍传统流计算引擎在关联关系计算的局限性,GeaFlow
流图计算高效的原理以及他们的性能对比。</font>
+<font style="color:rgb(0, 0, 0);">开源的流图计算引擎</font>Apache GeaFlow
(Incubating)<font style="color:rgb(0, 0,
0);">,将图计算与流计算相结合,提供了高效的流图处理框架,大幅提升了计算性能。下面为大家介绍传统流计算引擎在关联关系计算的局限性,GeaFlow
流图计算高效的原理以及他们的性能对比。</font>
<font style="color:rgb(0, 0, 0);"></font>
@@ -92,7 +90,7 @@ ON `e`.`dst` = `v`.`vid`;
Flink Join 算子实现
-## 流图计算引擎:GeaFlow
+## 流图计算引擎:Apache GeaFlow (Incubating)
### 图计算&流图
@@ -106,9 +104,9 @@ Flink Join 算子实现
<font style="color:rgb(0, 0, 0);"></font>
-### <font style="color:rgb(0, 0, 0);">GeaFlow 架构</font>
+### <font style="color:rgb(0, 0, 0);">Apache GeaFlow (Incubating) 架构</font>
-GeaFlow
引擎的计算流程分为流数据输入、分布式增量图计算、增量结果输出几个部分。和传统的流计算引擎一样,输入的实时数据按照窗口被切分成微批。对于当前批次的数据,先按照建模策略解析成点边构成增量图。增量图和之前数据构成的历史图一道组成完整的流图。计算框架在流图上应用增量图算法得到增量结果输出,最后把增量图添加到历史图中。
+Apache GeaFlow (Incubating)
引擎的计算流程分为流数据输入、分布式增量图计算、增量结果输出几个部分。和传统的流计算引擎一样,输入的实时数据按照窗口被切分成微批。对于当前批次的数据,先按照建模策略解析成点边构成增量图。增量图和之前数据构成的历史图一道组成完整的流图。计算框架在流图上应用增量图算法得到增量结果输出,最后把增量图添加到历史图中。

@@ -116,7 +114,7 @@ GeaFlow 引擎的计算流程分为流数据输入、分布式增量图计算、
<font style="color:rgb(0, 0, 0);">GeaFlow
计算框架是以点为中心的迭代计算模型。他以增量图中的点作为第一轮迭代的起点。在每一轮迭代中,每个点都独立维护自身的状态,根据与每个点关联的历史图和增量图完成当前迭代轮次的计算,最后将计算结果通过消息传递给邻居点,开启下一轮迭代。</font>
-<font style="color:rgb(0, 0, 0);">以前文中提到的 k-Hop
为例,增量算法如下:在第一轮迭代中,我们找到增量图中的所有边,将这些边作为初始的入向路径和出向路径,分别发送到他们的起点和终点。在后续的迭代中不断扩展入向路径和出向路径。当达到求取跳数时,将出向路径和入向路径发送给起点,在起点组合成最终结果。详细代码实现在开源仓库的</font>[IncKHopAlgorithm.java](https://github.com/TuGraph-family/tugraph-analytics/blob/master/geaflow/geaflow-dsl/geaflow-dsl-plan/src/main/java/com/antgroup/geaflow/dsl/udf/graph/IncKHopAlgorithm.java)<font
style="color:rgb(0, 0, 0);">文件中。</font>
+<font style="color:rgb(0, 0, 0);">以前文中提到的 k-Hop
为例,增量算法如下:在第一轮迭代中,我们找到增量图中的所有边,将这些边作为初始的入向路径和出向路径,分别发送到他们的起点和终点。在后续的迭代中不断扩展入向路径和出向路径。当达到求取跳数时,将出向路径和入向路径发送给起点,在起点组合成最终结果。详细代码实现在开源仓库的</font>[IncKHopAlgorithm.java](https://github.com/apache/geaflow/blob/master/geaflow/geaflow-dsl/geaflow-dsl-plan/src/main/java/com/antgroup/geaflow/dsl/udf/graph/IncKHopAlgorithm.java)<font
style="color:rgb(0, 0, 0);">文件中。</font>
<font style="color:rgb(0, 0, 0);">下图是两跳场景的描述。在第一轮迭代,增量边 B->C
分别构建入向路径和出向路径,将他们分别发送给点 B 和点 C。在第二轮迭代,B 收到入向路径,并加上当前点的入边形成 2 跳入向路径,发送给点 B。同样点 C
也收到出向路径,加上当前的出边形成 2 跳出向路径,发送给点 B。最后一轮迭代在 B 点将收到的出向和入向路径整合成新增的路径。可以看到,和 Flink
中需要查找所有的历史关系不同,GeaFlow 采用基于流图的增量图算法,计算量和图中的增量路径成正比。</font>
@@ -187,9 +185,9 @@ RETURN ret
<font style="color:rgb(0, 0, 0);"></font>
-## GeaFlow 性能测试
+## Apache GeaFlow (Incubating) 性能测试
-为了验证 GeaFlow 的流图计算性能,我们以<font style="color:rgb(0, 0, 0);">k-Hop</font>算法为例设计了和
Flink 的对比实验。我们将指定数据作为输入源输入到计算引擎中,执行<font style="color:rgb(0, 0,
0);">k-Hop</font>算法,并统计所有数据完成计算的时间来比较系统的性能。我们采用公开数据集[web-Google.txt](https://snap.stanford.edu/data/web-Google.html)作为输入,实验环境为
16 台 8 核 16G 的服务器,分别比较了一跳、两跳、三跳、四跳关系计算的场景。
+为了验证 Apache GeaFlow (Incubating) 的流图计算性能,我们以<font style="color:rgb(0, 0,
0);">k-Hop</font>算法为例设计了和 Flink 的对比实验。我们将指定数据作为输入源输入到计算引擎中,执行<font
style="color:rgb(0, 0,
0);">k-Hop</font>算法,并统计所有数据完成计算的时间来比较系统的性能。我们采用公开数据集[web-Google.txt](https://snap.stanford.edu/data/web-Google.html)作为输入,实验环境为
16 台 8 核 16G 的服务器,分别比较了一跳、两跳、三跳、四跳关系计算的场景。
实验结果如图所示,横坐标是分别是一跳关系、两跳关系、三跳关系、四跳关系,纵坐标是处理完所有数据的耗时,采用对数指标。可以看到在一跳、两跳场景中,Flink
的性能要好于 GeaFlow,这是因为在一跳、两跳场景中参与 join 计算的数据量比较小,join 需要遍历的左表和右表都很小,遍历本身耗时短,而且
Flink 的计算框架可以缓存 join 的历史计算结果。但是到了三跳、四跳场景时候,由于计算复杂度的上升,join
算子需要遍历的表迅速膨胀,带来计算性能的急剧下降,甚至四跳场景超过一天也无法完成计算。而 GeaFlow<font style="color:rgb(0,
0, 0);">采用基于流图增量图算法,计算耗时只和增量路径相关,和历史的关联关系计算结果无关,所以性能明显优于 Flink。</font>
@@ -201,17 +199,17 @@ k-Hop 计算性能对比
传统的 Flink 等流计算引擎在计算关联关系时需要用到 join 算子,join
算子需要遍历全量的历史数据,这使得他们在大数据关联计算场景中性能不佳。GeaFlow
引擎通过支持流图计算框架,将图计算引入到流计算中,采用增量图计算的方法大大提升了实时数据的处理系性能。
-目前 GeaFlow 项目代码已经开源,我们希望基于 GeaFlow
构建面向图数据的统一湖仓处理引擎,以解决多样化的大数据关联性分析诉求。同时我们也在积极筹备加入 Apache
基金会,丰富大数据开源生态,因此非常欢迎对图技术有浓厚兴趣同学加入社区共建。
+目前 Apache GeaFlow (Incubating) 项目代码已经开源,我们希望基于 GeaFlow
构建面向图数据的统一湖仓处理引擎,以解决多样化的大数据关联性分析诉求。同时我们也在积极筹备加入 Apache
基金会,丰富大数据开源生态,因此非常欢迎对图技术有浓厚兴趣同学加入社区共建。
社区中有诸多有趣的工作尚待完成,你可以从如下简单的「Good First Issue」开始,期待你加入同行。
-- 支持增量 k-Core 算法。([Issue
466](https://github.com/TuGraph-family/tugraph-analytics/issues/466))
-- 支持增量最小生成树算法。([Issue
465](https://github.com/TuGraph-family/tugraph-analytics/issues/465))
+- 支持增量 k-Core 算法。([Issue 466](https://github.com/apache/geaflow/issues/466))
+- 支持增量最小生成树算法。([Issue 465](https://github.com/apache/geaflow/issues/465))
- ...
## 参考链接
-1. GeaFlow
项目地址:[https://github.com/TuGraph-family/tugraph-analytics](https://github.com/TuGraph-family/tugraph-analytics)
+1. Apache GeaFlow (Incubating)
项目地址:[https://github.com/apache/geaflow](https://github.com/apache/geaflow)
2. web-Google
数据集地址:[https://snap.stanford.edu/data/web-Google.html](https://snap.stanford.edu/data/web-Google.html)
-3. GeaFlow
Issues:[https://github.com/TuGraph-family/tugraph-analytics/issues](https://github.com/TuGraph-family/tugraph-analytics/issues)
-4. 增量 k-Hop
算法实现源码:[https://github.com/TuGraph-family/tugraph-analytics/blob/master/geaflow/geaflow-dsl/geaflow-dsl-plan/src/main/java/com/antgroup/geaflow/dsl/udf/graph/IncKHopAlgorithm.java](https://github.com/TuGraph-family/tugraph-analytics/blob/master/geaflow/geaflow-dsl/geaflow-dsl-plan/src/main/java/com/antgroup/geaflow/dsl/udf/graph/IncKHopAlgorithm.java)
+3. Apache GeaFlow (Incubating)
Issues:[https://github.com/apache/geaflow/issues](https://github.com/apache/geaflow/issues)
+4. 增量 k-Hop
算法实现源码:[https://github.com/apache/geaflow/blob/master/geaflow/geaflow-dsl/geaflow-dsl-plan/src/main/java/com/antgroup/geaflow/dsl/udf/graph/IncKHopAlgorithm.java](https://github.com/apache/geaflow/blob/master/geaflow/geaflow-dsl/geaflow-dsl-plan/src/main/java/com/antgroup/geaflow/dsl/udf/graph/IncKHopAlgorithm.java)
diff --git a/i18n/zh-CN/docusaurus-plugin-content-blog/32.md
b/i18n/zh-CN/docusaurus-plugin-content-blog/32.md
index 8cb1155..7813408 100644
--- a/i18n/zh-CN/docusaurus-plugin-content-blog/32.md
+++ b/i18n/zh-CN/docusaurus-plugin-content-blog/32.md
@@ -1,9 +1,9 @@
---
-title: "流式图计算引擎 GeaFlow v0.6.4 发布,支持关系型访问图数据,增量匹配优化实时处理"
+title: "流式图计算引擎 Apache GeaFlow (Incubating) v0.6.4 发布,支持关系型访问图数据,增量匹配优化实时处理"
date: 2025-4-3
---
-<font style="color:rgb(63, 63, 63);">2025 年 3 月发布了流式图计算引擎 GeaFlow
v0.6.4,新版本实现了多个重要特性更新,包括:</font>
+<font style="color:rgb(63, 63, 63);">2025 年 3 月发布了流式图计算引擎 Apache GeaFlow
(Incubating) v0.6.4,新版本实现了多个重要特性更新,包括:</font>
- <font style="color:rgb(63, 63, 63);">🍀</font><font style="color:rgb(63, 63,
63);">GeaFlow 图存储扩展支持 paimon 数据湖(实验性功能)</font>
- <font style="color:rgb(63, 63, 63);">🍀</font><font style="color:rgb(63, 63,
63);">图数仓能力扩展:支持对图中的实体进行关系型访问</font>
@@ -15,7 +15,7 @@ date: 2025-4-3
## ✨ 新增功能
-### <font style="color:rgb(63, 63, 63);">🍀</font><font style="color:rgb(63,
63, 63);">GeaFlow 图存储扩展支持 paimon 数据湖(实验性功能)</font>
+### <font style="color:rgb(63, 63, 63);">🍀</font><font style="color:rgb(63,
63, 63);">Apache GeaFlow (Incubating) 图存储扩展支持 paimon 数据湖(实验性功能)</font>
<font style="color:rgb(63, 63, 63);">为提升 GeaFlow
数据存储系统的扩展性、实时数据处理能力及成本效率,本次更新加入了对 Apache Paimon 的支持。Paimon
作为新一代流式数据湖存储格式,在设计理念、功能特性上,与 GeaFlow 之前使用的 RocksDB 存在许多差异:</font>
@@ -29,7 +29,7 @@ date: 2025-4-3
- <font style="color:rgb(63, 63, 63);">当前为实验性功能,仅支持使用本地文件系统作为 paimon
的存储后端,且暂不支持 recover 能力,暂不支持动态图数据存储。</font>
- <font style="color:rgb(63, 63,
63);">通过配置`geaflow.store.paimon.options.warehouse`参数来指定存储路径,默认路径为"file:///tmp/paimon/"。</font>
-<font style="color:rgb(63, 63, 63);">当前 GeaFlow 的存储架构图如下。</font>
+<font style="color:rgb(63, 63, 63);">当前 Apache GeaFlow (Incubating)
的存储架构图如下。</font>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]