This is an automated email from the ASF dual-hosted git repository. jin pushed a commit to branch docs in repository https://gitbox.apache.org/repos/asf/incubator-hugegraph-computer.git
commit 5a6a5fe7374f831603ffdbe6efd76132b8a34da1 Author: imbajin <[email protected]> AuthorDate: Thu Jan 29 15:39:19 2026 +0800 docs: add AGENTS.md and update docs & gitignore Add AI-assistant guidance files (AGENTS.md) at repository root and under vermeer, and expand documentation across the project: significantly update top-level README.md, computer/README.md, and vermeer/README.md with architecture, quick-starts, build/test instructions, and examples. Also update CI badge link in README and add AI-assistant-specific ignore patterns to .gitignore and vermeer/.gitignore to avoid tracking assistant artifacts. --- .gitignore | 9 + AGENTS.md | 237 +++++++++++++++++++++++ README.md | 234 ++++++++++++++++++++--- computer/README.md | 523 ++++++++++++++++++++++++++++++++++++++++++++++++++- vermeer/.gitignore | 10 + vermeer/AGENTS.md | 207 +++++++++++++++++++++ vermeer/README.md | 536 +++++++++++++++++++++++++++++++++++++++++++++++------ 7 files changed, 1664 insertions(+), 92 deletions(-) diff --git a/.gitignore b/.gitignore index e909d27e..99283ffa 100644 --- a/.gitignore +++ b/.gitignore @@ -55,6 +55,15 @@ build/ *.log *.pyc +# AI assistant specific files (we only maintain AGENTS.md) +CLAUDE.md +GEMINI.md +CURSOR.md +COPILOT.md +.cursorrules +.cursor/ +.github/copilot-instructions.md + # maven ignore apache-hugegraph-*-incubating-*/ diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 00000000..b3afd40e --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,237 @@ +# AGENTS.md + +This file provides guidance to AI coding assistants when working with code in this repository. + +## Repository Overview + +This is the Apache HugeGraph-Computer repository containing two distinct graph computing systems: + +1. **computer** (Java/Maven): A distributed BSP/Pregel-style graph processing framework that runs on Kubernetes or YARN +2. **vermeer** (Go): A high-performance in-memory graph computing platform with master-worker architecture + +Both integrate with HugeGraph for graph data input/output. + +## Build & Test Commands + +### Computer (Java) + +**Prerequisites:** +- JDK 11 for building/running +- JDK 8 for HDFS dependencies +- Maven 3.5+ +- For K8s module: run `mvn clean install` first to generate CRD classes under computer-k8s + +**Build:** +```bash +cd computer +mvn clean compile -Dmaven.javadoc.skip=true +``` + +**Tests:** +```bash +# Unit tests +mvn test -P unit-test + +# Integration tests +mvn test -P integrate-test +``` + +**Run single test:** +```bash +# Run specific test class +mvn test -P unit-test -Dtest=ClassName + +# Run specific test method +mvn test -P unit-test -Dtest=ClassName#methodName +``` + +**License check:** +```bash +mvn apache-rat:check +``` + +**Package:** +```bash +mvn clean package -DskipTests +``` + +### Vermeer (Go) + +**Prerequisites:** +- Go 1.23+ +- `curl` and `unzip` (for downloading binary dependencies) + +**First-time setup:** +```bash +cd vermeer +make init # Downloads supervisord and protoc binaries, installs Go deps +``` + +**Build:** +```bash +make # Build for current platform +make build-linux-amd64 +make build-linux-arm64 +``` + +**Development build with hot-reload UI:** +```bash +go build -tags=dev +``` + +**Clean:** +```bash +make clean # Remove built binaries and generated assets +make clean-all # Also remove downloaded tools +``` + +**Run:** +```bash +# Using binary directly +./vermeer --env=master +./vermeer --env=worker + +# Using script (configure in vermeer.sh) +./vermeer.sh start master +./vermeer.sh start worker +``` + +**Regenerate protobuf (if proto files changed):** +```bash +go install google.golang.org/protobuf/cmd/[email protected] +go install google.golang.org/grpc/cmd/[email protected] +tools/protoc/osxm1/protoc *.proto --go-grpc_out=. --go_out=. +``` + +## Architecture + +### Computer (Java) - BSP/Pregel Framework + +**Module Structure:** +- `computer-api`: Public interfaces for graph processing (Computation, Vertex, Edge, Aggregator, Combiner, GraphFactory) +- `computer-core`: Runtime implementation (WorkerService, MasterService, messaging, BSP coordination, managers) +- `computer-algorithm`: Built-in algorithms (PageRank, LPA, WCC, SSSP, TriangleCount, etc.) +- `computer-driver`: Job submission and driver-side coordination +- `computer-k8s`: Kubernetes deployment integration +- `computer-yarn`: YARN deployment integration +- `computer-k8s-operator`: Kubernetes operator for job management +- `computer-dist`: Distribution packaging +- `computer-test`: Integration and unit tests + +**Key Design Patterns:** + +1. **API/Implementation Separation**: Algorithms depend only on `computer-api` interfaces; `computer-core` provides runtime implementation. Algorithms are dynamically loaded via config. + +2. **Manager Pattern**: `WorkerService` composes multiple managers (MessageSendManager, MessageRecvManager, WorkerAggrManager, DataServerManager, SortManagers, SnapshotManager, etc.) with lifecycle hooks: `initAll()`, `beforeSuperstep()`, `afterSuperstep()`, `closeAll()`. + +3. **BSP Coordination**: Explicit barrier synchronization via etcd (EtcdBspClient). Each superstep follows: + - `workerStepPrepareDone` β `waitMasterStepPrepareDone` + - Local compute (vertices process messages) + - `workerStepComputeDone` β `waitMasterStepComputeDone` + - Aggregators/snapshots + - `workerStepDone` β `waitMasterStepDone` (master returns SuperstepStat) + +4. **Computation Contract**: Algorithms implement `Computation<M extends Value>`: + - `compute0(context, vertex)`: Initialize at superstep 0 + - `compute(context, vertex, messages)`: Process messages in subsequent supersteps + - Access to aggregators, combiners, and message sending via `ComputationContext` + +**Important Files:** +- Algorithm contract: `computer/computer-api/src/main/java/org/apache/hugegraph/computer/core/worker/Computation.java` +- Runtime orchestration: `computer/computer-core/src/main/java/org/apache/hugegraph/computer/core/worker/WorkerService.java` +- BSP coordination: `computer/computer-core/src/main/java/org/apache/hugegraph/computer/core/bsp/Bsp4Worker.java` +- Example algorithm: `computer/computer-algorithm/src/main/java/org/apache/hugegraph/computer/algorithm/centrality/pagerank/PageRank.java` + +### Vermeer (Go) - In-Memory Computing Engine + +**Directory Structure:** +- `algorithms/`: Go algorithm implementations (pagerank.go, sssp.go, louvain.go, etc.) +- `apps/`: + - `bsp/`: BSP coordination helpers + - `graphio/`: HugeGraph I/O adapters (reads via gRPC to store/pd, writes via HTTP REST) + - `master/`: Master scheduling, HTTP endpoints, worker management + - `compute/`: Worker-side compute logic + - `protos/`: Generated protobuf/gRPC definitions + - `common/`: Utilities, logging, metrics +- `client/`: Client libraries +- `tools/`: Binary dependencies (supervisord, protoc) +- `ui/`: Web UI assets + +**Key Patterns:** + +1. **Maker/Registry Pattern**: Graph loaders/writers register themselves via init() (e.g., `LoadMakers[LoadTypeHugegraph] = &HugegraphMaker{}`). Master selects loader by type. + +2. **HugeGraph Integration**: + - `hugegraph.go` implements HugegraphMaker, HugegraphLoader, HugegraphWriter + - Queries PD via gRPC for partition metadata + - Streams vertex/edge data via gRPC from store (ScanPartition) + - Writes results back via HugeGraph HTTP REST API + +3. **Master-Worker**: Master schedules LoadPartition tasks to workers, manages worker lifecycle via WorkerManager/WorkerClient, exposes HTTP admin endpoints. + +**Important Files:** +- HugeGraph integration: `vermeer/apps/graphio/hugegraph.go` +- Master scheduling: `vermeer/apps/master/tasks/tasks.go` +- Worker management: `vermeer/apps/master/workers/workers.go` +- HTTP endpoints: `vermeer/apps/master/services/http_master.go` + +## Integration with HugeGraph + +**Computer (Java):** +- `WorkerInputManager` reads vertices/edges from HugeGraph via `GraphFactory` abstraction +- Graph data is partitioned and distributed to workers via input splits + +**Vermeer (Go):** +- Directly queries HugeGraph PD (metadata service) for partition information +- Uses gRPC to stream graph data from HugeGraph store +- Writes computed results back via HugeGraph HTTP REST API (adds properties to vertices) + +## Development Workflow + +**Adding a New Algorithm (Computer):** +1. Create class in `computer-algorithm` implementing `Computation<MessageType>` +2. Implement `compute0()` for initialization and `compute()` for message processing +3. Use `context.sendMessage()` or `context.sendMessageToAllEdges()` for message passing +4. Register aggregators in `beforeSuperstep()`, read/write in `compute()` +5. Configure algorithm class name in job config + +**K8s-Operator Development:** +- CRD classes are auto-generated; run `mvn clean install` in `computer-k8s-operator` first +- Generated classes appear in `computer-k8s/target/generated-sources/` +- CRD generation script: `computer-k8s-operator/crd-generate/Makefile` + +**Vermeer Asset Updates:** +- Web UI assets must be regenerated after changes: `cd asset && go generate` +- Or use `make generate-assets` from vermeer root +- For dev mode with hot-reload: `go build -tags=dev` + +## Testing Notes + +**Computer:** +- Integration tests require etcd, HDFS, HugeGraph, and Kubernetes (see `.github/workflows/computer-ci.yml`) +- Test environment setup scripts in `computer-dist/src/assembly/travis/` +- Unit tests run in isolation without external dependencies + +**Vermeer:** +- Test scripts in `vermeer/test/` +- Configuration files in `vermeer/config/` (master.ini, worker.ini templates) + +## CI/CD + +CI pipeline (`.github/workflows/computer-ci.yml`) runs: +1. License check (Apache RAT) +2. Setup HDFS (Hadoop 3.3.2) +3. Setup Minikube/Kubernetes +4. Load test data into HugeGraph +5. Compile with Java 11 +6. Run integration tests (`-P integrate-test`) +7. Run unit tests (`-P unit-test`) +8. Upload coverage to Codecov + +## Important Notes + +- **Computer K8s module**: Must run `mvn clean install` before editing to generate CRD classes +- **Java version**: Build requires JDK 11; HDFS dependencies require JDK 8 +- **Vermeer binary deps**: First-time builds need `make init` to download supervisord/protoc +- **BSP coordination**: Computer uses etcd for barrier synchronization (configure via `BSP_ETCD_URL`) +- **Memory management**: Both systems auto-manage memory by spilling to disk when needed diff --git a/README.md b/README.md index f557fc63..eee4c05e 100644 --- a/README.md +++ b/README.md @@ -1,50 +1,234 @@ # Apache HugeGraph-Computer [](https://www.apache.org/licenses/LICENSE-2.0.html) -[](https://github.com/apache/hugegraph-computer/actions/workflows/ci.yml) +[](https://github.com/apache/hugegraph-computer/actions/workflows/computer-ci.yml) [](https://codecov.io/gh/apache/incubator-hugegraph-computer) [](https://hub.docker.com/repository/docker/hugegraph/hugegraph-computer) [](https://deepwiki.com/apache/hugegraph-computer) -The [hugegraph-computer](./computer/README.md) is a distributed graph processing system for hugegraph. -(Also, the in-memory computing engine(vermeer) is on the way π§) +Apache HugeGraph-Computer is a comprehensive graph computing solution providing two complementary systems for different deployment scenarios: + +- **[Vermeer](./vermeer/README.md)** (Go): High-performance in-memory computing engine for single-machine deployments +- **[Computer](./computer/README.md)** (Java): Distributed BSP/Pregel framework for large-scale cluster computing + +## Quick Comparison + +| Feature | Vermeer (Go) | Computer (Java) | +|---------|--------------|-----------------| +| **Best for** | Single machine, quick start | Large-scale distributed computing | +| **Deployment** | Single binary | Kubernetes or YARN cluster | +| **Memory model** | In-memory first | Auto spill to disk | +| **Setup time** | Minutes | Hours (requires K8s/YARN) | +| **Algorithms** | 20+ algorithms | 45+ algorithms | +| **Architecture** | Master-Worker | BSP (Bulk Synchronous Parallel) | +| **API** | REST + gRPC | Java API | +| **Web UI** | Built-in dashboard | N/A | +| **Data sources** | HugeGraph, CSV, HDFS | HugeGraph, HDFS | + +## Architecture Overview + +```mermaid +graph TB + subgraph HugeGraph-Computer + subgraph Vermeer["Vermeer (Go) - In-Memory Engine"] + VM[Master :6688] --> VW1[Worker 1 :6789] + VM --> VW2[Worker 2 :6789] + VM --> VW3[Worker N :6789] + end + subgraph Computer["Computer (Java) - Distributed BSP"] + CM[Master Service] --> CW1[Worker Pod 1] + CM --> CW2[Worker Pod 2] + CM --> CW3[Worker Pod N] + end + end + + HG[(HugeGraph Server)] <--> Vermeer + HG <--> Computer + + style Vermeer fill:#e1f5fe + style Computer fill:#fff3e0 +``` + +## HugeGraph Ecosystem Integration + +``` +βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ +β HugeGraph Ecosystem β +βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ +β βββββββββββββββ βββββββββββββββ βββββββββββββββββββ β +β β Hubble β β Toolchain β β HugeGraph-AI β β +β β (UI) β β (Tools) β β (LLM/RAG) β β +β ββββββββ¬βββββββ ββββββββ¬βββββββ ββββββββββ¬βββββββββ β +β β β β β +β ββββββββββββββββββββΌβββββββββββββββββββββ β +β β β +β βββββββββΌββββββββ β +β β HugeGraph β β +β β Server β β +β βββββββββ¬ββββββββ β +β β β +β ββββββββββββββββββββΌβββββββββββββββββββ β +β β β β β +β ββββββββΌβββββββ ββββββββΌβββββββ βββββββΌββββββ β +β β Vermeer β β Computer β β Store β β +β β (Memory) β β (BSP/K8s) β β (PD) β β +β βββββββββββββββ βββββββββββββββ βββββββββββββ β +βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ +``` -## Learn More +## Getting Started with Vermeer (Recommended) -The [project homepage](https://hugegraph.apache.org/docs/quickstart/hugegraph-computer/) contains more information about hugegraph-computer. +For quick start and single-machine deployments, we recommend **Vermeer**: -And here are links of other repositories: +### Docker Quick Start -1. [hugegraph](https://github.com/apache/hugegraph) (graph's core component - Graph server + PD + Store) -2. [hugegraph-toolchain](https://github.com/apache/hugegraph-toolchain) (graph tools **[loader](https://github.com/apache/incubator-hugegraph-toolchain/tree/master/hugegraph-loader)/[dashboard](https://github.com/apache/incubator-hugegraph-toolchain/tree/master/hugegraph-hubble)/[tool](https://github.com/apache/incubator-hugegraph-toolchain/tree/master/hugegraph-tools)/[client](https://github.com/apache/incubator-hugegraph-toolchain/tree/master/hugegraph-client)**) -3. [hugegraph-ai](https://github.com/apache/incubator-hugegraph-ai) (integrated **Graph AI/LLM/KG** system) -4. [hugegraph-website](https://github.com/apache/hugegraph-doc) (**doc & website** code) +```bash +# Pull the image +docker pull hugegraph/vermeer:latest +# Run with docker-compose +docker-compose up -d +``` -## Note +### Binary Quick Start -- If some classes under computer-k8s cannot be found, you need to execute `mvn clean install` in advance to generate corresponding classes. +```bash +# Download and extract (example for Linux AMD64) +wget https://github.com/apache/hugegraph-computer/releases/download/vX.X.X/vermeer-linux-amd64.tar.gz +tar -xzf vermeer-linux-amd64.tar.gz +cd vermeer + +# Run master and worker +./vermeer --env=master & +./vermeer --env=worker & +``` + +See the **[Vermeer README](./vermeer/README.md)** for detailed configuration and usage. + +## Getting Started with Computer (Distributed) + +For large-scale distributed graph processing across clusters: + +### Prerequisites + +- JDK 11 or later +- Maven 3.5+ +- Kubernetes cluster or YARN cluster + +### Build from Source + +```bash +cd computer +mvn clean package -DskipTests +``` + +### Deploy on Kubernetes + +```bash +# Configure your K8s cluster in computer-k8s module +# Submit a graph computing job +java -jar computer-driver.jar --config job-config.properties +``` + +See the **[Computer README](./computer/README.md)** for detailed deployment and development guide. + +## Supported Algorithms + +### Common Algorithms (Both Systems) + +| Category | Algorithms | +|----------|-----------| +| **Centrality** | PageRank, Personalized PageRank, Betweenness Centrality, Closeness Centrality, Degree Centrality | +| **Community Detection** | Louvain, LPA (Label Propagation), SLPA, WCC (Weakly Connected Components) | +| **Path Finding** | SSSP (Single Source Shortest Path), BFS (Breadth-First Search) | +| **Graph Structure** | Triangle Count, K-Core, Clustering Coefficient, Cycle Detection | +| **Similarity** | Jaccard Similarity | + +### Vermeer-Specific Features + +- In-memory optimized implementations +- Weighted Louvain variant +- REST API for algorithm execution + +### Computer-Specific Algorithms + +- Count Triangle (distributed implementation) +- Rings detection +- ClusteringCoefficient variations +- Custom algorithm development framework + +See individual README files for complete algorithm lists and usage examples. + +## Performance Characteristics + +### Vermeer (In-Memory) + +- **Throughput**: Optimized for fast iteration on medium-sized graphs (millions of vertices/edges) +- **Latency**: Sub-second query response via REST API +- **Memory**: Requires graph to fit in total worker memory +- **Scalability**: Horizontal scaling by adding worker nodes + +### Computer (Distributed BSP) + +- **Throughput**: Handles billions of vertices/edges via distributed processing +- **Latency**: Batch-oriented with superstep barriers +- **Memory**: Auto spill to disk when memory is insufficient +- **Scalability**: Elastic scaling on K8s with pod autoscaling + +## Use Cases + +### When to Use Vermeer + +- Quick prototyping and experimentation +- Interactive graph analytics with Web UI +- Medium-scale graphs (up to hundreds of millions of edges) +- Single-machine or small cluster deployments +- REST API integration requirements + +### When to Use Computer + +- Large-scale batch processing (billions of vertices) +- Existing Kubernetes or YARN infrastructure +- Custom algorithm development with Java +- Memory-constrained environments (auto spill to disk) +- Integration with Hadoop ecosystem + +## Documentation + +- [Project Homepage](https://hugegraph.apache.org/docs/quickstart/hugegraph-computer/) +- [Vermeer Documentation](./vermeer/README.md) +- [Computer Documentation](./computer/README.md) +- [HugeGraph Documentation](https://hugegraph.apache.org/docs/) + +## Related Projects + +1. [hugegraph](https://github.com/apache/hugegraph) - Graph database core (Server + PD + Store) +2. [hugegraph-toolchain](https://github.com/apache/hugegraph-toolchain) - Graph tools (Loader/Hubble/Tools/Client) +3. [hugegraph-ai](https://github.com/apache/incubator-hugegraph-ai) - Graph AI/LLM/Knowledge Graph system +4. [hugegraph-website](https://github.com/apache/hugegraph-doc) - Documentation and website ## Contributing -- Welcome to contribute to HugeGraph, please see [How to Contribute](https://hugegraph.apache.org/docs/contribution-guidelines/contribute/) for more information. -- Note: It's recommended to use [GitHub Desktop](https://desktop.github.com/) to greatly simplify the PR and commit process. -- Thank you to all the people who already contributed to HugeGraph! +Welcome to contribute to HugeGraph-Computer! Please see: -[](https://github.com/apache/incubator-hugegraph-computer/graphs/contributors) +- [How to Contribute](https://hugegraph.apache.org/docs/contribution-guidelines/contribute/) for guidelines +- [GitHub Issues](https://github.com/apache/hugegraph-computer/issues) for bug reports and feature requests -## License +We recommend using [GitHub Desktop](https://desktop.github.com/) to simplify the PR process. + +Thank you to all contributors! -hugegraph-computer is licensed under [Apache 2.0](https://github.com/apache/incubator-hugegraph-computer/blob/master/LICENSE) License. +[](https://github.com/apache/incubator-hugegraph-computer/graphs/contributors) -### Contact Us +## License ---- +HugeGraph-Computer is licensed under [Apache 2.0 License](https://github.com/apache/incubator-hugegraph-computer/blob/master/LICENSE). - - [GitHub Issues](https://github.com/apache/incubator-hugegraph-computer/issues): Feedback on usage issues and functional requirements (quick response) - - Feedback Email: [[email protected]](mailto:[email protected]) ([subscriber](https://hugegraph.apache.org/docs/contribution-guidelines/subscribe/) only) - - Slack: [join the ASF HugeGraph channel](https://the-asf.slack.com/archives/C059UU2FJ23) - - WeChat public account: Apache HugeGraph, welcome to scan this QR code to follow us. +## Contact Us - <img src="https://github.com/apache/hugegraph-doc/blob/master/assets/images/wechat.png?raw=true" alt="QR png" width="350"/> +- **GitHub Issues**: [Report bugs or request features](https://github.com/apache/incubator-hugegraph-computer/issues) +- **Email**: [[email protected]](mailto:[email protected]) ([subscribe first](https://hugegraph.apache.org/docs/contribution-guidelines/subscribe/)) +- **Slack**: [Join ASF HugeGraph channel](https://the-asf.slack.com/archives/C059UU2FJ23) +- **WeChat**: Scan QR code to follow Apache HugeGraph official account +<img src="https://github.com/apache/hugegraph-doc/blob/master/assets/images/wechat.png?raw=true" alt="WeChat QR Code" width="350"/> diff --git a/computer/README.md b/computer/README.md index 7368f088..173c576e 100644 --- a/computer/README.md +++ b/computer/README.md @@ -1,12 +1,519 @@ -# Apache HugeGraph-Computer +# Apache HugeGraph-Computer (Java) -The hugegraph-computer is a distributed graph processing system for hugegraph. It is an implementation of [Pregel](https://kowshik.github.io/JPregel/pregel_paper.pdf). It runs on Kubernetes or YARN framework. +HugeGraph-Computer is a distributed graph processing framework implementing the [Pregel](https://kowshik.github.io/JPregel/pregel_paper.pdf) model (BSP - Bulk Synchronous Parallel). It runs on Kubernetes or YARN clusters and integrates with HugeGraph for graph input/output. ## Features -- Support distributed MPP graph computing, and integrates with HugeGraph as graph input/output storage. -- Based on BSP (Bulk Synchronous Parallel) model, an algorithm performs computing through multiple parallel iterations, every iteration is a superstep. -- Auto memory management. The framework will never be OOM(Out of Memory) since it will split some data to disk if it doesn't have enough memory to hold all the data. -- The part of edges or the messages of supernode can be in memory, so you will never lose it. -- You can load the data from HDFS or HugeGraph, output the results to HDFS or HugeGraph, or adapt any other systems manually as needed. -- Easy to develop a new algorithm. You need to focus on a vertex only processing just like as in a single server, without worrying about message transfer and memory/storage management. +- **Distributed MPP Computing**: Massively parallel graph processing across cluster nodes +- **BSP Model**: Algorithm execution through iterative supersteps with global synchronization +- **Auto Memory Management**: Automatic spill to disk when memory is insufficient - never OOM +- **Flexible Data Sources**: Load from HugeGraph or HDFS, output to HugeGraph or HDFS +- **Easy Algorithm Development**: Focus on single-vertex logic without worrying about distribution +- **Production-Ready**: Battle-tested on billion-scale graphs with Kubernetes integration + +## Architecture + +### Module Structure + +``` +βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ +β HugeGraph-Computer β +βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ +β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β +β β computer-driver β β +β β (Job Submission & Coordination) β β +β βββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββ β +β β β +β βββββββββββββββββββββββββββΌββββββββββββββββββββββββββββ β +β β Deployment Layer (choose one) β β +β β ββββββββββββββββ ββββββββββββββββ β β +β β β computer-k8s β β computer-yarnβ β β +β β ββββββββββββββββ ββββββββββββββββ β β +β βββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββ β +β β β +β βββββββββββββββββββββββββββΌββββββββββββββββββββββββββββ β +β β computer-core β β +β β (WorkerService, MasterService, BSP) β β +β β ββββββββββββββββββββββββββββββββββββββββββββββββ β β +β β β Managers: Message, Aggregation, Snapshot... β β β +β β ββββββββββββββββββββββββββββββββββββββββββββββββ β β +β βββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββ β +β β β +β βββββββββββββββββββββββββββΌββββββββββββββββββββββββββββ β +β β computer-algorithm β β +β β (PageRank, LPA, WCC, SSSP, TriangleCount...) β β +β βββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββ β +β β β +β βββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββββ β +β β computer-api β β +β β (Computation, Vertex, Edge, Aggregator, Value) β β +β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β +βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ +``` + +### Module Descriptions + +| Module | Description | +|--------|-------------| +| **computer-api** | Public interfaces for algorithm development (`Computation`, `Vertex`, `Edge`, `Aggregator`, `Combiner`) | +| **computer-core** | Runtime implementation (WorkerService, MasterService, messaging, BSP coordination, memory management) | +| **computer-algorithm** | Built-in graph algorithms (45+ implementations) | +| **computer-driver** | Job submission and driver-side coordination | +| **computer-k8s** | Kubernetes deployment integration | +| **computer-yarn** | YARN deployment integration | +| **computer-k8s-operator** | Kubernetes operator for job lifecycle management | +| **computer-dist** | Distribution packaging and assembly | +| **computer-test** | Integration tests and unit tests | + +## Prerequisites + +- **JDK 11** or later (for building and running) +- **Maven 3.5+** for building +- **Kubernetes cluster** or **YARN cluster** for deployment +- **etcd** for BSP coordination (configured via `BSP_ETCD_URL`) + +**Note**: For K8s-operator module development, run `mvn clean install` in `computer-k8s-operator` first to generate CRD classes. + +## Quick Start + +### Build from Source + +```bash +cd computer + +# Compile (skip javadoc for faster builds) +mvn clean compile -Dmaven.javadoc.skip=true + +# Package (skip tests for faster packaging) +mvn clean package -DskipTests +``` + +### Run Tests + +```bash +# Unit tests +mvn test -P unit-test + +# Integration tests (requires etcd, K8s, HugeGraph) +mvn test -P integrate-test + +# Run specific test class +mvn test -P unit-test -Dtest=ClassName + +# Run specific test method +mvn test -P unit-test -Dtest=ClassName#methodName +``` + +### License Check + +```bash +mvn apache-rat:check +``` + +### Deploy on Kubernetes + +#### 1. Configure Job + +Create `job-config.properties`: + +```properties +# Algorithm class +algorithm.class=org.apache.hugegraph.computer.algorithm.centrality.pagerank.PageRank + +# HugeGraph connection +hugegraph.url=http://hugegraph-server:8080 +hugegraph.graph=hugegraph + +# K8s configuration +k8s.namespace=default +k8s.image=hugegraph/hugegraph-computer:latest +k8s.master.cpu=2 +k8s.master.memory=4Gi +k8s.worker.replicas=3 +k8s.worker.cpu=4 +k8s.worker.memory=8Gi + +# BSP coordination (etcd) +bsp.etcd.url=http://etcd-cluster:2379 + +# Algorithm parameters (PageRank example) +pagerank.damping_factor=0.85 +pagerank.max_iterations=20 +pagerank.convergence_tolerance=0.0001 +``` + +#### 2. Submit Job + +```bash +java -jar computer-driver/target/computer-driver-${VERSION}.jar \ + --config job-config.properties +``` + +#### 3. Monitor Job + +```bash +# Check pod status +kubectl get pods -n default + +# View master logs +kubectl logs hugegraph-computer-master-xxx -n default + +# View worker logs +kubectl logs hugegraph-computer-worker-0 -n default +``` + +### Deploy on YARN + +**Note**: YARN deployment support is under development. Use Kubernetes for production deployments. + +## Available Algorithms + +### Centrality Algorithms + +| Algorithm | Class | Description | +|-----------|-------|-------------| +| PageRank | `algorithm.centrality.pagerank.PageRank` | Standard PageRank | +| Personalized PageRank | `algorithm.centrality.pagerank.PersonalizedPageRank` | Source-specific PageRank | +| Betweenness Centrality | `algorithm.centrality.betweenness.BetweennessCentrality` | Shortest-path-based centrality | +| Closeness Centrality | `algorithm.centrality.closeness.ClosenessCentrality` | Average distance centrality | +| Degree Centrality | `algorithm.centrality.degree.DegreeCentrality` | In/out degree counting | + +### Community Detection + +| Algorithm | Class | Description | +|-----------|-------|-------------| +| LPA | `algorithm.community.lpa.Lpa` | Label Propagation Algorithm | +| WCC | `algorithm.community.wcc.Wcc` | Weakly Connected Components | +| Louvain | `algorithm.community.louvain.Louvain` | Modularity-based community detection | +| K-Core | `algorithm.community.kcore.KCore` | K-core decomposition | + +### Path Finding + +| Algorithm | Class | Description | +|-----------|-------|-------------| +| SSSP | `algorithm.path.sssp.Sssp` | Single Source Shortest Path | +| BFS | `algorithm.traversal.bfs.Bfs` | Breadth-First Search | +| Rings | `algorithm.path.rings.Rings` | Cycle/ring detection | + +### Graph Structure + +| Algorithm | Class | Description | +|-----------|-------|-------------| +| Triangle Count | `algorithm.trianglecount.TriangleCount` | Count triangles | +| Clustering Coefficient | `algorithm.clusteringcoefficient.ClusteringCoefficient` | Local clustering measure | + +**Full algorithm list**: See `computer-algorithm/src/main/java/org/apache/hugegraph/computer/algorithm/` + +## Developing Custom Algorithms + +### Algorithm Contract + +Algorithms implement the `Computation` interface from `computer-api`: + +```java +package org.apache.hugegraph.computer.core.worker; + +public interface Computation<M extends Value> { + /** + * Initialization at superstep 0 + */ + void compute0(ComputationContext context, Vertex vertex); + + /** + * Message processing in subsequent supersteps + */ + void compute(ComputationContext context, Vertex vertex, Iterator<M> messages); +} +``` + +### Example: Simple PageRank + +```java +package org.apache.hugegraph.computer.algorithm.centrality.pagerank; + +import org.apache.hugegraph.computer.core.worker.Computation; +import org.apache.hugegraph.computer.core.worker.ComputationContext; + +public class PageRank implements Computation<DoubleValue> { + + public static final String OPTION_ALPHA = "pagerank.alpha"; + public static final String OPTION_MAX_ITERATIONS = "pagerank.max_iterations"; + + private double alpha; + private int maxIterations; + + @Override + public void init(Config config) { + this.alpha = config.getDouble(OPTION_ALPHA, 0.85); + this.maxIterations = config.getInt(OPTION_MAX_ITERATIONS, 20); + } + + @Override + public void compute0(ComputationContext context, Vertex vertex) { + // Initialize: set initial PR value + vertex.value(new DoubleValue(1.0)); + + // Send PR to neighbors + int edgeCount = vertex.numEdges(); + if (edgeCount > 0) { + double contribution = 1.0 / edgeCount; + context.sendMessageToAllEdges(vertex, new DoubleValue(contribution)); + } + } + + @Override + public void compute(ComputationContext context, Vertex vertex, Iterator<DoubleValue> messages) { + // Sum incoming PR contributions + double sum = 0.0; + while (messages.hasNext()) { + sum += messages.next().value(); + } + + // Calculate new PR value + double newPR = (1.0 - alpha) + alpha * sum; + vertex.value(new DoubleValue(newPR)); + + // Send to neighbors if not converged + if (context.superstep() < maxIterations) { + int edgeCount = vertex.numEdges(); + if (edgeCount > 0) { + double contribution = newPR / edgeCount; + context.sendMessageToAllEdges(vertex, new DoubleValue(contribution)); + } + } else { + vertex.inactivate(); + } + } +} +``` + +### Key Concepts + +#### 1. Supersteps + +- **Superstep 0**: Initialization via `compute0()` +- **Superstep 1+**: Message processing via `compute()` +- **Barrier Synchronization**: All workers complete superstep N before starting N+1 + +#### 2. Message Passing + +```java +// Send to specific vertex +context.sendMessage(targetId, new DoubleValue(1.0)); + +// Send to all outgoing edges +context.sendMessageToAllEdges(vertex, new DoubleValue(1.0)); +``` + +#### 3. Aggregators + +Global state shared across all workers: + +```java +// Register aggregator in compute0() +context.registerAggregator("sum", new DoubleValue(0.0), SumAggregator.class); + +// Write to aggregator +context.aggregateValue("sum", new DoubleValue(vertex.value())); + +// Read aggregator value (available in next superstep) +DoubleValue total = context.aggregatedValue("sum"); +``` + +#### 4. Combiners + +Reduce message volume by combining messages at sender: + +```java +public class SumCombiner implements Combiner<DoubleValue> { + @Override + public void combine(DoubleValue v1, DoubleValue v2, DoubleValue result) { + result.value(v1.value() + v2.value()); + } +} +``` + +### Algorithm Development Workflow + +1. **Implement `Computation` interface** in `computer-algorithm` +2. **Add configuration options** with `OPTION_*` constants +3. **Implement `compute0()` for initialization** +4. **Implement `compute()` for message processing** +5. **Configure in job properties**: + ```properties + algorithm.class=com.example.MyAlgorithm + myalgorithm.param1=value1 + ``` +6. **Build and test**: + ```bash + mvn clean package -DskipTests + ``` + +## BSP Coordination + +HugeGraph-Computer uses etcd for BSP barrier synchronization: + +### BSP Lifecycle (per superstep) + +1. **Worker Prepare**: `workerStepPrepareDone` β `waitMasterStepPrepareDone` +2. **Compute Phase**: Workers process vertices and messages locally +3. **Worker Compute Done**: `workerStepComputeDone` β `waitMasterStepComputeDone` +4. **Aggregation**: Aggregators combine global state +5. **Worker Step Done**: `workerStepDone` β `waitMasterStepDone` (master returns `SuperstepStat`) + +### Manager Pattern + +`WorkerService` composes multiple managers with lifecycle hooks: + +- `MessageSendManager`: Outgoing message buffering and sending +- `MessageRecvManager`: Incoming message receiving and sorting +- `WorkerAggrManager`: Aggregator value collection +- `DataServerManager`: Inter-worker data transfer +- `SortManagers`: Message and edge sorting +- `SnapshotManager`: Checkpoint creation + +All managers implement: +- `initAll()`: Initialize before first superstep +- `beforeSuperstep()`: Prepare for superstep +- `afterSuperstep()`: Cleanup after superstep +- `closeAll()`: Shutdown cleanup + +## Configuration Reference + +### Job Configuration + +```properties +# === Algorithm === +algorithm.class=<fully qualified class name> +algorithm.message_class=<message value class> +algorithm.result_class=<result value class> + +# === HugeGraph Input === +hugegraph.url=http://localhost:8080 +hugegraph.graph=hugegraph +hugegraph.input.vertex_label=person +hugegraph.input.edge_label=knows +hugegraph.input.filter=<gremlin filter> + +# === HugeGraph Output === +hugegraph.output.vertex_property=pagerank_value +hugegraph.output.edge_property=<property name> + +# === HDFS Input === +input.hdfs.path=/graph/input +input.hdfs.format=json + +# === HDFS Output === +output.hdfs.path=/graph/output +output.hdfs.format=json + +# === Worker Resources === +worker.count=3 +worker.memory=8Gi +worker.cpu=4 +worker.thread_count=<cpu cores> + +# === BSP Coordination === +bsp.etcd.url=http://etcd:2379 +bsp.max_superstep=100 +bsp.log_interval=10 + +# === Memory Management === +worker.data.dirs=/data1,/data2 +worker.write_buffer_size=134217728 +worker.max_spill_size=1073741824 +``` + +## Memory Management + +Computer auto-manages memory to prevent OOM: + +1. **In-Memory Buffering**: Vertices, edges, messages buffered in memory +2. **Spill Threshold**: When memory usage exceeds threshold, spill to disk +3. **Disk Storage**: Configurable data directories (`worker.data.dirs`) +4. **Automatic Cleanup**: Spilled data cleaned after superstep completion + +**Best Practice**: Allocate worker memory β₯ 2x graph size for optimal performance. + +## Troubleshooting + +### K8s CRD Classes Not Found + +```bash +# Generate CRD classes first +cd computer-k8s-operator +mvn clean install +``` + +Generated classes appear in `computer-k8s/target/generated-sources/`. + +### etcd Connection Errors + +- Verify `bsp.etcd.url` is reachable from all pods +- Check etcd cluster health: `etcdctl endpoint health` +- Ensure firewall allows port 2379 + +### Out of Memory Errors + +- Increase `worker.memory` in job config +- Reduce `worker.write_buffer_size` to trigger earlier spilling +- Increase `worker.count` to distribute graph across more workers + +### Slow Convergence + +- Check algorithm parameters (e.g., `pagerank.convergence_tolerance`) +- Monitor superstep logs for progress +- Consider using combiners to reduce message volume + +## Important Files + +| File | Description | +|------|-------------| +| `computer-api/.../Computation.java` | Algorithm interface contract (computer/computer-api/src/main/java/org/apache/hugegraph/computer/core/worker/Computation.java:25) | +| `computer-core/.../WorkerService.java` | Worker runtime orchestration (computer/computer-core/src/main/java/org/apache/hugegraph/computer/core/worker/WorkerService.java:1) | +| `computer-core/.../Bsp4Worker.java` | BSP coordination logic (computer/computer-core/src/main/java/org/apache/hugegraph/computer/core/bsp/Bsp4Worker.java:1) | +| `computer-algorithm/.../PageRank.java` | Example algorithm implementation (computer/computer-algorithm/src/main/java/org/apache/hugegraph/computer/algorithm/centrality/pagerank/PageRank.java:1) | + +## Testing + +### CI/CD Pipeline + +The CI pipeline (`.github/workflows/computer-ci.yml`) runs: + +1. License check (Apache RAT) +2. Setup HDFS (Hadoop 3.3.2) +3. Setup Minikube/Kubernetes +4. Load test data into HugeGraph +5. Compile with JDK 11 +6. Run integration tests (`-P integrate-test`) +7. Run unit tests (`-P unit-test`) +8. Upload coverage to Codecov + +### Local Testing + +```bash +# Setup test environment (etcd, HDFS, K8s) +cd computer-dist/src/assembly/travis +./start-etcd.sh +./start-hdfs.sh +./start-minikube.sh + +# Run tests +cd ../../../../ +mvn test -P integrate-test +``` + +## Links + +- [Project Homepage](https://hugegraph.apache.org/docs/quickstart/hugegraph-computer/) +- [Main README](../README.md) +- [Vermeer (Go) README](../vermeer/README.md) +- [GitHub Issues](https://github.com/apache/hugegraph-computer/issues) + +## Contributing + +See the main [Contributing Guide](../README.md#contributing) for how to contribute. + +## License + +HugeGraph-Computer is licensed under [Apache 2.0 License](https://github.com/apache/incubator-hugegraph-computer/blob/master/LICENSE). diff --git a/vermeer/.gitignore b/vermeer/.gitignore index b9d714e5..8fb775a2 100644 --- a/vermeer/.gitignore +++ b/vermeer/.gitignore @@ -98,3 +98,13 @@ tools/protoc/*/include/ # Generated files (should be generated via go generate) # ###################### asset/assets_vfsdata.go + +# AI assistant specific files (we only maintain AGENTS.md) # +###################### +CLAUDE.md +GEMINI.md +CURSOR.md +COPILOT.md +.cursorrules +.cursor/ +.github/copilot-instructions.md diff --git a/vermeer/AGENTS.md b/vermeer/AGENTS.md new file mode 100644 index 00000000..2fb2df04 --- /dev/null +++ b/vermeer/AGENTS.md @@ -0,0 +1,207 @@ +# AGENTS.md + +This file provides guidance to AI coding assistants when working with code in this repository. + +## Repository Overview + +Vermeer is a high-performance in-memory graph computing platform written in Go. It features a single-binary deployment model with master-worker architecture, supporting 20+ graph algorithms and seamless HugeGraph integration. + +## Build & Test Commands + +**Prerequisites:** +- Go 1.23+ +- `curl` and `unzip` (for downloading binary dependencies) + +**First-time setup:** +```bash +make init # Downloads supervisord and protoc binaries, installs Go deps +``` + +**Build:** +```bash +make # Build for current platform +make build-linux-amd64 +make build-linux-arm64 +``` + +**Development build with hot-reload UI:** +```bash +go build -tags=dev +``` + +**Clean:** +```bash +make clean # Remove built binaries and generated assets +make clean-all # Also remove downloaded tools +``` + +**Run:** +```bash +# Using binary directly +./vermeer --env=master +./vermeer --env=worker + +# Using script (configure in vermeer.sh) +./vermeer.sh start master +./vermeer.sh start worker +``` + +**Tests:** +```bash +# Run with build tag vermeer_test +go test -tags=vermeer_test -v + +# Specific test modes +go test -tags=vermeer_test -v -mode=algorithms +go test -tags=vermeer_test -v -mode=function +go test -tags=vermeer_test -v -mode=scheduler +``` + +**Regenerate protobuf (if proto files changed):** +```bash +go install google.golang.org/protobuf/cmd/[email protected] +go install google.golang.org/grpc/cmd/[email protected] +tools/protoc/osxm1/protoc *.proto --go-grpc_out=. --go_out=. +``` + +## Architecture + +### Directory Structure + +``` +vermeer/ +βββ main.go # Single binary entry point +βββ algorithms/ # Algorithm implementations +β βββ algorithms.go # AlgorithmMaker registry +β βββ pagerank.go +β βββ louvain.go +β βββ ... +βββ apps/ +β βββ master/ # Master service +β β βββ services/ # HTTP handlers +β β βββ workers/ # Worker management (WorkerManager, WorkerClient) +β β βββ tasks/ # Task scheduling +β β βββ graphs/ # Graph metadata management +β βββ worker/ # Worker service entry +β βββ compute/ # Worker-side compute logic +β β βββ api.go # Algorithm interface definition +β β βββ task.go # Compute task execution +β β βββ ... +β βββ graphio/ # Graph I/O (HugeGraph, CSV, HDFS) +β β βββ hugegraph.go # HugeGraph integration +β βββ protos/ # gRPC definitions +β βββ common/ # Utilities, logging, metrics +β βββ structure/ # Graph data structures +β βββ storage/ # Persistence layer +β βββ bsp/ # BSP coordination helpers +βββ config/ # Configuration templates +βββ tools/ # Binary dependencies (supervisord, protoc) +βββ ui/ # Web dashboard +``` + +### Key Design Patterns + +**1. Maker/Registry Pattern** + +Graph loaders and writers register themselves via `init()`: + +```go +func init() { + LoadMakers[LoadTypeHugegraph] = &HugegraphMaker{} +} +``` + +Master selects loader by type from the registry. Algorithms follow the same pattern in `algorithms/algorithms.go`. + +**2. Master-Worker Architecture** + +- **Master**: Schedules LoadPartition tasks to workers, manages worker lifecycle via WorkerManager/WorkerClient, exposes HTTP endpoints for graph/task management +- **Worker**: Executes compute tasks, reports status back to master via gRPC +- Communication: Master uses gRPC clients to workers (apps/master/workers/); workers connect to master on startup + +**3. HugeGraph Integration** + +Implementation in `apps/graphio/hugegraph.go`: + +1. **Metadata Query**: Queries HugeGraph PD (metadata service) via gRPC for partition information +2. **Data Loading**: Streams vertices/edges from HugeGraph Store via gRPC (`ScanPartition`) +3. **Result Writing**: Writes computed results back via HugeGraph HTTP REST API (adds vertex properties) + +The loader queries PD first (`QueryPartitions`), then creates LoadPartition tasks for each partition, which workers execute by calling `ScanPartition` on store nodes. + +**4. Algorithm Interface** + +Algorithms implement the interface defined in `apps/compute/api.go`. Each algorithm must register itself in `algorithms/algorithms.go` by appending to the `Algorithms` slice. + +**5. Single Binary Entry Point** + +`main.go` loads config from `config/{env}.ini`, then starts either master or worker based on `run_mode` parameter. The `--env` flag specifies which config file to use (e.g., `--env=master` loads `config/master.ini`). + +## Important Files + +- Entry point: `main.go` +- Algorithm interface: `apps/compute/api.go` +- Algorithm registry: `algorithms/algorithms.go` +- HugeGraph integration: `apps/graphio/hugegraph.go` +- Master scheduling: `apps/master/tasks/tasks.go` +- Worker management: `apps/master/workers/workers.go` +- HTTP endpoints: `apps/master/services/http_master.go` + +## Development Workflow + +**Adding a New Algorithm:** + +1. Create file in `algorithms/` implementing the interface from `apps/compute/api.go` +2. Register in `algorithms/algorithms.go` by appending to `Algorithms` slice +3. Implement required methods: `Init()`, `Compute()`, `Aggregate()`, `Terminate()` +4. Rebuild: `make` + +**Modifying Web UI:** + +1. Edit files in `ui/` +2. Regenerate assets: `cd asset && go generate` +3. Or use dev build: `go build -tags=dev` (hot-reload enabled) + +**Modifying Protobuf Definitions:** + +1. Edit `.proto` files in `apps/protos/` +2. Regenerate Go code using protoc (adjust path for platform): + ```bash + tools/protoc/osxm1/protoc apps/protos/*.proto --go-grpc_out=. --go_out=. + ``` + +## Configuration + +**Master (`config/master.ini`):** +- `http_peer`: Master HTTP listen address (default: 0.0.0.0:6688) +- `grpc_peer`: Master gRPC listen address (default: 0.0.0.0:6689) +- `run_mode`: Must be "master" +- `task_parallel_num`: Number of parallel tasks + +**Worker (`config/worker.ini`):** +- `http_peer`: Worker HTTP listen address (default: 0.0.0.0:6788) +- `grpc_peer`: Worker gRPC listen address (default: 0.0.0.0:6789) +- `master_peer`: Master gRPC address to connect (must match master's `grpc_peer`) +- `run_mode`: Must be "worker" + +## Memory Management + +Vermeer uses an in-memory-first approach. Graphs are distributed across workers and stored in memory. Ensure total worker memory exceeds graph size by 2-3x for algorithm workspace. + +## Testing Notes + +Tests require the build tag `vermeer_test`: + +```bash +go test -tags=vermeer_test -v +``` + +Test modes (set via `-mode` flag): +- `algorithms`: Algorithm correctness tests +- `function`: Functional integration tests +- `scheduler`: Scheduler behavior tests + +Test configuration via flags: +- `-master`: Master HTTP address +- `-worker01/02/03`: Worker HTTP addresses +- `-auth`: Authentication type diff --git a/vermeer/README.md b/vermeer/README.md index 2d9e8207..f1e71ab5 100644 --- a/vermeer/README.md +++ b/vermeer/README.md @@ -1,125 +1,543 @@ -# Vermeer Graph Compute Engine +# Vermeer - High-Performance In-Memory Graph Computing + +Vermeer is a high-performance in-memory graph computing platform with a single-binary deployment model. It provides 20+ graph algorithms, custom algorithm extensions, and seamless integration with HugeGraph. + +## Key Features + +- **Single Binary Deployment**: Zero external dependencies, run anywhere +- **In-Memory Performance**: Optimized for fast iteration on medium to large graphs +- **Master-Worker Architecture**: Horizontal scalability by adding worker nodes +- **REST API + gRPC**: Easy integration with existing systems +- **Web UI Dashboard**: Built-in monitoring and job management +- **Multi-Source Support**: HugeGraph, local CSV, HDFS +- **20+ Graph Algorithms**: Production-ready implementations + +## Architecture + +```mermaid +graph TB + subgraph Client["Client Layer"] + API[REST API Client] + UI[Web UI Dashboard] + end + + subgraph Master["Master Node"] + HTTP[HTTP Server :6688] + GRPC_M[gRPC Server :6689] + GM[Graph Manager] + TM[Task Manager] + WM[Worker Manager] + SCH[Scheduler] + end + + subgraph Workers["Worker Nodes"] + W1[Worker 1 :6789] + W2[Worker 2 :6789] + W3[Worker N :6789] + end + + subgraph DataSources["Data Sources"] + HG[(HugeGraph)] + CSV[Local CSV] + HDFS[HDFS] + end + + API --> HTTP + UI --> HTTP + HTTP --> GM + HTTP --> TM + GRPC_M <--> W1 + GRPC_M <--> W2 + GRPC_M <--> W3 + + W1 <--> HG + W2 <--> HG + W3 <--> HG + W1 <--> CSV + W1 <--> HDFS + + style Master fill:#e1f5fe + style Workers fill:#fff3e0 + style DataSources fill:#f1f8e9 +``` + +### Directory Structure -## Introduction -Vermeer is a high-performance distributed graph computing platform based on memory, supporting more than 15 graph algorithms, custom algorithm extensions, and custom data source access. +``` +vermeer/ +βββ main.go # Single binary entry point +βββ Makefile # Build automation +βββ algorithms/ # 20+ algorithm implementations +β βββ pagerank.go +β βββ louvain.go +β βββ sssp.go +β βββ ... +βββ apps/ +β βββ master/ # Master service +β β βββ services/ # HTTP handlers +β β βββ workers/ # Worker management +β β βββ tasks/ # Task scheduling +β βββ compute/ # Worker-side compute logic +β βββ graphio/ # Graph I/O (HugeGraph, CSV, HDFS) +β β βββ hugegraph.go # HugeGraph integration +β βββ protos/ # gRPC definitions +β βββ common/ # Utilities, logging, metrics +βββ config/ # Configuration templates +β βββ master.ini +β βββ worker.ini +βββ tools/ # Binary dependencies (supervisord, protoc) +βββ ui/ # Web dashboard +``` -## Run with Docker +## Quick Start + +### Option 1: Docker (Recommended) Pull the image: -``` + +```bash docker pull hugegraph/vermeer:latest ``` -Create local configuration files, for example, `~/master.ini` and `~/worker.ini`. +Create local configuration files `~/master.ini` and `~/worker.ini` (see [Configuration](#configuration) section). -Run with Docker. The `--env` flag specifies the file name. +Run with Docker: -``` -master: docker run -v ~/:/go/bin/config hugegraph/vermeer --env=master -worker: docker run -v ~/:/go/bin/config hugegraph/vermeer --env=worker +```bash +# Master node +docker run -v ~/:/go/bin/config hugegraph/vermeer --env=master + +# Worker node +docker run -v ~/:/go/bin/config hugegraph/vermeer --env=worker ``` -We've also provided a `docker-compose` file. Once you've created `~/master.ini` and `~/worker.ini`, and updated the `master_peer` in `worker.ini` to `172.20.0.10:6689`, you can run it using the following command: +#### Docker Compose -``` +Update `master_peer` in `~/worker.ini` to `172.20.0.10:6689`, then: + +```bash docker-compose up -d ``` -## Start +### Option 2: Binary Download +```bash +# Download binary (replace version and platform) +wget https://github.com/apache/hugegraph-computer/releases/download/vX.X.X/vermeer-linux-amd64.tar.gz +tar -xzf vermeer-linux-amd64.tar.gz +cd vermeer + +# Run master and worker +./vermeer --env=master & +./vermeer --env=worker & ``` -master: ./vermeer --env=master -worker: ./vermeer --env=worker01 -``` -The parameter env specifies the name of the configuration file in the useconfig folder. -``` +The `--env` parameter specifies the configuration file name in the `config/` folder (e.g., `master.ini`, `worker.ini`). + +#### Using the Shell Script + +Configure parameters in `vermeer.sh`, then: + +```bash ./vermeer.sh start master ./vermeer.sh start worker ``` -Configuration items are specified in vermeer.sh -## supervisord -Can be used with supervisord to start and stop services, automatically start applications, rotate logs, and more; for the configuration file, refer to config/supervisor.conf; -Configuration file reference config/supervisor.conf +### Option 3: Build from Source -```` -# run as daemon -./supervisord -c supervisor.conf -d -```` +#### Prerequisites -## Build from Source +- Go 1.23 or later +- `curl` and `unzip` utilities (for downloading dependencies) +- Internet connection (for first-time setup) -### Requirements -* Go 1.23 or later -* `curl` and `unzip` utilities (for downloading dependencies) -* Internet connection (for first-time setup) +#### Build Steps -### Quick Start - -**Recommended**: Use Makefile for building: +**Recommended**: Use Makefile: ```bash -# First time setup (downloads binary dependencies) +# First-time setup (downloads supervisord and protoc binaries) make init -# Build vermeer +# Build for current platform make + +# Or build for specific platform +make build-linux-amd64 +make build-linux-arm64 ``` -**Alternative**: Use the build script: +**Alternative**: Use build script: ```bash -# For AMD64 -./build.sh amd64 +# Auto-detect platform +./build.sh -# For ARM64 +# Or specify architecture +./build.sh amd64 ./build.sh arm64 ``` -# The script will: -# - Auto-detect your OS and architecture if no parameter is provided -# - Download required tools if not present -# - Generate assets and build the binary -# - Exit with error message if any step fails +#### Development Build -### Build Targets +For development with hot-reload of web UI: ```bash -make build # Build for current platform -make build-linux-amd64 # Build for Linux AMD64 -make build-linux-arm64 # Build for Linux ARM64 -make clean # Clean generated files -make help # Show all available targets +go build -tags=dev ``` -### Development Build +#### Clean Build Artifacts -For development with hot-reload of web UI: +```bash +make clean # Remove binaries and generated assets +make clean-all # Also remove downloaded tools (supervisord, protoc) +``` + +## Configuration + +### Master Configuration (`master.ini`) + +```ini +[master] +# Master server listen address +listen_addr = :6688 + +# Master gRPC server address +grpc_addr = :6689 + +# Worker heartbeat timeout (seconds) +worker_timeout = 30 + +# Task execution timeout (seconds) +task_timeout = 3600 + +[hugegraph] +# HugeGraph PD address for metadata +pd_peers = 127.0.0.1:8686 + +# HugeGraph HTTP endpoint for result writing +server = http://127.0.0.1:8080 + +# Graph space name +graph = hugegraph +``` + +### Worker Configuration (`worker.ini`) + +```ini +[worker] +# Worker listen address +listen_addr = :6789 + +# Master gRPC address to connect +master_peer = 127.0.0.1:6689 + +# Worker ID (unique) +worker_id = worker01 + +# Number of compute threads +compute_threads = 4 + +# Memory limit (GB) +memory_limit = 8 + +[storage] +# Local disk path for spilling +data_path = ./data +``` + +## Available Algorithms + +| Algorithm | Category | Description | +|-----------|----------|-------------| +| **PageRank** | Centrality | Measures vertex importance via link structure | +| **Personalized PageRank** | Centrality | PageRank from specific source vertices | +| **Betweenness Centrality** | Centrality | Measures vertex importance via shortest paths | +| **Closeness Centrality** | Centrality | Measures average distance to all other vertices | +| **Degree Centrality** | Centrality | Simple in/out degree calculation | +| **Louvain** | Community Detection | Modularity-based community detection | +| **Louvain (Weighted)** | Community Detection | Weighted variant for edge-weighted graphs | +| **LPA** | Community Detection | Label Propagation Algorithm | +| **SLPA** | Community Detection | Speaker-Listener Label Propagation | +| **WCC** | Community Detection | Weakly Connected Components | +| **SCC** | Community Detection | Strongly Connected Components | +| **SSSP** | Path Finding | Single Source Shortest Path (Dijkstra) | +| **Triangle Count** | Graph Structure | Counts triangles in the graph | +| **K-Core** | Graph Structure | Finds k-core subgraphs | +| **K-Out** | Graph Structure | K-degree filtering | +| **Clustering Coefficient** | Graph Structure | Measures local clustering | +| **Cycle Detection** | Graph Structure | Detects cycles in directed graphs | +| **Jaccard Similarity** | Similarity | Computes neighbor-based similarity | +| **Depth (BFS)** | Traversal | Breadth-First Search depth assignment | + +## API Overview + +Vermeer exposes a REST API on port `6688` (configurable in `master.ini`). + +### Key Endpoints + +| Endpoint | Method | Description | +|----------|--------|-------------| +| `/api/v1/graphs` | POST | Load graph from data source | +| `/api/v1/graphs/{graph_id}` | GET | Get graph metadata | +| `/api/v1/graphs/{graph_id}` | DELETE | Unload graph from memory | +| `/api/v1/compute` | POST | Execute algorithm on loaded graph | +| `/api/v1/tasks/{task_id}` | GET | Get task status and results | +| `/api/v1/workers` | GET | List connected workers | +| `/ui/` | GET | Web UI dashboard | + +### Example: Run PageRank ```bash -go build -tags=dev +# 1. Load graph from HugeGraph +curl -X POST http://localhost:6688/api/v1/graphs \ + -H "Content-Type: application/json" \ + -d '{ + "graph_name": "my_graph", + "load_type": "hugegraph", + "hugegraph": { + "pd_peers": ["127.0.0.1:8686"], + "graph_name": "hugegraph" + } + }' + +# 2. Run PageRank +curl -X POST http://localhost:6688/api/v1/compute \ + -H "Content-Type: application/json" \ + -d '{ + "graph_name": "my_graph", + "algorithm": "pagerank", + "params": { + "max_iterations": 20, + "damping_factor": 0.85 + }, + "output": { + "type": "hugegraph", + "property_name": "pagerank_value" + } + }' + +# 3. Check task status +curl http://localhost:6688/api/v1/tasks/{task_id} +``` + +### OLAP vs OLTP Modes + +- **OLAP Mode**: Load entire graph into memory, run multiple algorithms +- **OLTP Mode**: Query-driven, load subgraphs on demand (planned feature) + +## Data Sources + +### HugeGraph Integration + +Vermeer integrates with HugeGraph via: + +1. **Metadata Query**: Queries HugeGraph PD (metadata service) via gRPC for partition information +2. **Data Loading**: Streams vertices/edges from HugeGraph Store via gRPC (`ScanPartition`) +3. **Result Writing**: Writes computed results back via HugeGraph REST API (adds vertex properties) + +Configuration in graph load request: + +```json +{ + "load_type": "hugegraph", + "hugegraph": { + "pd_peers": ["127.0.0.1:8686"], + "graph_name": "hugegraph", + "vertex_label": "person", + "edge_label": "knows" + } +} +``` + +### Local CSV Files + +Load graphs from local CSV files: + +```json +{ + "load_type": "csv", + "csv": { + "vertex_file": "/path/to/vertices.csv", + "edge_file": "/path/to/edges.csv", + "delimiter": "," + } +} +``` + +### HDFS + +Load from Hadoop Distributed File System: + +```json +{ + "load_type": "hdfs", + "hdfs": { + "namenode": "hdfs://namenode:9000", + "vertex_path": "/graph/vertices", + "edge_path": "/graph/edges" + } +} +``` + +## Developing Custom Algorithms + +Custom algorithms implement the `Algorithm` interface in `algorithms/algorithms.go`: + +```go +type Algorithm interface { + // Initialize the algorithm + Init(params map[string]interface{}) error + + // Compute one iteration for a vertex + Compute(vertex *Vertex, messages []Message) (halt bool, outMessages []Message) + + // Aggregate global state (optional) + Aggregate() interface{} + + // Check termination condition + Terminate(iteration int) bool +} +``` + +### Example: Simple Degree Count + +```go +package algorithms + +type DegreeCount struct { + maxIter int +} + +func (dc *DegreeCount) Init(params map[string]interface{}) error { + dc.maxIter = params["max_iterations"].(int) + return nil +} + +func (dc *DegreeCount) Compute(vertex *Vertex, messages []Message) (bool, []Message) { + // Store degree as vertex value + vertex.SetValue(float64(len(vertex.OutEdges))) + + // Halt after first iteration + return true, nil +} + +func (dc *DegreeCount) Terminate(iteration int) bool { + return iteration >= dc.maxIter +} +``` + +Register the algorithm in `algorithms/algorithms.go`: + +```go +func init() { + RegisterAlgorithm("degree_count", &DegreeCount{}) +} +``` + +## Memory Management + +Vermeer uses an in-memory-first approach: + +1. **Graph Loading**: Vertices and edges are distributed across workers and stored in memory +2. **Automatic Partitioning**: Master assigns partitions to workers based on capacity +3. **Memory Monitoring**: Workers report memory usage to master +4. **Graceful Degradation**: If memory is insufficient, algorithms may fail (disk spilling not yet implemented) + +**Best Practice**: Ensure total worker memory exceeds graph size by 2-3x for algorithm workspace. + +## Supervisord Integration + +Run Vermeer as a daemon with automatic restarts and log rotation: + +```bash +# Configuration in config/supervisor.conf +./tools/supervisord -c config/supervisor.conf -d ``` ---- +Sample supervisor configuration: -### Protobuf Development +```ini +[program:vermeer-master] +command=/path/to/vermeer --env=master +autostart=true +autorestart=true +stdout_logfile=/var/log/vermeer-master.log +``` -If you need to regenerate protobuf files: +## Protobuf Development + +If you modify `.proto` files, regenerate Go code: ```bash # Install protobuf Go plugins go install google.golang.org/protobuf/cmd/[email protected] go install google.golang.org/grpc/cmd/[email protected] -# Generate protobuf files +# Generate (adjust protoc path for your platform) tools/protoc/osxm1/protoc *.proto --go-grpc_out=. --go_out=. ``` ---- +## Performance Tuning + +### Worker Configuration + +- **compute_threads**: Set to number of CPU cores for CPU-bound algorithms +- **memory_limit**: Set to 70-80% of available RAM +- **partition_count**: Increase for better parallelism (default: auto-calculated) + +### Master Configuration + +- **worker_timeout**: Increase for slow networks or heavily loaded workers +- **task_timeout**: Increase for long-running algorithms (e.g., Louvain on large graphs) + +### Algorithm-Specific + +- **PageRank**: Use `damping_factor=0.85`, `tolerance=0.0001` for faster convergence +- **Louvain**: Enable `weighted=true` only if edge weights are meaningful +- **SSSP**: Provide source vertex ID for single-source queries + +## Monitoring + +Access the Web UI dashboard at `http://master-ip:6688/ui/` for: + +- Worker status and resource usage +- Active and completed tasks +- Graph metadata and statistics +- Real-time logs + +## Troubleshooting + +### Workers Not Connecting + +- Verify `master_peer` in `worker.ini` matches master's gRPC address +- Check firewall rules for port `6689` (gRPC) +- Ensure master is running before starting workers + +### Out of Memory Errors + +- Reduce graph size or increase worker memory +- Distribute graph across more workers +- Use algorithms with lower memory footprint (e.g., degree centrality vs. betweenness) + +### Slow Algorithm Execution + +- Increase `compute_threads` in worker config +- Check network latency between master and workers +- Profile algorithm with built-in metrics (access via API) +## Links +- [Project Homepage](https://hugegraph.apache.org/docs/quickstart/hugegraph-computer/) +- [Main README](../README.md) +- [Computer (Java) README](../computer/README.md) +- [GitHub Issues](https://github.com/apache/hugegraph-computer/issues) +- [Docker Hub](https://hub.docker.com/r/hugegraph/vermeer) +## Contributing +See the main [Contributing Guide](../README.md#contributing) for how to contribute to Vermeer. +## License +Vermeer is part of Apache HugeGraph-Computer, licensed under [Apache 2.0 License](https://github.com/apache/incubator-hugegraph-computer/blob/master/LICENSE).
