This is an automated email from the ASF dual-hosted git repository. zhaocong pushed a commit to branch improve_doc in repository https://gitbox.apache.org/repos/asf/incubator-hugegraph-computer.git
commit b33e700b1966878088d5892692a3431f95c73a66 Author: coderzc <[email protected]> AuthorDate: Wed Jun 11 10:19:59 2025 +0800 feat: add API documentation and architectural design for Vermeer --- vermeer/README.md | 47 ++ vermeer/docs/api.md | 1430 ++++++++++++++++++++++++++++++++++++++++++ vermeer/docs/architecture.md | 83 +++ 3 files changed, 1560 insertions(+) diff --git a/vermeer/README.md b/vermeer/README.md new file mode 100644 index 00000000..b68d8b3b --- /dev/null +++ b/vermeer/README.md @@ -0,0 +1,47 @@ +## Vermeer +Vermeer is a high-performance graph computing framework implemented in the Go language, mainly focusing on memory-based, with the latest version compatible with some data being stored on disk. + +## Architecture Design +[Vermeer Architecture](docs/architecture.md) + +## Build + +### grpc protobuf Dependency installation +```` +go install google.golang.org/protobuf/cmd/[email protected] \ +go install google.golang.org/grpc/cmd/[email protected] +```` + +### protobuf build + +```` +../../tools/protoc/osxm1/protoc *.proto --go-grpc_out=. --go_out=. +```` + +### Cross-compilation + +```` +linux: GOARCH=amd64 GOOS=linux go build +CC=x86_64-linux-musl-gcc CGO_ENABLED=1 GOOS=linux GOARCH=amd64 go build -buildmode=plugin +```` + +## Run + +``` +master: ./vermeer --env=master +worker: ./vermeer --env=worker01 +# The parameter `env` specifies the configuration file name under the `config` folder. +``` +OR + +``` +./vermeer.sh start master +./vermeer.sh start worker +# The configuration items are specified in the vermeer.sh file. +``` + + +## API Doc +[Vermeer API](docs/api.md) + + diff --git a/vermeer/docs/api.md b/vermeer/docs/api.md new file mode 100644 index 00000000..f9cc2030 --- /dev/null +++ b/vermeer/docs/api.md @@ -0,0 +1,1430 @@ +# Vermeer API 文档 + +## 一、安装部署 + +### 1.1 运行配置 + +计算框架有两种角色,worker和master,但只有一个可执行文件,通过run_mode设置(master/worker)。--env参数可以指定使用哪个配置文件,例如--env=master指定使用master.ini,配置文件放在config/目录下。master需要指定监听的端口号,worker需要指定监听端口号和master的ip:port。 + +|配置项|说明|示例| +|---|---|---| +|log_level|日志级别|debug/info/warn/error| +|debug_mode|debug模式|release| +|http_peer|http监听地址|0.0.0.0:6688| +|grpc_peer|grpc监听地址|0.0.0.0:6689| +|master_peer|master grpc地址|127.0.0.1:6689| +|run_mode|运行模式|master/worker| + +运行示例: +```bash +./vermeer --env=master +./vermeer --env=worker01 +``` + +### 1.2 鉴权访问 + +Vermeer自0.2.0版本开始,默认开启鉴权访问。所有访问Vermeer接口的http请求都需要携带有效token,未携带有效token的请求会拒绝访问服务。token由Vermeer开发侧给出,使用时在request的Headers中增加Authorization字段。 + +Postman示例: +``` +Host: <calculated when request is sent> +User-Agent: PostmanRuntime/7.29.4 +Accept: gzip,deflate,br +Accept-Encoding: gzip,deflate,br +Connection: keep-alive +Authorization: 2Bq2ToHwLnci5wAyvXbhhYwz1YJhb4QLt1P2ttmfD... +``` + +Go示例: +```go +func setAuth(request *http.Request, token string) { + request.Header.Set("Authorization", token) +} +``` + +## 二、任务创建类rest api + +### 2.1 总体说明 + +此类rest api提供所有创建任务的功能,包括读取图数据和多种计算功能。提供异步返回和同步返回两种接口,返回内容均包含所创建任务的信息。使用Vermeer的整体流程是先创建读取图的任务,待图读取完毕后创建计算任务执行计算。图不会自动被删除,在一个图上运行多个计算任务无需多次重复读取,如需删除可用删除图接口。对于计算类型任务,可通过4.8接口迭代批量获取计算结果。任务状态分为读取任务状态和计算任务状态,客户端通常仅需了解创建、任务中、任务结束和任务错误四种状态。图状态是图是否可用的判断依据,若图正在读取中或图状态错误,无法使用该图创建计算任务,图删除接口仅在loaded和error状态且该图无计算任务时可用。 + +异步返回接口:`POST http://master_ip:port/tasks/create`,仅返回任务创建是否成功,需主动查询任务状态判断是否完成。 +同步返回接口:`POST http://master_ip:port/tasks/create/sync`,在任务结束后返回。 + +|参数|是否必填|类型|默认值|取值范围|备注| +|---|---|---|---|---|---| +|task_type|是|string|-|load/compute|任务类型| +|graph|是|string|-|-|图名| +|params|否|json|{}|-|动态参数,具体参数项见相关任务| + +load任务状态: +```mermaid +stateDiagram + [*] --> error + error --> load_vertex : 可能发生错误 + load_vertex --> load_vertex_ok : 顶点加载成功 + load_vertex_ok --> load_scatter : 继续下一步 + load_scatter --> load_scatter_ok : 散射加载成功 + load_scatter_ok --> load_edge : 加载边 + load_edge --> load_edge_ok : 边加载成功 + load_edge_ok --> load_degree : 计算度 + load_degree --> load_degree_ok : 度计算成功 + load_degree_ok --> loaded : 加载完成 +``` + +compute任务状态: +```mermaid +stateDiagram + [*] --> error + error --> created : 任务创建 + created --> step_doing : 计算步骤进行中 + step_doing --> step_done : 计算步骤完成 + step_done --> complete : 计算完成(可能循环直到计算结束) +``` + +图状态: +```mermaid +stateDiagram + [*] --> created + created --> loading : 图加载中 + loading --> loaded : 加载完成 + created --> error : 出错 +``` + +### 2.2 加载图数据 +支持reload新特性,可对图进行重新加载以更新图数据。设置"task_type": "reload",不填写params参数时默认使用上次加载图任务的参数进行重新加载,填写参数则会覆盖已有参数。前置条件是图必须已存在,图顶点数据默认存储至磁盘database中,小数据量下(<1亿),若需更快计算速度,可更改"load.vertex_backend"为"mem"。 + +异步返回接口:`POST http://master_ip:port/tasks/create` +同步返回接口:`POST http://master_ip:port/tasks/create/sync` + +|参数|是否必填|类型|默认值|取值范围|备注| +|---|---|---|---|---|---| +|task_type|是|string|-|load/reload|任务类型| +|graph|是|string|-|-|图名| +|params|否|json|{}|-|动态参数,具体参数项见下表| + +|动态参数|是否必填|类型|默认值|取值范围|备注| +|---|---|---|---|---|---| +|load.parallel|否|int|1|大于0|加载线程| +|load.type|否|string|local|local/hdfs/afs/hugegraph|数据加载方式| +|load.edges_per_vertex|否|int|10|大于等于0|每个顶点估计值,影响加载速度| +|load.edge_files|local/hdfs/afs方式必填|string|-|-|边文件名,local方式需指定worker ip,如{"10.81.116.77":"e_00"};hdfs/afs方式支持通配符,如"/data/twitter-2010.e_*"| +|load.use_property|否|int|0|0/1|是否使用属性| +|load.vertex_property|否(从hugegraph中读取数据,若load.use_property=1则必须指定property,此项必填)|json|-|-|顶点属性,格式为json,示例:{"label": "2,0", "name": "..." };hugegraph读取时为字符串,示例:"1,2,10001"| +|load.edge_property|否(从hugegraph中读取数据,若load.use_property=1则必须指定property,此项必填)|json|-|-|边属性,格式为json,示例:{"label": "2,0", "name": "..." };hugegraph读取时为字符串,示例:"1,2,10001"| +|load.use_outedge|否|int|0|0/1|是否生成顶点出边数据| +|load.use_out_degree|否|int|0|0/1|是否生成顶点出度数据| +|load.use_undirected|否|int|0|0/1|是否生成无向图数据| +|load.vertex_backend|否|string|db|db/mem|顶点底层存储方式,小数据量(<1亿)可设为mem| +|load.delimiter|否|string|空格|-|导入文件列分隔符| +|load.hdfs_conf_path|hdfs读取必填|string|-|-|hdfs配置文件路径| +|load.hdfs_namenode|hdfs读取必填|string|-|-|hdfs namenode| +|load.hdfs_use_krb|否|int|0|0/1|是否使用Keberos认证| +|load.krb_name|否|string|-|-|Keberos用户名| +|load.krb_realm|否|string|-|-|Keberos realm| +|load.krb_conf_path|否|string|-|-|Keberos配置文件路径| +|load.krb_keytab_path|否|string|-|-|Keberos keytab路径| +|load.hg_pd_peers|hugegraph读取必填|string|-|-|导入数据的hugegraph pd的peers,示例:["10.14.139.69:8686"]| +|load.hugegraph_name|hugegraph读取必填|string|-|-|导入数据的hugegraph图名| +|load.hugegraph_username|hugegraph读取必填|string|-|-|导入数据的hugegraph用户名| +|load.hugegraph_password|hugegraph读取必填|string|-|-|导入数据的hugegraph密码| +|load.hugegraph_vertex_condition|否|string|-|-|导入顶点数过滤语句,示例:"element.id > 100",字符串含义是boolean表达式| +|load.hugegraph_edge_condition|否|string|-|-|导入边数过滤语句,示例:"element.weight > 0.5",字符串含义是boolean表达式| +|load.hg_partitions|否|string|-|-|导入hugestore分区,示例:"load.hg_partitions": "0,1,2",拉取id为0、1、2的分区;若不配置则拉取所有分区| +|load.hugestore_batch_timeout|否|int|120|>=60|导入hugestore一个batch的超时时间| +|load.hugestore_batchsize|否|int|100000|>0|导入时拉取一个batch的大小| +|load.afs_uri|afs读取必填|string|-|-|导入数据的afs集群uri| +|load.afs_username|afs读取必填|string|-|-|导入数据的afs用户名| +|load.afs_password|afs读取必填|string|-|-|导入数据的afs密码| +|load.afs_config_path|否|string|-|-|afs-api配置文件路径,一般无需改动| + +request示例: +```json +// load.type: local +POST http://10.81.116.77:8688/tasks/create +{ + "task_type": "load", + "graph": "testdb", + "params": { + "load.parallel": "50", + "load.type": "local", + "load.vertex_files": "{\"10.81.116.77\":\"data/twitter-2010.v_[0,99]\"}", + "load.edge_files": "{\"10.81.116.77\":\"data/twitter-2010.e_[0,99]\"}", + "load.use_out_degree": "1", + "load.use_outedge": "1" + } +} +``` +```json +// load.type: hdfs +POST http://10.81.116.77:8688/tasks/create +{ + "task_type": "load", + "graph": "testdb", + "params": { + "load.parallel": "50", + "load.type": "hdfs", + "load.vertex_files": "/data/twitter-2010.v_*", + "load.edge_files": "/data/twitter-2010.e_*", + "load.hdfs_namenode": "hdfs://10.81.116.77:8020", + "load.hdfs_conf_path": "/home/hadoop/hadoop-3.3.1/etc/hadoop/", + "load.hdfs_use_krb": "0", + "load.krb_realm": "", + "load.krb_name": "", + "load.krb_keytab_path": "", + "load.krb_conf_path": "", + "load.use_out_degree": "1", + "load.use_outedge": "1" + } +} +``` +```json +// load.type: afs +POST http://10.81.116.77:8688/tasks/create +{ + "task_type": "load", + "graph": "testdb", + "params": { + "load.parallel": "50", + "load.type": "afs", + "load.vertex_files": "/user/vermeer_test/twitter-2010.v_*", + "load.edge_files": "/user/vermeer_test/twitter-2010.e_*", + "load.afs_username": "test", + "load.afs_password": "xxxxxx", + "load.afs_uri": "afs://wudang.afs.baidu.com:9902/", + "load.use_out_degree": "1", + "load.use_outedge": "1" + } +} +``` +```json +// load.type: hugegraph +POST http://10.81.116.77:8688/tasks/create +{ + "task_type": "load", + "graph": "testdb", + "params": { + "load.parallel": "50", + "load.type": "hugegraph", + "load.hg_pd_peers": "[\"10.14.139.69:8686\"]", + "load.hugegraph_name": "DEFAULT/hugegraph2/g", + "load.hugegraph_password":"xxxxx", + "load.use_out_degree": "1", + "load.use_outedge": "1" + } +} +``` +```json +// reload +POST http://10.81.116.77:8688/tasks/create +{ + "task_type": "reload", + "graph": "testdb" +} +``` + +### 2.3 输出计算结果说明 +所有计算任务(仅展示输出local方式的示例)均支持多种结果输出方式,可自定义输出方式:local、hdfs、afs或hugegraph。在发送请求时的params参数下加入对应参数即可生效。支持计算结果统计信息输出,需指定output.need_statistics为1,结果会写在接口任务信息内。统计模式算子目前支持 "count" 和 "modularity" ,但仅针对社区发现算法适用。 + +|动态参数|是否必填|类型|默认值|取值范围| +|---|---|---|---|---| +|output.parallel|否|int|1|大于0| +|output.delimiter|否|string|空格|-| +|output.file_path|local/hdfs/afs方式必填|string|-|-| +|output.type|是|string|local|local/hdfs/afs/hugegraph| +|output.need_query|否|int|0|0/1| +|output.hdfs_conf_path|hdfs方式必填|string|-|-| +|output.hdfs_namenode|hdfs方式必填|string|-|-| +|output.hdfs_use_krb|否|int|0|0/1| +|output.krb_name|否|string|-|-| +|output.krb_realm|否|string|-|-| +|output.krb_conf_path|否|string|-|-| +|output.krb_keytab_path|否|string|-|-| +|output.afs_uri|afs方式必填|string|-|-| +|output.afs_username|afs方式必填|string|-|-| +|output.afs_password|afs方式必填|string|-|-| +|output.afs_config_path|否|string|-|-| +|output.hg_pd_peers|hugegraph方式必填|string|-|-| +|output.hugegraph_name|hugegraph方式必填|string|-|-| +|output.hugegraph_username|hugegraph方式必填|string|-|-| +|output.hugegraph_password|hugegraph方式必填|string|-|-| +|output.hugegraph_property|hugegraph方式必填|string|-|-| +|output.hugegraph_write_type|否|string|OLAP_COMMON|OLAP_COMMON / OLAP_SECONDARY / OLAP_RANGE| +|output.need_statistics|否|int|0|0/1| +|output.statistics_mode|否|string|-|count/modularity| + +示例: +```json +// local +POST http://10.81.116.77:8688/tasks/create +{ + "task_type": "compute", + "graph": "testdb", + "params": { + "compute.algorithm": "pagerank", + "compute.parallel":"10", + "compute.max_step":"10", + "output.type":"local", + "output.parallel":"1", + "output.file_path":"result/pagerank" + } +} +``` + +```json +// hdfs +POST http://10.81.116.77:8688/tasks/create +{ + "task_type": "compute", + "graph": "testdb", + "params": { + "compute.algorithm": "pagerank", + "compute.parallel":"10", + "compute.max_step":"10", + "output.type":"hdfs", + "output.parallel":"1", + "output.file_path":"data/result/pagerank", + "output.hdfs_namenode": "hdfs://10.81.116.77:8020", + "output.hdfs_conf_path": "/home/hadoop/etc/hadoop", + "output.hdfs_use_krb": "0" + } +} +``` + +```json +// afs +POST http://10.81.116.77:8688/tasks/create +{ + "task_type": "compute", + "graph": "testdb", + "params": { + "compute.algorithm": "pagerank", + "compute.parallel":"10", + "compute.max_step":"10", + "output.type":"afs", + "output.parallel":"1", + "output.file_path":"/user/vermeer_test/result/pagerank", + "output.afs_username": "test", + "output.afs_password": "xxxxxx", + "output.afs_uri": "afs://wudang.afs.baidu.com:9902/" + } +} +``` + +```json +// hugegraph +POST http://10.81.116.77:8688/tasks/create +{ + "task_type": "compute", + "graph": "testdb", + "params": { + "compute.algorithm": "pagerank", + "compute.parallel":"10", + "compute.max_step":"10", + "output.type":"hugegraph", + "output.parallel":"10", + "output.hg_pd_peers": "[\"10.14.139.69:8686\"]", + "output.hugegraph_username":"admin", + "output.hugegraph_password":"xxxxx", + "output.hugegraph_name": "DEFAULT/hugegraph0/g", + "output.hugegraph_property": "pagerank" + } +} +``` + +### 2.4 PageRank +PageRank算法又称网页排名算法,通过网页(节点)间的超链接计算,体现网页(节点)的相关性和重要性。适用于网页排序、社交网络重点人物发掘等场景。一个网页被众多其他网页链接,其PageRank值较高;PageRank值高的网页链接到其他网页,会相应提高被链接网页的PageRank值。 + +异步返回接口:`POST http://master_ip:port/tasks/create` +同步返回接口:`POST http://master_ip:port/tasks/create/sync` + +|参数|是否必填|类型|默认值|取值范围|备注| +|---|---|---|---|---|---| +|task_type|是|string|-|compute|任务类型| +|graph|是|string|-|-|图名| +|params|否|json|{}|-|动态参数,具体参数项见下表| + +|动态参数|是否必填|类型|默认值|取值范围|备注| +|---|---|---|---|---|---| +|compute.algorithm|是|string|-|pagerank|算法名| +|compute.parallel|否|int|1|大于0|worker计算线程数| +|compute.max_step|否|int|10|大于0|最大迭代步数| +|pagerank.damping|否|float|0.85|0 - 1|阻尼系数,传导到下个点的百分比| +|pagerank.diff_threshold|否|float|0.00001|0 - 1|收敛精度,每次迭代各点变化绝对值累加和上限,小于此值时算法停止| + +```json +POST http://10.81.116.77:8688/tasks/create +{ + "task_type": "compute", + "graph": "testdb", + "params": { + "compute.algorithm": "pagerank", + "compute.parallel":"10", + "output.type":"local", + "output.parallel":"1", + "output.file_path":"result/pagerank", + "compute.max_step":"10" + } +} +``` + +### 2.5 WCC(弱连通分量) +计算无向图中所有连通的子图,输出各顶点所属的弱连通子图id,用于表明各个点之间的连通性,区分不同的连通社区。 + +异步返回接口:`POST http://master_ip:port/tasks/create` +同步返回接口:`POST http://master_ip:port/tasks/create/sync` + +|动态参数|是否必填|类型|默认值|取值范围|备注| +|---|---|---|---|---|---| +|compute.algorithm|是|string|-|wcc|算法名| +|compute.parallel|否|int|1|大于0|worker计算线程数| +|compute.max_step|否|int|10|大于0|最大迭代步数| + +```json +POST http://10.81.116.77:8688/tasks/create +{ + "task_type": "compute", + "graph": "testdb", + "params": { + "compute.algorithm": "wcc", + "compute.parallel":"10", + "output.type":"local", + "output.parallel":"1", + "output.file_path":"result/wcc", + "compute.max_step":"10" + } +} +``` + +### 2.6 LPA(标签传播) +标签传递算法是一种图聚类算法,常用于社交网络中发现潜在的社区。 + +异步返回接口:`POST http://master_ip:port/tasks/create` +同步返回接口:`POST http://master_ip:port/tasks/create/sync` + +|动态参数|是否必填|类型|默认值|取值范围|备注| +|---|---|---|---|---|---| +|compute.algorithm|是|string|-|lpa|算法名| +|compute.parallel|否|int|1|大于0|worker计算线程数| +|compute.max_step|否|int|10|大于0|最大迭代步数| +|lpa.vertex_weight_property|否|string|-|顶点权重属性名,必须为int或float| + +```json +POST http://10.81.116.77:8688/tasks/create +{ + "task_type": "compute", + "graph": "testdb", + "params": { + "compute.algorithm": "lpa", + "compute.parallel":"10", + "output.type":"local", + "output.parallel":"1", + "output.file_path":"result/lpa", + "compute.max_step":"10" + } +} +``` + +### 2.7 Degree Centrality(度中心性) +该算法用于计算图中每个节点的度中心性值,支持无向图和有向图。在无向图中基于边信息统计节点出现次数计算度中心性;有向图中根据边的方向筛选,基于输入边或输出边信息统计节点出现次数,得到节点的入度值或出度值。节点与其他节点的边越多,度中心性值越大,在图中的重要性越高。 + +异步返回接口:`POST http://master_ip:port/tasks/create` +同步返回接口:`POST http://master_ip:port/tasks/create/sync` + +|动态参数|是否必填|类型|默认值|取值范围|备注| +|---|---|---|---|---|---| +|compute.algorithm|是|string|-|degree|算法名| +|compute.parallel|否|int|1|大于0|worker计算线程数| +|degree.direction|否|string|out|out/in/both|边的方向| +|compute.max_step|否|int|10|大于0|最大迭代步数| + +```json +POST http://10.81.116.77:8688/tasks/create +{ + "task_type": "compute", + "graph": "testdb", + "params": { + "compute.algorithm": "degree", + "compute.parallel":"10", + "output.type":"local", + "output.parallel":"1", + "output.file_path":"result/degree", + "degree.direction":"both" + } +} +``` + +### 2.8 Closeness Centrality(紧密中心性) +计算一个节点到所有其他可达节点的最短距离的倒数,累积后归一化的值。紧密中心度可衡量信息从该节点传输到其他节点的时间长短,节点的“Closeness Centrality”越大,在图中的位置越靠近中心,适用于社交网络中关键节点发掘等场景。 + +异步返回接口:`POST http://master_ip:port/tasks/create` +同步返回接口:`POST http://master_ip:port/tasks/create/sync` + +|动态参数|是否必填|类型|默认值|取值范围|备注| +|---|---|---|---|---|---| +|compute.algorithm|是|string|-|closeness_centrality|算法名| +|compute.parallel|否|int|1|大于0|worker计算线程数| +|closeness_centrality.sample_rate|否|float|1.0|0 ~ 1|边的采样率,该算法算力要求高,需根据需求设置合理采样率以获得近似结果| +|closeness_centrality.wf_improved|否|int|1|0/1|是否使用Wasserman and Faust中心性公式| +|compute.max_step|否|int|10|大于0|最大迭代步数| + +```json +POST http://10.81.116.77:8688/tasks/create +{ + "task_type": "compute", + "graph": "testdb", + "params": { + "compute.algorithm": "closeness_centrality", + "compute.parallel":"10", + "output.type":"local", + "output.parallel":"1", + "output.file_path":"result/closeness_centrality", + "closeness_centrality.sample_rate":"0.01" + } +} +``` + +### 2.9 Betweenness Centrality(中介中心性算法) +中介中心性算法用于判断一个节点是否具有“桥梁”节点的价值,值越大说明它作为图中两点间必经路径的可能性越大。典型例子包括社交网络中的共同关注的人,适用于衡量社群围绕某个节点的聚集程度。 + +异步返回接口:`POST http://master_ip:port/tasks/create` +同步返回接口:`POST http://master_ip:port/tasks/create/sync` + +|动态参数|是否必填|类型|默认值|取值范围|备注| +|---|---|---|---|---|---| +|compute.algorithm|是|string|-|betweenness_centrality|算法名| +|compute.parallel|否|int|1|大于0|worker计算线程数| +|betweenness_centrality.sample_rate|否|float|1.0|0 ~ 1|边的采样率,算法算力要求高,需按需设置采样率获取近似结果| +|betweenness_centrality.use_endpoint|否|int|0|0/1|是否计算最后一个点| +|compute.max_step|否|int|10|大于0|最大迭代步数| + +```json +POST http://10.81.116.77:8688/tasks/create +{ + "task_type": "compute", + "graph": "testdb", + "params": { + "compute.algorithm": "betweenness_centrality", + "compute.parallel":"10", + "output.type":"local", + "output.parallel":"1", + "output.file_path":"result/betweenness_centrality", + "betweenness_centrality.sample_rate":"0.01" + } +} +``` + +### 2.10 Triangle Count(三角形计数) +用于计算通过每个顶点的三角形个数。在社交网络中,三角形表示有凝聚力的社区,有助于理解网络中个人或群体的聚类和相互联系;在金融网络或交易网络中,三角形的存在可能表示可疑或欺诈活动,可帮助识别需要进一步调查的交易模式。输出结果为每个顶点对应一个Triangle Count,即每个顶点所在三角形的个数,该算法为无向图算法,忽略边的方向。 + +异步返回接口:`POST http://master_ip:port/tasks/create` +同步返回接口:`POST http://master_ip:port/tasks/create/sync` + +|动态参数|是否必填|类型|默认值|取值范围|备注| +|---|---|---|---|---|---| +|compute.algorithm|是|string|-|triangle_count|算法名| +|compute.parallel|否|int|1|大于0|worker计算线程数| +|compute.max_step|否|int|10|大于0|最大迭代步数| + +```json +POST http://10.81.116.77:8688/tasks/create +{ + "task_type": "compute", + "graph": "testdb", + "params": { + "compute.algorithm": "triangle_count", + "compute.parallel":"10", + "output.type":"local", + "output.parallel":"1", + "output.file_path":"result/triangle_count" + } +} +``` + +### 2.11 K-Core +K-Core算法用于标记所有度数为K的顶点,适用于图的剪枝,查找图的核心部分。 + +异步返回接口:`POST http://master_ip:port/tasks/create` +同步返回接口:`POST http://master_ip:port/tasks/create/sync` + +|动态参数|类型|默认值|取值范围|备注| +|---|---|---|---|---| +|compute.algorithm|string|-|kcore|算法名| +|compute.parallel|int|1|大于0|worker计算线程数| +|kcore.degree_k|int|3|大于0|最小度数阈值| +|compute.max_step|int|10|大于0|最大迭代步数| + +```json +POST http://10.81.116.77:8688/tasks/create +{ + "task_type": "compute", + "graph": "testdb", + "params": { + "compute.algorithm": "kcore", + "compute.parallel":"10", + "output.type":"local", + "output.parallel":"1", + "output.file_path":"result/kcore", + "kcore.degree_k":"5" + } +} +``` + +### 2.12 SSSP(单元最短路径) +单源最短路径算法,用于求一个点到其他所有点的最短距离。 + +异步返回接口:`POST http://master_ip:port/tasks/create` +同步返回接口:`POST http://master_ip:port/tasks/create/sync` + +|动态参数|是否必填|类型|默认值|取值范围|备注| +|---|---|---|---|---|---| +|compute.algorithm|是|string|-|sssp|算法名| +|compute.parallel|否|int|1|大于0|worker计算线程数| +|sssp.source|是|string|-|-|起始点ID| +|compute.max_step|否|int|10|大于0|最大迭代步数| + +```json +POST http://10.81.116.77:8688/tasks/create +{ + "task_type": "compute", + "graph": "testdb", + "params": { + "compute.algorithm": "sssp", + "compute.parallel":"10", + "output.type":"local", + "output.parallel":"1", + "output.file_path":"result/degree", + "sssp.source":"tom" + } +} +``` + +### 2.13 Kout +以一个点为起点,获取这个点的k层的节点。 + +异步返回接口:`POST http://master_ip:port/tasks/create` +同步返回接口:`POST http://master_ip:port/tasks/create/sync` + +|动态参数|是否必填|类型|默认值|取值范围|备注| +|---|---|---|---|---|---| +|compute.algorithm|是|string|-|kout|算法名| +|compute.parallel|否|int|1|大于0|worker计算线程数| +|kout.source|是|string|-|-|起始点ID| +|compute.max_step|是|int|10|大于0|使用最大步数作为K值| + +```json +POST http://10.81.116.77:8688/tasks/create +{ + "task_type": "compute", + "graph": "testdb", + "params": { + "compute.algorithm": "kout", + "compute.parallel":"10", + "output.type":"local", + "output.parallel":"1", + "output.file_path":"result/kout", + "kout.source":"tom", + "compute.max_step":"6" + } +} +``` + +### 2.14 Louvain +Louvain算法是一种基于模块度的社区发现算法。其基本思想是网络中节点尝试遍历所有邻居的社区标签,并选择最大化模块度增量的社区标签。在最大化模块度之后,每个社区看成一个新的节点,重复直到模块度不再增大。Vermeer上实现的分布式Louvain算法受节点顺序、并行计算等因素影响,且由于其遍历顺序的随机导致社区压缩也具有一定的随机性,重复多次执行可能存在不同结果,但整体趋势不会有大的变化。 + +异步返回接口:`POST http://master_ip:port/tasks/create` +同步返回接口:`POST http://master_ip:port/tasks/create/sync` + +|动态参数|是否必填|类型|默认值|取值范围|备注| +|---|---|---|---|---|---| +|compute.algorithm|是|string|-|louvain|算法名| +|compute.parallel|否|int|1|大于0|worker计算线程数| +|compute.max_step|是|int|10|大于0|最大迭代步数(推荐设置到1000)| +|louvain.threshold|否|float|0.0000001|(0,1)|louvain算法模块度收敛精度,当前模块度计算差值小于精度阈值即停止| +|louvain.resolution|否|float|1.0|(0,1]|louvain的resolution参数| +|louvain.step|否|int|10|大于0|louvain的迭代轮次| + +注:compute.max_step并不与单机版本的louvain计算轮次对应,一个louvain计算轮次包含多个不确定的compute.max_step。twitter - 2010数据实测需要300 - 400轮才可收敛。推荐将迭代轮次设置较大,调整阈值控制算法是否提前结束。可通过louvain.step控制louvain的迭代轮次,compute.max_step、louvain.step、louvain.threshold任一满足条件都会退出算法。 + +```json +POST http://10.81.116.77:8688/tasks/create +{ + "task_type": "compute", + "graph": "testdb", + "params": { + "compute.algorithm": "louvain", + "compute.parallel":"10", + "compute.max_step":"1000", + "louvain.threshold":"0.0000001", + "louvain.resolution":"1.0", + "louvain.step":"10", + "output.type":"local", + "output.parallel":"1", + "output.file_path":"result/louvain" + } +} +``` + +### 2.15 Jaccard相似度系数 +Jaccard index,又称为Jaccard相似系数(Jaccard similarity coefficient)用于比较有限样本集之间的相似性与差异性。Jaccard系数值越大,样本相似度越高。该功能用于计算一个给定的源点,与图中其他所有点的Jaccard相似系数。 + +异步返回接口:`POST http://master_ip:port/tasks/create` +同步返回接口:`POST http://master_ip:port/tasks/create/sync` + +|动态参数|是否必填|类型|默认值|取值范围|备注| +|---|---|---|---|---|---| +|compute.algorithm|是|string|-|jaccard|算法名| +|compute.parallel|否|int|1|大于0|worker计算线程数| +|compute.max_step|是|int|10|大于0|最大迭代步数(计算仅需2步就结束)| +|jaccard.source|是|string|-|-|给定的源点| + +```json +POST http://10.81.116.77:8688/tasks/create +{ + "task_type": "compute", + "graph": "testdb", + "params": { + "compute.algorithm": "jaccard", + "compute.parallel":"10", + "compute.max_step":"2", + "jaccard.source":"123", + "output.type":"local", + "output.file_path":"result/jaccard" + } +} +``` + +### 2.16 Personalized PageRank +个性化的pagerank目标是计算所有节点相对于用户u的相关度。从用户u对应的节点开始游走,每到一个节点都以1 - d的概率停止游走并从u重新开始,或者以d的概率继续游走,从当前节点指向的节点中按照均匀分布随机选择一个节点往下游走。用于给定一个起点,计算此起点开始游走的个性化pagerank得分,适用于社交推荐等场景。注:由于计算需要使用出度,需要在读取图时设置"load.use_out_degree": "1"。 + +异步返回接口:`POST http://master_ip:port/tasks/create` +同步返回接口:`POST http://master_ip:port/tasks/create/sync` + +|动态参数|是否必填|类型|默认值|取值范围|备注| +|---|---|---|---|---|---| +|compute.algorithm|是|string|-|ppr|算法名| +|compute.parallel|否|int|1|大于0|worker计算线程数| +|compute.max_step|是|int|10|大于0|最大迭代步数(计算仅需2步就结束)| +|ppr.source|是|string|-|-|给定的起始点| +|ppr.damping|否|float|0.85|0 - 1|阻尼系数,传导到下个点的百分比| +|ppr.diff_threshold|float|0.00001|0 - 1|收敛精度,每次迭代各点变化绝对值累加和上限,小于此值时算法停止| + +```json +POST http://10.81.116.77:8688/tasks/create +{ + "task_type": "compute", + "graph": "testdb", + "params": { + "compute.algorithm": "ppr", + "compute.parallel":"100", + "compute.max_step":"10", + "ppr.source":"123", + "ppr.damping":"0.85", + "ppr.diff_threshold":"0.00001", + "output.type":"local", + "output.parallel":"1", + "output.file_path":"result/ppr" + } +} +``` + +### 2.17 全图Kout +计算图的所有节点的k度邻居(不包含自己以及1 - k - 1度的邻居),由于全图kout算法内存膨胀比较厉害,目前k限制在1和2。另外,全局kout算法支持过滤功能(参数如:"compute.filter":"risk_level==1"),在计算第k度的时候进行过滤条件的判断,符合过滤条件的进入最终结果集,算法最终输出是符合条件的邻居个数。 + +异步返回接口:`POST http://master_ip:port/tasks/create` +同步返回接口:`POST http://master_ip:port/tasks/create/sync` + +|动态参数|是否必填|类型|默认值|取值范围|备注| +|---|---|---|---|---|---| +|compute.algorithm|是|string|-|kout_all|算法名| +|compute.max_step|是|int|10|大于0|使用最大步数作为K值| +|compute.filter|否|string|-|属性支持数值类型和文本类型的比较。属性为数值类型:“level==1”,支持所有比较运算符;属性为文本类型:“name==\"test\"”,文本类型只支持 == 和 != 运算符| + +备注:compute.filter参数如果没有则不进行过滤。 + +```json +POST http://10.81.116.77:8688/tasks/create +{ + "task_type": "compute", + "graph": "testdb", + "params": { + "compute.algorithm": "kout_all", + "compute.parallel":"10", + "output.type":"local", + "output.parallel":"10", + "output.file_path":"result/kout", + "compute.max_step":"2", + "compute.filter":"risk_level==1" + } +} +``` + +### 2.18 集聚系数clustering coefficient +集聚系数表示一个图中节点聚集程度的系数。在现实网络中,节点总是趋向于建立一组严密的组织关系。集聚系数算法(Cluster Coefficient)用于计算图中节点的聚集程度,本算法为局部集聚系数,可测量图中每一个结点附近的集聚程度。 + +异步返回接口:`POST http://master_ip:port/tasks/create` +同步返回接口:`POST http://master_ip:port/tasks/create/sync` + +|动态参数|是否必填|类型|默认值|取值范围|备注| +|---|---|---|---|---|---| +|compute.algorithm|是|string|-|clustering_coefficient|算法名| +|compute.parallel|否|int|1|大于0|worker计算线程数| +|compute.max_step|是|int|10|大于0|最大迭代步数(计算仅需2步)| + +```json +POST http://10.81.116.77:8688/tasks/create +{ + "task_type": "compute", + "graph": "testdb", + "params": { + "compute.algorithm": "clustering_coefficient", + "compute.parallel":"100", + "compute.max_step":"10", + "output.type":"local", + "output.parallel":"1", + "output.file_path":"result/cc" + } +} +``` + +### 2.19 SCC(强连通分量) +在有向图的数学理论中,如果一个图的每一个顶点都可从该图其他任意一点到达,则称该图是强连通的。在任意有向图中能够实现强连通的部分我们称其为强连通分量,用于表明各个点之间的连通性,区分不同的连通社区。 + +异步返回接口:`POST http://master_ip:port/tasks/create` +同步返回接口:`POST http://master_ip:port/tasks/create/sync` + +|动态参数|是否必填|类型|默认值|取值范围|备注| +|---|---|---|---|---|---| +|compute.algorithm|是|string|-|scc|算法名| +|compute.parallel|否|int|1|大于0|worker计算线程数| +|compute.max_step|否|int|10|大于0|最大迭代步数| + +注:强连通分量算法通过多次的前向和后向传播找出强连通分量子图,需要较多轮次才能收敛,建议设置迭代轮次较大,算法会在收敛后自动结束。实测twitter数据集需120轮收敛。 + +```json +POST http://10.81.116.77:8688/tasks/create +{ + "task_type": "compute", + "graph": "testdb", + "params": { + "compute.algorithm": "scc", + "compute.parallel":"10", + "output.type":"local", + "output.parallel":"1", + "output.file_path":"result/scc", + "compute.max_step":"200" + } +} +``` + +## 三、其他rest api +### 3.1 获取graph列表 +获取数据库中的所有图列表。 +- **uri**:`GET http://master_ip:port/graphs` +- **params**:无 +- **response**: +```json +{ + "errcode": 0, + "graphs": [ + { + "name": "testdb", + "status": "loaded", + "create_time": "2022-11-01T17:08:43.100831+08:00", + "update_time": "2022-11-01T17:08:43.137682+08:00", + "vertex_count": 10, + "edge_count": 18, + "workers": [ + { + "Name": "1587370926564790272", + "VertexCount": 4, + "VertIdStart": 0, + "EdgeCount": 7, + "IsSelf": false, + "ScatterOffset": 0 + }, + { + "Name": "1587370993006116864", + "VertexCount": 6, + "VertIdStart": 4, + "EdgeCount": 11, + "IsSelf": false, + "ScatterOffset": 0 + } + ] + } + ] +} +``` + +### 3.2 获取指定graph信息 +获取数据库中的指定图数据。 +- **uri**:`GET http://master_ip:port/graphs/{$db_name}` +- **params**: +|参数名|是否必填|类型|默认值|取值范围|备注| +|---|---|---|---|---|---| +|db_name|是|string|-|-|要查询的图库名| +- **response**: +```json +{ + "errcode": 0, + "graph": { + "name": "testdb", + "status": "loaded", + "create_time": "2022-11-01T17:08:43.100831+08:00", + "update_time": "2022-11-01T17:08:43.137682+08:00", + "vertex_count": 10, + "edge_count": 18, + "workers": [ + { + "Name": "1587370926564790272", + "VertexCount": 4, + "IsSelf": false, + "ScatterOffset": 0 + }, + { + "Name": "1587370993006116864", + "VertexCount": 6, + "VertIdStart": 4, + "EdgeCount": 11, + "IsSelf": false, + "ScatterOffset": 0 + } + ], + "use_out_edges": true, + "use_out_degree": true + } +} +``` + +### 3.3 删除graph +删除图并释放空间,仅在图状态 "status" 为"loaded"或"error"时可用。 +- **uri**:`DELETE http://master_ip:port/graphs/{$db_name}` +- **params**: +|参数名|是否必填|类型|默认值|取值范围|备注| +|---|---|---|---|---|---| +|db_name|是|string|-|-|要删除的图库名| +- **response**: +```json +{ + "errcode": 0, + "deleted": "ok" +} +``` + +### 3.4 查询顶点的边信息 +不推荐业务依赖此接口查询,仅适合于debug。接口有效条件:1. 图的导入任务必须完成,图状态为loaded;2. 接口有效的生命周期仅为本图进行计算任务时,非本图进行计算时,会对非计算任务的图进行落盘以释放内存空间,此后不可查询图的具体数据,直到下一次发起该图的计算任务时,该图会从磁盘中调起。 +- **uri**:`GET http://master_ip:port/graphs/{$db_name}/edges?vertex_id={$vertex_id}&direction={$direction}` +- **params**: +|参数名|是否必填|类型|默认值|取值范围|备注| +|---|---|---|---|---|---| +|db_name|是|string|-|-|要查询的图库名| +|vertex_id|是|string|-|-|图中的顶点| +|direction|是|string|-|in/out/both|入边、出边或全部| +- **response**: +```json +{ + "errcode": 0, + "in_edges": [ + "13426", + "13441", + "13450", + "700623" + ], + "out_edges": [ + "13441", + "13450", + "700623" + ] +} +``` + +### 3.5 查询worker信息 +- **简介**:查询worker的id、name、ip等信息。 +- **uri**:`GET http://master_ip:port/workers` +- **params**:无 +- **response**: +```json +{ + "workers": [ + { + "id": 1, + "name": "1590231886607126528", + "grpc_peer": "10.157.12.67:8397", + "ip_addr": "10.157.12.67", + "launch_time": "2022-12-26T11:27:33.59824+08:00" + }, + { + "id": 2, + "name": "1590232206384066560", + "grpc_peer": "10.157.12.68:8395", + "ip_addr": "10.157.12.68", + "state": "READY", + "version": "0.0.1", + "launch_time": "2022-12-26T11:27:33.59824+08:00" + } + ] +} +``` +- **示例**: +```json +GET http://10.81.116.77:8688/workers +``` + +### 3.6 查询master信息 +- **简介**:查询master的相关信息。 +- **uri**:`GET http://master_ip:port/master` +- **params**:无 +- **response**: +```json +{ + "master": { + "grpc_peer": "0.0.0.0:6689", + "ip_addr": "0.0.0.0:6688", + "debug_mod": "release", + "launch_time": "2022-12-26T11:27:30.031169+08:00" + } +} +``` +- **示例**: +```json +GET http://0.0.0.0:6688/master +``` + +### 3.7 获取所有任务列表 +- **简介**:查询已创建的所有任务。 +- **uri**:`GET http://master_ip:port/tasks?type={$type}` +- **params**: +|参数|是否必填|类型|默认值|取值范围|备注| +|---|---|---|---|---|---| +|type|是|string|all|all/todo|任务类型,all:所有任务,todo:待执行任务| +- **response**: +```json +{ + "errcode": 0, + "tasks": [ + { + "id": 1, + "status": "loaded", + "create_time": "2022-11-10T11:50:57.101218+08:00", + "update_time": "2022-11-10T11:51:14.445569+08:00", + "graph_name": "testGraph", + "task_type": "load", + "params": { + "load.delimiter": " ", + "load.type": "local", + "load.use_out_degree": "1", + "load.use_outedge": "1", + "load.use_property": "0", + "load.use_undirected": "0", + "load.vertex_files": "{\"127.0.0.1\":\"test_case/vertex/vertex_[0,29]\"}", + "load.edge_files": "{\"127.0.0.1\":\"test_case/edge/edge_[0,29]\"}" + }, + "workers": [ + { + "name": "1590552492724137984", + "status": "loaded" + }, + { + "name": "1590552530107645952", + "status": "loaded" + } + ] + }, + { + "id": 2, + "status": "complete", + "create_time": "2022-11-10T11:51:20.098327+08:00", + "update_time": "2022-11-10T11:51:22.95215+08:00", + "graph_name": "testGraph", + "task_type": "compute", + "params": { + "compute.algorithm": "pagerank", + "compute.max_step": "10", + "compute.parallel": "30", + "output.delimiter": ",", + "output.file_path": "./data/pagerank", + "output.parallel": "1", + "output.type": "local", + "pagerank.damping": "0.85" + }, + "workers": [ + { + "name": "1590552492724137984", + "status": "complete" + }, + { + "name": "1590552530107645952", + "status": "complete" + } + ] + } + ] +} +``` +- **示例**: +```json +GET http://0.0.0.0:6688/tasks +``` + +### 3.8 获取单个任务信息 +- **简介**:查询指定task_id的任务信息。 +- **uri**:`GET http://master_ip:port/task/{$task_id}` +- **params**: +|参数名|是否必填|类型|默认值|取值范围|备注| +|---|---|---|---|---|---| +|task_id|是|int|-|-|要查询的任务id| +- **response**: +```json +{ + "errcode": 0, + "task": { + "id": 1, + "status": "loaded", + "create_time": "2022-11-10T11:50:57.101218+08:00", + "update_time": "2022-11-10T11:51:14.445569+08:00", + "graph_name": "testGraph", + "task_type": "load", + "params": { + "load.delimiter": " ", + "load.type": "local", + "load.use_out_degree": "1", + "load.use_outedge": "1", + "load.use_property": "0", + "load.use_undirected": "0", + "load.vertex_files": "{\"127.0.0.1\":\"test_case/vertex/vertex_[0,29]\"}", + "load.edge_files": "{\"127.0.0.1\":\"test_case/edge/edge_[0,29]\"}" + }, + "workers": [ + { + "name": "1590552492724137984", + "status": "loaded" + }, + { + "name": "1590552530107645952", + "status": "loaded" + } + ] + } +} +``` +- **示例**: +```json +GET http://0.0.0.0:6688/task/1 +``` + +### 3.9 计算任务迭代查询结果 +- **简介**:在计算任务中,可在计算任务结束后,通过迭代器批量查询计算结果。计算结果查询功能在十分钟内若未接收到新的查询请求,将会自动释放内存。 +- **uri**:`POST http://master_ip:port/tasks/value/{$task_id}?cursor={$cursor}&limit={$limit}` +- **params**: +|参数|是否必填|类型|默认值|取值范围|备注| +|---|---|---|---|---|---| +|task_id|是|int|-|-|任务id| +|cursor|否|int|0|0 - 顶点总数|下标| +- **response**:见示例 +- **示例**: +创建计算任务并完成,在创建时需要传入参数output.need_query。 +```json +POST http://10.81.116.77:8688/tasks/create/sync +{ + "task_type": "compute", + "graph": "testdb", + "params": { + "compute.algorithm": "pagerank", + "compute.parallel":"10", + "output.type":"local", + "output.need_query":"1", + "output.parallel":"1", + "output.file_path":"result/pagerank", + "compute.max_step":"10" + } +} +``` +response示例: +```json +{ + "errcode": 0, + "task": { + "id": 3, + "status": "complete", + "create_type": "sync", + "create_time": "2022-12-05T16:42:07.669406+08:00", + "update_time": "2022-12-05T16:42:10.441145+08:00", + "graph_name": "testGraph", + "task_type": "compute", + "params": { + "compute.algorithm": "pagerank", + "compute.max_step": "10", + "compute.parallel": "100", + "output.delimiter": ",", + "output.parallel": "1", + "output.type": "local" + }, + "workers": [ + { + "name": "1599650248728154112", + "status": "complete" + }, + { + "name": "1599650253686800384", + "status": "complete" + }, + { + "name": "1599650242993369088", + "status": "complete" + } + ] + } +} +``` +cursor初始值为0,每一步迭代查询都会返回新的cursor值,迭代获取过程中传入上一步返回的新的cursor,即可迭代获取计算结果。在计算任务结束后,可通过任务id进行迭代的查询计算结果,使用任务id传入下述uri中使用。 +```json +POST http://master_ip:port/tasks/value/{$task_id}?index={$index}&limit={$limit} +``` +迭代获取计算结果response示例: +```json +{ + "errcode": 0, + "vertices": [ + { + "ID": "646354", + "Value": "6.712826E-07" + }, + { + "ID": "646357", + "Value": "4.7299116E-07" + }, + { + "ID": "646362", + "Value": "9.832202E-07" + }, + { + "ID": "646365", + "Value": "1.2438179E-06" + }, + { + "ID": "646368", + "Value": "4.4179902E-07" + } + ], + "cursor": 10 +} +``` + +### 3.10 中断任务接口 +- **简介**:根据任务id,中断正在进行的指定任务。可能中断失败的情况:任务已完成、任务不存在、任务状态为错误。 +- **uri**:`GET http://ip:port/task/cancel/{$task_id}` +- **params**: + +|参数名|是否必填|类型|默认值|取值范围|备注| +|---|---|---|---|---|---| +|task_id|是|int|-|-|要中断的任务id| + +- **response**:见示例 +- **示例**: +```json +GET http://10.81.116.77:8688/task/cancel/10 +//中断成功 +//200 OK +response: +{ + "errcode": 0, + "message": "cancel task: ok" +} +``` +```json +GET http://10.81.116.77:8688/task/cancel/10 +//中断失败 +//400 BadRequest +response: +{ + "errcode": -1, + "message": "task already complete" +} +``` + +### 3.11 healthcheck +- **简介**:查询master或worker程序是否正常。 +- **uri**:`GET http://ip:port/healthcheck` +- **params**:无 +- **response**: +```json +{ + "code": 200 +} +``` +- **示例**: +```json +GET http://10.81.116.77:8688/healthcheck +``` + +### 3.12 metrics +master或worker均可使用。除go自带指标外,增加的自定义指标有:http请求相关信息,图相关信息,任务相关信息等。详见下文示例。 +- **uri**:`GET http://ip:port/metrics` +- **params**:无 +- **response**: +```json +# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles. +# TYPE go_gc_duration_seconds summary +go_gc_duration_seconds{quantile="0"} 3.7791e-05 +go_gc_duration_seconds{quantile="0.25"} 5.2001e-05 +go_gc_duration_seconds{quantile="0.5"} 0.000126126 +go_gc_duration_seconds{quantile="0.75"} 0.000167708 +go_gc_duration_seconds{quantile="1"} 0.013796375 +go_gc_duration_seconds_sum 0.014740127 +go_gc_duration_seconds_count 9 +# HELP go_goroutines Number of goroutines that currently exist. +# TYPE go_goroutines gauge +go_goroutines 39 +# HELP go_info Information about the Go environment. +# TYPE go_info gauge +go_info{version="go1.18.8"} 1 +# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use. +# TYPE go_memstats_alloc_bytes gauge +go_memstats_alloc_bytes 6.429872e+06 +# HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed. +# TYPE go_memstats_alloc_bytes_total counter +go_memstats_alloc_bytes_total 1.696e+07 +# HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling +# TYPE go_memstats_buck_hash_sys_bytes gauge +go_memstats_buck_hash_sys_bytes 1.461579e+06 +# HELP go_memstats_frees_total Total number of frees. +# TYPE go_memstats_frees_total counter +go_memstats_frees_total 271491 +# HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata. +# TYPE go_memstats_gc_sys_bytes gauge +go_memstats_gc_sys_bytes 5.4024e+06 +# HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and still in use. +# TYPE go_memstats_heap_alloc_bytes gauge +go_memstats_heap_alloc_bytes 6.429872e+06 +# HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used. +# TYPE go_memstats_heap_idle_bytes gauge +go_memstats_heap_idle_bytes 1.1165696e+07 +# HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use. +# TYPE go_memstats_heap_inuse_bytes gauge +go_memstats_heap_inuse_bytes 8.658944e+06 +# HELP go_memstats_heap_objects Number of allocated objects. +# TYPE go_memstats_heap_objects gauge +go_memstats_heap_objects 29651 +# HELP go_memstats_heap_released_bytes Number of heap bytes released to OS. +# TYPE go_memstats_heap_released_bytes gauge +go_memstats_heap_released_bytes 5.414912 + +### 3.13 查询顶点的属性信息 +不推荐业务依赖此接口查询,仅适合于debug。 +- **接口有效条件**: + - 图的导入任务必须完成,图状态为loaded。 + - 接口有效的生命周期仅为本图进行计算任务时,非本图进行计算时,会对非计算任务的图进行落盘以释放内存空间,此后不可查询图的具体数据,直到下一次发起该图的计算任务时,该图会从磁盘中调起。 +- **简介**:通过传参可批量查询顶点的属性信息。 +- **uri**:`POST http://master_ip:port/graphs/{$graph_name}/vertices` +- **params**: +|参数|是否必填|类型|默认值|取值范围|备注| +|---|---|---|---|---|---| +|graph_name|是|string|-|-|图名| +|vertices|是|array|-|-|顶点id集合| +- **示例**: +```json +POST http://10.81.116.77:8688/graphs/{$graph_name}/vertices +request: +{ + "vertices": [ + "100", + "101", + "102", + "103" + ] +} +response: +{ + "errcode": 0, + "vertices": [ + { + "id": "100", + "property": { + "label": "user" + } + }, + { + "id": "101", + "property": { + "label": "user" + } + }, + { + "id": "102", + "property": { + "label": "user" + } + }, + { + "id": "103", + "property": { + "label": "user" + } + } + ] +} +``` + +## 四、附录:任务创建的参数列表 +|参数名|类型|默认值|取值范围|备注| +|---|---|---|---|---| +|load.parallel|int|1|大于0|数据加载线程数| +|load.type|string|local|local/hdfs/afs|加载方式| +|load.vertex_files|string|-|-|顶点文件名,多个文件用[n,m]格式,n是开始号,如v_[0,99]代表v_00;若加载方式为local,只能读取本地文件,示例:{"10.81.116.77":"v_[0,49]", "10.81.116.78":"v_[50,99]"}| +|load.edge_files|string|-|-|边文件名,多个文件用[n,m]格式,n是开始号,如e_[0,99]代表e_00;若加载方式为local,只能读取本地文件,示例:{"10.81.116.77":"e_[0,49]", "10.81.116.78":"e_[50,99]"}| +|load.use_property|int|0|0/1|是否使用属性| +|load.vertex_property|json|-|-|顶点属性列表,格式为json,key是属性名,示例:{"label": "2,0", "name":...};若读取方式为hugegraph,仅传入字符串,字符串内容为需要读取的属性列表,若不传入此项,将会读取全部属性,示例:"1,2,10001"| +|load.edge_property|json|-|-|边属性列表,格式为json,key是属性名,示例:{"label": "2,0", "name":...};若读取方式为hugegraph,仅传入字符串,字符串内容为需要读取的属性列表,若不传入此项,将会读取全部属性,示例:"1,2,10001"| +|load.use_outedge|int|0|0/1|是否生成顶点出边数据| +|load.use_out_degree|int|0|0/1|是否生成顶点出度数据| +|load.use_undirected|int|0|0/1|是否生成无向图数据| +|load.delimiter|string|空格|-|导入文件的列分隔符| +|load.hdfs_conf_path|string|-|-|hdfs配置文件路径| +|load.hdfs_namenode|string|-|-|hdfs namenode| +|load.hdfs_use_krb|int|0|0/1|是否使用Keberos认证| +|load.krb_name|string|-|-|Keberos用户名| +|load.krb_realm|string|-|-|Keberos realm| +|load.krb_conf_path|string|-|-|Keberos配置文件路径| +|load.krb_keytab_path|string|-|-|Keberos keytab路径| +|load.hg_pd_peers|string|-|-|导入数据的hugegraph pd的peers,示例:["10.14.139.69:8686"]| +|load.hugegraph_name|string|-|-|导入数据的hugegraph图名| +|load.hugegraph_username|string|-|-|导入数据的hugegraph用户名| +|load.hugegraph_password|string|-|-|导入数据的hugegraph密码| +|load.afs_uri|string|-|-|导入数据的afs集群uri| +|load.afs_username|string|-|-|导入数据的afs用户名| +|load.afs_password|string|-|-|导入数据的afs密码| +|load.afs_config_path|string|-|-|导入数据的afs配置文件路径| +|output.parallel|int|1|大于0|数据加载线程数| +|output.delimiter|string|空格|-|结果文件的列分隔符| +|output.file_path|string|-|-|结果文件路径| +|output.type|string|local|local/hdfs/afs|导出方式| +|output.need_query|int|0|0/1|是否通过master http接口查询计算结果| +|output.hdfs_conf_path|string|-|-|hdfs配置文件路径| +|output.hdfs_namenode|string|-|-|hdfs namenode| +|output.hdfs_use_krb|int|0|0/1|是否使用Keberos认证| +|output.krb_name|string|-|-|Keberos用户名| +|output.krb_realm|string|-|-|Keberos realm| +|output.krb_conf_path|string|-|-|Keberos配置文件路径| +|output.krb_keytab_path|string|-|-|Keberos keytab路径| +|output.afs_uri|string|-|-|导出数据的afs集群uri| +|output.afs_username|string|-|-|导出数据的afs用户名| +|output.afs_password|string|-|-|导出数据的afs密码| +|output.afs_config_path|string|-|-|导出数据的afs配置文件路径| +|output.hg_pd_peers|string|-|-|导出数据的hugegraph pd的peers| +|output.hugegraph_name|string|-|-|导出数据的hugegraph图名| +|output.hugegraph_username|string|-|-|导出数据的hugegraph用户名| +|output.hugegraph_password|string|-|-|导出数据的hugegraph密码| +|output.hugegraph_property|string|-|-|导出数据至hugegraph的属性| +|compute.max_step|int|10|大于0|计算的超级步数| +|compute.parallel|int|1|大于0|worker计算的线程数| +|pagerank.damping|float|0.85|0 - 1|阻尼系数,传导到下个点的百分比| +|pagerank.diff_threshold|float|0.00001|0 - 1|收敛精度,每次迭代各点变化绝对值累加和上限,小于这个值时算法停止| +|kout.source|string|-|-|起始点ID| +|degree.direction|string|out|in/out/both|边的方向| +|closeness_centrality.sample_rate|float|1.0|0 - 1|边的采样率,由于此算法是指数增长算法,算力要求非常高,需要根据业务需求设置合理采样率以得到一个近似结果| +|closeness_centrality.wf_improved|int|1|0/1|是否使用Wasserman and Faust中心性公式| +|betweenness_centrality.sample_rate|float|1.0|0 - 1|边的采样率,由于此算法是指数增长算法,算力要求非常高,需要根据业务需求设置合理采样率以得到一个近似结果| +|betweenness_centrality.use_endpoint|int|0|0/1|是否使用最后一个点| +|sssp.source|string|-|-|起始点ID| +|kcore.degree_k|int|3|大于0|最小度数阈值| \ No newline at end of file diff --git a/vermeer/docs/architecture.md b/vermeer/docs/architecture.md new file mode 100644 index 00000000..b68cc555 --- /dev/null +++ b/vermeer/docs/architecture.md @@ -0,0 +1,83 @@ +# Vermeer Architectural Design + +## Running architecture + +Vermeer has two roles: master and worker. There is only one master, but there can be multiple workers. The master is responsible for communication, forwarding, and summarizing, with a small amount of computation and low resource usage; the workers are computational nodes used for storing graph data and running computation tasks, consuming a large amount of memory and CPU. gRPC is used for internal communication, while REST is used for external calls. + +```mermaid +graph TD + subgraph Master + A[HTTP handler] --> B[Business logic] + B --> C1[Task Manager] + B --> C2[Graph Manager] + B --> C3[Worker Manager] + C1 --> D[gRPC handler] + C2 --> D + C3 --> D + end + subgraph Worker1 + E[gRPC handler] --> F[Business logic] + F --> G1[Task Manager] + F --> G2[Graph Manager] + F --> G3[Peer Manager] + end + subgraph Worker2 + H[gRPC handler] --> I[Business logic] + I --> J1[Task Manager] + I --> J2[Graph Manager] + I --> J3[Peer Manager] + end + request --> Master + D --> E + D --> H +``` + +## Load process + +```mermaid +sequenceDiagram + participant User + participant master + participant worker + participant DataSource + + User->>master: User initiates import request + master->>worker: Notify worker to start importing vertices + worker->>master: Get LoadPartition (vertex) + worker->>DataSource: Get data + worker->>worker: Send to corresponding worker by vertex hash + worker->>worker: scatter vertex + worker->>worker: gather vertex + master->>worker: Notify worker to start importing edges + worker->>master: Get LoadPartition (edge) + worker->>DataSource: Get data + worker->>worker: Send to corresponding worker by vertex hash +``` + +## Computing process + +```mermaid +sequenceDiagram + participant User + participant master + participant worker1 as worker + participant worker2 as worker + + User->>master: User initiates compute request + master->>worker1: Notify worker to start computing + master->>worker2: Notify worker to start computing + worker1->>worker1: Init + worker1->>worker1: BeforeStep + worker1->>worker1: Compute + worker1->>master: Scatter vertex values to mirror vertices + worker2->>worker2: Init + worker2->>worker2: BeforeStep + worker2->>worker2: Compute + worker2->>master: Scatter vertex values to mirror vertices + worker1->>master: aggregate + worker2->>master: aggregate + master->>master: Master Compute determines whether to continue + master->>worker1: next step or output + master->>worker2: next step or output +``` +
