This is an automated email from the ASF dual-hosted git repository. jin pushed a commit to branch rag in repository https://gitbox.apache.org/repos/asf/incubator-hugegraph-ai.git
commit 78e1d18496439b175200f44dffa891af99f0db05 Author: imbajin <[email protected]> AuthorDate: Mon Jul 1 18:44:52 2024 +0800 refact: improve the graph rag experience --- README.md | 2 +- hugegraph-llm/README.md | 23 ++++++++++++++++------- hugegraph-llm/requirements.txt | 2 +- hugegraph-llm/src/hugegraph_llm/config/config.ini | 8 +++----- 4 files changed, 21 insertions(+), 14 deletions(-) diff --git a/README.md b/README.md index 9328561..705934d 100644 --- a/README.md +++ b/README.md @@ -12,7 +12,7 @@ in their projects. - [hugegraph-llm](./hugegraph-llm):The `hugegraph-llm` will house the implementation and research related to large language models. It will include runnable demos and can also be used as a third-party library, reducing the cost of using graph systems and the complexity of building knowledge graphs. Graph systems can help large models address challenges like timeliness -and hallucination, while large models can assist graph systems with cost-related issues. Therefore, this module will +and hallucination, while large models can help graph systems with cost-related issues. Therefore, this module will explore more applications and integration solutions for graph systems and large language models. - [hugegraph-ml](./hugegraph-ml): The `hugegraph-ml` will focus on integrating HugeGraph with graph machine learning, graph neural networks, and graph embeddings libraries. It will build an efficient and versatile intermediate layer diff --git a/hugegraph-llm/README.md b/hugegraph-llm/README.md index 3df3e97..183486e 100644 --- a/hugegraph-llm/README.md +++ b/hugegraph-llm/README.md @@ -6,7 +6,7 @@ The `hugegraph-llm` is a tool for the implementation and research related to lar This project includes runnable demos, it can also be used as a third-party library. As we know, graph systems can help large models address challenges like timeliness and hallucination, -while large models can assist graph systems with cost-related issues. +while large models can help graph systems with cost-related issues. With this project, we aim to reduce the cost of using graph systems, and decrease the complexity of building knowledge graphs. This project will offer more applications and integration solutions for @@ -25,13 +25,21 @@ graph systems and large language models. - Start the HugeGraph database, you can do it via Docker. Refer to [docker-link](https://hub.docker.com/r/hugegraph/hugegraph) & [deploy-doc](https://hugegraph.apache.org/docs/quickstart/hugegraph-server/#31-use-docker-container-convenient-for-testdev) for guidance - Start the gradio interactive demo, you can start with the following command, and open http://127.0.0.1:8001 after starting ```bash - # ${PROJECT_ROOT_DIR} is the root directory of hugegraph-ai, which needs to be configured by yourself + # 0. clone the hugegraph-ai project & enter the root dir + # 1. configure the environment path + PROJECT_ROOT_DIR = "/path/to/hugegraph-ai" # root directory of hugegraph-ai export PYTHONPATH=${PROJECT_ROOT_DIR}/hugegraph-llm/src:${PROJECT_ROOT_DIR}/hugegraph-python-client/src - python3 ./hugegraph-llm/src/hugegraph_llm/utils/gradio_demo.py + + # 2. install the required packages/deps (better to use virtualenv(venv) to manage the environment) + cd hugegraph-llm + pip install -r requirements.txt # ensure the python/pip version is satisfied + # 2.1 set basic configs in the hugegraph-llm/config/config.ini (Optional, you can also set it in gradio) + + # 3. start the gradio server, wait for some time to initialize + python3 ./src/hugegraph_llm/utils/gradio_demo.py ``` -- Configure HugeGraph database connection information and LLM information, which can be configured in two ways: - 1. Configure the `./hugegraph-llm/src/config/config.ini` file - 2. In gradio, after completing the configurations for LLM and HugeGraph, click on `Initialize configs`, the complete and initialized configuration file will be outputted. +- Configure HugeGraph database connection information & LLM information in the gradio interface, + click on `Initialize configs`, the complete and initialized configuration file will be overwritten. - offline download NLTK stopwords ```bash python3 ./hugegraph_llm/operators/common_op/nltk_helper.py @@ -105,7 +113,8 @@ The methods of the `KgBuilder` class can be chained together to perform a sequen Run example like `python3 ./hugegraph-llm/examples/graph_rag_test.py` -The `GraphRAG` class is used to integrate HugeGraph with large language models to provide retrieval-augmented generation capabilities. Here is a brief usage guide: +The `GraphRAG` class is used to integrate HugeGraph with large language models to provide retrieval-augmented generation capabilities. +Here is a brief usage guide: 1. **Extract Keyword:**: Extract keywords and expand synonyms. diff --git a/hugegraph-llm/requirements.txt b/hugegraph-llm/requirements.txt index 47a2d23..9c48a65 100644 --- a/hugegraph-llm/requirements.txt +++ b/hugegraph-llm/requirements.txt @@ -1,5 +1,5 @@ openai==0.28.1 retry==0.9.2 -tiktoken==0.5.1 +tiktoken==0.7.0 nltk==3.8.1 gradio==4.19.2 diff --git a/hugegraph-llm/src/hugegraph_llm/config/config.ini b/hugegraph-llm/src/hugegraph_llm/config/config.ini index d3ca7d3..5043fdd 100644 --- a/hugegraph-llm/src/hugegraph_llm/config/config.ini +++ b/hugegraph-llm/src/hugegraph_llm/config/config.ini @@ -28,26 +28,24 @@ graph = hugegraph # type = local_api # llm_url = http://localhost:7999/v1/chat/completions # -## openai +## OpenAI # type = openai # api_key = xxx # api_base = xxx # model_name = gpt-3.5-turbo-16k # max_token = 4000 # -## ernie +## WenXin (ernie) # type = ernie # api_key = xxx # secret_key = xxx -# llm_url = xxx +# llm_url = https://aip.baidubce.com/rpc/2.0/ai_custom/v1/wenxinworkshop/chat/completions_pro?access_token= # model_name = ernie # -# type = openai type = local_api api_key = xxx api_base = https://api.openai.com/v1 secret_key = xxx -# llm_url = https://aip.baidubce.com/rpc/2.0/ai_custom/v1/wenxinworkshop/chat/completions_pro?access_token= llm_url = http://localhost:7999/v1/chat/completions model_name = gpt-3.5-turbo-16k max_token = 4000
