Re: [I] [Question] 如何通过csv文件导入边数据 [incubator-hugegraph-toolchain]

via GitHub Sun, 24 Aug 2025 08:07:00 -0700


dosubot[bot] commented on issue #680:
URL: 
https://github.com/apache/incubator-hugegraph-toolchain/issues/680#issuecomment-3218174858


   <!-- Answer -->
   hugegraph-loader 支持通过 CSV 文件批量导入边数据，你需要准备好 edge.csv 
文件、schema.groovy（定义所有属性和边类型），以及 mapping 配置文件（如 struct.json）。核心步骤如下：
   
   1. **CSV 文件格式**  
      edge.csv 建议加表头，例如：  
      ```
      source_id,target_id,edge_type
      -201035082963479683,-1779678833482502384,属性
      4646408208538057683,-1861609733419239066,描述
      ...
      ```
   
   2. **schema.groovy 片段**  
      确保所有字段都已定义，例如：
      ```groovy
      schema.propertyKey("id").asText().ifNotExist().create();
      schema.propertyKey("名称").asText().ifNotExist().create();
      schema.propertyKey("edge_type").asText().ifNotExist().create();
      schema.vertexLabel("实体").properties("id", 
"名称").idStrategy("CUSTOMIZE_STRING").ifNotExist().create();
      
schema.edgeLabel("属性").sourceLabel("实体").targetLabel("实体").properties("edge_type").ifNotExist().create();
      
schema.edgeLabel("描述").sourceLabel("实体").targetLabel("实体").properties("edge_type").ifNotExist().create();
      
schema.edgeLabel("标签").sourceLabel("实体").targetLabel("实体").properties("edge_type").ifNotExist().create();
      ```
   
   3. **mapping 配置 struct.json 示例**  
      针对每种边类型建议单独配置一份 edge mapping。例如：
      ```json
      {
        "input": {
          "type": "FILE",
          "path": "edge.csv",
          "format": "CSV",
          "delimiter": ",",
          "header": ["source_id", "target_id", "edge_type"],
          "charset": "UTF-8"
        },
        "edges": [
          {
            "label": "属性",
            "source": ["source_id"],
            "target": ["target_id"],
            "field_mapping": {
              "source_id": "id",
              "target_id": "id",
              "edge_type": "edge_type"
            }
          },
          {
            "label": "描述",
            "source": ["source_id"],
            "target": ["target_id"],
            "field_mapping": {
              "source_id": "id",
              "target_id": "id",
              "edge_type": "edge_type"
            }
          },
          {
            "label": "标签",
            "source": ["source_id"],
            "target": ["target_id"],
            "field_mapping": {
              "source_id": "id",
              "target_id": "id",
              "edge_type": "edge_type"
            }
          }
        ]
      }
      ```
      每个 edge mapping 的 label 要和 schema.groovy 里的边类型一致。
   
   4. **性能参数建议**  
      大批量导入时，建议命令行加上：
      ```
      --batch-insert-threads 8 --test-mode true
      ```
      线程数可根据机器配置调整。
   
   5. **注意事项**  
      - CSV 表头要和 mapping 文件里的字段一致。
      - schema.groovy 里所有用到的 property key 必须提前定义，否则 loader 会报错。
      - mapping 文件和 schema 配置需严格对应 CSV 文件结构和实际 schema，避免因字段不匹配导致导入失败。
      - 边类型建议分批导入，便于定位和优化。
   
   详细案例和 mapping 文件可参考 [社区 issue 
658](https://github.com/apache/incubator-hugegraph-toolchain/issues/658#issuecomment-2727330047)
 和 loader 测试用例（src/test/resources 目录）。
   
   
   <!-- Dosu Comment Footer -->
   *To reply, just mention [@dosu](https://go.dosu.dev/dosubot).*
   
   ---
   <sup>How did I do? 
[Good](https://app.dosu.dev/response-feedback/8d3a594e-9fb6-40a7-95a1-f438d0c1b65d?feedback_type=great_response)
 | 
[Irrelevant](https://app.dosu.dev/response-feedback/8d3a594e-9fb6-40a7-95a1-f438d0c1b65d?feedback_type=irrelevant_answer)
 | 
[Incorrect](https://app.dosu.dev/response-feedback/8d3a594e-9fb6-40a7-95a1-f438d0c1b65d?feedback_type=incorrect_sources)
 | 
[Verbose](https://app.dosu.dev/response-feedback/8d3a594e-9fb6-40a7-95a1-f438d0c1b65d?feedback_type=too_verbose)
 | 
[Hallucination](https://app.dosu.dev/response-feedback/8d3a594e-9fb6-40a7-95a1-f438d0c1b65d?feedback_type=hallucination)
 | [Report 
🐛](https://app.dosu.dev/response-feedback/8d3a594e-9fb6-40a7-95a1-f438d0c1b65d?feedback_type=bug_report)
 | 
[Other](https://app.dosu.dev/response-feedback/8d3a594e-9fb6-40a7-95a1-f438d0c1b65d?feedback_type=other)</sup>&nbsp;&nbsp;[![Chat
 with 
Dosu](https://dosu.dev/dosu-chat-badge.svg)](https://app.dosu.dev/e5e246fa-a522-4d8a-ab3c-e2588b5b501b/ask?utm_source=githu
 b)&nbsp;[![Join 
Discord](https://img.shields.io/badge/join-5865F2?logo=discord&logoColor=white&label=)](https://go.dosu.dev/discord-bot)&nbsp;[![Share
 on 
X](https://img.shields.io/badge/X-share-black)](https://twitter.com/intent/tweet?text=%40dosu_ai%20helped%20me%20solve%20this%20issue!&url=https%3A//github.com/apache/incubator-hugegraph-toolchain/issues/680)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] [Question] 如何通过csv文件导入边数据 [incubator-hugegraph-toolchain]

Reply via email to