[PR] Refactor: Optimize Vector Index Performance with Parallel Embedding Generation [incubator-hugegraph-ai]

via GitHub Sat, 31 May 2025 04:33:41 -0700


weijinglin opened a new pull request, #264:
URL: https://github.com/apache/incubator-hugegraph-ai/pull/264


   ## Changes
   This PR introduces performance optimizations for vector index building and 
querying by implementing parallel text embedding generation.
   
   ### Key Improvements
   1. Added new utility class `embedding_utils.py` with parallel batch 
processing capabilities
      - Implements `get_embeddings_parallel` function for efficient batch 
processing
      - Uses asyncio with semaphore for controlled concurrency
      - Supports batch size of 1000 with max 10 concurrent tasks
   
   2. Refactored all index operation classes to use parallel processing:
      - `BuildGremlinExampleIndex`
      - `BuildSemanticIndex`
      - `BuildVectorIndex`
      - `GremlinExampleIndexQuery`
      - `SemanticIdQuery`
      - `VectorIndexQuery`
   
   3. Unified embedding generation approach:
      - Replaced individual `get_text_embedding` calls with batch 
`get_texts_embeddings`
      - Removed duplicate parallel processing code
      - Improved code reusability and maintainabilityl


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] Refactor: Optimize Vector Index Performance with Parallel Embedding Generation [incubator-hugegraph-ai]

Reply via email to