kartikeyamandhar opened a new pull request, #1532:
URL: https://github.com/apache/hamilton/pull/1532

   Full pipeline expressed as Hamilton DAGs:
   - Ingestion: TMDB JSON -> Neo4j via batched Cypher MERGE (4,803 movies, 
111k+ person edges, 20 genres, 5,047 companies)
   - Embedding: OpenAI text-embedding-3-small on Movie nodes with Neo4j cosine 
vector index
   - Retrieval: semantic entity resolution + 4-strategy routing (VECTOR / 
CYPHER / AGGREGATE / HYBRID) with direct Neo4j Cypher
   - Generation: gpt-4o with graph-grounded context
   - Passes 40 test queries across all retrieval categories
   - DAG visualisations in docs/images/
   
   ## Changes
   
   Adds a standalone Neo4j GraphRAG example to 
`examples/LLM_Workflows/neo4j_graph_rag/`.
   
   The full pipeline is expressed as Hamilton DAGs across four modules:
   
   - `ingest_module.py` — TMDB JSON → Neo4j via batched Cypher MERGE (4,803 
movies, 111k+ person edges, genres, production companies)
   - `embed_module.py` — OpenAI text-embedding-3-small embeddings stored on 
Movie nodes with a Neo4j cosine vector index
   - `retrieval_module.py` — semantic entity resolution + 4-strategy routing 
(VECTOR / CYPHER / AGGREGATE / HYBRID) with direct Neo4j Cypher
   - `generation_module.py` — gpt-4o with graph-grounded context
   - `run.py` — entry point for all three pipelines with `--visualise` support
   - `docker-compose.yml` — Neo4j 5 + APOC
   - DAG visualisations in `docs/images/`
   
   ## How I tested this
   
   Manually tested 40 queries covering direct lookup, filmography, 
co-occurrence, aggregation, multi-hop traversal, semantic similarity, and 
hybrid filtering. All 40 pass.
   
   ## Notes
   
   Dataset is TMDB 5000 Movies from Kaggle. Download instructions are in 
`data/README.md`. The existing GraphRAG example in this repo uses FalkorDB — 
this is a standalone Neo4j variant using direct Cypher retrieval rather than 
LlamaIndex.
   
   ## Checklist
   - [x] PR has an informative and human-readable title
   - [x] Changes are limited to a single goal (no scope creep)
   - [x] Code passed the pre-commit check & code is left cleaner/nicer than 
when first encountered
   - [x] New functions are documented (with a description, list of inputs, and 
expected output)
   - [x] Placeholder code is flagged / future TODOs are captured in comments
   - [x] Project documentation has been updated if adding/changing functionality


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to