kartikeyamandhar opened a new pull request, #1532: URL: https://github.com/apache/hamilton/pull/1532
Full pipeline expressed as Hamilton DAGs: - Ingestion: TMDB JSON -> Neo4j via batched Cypher MERGE (4,803 movies, 111k+ person edges, 20 genres, 5,047 companies) - Embedding: OpenAI text-embedding-3-small on Movie nodes with Neo4j cosine vector index - Retrieval: semantic entity resolution + 4-strategy routing (VECTOR / CYPHER / AGGREGATE / HYBRID) with direct Neo4j Cypher - Generation: gpt-4o with graph-grounded context - Passes 40 test queries across all retrieval categories - DAG visualisations in docs/images/ ## Changes Adds a standalone Neo4j GraphRAG example to `examples/LLM_Workflows/neo4j_graph_rag/`. The full pipeline is expressed as Hamilton DAGs across four modules: - `ingest_module.py` — TMDB JSON → Neo4j via batched Cypher MERGE (4,803 movies, 111k+ person edges, genres, production companies) - `embed_module.py` — OpenAI text-embedding-3-small embeddings stored on Movie nodes with a Neo4j cosine vector index - `retrieval_module.py` — semantic entity resolution + 4-strategy routing (VECTOR / CYPHER / AGGREGATE / HYBRID) with direct Neo4j Cypher - `generation_module.py` — gpt-4o with graph-grounded context - `run.py` — entry point for all three pipelines with `--visualise` support - `docker-compose.yml` — Neo4j 5 + APOC - DAG visualisations in `docs/images/` ## How I tested this Manually tested 40 queries covering direct lookup, filmography, co-occurrence, aggregation, multi-hop traversal, semantic similarity, and hybrid filtering. All 40 pass. ## Notes Dataset is TMDB 5000 Movies from Kaggle. Download instructions are in `data/README.md`. The existing GraphRAG example in this repo uses FalkorDB — this is a standalone Neo4j variant using direct Cypher retrieval rather than LlamaIndex. ## Checklist - [x] PR has an informative and human-readable title - [x] Changes are limited to a single goal (no scope creep) - [x] Code passed the pre-commit check & code is left cleaner/nicer than when first encountered - [x] New functions are documented (with a description, list of inputs, and expected output) - [x] Placeholder code is flagged / future TODOs are captured in comments - [x] Project documentation has been updated if adding/changing functionality -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
