mhamedbenjmaa commented on issue #7077: URL: https://github.com/apache/hop/issues/7077#issuecomment-4613199570
Hi Matt, thanks for the detailed feedback and for moving this to the roadmap.❤️ This architecture is exactly the direction i had in mind : the in-memory compilation approach is clean and the CDC pattern with `Merge Rows (diff)` is a smart way to avoid expensive delta SQL on the warehouse side. ## On the JSON spec I see it as the **generated output of the GUI, not something the user writes by hand**. With 50 tables and 500 columns, nobody should be editing JSON manually. The Visual Modeling Perspective (your Phase 2) is where the developer works, they select the business key, tick the satellite columns, choose the hash algorithm, and the tool generates the JSON silently in the background. The user never sees it unless they want to. So the full flow from a user perspective would be: ```text [GUI — checkbox-driven model definition] → [Auto-generated JSON] → [In-memory compile] → [Execution] ``` This also makes your zero-deployment principle even stronger, the JSON is just a save format for the model, not a deployment artifact. ## On schema evolution and maintenance Vault schema changes adding a column to a satellite, splitting a satellite, reloading history, are rare events that always require an architectural decision. I propose these remain intentionally **manual and developer-driven** in the initial version. No tool should make that call automatically. This keeps the scope clean and the tool reliable. ## On the Data Mart layer One thing not yet covered in the discussion: once the Raw Vault exists, I envision a second phase where the same model-driven approach extends to **star schema generation**. The developer selects Hubs, Satellites, and Links from the model and instructs the tool to project them as Fact tables or Dimension tables, applying the same zero-deployment, checkbox-driven philosophy to Data Mart construction. This would give Apache Hop end-to-end coverage: ```text Source → Raw Vault → Data Mart ``` The full journey of a classic DW project, with the repetitive scaffolding automated and the architectural decisions kept in the hands of the developer. Maintenance of the mart layer would also be manual by design, for the same reasons as the vault. ## Contributing I would like to actively contribute, starting with Phase 1 — the metadata schema definition. I am familiar with DV2.0 conventions and can help draft the `@HopMetadata` schema covering Hubs, Satellites, Links, hash key configuration, load date handling, and record source tracking. Where would you like me to start; is there an existing schema convention in Hop I should follow as a reference, or is this greenfield? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
