mhamedbenjmaa commented on issue #7077:
URL: https://github.com/apache/hop/issues/7077#issuecomment-4613199570

   Hi Matt, 
   
   thanks for the detailed feedback and for moving this to the roadmap.❤️
   
   This architecture is exactly the direction i had in mind : the in-memory 
compilation approach is clean and the CDC pattern with `Merge Rows (diff)` is a 
smart way to avoid expensive delta SQL on the warehouse side.
   
   ## On the JSON spec
   
   I see it as the **generated output of the GUI, not something the user writes 
by hand**. With 50 tables and 500 columns, nobody should be editing JSON 
manually. The Visual Modeling Perspective (your Phase 2) is where the developer 
works, they select the business key, tick the satellite columns, choose the 
hash algorithm, and the tool generates the JSON silently in the background. The 
user never sees it unless they want to.
   
   So the full flow from a user perspective would be:
   
   ```text
   [GUI — checkbox-driven model definition] → [Auto-generated JSON] → 
[In-memory compile] → [Execution]
   ```
   
   This also makes your zero-deployment principle even stronger, the JSON is 
just a save format for the model, not a deployment artifact.
   
   ## On schema evolution and maintenance
   
   Vault schema changes adding a column to a satellite, splitting a satellite, 
reloading history, are rare events that always require an architectural 
decision. I propose these remain intentionally **manual and developer-driven** 
in the initial version. No tool should make that call automatically. This keeps 
the scope clean and the tool reliable.
   
   ## On the Data Mart layer
   
   One thing not yet covered in the discussion: once the Raw Vault exists, I 
envision a second phase where the same model-driven approach extends to **star 
schema generation**. The developer selects Hubs, Satellites, and Links from the 
model and instructs the tool to project them as Fact tables or Dimension 
tables, applying the same zero-deployment, checkbox-driven philosophy to Data 
Mart construction.
   
   This would give Apache Hop end-to-end coverage:
   
   ```text
   Source → Raw Vault → Data Mart
   ```
   
   The full journey of a classic DW project, with the repetitive scaffolding 
automated and the architectural decisions kept in the hands of the developer. 
Maintenance of the mart layer would also be manual by design, for the same 
reasons as the vault.
   
   ## Contributing
   
   I would like to actively contribute, starting with Phase 1 — the metadata 
schema definition. I am familiar with DV2.0 conventions and can help draft the 
`@HopMetadata` schema covering Hubs, Satellites, Links, hash key configuration, 
load date handling, and record source tracking.
   
   Where would you like me to start; is there an existing schema convention in 
Hop I should follow as a reference, or is this greenfield?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to