mhamedbenjmaa commented on issue #7077:
URL: https://github.com/apache/hop/issues/7077#issuecomment-4618416075

   ok so for start , Looking at the Data Vault Configuration screen, I assume 
that metadata Data Vault Configuration is a global settings, one per project 
and metadata Data Vault Model is a per-table definition  if this is the case I 
suggest splitting the options into two levels:
   these must be consistent across all tables to guarantee vault integrity:
   
   - Hash algorithm
   - hash key data type
   - hash content casing
   - business key delimiter ==> careful the checksum component in HOP do not 
have this automatic option , you will have to generate a formula if there is 
multiple business key before calculating the hash ....
   - null placeholder
   - trim business keys
   - generate unknown record
   - use hashdiff for satellites
   - use load end date pattern
   - unknown business key value
   so you keep only these options in the global option and 
   
   Per-model configuration (defined at the table level in the metadata 
   
   
   Data Vault Model, the developer keeps full control over his own naming 
convention )
   
   - Hub hash key column name
   - link hash key column name
   - satellite hashdiff column name
   - load date column name
   - load end date column name
   - record source column name
   - record source value  ==> if you put this in the global that assume that 
you will have to create a global configuration for each source 
   
   The reason is that naming conventions vary significantly across teams and 
source systems, For example, a developer might name all columns in a customer 
satellite with a _CUS suffix — CUSTOMER_HK_CUS, LOAD_DATE_CUS, SOURCE_CUS — so 
that in any multi-table SQL query the origin of every column is immediately 
obvious, with zero ambiguity.
   
   what do you think ? 
   
   PS attached a simple tables i used back then for my training and the 
expected results using MD5, even if I personaly use SHA-256 
   [files.zip](https://github.com/user-attachments/files/28575273/files.zip)
   
   [expected 
result.zip](https://github.com/user-attachments/files/28577444/expected.result.zip)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to