mhamedbenjmaa commented on issue #7077: URL: https://github.com/apache/hop/issues/7077#issuecomment-4618416075
ok so for start , Looking at the Data Vault Configuration screen, I assume that metadata Data Vault Configuration is a global settings, one per project and metadata Data Vault Model is a per-table definition if this is the case I suggest splitting the options into two levels: these must be consistent across all tables to guarantee vault integrity: - Hash algorithm - hash key data type - hash content casing - business key delimiter ==> careful the checksum component in HOP do not have this automatic option , you will have to generate a formula if there is multiple business key before calculating the hash .... - null placeholder - trim business keys - generate unknown record - use hashdiff for satellites - use load end date pattern - unknown business key value so you keep only these options in the global option and Per-model configuration (defined at the table level in the metadata Data Vault Model, the developer keeps full control over his own naming convention ) - Hub hash key column name - link hash key column name - satellite hashdiff column name - load date column name - load end date column name - record source column name - record source value ==> if you put this in the global that assume that you will have to create a global configuration for each source The reason is that naming conventions vary significantly across teams and source systems, For example, a developer might name all columns in a customer satellite with a _CUS suffix — CUSTOMER_HK_CUS, LOAD_DATE_CUS, SOURCE_CUS — so that in any multi-table SQL query the origin of every column is immediately obvious, with zero ambiguity. what do you think ? PS attached a simple tables i used back then for my training and the expected results using MD5, even if I personaly use SHA-256 [files.zip](https://github.com/user-attachments/files/28575273/files.zip) [expected result.zip](https://github.com/user-attachments/files/28577444/expected.result.zip) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
