[ https://issues.apache.org/jira/browse/HUDI-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Raymond Xu updated HUDI-1267: ----------------------------- Component/s: writer-core > Additional Metadata Details for Hudi Transactions > ------------------------------------------------- > > Key: HUDI-1267 > URL: https://issues.apache.org/jira/browse/HUDI-1267 > Project: Apache Hudi > Issue Type: Improvement > Components: Usability, writer-core > Affects Versions: 0.9.0 > Reporter: Ashish M G > Priority: Major > Fix For: 0.11.0 > > > Whenever following scenarios happen : > # Custom Datasource ( Kafka for instance ) -> Hudi Table > # Hudi -> Hudi Table > # s3 -> Hudi Table > Following metadata need to be captured : > # Table Level Metadata > * > ** Operation name ( record level ) like Upsert, Insert etc for last > operation performed on the row > # Transaction Level Metadata ( This will be logged on Hudi Level and not > Table Level ) > ** Source ( Kafka Topic Name / S3 url for source data in case of s3 etc ) > ** Target Hudi Table Name > ** Last transaction time ( last commit time ) > Basically , point (1) collects all details on table level and point (2) > collects all the transactions happened on Hudi Level > Point(1) would be just a column addition for operation type > Eg for Point (2) : Suppose we had an ingestion from Kafka topic 'A' to Hudi > table 'ingest_kafka' and another ingestion from RDBMS table ( 'tableA' ) > through Sqoop to Hudi Table 'RDBMSingest' then the metadata captured would be > : > > |Source|Timestamp|Transaction Type|Target| > |Kafka - 'A'|XXXXXX|UPSERT|ingest_kafka| > |RDBMS - 'tableA'|XXXXXX|INSERT|RDBMSingest| > > The Transaction Details Table in Point (2) should be available as a separate > common table which can be queried as Hudi Table or stored as parquet which > can be queried from Spark -- This message was sent by Atlassian Jira (v8.20.1#820001)