hudi-bot opened a new issue, #14718:
URL: https://github.com/apache/hudi/issues/14718
Hi, all:
We plan to use Hudi to sync mysql binlog data. There will be a flink ETL
task to consume binlog records from kafka and save data to hudi every one hour.
The binlog records are also grouped every one hour and all records of one hour
will be saved in one commit. The data transmission pipeline should be like –
binlog -> kafka -> flink -> parquet.
After the data is synced to hudi, we want to querying the historical hourly
versions of the Hudi table in hive SQL.
Here is a more detailed description of our issue along with a simply design
of Time Travel for Hudi, the design is under development and testing:
[https://docs.google.com/document/d/1r0iwUsklw9aKSDMzZaiq43dy57cSJSAqT9KCvgjbtUo/edit?usp=sharing]
We have to support Time Travel ability recently for our business needs. We
also have seen the [RFC
07|https://cwiki.apache.org/confluence/display/HUDI/RFC+-+07+%3A+Point+in+time+Time-Travel+queries+on+Hudi+table].
Be glad to receive any suggestion or dicussion.
## JIRA info
- Link: https://issues.apache.org/jira/browse/HUDI-1460
- Type: New Feature
---
## Comments
14/Dec/20 16:07;xleesf;[~qian heng] sorry would not access the google doc
you provided, and it would be better if you would send a discuss email to dev
ML. ;;;
---
14/Dec/20 18:50;nishith29;[~qian heng] Like [~xleesf] pointed, even I was
unable to access the google doc. Could you please start a discuss thread on the
dev mailing list ? This will help you get feedback from other members as well.
Based on that, we can see if this needs a separate RFC or we can make changes
to RFC-07;;;
---
15/Dec/20 00:02;vinoth;+1 if we can keep discussions to the mailing list and
then onto the cWIki, that would be great.
Happy to provide any access/permissions as needed. ;;;
---
15/Dec/20 06:20;qian heng;The doc is already available, sorry for the
mistake;;;
---
12/Mar/22 14:12;xushiyan;[~x1q1j1] can you please go through the description
and design doc to see if any further work needed?;;;
---
13/Mar/22 05:39;x1q1j1;hi [~qian heng] 1. SparkSQL already supports time
travel to query Hudi table HUDI-3221
2. Hive SQL needs to add syntax support to hive source code.(This priority
will be implemented later than presto)
3. Presto/Trino SQL implemented time travel to query Hudi table. (will be
next);;;
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]