Hi Shiyan,
Hope you are doing well. As promised, we finished writing the RFC proposal and now we are ready to submit them as a PR with confident. According to the RFC Process, in order to check our elaborated designed RFC proposal, we need to add at least two PMCs as reviewers to examine it. Therefore, we would like to invite you as one of the reviewers sincerely to check our RFC proposal as well as give us some comments and feedbacks. Since we really put a lot of effort when writing this RFC proposal, and you are the first person who gave us feedback at the very beginning stage, we sincerely hope that you could accept our invitation so that I can put your Github account in the RFC. Likewise, if you have other suggested candidates, we'd be happy to invite them as reviewers, since the number of reviewers has no limitation. Wish you all good and look forward to receiving your reply. Sincerely, Xinyao Tian On 08/6/2022 10:11,Shiyan Xu<xu.shiyan.raym...@gmail.com> wrote: Hi Xinyao, awesome achievement! And really appreciate your keenness in contributing to Hudi. Certainly we'd love to see an RFC for this. On Fri, Aug 5, 2022 at 4:21 AM 田昕峣 (Xinyao Tian) <xinyaot...@yeah.net> wrote: Greetings everyone, My name is Xinyao and I'm currently working for an Insurance company. We found that Apache Hudi is an extremely awesome utility and when it cooprates with Apache Flink it can be even more powerful. Thus, we have been using it for months and still keep benefiting from it. However, there is one feature that we really desire but Hudi doesn't currently have: It is called "Multiple event_time fields verification". Because in the insurance industry, data is often stored distributed in dozens of tables and conceptually connected by same primary keys. When the data is being used, we often need to associate several or even dozens of tables through the Join operation, and stitch all partial columns into an entire record with dozens or even hundreds of columns for downstream services to use. Here comes to the problem. If we want to guarantee that every part of the data being joined is up to date, Hudi must have the ability to filter multiple event_time timestamps in a table and keep the most recent records. So, in this scenario, the signle event_time filtering field provided by Hudi (i.e. option 'write.precombine.field' in Hudi 0.10.0) is a bit inadequate. Obviously, in order to cope with the use case with complex Join operations like above, as well as to provide much potential for Hudi to support more application scenarios and engage into more industries, Hudi definitely needs to support the multiple event_time timestamps filtering feature in a single table. A good news is that, after more than two months of development, me and my colleagues have made some changes in the hudi-flink and hudi-common modules based on the hudi-0.10.0 and basically have achieved this feature. Currently, my team is using the enhanced source code and working with Kafka and Flink 1.13.2 to conduct some end-to-end testing on a dataset of more than 140 million real-world insurance data and verifying the accuracy of the data. The result is quite good: every part of the extremely-wide records have been updated to latest status based on our continuous observations during these weeks. We're very keen to make this new feature available to everyone. We benefit from the Hudi community, so we really desire to give back to the community with our efforts. The only problem is that, we are not sure whether we need to create a RFC to illusrtate our design and implementations in detail. According to "RFC Process" in Hudi official documentation, we have to confirm that this feature has not already exsited so that we could create a new RFC to share concept and code as well as explain them in detail. Thus, we really would like to create a new RFC that would explain our implementation in detail with theory and code, as well as make it easier for everyone to understand and make improvement based on our RFC. Look forward to receiving your feedback whether we should create a new RFC and make Hudi better and better to benifit everyone. Kind regards, Xinyao Tian -- Best, Shiyan