Hi Bill, Did you try using Presto (from EMR) to query HUDI tables on S3, and it could support real time queries. And you have to partition your data properly to minimize the amount of data each query has to scan/process.
Regards, Felix K Jose From: Jialun Liu <liujialu...@gmail.com> Date: Saturday, June 5, 2021 at 3:53 PM To: dev@hudi.apache.org <dev@hudi.apache.org> Subject: Could Hudi Data lake support low latency, high throughput random reads? Caution: This e-mail originated from outside of Philips, be careful for phishing. Hey guys, I am not sure if this is the right forum for this question, if you know where this should be directed, appreciated for your help! The question is that "Could Hudi Data lake support low latency, high throughput random reads?". I am considering building a data lake that produces auxiliary information for my main service table. Example, say my main service is S3 and I want to produce the S3 object pull count as the auxiliary information. I am going to use Apache Hudi and EMR to process the S3 access log to produce the pull count. Now, what I don't know is that can data lake support low latency, high throughput random reads for online request-response type of service? This way I could serve this information to customers in real time. I could write the auxiliary information, pull count, back to the main service table, but I personally don't think it is a sustainable architecture. It would be hard to do independent and agile development if I continue to add more derived attributes to the main table. Any help would be appreciated! Best regards, Bill ________________________________ The information contained in this message may be confidential and legally protected under applicable law. The message is intended solely for the addressee(s). If you are not the intended recipient, you are hereby notified that any use, forwarding, dissemination, or reproduction of this message is strictly prohibited and may be unlawful. If you are not the intended recipient, please contact the sender by return e-mail and destroy all copies of the original message.