Hey guys, I am not sure if this is the right forum for this question, if you know where this should be directed, appreciated for your help!
The question is that "Could Hudi Data lake support low latency, high throughput random reads?". I am considering building a data lake that produces auxiliary information for my main service table. Example, say my main service is S3 and I want to produce the S3 object pull count as the auxiliary information. I am going to use Apache Hudi and EMR to process the S3 access log to produce the pull count. Now, what I don't know is that can data lake support low latency, high throughput random reads for online request-response type of service? This way I could serve this information to customers in real time. I could write the auxiliary information, pull count, back to the main service table, but I personally don't think it is a sustainable architecture. It would be hard to do independent and agile development if I continue to add more derived attributes to the main table. Any help would be appreciated! Best regards, Bill