Could Hudi Data lake support low latency, high throughput random reads?

Jialun Liu Sat, 05 Jun 2021 12:53:38 -0700

Hey guys,

I am not sure if this is the right forum for this question, if you know
where this should be directed, appreciated for your help!


The question is that "Could Hudi Data lake support low latency, high
throughput random reads?".

I am considering building a data lake that produces auxiliary information
for my main service table. Example, say my main service is S3 and I want to
produce the S3 object pull count as the auxiliary information. I am going
to use Apache Hudi and EMR to process the S3 access log to produce the pull
count. Now, what I don't know is that can data lake support low latency,
high throughput random reads for online request-response type of service?
This way I could serve this information to customers in real time.

I could write the auxiliary information, pull count, back to the main
service table, but I personally don't think it is a sustainable
architecture. It would be hard to do independent and agile development if I
continue to add more derived attributes to the main table.

Any help would be appreciated!

Best regards,
Bill

Could Hudi Data lake support low latency, high throughput random reads?

Reply via email to