FelixKJose edited a comment on issue #4891: URL: https://github.com/apache/hudi/issues/4891#issuecomment-1061421066
@codope @suryaprasanna Thank you for the detailed information. Couple of questions: 1. Let's say my each partitions (date) are large partitions (eg. 6.5 TB uncompressed data), so having the frequent async clustering is suggested right? I am running on r5.4xlarge (meaning 37GB driver memory), so what will be best clusering frequency? What will be the best value for `hoodie.clustering.plan.strategy.small.file.limit`? 2. Also any other configurations I should be using considering the partition size as mentioned above 3. Which lock provider is advised if I am running on AWS EMR? Note: Our requirement is to ingest data quickly and at the same time expecting to support interactive workloads for query side. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org