nsivabalan commented on issue #4864: URL: https://github.com/apache/hudi/issues/4864#issuecomment-1302903706
Insert drop dups will consider file groups for matching partitions only. So, if you incoming batch contains records for 1 partition, hudi will do an index look up only in 1 partition and then drop matching records. another suggestion you can try is. You can enable clustering and batch lot of small files into larger file groups. And so, index look up will have lesser no of files/bloom to look up. Alternatively, if records are spread across all file groups randomly in a given partition, you can try "SIMPLE" index. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org