nsivabalan commented on issue #4864:
URL: https://github.com/apache/hudi/issues/4864#issuecomment-1302903706

   Insert drop dups will consider file groups for matching partitions only. So, 
if you incoming batch contains records for 1 partition, hudi will do an index 
look up only in 1 partition and then drop matching records. 
   
   another suggestion you can try is. You can enable clustering and batch lot 
of small files into larger file groups. And so, index look up will have lesser 
no of files/bloom to look up. 
   
   Alternatively, if records are spread across all file groups randomly in a 
given partition, you can try "SIMPLE" index. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to