umehrot2 commented on issue #1830:
URL: https://github.com/apache/hudi/issues/1830#issuecomment-659057299


   @bvaradar thank you for taking a look at this. We had an internal meeting 
with @srsteinmetz and the team, and yes at the outset to me it looks the the 
total time for lookup is increasing linearly here. It seems to be that when it 
does `countByKey()` in `WorkloadProfie` that is also triggering some of the 
previous `index lookup` spark actions on the `taggedRecords RDD`. Could this be 
an artifact of number of parquet files/bloom filters to check keeps increasing 
over time ? Have we seen similar issues reported before with Hudi ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to