codope commented on issue #3755: URL: https://github.com/apache/hudi/issues/3755#issuecomment-951914894
> I found two sparkHoodieBloomIndex were running, is that means two writers ran parallelism? I believe those are part of the same writer process. Hudi performs index lookup to get existing location of records. As part of that, it will tag the incoming records as inserts or updates, by joining with existing record keys. So you see two mapToPair calls. Check [SparkHoodieBloomIndex#tagLocation](https://github.com/apache/hudi/blob/e3fc74668fc43fefd73087ff725245b8ed85b4a1/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/index/bloom/SparkHoodieBloomIndex.java#L70) method. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org