codope commented on issue #3755:
URL: https://github.com/apache/hudi/issues/3755#issuecomment-951914894


   > I found two sparkHoodieBloomIndex were running, is that means two writers 
ran parallelism?
   
   I believe those are part of the same writer process. Hudi performs index 
lookup to get existing location of records. As part of that, it will tag the 
incoming records as inserts or updates, by joining with existing record keys. 
So you see two mapToPair calls. Check 
[SparkHoodieBloomIndex#tagLocation](https://github.com/apache/hudi/blob/e3fc74668fc43fefd73087ff725245b8ed85b4a1/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/index/bloom/SparkHoodieBloomIndex.java#L70)
 method. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to