Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23214 I might know the root cause: `LongToUnsafeRowMap` is acutally accessed by multiple threads. For broadcast hash join, we will copy the broadcasted hash relation to avoid multi-thread problem, via `HashedRelation.asReadOnlyCopy`. However, this is a shallow copy, the `LongToUnsafeRowMap` is not copied and likely shared by multiple `HashedRelation`s. The metrics is per-task, so I think a better fix is to track the hash probe metrics per `HashedRelation`, instead of per `LongToUnsafeRowMap`. It's too costly to copy the `LongToUnsafeRowMap`, we should think about how to do it efficiently. cc @hvanhovell
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org