Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/23214
  
    I might know the root cause: `LongToUnsafeRowMap` is acutally accessed by 
multiple threads.
    
    For broadcast hash join, we will copy the broadcasted hash relation to 
avoid multi-thread problem, via `HashedRelation.asReadOnlyCopy`. However, this 
is a shallow copy, the `LongToUnsafeRowMap` is not copied and likely shared by 
multiple `HashedRelation`s.
    
    The metrics is per-task, so I think a better fix is to track the hash probe 
metrics per `HashedRelation`, instead of per `LongToUnsafeRowMap`. It's too 
costly to copy the `LongToUnsafeRowMap`, we should think about how to do it 
efficiently. cc @hvanhovell 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to