Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22995#discussion_r234396107
  
    --- Diff: 
core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala ---
    @@ -93,7 +96,14 @@ private[spark] class TorrentBroadcast[T: ClassTag](obj: 
T, id: Long)
       private var checksums: Array[Int] = _
     
       override protected def getValue() = {
    -    _value
    +    val memoized: T = if (_value == null) null.asInstanceOf[T] else 
_value.get
    --- End diff --
    
    I suppose there is a race condition here, in that several threads could end 
up simultaneously setting the reference. It won't be incorrect as the data 
ought to be the same. I am not sure of the access pattern for this object; 
maybe it's always single-threaded. But if lots are reading, you can imagine 
them all causing a call to `readBroadcastBlock()` simultaneously.
    
    Introducing another object to lock on is safe and not too much extra 
legwork. Might be worth it.
    
    Isn't WeakReference cleared on any GC? would SoftReference be better to 
hold out until memory is exhausted? to avoid re-reading. There's a tradeoff 
there.
    
    Good idea, just surprisingly full of possible gotchas.
    
    Nit: isn't `val memoized = if (_value == null) null else _value.get` 
sufficient?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to