Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22995#discussion_r234396107 --- Diff: core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala --- @@ -93,7 +96,14 @@ private[spark] class TorrentBroadcast[T: ClassTag](obj: T, id: Long) private var checksums: Array[Int] = _ override protected def getValue() = { - _value + val memoized: T = if (_value == null) null.asInstanceOf[T] else _value.get --- End diff -- I suppose there is a race condition here, in that several threads could end up simultaneously setting the reference. It won't be incorrect as the data ought to be the same. I am not sure of the access pattern for this object; maybe it's always single-threaded. But if lots are reading, you can imagine them all causing a call to `readBroadcastBlock()` simultaneously. Introducing another object to lock on is safe and not too much extra legwork. Might be worth it. Isn't WeakReference cleared on any GC? would SoftReference be better to hold out until memory is exhausted? to avoid re-reading. There's a tradeoff there. Good idea, just surprisingly full of possible gotchas. Nit: isn't `val memoized = if (_value == null) null else _value.get` sufficient?
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org