dtarima commented on code in PR #45181: URL: https://github.com/apache/spark/pull/45181#discussion_r1517869525
########## sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala: ########## @@ -193,10 +193,40 @@ private[sql] object Dataset { */ @Stable class Dataset[T] private[sql]( - @DeveloperApi @Unstable @transient val queryExecution: QueryExecution, + @DeveloperApi @Unstable @transient val queryUnpersisted: QueryExecution, @DeveloperApi @Unstable @transient val encoder: Encoder[T]) extends Serializable { + private var queryPersisted: Option[(Array[Boolean], QueryExecution)] = None + + def queryExecution: QueryExecution = { + val cacheStatesSign = queryUnpersisted.computeCacheStateSignature() + // If all children aren't cached, directly return the queryUnpersisted + if (cacheStatesSign.forall(b => !b)) { Review Comment: 1. It doesn't look like it's necessary to distinguish between `persisted` and `unpersisted` anymore. If we wanted we could have a cache `Map[State, QueryExecution]` for different states, but I think it'd add unjustified complexity. 2. We cannot use `var` - it's not thread-safe. ```scala class Dataset[T] private[sql]( @Unstable @transient val queryExecutionRef: AtomicReference[(Array[Boolean], QueryExecution)], @DeveloperApi @Unstable @transient val encoder: Encoder[T]) extends Serializable { @DeveloperApi def queryExecution: QueryExecution = { val (state, queryExecution) = queryExecutionRef.get() val newState = queryExecution.computeCacheStateSignature() if (state.sameElements(newState)) queryExecution else { val newQueryExecution = new QueryExecution(queryExecution) queryExecutionRef.set((newState, newQueryExecution)) newQueryExecution } } ... ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org