dtarima commented on code in PR #45181:
URL: https://github.com/apache/spark/pull/45181#discussion_r1517869525


##########
sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala:
##########
@@ -193,10 +193,40 @@ private[sql] object Dataset {
  */
 @Stable
 class Dataset[T] private[sql](
-    @DeveloperApi @Unstable @transient val queryExecution: QueryExecution,
+    @DeveloperApi @Unstable @transient val queryUnpersisted: QueryExecution,
     @DeveloperApi @Unstable @transient val encoder: Encoder[T])
   extends Serializable {
 
+  private var queryPersisted: Option[(Array[Boolean], QueryExecution)] = None
+
+  def queryExecution: QueryExecution = {
+    val cacheStatesSign = queryUnpersisted.computeCacheStateSignature()
+    // If all children aren't cached, directly return the queryUnpersisted
+    if (cacheStatesSign.forall(b => !b)) {

Review Comment:
   1. It doesn't look like it's necessary to distinguish between `persisted` 
and `unpersisted` anymore. If we wanted we could have a cache `Map[State, 
QueryExecution]` for different states, but I think it'd add unjustified 
complexity.
   2. We cannot use `var` - it's not thread-safe.
   
   ```scala
   class Dataset[T] private[sql](
       @Unstable @transient val queryExecutionRef: 
AtomicReference[(Array[Boolean], QueryExecution)],
       @DeveloperApi @Unstable @transient val encoder: Encoder[T])
     extends Serializable {
   
     @DeveloperApi
     def queryExecution: QueryExecution = {
       val (state, queryExecution) = queryExecutionRef.get()
       val newState = queryExecution.computeCacheStateSignature()
   
       if (state.sameElements(newState)) queryExecution
       else {
         val newQueryExecution = new QueryExecution(queryExecution)
         queryExecutionRef.set((newState, newQueryExecution))
         newQueryExecution
       }
     }
   
     ...
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to