phaniarnab commented on code in PR #1790:
URL: https://github.com/apache/systemds/pull/1790#discussion_r1142295071


##########
src/main/java/org/apache/sysds/hops/rewrite/RewriteInjectSparkPReadCheckpointing.java:
##########
@@ -68,20 +69,30 @@ private void rInjectCheckpointAfterPRead( Hop hop )
                boolean isMatrix = hop.getDataType().isMatrix();
                boolean isPRead = hop instanceof DataOp  && 
((DataOp)hop).getOp()==OpOpData.PERSISTENTREAD;
                boolean isFrameException = hop.getDataType().isFrame() && 
isPRead && !((DataOp)hop).getFileFormat().isIJV();
-               
-               if( (isMatrix && isPRead) || (hop.requiresReblock() && 
!isFrameException) ) {
+
+               // if the only operation performed is an action then do not add 
chkpoint
+               if((isMatrix && isPRead) || (hop.requiresReblock() && 
!isFrameException)) {
+                       boolean isActionOnly = isActionOnly(hop,  
hop.getParent());
                        //make given hop for checkpointing (w/ default storage 
level)
-                       //note: we do not recursively process childs here in 
order to prevent unnecessary checkpoints
-                       hop.setRequiresCheckpoint(true);
+                       //note: we do not recursively process children here in 
order to prevent unnecessary checkpoints
+                       if(!isActionOnly)
+                               hop.setRequiresCheckpoint(true);
                }
                else {
                        if( hop.getInput() != null ) {
-                               //process all childs (prevent concurrent 
modification by index access)
+                               //process all children (prevent concurrent 
modification by index access)
                                for( int i=0; i<hop.getInput().size(); i++ )
                                        rInjectCheckpointAfterPRead( 
hop.getInput().get(i) );
                        }
                }
                
                hop.setVisited();
        }
+
+       private boolean isActionOnly(Hop hop, List<Hop> parents){
+               // if the number of consumers of this hop is equal to 1 and no 
more
+               // then do not cache block unless that one operation is 
transient write
+               boolean isPWrite = hop instanceof DataOp && 
((DataOp)hop).getOp()==OpOpData.PERSISTENTWRITE;
+               return !isPWrite && parents.size() == 1;
+       }

Review Comment:
   1. If you want to check for transient write, use `OpOpData.TRANSIENTWRITE`
   2. What if the consumer is a unary aggregate or another action? You still 
want to avoid checkpointing right?
   3. Even if the PRead has multiple consumers, but they are all part of a 
single job, you still can avoid checkpointing. But I think it is okay to leave 
that case for now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to