phaniarnab commented on code in PR #1790:
URL: https://github.com/apache/systemds/pull/1790#discussion_r1142295071
##########
src/main/java/org/apache/sysds/hops/rewrite/RewriteInjectSparkPReadCheckpointing.java:
##########
@@ -68,20 +69,30 @@ private void rInjectCheckpointAfterPRead( Hop hop )
boolean isMatrix = hop.getDataType().isMatrix();
boolean isPRead = hop instanceof DataOp &&
((DataOp)hop).getOp()==OpOpData.PERSISTENTREAD;
boolean isFrameException = hop.getDataType().isFrame() &&
isPRead && !((DataOp)hop).getFileFormat().isIJV();
-
- if( (isMatrix && isPRead) || (hop.requiresReblock() &&
!isFrameException) ) {
+
+ // if the only operation performed is an action then do not add
chkpoint
+ if((isMatrix && isPRead) || (hop.requiresReblock() &&
!isFrameException)) {
+ boolean isActionOnly = isActionOnly(hop,
hop.getParent());
//make given hop for checkpointing (w/ default storage
level)
- //note: we do not recursively process childs here in
order to prevent unnecessary checkpoints
- hop.setRequiresCheckpoint(true);
+ //note: we do not recursively process children here in
order to prevent unnecessary checkpoints
+ if(!isActionOnly)
+ hop.setRequiresCheckpoint(true);
}
else {
if( hop.getInput() != null ) {
- //process all childs (prevent concurrent
modification by index access)
+ //process all children (prevent concurrent
modification by index access)
for( int i=0; i<hop.getInput().size(); i++ )
rInjectCheckpointAfterPRead(
hop.getInput().get(i) );
}
}
hop.setVisited();
}
+
+ private boolean isActionOnly(Hop hop, List<Hop> parents){
+ // if the number of consumers of this hop is equal to 1 and no
more
+ // then do not cache block unless that one operation is
transient write
+ boolean isPWrite = hop instanceof DataOp &&
((DataOp)hop).getOp()==OpOpData.PERSISTENTWRITE;
+ return !isPWrite && parents.size() == 1;
+ }
Review Comment:
1. If you want to check for transient write, use `OpOpData.TRANSIENTWRITE`
2. Even if the PRead has multiple consumers, but they are all part of a
single job, you still can avoid checkpointing. But I think it is okay to leave
that case for now.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]