shameersss1 commented on a change in pull request #60: URL: https://github.com/apache/tez/pull/60#discussion_r821868285
########## File path: tez-api/src/main/java/org/apache/tez/dag/api/TezConfiguration.java ########## @@ -883,6 +883,25 @@ public TezConfiguration(boolean loadDefaults) { + "dag.cleanup.on.completion"; public static final boolean TEZ_AM_DAG_CLEANUP_ON_COMPLETION_DEFAULT = false; + /** + * Boolean value. Instructs AM to delete vertex shuffle data if a vertex and all its + * child vertices at a certain depth are completed. + */ + @ConfigurationScope(Scope.AM) + @ConfigurationProperty(type="boolean") + public static final String TEZ_AM_VERTEX_CLEANUP_ON_COMPLETION = TEZ_AM_PREFIX + + "vertex.cleanup.on.completion"; + public static final boolean TEZ_AM_VERTEX_CLEANUP_ON_COMPLETION_DEFAULT = false; + + /** + * Int value. The height from the vertex that it can issue shuffle data deletion upon completion + */ + @ConfigurationScope(Scope.AM) + @ConfigurationProperty(type="integer") + public static final String TEZ_AM_VERTEX_CLEANUP_HEIGHT = TEZ_AM_PREFIX + + "vertex.cleanup.height"; + public static final int TEZ_AM_VERTEX_CLEANUP_HEIGHT_DEFAULT = 1; Review comment: Just to add to my previous comment, The intention behind having height as a config is that (rather than value being 1 is that), The probability of shuffle data of great ancestor(s) being requested is very rare and happens when multiple shuffle nodes are lost. So having control over which ancestor(s) shuffle data to clear up will give user more flexibility to control -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@tez.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org