vinothchandar commented on a change in pull request #1289: [HUDI-92] Provide reasonable names for Spark DAG stages in Hudi. URL: https://github.com/apache/incubator-hudi/pull/1289#discussion_r375667190
########## File path: hudi-client/src/main/java/org/apache/hudi/HoodieWriteClient.java ########## @@ -586,6 +586,7 @@ public boolean savepoint(String commitTime, String user, String comment) { HoodieTimeline.compareTimestamps(commitTime, lastCommitRetained, HoodieTimeline.GREATER_OR_EQUAL), "Could not savepoint commit " + commitTime + " as this is beyond the lookup window " + lastCommitRetained); + jsc.setJobGroup(this.getClass().getSimpleName(), "Collecting latest files in partition"); Review comment: In general lets provide some context into what higher level context, the action is being performed i.e savepoints, compaction, rollbacks. etc . In that spirit, change to `Collecting latest files for savepoint` ? Also wonder if we can include the `commitTime` in the detail i.e `Collecting latest files for savepoint 20200205010000`. This way, you can just go to past runs on spark history server and relate them to commits on hudi.. Even better, if someone is running deltastreamer in continuous mode, then they can see activity for commits over time ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services