vinothchandar commented on a change in pull request #1289: [HUDI-92] Provide 
reasonable names for Spark DAG stages in Hudi.
URL: https://github.com/apache/incubator-hudi/pull/1289#discussion_r375667190
 
 

 ##########
 File path: hudi-client/src/main/java/org/apache/hudi/HoodieWriteClient.java
 ##########
 @@ -586,6 +586,7 @@ public boolean savepoint(String commitTime, String user, 
String comment) {
           HoodieTimeline.compareTimestamps(commitTime, lastCommitRetained, 
HoodieTimeline.GREATER_OR_EQUAL),
           "Could not savepoint commit " + commitTime + " as this is beyond the 
lookup window " + lastCommitRetained);
 
+      jsc.setJobGroup(this.getClass().getSimpleName(), "Collecting latest 
files in partition");
 
 Review comment:
   In general lets provide some context into what higher level context, the 
action is being performed i.e savepoints, compaction, rollbacks. etc . In that 
spirit, change to `Collecting latest files for savepoint` ? 
   
   Also wonder if we can include the `commitTime` in the detail i.e `Collecting 
latest files for savepoint 20200205010000`. This way, you can just go to past 
runs on spark history server and relate them to commits on hudi.. Even better, 
if someone is running deltastreamer in continuous mode, then they can see 
activity for commits over time 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to