swaminathanmanish commented on code in PR #14623:
URL: https://github.com/apache/pinot/pull/14623#discussion_r1877412335


##########
pinot-common/src/main/java/org/apache/pinot/common/minion/RealtimeToOfflineSegmentsTaskMetadata.java:
##########
@@ -41,19 +47,37 @@
 public class RealtimeToOfflineSegmentsTaskMetadata extends BaseTaskMetadata {
 
   private static final String WATERMARK_KEY = "watermarkMs";
+  private static final String SEGMENT_NAME_SEPARATOR = ",";
 
   private final String _tableNameWithType;
-  private final long _watermarkMs;
+  private long _watermarkMs;
+  private final Map<String, List<String>> 
_realtimeSegmentVsCorrespondingOfflineSegmentMap;

Review Comment:
   This approach looks more promising.
   I think we might have to track this input->output segment on a per task 
basis and then undo everything if there's a task failure (task's execution 
should be treated as all or nothing).  You could maintain 2 maps where the key 
of the map is taskId and value is list of segments (inputSegments, 
outputSegments). 
   If there's a task failure, you need to undo all that the task has done i.e 
remove all outputSegments (if they exist in offline table) and redo all 
inputSegments that the task picked. 
   The reason for the above is a single input segment can map to multiple 
output segments and multiple input can map to single output segment. The 
cleanest approach is undo what the task has done (either in the minion itself & 
in generator as fallback) if there's failure and retry the input segments. 
   
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to