Re: [PR] Spark upsert table backfill support [pinot]

via GitHub Thu, 14 Nov 2024 09:07:15 -0800


pengding-stripe commented on code in PR #14443:
URL: https://github.com/apache/pinot/pull/14443#discussion_r1842602704



##########
pinot-plugins/pinot-batch-ingestion/pinot-batch-ingestion-common/src/main/java/org/apache/pinot/plugin/ingestion/batch/common/SegmentGenerationTaskRunner.java:
##########
@@ -167,19 +167,25 @@ private SegmentNameGenerator 
getSegmentNameGenerator(SegmentGeneratorConfig segm
         return new 
InputFileSegmentNameGenerator(segmentNameGeneratorConfigs.get(FILE_PATH_PATTERN),
             segmentNameGeneratorConfigs.get(SEGMENT_NAME_TEMPLATE), 
inputFileUri, appendUUIDToSegmentName);
       case BatchConfigProperties.SegmentNameGeneratorType.UPLOADED_REALTIME:
-        Preconditions.checkState(segmentGeneratorConfig.getCreationTime() != 
null,
-            "Creation time must be set for uploaded realtime segment name 
generator");
-        
Preconditions.checkState(segmentGeneratorConfig.getUploadedSegmentPartitionId() 
!= -1,
+        
Preconditions.checkState(segmentNameGeneratorConfigs.get(BatchConfigProperties.SEGMENT_UPLOAD_TIME_MS)
 != null,
+            "Upload time must be set for uploaded realtime segment name 
generator");
+        
Preconditions.checkState(segmentNameGeneratorConfigs.get(BatchConfigProperties.SEGMENT_PARTITION_ID)
 != null,

Review Comment:
   Users need to partition data themselves. We have a spark job to replicate 
how stream ingestion partition data and put partitioned data in a path with 
partition id. e.g. `s3://.../partition=0/partitioned-data.parquet`
   Then create segment task can get partition id from the path. So there will 
be one spark task per partition.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Spark upsert table backfill support [pinot]

Reply via email to