klsince commented on code in PR #13314:
URL: https://github.com/apache/pinot/pull/13314#discussion_r1628481078


##########
pinot-controller/src/main/java/org/apache/pinot/controller/util/ServerSegmentMetadataReader.java:
##########
@@ -235,8 +236,18 @@ public List<ValidDocIdsMetadataInfo> 
getValidDocIdsMetadataFromServer(String tab
           }
         }
       }
-      
serverURLsAndBodies.add(generateValidDocIdsMetadataURL(tableNameWithType, 
segmentsToQuery, validDocIdsType,
-          serverToEndpoints.get(serverToSegments.getKey())));
+
+      // Number of segments to query per server request. If a table has a lot 
of segments, then we might send a
+      // huge payload to pinot-server in request. Batching the requests will 
help in reducing the payload size.

Review Comment:
   use Lists.partition() for short?



##########
pinot-controller/src/main/java/org/apache/pinot/controller/api/resources/PinotTableRestletResource.java:
##########
@@ -972,7 +972,10 @@ public String getTableAggregateValidDocIdsMetadata(
       @ApiParam(value = "A list of segments", allowMultiple = true) 
@QueryParam("segmentNames")
       List<String> segmentNames,
       @ApiParam(value = "Valid doc ids type") @QueryParam("validDocIdsType")
-      @DefaultValue("SNAPSHOT") ValidDocIdsType validDocIdsType, @Context 
HttpHeaders headers) {
+      @DefaultValue("SNAPSHOT") ValidDocIdsType validDocIdsType,
+      @ApiParam(value = "Number of segments in a batch per server request")
+      @QueryParam("numSegmentsBatchPerServerRequest") @DefaultValue("500") int 
numSegmentsBatchPerServerRequest,

Review Comment:
   nit: numSegmentsPerServerRequest or serverRequestBatchSize



##########
pinot-controller/src/main/java/org/apache/pinot/controller/util/CompletionServiceHelper.java:
##########
@@ -129,13 +130,20 @@ private CompletionServiceResponse collectResponse(String 
tableNameWithType, int
         int statusCode = 
multiHttpRequestResponse.getResponse().getStatusLine().getStatusCode();
         if (statusCode >= 300) {
           String reason = 
multiHttpRequestResponse.getResponse().getStatusLine().getReasonPhrase();
-          LOGGER.error("Server: {} returned error: {}, reason: {}", instance, 
statusCode, reason);
+          LOGGER.error("Server: {} returned error: {}, reason: {} for uri: 
{}", instance, statusCode, reason, uri);
           completionServiceResponse._failedResponseCount++;
           continue;
         }
         String responseString = 
EntityUtils.toString(multiHttpRequestResponse.getResponse().getEntity());
-        completionServiceResponse._httpResponses
-            .put(multiRequestPerServer ? uri.toString() : instance, 
responseString);
+        String key = multiRequestPerServer ? uri.toString() : instance;
+        // there can be a scenario where all the requests to a particular 
server had the same uri but the
+        // payload might be different. In that scenario, we should append a 
random string to the key so that
+        // we send all the responses back to the caller otherwise in the map, 
the last response will overwrite
+        if (multiRequestPerServer && 
completionServiceResponse._httpResponses.containsKey(key)) {
+          LOGGER.warn("Appending random string to http response key name: {}", 
key);
+          key = key + "__" + RandomStringUtils.randomAlphanumeric(10);

Review Comment:
   use a counter to avoid conflict, to make it easier to tell how many requests 
created for the same uri



##########
pinot-server/src/main/java/org/apache/pinot/server/api/resources/TablesResource.java:
##########
@@ -657,9 +657,7 @@ private List<Map<String, Object>> 
processValidDocIdsMetadata(String tableNameWit
     }
     try {
       if (!missingSegments.isEmpty()) {
-        throw new WebApplicationException(
-            String.format("Table %s has missing segments: %s)", 
tableNameWithType, segments),
-            Response.Status.NOT_FOUND);
+        LOGGER.warn("Table {} has missing segments {}", tableNameWithType, 
missingSegments);

Review Comment:
   can add a comment on why no need to abort



##########
pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/src/main/java/org/apache/pinot/plugin/minion/tasks/upsertcompaction/UpsertCompactionTaskGenerator.java:
##########
@@ -58,6 +58,7 @@ public class UpsertCompactionTaskGenerator extends 
BaseTaskGenerator {
   private static final String DEFAULT_BUFFER_PERIOD = "7d";
   private static final double DEFAULT_INVALID_RECORDS_THRESHOLD_PERCENT = 0.0;
   private static final long DEFAULT_INVALID_RECORDS_THRESHOLD_COUNT = 0;
+  private static final int DEFAULT_SEGMENT_BATCH_SIZE_FOR_QUERYING_SERVER = 
500;

Review Comment:
   nit: simply `DEFAULT_`NUM_SEGMENTS_BATCH_PER_SERVER_REQUEST



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to