klsince commented on code in PR #13314:
URL: https://github.com/apache/pinot/pull/13314#discussion_r1628481078
##
pinot-controller/src/main/java/org/apache/pinot/controller/util/ServerSegmentMetadataReader.java:
##
@@ -235,8 +236,18 @@ public List
getValidDocIdsMetadataFromServer(String tab
}
}
}
-
serverURLsAndBodies.add(generateValidDocIdsMetadataURL(tableNameWithType,
segmentsToQuery, validDocIdsType,
- serverToEndpoints.get(serverToSegments.getKey(;
+
+ // Number of segments to query per server request. If a table has a lot
of segments, then we might send a
+ // huge payload to pinot-server in request. Batching the requests will
help in reducing the payload size.
Review Comment:
use Lists.partition() for short?
##
pinot-controller/src/main/java/org/apache/pinot/controller/api/resources/PinotTableRestletResource.java:
##
@@ -972,7 +972,10 @@ public String getTableAggregateValidDocIdsMetadata(
@ApiParam(value = "A list of segments", allowMultiple = true)
@QueryParam("segmentNames")
List segmentNames,
@ApiParam(value = "Valid doc ids type") @QueryParam("validDocIdsType")
- @DefaultValue("SNAPSHOT") ValidDocIdsType validDocIdsType, @Context
HttpHeaders headers) {
+ @DefaultValue("SNAPSHOT") ValidDocIdsType validDocIdsType,
+ @ApiParam(value = "Number of segments in a batch per server request")
+ @QueryParam("numSegmentsBatchPerServerRequest") @DefaultValue("500") int
numSegmentsBatchPerServerRequest,
Review Comment:
nit: numSegmentsPerServerRequest or serverRequestBatchSize
##
pinot-controller/src/main/java/org/apache/pinot/controller/util/CompletionServiceHelper.java:
##
@@ -129,13 +130,20 @@ private CompletionServiceResponse collectResponse(String
tableNameWithType, int
int statusCode =
multiHttpRequestResponse.getResponse().getStatusLine().getStatusCode();
if (statusCode >= 300) {
String reason =
multiHttpRequestResponse.getResponse().getStatusLine().getReasonPhrase();
- LOGGER.error("Server: {} returned error: {}, reason: {}", instance,
statusCode, reason);
+ LOGGER.error("Server: {} returned error: {}, reason: {} for uri:
{}", instance, statusCode, reason, uri);
completionServiceResponse._failedResponseCount++;
continue;
}
String responseString =
EntityUtils.toString(multiHttpRequestResponse.getResponse().getEntity());
-completionServiceResponse._httpResponses
-.put(multiRequestPerServer ? uri.toString() : instance,
responseString);
+String key = multiRequestPerServer ? uri.toString() : instance;
+// there can be a scenario where all the requests to a particular
server had the same uri but the
+// payload might be different. In that scenario, we should append a
random string to the key so that
+// we send all the responses back to the caller otherwise in the map,
the last response will overwrite
+if (multiRequestPerServer &&
completionServiceResponse._httpResponses.containsKey(key)) {
+ LOGGER.warn("Appending random string to http response key name: {}",
key);
+ key = key + "__" + RandomStringUtils.randomAlphanumeric(10);
Review Comment:
use a counter to avoid conflict, to make it easier to tell how many requests
created for the same uri
##
pinot-server/src/main/java/org/apache/pinot/server/api/resources/TablesResource.java:
##
@@ -657,9 +657,7 @@ private List>
processValidDocIdsMetadata(String tableNameWit
}
try {
if (!missingSegments.isEmpty()) {
-throw new WebApplicationException(
-String.format("Table %s has missing segments: %s)",
tableNameWithType, segments),
-Response.Status.NOT_FOUND);
+LOGGER.warn("Table {} has missing segments {}", tableNameWithType,
missingSegments);
Review Comment:
can add a comment on why no need to abort
##
pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/src/main/java/org/apache/pinot/plugin/minion/tasks/upsertcompaction/UpsertCompactionTaskGenerator.java:
##
@@ -58,6 +58,7 @@ public class UpsertCompactionTaskGenerator extends
BaseTaskGenerator {
private static final String DEFAULT_BUFFER_PERIOD = "7d";
private static final double DEFAULT_INVALID_RECORDS_THRESHOLD_PERCENT = 0.0;
private static final long DEFAULT_INVALID_RECORDS_THRESHOLD_COUNT = 0;
+ private static final int DEFAULT_SEGMENT_BATCH_SIZE_FOR_QUERYING_SERVER =
500;
Review Comment:
nit: simply `DEFAULT_`NUM_SEGMENTS_BATCH_PER_SERVER_REQUEST
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For queries about this service, please contact Infrastructure at: