mcvsubbu commented on a change in pull request #5221: Add a new server api for
download of segments.
URL: https://github.com/apache/incubator-pinot/pull/5221#discussion_r406536533
##########
File path:
pinot-server/src/main/java/org/apache/pinot/server/api/resources/TablesResource.java
##########
@@ -175,4 +183,43 @@ public String getCrcMetadataForTable(
}
}
}
+
+ // TODO Add access control similar to
PinotSegmentUploadDownloadRestletResource for segment download.
+ @GET
+ @Produces(MediaType.APPLICATION_OCTET_STREAM)
+ @Path("/segments/{tableNameWithType}/{segmentName}")
+ @ApiOperation(value = "Download a segment", notes = "Download a segment in
zipped tar format")
+ public Response downloadSegment(
+ @ApiParam(value = "Name of the table with type REALTIME OR OFFLINE",
required = true, example = "myTable_OFFLINE") @PathParam("tableNameWithType")
String tableNameWithType,
+ @ApiParam(value = "Name of the segment", required = true)
@PathParam("segmentName") @Encoded String segmentName,
+ @Context HttpHeaders httpHeaders)
+ throws Exception {
+ LOGGER.info("Received a request to download segment {} for table {}",
segmentName, tableNameWithType);
+ TableDataManager tableDataManager =
checkGetTableDataManager(tableNameWithType);
+ SegmentDataManager segmentDataManager =
tableDataManager.acquireSegment(segmentName);
+ if (segmentDataManager == null) {
+ throw new WebApplicationException(
+ String.format("Table %s segment %s does not exist",
tableNameWithType, segmentName),
+ Response.Status.NOT_FOUND);
+ }
+ try {
+ String tableDir = tableDataManager.getTableDataDir();
+ // TODO Limit the number of concurrent downloads of segments because
compression is an expensive operation.
+ String tarFilePath =
TarGzCompressionUtils.createTarGzOfDirectory(tableDir + File.separator +
segmentName);
Review comment:
You have two options here:
(1) Create a temporary tar file (unique name), and delete it soon after you
send the segment out, in the finally block. In addition you can also mark it as
deleteOnExit, in case things happen during tar/serving. In this case, use
segmentTarDir config to drop the file for (one) serve to another host.
(2) Create a semi-permanent tar file (non-unique name) so that it can be
re-used when another request comes in for the same segment. In this case, you
need to make sure that the file is not over-written when two requests come in
for serving the same segment. In this case, I think the permanent tar file
should be created in tableDataDir like you have done. But you do need to
synchronize the creation of the file, and check if the file is already created
by another thread, no?
Which technique do you want to go with?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]