[GitHub] [incubator-pinot] snleee commented on a change in pull request #5221: Add a new server api for download of segments.

GitBox Tue, 07 Apr 2020 17:38:02 -0700

snleee commented on a change in pull request #5221: Add a new server api for 
download of segments.
URL: https://github.com/apache/incubator-pinot/pull/5221#discussion_r405191232


 ##########
 File path: 
pinot-server/src/main/java/org/apache/pinot/server/api/resources/TablesResource.java
 ##########
 @@ -175,4 +183,41 @@ public String getCrcMetadataForTable(
       }
     }
   }
+
+  @GET
+  @Produces(MediaType.APPLICATION_OCTET_STREAM)
+  @Path("/tables/{tableName}/segments/{segmentName}")
+  @ApiOperation(value = "Download a segment", notes = "Download a segment in 
zipped tar format")
+  public Response downloadSegment(
+      @ApiParam(value = "Name of the table with type REALTIME OR OFFLINE", 
required = true, example = "myTable_OFFLINE") @PathParam("tableName") String 
tableName,
+      @ApiParam(value = "Name of the segment", required = true) 
@PathParam("segmentName") @Encoded String segmentName,
+      @Context HttpHeaders httpHeaders)
+      throws Exception {
+    LOGGER.info("Get a request to download segment {} for table {}", 
segmentName, tableName);
+    TableDataManager tableDataManager = checkGetTableDataManager(tableName);
+    SegmentDataManager segmentDataManager = 
tableDataManager.acquireSegment(segmentName);
+    if (segmentDataManager == null) {
+      throw new WebApplicationException(String.format("Table %s segments %s 
does not exist", tableName, segmentName),
+          Response.Status.NOT_FOUND);
+    }
+    try {
+      String tableDir = tableDataManager.getTableDataDir();
+      String tarFilePath = 
TarGzCompressionUtils.createTarGzOfDirectory(tableDir + "/" + segmentName);
 
 Review comment:
   Current API behavior will compress the segment file every time we hit this 
API. It looks a bit expensive operation. Does 
`TarGzCompressionUtils.createTarGzOfDirectory ` use `tar -cvf` or `tar -czvf`? 
`cvf` will simply group the multiple files/directories into a single file while 
`czvf` will do the compression. I guess that 
`TarGzCompressionUtils.createTarGzOfDirectory ` probably try to compress the 
file. 
   
   Depending on the use case, compression may become the performance 
bottleneck. Imagine that a single server gets the download request for multiple 
segment at the similar time. Compressing multiple files concurrently will 
consume a lot of CPU resource.
   
   One way to improve this is simply using the `tar cvf` equivalent logic (no 
compression) and send the file. Another approach is to keep compressed files in 
some directory and use it as a cache (then we also needs to handle 
invalidation). We don't need to address this for now but let's add at least a 
comment on this in case someone faces the bottleneck here.. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [incubator-pinot] snleee commented on a change in pull request #5221: Add a new server api for download of segments.

Reply via email to