keith-turner commented on issue #5096: URL: https://github.com/apache/accumulo/issues/5096#issuecomment-2495547631
> @keith-turner - In regards to adding multi-threading for scans, I'm planning to add a new API call to InstanceOperations that is similar to getActiveCompactions(List servers) and I'm wondering what we want to do with the custom java [iterator](https://github.com/apache/accumulo/blob/bbfd250d8694c210faf31116d199b56570c46f38/shell/src/main/java/org/apache/accumulo/shell/commands/ActiveScanIterator.java#L33) used by the shell for the listscans command. That iterator seems like its concatenating iterators and transforming. Seems like that cusomt iter could be dropped and replaced w/ streams that concat and transform and then turn the stream into an iterator. > We could update that to just use the new API call and get back all the scans at one time for all the servers and print them out which simplifies things compared to now (currently it just makes calls as it loops through the list of servers instead of all at once). But we could get a lot of scans back depending on how many servers are queried at once. Pull too much into memory seems like a valid concern. Could possibly use guavas Lists.partition for this. ``` List<List<ServerId>> partServerIds = Lists.partition(serverIds, 100); Stream<String> servers = partServerIds.stream().flatMap(ids->client.instanceOps.getActiveScans(ids).stream()).map(//do the string mapping that ActiveScanIter does); ``` Maybe this will fetch 100 severs at a time in parallel, not sure. Depends on flatMap behavior. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
