[GitHub] [iceberg] rdblue commented on pull request #1440: Core: Parallelize manifest list reads in metadata tables

GitBox Tue, 15 Sep 2020 10:48:15 -0700


rdblue commented on pull request #1440:
URL: https://github.com/apache/iceberg/pull/1440#issuecomment-692873721



   @kbendick, sounds like there are two concerns:
   1. Using `Iterables.transform` rather than `Lists.transform`
   2. Using `Iterables` instead of streams
   
   For the first, the fix here is to create an `Iterable` using a lambda, to 
delay reading the file contents until the task runs in another thread. Because 
that `Iterable` is not a list, we can't use `Lists.transform`. We do use 
`Lists.transform` when transforming a list into another list, but it isn't 
needed here after the change. We could have used it before the change, but we 
need an `Iterable`, not a `List`, so it doesn't really matter.
   
   For the second concern, this needs to pass an `Iterable` into 
`ParallelIterable` so we can't really use a stream unless we convert to stream, 
transform, and convert back to a collection. Mixing streams and iterables like 
that tends to just make the code more complicated because it requires extra 
steps (collecting to a list) rather than just transforming.
   
   Lastly, using `Iterable` / `Iterator` allows more obvious control over when 
files are actually opened because everything is done by pulling records from 
the final result.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on pull request #1440: Core: Parallelize manifest list reads in metadata tables

Reply via email to