rdblue commented on pull request #1440: URL: https://github.com/apache/iceberg/pull/1440#issuecomment-692873721
@kbendick, sounds like there are two concerns: 1. Using `Iterables.transform` rather than `Lists.transform` 2. Using `Iterables` instead of streams For the first, the fix here is to create an `Iterable` using a lambda, to delay reading the file contents until the task runs in another thread. Because that `Iterable` is not a list, we can't use `Lists.transform`. We do use `Lists.transform` when transforming a list into another list, but it isn't needed here after the change. We could have used it before the change, but we need an `Iterable`, not a `List`, so it doesn't really matter. For the second concern, this needs to pass an `Iterable` into `ParallelIterable` so we can't really use a stream unless we convert to stream, transform, and convert back to a collection. Mixing streams and iterables like that tends to just make the code more complicated because it requires extra steps (collecting to a list) rather than just transforming. Lastly, using `Iterable` / `Iterator` allows more obvious control over when files are actually opened because everything is done by pulling records from the final result. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
