kbendick commented on pull request #1440: URL: https://github.com/apache/iceberg/pull/1440#issuecomment-693139777
> @kbendick, sounds like there are two concerns: > > 1. Using `Iterables.transform` rather than `Lists.transform` > 2. Using `Iterables` instead of streams > > For the first, the fix here is to create an `Iterable` using a lambda, to delay reading the file contents until the task runs in another thread. Because that `Iterable` is not a list, we can't use `Lists.transform`. We do use `Lists.transform` when transforming a list into another list, but it isn't needed here after the change. We could have used it before the change, but we need an `Iterable`, not a `List`, so it doesn't really matter. > > For the second concern, this needs to pass an `Iterable` into `ParallelIterable` so we can't really use a stream unless we convert to stream, transform, and convert back to a collection. Mixing streams and iterables like that tends to just make the code more complicated because it requires extra steps (collecting to a list) rather than just transforming. > > Lastly, using `Iterable` / `Iterator` allows more obvious control over when files are actually opened because everything is done by pulling records from the final result. Cool. Thanks for the clarification. I'm not typically a java developer by day (scala, python, go, nodejs, and various flavors of automation "languages" and scripting with some architectural support and code reviews provided in java and _occasionally_ library support provided in java), so I appreciate you taking the time to help clear up the differences for me. I'm not used to needing so many additional libs (eg Guava, commons libraries, etc) for what is provided out of the box often times in scala. One of my goals when choosing to join this project was to get more hands on experience with practical java, so as always I greatly appreciate you taking the time to clarify my question @rdblue 👍 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
