kbendick commented on pull request #1440:
URL: https://github.com/apache/iceberg/pull/1440#issuecomment-693139777


   > @kbendick, sounds like there are two concerns:
   > 
   > 1. Using `Iterables.transform` rather than `Lists.transform`
   > 2. Using `Iterables` instead of streams
   > 
   > For the first, the fix here is to create an `Iterable` using a lambda, to 
delay reading the file contents until the task runs in another thread. Because 
that `Iterable` is not a list, we can't use `Lists.transform`. We do use 
`Lists.transform` when transforming a list into another list, but it isn't 
needed here after the change. We could have used it before the change, but we 
need an `Iterable`, not a `List`, so it doesn't really matter.
   > 
   > For the second concern, this needs to pass an `Iterable` into 
`ParallelIterable` so we can't really use a stream unless we convert to stream, 
transform, and convert back to a collection. Mixing streams and iterables like 
that tends to just make the code more complicated because it requires extra 
steps (collecting to a list) rather than just transforming.
   > 
   > Lastly, using `Iterable` / `Iterator` allows more obvious control over 
when files are actually opened because everything is done by pulling records 
from the final result.
   
   Cool. Thanks for the clarification. I'm not typically a java developer by 
day (scala, python, go, nodejs, and various flavors of automation "languages" 
and scripting with some architectural support and code reviews provided in java 
and _occasionally_ library support provided in java), so I appreciate you 
taking the time to help clear up the differences for me. I'm not used to 
needing so many additional libs (eg Guava, commons libraries, etc) for what is 
provided out of the box often times in scala.
   
   One of my goals when choosing to join this project was to get more hands on 
experience with practical java, so as always I greatly appreciate you taking 
the time to clarify my question @rdblue 👍 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to